ANY23-448 Move service and plugins out of core
  1. basic-crawler/
  2. html-scraper/
  3. integration-test/
  4. office-scraper/
  5. openie/
  6. .gitignore
  9. NOTICE.txt
  10. pom.xml

Any23 Plugins

This is the root dir of the Any23 Plugins module.

A plugin is an extension of the Any23 core and can be plugged using the Plugin Manager capabilities.



A CLI tool which extends the Rover CLI adding crawler specific capabilities.


The HTML scraper is able to convert any HTML page to triples containing the text scraped from the page.


The Office scraper is able to convert the main MS Office compatible formats and convert them to triples.


This module contains the integration tests for all the defined plugins.

Generate Plugin Packaging

To generate the desired plugin package, navigate to the plugin directory and execute

mvn package

e.g. to generate the basic-crawler plugin package

$cd $ANY23-HOME/plugins/basic-crawler
$ mvn package

From the basic-crawler directory this generates

|-- pom.xml
|-- src
|   |-- main
|   |   |-- assembly
|   |   `-- java
|   `-- test
`-- target
    |-- any23-basic-crawler-${version}.jar
    |-- apache-any23-basic-crawler-${version}-bin.tar.gz <<<
    |-- apache-any23-basic-crawler-${version} <<<
    |-- archive-tmp
    |-- classes
    |   |-- META-INF
    |   `-- org
    |-- generated-sources
    |-- maven-archiver
    |-- maven-shared-archive-resources
    |-- surefire
    |-- surefire-reports
    `-- test-classes

Plugin specific README's can be found in either ./target/.tar.gz || ./target/.zip (annotated above with ‘<<<’), where much more detailed information sources can be located.