The Prospects Table provides statistics on the number of subject/predicate/object data found in the triple store. It is currently a Map Reduce job that will run against the Apache Rya store and save all the statistics in the prospects table.
Deploy the extras/rya.prospector/target/rya.prospector-<version>-shade.jar
file to the hadoop cluster.
The prospector also requires a configuration file that defines where Accumulo is, which Rya table (has to be the SPO table) to read from, and which table to output to. (Note: Make sure you follow the same schema as the Rya tables (prospects table name: tableprefix_prospects)
A sample configuration file might look like the following:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>prospector.intable</name> <value>triplestore_spo</value> </property> <property> <name>prospector.outtable</name> <value>triplestore_prospects</value> </property> <property> <name>prospector.auths</name> <value>U,FOUO</value> </property> <property> <name>instance</name> <value>accumulo</value> </property> <property> <name>zookeepers</name> <value>localhost:2181</value> </property> <property> <name>username</name> <value>root</value> </property> <property> <name>password</name> <value>secret</value> </property> </configuration>
Run the command, filling in the correct information.
hadoop jar rya.prospector-3.0.4-SNAPSHOT-shade.jar org.apache.rya.prospector.mr.Prospector /tmp/prospectorConf.xml