Prospects Table

The Prospects Table provides statistics on the number of subject/predicate/object data found in the triple store. It is currently a Map Reduce job that will run against the Apache Rya store and save all the statistics in the prospects table.

Build

Build the mmrts.git repo

Run

Deploy the extras/rya.prospector/target/rya.prospector-<version>-shade.jar file to the hadoop cluster.

The prospector also requires a configuration file that defines where Accumulo is, which Rya table (has to be the SPO table) to read from, and which table to output to. (Note: Make sure you follow the same schema as the Rya tables (prospects table name: tableprefix_prospects)

A sample configuration file might look like the following:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>prospector.intable</name>
        <value>triplestore_spo</value>
    </property>
    <property>
        <name>prospector.outtable</name>
        <value>triplestore_prospects</value>
    </property>
    <property>
        <name>prospector.auths</name>
        <value>U,FOUO</value>
    </property>
    <property>
        <name>instance</name>
        <value>accumulo</value>
    </property>
    <property>
        <name>zookeepers</name>
        <value>localhost:2181</value>
    </property>
    <property>
        <name>username</name>
        <value>root</value>
    </property>
    <property>
        <name>password</name>
        <value>secret</value>
    </property>
</configuration>

Run the command, filling in the correct information.

hadoop jar rya.prospector-3.0.4-SNAPSHOT-shade.jar org.apache.rya.prospector.mr.Prospector /tmp/prospectorConf.xml