The complete forward-chaining inference task is performed by the class org.apache.rya.reasoning.mr.ReasoningDriver. This will run a series of MapReduce jobs on the input until no further inferences can be made. Input must be configured using properties given in a Hadoop configuration file and/or through command line parameters. Optional parameters control the output generated.
The following properties are all required to connect to Accumulo:
rya_
: the dataset to perform inference on)File input is not generally recommended, but is implemented for testing purposes. To read from a file instead of an Accumulo table, specify either input to use a file/directory on HDFS, or C) reasoning.inputLocal to use a local file/direcory (it will be uploaded to HDFS). This requires that the data can be loaded entirely in memory, and does not correctly handle bnodes in instance data. (Each individual MapReduce job parses the file again, yielding new identifiers. This means redundant derivations involving bnodes are not detected, potentially causing a loop.) Accumulo input is the most reliable option and should be preferred when possible.
The following properties are optional:
tmp/reasoning
)debug
(defaults to false)Useful MapReduce properties include (Hadoop 2 versions):
Given the following conf.xml
(assuming a running Accumulo instance with these parameters):
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>ac.instance</name> <value>dev</value> </property> <property> <name>ac.zk</name> <value>localhost:2181</value> </property> <property> <name>ac.username</name> <value>root</value> </property> <property> <name>ac.pwd</name> <value>root</value> </property> <property> <name>reasoning.workingDir</name> <value>example_output</value> </property> </configuration>
The following command will run the reasoner on the triples found in Rya table rya_spo
and produce file output in HDFS under example_output/final
:
hadoop jar target/rya.reasoning-3.2.10-SNAPSHOT-shaded.jar org.apache.rya.reasoning.mr.ReasoningDriver -conf conf.xml -Drdf.tablePrefix=rya_
Summary statistics will be printed to standard out, including the number of instance triples inferred and the number of inconsistencies found. By default, this information will be printed at each iteration's completion. If reasoning.stats is set to true, detailed statistics will be printed at the end instead.
The inferences themselves will be written to files under ${reasoning.workingDir}/final
, unless output was disabled. Triples are written as raw N-Triples, while inconsistencies are printed with their derivations. Triples and inconsistencies will be written to different files; each file will only exist if at least one of the corresponding fact types was generated.
If debug was set to true, data files and additional info for each step i
will be written to ${reasoning.workingDir}/step-${i}a
(output of the forward chaining reasoner for that step) and ${reasoning.workingDir}/step-${i}
(output of the duplicate elimination job for that step).
Some additional information can be found in the Hadoop logs if the log level was set to DEBUG (with the mapreduce.{map/reduce}.log.level properties).
A separate tool can run conformance tests specified according to the OWL test ontology. This tool loads appropriate tests from an input file (tests stated to apply to OWL RL and the RDF-based semantics), instantiates a Mini Accumulo Cluster, and runs each test. Each test contains a premise ontology which is used as input, and output is evaluated based on the type of test: inconsistency tests require that an inconsistency is found, entailment tests require that a conclusion ontology is entailed, etc. The tool considers a conclusion to be entailed if each triple in the conclusion is either explicitly produced by the reasoner, contained in the schema (since the current implementation is mainly concerned with consistency detection, it may correctly derive a schema fact but will not manifest it as a triple), or trivially implied (e.g. parsing a conclusion specified in XML using the <owl:Ontology>
tag will result in a triple [bnode] rdf:type owl:Ontology
, which the reasoner would not explicitly generate). Once all tests are run, either success or failure is reported for each. If a test has multiple types, e.g. both consistency and entailment, success requires passing all appropriate checks.
The command expects two arguments: the input file containing the tests, and a path to use for a temporary working directory for Mini Accumulo. Input format can be specified with rdf.format; defaults to RDF/XML. HDFS is still used for intermediate storage and reasoner output (reasoning.workingDir).
Example:
hadoop jar rya.reasoning-3.2.10-SNAPSHOT-shaded.jar org.apache.rya.reasoning.mr.ConformanceTest -conf conf.xml owl-test/profile-RL.rdf temp
Reasoning over full OWL 2 is generally N2EXPTIME-complete, with reasoning over the RDF-based semantics in particular being undecidable (see here for computational properties of different subsets). In practice, full OWL reasoners exist, but have limited scalability.
Most traditional reasoners use tableau algorithms, which maintain a set of logical assertions as a tree, and repeatedly reduce them to entail new assertions. Each time a reduction requires a nondeterministic choice -- e.g. A or B -- this corresponds to a branch in the tree. When a reduction results in an atomic assertion that would create a contradiction, the current branch is inconsistent, and the search for alternative reductions backtracks and chooses a new path. When all paths are eliminated, the original set of facts is necessarily inconsistent.
In the interests of scalability and distributed computation, we choose to use a simpler rule-based approach. The OWL RL profile was explicitly designed to support scalable if/then rules, so this is a natural choice. RL allows almost all of the OWL modeling vocabulary but its semantics are limited specifically so that nondeterministic inferences are never necessary. For example, no RL rule will imply that an individual must belong to one of two different classes. One might assert that c1 is a subclass of (c2 union c3), but this does not entail any inferences under RL. (However, if (c1 union c2) is a subclass of c3, the RL rules do entail that both c1 and c2 are subclasses of c3. The union construct can have implications in RL, but only deterministic ones.)
Our approach is similar to that of WebPie, with one notable difference being that we apply all rules on instance data in the same MapReduce job, using one standard partitioning scheme (by single node: a triple is partitioned based on its subject or object). (WebPie, by contrast, uses finer partitioning schemes tailored for specific sets of rules.)
Reasoning strategies can generally be classified as forward-chaining or backward-chaining (with some hybrid approaches). Forward-chaining systems generate the closure of the ontology: apply the inference rules over the whole knowledgebase, “materialize” any derived triples, and repeat until no new information can be inferred. Backward-chaining systems begin with a query and recursively apply rules that might result in the statisfaction of that query, inferring triples dynamically. Our current implementation is a forward-chaining batch job, suitable to offline inconsistency checking over the full dataset.
Consistency checking is implemented as a series of MapReduce jobs which repeatedly pass over the set of known triples (starting with an Accumulo table) to generate new triples and check for inconsistencies, and add any new triples to the set of known triples for the next iteration, until there are no new triples or inconsistencies generated.
Reasoning is distributed according to URI: Mappers read in triples from Accumulo and from previous iterations, and partition triples according to their subject and object. Reducers perform reasoning around one URI at a time, receiving as input only triples involving that URI. This enables us to use MapReduce, but requires making several passes through the data in cases where information needs to be propagated several steps through the graph (for example, with transitivity).
Many of the RL rules can be applied this way, if we can additionally assume that each reducer has access to the complete schema, or TBox. For example, if :c1
and :c2
are known to be disjoint classes, a reasoner responsible for processing triples involving node :x
can detect an inconsistency if it receives both :x rdf:type :c
and :x rdf:type :y
. But this only works if the disjoint class relationship was passed to that reducer, and if we process the input in parallel, the mapper that sees :c1 owl:disjointWith :c2
has no way of predicting which reducers will need this information. Therefore, we do an initial pass through the data to aggregate all semantically meaningful information about classes and properties (the schema), and ensure that this is sent to every reasoner. This assumes that the schema/TBox is very small relative to the instance data/ABox, and can be held in memory by each node.
There may be many ways to derive the same triple, and there may be cycles which could lead to circular derivations. An additional job is used to prune newly generated triples which are identical to previously known triples.
Together, this yields the following MapReduce jobs and workflow:
:s :p :o
and the schema has a domain for :p
, output <:s, (:s :p :o)>
so the reducer for :s
can apply the domain rule. If the triple has a range as well, also output <:o, (:s :p :o)>
).(:a :b :c)
and (:c :d :e)
are relevant to the reducer for :c
, then <:c, (:a :b :c)>
will be received before <:c, (:c :d :e)>
.) This allows joins to use less memory than they would otherwise need.The SchemaFilter MapReduce job maps all schema-relevant triples to one reducer, which constructs a Schema object incorporating the schema relationships: first by sending the triples to the schema object one by one, and then by applying schema inference rules to add inferred schema information.
SchemaFilter: # MapReduce job Map: for triple in input: if isSchemaTriple(triple): output(null, triple) Reduce(null, triples): # Single reducer schema <- new Schema() for triple in triples: schema.processTriple(triple) schema.computeClosure() output(schema) to file
Where:
isSchemaTriple(triple): (s p o) <- triple if (p is rdf:type AND o in { owl:TransitiveProperty, owl:IrreflexiveProperty, owl:SymmetricProperty, ... }) OR (p in { rdfs:subClassOf, rdfs:subPropertyOf, owl:inverseOf, owl:disjointWith, ... }): return true else: return false
Schema.processTriple(s, p, o): (s p o) <- triple # If this is a schema-relevant type assignment if p is rdf:type: if o in { owl:TransitiveProperty, owl:IrreflexiveProperty, ... }: # then s is a property with that characteristic properties[s] <- initialize property s if necessary if o == owl:TransitiveProperty: properties[s].transitive <- true ... # Repeat for each characteristic # If this connects two schema constructs else if p in { rdfs:subClassOf, owl:disjointWith, owl:onProperty, ... }: initialize s if necessary (class or property, depending on p) initialize o if necessary (class or property, depending on p) # Connect s to o according to p: switch(p): case rdfs:subClassOf: classes[s].superclasses.add(classes[o]) case rdfs:subPropertyOf: properties[s].superproperties.add(properties[o]) case rdfs:domain: properties[s].domain.add(classes[o]) case rdfs:range: properties[s].range.add(classes[o]) # Rules scm-eqc1, scm-eqc2: equivalent class is equivalent to mutual subClassOf: case owl:equivalentClass: classes[s].superclasses.add(classes[o]) classes[o].superclasses.add(classes[s]) # Rules scm-eqp1, scm-eqp2: equivalent property is equivalent to mutual subPropertyOf: case owl:equivalentProperty: properties[s].superproperties.add(properties[o]) properties[o].superproperties.add(properties[s]) # inverse, disjoint, complement relations are all symmetric: case owl:inverseOf: properties[s].inverseproperties.add(properties[o]) properties[o].inverseproperties.add(properties[s]) case owl:disjointWith: classes[s].disjointclasses.add(o) classes[o].disjointclasses.add(s) case owl:complementOf: classes[s].complementaryclasses.add(o) classes[o].complementaryclasses.add(s) case owl:propertyDisjointWith: properties[s].disjointproperties.add(properties[o]) properties[o].disjointproperties.add(properties[s]) case owl:onProperty: classes[s].onProperty.add(properties[o]) # Link properties back to their property restrictions properties[o].addRestriction.(properties[s]) case owl:someValuesFrom: classes[s].someValuesFrom.add(o) case owl:allValuesFrom: classes[s].allValuesFrom.add(o) case owl:hasValue: classes[s].hasValue.add(o) case owl:maxCardinality: classes[s].maxCardinality <- (int) o case owl:maxQualifiedCardinality: classes[s].maxQualifiedCardinality <- (int) o case owl:onClass: classes[s].onClass.add(classes[o])
computeSchemaClosure(): # Rule scm-spo states that owl:subPropertyOf relationships are transitive. # For each property, do a BFS to find all its superproperties. for p in properties: frontier <- { sp | sp in p.superproperties } while frontier is not empty: ancestors <- UNION{ f.superproperties | f in frontier } p.superproperties <- p.superproperties UNION ancestors frontier <- UNION{ a.superproperties | a in ancestors } - p.superproperties # Rules scm-hv, scm-svf2, and scm-avf2 use sub/superproperty information to # infer sub/superclass relations between property restrictions: for c1, c2 in classes: # Rules apply if c1 is a restriction on a subproperty and c2 is a restriction on a superproperty. if c1 is property restriction AND c2 is property restriction AND c1.onProperty.superproperties contains c2.onProperty: if v exists such that c1.hasValue(v) and c2.hasValue(v): c1.superclasses.add(c2) else if c3 exists such that c1.someValuesFrom(c3) and c2.someValuesFrom(c3): c1.superclasses.add(c2) if c3 exists such that c1.allValuesFrom(c3) and c2.allValuesFrom(c3): c2.superclasses.add(c1) # Repeatedly apply rules that operate on the subclass graph until complete: do: # scm-sco: subclass relationship is transitive compute owl:subClassOf hierarchy # same BFS algorithm as owl:subPropertyOf # scm-svf1, scm-avf1: restrictions on subclasses for p in properties: for r1, r2 in property restrictions on p: if c1, c2 exist such that c1.superclasses contains c2: AND ( (r1.someValuesFrom(c1) AND r2.someValuesFrom(c2)) OR (r1.someValuesFrom(c1) AND r2.someValuesFrom(c2)) ): r1.superclasses.add(r2) until no new information generated # Once all subclass and subproperty connections have been made, use them to # apply domain and range inheritance (scm-dom1, scm-dom2, scm-rng1, scm-rng2): for p in properties: for parent in p.superproperties: p.domain <- p.domain UNION parent.domain p.range <- p.range UNION parent.range for c in p.domain: p.domain <- p.domain UNION c.superclasses for c in p.range: p.range <- p.range UNION c.superclasses
The ForwardChain MapReduce job scans through both the input triples and any triples that have been derived in any previous iterations, maps triples according to ther subject or object (depending on which might be able to use them for reasoning: can be both or neither, resulting in 0-2 map outputs per triple), runs local reasoners around individual resources, and saves all generated triples and inconsistencies to an intermediate output directory.
Main ForwardChain logic:
ForwardChain(i): # MapReduce job, iteration i Setup: distribute schema file to each node Map(triples): # from input or previous job schema <- load from file for fact in facts: (s p o) <- fact.triple relevance <- getRelevance(fact.triple, schema) if relevance.relevantToSubject == true: output(s, fact) if relevance.relevantToObject == true: output(o, fact) Group (node, fact) according to node; Sort such that incoming edges (with respect to node) come first: (c, (a b c)) < (c, (c d e)) Reduce(node, facts): # from map output schema <- load from file reasoner <- new LocalReasoner(node, i) for fact in facts: reasoner.processFact(fact) output(node.collectOutput)
Where a fact consists of a triple, iteration generated (if any), and sources (if any), and where:
getRelevance(triple, schema): # Determine relevance according to rules and their implementations in the reasoner (s p o) <- triple if p is rdf:type: # If this triple is a schema-relevant type assignment if o in schema.classes: relevantToSubject <- true if p in schema.properties: if schema.properties[p].domain is not empty: relevantToSubject = true if schema.properties[p].range is not empty: relevantToObject = true if schema.properties[p].isTransitive: relevantToSubject = true relevantToObject = true ... # Check for all implemented rules
Some examples of LocalReasoner rule application:
processFact(fact): (s p o) <- fact.triple newFacts = {} if reasoner.node == o: # Limit recursion on self-referential edges unless s==o AND fact.iteration == reasoner.currentIteration: newFacts <- newFacts UNION processIncomingEdge(fact) if reasoner.node == s: if p is rdf:type: newFacts <- newFacts UNION processType(fact) else: newFacts <- newFacts UNION processOutgoingEdge(fact) results = {} for newFact in newFacts: results <- results UNION processFact(newFact) return results processIncomingEdge(fact): (s p o) <- fact.triple # Range rule for c in schema.properties[p].range: generate fact ((reasoner.node rdf:type c), source:fact, iteration:reasoner.currentIteration) # Inverse property rule for p2 in schema.properties[p].inverseOf: generate fact ((reasoner.node p2 s), source:fact, iteration:reasoner.currentIteration) # Symmetric property rule if schema.properties[p].isSymmetric: generate fact ((reasoner.node p s), source:fact, iteration:reasoner.currentIteration) # Irreflexive property rule if schema.properties[p].isIrreflexive: if s == o: generate inconsistency (source:fact, iteration:reasoner.currentIteration) # Transitive property rule (part 1/2) if schema.properties[p].isTransitive: if fact.distance() >= 2^(reasoner.currentIteration-1): # smart transitive closure reasoner.transitiveEdges.put(p, fact) ... # Apply each supported rule applying to incoming edges processOutgoingEdge(fact): (s p o) <- fact.triple # Domain rule for c in schema.properties[p].domain: generate fact ((reasoner.node rdf:type c), source:fact, iteration:reasoner.currentIteration) # Transitive property rule (part 2/2) if schema.properties[p].isTransitive: for leftFact in reasoner.transitiveEdges[p]: (s0 p node) <- leftFact generate fact ((s0 p o), source:{fact, leftFact}, iteration:reasoner.currentIteration) ... # Apply each supported rule applying to outgoing edges processType(fact): (reasoner.node rdf:type c) <- fact.triple reasoner.typeFacts.put(c, fact) # Disjoint class rule for c2 in schema.classes[c].disjointclasses: if reasoner.typeFacts contains c2: generate inconsistency (source:{fact, reasoner.typeFacts[c]}, iteration:reasoner.currentIteration) # Subclass rule for parent in schema.classes[c].superclasses: processType(fact((reasoner.node rdf:type parent), source:fact, iteration:reasoner.currentIteration)) ... # Apply each supported rule applying to types
The main reasoning logic is located in org.apache.rya.reasoning, while MapReduce tools and utilities for interacting with Accumulo are located in org.apache.rya.reasoning.mr. Reasoning logic makes use of RDF constructs in the org.openrdf.model API, in particular: Statement, URI, Resource, and Value.
OWL2: In general, the Sesame/openrdf API is used to represent RDF constructs and refer to the RDF, RDFS, and OWL vocabularies. However, the API only covers OWL 1 constructs. The OWL2 class contains static URIs for new OWL 2 vocabulary resources: owl:IrreflexiveProperty, owl:propertyDisjointWith, etc.
OwlRule: Enum containing the RL Rules implemented.
Schema: Schema represents the complete schema/TBox/RBox as a set of Java objects. A schema consists of properties, classes, and their relationships. The Schema object maps URIs to instances of OwlProperty and Resources to instances of OwlClass.
The schema is built one triple at a time: the Schema takes in a single triple, determines what kind of schema information it represents, and instantiates and/or connects class and/or property objects accordingly. Some RL rules are applied during this process. (For example, the rule that an equivalentClass relationship is equivalent to mutual subClassOf relationships). The remainder of the RL schema rules (those that require traversing multiple levels of the schema graph) must be applied explicitly. Invoking the closure() method after all triples are incorporated to fully apply these rules. (For example, the rules that infer a property's domain based on the domains of its superproperties and/or the superclasses of its domains.)
Once the schema has been constructed, it can be used in reasoning. The Schema object serves as a repository for OwlClass and OwlProperty objects, accessed by their URIs (or Resources), which in turn contain detailed schema information about themselves.
The Schema class also provides static method isSchemaTriple for checking whether a triple contains schema-relevant information at all.
An OwlProperty or OwlClass represents a property or a class, respectively. Each object holds a reference to the RDF entity that identifies it (using to the openrdf api): a URI for each OwlProperty, and a Resource for each class (because a class is more general; it can be a URI or a bnode).
Both maintain connections to other schema constructs, according to the relevant schema connections that can be made for each type. In general, these correspond to outgoing edges with the appropriate schema triple. Internally, they are represented as sets of those entities they are connected to, one set for each type of relationship. (For example, each OwlClass contains a set of its superclasses.) These are accessed via getter methods that return sets of URIs or Resources, as appropriate. Both also apply some RL schema rules. Some rules are applied by the getter methods (for example, every class being its own subclass), while others must be explicitly invoked by the Schema (for example, determining that one property restriction is a subclass of another based on the relationships between their corresponding properties and classes).
OwlProperty also includes boolean fields corresponding to whether the property is transitive, symmetric, asymmetric, etc. (All default to false.) An OwlProperty also contains a set of references to any property restrictions that apply to it.
In addition to common connections to other classes like superclasses and disjoint classes, OwlClass instances can also represent property restriction information. This is represented in the same way as the other schema connections: a set of classes corresponding to any allValuesFrom triples, a set of classes corresponding to any someValuesFrom triples, etc. Typically, only one such connection should exist (plus one onProperty relationship to specify what property this restriction applies to), but this representation is used for the sake of generality. (In principle, an RDF graph could contain two such triples, though the semantic implications aren't well-defined.)
Fact and Derivation: A Fact represents a piece of information, typically a triple, that either was contained in the input graph or was inferred by some rule. An inferred fact contains a Derivation explaining how it was inferred. A Derivation represents an application of a rule. It contains the OwlRule used, the set of Facts that caused it to apply, the Resource corresponding to the reasoner that made the inference, and the iteration number corresponding to when the inference was made. Together, they can represent an arbitrarily complex derivation tree. Both classes contain some methods for interacting with this tree.
Local reasoners operate on Facts and Derivations so that they can make use a triple's sources as well as its contents. MapReduce tools use Fact and Derivation for their inputs and outputs, so both implement WritableComparable. Derivations are compared based on their rules and direct sources. Facts are compared based only on their direct contents (i.e. their triples), unless neither have contents, in which case they are compared by their Derivations.
AbstractReasoner: Base class for local reasoners. The AbstractReasoner doesn't contain the semantics of any reasoning rule, but contains methods for collecting and returning derived triples and inconsistencies, represented as Fact and Derivation objects respectively. Every reasoner is centered around one specific node in the graph.
LocalReasoner: Main reasoning class for miscellaneous RL instance rules. A LocalReasoner processes triples one at a time, collecting any new facts it generates. The LocalReasoner is concerned with reasoning around one main node, and can use triples only if they represent edges to or from that node. Not all edges are useful: static methods are provided to determine whether a triple is relevant for its subject‘s reasoner, its object’s reasoner, both, or neither. Relevance is determined based on the schema and based on how the reasoner applies rules to its incoming and outgoing edges (incoming and outgoing edges can trigger different rules). An arbitrary triple may not be relevant to anything if the schema contains no information about the property. If that property has a range, it will be considered relevant to the object (i.e. the reasoner for the object will be able to apply the range rule). If the property is transitive, it will be relevant to both subject and object.
When the reasoner receives a triple, it determines the direction of the edge, checks it against the appropriate set of RL rules, applies those rules if necessary, and recursively processes any facts newly generated. The LocalReasoner will perform as much inference as it can in its neighborhood, but many triples may ultimately need to be passed to reasoners for other nodes in order to produce all implied information.
Most RL rules are handled in the LocalReasoner object itself, but it also contains a TypeReasoner specifically responsible for those rules involving the central node‘s types. Unlike other facts, the type facts aren’t collected automatically and must be explicitly gathered with a call to collectTypes. (Collecting type assignments one at a time would lead to a great deal of redundancy.)
TypeReasoner: Conducts reasoning having to do with a single node's types. Several rules involve types, so they are all handled in the same place. The TypeReasoner contains a set of types for its central node, which allows it to make sure the same type is not repeatedly generated and output during one iteration. (This is in contrast to the way generic triples are handled, because not all information can be cached this way, and because it is particularly common to be able to derive the same type many times; e.g. the domain rule will generate the same type assignment for every outgoing edge with the associated predicate.)
Also has the ability to hold information that should be asserted if and only if the node turns out to be a particular type, and return that information if/when that happens. (Particularly useful for property restriction rules.)
Contains MapReduce tools and utilities for interacting with HDFS, Accumulo, and Rya tables in Accumulo.
In between jobs, different types of intermediate data can be saved to different files using MultipleOutputs: relevant intermediate triples (Fact objects); other instance triples (Facts that were generated but which could not be used to apply any rules at this point, based on the Schema); schema triples (derived Fact objects which contain schema information -- this shouldn't generally happen, but if it does, it will trigger a regeneration of the global Schema); and inconsistencies (Derivation objects).
At each iteration, ForwardChain will output generated information to its output directory, and then DuplicateElimination will output a subset of that information to its output directory. Each job uses as input one directory from each previous iteration: the subset with duplicates eliminated, if found, or the original set of generated information otherwise.
The individual MapReduce jobs each have different mappers for different kinds of input: Accumulo (main source of initial input, takes in <org.apache.accumulo.core.data.Key, org.apache.accumulo.core.data.Value>
), RDF file (alternative initial input source, takes in <LongWritable, RyaStatementWritable>
), and HDFS sequence file (intermediate output/input, takes in <Fact, NullWritable>
or <Derivation, NullWritable>
).
ReasoningDriver: Main driver class for reasoning application. Executes high-level algorithm by running SchemaFilter, ForwardChain, DuplicateElimination, and OutputTool.
MRReasoningUtils: Defines configuration parameters and contains static methods for configuring input and output (using those parameters) and for passing information between jobs (using those parameters).
ResourceWritable: WritableComparable wrapper for org.openrdf.model.Resource, so it can be used as a key/value in MapReduce tasks. Also contains an integer field to enable arbitrary secondary sort. Provides static classes PrimaryComparator to use the Resource alone, and SecondaryComparator to use resource followed by key.
SchemaWritable: Writable subclass of Schema, allowing the schema to be directly output by a MapReduce job.
ConformanceTest: Reads and executes OWL conformance tests read from an RDF file, and reports the results.
AbstractReasoningTool: Base class for individual MapReduce jobs. Contains methods for configuring different kinds of input and output, loading the schema, defining and interacting with counters (e.g. number of instance triples derived during a job), and loading and distributing schema information to all nodes.
SchemaFilter: First MapReduce job to run. Runs with only one reducer.
<Key, Value>
, <LongWritable, RyaStatementWritable>
, <Fact, NullWritable>
<NullWritable, Fact>
<NullWritable, SchemaWritable>
ForwardChain: MapReduce job responsible for actual forward-chaining reasoning.
<Key, Value>
, <LongWritable, RyaStatementWritable>
, <Fact, NullWritable>
<ResourceWritable, Fact>
<Fact, NullWritable>
, <Derivation, NullWritable>
DuplicateElimination: MapReduce job responsible for eliminating redundant information. This includes redundant inconsistencies, so unlike ForwardChain, this job needs to take inconsistencies (Derivations) as input.
<Key, Value>
, <LongWritable, RyaStatementWritable>
, <Fact, NullWritable>
, <Derivation, NullWritable>
<Fact, Derivation>
<Fact, NullWritable>
, <Derivation, NullWritable>
OutputTool: MapReduce job that collects output from all previous iterations in one place.
<Fact, NullWritable>
, <Derivation, NullWritable>
<Text, Text>
<NullWritable, Text>
RunStatistics: Simple tool to collect statistics from each job executed during the run. Uses Hadoop counters to get the number of input/output triples, number of mappers and reducers, time spent by each, memory, input and output records, etc.