blob: a034eef335af931ccc4ff713a06318d4d525ec05 [file] [log] [blame]
<!DOCTYPE html>
<html lang="en">
<head>
<title>Apache Jena - Apache Jena Elephas - RDF Stats Demo</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link href="/css/bootstrap.min.css" rel="stylesheet" media="screen">
<link href="/css/bootstrap-extension.css" rel="stylesheet" type="text/css">
<link href="/css/jena.css" rel="stylesheet" type="text/css">
<link rel="shortcut icon" href="/images/favicon.ico" />
<script src="https://code.jquery.com/jquery-2.2.4.min.js"
integrity="sha256-BbhdlvQf/xTY9gja0Dq3HiwQF8LaCRTXxZKRutelT44="
crossorigin="anonymous"></script>
<script src="/js/jena-navigation.js" type="text/javascript"></script>
<script src="/js/bootstrap.min.js" type="text/javascript"></script>
<script src="/js/improve.js" type="text/javascript"></script>
</head>
<body>
<nav class="navbar navbar-default" role="navigation">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-ex1-collapse">
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="/index.html">
<img class="logo-menu" src="/images/jena-logo/jena-logo-notext-small.png" alt="jena logo">Apache Jena</a>
</div>
<div class="collapse navbar-collapse navbar-ex1-collapse">
<ul class="nav navbar-nav">
<li id="homepage"><a href="/index.html"><span class="glyphicon glyphicon-home"></span> Home</a></li>
<li id="download"><a href="/download/index.cgi"><span class="glyphicon glyphicon-download-alt"></span> Download</a></li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown"><span class="glyphicon glyphicon-book"></span> Learn <b class="caret"></b></a>
<ul class="dropdown-menu">
<li class="dropdown-header">Tutorials</li>
<li><a href="/tutorials/index.html">Overview</a></li>
<li><a href="/documentation/fuseki2/index.html">Fuseki Triplestore</a></li>
<li><a href="/documentation/notes/index.html">How-To's</a></li>
<li><a href="/documentation/query/manipulating_sparql_using_arq.html">Manipulating SPARQL using ARQ</a></li>
<li><a href="/tutorials/rdf_api.html">RDF core API tutorial</a></li>
<li><a href="/tutorials/sparql.html">SPARQL tutorial</a></li>
<li><a href="/tutorials/using_jena_with_eclipse.html">Using Jena with Eclipse</a></li>
<li class="divider"></li>
<li class="dropdown-header">References</li>
<li><a href="/documentation/index.html">Overview</a></li>
<li><a href="/documentation/query/index.html">ARQ (SPARQL)</a></li>
<li><a href="/documentation/assembler/index.html">Assembler</a></li>
<li><a href="/documentation/tools/index.html">Command-line tools</a></li>
<li><a href="/documentation/rdfs/">Data with RDFS Inferencing</a></li>
<li><a href="/documentation/geosparql/index.html">GeoSPARQL</a></li>
<li><a href="/documentation/inference/index.html">Inference API</a></li>
<li><a href="/documentation/javadoc.html">Javadoc</a></li>
<li><a href="/documentation/ontology/">Ontology API</a></li>
<li><a href="/documentation/permissions/index.html">Permissions</a></li>
<li><a href="/documentation/extras/querybuilder/index.html">Query Builder</a></li>
<li><a href="/documentation/rdf/index.html">RDF API</a></li>
<li><a href="/documentation/rdfconnection/">RDF Connection - SPARQL API</a></li>
<li><a href="/documentation/io/">RDF I/O</a></li>
<li><a href="/documentation/rdfstar/index.html">RDF-star</a></li>
<li><a href="/documentation/shacl/index.html">SHACL</a></li>
<li><a href="/documentation/shex/index.html">ShEx</a></li>
<li><a href="/documentation/jdbc/index.html">SPARQL over JDBC</a></li>
<li><a href="/documentation/tdb/index.html">TDB</a></li>
<li><a href="/documentation/tdb2/index.html">TDB2</a></li>
<li><a href="/documentation/query/text-query.html">Text Search</a></li>
</ul>
</li>
<li class="drop down">
<a href="#" class="dropdown-toggle" data-toggle="dropdown"><span class="glyphicon glyphicon-book"></span> Javadoc <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="/documentation/javadoc.html">All Javadoc</a></li>
<li><a href="/documentation/javadoc/arq/">ARQ</a></li>
<li><a href="/documentation/javadoc_elephas.html">Elephas</a></li>
<li><a href="/documentation/javadoc/fuseki2/">Fuseki</a></li>
<li><a href="/documentation/javadoc/geosparql/">GeoSPARQL</a></li>
<li><a href="/documentation/javadoc/jdbc/">JDBC</a></li>
<li><a href="/documentation/javadoc/jena/">Jena Core</a></li>
<li><a href="/documentation/javadoc/permissions/">Permissions</a></li>
<li><a href="/documentation/javadoc/extras/querybuilder/">Query Builder</a></li>
<li><a href="/documentation/javadoc/shacl/">SHACL</a></li>
<li><a href="/documentation/javadoc/tdb/">TDB</a></li>
<li><a href="/documentation/javadoc/text/">Text Search</a></li>
</ul>
</li>
<li id="ask"><a href="/help_and_support/index.html"><span class="glyphicon glyphicon-question-sign"></span> Ask</a></li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown"><span class="glyphicon glyphicon-bullhorn"></span> Get involved <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="/getting_involved/index.html">Contribute</a></li>
<li><a href="/help_and_support/bugs_and_suggestions.html">Report a bug</a></li>
<li class="divider"></li>
<li class="dropdown-header">Project</li>
<li><a href="/about_jena/about.html">About Jena</a></li>
<li><a href="/about_jena/architecture.html">Architecture</a></li>
<li><a href="/about_jena/citing.html">Citing</a></li>
<li><a href="/about_jena/team.html">Project team</a></li>
<li><a href="/about_jena/contributions.html">Related projects</a></li>
<li><a href="/about_jena/roadmap.html">Roadmap</a></li>
<li class="divider"></li>
<li class="dropdown-header">ASF</li>
<li><a href="http://www.apache.org/">Apache Software Foundation</a></li>
<li><a href="http://www.apache.org/foundation/sponsorship.html">Become a Sponsor</a></li>
<li><a href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li>
<li><a href="http://www.apache.org/security/">Security</a></li>
<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
</ul>
</li>
<li id="edit"><a href="https://github.com/apache/jena-site/edit/main/source/documentation/archive/hadoop/demo.md" title="Edit this page on GitHub"><span class="glyphicon glyphicon-pencil"></span> Edit this page</a></li>
</ul>
</div>
</div>
</nav>
<div class="container">
<div class="row">
<div class="col-md-12">
<div id="breadcrumbs">
<ol class="breadcrumb">
<li><a href='/documentation'>DOCUMENTATION</a></li>
<li><a href='/documentation/archive'>ARCHIVE</a></li>
<li><a href='/documentation/archive/hadoop'>HADOOP</a></li>
<li class="active">DEMO</li>
</ol>
</div>
<h1 class="title">Apache Jena Elephas - RDF Stats Demo</h1>
<p>The RDF Stats Demo is a pre-built application available as a ready to run Hadoop Job JAR with all dependencies embedded within it. The demo app uses the other libraries to allow calculating a number of basic statistics over any RDF data supported by Elephas.</p>
<p>To use it you will first need to build it from source or download the relevant Maven artefact:</p>
<pre><code>&lt;dependency&gt;
&lt;groupId&gt;org.apache.jena&lt;/groupId&gt;
&lt;artifactId&gt;jena-elephas-stats&lt;/artifactId&gt;
&lt;version&gt;x.y.z&lt;/version&gt;
&lt;classifier&gt;hadoop-job&lt;/classifier&gt;
&lt;/dependency&gt;
</code></pre>
<p>Where <code>x.y.z</code> is the desired version.</p>
<h1 id="pre-requisites">Pre-requisites</h1>
<p>In order to run this demo you will need to have a Hadoop 2.x cluster available, for simple experimentation purposes a <a href="http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html">single node cluster</a> will be sufficient.</p>
<h1 id="running">Running</h1>
<p>Assuming your cluster is started and running and the <code>hadoop</code> command is available on your path you can run the application without any arguments to see help:</p>
<pre><code>&gt; hadoop jar jena-elephas-stats-VERSION-hadoop-job.jar org.apache.jena.hadoop.rdf.stats.RdfStats
NAME
hadoop jar PATH_TO_JAR org.apache.jena.hadoop.rdf.stats.RdfStats - A
command which computes statistics on RDF data using Hadoop
SYNOPSIS
hadoop jar PATH_TO_JAR org.apache.jena.hadoop.rdf.stats.RdfStats
[ {-a | --all} ] [ {-d | --data-types} ] [ {-g | --graph-sizes} ]
[ {-h | --help} ] [ --input-type &lt;inputType&gt; ] [ {-n | --node-count} ]
[ --namespaces ] {-o | --output} &lt;OutputPath&gt; [ {-t | --type-count} ]
[--] &lt;InputPath&gt;...
OPTIONS
-a, --all
Requests that all available statistics be calculated
-d, --data-types
Requests that literal data type usage counts be calculated
-g, --graph-sizes
Requests that the size of each named graph be counted
-h, --help
Display help information
--input-type &lt;inputType&gt;
Specifies whether the input data is a mixture of quads and triples,
just quads or just triples. Using the most specific data type will
yield the most accurate statistics
This options value is restricted to the following value(s):
mixed
quads
triples
-n, --node-count
Requests that node usage counts be calculated
--namespaces
Requests that namespace usage counts be calculated
-o &lt;OutputPath&gt;, --output &lt;OutputPath&gt;
Sets the output path
-t, --type-count
Requests that rdf:type usage counts be calculated
--
This option can be used to separate command-line options from the
list of argument, (useful when arguments might be mistaken for
command-line options)
&lt;InputPath&gt;
Sets the input path(s)
</code></pre>
<p>If we wanted to calculate the node count on some data we could do the following:</p>
<pre><code>&gt; hadoop jar jena-elephas-stats-VERSION-hadoop-job.jar org.apache.jena.hadoop.rdf.stats.RdfStats --node-count --output /example/output /example/input
</code></pre>
<p>This calculates the node counts for the input data found in <code>/example/input</code> placing the generated counts in <code>/example/output</code></p>
<h2 id="specifying-inputs-and-outputs">Specifying Inputs and Outputs</h2>
<p>Inputs are specified simply by providing one or more paths to the data you wish to analyse. You can provide directory paths in which case all files within the directory will be processed.</p>
<p>To specify the output location use the <code>-o</code> or <code>--output</code> option followed by the desired output path.</p>
<p>By default the demo application assumes a mixture of quads and triples data, if you know your data is only in triples/quads then you can use the <code>--input-type</code> argument followed by <code>triples</code> or <code>quads</code> to indicate the type of your data. Not doing this can skew some statistics as the default is to assume mixed data and so all triples are upgraded into quads when calculating the statistics.</p>
<h2 id="available-statistics">Available Statistics</h2>
<p>The following statistics are available and are activated by the relevant command line option:</p>
<table>
<tr><th>Command Line Option</th><th>Statistic</th><th>Description & Notes</th></tr>
<tr><td><code>-n</code> or <code>--node-count</code></td><td>Node Count</td><td>Counts the occurrences of each unique RDF term i.e. node in Jena parlance</td></tr>
<tr><td><code>-t</code> or <code>--type-count</code></td><td>Type Count</td><td>Counts the occurrences of each declared <code>rdf:type</code> value</td></tr>
<tr><td><code>-d</code> or <code>--data-types</code></td><td>Data Type Count</td><td>Counts the occurrences of each declared literal data type</td></tr>
<tr><td><code>--namespaces</code></td><td>Namespace Counts</td><td>Counts the occurrences of namespaces within the data.<br />Namespaces are determined by splitting URIs at the <code>#</code> fragment separator if present and if not the last <code>/</code> character
<tr><td><code>-g</code> or <code>--graph-sizes</code></td><td>Graph Sizes</td><td>Counts the sizes of each graph declared in the data</td></tr>
</table>
<p>You can also use the <code>-a</code> or <code>--all</code> option if you simply wish to calculate all statistics.</p>
</div>
</div>
</div>
<footer class="footer">
<div class="container" style="font-size:80%" >
<p>
Copyright &copy; 2011&ndash;2022 The Apache Software Foundation, Licensed under the
<a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
</p>
<p>
Apache Jena, Jena, the Apache Jena project logo, Apache and the Apache feather logos are trademarks of
The Apache Software Foundation.
<br/>
<a href="https://privacy.apache.org/policies/privacy-policy-public.html"
>Apache Software Foundation Privacy Policy</a>.
</p>
</div>
</footer>
<script type="text/javascript">
var link = $('a[href="' + this.location.pathname + '"]');
if (link != undefined)
link.parents('li,ul').addClass('active');
</script>
</body>
</html>