blob: cb05bec46fbb53d9c4936c521a7abc033e2aed45 [file] [log] [blame]
<!DOCTYPE html>
<html lang="en">
<head>
<title>Apache Jena - SAX Input into Jena and ARP</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link href="/css/bootstrap.min.css" rel="stylesheet" media="screen">
<link href="/css/bootstrap-extension.css" rel="stylesheet" type="text/css">
<link href="/css/jena.css" rel="stylesheet" type="text/css">
<link rel="shortcut icon" href="/images/favicon.ico" />
<script src="https://code.jquery.com/jquery-2.2.4.min.js"
integrity="sha256-BbhdlvQf/xTY9gja0Dq3HiwQF8LaCRTXxZKRutelT44="
crossorigin="anonymous"></script>
<script src="/js/jena-navigation.js" type="text/javascript"></script>
<script src="/js/bootstrap.min.js" type="text/javascript"></script>
<script src="/js/improve.js" type="text/javascript"></script>
</head>
<body>
<nav class="navbar navbar-default" role="navigation">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-ex1-collapse">
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="/index.html">
<img class="logo-menu" src="/images/jena-logo/jena-logo-notext-small.png" alt="jena logo">Apache Jena</a>
</div>
<div class="collapse navbar-collapse navbar-ex1-collapse">
<ul class="nav navbar-nav">
<li id="homepage"><a href="/index.html"><span class="glyphicon glyphicon-home"></span> Home</a></li>
<li id="download"><a href="/download/index.cgi"><span class="glyphicon glyphicon-download-alt"></span> Download</a></li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown"><span class="glyphicon glyphicon-book"></span> Learn <b class="caret"></b></a>
<ul class="dropdown-menu">
<li class="dropdown-header">Tutorials</li>
<li><a href="/tutorials/index.html">Overview</a></li>
<li><a href="/documentation/fuseki2/index.html">Fuseki Triplestore</a></li>
<li><a href="/documentation/notes/index.html">How-To's</a></li>
<li><a href="/documentation/query/manipulating_sparql_using_arq.html">Manipulating SPARQL using ARQ</a></li>
<li><a href="/tutorials/rdf_api.html">RDF core API tutorial</a></li>
<li><a href="/tutorials/sparql.html">SPARQL tutorial</a></li>
<li><a href="/tutorials/using_jena_with_eclipse.html">Using Jena with Eclipse</a></li>
<li class="divider"></li>
<li class="dropdown-header">References</li>
<li><a href="/documentation/index.html">Overview</a></li>
<li><a href="/documentation/query/index.html">ARQ (SPARQL)</a></li>
<li><a href="/documentation/assembler/index.html">Assembler</a></li>
<li><a href="/documentation/tools/index.html">Command-line tools</a></li>
<li><a href="/documentation/rdfs/">Data with RDFS Inferencing</a></li>
<li><a href="/documentation/geosparql/index.html">GeoSPARQL</a></li>
<li><a href="/documentation/inference/index.html">Inference API</a></li>
<li><a href="/documentation/javadoc.html">Javadoc</a></li>
<li><a href="/documentation/ontology/">Ontology API</a></li>
<li><a href="/documentation/permissions/index.html">Permissions</a></li>
<li><a href="/documentation/extras/querybuilder/index.html">Query Builder</a></li>
<li><a href="/documentation/rdf/index.html">RDF API</a></li>
<li><a href="/documentation/rdfconnection/">RDF Connection - SPARQL API</a></li>
<li><a href="/documentation/io/">RDF I/O</a></li>
<li><a href="/documentation/rdfstar/index.html">RDF-star</a></li>
<li><a href="/documentation/shacl/index.html">SHACL</a></li>
<li><a href="/documentation/shex/index.html">ShEx</a></li>
<li><a href="/documentation/jdbc/index.html">SPARQL over JDBC</a></li>
<li><a href="/documentation/tdb/index.html">TDB</a></li>
<li><a href="/documentation/tdb2/index.html">TDB2</a></li>
<li><a href="/documentation/query/text-query.html">Text Search</a></li>
</ul>
</li>
<li class="drop down">
<a href="#" class="dropdown-toggle" data-toggle="dropdown"><span class="glyphicon glyphicon-book"></span> Javadoc <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="/documentation/javadoc.html">All Javadoc</a></li>
<li><a href="/documentation/javadoc/arq/">ARQ</a></li>
<li><a href="/documentation/javadoc_elephas.html">Elephas</a></li>
<li><a href="/documentation/javadoc/fuseki2/">Fuseki</a></li>
<li><a href="/documentation/javadoc/geosparql/">GeoSPARQL</a></li>
<li><a href="/documentation/javadoc/jdbc/">JDBC</a></li>
<li><a href="/documentation/javadoc/jena/">Jena Core</a></li>
<li><a href="/documentation/javadoc/permissions/">Permissions</a></li>
<li><a href="/documentation/javadoc/extras/querybuilder/">Query Builder</a></li>
<li><a href="/documentation/javadoc/shacl/">SHACL</a></li>
<li><a href="/documentation/javadoc/tdb/">TDB</a></li>
<li><a href="/documentation/javadoc/text/">Text Search</a></li>
</ul>
</li>
<li id="ask"><a href="/help_and_support/index.html"><span class="glyphicon glyphicon-question-sign"></span> Ask</a></li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown"><span class="glyphicon glyphicon-bullhorn"></span> Get involved <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="/getting_involved/index.html">Contribute</a></li>
<li><a href="/help_and_support/bugs_and_suggestions.html">Report a bug</a></li>
<li class="divider"></li>
<li class="dropdown-header">Project</li>
<li><a href="/about_jena/about.html">About Jena</a></li>
<li><a href="/about_jena/architecture.html">Architecture</a></li>
<li><a href="/about_jena/citing.html">Citing</a></li>
<li><a href="/about_jena/team.html">Project team</a></li>
<li><a href="/about_jena/contributions.html">Related projects</a></li>
<li><a href="/about_jena/roadmap.html">Roadmap</a></li>
<li class="divider"></li>
<li class="dropdown-header">ASF</li>
<li><a href="http://www.apache.org/">Apache Software Foundation</a></li>
<li><a href="http://www.apache.org/foundation/sponsorship.html">Become a Sponsor</a></li>
<li><a href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li>
<li><a href="http://www.apache.org/security/">Security</a></li>
<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
</ul>
</li>
<li id="edit"><a href="https://github.com/apache/jena-site/edit/main/source/documentation/io/arp_sax.md" title="Edit this page on GitHub"><span class="glyphicon glyphicon-pencil"></span> Edit this page</a></li>
</ul>
</div>
</div>
</nav>
<div class="container">
<div class="row">
<div class="col-md-12">
<div id="breadcrumbs">
<ol class="breadcrumb">
<li><a href='/documentation'>DOCUMENTATION</a></li>
<li><a href='/documentation/io'>IO</a></li>
<li class="active">ARP SAX</li>
</ol>
</div>
<h1 class="title">SAX Input into Jena and ARP</h1>
<p>Normally, both ARP and Jena are used to read files either from the
local machine or from the Web. A different use case, addressed
here, is when the XML source is available in-memory in some way. In
these cases, ARP and Jena can be used as a SAX event handler,
turning SAX events into triples, or a DOM tree can be parsed into a
Jena Model.</p>
<h2 id="contents">Contents</h2>
<ul>
<li><a href="#overview">Overview</a></li>
<li><a href="#sample-code">Sample Code</a></li>
<li><a href="#initializing-sax-event-source">Initializing SAX event source</a></li>
<li><a href="#error-handler">Error Handler</a></li>
<li><a href="#options">Options</a></li>
<li><a href="#xml-lang-and-namespaces">XML Lang and Namespaces</a></li>
<li><a href="#using-your-own-triple-handler">Using your own triple handler</a></li>
<li><a href="#using-a-dom-as-input">Using a DOM as input</a></li>
</ul>
<h2 id="1-overview">1. Overview</h2>
<p>To read an arbitrary SAX source as triples to be added into a Jena
model, it is not possible to use a
<code>Model.</code><a href="/documentation/javadoc/jena/org/apache/jena/rdf/model/Model.html#read(java.io.InputStream,%20java.lang.String)"><code>read</code></a>()
operation. Instead, you construct a SAX event handler of class
<a href="/documentation/javadoc/jena/org/apache/jena/rdf/arp/SAX2Model.html"><code>SAX2Model</code></a>,
using the
<a href="/documentation/javadoc/jena/org/apache/jena/rdf/arp/SAX2Model.html#create(java.lang.String,%20org.apache.jena.rdf.model.Model)"><code>create</code></a>
method, install these as the handler on your SAX event source, and
then stream the SAX events. It is possible to have fine-grained
control over the SAX events, for instance, by inserting or deleting
events, before passing them to the
<a href="/documentation/javadoc/jena/org/apache/jena/rdf/arp/SAX2Model.html"><code>SAX2Model</code></a>
handler.</p>
<h2 id="sample-code">Sample Code</h2>
<p>This code uses the Xerces parser as a SAX event stream, and adds
the triple to a
<a href="/documentation/javadoc/jena/org/apache/jena/rdf/model/Model.html"><code>Model</code></a> using
default options.</p>
<pre><code>// Use your own SAX source.
XMLReader saxParser = new SAXParser();
// set up SAX input
InputStream in = new FileInputStream(&quot;kb.rdf&quot;);
InputSource ins = new InputSource(in);
ins.setSystemId(base);
Model m = ModelFactory.createDefaultModel();
String base = &quot;http://example.org/&quot;;
// create handler, linked to Model
SAX2Model handler = SAX2Model.create(base, m);
// install handler on SAX event stream
SAX2RDF.installHandlers(saxParser, handler);
try {
try {
saxParser.parse(ins);
} finally {
// MUST ensure handler is closed.
handler.close();
}
} catch (SAXParseException e) {
// Fatal parsing errors end here,
// but they will already have been reported.
}
</code></pre>
<h2 id="initializing-sax-event-source">Initializing SAX event source</h2>
<p>If your SAX event source is a subclass of <code>XMLReader</code>, then the
<a href="/documentation/javadoc/jena/org/apache/jena/rdf/arp/SAX2RDF.html#installHandlers(org.xml.sax.XMLReader,%20org.apache.jena.rdf.arp.XMLHandler)">installHandlers</a>
static method can be used as shown in the sample. Otherwise, you
have to do it yourself. The
<a href="/documentation/javadoc/jena/org/apache/jena/rdf/arp/SAX2RDF.html#installHandlers(org.xml.sax.XMLReader,%20org.apache.jena.rdf.arp.XMLHandler)"><code>installHandlers</code></a>
code is like this:</p>
<pre><code>static public void installHandlers(XMLReader rdr, XMLHandler sax2rdf)
throws SAXException
{
rdr.setEntityResolver(sax2rdf);
rdr.setDTDHandler(sax2rdf);
rdr.setContentHandler(sax2rdf);
rdr.setErrorHandler(sax2rdf);
rdr.setFeature(&quot;http://xml.org/sax/features/namespaces&quot;, true);
rdr.setFeature(
&quot;http://xml.org/sax/features/namespace-prefixes&quot;,
true);
rdr.setProperty(
&quot;http://xml.org/sax/properties/lexical-handler&quot;,
sax2rdf);
}
</code></pre>
<p>For some other SAX source, the exact code will differ, but the
required operations are as above.</p>
<h2 id="error-handler">Error Handler</h2>
<p>The <a href="/documentation/javadoc/jena/org/apache/jena/rdf/arp/SAX2Model.html">SAX2Model</a>
handler supports the
<a href="/documentation/javadoc/jena/org/apache/jena/rdf/model/RDFReader.html#setErrorHandler(org.apache.jena.rdf.model.RDFErrorHandler)">setErrorHandler</a>
method, from the Jena
<a href="/documentation/javadoc/jena/org/apache/jena/rdf/model/RDFReader.html">RDFReader</a>
interface. This is used in the same way as that method to control
error reporting.</p>
<p>A specific fatal error, new in Jena 2.3, is ERR_INTERRUPTED, which
indicates that the current Thread received an interrupt. This
allows long jobs to be aborted on user request.</p>
<h2 id="options">Options</h2>
<p>The <a href="/documentation/javadoc/jena/org/apache/jena/rdf/arp/SAX2Model.html"><code>SAX2Model</code></a>
handler supports the
<a href="/documentation/javadoc/jena/org/apache/jena/rdf/model/RDFReader.html#setProperty(java.lang.String,%20java.lang.Object)"><code>setProperty</code></a>
method, from the Jena
<a href="/documentation/javadoc/jena/org/apache/jena/rdf/model/RDFReader.html"><code>RDFReader</code></a>
interface. This is used in nearly the same way to have fine grain
control over ARPs behaviour, particularly over error reporting, see
the <a href="iohowto.html#arp_properties">I/O howto</a>. Setting SAX or
Xerces properties cannot be done using this method.</p>
<h2 id="xml-lang-and-namespaces">XML Lang and Namespaces</h2>
<p>If you are only treating some document subset as RDF/XML then it is
necessary to ensure that ARP knows the correct value for <code>xml:lang</code>
and desirable that it knows the correct mappings of namespace
prefixes.</p>
<p>There is a second version of the
<a href="/documentation/javadoc/jena/org/apache/jena/rdf/arp/SAX2Model.html#create(java.lang.String,%20org.apache.jena.rdf.model.Model,%20java.lang.String)"><code>create</code></a>
method, which allows specification of the <code>xml:lang</code> value from the
outer context. If this is inappropriate it is possible, but hard
work, to synthesis an appropriate SAX event.</p>
<p>For the namespaces prefixes, it is possible to call the
<a href="/documentation/javadoc/jena/org/apache/jena/rdf/arp/SAX2RDF.html#startPrefixMapping(java.lang.String,%20java.lang.String)"><code>startPrefixMapping</code></a>
SAX event, before passing the other SAX events, to declare each
namespace, one by one. Failure to do this is permitted, but, for
instance, a Jena Model will then not know the (advisory) namespace
prefix bindings. These should be paired with endPrefixMapping
events, but nothing untoward is likely if such code is omitted.</p>
<h2 id="using-your-own-triple-handler">Using your own triple handler</h2>
<p>As with ARP, it is possible to use this functionality, without
using other Jena features, in particular, without using a Jena
Model. Instead of using the class SAX2Model, you use its superclass
<a href="/documentation/javadoc/jena/org/apache/jena/rdf/arp/SAX2RDF.html">SAX2RDF</a>. The
<a href="/documentation/javadoc/jena/org/apache/jena/rdf/arp/SAX2RDF.html#create(java.lang.String)">create</a>
method on this class does not provide any means of specifying what
to do with the triples. Instead, the class implements the
<a href="/documentation/javadoc/jena/org/apache/jena/rdf/arp/SAX2RDF.html">ARPConfig</a>
interface, which permits the setting of handlers and parser
options, as described in the documentation for using
<a href="standalone.html">ARP without Jena</a>.</p>
<p>Thus you need to:</p>
<ol>
<li>Create a SAX2RDF using
<a href="/documentation/javadoc/jena/org/apache/jena/rdf/arp/SAX2RDF.html#create(java.lang.String)">SAX2RDF.create()</a></li>
<li>Attach your StatementHandler and SAXErrorHandler and optionally
your NamespaceHandler and ExtendedHandler to the SAX2RDF instance.</li>
<li>Install the SAX2RDF instance as the SAX handler on your SAX
source.</li>
<li>Follow the remainder of the code sample above.</li>
</ol>
<h2 id="using-a-dom-as-input">Using a DOM as Input</h2>
<p>None of the approaches listed here work with Java 1.4.1_04. We
suggest using Java 1.4.2_04 or greater for this functionality.
This issue has no impact on any other Jena functionality.</p>
<h3 id="using-a-dom-as-input-to-jena">Using a DOM as Input to Jena</h3>
<p>The <a href="/documentation/javadoc/jena/org/apache/jena/rdf/arp/DOM2Model.html"><code>DOM2Model</code></a>
subclass of SAX2Model, allows the parsing of a DOM using ARP. The
procedure to follow is:</p>
<ul>
<li>Construct a <code>DOM2Model</code>, using a factory method such as
<a href="/documentation/javadoc/jena/org/apache/jena/rdf/arp/DOM2Model.html#createD2M(java.lang.String,%20org.apache.jena.rdf.model.Model)"><code>createD2M</code></a>,
specifying the xml:base of the document to be loaded, the Model to
load into, optionally the xml:lang value (particularly useful if
using a DOM Node from within a Document).</li>
<li>Set any properties, error handlers etc. on the <code>DOM2Model</code>
object.</li>
<li>The DOM is parsed simply by calling the
<a href="/documentation/javadoc/jena/org/apache/jena/rdf/arp/DOM2Model.html#load(org.w3c.dom.Node)"><code>load(Node)</code></a>
method.</li>
</ul>
<h3 id="using-a-dom-as-input-to-arp">Using a DOM as Input to ARP</h3>
<p>DOM2Model is a subclass of SAX2RDF, and handlers etc. can be set on
the DOM2Model as for SAX2RDF. Using a null model as the argument to
the factory indicates this usage.</p>
</div>
</div>
</div>
<footer class="footer">
<div class="container" style="font-size:80%" >
<p>
Copyright &copy; 2011&ndash;2022 The Apache Software Foundation, Licensed under the
<a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
</p>
<p>
Apache Jena, Jena, the Apache Jena project logo, Apache and the Apache feather logos are trademarks of
The Apache Software Foundation.
<br/>
<a href="https://privacy.apache.org/policies/privacy-policy-public.html"
>Apache Software Foundation Privacy Policy</a>.
</p>
</div>
</footer>
<script type="text/javascript">
var link = $('a[href="' + this.location.pathname + '"]');
if (link != undefined)
link.parents('li,ul').addClass('active');
</script>
</body>
</html>