blob: d0f38022f16e94fc0c7887fd6e1c6331f9b33c8c [file] [log] [blame]
<!DOCTYPE html>
<html lang="en">
<title>Apache Jena - SDB Loading data</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link href="/css/bootstrap.min.css" rel="stylesheet" media="screen">
<link href="/css/bootstrap-extension.css" rel="stylesheet" type="text/css">
<link href="/css/jena.css" rel="stylesheet" type="text/css">
<link rel="shortcut icon" href="/images/favicon.ico" />
<script src=""
<script src="/js/jena-navigation.js" type="text/javascript"></script>
<script src="/js/bootstrap.min.js" type="text/javascript"></script>
<script src="/js/improve.js" type="text/javascript"></script>
<nav class="navbar navbar-default" role="navigation">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-ex1-collapse">
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<a class="navbar-brand" href="/index.html">
<img class="logo-menu" src="/images/jena-logo/jena-logo-notext-small.png" alt="jena logo">Apache Jena</a>
<div class="collapse navbar-collapse navbar-ex1-collapse">
<ul class="nav navbar-nav">
<li id="homepage"><a href="/index.html"><span class="glyphicon glyphicon-home"></span> Home</a></li>
<li id="download"><a href="/download/index.cgi"><span class="glyphicon glyphicon-download-alt"></span> Download</a></li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown"><span class="glyphicon glyphicon-book"></span> Learn <b class="caret"></b></a>
<ul class="dropdown-menu">
<li class="dropdown-header">Tutorials</li>
<li><a href="/tutorials/index.html">Overview</a></li>
<li><a href="/documentation/fuseki2/index.html">Fuseki Triplestore</a></li>
<li><a href="/documentation/notes/index.html">How-To's</a></li>
<li><a href="/documentation/query/manipulating_sparql_using_arq.html">Manipulating SPARQL using ARQ</a></li>
<li><a href="/tutorials/rdf_api.html">RDF core API tutorial</a></li>
<li><a href="/tutorials/sparql.html">SPARQL tutorial</a></li>
<li><a href="/tutorials/using_jena_with_eclipse.html">Using Jena with Eclipse</a></li>
<li class="divider"></li>
<li class="dropdown-header">References</li>
<li><a href="/documentation/index.html">Overview</a></li>
<li><a href="/documentation/query/index.html">ARQ (SPARQL)</a></li>
<li><a href="/documentation/assembler/index.html">Assembler</a></li>
<li><a href="/documentation/tools/index.html">Command-line tools</a></li>
<li><a href="/documentation/rdfs/">Data with RDFS Inferencing</a></li>
<li><a href="/documentation/geosparql/index.html">GeoSPARQL</a></li>
<li><a href="/documentation/inference/index.html">Inference API</a></li>
<li><a href="/documentation/javadoc.html">Javadoc</a></li>
<li><a href="/documentation/ontology/">Ontology API</a></li>
<li><a href="/documentation/permissions/index.html">Permissions</a></li>
<li><a href="/documentation/extras/querybuilder/index.html">Query Builder</a></li>
<li><a href="/documentation/rdf/index.html">RDF API</a></li>
<li><a href="/documentation/rdfconnection/">RDF Connection - SPARQL API</a></li>
<li><a href="/documentation/io/">RDF I/O</a></li>
<li><a href="/documentation/rdfstar/index.html">RDF-star</a></li>
<li><a href="/documentation/shacl/index.html">SHACL</a></li>
<li><a href="/documentation/shex/index.html">ShEx</a></li>
<li><a href="/documentation/jdbc/index.html">SPARQL over JDBC</a></li>
<li><a href="/documentation/tdb/index.html">TDB</a></li>
<li><a href="/documentation/tdb2/index.html">TDB2</a></li>
<li><a href="/documentation/query/text-query.html">Text Search</a></li>
<li class="drop down">
<a href="#" class="dropdown-toggle" data-toggle="dropdown"><span class="glyphicon glyphicon-book"></span> Javadoc <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="/documentation/javadoc.html">All Javadoc</a></li>
<li><a href="/documentation/javadoc/arq/">ARQ</a></li>
<li><a href="/documentation/javadoc_elephas.html">Elephas</a></li>
<li><a href="/documentation/javadoc/fuseki2/">Fuseki</a></li>
<li><a href="/documentation/javadoc/geosparql/">GeoSPARQL</a></li>
<li><a href="/documentation/javadoc/jdbc/">JDBC</a></li>
<li><a href="/documentation/javadoc/jena/">Jena Core</a></li>
<li><a href="/documentation/javadoc/permissions/">Permissions</a></li>
<li><a href="/documentation/javadoc/extras/querybuilder/">Query Builder</a></li>
<li><a href="/documentation/javadoc/shacl/">SHACL</a></li>
<li><a href="/documentation/javadoc/tdb/">TDB</a></li>
<li><a href="/documentation/javadoc/text/">Text Search</a></li>
<li id="ask"><a href="/help_and_support/index.html"><span class="glyphicon glyphicon-question-sign"></span> Ask</a></li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown"><span class="glyphicon glyphicon-bullhorn"></span> Get involved <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="/getting_involved/index.html">Contribute</a></li>
<li><a href="/help_and_support/bugs_and_suggestions.html">Report a bug</a></li>
<li class="divider"></li>
<li class="dropdown-header">Project</li>
<li><a href="/about_jena/about.html">About Jena</a></li>
<li><a href="/about_jena/architecture.html">Architecture</a></li>
<li><a href="/about_jena/citing.html">Citing</a></li>
<li><a href="/about_jena/team.html">Project team</a></li>
<li><a href="/about_jena/contributions.html">Related projects</a></li>
<li><a href="/about_jena/roadmap.html">Roadmap</a></li>
<li class="divider"></li>
<li class="dropdown-header">ASF</li>
<li><a href="">Apache Software Foundation</a></li>
<li><a href="">Become a Sponsor</a></li>
<li><a href="">License</a></li>
<li><a href="">Security</a></li>
<li><a href="">Thanks</a></li>
<li id="edit"><a href="" title="Edit this page on GitHub"><span class="glyphicon glyphicon-pencil"></span> Edit this page</a></li>
<div class="container">
<div class="row">
<div class="col-md-12">
<div id="breadcrumbs">
<ol class="breadcrumb">
<li><a href='/documentation'>DOCUMENTATION</a></li>
<li><a href='/documentation/archive'>ARCHIVE</a></li>
<li><a href='/documentation/archive/sdb'>SDB</a></li>
<li class="active">LOADING DATA</li>
<h1 class="title">SDB Loading data</h1>
<p>There are three ways to load data into SDB:</p>
<li>Use the command utility
<a href="commands.html#Loading_data" title="SDB/Commands">sdbload</a></li>
<li>Use one of the Jena <code></code> operations</li>
<li>Use the Jena <code>model.add</code></li>
<p>The last one of these requires the application to signal the
beginning and end of batches.</p>
<h2 id="loading-with-modelread">Loading with <code></code></h2>
<p>A Jena Model obtained from SDB via:</p>
<p>will automatically bulk load data for each call of one of the
<code></code> operations.</p>
<h2 id="loading-with-modeladd">Loading with <code>Model.add</code></h2>
<p>The <code>Model.add</code> operations, in any form or combination of forms,
whether loading a single statement, list of statements, or another
model, will invoke the bulk loader if previously notified before an
add operation.</p>
<p>You can also explicitly delimit bulk operations:</p>
<pre><code> model.notifyEvent(GraphEvents.startRead)
... do add/remove operations ...
<p><strong>Failing to notify the end of the operations will result in data loss</strong>.</p>
<p>A try/finally block can ensure that the finish is notified.</p>
<pre><code> model.notifyEvent(GraphEvents.startRead) ;
try {
... do add/remove operations ...
} finally {
model.notifyEvent(GraphEvents.finishRead) ;
<p>The <code></code> operations do this automatically.</p>
<p>The bulk loader will automatically chunk large sequences of
additions to sizes appropriate to the underlying database. The bulk
loader is threaded with double-buffered; loading to the database
happens in parallel to the application thread and any RDF parsing.</p>
<h2 id="how-the-loader-works">How the loader works</h2>
<p>Loading consists of two phases: in the java VM, and on the database
itself. The SDB loader takes incoming triples and breaks them down
into components ready for the database. These prepared triples are
added to a queue for the database phase, which (by default) takes
place on a separate thread. When the number of triples reaches a
limit (default 20,000), or finish update is signalled, the triples
are passed to the database.</p>
<p>You can configure whether to use threading and the &lsquo;chunk size&rsquo; &ndash;
the number of triples per load event &ndash; via <code>StoreLoader</code>.</p>
<pre><code>Store store; // SDB Store
store.getLoader().setChunkSize(5000); //
store.getLoader().setUseThreading(false); // Don't thread
<p>You should set these <em>before</em> the loader has been used.</p>
<p>Each loader sets up two temporary tables (<code>NNode</code> and <code>NTrip</code>) that
mirror <code>Nodes</code> and <code>Triples</code> tables. These tables are virtually
identical, except that a) they are not indexed and b) for the index
variant there is no index column for nodes.</p>
<p>When loading prepared triples &ndash; triples that have been broken down
ready for the database &ndash; are passed to the loader core (normally
running on a different thread). When the chunk size is reached, or
we are out of triples, the following happens:</p>
<li>Prepared nodes are added in one go to <code>NNode</code>. Duplicate nodes
within a chunk are suppressed on the java side (this is worth doing
since they are quite common, e.g. properties).</li>
<li>Prepared triples are added in one go to <code>NTrip</code>.</li>
<li>New nodes are added to the node table (duplicate suppression is
explained below).</li>
<li>New triples are added to the triple table (once again
suppressing dupes). For the index case this involves joining on the
node table to do a hash to index lookup.</li>
<li>We commit.</li>
<li>If anything goes wrong the transaction (the chunk) is rolled
back, and an exception is thrown (or readied for throwing on the
calling thread).</li>
<p>Thus there are five calls to the database for every chunk. The
database handles almost all of the work uninterrupted (duplicate
suppression, hash to index lookup), which makes loading reasonably
<h2 id="duplicate-suppression">Duplicate Suppression</h2>
<p>MySQL has a very useful <code>INSERT IGNORE</code>, which will keep going,
skipping an offending row if a uniqueness constraint is violated.
For other databases we need something else.</p>
<p>Having tried a number of options the best seems to be to <code>INSERT</code>
new items by <code>LEFT JOIN</code> new items to existing items, then
filtering <code>WHERE (existing item feature) IS NULL</code>. Specifically,
for the triple hash case (where no id lookups are needed):</p>
<pre><code>INSERT INTO Triples
SELECT DISTINCT NTrip.s, NTrip.p, NTrip.o -- DISTINCT because new triples may contain duplicates (not so for nodes)
NTrip LEFT JOIN Triples ON (NTrip.s=Triples.s AND NTrip.p=Triples.p AND NTrip.o=Triples.o)
WHERE Triples.s IS NULL OR Triples.p IS NULL OR Triples.o IS NULL
<footer class="footer">
<div class="container" style="font-size:80%" >
Copyright &copy; 2011&ndash;2022 The Apache Software Foundation, Licensed under the
<a href="">Apache License, Version 2.0</a>.
Apache Jena, Jena, the Apache Jena project logo, Apache and the Apache feather logos are trademarks of
The Apache Software Foundation.
<a href=""
>Apache Software Foundation Privacy Policy</a>.
<script type="text/javascript">
var link = $('a[href="' + this.location.pathname + '"]');
if (link != undefined)