| <!DOCTYPE html> |
| <html lang="en"> |
| <head> |
| |
| |
| <title>Apache Jena - SDB Loading data</title> |
| <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> |
| <meta name="viewport" content="width=device-width, initial-scale=1.0"> |
| |
| <link href="/css/bootstrap.min.css" rel="stylesheet" media="screen"> |
| <link href="/css/bootstrap-extension.css" rel="stylesheet" type="text/css"> |
| <link href="/css/jena.css" rel="stylesheet" type="text/css"> |
| <link rel="shortcut icon" href="/images/favicon.ico" /> |
| |
| <script src="https://code.jquery.com/jquery-2.2.4.min.js" |
| integrity="sha256-BbhdlvQf/xTY9gja0Dq3HiwQF8LaCRTXxZKRutelT44=" |
| crossorigin="anonymous"></script> |
| <script src="/js/jena-navigation.js" type="text/javascript"></script> |
| <script src="/js/bootstrap.min.js" type="text/javascript"></script> |
| |
| <script src="/js/improve.js" type="text/javascript"></script> |
| |
| |
| </head> |
| |
| <body> |
| |
| <nav class="navbar navbar-default" role="navigation"> |
| <div class="container"> |
| <div class="navbar-header"> |
| <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-ex1-collapse"> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| </button> |
| <a class="navbar-brand" href="/index.html"> |
| <img class="logo-menu" src="/images/jena-logo/jena-logo-notext-small.png" alt="jena logo">Apache Jena</a> |
| </div> |
| |
| <div class="collapse navbar-collapse navbar-ex1-collapse"> |
| <ul class="nav navbar-nav"> |
| <li id="homepage"><a href="/index.html"><span class="glyphicon glyphicon-home"></span> Home</a></li> |
| <li id="download"><a href="/download/index.cgi"><span class="glyphicon glyphicon-download-alt"></span> Download</a></li> |
| <li class="dropdown"> |
| <a href="#" class="dropdown-toggle" data-toggle="dropdown"><span class="glyphicon glyphicon-book"></span> Learn <b class="caret"></b></a> |
| <ul class="dropdown-menu"> |
| <li class="dropdown-header">Tutorials</li> |
| <li><a href="/tutorials/index.html">Overview</a></li> |
| <li><a href="/documentation/fuseki2/index.html">Fuseki Triplestore</a></li> |
| <li><a href="/documentation/notes/index.html">How-To's</a></li> |
| <li><a href="/documentation/query/manipulating_sparql_using_arq.html">Manipulating SPARQL using ARQ</a></li> |
| <li><a href="/tutorials/rdf_api.html">RDF core API tutorial</a></li> |
| <li><a href="/tutorials/sparql.html">SPARQL tutorial</a></li> |
| <li><a href="/tutorials/using_jena_with_eclipse.html">Using Jena with Eclipse</a></li> |
| <li class="divider"></li> |
| <li class="dropdown-header">References</li> |
| <li><a href="/documentation/index.html">Overview</a></li> |
| <li><a href="/documentation/query/index.html">ARQ (SPARQL)</a></li> |
| <li><a href="/documentation/assembler/index.html">Assembler</a></li> |
| <li><a href="/documentation/tools/index.html">Command-line tools</a></li> |
| <li><a href="/documentation/rdfs/">Data with RDFS Inferencing</a></li> |
| <li><a href="/documentation/geosparql/index.html">GeoSPARQL</a></li> |
| <li><a href="/documentation/inference/index.html">Inference API</a></li> |
| <li><a href="/documentation/javadoc.html">Javadoc</a></li> |
| <li><a href="/documentation/ontology/">Ontology API</a></li> |
| <li><a href="/documentation/permissions/index.html">Permissions</a></li> |
| <li><a href="/documentation/extras/querybuilder/index.html">Query Builder</a></li> |
| <li><a href="/documentation/rdf/index.html">RDF API</a></li> |
| <li><a href="/documentation/rdfconnection/">RDF Connection - SPARQL API</a></li> |
| <li><a href="/documentation/io/">RDF I/O</a></li> |
| <li><a href="/documentation/rdfstar/index.html">RDF-star</a></li> |
| <li><a href="/documentation/shacl/index.html">SHACL</a></li> |
| <li><a href="/documentation/shex/index.html">ShEx</a></li> |
| <li><a href="/documentation/jdbc/index.html">SPARQL over JDBC</a></li> |
| <li><a href="/documentation/tdb/index.html">TDB</a></li> |
| <li><a href="/documentation/tdb2/index.html">TDB2</a></li> |
| <li><a href="/documentation/query/text-query.html">Text Search</a></li> |
| </ul> |
| </li> |
| |
| <li class="drop down"> |
| <a href="#" class="dropdown-toggle" data-toggle="dropdown"><span class="glyphicon glyphicon-book"></span> Javadoc <b class="caret"></b></a> |
| <ul class="dropdown-menu"> |
| <li><a href="/documentation/javadoc.html">All Javadoc</a></li> |
| <li><a href="/documentation/javadoc/arq/">ARQ</a></li> |
| <li><a href="/documentation/javadoc_elephas.html">Elephas</a></li> |
| <li><a href="/documentation/javadoc/fuseki2/">Fuseki</a></li> |
| <li><a href="/documentation/javadoc/geosparql/">GeoSPARQL</a></li> |
| <li><a href="/documentation/javadoc/jdbc/">JDBC</a></li> |
| <li><a href="/documentation/javadoc/jena/">Jena Core</a></li> |
| <li><a href="/documentation/javadoc/permissions/">Permissions</a></li> |
| <li><a href="/documentation/javadoc/extras/querybuilder/">Query Builder</a></li> |
| <li><a href="/documentation/javadoc/shacl/">SHACL</a></li> |
| <li><a href="/documentation/javadoc/tdb/">TDB</a></li> |
| <li><a href="/documentation/javadoc/text/">Text Search</a></li> |
| </ul> |
| </li> |
| |
| <li id="ask"><a href="/help_and_support/index.html"><span class="glyphicon glyphicon-question-sign"></span> Ask</a></li> |
| |
| <li class="dropdown"> |
| <a href="#" class="dropdown-toggle" data-toggle="dropdown"><span class="glyphicon glyphicon-bullhorn"></span> Get involved <b class="caret"></b></a> |
| <ul class="dropdown-menu"> |
| <li><a href="/getting_involved/index.html">Contribute</a></li> |
| <li><a href="/help_and_support/bugs_and_suggestions.html">Report a bug</a></li> |
| <li class="divider"></li> |
| <li class="dropdown-header">Project</li> |
| <li><a href="/about_jena/about.html">About Jena</a></li> |
| <li><a href="/about_jena/architecture.html">Architecture</a></li> |
| <li><a href="/about_jena/citing.html">Citing</a></li> |
| <li><a href="/about_jena/team.html">Project team</a></li> |
| <li><a href="/about_jena/contributions.html">Related projects</a></li> |
| <li><a href="/about_jena/roadmap.html">Roadmap</a></li> |
| <li class="divider"></li> |
| <li class="dropdown-header">ASF</li> |
| <li><a href="http://www.apache.org/">Apache Software Foundation</a></li> |
| <li><a href="http://www.apache.org/foundation/sponsorship.html">Become a Sponsor</a></li> |
| <li><a href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li> |
| <li><a href="http://www.apache.org/security/">Security</a></li> |
| <li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li> |
| </ul> |
| </li> |
| |
| |
| |
| |
| <li id="edit"><a href="https://github.com/apache/jena-site/edit/main/source/documentation/archive/sdb/loading_data.md" title="Edit this page on GitHub"><span class="glyphicon glyphicon-pencil"></span> Edit this page</a></li> |
| </ul> |
| </div> |
| </div> |
| </nav> |
| |
| |
| <div class="container"> |
| <div class="row"> |
| <div class="col-md-12"> |
| <div id="breadcrumbs"> |
| |
|
|
|
|
|
|
|
|
|
|
|
|
| <ol class="breadcrumb">
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| <li><a href='/documentation'>DOCUMENTATION</a></li>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| <li><a href='/documentation/archive'>ARCHIVE</a></li>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| <li><a href='/documentation/archive/sdb'>SDB</a></li>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| <li class="active">LOADING DATA</li>
|
|
|
|
|
|
|
|
|
| </ol>
|
|
|
|
|
|
|
| |
| |
| </div> |
| <h1 class="title">SDB Loading data</h1> |
| |
| <p>There are three ways to load data into SDB:</p> |
| <ol> |
| <li>Use the command utility |
| <a href="commands.html#Loading_data" title="SDB/Commands">sdbload</a></li> |
| <li>Use one of the Jena <code>model.read</code> operations</li> |
| <li>Use the Jena <code>model.add</code></li> |
| </ol> |
| <p>The last one of these requires the application to signal the |
| beginning and end of batches.</p> |
| <h2 id="loading-with-modelread">Loading with <code>Model.read</code></h2> |
| <p>A Jena Model obtained from SDB via:</p> |
| <pre><code>SDBFactory.connectModel(store) |
| </code></pre> |
| <p>will automatically bulk load data for each call of one of the |
| <code>Model.read</code> operations.</p> |
| <h2 id="loading-with-modeladd">Loading with <code>Model.add</code></h2> |
| <p>The <code>Model.add</code> operations, in any form or combination of forms, |
| whether loading a single statement, list of statements, or another |
| model, will invoke the bulk loader if previously notified before an |
| add operation.</p> |
| <p>You can also explicitly delimit bulk operations:</p> |
| <pre><code> model.notifyEvent(GraphEvents.startRead) |
| ... do add/remove operations ... |
| model.notifyEvent(GraphEvents.finishRead) |
| </code></pre> |
| <p><strong>Failing to notify the end of the operations will result in data loss</strong>.</p> |
| <p>A try/finally block can ensure that the finish is notified.</p> |
| <pre><code> model.notifyEvent(GraphEvents.startRead) ; |
| try { |
| ... do add/remove operations ... |
| } finally { |
| model.notifyEvent(GraphEvents.finishRead) ; |
| } |
| </code></pre> |
| <p>The <code>model.read</code> operations do this automatically.</p> |
| <p>The bulk loader will automatically chunk large sequences of |
| additions to sizes appropriate to the underlying database. The bulk |
| loader is threaded with double-buffered; loading to the database |
| happens in parallel to the application thread and any RDF parsing.</p> |
| <h2 id="how-the-loader-works">How the loader works</h2> |
| <p>Loading consists of two phases: in the java VM, and on the database |
| itself. The SDB loader takes incoming triples and breaks them down |
| into components ready for the database. These prepared triples are |
| added to a queue for the database phase, which (by default) takes |
| place on a separate thread. When the number of triples reaches a |
| limit (default 20,000), or finish update is signalled, the triples |
| are passed to the database.</p> |
| <p>You can configure whether to use threading and the ‘chunk size’ – |
| the number of triples per load event – via <code>StoreLoader</code>.</p> |
| <pre><code>Store store; // SDB Store |
| ... |
| store.getLoader().setChunkSize(5000); // |
| store.getLoader().setUseThreading(false); // Don't thread |
| </code></pre> |
| <p>You should set these <em>before</em> the loader has been used.</p> |
| <p>Each loader sets up two temporary tables (<code>NNode</code> and <code>NTrip</code>) that |
| mirror <code>Nodes</code> and <code>Triples</code> tables. These tables are virtually |
| identical, except that a) they are not indexed and b) for the index |
| variant there is no index column for nodes.</p> |
| <p>When loading prepared triples – triples that have been broken down |
| ready for the database – are passed to the loader core (normally |
| running on a different thread). When the chunk size is reached, or |
| we are out of triples, the following happens:</p> |
| <ul> |
| <li>Prepared nodes are added in one go to <code>NNode</code>. Duplicate nodes |
| within a chunk are suppressed on the java side (this is worth doing |
| since they are quite common, e.g. properties).</li> |
| <li>Prepared triples are added in one go to <code>NTrip</code>.</li> |
| <li>New nodes are added to the node table (duplicate suppression is |
| explained below).</li> |
| <li>New triples are added to the triple table (once again |
| suppressing dupes). For the index case this involves joining on the |
| node table to do a hash to index lookup.</li> |
| <li>We commit.</li> |
| <li>If anything goes wrong the transaction (the chunk) is rolled |
| back, and an exception is thrown (or readied for throwing on the |
| calling thread).</li> |
| </ul> |
| <p>Thus there are five calls to the database for every chunk. The |
| database handles almost all of the work uninterrupted (duplicate |
| suppression, hash to index lookup), which makes loading reasonably |
| quick.</p> |
| <h2 id="duplicate-suppression">Duplicate Suppression</h2> |
| <p>MySQL has a very useful <code>INSERT IGNORE</code>, which will keep going, |
| skipping an offending row if a uniqueness constraint is violated. |
| For other databases we need something else.</p> |
| <p>Having tried a number of options the best seems to be to <code>INSERT</code> |
| new items by <code>LEFT JOIN</code> new items to existing items, then |
| filtering <code>WHERE (existing item feature) IS NULL</code>. Specifically, |
| for the triple hash case (where no id lookups are needed):</p> |
| <pre><code>INSERT INTO Triples |
| SELECT DISTINCT NTrip.s, NTrip.p, NTrip.o -- DISTINCT because new triples may contain duplicates (not so for nodes) |
| NTrip LEFT JOIN Triples ON (NTrip.s=Triples.s AND NTrip.p=Triples.p AND NTrip.o=Triples.o) |
| WHERE Triples.s IS NULL OR Triples.p IS NULL OR Triples.o IS NULL |
| </code></pre> |
| |
| |
| </div> |
| </div> |
| |
| </div> |
| |
| <footer class="footer"> |
| <div class="container" style="font-size:80%" > |
| <p> |
| Copyright © 2011–2022 The Apache Software Foundation, Licensed under the |
| <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. |
| </p> |
| <p> |
| Apache Jena, Jena, the Apache Jena project logo, Apache and the Apache feather logos are trademarks of |
| The Apache Software Foundation. |
| <br/> |
| <a href="https://privacy.apache.org/policies/privacy-policy-public.html" |
| >Apache Software Foundation Privacy Policy</a>. |
| </p> |
| </div> |
| </footer> |
| |
| |
| <script type="text/javascript"> |
| var link = $('a[href="' + this.location.pathname + '"]'); |
| if (link != undefined) |
| link.parents('li,ul').addClass('active'); |
| </script> |
| |
| </body> |
| </html> |