blob: aa315bda9c9180f814921504cbf625a78930ce8b [file] [log] [blame]
<!DOCTYPE html>
<html lang="en">
<head>
<title>Apache Jena - SDB Database Notes</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link href="/css/bootstrap.min.css" rel="stylesheet" media="screen">
<link href="/css/bootstrap-extension.css" rel="stylesheet" type="text/css">
<link href="/css/jena.css" rel="stylesheet" type="text/css">
<link rel="shortcut icon" href="/images/favicon.ico" />
<script src="https://code.jquery.com/jquery-2.2.4.min.js"
integrity="sha256-BbhdlvQf/xTY9gja0Dq3HiwQF8LaCRTXxZKRutelT44="
crossorigin="anonymous"></script>
<script src="/js/jena-navigation.js" type="text/javascript"></script>
<script src="/js/bootstrap.min.js" type="text/javascript"></script>
<script src="/js/improve.js" type="text/javascript"></script>
</head>
<body>
<nav class="navbar navbar-default" role="navigation">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-ex1-collapse">
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="/index.html">
<img class="logo-menu" src="/images/jena-logo/jena-logo-notext-small.png" alt="jena logo">Apache Jena</a>
</div>
<div class="collapse navbar-collapse navbar-ex1-collapse">
<ul class="nav navbar-nav">
<li id="homepage"><a href="/index.html"><span class="glyphicon glyphicon-home"></span> Home</a></li>
<li id="download"><a href="/download/index.cgi"><span class="glyphicon glyphicon-download-alt"></span> Download</a></li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown"><span class="glyphicon glyphicon-book"></span> Learn <b class="caret"></b></a>
<ul class="dropdown-menu">
<li class="dropdown-header">Tutorials</li>
<li><a href="/tutorials/index.html">Overview</a></li>
<li><a href="/documentation/fuseki2/index.html">Fuseki Triplestore</a></li>
<li><a href="/documentation/notes/index.html">How-To's</a></li>
<li><a href="/documentation/query/manipulating_sparql_using_arq.html">Manipulating SPARQL using ARQ</a></li>
<li><a href="/tutorials/rdf_api.html">RDF core API tutorial</a></li>
<li><a href="/tutorials/sparql.html">SPARQL tutorial</a></li>
<li><a href="/tutorials/using_jena_with_eclipse.html">Using Jena with Eclipse</a></li>
<li class="divider"></li>
<li class="dropdown-header">References</li>
<li><a href="/documentation/index.html">Overview</a></li>
<li><a href="/documentation/query/index.html">ARQ (SPARQL)</a></li>
<li><a href="/documentation/assembler/index.html">Assembler</a></li>
<li><a href="/documentation/tools/index.html">Command-line tools</a></li>
<li><a href="/documentation/rdfs/">Data with RDFS Inferencing</a></li>
<li><a href="/documentation/geosparql/index.html">GeoSPARQL</a></li>
<li><a href="/documentation/inference/index.html">Inference API</a></li>
<li><a href="/documentation/javadoc.html">Javadoc</a></li>
<li><a href="/documentation/ontology/">Ontology API</a></li>
<li><a href="/documentation/permissions/index.html">Permissions</a></li>
<li><a href="/documentation/extras/querybuilder/index.html">Query Builder</a></li>
<li><a href="/documentation/rdf/index.html">RDF API</a></li>
<li><a href="/documentation/rdfconnection/">RDF Connection - SPARQL API</a></li>
<li><a href="/documentation/io/">RDF I/O</a></li>
<li><a href="/documentation/rdfstar/index.html">RDF-star</a></li>
<li><a href="/documentation/shacl/index.html">SHACL</a></li>
<li><a href="/documentation/shex/index.html">ShEx</a></li>
<li><a href="/documentation/jdbc/index.html">SPARQL over JDBC</a></li>
<li><a href="/documentation/tdb/index.html">TDB</a></li>
<li><a href="/documentation/tdb2/index.html">TDB2</a></li>
<li><a href="/documentation/query/text-query.html">Text Search</a></li>
</ul>
</li>
<li class="drop down">
<a href="#" class="dropdown-toggle" data-toggle="dropdown"><span class="glyphicon glyphicon-book"></span> Javadoc <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="/documentation/javadoc.html">All Javadoc</a></li>
<li><a href="/documentation/javadoc/arq/">ARQ</a></li>
<li><a href="/documentation/javadoc_elephas.html">Elephas</a></li>
<li><a href="/documentation/javadoc/fuseki2/">Fuseki</a></li>
<li><a href="/documentation/javadoc/geosparql/">GeoSPARQL</a></li>
<li><a href="/documentation/javadoc/jdbc/">JDBC</a></li>
<li><a href="/documentation/javadoc/jena/">Jena Core</a></li>
<li><a href="/documentation/javadoc/permissions/">Permissions</a></li>
<li><a href="/documentation/javadoc/extras/querybuilder/">Query Builder</a></li>
<li><a href="/documentation/javadoc/shacl/">SHACL</a></li>
<li><a href="/documentation/javadoc/tdb/">TDB</a></li>
<li><a href="/documentation/javadoc/text/">Text Search</a></li>
</ul>
</li>
<li id="ask"><a href="/help_and_support/index.html"><span class="glyphicon glyphicon-question-sign"></span> Ask</a></li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown"><span class="glyphicon glyphicon-bullhorn"></span> Get involved <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="/getting_involved/index.html">Contribute</a></li>
<li><a href="/help_and_support/bugs_and_suggestions.html">Report a bug</a></li>
<li class="divider"></li>
<li class="dropdown-header">Project</li>
<li><a href="/about_jena/about.html">About Jena</a></li>
<li><a href="/about_jena/architecture.html">Architecture</a></li>
<li><a href="/about_jena/citing.html">Citing</a></li>
<li><a href="/about_jena/team.html">Project team</a></li>
<li><a href="/about_jena/contributions.html">Related projects</a></li>
<li><a href="/about_jena/roadmap.html">Roadmap</a></li>
<li class="divider"></li>
<li class="dropdown-header">ASF</li>
<li><a href="http://www.apache.org/">Apache Software Foundation</a></li>
<li><a href="http://www.apache.org/foundation/sponsorship.html">Become a Sponsor</a></li>
<li><a href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li>
<li><a href="http://www.apache.org/security/">Security</a></li>
<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
</ul>
</li>
<li id="edit"><a href="https://github.com/apache/jena-site/edit/main/source/documentation/archive/sdb/db_notes.md" title="Edit this page on GitHub"><span class="glyphicon glyphicon-pencil"></span> Edit this page</a></li>
</ul>
</div>
</div>
</nav>
<div class="container">
<div class="row">
<div class="col-md-12">
<div id="breadcrumbs">
<ol class="breadcrumb">
<li><a href='/documentation'>DOCUMENTATION</a></li>
<li><a href='/documentation/archive'>ARCHIVE</a></li>
<li><a href='/documentation/archive/sdb'>SDB</a></li>
<li class="active">DB NOTES</li>
</ol>
</div>
<h1 class="title">SDB Database Notes</h1>
<ul>
<li><a href="#db2">DB2</a></li>
<li><a href="#derby">Derby</a></li>
<li><a href="#ms-sql">MS SQL</a></li>
<li><a href="#mysql">MySQL</a></li>
<li><a href="#postgresql">PostgresQL</a></li>
</ul>
<h2 id="db2">DB2</h2>
<h3 id="database-creation">Database creation</h3>
<p>The database should be created with code set UTF-8 so unicode is
enabled (SDB creates tables CCSID UNICODE for full
internationalization support).</p>
<h2 id="derby">Derby</h2>
<h3 id="loading-restriction">Loading Restriction</h3>
<p>Only one load operation can be active at any one time. Limitations
on temporary tables in Derby mean the loader tables are not
temporary and hence are shared by all connections.</p>
<h2 id="ms-sql">MS SQL</h2>
<p>The collation sequence for the database must be one that is binary
(BIN in the name). It does not matter which one is used. Without
BIN, string matching is case insensitive but RDF requires case
sensitive literals and IRIs. The normal layout is not affected by
this because it does not use string comparisons.</p>
<h2 id="mysql">MySQL</h2>
<h3 id="national-characters">National Characters</h3>
<p>SDB formats all table columns used for storing text in the MySQL
schema to UTF-8. However, this does not cause the data to be
transmitted in UTF-8 over the JDBC connection.</p>
<p>The best way is to run the server with a default character set of
UTF-8. This is set in the MySQL server configuration file:</p>
<pre><code>[mysql]
default-character-set=utf8
</code></pre>
<p>A less reliable way is to pass parameters to the JDBC driver in the
JDBC URL. The application will need to explicitly set the JDBC URL
in the
<a href="store_description.html" title="SDB/Store Description">store configuration</a>
file.</p>
<pre><code> ...?useUnicode=true&amp;characterEncoding=UTF-8
</code></pre>
<h3 id="connection-timeouts">Connection timeouts</h3>
<p>If you get the connection timing out after (by default) 8 hours of
no activity, try setting <code>autoReconnect=true</code> in the JDBC URL.</p>
<h3 id="tuning">Tuning</h3>
<ol>
<li>For InndoDB, the critical parameter is
<code>innodb_buffer_pool_size</code>. See the MySQL sample configuration files
for details.</li>
<li>Using ANALYZE TABLE on the database tables can improve the
choices made by the MySQL optimizer.</li>
</ol>
<h3 id="connection-timeout">Connection Timeout</h3>
<p>MySQL closes the JDBC connection after a period of no use (8 hours
by default).</p>
<p>While deprecated my MySQL, <code>?autoReconnect=true</code> may help here.</p>
<p>Other ways of addressing the problem are to make a simple query
call on a regular basis just to keep the connection alive (e.g.
<code>SELECT * { &lt;http://example/junk&gt; &lt;http://example/junk&gt; &lt;http://example/junk&gt; }</code>).</p>
<p>Some connection pool systems automatic compensate for this feature
of MySQL.</p>
<h2 id="postgresql">PostgresQL</h2>
<h3 id="databases-must-use-utf-8-encoding">Databases must use UTF-8 encoding</h3>
<p>Create SDB stores with encoding UTF-8.</p>
<p>International character sets can cause corrupted databases
otherwise. The database will not pass the SDB test suite.</p>
<p>Set this when creating the database with pgAdmin or if you use the
command line, for example:</p>
<pre><code> CREATE DATABASE &quot;YourStoreName&quot;
WITH OWNER = &quot;user&quot;
ENCODING = 'UTF8'
TABLESPACE = pg_default;
</code></pre>
<h3 id="improving-loading-rates">Improving loading rates</h3>
<p>The index layout (&ldquo;layout2/index&rdquo;) usually loads faster than the
hash form.</p>
<p><em>Existing store</em></p>
<p>When loading into an existing store, where there is existing data
and <code>ANALYZE</code> has been run, the process is:</p>
<ul>
<li>
<p>Drop indexes</p>
<p>sdbconfig &ndash;drop</p>
</li>
<li>
<p>Load data</p>
<p>sdbload file</p>
</li>
<li>
<p>Redo the indexes</p>
<p>sdbconfig &ndash;index</p>
</li>
</ul>
<p><em>Fresh store</em></p>
<p>PostgreSQL needs statistics to improve load performance through the
use of <code>ANALYSE</code>.</p>
<p>When loading the first time, there are no statistics so, for a
large load, it is advisable to load a sample, run <code>ANALYSE</code> and
then load the whole data.</p>
<ul>
<li>
<p>Create the database without indexes (just the primary keys).</p>
<p>sdbconfig &ndash;format</p>
</li>
<li>
<p>Load a sample of the triples (say, a 100K or a million triples</p>
<ul>
<li>until</li>
</ul>
</li>
</ul>
<p>the load rate starts to drop appreciably). The sample must be
representative of the data.</p>
<pre><code> sdbload --time sample
</code></pre>
<ul>
<li>
<p>Run <code>ANALYZE</code> on the database.</p>
</li>
<li>
<p>If your sample is one part of a large set of files, this set is
not necessary at all. If you are loading one single large file then
you might wish to empty the database. This is only needed if the
data has bNodes in</p>
</li>
</ul>
<p>it because the load process suppresses duplicates.</p>
<pre><code> sdbconfig --truncate
</code></pre>
<ul>
<li>
<p>Now load the data or rest of the data.</p>
<p>sdbload &ndash;time file</p>
</li>
<li>
<p>Add the indexes. This only takes a few minutes even on a very
large store but calculating them during loading (that is,
<code>--create</code>, not <code>--format</code>) is noticeably slower.</p>
<p>sdbconfig &ndash;index</p>
</li>
<li>
<p>Run <code>ANALYZE</code> on the database again.</p>
</li>
</ul>
<h3 id="tuning-1">Tuning</h3>
<p>It is essential to run the PostgreSQL ANALYZE command on a
database, either during or after building. This is done via the
command line <code>psql</code> or via <code>pgAdmin</code>. The PostgreSQL documentation
describes ways to run this as a background daemon.</p>
<p>Various of the PostgreSQL configuration parameters will affect
performance, particularly <code>effective_cache_size</code>. The parameter
<code>enable_seqscan</code> <em>may</em> help avoid some unexpected slow queries.</p>
</div>
</div>
</div>
<footer class="footer">
<div class="container" style="font-size:80%" >
<p>
Copyright &copy; 2011&ndash;2022 The Apache Software Foundation, Licensed under the
<a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
</p>
<p>
Apache Jena, Jena, the Apache Jena project logo, Apache and the Apache feather logos are trademarks of
The Apache Software Foundation.
<br/>
<a href="https://privacy.apache.org/policies/privacy-policy-public.html"
>Apache Software Foundation Privacy Policy</a>.
</p>
</div>
</footer>
<script type="text/javascript">
var link = $('a[href="' + this.location.pathname + '"]');
if (link != undefined)
link.parents('li,ul').addClass('active');
</script>
</body>
</html>