blob: a2c4ac90442aec19f5e04ff44a35a763977ac37f [file] [log] [blame]
[{"categories":null,"contents":"This page is historical \u0026ldquo;for information only\u0026rdquo; - there is no Apache release of Eyeball and the code has not been updated for Jena3.\nThe original source code is available. So you\u0026rsquo;ve got Eyeball installed and you\u0026rsquo;ve run it on one of your files, and Eyeball doesn\u0026rsquo;t like it. You\u0026rsquo;re not sure why, or what to do about it. Here\u0026rsquo;s what\u0026rsquo;s going on.\nEyeball inspects your model against a set of schemas. The default set of schemas includes RDF, RDFS, the XSD datatypes, and any models your model imports: you can add additional schemas from the command line or configuration file. Eyeball uses those schemas to work out what URIs count as \u0026ldquo;declared\u0026rdquo; in advance. It also checks URIs and literals for syntactic correctness and name space prefixes for being \u0026ldquo;sensible\u0026rdquo;. Let\u0026rsquo;s look at some of the messages you can get.\nUnknown predicate reports You\u0026rsquo;ll probably find several messages like this: predicate not declared in any schema: somePredicateURI\nEyeball treats the imported models, and (independently) the specified schemas, as single OntModels, and extracts those OntModels\u0026rsquo; properties. It includes the RDF and RDFS schemas. Anything used as a predicate that isn\u0026rsquo;t one of those properties is reported.\nIf you\u0026rsquo;re using OWL, you can silence the \u0026ldquo;undeclared property\u0026rdquo; messages about OWL properties by adding to your Eyeball command line the option: -assume owl\nEyeball will read the OWL schema (it has a copy stashed away in the mirror directory) and add the declared properties to its known list. This works for any filename or URL you like, so long as there\u0026rsquo;s RDF there and it has a suitable file suffix - .n3 for N3 or .rdf or .owl for RDF/XML - and for the built-in names dc (basic Dublin Core), dcterms (Dublin Core terms) and dc-all (both). So you can construct your own schemas, which declare your own domain-specific property declarations, and invoke Eyeball with\n-assume owl *mySchemaFile.n3* *otherSchemaFile.rdf* You can give short names (like dc and rdfs) to your own schemas, or collections of schemas, using an Eyeball config file, but you\u0026rsquo;ll have to see the manual to find out how.\nUnknown class reports You may see messages like this:\nclass not declared in any schema: someClassURI Having read the previous section, you can probably work out what\u0026rsquo;s going on: Eyeball reads the schemas (and imports) and extracts the declared OntClasses. Then anything used as a class that isn\u0026rsquo;t one of those declared classes is reported..\nAnd that\u0026rsquo;s exactly it. \u0026ldquo;Used as a class\u0026rdquo; means appearing as C or D in any statement of the form:\n\\_ rdf:type C \\_ rdfs:domain C \\_ rdfs:range C C rdfs:subClassOf D Suppressing inspectors It may be that you\u0026rsquo;re not interested in the \u0026ldquo;unknown predicate\u0026rdquo; or \u0026ldquo;unknown class\u0026rdquo; reports until you\u0026rsquo;ve sorted out the URIs. Or maybe you don\u0026rsquo;t care about them. In that case, you can switch them off.\nEyeball\u0026rsquo;s different checks are carried out by inspector classes. These classes are given short names by entries in Eyeball config files (which are RDF files written using N3; you can see the default config file by looking in Eyeball\u0026rsquo;s etc directory for eyeball2-config.n3). By adding eg:\n-exclude property class to the Eyeball command line, you can exclude the inspectors with those short names from the check. property is the short name for the \u0026ldquo;unknown property\u0026rdquo; inspector, and class is the short name for the \u0026ldquo;unknown class\u0026rdquo; inspector.\nNamespace and URI reports Eyeball checks all the URIs in the model, including (if available) those used for namespaces. (And literals, but see below.) Here\u0026rsquo;s an example:\nbad namespace URI: \u0026quot;file:some-filename\u0026quot; on prefix: \u0026quot;pqr\u0026quot; for reason: file URI inappropriate for namespace A \u0026ldquo;bad namespace URI\u0026rdquo; means that Eyeball doesn\u0026rsquo;t like the URI for a namespace in the model. The \u0026ldquo;on prefix\u0026rdquo; part of the report says what the namespace prefix is, and the \u0026ldquo;for reason\u0026rdquo; part gives the reason. In this case, we (the designer of Eyeball) feel that it is unwise to use file URIs - which tend to depend on internal details of your directory structure - for global concepts. A more usual reason is that the URI is syntactically illegal. Here are some possibilities:\nproblem explanation URI contains spaces literal spaces are not legal in URIs. This usually arises from file URIs when the file has a space in its name. Spaces in URIs have to be encoded. URI has no scheme The URI has no scheme at all. This usually happens when some relative URI hasn\u0026rsquo;t been resolved properly, eg there\u0026rsquo;s no xml base in an RDF/XML document. URI has an unrecognised scheme The scheme part of the URI - the bit before the first colon - isn\u0026rsquo;t recognised. Eyeball knows, by default, four schemes: http, ftp, file, and urn. This usually arises when a QName has \u0026ldquo;escaped\u0026rdquo; from somewhere, or from a typo. You can tell Eyeball about other schemes, if you need them. scheme should be lower-case The scheme part of the URI contains uppercase letters. While this is not actually wrong, it is unconventional and pointless. URI doesn\u0026rsquo;t fit pattern Eyeball has some (weak) checks on the syntax of URIs in different schemes, expressed as patterns in its config files. If a URI doesn\u0026rsquo;t match the pattern, Eyeball reports this problem. At the moment, you\u0026rsquo;ll only get this report for a urn URI like urn:x-hp:23487682347 where the URN id (the bit between the first and second colons, here x-hp) is illegal. URI syntax error A catch-all error: Java couldn\u0026rsquo;t make any sense of this URI at all. Problems with literals Eyeball checks literals (using the literal inspector, whose short name is literal if you want to switch it off), but the checking is quite weak because it doesn\u0026rsquo;t understand types at the moment. You can get two different classes of error.\nbad language: someLanguageCode on literal: theLiteralInQuestion Literals with language codes (things like en-UK or de) are checked to make sure that the language code conforms to the general syntax for language codes: alphanumeric words separated by hyphens, with the first containing no digits.\n(Later versions of Eyeball will likely allow you to specify which language codes you want to permit in your models. But we haven\u0026rsquo;t got there yet.)\nbad datatype URI: someURI on literal: theLiteralInQuestion for reason: theReason Similarly, literals with datatypes are checked to make sure that the type URI is legal. That\u0026rsquo;s it for the moment: Eyeball doesn\u0026rsquo;t try to find out if the URI really is a type URI, or if the spelling of the literal is OK for that type. But it spots the bad URIs. (The messages are the same as those that appear in the URI checking - above - for the very good reason that it\u0026rsquo;s the same code doing the checking.)\nProblematic prefixes Both RDF/XML and N3 allow (and RDF/XML requires) namespaces to be abbreviated by prefixes. Eyeball checks prefixes for two possible problems. The first:\nnon-standard namespace for prefix This arises when a \u0026ldquo;standard\u0026rdquo; prefix has been bound to a namespace URI which isn\u0026rsquo;t its usual one. The \u0026ldquo;standard\u0026rdquo; prefixes are taken from Jena\u0026rsquo;s PrefixMapping.Extended and are currently:\n**rdf, rdfs, owl, xsd, rss, vcard** And the second:\nJena generated prefix found This arises when the model contains prefixes of the form j.N, where N is a number. These are generated by Jena when writing RDF/XML for URIs that must have a prefix (because they are used as types or predicates) but haven\u0026rsquo;t been given one.\nIf you\u0026rsquo;re not bothered about inventing short prefixes for your namespaces, you can -exclude jena-prefix to suppress this inspection.\nBut how do I \u0026hellip; The reports described so far are part of Eyeball\u0026rsquo;s default set of inspections. There are some other checks that it can do that are switched off by default, because they are expensive, initially overwhelming, or downright obscure. If you need to add these checks to your eyeballing, this is how to do it.\n\u0026hellip; make sure everything is typed? Some applications (or a general notion of cleanliness) require that every individual in an RDF model has an explicit rdf:type. The Eyeball check for this isn\u0026rsquo;t enabled by default, because lots of casual RDF use doesn\u0026rsquo;t need it, and more sophisticated use has models with enough inference power to infer types.\nYou can add the all-typed inspector to the inspectors that Eyeball will run by adding to the command line:\n-inspectors defaultInspectors all-typed The all-typed inspector will generate a message\nresource has no rdf:type for each resource in the model which is not the subject of an rdf:type statement.\n\u0026hellip; check for type consistency? One easy mistake to make in RDF is to make an assertion - we\u0026rsquo;ll call it S P O - about some subject S which is \u0026ldquo;of the wrong type\u0026rdquo;, that is, not of whatever type P\u0026rsquo;s domain is. This isn\u0026rsquo;t, in principle, an error, since RDF resources can have multiple types, and this just makes S have a type which is a subtype of both P\u0026rsquo;s domain and whatever type it was supposed to have.\nTo spot this, and related problems, Eyeball has the consistent-type inspector. You can add it to the inspections in the same way as the all-typed inspector:\n-inspectors defaultInspectors consistent-type It checks that every resource which has been given at least one type has a type which is a subtype of all its types, under an additional assumption:\nTypes in the type graph (the network of rdfs:subClassOf statements) are disjoint (share no instances) unless the type graph says they're not. For example, suppose that both A and B are subclasses of Top, and that there are no other subclass relationships. Then consistent-types assumes that there are (supposed to be) no resources which have both A and B as types. If it finds a resource X which does have both types, it generates a message like this:\nno consistent type for: X has associated type: A has associated type: B has associated type: Top It\u0026rsquo;s up to you to disentangle the types and work out what went wrong.\nNote: this test requires that Eyeball do a significant amount of inference, to complete the type hierarchy and check the domains and ranges of properties. It\u0026rsquo;s quite slow, which is one reason it isn\u0026rsquo;t switched on by default.\n\u0026hellip; check the right number of values for a property? You want to make sure that your data has the right properties for things of a certain type: say, that a book has at least one author (or editor), an album has at least one track, nobody in your organisation has more than ten managers, a Jena contrib has at least a dc:creator, a dc:name, and a dc:description. You write some OWL cardinality constraints:\nmy:Type rdfs:subClassOf [owl:onProperty my:track; owl:minCardinality 1] Then you discover that, for wildly technical reasons, the OWL validation code in Jena doesn\u0026rsquo;t think it\u0026rsquo;s an error for some album to have no tracks (maybe there\u0026rsquo;s a namespace error). You can enable Eyeball\u0026rsquo;s cardinality inspector by adding\n-inspectors cardinality to the command line. You\u0026rsquo;ll now get a report item for every resource that has rdf:type your restricted type (my:Type above) but doesn\u0026rsquo;t have the right (at least one) value for the property. It will look something like:\ncardinality failure for: my:Instance on type: my:Type on property: my:track cardinality range: [min: 1] number of values: 0 values: {} If there are some values for the property - say you\u0026rsquo;ve supplied an owl:maxCardinality restriction and then gone over the top - they get listed inside the values curly braces.\n","permalink":"https://jena.apache.org/documentation/archive/eyeball/eyeball-guide.html","tags":null,"title":"A brief guide to Jena Eyeball"},{"categories":null,"contents":"Overview The goal of this document is to add Jena Permissions to a fuseki deployment to restrict access to graph data. This example will take the example application, deploy the data to a fuseki instance and add the Jena Permissions to achieve the same access restrictions that the example application has.\nTo do this you will need a Fuseki installation, the Permissions Packages and a SecurityEvaluator implementation. For this example we will use the SecurityEvaluator from the permissions-example.\nSet up Fuseki can be downloaded from: https://repository.apache.org/content/repositories/releases/org/apache/jena/apache-jena-fuseki/\nJena Permissions jars can be downloaded from: https://repository.apache.org/content/repositories/releases/org/apache/jena/jena-permissions/\nDownload and unpack Fuseki. The directory that you unpack Fuseki into will be referred to as the Fuseki Home directory for the remainder of this document.\nDownload the permissions jar and the associated permissions-example jar.\nCopy the permissions jar and the permissions-example jar into the Fuseki Home directory. For the rest of this document the permissions jar will be referred to as permissions.jar and the permissions-example.jar as example.jar\nDownload the Apache Commons Collections v4. Uncompress the commons-collections*.jar into the Fuseki Home directory.\nAdd security jars to the startup script/batch file.\nOn *NIX edit fuseki-server script\nComment out the line that reads exec java $JVM_ARGS -jar \u0026quot;$JAR\u0026quot; \u0026quot;$@\u0026quot; Uncomment the line that reads ## APPJAR=MyCode.jar Uncomment the line that reads ## java $JVM_ARGS -cp \u0026quot;$JAR:$APPJAR\u0026quot; org.apache.jena.fuseki.cmd.FusekiCmd \u0026quot;$@\u0026quot; change MyCode.jar to permissions.jar:example.jar:commons-collections*.jar On Windows edit fuseki-server.bat file.\nComment out the line that reads java -Xmx1200M -jar fuseki-server.jar %* Uncomment the line that reads @REM java ... -cp fuseki-server.jar;MyCustomCode.jar org.apache.jena.fuseki.cmd.FusekiCmd %* Change MyCustomCode.jar to permissions.jar;example.jar;commons-collections*.jar Run the fuseki-server script or batch file.\nStop the server.\nExtract the example configuration into the newly created Fuseki Home/run directory. From the example.jar archive:\nextract /org/apache/jena/permissions/example/example.ttl into the Fuseki Home/run directory extract /org/apache/jena/permissions/example/fuseki/config.ttl into the Fuseki Home/run directory extract /org/apache/jena/permissions/example/fuseki/shiro.ini into the Fuseki Home/run directory Run fuseki-server –config=run/config.ttl or fuseki-server.bat –config=run/config.ttl\nReview of configuration At this point the system is configured with the following logins:\nLoginpasswordAccess to adminadminEverything alicealiceOnly messages to or from alice bobbobOnly messages to or from bob chuckchuckOnly messages to or from chuck darladarlaOnly messages to or from darla The messages graph is defined in the run/example.ttl file.\nThe run/shiro.ini file lists the users and their passwords and configures Fuseki to require authentication to access to the graphs.\nThe run/config.ttl file adds the permissions to the graph as follows by applying the org.apache.jena.permissions.example.ShiroExampleEvaluator security evaluator to the message graph.\nDefine all the prefixes.\nPREFIX fuseki: \u0026lt;http://jena.apache.org/fuseki#\u0026gt; PREFIX tdb: \u0026lt;http://jena.hpl.hp.com/2008/tdb#\u0026gt; PREFIX rdf: \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026gt; PREFIX rdfs: \u0026lt;http://www.w3.org/2000/01/rdf-schema#\u0026gt; PREFIX ja: \u0026lt;http://jena.hpl.hp.com/2005/11/Assembler#\u0026gt; PREFIX perm: \u0026lt;http://apache.org/jena/permissions/Assembler#\u0026gt; PREFIX my: \u0026lt;http://example.org/#\u0026gt; Load the SecuredAssembler class from the permissions library and define the perm:Model as a subclass of ja:NamedModel.\n[] ja:loadClass \u0026quot;org.apache.jena.permissions.SecuredAssembler\u0026quot; . perm:Model rdfs:subClassOf ja:NamedModel . Define the base model that contains the unsecured data. This can be any model type. For our example we use an in memory model that reads the example.ttl file.\nmy:baseModel rdf:type ja:MemoryModel; ja:content [ja:externalContent \u0026lt;file:./example.ttl\u0026gt;] . Define the secured model. This is where permissions is applied to the my:baseModel to create a model that has permission restrictions. Note that it is using the security evaluator implementation (sec:evaluatorImpl) called my:secEvaluator which we will define next.\nmy:securedModel rdf:type sec:Model ; perm:baseModel my:baseModel ; ja:modelName \u0026quot;https://example.org/securedModel\u0026quot; ; perm:evaluatorImpl my:secEvaluator . Define the security evaluator. This is where we use the example ShiroExampleEvaluator. For your production environment you will replace \u0026ldquo;org.apache.jena.security.example.ShiroExampleEvaluator\u0026rdquo; with your SecurityEvaluator implementation. Note that ShiroExampleEvaluator constructor takes a Model argument. We pass in the unsecured baseModel so that the evaluator can read it unencumbered. Your implementation of SecurityEvaluator may have different parameters to meet your specific needs.\nmy:secEvaluator rdf:type perm:Evaluator ; perm:args [ rdf:_1 my:baseModel ; ] ; perm:evaluatorClass \u0026quot;org.apache.jena.permissions.example.ShiroExampleEvaluator\u0026quot; . Define the dataset that we will use for in the server. Note that in the example dataset only contains the single secured model, adding multiple models and missing secured and unsecured models is supported.\nmy:securedDataset rdf:type ja:RDFDataset ; ja:defaultGraph my:securedModel . Define the fuseki:Server.\nmy:fuseki rdf:type fuseki:Server ; fuseki:services ( my:service1 ) . Define the service for the fuseki:Service. Note that the fuseki:dataset served by this server is the secured dataset defined above.\nmy:service1 rdf:type fuseki:Service ; rdfs:label \u0026quot;My Secured Data Service\u0026quot; ; fuseki:name \u0026quot;myAppFuseki\u0026quot; ; # http://host:port/myAppFuseki fuseki:serviceQuery \u0026quot;query\u0026quot; ; # SPARQL query service fuseki:serviceQuery \u0026quot;sparql\u0026quot; ; # SPARQL query service fuseki:serviceUpdate \u0026quot;update\u0026quot; ; # SPARQL query service fuseki:serviceReadWriteGraphStore \u0026quot;data\u0026quot; ; # SPARQL Graph store protocol (read and write) # A separate read-only graph store endpoint: fuseki:serviceReadGraphStore \u0026quot;get\u0026quot; ; # SPARQL Graph store protocol (read only) fuseki:dataset my:securedDataset ; . Review of ShiroExampleEvaluator The ShiroExampleEvaluator uses triple level permissions to limit access to the \u0026ldquo;messages\u0026rdquo; in the graph to only those people in the message is address to or from. It is connected to the Shiro system by the getPrincipal() implementation where it simply calls the Shiro SecurityUtils.getSubject() method to return the current shiro user.\n/** * Return the Shiro subject. This is the subject that Shiro currently has logged in. */ @Override public Object getPrincipal() { return SecurityUtils.getSubject(); } This example allows any action on a graph as is seen in the evaluate(Object principal, Action action, Node graphIRI) and evaluateAny(Object principal, Set\u0026lt;Action\u0026gt; actions, Node graphIRI) methods. This is the first permissions check. If you wish to restrict users from specific graphs this method should be recoded to perform the check.\n/** * We allow any action on the graph itself, so this is always true. */ @Override public boolean evaluate(Object principal, Action action, Node graphIRI) { // we allow any action on a graph. return true; } /** * As per our design, users can access any graph. If we were to implement rules that * restricted user access to specific graphs, those checks would be here and we would * return \u0026lt;code\u0026gt;false\u0026lt;/code\u0026gt; if they were not allowed to access the graph. Note that this * method is checking to see that the user may perform ANY of the actions in the set on the * graph. */ @Override public boolean evaluateAny(Object principal, Set\u0026lt;Action\u0026gt; actions, Node graphIRI) { return true; } The other overridden methods are implemented using one of three (3) private methods that evaluate if the user should have access to the data based on our security design. To implement your security design you should understand what each of the methods checks. See the SecurityEvaluator javadocs and SecurityEvaluator implementation notes.\n","permalink":"https://jena.apache.org/documentation/permissions/example.html","tags":null,"title":"Adding Jena Permissions to Fuseki"},{"categories":null,"contents":"Outra forma de lidar com dado semiestruturado é consultar uma de um número de possibilidades. Essa sessão cobre O padrão UNION, onde uma de um número de possibilidades é testado.\nUNION - duas maneiras para o mesmo dado Ambos os vocabulários de vcard e de FOAF possuem propriedades para nome de pessoas. Em vcard, é vcard:FN, o nome formatado, e em FOAF, é foaf:name. Nesta sessão, vamos olhar um pequeno conjunto de dados onde o nome das pessoas podem ser dados por ambos os vocabulários de FOAF e vcard.\nSuponha que temos um an RDF graph que contém a informação de nome usando os vocabulários de vcard e FOAF.\n@prefix foaf: \u0026lt;http://xmlns.com/foaf/0.1/\u0026gt; . @prefix vcard: \u0026lt;http://www.w3.org/2001/vcard-rdf/3.0#\u0026gt; . _:a foaf:name \u0026#34;Matt Jones\u0026#34; . _:b foaf:name \u0026#34;Sarah Jones\u0026#34; . _:c vcard:FN \u0026#34;Becky Smith\u0026#34; . _:d vcard:FN \u0026#34;John Smith\u0026#34; . Uma consulta para acessar a informação do nome, poderia ser (q-union1.rq):\nPREFIX foaf: \u0026lt;http://xmlns.com/foaf/0.1/\u0026gt; PREFIX vCard: \u0026lt;http://www.w3.org/2001/vcard-rdf/3.0#\u0026gt; SELECT ?name WHERE { { [] foaf:name ?name } UNION { [] vCard:FN ?name } } isso retorna:\n----------------- | name | ================= | \u0026#34;Matt Jones\u0026#34; | | \u0026#34;Sarah Jones\u0026#34; | | \u0026#34;Becky Smith\u0026#34; | | \u0026#34;John Smith\u0026#34; | ----------------- Não importa que forma de expressão usasse para o nome, a variável ?name é preenchida. Isso pode ser obtido usando um FILTER como mostra essa consulta (q-union-1alt.rq):\nPREFIX foaf: \u0026lt;http://xmlns.com/foaf/0.1/\u0026gt; PREFIX vCard: \u0026lt;http://www.w3.org/2001/vcard-rdf/3.0#\u0026gt; SELECT ?name WHERE { [] ?p ?name FILTER ( ?p = foaf:name || ?p = vCard:FN ) } testando se a propriedade é uma URI ou a outra. As soluções podem não vir na mesma ordem. A primeira forma é conhecida como a mais rápida, dependendo dos dados e do armazenamento utilizado, porque a segunda forma tem que pegar todas as triplas do grafo para casar com o padrão da tripla com variáveis não ligadas (ou blank nodes) em cada slot, então testa cada ?p para ver se casa com algum dos valores. Isso vai depender da sofisticação do otimizador de consultas para saber se ele vai executar a consulta mais eficientemente e transcender para a camada de armazenamento.\nUnion - relembrando onde o dado foi encontrado. O exemplo acima usou a mesma variável em cada ramo. Se diferentes variáveis forem usadas, a aplicação pode descobrir que sub-padrão causou o casamento (q-union2.rq):\nPREFIX foaf: \u0026lt;http://xmlns.com/foaf/0.1/\u0026gt; PREFIX vCard: \u0026lt;http://www.w3.org/2001/vcard-rdf/3.0#\u0026gt; SELECT ?name1 ?name2 WHERE { { [] foaf:name ?name1 } UNION { [] vCard:FN ?name2 } } --------------------------------- | name1 | name2 | ================================= | \u0026#34;Matt Jones\u0026#34; | | | \u0026#34;Sarah Jones\u0026#34; | | | | \u0026#34;Becky Smith\u0026#34; | | | \u0026#34;John Smith\u0026#34; | --------------------------------- Essa segunda consulta guardou informação sobre onde o name da pessoa se originou atribuindo o nome para diferentes variáveis.\nOPTIONAL e UNION Na prática, OPTIONAL é mais comum que UNION mas ambas têm seu uso. OPTIONAL é útil para aumentar as soluções encontradas, UNION é útil para concatenar soluções de diferentes possibilidades. Eles não retornam necessariamente a informação da mesma maneira.\nConsulta(q-union3.rq):\nPREFIX foaf: \u0026lt;http://xmlns.com/foaf/0.1/\u0026gt; PREFIX vCard: \u0026lt;http://www.w3.org/2001/vcard-rdf/3.0#\u0026gt; SELECT ?name1 ?name2 WHERE { ?x a foaf:Person OPTIONAL { ?x foaf:name ?name1 } OPTIONAL { ?x vCard:FN ?name2 } } --------------------------------- | name1 | name2 | ================================= | \u0026#34;Matt Jones\u0026#34; | | | \u0026#34;Sarah Jones\u0026#34; | | | | \u0026#34;Becky Smith\u0026#34; | | | \u0026#34;John Smith\u0026#34; | --------------------------------- Mas cuidado ao usar ?name em cada OPTIONAL porque isso é uma consulta dependente da ordem.\nPróximo: Grafos Nomeados\n","permalink":"https://jena.apache.org/tutorials/sparql_union_pt.html","tags":null,"title":"Alternativas num padrão"},{"categories":null,"contents":"Preface This is a tutorial introduction to both W3C\u0026rsquo;s Resource Description Framework (RDF) and Jena, a Java API for RDF. It is written for the programmer who is unfamiliar with RDF and who learns best by prototyping, or, for other reasons, wishes to move quickly to implementation. Some familiarity with both XML and Java is assumed.\nImplementing too quickly, without first understanding the RDF data model, leads to frustration and disappointment. Yet studying the data model alone is dry stuff and often leads to tortuous metaphysical conundrums. It is better to approach understanding both the data model and how to use it in parallel. Learn a bit of the data model and try it out. Then learn a bit more and try that out. Then the theory informs the practice and the practice the theory. The data model is quite simple, so this approach does not take long.\nRDF has an XML syntax and many who are familiar with XML will think of RDF in terms of that syntax. This is a mistake. RDF should be understood in terms of its data model. RDF data can be represented in XML, but understanding the syntax is secondary to understanding the data model.\nAn implementation of the Jena API, including the working source code for all the examples used in this tutorial can be downloaded from jena.apache.org/download/index.cgi.\nIntroduction The Resource Description Framework (RDF) is a standard (technically a W3C Recommendation) for describing resources. What is a resource? That is rather a deep question and the precise definition is still the subject of debate. For our purposes we can think of it as anything we can identify. You are a resource, as is your home page, this tutorial, the number one and the great white whale in Moby Dick.\nOur examples in this tutorial will be about people. They use an RDF representation of VCARDS. RDF is best thought of in the form of node and arc diagrams. A simple vcard might look like this in RDF:\nThe resource, John Smith, is shown as an ellipse and is identified by a Uniform Resource Identifier (URI)1, in this case \"http://.../JohnSmith\". If you try to access that resource using your browser, you are unlikely to be successful; April the first jokes not withstanding, you would be rather surprised if your browser were able to deliver John Smith to your desk top. If you are unfamiliar with URI's, think of them simply as rather strange looking names.\nResources have properties. In these examples we are interested in the sort of properties that would appear on John Smith's business card. Figure 1 shows only one property, John Smith's full name. A property is represented by an arc, labeled with the name of a property. The name of a property is also a URI, but as URI's are rather long and cumbersome, the diagram shows it in XML qname form. The part before the ':' is called a namespace prefix and represents a namespace. The part after the ':' is called a local name and represents a name in that namespace. Properties are usually represented in this qname form when written as RDF XML and it is a convenient shorthand for representing them in diagrams and in text. Strictly, however, properties are identified by a URI. The nsprefix:localname form is a shorthand for the URI of the namespace concatenated with the localname. There is no requirement that the URI of a property resolve to anything when accessed by a browser.\nEach property has a value. In this case the value is a literal, which for now we can think of as a strings of characters2. Literals are shown in rectangles.\nJena is a Java API which can be used to create and manipulate RDF graphs like this one. Jena has object classes to represent graphs, resources, properties and literals. The interfaces representing resources, properties and literals are called Resource, Property and Literal respectively. In Jena, a graph is called a model and is represented by the Model interface.\nThe code to create this graph, or model, is simple:\n// some definitions static String personURI = \u0026#34;http://somewhere/JohnSmith\u0026#34;; static String fullName = \u0026#34;John Smith\u0026#34;; // create an empty Model Model model = ModelFactory.createDefaultModel(); // create the resource Resource johnSmith = model.createResource(personURI); // add the property johnSmith.addProperty(VCARD.FN, fullName); It begins with some constant definitions and then creates an empty Model or model, using the ModelFactory method createDefaultModel() to create a memory-based model. Jena contains other implementations of the Model interface, e.g one which uses a relational database: these types of Model are also available from ModelFactory.\nThe John Smith resource is then created and a property added to it. The property is provided by a \"constant\" class VCARD which holds objects representing all the definitions in the VCARD schema. Jena provides constant classes for other well known schemas, such as RDF and RDF schema themselves, Dublin Core and OWL.\nThe working code for this example can be found in the /src-examples directory of the Jena distribution as tutorial 1. As an exercise, take this code and modify it to create a simple VCARD for yourself.\nThe code to create the resource and add the property, can be more compactly written in a cascading style:\nResource johnSmith = model.createResource(personURI) .addProperty(VCARD.FN, fullName); Now let's add some more detail to the vcard, exploring some more features of RDF and Jena.\nIn the first example, the property value was a literal. RDF properties can also take other resources as their value. Using a common RDF technique, this example shows how to represent the different parts of John Smith's name:\nHere we have added a new property, vcard:N, to represent the structure of John Smith's name. There are several things of interest about this Model. Note that the vcard:N property takes a resource as its value. Note also that the ellipse representing the compound name has no URI. It is known as an blank Node.\nThe Jena code to construct this example, is again very simple. First some declarations and the creation of the empty model.\n// some definitions String personURI = \u0026#34;http://somewhere/JohnSmith\u0026#34;; String givenName = \u0026#34;John\u0026#34;; String familyName = \u0026#34;Smith\u0026#34;; String fullName = givenName + \u0026#34; \u0026#34; + familyName; // create an empty Model Model model = ModelFactory.createDefaultModel(); // create the resource // and add the properties cascading style Resource johnSmith = model.createResource(personURI) .addProperty(VCARD.FN, fullName) .addProperty(VCARD.N, model.createResource() .addProperty(VCARD.Given, givenName) .addProperty(VCARD.Family, familyName)); The working code for this example can be found as tutorial 2 in the /src-examples directory of the Jena distribution.\nStatements Each arc in an RDF Model is called a statement. Each statement asserts a fact about a resource. A statement has three parts:\nthe subject is the resource from which the arc leaves the predicate is the property that labels the arc the object is the resource or literal pointed to by the arc A statement is sometimes called a triple, because of its three parts.\nAn RDF Model is represented as a set of statements. Each call of addProperty in tutorial2 added another statement to the Model. (Because a Model is set of statements, adding a duplicate of a statement has no effect.) The Jena model interface defines a listStatements() method which returns an StmtIterator, a subtype of Java's Iterator over all the statements in a Model. StmtIterator has a method nextStatement() which returns the next statement from the iterator (the same one that next() would deliver, already cast to Statement). The Statement interface provides accessor methods to the subject, predicate and object of a statement.\nNow we will use that interface to extend tutorial2 to list all the statements created and print them out. The complete code for this can be found in tutorial 3.\n// list the statements in the Model StmtIterator iter = model.listStatements(); // print out the predicate, subject and object of each statement while (iter.hasNext()) { Statement stmt = iter.nextStatement(); // get next statement Resource subject = stmt.getSubject(); // get the subject Property predicate = stmt.getPredicate(); // get the predicate RDFNode object = stmt.getObject(); // get the object System.out.print(subject.toString()); System.out.print(\u0026#34; \u0026#34; + predicate.toString() + \u0026#34; \u0026#34;); if (object instanceof Resource) { System.out.print(object.toString()); } else { // object is a literal System.out.print(\u0026#34; \\\u0026#34;\u0026#34; + object.toString() + \u0026#34;\\\u0026#34;\u0026#34;); } System.out.println(\u0026#34; .\u0026#34;); } Since the object of a statement can be either a resource or a literal, the getObject() method returns an object typed as RDFNode, which is a common superclass of both Resource and Literal. The underlying object is of the appropriate type, so the code uses instanceof to determine which and processes it accordingly.\nWhen run, this program should produce output resembling:\nhttp://somewhere/JohnSmith http://www.w3.org/2001/vcard-rdf/3.0#N 413f6415-c3b0-4259-b74d-4bd6e757eb60 . 413f6415-c3b0-4259-b74d-4bd6e757eb60 http://www.w3.org/2001/vcard-rdf/3.0#Family \u0026#34;Smith\u0026#34; . 413f6415-c3b0-4259-b74d-4bd6e757eb60 http://www.w3.org/2001/vcard-rdf/3.0#Given \u0026#34;John\u0026#34; . http://somewhere/JohnSmith http://www.w3.org/2001/vcard-rdf/3.0#FN \u0026#34;John Smith\u0026#34; . Now you know why it is clearer to draw Models. If you look carefully, you will see that each line consists of three fields representing the subject, predicate and object of each statement. There are four arcs in the Model, so there are four statements. The \"14df86:ecc3dee17b:-7fff\" is an internal identifier generated by Jena. It is not a URI and should not be confused with one. It is simply an internal label used by the Jena implementation.\nThe W3C RDFCore Working Group have defined a similar simple notation called N-Triples. The name means \"triple notation\". We will see in the next section that Jena has an N-Triples writer built in.\nWriting RDF Jena has methods for reading and writing RDF as XML. These can be used to save an RDF model to a file and later read it back in again.\nTutorial 3 created a model and wrote it out in triple form. Tutorial 4 modifies tutorial 3 to write the model in RDF XML form to the standard output stream. The code again, is very simple: model.write can take an OutputStream argument.\n// now write the model in XML form to a file model.write(System.out); The output should look something like this:\n\u0026lt;rdf:RDF xmlns:rdf=\u0026#39;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026#39; xmlns:vcard=\u0026#39;http://www.w3.org/2001/vcard-rdf/3.0#\u0026#39; \u0026gt; \u0026lt;rdf:Description rdf:about=\u0026#39;http://somewhere/JohnSmith\u0026#39;\u0026gt; \u0026lt;vcard:FN\u0026gt;John Smith\u0026lt;/vcard:FN\u0026gt; \u0026lt;vcard:N rdf:nodeID=\u0026#34;A0\u0026#34;/\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;rdf:Description rdf:nodeID=\u0026#34;A0\u0026#34;\u0026gt; \u0026lt;vcard:Given\u0026gt;John\u0026lt;/vcard:Given\u0026gt; \u0026lt;vcard:Family\u0026gt;Smith\u0026lt;/vcard:Family\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;/rdf:RDF\u0026gt; The RDF specifications specify how to represent RDF as XML. The RDF XML syntax is quite complex. The reader is referred to the primer being developed by the RDFCore WG for a more detailed introduction. However, let's take a quick look at how to interpret the above.\nRDF is usually embedded in an \u0026lt;rdf:RDF\u0026gt; element. The element is optional if there are other ways of knowing that some XML is RDF, but it is usually present. The RDF element defines the two namespaces used in the document. There is then an \u0026lt;rdf:Description\u0026gt; element which describes the resource whose URI is \"http://somewhere/JohnSmith\". If the rdf:about attribute was missing, this element would represent a blank node.\nThe \u0026lt;vcard:FN\u0026gt; element describes a property of the resource. The property name is the \"FN\" in the vcard namespace. RDF converts this to a URI reference by concatenating the URI reference for the namespace prefix and \"FN\", the local name part of the name. This gives a URI reference of \"http://www.w3.org/2001/vcard-rdf/3.0#FN\". The value of the property is the literal \"John Smith\".\nThe \u0026lt;vcard:N\u0026gt; element is a resource. In this case the resource is represented by a relative URI reference. RDF converts this to an absolute URI reference by concatenating it with the base URI of the current document.\nThere is an error in this RDF XML; it does not exactly represent the Model we created. The blank node in the Model has been given a URI reference. It is no longer blank. The RDF/XML syntax is not capable of representing all RDF Models; for example it cannot represent a blank node which is the object of two statements. The 'dumb' writer we used to write this RDF/XML makes no attempt to write correctly the subset of Models which can be written correctly. It gives a URI to each blank node, making it no longer blank.\nJena has an extensible interface which allows new writers for different serialization languages for RDF to be easily plugged in. The above call invoked the standard 'dumb' writer. Jena also includes a more sophisticated RDF/XML writer which can be invoked by using RDFDataMgr.write function call:\n// now write the model in a pretty form RDFDataMgr.write(System.out, model, Lang.RDFXML); This writer, the so called PrettyWriter, takes advantage of features of the RDF/XML abbreviated syntax to write a Model more compactly. It is also able to preserve blank nodes where that is possible. It is however, not suitable for writing very large Models, as its performance is unlikely to be acceptable. To write large files and preserve blank nodes, write in N-Triples format:\n// now write the model in N-TRIPLES form RDFDataMgr.write(System.out, model, Lang.NTRIPLES); This will produce output similar to that of tutorial 3 which conforms to the N-Triples specification.\nReading RDF Tutorial 5 demonstrates reading the statements recorded in RDF XML form into a model. With this tutorial, we have provided a small database of vcards in RDF/XML form. The following code will read it in and write it out. Note that for this application to run, the input file must be in the current directory.\n// create an empty model Model model = ModelFactory.createDefaultModel(); // use the RDFDataMgr to find the input file InputStream in = RDFDataMgr.open( inputFileName ); if (in == null) { throw new IllegalArgumentException(\u0026#34;File: \u0026#34; + inputFileName + \u0026#34; not found\u0026#34;); } // read the RDF/XML file model.read(in, null); // write it to standard out model.write(System.out); The second argument to the read() method call is the URI which will be used for resolving relative URI's. As there are no relative URI references in the test file, it is allowed to be empty. When run, tutorial 5 will produce XML output which looks like:\n\u0026lt;rdf:RDF xmlns:rdf=\u0026#39;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026#39; xmlns:vcard=\u0026#39;http://www.w3.org/2001/vcard-rdf/3.0#\u0026#39; \u0026gt; \u0026lt;rdf:Description rdf:nodeID=\u0026#34;A0\u0026#34;\u0026gt; \u0026lt;vcard:Family\u0026gt;Smith\u0026lt;/vcard:Family\u0026gt; \u0026lt;vcard:Given\u0026gt;John\u0026lt;/vcard:Given\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;rdf:Description rdf:about=\u0026#39;http://somewhere/JohnSmith/\u0026#39;\u0026gt; \u0026lt;vcard:FN\u0026gt;John Smith\u0026lt;/vcard:FN\u0026gt; \u0026lt;vcard:N rdf:nodeID=\u0026#34;A0\u0026#34;/\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;rdf:Description rdf:about=\u0026#39;http://somewhere/SarahJones/\u0026#39;\u0026gt; \u0026lt;vcard:FN\u0026gt;Sarah Jones\u0026lt;/vcard:FN\u0026gt; \u0026lt;vcard:N rdf:nodeID=\u0026#34;A1\u0026#34;/\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;rdf:Description rdf:about=\u0026#39;http://somewhere/MattJones/\u0026#39;\u0026gt; \u0026lt;vcard:FN\u0026gt;Matt Jones\u0026lt;/vcard:FN\u0026gt; \u0026lt;vcard:N rdf:nodeID=\u0026#34;A2\u0026#34;/\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;rdf:Description rdf:nodeID=\u0026#34;A3\u0026#34;\u0026gt; \u0026lt;vcard:Family\u0026gt;Smith\u0026lt;/vcard:Family\u0026gt; \u0026lt;vcard:Given\u0026gt;Rebecca\u0026lt;/vcard:Given\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;rdf:Description rdf:nodeID=\u0026#34;A1\u0026#34;\u0026gt; \u0026lt;vcard:Family\u0026gt;Jones\u0026lt;/vcard:Family\u0026gt; \u0026lt;vcard:Given\u0026gt;Sarah\u0026lt;/vcard:Given\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;rdf:Description rdf:nodeID=\u0026#34;A2\u0026#34;\u0026gt; \u0026lt;vcard:Family\u0026gt;Jones\u0026lt;/vcard:Family\u0026gt; \u0026lt;vcard:Given\u0026gt;Matthew\u0026lt;/vcard:Given\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;rdf:Description rdf:about=\u0026#39;http://somewhere/RebeccaSmith/\u0026#39;\u0026gt; \u0026lt;vcard:FN\u0026gt;Becky Smith\u0026lt;/vcard:FN\u0026gt; \u0026lt;vcard:N rdf:nodeID=\u0026#34;A3\u0026#34;/\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;/rdf:RDF\u0026gt; Controlling Prefixes Explicit prefix definitions In the previous section, we saw that the output XML declared a namespace prefix vcard and used that prefix to abbreviate URIs. While RDF uses only the full URIs, and not this shortened form, Jena provides ways of controlling the namespaces used on output with its prefix mappings. Here\u0026rsquo;s a simple example.\nModel m = ModelFactory.createDefaultModel(); String nsA = \u0026#34;http://somewhere/else#\u0026#34;; String nsB = \u0026#34;http://nowhere/else#\u0026#34;; Resource root = m.createResource( nsA + \u0026#34;root\u0026#34; ); Property P = m.createProperty( nsA + \u0026#34;P\u0026#34; ); Property Q = m.createProperty( nsB + \u0026#34;Q\u0026#34; ); Resource x = m.createResource( nsA + \u0026#34;x\u0026#34; ); Resource y = m.createResource( nsA + \u0026#34;y\u0026#34; ); Resource z = m.createResource( nsA + \u0026#34;z\u0026#34; ); m.add( root, P, x ).add( root, P, y ).add( y, Q, z ); System.out.println( \u0026#34;# -- no special prefixes defined\u0026#34; ); m.write( System.out ); System.out.println( \u0026#34;# -- nsA defined\u0026#34; ); m.setNsPrefix( \u0026#34;nsA\u0026#34;, nsA ); m.write( System.out ); System.out.println( \u0026#34;# -- nsA and cat defined\u0026#34; ); m.setNsPrefix( \u0026#34;cat\u0026#34;, nsB ); m.write( System.out ); The output from this fragment is lots of RDF/XML, with three different prefix mappings. First the default, with no prefixes other than the standard ones:\n# -- no special prefixes defined \u0026lt;rdf:RDF xmlns:j.0=\u0026#34;http://nowhere/else#\u0026#34; xmlns:rdf=\u0026#34;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026#34; xmlns:j.1=\u0026#34;http://somewhere/else#\u0026#34; \u0026gt; \u0026lt;rdf:Description rdf:about=\u0026#34;http://somewhere/else#root\u0026#34;\u0026gt; \u0026lt;j.1:P rdf:resource=\u0026#34;http://somewhere/else#x\u0026#34;/\u0026gt; \u0026lt;j.1:P rdf:resource=\u0026#34;http://somewhere/else#y\u0026#34;/\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;rdf:Description rdf:about=\u0026#34;http://somewhere/else#y\u0026#34;\u0026gt; \u0026lt;j.0:Q rdf:resource=\u0026#34;http://somewhere/else#z\u0026#34;/\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;/rdf:RDF\u0026gt; We see that the rdf namespace is declared automatically, since it is required for tags such as \u0026lt;rdf:RDF\u0026gt; and \u0026lt;rdf:resource\u0026gt;. XML namespace declarations are also needed for using the two properties P and Q, but since their prefixes have not been introduced to the model in this example, they get invented namespace names: j.0 and j.1.\nThe method setNsPrefix(String prefix, String URI) declares that the namespace URI may be abbreviated by prefix. Jena requires that prefix be a legal XML namespace name, and that URI ends with a non-name character. The RDF/XML writer will turn these prefix declarations into XML namespace declarations and use them in its output: # -- nsA defined \u0026lt;rdf:RDF xmlns:j.0=\u0026#34;http://nowhere/else#\u0026#34; xmlns:rdf=\u0026#34;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026#34; xmlns:nsA=\u0026#34;http://somewhere/else#\u0026#34; \u0026gt; \u0026lt;rdf:Description rdf:about=\u0026#34;http://somewhere/else#root\u0026#34;\u0026gt; \u0026lt;nsA:P rdf:resource=\u0026#34;http://somewhere/else#x\u0026#34;/\u0026gt; \u0026lt;nsA:P rdf:resource=\u0026#34;http://somewhere/else#y\u0026#34;/\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;rdf:Description rdf:about=\u0026#34;http://somewhere/else#y\u0026#34;\u0026gt; \u0026lt;j.0:Q rdf:resource=\u0026#34;http://somewhere/else#z\u0026#34;/\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;/rdf:RDF\u0026gt; is now used in the property tags. There\u0026rsquo;s no need for the prefix name to have anything to do with the variables in the Jena code:\n# -- nsA and cat defined \u0026lt;rdf:RDF xmlns:cat=\u0026#34;http://nowhere/else#\u0026#34; xmlns:rdf=\u0026#34;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026#34; xmlns:nsA=\u0026#34;http://somewhere/else#\u0026#34; \u0026gt; \u0026lt;rdf:Description rdf:about=\u0026#34;http://somewhere/else#root\u0026#34;\u0026gt; \u0026lt;nsA:P rdf:resource=\u0026#34;http://somewhere/else#x\u0026#34;/\u0026gt; \u0026lt;nsA:P rdf:resource=\u0026#34;http://somewhere/else#y\u0026#34;/\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;rdf:Description rdf:about=\u0026#34;http://somewhere/else#y\u0026#34;\u0026gt; \u0026lt;cat:Q rdf:resource=\u0026#34;http://somewhere/else#z\u0026#34;/\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;/rdf:RDF\u0026gt; Both prefixes are used for output, and no generated prefixes are needed.\nImplicit prefix definitions As well as prefix declarations provided by calls to setNsPrefix, Jena will remember the prefixes that were used in input to model.read().\nTake the output produced by the previous fragment, and paste it into some file, with URL file:/tmp/fragment.rdf say. Then run the code: Model m2 = ModelFactory.createDefaultModel(); m2.read( \u0026#34;file:/tmp/fragment.rdf\u0026#34; ); m2.write( System.out ); You\u0026rsquo;ll see that the prefixes from the input are preserved in the output. All the prefixes are written, even if they\u0026rsquo;re not used anywhere. You can remove a prefix with removeNsPrefix(String prefix) if you don\u0026rsquo;t want it in the output.\nSince NTriples doesn't have any short way of writing URIs, it takes no notice of prefixes on output and doesn't provide any on input. The notation N3, also supported by Jena, does have short prefixed names, and records them on input and uses them on output. Jena has further operations on the prefix mappings that a model holds, such as extracting a Java Map of the exiting mappings, or adding a whole group of mappings at once; see the documentation for PrefixMapping for details. Jena RDF Packages Jena is a Java API for semantic web applications. The key RDF package for the application developer is org.apache.jena.rdf.model. The API has been defined in terms of interfaces so that application code can work with different implementations without change. This package contains interfaces for representing models, resources, properties, literals, statements and all the other key concepts of RDF, and a ModelFactory for creating models. So that application code remains independent of the implementation, it is best if it uses interfaces wherever possible, not specific class implementations.\nThe org.apache.jena.tutorial package contains the working source code for all the examples used in this tutorial.\nThe org.apache.jena...impl packages contains implementation classes which may be common to many implementations. For example, they defines classes ResourceImpl, PropertyImpl, and LiteralImpl which may be used directly or subclassed by different implementations. Applications should rarely, if ever, use these classes directly. For example, rather than creating a new instance of ResourceImpl, it is better to use the createResource method of whatever model is being used. That way, if the model implementation has used an optimized implementation of Resource, then no conversions between the two types will be necessary.\nNavigating a Model So far, this tutorial has dealt mainly with creating, reading and writing RDF Models. It is now time to deal with accessing information held in a Model.\nGiven the URI of a resource, the resource object can be retrieved from a model using the Model.getResource(String uri) method. This method is defined to return a Resource object if one exists in the model, or otherwise to create a new one. For example, to retrieve the John Smith resource from the model read in from the file in tutorial 5:\n// retrieve the John Smith vcard resource from the model Resource vcard = model.getResource(johnSmithURI); The Resource interface defines a number of methods for accessing the properties of a resource. The Resource.getProperty(Property p) method accesses a property of the resource. This method does not follow the usual Java accessor convention in that the type of the object returned is Statement, not the Property that you might have expected. Returning the whole statement allows the application to access the value of the property using one of its accessor methods which return the object of the statement. For example to retrieve the resource which is the value of the vcard:N property:\n// retrieve the value of the N property Resource name = (Resource) vcard.getProperty(VCARD.N) .getObject(); In general, the object of a statement could be a resource or a literal, so the application code, knowing the value must be a resource, casts the returned object. One of the things that Jena tries to do is to provide type specific methods so the application does not have to cast and type checking can be done at compile time. The code fragment above, can be more conveniently written:\n// retrieve the value of the N property Resource name = vcard.getProperty(VCARD.N) .getResource(); Similarly, the literal value of a property can be retrieved:\nString fullName = vcard.getProperty(VCARD.FN) .getString(); In this example, the vcard resource has only one vcard:FN and one vcard:N property. RDF permits a resource to repeat a property; for example Adam might have more than one nickname. Let's give him two:\n// add two nickname properties to vcard vcard.addProperty(VCARD.NICKNAME, \u0026#34;Smithy\u0026#34;) .addProperty(VCARD.NICKNAME, \u0026#34;Adman\u0026#34;); As noted before, Jena represents an RDF Model as set of statements, so adding a statement with the subject, predicate and object as one already in the Model will have no effect. Jena does not define which of the two nicknames present in the Model will be returned. The result of calling vcard.getProperty(VCARD.NICKNAME) is indeterminate. Jena will return one of the values, but there is no guarantee even that two consecutive calls will return the same value.\nIf it is possible that a property may occur more than once, then the Resource.listProperties(Property p) method can be used to return an iterator which will list them all. This method returns an iterator which returns objects of type Statement. We can list the nicknames like this:\n// set up the output System.out.println(\u0026#34;The nicknames of \\\u0026#34;\u0026#34; + fullName + \u0026#34;\\\u0026#34; are:\u0026#34;); // list the nicknames StmtIterator iter = vcard.listProperties(VCARD.NICKNAME); while (iter.hasNext()) { System.out.println(\u0026#34; \u0026#34; + iter.nextStatement() .getObject() .toString()); } This code can be found in tutorial 6. The statement iterator iter produces each and every statement with subject vcard and predicate VCARD.NICKNAME, so looping over it allows us to fetch each statement by using nextStatement(), get the object field, and convert it to a string. The code produces the following output when run:\nThe nicknames of \u0026#34;John Smith\u0026#34; are: Smithy Adman All the properties of a resource can be listed by using the listProperties() method without an argument. Querying a Model The previous section dealt with the case of navigating a model from a resource with a known URI. This section deals with searching a model. The core Jena API supports only a limited query primitive. The more powerful query facilities of SPARQL are described elsewhere.\nThe Model.listStatements() method, which lists all the statements in a model, is perhaps the crudest way of querying a model. Its use is not recommended on very large Models. Model.listSubjects() is similar, but returns an iterator over all resources that have properties, ie are the subject of some statement.\nModel.listSubjectsWithProperty(Property p, RDFNode o) will return an iterator over all the resources which have property p with value o. If we assume that only vcard resources will have vcard:FN property, and that in our data, all such resources have such a property, then we can find all the vcards like this:\n// list vcards ResIterator iter = model.listSubjectsWithProperty(VCARD.FN); while (iter.hasNext()) { Resource r = iter.nextResource(); ... } All these query methods are simply syntactic sugar over a primitive query method model.listStatements(Selector s). This method returns an iterator over all the statements in the model 'selected' by s. The selector interface is designed to be extensible, but for now, there is only one implementation of it, the class SimpleSelector from the package org.apache.jena.rdf.model. Using SimpleSelector is one of the rare occasions in Jena when it is necessary to use a specific class rather than an interface. The SimpleSelector constructor takes three arguments:\nSelector selector = new SimpleSelector(subject, predicate, object); This selector will select all statements with a subject that matches subject, a predicate that matches predicate and an object that matches object. If a null is supplied in any of the positions, it matches anything; otherwise they match corresponding equal resources or literals. (Two resources are equal if they have equal URIs or are the same blank node; two literals are the same if all their components are equal.) Thus:\nSelector selector = new SimpleSelector(null, null, null); will select all the statements in a Model.\nSelector selector = new SimpleSelector(null, VCARD.FN, null); will select all the statements with VCARD.FN as their predicate, whatever the subject or object. As a special shorthand, listStatements( S, P, O ) is equivalent to\nlistStatements( new SimpleSelector( S, P, O ) ) The following code, which can be found in full in tutorial 7 lists the full names on all the vcards in the database.\n// select all the resources with a VCARD.FN property ResIterator iter = model.listSubjectsWithProperty(VCARD.FN); if (iter.hasNext()) { System.out.println(\u0026#34;The database contains vcards for:\u0026#34;); while (iter.hasNext()) { System.out.println(\u0026#34; \u0026#34; + iter.nextResource() .getProperty(VCARD.FN) .getString()); } } else { System.out.println(\u0026#34;No vcards were found in the database\u0026#34;); } This should produce output similar to the following:\nThe database contains vcards for: Sarah Jones John Smith Matt Jones Becky Smith Your next exercise is to modify this code to use SimpleSelector instead of listSubjectsWithProperty.\nLet's see how to implement some finer control over the statements selected. SimpleSelector can be subclassed and its selects method modified to perform further filtering:\n// select all the resources with a VCARD.FN property // whose value ends with \u0026#34;Smith\u0026#34; StmtIterator iter = model.listStatements( new SimpleSelector(null, VCARD.FN, (RDFNode) null) { public boolean selects(Statement s) {return s.getString().endsWith(\u0026#34;Smith\u0026#34;);} }); This sample code uses a neat Java technique of overriding a method definition inline when creating an instance of the class. Here the selects(...) method checks to ensure that the full name ends with \"Smith\". It is important to note that filtering based on the subject, predicate and object arguments takes place before the selects(...) method is called, so the extra test will only be applied to matching statements.\nThe full code can be found in tutorial 8 and produces output like this:\nThe database contains vcards for: John Smith Becky Smith You might think that:\n// do all filtering in the selects method StmtIterator iter = model.listStatements( new SimpleSelector(null, null, (RDFNode) null) { public boolean selects(Statement s) { return (subject == null || s.getSubject().equals(subject)) \u0026amp;amp;\u0026amp;amp; (predicate == null || s.getPredicate().equals(predicate)) \u0026amp;amp;\u0026amp;amp; (object == null || s.getObject().equals(object)) ; } } }); is equivalent to:\nStmtIterator iter = model.listStatements(new SimpleSelector(subject, predicate, object) Whilst functionally they may be equivalent, the first form will list all the statements in the Model and test each one individually, whilst the second allows indexes maintained by the implementation to improve performance. Try it on a large Model and see for yourself, but make a cup of coffee first.\nOperations on Models Jena provides three operations for manipulating Models as a whole. These are the common set operations of union, intersection and difference.\nThe union of two Models is the union of the sets of statements which represent each Model. This is one of the key operations that the design of RDF supports. It enables data from disparate data sources to be merged. Consider the following two Models:\nand When these are merged, the two http://...JohnSmith nodes are merged into one and the duplicate vcard:FN arc is dropped to produce:\nLet's look at the code to do this (the full code is in tutorial 9) and see what happens.\n// read the RDF/XML files model1.read(new InputStreamReader(in1), \u0026#34;\u0026#34;); model2.read(new InputStreamReader(in2), \u0026#34;\u0026#34;); // merge the Models Model model = model1.union(model2); // print the Model as RDF/XML model.write(system.out, \u0026#34;RDF/XML-ABBREV\u0026#34;); The output produced by the pretty writer looks like this:\n\u0026lt;rdf:RDF xmlns:rdf=\u0026#34;\u0026lt;a href=\u0026#34;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026#34;\u0026gt;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026lt;/a\u0026gt;\u0026#34; xmlns:vcard=\u0026#34;http://www.w3.org/2001/vcard-rdf/3.0#\u0026#34;\u0026gt; \u0026lt;rdf:Description rdf:about=\u0026#34;http://somewhere/JohnSmith/\u0026#34;\u0026gt; \u0026lt;vcard:EMAIL\u0026gt; \u0026lt;vcard:internet\u0026gt; \u0026lt;rdf:value\u0026gt;John@somewhere.com\u0026lt;/rdf:value\u0026gt; \u0026lt;/vcard:internet\u0026gt; \u0026lt;/vcard:EMAIL\u0026gt; \u0026lt;vcard:N rdf:parseType=\u0026#34;Resource\u0026#34;\u0026gt; \u0026lt;vcard:Given\u0026gt;John\u0026lt;/vcard:Given\u0026gt; \u0026lt;vcard:Family\u0026gt;Smith\u0026lt;/vcard:Family\u0026gt; \u0026lt;/vcard:N\u0026gt; \u0026lt;vcard:FN\u0026gt;John Smith\u0026lt;/vcard:FN\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;/rdf:RDF\u0026gt; Even if you are unfamiliar with the details of the RDF/XML syntax, it should be reasonably clear that the Models have merged as expected. The intersection and difference of the Models can be computed in a similar manner, using the methods .intersection(Model) and .difference(Model); see the difference and intersection Javadocs for more details. Containers RDF defines a special kind of resources for representing collections of things. These resources are called containers. The members of a container can be either literals or resources. There are three kinds of container:\na BAG is an unordered collection an ALT is an unordered collection intended to represent alternatives a SEQ is an ordered collection A container is represented by a resource. That resource will have an rdf:type property whose value should be one of rdf:Bag, rdf:Alt or rdf:Seq, or a subclass of one of these, depending on the type of the container. The first member of the container is the value of the container's rdf:_1 property; the second member of the container is the value of the container's rdf:_2 property and so on. The rdf:_nnn properties are known as the ordinal properties.\nFor example, the Model for a simple bag containing the vcards of the Smith's might look like this:\nWhilst the members of the bag are represented by the properties rdf:_1, rdf:_2 etc the ordering of the properties is not significant. We could switch the values of the rdf:_1 and rdf:_2 properties and the resulting Model would represent the same information.\nAlt's are intended to represent alternatives. For example, lets say a resource represented a software product. It might have a property to indicate where it might be obtained from. The value of that property might be an Alt collection containing various sites from which it could be downloaded. Alt's are unordered except that the rdf:_1 property has special significance. It represents the default choice.\nWhilst containers can be handled using the basic machinery of resources and properties, Jena has explicit interfaces and implementation classes to handle them. It is not a good idea to have an object manipulating a container, and at the same time to modify the state of that container using the lower level methods.\nLet's modify tutorial 8 to create this bag:\n// create a bag Bag smiths = model.createBag(); // select all the resources with a VCARD.FN property // whose value ends with \u0026#34;Smith\u0026#34; StmtIterator iter = model.listStatements( new SimpleSelector(null, VCARD.FN, (RDFNode) null) { public boolean selects(Statement s) { return s.getString().endsWith(\u0026#34;Smith\u0026#34;); } }); // add the Smith\u0026#39;s to the bag while (iter.hasNext()) { smiths.add(iter.nextStatement().getSubject()); } If we write out this Model, it contains something like the following:\n\u0026lt;rdf:RDF xmlns:rdf=\u0026#39;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026#39; xmlns:vcard=\u0026#39;http://www.w3.org/2001/vcard-rdf/3.0#\u0026#39; \u0026gt; ... \u0026lt;rdf:Description rdf:nodeID=\u0026#34;A3\u0026#34;\u0026gt; \u0026lt;rdf:type rdf:resource=\u0026#39;http://www.w3.org/1999/02/22-rdf-syntax-ns#Bag\u0026#39;/\u0026gt; \u0026lt;rdf:_1 rdf:resource=\u0026#39;http://somewhere/JohnSmith/\u0026#39;/\u0026gt; \u0026lt;rdf:_2 rdf:resource=\u0026#39;http://somewhere/RebeccaSmith/\u0026#39;/\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;/rdf:RDF\u0026gt; which represents the Bag resource.\nThe container interface provides an iterator to list the contents of a container:\n// print out the members of the bag NodeIterator iter2 = smiths.iterator(); if (iter2.hasNext()) { System.out.println(\u0026#34;The bag contains:\u0026#34;); while (iter2.hasNext()) { System.out.println(\u0026#34; \u0026#34; + ((Resource) iter2.next()) .getProperty(VCARD.FN) .getString()); } } else { System.out.println(\u0026#34;The bag is empty\u0026#34;); } which produces the following output:\nThe bag contains: John Smith Becky Smith Executable example code can be found in tutorial 10, which glues together the fragments above into a complete example.\nThe Jena classes offer methods for manipulating containers including adding new members, inserting new members into the middle of a container and removing existing members. The Jena container classes currently ensure that the list of ordinal properties used starts at rdf:_1 and is contiguous. The RDFCore WG have relaxed this constraint, which allows partial representation of containers. This therefore is an area of Jena may be changed in the future.\nMore about Literals and Datatypes RDF literals are not just simple strings. Literals may have a language tag to indicate the language of the literal. The literal \"chat\" with an English language tag is considered different to the literal \"chat\" with a French language tag. This rather strange behaviour is an artefact of the original RDF/XML syntax.\nFurther there are really two sorts of Literals. In one, the string component is just that, an ordinary string. In the other the string component is expected to be a well-balanced fragment of XML. When an RDF Model is written as RDF/XML a special construction using a parseType='Literal' attribute is used to represent it.\nIn Jena, these attributes of a literal may be set when the literal is constructed, e.g. in tutorial 11:\n// create the resource Resource r = model.createResource(); // add the property r.addProperty(RDFS.label, model.createLiteral(\u0026#34;chat\u0026#34;, \u0026#34;en\u0026#34;)) .addProperty(RDFS.label, model.createLiteral(\u0026#34;chat\u0026#34;, \u0026#34;fr\u0026#34;)) .addProperty(RDFS.label, model.createLiteral(\u0026#34;\u0026amp;lt;em\u0026amp;gt;chat\u0026amp;lt;/em\u0026amp;gt;\u0026#34;, true)); // write out the Model model.write(system.out); produces\n\u0026lt;rdf:RDF xmlns:rdf=\u0026#39;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026#39; xmlns:rdfs=\u0026#39;http://www.w3.org/2000/01/rdf-schema#\u0026#39; \u0026gt; \u0026lt;rdf:Description rdf:nodeID=\u0026#34;A0\u0026#34;\u0026gt; \u0026lt;rdfs:label xml:lang=\u0026#39;en\u0026#39;\u0026gt;chat\u0026lt;/rdfs:label\u0026gt; \u0026lt;rdfs:label xml:lang=\u0026#39;fr\u0026#39;\u0026gt;chat\u0026lt;/rdfs:label\u0026gt; \u0026lt;rdfs:label rdf:parseType=\u0026#39;Literal\u0026#39;\u0026gt;\u0026lt;em\u0026gt;chat\u0026lt;/em\u0026gt;\u0026lt;/rdfs:label\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;/rdf:RDF\u0026gt; For two literals to be considered equal, they must either both be XML literals or both be simple literals. In addition, either both must have no language tag, or if language tags are present they must be equal. For simple literals the strings must be equal. XML literals have two notions of equality. The simple notion is that the conditions previously mentioned are true and the strings are also equal. The other notion is that they can be equal if the canonicalization of their strings is equal.\nJena's interfaces also support typed literals. The old-fashioned way (shown below) treats typed literals as shorthand for strings: typed values are converted in the usual Java way to strings and these strings are stored in the Model. For example, try (noting that for simple literals, we can omit the model.createLiteral(...) call):\n// create the resource Resource r = model.createResource(); // add the property r.addProperty(RDFS.label, \u0026#34;11\u0026#34;) .addProperty(RDFS.label, 11); // write out the Model model.write(system.out, \u0026#34;N-TRIPLE\u0026#34;); The output produced is:\n_:A... \u0026lt;http://www.w3.org/2000/01/rdf-schema#label\u0026gt; \u0026#34;11\u0026#34; . Since both literals are really just the string \"11\", then only one statement is added.\nThe RDFCore WG has defined mechanisms for supporting datatypes in RDF. Jena supports these using the typed literal mechanisms; they are not discussed in this tutorial.\nGlossary Blank Node Represents a resource, but does not indicate a URI for the resource. Blank nodes act like existentially qualified variables in first order logic. Dublin Core A standard for metadata about web resources. Further information can be found at the Dublin Core web site. Literal A string of characters which can be the value of a property. Object The part of a triple which is the value of the statement. Predicate The property part of a triple. Property A property is an attribute of a resource. For example DC.title is a property, as is RDF.type. Resource Some entity. It could be a web resource such as web page, or it could be a concrete physical thing such as a tree or a car. It could be an abstract idea such as chess or football. Resources are named by URI's. Statement An arc in an RDF Model, normally interpreted as a fact. Subject The resource which is the source of an arc in an RDF Model Triple A structure containing a subject, a predicate and an object. Another term for a statement. Footnotes The identifier of an RDF resource can include a fragment identifier, e.g. http://hostname/rdf/tutorial/#ch-Introduction, so, strictly speaking, an RDF resource is identified by a URI reference. As well as being a string of characters, literals also have an optional language encoding to represent the language of the string. For example the literal \"two\" might have a language encoding of \"en\" for English and the literal \"deux\" might have a language encoding of \"fr\" for France. ","permalink":"https://jena.apache.org/tutorials/rdf_api.html","tags":null,"title":"An Introduction to RDF and the Jena RDF API"},{"categories":null,"contents":"This section contains detailed information about the various Jena sub-systems, aimed at developers using Jena. For more general introductions, please refer to the Getting started and Tutorial sections.\nDocumentation index The RDF API - the core RDF API in Jena SPARQL - querying and updating RDF models using the SPARQL standards Fuseki - SPARQL server which can present RDF data and answer SPARQL queries over HTTP I/O - reading and writing RDF data RDF Connection - a SPARQL API for local datasets and remote services Assembler - describing recipes for constructing Jena models declaratively using RDF Inference - using the Jena rules engine and other inference algorithms to derive consequences from RDF models Ontology - support for handling OWL models in Jena Data and RDFS - apply RDFS to graphs in a dataset TDB2 - a fast persistent triple store that stores directly to disk TDB - Original TDB database SHACL - SHACL processor for Jena ShEx - ShEx processor for Jena Text Search - enhanced indexes using Lucene for more efficient searching of text literals in Jena models and datasets. GeoSPARQL - support for GeoSPARQL Permissions - a permissions wrapper around Jena RDF implementation JDBC - a SPARQL over JDBC driver framework Tools - various command-line tools and utilities to help developers manage RDF data and other aspects of Jena How-To\u0026rsquo;s - various topic-specific how-to documents QueryBuilder - Classes to simplify the programmatic building of various query and update statements. Extras - various modules that provide utilities and larger packages that make Apache Jena development or usage easier but that do not fall within the standard Jena framework. Javadoc - JavaDoc generated from the Jena source ","permalink":"https://jena.apache.org/documentation/","tags":null,"title":"Apache Jena documentation overview"},{"categories":null,"contents":" The Jena Elephas module has been retired. The last release of Jena with Elephas is Jena 3.17.0. See jena-elephas/README.md. The original documentation.\n","permalink":"https://jena.apache.org/documentation/archive/hadoop/","tags":null,"title":"Apache Jena Elephas"},{"categories":null,"contents":"Apache Jena Elephas is a set of libraries which provide various basic building blocks which enable you to start writing Apache Hadoop based applications which work with RDF data.\nHistorically there has been no serious support for RDF within the Hadoop ecosystem and what support has existed has often been limited and task specific. These libraries aim to be as generic as possible and provide the necessary infrastructure that enables developers to create their application specific logic without worrying about the underlying plumbing.\nBeta These modules are currently considered to be in a Beta state, they have been under active development for about a year but have not yet been widely deployed and may contain as yet undiscovered bugs.\nPlease see the How to Report a Bug page for how to report any bugs you may encounter.\nDocumentation Overview Getting Started APIs Common IO Map/Reduce Javadoc Examples RDF Stats Demo Maven Artifacts Overview Apache Jena Elephas is published as a set of Maven module via its maven artifacts. The source for these libraries may be downloaded as part of the source distribution. These modules are built against the Hadoop 2.x. APIs and no backwards compatibility for 1.x is provided.\nThe core aim of these libraries it to provide the basic building blocks that allow users to start writing Hadoop applications that work with RDF. They are mostly fairly low level components but they are designed to be used as building blocks to help users and developers focus on actual application logic rather than on the low level plumbing.\nFirstly at the lowest level they provide Writable implementations that allow the basic RDF primitives - nodes, triples and quads - to be represented and exchanged within Hadoop applications, this support is provided by the Common library.\nSecondly they provide support for all the RDF serialisations which Jena supports as both input and output formats subject to the specific limitations of those serialisations. This support is provided by the IO library in the form of standard InputFormat and OutputFormat implementations.\nThere are also a set of basic Mapper and Reducer implementations provided by the Map/Reduce library which contains code that enables various common Hadoop tasks such as counting, filtering, splitting and grouping to be carried out on RDF data. Typically these will be used as a starting point to build more complex RDF processing applications.\nFinally there is a RDF Stats Demo which is a runnable Hadoop job JAR file that demonstrates using these libraries to calculate a number of basic statistics over arbitrary RDF data.\nGetting Started To get started you will need to add the relevant dependencies to your project, the exact dependencies necessary will depend on what you are trying to do. Typically you will likely need at least the IO library and possibly the Map/Reduce library:\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-elephas-io\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;x.y.z\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; \u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-elephas-mapreduce\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;x.y.z\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; Our libraries depend on the relevant Hadoop libraries but since these libraries are typically provided by the Hadoop cluster those dependencies are marked as provided and thus are not transitive. This means that you will typically also need to add the following additional dependencies:\n\u0026lt;!-- Hadoop Dependencies --\u0026gt; \u0026lt;!-- Note these will be provided on the Hadoop cluster hence the provided scope --\u0026gt; \u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.hadoop\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;hadoop-common\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;2.6.0\u0026lt;/version\u0026gt; \u0026lt;scope\u0026gt;provided\u0026lt;/scope\u0026gt; \u0026lt;/dependency\u0026gt; \u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.hadoop\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;hadoop-mapreduce-client-common\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;2.6.0\u0026lt;/version\u0026gt; \u0026lt;scope\u0026gt;provided\u0026lt;/scope\u0026gt; \u0026lt;/dependency\u0026gt; You can then write code to launch a Map/Reduce job that works with RDF. For example let us consider a RDF variation of the classic Hadoop word count example. In this example which we call node count we do the following:\nTake in some RDF triples Split them up into their constituent nodes i.e. the URIs, Blank Nodes \u0026amp; Literals Assign an initial count of one to each node Group by node and sum up the counts Output the nodes and their usage counts We will start with our Mapper implementation, as you can see this simply takes in a triple and splits it into its constituent nodes. It then outputs each node with an initial count of 1:\npackage org.apache.jena.hadoop.rdf.mapreduce.count; import org.apache.jena.hadoop.rdf.types.NodeWritable; import org.apache.jena.hadoop.rdf.types.TripleWritable; import org.apache.jena.graph.Triple; /** * A mapper for counting node usages within triples designed primarily for use * in conjunction with {@link NodeCountReducer} * * @param \u0026lt;TKey\u0026gt; Key type */ public class TripleNodeCountMapper\u0026lt;TKey\u0026gt; extends AbstractNodeTupleNodeCountMapper\u0026lt;TKey, Triple, TripleWritable\u0026gt; { @Override protected NodeWritable[] getNodes(TripleWritable tuple) { Triple t = tuple.get(); return new NodeWritable[] { new NodeWritable(t.getSubject()), new NodeWritable(t.getPredicate()), new NodeWritable(t.getObject()) }; } } And then our Reducer implementation, this takes in the data grouped by node and sums up the counts outputting the node and the final count:\npackage org.apache.jena.hadoop.rdf.mapreduce.count; import java.io.IOException; import java.util.Iterator; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.mapreduce.Reducer; import org.apache.jena.hadoop.rdf.types.NodeWritable; /** * A reducer which takes node keys with a sequence of longs representing counts * as the values and sums the counts together into pairs consisting of a node * key and a count value. */ public class NodeCountReducer extends Reducer\u0026lt;NodeWritable, LongWritable, NodeWritable, LongWritable\u0026gt; { @Override protected void reduce(NodeWritable key, Iterable\u0026lt;LongWritable\u0026gt; values, Context context) throws IOException, InterruptedException { long count = 0; Iterator\u0026lt;LongWritable\u0026gt; iter = values.iterator(); while (iter.hasNext()) { count += iter.next().get(); } context.write(key, new LongWritable(count)); } } Finally we then need to define an actual Hadoop job we can submit to run this. Here we take advantage of the IO library to provide us with support for our desired RDF input format:\npackage org.apache.jena.hadoop.rdf.stats; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.jena.hadoop.rdf.io.input.TriplesInputFormat; import org.apache.jena.hadoop.rdf.io.output.ntriples.NTriplesNodeOutputFormat; import org.apache.jena.hadoop.rdf.mapreduce.count.NodeCountReducer; import org.apache.jena.hadoop.rdf.mapreduce.count.TripleNodeCountMapper; import org.apache.jena.hadoop.rdf.types.NodeWritable; public class RdfMapReduceExample { public static void main(String[] args) { try { // Get Hadoop configuration Configuration config = new Configuration(true); // Create job Job job = Job.getInstance(config); job.setJarByClass(RdfMapReduceExample.class); job.setJobName(\u0026quot;RDF Triples Node Usage Count\u0026quot;); // Map/Reduce classes job.setMapperClass(TripleNodeCountMapper.class); job.setMapOutputKeyClass(NodeWritable.class); job.setMapOutputValueClass(LongWritable.class); job.setReducerClass(NodeCountReducer.class); // Input and Output job.setInputFormatClass(TriplesInputFormat.class); job.setOutputFormatClass(NTriplesNodeOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(\u0026quot;/example/input/\u0026quot;)); FileOutputFormat.setOutputPath(job, new Path(\u0026quot;/example/output/\u0026quot;)); // Launch the job and await completion job.submit(); if (job.monitorAndPrintJob()) { // OK System.out.println(\u0026quot;Completed\u0026quot;); } else { // Failed System.err.println(\u0026quot;Failed\u0026quot;); } } catch (Throwable e) { e.printStackTrace(); } } } So this really is no different from configuring any other Hadoop job, we simply have to point to the relevant input and output formats and provide our mapper and reducer. Note that here we use the TriplesInputFormat which can handle RDF in any Jena supported format, if you know your RDF is in a specific format it is usually more efficient to use a more specific input format. Please see the IO page for more detail on the available input formats and the differences between them.\nWe recommend that you next take a look at our RDF Stats Demo which shows how to do some more complex computations by chaining multiple jobs together.\nAPIs There are three main libraries each with their own API:\nCommon - this provides the basic data model for representing RDF data within Hadoop IO - this provides support for reading and writing RDF Map/Reduce - this provides support for writing Map/Reduce jobs that work with RDF ","permalink":"https://jena.apache.org/documentation/archive/hadoop/elephas_index.html","tags":null,"title":"Apache Jena Elephas"},{"categories":null,"contents":"The Common API provides the basic data model for representing RDF data within Apache Hadoop applications. This primarily takes the form of Writable implementations and the necessary machinery to efficiently serialise and deserialise these.\nCurrently we represent the three main RDF primitives - Nodes, Triples and Quads - though in future a wider range of primitives may be supported if we receive contributions to implement them.\nRDF Primitives Nodes The Writable type for nodes is predictably enough called NodeWritable and it implements the WritableComparable interface which means it can be used as both a key and/or value in Map/Reduce. In standard Hadoop style a get() method returns the actual value as a Jena Node instance while a corresponding set() method allows the value to be set. Conveying null values is acceptable and fully supported.\nNote that nodes are lazily converted to and from the underlying binary representation so there is minimal overhead if you create a NodeWritable instance that does not actually ever get read/written.\nNodeWritable supports and automatically registers itself for Hadoop\u0026rsquo;s WritableComparator mechanism which allows it to provide high efficiency binary comparisons on nodes which helps reduce phases run faster by avoiding unnecessary deserialisation into POJOs.\nHowever the downside of this is that the sort order for nodes may not be as natural as the sort order using POJOs or when sorting with SPARQL. Ultimately this is a performance trade off and in our experiments the benefits far outweigh the lack of a more natural sort order.\nYou simply use it as follows\nNodeWritable nw = new NodeWritable(); // Set the value nw.set(NodeFactory.createURI(\u0026quot;http://example.org\u0026quot;)); // Get the value (remember this may be null) Node value = nw.get(); Triples Again the Writable type for nodes is simply called TripleWritable and it also implements the WritableComparable interface meaning it may be used as both a key and/or value. Again the standard Hadoop conventions of a get() and set() method to get/set the value as a Jena Triple are followed. Unlike the NodeWritable this class does not support conveying null values.\nLike the other primitives it is lazily converted to and from the underlying binary representations and it also supports \u0026amp; registers itself for Hadoop\u0026rsquo;s WritableComparator mechanism.\nQuads Similarly the Writable type for quads is again simply called QuadWritable and it implements the WritableComparable interface making it usable as both a key and/or value. As per the other primitives standard Hadoop conventions of a get() and set() method are provided to get/set the value as a Jena Quad. Unlike the NodeWritable this class does not support conveying null values.\nLike the other primitives it is lazily converted to and from the underlying binary representations and it also supports \u0026amp; registers itself for Hadoop\u0026rsquo;s WritableComparator mechanism.\nArbitrary sized tuples In some cases you may have data that is RDF like but not itself RDF or that is a mix of triples and quads in which case you may wish to use the NodeTupleWritable. This is used to represent an arbitrarily sized tuple consisting of zero or more Node instances, there is no restriction on the number of nodes per tuple and no requirement that tuple data be uniform in size.\nLike the other primitives it implements WritableComparable so can be used as a key and/or value. However this primitive does not support binary comparisons meaning it may not perform as well as using the other primitives.\nIn this case the get() and set() methods get/set a Tuple\u0026lt;Node\u0026gt; instance which is a convenience container class provided by ARQ. Currently the implementation does not support lazy conversion so the full Tuple\u0026lt;Node\u0026gt; is reconstructed as soon as an NodeTupleWritable instance is deserialised.\n","permalink":"https://jena.apache.org/documentation/archive/hadoop/common.html","tags":null,"title":"Apache Jena Elephas - Common API"},{"categories":null,"contents":"The IO API provides support for reading and writing RDF within Apache Hadoop applications. This is done by providing InputFormat and OutputFormat implementations that cover all the RDF serialisations that Jena supports.\n{% toc %}\nBackground on Hadoop IO If you are already familiar with the Hadoop IO paradigm then please skip this section, if not please read as otherwise some of the later information will not make much sense.\nHadoop applications and particularly Map/Reduce exploit horizontally scalability by dividing input data up into splits where each split represents a portion of the input data that can be read in isolation from the other pieces. This isolation property is very important to understand, if a file format requires that the entire file be read sequentially in order to properly interpret it then it cannot be split and must be read as a whole.\nTherefore depending on the file formats used for your input data you may not get as much parallel performance because Hadoop\u0026rsquo;s ability to split the input data may be limited.\nIn some cases there are file formats that may be processed in multiple ways i.e. you can split them into pieces or you can process them as a whole. Which approach you wish to use will depend on whether you have a single file to process or many files to process. In the case of many files processing files as a whole may provide better overall throughput than processing them as chunks. However your mileage may vary especially if your input data has many files of uneven size.\nCompressed IO Hadoop natively provides support for compressed input and output providing your Hadoop cluster is appropriately configured. The advantage of compressing the input/output data is that it means there is less IO workload on the cluster however this comes with the disadvantage that most compression formats block Hadoop\u0026rsquo;s ability to split up the input.\nHadoop generally handles compression automatically and all our input and output formats are capable of handling compressed input and output as necessary. However in order to use this your Hadoop cluster/job configuration must be appropriately configured to inform Hadoop about what compression codecs are in use.\nFor example to enable BZip2 compression (assuming your cluster doesn\u0026rsquo;t enable this by default):\n// Assumes you already have a Configuration object you are preparing // in the variable config config.set(HadoopIOConstants.IO_COMPRESSION_CODECS, BZip2Codec.class.getCanonicalName()); See the Javadocs for the Hadoop CompressionCodec API to see the available out of the box implementations. Note that some clusters may provide additional compression codecs beyond those built directly into Hadoop.\nRDF IO in Hadoop There are a wide range of RDF serialisations supported by ARQ, please see the RDF IO for an overview of the formats that Jena supports. In this section we go into a lot more depth of how exactly we support RDF IO in Hadoop.\nInput One of the difficulties posed when wrapping these for Hadoop IO is that the formats have very different properties in terms of our ability to split them into distinct chunks for Hadoop to process. So we categorise the possible ways to process RDF inputs as follows:\nLine Based - Each line of the input is processed as a single line Batch Based - The input is processed in batches of N lines (where N is configurable) Whole File - The input is processed as a whole There is then also the question of whether a serialisation encodes triples, quads or can encode both. Where a serialisation encodes both we provide two variants of it so you can choose whether you want to process it as triples/quads.\nBlank Nodes in Input Note that readers familiar with RDF may be wondering how we cope with blank nodes when splitting input and this is an important issue to address.\nEssentially Jena contains functionality that allows it to predictably generate identifiers from the original identifier present in the file e.g. _:blank. This means that wherever _:blank appears in the original file we are guaranteed to assign it the same internal identifier. Note that this functionality uses a seed value to ensure that blank nodes coming from different input files are not assigned the same identifier.\nWhen used with Hadoop this seed is chosen based on a combination of the Job ID and the input file path. This means that the same file processed by different jobs will produce different blank node identifiers each time. However within a job every read of the file will predictably generate blank node identifiers so splitting does not prevent correct blank node identification.\nAdditionally the binary serialisation we use for our RDF primitives (described on the Common API) page guarantees that internal identifiers are preserved as-is when communicating values across the cluster.\nMixed Inputs In many cases your input data may be in a variety of different RDF formats in which case we have you covered. The TriplesInputFormat, QuadsInputFormat and TriplesOrQuadsInputFormat can handle mixture of triples/quads/both triples \u0026amp; quads as desired. Note that in the case of TriplesOrQuadsInputFormat any triples are up-cast into quads in the default graph.\nWith mixed inputs the specific input format to use for each is determined based on the file extensions of each input file, unrecognised extensions will result in an IOException. Compression is handled automatically you simply need to name your files appropriately to indicate the type of compression used e.g. example.ttl.gz would be treated as GZipped Turtle, if you\u0026rsquo;ve used a decent compression tool it should have done this for you. The downside of mixed inputs is that it decides quite late what the input format is which means that it always processes inputs as whole files because it doesn\u0026rsquo;t decide on the format until after it has been asked to split the inputs.\nOutput As with input we also need to be careful about how we output RDF data. Similar to input some serialisations can be output in a streaming fashion while other serialisations require us to store up all the data and then write it out in one go at the end. We use the same categorisations for output though the meanings are slightly different:\nLine Based - Each record is written as soon as it is received Batch Based - Records are cached until N records are seen or the end of output and then the current batch is output (where N is configurable) Whole File - Records are cached until the end of output and then the entire output is written in one go However both the batch based and whole file approaches have the downside that it is possible to exhaust memory if you have large amounts of output to process (or set the batch size too high for batch based output).\nBlank Nodes in Output As with input blank nodes provide a complicating factor in producing RDF output. For whole file output formats this is not an issue but it does need to be considered for line and batch based formats.\nHowever what we have found in practise is that the Jena writers will predictably map internal identifiers to the blank node identifiers in the output serialisations. What this means is that even when processing output in batches we\u0026rsquo;ve found that using the line/batch based formats correctly preserve blank node identity.\nIf you are concerned about potential data corruption as a result of this then you should make sure to always choose a whole file output format but be aware that this can exhaust memory if your output is large.\nBlank Node Divergence in multi-stage pipelines The other thing to consider with regards to blank nodes in output is that Hadoop will by default create multiple output files (one for each reducer) so even if consistent and valid blank nodes are output they may be spread over multiple files.\nIn multi-stage pipelines you may need to manually concatenate these files back together (assuming they are in a format that allows this e.g. NTriples) as otherwise when you pass them as input to the next job the blank node identifiers will diverge from each other. JENA-820 discusses this problem and introduces a special configuration setting that can be used to resolve this. Note that even with this setting enabled some formats are not capable of respecting it, see the later section on Job Configuration Options for more details.\nAn alternative workaround is to always use RDF Thrift as the intermediate output format since it preserves blank node identifiers precisely as they are seen. This also has the advantage that RDF Thrift is extremely fast to read and write which can speed up multi-stage pipelines considerably.\nNode Output Format We also include a special NTriplesNodeOutputFormat which is capable of outputting pairs composed of a NodeWritable key and any value type. Think of this as being similar to the standard Hadoop TextOutputFormat except it understands how to format nodes as valid NTriples serialisation. This format is useful when performing simple statistical analysis such as node usage counts or other calculations over nodes.\nIn the case where the value of the key value pair is also a RDF primitive proper NTriples formatting is also applied to each of the nodes in the value\nRDF Serialisation Support Input The following table categorises how each supported RDF serialisation is processed for input. Note that in some cases we offer multiple ways to process a serialisation.\nRDF Serialisation Line Based Batch Based Whole File Triple Formats NTriplesYesYesYes TurtleNoNoYes RDF/XMLNoNoYes RDF/JSONNoNoYes Quad Formats NQuadsYesYesYes TriGNoNoYes TriXNoNoYes Triple/Quad Formats JSON-LDNoNoYes RDF ThriftNoNoYes Output The following table categorises how each supported RDF serialisation can be processed for output. As with input some serialisations may be processed in multiple ways.\nRDF Serialisation Line Based Batch Based Whole File Triple Formats NTriplesYesNoNo TurtleYesYesNo RDF/XMLNoNoYes RDF/JSONNoNoYes Quad Formats NQuadsYesNoNo TriGYesYesNo TriXYesNoNo Triple/Quad Formats JSON-LDNoNoYes RDF ThriftYesNoNo Job Setup To use RDF as an input and/or output format you will need to configure your Job appropriately, this requires setting the input/output format and setting the data paths:\n// Create a job using default configuration Job job = Job.createInstance(new Configuration(true)); // Use Turtle as the input format job.setInputFormatClass(TurtleInputFormat.class); FileInputFormat.setInputPath(job, \u0026quot;/users/example/input\u0026quot;); // Use NTriples as the output format job.setOutputFormatClass(NTriplesOutputFormat.class); FileOutputFormat.setOutputPath(job, \u0026quot;/users/example/output\u0026quot;); // Other job configuration... This example takes in input in Turtle format from the directory /users/example/input and outputs the end results in NTriples in the directory /users/example/output.\nTake a look at the Javadocs to find the actual available input and output format implementations.\nJob Configuration Options There are a several useful configuration options that can be used to tweak the behaviour of the RDF IO functionality if desired.\nInput Lines per Batch Since our line based input formats use the standard Hadoop NLineInputFormat to decide how to split up inputs we support the standard mapreduce.input.lineinputformat.linespermap configuration setting for changing the number of lines processed per map.\nYou can set this directly in your configuration:\njob.getConfiguration().setInt(NLineInputFormat.LINES_PER_MAP, 100); Or you can use the convenience method of NLineInputFormat like so:\nNLineInputFormat.setNumLinesPerMap(job, 100); Max Line Length When using line based inputs it may be desirable to ignore lines that exceed a certain length (for example if you are not interested in really long literals). Again we use the standard Hadoop configuration setting mapreduce.input.linerecordreader.line.maxlength to control this behaviour:\njob.getConfiguration().setInt(HadoopIOConstants.MAX_LINE_LENGTH, 8192); Ignoring Bad Tuples In many cases you may have data that you know contains invalid tuples, in such cases it can be useful to just ignore the bad tuples and continue. By default we enable this behaviour and will skip over bad tuples though they will be logged as an error. If you want you can disable this behaviour by setting the rdf.io.input.ignore-bad-tuples configuration setting:\njob.getConfiguration().setBoolean(RdfIOConstants.INPUT_IGNORE_BAD_TUPLES, false); Global Blank Node Identity The default behaviour of this library is to allocate file scoped blank node identifiers in such a way that the same syntactic identifier read from the same file is allocated the same blank node ID even across input splits within a job. Conversely the same syntactic identifier in different input files will result in different blank nodes within a job.\nHowever as discussed earlier in the case of multi-stage jobs the intermediate outputs may be split over several files which can cause the blank node identifiers to diverge from each other when they are read back in by subsequent jobs. For multi-stage jobs this is often (but not always) incorrect and undesirable behaviour in which case you will need to set the rdf.io.input.bnodes.global-identity property to true for the subsequent jobs:\njob.getConfiguration.setBoolean(RdfIOConstants.GLOBAL_BNODE_IDENTITY, true); Important - This should only be set for the later jobs in a multi-stage pipeline and should rarely (if ever) be set for single jobs or the first job of a pipeline.\nEven with this setting enabled not all formats are capable of honouring this option, RDF/XML and JSON-LD will ignore this option and should be avoided as intermediate output formats.\nAs noted earlier an alternative workaround to enabling this setting is to instead use RDF Thrift as the intermediate output format since it guarantees to preserve blank node identifiers as-is on both reads and writes.\nOutput Batch Size The batch size for batched output formats can be controlled by setting the rdf.io.output.batch-size property as desired. The default value for this if not explicitly configured is 10,000:\njob.getConfiguration.setInt(RdfIOConstants.OUTPUT_BATCH_SIZE, 25000); ","permalink":"https://jena.apache.org/documentation/archive/hadoop/io.html","tags":null,"title":"Apache Jena Elephas - IO API"},{"categories":null,"contents":"The Map/Reduce API provides a range of building block Mapper and Reducer implementations that can be used as a starting point for building Map/Reduce applications that process RDF. Typically more complex applications will need to implement their own variants but these basic ones may still prove useful as part of a larger pipeline.\n{% toc %}\nTasks The API is divided based upon implementations that support various common Hadoop tasks with appropriate Mapper and Reducer implementations provided for each. In most cases these are implemented to be at least partially abstract to make it easy to implement customised versions of these.\nThe following common tasks are supported:\nCounting Filtering Grouping Splitting Transforming Note that standard Map/Reduce programming rules apply as normal. For example if a mapper/reducer transforms between data types then you need to make setMapOutputKeyClass(), setMapOutputValueClass(), setOutputKeyClass() and setOutputValueClass() calls on your Job configuration as necessary.\nCounting Counting is one of the classic Map/Reduce tasks and features as both the official Map/Reduce example for both Hadoop itself and for Elephas. Implementations cover a number of different counting tasks that you might want to carry out upon RDF data, in most cases you will use the desired Mapper implementation in conjunction with the NodeCountReducer.\nNode Usage The simplest type of counting supported is to count the usages of individual RDF nodes within the triples/quads. Depending on whether your data is triples/quads you can use either the TripleNodeCountMapper or the QuadNodeCountMapper.\nIf you want to count only usages of RDF nodes in a specific position then we also provide variants for that, for example TripleSubjectCountMapper counts only RDF nodes present in the subject position. You can substitute Predicate or Object into the class name in place of Subject if you prefer to count just RDF nodes in the predicate/object position instead. Similarly replace Triple with Quad if you wish to count usage of RDF nodes in specific positions of quads, an additional QuadGraphCountMapper if you want to calculate the size of graphs.\nLiteral Data Types Another interesting variant of counting is to count the usage of literal data types, you can use the TripleDataTypeCountMapper or QuadDataTypeCountMapper if you want to do this.\nNamespaces Finally you may be interested in the usage of namespaces within your data, in this case the TripleNamespaceCountMapper or QuadNamespaceCountMapper can be used to do this. For this use case you should use the TextCountReducer to total up the counts for each namespace. Note that the mappers determine the namespace for a URI simply by splitting after the last # or / in the URI, if no such character exists then the full URI is considered to be the namespace.\nFiltering Filtering is another classic Map/Reduce use case, here you want to take the data and extract only the portions that you are interested in based on some criteria. All our filter Mapper implementations also support a Job configuration option named rdf.mapreduce.filter.invert allowing their effects to be inverted if desired e.g.\nconfig.setBoolean(RdfMapReduceConstants.FILTER_INVERT, true); Valid Data One type of filter that may be useful particularly if you are generating RDF data that may not be strict RDF is the ValidTripleFilterMapper and the ValidQuadFilterMapper. These filters only keep triples/quads that are valid according to strict RDF semantics i.e.\nSubject can only be URI/Blank Node Predicate can only be a URI Object can be a URI/Blank Node/Literal Graph can only be a URI or Blank Node If you wanted to extract only the bad data e.g. for debugging then you can of course invert these filters by setting rdf.mapreduce.filter.invert to true as shown above.\nGround Data In some cases you may only be interesting in triples/quads that are grounded i.e. don\u0026rsquo;t contain blank nodes in which case the GroundTripleFilterMapper and GroundQuadFilterMapper can be used.\nData with a specific URI In lots of case you may want to extract only data where a specific URI occurs in a specific position, for example if you wanted to extract all the rdf:type declarations then you might want to use the TripleFilterByPredicateUriMapper or QuadFilterByPredicateUriMapper as appropriate. The job configuration option rdf.mapreduce.filter.predicate.uris is used to provide a comma separated list of the full URIs you want the filter to accept e.g.\nconfig.set(RdfMapReduceConstants.FILTER_PREDICATE_URIS, \u0026quot;http://example.org/predicate,http://another.org/predicate\u0026quot;); Similar to the counting of node usage you can substitute Predicate for Subject, Object or Graph as desired. You will also need to do this in the job configuration option, for example to filter on subject URIs in quads use the QuadFilterBySubjectUriMapper and the rdf.mapreduce.filter.subject.uris configuration option e.g.\nconfig.set(RdfMapReduceConstants.FILTER_SUBJECT_URIS, \u0026quot;http://example.org/myInstance\u0026quot;); Grouping Grouping is again another frequent Map/Reduce use case, here we provide implementations that allow you to group triples or quads by a specific RDF node within the triples/quads e.g. by subject. For example to group quads by predicate use the QuadGroupByPredicateMapper, similar to filtering and counting you can substitute Predicate for Subject, Object or Graph if you wish to group by another node of the triple/quad.\nSplitting Splitting allows you to split triples/quads up into the constituent RDF nodes, we provide two kinds of splitting:\nTo Nodes - Splits pairs of arbitrary keys with triple/quad values into several pairs of the key with the nodes as the values With Nodes - Splits pairs of arbitrary keys with triple/quad values keeping the triple/quad as the key and the nodes as the values. Transforming Transforming provides some very simple implementations that allow you to convert between triples and quads. For the lossy case of going from quads to triples simply use the QuadsToTriplesMapper.\nIf you want to go the other way - triples to quads - this requires adding a graph field to each triple and we provide two implementations that do that. Firstly there is TriplesToQuadsBySubjectMapper which puts each triple into a graph based on its subject i.e. all triples with a common subject go into a graph named for the subject. Secondly there is TriplesToQuadsConstantGraphMapper which simply puts all triples into the default graph, if you wish to change the target graph you should extend this class. If you wanted to select the graph to use based on some arbitrary criteria you should look at extending the AbstractTriplesToQuadsMapper instead.\nExample Jobs Node Count The following example shows how to configure a job which performs a node count i.e. counts the usages of RDF terms (aka nodes in Jena parlance) within the data:\n// Assumes we have already created a Hadoop Configuration // and stored it in the variable config Job job = Job.getInstance(config); // This is necessary as otherwise Hadoop won't ship the JAR to all // nodes and you'll get ClassDefNotFound and similar errors job.setJarByClass(Example.class); // Give our job a friendly name job.setJobName(\u0026quot;RDF Triples Node Usage Count\u0026quot;); // Mapper class // Since the output type is different from the input type have to declare // our output types job.setMapperClass(TripleNodeCountMapper.class); job.setMapOutputKeyClass(NodeWritable.class); job.setMapOutputValueClass(LongWritable.class); // Reducer class job.setReducerClass(NodeCountReducer.class); // Input // TriplesInputFormat accepts any RDF triples serialisation job.setInputFormatClass(TriplesInputFormat.class); // Output // NTriplesNodeOutputFormat produces lines consisting of a Node formatted // according to the NTriples spec and the value separated by a tab job.setOutputFormatClass(NTriplesNodeOutputFormat.class); // Set your input and output paths FileInputFormat.setInputPath(job, new Path(\u0026quot;/example/input\u0026quot;)); FileOutputFormat.setOutputPath(job, new Path(\u0026quot;/example/output\u0026quot;)); // Now run the job... ","permalink":"https://jena.apache.org/documentation/archive/hadoop/mapred.html","tags":null,"title":"Apache Jena Elephas - Map/Reduce API"},{"categories":null,"contents":"The RDF Stats Demo is a pre-built application available as a ready to run Hadoop Job JAR with all dependencies embedded within it. The demo app uses the other libraries to allow calculating a number of basic statistics over any RDF data supported by Elephas.\nTo use it you will first need to build it from source or download the relevant Maven artefact:\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-elephas-stats\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;x.y.z\u0026lt;/version\u0026gt; \u0026lt;classifier\u0026gt;hadoop-job\u0026lt;/classifier\u0026gt; \u0026lt;/dependency\u0026gt; Where x.y.z is the desired version.\nPre-requisites In order to run this demo you will need to have a Hadoop 2.x cluster available, for simple experimentation purposes a single node cluster will be sufficient.\nRunning Assuming your cluster is started and running and the hadoop command is available on your path you can run the application without any arguments to see help:\n\u0026gt; hadoop jar jena-elephas-stats-VERSION-hadoop-job.jar org.apache.jena.hadoop.rdf.stats.RdfStats NAME hadoop jar PATH_TO_JAR org.apache.jena.hadoop.rdf.stats.RdfStats - A command which computes statistics on RDF data using Hadoop SYNOPSIS hadoop jar PATH_TO_JAR org.apache.jena.hadoop.rdf.stats.RdfStats [ {-a | --all} ] [ {-d | --data-types} ] [ {-g | --graph-sizes} ] [ {-h | --help} ] [ --input-type \u0026lt;inputType\u0026gt; ] [ {-n | --node-count} ] [ --namespaces ] {-o | --output} \u0026lt;OutputPath\u0026gt; [ {-t | --type-count} ] [--] \u0026lt;InputPath\u0026gt;... OPTIONS -a, --all Requests that all available statistics be calculated -d, --data-types Requests that literal data type usage counts be calculated -g, --graph-sizes Requests that the size of each named graph be counted -h, --help Display help information --input-type \u0026lt;inputType\u0026gt; Specifies whether the input data is a mixture of quads and triples, just quads or just triples. Using the most specific data type will yield the most accurate statistics This options value is restricted to the following value(s): mixed quads triples -n, --node-count Requests that node usage counts be calculated --namespaces Requests that namespace usage counts be calculated -o \u0026lt;OutputPath\u0026gt;, --output \u0026lt;OutputPath\u0026gt; Sets the output path -t, --type-count Requests that rdf:type usage counts be calculated -- This option can be used to separate command-line options from the list of argument, (useful when arguments might be mistaken for command-line options) \u0026lt;InputPath\u0026gt; Sets the input path(s) If we wanted to calculate the node count on some data we could do the following:\n\u0026gt; hadoop jar jena-elephas-stats-VERSION-hadoop-job.jar org.apache.jena.hadoop.rdf.stats.RdfStats --node-count --output /example/output /example/input This calculates the node counts for the input data found in /example/input placing the generated counts in /example/output\nSpecifying Inputs and Outputs Inputs are specified simply by providing one or more paths to the data you wish to analyse. You can provide directory paths in which case all files within the directory will be processed.\nTo specify the output location use the -o or --output option followed by the desired output path.\nBy default the demo application assumes a mixture of quads and triples data, if you know your data is only in triples/quads then you can use the --input-type argument followed by triples or quads to indicate the type of your data. Not doing this can skew some statistics as the default is to assume mixed data and so all triples are upgraded into quads when calculating the statistics.\nAvailable Statistics The following statistics are available and are activated by the relevant command line option:\nCommand Line OptionStatisticDescription \u0026 Notes -n or --node-countNode CountCounts the occurrences of each unique RDF term i.e. node in Jena parlance -t or --type-countType CountCounts the occurrences of each declared rdf:type value -d or --data-typesData Type CountCounts the occurrences of each declared literal data type --namespacesNamespace CountsCounts the occurrences of namespaces within the data.\nNamespaces are determined by splitting URIs at the # fragment separator if present and if not the last / character -g or --graph-sizesGraph SizesCounts the sizes of each graph declared in the data You can also use the -a or --all option if you simply wish to calculate all statistics.\n","permalink":"https://jena.apache.org/documentation/archive/hadoop/demo.html","tags":null,"title":"Apache Jena Elephas - RDF Stats Demo"},{"categories":null,"contents":"Apache Jena Fuseki is a SPARQL server. It can run as an operating system service, as a Java web application (WAR file), and as a standalone server.\nFuseki comes in two forms, a single system \u0026ldquo;webapp\u0026rdquo;, combined with a UI for admin and query, and as \u0026ldquo;main\u0026rdquo;, a server suitable to run as part of a larger deployment, including with Docker or running embedded. Both forms use the same core protocol engine and same configuration file format.\nFuseki provides the SPARQL 1.1 protocols for query and update as well as the SPARQL Graph Store protocol.\nFuseki is tightly integrated with TDB to provide a robust, transactional persistent storage layer, and incorporates Jena text query.\nContents Download with UI Getting Started Running Fuseki with UI As a standalone server with UI As a service As a web application Security with Apache Shiro Running Fuseki Server Setup As a Docker container As an embedded SPARQL server Security and data access control Logging Fuseki Configuration Server Statistics and Metrics How to Contribute Client access Use from Java SPARQL Over HTTP - scripts to help with data management. Extending Fuseki with Fuseki Modules Links to Standards The Jena users mailing is the place to get help with Fuseki.\nEmail support lists\nDownload Fuseki with UI Releases of Apache Jena Fuseki can be downloaded from one of the mirror sites:\nJena Downloads\nand previous releases are available from the archive. We strongly recommend that users use the latest official Apache releases of Jena Fuseki in preference to any older versions.\nFuseki download files\nFilename Description apache-jena-fuseki-*VER*.zip Fuseki with UI download jena-fuseki-server The Fuseki Main packaging apache-jena-fuseki-*VER*.zip contains both a war file and an executable jar.\nFuskei Main is also available as a Maven artifact:\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-fuseki-main\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;X.Y.Z\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; Previous releases While previous releases are available, we strongly recommend that wherever possible users use the latest official Apache releases of Jena in preference to using any older versions of Jena.\nPrevious Apache Jena releases can be found in the Apache archive area at https://archive.apache.org/dist/jena\nDevelopment Builds Regular development builds of all of Jena are available (these are not formal releases) from the Apache snapshots maven repository. This includes packaged builds of Fuseki.\nGetting Started With Fuseki The quick start section serves as a basic guide to getting a Fuseki server running on your local machine.\nSee all the ways to run Fuseki for complete coverage of all the deployment methods for Fuseki.\nHow to Contribute We welcome contributions towards making Jena a better platform for semantic web and linked data applications. We appreciate feature suggestions, bug reports and patches for code or documentation.\nSee \u0026ldquo;Getting Involved\u0026rdquo; for ways to contribute to Jena and Fuseki, including patches and making github pull-requests.\nSource code The development codebase is available from git.\nDevelopment builds (not a formal release): SNAPSHOT\nSource code: https://github.com/apache/jena/tree/main/jena-fuseki2\nThe Fuseki code is under \u0026ldquo;jena-fuseki2/\u0026rdquo;:\nCode Purpose jena-fuseki-core The Fuseki engine. All SPARQL operations. Fuseki/Main jena-fuseki-main Embedded server and command line jena-fuseki-server Build the combined jar for Fusek/main server jena-fuseki-docker Build a docker conntained based on Fusek/main Webapp jena-fuseki-webapp Web application and command line startup jena-fuseki-fulljar Build the combined jar for Fuseki/UI server jena-fuseki-war Build the war file for Fusek/UI server apache-jena-fuseki The download for Fuskei Other jena-fuseki-access Data access control jena-fuseki-geosparql Integration for GeoSPARQL ","permalink":"https://jena.apache.org/documentation/fuseki2/","tags":null,"title":"Apache Jena Fuseki"},{"categories":null,"contents":"An implementation of GeoSPARQL 1.0 standard for SPARQL query or API.\nIntegration with Fuseki is provided either by using the GeoSPARQL assembler or using the self-contained original jena-fuseki-geosparql. In either case, this page describes the GeoSPARQL supported features.\nGetting Started GeoSPARQL Jena can be accessed as a library using Maven etc. from Maven Central.\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-geosparql\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;...\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; Features This implementation follows the 11-052r4 OGC GeoSPARQL standard (https://www.ogc.org/standards/geosparql). The implementation is pure Java and does not require any set-up or configuration of any third party relational databases and geospatial extensions.\nIt implements the six Conformance Classes described in the GeoSPARQL document:\nCore Topology Vocabulary Geometry Extension Geometry Topology RDFS Entailment Extension Query Rewrite Extension The WKT (as described in 11-052r4) and GML 2.0 Simple Features Profile (10-100r3) serialisations are supported. Additional serialisations can be implemented by extending the org.apache.jena.geosparql.implementation.datatype.GeometryDatatype and registering with Jena\u0026rsquo;s org.apache.jena.datatypes.TypeMapper.\nAll three spatial relation families are supported: Simple Feature, Egenhofer and RCC8.\nIndexing and caching of spatial objects and relations is performed on-demand during query execution. Therefore, set-up delays should be minimal. Spatial indexing is available based on the STRtree from the JTS library. The STRtree is readonly once built and contributions of a QuadTree implementation are welcome.\nBenchmarking of the implementation against Strabon and Parliament has found it to be comparable or quicker. The benchmarking used was the Geographical query and dataset (http://geographica.di.uoa.gr/).\nAdditional Features The following additional features are also provided:\nGeometry properties are automatically calculated and do not need to be asserted in the dataset. Conversion between EPSG spatial/coordinate reference systems is applied automatically. Therefore, mixed datasets or querying can be applied. This is reliance upon local installation of Apache SIS EPSG dataset, see Key Dependencies. Units of measure are automatically converted to the appropriate units for the coordinate reference system. Geometry, transformation and spatial relation results are stored in persistent and configurable time-limited caches to improve response times and reduce recalculations. Dataset conversion between serialisations and spatial/coordinate reference systems. Tabular data can also be loaded, see RDF Tables project (https://github.com/galbiston/rdf-tables). Functions to test Geometry properties directly on Geometry Literals have been included for convenience. SPARQL Query Configuration Using the library for SPARQL querying requires one line of code. All indexing and caching is performed during query execution and so there should be minimal delay during initialisation. This will register the Property Functions with ARQ query engine and configures the indexes used for time-limited caching.\nThere are three indexes which can be configured independently or switched off. These indexes retain data that may be required again when a query is being executed but may not be required between different queries. Therefore, the memory usage will grow during query execution and then recede as data is not re-used. All the indexes support concurrency and can be set to a maximum size or allowed to increase capacity as required.\nGeometry Literal: Geometry objects following de-serialisation from Geometry Literal. Geometry Transform: Geometry objects resulting from coordinate transformations between spatial reference systems. Query Rewrite: results of spatial relations between Feature and Geometry spatial objects. Testing has found up to 20% improvement in query completion durations using the indexes. The indexes can be configured by size, retention duration and frequency of clean up.\nBasic setup with default values: GeoSPARQLConfig.setupMemoryIndex()\nIndexes set to maximum sizes: GeoSPARQLConfig.setupMemoryIndexSize(50000, 50000, 50000)\nIndexes set to remove objects not used after 5 seconds: GeoSPARQLConfig.setupMemoryIndexExpiry(5000, 5000, 5000)\nNo indexes setup (Query rewrite still performed but results not stored) : GeoSPARQLConfig.setupNoIndex()\nNo indexes and no query rewriting: GeoSPARQLConfig.setupNoIndex(false)\nReset indexes and other stored data: GeoSPARQLConfig.reset()\nA variety of configuration methods are provided in org.apache.jena.geosparql.configuration.GeoSPARQLConfig. Caching of frequently used but small quantity data is also applied in several registries, e.g. coordinate reference systems and mathematical transformations.\nExample GeoSPARQL query:\nPREFIX geo: \u0026lt;http://www.opengis.net/ont/geosparql#\u0026gt; SELECT ?obj WHERE{ ?subj geo:sfContains ?obj } ORDER by ?obj Querying Datasets \u0026amp; Models with SPARQL The setup of GeoSPARQL Jena only needs to be performed once in an application. After it is set up querying is performed using Jena\u0026rsquo;s standard query methods.\nTo query a Model with GeoSPARQL or standard SPARQL:\nGeoSPARQLConfig.setupMemoryIndex(); Model model = .....; String query = ....; try (QueryExecution qe = QueryExecution.create(query, model)) { ResultSet rs = qe.execSelect(); ResultSetFormatter.outputAsTSV(rs); } If your dataset needs to be separate from your application and accessed over HTTP then you probably need the GeoSPARQL Assembler to integrate with Fuseki. The GeoSPARQL functionality needs to be setup in the application or Fuseki server where the dataset is located.\nIt is recommended that hasDefaultGeometry properties are included in the dataset to access all functionality. It is necessary that SpatialObject classes are asserted or inferred (i.e. a reasoner with the GeoSPARQL schema is applied) in the dataset. Methods to prepare a dataset can be found in org.apache.jena.geosparql.configuration.GeoSPARQLOperations.\nAPI The library can be used as an API in Java. The main class to handle geometries and their spatial relations is the GeometryWrapper. This can be obtained by parsing the string representation of a geometry using the appropriate datatype (e.g. WKT or GML). Alternatively, a Literal can be extracted automatically using the GeometryWrapper.extract() method and registered datatypes. The GeometryWrapperFactory can be used to directly construct a GeometryWrapper. There is overlap between spatial relation families so repeated methods are not specified.\nParse a Geometry Literal: GeometryWrapper geometryWrapper = WKTDatatype.INSTANCE.parse(\u0026quot;POINT(1 1)\u0026quot;);\nExtract from a Jena Literal: GeometryWrapper geometryWrapper = GeometryWrapper.extract(geometryLiteral);\nCreate from a JTS Geometry: GeometryWrapper geometryWrapper = GeometryWrapperFactory.createGeometry(geometry, srsURI, geometryDatatypeURI);\nCreate from a JTS Point Geometry: GeometryWrapper geometryWrapper = GeometryWrapperFactory.createPoint(coordinate, srsURI, geometryDatatypeURI);\nConvert CRS/SRS: GeometryWrapper otherGeometryWrapper = geometryWrapper.convertCRS(\u0026quot;http://www.opengis.net/def/crs/EPSG/0/27700\u0026quot;)\nSpatial Relation: boolean isCrossing = geometryWrapper.crosses(otherGeometryWrapper);\nDE-9IM Intersection Pattern: boolean isRelated = geometryWrapper.relate(otherGeometryWrapper, \u0026quot;TFFFTFFFT\u0026quot;);\nGeometry Property: boolean isEmpty = geometryWrapper.isEmpty();\nThe GeoSPARQL standard specifies that WKT Geometry Literals without an SRS URI are defaulted to CRS84 http://www.opengis.net/def/crs/OGC/1.3/CRS84.\nKey Dependencies GeoSPARQL The OGC GeoSPARQL standard supports representing and querying geospatial data on the Semantic Web. GeoSPARQL defines a vocabulary for representing geospatial data in RDF, and it defines an extension to the SPARQL query language for processing geospatial data. In addition, GeoSPARQL is designed to accommodate systems based on qualitative spatial reasoning and systems based on quantitative spatial computations.\nThe GeoSPARQL standard is based upon the OGC Simple Features standard (http://www.opengeospatial.org/standards/sfa) used in relational databases. Modifications and enhancements have been made for usage with RDF and SPARQL. The Simple Features standard, and by extension GeoSPARQL, simplify calculations to Euclidean planer geometry. Therefore, datasets using a geographic spatial/coordinate reference system, which are based on latitude and longitude on an ellipsoid, e.g. WGS84, will have minor error introduced. This error has been deemed acceptable due to the simplification in calculation it offers.\nApache SIS/SIS_DATA Environment Variable Apache Spatial Information System (SIS) is a free software, Java language library for developing geospatial applications. SIS provides data structures for geographic features and associated meta-data along with methods to manipulate those data structures. The library is an implementation of GeoAPI 3.0 interfaces and can be used for desktop or server applications.\nA subset of the EPSG spatial/coordinate reference systems are included by default. The full EPSG dataset is not distributed due to the EPSG terms of use being incompatible with the Apache Licence. Several options are available to include the EPSG dataset by setting the SIS_DATA environment variable (http://sis.apache.org/epsg.html).\nAn embedded EPSG dataset can be included in a Gradle application by adding the following dependency to build.gradle:\next.sisVersion = \u0026quot;1.1\u0026quot; implementation \u0026quot;org.apache.sis.non-free:sis-embedded-data:$sisVersion\u0026quot; Java Topology Suite The JTS Topology Suite is a Java library for creating and manipulating vector geometry.\nNote The following are implementation points that may be useful during usage.\nGeoSPARQL Schema An RDF/XML schema has been published for the GeoSPARQL v1.0 standard (v1.0.1 - http://schemas.opengis.net/geosparql/1.0/geosparql_vocab_all.rdf). This can be applied to Jena Models (see the inference documentation) to provide RDFS and OWL inferencing on a GeoSPARQL conforming dataset. However, the published schema does not conform with the standard.\nThe property hasDefaultGeometry is missing from the schema and instead the defaultGeometry property is stated.\nThis prevents RDFS inferencing being performed correctly and has been reported to the OGC Standards Tracker. A corrected version of the schema is available in the Resources folder.\nSpatial Relations The GeoSPARQL and Simple Features standard both define the DE-9IM intersection patterns for the three spatial relation families. However, these patterns are not always consistent with the patterns stated by the JTS library for certain relations.\nFor example, GeoSPARQL/Simple Features use TFFFTFFFT equals relations in Simple Feature, Egenhofer and RCC8. However, this does not yield the usually expected result when comparing a pair of point geometries. The Simple Features standard states that the boundary of a point is empty. Therefore, the boundary intersection of two points would also be empty so give a negative comparison result.\nJTS, and other libraries, use the alternative intersection pattern of T*F**FFF*. This is a combination of the within and contains relations and yields the expected results for all geometry types.\nThe spatial relations utilised by JTS have been implemented as the extension spatial:equals filter and property functions. A user can also supply their own DE-9IM intersection patterns by using the geof:relate filter function.\nSpatial Relations and Geometry Shapes/Types The spatial relations for the three spatial families do not apply to all combinations of the geometry shapes (Point, LineString, Polygon) and their collections (MultiPoint, MultiLineString, MultiPolygon). Therefore, some queries may not produce all the results that may initially be expected.\nSome examples are:\nIn some relations there may only be results when a collection of shapes is being used, e.g. two multi-points can overlap but two points cannot. A relation may only apply for one combination but not its reciprocal, e.g. a line may cross a polygon but a polygon may not cross a line. The RCC8 family only applies to Polygon and MultiPolygon types. Refer to pages 8-10 of 11-052r4 GeoSPARQL standard for more details.\nEquals Relations The three equals relations (sfEquals, ehEquals and rccEquals) use spatial equality and not lexical equality. Therefore, some comparisons using these relations may not be as expected.\nThe JTS description of sfEquals is:\nTrue if two geometries have at least one point in common and no point of either geometry lies in the exterior of the other geometry. Therefore, two empty geometries will return false as they are not spatially equal. Shapes which differ in the number of points but have the same geometry are equal and will return true.\ne.g. LINESTRING (0 0, 0 10) and LINESTRING (0 0, 0 5, 0 10) are spatially equal.\nQuery Rewrite Extension The Query Rewrite Extension provides for simpler querying syntax. Feature and Geometry can be used in spatial relations without needing the relations to be asserted in the dataset. This also means the Geometry Literal does not need to be specified in the query. In the case of Features this requires the hasDefaultGeometry property to be used in the dataset.\nThis means the query:\n?subj geo:hasDefaultGeometry ?subjGeom . ?subjGeom geo:hasSerialization ?subjLit . ?obj geo:hasDefaultGeometry ?objGeom . ?objGeom geo:hasSerialization ?objLit . FILTER(geof:sfContains(?subjLit, ?objLit)) becomes:\n?subj geo:sfContains ?obj . Methods are available to apply the hasDefaultGeometry property to every Geometry with a single hasGeometry property, see org.apache.jena.geosparql.configuration.GeoSPARQLOperations.\nDepending upon the spatial relation, queries may include the specified Feature and Geometry in the results. e.g. FeatureA is bound in a query on a dataset only containing FeatureA and GeometryA. The results FeatureA and GeometryA are returned rather than no results. Therefore, filtering using FILTER(!sameTerm(?subj, ?obj)) etc. may be needed in some cases. The query rewrite functionality can be switched off in the library configuration, see org.apache.jena.geosparql.configuration.GeoSPARQLConfig.\nEach dataset is assigned a Query Rewrite Index to store the results of previous tests. There is the potential that relations are tested multiple times in a query (i.e. Feature-Feature, Feature-Geometry, Geometry-Geometry, Geometry-Feature). Therefore, it is useful to retain the results for at least a short period of time.\nIterating through all combinations of spatial relations for a dataset containing n Geometry Literals will produce 27n^2 true/false results (asserting the true result statements in a dataset would be a subset). Control is given on a dataset basis to allow choice in when and how storage of rewrite results is applied, e.g. store all found results on a small dataset but on demand for a large dataset.\nThis index can be configured on a global and individual dataset basis for the maximum size and duration until unused items are removed. Query rewriting can be switched on independently of the indexes, i.e. query rewriting can be performed but an index is configured to not store the result.\nAs an extension to the standard, supplying a Geometry Literal is also permitted. For example:\n?subj geo:sfContains \u0026#34;POINT(0 0)\u0026#34;^^geo:wktLiteral . Dataset Conversion Methods to convert datasets between serialisations and spatial/coordinate reference systems are available in: org.apache.jena.geosparql..configuration.GeoSPARQLOperations\nThe following list shows some of the operations that can be performed. Once these operations have been performed they can be serialised to file or stored in a Jena TDB to remove the need to reprocess.\nLoad a Jena Model from file: Model dataModel = RDFDataMgr.loadModel(\u0026quot;data.ttl\u0026quot;);\nConvert Feature-GeometryLiteral to the GeoSPARQL Feature-Geometry-GeometryLiteral structure: Model geosparqlModel = GeoSPARQLOperations.convertGeometryStructure(dataModel);\nConvert Feature-Lat, Feature-Lon Geo predicates to the GeoSPARQL Feature-Geometry-GeometryLiteral structure, with option to remove Geo predicates: Model geosparqlModel = GeoSPARQLOperations.convertGeoPredicates(dataModel, true);\nAssert additional hasDefaultGeometry statements for single hasGeometry triples, used in Query Rewriting: GeoSPARQLOperations.applyDefaultGeometry(geosparqlModel);\nConvert Geometry Literals to the WGS84 spatial reference system and WKT datatype: Model model = GeoSPARQLOperations.convert(geosparqlModel, \u0026quot;http://www.opengis.net/def/crs/EPSG/0/4326\u0026quot;, \u0026quot;http://www.opengis.net/ont/geosparql#wktLiteral\u0026quot;);\nApply GeoSPARQL schema with RDFS inferencing and assert additional statements in the Model: GeoSPARQLOperations.applyInferencing(model);\nApply commonly used GeoSPARQL prefixes for URIs to the model: GeoSPARQLOperations.applyPrefixes(model);\nCreate Spatial Index for a Model within a Dataset for spatial querying: Dataset dataset = SpatialIndex.wrapModel(model);\nOther operations are available and can be applied to a Dataset containing multiple Models and in some cases files and folders. These operations do not configure and set up the GeoSPARQL functions or indexes that are required for querying.\nSpatial Index A Spatial Index can be created to improve searching of a dataset. The Spatial Index is expected to be unique to the dataset and should not be shared between datasets. Once built the Spatial Index cannot have additional items added to it.\nA Spatial Index is required for the jena-spatial property functions and is optional for the GeoSPARQL spatial relations. Only a single SRS can be used for a Spatial Index, and it is recommended that datasets are converted to a single SRS, see GeoSPARQLOperations.\nSetting up a Spatial Index can be done through org.apache.jena.geosparql.configuration.GeoSPARQLConfig. Additional methods for building, loading and saving Spatial Indexes are provided in org.apache.jena.geosparql.spatial.SpatialIndex.\nUnits URI Spatial/coordinate reference systems use a variety of measuring systems for defining distances. These can be specified using a URI identifier, as either URL or URN, with conversion undertaken automatically as required. It should be noted that there is error inherent in spatial reference systems and some variation in values may occur between different systems.\nThe following table gives some examples of units that are supported (additional units can be added to the UnitsRegistry using the javax.measure.Unit API). These URIs are all in the namespace http://www.opengis.net/def/uom/OGC/1.0/ and here use the prefix units.\nURI Description units:kilometre or units:kilometer Kilometres units:metre or units:meter Metres units:mile or units:statuteMile Miles units:degree Degrees units:radian Radians Full listing of default Units can be found in org.apache.jena.geosparql.implementation.vocabulary.Unit_URI.\nGeography Markup Language Support (GML) The supported GML profile is GML 2.0 Simple Features Profile (10-100r3), which is a profile of GML 3.2.1 (07-036r1). The profile restricts the geometry shapes permitted in GML 3.2.1 to a subset, see 10-100r3 page 22. The profile supports Points, LineString and Polygon shapes used in WKT. There are also additional shape serialisations available in the profile that do not exist in WKT or JTS to provide simplified representations which would otherwise use LineStrings or Polygons. Curves can be described by LineStringSegment, Arc, Circle and CircleByCenterPoint. Surfaces can be formed similarly to Polygons or using Curves. These additional shapes can be read as part of a dataset or query but will not be produced if the SRS of the shape is transformed, instead a LineString or Polygon representation will be produced.\nDetails of the GML structure for these shapes can be found in the geometryPrimitives.xsd, geometryBasic0d1d.xsd, geometryBasic2d.xsd and geometryAggregates.xsd schemas.\nThe labelling of collections is as follows:\nCollection Geometry MultiPoint Point MultiCurve LineString, Curve MultiSurface Polygon, Surface MultiGeometry Point, LineString, Curve, Polygon, Surface Apache Jena Spatial Functions/WGS84 Geo Predicates The jena-spatial module contains several SPARQL functions for querying datasets using the WGS84 Geo predicates for latitude (http://www.w3.org/2003/01/geo/wgs84_pos#lat) and longitude (http://www.w3.org/2003/01/geo/wgs84_pos#long). These jena-spatial functions are supported for both Geo predicates and Geometry Literals, i.e. a GeoSPARQL dataset. Additional SPARQL filter functions have been provided to convert Geo predicate properties into WKT strings and calculate Great Circle and Euclidean distances. The jena-spatialfunctions require setting up a Spatial Index for the target Dataset, e.g. GeoSPARQLConfig.setupSpatialIndex(dataset);, see Spatial Index section.\nSupported Features The Geo predicate form of spatial representation is restricted to only \u0026lsquo;Point\u0026rsquo; shapes in the WGS84 spatial/coordinate reference system. The Geo predicates are properties of the Feature and do not use the properties and structure of the GeoSPARQL standard, including Geometry Literals. Methods are available to convert datasets from Geo predicates to GeoSPARQL structure, see: org.apache.jena.geosparql.configuration.GeoSPARQLOperations\nThe spatial relations and query re-writing of GeoSPARQL outlined previously has been implemented for Geo predicates. However, only certain spatial relations are valid for Point to Point relationships. Refer to pages 8-10 of 11-052r4 GeoSPARQL standard for more details.\nGeo predicates can be converted to Geometry Literals in query and then used with the GeoSPARQL filter functions.\n?subj wgs:lat ?lat . ?subj wgs:long ?lon . BIND(spatialF:convertLatLon(?lat, ?lon) as ?point) . #Coordinate order is Lon/Lat without stated SRS URI. BIND(\u0026#34;POLYGON((...))\u0026#34;^^\u0026lt;http://www.opengis.net/ont/geosparql#wktLiteral\u0026gt; AS ?box) . FILTER(geof:sfContains(?box, ?point)) Alternatively, utilising more shapes, relations and spatial reference systems can be achieved by converting the dataset to the GeoSPARQL structure.\n?subj geo:hasGeometry ?geom . ?geom geo:hasSerialization ?geomLit . #Coordinate order is Lon/Lat without stated SRS URI. BIND(\u0026#34;POLYGON((...))\u0026#34;^^\u0026lt;http://www.opengis.net/ont/geosparql#wktLiteral\u0026gt; AS ?box) . FILTER(geof:sfContains(?box, ?geomLit)) Datasets can contain both Geo predicates and Geometry Literals without interference. However, a dataset containing both types will only examine those Features which have Geometry Literals for spatial relations, i.e. the check for Geo predicates is a fallback when Geometry Literals aren\u0026rsquo;t found. Therefore, it is not recommended to insert new Geo predicate properties after a dataset has been converted to GeoSPARQL structure (unless corresponding Geometry and Geometry Literals are included).\nFilter Functions These filter functions are available in the http://jena.apache.org/function/spatial# namespace and here use the prefix spatialF.\nFunction Name Description ?wktString spatialF:convertLatLon(?lat, ?lon) Converts Lat and Lon double values into WKT string of a Point with WGS84 SRS. ?wktString spatialF:convertLatLonBox(?latMin, ?lonMin, ?latMax, ?lonMax) Converts Lat and Lon double values into WKT string of a Polygon forming a box with WGS84 SRS. ?boolean spatialF:equals(?geomLit1, ?geomLit2) True, if geomLit1 is spatially equal to geomLit2. ?boolean spatialF:nearby(?geomLit1, ?geomLit2, ?distance, ?unitsURI) True, if geomLit1 is within distance of geomLit2 using the distance units. ?boolean spatialF:withinCircle(?geomLit1, ?geomLit2, ?distance, ?unitsURI) True, if geomLit1 is within distance of geomLit2 using the distance units. ?radians spatialF:angle(?x1, ?y1, ?x2, ?y2) Angle clockwise from y-axis from Point(x1,y1) to Point (x2,y2) in 0 to 2π radians. ?degrees spatialF:angleDeg(?x, ?y1, ?x2, ?y2) Angle clockwise from y-axis from Point(x1,y1) to Point (x2,y2) in 0 to 360 degrees. ?distance spatialF:distance(?geomLit1, ?geomLit2, ?unitsURI) Distance between two Geometry Literals in distance units. Chooses distance measure based on SRS type. Great Circle distance for Geographic SRS and Euclidean otherwise. ?radians spatialF:azimuth(?lat1, ?lon1, ?lat2, ?lon2) Forward azimuth clockwise from North between two Lat/Lon Points in 0 to 2π radians. ?degrees spatialF:azimuthDeg(?lat1, ?lon1, ?lat2, ?lon2) Forward azimuth clockwise from North between two Lat/Lon Points in 0 to 360 degrees. ?distance spatialF:greatCircle(?lat1, ?lon1, ?lat2, ?lon2, ?unitsURI) Great Circle distance (Vincenty formula) between two Lat/Lon Points in distance units. ?distance spatialF:greatCircleGeom(?geomLit1, ?geomLit2, ?unitsURI) Great Circle distance (Vincenty formula) between two Geometry Literals in distance units. Use http://www.opengis.net/def/function/geosparql/distance from GeoSPARQL standard for Euclidean distance. ?geomLit2 spatialF:transform(?geomLit1, ?datatypeURI, ?srsURI) Transform Geometry Literal by Datatype and SRS. ?geomLit2 spatialF:transformDatatype(?geomLit1, ?datatypeURI) Transform Geometry Literal by Datatype. ?geomLit2 spatialF:transformSRS(?geomLit1, ?srsURI) Transform Geometry Literal by SRS. Property Functions These property functions are available in the http://jena.apache.org/spatial# namespace and here use the prefix spatial. This is the same namespace as the jena-spatial functions utilise and these form direct replacements. The subject Feature may be bound, to test the pattern is true, or unbound, to find all cases the pattern is true. These property functions require a Spatial Index to be setup for the dataset.\nThe optional ?limit parameter restricts the number of results returned. The default value is -1 which returns all results. No guarantee is given for ordering of results. The optional ?unitsURI parameter specifies the units of a distance. The default value is kilometres through the string or resource http://www.opengis.net/def/uom/OGC/1.0/kilometre.\nThe spatial:equals property function behaves the same way as the main GeoSPARQL property functions. Either, both or neither of the subject and object can be bound. A Spatial Index is not required for the dataset with the spatial:equals property function.\nFunction Name Description ?spatialObject1 spatial:equals ?spatialObject2 Find spatialObjects (i.e. features or geometries) that are spatially equal. ?feature spatial:intersectBox(?latMin ?lonMin ?latMax ?lonMax [ ?limit]) Find features that intersect the provided box, up to the limit. ?feature spatial:intersectBoxGeom(?geomLit1 ?geomLit2 [ ?limit]) Find features that intersect the provided box, up to the limit. ?feature spatial:withinBox(?latMin ?lonMin ?latMax ?lonMax [ ?limit]) Find features that intersect the provided box, up to the limit. ?feature spatial:withinBoxGeom(?geomLit1 ?geomLit2 [ ?limit]) Find features that are within the provided box, up to the limit. ?feature spatial:nearby(?lat ?lon ?radius [ ?unitsURI [ ?limit]]) Find features that are within radius of the distance units, up to the limit. ?feature spatial:nearbyGeom(?geomLit ?radius [ ?unitsURI [ ?limit]]) Find features that are within radius of the distance units, up to the limit. ?feature spatial:withinCircle(?lat ?lon ?radius [ ?unitsURI [ ?limit]]) Find features that are within radius of the distance units, up to the limit. ?feature spatial:withinCircleGeom(?geomLit ?radius [ ?unitsURI [ ?limit]]) Find features that are within radius of the distance units, up to the limit. The Cardinal Functions find all Features that are present in the specified direction. In Geographic spatial reference systems (SRS), e.g. WGS84 and CRS84, the East/West directions wrap around. Therefore, a search is made from the shape\u0026rsquo;s edge for up to half the range of the SRS (i.e. 180 degrees in WGS84) and will continue across the East/West boundary if necessary. In other SRS, e.g. Projected onto a flat plane, the East/West check is made from the shape\u0026rsquo;s edge to the farthest limit of the SRS range, i.e. there is no wrap around.\nCardinal Function Name Description ?feature spatial:north(?lat ?lon [ ?limit]) Find features that are North of the Lat/Lon point (point to +90 degrees), up to the limit. ?feature spatial:northGeom(?geomLit [ ?limit]) Find features that are North of the Geometry Literal, up to the limit. ?feature spatial:south(?lat ?lon [ ?limit]) Find features that are South of the Lat/Lon point (point to -90 degrees), up to the limit. ?feature spatial:southGeom(?geomLit [ ?limit]) Find features that are South of the Geometry Literal, up to the limit. ?feature spatial:east(?lat ?lon [ ?limit]) Find features that are East of the Lat/Lon point (point plus 180 degrees longitude, wrapping round), up to the limit. ?feature spatial:eastGeom(?geomLit [ ?limit]) Find features that are East of the Geometry Literal, up to the limit. ?feature spatial:west(?lat ?lon [ ?limit]) Find features that are West of the Lat/Lon point (point minus 180 degrees longitude, wrapping round), up to the limit. ?feature spatial:westGeom(?geomLit [ ?limit]) Find features that are West of the Geometry Literal, up to the limit. Geometry Property Filter Functions The GeoSPARQL standard provides a set of properties related to geometries, see Section 8.4. These are applied on the Geometry resource and are automatically determined if not asserted in the data. However, it may be necessary to retrieve the properties of a Geometry Literal directly without an associated Geometry resource. Filter functions to do this have been included as part of the http://www.opengis.net/def/function/geosparql/ namespace as a minor variation to the GeoSPARQL standard. The relevant functions using the geof prefix are:\nGeometry Property Filter Function Name Description ?integer geof:dimension(?geometryLiteral) Topological dimension, e.g. 0 for Point, 1 for LineString and 2 for Polygon. ?integer geof:coordinateDimension(?geometryLiteral) Coordinate dimension, e.g. 2 for XY coordinates and 4 for XYZM coordinates. ?integer geof:spatialDimension(?geometryLiteral) Spatial dimension, e.g. 2 for XY coordinates and 3 for XYZM coordinates. ?boolean geof:isEmpty(?geometryLiteral) True, if geometry is empty. ?boolean geof:isSimple(?geometryLiteral) True, if geometry is simple. ?boolean geof:isValid(?geometryLiteral) True, if geometry is topologically valid. A dataset that follows the GeoSPARQL Feature-Geometry-GeometryLiteral can have simpler SPARQL queries without needing to use these functions by taking advantage of the Query Rewriting functionality. The geof:isValid filter function and geo:isValid property for a Geometry resource are not part of the GeoSPARQL standard but have been included as a minor variation.\nFuture Work Implementing GeoJSON as a GeometryLiteral serialisation (https://tools.ietf.org/html/rfc7946). Producing GeoJSON is already possible with geof:asGeoJSON(?geometryLiteral). Contributors The following individuals have made contributions to this project:\nGreg Albiston Haozhe Chen Taha Osman Why Use This Implementation? There are several implementations of the GeoSPARQL standard. The conformance and completeness of these implementations is difficult to ascertain and varies between features.\nHowever, the following may be of interest when considering whether to use this implementation based on reviewing several alternatives.\nThis Implementation Other Implementations Implements all six components of the GeoSPARQL standard. Generally partially implement the Geometry Topology and Geometry Extensions. Do not implement the Query Rewrite Extension. Pure Java and does not require a supporting relational database. Configuration requires a single line of code (although Apache SIS may need some setting up, see above). Require setting up a database, configuring a geospatial extension and setting environment variables. Uses Jena, which conforms to the W3C standards for RDF and SPARQL. New versions of the standards will quickly feed through. Not fully RDF and SPARQL compliant, e.g. RDFS/OWL inferencing or SPARQL syntax. Adding your own schema may not produce inferences. Automatically determines geometry properties and handles mixed cases of units or coordinate reference systems. The GeoSPARQL standard suggests this approach but does not require it. Tend to produce errors or no results in these situations. Performs indexing and caching on-demand which reduces set-up time and only performs calculations that are required. Perform indexing in the data loading phase and initialisation phase, which can lead to lengthy delays (even on relatively small datasets). Uses JTS which does not truncate coordinate precision and applies spatial equality. May truncate coordinate precision and apply lexical equality, which is quicker but does not comply with the GeoSPARQL standard. ","permalink":"https://jena.apache.org/documentation/geosparql/","tags":null,"title":"Apache Jena GeoSPARQL"},{"categories":null,"contents":"Jena has an initialization sequence that is used to setup components available at runtime.\nApplication code is welcome to also use this mechanism. This must be done with care. During Jena initialization, there can be visibility of uninitialized data in class static members.\nThe standard initialization sequence is\nCore -\u0026gt; RIOT -\u0026gt; ARQ -\u0026gt; TDB -\u0026gt; other (including jena text)\nThe sequence from 0 to level 500 is the Jena platform initialization. Application may use the jena initialization mechanism and it is recommended to place initialization code above level 500.\nInitialization occurs when JenaSystem.init() is first called. Jena ensures that this is done when the application first uses any Jena code by using class initializers.\nApplication can call JenaSystem.init().\nSee notes on repacking Jena code for how to deal with ServiceLoader files in repacked jars.\nInitialization code Initialization code is an implementation of JenaSubsystemLifecycle which itself extends SubsystemLifecycle.\nFor use in the default initialization, the class must have a zero-argument constructor and implement:\npublic interface JenaSubsystemLifecycle { public void start() ; public void stop() ; default public int level() { return 9999 ; } } The code should supply a level, indicating its place in the order of initialization. The levels used by Jena are:\n0 - reserved 10 - Used by jena-core 15 - CLI Commands registry 20 - RIOT 30 - ARQ 40 - Text indexing 40 - TDB1 42 - TDB2 60 - Additional HTTP configuration 60 - RDFPatch 96 - SHACL 96 - ShEx 101 - Fuseki 9999 - Default. Levels up to 500 are considered to be \u0026ldquo;Jena system level\u0026rdquo;, Application code should use level above 500.\nFuseki initialization includes Fuseki Modules which uses SubsystemLifecycle with a different Java interface.\nThe Initialization Process The process followed by JenaSystem.init() is to load all java ServiceLoader registered JenaSubsystemLifecycle, sort into level order, then call init on each initialization object. Initialization code at the same level may be called in any order and that order may be different between runs.\nOnly the first call of JenaSystem.init() causes the process to run. Any subsequent calls are cheap, so calling JenaSystem.init() when in doubt about the initialization state is safe.\nOverlapping concurrent calls to JenaSystem.init() are thread-safe. On a return from JenaSystem.init(), Jena has been initialized at some point.\nDebugging There is a flag JenaSystem.DEBUG_INIT to help with development. It is not intended for runtime logging.\nJena components print their initialization beginning and end points on System.err to help track down ordering issues.\n","permalink":"https://jena.apache.org/documentation/notes/system-initialization.html","tags":null,"title":"Apache Jena Initialization"},{"categories":null,"contents":"In first name alphabetical order:\nAaron Coburn (acoburn) C Adam Soroka (ajs6f) CP Andy Seaborne (andy) CP VP Bruno Kinoshita (kinow) CP Chris Dollin (chrisdollin) CP Chris Tomlinson (codeferret) CP Claude Warren (claude) CP Damian Steer (damian) CP Dave Reynolds (der) CP Ian Dickinson (ijd) CP Lorenz Buehmann (lbuehmann) C Osma Suominen (osma) CP Paolo Castagna (castagna) CP Rob Vesse (rvesse) CP Stephen Allen (sallen) CP Ying Jiang (jpz6311whu) C Emeritus and Mentors:\nBenson Margulies C Dave Johnson Leo Simons Ross Gardler Key C a committer P a PMC member VP project chair and Apache Foundation Vice-President ","permalink":"https://jena.apache.org/about_jena/team.html","tags":null,"title":"Apache Jena project team members"},{"categories":null,"contents":"Apache Jena is packaged as downloads which contain the most commonly used portions of the systems:\napache-jena – contains the APIs, SPARQL engine, the TDB native RDF database and command line tools apache-jena-fuseki – the Jena SPARQL server Jena5 requires Java 17.\nJena jars are available from Maven.\nYou may verify the authenticity of artifacts below by using the PGP KEYS file.\nApache Jena Release Source release: this forms the official release of Apache Jena. All binaries artifacts and maven binaries correspond to this source.\nApache Jena Release SHA512 Signature jena-5.0.0-source-release.zip SHA512 PGP Apache Jena Binary Distributions The binary distribution of the Fuseki server:\nApache Jena Fuseki SHA512 Signature apache-jena-fuseki-5.0.0.tar.gz SHA512 PGP apache-jena-fuseki-5.0.0.zip SHA512 PGP \u0026nbsp;\nThe binary distribution of libraries contains the APIs, SPARQL engine, the TDB native RDF database and a variety of command line scripts and tools for working with these systems. Apache Jena Commands SHA512 Signature apache-jena-5.0.0.tar.gz SHA512 PGP apache-jena-5.0.0.zip SHA512 PGP \u0026nbsp;\nThe binary distribution of Fuseki as a WAR file: Apache Jena Fuseki SHA512 Signature jena-fuseki-war-5.0.0.war SHA512 PGP This can be run in any servlet application container supporting Jakarta Servlet 6.0 (part of Jakarta EE version 9), such as Apache Tomcat 10.x or later. The server must be running on the required version of Java.\nApache Jena Download area The source release and also the binaries are available from the Apache Jena Download area.\nIndividual Modules Apache Jena publishes a range of modules beyond those included in the binary distributions (code for all modules may be found in the source distribution).\nIndividual modules may be obtained using a dependency manager which can talk to Maven repositories, some modules are only available via Maven.\nMaven See \u0026ldquo;Using Jena with Apache Maven\u0026rdquo; for full details.\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;apache-jena-libs\u0026lt;/artifactId\u0026gt; \u0026lt;type\u0026gt;pom\u0026lt;/type\u0026gt; \u0026lt;version\u0026gt;X.Y.Z\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; Source code The development codebase is available from git.\nhttps://gitbox.apache.org/repos/asf?p=jena.git\nThis is also available on GitHub:\nhttps://github.com/apache/jena\nPrevious releases While previous releases are available, we strongly recommend that wherever possible users use the latest official Apache releases of Jena in preference to using any older versions of Jena.\nPrevious Apache Jena releases can be found in the Apache archive area at https://archive.apache.org/dist/jena.\nDownload Source The Apache Software foundation uses CDN-distribution for Apache projects and the current release of Jena.\n[if-any logo] [end] The currently selected mirror is [preferred]. If you encounter a problem with this mirror, please select another mirror. Other mirrors: [if-any http] [for http][http][end] [end] [if-any ftp] [for ftp][ftp][end] [end] [if-any backup] [for backup][backup] (backup)[end] [end] ","permalink":"https://jena.apache.org/download/","tags":null,"title":"Apache Jena Releases"},{"categories":null,"contents":" The Apache Jena SDB module has been retired and is no longer supported. The last release of Jena with this module was Apache Jena 3.17.0. The original documentation.\n","permalink":"https://jena.apache.org/documentation/archive/sdb/","tags":null,"title":"Apache Jena SDB - persistent triple stores using relational databases"},{"categories":null,"contents":"jena-shacl is an implementation of the W3C Shapes Constraint Language (SHACL). It implements SHACL Core and SHACL SPARQL Constraints.\nIn addition, it provides:\nSHACL Compact Syntax SPARQL-based targets Command line The command shacl introduces shacl operations; it takes a sub-command argument.\nTo validate:\nshacl validate --shapes SHAPES.ttl --data DATA.ttl shacl v -s SHAPES.ttl -d DATA.ttl The shapes and data files can be the same; the --shapes is optional and defaults to the same as --data. This includes running individual W3C Working Group tests.\nTo parse a file:\nshacl parse FILE shacl p FILE which writes out a text format.\nshacl p --out=FMT FILE writes out in text(t), compact(c), rdf(r) formats. Multiple formats can be given, separated by \u0026ldquo;,\u0026rdquo; and format all outputs all 3 formats.\nIntegration with Apache Jena Fuseki Fuseki has a new service operation fuseki:shacl:\n\u0026lt;#serviceWithShacl\u0026gt;; rdf:type fuseki:Service ; rdfs:label \u0026#34;Dataset with SHACL validation\u0026#34; ; fuseki:name \u0026#34;\u0026lt;i\u0026gt;ds\u0026lt;/i\u0026gt;\u0026#34; ; fuseki:serviceReadWriteGraphStore \u0026#34;\u0026#34; ; fuseki:endpoint [ fuseki:operation fuseki:shacl ; fuseki:name \u0026#34;shacl\u0026#34; ] ; fuseki:dataset \u0026lt;#dataset\u0026gt; ; . This requires a \u0026ldquo;new style\u0026rdquo; endpoint declaration: see \u0026ldquo;Fuseki Endpoint Configuration\u0026rdquo;.\nThis is not installed into a dataset setup by default; a configuration file using\nfuseki:endpoint [ fuseki:operation fuseki:shacl ; fuseki:name \u0026#34;shacl\u0026#34; ]; is necessary (or programmatic setup for Fuseki Main).\nThe service accepts a shapes graph posted as RDF to /ds/shacl with content negotiation.\nThere is a graph argument, ?graph=, that specifies the graph to validate. It is the URI of a named graph, default for the unnamed, default graph (and this is the assumed value of ?graph if not present), or union for union of all named graphs in the dataset.\nFurther, an argument target=uri validates a specific node in the data.\nUpload data in file fu-data.ttl:\ncurl -XPOST --data-binary @fu-data.ttl \\ --header \u0026#39;Content-type: text/turtle\u0026#39; \\ \u0026#39;http://localhost:3030/ds?default\u0026#39; Validate with shapes in fu-shapes.ttl and get back a validation report:\ncurl -XPOST --data-binary @fu-shapes.ttl \\ --header \u0026#39;Content-type: text/turtle\u0026#39; \\ \u0026#39;http://localhost:3030/ds/shacl?graph=default\u0026#39; API The package org.apache.jena.shacl has the main classes.\nShaclValidator for parsing and validation GraphValidation for updating graphs with validation API Examples https://github.com/apache/jena/tree/main/jena-examples/src/main/java/shacl/examples/\nExample Shacl01_validateGraph shows validation and printing of the validation report in a text form and in RDF:\npublic static void main(String ...args) { String SHAPES = \u0026#34;shapes.ttl\u0026#34;; String DATA = \u0026#34;data1.ttl\u0026#34;; Graph shapesGraph = RDFDataMgr.loadGraph(SHAPES); Graph dataGraph = RDFDataMgr.loadGraph(DATA); Shapes shapes = Shapes.parse(shapesGraph); ValidationReport report = ShaclValidator.get().validate(shapes, dataGraph); ShLib.printReport(report); System.out.println(); RDFDataMgr.write(System.out, report.getModel(), Lang.TTL); } Example Shacl02_validateTransaction shows how to update a graph only if, after the changes, the graph is validated according to the shapes provided.\nSHACL Compact Syntax Apache Jena supports SHACL Compact Syntax (SHACL-C) for both reading and writing.\nThe file extensions for SHACL-C are .shc and .shaclc and there is a registered language constant Lang.SHACLC.\nRDFDataMgr.load(\u0026#34;shapes.shc\u0026#34;); RDFDataMgr.read(\u0026#34;file:compactShapes\u0026#34;, Lang.SHACLC); RDFDataMgr.write(System.out, shapesGraph, Lang.SHACLC); SHACL-C is managed by the SHACL Community Group. It does not cover all possible shapes. When outputting SHACL-C, SHACL shapes not expressible in SHACL-C will cause an exception and data in the RDF graph that is not relevant will not be output. In other words, SHACL-C is a lossy format for RDF.\nThe Jena SHACL-C writer will output any valid SHACL-C document.\nExtensions:\nThe constraint grammar rule allows a shape reference to a node shape. The propertyParam grammar rule provides \u0026ldquo;group\u0026rdquo;, \u0026ldquo;order\u0026rdquo;, \u0026ldquo;name\u0026rdquo;, \u0026ldquo;description\u0026rdquo; and \u0026ldquo;defaultValue\u0026rdquo; to align with nodeParam. The nodeParam grammar rule supports \u0026ldquo;targetClass\u0026rdquo; (normally written with the shorthand -\u0026gt;) as well as the defined \u0026ldquo;targetNode\u0026rdquo;, \u0026ldquo;targetObjectsOf\u0026rdquo;, \u0026ldquo;targetSubjectsOf\u0026rdquo; SPARQL-based targets SPARQL-based targets allow the target nodes to be calculated with a SPARQL SELECT query.\nSee SPARQL-based targets for details.\nex:example sh:target [ a sh:SPARQLTarget ; sh:select \u0026#34;\u0026#34;\u0026#34; SELECT ?this WHERE { ... } \u0026#34;\u0026#34;\u0026#34; ; ] ; ValidationListener When given a ValidationListener the SHACL validation code emits events at each step of validation:\nwhen validation of a shape starts or finishes when the focus nodes of the shape have been identified when validation of a constraint begins, ends and yields positive or negative results For example, the following listener will just record all events in a List:\npublic class RecordingValidationListener implements ValidationListener { private final List\u0026lt;ValidationEvent\u0026gt; events = new ArrayList\u0026lt;\u0026gt;(); @Override public void onValidationEvent(ValidationEvent e) { events.add(e); } public List\u0026lt;ValidationEvent\u0026gt; getEvents() { return events; } } The listener must be passed to the constructor of the ValidationContext. The following example validates the dataGraph according to the shapesGraph using the ValidationListener above:\nGraph shapesGraph = RDFDataMgr.loadGraph(shapesGraphUri); //assuming shapesGraphUri points to an RDF file Graph dataGraph = RDFDataMgr.loadGraph(dataGraphUri); //assuming dataGraphUri points to an RDF file RecordingValidationListener listener = new RecordingValidationListener(); // see above Shapes shapes = Shapes.parse(shapesGraph); ValidationContext vCtx = ValidationContext.create(shapes, dataGraph, listener); // pass listener here for (Shape shape : shapes.getTargetShapes()) { Collection\u0026lt;Node\u0026gt; focusNodes = VLib.focusNodes(dataGraph, shape); for (Node focusNode : focusNodes) { VLib.validateShape(vCtx, dataGraph, shape, focusNode); } } List\u0026lt;ValidationEvent\u0026gt; actualEvents = listener.getEvents(); // all events have been recorded The events thus generated might look like this (event.toString(), one per line):\nFocusNodeValidationStartedEvent{focusNode=http://datashapes.org/sh/tests/core/node/class-001.test#Someone, shape=NodeShape[http://datashapes.org/sh/tests/core/node/class-001.test#TestShape]} ConstraintEvaluationForNodeShapeStartedEvent{constraint=ClassConstraint[\u0026lt;http://datashapes.org/sh/tests/core/node/class-001.test#Person\u0026gt;], focusNode=http://datashapes.org/sh/tests/core/node/class-001.test#Someone, shape=NodeShape[http://datashapes.org/sh/tests/core/node/class-001.test#TestShape]} ConstraintEvaluatedOnFocusNodeEvent{constraint=ClassConstraint[\u0026lt;http://datashapes.org/sh/tests/core/node/class-001.test#Person\u0026gt;], focusNode=http://datashapes.org/sh/tests/core/node/class-001.test#Someone, shape=NodeShape[http://datashapes.org/sh/tests/core/node/class-001.test#TestShape], valid=true} ConstraintEvaluationForNodeShapeFinishedEvent{constraint=ClassConstraint[\u0026lt;http://datashapes.org/sh/tests/core/node/class-001.test#Person\u0026gt;], focusNode=http://datashapes.org/sh/tests/core/node/class-001.test#Someone, shape=NodeShape[http://datashapes.org/sh/tests/core/node/class-001.test#TestShape]} FocusNodeValidationFinishedEvent{focusNode=http://datashapes.org/sh/tests/core/node/class-001.test#Someone, shape=NodeShape[http://datashapes.org/sh/tests/core/node/class-001.test#TestShape]} [...] Many use cases can be addressed with the HandlerBasedValidationListener, which allows for registering event handlers on a per-event basis. For example:\nValidationListener myListener = HandlerBasedValidationListener .builder() .forEventType(FocusNodeValidationStartedEvent.class) .addSimpleHandler(e -\u0026gt; { // ... }) .forEventType(ConstraintEvaluatedEvent.class) .addHandler(c -\u0026gt; c .iff(EventPredicates.isValid()) // use a Predicate\u0026lt;ValidationEvent\u0026gt; to select events .handle(e -\u0026gt; { // ... }) ) .build(); ","permalink":"https://jena.apache.org/documentation/shacl/","tags":null,"title":"Apache Jena SHACL"},{"categories":null,"contents":"jena-shex is an implementation of the ShEx (Shape Expressions) language.\nStatus jena-shex reads ShExC (the compact syntax) files.\nNot currently supported:\nsemantic actions EXTERNAL Blank node label validation is meaningless in Jena because a blank node label is scoped to the file, and not retained after the file has been read.\nCommand line The command shex introduces ShEx operations; it takes a sub-command argument.\nTo validate:\nshex validate --schema SCHEMA.shex --map MAP.smap --data DATA.ttl shex v -s SCHEMA.shex -m MAP.smap -d data.ttl To parse a file:\nshex parse FILE shex p FILE which writes out the parser results in a text format.\nAPI The package org.apache.jena.shex has the main classes.\nShex for reading ShEx related formats. ShexValidation for validation. API Examples Examples:\nhttps://github.com/apache/jena/tree/main/jena-examples/src/main/java/shex/examples/\npublic static void main(String ...args) { String SHAPES = \u0026#34;examples/schema.shex\u0026#34;; String SHAPES_MAP = \u0026#34;examples/shape-map.shexmap\u0026#34;; String DATA = \u0026#34;examples/data.ttl\u0026#34;; System.out.println(\u0026#34;Read data\u0026#34;); Graph dataGraph = RDFDataMgr.loadGraph(DATA); System.out.println(\u0026#34;Read schema\u0026#34;); ShexSchema shapes = Shex.readSchema(SHAPES); // Shapes map. System.out.println(\u0026#34;Read shapes map\u0026#34;); ShapeMap shapeMap = Shex.readShapeMap(SHAPES_MAP); // ShexReport System.out.println(\u0026#34;Validate\u0026#34;); ShexReport report = ShexValidator.get().validate(dataGraph, shapes, shapeMap); System.out.println(); // Print report. ShexLib.printReport(report); } ","permalink":"https://jena.apache.org/documentation/shex/","tags":null,"title":"Apache Jena ShEx"},{"categories":null,"contents":"Jump to the \u0026ldquo;Changes\u0026rdquo; section.\nOverview The SPARQL specifications provide query, update and the graph store protocol (GSP). In addition, Jena provided store operations for named graph formats.\nFor working with RDF data:\nAPI GPI Model Graph Statement Triple Resource Node Literal Node String Var Dataset DatasetGraph Quad and for SPARQL,\nAPI GPI RDFConnection RDFLink QueryExecution QueryExec UpdateExecution UpdateExec ResultSet RowSet ModelStore GSP ModelStore DSP Jena provides a single interface, RDFConnection for working with local and remote RDF data using these protocols in a unified way. This is most useful for remote data because the setup to connect is more complicated and can be done once and reused.\nHTTP authentication support is provided, supporting both basic and digest authentication in challenge-response scenarios. Most authentication setup is abstracted away from the particualr HTTP client library Jena is using.\nApplications can also use the various execution engines through QueryExecution, UpdateExecution and ModelStore.\nAll the main implementations work at \u0026ldquo;Graph SPI\u0026rdquo; (GPI) level and an application may wish to work with this lower level interface that implements generalized RDF (i.e. a triple is any three nodes, including ones like variables, and subsystem extension nodes).\nThe GPI version is the main machinery working at the storage and network level, and the API version is an adapter to convert to the Model API and related classes.\nUpdateProcessor is a legacy name for UpdateExecution\nGSP provides the SPARQL Graph Store Protocol, and \u0026lsquo;DSP\u0026rsquo; (Dataset Store Protocol) provides for sending and receiving datasets, rather than individual graphs.\nBoth API and GPI provide builders for detailed setup, particularly for remote usage over HTTP and HTTPS where detailed control of the HTTP requests is sometimes necessary to work with other triple stores.\nUse of the builders is preferred to factories. Factory style functions for many common usage patterns are retained in QueryExecutionFactory, UpdateExecutionFactory. Note that any methods that involved Apache HttpClient objects have been removed.\nChanges from Jena 4.2.0 Changes at Jena 4.3.0 Execution objects have a companion builder. This is especially important of HTTP as there many configuration options that may be needed. Local use is still covered by the existing QueryExecutionFactory as well as the new QueryExecutionBuilder.\nHTTP usage provided by the JDK java.net.http package, with challenge-based authentication provided on top by Jena. See the authentiucation documentation.\nAuthentication support is uniformly applied to query, update, GSP, DSP and SERVICE.\nHTTP/2 support\nRemove Apache HttpClient usage\nWhen using this for authentication, application code changes wil be necessary. Deprecate modifying QueryExecution after it is built.\nSubstitution of variables for concrete values in query and update execution. This is a form of paramterization that works in both local and remnote usage (unlike \u0026ldquo;initial bindings\u0026rdquo; which are only available for local query execution). See the substitution section section below.\nHttpOp, using java.net.http.HttpClient, is split into HttpRDF for GET/POST/PUT/DELETE of graphs and datasets and new HttpOp for packaged-up common patterns of HTTP usage.\nThe previous HttpOp is available as HttpOp1 and Apache HttpClient is still a dependency. Eventually, HttpOp and dependency on Apache HttpClient will be removed.\nGSP - support for dataset operations as well as graphs (also supported by Fuseki).\nDatasetAccessors removed - previously these were deprecated. GSP and ModelStore are the replacement for remote operations. RDFConnection and RDFLink provide APIs.\nChanges at Jena 4.5.0 Separate the dataset operations from the graph operations.\nGSP - SPARQL Graph Store Protocol\nDSP - Dataset Store Protocol: HTTP GET, POST, PUT operations on the datatse, e.g. quad formats like TriG.\nSubstitution All query and update builders provide operations to use a query and substitute variables for concrete RDF terms in the execution.\nUnlike \u0026ldquo;initial bindings\u0026rdquo; substitution is provided in query and update builders for both local and remote cases.\nSubstitution is always \u0026ldquo;replace variable with RDF term\u0026rdquo; in a query or update that is correct syntax. This means it does not apply to INSERT DATA or DELETE DATA but can be used with INSERT { ?s ?p ?o } WHERE {} and DELETE { ?s ?p ?o } WHERE {}.\nFull example: ExQuerySubstitute_01.java.\nResultSet resultSet1 = QueryExecution.dataset(dataset) .query(prefixes+\u0026#34;SELECT * { ?person foaf:name ?name }\u0026#34;) .substitution(\u0026#34;name\u0026#34;, name1) .select(); ResultSetFormatter.out(resultSet1); Substitution is to be preferred over \u0026ldquo;initial bindings\u0026rdquo; because it is clearly defined and applies to both query and update in both local and remote uses.\n\u0026ldquo;Substitution\u0026rdquo; and \u0026ldquo;initial bindings\u0026rdquo; are similar but not identical.\nSee also\nParameterized Queries Jena Query Builder which provide different ways to build a query.\nRDFConnection RDFConnection\ntry ( RDFConnection conn = RDFConnectionRemote.service(dataURL).build()) { conn.update(\u0026#34;INSERT DATA{}\u0026#34;); conn.queryAsk(\u0026#34;ASK{}\u0026#34;); } or the less flexible:\ntry ( RDFConnection conn = RDFConnection.connect(dataURL) ) { conn.update(\u0026#34;INSERT DATA{}\u0026#34;); conn.queryAsk(\u0026#34;ASK{}\u0026#34;); } Query Execution Builder Examples Builders are reusable and modifiable after a \u0026ldquo;build\u0026rdquo; operation.\nDataset dataset = ... Query query = ... try ( QueryExecution qExec = QueryExecution.create() .dataset(dataset) .query(query) .build() ) { ResultSet results = qExec.execSelect(); ... use results ... } and remote calls:\ntry ( QueryExecution qExec = QueryExecutionHTTP.service(\u0026#34;http://....\u0026#34;) .query(query) .build() ) { ResultSet results = qExec.execSelect(); ... use results ... } Factory Examples\nDataset dataset = ... Query query = ... try ( QueryExecution qExec = QueryExecutionFactory.create(query, dataset) ) { ResultSet results = qExec.execSelect(); ... use results ... } More complex setup:\n// JDK HttpClient HttpClient httpClient = HttpClient.newBuilder() .connectTimeout(Duration.ofSeconds(10)) // Timeout to connect .followRedirects(Redirect.NORMAL) .build(); try ( QueryExecution qExec = QueryExecutionHTTP.create() .service(\u0026#34;http:// ....\u0026#34;) .httpClient(httpClient) .query(query) .sendMode(QuerySendMode.asPost) .timeout(30, TimeUnit.SECONDS) // Timeout of request .build() ) { ResultSet results = qExec.execSelect(); ... use results ... } There is only one timeout setting for eacho HTTP query execution. The \u0026ldquo;time to connect\u0026rdquo; is handled by the JDK HttpClient. Timeouts for local execution are \u0026ldquo;time to first result\u0026rdquo; and \u0026ldquo;time to all results\u0026rdquo; as before.\nModelStore and GSP Model model = ModelStore.service(\u0026#34;http://fuseki/dataset\u0026#34;).defaultGraph().GET(); Graph graph = GSP.service(\u0026#34;http://fuseki/dataset\u0026#34;).defaultGraph().GET(); Graph graph = ... ; GSP.request(\u0026#34;http://fuseki/dataset\u0026#34;).graphName(\u0026#34;http;//data/myGraph\u0026#34;).POST(graph); DatasetGraph dataset = GSP.request(\u0026#34;http://fuseki/dataset\u0026#34;).getDataset(); SERVICE Old documentation - configuration, especially for authentication, has changed.\nSERVICE configuration See below for more on HTTP authentication with SERVICE.\nThe configuration of SERVICE operations has changed in Jena 4.3.0 and the parameter names have changed.\nSymbol Java Constant Usage arq:httpServiceAllowed ARQ.httpServiceAllowed False to disable arq:httpQueryClient ARQ.httpQueryCient An java.net.http.HttpClient object arq:httpServiceSendMode ARQ.httpServiceSendMode See Service documentation where arq: is prefix for \u0026lt;http://jena.apache.org/ARQ#\u0026gt;.\nThe timeout is now only for the overall request and manged by the HTTP client code.\nCompression of responses is not currently supported.\nCustomization of HTTP requests There is a mechanism to modify HTTP requests to specific endpoints or to a collection of endpoints with the same prefix.\nFor example, to add a header X-Tracker to each request to a particular server:\nAtomicLong counter = new AtomicLong(0); HttpRequestModifier modifier = (params, headers)-\u0026gt;{ long x = counter.incrementAndGet(); headers.put(\u0026#34;X-Tracker\u0026#34;, \u0026#34;Call=\u0026#34;+x); }; // serverURL is the HTTP URL for the server or part of the server HTTP space. RegistryRequestModifier.get().addPrefix(serverURL, modifier); The RegistryRequestModifier registry is checked on each HTTP operation. It maps URLs or prefix of URLs to a function of interface HttpRequestModifier which has access to the headers and the query string parameters of the request.\nAuthentication Documentation for authentication.\n","permalink":"https://jena.apache.org/documentation/sparql-apis/","tags":null,"title":"Apache Jena SPARQL APIs"},{"categories":null,"contents":"ARQ is a query engine for Jena that supports the SPARQL RDF Query language. SPARQL is the query language developed by the W3C RDF Data Access Working Group.\nARQ Features Standard SPARQL Free text search via Lucene SPARQL/Update Access and extension of the SPARQL algebra Support for custom filter functions, including javascript functions Property functions for custom processing of semantic relationships Aggregation, GROUP BY and assignment as SPARQL extensions Support for federated query Support for extension to other storage systems Client-support for remote access to any SPARQL endpoint Introduction A Brief Tutorial on SPARQL Application API - covers the majority of application usages Frequently Asked Questions ARQ Support Application javadoc Command line utilities Querying remote SPARQL services HTTP Authentication for ARQ Logging Explaining queries Tutorial: manipulating SPARQL using ARQ Basic federated query (SERVICE) Property paths GROUP BY and counting SELECT expressions Sub-SELECT Negation Features of ARQ that are legal SPARQL syntax\nConditions in FILTERs Free text searches Accessing lists (RDF collections) Extension mechanisms Custom Expression Functions Property Functions Library Expression function library Property function library Writing SPARQL functions Writing SPARQL functions in JavaScript Custom execution of SERVICE Constructing queries programmatically Parameterized query strings ARQ and the SPARQL algebra Extending ARQ query execution and accessing different storage implementations Custom aggregates Caching and bulk-retrieval for SERVICE Extensions Feature of ARQ that go beyond SPARQL syntax.\nLATERAL Join RDF-star Operators and functions MOD and IDIV for modulus and integer division. LET variable assignment Order results using a Collation Construct Quad Generate JSON from SPARQL Update ARQ supports the W3C standard SPARQL Update language.\nSPARQL Update The ARQ SPARQL/Update API See Also Fuseki - Server implementation of the SPARQL protocol. TDB - A SPARQL database for Jena, a pure Java persistence layer for large graphs, high performance applications and embedded use. RDFConnection, a unified API for SPARQL Query, Update and Graph Store Protocol. W3C Documents SPARQL Query Language specification SPARQL Query Results JSON Format SPARQL Protocol Articles Articles and documentation elsewhere:\nIntroducing SPARQL: Querying the Semantic Web (xml.com article by Leigh Dodds) Search RDF data with SPARQL (by Phil McCarthy) - article published on IBM developer works about SPARQL and Jena. SPARQL reference card (by Dave Beckett) Parameterised Queries with SPARQL and ARQ (by Leigh Dodds) Writing an ARQ Extension Function (by Leigh Dodds) RDF Syntax Specifications Turtle N-Triples TriG N-Quads ","permalink":"https://jena.apache.org/documentation/query/","tags":null,"title":"ARQ - A SPARQL Processor for Jena"},{"categories":null,"contents":"ARQ includes support for GROUP BY and counting. This was previously an ARQ extension but is now legal SPARQL 1.1\nGROUP BY A GROUP BY clause transforms a result set so that only one row will appear for each unique set of grouping variables. All other variables from the query pattern are projected away and are not available in the SELECT clause.\nPREFIX SELECT ?p ?q { . . . } GROUP BY ?p ?q SELECT * will include variables from the GROUP BY but no others. This ensures that results are always the same - including other variables from the pattern would involve choosing some value that was not constant across each section of the group and so lead to indeterminate results.\nThe GROUP BY clause can involve an expression. If the expression is named, then the value is included in the columns, before projection. An unnamed expression is used for grouping but the value is not placed in the result set formed by the GROUP BY clause.\nSELECT ?productId ?cost { . . . } GROUP BY ?productId (?num * ?price AS ?cost) HAVING A query may specify a HAVING clause to apply a filter to the result set after grouping. The filter may involve variables from the GROUP BY clause or aggregations.\nSELECT ?p ?q { . . . } GROUP BY ?p ?q HAVING (count(distinct *) \u0026gt; 1) Aggregation Currently supported aggregations:\nAggregator Description count(*) Count rows of each group element, or the whole result set if no GROUP BY. count(distinct *) Count the distinct rows of each group element, or the whole result set if no GROUP BY. count(?var) Count the number of times ?var is bound in a group. count(distinct ?var) Count the number of distinct values ?var is bound to in a group. sum(?x) Sum the variable over the group (non-numeric values and unbound values are ignored). When a variable is used, what is being counted is occurrences of RDF terms, that is names. It is not a count of individuals because two names can refer to the same individual.\nIf there was no explicit GROUP BY clause, then it is as if the whole of the result set forms a single group element. Equivalently, it is GROUP BY of no variables. Only aggregation expressions make sense in the SELECT clause as there are no variables from the query pattern to project out.\nARQ documentation index\n","permalink":"https://jena.apache.org/documentation/query/group-by.html","tags":null,"title":"ARQ - Aggregates"},{"categories":null,"contents":"The application API is in the package org.apache.jena.query.\nOther packages contain various parts of the system (execution engine, parsers, testing etc). Most applications will only need to use the main package. Only applications wishing to programmatically build queries or modify the behaviour of the query engine need to use the others packages directly.\nKey Classes The package org.apache.jena.query is the main application package.\nQuery - a class that represents the application query. It is a container for all the details of the query. Objects of class Query are normally created by calling one of the methods of QueryFactory methods which provide access to the various parsers. QueryExecution - represents one execution of a query. QueryExecutionFactory - a place to get QueryExecution instances. DatasetFactory - a place to make datasets. For SELECT queries: QuerySolution - A single solution to the query. ResultSet - All the QuerySolutions. An iterator. ResultSetFormatter - turn a ResultSet into various forms; into json, text, or as plain XML. SELECT queries The basic steps in making a SELECT query are outlined in the example below. A query is created from a string using the QueryFactory. The query and model or RDF dataset to be queried are then passed to QueryExecutionFactory to produce an instance of a query execution. QueryExecution objects are java.lang.AutoCloseable and can be used in try-resource. Result are handled in a loop and finally the query execution is closed.\nimport org.apache.jena.query.* ; Model model = ... ; String queryString = \u0026quot; .... \u0026quot; ; Query query = QueryFactory.create(queryString) ; try (QueryExecution qexec = QueryExecutionFactory.create(query, model)) { ResultSet results = qexec.execSelect() ; for ( ; results.hasNext() ; ) { QuerySolution soln = results.nextSolution() ; RDFNode x = soln.get(\u0026quot;varName\u0026quot;) ; // Get a result variable by name. Resource r = soln.getResource(\u0026quot;VarR\u0026quot;) ; // Get a result variable - must be a resource Literal l = soln.getLiteral(\u0026quot;VarL\u0026quot;) ; // Get a result variable - must be a literal } } It is important to cleanly close the query execution when finished. System resources connected to persistent storage may need to be released.\nA ResultSet supports the Java iterator interface so the following is also a way to process the results if preferred:\nIterator\u0026lt;QuerySolution\u0026gt; results = qexec.execSelect() ; for ( ; results.hasNext() ; ) { QuerySolution soln = results.next() ; . . . } The step of creating a query and then a query execution can be reduced to one step in some common cases:\nimport org.apache.jena.query.* ; Model model = ... ; String queryString = \u0026quot; .... \u0026quot; ; try (QueryExecution qexec = QueryExecutionFactory.create(queryString, model)) { ResultSet results = qexec.execSelect() ; . . . } Passing a result set out of the processing loop. A ResultSet is an iterator and can be traversed only once. What is more, much of query execution and result set processing is handled internally in a streaming fashion. The ResultSet returned by execSelect is not valid after the QueryExecution is closed, whether explicitly or by try-resources as the QueryExecution implements AutoCloseable.\nA result set may be materialized - this is then usable outside\ntry (QueryExecution qexec = QueryExecutionFactory.create(queryString, model)) { ResultSet results = qexec.execSelect() ; results = ResultSetFactory.copyResults(results) ; return results ; // Passes the result set out of the try-resources } The result set from ResultSetFactory.copyResults is a ResultSetRewindable which has a reset() operation that positions the iterator at the start of the result again.\nThis can also be used when the results are going to be used in a loop that modifies the data. It is not possible to update the model or dataset while looping over the results of a SELECT query.\nThe models returned by execConstruct and execDescribe are valid after the QueryExecution is closed.\nExample: formatting a result set Instead of a loop to deal with each row in the result set, the application can call an operation of the ResultSetFormatter. This is what the command line applications do.\nExample: processing results to produce a simple text presentation:\nResultSetFormatter fmt = new ResultSetFormatter(results, query) ; fmt.printAll(System.out) ; or simply:\nResultSetFormatter.out(System.out, results, query) ; Example: Processing results The results are objects from the Jena RDF API and API calls, which do not modify the model, can be mixed with query results processing:\nfor ( ; results.hasNext() ; ) { // Access variables: soln.get(\u0026quot;x\u0026quot;) ; RDFNode n = soln.get(\u0026quot;x\u0026quot;) ; // \u0026quot;x\u0026quot; is a variable in the query // If you need to test the thing returned if ( n.isLiteral() ) ((Literal)n).getLexicalForm() ; if ( n.isResource() ) { Resource r = (Resource)n ; if ( ! r.isAnon() ) { ... r.getURI() ... } } } Updates to the model must be carried out after the query execution has finished. Typically, this involves collecting results of interest in a local datastructure and looping over that structure after the query execution has finished and been closed.\nCONSTRUCT Queries CONSTRUCT queries return a single RDF graph. As usual, the query execution should be closed after use.\nQuery query = QueryFactory.create(queryString) ; QueryExecution qexec = QueryExecutionFactory.create(query, model) ; Model resultModel = qexec.execConstruct() ; qexec.close() ; DESCRIBE Queries DESCRIBE queries return a single RDF graph. Different handlers for the DESCRIBE operation can be loaded by added by the application.\nQuery query = QueryFactory.create(queryString) ; QueryExecution qexec = QueryExecutionFactory.create(query, model) ; Model resultModel = qexec.execDescribe() ; qexec.close() ; ASK Queries The operation Query.execAsk() returns a boolean value indicating whether the query pattern matched the graph or dataset or not.\nQuery query = QueryFactory.create(queryString) ; QueryExecution qexec = QueryExecutionFactory.create(query, model) ; boolean result = qexec.execAsk() ; qexec.close() ; Formatting XML results The ResultSetFormatter class has methods to write out the SPARQL Query Results XML Format. See ResultSetFormatter.outputAsXML method.\nDatasets The examples above are all queries on a single model. A SPARQL query is made on a dataset, which is a default graph and zero or more named graphs. Datasets can be constructed using the DatasetFactory:\nString dftGraphURI = \u0026quot;file:default-graph.ttl\u0026quot; ; List namedGraphURIs = new ArrayList() ; namedGraphURIs.add(\u0026quot;file:named-1.ttl\u0026quot;) ; namedGraphURIs.add(\u0026quot;file:named-2.ttl\u0026quot;) ; Query query = QueryFactory.create(queryString) ; Dataset dataset = DatasetFactory.create(dftGraphURI, namedGraphURIs) ; try(QueryExecution qExec = QueryExecutionFactory.create(query, dataset)) { ... } Already existing models can also be used:\nDataset dataset = DatasetFactory.create() ; dataset.setDefaultModel(model) ; dataset.addNamedModel(\u0026quot;http://example/named-1\u0026quot;, modelX) ; dataset.addNamedModel(\u0026quot;http://example/named-2\u0026quot;, modelY) ; try(QueryExecution qExec = QueryExecutionFactory.create(query, dataset)) { ... } ARQ documentation index\n","permalink":"https://jena.apache.org/documentation/query/app_api.html","tags":null,"title":"ARQ - Application API"},{"categories":null,"contents":"ARQ includes support for a logical assignment of variables. If the variable is already bound, it acts like a filter, otherwise the value is assignment. This makes it position independent.\nThis involves is syntactic extension and is available is the query is parsed with language Syntax.syntaxARQ (which is the default).\nSee also SELECT expressions which is also a form of assignment.\nAssignment The general form is:\nLET ( variable := expression ) For example:\nLET ( ?x := 2 ) { ?x :name ?name . LET ( ?age2 := ?age - 21 ) Note: Assignment is \u0026ldquo;:=\u0026rdquo;\nAssignment Rules ARQ assignment is single assignment, that is, once a variable is assigned a binding, then it can not be changed in the same query solution.\nOnly one LET expression per variable is allowed in a single scope.\nThe execution rules are:\nIf the expression does not evaluate (e.g. unbound variable in the expression), no assignment occurs and the query continues. If the variable is unbound, and the expression evaluates, the variable is bound to the value. If the variable is bound to the same value as the expression evaluates, nothing happens and the query continues. If the variable is bound to a different value as the expression evaluates, an error occurs and the current solution will be excluded from the results. Note that \u0026ldquo;same value\u0026rdquo; means the same as applies to graph pattern matching, not to FILTER expressions. Some graph implementation only provide same-term graph pattern matching. FILTERs always do value-based comparisons for \u0026ldquo;=\u0026rdquo; for all graphs.\nUse with CONSTRUCT One use is to perform some calculation prior to forming the result graph in a CONSTRUCT query.\nCONSTRUCT { ?x :lengthInCM ?cm } WHERE { ?x :lengthInInches ?inch . LET ( ?cm := ?inches/2.54 ) } Use with !BOUND The OPTIONAL/!BOUND/FILTER idiom for performing limited negation of a pattern in SPARQL can be inconvenient because it requires a variable in the OPTIONAL to be assigned by pattern matching. Using a LET can make that easier; here, we assign to ?z (any value will do) to mark when the matching pattern included the OPTIONAL pattern.\nExample: ?x with no \u0026ldquo;:p 1\u0026rdquo; triple:\n{ ?x a :FOO . OPTIONAL { ?x :p 1 . LET (?z := true) } FILTER ( !BOUND(?z) ) } Note that negation is supported properly through the NOT EXISTS form.\nARQ documentation index\n","permalink":"https://jena.apache.org/documentation/query/assignment.html","tags":null,"title":"ARQ - Assignment"},{"categories":null,"contents":"There are already ways to access remote RDF data. The simplest is to read a document which is an RDF graph and query it. Another way is with the SPARQL protocol which allows a query to be sent to a remote service endpoint and the results sent back (in RDF, or an XML-based results format or even a JSON one).\nSERVICE is a feature of SPARQL 1.1 that allows an executing query to make a SPARQL protocol to another SPARQL endpoint.\nSyntax PREFIX : \u0026lt;http://example/\u0026gt; PREFIX dc: \u0026lt;http://purl.org/dc/elements/1.1/\u0026gt; SELECT ?a FROM \u0026lt;mybooks.rdf\u0026gt; { ?b dc:title ?title . SERVICE \u0026lt;http://sparql.org/books\u0026gt; { ?s dc:title ?title . ?s dc:creator ?a } } Algebra There is an operator in the algebra.\n(prefix ((dc: \u0026lt;http://purl.org/dc/elements/1.1/\u0026gt;)) (project (?a) (join (BGP [triple ?b dc:title ?title]) (service \u0026lt;http://sparql.org/books\u0026gt; (BGP [triple ?s dc:title ?title] [triple ?s dc:creator ?a] )) ))) Performance Considerations This feature is a basic building block to allow remote access in the middle of a query, not a general solution to the issues in distributed query evaluation. The algebra operation is executed without regard to how selective the pattern is. So the order of the query will affect the speed of execution. Because it involves HTTP operations, asking the query in the right order matters a lot. Don\u0026rsquo;t ask for the whole of a bookstore just to find a book whose title comes from a local RDF file - ask the bookshop a query with the title already bound from earlier in the query.\nControlling SERVICE requests. The SERVICE operation in a SPARQL query may be configured via the Context. The values for configuration can be set in the global context (accessed via ARQ.getContext()) or in the per-query execution context.\nThe prefix arq: is \u0026lt;http://jena.apache.org/ARQ#\u0026gt;.\nSymbol Java Constant Default arq:httpServiceAllowed ARQ.httpServiceAllowed true arq:httpQueryClient ARQ.httpQueryClient System default. arq:httpServiceSendMode `ARQ.httpServiceSendMode unset arq:httpServiceAllowed This setting can be used to disable execution of any SERVICE request in query. Set to \u0026ldquo;false\u0026rdquo; to prohibit SERVICE requests.\narq:httpQueryClient The java.net.http HttpClient object to use for SERVICE execution.\narq:httpServiceSendMode The HTTP operation to use. The value is a string or a QuerySendMode object.\nString settings are:\nSetting Effect \u0026ldquo;POST\u0026rdquo; Use HTTP POST. Same as \u0026ldquo;asPost\u0026rdquo;. \u0026ldquo;GET\u0026rdquo; Use HTTP GET unconditionally. Same as \u0026ldquo;asGetAlways\u0026rdquo;. \u0026ldquo;asGetAlways\u0026rdquo; Use HTTP GET. \u0026ldquo;asGetWithLimitBody\u0026rdquo; Use HTTP GET upto a size limit (usually 2kbytes). \u0026ldquo;asGetWithLimitForm\u0026rdquo; Use HTTP GET upto a size limit (usually 2kbytes), and use a HTML form for the query. \u0026ldquo;asPostForm\u0026rdquo; Use HTTP POST and use an HTML form for the query. \u0026ldquo;asPost\u0026rdquo; Use HTTP POST. Old Context setting Old settings are honored where possible but should not be used:\nThe prefix srv: is the IRI \u0026lt;http://jena.hpl.hp.com/Service#\u0026gt;.\nSymbol Usage Default srv:queryTimeout Set timeouts none srv:queryCompression Enable use of deflation and GZip true srv:queryClient Enable use of a specific client none srv:serviceContext Per-endpoint configuration none srv:queryTimeout As documented above.\nsrv:queryCompression Sets the flag for use of deflation and GZip.\nBoolean: True indicates that gzip compressed data is acceptable.\nsrv:queryClient Enable use of a specific client\nProvides a slot for a specific HttpClient for use with a specific SERVICE\nsrv:serviceContext Provides a mechanism to override system context settings on a per URI basis.\nThe value is a Map\u0026lt;String,Context\u0026gt; where the map key is the URI of the service endpoint, and the Context is a set of values to override the default values.\nIf a context is provided for the URI, the system context is copied and the context for the URI is used to set specific values. This ensures that any URI specific settings will be used.\n","permalink":"https://jena.apache.org/documentation/query/service.html","tags":null,"title":"ARQ - Basic Federated SPARQL Query"},{"categories":null,"contents":"It is possible to build queries by building and abstract syntax tree (as the parser does) or by building the algebra expression for the query. It is usually better to work with the algebra form as it is more regular.\nSee the examples such as arq.examples.algrebra.AlgebraExec at jena-examples:arq/examples\nSee also ARQ - SPARQL Algebra\nARQ documentation index\n","permalink":"https://jena.apache.org/documentation/query/programmatic.html","tags":null,"title":"ARQ - Building Queries Programmatically"},{"categories":null,"contents":"ARQ supports sorting results in a query. Users are able to specify an expression that can be a function (built-in function, custom function, or a variable).\nBy default, results are sorted using the default behavior provided by the JVM. If you have the following query.\nSELECT ?label WHERE { VALUES ?label { \u0026quot;tsahurin kieli\u0026quot;@fi \u0026quot;tšekin kieli\u0026quot;@fi \u0026quot;tulun kieli\u0026quot;@fi \u0026quot;töyhtöhyyppä\u0026quot;@fi } } ORDER BY ?label The results will be returned exactly in the following order.\n\u0026ldquo;töyhtöhyyppä\u0026rdquo;@fi \u0026ldquo;tsahurin kieli\u0026rdquo;@fi \u0026ldquo;tšekin kieli\u0026rdquo;@fi \u0026ldquo;tulun kieli\u0026rdquo;@fi However, in Finnish the expected order is as follows.\n\u0026ldquo;tsahurin kieli\u0026rdquo;@fi \u0026ldquo;tšekin kieli\u0026rdquo;@fi \u0026ldquo;tulun kieli\u0026rdquo;@fi \u0026ldquo;töyhtöhyyppä\u0026rdquo;@fi To specify the collation used for sorting, we can use the ARQ collation function.\nPREFIX arq: \u0026lt;http://jena.apache.org/ARQ/function#\u0026gt; SELECT ?label WHERE { VALUES ?label { \u0026quot;tsahurin kieli\u0026quot;@fi \u0026quot;tšekin kieli\u0026quot;@fi \u0026quot;tulun kieli\u0026quot;@fi \u0026quot;töyhtöhyyppä\u0026quot;@fi } } ORDER BY arq:collation(\u0026quot;fi\u0026quot;, ?label) The function collation receives two parameters. The first is the desired collation, and the second is the function (which can be a variable, or another function).\nThe collation used, will be the Finnish collation algorithm provided with the JVM. This is done through calls to methods in the java.util.Locale class and in the java.text.Collator, to retrieve a collator.\nIf the desired collation is not available, or invalid, the JVM behavior is also adopted. It may return the default collator, but it may vary depending on the JVM vendor.\nNote that this function was released with Jena 3.4.0. Mixing locales may lead to undesired results. See JENA-1313 for more information about the implementation details.\nARQ documentation index\n","permalink":"https://jena.apache.org/documentation/query/collation.html","tags":null,"title":"ARQ - Collation"},{"categories":null,"contents":"The arq package contains some command line applications to run queries, parse queries, process result sets and run test sets.\nYou will need to set the classpath, or use the helper scripts, to run these applications from the command line. The helper scripts are in bin/ (Linux, Unix, Cygwin, OS/X) and bat/ (Windows) directories. There are ancillary scripts in the directories that the main commands need - see the tools page for setup details.\nThe commands look for file log4j2.properties in the current directory, as well as the usual log4j2 initialization with property log4j.configurationFile and looking for classpath resource log4j2.properties; there is a default setup of log4j2 built-in.\narq.query is the main query driver.\narq.qparse : parse and print a SPARQL query.\narq.uparse : parse and print a SPARQL update.\narq.update : execute SPARQL/Update requests.\narq.remote : execute a query by HTTP on a remote SPARQL endpoint.\narq.rset : transform result sets.\narq.qexpr : evaluate and print an expression.\nAll commands have a --help command for a summary of the arguments.\nWhen using a query in a file, if the query file ends .rq, it is assumed to be a SPARQL query. If it ends .arq, it is assumed to be an ARQ query (extensions to SPARQL). You can specify the syntax explicitly.\narq.query This is the main command for executing queries on data. The wrappers just set the query language.\narq.sparql : wrapper for SPARQL queries arq.arq : wrapper for ARQ queries Running arq.query --help prints the usage message. The main arguments are:\n--query FILE : The file with the query to execute --data FILE : The data to query. It will be included in the default graph. --namedgraph FILE : The data to query. It will be included as a named graph. --desc/--dataset: Jena Assembler description of the dataset to be queried, augmented with vocabulary for datasets, not just graphs. See etc/ for examples. The file extension is used to guess the file serialization format. If a data file ends .n3, it is assumed to be N3; if it ends .ttl is Turtle; if it is .nt is N-Triples; otherwise it is assumed to be RDF/XML. The data serialization can be explicitly specified on the command line.\narq.qparse Parse a query and print it out.\narq.qparse will parse the query, print it out again (with line numbers by default) and then parse the serialized query again. If your query has a syntax error, a message is printed but no query is printed. If a query is printed then you get a syntax error message, then your query was syntactically correct but the ARQ serialization is broken. Please report this.\nThe command arq.qparse --print=op --file \u0026lt;i\u0026gt;queryFile\u0026lt;/i\u0026gt;will print the SPARQL algebra for the query in SSE format.\narq.uparse Parse a SPARQL update and print it out.\narq.uparse will parse the update, print it out again (with line numbers by default) and then parse the serialized update again. If your update has a syntax error, a message is printed but no update is printed. If a update is printed then you get a syntax error message, then your query was syntactically correct but the ARQ serialization is broken. Please report this.\narq.update Execute SPARQL Update requests.\n--desc: Jena Assembler description of the dataset or graph store to be updated. See etc/ for examples. arq.rset Read and write result sets.\nIn particular,\njava -cp ... arq.rset --in xml --out text will translate a SPARQL XML Result Set into a tabular text form.\narq.qexpr Read and print an expression (something that can go in a FILTER clause). Indicates whether an evaluation exception occurred.\nThe -v argument prints the parsed expression.\narq.remote Execute a request on a remote SPARQL endpoint using HTTP.\n--service URL : The endpoint. --data FILE : Dataset description (default graph) added to the request. --namedgraph FILE : Dataset description (named graph) added to the request. --results FORMAT : Write results in specified format. Does not change the request to the server which is always for an XML form. ","permalink":"https://jena.apache.org/documentation/query/cmds.html","tags":null,"title":"ARQ - Command Line Applications"},{"categories":null,"contents":"The current W3C recommendation of SPARQL 1.1 supports the CONSTRUCT query form, which returns a single RDF graph specified by a graph template. The result is an RDF graph formed by taking each query solution in the solution sequence, substituting for the variables in the graph template, and combining the triples into a single RDF graph by set union. However, it does not directly generate quads or RDF datasets.\nIn order to eliminate this limitation, Jena ARQ extends the grammar of the CONSTRUCT query form and provides the according components, which brings more conveniences for the users manipulating RDF datasets with SPARQL.\nThis feature was added in Jena 3.0.1.\nQuery Syntax A CONSTRUCT template of the SPARQL 1.1 query String is Turtle format with possible variables. The syntax for this extension follows that style in ARQ, using TriG plus variables. Just like SPARQL 1.1, there are 2 forms for ARQ Construct Quad query:\nComplete Form CONSTRUCT { # Named graph GRAPH :g { ?s :p ?o } # Default graph { ?s :p ?o } # Default graph :s ?p :o } WHERE { # SPARQL 1.1 WHERE Clause ... } The default graphs and the named graphs can be constructed within the CONSTRUCT clause in the above way. Note that, for constructing the named graph, the token of GRAPH can be optional. The brackets of the triples to be constructed in the default graph can also be optional.\nShort Form CONSTRUCT WHERE { # Basic dataset pattern (only the default graph and the named graphs) ... } A short form is provided for the case where the template and the pattern are the same and the pattern is just a basic dataset pattern (no FILTERs and no complex graph patterns are allowed in the short form). The keyword WHERE is required in the short form.\nGrammar The normative definition of the syntax grammar of the query string is defined in this table:\nRule Expression ConstructQuery ::= \u0026lsquo;CONSTRUCT\u0026rsquo; ( ConstructTemplate DatasetClause* WhereClause SolutionModifier | DatasetClause* \u0026lsquo;WHERE\u0026rsquo; \u0026lsquo;{\u0026rsquo; ConstructQuads \u0026lsquo;}\u0026rsquo; SolutionModifier ) ConstructTemplate ::= \u0026lsquo;{\u0026rsquo; ConstructQuads \u0026lsquo;}\u0026rsquo; ConstructQuads ::= TriplesTemplate? ( ConstructQuadsNotTriples \u0026lsquo;.\u0026rsquo;? TriplesTemplate? )* ConstructQuadsNotTriples ::= ( \u0026lsquo;GRAPH\u0026rsquo; VarOrBlankNodeIri )? \u0026lsquo;{\u0026rsquo; TriplesTemplate? \u0026lsquo;}\u0026rsquo; TriplesTemplate ::= TriplesSameSubject ( \u0026lsquo;.\u0026rsquo; TriplesTemplate? )? DatasetClause, WhereClause, SolutionModifier, TriplesTemplate, VarOrIri, TriplesSameSubject are as for the SPARQL 1.1 Grammar\nProgramming API ARQ provides 2 additional methods in QueryExecution for Construct Quad.\nIterator\u0026lt;Quad\u0026gt; QueryExecution.execConstructQuads() // allow duplication Dataset QueryExecution.execConstructDataset() // no duplication One difference of the 2 methods is: The method of execConstructQuads() returns an Iterator of Quad, allowing duplication. But execConstructDataset() constructs the desired Dataset object with only unique Quads.\nIn order to use these methods, it\u0026rsquo;s required to switch on the query syntax of ARQ beforehand, when creating the Query object:\nQuery query = QueryFactory.create(queryString, Syntax.syntaxARQ); If the query is supposed to construct only triples, not quads, the triples will be constructed in the default graph. For example:\nString queryString = \u0026quot;CONSTRUCT { ?s ?p ?o } WHERE ... \u0026quot; ... // The graph node of the quads are the default graph (ARQ uses \u0026lt;urn:x-arq:DefaultGraphNode\u0026gt;). Iterator\u0026lt;Quad\u0026gt; quads = qexec.execConstructQuads(); If the query string stands for constructing quads while the method of exeConstructTriples() are called, it returns only the triples in the default graph of the CONSTRUCT query template. It\u0026rsquo;s called a \u0026ldquo;projection\u0026rdquo; on the default graph. For instance:\nString queryString = \u0026quot;CONSTRUCT { ?s ?p ?o . GRAPH ?g1 { ?s1 ?p1 ?o1 } } WHERE ...\u0026quot; ... // The part of \u0026quot;GRAPH ?g1 { ?s1 ?p1 ?o1 }\u0026quot; will be ignored. Only \u0026quot;?s ?p ?o\u0026quot; in the default graph will be returned. Iterator\u0026lt;Triple\u0026gt; triples = qexec.execConstructTriples(); More examples can be found at ExampleConstructQuads.java at jena-examples:arq/examples/constructquads/.\nFuseki Support Jena Fuseki is also empowered with Construct Quad query as a built-in function. No more additional configuration is required to switch it on. Because QueryEngineHTTP is just an implementation of QueryExecution, there\u0026rsquo;s not much difference for the client users to manipulate the programming API described in the previous sections, e.g.\nString queryString = \u0026quot; CONSTRUCT { GRAPH \u0026lt;http://example/ns#g1\u0026gt; {?s ?p ?o} } WHERE {?s ?p ?o}\u0026quot; ; Query query = QueryFactory.create(queryString, Syntax.syntaxARQ); try ( QueryExecution qExec = QueryExecution.service(serviceQuery).query(query).build() ) { // serviceQuery is the URL of the remote service Iterator\u0026lt;Quad\u0026gt; result = qExec.execConstructQuads(); ... } ... ARQ documentation index\n","permalink":"https://jena.apache.org/documentation/query/construct-quad.html","tags":null,"title":"ARQ - Construct Quad"},{"categories":null,"contents":"ARQ supports custom aggregate functions as allowed by the SPARQL 1.1 specification.\nSee jena-examples:arq/examples/aggregates.\n","permalink":"https://jena.apache.org/documentation/query/custom_aggregates.html","tags":null,"title":"ARQ - Custom aggregates"},{"categories":null,"contents":"Since Jena 4.2.0, ARQ features a plugin system for custom service executors. The relevant classes are located in the package org.apache.jena.sparql.service and are summarized as follows:\nServiceExecutorRegistry: A registry that holds a list of service executors. When Jena starts up, it configures a default registry to handle SERVICE requests against HTTP SPARQL endpoints and registers it with the global ARQ context accessible under ARQ.getContext().\nServiceExecutorFactory: This is the main interface for custom SERVICE handler implementations:\npublic interface ServiceExecutorFactory { public ServiceExecution createExecutor(OpService substituted, OpService original, Binding binding, ExecutionContext execCxt); } The second OpService parameter represents the original SERVICE clause as it occurs in the query, whereas the first parameter is the OpService obtained after substitution of all mentioned variables w.r.t. the current binding. A ServiceExecutorFactory can indicate its non-applicability for handling a request simply by returning null. In that case, Jena will ask the next service executor factory in the registry. If a request remains unhandled then the QueryExecException No SERVICE handler is raised.\nServiceExecution: If a ServiceExectorFactory can handle a request then it needs to returns a ServiceExecution instance: public interface ServiceExecution { public QueryIterator exec(); } The actual execution is started by calling the exec() method which returns a QueryIterator. Note, that there are uses cases where ServiceExecution instances may not have to be executed. For example, one may analyze which service executor factories among a set of them claim to be capable of handling a request. This can be useful for debugging or display in a dashboard of applicable service executors.\nExamples A runnable example suite is located in the jena-examples module at CustomServiceExecutor.java.\nIn the remainder we summarize the essentials of setting up a custom service executor. The following snippet sets up a simple service executor factory that relays queries targeted at Wikidata to DBpedia:\nNode WIKIDATA = NodeFactory.createURI(\u0026#34;http://query.wikidata.org/sparql\u0026#34;); Node DBPEDIA = NodeFactory.createURI(\u0026#34;http://dbpedia.org/sparql\u0026#34;); ServiceExecutorFactory myExecutorFactory = (opExecute, original, binding, execCxt) -\u0026gt; { if (opExecute.getService().equals(WIKIDATA)) { opExecute = new OpService(DBPEDIA, opExecute.getSubOp(), opExecute.getSilent()); return ServiceExecutorRegistry.httpService.createExecutor(opExecute, original, binding, execCxt); } return null; }; Global vs Local Service Executor Registration The global registry can be accessed and modified as shown below:\nServiceExecutorRegistry globalRegistry = ServiceExecutorRegistry.get(); // Note: registry.add() prepends executor factories to the internal list such // that they are consulted first! globalRegistry.add(myExecutorFactory); The following snippet shows how a custom service executor can be configured locally for an individual query execution:\nContext cxt = ARQ.getContext().copy(); ServiceExecutorRegistry localRegistry = ServiceExecutorRegistry().get().copy(); localRegistry.add(myExecutorFactory); String queryStr = \u0026#34;SELECT * { SERVICE \u0026lt;http://query.wikidata.org/sparql\u0026gt; { ?s ?p \u0026#34;Apache Jena\u0026#34;@en } }\u0026#34;; try (QueryExecution qe = QueryExecutionFactory.create(queryStr)) { ServiceExecutorRegistry.set(qe.getContext(), registry); // ... } ","permalink":"https://jena.apache.org/documentation/query/custom_service_executors.html","tags":null,"title":"ARQ - Custom Service Executors"},{"categories":null,"contents":"This page describes the mechanisms that can be used to extend and modify query execution within ARQ. Through these mechanisms, ARQ can be used to query different graph implementations and to provide different query evaluation and optimization strategies for particular circumstances. These mechanisms are used by TDB.\nARQ can be extended in various ways to incorporate custom code into a query. Custom filter functions and property functions provide ways to add application specific code. The free text search capabilities, using Apache Lucene, are provided via a property function. Custom filter functions and property functions should be used where possible.\nJena itself can be extended by providing a new implementation of the Graph interface. This can be used to encapsulate specific specialised storage and also for wrapping non-RDF sources to look like RDF. There is a common implementation framework provided by GraphBase so only one operation, the find method, needs to be written for a read-only data source. Basic find works well in many cases, and the whole Jena API will be able to use the extension. For higher SPARQL performance, ARQ can be extended at the basic graph matching or algebra level.\nApplications writers who extend ARQ at the query execution level should be prepared to work with the source code for ARQ for specific details and for finding code to reuse. Some examples can be found arq/examples directory\nOverview of ARQ Query processing The Main Query Engine Graph matching and a custom StageGenerator OpExecutor Quads Mixed Graph Implementation Datasets Custom Query Engines Extend the algebra Overview of ARQ Query Processing The sequence of actions performed by ARQ to perform a query are parsing, algebra generation, execution building, high-level optimization, low-level optimization and finally evaluation. It is not usual to modify the parsing step nor the conversion from the parse tree to the algebra form, which is a fixed algorithm defined by the SPARQL standard. Extensions can modify the algebra form by transforming it from one algebra expression to another, including introducing new operators. See also the documentation on working with the SPARQL algebra in ARQ including building algebra expressions programmatically, rather than obtaining them from a query string.\nParsing The parsing step turns a query string into a Query object. The class Query represents the abstract syntax tree (AST) for the query and provides methods to create the AST, primarily for use by the parser. The query object also provides methods to serialize the query to a string. Because this is from an AST, the string produced will be very close to the original query with the same syntactic elements, but without comments, and formatted with whitespace for readability. It is not usually the best way to build a query programmatically and the AST is not normally an extension point.\nThe query object can be used many times. It is not modified once created, and in particular it is not modified by query execution.\nAlgebra generation ARQ generates the SPARQL algebra expression for the query. After this a number of transformations can be applied (for example, identification of property functions) but the first step is the application of the algorithm in the SPARQL specification for translating a SPARQL query string, as held in a Query object into a SPARQL algebra expression. This includes the process of removing joins involving the identity pattern (the empty graph pattern).\nFor example, the query:\nPREFIX foaf: \u0026lt;http://xmlns.com/foaf/0.1/\u0026gt; SELECT ?name ?mbox ?nick WHERE { ?x foaf:name ?name ; foaf:mbox ?mbox . OPTIONAL { ?x foaf:nick ?nick } } becomes\n(prefix ((foaf: \u0026lt;http://xmlns.com/foaf/0.1/\u0026gt;)) (project (?name ?mbox ?nick) (leftjoin (bgp (triple ?x foaf:name ?name) (triple ?x foaf:mbox ?mbox) ) (bgp (triple ?x foaf:nick ?nick) ) ))) using the SSE syntax to write out the internal data-structure for the algebra.\nThe online SPARQL validator at sparql.org can be used to see the algebra expression for a SPARQL query. This validator is also included in Fuseki.\nHigh-Level Optimization and Transformations There is a collection of transformations that can be applied to the algebra, such as replacing equality filters with a more efficient graph pattern and an assignment. When extending ARQ, a query processor for a custom storage layout can choose which optimizations are appropriate and can also provide its own algebra transformations.\nA transform is code that converts an algebra operation into other algebra operations. It is applied using the Transformer class:\nOp op = ... ; Transform someTransform = ... ; op = Transformer.transform(someTransform, op) ; The Transformer class applies the transform to each operation in the algebra expression tree. Transform itself is an interface, with one method signature for each operation type, returning a replacement for the operator instance it is called on.\nOne such transformation is to turn a SPARQL algebra expression involving named graphs and triples into one using quads. This transformation is performed by a call to Algebra.toQuadForm.\nTransformations proceed from the bottom of the expression tree to the top. Algebra expressions are best treated as immutable so a change made in one part of the tree should result in a copy of the tree above it. This is automated by the TransformCopy class which is the commonly used base class for writing transforms. The other helper base class is TransformBase, which provides the identify operation (returns the node supplied) for each transform operation.\nOperations can be printed out in SSE syntax. The Java toString method is overridden to provide pretty printing and the static methods in WriterOp provide output to various output objects like java.io.OutputStream.\nLow-Level Optimization and Evaluation The step of evaluating a query is the process of executing the algebra expression, as modified by any transformations applied, to yield a stream of pattern solutions. Low-level optimizations include choosing the order in which to evaluate basic graph patterns. These are the responsibility of the custom storage layer. Low-level optimization can be carried out dynamically as part of evaluation.\nInternally, ARQ uses iterators extensively. Where possible, evaluation of an operation is achieved by feeding the stream of results from the previous stage into the evaluation. A common pattern is to take each intermediate result one at a time (use QueryIterRepeatApply to be called for each binding) , substituting the variables of pattern with those in the incoming binding, and evaluating to a query iterator of all results for this incoming row. The result can be the empty iterator (one that always returns false for hasNext). It is also common to not have to touch the incoming stream at all but merely to pass it to sub-operations.\nQuery Engines and Query Engine Factories The steps from algebra generation to query evaluation are carried out when a query is executed via the QueryExecution.execSelect or other QueryExecution exec operation. It is possible to carry out storage-specific operations when the query execution is created. A query engine works in conjunction with a QueryExecution to provide the evaluation of a query pattern. QueryExecutionBase provides all the machinery for the different result types and does not need to be modified by extensions to query execution.\nARQ provides three query engine factories; the main query engine factory, one for a reference query engine and one to remotely execute a query. TDB provides its own query engine factories which they register during sub-system initialization. Both extend the main query engine described below.\nThe reference query engine is a direct top-down evaluation of the expression. Its purpose is to be simple so it can be easily verified and checked then its results used to check more complicated processing in the main engine and other implementations. All arguments to each operator are fully evaluated to produce intermediate in-memory tables then a simple implementation of the operator is called to calculate the results. It does not scale and does not perform any optimizations. It is intended to be clear and simple; it is not designed to be efficient.\nQuery engines are chosen by referring to the registry of query engine factories.\npublic interface QueryEngineFactory { public boolean accept(Query query, DatasetGraph dataset, Context context) ; public Plan create(Query query, DatasetGraph dataset, Binding inputBinding, Context context) ; public boolean accept(Op op, DatasetGraph dataset, Context context) ; public Plan create(Op op, DatasetGraph dataset, Binding inputBinding, Context context) ; } When the query execution factory is given a dataset and query, the query execution factory tries each registered engine factory in turn calling the accept method (for query or algebra depending on how it was presented). The registry is kept in reverse registration order - the most recently registered query engine factory is tried first. The first query engine factory to return true is chosen and no further engine factories are checked.\nWhen a query engine factory is chosen, the create method is called to return a Plan object for the execution. The main operation of the Plan interface is to get the QueryIterator for the query.\nSee the example arq.examples.engine.MyQueryEngine at jena-examples:arq/examples.\nThe Main Query Engine The main query engine can execute any query. It contains a number of basic graph pattern matching implementations including one that uses the Graph.find operation so it can work with any implementation of the Jena Graph SPI. The main query engine works with general purpose datasets but not directly with quad stores; it evaluates patterns on each graph in turn. The main query engine includes optimizations for the standard Jena implementation of in-memory graphs.\nHigh-level optimization is performed by a sequence of transformations. This set of optimizations is evolving. A custom implementation of a query engine can reuse some or all of these transformations (see Algebra.optimize which is the set of transforms used by the main query engine).\nThe main query engine is a streaming engine. It evaluates expressions as the client consumes each query solution. After preparing the execution by creating the initial conditions (a partial solution of one row and no bound variables or any initial bindings of variables), the main query engine calls QC.execute which is the algorithm to execute a query. Any extension that wished to reuse some of the main query engine by providing its own OpExecutor must call this method to evaluate a sub-operation.\nQC.execute finds the currently active OpExecutor factory, creates an OpExecutor object and invokes it to evaluate one algebra operation.\nThere are two points of extension for the main query engine:\nStage generators, for evaluating basic graph patterns and reusing the rest of the engine. OpExecutor to execute any algebra operator specially. The standard OpExecutor invokes the stage generator mechanism to match a basic graph pattern.\nGraph matching and a custom StageGenerator The correct point to hook into ARQ for just extending basic graph pattern matching (BGPs) is to provide a custom StageGenerator. (To hook into filtered basic graph patterns, the extension will need to provide its own OpExecutor factory). The advantage of the StageGenerator mechanism, as compared to the more general OpExecutor described below, is that it more self-contained and requires less detail about the internal evaluation of the other SPARQL algebra operators. This extension point corresponds to section 12.6 \u0026ldquo;Extending SPARQL Basic Graph Matching\u0026rdquo;.\nBelow is the default code to match a BGP from OpExecutor.execute(OpBGP, QueryIterator). It merely calls fixed code in the StageBuilder class.The input is a stream of results from earlier stages. The execution must return a query iterator that is all the possible ways to match the basic graph pattern for each of the inputs in turn. Order of results does not matter.\nprotected QueryIterator execute(OpBGP opBGP, QueryIterator input) { BasicPattern pattern = opBGP.getPattern() ; return StageBuilder.execute(pattern, input, execCxt) ; } The StageBuilder looks for the stage generator by accessing the context for the execution:\nStageGenerator stageGenerator = (StageGenerator)context.get(ARQ.stageGenerator) ; where the context is the global context and any query execution specific additions together with various execution control elements.\nA StageGenerator is an implementation of:\npublic interface StageGenerator { public QueryIterator execute(BasicPattern pattern, QueryIterator input, ExecutionContext execCxt) ; } Setting the Stage Generator An extension stage generator can be registered on a per-query execution basis or (more usually) in the global context.\nStageBuilder.setGenerator(Context, StageGenerator) The global context can be obtained by a call to ARQ.getContext()\nStageBuilder.setGenerator(ARQ.getContext(), myStageGenerator) ; In order to allow an extensions to still permit other graphs to be used, stage generators are usually chained, with each new custom one passing the execution request up the chain if the request is not supported by this custom stage generator.\npublic class MyStageGenerator implements StageGenerator { StageGenerator above = null ; public MyStageGenerator (StageGenerator original) { above = original ; } @Override public QueryIterator execute(BasicPattern pattern, QueryIterator input, ExecutionContext execCxt) { Graph g = execCxt.getActiveGraph() ; // Test to see if this is a graph we support. if ( ! ( g instanceof MySpecialGraphClass ) ) // Not us - bounce up the StageGenerator chain return above.execute(pattern, input, execCxt) ; MySpecialGraphClass graph = (MySpecialGraphClass )g ; // Create a QueryIterator for this request ... This is registered by setting the global context (StageBuilder has a convenience operation to do this):\n// Get the standard one. StageGenerator orig = (StageGenerator)ARQ.getContext().get(ARQ.stageGenerator) ; // Create a new one StageGenerator myStageGenerator= new MyStageGenerator(orig) ; // Register it StageBuilder.setGenerator(ARQ.getContext(), myStageGenerator) ; Example: jena-examples:arq/examples/bgpmatching\nOpExecutor A StageGenerator provides matching for a basic graph pattern. If an extension wishes to take responsibility for more of the evaluation then it needs to work with OpExecutor. This includes evaluation of filtered basic graph patterns.\nAn example query using a filter:\nPREFIX dc: \u0026lt;http://purl.org/dc/elements/1.1/\u0026gt; PREFIX books: \u0026lt;http://example.org/book/\u0026gt; SELECT * WHERE { ?book dc:title ?title . FILTER regex(?title, \u0026quot;Paddington\u0026quot;) } results in the algebra expression for the pattern:\n(filter (regex ?title \u0026quot;Paddington\u0026quot;) (bgp (triple ?book dc:title ?title) )) showing that the filter is being applied to the results of a basic graph pattern matching.\nNote: this is not the way to provide custom filter operations. See the documentation for application-provided filter functions.\nEach step of evaluation in the main query engine is performed by a OpExecutor and a new one is created from a factory at each step. The factory is registered in the execution context. The implementation of a specialized OpExecutor can inherit from the standard one and override only those algebra operators it wishes to deal with, including inspecting the execution and choosing to pass up to the super-class based on the details of the operation. From the query above, only regex filters might be specially handled.\nRegistering an OpExecutorFactory:\nOpExecutorFactory customExecutorFactory = new MyOpExecutorFactory(...) ; QC.setFactory(ARQ.getCOntext(), customExecutorFactory) ; QC is a point of indirection that chooses the execution process at each stage in a query so if the custom execution wishes to evaluate an algebra operation within another operation, it should call QC.execute. Be careful not to loop endlessly if the operation is itself handled by the custom evaluator. This can be done by swapping in a different OpExecutorFactory.\n// Execute an operation with a different OpExecution Factory // New context. ExecutionContext ec2 = new ExecutionContext(execCxt) ; ec2.setExecutor(plainFactory) ; QueryIterator qIter = QC.execute(op, input, ec2) ; private static OpExecutorFactory plainFactory = new OpExecutorFactory() { @Override public OpExecutor create(ExecutionContext execCxt) { // The default OpExecutor of ARQ. return new OpExecutor(execCxt) ; } } ; Quads If a custom extension provides named graphs, then it may be useful to execute the quad form of the query. This is done by writing a custom query engine and overriding QueryEngineMain.modifyOp:\n@Override protected Op modifyOp(Op op) { op = Substitute.substitute(op, initialInput) ; // Use standard optimizations. op = super.modifyOp(op) ; // Turn into quad form. op = Algebra.toQuadForm(op) ; return op ; } The extension may need to provide its own dataset implementation so that it can detect when queries are directed to its named graph storage. TDB are examples of this.\nMixed Graph Implementation Datasets The dataset implementation used in normal operation does not work on quads but instead can provide a dataset with a collection of graphs each from different implementation sub-systems. In-memory graphs can be mixed with database backed graphs as well as custom storage systems. Query execution proceeds per-graph so that an custom OpExecutor will need to test the graph to work with to make sure it is of the right class. The pattern in the StageGenerator extension point is an example of a design pattern in that situation.\nCustom Query Engines A custom query engine enables an extension to choose which datasets it wishes to handle. It also allows the extension to intercept query execution during the setup of the execution so it can modify the algebra expression, introduce its own algebra extensions, choose which high-level optimizations to apply and also transform to the expression into quad form. Execution can proceed with the normal algorithm or a custom OpExecutor or a custom Stage Generator or a combination of all three extension mechanism.\nOnly a small, skeleton custom query engine is needed to intercept the initial setup. See the example in jena-examples:arq/examples arq.examples.engine.MyQueryEngine.\nWhile it is possible to replace the entire process of query evaluation, this is a substantial endeavour. QueryExecutionBase provides the machinery for result presentation (SELECT, CONSTRUCT, DESCRIBE, ASK), leaving the work of pattern evaluation to the custom query engine.\nAlgebra Extensions New operators can be added to the algebra using the OpExt class as the super-class of the new operator. They can be inserted into the expression to be evaluated using a custom query engine to intercept evaluation initialization. When evaluation of a query requires the evaluation of a sub-class of OpExt, the eval method is called.\n","permalink":"https://jena.apache.org/documentation/query/arq-query-eval.html","tags":null,"title":"ARQ - Extending Query Execution"},{"categories":null,"contents":"This page describes function-like operators that can be used in expressions, such as FILTERs, assignments and SELECT expressions.\nThese are not strictly functions - the evaluation semantics of custom functions is to evaluate each argument then call the function with the results of the sub-expressions. Examples in standard SPARQL include bound, which does not evaluate a variable as an expression but just tests whether it is set or not, and boolean operators || and \u0026amp;\u0026amp; which handle errors and do not just evaluate each branch and combining the results.\nThese were previously ARQ extensions but are now legal SPARQL 1.1\nIF The IF form evaluates its first argument to get a boolean result, then evaluates and return the value of the second if the boolean result is true, and the third argument if it is false.\nExamples:\nIF ( ?x\u0026lt;0 , \u0026#34;negative\u0026#34; , \u0026#34;positive\u0026#34; ) # A possible way to do default values. LET( ?z := IF(bound(?z) , ?z , \u0026#34;DftValue\u0026#34; ) ) COALESCE The COALESCEform returns the first argument of its argument list that is bound.\n# Suppose ?y is bound to \u0026#34;y\u0026#34; and ?z to \u0026#34;z\u0026#34; but ?x is not. COALESCE(?x , ?y , ?z) # return \u0026#34;y\u0026#34; ARQ documentation index\n","permalink":"https://jena.apache.org/documentation/query/function_forms.html","tags":null,"title":"ARQ - Filter Forms"},{"categories":null,"contents":"java.lang.NoClassDefFoundError : Exception in thread \u0026ldquo;main\u0026rdquo; : The classpath is wrong. Include all the jar files in lib/ before running one of the command line applications.\njava.lang.NoSuchFieldError: actualValueType : This is almost always due to using the wrong version of the Xerces library. Jena and ARQ make use of XML schema support that changed at Xerces 2.6.0 and is not compatible with earlier versions. At the time of writing Jena ships with Xerces 2.6.1.\nIn some situations your runtime environment may be picking up an earlier version of Xerces from an \u0026quot;endorsed\u0026quot; directory. You will need to either disable use of that endorsed library or replace it by a more up to date version of Xerces. This appears to happen with some distributions of Tomcat 5.\\* and certain configurations of JDK 1.4.1. Query Debugging : Look at the data in N3 or Turtle or N-triples. This can give you a better sense of the graph than RDF/XML.\nUse the [command line tools](cmds.html) and a sample of your data to develop a query, especially a complex one. Break your query up into smaller sections. How do I do test substrings of literals? : SPARQL provides regular expression matching which can be used to test for substrings and other forms that SQL\u0026rsquo;s LIKE operator provides.\nExample: find resource with an RDFS label contains the substring \u0026quot;orange\u0026quot;, matching without respecting case of the string. PREFIX rdfs: \u0026lt;http://www.w3.org/2000/01/rdf-schema#\u0026gt; SELECT ?x WHERE { ?x rdfs:label ?v . FILTER regex(?v, \u0026quot;orange\u0026quot;, \u0026quot;i\u0026quot;) } The regular expression matching in ARQ is provided by java.util.regex. Accented characters and characters outside of basic latin ~ SPARQL queries are assumed to be Unicode strings. If typing from a text editor, ensure it is working in UTF-8 and not the operating system native character set. UTF-8 is not the default character set under MS Windows.\nARQ supports \\\\u escape sequences in queries for the input of 16bit codepoints. ARQ does not support 32 bit codepoints (it would require a move to Java 1.5, including all support libraries and checking the codebase for char/codepoint inconsistencies and drop support for Java 1.4). The same is true for data. XML files can be written in any XML-supported character set if the right `?xml` processing instruction is used. The default is UTF-8 or UTF-16. XSD DateTime : Examples of correctly formatted XSD DateTime literals are: these two are actually the same point in time and will test equal in a filter:\n\u0026quot;2005-04-04T04:04:04Z\u0026quot;^^xsd:dateTime \u0026quot;2004-12-31T18:01:00-05:00\u0026quot;^^\u0026lt;http://www.w3.org/2001/XMLSchema#dateTime\u0026gt; - The timezone is required. - The datatype must be given. String Operations : ARQ provides many of the XPath/XQuery functions and operators including string operations. These include: fn:contains, fn:starts-with, fn:ends-with. See the library page for details of all function provided.\nNote 1: For string operations taken from XQuery/XPath, character positions are numbered from 1, unlike Java where they are numbered from 0. Note 2: `fn:substring` operation takes the length of the substring as the 3rd argument, unlike Java where it is the end index. ARQ documentation index\n","permalink":"https://jena.apache.org/documentation/query/faq.html","tags":null,"title":"ARQ - Frequently Asked Questions"},{"categories":null,"contents":"The current W3C recommendation of SPARQL 1.1 supports the query results in JSON format. What is described in this page is not that format, but an extension of Apache Jena, which allows users to define how results should be returned in a key/value pair fashion, providing this way a simpler output. This output can be easily used as model for web applications, or inspecting data.\nCompare the output of this extension:\n[ { \u0026quot;book\u0026quot;: \u0026quot;http://example.org/book/book6\u0026quot;, \u0026quot;title\u0026quot;: \u0026quot;Harry Potter and the Half-Blood Prince\u0026quot; }, { \u0026quot;book\u0026quot;: \u0026quot;http://example.org/book/book7\u0026quot;, \u0026quot;title\u0026quot;: \u0026quot;Harry Potter and the Deathly Hallows\u0026quot; }, ] With the output of the SPARQL 1.1 query result JSON format below:\n{ \u0026quot;head\u0026quot;: { \u0026quot;vars\u0026quot;: [ \u0026quot;book\u0026quot; , \u0026quot;title\u0026quot; ] } , \u0026quot;results\u0026quot;: { \u0026quot;bindings\u0026quot;: [ { \u0026quot;book\u0026quot;: { \u0026quot;type\u0026quot;: \u0026quot;uri\u0026quot; , \u0026quot;value\u0026quot;: \u0026quot;http://example.org/book/book6\u0026quot; } , \u0026quot;title\u0026quot;: { \u0026quot;type\u0026quot;: \u0026quot;literal\u0026quot; , \u0026quot;value\u0026quot;: \u0026quot;Harry Potter and the Half-Blood Prince\u0026quot; } } , { \u0026quot;book\u0026quot;: { \u0026quot;type\u0026quot;: \u0026quot;uri\u0026quot; , \u0026quot;value\u0026quot;: \u0026quot;http://example.org/book/book7\u0026quot; } , \u0026quot;title\u0026quot;: { \u0026quot;type\u0026quot;: \u0026quot;literal\u0026quot; , \u0026quot;value\u0026quot;: \u0026quot;Harry Potter and the Deathly Hallows\u0026quot; } } ] } } This feature was added in Jena 3.8.0.\nQuery Syntax The JSON syntax is similar in certain ways to the SPARQL CONSTRUCT syntax.\nPREFIX purl: \u0026lt;http://purl.org/dc/elements/1.1/\u0026gt; PREFIX w3: \u0026lt;http://www.w3.org/2001/vcard-rdf/3.0#\u0026gt; PREFIX : \u0026lt;http://example.org/book/\u0026gt; JSON { \u0026quot;author\u0026quot;: ?author, \u0026quot;title\u0026quot;: ?title } WHERE { ?book purl:creator ?author . ?book purl:title ?title . FILTER (?author = 'J.K. Rowling') } As in CONSTRUCT, users are able to specify how the output must look like, using a simple key/value pair pattern, which could produce the following output for the query above.\n[ { \u0026quot;author\u0026quot; : \u0026quot;J.K. Rowling\u0026quot; , \u0026quot;title\u0026quot; : \u0026quot;Harry Potter and the Deathly Hallows\u0026quot; } { \u0026quot;author\u0026quot; : \u0026quot;J.K. Rowling\u0026quot; , \u0026quot;title\u0026quot; : \u0026quot;Harry Potter and the Philosopher's Stone\u0026quot; } { \u0026quot;author\u0026quot; : \u0026quot;J.K. Rowling\u0026quot; , \u0026quot;title\u0026quot; : \u0026quot;Harry Potter and the Order of the Phoenix\u0026quot; } { \u0026quot;author\u0026quot; : \u0026quot;J.K. Rowling\u0026quot; , \u0026quot;title\u0026quot; : \u0026quot;Harry Potter and the Half-Blood Prince\u0026quot; } ] Grammar The normative definition of the syntax grammar of the query string is defined in this table:\nRule Expression JsonQuery ::= JsonClause ( DatasetClause )* WhereClause SolutionModifier JsonClause ::= \u0026lsquo;JSON\u0026rsquo; \u0026lsquo;{\u0026rsquo; JsonObjectMember ( \u0026lsquo;,\u0026rsquo; JsonObjectMember )* \u0026lsquo;}\u0026rsquo; JsonObjectMember ::= String \u0026lsquo;:\u0026rsquo; ( Var | RDFLiteral | NumericLiteral | BooleanLiteral ) DatasetClause, WhereClause, SolutionModifier, String, Var, \u0026lsquo;RDFLiteral\u0026rsquo;, NumericLiteral, and \u0026lsquo;BooleanLiteral\u0026rsquo; are as for the SPARQL 1.1 Grammar\nProgramming API ARQ provides 2 additional methods in QueryExecution for JSON.\nIterator\u0026lt;JsonObject\u0026gt; QueryExecution.execJsonItems() JsonArray QueryExecution.execJson() In order to use these methods, it\u0026rsquo;s required to switch on the query syntax of ARQ beforehand, when creating the Query object:\nQuery query = QueryFactory.create(queryString, Syntax.syntaxARQ) String queryString = \u0026quot;JSON { 'name' : ?name, 'age' : ?age } WHERE ... \u0026quot; ... Iterator\u0026lt;JsonObject\u0026gt; json = qexec.execJsonItems() Fuseki Support Users are able to use Fuseki web interface, as well as the other HTTP endpoints to submit queries using any programming language. The following example shows how to POST to the query endpoint passing the query as a form data field.\ncurl -XPOST --data \u0026quot;query=JSON { 'name' : ?name, 'age': ?age } WHERE { ... }\u0026quot; http://localhost:3030/ds/query The web interface editor parses the SPARQL implementation syntax, so syntax errors are expected in the web editor at this moment when using the JSON clause. The query should still be correctly executed, and the results displayed as with other normal SPARQL queries.\nARQ documentation index\n","permalink":"https://jena.apache.org/documentation/query/generate-json-from-sparql.html","tags":null,"title":"ARQ - Generate JSON from SPARQL"},{"categories":null,"contents":"@@ Incomplete / misnamed?\nARQ consists of the following parts:\nThe SPARQL abstract syntax tree (AST) and the SPARQL parser\nThe algebra generator that turns SPARQL AST into algebra expressions\nImplementation of the translation in the SPARQL specification. Quad version compiling SPARQL to quad expressions, not basic graph patterns. Query engines to execute queries\nSPARQL protocol client - remote HTTP requests Reference engine - direct implementation of the algebra Quad engine - direct implementation of the algebra except The main engine TDB, a SPARQL database for large-sale persistent data Result set handling for the SPARQL XML results format, the JSON and text versions.\nMain packages Package Use org.apache.jena.query The application API org.apache.jena.sparql.syntax Abstract syntax tree org.apache.jena.sparql.algebra SPARQL algebra org.apache.jena.sparql.lang The parsers: SPARQL, ARQ, RDQL org.apache.jena.sparql.expr Expression code. org.apache.jena.sparql.serializer Output in SPARQL, ARQ forms, in SPARQL syntax, in an abstract form (useful in debugging) and in XML. org.apache.jena.sparql.engine The abstraction of a query engine. org.apache.jena.sparql.engine.main The usual query engine. org.apache.jena.sparql.engine.ref The reference query engine (and quad version) Key Execution Classes Bindings Query Iterators Context ARQ documentation index\n","permalink":"https://jena.apache.org/documentation/query/architecture.html","tags":null,"title":"ARQ - Internal Design"},{"categories":null,"contents":"ARQ supports writing custom SPARQL functions in JavaScript. These functions can be used in FILTERs and for calculating values to assign with AS in BIND and SELECT expressions.\nXSD datatypes for strings, numbers and booleans are converted to the native JavaScript datatypes. RDFterms that do not fit easily into JavaScript datatypes are handled with a object class NV.\nApplications should be aware that there are risks in exposing a script engine with full computational capabilities through SPARQL. Script functions are only as secure as the script engine environment they run in.\nRequirements ARQ requires a javascript engine such as GraalVM to be added to the classpath.\n\u0026lt;properties\u0026gt; \u0026lt;ver.graalvm\u0026gt;....\u0026lt;/ver.graalvm\u0026gt; ... \u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.graalvm.js\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;js\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;${ver.graalvm}\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; \u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.graalvm.js\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;js-scriptengine\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;${ver.graalvm}\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; Enabling and Loading JavaScript functions JavaScript is loaded from an external file using the context setting \u0026ldquo;http://jena.apache.org/ARQ#js-library\u0026quot;. This can be written as arq:js-library for commands and Fuseki configuration files.\nAccess to the script engine must be enabled at runtime. The Java system property to do this is jena:scripting.\nExample:\nexport JVM_ARGS=-Djena:scripting=true sparql --set arq:js-library=SomeFile.js --data ... --query ... and for MS Windows:\nset JVM_ARGS=-Djena:scripting=true sparql --set arq:js-library=SomeFile.js --data ... --query ... will execute on the data with the JavaScript functions from file \u0026ldquo;SomeFile.js\u0026rdquo; available.\nJavaScript functions can also be set from a string directly from within Java using constant ARQ.symJavaScriptFunctions (\u0026ldquo;http://jena.apache.org/ARQ#js-functions\u0026quot;).\nWARNING: Enabling this feature exposes the majority of the underlying scripting engine directly to SPARQL queries so may provide a vector for arbitrary code execution. Therefore it is recommended that this feature remain disabled for any publicly accessible deployment that utilises the ARQ query engine.\nIdentifying callable functions The context setting \u0026ldquo;\u0026ldquo;http://jena.apache.org/ARQ#scriptAllowList\u0026quot; is used to provide a comma-separated list of function names, which are the local part of the URI, that are allowed to be called as custom script functions.\nThis can be written as arq:scriptAllowList for commands and Fuseki configuration files. It is the java constant ARQ.symCustomFunctionScriptAllowList\nsparql --set arq:js-library=SomeFile.js \\ --set arq:scriptAllowList=toCamelCase,anotherFunction --data ... --query ... and a query of:\nPREFIX js: \u0026lt;http://jena.apache.org/ARQ/jsFunction#\u0026gt; SELECT ?input (js:toCamelCase(?input) AS ?X) { VALUES ?input { \u0026quot;some woRDs to PROCESS\u0026quot; } } Using JavaScript functions SPARQL functions implemented in JavaScript are automatically called when a URI starting \u0026ldquo;http://jena.apache.org/ARQ/jsFunction#\u0026quot; used.\nThis can conveniently be abbreviated by:\nPREFIX js: \u0026lt;http://jena.apache.org/ARQ/jsFunction#\u0026gt; Arguments and Function Results xsd:string (a string with no language tag), any XSD numbers (integer, decimal, float, double and all the derived types) and xsd:boolean are converted to JavaScript string, number and boolean respectively.\nSPARQL functions must return a value. When a function returns a value, it can be one of these JavaScript native datatypes, in which case the reverse conversion is applied back to XSD datatypes. For numbers, the conversion is back to xsd:integer (if it has no fractional part) or xsd:double.\nThe JavaScript function can also create NodeValue (or NV) objects for other datatypes by calling Java from within the JavaScript script engine of the Java runtime.\nURIs are passed as NV object and are available in JavaScript as a string.\nThe class NV is used for all other RDF terms.\nReturning JavaScript null is the error indicator and a SPARQL expression error (ExprEvalException) is raised, like any other expression error in SPARQL. That, in turn, will cause the whole expression the function is part of to evaluate to an error (unless a special form like COALESCE is used). In a FILTER that typically makes the filter evaluate to \u0026ldquo;false\u0026rdquo;.\nExample Suppose \u0026ldquo;functions.js\u0026rdquo; contains code to camel case words in a string. For example, \u0026ldquo;some words to process \u0026quot; becomes \u0026ldquo;someWordsToProcess\u0026rdquo;.\n// CamelCase a string // Words to be combined are separated by a space in the string. function toCamelCase(str) { return str .split(' ') .map(cc) .join(''); } function ucFirst(word) { return word.charAt(0).toUpperCase() + word.slice(1).toLowerCase(); } function lcFirst(word) { return word.toLowerCase(); } function cc(word,index) { return (index == 0) ? lcFirst(word) : ucFirst(word); } and the query Q.rq\nPREFIX js: \u0026lt;http://jena.apache.org/ARQ/jsFunction#\u0026gt; SELECT ?input (js:toCamelCase(?input) AS ?X) { VALUES ?input { \u0026quot;some woRDs to PROCESS\u0026quot; } } which results in:\n-------------------------------------------------- | input | X | ================================================== | \u0026quot;some woRDs to PROCESS\u0026quot; | \u0026quot;someWordsToProcess\u0026quot; | -------------------------------------------------- Use with Fuseki The context setting can be provided on the command line starting the server, for example:\nexport JVM_ARGS=-Djena:scripting=true fuseki --set arq:js-library=functions.js \\ --set arq:scriptAllowList=toCamelCase \\ --mem /ds or it can be specified in the server configuration file config.ttl:\nPREFIX : \u0026lt;#\u0026gt; PREFIX fuseki: \u0026lt;http://jena.apache.org/fuseki#\u0026gt; PREFIX rdf: \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026gt; PREFIX rdfs: \u0026lt;http://www.w3.org/2000/01/rdf-schema#\u0026gt; PREFIX ja: \u0026lt;http://jena.hpl.hp.com/2005/11/Assembler#\u0026gt; [] rdf:type fuseki:Server ; # Set the server-wide context ja:context [ ja:cxtName \u0026quot;arq:js-library\u0026quot; ; ja:cxtValue \u0026quot;/filepath/functions.js\u0026quot; ] ; ja:context [ ja:cxtName \u0026quot;arq:scriptAllowList\u0026quot; ; ja:cxtValue \u0026quot;toCamelCase\u0026quot; ] ; . \u0026lt;#service\u0026gt; rdf:type fuseki:Service; rdfs:label \u0026quot;Dataset\u0026quot;; fuseki:name \u0026quot;ds\u0026quot;; fuseki:serviceQuery \u0026quot;sparql\u0026quot;; fuseki:dataset \u0026lt;#dataset\u0026gt; ; . \u0026lt;#dataset\u0026gt; rdf:type ja:DatasetTxnMem; ja:data \u0026lt;file:D.trig\u0026gt;; . and used as:\nexport JVM_ARGS=-Djena:scripting=true fuseki --conf config.ttl ","permalink":"https://jena.apache.org/documentation/query/javascript-functions.html","tags":null,"title":"ARQ - JavaScript SPARQL Functions"},{"categories":null,"contents":"Lateral joins using the keyword LATERAL were introduced in Apache Jena 4.7.0.\nA LATERAL join is like a foreach loop, looping on the results from the left-hand side (LHS), the pattern before the LATERAL keyword, and executing the right-hand side (RHS) query pattern once for each row, with the variables from the input LHS in-scope during each RHS evaluation.\nA regular join only executes the RHS once, and the variables from the LHS are used for the join condition after evaluation of the left and right sub-patterns.\nAnother way to think of a lateral join is as a flatmap.\nExamples:\n## Get exactly one label for each subject with type `:T` SELECT * { ?s rdf:type :T LATERAL { SELECT * { ?s rdfs:label ?label } LIMIT 1 } } ## Get zero or one labels for each subject. SELECT * { ?s ?p ?o LATERAL { OPTIONAL { SELECT * { ?s rdfs:label ?label } LIMIT 1 } } } Syntax The LATERAL keyword which takes the graph pattern so far in the group, from the { starting of the current block, and a { } block afterwards.\nEvaluation Substituting variables from the LHS into the RHS (with the same restrictions), then executing the pattern, gives the evaluation of LATERAL.\nVariable assignment There needs to be a new syntax restriction: there can no variable introduced by AS (BIND, or sub-query) or VALUES in-scope at the top level of the LATERAL RHS, that is the same name as any in-scope variable from the LHS.\nSuch a variable assignment would conflict with the variable being set in variables of the row being joined.\n## ** Illegal ** SELECT * { ?s ?p ?o LATERAL { BIND( 123 AS ?o) } } See SPARQL Grammar note 12.\nIn ARQ, LET would work. LET for a variable that is bound acts like a filter.\nVariable Scopes In looping on the input, a lateral join makes the bindings of variables in the current row available to the right-hand side pattern, setting their value from the top down.\nIn SPARQL, it is possible to have variables of the same name which are not exposed within a sub-select. These are not lateral-joined to a variable of the same name from the LHS.\nThis is not specific to lateral joins. In\nSELECT * { ?s rdf:type :T { SELECT ?label { ?s rdfs:label ?label } } } the inner ?s can be replaced by ?z without changing the results because the inner ?s is not joined to the outer ?s but instead is hidden by the SELECT ?label.\nSELECT * { ?s rdf:type :T { SELECT ?label { ?z rdfs:label ?label } } } The same rule applies to lateral joins.\nSELECT * { ?s rdf:type :T LATERAL { SELECT ?label { ?s rdfs:label ?label } LIMIT 1 } } The inner ?s in the SELECT ?label is not the outer ?s because the SELECT ?label does not pass out ?s. As a sub-query the ?s could be any name except ?label for the same results.\nNotes There is a similarity to filter NOT EXISTS/EXISTS expressed as the non-legal FILTER ( ASK { pattern } ) where the variables of the row being filtered are available to \u0026ldquo;pattern\u0026rdquo;. This is similar to an SQL correlated subquery.\nSPARQL Specification Additional Material Syntax LATERAL is added to the SPARQL grammar at rule [[56] GraphPatternNotTriples](https://www.w3.org/TR/sparql11-query/#rGraphPatternNotTriples). As a syntax form, it is similar to OPTIONAL.\n[56] GraphPatternNotTriples ::= GroupOrUnionGraphPattern | OptionalGraphPattern | LateralGraphPattern | ... [57] OptionalGraphPattern ::= \u0026#39;OPTIONAL\u0026#39; GroupGraphPattern [ ] LateralGraphPattern ::= \u0026#39;LATERAL\u0026#39; GroupGraphPattern Algebra The new algebra operator is lateral which takes two expressions\nSELECT * { ?s ?p ?o LATERAL { ?a ?b ?c } } is translated to:\n(lateral (bgp (triple ?s ?p ?o)) (bgp (triple ?a ?b ?c))) Evaluation To evaluate lateral:\nEvaluate the first argument (left-hand side from syntax) to get a multiset of solution mappings. For each solution mapping (\u0026ldquo;row\u0026rdquo;), inject variable bindings into the second argument Evaluate this pattern Add to results Outline:\nDefinition: Lateral Let Ω be a multiset of solution mappings. We define: Lateral(Ω, P) = { μ | union of Ω1 where foreach μ1 in Ω: pattern2 = inject(pattern, μ1) Ω1 = eval(D(G), pattern2) result Ω1 } where inject is the corrected substitute operation.\nAn alternative style is to define Lateral more like \u0026ldquo;evaluate P such that μ is in-scope\u0026rdquo; in some way, rather than rely on inject which is a mechanism.\nDefinition: Evaluation of Lateral eval(D(G), Lateral(P1, P2) = Lateral(eval(D(G), P1), P2) ","permalink":"https://jena.apache.org/documentation/query/lateral-join.html","tags":null,"title":"ARQ - Lateral Join"},{"categories":null,"contents":"ARQ uses SLF4j as the logging API and the query and RIOT commands use Log4J2 as a deployment system. You can use Java 1.4 logging instead.\nARQ does not output any logging messages at level INFO in normal operation. The code uses level TRACE and DEBUG. Running with logging set to an application at INFO will cause no output in normal operation. Output below INFO can be very verbose and is intended mainly to help debug ARQ. WARN and FATAL messages are only used when something is wrong.\nThe root of all the loggers is org.apache.jena. org.apache.jena.query is the application API. org.apache.jena.sparql is the implementation and extensions points.\nIf using in Tomcat, or other system that provides complex class loading arrangements, be careful about loading from jars in both the web application and the system directories as this can cause separate logging systems to be created (this may not matter).\nThe ARQ and RIOT command line utilities look for a file \u0026ldquo;log4j2.properties\u0026rdquo; in the current directory to control logging during command execution. There is also a built-in configuration so no configuration work is required.\nLogger Names Name Constant Logger Use org.apache.jena.arq.info ARQ.logInfoName ARQ.getLoggerInfo() General information org.apache.jena.arq.exec ARQ.logExecName ARQ.getLoggerExec() Execution information The reading of log4j2.properties from the current directory is achieved by a call to org.apache.jena.atlas.logging.Log.setlog4j2().\nExample log4j2.properties file:\nstatus = error name = PropertiesConfig filters = threshold filter.threshold.type = ThresholdFilter filter.threshold.level = INFO appender.console.type = Console appender.console.name = STDOUT appender.console.layout.type = PatternLayout appender.console.layout.pattern = %d{HH:mm:ss} %-5p %-15c{1} :: %m%n rootLogger.level = INFO rootLogger.appenderRef.stdout.ref = STDOUT logger.jena.name = org.apache.jena logger.jena.level = INFO logger.arq-exec.name = org.apache.jena.arq.exec logger.arq-exec.level = INFO logger.arq-info.name = org.apache.jena.arq.info logger.arq-info.level = INFO logger.riot.name = org.apache.jena.riot logger.riot.level = INFO A Fuseki server output can include ARQ execution logging, see Fuseki logging for the configuration.\nExecution Logging ARQ can log query and update execution details globally or for an individual operations. This adds another level of control on top of the logger level controls.\nExplanatory messages are controlled by the Explain.InfoLevel level in the execution context.\nThe logger used is called org.apache.jena.arq.exec. Message are sent at level \u0026ldquo;info\u0026rdquo;. So for log4j2, the following can be set in the log4j2.properties file:\nlogger.arq-exec.name = org.apache.jena.arq.exec logger.arq-exec.level = INFO The context setting is for key (Java constant) ARQ.symLogExec. To set globally:\nARQ.setExecutionLogging(Explain.InfoLevel.ALL); and it may also be set on an individual query execution using its local context.\ntry(QueryExecution qExec = QueryExecution.create()... .set(ARQ.symLogExec, Explain.InfoLevel.ALL).build) { ... } On the command line:\narq.query --explain --data data file --query=queryfile The command tdbquery takes the same --explain argument.\nInformation levels\nLevel Effect INFO Log each query FINE Log each query and its algebra form after optimization ALL Log query, algebra and every dataset access (can be expensive) NONE No information logged These can be specified as string, to the command line tools, or using the constants in Explain.InfoLevel.\nqExec.getContext().set(ARQ.symLogExec, Explain.InfoLevel.FINE); arq.query --set arq:logExec=FINE --data data file --query=queryfile ARQ documentation index\n","permalink":"https://jena.apache.org/documentation/query/logging.html","tags":null,"title":"ARQ - Logging"},{"categories":null,"contents":"Negation by Failure (OPTIONAL + !BOUND) Standard SPARQL 1.0 can perform negation using the idiom of OPTIONAL/!BOUND. It is inconvenient and can be hard to use as complexity increases. SPARQL 1.1 supports additional operators for negation.\n# Names of people who have not stated that they know anyone PREFIX foaf: \u0026lt;http://xmlns.com/foaf/0.1/\u0026gt; SELECT ?name WHERE { ?x foaf:givenName ?name . OPTIONAL { ?x foaf:knows ?who } . FILTER (!BOUND(?who)) } EXISTS and NOT EXISTS The EXISTS and NOT EXISTS are now legal SPARQL 1.1 when used inside a FILTER, they may be used as bare graph patterns only when Syntax.syntaxARQ is used\nThere is the NOT EXISTS operator which acts at the point in the query where it is written. It does not bind any variables but variables already bound in the query will have their bound value.\n# Names of people who have not stated that they know anyone PREFIX foaf: \u0026lt;http://xmlns.com/foaf/0.1/\u0026gt; SELECT ?name WHERE { ?x foaf:givenName ?name . FILTER NOT EXISTS { ?x foaf:knows ?who } } There is also an EXISTS operator.\n# Names of people where it is stated that they know at least one other person. PREFIX foaf: \u0026lt;http://xmlns.com/foaf/0.1/\u0026gt; SELECT ?name WHERE { ?x foaf:givenName ?name . FILTER EXISTS { ?x foaf:knows ?who . FILTER(?who != ?x) } } In this example, the pattern is a little more complex. Any graph pattern is allowed although use of OPTIONAL is pointless (which will always match, possible with no additional results).\nNOT EXISTS and EXISTS can also be used in FILTER expressions. In SPARQL, FILTER expressions act over the whole of the basic graph pattern in which they occur.\n# Names of people who have not stated that they know anyone PREFIX foaf: \u0026lt;http://xmlns.com/foaf/0.1/\u0026gt; SELECT ?name WHERE { ?x foaf:givenName ?name . FILTER (NOT EXISTS { ?x foaf:knows ?who }) } A note of caution:\nPREFIX foaf: \u0026lt;http://xmlns.com/foaf/0.1/\u0026gt; SELECT ?name WHERE { ?x foaf:givenName ?name . FILTER (NOT EXISTS { ?x foaf:knows ?y }) ?x foaf:knows ?who } is the same as (it\u0026rsquo;s a single basic graph pattern - the filter does not break it in two):\nPREFIX foaf: \u0026lt;http://xmlns.com/foaf/0.1/\u0026gt; SELECT ?name WHERE { ?x foaf:givenName ?name . ?x foaf:knows ?y . FILTER (NOT EXISTS { ?x foaf:knows ?who }) } and the FILTER will always be false ({ ?x foaf:knows ?y } must have matched to get to this point in the query and using ?who instead makes no difference).\nMINUS SPARQL 1.1 also provides a MINUS keyword which is broadly similar to NOT EXISTS though does have some key differences as explained in the specification:\nPREFIX foaf: \u0026lt;http://xmlns.com/foaf/0.1/\u0026gt; SELECT ?name WHERE { ?x foaf:givenName ?name . ?x foaf:knows ?y . MINUS { ?x foaf:knows \u0026lt;http://example.org/A\u0026gt; } } Here we subtract any solutions where ?x also knows http://example.org/A\nOne of the key differences between MINUS and NOT EXISTS is that it is a child graph pattern and so breaks the graph pattern and so the result of the query can change depending where the MINUS is placed. This is unlike the earlier NOT EXISTS examples where moving the position of the FILTER resulted in equivalent queries.\nNOT IN SPARQL 1.1 also has a simpler form of negation for when you simply need to restrict a variable to not being in a given set of values, this is the NOT IN function:\nPREFIX foaf: \u0026lt;http://xmlns.com/foaf/0.1/\u0026gt; SELECT ?name WHERE { ?x foaf:givenName ?name . ?x foaf:knows ?y . FILTER(?y NOT IN (\u0026lt;http://example.org/A\u0026gt;, \u0026lt;http://example.org/B\u0026gt;)) } This would filter out matches where the value of ?y is either http://example.org/A or http://example.org/B\nARQ documentation index\n","permalink":"https://jena.apache.org/documentation/query/negation.html","tags":null,"title":"ARQ - Negation"},{"categories":null,"contents":"SPARQL is a query language and a remote access protocol. The remote access protocol runs over HTTP.\nSee Fuseki for an implementation of the SPARQL protocol over HTTP. Fuseki uses ARQ to provide SPARQL query access to Jena models, including Jena persistent models.\nARQ includes a query engine capable of using the HTTP version.\nFrom your application The QueryExecutionHTTP has methods for creating a QueryExecution object for remote use. There are various HTTP specific settings; the default should work in most cases.\nThe remote request is made when the execSelect, execConstruct, execDescribe or execAsk method is called.\nThe results are held locally after remote execution and can be processed as usual.\nFrom the command line The arq.rsparql command can issue remote query requests using the --service argument:\njava -cp ... arq.rsparql --service 'http://host/service' --query 'SELECT ?s WHERE {?s [] []}' Or: rsparql \u0026ndash;service \u0026lsquo;http://host/service\u0026rsquo; \u0026ndash;query \u0026lsquo;SELECT ?s WHERE {?s [] []}\u0026rsquo;\nThis takes a URL that is the service location.\nThe query given is parsed locally to check for syntax errors before sending.\nAuthentication ARQ provides a flexible API for authenticating against remote services, see the HTTP Authentication documentation for more details.\nFirewalls and Proxies Don\u0026rsquo;t forget to set the proxy for Java if you are accessing a public server from behind a blocking firewall. Most home firewalls do not block outgoing requests; many corporate firewalls do block outgoing requests.\nIf, to use your web browser, you need to set a proxy, you need to do so for a Java program.\nSimple examples include:\n-DsocksProxyHost=YourSocksServer -DsocksProxyHost=YourSocksServer -DsocksProxyPort=port -Dhttp.proxyHost=WebProxy -Dhttp.proxyPort=Port This can be done in the application if it is done before any network connection are made:\nSystem.setProperty(\u0026quot;socksProxyHost\u0026quot;, \u0026quot;socks.corp.com\u0026quot;); Consult the Java documentation for more details. Searching the web is also very helpful.\nARQ documentation index\n","permalink":"https://jena.apache.org/documentation/query/sparql-remote.html","tags":null,"title":"ARQ - Querying Remote SPARQL Services"},{"categories":null,"contents":"RDF collections, also called RDF lists, are difficult to query directly.\nARQ provides a 3 property functions to work with RDF collections.\nlist:member \u0026ndash; members of a list list:index \u0026ndash; index of a member in a list list:length \u0026ndash; length of a list list:member is similar to rdfs:member except for RDF lists. ARQ also provides rdfs:member.\nSee the property functions library page.\nARQ documentation index\n","permalink":"https://jena.apache.org/documentation/query/rdf_lists.html","tags":null,"title":"ARQ - RDF Collections"},{"categories":null,"contents":"The SELECT statement of a query can include expressions, not just variables. This was previously a SPARQL extension but is now legal SPARQL 1.1\nExpressions are enclosed in () and can be optionally named using AS. If no name is given, and internal name is allocated which may not be a legal SPARQL variable name. In order to make results portable in the SPARQL Query Results XML Format, the application must specify the name so using AS is strongly encouraged.\nExpressions can involve group aggregations.\nExpressions that do not correctly evaluate result in an unbound variable in the results. That is, the illegal expression is silently skipped.\nExamples:\nPREFIX : \u0026lt;http://example/\u0026gt; SELECT (?p+1 AS ?q) { :x :p ?p } PREFIX rdf: \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026gt; PREFIX : \u0026lt;http://example/\u0026gt; SELECT (count(*) AS ?count) { :x rdf:type :Class } ARQ documentation index\n","permalink":"https://jena.apache.org/documentation/query/select_expr.html","tags":null,"title":"ARQ - SELECT Expressions"},{"categories":null,"contents":"A SPARQL query in ARQ goes through several stages of processing:\nString to Query (parsing) Translation from Query to a SPARQL algebra expression Optimization of the algebra expression Query plan determination and low-level optimization Evaluation of the query plan This page describes how to access and use expressions in the SPARQL algebra within ARQ. The definition of the SPARQL algebra is to be found in the SPARQL specification in section 12. ARQ can be extended to modify the evaluation of the algebra form to access different graph storage implementations.\nThe classes for the datastructures for the algebra resize in the package org.apache.jena.sparql.algebra in the op subpackage. All the classes are names \u0026ldquo;Op...\u0026rdquo;; the interface that they all offer is \u0026ldquo;Op\u0026rdquo;.\nViewing the algebra expression for a Query The command line tool arq.qparse will print the algebra form of a query:\narq.qparse --print=op --query=Q.rq arq.qparse --print=op 'SELECT * { ?s ?p ?o}' The syntax of the output is SSE, a simple format for writing data structures involving RDF terms. It can be read back in again to produce the Java form of the algebra expression.\nTurning a query into an algebra expression Getting the algebra expression for a Query is simply a matter of passing the parsed Query object to the transaction function in the Algebra class:\nQuery query = QueryFactory.create(.....) ; Op op = Algebra.compile(query) ; And back again.\nQuery query = OpAsQuery.asQuery(op) ; System.out.println(query.serialize()) ; This reverse translation can handle any algebra expression originally from a SPARQL Query, but not any algebra expression. It is possible to create programmatically useful algebra expressions that can not be turned into a query, especially if they involve algebra. Also, the query produced may not be exactly the same but will yield the same results (for example, filters may be moved because the SPARQL query algebra translation in the SPARQL specification moves filter expressions around).\nDirectly reading and writing algebra expression The SSE class is a collection of functions to parse SSE expressions for the SPARQL algebra but also RDF terms, filter expressions and even dataset and graphs.\nOp op = SSE.parseOp(\u0026quot;(bgp (?s ?p ?o))\u0026quot;) ; // Read a string Op op = SSE.readOp(\u0026quot;filename.sse\u0026quot;) ; // Read a file The SSE class simply calls the appropriate builder operation from the org.apache.jena.sparql.sse.builder package.\nTo go with this, there is a collection of writers for many of the Java structures in ARQ.\nOp op = ... ; SSE.write(op) ; // Write to stdout Writers default to writing to System.out but support calls to any output stream (it manages the conversion to UTF-8) and ARQ own IndentedWriters form for embedding in structured output. Again, SSE is simply passing the calls to the writer operation from the org.apache.jena.sparql.sse.writer package.\nCreating an algebra expression programmatically See the example in AlgebraExec.\nTo produce the complete javadoc for ARQ, download an ARQ distribution and run the ant task \u0026lsquo;javadoc-all\u0026rsquo;.\nEvaluating a algebra expression QueryIterator qIter = Algebra.exec(op,graph) ; QueryIterator qIter = Algebra.exec(op,datasetGraph) ; Evaluating an algebra expression produces a iterator of query solutions (called Bindings).\nfor ( ; qIter.hasNext() ; ) { Binding b = qIter.nextBinding() ; Node n = b.get(var_x) ; System.out.println(var_x+\u0026quot; = \u0026quot;+FmtUtils.stringForNode(n)) ; } qIter.close() ; Operations of CONSTRUCT, DESCRIBE and ASK are done on top of algebra evaluation. Applications can access this functionality by creating their own QueryEngine (see arq.examples.engine.MyQueryEngine) and it\u0026rsquo;s factory. A query engine is a one-time use object for each query execution.\nARQ documentation index\n","permalink":"https://jena.apache.org/documentation/query/algebra.html","tags":null,"title":"ARQ - SPARQL Algebra"},{"categories":null,"contents":"SPARQL Update is a W3C standard for an RDF update language with SPARQL syntax. It is described in \u0026ldquo;SPARQL 1.1 Update\u0026rdquo;.\nA SPARQL Update request is composed of a number of update operations, so in a single request graphs can be created, loaded with RDF data and modified.\nSome examples of ARQ\u0026rsquo;s SPARQL Update support are to be found in the download in jena-examples:arq/examples/update.\nThe main API classes are:\nUpdateRequest - A list of Update to be performed. UpdateFactory - Create UpdateRequest objects by parsing strings or parsing the contents of a file. UpdateAction - execute updates To execute a SPARQL Update request as a script from a file:\nDataset dataset = ... UpdateAction.readExecute(\u0026quot;update.ru\u0026quot;, dataset) ; To execute a SPARQL Update request as a string:\nDataset dataset = ... UpdateAction.parseExecute(\u0026quot;DROP ALL\u0026quot;, dataset) ; The application writer can create and execute operations:\nUpdateRequest request = UpdateFactory.create() ; request.add(\u0026quot;DROP ALL\u0026quot;) .add(\u0026quot;CREATE GRAPH \u0026lt;http://example/g2\u0026gt;\u0026quot;) .add(\u0026quot;LOAD \u0026lt;file:etc/update-data.ttl\u0026gt; INTO \u0026lt;http://example/g2\u0026gt;\u0026quot;) ; // And perform the operations. UpdateAction.execute(request, dataset) ; but be aware that each operation added needs to be a complete SPARQL Update operation, including prefixes if needed.\nARQ documentation index\n","permalink":"https://jena.apache.org/documentation/query/update.html","tags":null,"title":"ARQ - SPARQL Update"},{"categories":null,"contents":"ARQ includes support for nested SELECTs. This was previously an ARQ extension but is now legal SPARQL 1.1\nNested SELECT A SELECT query can be placed inside a graph pattern to produce a table that is used within the outer query. A nested SELECT statement is enclosed in {} and is the only element in that group.\nExample: find toys with more than five orders:\nPREFIX : \u0026lt;http://example/\u0026gt; SELECT ?x { ?x a :Toy . { SELECT ?x ( count(?order) as ?q ) { ?x :order ?order } GROUP BY ?x } FILTER ( ?q \u0026gt; 5 ) } ARQ documentation index\n","permalink":"https://jena.apache.org/documentation/query/sub-select.html","tags":null,"title":"ARQ - Sub Queries"},{"categories":null,"contents":"ARQ uses URIs of the form \u0026lt;java:\u0026lt;i\u0026gt;package.class\u0026lt;/i\u0026gt;\u0026gt; to provide dynamic loading of code for value functions and property functions. ARQ loads the class when needed. For functions and property functions, it also wraps it in the necessary factory code. A new instance of the function or property function is created for each mention of the name in each query.\nDynamic Code Loading Any classes loaded by ARQ must already be on the java classpath. ARQ does not create any new class loaders, nor modify the Java class path in any way. The class path must be set up to include any class files or jar files for dynamically loaded code.\nClasses can be mor conveniently named in queries using SPARQL PREFIXes but because dots can\u0026rsquo;t appear in the local part of a prefixed name, all the package name and the final dot must be in the PREFIX declaration.\nPREFIX fn: \u0026lt;java:org.example.functions.\u0026gt; # Including the final dot ... FILTER fn:alter(?x) ... Remapping All code loading is performed via the MappedLoader class. Before actually loading the code, the mapped loader applies any transformation of URIs. For example, the ARQ function library has a namespace of \u0026lt;http://jena.apache.org/ARQ/function#\u0026gt; and resides in the Java package org.apache.jena.sparql.function.library. The mapped loader includes a partial rewrite rule turning http URLs starting with that namespace into java: URIs using the package name.\n","permalink":"https://jena.apache.org/documentation/query/java-uri.html","tags":null,"title":"ARQ - The java: URI scheme"},{"categories":null,"contents":"Applications can add SPARQL functions to the query engine. This is done by writing a class implementing the right interface, then either registering it or using the fake java: URI scheme to dynamically call the function.\nWriting SPARQL Value Functions A SPARQL value function is an extension point of the SPARQL query language that allows URI to name a function in the query processor.\nIn the ARQ engine, code to implement function must implement the interface org.apache.jena.sparql.function.Function although it is easier to work with one of the abstract classes for specific numbers of arguments like org.apache.jena.sparql.function.FunctionBase1 for one argument functions. Functions do not have to have a fixed number of arguments.\nThe abstract class FunctionBase, the superclass of FunctionBase1 to FunctionBase4, evaluates its arguments and calls the implementation code with argument values (if a variable was unbound, an error will have been generated)\nIt is possible to get unevaluated arguments but care must be taken not to violate the rules of function evaluation. The standard functions that access unevaluated arguments are the logical \u0026lsquo;or\u0026rsquo; and logical \u0026lsquo;and\u0026rsquo; operations that back || and \u0026amp;\u0026amp; are special forms to allow for the special exception handling rules.\nNormally, function should be a pure evaluation based on its argument. It should not access a graph nor return different values for the same arguments (to allow expression optimization). Usually, these requirements can be better met with a property function. Functions can\u0026rsquo;t bind variables; this would be done in a property function as well.\nExample: (this is the max function in the standard ARQ library):\npublic class max extends FunctionBase2 { public max() { super() ; } public NodeValue exec(NodeValue nv1, NodeValue nv2) { return Functions.max(nv1, nv2) ; } } The function takes two arguments and returns a single value. The class NodeValue represents values and supports value-based operations. NodeValue value support includes the XSD datatypes, xsd:decimal and all its subtypes like xsd:integer and xsd:byte, xsd\u0026rsquo;;double, xsd:float, xsd:boolean, xsd:dateTime and xsd:date. Literals with language tags are also treated as values in additional \u0026ldquo;value spaces\u0026rdquo; determined by the language tag without regard to case.\nThe Functions class contains the core XML Functions and Operators operations. Class NodeFunctions contains the implementations of node-centric operations like isLiteral and str.\nIf any of the arguments are wrong, then the function should throw ExprEvalException.\nExample: calculate the canonical namespace from a URI (calls the Jena operation for the actual work):\npublic class namespace extends FunctionBase1 { public namespace() { super() ; } public NodeValue exec(NodeValue v) { Node n = v.asNode() ; if ( ! n.isURI() ) throw new ExprEvalException(\u0026quot;Not a URI: \u0026quot;+FmtUtils.stringForNode(n)) ; String str = n.getNameSpace() ; return NodeValue.makeString(str) ; } } This throws an evaluation exception if it is passed a value that is not a URI.\nThe standard library, in package org.apache.jena.sparql.function.library, contains many examples.\nRegistering Functions The query compiler finds functions based on the functions URI. There is a global registry of known functions, but any query execution can have its own function registry.\nFor each function, there is a function factory associated with the URI. A new function instance is created for each use of a function in each query execution.\n// Register with the global registry. FunctionRegistry.get().put(\u0026quot;http://example.org/function#myFunction\u0026quot;, new MyFunctionFactory()) ; A common case is registering a specific class for a function implementation so there is an addition method that takes a class, wraps in a built-in function factory and registers the function implementation.\n// Register with the global registry. FunctionRegistry.get().put(\u0026quot;http://example.org/function#myFunction\u0026quot;, MyFunction.class) ; Another convenience route to function calling is to use the java: URI scheme. This dynamically loads the code, which must be on the Java classpath. With this scheme, the function URI gives the class name. There is automatic registration of a wrapper into the function registry. This way, no explicit registration step is needed by the application and queries issues with the command line tools can load custom functions.\nPREFIX f: \u0026lt;java:app.myFunctions.\u0026gt; ... FILTER f:myTest(?x, ?y) ... FILTER (?x + f:myIntToXSD(?y)) ... ARQ documentation index\n","permalink":"https://jena.apache.org/documentation/query/writing_functions.html","tags":null,"title":"ARQ - Writing Filter Functions"},{"categories":null,"contents":"ARQ - Writing Property Functions\nSee also Writing Filter Functions.\nApplications can add SPARQL property functions to the query engine. This is done by first implementing the PropertyFunction interface, and then either registering that function or using the fake java: URI scheme to dynamically load the function.\nWriting SPARQL Property Functions\nSimilar to SPARQL Filter Functions, a SPARQL Property Function is an extension point of the SPARQL query language that allows a URI to name a function in the query processor. A key difference is that Property Functions may generate new bindings.\nJust like org.apache.jena.sparql.function.Function there are various utility classes provided to simplify the creation of a Property Function. The selection of one depends on the \u0026lsquo;style\u0026rsquo; of the desired built-in. For example, PFuncSimple is expected to be the predicate of triple patterns ?such ex:as ?this, where neither argument is an rdf:list, and either may be a variable. Alternatively, PFuncAssignToObject assumes that the subject will be bound, while the object will be a variable.\nPropertyFunction | |--PropertyFunctionBase | |--PropertyFunctionEval | |--PFuncSimpleAndList | |--PFuncSimple | |--PFuncAssignToObject | |--PFuncAssignToSubject | |--PFuncListAndSimple | |--PFuncListAndList The choice of extension point determines the function signature that the developer will need to implement, and primarily determines whether some of the arguments will be org.apache.jena.graph.Nodes or org.apache.jena.sparql.pfunction.PropFuncArgs. In the latter case, the programmer can determine whether the argument is a list as well as how many arguments it consists of.\nRegistration\nEvery property function is associated with a particular org.apache.jena.sparql.util.Context. This allows you to limit the availability of the function to be global or associated with a particular dataset. For example, a custom Property Function may expose an index which only has meaning with respect to some set of data.\nAssuming you have an implementation of org.apache.jena.sparql.pfunction.PropertyFunctionFactory (shown later), you can register a function as follows:\nfinal PropertyFunctionRegistry reg = PropertyFunctionRegistry.chooseRegistry(ARQ.getContext()); reg.put(\u0026quot;urn:ex:fn#example\u0026quot;, new ExamplePropertyFunctionFactory()); PropertyFunctionRegistry.set(ARQ.getContext(), reg); The only difference between global and dataset-specific registration is where the Context object comes from:\nfinal Dataset ds = DatasetFactory.createGeneral(); final PropertyFunctionRegistry reg = PropertyFunctionRegistry.chooseRegistry(ds.getContext()); reg.put(\u0026quot;urn:ex:fn#example\u0026quot;, new ExamplePropertyFunctionFactory()); PropertyFunctionRegistry.set(ds.getContext(), reg); Note that org.apache.jena.sparql.pfunction.PropertyFunctionRegistry has other put methods that allow registration by passing a Class object, as well.\nImplementation\nThe implementation of a Property Function is actually quite straight forward once one is aware of the tools at their disposal to do so. For example, if we wished to create a Property Function that returns no results regardless of their arguments we could do so as follows:\npublic class ExamplePropertyFunctionFactory implements PropertyFunctionFactory { @Override public PropertyFunction create(final String uri) {\treturn new PFuncSimple() { @Override public QueryIterator execEvaluated(final Binding parent, final Node subject, final Node predicate, final Node object, final ExecutionContext execCtx) {\treturn QueryIterNullIterator.create(execCtx); } }; } } Node and PropFuncArg objects allow the developer to reflect on the state of the arguments, and choose what bindings to generate given the intended usage of the Property Function. For example, if the function expects a list of three bound arguments for the object of the property, then it can throw a ExprEvalException (or derivative) to indicate incorrect use. It is the responsibility of the developer to identify what parts of the argument are bound, and to respond appropriately.\nFor example, if ?a ex:f ?b were a triple pattern in a query, it could be called with ?a bound, ?b bound, or neither. It may make sense to return new bindings that include ?b if passed a concrete value for ?a, or conversely to generate new bindings for ?a when passed a concrete ?b. If both ?a and ?b are bound, and the function wishes to confirm that the pairing is valid, it can return the existing binding. If there are no valid solutions to return, then an empty solution may be presented.\nThere are several extremely useful implementations of QueryIterator within the Jena library that make it easy to support typical use cases.\nOf particular note:\nQueryIterNullIterator - to indicate that there are no valid solutions/bindings for the given values QueryIterSingleton - to provide a single solution/binding for the given values QueryIterPlainWrapper - to provide multiple solutions/bindings for the given values The second two cases require instances of Binding objects which can be obtained through static methods of BindingFactory. Creation of Binding objects will also require references to Var and NodeFactory\nNote that it can make a lot of sense to generate the Iterator\u0026lt;Binding\u0026gt; for QueryIterPlainWrapper by means of Jena\u0026rsquo;s ExtendedIterator. This can allow domain-specific value to be easily mapped to Binding objects in a lazy fashion.\nGraph Operations\nAdditional operations on the current, or another, Graph can be achieved through the Execution Context. Once retrieved the Graph can be operated upon directly, queried or wrapped in a Model, if preferred.\n// Retrieve current Graph. Graph graph = execCxt.getActiveGraph(); // Wrap Graph in a Model. Model model = ModelFactory.createModelForGraph(graph); Access another graph:\n// Retrieve DatasetGraph of current Graph. DatasetGraph datasetGraph = execCxt.getDataset(); // Retrieve a different Graph in the Dataset. Node otherGraphNode = NodeFactory.createURI(\u0026quot;http://example.org/otherGraph\u0026quot;); Graph otherGraph = datasetGraph.getNamedGraph(otherGraphNode); // Access the other graph ExtendedIterator\u0026lt;Triple\u0026gt; iter = otherGraph.find(...); ","permalink":"https://jena.apache.org/documentation/query/writing_propfuncs.html","tags":null,"title":"ARQ - Writing Property Functions"},{"categories":null,"contents":"Read the following first:\nFrequently Asked Questions Submitting a support request or bug reports The documentation Support for ARQ is provided via the Jena mailing list \u0026lt;mailto:users@jena.apache.org\u0026gt;.\nARQ documentation index\n","permalink":"https://jena.apache.org/documentation/query/support.html","tags":null,"title":"ARQ – Support"},{"categories":null,"contents":"For details on downloading ARQ, please see the Jena downloads page.\nARQ documentation index\n","permalink":"https://jena.apache.org/documentation/query/download.html","tags":null,"title":"ARQ Downloads"},{"categories":null,"contents":"ARQ - Property Paths A property path is a possible route through a graph between two graph nodes. A trivial case is a property path of length exactly one, which is a triple pattern.\nMost property paths are now legal SPARQL 1.1 syntax, there are some advanced property paths which are syntactic extensions and are only available if the query is parsed with language Syntax.syntaxARQ.\nPath Language A property path expression (or just \u0026lsquo;path\u0026rsquo;) is similar to a string regular expression but over properties, not characters. ARQ determines all matches of a path expression and binds subject or object as appropriate. Only one match is recorded - no duplicates for any given path expression, although if the path is used in a situation where it\u0026rsquo;s initial points is already repeated in a pattern, then this duplication is preserved.\nPath example Meaning dc:title | rdfs:label Dublin Core title or an RDFS label. foaf:knows/foaf:name Name of people one \u0026ldquo;knows\u0026rdquo; steps away. foaf:knows/foaf:knows/foaf:name Name of people two \u0026ldquo;knows\u0026rdquo; steps away. In the description below, uri is either a URI or a prefixed name.\nSyntax Form Matches uri A URI or a prefixed name. A path of length one. ^elt Reverse path (object to subject) (elt) A group path elt, brackets control precedence. elt1 / elt2 A sequence path of elt1, followed by elt2 elt1 | elt2 A alternative path of elt1, or elt2 (both possibilities are tried) elt* A path of zero or more occurrences of elt. elt+ A path of one or more occurrences of elt. elt? A path of zero or one elt. !uri A path matching a property which isn\u0026rsquo;t uri (negated property set) !(uri1|\u0026hellip;|uriN) A path matching a property which isn\u0026rsquo;t any of uri1 ... uri (negated property set) ARQ extensions: to use these you must use Syntax.syntaxARQ\nSyntax Form Matches elt1 ^ elt2 Shorthand for elt1 / ^elt2, that is elt1 followed by reverse elt2. elt{n,m} A path between n and m occurrences of elt. elt{n} Exactly n occurrences of elt. A fixed length path. elt{n,} n or more occurrences of elt. elt{,n} Between 0 and n occurrences of elt. Precedence:\nURI, prefixed names Negated property set Groups Unary ^ reverse links Unary operators *, ?, + and {} forms Binary operators / and ^ Binary operator | Precedence is left-to-right within groups.\nPath Evaluation Paths are \u0026ldquo;simple\u0026rdquo; if they involve only operators / (sequence), ^ (reverse, unary or binary) and the form {n}, for some single integer n. Such paths are fixed length. They are translated to triple patterns by the query compiler and do not require special path-evaluation at runtime.\nA path of just a URI is still a single triple pattern.\nA path is \u0026ldquo;complex\u0026rdquo; if it involves one or more of the operators *,?, + and {}. Such paths require special evaluation and provide expressivity outside of strict SPARQL because paths can be of variable length. When used with models backed by SQL databases, complex path expressions may take some time.\nA path of length zero connects a graph node to itself.\nCycles in paths are possible and are handled.\nPaths do not need to be anchored at one end of the other, although this can lead to large numbers of result because the whole graph is searched.\nProperty functions in paths are only available for simple paths.\nExtended Language This involves is syntactic extension and is available if the query is parsed with language Syntax.syntaxARQ.\nPaths can be directly included in the query in the property position of a triple pattern:\nPREFIX : \u0026lt;http://example/\u0026gt; PREFIX rdf: \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026gt; PREFIX rdfs: \u0026lt;http://www.w3.org/2000/01/rdf-schema#\u0026gt; # Find the types of :x, following subClassOf SELECT * { :x rdf:type/rdfs:subClassOf* ?t } Examples Simple Paths Find the name of any people that Alice knows.\n{ ?x foaf:mbox \u0026lt;mailto:alice@example\u0026gt; . ?x foaf:knows/foaf:name ?name . } Find the names of people 2 \u0026ldquo;foaf:knows\u0026rdquo; links away.\n{ ?x foaf:mbox \u0026lt;mailto:alice@example\u0026gt; . ?x foaf:knows/foaf:knows/foaf:name ?name . } This is the same as the strict SPARQL query:\n{ ?x foaf:mbox \u0026lt;mailto:alice@example\u0026gt; . ?x foaf:knows [ foaf:knows [ foaf:name ?name ]]. } or, with explicit variables:\n{ ?x foaf:mbox \u0026lt;mailto:alice@example\u0026gt; . ?x foaf:knows ?a1 . ?a1 foaf:knows ?a2 . ?a2 foaf:name ?name . } Because someone Alice knows may well know Alice, the example above may include Alice herself. This could be avoided with:\n{ ?x foaf:mbox \u0026lt;mailto:alice@example\u0026gt; . ?x foaf:knows/foaf:knows ?y . FILTER ( ?x != ?y ) ?y foaf:name ?name } These two are the same query: the second is just reversing the property direction which swaps the roles of subject and object.\n{ ?x foaf:mbox \u0026lt;mailto:alice@example\u0026gt; } { \u0026lt;mailto:alice@example\u0026gt; ^foaf:mbox ?x } Mutual foaf:knows relationships: ?x knows someone who knows ?x\n{ ?x foaf:knows^foaf:knows ?x . } Negated property sets define matching by naming one or more properties that must not match. Match if there is a triple from ?x to ?y which is not rdf:type.\n{ ?x !rdf:type ?y . } { ?x !(rdf:type|^rdf:type) ?y . } Only properties and reverse properties are allowed in a negated property set, not a full path expression.\nComplex Paths Find the names of all the people can be reached from Alice by foaf:knows:\n{ ?x foaf:mbox \u0026lt;mailto:alice@example\u0026gt; . ?x foaf:knows+/foaf:name ?name . } Again, because of cycles in foaf:knows relationships, it is likely to include Alice herself.\nSome forms of limited inference are possible as well. For example: all types and supertypes of a resource:\n{ \u0026lt;http://example/\u0026gt; rdf:type/rdfs:subClassOf* ?type } All resources and all their inferred types:\n{ ?x rdf:type/rdfs:subClassOf* ?type } Use with Legal SPARQL Syntax A path can parsed, then installed as a property function to be referred to by URI. This way, when the URI is used in the predicate location in a triple pattern, the path expression is evaluated.\nPath path = ... String uri = ... PathLib.install(uri, path) ; For example:\nPath path = PathParser.parse(\u0026quot;rdf:type/rdfs:subClassOf*\u0026quot;, PrefixMapping.Standard) ; String uri = \u0026quot;http://example/ns#myType\u0026quot; ; PathLib.install(uri, path) ; and the SPARQL query:\nPREFIX : \u0026lt;http://example/\u0026gt; PREFIX ns: \u0026lt;http://example/ns#\u0026gt; # Find the types of :x, following subClassOf SELECT * { :x ns:myType ?t} This also works with if an existing property is redefined (a URI in a path expression is not interpreted as a property function) so, for example, rdf:type can be redefined as a path that also considers RDFS sub -class relationships. The path is a complex path so the property function for rdf:type is not triggered.\nPath path = PathParser.parse(\u0026quot;rdf:type/rdfs:subClassOf*\u0026quot;, PrefixMapping.Standard) ; PathLib.install(RDF.type.getURI(), path) ; ARQ documentation index\n","permalink":"https://jena.apache.org/documentation/query/property_paths.html","tags":null,"title":"ARQ Property Paths"},{"categories":null,"contents":"The name of the project is “Apache Jena”. That should appear as the first use in a paper and in a reference. After that \u0026ldquo;Jena\u0026rdquo; can be used. It is also a trademark of the Apache Software Foundation. This is also the industry practice.\nThe reference should indicate the website https://jena.apache.org/ (https is preferable). If relevant to reproducibility, or discussing performance, the release version number MUST also be included. The date of access would also be helpful to the reader.\nYou can use names such as “TDB” and “Fuseki” on their own. They are informal names to parts of the whole system. They also change over time and versions. You could say “Apache Jena Fuseki” for the triplestore but as the components function as part of the whole, “Apache Jena” would be accurate.\nThe first paper citing Jena is Jena: implementing the semantic web recommendations. That only covers the API and its implementation. Some parts of the system mentioned in that paper have been dropped a long time ago (e.g. the “RDB” system). The paper is also prior to the move to under the Apache Software Foundation. It is also good to acknowledge Brian McBride, who started the project.\nHere is an example of what a citation may look like:\nApache Software Foundation, 2021. Apache Jena, Available at: https://jena.apache.org/. ","permalink":"https://jena.apache.org/about_jena/citing.html","tags":null,"title":"Citing Jena"},{"categories":null,"contents":"Apache Jena initializes uses Java\u0026rsquo;s ServiceLoader mechanism to locate initialization steps. The documentation for process in Jena is available here.\nThere are a number of files (Java resources) in Jena jars named:\nMETA-INF/services/org.apache.jena.sys.JenaSubsystemLifecycle Each has different contents, usually one or two lines.\nWhen making a combined jar (\u0026ldquo;uber-jar\u0026rdquo;, jar with dependencies) from Jena dependencies and application code, the contents of the Jena files must be combined and be present in the combined jar as a java resource of the same name.\nMaven The maven shade plugin is capable of doing this process in a build using a \u0026ldquo;transformer\u0026rdquo;.\nThe Apache Jena uses the shade plugin technique itself to make the combined jar for Fuseki. It uses the maven shade plugin with a transformer.\nThis is an extract from the POM:\n\u0026lt;plugin\u0026gt; \u0026lt;groupId\u0026gt;org.apache.maven.plugins\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;maven-shade-plugin\u0026lt;/artifactId\u0026gt; \u0026lt;configuration\u0026gt; ... \u0026lt;transformers\u0026gt; \u0026lt;transformer implementation=\u0026#34;org.apache.maven.plugins.shade.resource.ServicesResourceTransformer\u0026#34;/\u0026gt; ... other transformers ... \u0026lt;/transformers\u0026gt; \u0026lt;/configuration\u0026gt; \u0026lt;/plugin\u0026gt; See jena-fuseki2/jena-fuseki-server/pom.xml for the complete shade plugin setup used by Fuseki.\nGradle For Gradle, the shadowJar plugin has the mergeServiceFiles operation.\nplugins { ... id \u0026#34;com.github.johnrengelman.shadow\u0026#34; version \u0026#34;7.1.2\u0026#34; } shadowJar { mergeServiceFiles() } ... Manual assembling If doing manually, create a single file (META-INF/services/org.apache.jena.sys.JenaSubsystemLifecycle) in your application jar containing the lines of all the services resource files. The order does not matter. Jena calls modules in the right order.\n","permalink":"https://jena.apache.org/documentation/notes/jena-repack.html","tags":null,"title":"Combining Apache Jena jars"},{"categories":null,"contents":"Jena includes various command-line utilities which can help you with a variety of tasks in developing Jena-based applications.\nIndex of tools schemagen using schemagen from maven Setting up your Environment An environment variable JENA_HOME is used by all the command line tools to configure the class path automatically for you. You can set this up as follows:\nOn Linux / Mac\nexport JENA_HOME=the directory you downloaded Jena to export PATH=$PATH:$JENA_HOME/bin On Windows\nSET JENA_HOME =the directory you downloaded Jena to SET PATH=%PATH%;%JENA_HOME%\\bat Running the Tools Once you\u0026rsquo;ve done the above you should now be able to run the tools from the command line like so:\nOn Linux / Mac\nsparql --version On Windows\nsparql.bat --version This command will simply print the versions of Jena and ARQ used in your distribution, all the tools support the --version option. To find out how to use a specific tool add the --help flag instead.\nNote that many examples of using Jena tools typically use the Linux style invocation because most of the Jena developers work on Linux/Mac platforms. When running on windows simply add .bat as an extension to the name of the command line tool to run it, on some versions of Windows this may not be required.\nCommon Issues with Running the Tools If you receive errors stating that a class is not found then it is most likely that JENA_HOME is not set correctly. As a quick check you can try the following to see if it is set appropriately:\nOn Linux / Mac\ncd $JENA_HOME On Windows\ncd %JENA_HOME% If this command fails then JENA_HOME is not correctly set, please ensure you have set it correctly and try again.\nWindows users may experience problems if trying to run the tools when their JENA_HOME path contains spaces in it, there are two workarounds for this:\nMove your Jena installation to a path without spaces Grab the latest scripts from main where they have been fixed to safely handle this. Future releases will include this fix and resolve this issue Command Line Tools Quick Reference riot and Related See Reading and Writing RDF in Apache Jena for more information.\nriot: parse RDF data, guessing the syntax from the file extension. Assumes that standard input is N-Quads/N-Triples unless you tell it otherwise with the --syntax parameter. riot can also do RDFS inferencing, count triples, convert serializations, validate syntax, concatenate datasets, and more.\nturtle, ntriples, nquads, trig, rdfxml: specialized versions of riot that assume that the input is in the named serialization.\nrdfparse: parse an RDF/XML document, for which you can usually just use riot, but this can also pull triples out of rdf:RDF elements embedded at arbitrary places in an XML document if you need to deal with those.\nSPARQL Queries on Local Files and Endpoints See ARQ - Command Line Applications for more about these.\narq and sparql: run a query in a file named as a command line parameter on a dataset in one or more files named as command line parameters.\nqparse: parse a query, report on any problems, and output a pretty-printed version of the query.\nuparse: do the same thing as qparse but for update requests.\nrsparql: send a local query to a SPARQL endpoint specified with a URL, giving you the same choice of output formats that arq does.\nrupdate: send a local update query to a SPARQL endpoint specified with a URL, assuming that is accepting updates from you.\nQuerying and Manipulating Fuseki Datasets The following utilities let you work with data stored using a local Fuseki triplestore. They can be useful for automating queries and updates of data stored there. Each requires an assembler file pointing at a dataset as a parameter; Fuseki creates these for you.\nFor each pair of utilities shown, the first is used with data stored using the TDB format and the second with data stored using the newer and more efficient TDB2 format.\nThe TDB and TDB2 - Command Line Tools pages describe these further.\ntdbquery, tdb2.tdbquery: query a dataset that has been stored with Fuseki.\ntdbdump, tdb2.tdbdump: dump the contents of a Fuseki dataset to standard out.\ntdbupdate, tdb2.tdbupdate: run an update request against a Fuseki dataset.\ntdbloader, tdb2.tdbloader: load a data from a file into a Fuseki dataset.\ntdbstats, tdb2.tdbstats: output a short report of information about a Fuseki dataset.\ntdbbackup, tdb2.tdbbackup: create a gzipped copy of the Fuseki dataset\u0026rsquo;s triples.\nnot implemented for TDB1, tdb2.tdbcompact: reduce the size of the Fuseki dataset.\nOther Handy Command Line Tools shacl: validate a dataset against a set of shapes and constraints described in a file that conforms to the W3C SHACL standard. Jena\u0026rsquo;s SHACL page has more on this utility.\nshex: validate data using ShEx from the W3C Shape Expressions Community Group. Jena\u0026rsquo;s ShEx page has more on this utility.\nrdfdiff: compare the triples in two datasets, regardless of their serializations, and list which are different between the two datasets. (Modeled on the UNIX diff utility.)\niri: Parse a IRI and tell you about it, with errors and warnings. Good for checking for issues like proper escaping.\n","permalink":"https://jena.apache.org/documentation/tools/","tags":null,"title":"Command-line and other tools for Jena developers"},{"categories":null,"contents":" All datasets provide transactions. This is the preferred way to handle concurrenct access to data. Applications need to be aware of the concurrency issues in access Jena models. API operations are not thread safe by default. Thread safety would simple ensure that the model data-structures remained intact but would not give an application consistent access to the RDF graph. It would also limit the throughput of multi-threaded applications on multiprocessor machines where true concurrency can lead to a reduction in response time.\nFor example, supposed an application wishes to read the name and age of a person from model. This takes two API calls. It is more convenient to be able to read that information in a consistent fashion, knowing that the access to the second piece of information is not being done after some model change has occurred.\nSpecial care is needed with iterators. In general, Jena\u0026rsquo;s iterators do not take a copy to enable safe use in the presence of concurrent update. A multi-threaded application needs to be aware of these issues and correctly use the mechanisms that Jena provides (or manage its own concurrency itself). While not zero, the application burden is not high.\nThere are two main cases:\nMultiple threads in the same JVM. Multiple applications accessing the same persistent model (typically, a database). This note describes the support for same-JVM, multi-threaded applications using in-memory Jena Models.\nLocks Locks provide critical section support for managing the interactions of multiple threads in the same JVM. Jena provides multiple-reader/single-writer concurrency support (MRSW).\nThe pattern general is:\nModel model = . . . ; model.enterCriticalSection(Lock.READ) ; // or Lock.WRITE try { ... perform actions on the model ... ... obey contract - no update operations if a read lock } finally { model.leaveCriticalSection() ; } Applications are expected to obey the lock contract, that is, they must not do update operations if they have a read lock as there can be other application threads reading the model concurrently.\nIterators Care must be taken with iterators: unless otherwise stated, all iterators must be assumed to be iterating over the data-structures in the model or graph implementation itself. It is not possible to safely pass these out of critical sections.\n","permalink":"https://jena.apache.org/documentation/notes/concurrency-howto.html","tags":null,"title":"Concurrent access to Models"},{"categories":null,"contents":"As noted in the overview Jena JDBC drivers are built around a core library which implements much of the common functionality required in an abstract way. This means that it is relatively easy to build a custom driver just by relying on the core library and implementing a minimum of one class.\nCustom Driver class The one and only thing that you are required to do to create a custom driver is to implement a class that extends JenaDriver. This requires you to implement a constructor which simply needs to call the parent constructor with the relevant inputs, one of these is your driver specific connection URL prefix i.e. the foo in jdbc:jena:foo:. Implementation specific prefixes must conform to the regular expression [A-Za-z\\d\\-_]+: i.e. some combination of alphanumerics, hyphens and underscores terminated by a colon.\nAdditionally you must override and implement two abstract methods connect() and getPropertyInfo(). The former is used to produce an instance of a JenaConnection while the latter provides information that may be used by tools to present users with some form of user interface for configuring a connection to your driver.\nAn important thing to note is that this may be all you need to do to create a custom driver, it is perfectly acceptable for your connect() implementation to just return one of the implementations from the built-in drivers. This may be useful if you are writing a driver for a specific store and wish to provide simplified connection URL parameters and create the appropriate connection instance programmatically.\nCustom Connection class The next stage in creating a custom driver (where necessary) is to create a class derived from JenaConnection. This has a somewhat broader set of abstract methods which you will need to implement such as createStatementInternal() and various methods which you may optionally override if you need to deviate from the default behaviors.\nIf you wish to go down this route then we recommend looking at the source for the built in implementations to guide you in this. It may be easier to extend one of the built-in implementations rather than writing an entire custom implementation yourself.\nNote that custom implementations may also require you to implement custom JenaStatement and JenaPreparedStatement implementations.\nTesting your Driver To aid testing your custom driver the jena-jdbc-core module provides a number of abstract test classes which can be derived from in order to provide a wide variety of tests for your driver implementation. This is how all the built in drivers are tested so you can check out their test sources for examples of this.\n","permalink":"https://jena.apache.org/documentation/jdbc/custom_driver.html","tags":null,"title":"Creating a Custom Jena JDBC Driver"},{"categories":null,"contents":"Introduction Jena is a moderately complicated system, with several different kinds of Model and ways of constructing them. This note describes the Jena ModelFactory, a one-stop shop for creating Jena models. ModelFactory lives in Java package org.apache.jena.rdf.model.\nThis note is an introduction, not an exhaustive description. As usual consult the Javadoc for details of the methods and classes to use.\nSimple model creation The simplest way to create a model (if not the shortest) is to call ModelFactory.createDefaultModel(). This [by default] delivers a plain RDF model, stored in-memory, that does no inference and has no special ontology interface.\nDatabase model creation For methods of creating models for TDB please see the relevant reference sections.\nInference model creation An important feature of Jena is support for different kinds of inference over RDF-based models (used for RDFS and OWL). Inference models are constructed by applying reasoners to base models and optionally schema. The statements deduced by the reasoner from the base model then appear in the inferred model alongside the statements from the base model itself. RDFS reasoning is directly available:\ncreateRDFSModel(Model base) creates an inference model over the base model using the built-in RDFS inference rules and any RDFS statements in the base model.\ncreateRDFSModel(Model schema, Model base) creates an RDFS inference model from the base model and the supplied schema model. The advantage of supplying the schema separately is that the reasoner may be able to compute useful information in advance on the assumption that the schema won\u0026rsquo;t change, or at least not change as often as the base model.\nIt\u0026rsquo;s possible to use other reasoning systems than RDFS. For these a Reasoner is required:\ncreateInfModel(Reasoner reasoner, Model base) creates an inference model using the rules of reasoner over the model base.\ncreateInfModel(Reasoner reasoner, Model schema, Model base) Just as for the RDFS case, the schema may be supplied separately to allow the reasoner to digest them before working on the model.\nFrom where do you fetch your reasoners? From the reasoner registry, the class ReasonerRegistry. This allows reasoners to be looked up by name, but also provides some predefined access methods for well-know reasoners:\ngetOWLReasoner(): the reasoner used for OWL inference\ngetRDFSReasoner(): the reasoner used for RDFS inference\ngetTransitiveReasoner(): a reasoner for doing subclass and sub-property closure.\nOntology model creation An ontology model is one that presents RDF as an ontology - classes, individuals, different kinds of properties, and so forth. Jena supports RDFS and OWL ontologies through profiles. There is extensive documentation on Jena\u0026rsquo;s ontology support, so all we\u0026rsquo;ll do here is summarise the creation methods.\ncreateOntologyModel() Creates an ontology model which is in-memory and presents OWL ontologies.\ncreateOntologyModel(OntModelSpec spec, Model base) Creates an ontology model according the OntModelSpec spec which presents the ontology of base.\ncreateOntologyModel(OntModelSpec spec, ModelMaker maker, Model base) Creates an OWL ontology model according to the spec over the base model. If the ontology model needs to construct additional models (for OWL imports), use the ModelMaker to create them. [The previous method will construct a MemModelMaker for this.]\nWhere do OntModelSpecs come from? There\u0026rsquo;s a cluster of constants in the class which provide for common uses; to name but three:\nOntModelSpec.OWL_MEM_RDFS_INF OWL ontologies, model stored in memory, using RDFS entailment only\nOntModelSpec.RDFS_MEM RDFS ontologies, in memory, but doing no additional inferences\nOntModelSpec.OWL_DL_MEM_RULE_INF OWL ontologies, in memory, with the full OWL Lite inference\nCreating models from Assembler descriptions A model can be built from a description of the required model. This is documented in the assembler howto. Access to the assembler system for model creation is provided by three ModelFactory methods:\nassembleModelFrom( Model singleRoot ): assemble a Model from the single Model description in singleRoot. If there is no such description, or more than one, an exception is thrown. If a description has to be selected from more than one available candidates, consider using the methods below.\nfindAssemblerRoots( Model m ): answer a Set of all the Resources in m which are of type ja:Model, ie descriptions of models to assemble. (Note that this will include sub-descriptions of embedded models if they are present.)\nassembleModelFrom( Resource root ): answer a Model assembled according to the description hanging from root. Assemblers can construct other things as well as models, and the Assembler system is user-extensible: see the howto for details.\nFile-based models The method ModelFactory.createFileModelMaker(String) returns a ModelMaker which attaches models to filing-system files. The String argument is the fileBase. When a file-ModelMaker opens a file, it reads it from a file in the directory named by the fileBase; when the model is closed (and only then, in the current implementation), the contents of the model are written back to the file.\nBecause the names of models in a modelMaker can be arbitrary character strings, in particular URIs, they are translated slightly to avoid confusion with significant characters of common filing systems. In the current implementation,\ncolon : is converted to \\_C slash / is converted to \\_S underbar _ is converted to \\_U ModelMakers Plain models can be given names which allows them to be \u0026ldquo;saved\u0026rdquo; and looked up by name later. This is handled by implementations of the interface ModelMaker; each ModelMaker produces Models of the same kind. The simplest kind of ModelMaker is a memory model maker, which you get by calling ModelFactory.createMemModelMaker(). The methods you\u0026rsquo;d want to use to start with on a ModelMaker are:\ncreateModel(String): create a model with the given name in the ModelMaker. If a model with that name already exists, then that model is used instead.\nopenModel(String): open an existing model with the given name. If no such model exists, create a new empty one and give it that name. [createModel(String) and openModel(String) behave in the same way, but each has a two-argument form for which the behaviour is different. Use whichever one best fits your intention.]\ncreateModel(): create a fresh anonymous model.\ngetModel(): each ModelMaker has a default model; this method returns that model.\nThere are other methods, for removing models, additional control over create vs open, closing the maker, and looking names up; for those consult the ModelMaker JavaDoc.\nMiscellany Finally, ModelFactory contains a collection of methods for some special cases not conveniently dealt with elsewhere.\ncreateModelForGraph(Graph g) is used when an advanced user with access to the Jena SPI has constructed or obtained a Graph and wishes to present it as a model. This method wraps the graph up as a plain model. Alterations to the graph are visible in the model, and vice versa.\n","permalink":"https://jena.apache.org/documentation/notes/model-factory.html","tags":null,"title":"Creating Jena models"},{"categories":null,"contents":" This page covers the jena-csv module which has been retired. The last release of Jena with this module is Jena 3.9.0. See jena-csv/README.md. The original documentation.\n","permalink":"https://jena.apache.org/documentation/archive/csv/","tags":null,"title":"CSV PropertyTable"},{"categories":null,"contents":" This page covers the jena-csv module which has been retired. The last release of Jena with this module is Jena 3.9.0. See jena-csv/README.md. This is the original documentation.\nThis module is about getting CSVs into a form that is amenable to Jena SPARQL processing, and doing so in a way that is not specific to CSV files. It includes getting the right architecture in place for regular table shaped data, using the core abstraction of PropertyTable.\nIllustration\nThis module involves the basic mapping of CSV to RDF using a fixed algorithm, including interpreting data as numbers or strings.\nSuppose we have a CSV file located in “file:///c:/town.csv”, which has one header row, two data rows:\nTown,Population Southton,123000 Northville,654000 As RDF this might be viewable as:\n@prefix : \u0026lt;file:///c:/town.csv#\u0026gt; . @prefix csv: \u0026lt;http://w3c/future-csv-vocab/\u0026gt; . [ csv:row 1 ; :Town \u0026quot;Southton\u0026quot; ; :Population “123000”^^http://www.w3.org/2001/XMLSchema#int ] . [ csv:row 2 ; :Town \u0026quot;Northville\u0026quot; ; :Population “654000”^^http://www.w3.org/2001/XMLSchema#int ] . or without the bnode abbreviation:\n@prefix : \u0026lt;file:///c:/town.csv#\u0026gt; . @prefix csv: \u0026lt;http://w3c/future-csv-vocab/\u0026gt; . _:b0 csv:row 1 ; :Town \u0026quot;Southton\u0026quot; ; :Population “123000”^^http://www.w3.org/2001/XMLSchema#int . _:b1 csv:row 2 ; :Town \u0026quot;Northville\u0026quot; ; :Population “654000”^^http://www.w3.org/2001/XMLSchema#int. Each row is modeling one \u0026ldquo;entity\u0026rdquo; (here, a population observation). There is a subject (a blank node) and one predicate-value for each cell of the row. Row numbers are added because it can be important. Now the CSV file is viewed as a graph - normal, unmodified SPARQL can be used. Multiple CSVs files can be multiple graphs in one dataset to give query across different data sources.\nWe can use the following SPARQL query for “Towns over 500,000 people” mentioned in the CSV file:\nSELECT ?townName ?pop { GRAPH \u0026lt;file:///c:/town.csv\u0026gt; { ?x :Town ?townName ; :Popuation ?pop . FILTER(?pop \u0026gt; 500000) } } What\u0026rsquo;s more, we make some room for future extension through PropertyTable. The architecture is designed to be able to accommodate any table-like data sources, such as relational databases, Microsoft Excel, etc.\nDocumentation Get Started Design Implementation ","permalink":"https://jena.apache.org/documentation/archive/csv/csv_index.html","tags":null,"title":"CSV PropertyTable"},{"categories":null,"contents":"Architecture The architecture of CSV PropertyTable mainly involves 2 components:\nPropertyTable GraphPropertyTable PropertyTable A PropertyTable is collection of data that is sufficiently regular in shape it can be treated as a table. That means each subject has a value for each one of the set of properties. Irregularity in terms of missing values needs to be handled but not multiple values for the same property. With special storage, a PropertyTable\nis more compact and more amenable to custom storage (e.g. a JSON document store) can have custom indexes on specific columns can guarantee access orders More explicitly, PropertyTable is designed to be a table of RDF terms, or Nodes in Jena. Each Column of the PropertyTable has an unique columnKey Node of the predicate (or p for short). Each Row of the PropertyTable has an unique rowKey Node of the subject (or s for short). You can use getColumn() to get the Column by its columnKey Node of the predicate, while getRow() for Row.\nA PropertyTable should be constructed in this workflow (in order):\nCreate Columns using PropertyTable.createColumn() for each Column of the PropertyTable Create Rows using PropertyTable.createRow() for each Row of the PropertyTable For each Row created, set a value (Node) at the specified Column, by calling Row.setValue() Once a PropertyTable is built, tabular data within can be accessed by the API of PropertyTable.getMatchingRows(), PropertyTable.getColumnValues(), etc.\nGraphPropertyTable GraphPropertyTable implements the Graph interface (read-only) over a PropertyTable. This is subclass from GraphBase and implements find(). The graphBaseFind()(for matching a Triple) and propertyTableBaseFind()(for matching a whole Row) methods can choose the access route based on the find arguments. GraphPropertyTable holds/wraps a reference of the PropertyTable instance, so that such a Graph can be treated in a more table-like fashion.\nNote: Both PropertyTable and GraphPropertyTable are NOT restricted to CSV data. They are supposed to be compatible with any table-like data sources, such as relational databases, Microsoft Excel, etc.\nGraphCSV GraphCSV is a sub class of GraphPropertyTable aiming at CSV data. Its constructor takes a CSV file path as the parameter, parse the file using a CSV Parser, and makes a PropertyTable through PropertyTableBuilder.\nFor CSV to RDF mapping, we establish some basic principles:\nSingle-Value and Regular-Shaped CSV Only In the CSV-WG, it looks like duplicate column names are not going to be supported. Therefore, we just consider parsing single-valued CSV tables. There is the current editor working draft from the CSV on the Web Working Group, which is defining a more regular data out of CSV. This is the target for the CSV work of GraphCSV: tabular regular-shaped CSV; not arbitrary, irregularly shaped CSV.\nNo Additional CSV Metadata A CSV file with no additional metadata is directly mapped to RDF, which makes a simpler case compared to SQL-to-RDF work. It\u0026rsquo;s not necessary to have a defined primary column, similar to the primary key of database. The subject of the triple can be generated through one of:\nThe triples for each row have a blank node for the subject, e.g. something like the illustration The triples for row N have a subject URI which is \u0026lt;FILE#_N\u0026gt;. Data Type for Typed Literal All the values in CSV are parsed as strings line by line. As a better option for the user to turn on, a dynamic choice which is a posh way of saying attempt to parse it as an integer (or decimal, double, date) and if it passes, it\u0026rsquo;s an integer (or decimal, double, date). Note that for the current release, all of the numbers are parsed as double, and date is not supported yet.\nFile Path as Namespace RDF requires that the subjects and the predicates are URIs. We need to pass in the namespaces (or just the default namespaces) to make URIs by combining the namespaces with the values in CSV. We don’t have metadata of the namespaces for the columns, But subjects can be blank nodes which is useful because each row is then a new blank node. For predicates, suppose the URL of the CSV file is file:///c:/town.csv, then the columns can be \u0026lt;file:///c:/town.csv#Town\u0026gt; and \u0026lt;file:///c:/town.csv#Population\u0026gt;, as is showed in the illustration.\nFirst Line of Table Header Needed as Predicates The first line of the CSV file must be the table header. The columns of the first line are parsed as the predicates of the RDF triples. The RDF triple data are parsed starting from the second line.\nUTF-8 Encoded Only The CSV files must be UTF-8 encoded. If your CSV files are using Western European encodings, please change the encoding before using CSV PropertyTable.\n","permalink":"https://jena.apache.org/documentation/archive/csv/design.html","tags":null,"title":"CSV PropertyTable - Design"},{"categories":null,"contents":"Using CSV PropertyTable with Apache Maven See \u0026ldquo;Using Jena with Apache Maven\u0026rdquo; for full details.\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-csv\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;X.Y.Z\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; Using CSV PropertyTable from Java through the API In order to switch on CSV PropertyTable, it\u0026rsquo;s required to register LangCSV into Jena RIOT, through a simple method call:\nimport org.apache.jena.propertytable.lang.CSV2RDF; ... CSV2RDF.init() ; It\u0026rsquo;s a static method call of registration, which needs to be run just one time for an application before using CSV PropertyTable (e.g. during the initialization phase).\nOnce registered, CSV PropertyTable provides 2 ways for the users to play with (i.e. GraphCSV and RIOT):\nGraphCSV GraphCSV wrappers a CSV file as a Graph, which makes a Model for SPARQL query:\nModel model = ModelFactory.createModelForGraph(new GraphCSV(\u0026quot;data.csv\u0026quot;)) ; QueryExecution qExec = QueryExecutionFactory.create(query, model) ; or for multiple CSV files and/or other RDF data:\nModel csv1 = ModelFactory.createModelForGraph(new GraphCSV(\u0026quot;data1.csv\u0026quot;)) ; Model csv2 = ModelFactory.createModelForGraph(new GraphCSV(\u0026quot;data2.csv\u0026quot;)) ; Model other = ModelFactory.createModelForGraph(otherGraph) ; Dataset dataset = ... ; dataset.addNamedModel(\u0026quot;http://example/table1\u0026quot;, csv1) ; dataset.addNamedModel(\u0026quot;http://example/table2\u0026quot;, csv2) ; dataset.addNamedModel(\u0026quot;http://example/other\u0026quot;, other) ; ... normal SPARQL execution ... You can also find the full examples from GraphCSVTest.\nIn short, for Jena ARQ, a CSV table is actually a Graph (i.e. GraphCSV), without any differences from other types of Graphs when using it from the Jena ARQ API.\nRIOT When LangCSV is registered into RIOT, CSV PropertyTable adds a new RDF syntax of \u0026lsquo;.csv\u0026rsquo; with the content type of \u0026ldquo;text/csv\u0026rdquo;. You can read \u0026ldquo;.csv\u0026rdquo; files into Model following the standard RIOT usages:\n// Usage 1: Direct reading through Model Model model_1 = ModelFactory.createDefaultModel() model.read(\u0026quot;test.csv\u0026quot;) ; // Usage 2: Reading using RDFDataMgr Model model_2 = RDFDataMgr.loadModel(\u0026quot;test.csv\u0026quot;) ; For more information, see Reading RDF in Apache Jena.\nNote that, the requirements for the CSV files are listed in the documentation of Design. CSV PropertyTable only supports single-Value, regular-Shaped, table-headed and UTF-8-encoded CSV files (NOT Microsoft Excel files).\nCommand Line Tool csv2rdf is a tool for direct transforming from CSV to the formatted RDF syntax of N-Triples. The script calls the csv2rdf java program in the riotcmd package in this way:\njava -cp ... riotcmdx.csv2rdf inputFile ... It transforms the CSV inputFile into N-Triples. For example,\njava -cp ... riotcmdx.csv2rdf src/test/resources/test.csv The script reuses Common framework for running RIOT parsers, so that it also accepts the same arguments (type \u0026quot;riot --help\u0026quot; to get command line reminders) from RIOT Command line tools:\n--validate: Checking mode: same as --strict --sink --check=true --check=true/false: Run with checking of literals and IRIs either on or off. --sink: No output of triples or quads in the standard output (i.e. System.out). --time: Output timing information. ","permalink":"https://jena.apache.org/documentation/archive/csv/get_started.html","tags":null,"title":"CSV PropertyTable - Get Started"},{"categories":null,"contents":"PropertyTable Implementations There are 2 implementations for PropertyTable. The pros and cons are summarised in the following table:\nPropertyTable Implementation Description Supported Indexes Advantages Disadvantages PropertyTableArrayImpl implemented by a two-dimensioned Java array of Nodes SPO, PSO compact memory usage, fast for querying with S and P, fast for query a whole Row slow for query with O, table Row/Column size provided PropertyTableHashMapImpl implemented by several Java HashMaps PSO, POS fast for querying with O, table Row/Column size not required more memory usage for HashMaps By default, [PropertyTableArrayImpl]((https://github.com/apache/jena/tree/main/jena-csv/src/main/java/org/apache/jena/propertytable/impl/PropertyTableArrayImpl.java) is used as the PropertyTable implementation held by GraphCSV. If you want to switch to PropertyTableHashMapImpl, just use the static method of GraphCSV.createHashMapImpl() to replace the default new GraphCSV() way. Here is an example:\nModel model_csv_array_impl = ModelFactory.createModelForGraph(new GraphCSV(file)); // PropertyTableArrayImpl Model model_csv_hashmap_impl = ModelFactory.createModelForGraph(GraphCSV.createHashMapImpl(file)); // PropertyTableHashMapImpl StageGenerator Optimization for GraphPropertyTable Accessing from SPARQL via Graph.find() will work, but it\u0026rsquo;s not ideal. Some optimizations can be done for processing a SPARQL basic graph pattern. More explicitly, in the method of OpExecutor.execute(OpBGP, ...), when the target for the query is a GraphPropertyTable, it can get a whole Row, or Rows, of the table data and match the pattern with the bindings.\nThe optimization of querying a whole Row in the PropertyTable are supported now. The following query pattern can be transformed into a Row querying, without generating triples:\n?x :prop1 ?v . ?x :prop2 ?w . ... It\u0026rsquo;s made by using the extension point of StageGenerator, because it\u0026rsquo;s now just concerned with BasicPattern. The detailed workflow goes in this way:\nSplit the incoming BasicPattern by subjects, (i.e. it becomes multiple sub BasicPatterns grouped by the same subjects. (see QueryIterPropertyTable ) For each sub BasicPattern, if the Triple size within is greater than 1 (i.e. at least 2 Triples), it\u0026rsquo;s turned into a Row querying, and processed by QueryIterPropertyTableRow, else if it contains only 1 Triple, it goes for the traditional Triple querying by graph.graphBaseFind() In order to turn on this optimization, we need to register the StageGeneratorPropertyTable into ARQ context, before performing SPARQL querying:\nStageGenerator orig = (StageGenerator)ARQ.getContext().get(ARQ.stageGenerator) ; StageGenerator stageGenerator = new StageGeneratorPropertyTable(orig) ; StageBuilder.setGenerator(ARQ.getContext(), stageGenerator) ; ","permalink":"https://jena.apache.org/documentation/archive/csv/implementation.html","tags":null,"title":"CSV PropertyTable - Implementation"},{"categories":null,"contents":"Fuseki can provide access control at the level on the server, on datasets, on endpoints and also on specific graphs within a dataset. It also provides native https to protect data in-flight.\nFuseki Main provides some common patterns of authentication and also Graph level Data Access Control to provide control over the visibility of graphs within a dataset, including the union graph of a dataset and the default graph. Currently, Graph level access control only applies to read-only datasets.\nFuseki Full (Fuseki with the UI) can be used when run in a web application server such as Tomcat to provide authentication of the user. See \u0026ldquo;Fuseki Security\u0026rdquo; for configuring security over the whole of the Fuseki UI.\nThis page applies to Fuseki Main.\nHTTPS HTTPS support is configured from the fuseki server command line.\nServer Argument \u0026ndash;https=SETUP Name of file for certificate details. \u0026ndash;httpsPort=PORT The port for https Default: 3043 The --https argument names a file in JSON which includes the name of the certificate file and password for the certificate.\nHTTPS certificate details file The file is a simple JSON file:\n{ \u0026#34;cert\u0026#34;: KEYSTORE, \u0026#34;passwd\u0026#34;: SECRET } This file must be protected by file access settings so that it can only be read by the userid running the server. One way is to put the keystore certificate and the certificate details file in the same directory, then make the directory secure.\nSelf-signed certificates A self-signed certificate provides an encrypted link to the server and stops some attacks. What it does not do is guarantee the identity of the host name of the Fuseki server to the client system. A signed certificate provides that through the chain of trust. A self-signed certificate does protect data in HTTP responses.\nA self-signed certificate can be generated with:\n$ keytool -keystore $keystore -alias jetty -genkey -keyalg RSA For information on creating a certificate, see the Jetty documentation for generating certificates.\nAuthentication Authentication, is establishing the identity of the principal (user or program) accessing the system. Fuseki Main provides users/password setup and HTTP authentication, digest or basic).\nThese should be used with HTTPS.\nServer Argument \u0026ndash;passwd=FILE Password file \u0026ndash;auth= \u0026ldquo;basic\u0026rdquo; or \u0026ldquo;digest\u0026rdquo; Default is \u0026ldquo;digest\u0026rdquo; These can also be given in the server configuration file:\n\u0026lt;#server\u0026gt; rdf:type fuseki:Server ; fuseki:passwd \u0026#34;\u0026lt;i\u0026gt;password_file\u0026lt;/i\u0026gt;\u0026#34; ; fuseki:auth \u0026#34;\u0026lt;i\u0026gt;digest\u0026lt;/i\u0026gt;\u0026#34; ; ... The format of the password file is:\nusername: password and passwords can be stored in hash or obfuscated form.\nDocumentation of the Eclipse Jetty Password file format.\nIf different authentication is required, the full facilities of Eclipse Jetty configuration are available - see the section below.\nUsing curl See the curl documentation for full details. This section is a brief summary of some relevant options:\ncurl argument Value \u0026ndash; -n, --netrc Take passwords from .netrc (_netrc on windows) --user= user:password Set the user and password (visible to all on the local machine) --anyauth Use server nominated authentication scheme --basic Use HTTP basic auth --digest Use HTTP digest auth -k, --insecure Don\u0026rsquo;t check HTTPS certificate. This allows for self-signed or expired certificates, or ones with the wrong host name. Using wget See the wget documentation for full details. This section is a brief summary of some relevant options:\nwget argument Value \u0026ndash; --http-user user name Set the user. --http-password password Set the password (visible to all on the local machine) wget uses users/password from .wgetrc or .netrc by default. --no-check-certificate Don\u0026rsquo;t check HTTPS certificate. This allows for self-signed or expired, certificates or ones with the wrong host name. Access Control Lists ACLs can be applied to the server as a whole, to a dataset, to endpoints, and to graphs within a dataset. This section covers server, dataset and endpoint access control lists. Graph-level access control is covered below.\nAccess control lists (ACL) as part of the server configuration file.\n$ fuseki --conf configFile.ttl ACLs are provided by the fuseki:allowedUsers property\nFormat of fuseki:allowedUsers The list of users allowed access can be an RDF list or repeated use of the property or a mixture. The different settings are combined into one ACL.\nfuseki:allowedUsers \u0026#34;user1\u0026#34;, \u0026#34;user2\u0026#34;, \u0026#34;user3\u0026#34;; fuseki:allowedUsers \u0026#34;user3\u0026#34;; fuseki:allowedUsers ( \u0026#34;user1\u0026#34; \u0026#34;user2\u0026#34; \u0026#34;user3\u0026#34;) ; There is a special user name \u0026ldquo;*\u0026rdquo; which means \u0026ldquo;any authenticated user\u0026rdquo;.\nfuseki:allowedUsers \u0026#34;*\u0026#34; ; Server Level ACLs \u0026lt;#server\u0026gt; rdf:type fuseki:Server ; \u0026lt;b\u0026gt;fuseki:allowedUsers \u0026#34;user1\u0026#34;, \u0026#34;user2\u0026#34;, \u0026#34;user3\u0026#34;;\u0026lt;/b\u0026gt; ... fuseki:services ( ... ) ; ... . A useful pattern is:\n\u0026lt;#server\u0026gt; rdf:type fuseki:Server ; \u0026lt;b\u0026gt;fuseki:allowedUsers \u0026#34;*\u0026#34;;\u0026lt;/b\u0026gt; ... fuseki:services ( ... ) ; ... . which requires all access to to be authenticated and the allowed users are those in the password file.\nDataset Level ACLs When there is an access control list on the fuseki:Service, it applies to all requests to the endpoints of the dataset.\nAny server-wide \u0026ldquo;allowedUsers\u0026rdquo; configuration also applies and both levels must allow the user access.\n\u0026lt;#service_auth\u0026gt; rdf:type fuseki:Service ; rdfs:label \u0026#34;ACL controlled dataset\u0026#34; ; fuseki:name \u0026#34;db-acl\u0026#34; ; # ACL here. fuseki:allowedUsers \u0026#34;user1\u0026#34;, \u0026#34;user3\u0026#34;; ## Choice of operations. fuseki:endpoint [ fuseki:operation fuseki:query ; fuseki:name \u0026#34;sparql\u0026#34; ]; fuseki:endpoint [ fuseki:operation fuseki:update ; fuseki:name \u0026#34;sparql\u0026#34; ] ; fuseki:endpoint [ fuseki:operation fuseki:gsp-r ; fuseki:name \u0026#34;get\u0026#34; ] ; fuseki:dataset \u0026lt;#base_dataset\u0026gt;; . Endpoint Level ACLs An access control list can be applied to an individual endpoint. Again, any other \u0026ldquo;allowedUsers\u0026rdquo; configuration, service-wide, or server-wide) also applies.\nfuseki:endpoint [ fuseki:operation fuseki:query ; fuseki:name \u0026#34;query\u0026#34; ; fuseki:allowedUsers \u0026#34;user1\u0026#34;, \u0026#34;user2\u0026#34; ; ]; fuseki:endpoint [ fuseki:operation fuseki:update ; fuseki:name \u0026#34;update\u0026#34; ; fuseki:allowedUsers \u0026#34;user1\u0026#34; ] ; Only user1 can use SPARQL update; both user1 and user2 can use SPARQL query.\nGraph Access Control Lists Graph level access control is defined using a specific dataset implementation for the service.\n\u0026lt;#access_dataset\u0026gt; rdf:type access:AccessControlledDataset ; access:registry ... ; access:dataset ... ; . Graph ACLs are defined in a Graph Security Registry which lists the users and graph URIs.\n\u0026lt;#service_tdb2\u0026gt; rdf:type fuseki:Service ; rdfs:label \u0026#34;Graph-level access controlled dataset\u0026#34; ; fuseki:name \u0026#34;db-graph-acl\u0026#34; ; ## Read-only operations on the dataset URL. fuseki:endpoint [ fuseki:operation fuseki:query ] ; fuseki:endpoint [ fuseki:operation fuseki:gsp_r ] ; fuseki:dataset \u0026lt;b\u0026gt;\u0026lt;#access_dataset\u0026gt;\u0026lt;/b\u0026gt; ; . # Define access on the dataset. \u0026lt;#access_dataset\u0026gt; rdf:type access:AccessControlledDataset ; access:registry \u0026lt;#securityRegistry\u0026gt; ; access:dataset \u0026lt;#tdb_dataset_shared\u0026gt; ; . \u0026lt;#securityRegistry\u0026gt;rdf:type access:SecurityRegistry ; . . . \u0026lt;#tdb_dataset_shared\u0026gt; rdf:type tdb:DatasetTDB ; . . . All dataset storage types are supported. TDB1 and TDB2 have special implementations for handling graph access control.\nGraph Security Registry The Graph Security Registry is defined as a number of access entries in either a list format \u0026ldquo;(user graph1 graph2 \u0026hellip;)\u0026rdquo; or as RDF properties access:user and access:graphs. The property access:graphs has graph URI or a list of URIs as its object.\n\u0026lt;#securityRegistry\u0026gt; rdf:type access:SecurityRegistry ; access:entry ( \u0026#34;user1\u0026#34; \u0026lt;http://host/graphname1\u0026gt; \u0026lt;http://host/graphname2\u0026gt; ) ; access:entry ( \u0026#34;user1\u0026#34; \u0026lt;http://host/graphname3\u0026gt; ) ; access:entry ( \u0026#34;user1\u0026#34; \u0026lt;urn:x-arq:DefaultGraph\u0026gt; ) ; access:entry ( \u0026#34;user2\u0026#34; \u0026lt;http://host/graphname9\u0026gt; ) ; access:entry [ access:user \u0026#34;user3\u0026#34; ; access:graphs ( \u0026lt;http://host/graphname3\u0026gt; \u0026lt;http://host/graphname4\u0026gt; ) ] ; access:entry [ access:user \u0026#34;user3\u0026#34; ; access:graphs \u0026lt;http://host/graphname5\u0026gt; ] ; access:entry [ access:user \u0026#34;userZ\u0026#34; ; access:graphs \u0026lt;http://host/graphnameZ\u0026gt; ] ; . Jetty Configuration For authentication configuration not covered by Fuseki configuration, the deployed server can be run using a Jetty configuration.\nServer command line: --jetty=jetty.xml.\nDocumentation for jetty.xml.\n","permalink":"https://jena.apache.org/documentation/fuseki2/fuseki-data-access-control.html","tags":null,"title":"Data Access Control for Fuseki"},{"categories":null,"contents":"This page describes support for accessing data with additional statements derived using RDFS. It supports rdfs:subClassOf, rdfs:subPropertyOf, rdfs:domain and rdfs:range. It does not provide RDF axioms. The RDFS vocabulary is not included in the data.\nIt does support use with RDF datasets, where each graph in the dataset has the same RDFS vocabulary applied to it.\nThis is not a replacement for the Jena RDFS Reasoner support which covers full RDFS inference.\nThe data is updateable, and graphs can be added and removed from the dataset. The vocabulary can not be changed during the lifetime of the RDFS dataset.\nAPI: RDFSFactory The API provides operation to build RDF-enabled datasets from data storage and vocabularies:\nExample:\nDatasetGraph data = ... // Load the vocabulary Graph vocab = RDFDataMgr.loadGraph(\u0026#34;vocabulary.ttl\u0026#34;); // Create a DatasetGraph with RDFS DatasetGraph dsg = datasetRDFS(DatasetGraph data, Graph vocab ); // (Optional) Present as a Dataset. Dataset dataset = DatasetFactory.wrap(dsg); The vocabulary is processed to produce datastructure needed for processing the data efficiently at run time. This is the SetupRDFS class that can be created and shared; it is thread-safe.\nSetupRDFS setup = setupRDFS(vocab); Assembler: RDFS Dataset Datasets with RDFS can be built with an assembler:\n\u0026lt;#rdfsDS\u0026gt; rdf:type ja:DatasetRDFS ; ja:rdfsSchema \u0026lt;vocabulary.ttl\u0026gt;; ja:dataset \u0026lt;#baseDataset\u0026gt; ; . \u0026lt;#baseDataset\u0026gt; rdf:type ...some dataset type ... ; ... . where \u0026lt;#baseDataset\u0026gt; is the definition of the dataset to be enriched.\nAssembler: RDFS Graph It is possible to build a single Model:\n\u0026lt;#rdfsGraph\u0026gt; rdf:type ja:GraphRDFS ; ja:rdfsSchema \u0026lt;vocabulary.ttl\u0026gt;; ja:graph \u0026lt;#baseGraph\u0026gt; ; . \u0026lt;#baseGraph\u0026gt; rdf:type ja:MemoryModel; ... More generally, inference models can be defined using the Jena Inference and Rule engine: jena-fuseki2/examples/config-inference-1.ttl.\nUse with Fuseki The files for this example are available at: jena-fuseki2/examples/rdfs.\nFrom the command line (here, loading data from a file into an in-memory dataset):\nfuseki-server --data data.trig --rdfs vocabulary.ttl /dataset or from a configuration file with an RDFS Dataset:\nPREFIX : \u0026lt;#\u0026gt; PREFIX fuseki: \u0026lt;http://jena.apache.org/fuseki#\u0026gt; PREFIX rdf: \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026gt; PREFIX rdfs: \u0026lt;http://www.w3.org/2000/01/rdf-schema#\u0026gt; PREFIX ja: \u0026lt;http://jena.hpl.hp.com/2005/11/Assembler#\u0026gt; [] rdf:type fuseki:Server ; fuseki:services ( :service ) . ## Fuseki service /dataset with SPARQ query ## /dataset?query= :service rdf:type fuseki:Service ; fuseki:name \u0026#34;dataset\u0026#34; ; fuseki:endpoint [ fuseki:operation fuseki:query ] ; fuseki:endpoint [ fuseki:operation fuseki:update ] ; fuseki:dataset :rdfsDataset ; . ## RDFS :rdfsDataset rdf:type ja:DatasetRDFS ; ja:rdfsSchema \u0026lt;file:vocabulary.ttl\u0026gt;; ja:dataset :baseDataset; . ## Transactional in-memory dataset. :baseDataset rdf:type ja:MemoryDataset ; ja:data \u0026lt;file:data.trig\u0026gt;; . Querying the Fuseki server With the SOH tools, a query (asking for plain text output):\ns-query --service http://localhost:3030/dataset --output=text --file query.rq or with curl:\ncurl --data @query.rq \\ --header \u0026#39;Accept: text/plain\u0026#39; \\ --header \u0026#39;Content-type: application/sparql-query\u0026#39; \\ http://localhost:3030/dataset will return:\n------------------------- | s | p | o | ========================= | :s | ns:p | :o | | :s | rdf:type | ns:D | | :o | rdf:type | ns:T1 | | :o | rdf:type | ns:T3 | | :o | rdf:type | ns:T2 | ------------------------- Files data.trig:\nPREFIX : \u0026lt;http://example/\u0026gt; PREFIX ns: \u0026lt;http://example/ns#\u0026gt; :s ns:p :o . vocabulary.ttl:\nPREFIX xsd: \u0026lt;http://www.w3.org/2001/XMLSchema#\u0026gt; PREFIX rdf: \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026gt; PREFIX rdfs: \u0026lt;http://www.w3.org/2000/01/rdf-schema#\u0026gt; PREFIX skos: \u0026lt;http://www.w3.org/2008/05/skos#\u0026gt; PREFIX list: \u0026lt;http://jena.hpl.hp.com/ARQ/list#\u0026gt; PREFIX ns: \u0026lt;http://example/ns#\u0026gt; ns:T1 rdfs:subClassOf ns:T2 . ns:T2 rdfs:subClassOf ns:T3 . ns:p rdfs:domain ns:D . ns:p rdfs:range ns:T1 . query.rq:\nPREFIX : \u0026lt;http://example/\u0026gt; PREFIX ns: \u0026lt;http://example/ns#\u0026gt; PREFIX rdf: \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026gt; SELECT * { ?s ?p ?o } ","permalink":"https://jena.apache.org/documentation/rdfs/","tags":null,"title":"Data with RDFS Inferencing"},{"categories":null,"contents":"ModelChangedListener In Jena it is possible to monitor a Model for changes, so that code can be run after changes are applied without the coding for that Model having to do anything special. We call these changes \u0026ldquo;events\u0026rdquo;. This first design and implementation is open for user comment and we may refine or reduce the implementation as more experience is gained with it.\nTo monitor a Model, you must register a ModelChangedListener with that Model:\nModel m = ModelFactory.createDefaultModel(); ModelChangedListener L = new MyListener(); m.register( L ); MyListener must be an implementation of ModelChangedListener, for example:\nclass MyListener implements ModelChangedListener { public void addedStatement( Statement s ) { System.out.println( \u0026#34;\u0026gt;\u0026gt; added statement \u0026#34; + s ); } public void addedStatements( Statement [] statements ) {} public void addedStatements( List statements ) {} public void addedStatements( StmtIterator statements ) {} public void addedStatements( Model m ) {} public void removedStatement( Statement s ) {} public void removedStatements( Statement [] statements ) {} public void removedStatements( List statements ) {} public void removedStatements( StmtIterator statements ) {} public void removedStatements( Model m ) {} } This listener ignores everything except the addition of single statements to m; those it prints out. The listener has a method for each of the ways that statements can be added to a Model:\nas a single statement, Model::add(Statement) as an element of an array of statements, Model::add(Statement[]) as an element of a list of statements, Model::add(List) as an iterator over statements, Model::add(StmtIterator) as part of another Model, Model::add(Model) (Similarly for delete.)\nThe listener method is called when the statement(s) have been added to the Model, if no exceptions have been thrown. It does not matter if the statement was already in the Model or not; it is the act of adding it that fires the listener.\nThere is no guarantee that the statement, array, list, or model that is added or removed is the same one that is passed to the appropriate listener method, and the StmtIterator will never be the same one. However, in the current design:\na single Statement will be .equals to the original Statement a List will be .equals to the original List a Statement[] will be the same length and have .equal elements in the same order a StmtIterator will deliver .equal elements in the same order a Model will contain the same statements We advise not relying on these ordering properties; instead assume that for any bulk update operation on the model, the listener will be told the method of update and the statements added or removed, but that the order may be different and duplicate statements may have been removed. Note in particular that a Model with any Listeners will have to record the complete contents of any StmtIterator that is added or removed to the model, so that the Model and the Listener can both see all the statements.\nFinally, there is no guarantee that only Statements etc added through the Model API will be presented to the listener; any Triples added to its underlying Graph will also be presented to the listener as statements.\nUtility classes The full Listener API is rather chunky and it can be inconvenient to use, especially for the creation of inline classes. There are four utility classes in org.apache.jena.rdf.listeners:\nNullListener. This class\u0026rsquo;s methods do nothing. This is useful when you want to subclass and intercept only specific ways of updating a Model. ChangedListener. This class only records whether some change has been made, but not what it is. The method hasChanged() returns true if some change has been made since the last call of hasChanged() [or since the listener was created]. StatementListener. This class translates all bulk update calls (ie the ones other than addedStatement() and removedStatement()) into calls to addedStatement()/removedStatement() for each Statement in the collection. This allows statements to be tracked whether they are added one at a time or in bulk. ObjectListener. This class translates all the listener calls into added(Object) or removed(Object) as appropriate; it is left to the user code to distinguish among the types of argument. When listeners are called In the current implementation, listener methods are called immediately the additions or removals are made, in the same thread as the one making the update. If a model has multiple listeners registered, the order in which they are informed about an update is unspecified and may change from update to update. If any listener throws an exception, that exception is thrown through the update call, and other listeners may not be informed of the update. Hence listener code should be brief and exception-free if at all possible.\nRegistering and unregistering A listener may be registered with the same model multiple times. If so, it will be invoked as many times as it is registered for each update event on the model.\nA listener L may be unregistered from a Model using the method unregister(L). If L is not registered with the model, nothing happens.\nIf a listener is registered multiple times with the same model, each unregister() for that listener will remove just one of the registrations.\nTransactions and databases In the current design, listeners are not informed of transaction boundaries, and all events are fed to listeners as soon as they happen.\n","permalink":"https://jena.apache.org/documentation/notes/event-handler-howto.html","tags":null,"title":"Event handling in Jena"},{"categories":null,"contents":"Optimization in ARQ proceeds on two levels. After the query is parsed, the SPARQL algebra for the query is generated as described in the SPARQL specification. High-level optimization occurs by rewriting the algebra into new, equivalent algebra forms and introducing specialized algebra operators. During query execution, the low-level, storage-specific optimization occurs such as choosing the order of triple patterns within basic graph patterns.\nThe effect of high-level optimizations can be seen using arq.qparse and the low-level runtime optimizations can be seen by execution logging.\nAlgebra Transformations The preparation for a query for execution can be investigated with the command arq.qparse --explain --query QueryFile.rq. Different storage systems may perform different optimizations, usually chosen from the standard set. qparse shows the action of the memory-storage optimizer which applies all optimizations.\nOther useful arguments are:\nqparse arguments\nArgument Effect --print=query Print the parsed query --print=op Print the SPARQL algebra for the query. This is exactly the algebra specified by the SPARQL standard. --print=opt Print the optimized algebra for the query. --print=quad Print the quad form algebra for the query. --print=optquad Print the quad-form optimized algebra for the query. The argument --explain is equivalent to --print=query --print=opt\nExamples:\narq.qparse --explain --query Q.rq arq.qparse --explain \u0026#39;SELECT * { ?s ?p ?o }\u0026#39; Execution Logging ARQ can log query and update execution details globally or for an individual operations. This adds another level of control on top of the logger level controls.\nFrom command line:\narq.sparql --explain --data ... --query ... Explanatory messages are controlled by the Explain.InfoLevel level in the execution context.\nExecution logging at level ALL can cause a significant slowdown in query execution speeds but the order of operations logged will be correct.\nThe logger used is called org.apache.jena.arq.exec. Message are sent at level \u0026ldquo;info\u0026rdquo;. So for log4j2, the following can be set in the log4j2.properties file:\nlogger.arq-exec.name = org.apache.jena.arq.exec logger.arq-exec.level = INFO The context setting is for key (Java constant) ARQ.symLogExec. To set globally:\nARQ.setExecutionLogging(Explain.InfoLevel.ALL); and it may also be set on an individual query execution using its local context.\ntry(QueryExecution qExec = QueryExecution.create() ... .set(ARQ.symLogExec, Explain.InfoLevel.ALL).build() ) { ResultSet rs = qExec.execSelect() ; ... } On the command line:\narq.query --explain --data data file --query=queryfile The command tdbquery takes the same \u0026ndash;explain argument.\nLogging information levels: see the logging page\nTo get ARQ query explanation in Fuseki logs see Fuseki logging\n","permalink":"https://jena.apache.org/documentation/query/explain.html","tags":null,"title":"Explaining ARQ queries"},{"categories":null,"contents":"There are several ways to extend the ARQ query engine within the SPARQL syntax.\nExpression Functions - additional operations in FILTERS, BIND and SELECT expressions. Property functions - adding predicates that introduce custom query stages DESCRIBE handlers Support for finding blank nodes by label Extending query evaluation for querying different storage and inference systems Functions are standard part of SPARQL. ARQ provides application-written functions and provides a function library. Applications can write and register their own functions.\nProperty functions provide a way to provide custom matching of particular predicates. They enable triple matches to be calculated, rather than just looked up in a RDF graph and they are a way to add functionality and remain within SPARQL. ARQ has a property function library. Applications can write and register their own property functions.\nThe free text support in ARQ is provided by Lucene, using property functions.\nFilter Functions A SPARQL custom function is implementation dependent. Most details of the ARQ query engine do not have to be understood to write a function; it is a matter of implementing one interface. This is made simpler for many cases by a number of base classes that provide much of the machinery needed.\nFunction Registry Functions can be installed into the function registry by the application. The function registry is a mapping from URI to a factory class for functions (each time a function is mentioned in a query, a new instance is created) and there is an auto-factory class so just registering a Java class will work. A function can access the queried dataset.\nDynamically Loaded Functions The ARQ function library uses this mechanism. The namespace of the ARQ function library is \u0026lt;http://jena.apache.org/ARQ/function#\u0026gt;.\nPREFIX afn: \u0026lt;http://jena.apache.org/ARQ/function#\u0026gt; PREFIX dc: \u0026lt;http://purl.org/dc/elements/1.1/\u0026gt; SELECT ?v { ?x dc:date ?date . FILTER (?date \u0026lt; afn:now() ) } The afn:now returns the time the query started.\nThe expression functions in the ARQ distribution are described on the expression function library page.\nURIs for functions in the (fake) URI scheme java: are dynamically loaded. The class name forms the scheme specific part of the URI.\nProperty functions Property functions, sometimes called \u0026ldquo;magic properties\u0026rdquo;, are properties that cause triple matching to happen by executing some piece of code, determined by the property URI, and not by the usual graph matching. They can be used to give certain kinds of inference and rule processing. Some calculated properties have additional, non-declarative requirements such as needing one of other of the subject or object to be a query constant or a bound value, and not able to generate all possibilities for that slot.\nProperty functions must have fixed URI for the predicate (it can\u0026rsquo;t be query variable). They may take a list for subject or object.\nOne common case is for access to collections (RDF lists) or containers (rdf:Bag, rdf:Seq, rdf:Alt).\nPREFIX list: \u0026lt;http://jena.apache.org/ARQ/list#\u0026gt; SELECT ?member { ?x :p ?list . # Some way to find the list ?list list:member ?member . } which can also be written:\nPREFIX list: \u0026lt;http://jena.apache.org/ARQ/list#\u0026gt; SELECT ?member { ?x :p [ list:member ?member ] } Likewise, RDF containers:\nPREFIX rdfs: \u0026lt;http://www.w3.org/2000/01/rdf-schema#\u0026gt; SELECT ?member { ?x :p ?bag . # Some way to find the bag ?bag rdfs:member ?member . } Property functions can also take lists in the subject or object slot.\nCode for properties can be dynamically loaded or pre-registered. For example, splitIRI will take an IRI and assign the namespace ad localname parts to variables (if the variables are already bound, not constants are used, splitIRI will check the values).\nPREFIX xsd: \u0026lt;http://www.w3.org/2001/XMLSchema#\u0026gt; PREFIX apf: \u0026lt;java:org.apache.jena.query.pfunction.library.\u0026gt; SELECT ?namespace ?localname { xsd:string apf:splitIRI (?namespace ?localname) } Property functions might conflict with inference rules and it can be turned off by the Java code:\nARQ.setFalse(ARQ.enablePropertyFunctions) ; or on a per instance basis:\ntry(QueryExecution qExec = ... ) { qExec.getContext().setFalse(ARQ.enablePropertyFunctions) ; ... } The property functions in the ARQ distribution are described on the property function library page.\nURIs for functions in the (fake) URI scheme java: are dynamically loaded. The class name forms the scheme specific part of the URI.\nDESCRIBE handlers The DESCRIBE result form in SPARQL does not define an exact form of RDF to return. Instead, it allows the server or query processor to return what it considers to be an appropriate description of the resources located. This description will be specific to the domain, data modelling or application.\nARQ comes with one built-in handler which calculates the blank node closure of resources found. While suitable for many situations, it is not general (for example, a FOAF file usually consists of all blank nodes). ARQ allows the application to replace or add handlers for producing DESCRIBE result forms.\nApplication-specific handlers can be added to the DescribeHandlerRegistry. The handler will be called for each resource (not literals) identified by the DESCRIBE query.\nBlank Node Labels URIs from with scheme name \u0026ldquo;_\u0026rdquo; (which is illegal) are created as blank node labels for directly accessing a blank node in the queried graph or dataset. These are constant terms in the query - not unnamed variables. Do not confuse these with the standard qname-like notation for blank nodes in queries. This is not portable - use with care.\n\u0026lt;_:1234-5678-90\u0026gt; # A blank node in the data _:b0 # A blank node in the query - a variable ","permalink":"https://jena.apache.org/documentation/query/extension.html","tags":null,"title":"Extensions in ARQ"},{"categories":null,"contents":"Eyeball is a Jena-based tool for checking RDF models (including OWL) for common problems. It is user-extensible using plugins.\nThis page is historical \u0026ldquo;for information only\u0026rdquo; - there is no Apache release of Eyeball and the code has not been updated for Jena3.\nThe original source code is available. Documentation index The brief guide. The manual. The JavaDoc. Getting the Eyeball release Installation Eyeball needs to be compiled from source.\nIf you have Ant installed, run the Eyeball test suite:\nant test Ensure all the jars in the Eyeball lib directory are on your classpath.\nUsing Eyeball with Apache Maven TODO\nTrying it out Pick one of your RDF files; we\u0026rsquo;ll call it FOO for now. Run the command-line command\njava jena.eyeball -check FOO You will likely get a whole bunch of messages about your RDF. The messages are supposed to be self-explanatory, so you may be able to go ahead and fix some problems straight away. If you get a Java error about NoClassDefFoundError, you\u0026rsquo;ve forgotten to set the classpath up or use the -cp myClassPath option to Java.\nYou may also want to try the experimental GUI, see below.\nIf the messages aren\u0026rsquo;t self-explanatory, or you want more details, please consult the guide.\nExperimental Eyeball GUI Eyeball includes a simple GUI tool which will allow multiple files to be checked at once and multiple schemas to be assumed. It will also allow you to select which inspectors are used.\nTo start the GUI, use the following (assuming your classpath is set up, as above): java jena.eyeballGUI\n","permalink":"https://jena.apache.org/documentation/archive/eyeball/eyeball-getting-started.html","tags":null,"title":"Eyeball - checking RDF/OWL for common problems"},{"categories":null,"contents":"","permalink":"https://jena.apache.org/help_and_support/faq.html","tags":null,"title":"Frequently asked questions"},{"categories":null,"contents":"The regular expressions for fan:localname and afn:namespace were incorrect. SPARQL allows custom functions in expressions so that queries can be used on domain-specific data. SPARQL defines a function by URI (or prefixed name) in FILTER expressions. ARQ provides a function library and supports application-provided functions. Functions and property functions can be registered or dynamically loaded.\nApplications can also provide their own functions.\nARQ also provides an implementation the Leviathan Function Library.\nXQuery/XPath Functions and Operators supported ARQ supports the scalar functions and operators from \u0026ldquo;XQuery 1.0 and XPath 2.0 Functions and Operators v3.1\u0026rdquo;.\nFunctions in involving sequences are not supported.\nSee XSD Support for details of datatypes and functions currently supported. To check the exact current registrations, see function/StandardFunctions.java.\nSee also the property functions library page.\nFunction Library The prefix afn is \u0026lt;http://jena.apache.org/ARQ/function#\u0026gt;. (The old prefix of \u0026lt;http://jena.hpl.hp.com/ARQ/function#\u0026gt; continues to work. Applications are encouraged to switch.)\nDirect loading using a URI prefix of \u0026lt;java:org.apache.jena.sparql.function.library.\u0026gt; (note the final dot) is deprecated.\nThe prefix fn is \u0026lt;http://www.w3.org/2005/xpath-functions#\u0026gt; (the XPath and XQuery function namespace).\nThe prefix math is \u0026lt;http://www.w3.org/2005/xpath-functions/math#\u0026gt;.\nCustom Aggregates The prefix agg: is \u0026lt;http://jena.apache.org/ARQ/function/aggregate#\u0026gt;.\nThe statistical aggregates are provided are:\nagg:stdev, agg:stdev_samp, agg:stdev_pop, agg:variance, agg:var_samp, agg:var_pop\nThese are modelled after SQL aggregate functions STDEV, STDEV_SAMP, STDEV_POP, VARIANCE, VAR_SAMP, VAR_POP.\nThese, as keywords, are available in ARQ\u0026rsquo;s extended SPARQL (parse using Syntax.syntaxARQ).\nAdditional Functions Provided by ARQ Most of these have equivalents, or near equivalents, in SPARQL or as an XQuery function and are to be preferred. These ARQ-specific versions remain for compatibility.\nRDF Graph Functions\nFunction name Description Alternative afn:bnode(?x) Return the blank node label if ?x is a blank node. STR(?x) afn:localname(?x) The local name of ?x REPLACE(STR(?x), \u0026quot;^(.*)(/|#)([^#/]*)$\u0026quot;, \u0026quot;$3\u0026quot;) afn:namespace(?x) The namespace of ?x REPLACE(STR(?x), \u0026quot;^(.*)(/|#)([^#/]*)$\u0026quot;, \u0026quot;$1\u0026quot;) The prefix and local name of a IRI is based on splitting the IRI, not on any prefixes in the query or dataset.\nString Functions\nFunction name Description Alternative afn:sprintf(format, v1, v2, ...) Make a string from the format string and the RDF terms. afn:substr(string, startIndex [,endIndex]) Substring, Java style using startIndex and endIndex. afn:substring Synonym for afn:substr afn:strjoin(sep, string ...) Concatenate string together, with a separator. afn:sha1sum(resource) Calculate the SHA1 checksum of a literal or URI SHA1(STR(resource)) Notes:\nStrings in \u0026ldquo;XQuery 1.0 and XPath 2.0 Functions and Operators\u0026rdquo; start from character position one, unlike Java and C# where strings start from zero. The fn:substring operation takes an optional length, like C# but different from Java, where it is the endIndex of the first character after the substring. afn:substr uses Java-style startIndex and endIndex. Mathematical Functions\nFunction name Description Alternative afn:min(num1, num2) Return the minimum of two numbers fn:min afn:max(num1, num2) Return the maximum of two numbers fn:max afn:pi() The value of pi, as an XSD double math:pi() afn:e() The value of e, as an XSD double math:exp(1) afn:sqrt(num) The square root of num math:sqrt Miscellaneous Functions\nFunction name Description Alternative afn:now() Current time. Actually, the time the query started. NOW() afn:sha1sum(resource) Calculate the SHA1 checksum SHASUM ","permalink":"https://jena.apache.org/documentation/query/library-function.html","tags":null,"title":"Functions in ARQ"},{"categories":null,"contents":"The jena-fuseki-docker package contains a Dockerfile, docker-compose file, and helper scripts to create a docker container for Apache Jena Fuseki.\nThe docker container is based on Fuseki main for running a SPARQL server.\nThere is no UI - all configuration is by command line and all usage by via the network protocols.\nDatabases can be mounted outside the docker container so they are preserved when the container terminates.\nThis build system allows the user to customize the docker image.\nThe docker build downloads the server binary from Maven central, checking the download against the SHA1 checksum.\nDatabase There is a volume mapping \u0026ldquo;./databases\u0026rdquo; in the current directory into the server. This can be used to contain databases outside, but accessible to, the container that do not get deleted when the container exits.\nSee examples below.\nBuild Choose the version number of Apache Jena release you wish to use. This toolkit defaults to the version of the overall Jena release it was part of. It is best to use the release of this set of tools from the same release of the desired server.\ndocker-compose build --build-arg JENA_VERSION=3.16.0 Note the build command must provide the version number.\nTest Run docker-compose run cam be used to test the build from the previous section.\nExamples:\nStart Fuseki with an in-memory, updatable dataset at http://host:3030/ds\ndocker-compose run --rm --service-ports fuseki --mem /ds Load a TDB2 database, and expose, read-only, via docker:\nmkdir -p databases/DB2 tdb2.tdbloader --loc databases/DB2 MyData.ttl # Publish read-only docker-compose run --rm --name MyServer --service-ports fuseki --loc databases/DB2 /ds To allow update on the database, add --update. Updates are persisted.\ndocker-compose run --rm --name MyServer --service-ports fuseki --update --loc databases/DB2 /ds See fuseki-configuration for more information on command line arguments.\nTo use docker-compose up, edit the docker-compose.yaml to set the Fuseki command line arguments appropriately.\nLayout The default layout in the container is:\nPath Use /opt/java-minimal A reduced size Java runtime /fuseki The Fuseki installation /fuseki/log4j2.properties Logging configuration /fuseki/databases/ Directory for a volume for persistent databases Setting JVM arguments Use JAVA_OPTIONS:\ndocker-compose run --service-ports --rm -e JAVA_OPTIONS=\u0026quot;-Xmx1048m -Xms1048m\u0026quot; --name MyServer fuseki --mem /ds Docker Commands If you prefer to use docker directly:\nBuild:\ndocker build --force-rm --build-arg JENA_VERSION=3.16.0 -t fuseki . Run:\ndocker run -i --rm -p \u0026quot;3030:3030\u0026quot; --name MyServer -t fuseki --mem /ds With databases on a bind mount to host filesystem directory:\nMNT=\u0026quot;--mount type=bind,src=$PWD/databases,dst=/fuseki/databases\u0026quot; docker run -i --rm -p \u0026quot;3030:3030\u0026quot; $MNT --name MyServer -t fuseki --update --loc databases/DB2 /ds Version specific notes: Versions of Jena up to 3.14.0 use Log4j1 for logging. The docker will build will ignore the log4j2.properties file Version 3.15.0: When run, a warning will be emitted.\nWARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.\nThis can be ignored. ","permalink":"https://jena.apache.org/documentation/fuseki2/fuseki-docker.html","tags":null,"title":"Fuseki : Docker Tools"},{"categories":null,"contents":"Fuseki can be run within a larger JVM application as an embedded triplestore.\nDependencies and Setup Logging Building a Server Examples The application can safely access and modify the data published by the server if it does so inside a transaction using an appropriate storage choice. DatasetFactory.createTxnMem() is a good choice for in-memory use; TDB is a good choice for a persistent database.\nTo build and start the server:\nDataset ds = ... FusekiServer server = FusekiServer.create() .add(\u0026quot;/rdf\u0026quot;, ds) .build() ; server.start() ; then the application can modify the dataset:\n// Add some data while live. // Write transaction. Txn.execWrite(dsg, ()-\u0026gt;RDFDataMgr.read(dsg, \u0026quot;D.trig\u0026quot;)) ; or read the dataset and see any updates made by remote systems:\n// Query data while live // Read transaction. Txn.execRead(dsg, ()-\u0026gt;{ Dataset ds = DatasetFactory.wrap(dsg) ; try (QueryExecution qExec = QueryExecution.create(\u0026quot;SELECT * { ?s ?o}\u0026quot;, ds) ) { ResultSet rs = qExec.execSelect() ; ResultSetFormatter.out(rs) ; } }) ; The full Jena API can be used provided operations (read and write) are inside a transaction.\nDependencies and Setup To include an embedded Fuseki server in the application:\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-fuseki-main\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;3.x.y\u0026lt;/version\u0026gt; \u0026lt;!-- Set the version --\u0026gt; \u0026lt;/dependency\u0026gt; This brings in enough dependencies to run Fuseki. Application writers are strongly encouraged to use a dependency manager because the number of Jetty and other dependencies is quite large and difficult to set manually.\nThis dependency does not include a logging setting. Fuseki uses slf4j.\nIf the application wishes to use a dataset with a text-index then the application will also need to include jena-text in its dependencies.\nLogging The application must set the logging provided for slf4j. Apache Jena provides helpers Apache Log4j v2.\nFor Apache Log4j2, call:\nFusekiLogging.setLogging(); and dependency:\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.logging.log4j\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;log4j-slf4j-impl\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;2.13.1\u0026lt;/version\u0026gt; \u0026lt;!-- Many versions work --\u0026gt; \u0026lt;/dependency\u0026gt; See Fuseki Logging.\nTo silence logging from Java, try:\nLogCtl.setLevel(Fuseki.serverLogName, \u0026quot;WARN\u0026quot;); LogCtl.setLevel(Fuseki.actionLogName, \u0026quot;WARN\u0026quot;); LogCtl.setLevel(Fuseki.requestLogName, \u0026quot;WARN\u0026quot;); LogCtl.setLevel(Fuseki.adminLogName, \u0026quot;WARN\u0026quot;); LogCtl.setLevel(\u0026quot;org.eclipse.jetty\u0026quot;, \u0026quot;WARN\u0026quot;); Building a server A FusekiServer is built by creating a configuration, building the server, then running it. The application needs to start the server.\nThe default port for a Fuseki embedded server is 3330. This is different for the default port for Fuseki running as a standalone server or as a webapp application.\nExamples of embedded use Example 1 Create a server on port 3330, that provides the default set of endpoints for an RDF dataset that can be updated via HTTP.\nDataset ds = DatasetFactory.createTxnMem() ; FusekiServer server = FusekiServer.create() .add(\u0026quot;/ds\u0026quot;, ds) .build() ; server.start() ; ... server.stop() ; The services are avilable on a named endpoint and also on the dataset URL itself.\nURLs:\nService Endpoint1 Endpoint2 SPARQL Query http://host:3330/ds/query http://host:3330/ds SPARQL Query http://host:3330/ds/sparql http://host:3330/ds SPARQL Update http://host:3330/ds/update http://host:3330/ds GSP read-write http://host:3330/ds/data http://host:3330/ds \u0026ldquo;GSP\u0026rdquo; = SPARQL Graph Store Protocol\nExample 2 Create a server on port 3332, that provides the default set of endpoints for a data set that is read-only over HTTP. The application can still update the dataset.\nDataset ds = ... ; FusekiServer server = FusekiServer.create() .port(3332) .add(\u0026quot;/ds\u0026quot;, ds, false) .build() ; server.start() ; Service Endpoint Endpoint2 SPARQL Query http://host:3332/ds/query http://host:3332/ds SPARQL Query http://host:3332/ds/sparql http://host:3332/ds GSP read-only http://host:3332/ds/data http://host:3332/ds Example 3 Different combinations of services and endpoint names can be given using a DataService.\nDatasetGraph dsg = ... ; DataService dataService = new DataService(dsg) ; dataService.addEndpoint(OperationName.GSP_RW, \u0026quot;\u0026quot;); dataService.addEndpoint(OperationName.Query, \u0026quot;\u0026quot;); dataService.addEndpoint(OperationName.Update, \u0026quot;\u0026quot;); FusekiServer server = FusekiServer.create() .port(3332) .add(\u0026quot;/data\u0026quot;, dataService) .build() ; server.start() ; This setup puts all the operation on the dataset URL. The Content-type and any query string is used to determine the operation.\nService Endpoint SPARQL Query http://host:3332/ds SPARQL Update http://host:3332/ds GSP read-write http://host:3332/ds Example 4 Multiple datasets can be served by one server.\nDataset ds1 = ... Dataset ds2 = ... FusekiServer server = FusekiServer.create() .add(\u0026quot;/data1\u0026quot;, ds1) .add(\u0026quot;/data1-readonly\u0026quot;, ds1, true) .add(\u0026quot;/data2\u0026quot;, ds2) .build() ; server.start() ; ","permalink":"https://jena.apache.org/documentation/fuseki2/fuseki-embedded.html","tags":null,"title":"Fuseki : Embedded Server"},{"categories":null,"contents":"Fuseki main is a packaging of Fuseki as a triple store without a UI for administration.\nFuseki can be run in the background by an application as an embedded server. The application can safely work with the dataset directly from java while having Fuseki provide SPARQL access over HTTP. An embedded server is useful for adding functionality around a triple store and also for development and testing.\nRunning as a deployment or development server Running from Docker Running as an embedded server Dependencies and Setup Logging Building a Server Examples The main server does not depend on any files on disk (other than for databases provided by the application), and does not provide the Fuseki UI or admins functions to create dataset via HTTP.\nSee also Data Access Control for Fuseki.\nRunning as a configured deployment or development server The artifact org.apache.jena:jena-fuseki-server is a packaging of the \u0026ldquo;main\u0026rdquo; server that runs from the command line. Unlike the UI Fuseki server, it is only configured from the command line and has no persistent work area on-disk.\njava -jar jena-fuseki-server-$VER.jar --help The arguments are the same as the full UI server command line program. There are no special environment variables.\nThe entry point is org.apache.jena.fuseki.main.cmds.FusekiMainCmd so the server can also be run as:\njava -cp jena-fuseki-server-$VER.jar:...OtherJars... \\ org.apache.jena.fuseki.main.cmds.FusekiMainCmd ARGS Docker A kit to build a container with docker or docker compose\nhttps://repo1.maven.org/maven2/org/apache/jena/jena-fuseki-docker/ Note: take care that databases are on mounted volumes if they are to persist after the container is removed.\nSee the Fuseki docker tools page for details.\nRunning as an embedded server Fuseki can be run from inside an Java application to provide SPARQL services to application data. The application can continue to access and update the datasets served by the server.\nTo build and start the server:\nDataset ds = ... FusekiServer server = FusekiServer.create() .add(\u0026quot;/dataset\u0026quot;, ds) .build() ; server.start() ; See Fuseki embedded documentation for details and examples.\n","permalink":"https://jena.apache.org/documentation/fuseki2/fuseki-main.html","tags":null,"title":"Fuseki : Main Server"},{"categories":null,"contents":"A data service provides a number of operations on a dataset. These can be explicitly named endpoints or operations at the URL of the dataset. New operations can be configured in; these typically have their own named endpoints.\nSyntax Here is an example of a server configuration that provides one operation, SPARQL query, and then only on the dataset URL.\nPREFIX : \u0026lt;#\u0026gt; PREFIX fuseki: \u0026lt;http://jena.apache.org/fuseki#\u0026gt; PREFIX rdf: \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026gt; [] rdf:type fuseki:Server . \u0026lt;#service\u0026gt; rdf:type fuseki:Service ; fuseki:name \u0026quot;dataset\u0026quot; ; fuseki:endpoint [ fuseki:operation fuseki:query ] ; fuseki:dataset \u0026lt;#dataset\u0026gt; . ## In memory transactional dataset initially loaded ## with the contents of file \u0026quot;data.trig\u0026quot; \u0026lt;#dataset\u0026gt; rdf:type ja:MemoryDataset; ja:data \u0026quot;data.trig\u0026quot; . This is invoked with a URL of the form http://host:port/dataset?query=... which is a SPARQL query request sent to the dataset URL.\nThe property fuseki:endpoint describes the operation available. No name is given so the operation is available at the URL of the dataset.\nfuseki:dataset names the dataset to be used with this data service.\nIn this second example:\n\u0026lt;#service\u0026gt; rdf:type fuseki:Service ; fuseki:name \u0026quot;dataset\u0026quot; ; fuseki:endpoint [ fuseki:operation fuseki:query ; fuseki:name \u0026quot;sparql\u0026quot;; ]; fuseki:dataset \u0026lt;#dataset\u0026gt; . the endpoint has a name. The URL to invoke the operation is now:\nhttp://host:port/dataset/sparql?query=...\nand is similar to older form:\n\u0026lt;#service\u0026gt; rdf:type fuseki:Service ; fuseki:name \u0026quot;dataset\u0026quot; ; fuseki:serviceQuery \u0026quot;sparql\u0026quot; ; fuseki:dataset \u0026lt;#dataset\u0026gt; . Operations on the dataset URL have the name \u0026quot;\u0026quot; (the empty string) and this is the default. The first example is the same as:\n\u0026lt;#service\u0026gt; rdf:type fuseki:Service ; fuseki:name \u0026quot;dataset\u0026quot; ; fuseki:endpoint [ fuseki:operation fuseki:query ; fuseki:name \u0026quot;\u0026quot; ; ] ; fuseki:dataset \u0026lt;#dataset\u0026gt; . Original Configuration Syntax The syntax described on this page was introduced in Apache Jena 3.13.0.\nThe previous syntax is still valid.\nThe new syntax enables more configuration options and gives more control of server functionality:\nsetting the context on a per-endpoint basis having multiple operations at the service access point, switching based on operation type a more general structure for adding custom services adding custom extensions to a Fuseki server Operations The following operations are provided:\nURI Operation fuseki:query SPARQL 1.1 Query with ARQ extensions fuseki:update SPARQL 1.1 Update with ARQ extensions fuseki:gsp-r SPARQL Graph Store Protocol and Quad extensions (read only) fuseki:gsp-rw SPARQL Graph Store Protocol and Quad extensions fuseki:upload HTML form file upload fuseki:no-op An operation that causes a 400 or 404 error Custom extensions can be added (see Programmatic configuration of the Fuseki server). To be able to uniquely identify the operation, these are usually\nfuseki:endpoint [ fuseki:operation fuseki:shacl ; fuseki:name \u0026quot;shacl\u0026quot; ; ] ; See the section \u0026ldquo;Integration with Apache Jena Fuseki\u0026rdquo; for details of the SHACL support. While this operation is part of the standard Fuseki distribution, this operation is added during system initialization, using the custom operation support.\nCommand Line Equivalents The standard set of service installed by running the server from the command line without a configuration file is for a read-only:\n\u0026lt;#service1\u0026gt; rdf:type fuseki:Service ; fuseki:name \u0026quot;dataset\u0026quot; ; fuseki:endpoint [ fuseki:operation fuseki:query ; ]; fuseki:endpoint [ fuseki:operation fuseki:query ; fuseki:name \u0026quot;sparql\u0026quot; ]; fuseki:endpoint [ fuseki:operation fuseki:query ; fuseki:name \u0026quot;query\u0026quot; ]; fuseki:endpoint [ fuseki:operation fuseki:gsp-r ; ]; fuseki:endpoint [ fuseki:operation fuseki:gsp-r ; fuseki:name \u0026quot;get\u0026quot; ]; fuseki:dataset ... which supports requests such as:\nhttp://\u0026lt;i\u0026gt;host:port\u0026lt;/i\u0026gt;/dataset?query=... http://\u0026lt;i\u0026gt;host:port\u0026lt;/i\u0026gt;/dataset/sparql?query=... http://\u0026lt;i\u0026gt;host:port\u0026lt;/i\u0026gt;/dataset?default http://\u0026lt;i\u0026gt;host:port\u0026lt;/i\u0026gt;/dataset/get?default and for an updatable dataset (command line --mem for an in-memory dataset; or with TDB storage, with --update):\n\u0026lt;#service1\u0026gt; rdf:type fuseki:Service ; fuseki:name \u0026quot;dataset\u0026quot; ; fuseki:endpoint [ fuseki:operation fuseki:query ;]; fuseki:endpoint [ fuseki:operation fuseki:query ; fuseki:name \u0026quot;sparql\u0026quot; ]; fuseki:endpoint [ fuseki:operation fuseki:query ; fuseki:name \u0026quot;query\u0026quot; ]; fuseki:endpoint [ fuseki:operation fuseki:update ; ]; fuseki:endpoint [ fuseki:operation fuseki:update ; fuseki:name \u0026quot;update\u0026quot; ]; fuseki:endpoint [ fuseki:operation fuseki:gsp-r ; fuseki:name \u0026quot;get\u0026quot; ]; fuseki:endpoint [ fuseki:operation fuseki:gsp-rw ; ] ; fuseki:endpoint [ fuseki:operation fuseki:gsp-rw ; fuseki:name \u0026quot;data\u0026quot; ]; fuseki:endpoint [ fuseki:operation fuseki:upload ; fuseki:name \u0026quot;upload\u0026quot; ] fuseki:dataset ... which adds requests that can change the data.\nNew operations can be added by programmatic setup in Fuseki Main.\nDispatch \u0026ldquo;Dispatch\u0026rdquo; is the process of routing a HTTP request to a specific operation processor implementation to handle the request.\nDispatch to named endpoint usually happens from the name alone, when there is a unique name for an endpoint. If, however, two endpoints give the same fuseki:name, or if operations are defined for the dataset itself, then dispatch is based on a second step of determining the operation type by inspecting the request. Each of the SPARQL operations has a unique signature.\nA query is either a GET with query string including \u0026ldquo;?query=\u0026rdquo;, or a POST with a content type of the body \u0026ldquo;application/sparql-query\u0026rdquo;, or an HTML form with a field \u0026ldquo;query=\u0026rdquo;\nAn update is a POST where the body is \u0026ldquo;application/sparql-update\u0026rdquo; or an HTML form with field \u0026ldquo;update=\u0026rdquo;.\nA GSP operation has ?default or ?graph=.\nQuads operations are also provided by GSP endpoints when there is no query string and a have a Content-Type for a data in a RDF triples or quads syntax.\nSo, for example \u0026ldquo;GET /dataset\u0026rdquo; is a request to get all the triples and quads in the dataset. The syntax for the response is determined by content negotiation, defaulting to text/trig.\nCustom services usually use a named endpoint. Custom operations can specific a content type that they handle, which must be unique for the operation. They can not provide a query string signature for dispatch.\nCommon Cases This section describes a few deployment patterns:\nCase 1: Read-only Dataset The 2 SPARQL standard operations for a read-only dataset:\n\u0026lt;#service\u0026gt; rdf:type fuseki:Service ; fuseki:name \u0026quot;ds-read-only\u0026quot; ; ## fuseki:name \u0026quot;\u0026quot; is optional. fuseki:endpoint [ fuseki:operation fuseki:query; ] ; fuseki:endpoint [ fuseki:operation fuseki:gsp-r; ] ; fuseki:dataset \u0026lt;#dataset\u0026gt; . This is good for publishing data.\nCase 2: Dataset level operation. The 3 SPARQL standard operations for a read-write dataset, request are sent to http://host:port/dataset. There are no named endpoint services.\n\u0026lt;#service\u0026gt; rdf:type fuseki:Service ; fuseki:name \u0026quot;ds-rw\u0026quot; ; ## fuseki:name \u0026quot;\u0026quot; is optional. fuseki:endpoint [ fuseki:operation fuseki:query; ] ; fuseki:endpoint [ fuseki:operation fuseki:update;] ; fuseki:endpoint [ fuseki:operation fuseki:gsp-rw; ] ; fuseki:dataset \u0026lt;#dataset\u0026gt; . Case 3: Named endpoints \u0026lt;#service1\u0026gt; rdf:type fuseki:Service ; fuseki:name \u0026quot;ds-named\u0026quot; ; fuseki:endpoint [ fuseki:operation fuseki:query; fuseki:name \u0026quot;sparql\u0026quot; ] ; fuseki:endpoint [ fuseki:operation fuseki:query; fuseki:name \u0026quot;query\u0026quot; ] ; fuseki:endpoint [ fuseki:operation fuseki:update; fuseki:name \u0026quot;update\u0026quot; ] ; fuseki:endpoint [ fuseki:operation fuseki:upload; fuseki:name \u0026quot;upload\u0026quot; ] ; fuseki:endpoint [ fuseki:operation fuseki:gsp_r; fuseki:name \u0026quot;get\u0026quot; ] ; fuseki:endpoint [ fuseki:operation fuseki:gsp_rw; fuseki:name \u0026quot;data\u0026quot; ] ; fuseki:dataset \u0026lt;#dataset\u0026gt; . The operation on this dataset can only be accessed as \u0026ldquo;/ds-named/sparql\u0026rdquo;, \u0026ldquo;/ds-named/update\u0026rdquo; etc, not as \u0026ldquo;/ds-named\u0026rdquo;.\nCase 4: Named endpoints with query of the dataset. \u0026lt;#service1\u0026gt; rdf:type fuseki:Service ; fuseki:name \u0026quot;ds\u0026quot; ; fuseki:endpoint [ fuseki:operation fuseki:query ] ; fuseki:endpoint [ fuseki:operation fuseki:query; fuseki:name \u0026quot;sparql\u0026quot; ] ; fuseki:endpoint [ fuseki:operation fuseki:query; fuseki:name \u0026quot;query\u0026quot; ] ; fuseki:endpoint [ fuseki:operation fuseki:update; fuseki:name \u0026quot;update\u0026quot; ] ; fuseki:endpoint [ fuseki:operation fuseki:upload; fuseki:name \u0026quot;upload\u0026quot; ] ; fuseki:endpoint [ fuseki:operation fuseki:gsp_r; fuseki:name \u0026quot;get\u0026quot; ] ; fuseki:endpoint [ fuseki:operation fuseki:gsp_rw; fuseki:name \u0026quot;data\u0026quot; ] ; fuseki:dataset \u0026lt;#dataset\u0026gt; . The operations on this dataset are accessed as \u0026ldquo;/ds/sparql\u0026rdquo;, \u0026ldquo;/ds/update\u0026rdquo; etc. In addition, \u0026ldquo;/ds?query=\u0026rdquo; provided SPARQL query.\nQuad extensions The GSP (SPARQL Graph Store Protocol) operations provide the HTTP operations of GET, POST, PUT and DELETE for specific graphs in the RDF dataset. The SPARQL GSP standard includes identifying the target graph with ?default or ?graph=...uri... and the request or response is one of the RDF triple syntaxes (Turtle, N-Triples, JSON-LD, RDF/XML) as well as older proposals (TriX and RDF/JSON).\nApache Jena Fuseki also provides quad operations for HTTP methods GET, POST, PUT (not DELETE, that would be the dataset itself), and the request or response is one of the syntaxes for datasets (TriG, N-Quads, JSON-LD, TriX).\nThe DSP (\u0026ldquo;Dataset Store Protocol\u0026rdquo;) operations provide operations similar to GSP but operating on the dataset, not a speciifc graph.\nFuseki also provides [/documentation/io/rdf-binary.html](RDF Binary) for triples and quads.\nContext Each operation execution is given a \u0026ldquo;context\u0026rdquo; - a set of name-value pairs. Internally, this is used for system registries, for the fixed \u0026ldquo;current time\u0026rdquo; for an operation. The context is the merge of the server\u0026rsquo;s context, any additional settings on the dataset and any settings for the endpoint. The merge is performed in that order - server then dataset then endpoint.\nUses for the context setting include query timeouts and making default query pattern matching apply to the union of named graphs, not the default graph.\nIn this example (prefix tdb2: is for URI \u0026lt;http://jena.apache.org/2016/tdb#\u0026gt;):\n\u0026lt;#servicetdb\u0026gt; rdf:type fuseki:Service ; fuseki:name \u0026quot;ds-tdb\u0026quot; ; fuseki:endpoint [ fuseki:operation fuseki:query ; fuseki:name \u0026quot;sparql-union\u0026quot; ; ja:context [ ja:cxtName \u0026quot;tdb:unionDefaultGraph\u0026quot; ; ja:cxtValue true ] ; ] ; fuseki:endpoint [ fuseki:operation fuseki:query; ] ; fuseki:endpoint [ fuseki:operation fuseki:update; ] ; fuseki:dataset \u0026lt;#tdbDataset\u0026gt; . \u0026lt;#tdbDataset\u0026gt; rdf:type tdb2:DatasetTDB ; ja:context [ ja:cxtName \u0026quot;arq:queryTimeout\u0026quot; ; ja:cxtValue \u0026quot;10000,30000\u0026quot; ] ; tdb2:location \u0026quot;DATA\u0026quot; . \u0026ldquo;/ds-tdb\u0026rdquo; is a TDB2 database with endpoints for SPARQL query and update on the dataset URL. In addition, it has a named service \u0026ldquo;/ds-tdb/sparql-union\u0026rdquo; where the query works with the union of named graphs as the default graph.\nQuery timeout is set for any use of the dataset with first result in 10 seconds, and complete results in 30 seconds.\nSecurity The page Data Access Control for Fuseki covers the\nFor endpoints, the permitted users are part of the endpoint description.\nfuseki:endpoint [ fuseki:operation fuseki:query; fuseki:name \u0026quot;sparql\u0026quot; ; fuseki:allowedUsers \u0026quot;user1\u0026quot;, \u0026quot;user2\u0026quot; ] ; ","permalink":"https://jena.apache.org/documentation/fuseki2/fuseki-config-endpoint.html","tags":null,"title":"Fuseki Data Service Configuration Syntax"},{"categories":null,"contents":"This page describes the original Fuseki2 server configuration syntax.\nExample:\n## Updatable dataset. \u0026lt;#service1\u0026gt; rdf:type fuseki:Service ; fuseki:name \u0026quot;ds\u0026quot; ; # http://host:port/ds fuseki:serviceQuery \u0026quot;sparql\u0026quot; ; # SPARQL query service fuseki:serviceQuery \u0026quot;query\u0026quot; ; # SPARQL query service (alt name) fuseki:serviceUpdate \u0026quot;update\u0026quot; ; # SPARQL update service fuseki:serviceReadWriteGraphStore \u0026quot;data\u0026quot; ; # SPARQL Graph Store Protocol (read and write) fuseki:serviceReadGraphStore \u0026quot;get\u0026quot; ; # SPARQL Graph Store Protocol (read only) fuseki:dataset \u0026lt;#dataset\u0026gt; ; . \u0026lt;#dataset\u0026gt; refers to a dataset description in the same file.\nThere are a fixed set of services:\nService Description fuseki:serviceQuery SPARQL query service fuseki:serviceUpdate SPARQL update service fuseki:serviceReadGraphStore SPARQL Graph Store Protocol (read) fuseki:serviceReadWriteGraphStore SPARQL Graph Store Protocol (read and write) Configuration syntax can be mixed. If there are both old style and new style configurations for the same endpoint, the new style configuration is used.\nQuads operations on dataset are implied if there is a SPARQL Graph Store Protocol service configured.\nIf a request is made on the dataset (no service name in the request URL), then the dispatcher classifies the operation and looks for a named endpoint for that operation of any name. If one is found, that is used. In the full endpoint configuration syntax, the additional dataset services are specified explicitly.\nThe equivalent of\nfuseki:serviceQuery \u0026quot;sparql\u0026quot; ; is\nfuseki:endpoint [ fuseki:operation fuseki:query ; ]; fuseki:endpoint [ fuseki:operation fuseki:query ; fuseki:name \u0026quot;sparql\u0026quot; ]; and the two endpoint can have different context setting and security.\n","permalink":"https://jena.apache.org/documentation/fuseki2/fuseki-old-config-endpoint.html","tags":null,"title":"Fuseki Data Service Configuration Syntax - Old Style"},{"categories":null,"contents":"There are two areas: the fixed files provided by the distribution and the changing files for the local deployment,including the default location for TDB databases.\nTwo environment variables control the file system usage. Symbolic links can be used to create variations on the standard layout.\nFUSEKI_HOME - this contains the fixed files from the distribution and is used for Unix service deployments. When deployment as a WAR file, everything is in the WAR file itself.\nFUSEKI_BASE - this contains the deployment files.\nMode Environment Variable Default Setting Service FUSEKI_HOME /usr/share/fuseki FUSEKI_BASE /etc/fuseki Webapp FUSEKI_HOME N/A (Files in the Fuseki .war file) FUSEKI_BASE /etc/fuseki Standalone FUSEKI_HOME Current directory FUSEKI_BASE ${FUSEKI_HOME}/run/ When run in a web application container (e.g. Tomcat, Jetty or other webapp compliant server), FUSEKI_BASE will be /etc/fuseki.\nIf FUSEKI_BASE is the same as FUSEKI_HOME, be careful when upgrading not to delete server deployment files and directories.\nDistribution area \u0026ndash; FUSEKI_HOME Directory or File Usage fuseki Fuseki Service (Linux) fuseki-server Fuseki standalone command fuseki-server.bat Fuseki standalone command fuseki-server.jar The Fuseki Server binary fuseki.war The Fuseki Server as a WAR file bin/ Helper scripts webapp/ The webapp for the UI Runtime area \u0026ndash; FUSEKI_BASE Directory or File Usage config.ttl Server configuration shiro.ini Apache Shiro configuration databases/ TDB Databases backups/ Write area for live backups configuration/ Assembler files logs/ Log file area system/ System configuration database system_files/ Uploaded data service descriptions (copies) templates/ Templates for build-in configurations The system_files/ keeps a copy of any assemblers uploaded to configure the server. The primary copy is kept in the system database.\nResetting To reset the server, stop the server, and delete the system database in system/, the system_files/ and any other unwanted deployment files, then restart the server.\n","permalink":"https://jena.apache.org/documentation/fuseki2/fuseki-layout.html","tags":null,"title":"Fuseki File System Layout"},{"categories":null,"contents":"This page describes the HTTP Protocol used to control an Fuseki server via its administrative interface.\nOperations Server Information Datasets and Services Adding a Dataset and its Services Removing a Dataset Dormant and Active Removing a dataset All admin operations have URL paths starting /$/ to avoid clashes with dataset names and this prefix is reserved for the Fuseki control functions. Further operations may be added within this naming scheme.\nOperations Replace {name} with a dataset name: e.g. /$/backup/myDataset.\nMethod URL pattern Description GET /$/ping Check if server is alive POST /$/ping GET /$/server Get basic information (version, uptime, datasets\u0026hellip;) POST /$/server GET /$/status Alias of /$/server POST /$/datasets Create a new dataset GET /$/datasets Get a list of datasets DELETE /$/datasets/{name} Remove a dataset GET /$/datasets/{name} Get information about a dataset POST /$/datasets/{name}?state=offline Switch state of dataset to offline POST /$/datasets/{name}?state=active Switch state of dataset to online POST /$/server/shutdown Not implemented yet GET /$/stats Get request statistics for all datasets GET /$/stats/{name} Get request statistics for a dataset POST /$/backup/{name} POST /$/backups/{name} Alias of /$/backup/{name} GET /$/backups-list POST /$/compact/{name}?deleteOld=true POST /$/sleep GET /$/tasks GET /$/tasks/{name} GET /$/metrics GET /$/logs Not implemented yet Ping Pattern: /$/ping\nThe URL /$/ping is a guaranteed low cost point to test whether a server is running or not. It returns no other information other than to respond to the request over GET or POST (to avoid any HTTP caching) with a 200 response.\nReturn: current timestamp\nServer Information Pattern: /$/server\nThe URL /$/server returns details about the server and it\u0026rsquo;s current status in JSON.\n@@details of JSON format.\nDatasets and Services Pattern: /$/datasets\n/$/datasets is a container representing all datasets present in the server. /$/datasets/{name} names a specific dataset. As a container, operations on items in the container, via GET, POST and DELETE, operate on specific dataset.\nAdding a Dataset and its Services. @@ May add server-managed templates\nA dataset set can be added to a running server. There are several methods for doing this:\nPost the assembler file HTML Form upload the assembler file Use a built-in template (in-memory or persistent) All require HTTP POST.\nChanges to the server state are carried across restarts.\nFor persistent datasets, for example TDB, the dataset is persists across restart.\nFor in-memory datasets, the dataset is rebuilt from it\u0026rsquo;s description (this may include loading data from a file) but any changes are lost.\nTemplates A short-cut form for some common set-ups is provided by POSTing with the following parameters (query string or HTML form):\nParameter dbType Either mem or tdb dbName URL path name The dataset name must not be already in-use.\nDatasets are created in directory databases/.\nAssembler example The assembler description contains data and service. It can be sent by posting the assembler RDF graph in any RDF format or by posting from an HTML form (the syntax must be Turtle).\nThe assembler file is stored by the server will be used on restart or when making the dataset active again.\n@@\nRemoving a Dataset Note: DELETE means \u0026ldquo;gone for ever\u0026rdquo;. The dataset name and the details of its configuration are completely deleted and can not be recovered.\nThe data of a TDB dataset is not deleted.\nActive and Offline A dataset is in one of two modes: \u0026ldquo;active\u0026rdquo;, meaning it is services request over HTTP (subject to configuration and security), or \u0026ldquo;offline\u0026rdquo;, meaning the configuration and name is known about by the server but the dataset is not attached to the server. When \u0026ldquo;offline\u0026rdquo;, any persistent data can be manipulated outside the server.\nDatasets are initially \u0026ldquo;active\u0026rdquo;. The transition from \u0026ldquo;active\u0026rdquo; to \u0026ldquo;offline\u0026rdquo; is graceful - all outstanding requests are completed.\nStatistics Pattern: /$/stats/{name}\nStatistics can be obtained for each dataset or all datasets in a single response. /$/stats is treated as a container for this information.\n@@ stats details See Fuseki Server Information for details of statistics kept by a Fuseki server.\nBackup Pattern: /$/backup/{name}\nThis operation initiates a backup and returns a JSON object with the task Id in it.\nBackups are written to the server local directory \u0026lsquo;backups\u0026rsquo; as gzip-compressed N-Quads files.\nSee Tasks for how to monitor a backups progress.\nReturn: A task is allocated a identifier (usually, a number).\n{ \u0026#34;taskId\u0026#34; : \u0026#34;{taskId}\u0026#34; } The task id can be used to construct a URL to get details of the task:\n/$/tasks/{taskId} Pattern: /$/backups-list\nReturns a list of all the files in the backup area of the server. This is useful for managing the files externally.\nThe returned JSON object will have the form { backups: [ ... ] } where the [] array is a list of file names.\nSince 4.7.0 backups are written to a temporary file in the same directory and renamed on completion. In case of server crash, it will not be renamed. This guarantees backups are complete. Cleanup of incomplete backups can be done by users on application / container start: remove all incomplete files.g\nBackup policies Users can use the backup api the Fuseki HTTP Administration Protocol to build backup policies. See issue for more information https://github.com/apache/jena/issues/1500 .\nCompact Pattern: /$/compact/{name}\nThis operations initiates a database compaction task and returns a JSON object with the task Id in it.\nThe optional parameter and value deleteOld=true deletes the database which currently is compacted after compacting completion.\nCompaction ONLY applies to TDB2 datasets, see TDB2 Database Administration for more details of this operation.\nYou can monitor the status of the task via the Tasks portion of the API. A successful compaction will have the finishPoint field set and success field set to true.\nTasks Some operations cause a background task to be executed, backup is an example. The result of such operations includes a json object with the task id and also a Location: header with the URL of the task created.\nThe progress of the task can be monitored with HTTP GET operations:\nPattern: /$/tasks – All asynchronous tasks. Pattern: /$/tasks/{taskId} – A particular task.\nThe URL /$/tasks returns a description of all running and recently tasks. A finished task can be identified by having a finishPoint and success fields.\nEach background task has an id. The URL /$/tasks/{taskId} gets a description about one single task.\nDetails of the last few completed tasks are retained, up to a fixed number. The records will eventually be removed as later tasks complete, and the task URL will then return 404.\nPattern: /$/tasks ; example:\n[ { \u0026#34;finished\u0026#34; : \u0026#34;2014-05-28T12:52:51.860+01:00\u0026#34; , \u0026#34;started\u0026#34; : \u0026#34;2014-05-28T12:52:50.859+01:00\u0026#34; , \u0026#34;task\u0026#34; : \u0026#34;sleep\u0026#34; , \u0026#34;taskId\u0026#34; : \u0026#34;1\u0026#34; , \u0026#34;success\u0026#34; : true } , { \u0026#34;finished\u0026#34; : \u0026#34;2014-05-28T12:53:24.718+01:00\u0026#34; , \u0026#34;started\u0026#34; : \u0026#34;2014-05-28T12:53:14.717+01:00\u0026#34; , \u0026#34;task\u0026#34; : \u0026#34;sleep\u0026#34; , \u0026#34;taskId\u0026#34; : \u0026#34;2\u0026#34; , \u0026#34;success\u0026#34; : true } ] Pattern: /$/tasks/1 : example:\n[ { \u0026#34;finished\u0026#34; : \u0026#34;2014-05-28T13:54:13.608+01:00\u0026#34; , \u0026#34;started\u0026#34; : \u0026#34;2014-05-28T13:54:03.607+01:00\u0026#34; , \u0026#34;task\u0026#34; : \u0026#34;backup\u0026#34; , \u0026#34;taskId\u0026#34; : \u0026#34;1\u0026#34; , \u0026#34;success\u0026#34; : false } ] This is inside an array to make the format returned the same as /$/tasks.\nMetrics Pattern: /$/metrics\n@@\n","permalink":"https://jena.apache.org/documentation/fuseki2/fuseki-server-protocol.html","tags":null,"title":"Fuseki HTTP Administration Protocol"},{"categories":null,"contents":"Fuseki logs operation details and also provides a standard NCSA request log.\nLogging is via SLF4J over Apache Log4J2, or by the Tomcat configuration if running the WAR file.\nFull Log name Usage org.apache.jena.fuseki.Server General Server Messages org.apache.jena.fuseki.Request NCSA request Log org.apache.jena.fuseki.Fuseki The HTTP request log org.apache.jena.fuseki.Admin Administration operations org.apache.jena.fuseki.Builder Dataset and service build operations org.apache.jena.fuseki.Config Configuration NCSA request Log This log is in NCSA extended/combined log format.\nMany web log analysers can process this format.\nThis log is normally off. The logger name is org.apache.jena.fuseki.Request.\nWhen run as a WAR file inside a webapp container (e.g. Apache Tomcat), the webapp container or reverse proxy will log access requests anyway.\nSetting logging The Fuseki Main engine looks for the log4j2 configuration as follows:\nUse system property log4j2.configurationFile if defined (as usual for log4j2). Use file:log4j2.properties (current directory) if it exists Use java resource log4j2.properties on the classpath. Use java resource org/apache/jena/fuseki/log4j2.properties on the classpath. Use a built-in configuration. The last step is a fallback to catch the case where Fuseki has been repackaged into a new WAR file and org/apache/jena/fuseki/log4j2.properties omitted, or run from the base jar. It is better to include org/apache/jena/fuseki/log4j2.properties.\nThe preferred customization is to use a custom log4j2.properties file in the directory where Fuseki Main is run.\nFor the war file packaging, the log4j2.properties should go in FUSEKI_BASE which defaults to /etc/fuseki on Linux.\nFor the standalone webapp server, FUSEKI_BASE defaults to directory run/ within the directory where the server is run.\nThe property fuseki.loglogging can also be set to true for additional logging.\nSetting ARQ explain logging Query explanation can be turned on by setting the symbol arq:optReorderBGP in the context to \u0026ldquo;info\u0026rdquo;, \u0026ldquo;fine\u0026rdquo; or \u0026ldquo;all\u0026rdquo;. This can be done in the Assembler file by setting ja:context on the server, dataset, or endpoint:\n[] ja:context [ ja:cxtName \u0026quot;arq:logExec\u0026quot; ; ja:cxtValue \u0026quot;info\u0026quot; ] . Default setting The default log4j2.properties.\nLogrotate Below is an example logrotate(1) configuration (to go in /etc/logrotate.d) assuming the log file has been put in /etc/fuseki/logs/fuseki.log.\nIt rotates the logs once a month, compresses logs on rotation, and keeps them for 6 months.\nIt uses copytruncate. This may lead to at most one broken log file line.\n/etc/fuseki/logs/fuseki.log { compress monthly rotate 6 create missingok copytruncate # Date in extension. dateext # No need # delaycompress } ","permalink":"https://jena.apache.org/documentation/fuseki2/fuseki-logging.html","tags":null,"title":"Fuseki Logging"},{"categories":null,"contents":"Fuseki modules are a mechanism to include extension code into a Fuseki server. Modules are invoked during the process of building a Fuseki Main server. A module can modify the server configuration, add new functionality, or react to a server being built and started.\nThis feature was added in Jena version 4.3.0. It is an experimental feature that will evolve based on feedback and use cases.\nThe interface for modules is FusekiModule; if automatcally loaded, the interface is FusekiAutoModule which extends FusekiModule.\nFuseki modules can be provided in two ways:\nLoaded from additional jars on the classpath Programmatically controlling the setup of the FusekiServer server. Automatically loaded Fuseki Modules can be loaded using the JDK ServiceLoader by being placing a jar file on the classpath, together with any additional dependencies. These provide interface FusekiAutoModule. The service loader is controlled by file resources META-INF/services/org.apache.jena.fuseki.main.sys.FusekiAutoModule in the jar file. The module class must have a no-argument constructor.\nThis is often done by placing the file in the development code in src/main/resources/META-INF/services/). The file containing a line with the implementation full class name. If repacking Fuseki with the maven-shade-plugin, make sure the ServicesResourceTransformer is used.\nThe method start is called when the module is loaded. Custom operations can be globally registered at this point (see the Fuseki examples directory).\nA FusekiAutoModule can provide a level, an integer, to control the order in which modules are invoked during server building. Lower numbers are invoked before larger numbers at each step.\nProgrammaticaly configuring a server If creating a Fuseki server from Java, the modules can be autoloaded as described above, or explicitly added to the server builder.\nA FusekiModules object is collection of modules, called at each point in the order given when creating the object.\nFusekiModule myModule = new MyModule(); FusekiModules fmods = FusekiModules.create(myModule); FusekiServer server = FusekiServer.create() ... .fusekiModules(fmods) ... .build(); Fuseki Module operations The module lifecycle during creating a Fuseki server is:\nprepare - called at the start of the server build steps before setting up the datasets. configured - access and modify the setup. This is called after the server has been configured, before the server is built. It defaults to calls to configDataAccessPoint for dataset being hosted by the server. server - called after the built, before the return of FusekiServerBuilder.build() There are also operations notified when a server is reloaded while running.\nserverConfirmReload serveReload As of Jena 4.9.0, eeload is not yet supported.\nThe Fuseki start up sequence is:\nserverBeforeStarting - called at the start of server.start() serverAfterStarting - called at the end of server.start() serverStopped - called as just after the server has stopped in the server.stop() call. (note, this is not always called because a server can simply exit the JVM). A Fuseki module does not need to implement all these steps. The default for all steps is \u0026ldquo;do nothing\u0026rdquo;. Usually, an extension will only be interested in certain steps, such as prepare, or the registry information of configuration.\nDuring the configuration step, the Fuseki configuration file for the server is available. If the server is built programmatically without a configuration file, this is null.\nThe configuration file can contain RDF information to build resources (e.g. it can contain additional assembler descriptions not directly linked to the server).\nThere is an example Fuseki Module in the Fuseki examples directory.\nFusekiModule interface /** * Module interface for Fuseki. * \u0026lt;p\u0026gt; * A module is additional code, usually in a separate jar, * but can also be part of the application code. */ public interface FusekiModule extends SubsystemLifecycle { /** * Display name to identify this module. */ public String name(); // -- Build cycle. /** * Called at the start of \u0026#34;build\u0026#34; step. The builder has been set according to the * configuration of API calls and parsing configuration files. No build actions have been carried out yet. * The module can make further FusekiServer.{@link Builder} calls. * The \u0026#34;configModel\u0026#34; parameter is set if a configuration file was used otherwise it is null. */ public default void prepare(FusekiServer.Builder serverBuilder, Set\u0026lt;String\u0026gt; datasetNames, Model configModel) ; /** * Called after the DataAccessPointRegistry has been built. * \u0026lt;p\u0026gt; * The default implementation is to call {@link #configDataAccessPoint(DataAccessPoint, Model)} * for each {@link DataAccessPoint}. * \u0026lt;pre\u0026gt; * dapRegistry.accessPoints().forEach(accessPoint{@literal -\u0026gt;}configDataAccessPoint(accessPoint, configModel)); * \u0026lt;/pre\u0026gt; */ public default void configured(FusekiServer.Builder serverBuilder, DataAccessPointRegistry dapRegistry, Model configModel) { dapRegistry.accessPoints().forEach(accessPoint-\u0026gt;configDataAccessPoint(accessPoint, configModel)); } /** * This method is called for each {@link DataAccessPoint} by the default * implementation of {@link #configured} after the new servers * DataAccessPointRegistry has been built. */ public default void configDataAccessPoint(DataAccessPoint dap, Model configModel) {} /** * Built, not started, about to be returned to the builder caller. */ public default void server(FusekiServer server) { } /** * Confirm or reject a request to reload. */ public default boolean serverConfirmReload(FusekiServer server) { return true; } /** * Perform any operations necessary for a reload. */ public default void serverReload(FusekiServer server) { } // -- Server start up /** * Server starting - called just before server.start happens. */ public default void serverBeforeStarting(FusekiServer server) { } /** * Server started - called just after server.start happens, and before server * .start() returns to the application. */ public default void serverAfterStarting(FusekiServer server) { } /** Server stopping. * Do not rely on this to clear up external resources. * Usually there is no stop phase and the JVM just exits or is killed externally. * */ public default void serverStopped(FusekiServer server) { } /** Module unloaded : do not rely on this happening. */ @Override public default void stop() {} } FusekiAutoModules also provide the org.apache.jena.base.module.SubsystemLifecycle interface.\n","permalink":"https://jena.apache.org/documentation/fuseki2/fuseki-modules.html","tags":null,"title":"Fuseki Modules"},{"categories":null,"contents":"This page describes how to achieve certain common tasks in the most direct way possible.\nRunning with Apache Tomcat and loading a file. Unpack the distribution. Copy the WAR file into the Apache tomcat webapp directory, under the name \u0026lsquo;fuseki\u0026rsquo; If the user under which Apache tomcat is running does not have write access to /etc, then please make sure to set the environment variable FUSEKI_BASE, whereas the value should be a directory where the user running Apache tomcat is able to write to. In a browser, go to [http://localhost:8080/fuseki/](http://localhost:8080/fuseki) (details such as port number depend on the Tomcat setup). Click on \u0026ldquo;Add one\u0026rdquo;, choose \u0026ldquo;in-memory\u0026rdquo;, choose a name for the URL for the dataset. Go to \u0026ldquo;add data\u0026rdquo; and load the file (single graph). Publish an RDF file as a SPARQL endpoint. Unpack the distribution. Run fuseki-server --file FILE /name Explore a TDB database Unpack the distribution. Run fuseki-server --loc=DATABASE /name In a browser, go to http://localhost:3030//query.html More details on running Fuseki can be found nearby, including running as an operating system service and in a web app or servlet container such as Apache Tomcat or Jetty.\n","permalink":"https://jena.apache.org/documentation/fuseki2/fuseki-quick-start.html","tags":null,"title":"Fuseki Quickstart"},{"categories":null,"contents":"A Fuseki server is configured by defining the data services (data and actions available on the data). There is also server configuration although this is often unnecessary.\nThe data services configuration can come from:\nFor Fuseki Full (webapp with UI):\nThe directory FUSEKI_BASE/configuration/ with one data service assembler per file (includes endpoint details and the dataset description.) The system database. This includes uploaded assembler files. It also keeps the state of each data service (whether it\u0026rsquo;s active or offline). The server configuration file config.ttl. For compatibility, the server configuration file can also have data services. The command line, if not running as a web application from a .war file. FUSEKI_BASE is the location of the Fuseki run area.\nFor Fuseki Main:\nThe command line, using --conf to provide a configuration file. The command line, using arguments (e.g. --mem /ds or --tdb2 --loc DB2 /ds). Programmatic configuration of the server. See Fuseki Security for more information on security configuration.\nExamples Example server configuration files can be found at jena-fuseki2/examples.\nSecurity and Access Control Access Control can be configured on any of the server, data service or dataset. Fuseki Data Access Control.\nSeparately, Fuseki Full has request based security filtering provided by Apache Shiro: Fuseki Full Security\nFuseki Configuration File A Fuseki server can be set up using a configuration file. The command-line arguments for publishing a single dataset are a short cut that, internally, builds a default configuration based on the dataset name given.\nThe configuration is an RDF graph. One graph consists of one server description, with a number of services, and each service offers a number of endpoints over a dataset.\nThe example below is all one file (RDF graph in Turtle syntax) split to allow for commentary.\nPrefix declarations Some useful prefix declarations:\nPREFIX fuseki: \u0026lt;http://jena.apache.org/fuseki#\u0026gt; PREFIX rdf: \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026gt; PREFIX rdfs: \u0026lt;http://www.w3.org/2000/01/rdf-schema#\u0026gt; PREFIX tdb1: \u0026lt;http://jena.hpl.hp.com/2008/tdb#\u0026gt; PREFIX tdb2: \u0026lt;http://jena.apache.org/2016/tdb#\u0026gt; PREFIX ja: \u0026lt;http://jena.hpl.hp.com/2005/11/Assembler#\u0026gt; PREFIX : \u0026lt;#\u0026gt; Assembler Initialization All datasets are described by assembler descriptions. Assemblers provide an extensible way of describing many kinds of objects.\nDefining the service name and endpoints available Each data service assembler defines:\nThe base name The operations and endpoint names The dataset for the RDF data. This example offers SPARQL Query, SPARQL Update and SPARQL Graph Store protocol, as well as file upload.\nSee Data Service Configuration Syntax for the complete details of the endpoint configuration description. Here, we show some examples.\nThe original configuration syntax, using, for example, fuseki:serviceQuery, is still supported.\nThe base name is /ds.\n## Updatable in-memory dataset. \u0026lt;#service1\u0026gt; rdf:type fuseki:Service ; fuseki:name \u0026quot;ds\u0026quot; ; # http://host:port/ds fuseki:endpoint [ # SPARQL query service fuseki:operation fuseki:query ; fuseki:name \u0026quot;sparql\u0026quot; ] ; fuseki:endpoint [ # SPARQL query service (alt name) fuseki:operation fuseki:query ; fuseki:name \u0026quot;query\u0026quot; ] ; fuseki:endpoint [ # SPARQL update service fuseki:operation fuseki:update ; fuseki:name \u0026quot;update\u0026quot; ] ; fuseki:endpoint [ # HTML file upload service fuseki:operation fuseki:upload ; fuseki:name \u0026quot;upload\u0026quot; ] ; fuseki:endpoint [ # SPARQL Graph Store Protocol (read) fuseki:operation fuseki:gsp_r ; fuseki:name \u0026quot;get\u0026quot; ] ; fuseki:endpoint [ # SPARQL Graph Store Protcol (read and write) fuseki:operation fuseki:gsp_rw ; fuseki:name \u0026quot;data\u0026quot; ] ; fuseki:dataset \u0026lt;#dataset\u0026gt; ; . \u0026lt;#dataset\u0026gt; refers to a dataset description in the same file.\nHTTP requests will include the service name: http://host:port/ds/sparql?query=....\nRead-only service This example offers only read-only endpoints (SPARQL Query and HTTP GET SPARQL Graph Store protocol).\nThis service offers read-only access to a dataset with a single graph of data.\n\u0026lt;#service2\u0026gt; rdf:type fuseki:Service ; fuseki:name \u0026quot;/ds-ro\u0026quot; ; # http://host:port/ds-ro fuseki:endpoint [ fuseki:operation fuseki:query ; fuseki:name \u0026quot;sparql\u0026quot; ]; fuseki:endpoint [ fuseki:operation fuseki:query ; fuseki:name \u0026quot;query\u0026quot; ]; fuseki:endpoint [ fuseki:operation fuseki:gsp_r ; fuseki:name \u0026quot;data\u0026quot; ]; fuseki:dataset \u0026lt;#dataset\u0026gt; ; . Data services on the dataset The standard SPARQL operations can also be defined on the dataset URL with no secondary service name:\n\u0026lt;#service2\u0026gt; rdf:type fuseki:Service ; fuseki:name \u0026quot;/dataset\u0026quot; ; fuseki:endpoint [ fuseki:operation fuseki:query ]; fuseki:endpoint [ fuseki:operation fuseki:gsp_r ]; fuseki:dataset \u0026lt;#dataset\u0026gt; ; . HTTP requests use the URL of the dataset.\nSPARQL Query: http://host:port/dataset?query=... Fetch the default graph (SPARQL Graph Store Protocol): http://host:port/dataset?default Server Configuration If you need to load additional classes, or set global parameters, then these go in FUSEKI_BASE/config.ttl.\nAdditional classes can not be loaded if running as a .war file. You will need to create a custom .war file consisting of the contents of the Fuseki web application and the additional classes\nThe server section is optional.\nIf absent, fuseki configuration is performed by searching the configuration file for the type fuseki:Service.\nServer Section [] rdf:type fuseki:Server ; # Server-wide context parameters can be given here. # For example, to set query timeouts: on a server-wide basis: # Format 1: \u0026quot;1000\u0026quot; -- 1 second timeout # Format 2: \u0026quot;10000,60000\u0026quot; -- 10s timeout to first result, then 60s timeout to for rest of query. # See java doc for ARQ.queryTimeout # ja:context [ ja:cxtName \u0026quot;arq:queryTimeout\u0026quot; ; ja:cxtValue \u0026quot;10000\u0026quot; ] ; # Explicitly choose which services to add to the server. # If absent, include all descriptions of type `fuseki:Service`. # fuseki:services (\u0026lt;#service1\u0026gt; \u0026lt;#service2\u0026gt;) . Datasets In-memory An in-memory dataset, with data in the default graph taken from a local file.\n\u0026lt;#books\u0026gt; rdf:type ja:RDFDataset ; rdfs:label \u0026quot;Books\u0026quot; ; ja:defaultGraph [ rdfs:label \u0026quot;books.ttl\u0026quot; ; a ja:MemoryModel ; ja:content [ja:externalContent \u0026lt;file:Data/books.ttl\u0026gt; ] ; ] ; . TDB \u0026lt;#dataset\u0026gt; rdf:type tdb:DatasetTDB ; tdb:location \u0026quot;DB\u0026quot; ; # Query timeout on this dataset (1s, 1000 milliseconds) ja:context [ ja:cxtName \u0026quot;arq:queryTimeout\u0026quot; ; ja:cxtValue \u0026quot;1000\u0026quot; ] ; # Make the default graph be the union of all named graphs. ## tdb:unionDefaultGraph true ; . TDB2 \u0026lt;#dataset\u0026gt; rdf:type tdb:DatasetTDB2 ; tdb:location \u0026quot;DB2\u0026quot; ; # Query timeout on this dataset (1s, 1000 milliseconds) ja:context [ ja:cxtName \u0026quot;arq:queryTimeout\u0026quot; ; ja:cxtValue \u0026quot;1000\u0026quot; ] ; # Make the default graph be the union of all named graphs. ## tdb:unionDefaultGraph true ; . Inference An inference reasoner can be layered on top of a dataset as defined above. The type of reasoner must be selected carefully and should not include more reasoning than is required by the application, as extensive reasoning can be detrimental to performance.\nYou have to build up layers of dataset, inference model, and graph.\n\u0026lt;#dataset\u0026gt; rdf:type ja:RDFDataset; ja:defaultGraph \u0026lt;#inferenceModel\u0026gt; . \u0026lt;#inferenceModel\u0026gt; rdf:type ja:InfModel; ja:reasoner [ ja:reasonerURL \u0026lt;http://example/someReasonerURLHere\u0026gt; ]; ja:baseModel \u0026lt;#baseModel\u0026gt;; . \u0026lt;#baseModel\u0026gt; rdf:type tdb:GraphTDB2; # for example. tdb2:location \u0026quot;/some/path/to/store/data/to\u0026quot;; # etc . where http://example/someReasonerURLHere is one of the URLs below.\nPossible reasoners: Details are in the main documentation for inference.\nGeneric Rule Reasoner: http://jena.hpl.hp.com/2003/GenericRuleReasoner\nThe specific rule set and mode configuration can be set through parameters in the configuration Model.\nTransitive Reasoner: http://jena.hpl.hp.com/2003/TransitiveReasoner\nA simple \u0026ldquo;reasoner\u0026rdquo; used to help with API development.\nThis reasoner caches a transitive closure of the subClass and subProperty graphs. The generated infGraph allows both the direct and closed versions of these properties to be retrieved. The cache is built when the tbox is bound in but if the final data graph contains additional subProperty/subClass declarations then the cache has to be rebuilt.\nThe triples in the tbox (if present) will also be included in any query. Any of tbox or data graph are allowed to be null.\nRDFS Rule Reasoner: http://jena.hpl.hp.com/2003/RDFSExptRuleReasoner\nA full implementation of RDFS reasoning using a hybrid rule system, together with optimized subclass/subproperty closure using the transitive graph caches. Implements the container membership property rules using an optional data scanning hook. Implements datatype range validation.\nFull OWL Reasoner: http://jena.hpl.hp.com/2003/OWLFBRuleReasoner\nA hybrid forward/backward implementation of the OWL closure rules.\nMini OWL Reasoner: http://jena.hpl.hp.com/2003/OWLMiniFBRuleReasoner\nKey limitations over the normal OWL configuration are:\nomits the someValuesFrom =\u0026gt; bNode entailments avoids any guard clauses which would break the find() contract omits inheritance of range implications for XSD datatype ranges Micro OWL Reasoner: http://jena.hpl.hp.com/2003/OWLMicroFBRuleReasoner\nThis only supports:\nRDFS entailments basic OWL axioms like ObjectProperty subClassOf Property intersectionOf, equivalentClass and forward implication of unionOf sufficient for traversal of explicit class hierarchies Property axioms (inverseOf, SymmetricProperty, TransitiveProperty, equivalentProperty) There is some experimental support for the cheaper class restriction handling which should not be relied on at this point.\n","permalink":"https://jena.apache.org/documentation/fuseki2/fuseki-configuration.html","tags":null,"title":"Fuseki: Configuring Fuseki"},{"categories":null,"contents":"A Fuseki server keeps detailed statistics for each dataset and each service of a dataset keeps counters as to the number of incoming requests, number of successful requests, number of bad requests (i.e client errors), and number of failing requests (i.e. server errors).\nStatistics are available in JSON and in Prometheus format. The Prometheus data includes both database and JVM metrics.\nEndpoints The following servers endpoints are available. They are present in Fuseki/UI; they need to be enabled with Fuseki/main, either on the command line or in the server configuration file with a boolean setting.\nEndpoint Config Property Usage /$/ping fuseki:pingEP Server liveness endpoint /$/stats fuseki:statsEP JSON format endpoint /$/metrics fuseki:metricsEP Prometheus format endpoint Ping The \u0026ldquo;ping\u0026rdquo; service can be used to test whether a Fuseki server is running. Calling this endpoint imposes minimal overhead on the server. Requests return the current time as a plain text string so to show the ping is current.\nHTTP GET and HTTP POST are supported. The GET request is marked \u0026ldquo;no-cache\u0026rdquo;.\nStructure of the Statistics Report The statistics report shows the endpoints for each dataset with total counts of requests, good request and bad requests.\nExample Endpoints with the format \u0026ldquo;_1\u0026rdquo; etc are unnamed services of the dataset.\n{ \u0026quot;datasets\u0026quot; : { \u0026quot;/ds\u0026quot; : { \u0026quot;Requests\u0026quot; : 0 , \u0026quot;RequestsGood\u0026quot; : 0 , \u0026quot;RequestsBad\u0026quot; : 0 , \u0026quot;endpoints\u0026quot; : { \u0026quot;data\u0026quot; : { \u0026quot;RequestsBad\u0026quot; : 0 , \u0026quot;Requests\u0026quot; : 0 , \u0026quot;RequestsGood\u0026quot; : 0 , \u0026quot;operation\u0026quot; : \u0026quot;gsp-rw\u0026quot; , \u0026quot;description\u0026quot; : \u0026quot;Graph Store Protocol\u0026quot; } , \u0026quot;_1\u0026quot; : { \u0026quot;RequestsBad\u0026quot; : 0 , \u0026quot;Requests\u0026quot; : 0 , \u0026quot;RequestsGood\u0026quot; : 0 , \u0026quot;operation\u0026quot; : \u0026quot;gsp-rw\u0026quot; , \u0026quot;description\u0026quot; : \u0026quot;Graph Store Protocol\u0026quot; } , \u0026quot;_2\u0026quot; : { \u0026quot;RequestsBad\u0026quot; : 0 , \u0026quot;Requests\u0026quot; : 0 , \u0026quot;RequestsGood\u0026quot; : 0 , \u0026quot;operation\u0026quot; : \u0026quot;query\u0026quot; , \u0026quot;description\u0026quot; : \u0026quot;SPARQL Query\u0026quot; } , \u0026quot;query\u0026quot; : { \u0026quot;RequestsBad\u0026quot; : 0 , \u0026quot;Requests\u0026quot; : 0 , \u0026quot;RequestsGood\u0026quot; : 0 , \u0026quot;operation\u0026quot; : \u0026quot;query\u0026quot; , \u0026quot;description\u0026quot; : \u0026quot;SPARQL Query\u0026quot; } , \u0026quot;sparql\u0026quot; : { \u0026quot;RequestsBad\u0026quot; : 0 , \u0026quot;Requests\u0026quot; : 0 , \u0026quot;RequestsGood\u0026quot; : 0 , \u0026quot;operation\u0026quot; : \u0026quot;query\u0026quot; , \u0026quot;description\u0026quot; : \u0026quot;SPARQL Query\u0026quot; } , \u0026quot;get\u0026quot; : { \u0026quot;RequestsBad\u0026quot; : 0 , \u0026quot;Requests\u0026quot; : 0 , \u0026quot;RequestsGood\u0026quot; : 0 , \u0026quot;operation\u0026quot; : \u0026quot;gsp-r\u0026quot; , \u0026quot;description\u0026quot; : \u0026quot;Graph Store Protocol (Read)\u0026quot; } , \u0026quot;update\u0026quot; : { \u0026quot;RequestsBad\u0026quot; : 0 , \u0026quot;Requests\u0026quot; : 0 , \u0026quot;RequestsGood\u0026quot; : 0 , \u0026quot;operation\u0026quot; : \u0026quot;update\u0026quot; , \u0026quot;description\u0026quot; : \u0026quot;SPARQL Update\u0026quot; } , \u0026quot;_3\u0026quot; : { \u0026quot;RequestsBad\u0026quot; : 0 , \u0026quot;Requests\u0026quot; : 0 , \u0026quot;RequestsGood\u0026quot; : 0 , \u0026quot;operation\u0026quot; : \u0026quot;update\u0026quot; , \u0026quot;description\u0026quot; : \u0026quot;SPARQL Update\u0026quot; } , \u0026quot;upload\u0026quot; : { \u0026quot;RequestsBad\u0026quot; : 0 , \u0026quot;Requests\u0026quot; : 0 , \u0026quot;RequestsGood\u0026quot; : 0 , \u0026quot;operation\u0026quot; : \u0026quot;upload\u0026quot; , \u0026quot;description\u0026quot; : \u0026quot;File Upload\u0026quot; } } } } } ","permalink":"https://jena.apache.org/documentation/fuseki2/fuseki-server-info.html","tags":null,"title":"Fuseki: Server Information"},{"categories":null,"contents":"See the Fuseki2 documentation. This page covers Fuseki v1. Fuseki1 is deprecated and has been retired. The last release of Jena with this module is Jena 3.9.0.\nFuseki is a SPARQL server. It provides REST-style SPARQL HTTP Update, SPARQL Query, and SPARQL Update using the SPARQL protocol over HTTP.\nThe relevant SPARQL standards are:\nSPARQL 1.1 Query SPARQL 1.1 Update SPARQL 1.1 Protocol SPARQL 1.1 Graph Store HTTP Protocol Download Fuseki1 Binaries for Fuseki1 are available from the maven repositories.\nThe source code is available in the Apache Jena source release.\nGetting Started With Fuseki This section provides a brief guide to getting up and running with a simple server installation. It uses the SOH (SPARQL over HTTP) scripts included in the download.\nDownload the latest jena-fuseki-*-distribution\nUnpack the downloaded file with unzip or tar zxfv\nMove into the newly-created apache-jena-fuseki-* directory\n(Linux) chmod +x fuseki-server bin/s-*\nRun a server\n./fuseki-server \u0026ndash;update \u0026ndash;mem /ds\nThe server logging goes to the console:\n09:25:41 INFO Fuseki :: Dataset: in-memory 09:25:41 INFO Fuseki :: Update enabled 09:25:41 INFO Fuseki :: Fuseki development 09:25:41 INFO Fuseki :: Jetty 7.2.1.v20101111 09:25:41 INFO Fuseki :: Dataset = /ds 09:25:41 INFO Fuseki :: Started 2011/01/06 09:25:41 GMT on port 3030 User Interface The Fuseki download includes a number of services:\nSPARQL Query, SPARQL Update, and file upload to a selected dataset. Link to the documentation (here). Validators for SPARQL query and update and for non-RDF/XML formats. For the control panel:\nIn a browser, go to http://localhost:3030/ Click on Control Panel Select the dataset (if set up above, there is only one choice). The page offers SPARQL operations and file upload acting on the selected dataset.\nScript Control In a new window:\nLoad some RDF data into the default graph of the server:\ns-put http://localhost:3030/ds/data default books.ttl Get it back:\ns-get http://localhost:3030/ds/data default Query it with SPARQL using the \u0026hellip;/query endpoint.\ns-query --service http://localhost:3030/ds/query 'SELECT * {?s ?p ?o}' Update it with SPARQL using the \u0026hellip;/update endpoint.\ns-update --service http://localhost:3030/ds/update 'CLEAR DEFAULT' Security and Access Control Fuseki does not currently offer security and access control itself.\nAuthentication and control of the number of concurrent requests can be added using an Apache server and either blocking the Fuseki port to outside traffic (e.g. on Amazon\u0026rsquo;s EC2) or by listening only the localhost network interface. This is especially important for update endpoints (SPARQL Update, SPARQL Graph Store protocol with PUT/POST/DELETE enabled).\nData can be updated without access control if the server is started with the --update argument. If started without that argument, data is read-only.\nLogging Fuseki uses Log4J for logging. There are two main logging channels:\nThe general server messages: org.apache.jena.fuseki.Server A channel for all request messages: org.apache.jena.fuseki.Fuseki The default settings are (this is an extract of a log4j properties file):\n# Fuseki # Server log. log4j.logger.org.apache.jena.fuseki.Server=INFO # Request log. log4j.logger.org.apache.jena.fuseki.Fuseki=INFO # Internal logs log4j.logger.org.apache.jena.fuseki=INFO Server URI scheme This details the service URIs for Fuseki:\nhttp://*host*/dataset/query \u0026ndash; the SPARQL query endpoint. http://*host*/dataset/update \u0026ndash; the SPARQL Update language endpoint. http://*host*/dataset/data \u0026ndash; the SPARQL Graph Store Protocol endpoint. http://*host*/dataset/upload \u0026ndash; the file upload endpoint. Where dataset is a URI path. Note that Fuseki defaults to using port 3030 so host is often localhost:3030.\nImportant - While you may set dataset to be the text dataset this should be avoided since it may interfere with the function of the control panel and web pages.\nThe URI http://host/dataset/sparql is currently mapped to /query but this may change to being a general purpose SPARQL query endpoint.\nRunning a Fuseki Server The server can be run with the script fuseki-server. Common forms are:\nfuseki-server --mem /DatasetPathName fuseki-server --file=FILE /DatasetPathName fuseki-server --loc=DB /DatasetPathName fuseki-server --config=ConfigFile There is an option --port=PORT to set the port number. It defaults to 3030.\n/DatasetPathName is the name under which the dataset will be accessible over HTTP. Please see the above section on Server URI scheme for notes regarding available URIs and choice of this name\nThe server will service read requests only unless the --update argument is used.\nThe full choice of dataset forms is:\nFuseki Dataset Descriptions\n--mem Create an empty, in-memory (non-persistent) dataset. --file=FILE Create an empty, in-memory (non-persistent) dataset, then load FILE into it. --loc=DIR Use an existing TDB database. Create an empty one if it does not exist. --desc=assemblerFile Construct a dataset based on the general assembler description. --config=ConfigFile Construct one or more service endpoints based on the configuration description. A copy of TDB is included in the standalone server. An example assembler file for TDB is in tdb.ttl.\nFuseki Server Arguments\n--help Print help message. --port=*number* Run on port number (default is 3030). --localhost Listen only to the localhost network interface. --update Allow update. Otherwise only read requests are served (ignored if a configuration file is given). Fuseki Server starting with an empty dataset fuseki-server --update --mem /ds runs the server on port 3030 with an in-memory dataset. It can be accessed via the appropriate protocol at the following URLs:\nSPARQL query: http://localhost:3030/ds/query SPARQL update: http://localhost:3030/ds/update SPARQL HTTP update: http://localhost:3030/ds/data The SPARQL Over HTTP scripts take care of naming and protocol details. For example, to load in a file data.rdf:\ns-put http://localhost:3030/ds/data default data.rdf Fuseki Server and TDB Fuseki includes a built-in version of TDB. Run the server with the --desc argument\nfuseki-server --desc tdb.ttl /ds and a database in the directory DB, an assembler description of:\n@prefix rdf: \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026gt; . @prefix rdfs: \u0026lt;http://www.w3.org/2000/01/rdf-schema#\u0026gt; . @prefix ja: \u0026lt;http://jena.hpl.hp.com/2005/11/Assembler#\u0026gt; . @prefix tdb: \u0026lt;http://jena.hpl.hp.com/2008/tdb#\u0026gt; . \u0026lt;#dataset\u0026gt; rdf:type tdb:DatasetTDB ; tdb:location \u0026quot;DB\u0026quot; ; . The form:\nfuseki-server --loc=DB /ds is a shorthand for such an assembler with location DB.\nTo make triples from all the named graphs appear as the default, unnamed graph, use:\n\u0026lt;#dataset\u0026gt; rdf:type tdb:DatasetTDB ; tdb:location \u0026quot;DB\u0026quot; ; tdb:unionDefaultGraph true ; . Fuseki Server and general dataset descriptions The Fuseki server can be given an assembler description to build a variety of model and datasets types.\nfuseki-server --desc assembler.ttl /ds Full details of setting up models assembler is given in the assembler documentation and assembler howto.\nA general dataset is described by:\n# Dataset of default graph and one named graph. \u0026lt;#dataset\u0026gt; rdf:type ja:RDFDataset ; ja:defaultGraph \u0026lt;#modelDft\u0026gt; ; ja:namedGraph [ ja:graphName \u0026lt;http://example.org/name1\u0026gt; ; ja:graph \u0026lt;#model1\u0026gt; ] ; . \u0026lt;#modelDft\u0026gt; a ja:MemoryModel ; ja:content [ ja:externalContent \u0026lt;file:Data.ttl\u0026gt; . \u0026lt;#model1\u0026gt; rdf:type ja:MemoryModel ; ja:content [ ja:externalContent \u0026lt;file:FILE-1.ttl\u0026gt; ] ; ja:content [ ja:externalContent \u0026lt;file:FILE-2.ttl\u0026gt; ] ; . The models can be Jena inference models.\nFuseki Configuration File A Fuseki server can be set up using a configuration file. The command-line arguments for publishing a single dataset are a short cut that, internally, builds a default configuration based on the dataset name given.\nThe configuration is an RDF graph. One graph consists of one server description, with a number of services, and each service offers a number of endpoints over a dataset.\nThe example below is all one file (RDF graph in Turtle syntax) split to allow for commentary.\nPrefix declarations Some useful prefix declarations:\n@prefix fuseki: \u0026lt;http://jena.apache.org/fuseki#\u0026gt; . @prefix rdf: \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026gt; . @prefix rdfs: \u0026lt;http://www.w3.org/2000/01/rdf-schema#\u0026gt; . @prefix tdb: \u0026lt;http://jena.hpl.hp.com/2008/tdb#\u0026gt; . @prefix ja: \u0026lt;http://jena.hpl.hp.com/2005/11/Assembler#\u0026gt; . @prefix : \u0026lt;#\u0026gt; . Server Section Order of the file does not matter to the machine, but it\u0026rsquo;s useful to start with the server description, then each of the services with its datasets.\n[] rdf:type fuseki:Server ; # Server-wide context parameters can be given here. # For example, to set query timeouts: on a server-wide basis: # Format 1: \u0026quot;1000\u0026quot; -- 1 second timeout # Format 2: \u0026quot;10000,60000\u0026quot; -- 10s timeout to first result, then 60s timeout to for rest of query. # See java doc for ARQ.queryTimeout # ja:context [ ja:cxtName \u0026quot;arq:queryTimeout\u0026quot; ; ja:cxtValue \u0026quot;10000\u0026quot; ] ; # Services available. Only explicitly listed services are configured. # If there is a service description not linked from this list, it is ignored. fuseki:services ( \u0026lt;#service1\u0026gt; \u0026lt;#service2\u0026gt; ) . Assembler Initialization All datasets are described by assembler descriptions. Assemblers provide an extensible way of describing many kinds of objects. Set up any assembler extensions - here, the TDB assembler support.\nService 1 This service offers SPARQL Query, SPARQL Update and SPARQL Graph Store protocol, as well as file upload, on an in-memory dataset. Initially, the dataset is empty.\n## --------------------------------------------------------------- ## Updatable in-memory dataset. \u0026lt;#service1\u0026gt; rdf:type fuseki:Service ; fuseki:name \u0026quot;ds\u0026quot; ; # http://host:port/ds fuseki:serviceQuery \u0026quot;query\u0026quot; ; # SPARQL query service fuseki:serviceQuery \u0026quot;sparql\u0026quot; ; # SPARQL query service fuseki:serviceUpdate \u0026quot;update\u0026quot; ; # SPARQL query service fuseki:serviceUpload \u0026quot;upload\u0026quot; ; # Non-SPARQL upload service fuseki:serviceReadWriteGraphStore \u0026quot;data\u0026quot; ; # SPARQL Graph store protocol (read and write) # A separate read-only graph store endpoint: fuseki:serviceReadGraphStore \u0026quot;get\u0026quot; ; # SPARQL Graph store protocol (read only) fuseki:dataset \u0026lt;#dataset-mem\u0026gt; ; . \u0026lt;#dataset-mem\u0026gt; rdf:type ja:RDFDataset . Service 2 This service offers a number of endpoints. It is read-only, because only read-only endpoints are defined (SPARQL Query and HTTP GET SPARQl Graph Store protocol). The dataset is a single in-memory graph:\nThis service offers read-only access to a dataset with a single graph of data.\n\u0026lt;#service2\u0026gt; rdf:type fuseki:Service ; fuseki:name \u0026quot;books\u0026quot; ; # http://host:port/books fuseki:serviceQuery \u0026quot;query\u0026quot; ; # SPARQL query service fuseki:serviceReadGraphStore \u0026quot;data\u0026quot; ; # SPARQL Graph store protocol (read only) fuseki:dataset \u0026lt;#books\u0026gt; ; . \u0026lt;#books\u0026gt; rdf:type ja:RDFDataset ; rdfs:label \u0026quot;Books\u0026quot; ; ja:defaultGraph [ rdfs:label \u0026quot;books.ttl\u0026quot; ; a ja:MemoryModel ; ja:content [ja:externalContent \u0026lt;file:Data/books.ttl\u0026gt; ] ; ] ; . Service 3 This service offers SPARQL query access only to a TDB database. The TDB database can have specific features set, such as query timeout or making the default graph the union of all named graphs.\n\u0026lt;#service3\u0026gt; rdf:type fuseki:Service ; fuseki:name \u0026quot;tdb\u0026quot; ; # http://host:port/tdb fuseki:serviceQuery \u0026quot;sparql\u0026quot; ; # SPARQL query service fuseki:dataset \u0026lt;#dataset\u0026gt; ; . \u0026lt;#dataset\u0026gt; rdf:type tdb:DatasetTDB ; tdb:location \u0026quot;DB\u0026quot; ; # Query timeout on this dataset (1s, 1000 milliseconds) ja:context [ ja:cxtName \u0026quot;arq:queryTimeout\u0026quot; ; ja:cxtValue \u0026quot;1000\u0026quot; ] ; # Make the default graph be the union of all named graphs. ## tdb:unionDefaultGraph true ; . SPARQL Over HTTP SOH (SPARQL Over HTTP) is a set of command-line scripts for working with SPARQL 1.1. SOH is server-independent and will work with any compliant SPARQL 1.1 system offering HTTP access.\nSee the SPARQL Over HTTP page.\nExamples # PUT a file s-put http://localhost:3030/ds/data default D.nt # GET a file s-get http://localhost:3030/ds/data default # PUT a file to a named graph s-put http://localhost:3030/ds/data http://example/graph D.nt # Query s-query --service http://localhost:3030/ds/query 'SELECT * {?s ?p ?o}' # Update s-update --service http://localhost:3030/ds/update --file=update.ru Use from Java SPARQL Query ARQ\u0026rsquo;s QueryExecutionFactory.sparqlService can be used.\nSPARQL Update See UpdateExecutionFactory.createRemote\nSPARQL HTTP See DatasetAccessor\n","permalink":"https://jena.apache.org/documentation/archive/serving_data/fuseki1.html","tags":null,"title":"Fuseki: serving RDF data over HTTP"},{"categories":null,"contents":"SPARQL Standards The relevant SPARQL 1.1 standards are:\nSPARQL 1.1 Query SPARQL 1.1 Update SPARQL 1.1 Protocol SPARQL 1.1 Graph Store HTTP Protocol SPARQL 1.1 Query Results JSON Format SPARQL 1.1 Query Results CSV and TSV Formats SPARQL Query Results XML Format RDF Standards Some RDF 1.1 standards\nRDF 1.1 Turtle RDF 1.1 Trig RDF 1.1 N-Triples RDF 1.1 N-Quads JSON-LD ","permalink":"https://jena.apache.org/documentation/fuseki2/rdf-sparql-standards.html","tags":null,"title":"Fuseki: SPARQL and RDF Standards"},{"categories":null,"contents":" Dataset Transactions Concurrency how-to Handling concurrent access to Jena models Event handler how-to Responding to events Stream manager how-to Redirecting URLs to local files Model factory Creating Jena models of various kinds RDF frames Viewing RDF statements as frame-like objects Typed literals Creating and extracting RDF typed literals SSE SPARQL Syntax Expressions Repacking Jena jars Jena Initialization ","permalink":"https://jena.apache.org/documentation/notes/","tags":null,"title":"General notes and how-to's"},{"categories":null,"contents":"Details of the GeoSPARQL support are provided on the GeoSPARQL page.\nThe assembler for GeoSPARQL support is part of the jena-geosparql artifact and must be on the Fuseki server classpath, along with its dependencies.\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-geosparql\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;...\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; or download the binary from the Maven central repository org/apache/jena/jena-geosparql\nThe GeoSPARQL assembler can be used in a Fuseki configuration file.\nThis example is of a read-only:\nPREFIX fuseki: \u0026lt;http://jena.apache.org/fuseki#\u0026gt; PREFIX rdf: \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026gt; PREFIX rdfs: \u0026lt;http://www.w3.org/2000/01/rdf-schema#\u0026gt; PREFIX tdb2: \u0026lt;http://jena.apache.org/2016/tdb#\u0026gt; PREFIX ja: \u0026lt;http://jena.hpl.hp.com/2005/11/Assembler#\u0026gt; PREFIX geosparql: \u0026lt;http://jena.apache.org/geosparql#\u0026gt; \u0026lt;#service\u0026gt; rdf:type fuseki:Service; fuseki:name \u0026#34;geo\u0026#34;; fuseki:endpoint [ fuseki:operation fuseki:query; ] ; fuseki:dataset \u0026lt;#geo_ds\u0026gt; . \u0026lt;#geo_ds\u0026gt; rdf:type geosparql:geosparqlDataset ; geosparql:spatialIndexFile \u0026#34;DB/spatial.index\u0026#34;; geosparql:dataset \u0026lt;#baseDataset\u0026gt; ; . \u0026lt;#baseDataset\u0026gt; rdf:type tdb2:DatasetTDB2 ; tdb2:location \u0026#34;DB/\u0026#34; ; . It is possible to run with a data file loaded into memory and a spatial in-memory index:\nPREFIX fuseki: \u0026lt;http://jena.apache.org/fuseki#\u0026gt; PREFIX rdf: \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026gt; PREFIX rdfs: \u0026lt;http://www.w3.org/2000/01/rdf-schema#\u0026gt; PREFIX ja: \u0026lt;http://jena.hpl.hp.com/2005/11/Assembler#\u0026gt; PREFIX geosparql: \u0026lt;http://jena.apache.org/geosparql#\u0026gt; \u0026lt;#service\u0026gt; rdf:type fuseki:Service; fuseki:name \u0026#34;ds\u0026#34;; fuseki:endpoint [ fuseki:operation fuseki:query; ] ; fuseki:dataset \u0026lt;#geo_ds\u0026gt; . # In-memory data and index. \u0026lt;#geo_ds\u0026gt; rdf:type geosparql:geosparqlDataset ; geosparql:dataset \u0026lt;#baseDataset\u0026gt; . \u0026lt;#baseDataset\u0026gt; rdf:type ja:MemoryDataset ; ja:data \u0026lt;file:geosparql_data.ttl\u0026gt; ; . The full assembler properties with the default settings is:\n\u0026lt;#geo_ds\u0026gt; rdf:type geosparql:GeosparqlDataset ; # Build in-memory is absent. geosparql:spatialIndexFile \u0026#34;spatial.index\u0026#34;; ## Default settings. See documentation for meanings. geosparql:inference true ; geosparql:queryRewrite true ; geosparql:indexEnabled true ; geosparql:applyDefaultGeometry false ; # 3 item lists: [Geometry Literal, Geometry Transform, Query Rewrite] geosparql:indexSizes \u0026#34;-1,-1,-1\u0026#34; ; # Default - unlimited. geosparql:indexExpires \u0026#34;5000,5000,5000\u0026#34; ; # Default - time in milliseconds. ## Required setting - data over which GeoSPARQL is applied. geosparql:dataset \u0026lt;#baseDataset\u0026gt; ; . ","permalink":"https://jena.apache.org/documentation/geosparql/geosparql-assembler.html","tags":null,"title":"GeoSPARQL Assembler"},{"categories":null,"contents":"This application provides a HTTP server compliant with the GeoSPARQL standard.\nGeoSPARQL can also be integrated with Fuseki using the GeoSPARQL assembler with a general Fuseki server.\njena-fuseki-geosparql GeoSPARQL Fuseki can be accessed as an embedded server using Maven etc. from Maven Central or run from the command line. SPARQL queries directly on Jena Datasets and Models can be done using the GeoSPARQL Jena module.\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-fuseki-geosparql\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;...\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; or download the binary from the Maven central repository org/apache/jena/jena-fuseki-geosparql\nThis uses the embedded server Fuseki and provides additional parameters for dataset loading.\nThe project uses the GeoSPARQL implementation from the GeoSPARQL Jena module, which includes a range of functions in addition to those from the GeoSPARQL standard.\nCurrently, there is no GUI interface as provided with this server.\nThe intended usage is to specify a TDB folder (either TDB1 or TDB2, created if required) for persistent storage of the dataset. File loading, inferencing and data conversion operations can also be specified to load and manipulate data into the dataset. When the server is restarted these conversion operations are not required again (as they have been stored in the dataset) unless there are relevant changes. The TDB dataset can also be prepared and manipulated programatically using the Jena API.\nUpdates can be made to the dataset while the Fuseki server is running. However, these changes will not be applied to inferencing and spatial indexes until the server restarts (any default or specified spatial index file must not exists to trigger building). This is due to the current implementation of RDFS inferencing in Jena (and is required in any Fuseki server with inferencing) and the selected spatial index.\nA subset of the EPSG spatial/coordinate reference systems are included by default from the Apache SIS project (http://sis.apache.org). The full EPSG dataset is not distributed due to the EPSG terms of use being incompatible with the Apache Licence. Several options are available to include the EPSG dataset by setting the SIS_DATA environment variable (http://sis.apache.org/epsg.html).\nIt is expected that at least one Geometry Literal or Geo Predicate is present in a dataset (otherwise a standard Fuseki server can be used). A spatial index is created and new data cannot be added to the index once built. The spatial index can optionally be stored for future usage and needs to removed from a TDB folder if the index is to be rebuilt.\nClarifications on GeoSPARQL Geographic Markup Language (GML) GeoSPARQL refers to the Geographic Markup Language (GML) as one format for GeometryLiterals. This does not mean that GML is part of the GeoSPARQL standard. Instead a subset of geometry encodings from the GML standards are permitted (specifically the GML 2.0 Simple Features Profile (10-100r3) is supported by GeoSPARQL Jena). The expected encoding of data is in RDF triples and can be loaded from any RDF file format supported by Apache Jena. Conversion of GML to RDF is out of scope of the GeoSPARQL standard and Apache Jena.\nGeo Predicates Lat/Lon Historically, geospatial data has frequently been encoded as Latitude/Longitude coordinates in the WGS84 coordinate reference system. The GeoSPARQL standard specifically chooses not to adopt this approach and instead uses the more versatile GeometryLiteral, which permits multiple encoding formats that support multiple coordinate reference systems and geometry shapes. Therefore, Lat/Lon Geo Predicates are not part of the GeoSPARQL standard. However, GeoSPARQL Jena provides two methods to support users with geo predicates in their geospatial data.\nConversion of Geo Predicates to the GeoSPARQL data structure (encoding the Lat/Lon as a Point geometry). Spatial extension which provides property and filter functions accepting Lat/Lon arguments. The Spatial extension functions (documented in the GeoSPARQL Jena module) support triples in either GeoSPARQL data structure or Geo Predicates. Therefore, converting a dataset to GeoSPARQL will not lose functionality. By converting to the GeoSPARQL data structure, datasets can include a broader range of geospatial data.\nCommand Line Run from the command line and send queries over HTTP.\njava -jar jena-fuseki-geosparql-VER.jar ARGS\nwritten geosparql-fuseki below.\nExamples java -jar jena-fuseki-geosparql-VER.jar -rf \u0026quot;geosparql_test.rdf\u0026quot; -i\nThe example file geosparql_test.rdf in the GitHub repository contains several geometries in geodectic WGS84 (EPSG:4326). The example file geosparql_test_27700.rdf is identical but in the projected OSGB36 (EPSG:27770) used in the United Kingdom. Both will return the same results as GeoSPARQL treats all SRS as being projected. RDFS inferencing is applied using the GeoSPARQL schema to infer additional relationships (which aren\u0026rsquo;t asserted in the example files) that are used in the spatial operations and data retrieval.\nExamples:\nLoad RDF file (XML format) into memory and run server: geosparql-fuseki -rf \u0026quot;test.rdf\u0026quot;\nLoad RDF file (TTL format: default) into memory, apply GeoSPARQL schema with RDFS inferencing and run server: geosparql-fuseki -rf \u0026quot;test.rdf\u0026quot; -i\nLoad RDF file into memory, write spatial index to file and run server: geosparql-fuseki -rf \u0026quot;test.rdf\u0026quot; -si \u0026quot;spatial.index\u0026quot;\nLoad RDF file into persistent TDB and run server: geosparql-fuseki -rf \u0026quot;test.rdf\u0026quot; -t \u0026quot;TestTDB\u0026quot;\nLoad from persistent TDB and run server: geosparql-fuseki -t \u0026quot;TestTDB\u0026quot;\nLoad from persistent TDB, change port and run server: geosparql-fuseki -t \u0026quot;TestTDB\u0026quot; -p 3030\nSee rdf-tables in Output Formats/Serialisations for supported RDF format keywords.\nN.B. Windows Powershell will strip quotation pairs from arguments and so triple quotation pairs may be required, e.g. \u0026quot;\u0026quot;\u0026quot;test.rdf\u0026quot;\u0026quot;\u0026quot;. Otherwise, logging output will be sent to a file called \u0026ldquo;xml\u0026rdquo;. Also, \u0026ldquo;The input line is too long\u0026rdquo; error can mean the path to the exceeds the character limit and needs shortening.\nEmbedded Server Run within a Java application to provide GeoSPARQL support over HTTP to other applications:\nFusekiLogging.setLogging(); GeosparqlServer server = new GeosparqlServer(portNumber, datasetName, isLoopbackOnly, dataset, isUpdate); SPARQL Query Example Once the default server is running it can be queried using Jena as follows:\nString service = \u0026#34;http://localhost:3030/ds\u0026#34;; String query = ....; try (QueryExecution qe = QueryExecution.service(service).query(query).build()) { ResultSet rs = qe.execSelect(); ResultSetFormatter.outputAsTSV(rs); } The server will respond to any valid SPARQL HTTP so an alternative SPARQL framework can be used. More information on SPARQL querying using Jena can be found on their website (https://jena.apache.org/tutorials/sparql.html).\nSIS_DATA Environment Variable The Apache SIS library is used to support the recognition and transformation of Coordinate/Spatial Reference Systems. These Reference Systems are published as the EPSG dataset. The full EPSG dataset is not distributed due to the EPSG terms of use being incompatible with the Apache Licence. A subset of the EPSG spatial/coordinate reference systems are included by default but the wider dataset may be required. Several options are available to include the EPSG dataset by setting the SIS_DATA environment variable (http://sis.apache.org/epsg.html).\nAn embedded EPSG dataset can be included in an application by adding the following dependency:\nGradle dependency in build.gradle\next.sisVersion = \u0026ldquo;0.8\u0026rdquo; implementation \u0026ldquo;org.apache.sis.non-free:sis-embedded-data:$sisVersion\u0026rdquo;\nMaven dependency in pom.xml\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.sis.non-free\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;sis-embedded-data\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;0.8\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; Command Line Arguments Boolean options that have false defaults only require \u0026ldquo;\u0026ndash;option\u0026rdquo; to make true in release v1.0.7 or later. Release v1.0.6 and earlier use the form \u0026ldquo;\u0026ndash;option true\u0026rdquo;.\n1) Port --port, -p The port number of the server. Default: 3030\n2) Dataset name --dataset, -d The name of the dataset used in the URL. Default: ds\n3) Loopback only --loopback, -l The server only accepts local host loopback requests. Default: true\n4) SPARQL update allowed --update, -u The server accepts updates to modify the dataset. Default: false\n5) TDB folder --tdb, -t An existing or new TDB folder used to persist the dataset. Default set to memory dataset. If accessing a dataset for the first time with GeoSPARQL then consider the --inference, --default_geometry and --validate options. These operations may add additional statements to the dataset. TDB1 Dataset will be used by default, use -t \u0026lt;folder_path\u0026gt; -t2 options for TDB2 Dataset.\n6) Load RDF file into dataset --rdf_file, -rf Comma separated list of [RDF file path#graph name\u0026amp;RDF format] to load into dataset. Graph name is optional and will use default graph. RDF format is optional (default: ttl) or select from one of the following: json-ld, json-rdf, nt, nq, thrift, trig, trix, ttl, ttl-pretty, xml, xml-plain, xml-pretty. e.g. test.rdf#test\u0026amp;xml,test2.rdf will load test.rdf file into test graph as RDF/XML and test2.rdf into default graph as TTL.\nConsider the --inference, --default_geometry and --validate options. These operations may add additional statements to the dataset.\nThe combination of specifying -t TDB folder and -rf loading RDF file will store the triples in the persistent TDB dataset. Therefore, loading the RDF file would only be required once.\n7) Load Tabular file into dataset --tabular_file, -tf Comma separated list of [Tabular file path#graph name|delimiter] to load into dataset. See RDF Tables for table formatting. Graph name is optional and will use default graph. Column delimiter is optional and will default to COMMA. Any character except \u0026lsquo;:\u0026rsquo;, \u0026lsquo;^\u0026rsquo; and \u0026lsquo;|\u0026rsquo;. Keywords TAB, SPACE and COMMA are also supported. e.g. test.rdf#test|TAB,test2.rdf will load test.rdf file into test graph as TAB delimited and test2.rdf into default graph as COMMA delimited.\nSee RDF Tables project (https://github.com/galbiston/rdf-tables) for more details on tabular format.\nConsider the --inference, --default_geometry and --validate options. These operations may add additional statements to the dataset.\nThe combination of specifying -t TDB folder and -tf loading tabular file will store the triples in the persistent TDB dataset. Therefore, loading the tabular file would only be required once.\n8) GeoSPARQL RDFS inference --inference, -i Enable GeoSPARQL RDFS schema and inferencing (class and property hierarchy). Inferences will be applied to the dataset. Updates to dataset may require server restart. Default: false\nThe combination of specifying -t TDB folder and -i GeoSPARQL RDFS inference will store the triples in the persistent TDB dataset. Therefore, the GeoSPARL RDFS inference option would only be required when there is a change to the dataset.\n9) Apply hasDefaultGeometry --default_geometry, -dg Apply hasDefaultGeometry to single Feature hasGeometry Geometry statements. Additional properties will be added to the dataset. Default: false\nThe combination of specifying -t TDB folder and -dg apply hasDefaultGeometry will modify the triples in the persistent TDB dataset. Therefore, applying hasDefaultGeometry would only be required when there is a change to the dataset.\n10) Validate Geometry Literals --validate, -v Validate that the Geometry Literals in the dataset are valid. Default: false\n11) Convert Geo predicates --convert_geo, -c Convert Geo predicates in the data to Geometry with WKT WGS84 Point GeometryLiteral. Default: false\nThe combination of specifying -t TDB folder and -c convert Geo predicates will modify the triples in the persistent TDB dataset. Therefore, converting the Geo predicates would only be required once.\n12) Remove Geo predicates --remove_geo, -rg Remove Geo predicates in the data after combining to Geometry. Default: false\nThe combination of specifying -t TDB folder and -rg remove Geo predicates will modify the triples in the persistent TDB dataset. Therefore, removing the Geo predicates would only be required once.\n13) Query Rewrite enabled --rewrite, -r Enable query rewrite extension of GeoSPARQL standard to simplify queries, which relies upon the \u0026lsquo;hasDefaultGeometry\u0026rsquo; property. The \u0026lsquo;default_geometry\u0026rsquo; may be useful for adding the \u0026lsquo;hasDefaultGeometry\u0026rsquo; to a dataset. Default: true\n14) Indexing enabled --index, -x Enable caching of re-usable data to improve query performance. Default: true See GeoSPARQL Jena project for more details.\n15) Index sizes --index_sizes, -xs List of Index item sizes: [Geometry Literal, Geometry Transform, Query Rewrite]. Unlimited: -1, Off: 0 Unlimited: -1, Off: 0, Default: -1,-1,-1\n16) Index expiries --index_expiry, -xe List of Index item expiry in milliseconds: [Geometry Literal, Geometry Transform, Query Rewrite]. Off: 0, Minimum: 1001, Default: 5000,5000,500\n17) Spatial Index file --spatial_index, -si File to load or store the spatial index. Default to \u0026ldquo;spatial.index\u0026rdquo; in TDB folder if using TDB option and this option is not set. Otherwise spatial index is not stored and rebuilt at start up. The spatial index file must not exist for the index to be built (e.g. following changes to the dataset).\n18) Properties File Supply the above parameters as a file:\n$ java Main @/tmp/parameters Future Work GUI to assist users when querying a dataset. ","permalink":"https://jena.apache.org/documentation/geosparql/geosparql-fuseki.html","tags":null,"title":"GeoSPARQL Fuseki"},{"categories":null,"contents":"We are always happy to help you get your Jena project going. Jena has been around for many years, there are many archives of past questions, tutorials and articles on the web. A quick search may well answer your question directly! If not, please feel free to post a question to the user support list (details below).\nEmail support lists The main user support list is users@jena.apache.org. To join this list, please send an email to: users-subscribe@jena.apache.org from the email account you want to subscribe with. This list is a good place to ask for advice on developing Jena-based applications, or solving a problem with using Jena. Please see below for notes on asking good questions. The list is archived at lists.apache.org.\nThe developers list is dev@jena.apache.org. To join this list, please send an email to: dev-subscribe@jena.apache.org from the email account you want to subscribe with. This list is a good place to discuss the development of the Jena platform itself, including patches you want to submit.\nTo unsubscribe from a mailing list, send email to LIST-unsubscribe@jena.apache.org.\nFull details of Apache mailing lists: https://www.apache.org/foundation/mailinglists.html.\nOther resources There are curated collections of Jena questions on StackOverflow tagged \u0026lsquo;jena\u0026rsquo; and \u0026lsquo;apache-jena\u0026rsquo;. There are also questions and answers about SPARQL.\nHow to ask a good question Asking good questions is the best way to get good answers. Try to follow these tips:\nMake the question precise and specific. \u0026ldquo;My code doesn\u0026rsquo;t work\u0026rdquo;, for example, does not help us to help you as much as \u0026ldquo;The following SPARQL query gave me an answer I didn\u0026rsquo;t expect\u0026rdquo;.\nShow that you\u0026rsquo;ve tried to solve the problem yourself. Everyone who answers questions on the list has a full-time job or study to do; no-one gets paid for answering these support questions. Spend their goodwill wisely: \u0026ldquo;Here\u0026rsquo;s the code I tried\u0026hellip;\u0026rdquo; or \u0026ldquo;I read in the documentation that \u0026hellip;\u0026rdquo; shows that you\u0026rsquo;ve at least made some effort to find things out for yourself.\nWhere appropriate show a complete test case. Seeing where your code goes wrong is generally much easier if we can run it our computers. Corollaries: don\u0026rsquo;t post your entire project - take some time to reduce it down to a minimal test case. Include enough data - runnable code is no help if critical resources like *.rdf files are missing. Reducing your code down to a minimal test case is often enough for you to figure out the problem yourself, which is always satisfying!\nDon\u0026rsquo;t re-post your question after only a few hours. People are busy, and may be in a different timezone to you. If you\u0026rsquo;re not sure if your question made it to the list, look in the archive.\nAdding lots of exclamation marks or other punctuation will not move your question up the queue. Quite the reverse, in fact.\nAsk questions on the list, rather than emailing the developers directly. This gives us the chance to share the load of answering questions, and also ensures that answers are archived in case they\u0026rsquo;re of use to others in the future.\n","permalink":"https://jena.apache.org/help_and_support/","tags":null,"title":"Getting help with Jena"},{"categories":null,"contents":"We welcome your contribution towards making Jena a better platform for semantic web and linked data applications. We appreciate feature suggestions, bug reports and patches for code or documentation.\nIf you need help using Jena, please see our getting help page.\nHow to contribute You can help us sending your suggestions, feature requests and bug reports (as well as patches) using Jena\u0026rsquo;s GitHub Issues.\nYou can discuss your contribution on the dev@jena.apache.org mailing list. You can also help other users by answering their questions on the users@jena.apache.org mailing list. See the subscription instructions for details.\nPlease see the Reviewing Contributions page for details of what committers will be looking for when reviewing contributions.\nImproving the Website You can also help us improve the documentation on this website via Pull Request.\nThe website source lives in an Apache git repository at gitbox.apache.org repo jena-site. There is also a full read-write mirror on GitHub, see jena-site on GitHub:\ngit clone https://github.com/apache/jena-site.git cd jena-site You can then make a branch, prepare your changes and submit a pull request. Please see the README.md in that repository for more details.\nSNAPSHOTs If you use Apache Maven and you are not afraid of being on the bleeding-edge, you can help us by testing our SNAPSHOTs which you can find in the Apache Maven repository.\nHere is, for example, how you can add TDB version X.Y.Z-SNAPSHOT to your project (please ask if you are unsure what the latest snapshot version number currently is):\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-tdb\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;X.Y.Z-SNAPSHOT\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; See also how to use Jena with Maven.\nIf you have problems with any of our SNAPSHOTs, let us know.\nYou can check the state of each Jena development builds on the Apache Jenkins continuous integration server.\nGit repository You can find the Jena source code in the Apache git repository: https://gitbox.apache.org/repos/asf/jena.git\nThere is also a full read-write mirror of Jena on GitHub:\ngit clone https://github.com/apache/jena.git cd jena mvn clean install You can fork Jena on GitHub and also submit pull requests to contribute your suggested changes to the code.\nOpen issues Apache Jena manages issues using github open issues.\nSubmit your patches You can develop new contributions and work on patches using either the Apache-hosted git repository or the mirror on GitHub.\nAlternatively, patches can be attached directly to issues in github.\nPlease, inspect your contribution/patch and make sure it includes all (and only) the relevant changes for a single issue. Don\u0026rsquo;t forget tests!\nIf you want to test if a patch applies cleanly you can use:\npatch -p0 \u0026lt; JENA-XYZ.patch If you use Eclipse: right click on the project name in Package Explorer, select Team \u0026gt; Create Patch or Team \u0026gt; Apply Patch.\nYou can also use git:\ngit format-patch origin/trunk IRC channel Some Jena developers hang out on #jena on irc.freenode.net.\nHow Apache Software Foundation works To better understand how to get involved and how the Apache Software Foundation works we recommend you read:\nhttps://www.apache.org/foundation/getinvolved.html https://www.apache.org/foundation/how-it-works.html https://www.apache.org/dev/contributors.html ","permalink":"https://jena.apache.org/getting_involved/","tags":null,"title":"Getting involved in Apache Jena"},{"categories":null,"contents":"Apache Jena (or Jena in short) is a free and open source Java framework for building semantic web and Linked Data applications. The framework is composed of different APIs interacting together to process RDF data. If you are new here, you might want to get started by following one of the tutorials. You can also browse the documentation if you are interested in a particular topic.\nTutorials RDF API tutorial - you will learn the essence of the semantic web and the graph representation behind RDF. SPARQL tutorial - will guide you to formulate expressive queries over RDF data. Ontology API - illustrates the usage of advanced semantic web features such as reasoning over your data using OWL. Finally, some of the tutorials are also available in Traditional Chinese, Portuguese and French. Documentation The following topics are covered in the documentation:\nThe RDF API - the core RDF API in Jena SPARQL - querying and updating RDF models using the SPARQL standards Fuseki - SPARQL server which can present RDF data and answer SPARQL queries over HTTP Assembler - describing recipes for constructing Jena models declaratively using RDF Inference - using the Jena rules engine and other inference algorithms to derive consequences from RDF models Javadoc - JavaDoc generated from the Jena source Text Search - enhanced indexes using Lucene or Solr for more efficient searching of text literals in Jena models and datasets I/O - notes on input and output of triples to and from Jena models How-To\u0026rsquo;s - various topic-specific how-to documents Ontology - support for handling OWL models in Jena TDB - a fast persistent triple store that stores directly to disk Tools - various command-line tools and utilities to help developers manage RDF data and other aspects of Jena Framework Architecture The interaction between the different APIs:\nOther resources ","permalink":"https://jena.apache.org/getting_started/","tags":null,"title":"Getting started with Apache Jena"},{"categories":null,"contents":"Old documentation (Jena 3.1.1 to Jena 4.2.0)\nJena 4.3.0 and later uses the JDK java.net.http package. Jena adds API support for challenge-based authentication and also provide HTTP digest authentication.\nAuthentication There are 5 variations:\nBasic authentication Challenge-Basic authentication Challenge-Digest authentication URL user (that is, user@host.net in the URL) URL user and password in the URL (that is, user:password@host.net in the URL) Basic authentication occurs where the app provides the user and password information to the JDK HttpClient and that information is always used when sending HTTP requests with that HttpClient. It does not require an initial request-challenge-resend to initiate. This is provided natively by the java.net.http JDK code. See HttpClient.newBuilder().authenticate(...).\nChallenge based authentication, for \u0026ldquo;basic\u0026rdquo; or \u0026ldquo;digest\u0026rdquo;, are provided by Jena. The challenge happens on the first contact with the remote endpoint and the server returns a 401 response with an HTTP header saying which style of authentication is required. There is a registry of users name and password for endpoints which is consulted and the appropriate Authorization: header is generated then the request resent. If no registration matches, the 401 is passed back to the application as an exception.\nBecause it is a challenge response to a request, the request must be sent twice, first to trigger the challenge and then again with the HTTP authentication information. To make this automatic, the first request must not be a streaming request (the stream is not repeatable). All HTTP request generated by Jena are repeatable.\nThe URL can contain a userinfo part, either the users@host form, or the user:password@host form. If just the user is given, the authentication environment is consulted for registered users-password information. If user and password is given, the details as given are used. This latter form is not recommended and should only be used if necessary because the password is in-clear in the SPARQL query.\nJena also has support for bearer authentication.\nJDK HttpClient.authenticator // Basic or Digest - determined when the challenge happens. AuthEnv.get().registerUsernamePassword(URI.create(dataURL), \u0026#34;user\u0026#34;, \u0026#34;password\u0026#34;); try ( QueryExecution qExec = QueryExecutionHTTP.service(dataURL) .endpoint(dataURL) .queryString(\u0026#34;ASK{}\u0026#34;) .build()) { qExec.execAsk(); } alternatively, the java platform provides basic authentication. This is not challenge based - any request sent using a HttpClient configured with an authenticator will include the authentication details. (Caution - including sending username/password to the wrong site!). Digest authentication must use AuthEnv.get().registerUsernamePassword.\nAuthenticator authenticator = AuthLib.authenticator(\u0026#34;user\u0026#34;, \u0026#34;password\u0026#34;); HttpClient httpClient = HttpClient.newBuilder() .authenticator(authenticator) .build(); // Use with RDFConnection try ( RDFConnection conn = RDFConnectionRemote.service(dataURL) .httpClient(httpClient) .build()) { conn.queryAsk(\u0026#34;ASK{}\u0026#34;); } try ( QueryExecution qExec = QueryExecutionHTTP.service(dataURL) .httpClient(httpClient) .endpoint(dataURL) .queryString(\u0026#34;ASK{}\u0026#34;) .build()) { qExec.execAsk(); } Challenge registration AuthEnv maintains a registry of credentials and also a registry of which service URLs the credentials should be used. It supports registration of endpoint prefixes so that one registration will apply to all URLs starting with a common root.\nThe main function is AuthEnv.get().registerUsernamePassword.\n// Application setup code AuthEnv.get().registerUsernamePassword(\u0026#34;username\u0026#34;, \u0026#34;password\u0026#34;); ... try ( QueryExecution qExec = QueryExecutionHTTP.service(dataURL) .endpoint(dataURL) .queryString(\u0026#34;ASK{}\u0026#34;) .build()) { qExec.execAsk(); } When an HTTP 401 response with an WWW-Authenticate header is received, the Jena http handling code will will look for a suitable authentication registration (exact or longest prefix), and retry the request. If it succeeds, a modifier is installed so all subsequent request to the same endpoint will have the authentication header added and there is no challenge round-trip.\nSERVICE The same mechanism is used for the URL in a SPARQL SERVICE clause. If there is a 401 challenge, the registry is consulted and authentication applied.\nIn addition, if the SERVICE URL has a username as the userinfo (that is, https://users@some.host/...), that user name is used to look in the authentication registry.\nIf the userinfo is of the form \u0026ldquo;username:password\u0026rdquo; then the information as given in the URL is used.\nAuthEnv.get().registerUsernamePassword(URI.create(\u0026#34;http://host/sparql\u0026#34;), \u0026#34;u\u0026#34;, \u0026#34;p\u0026#34;); // Registration applies to SERVICE. Query query = QueryFactory.create(\u0026#34;SELECT * { SERVICE \u0026lt;http://host/sparql\u0026gt; { ?s ?p ?o } }\u0026#34;); try ( QueryExecution qExec = QueryExecution.create().query(query).dataset(...).build() ) { System.out.println(\u0026#34;Call using SERVICE...\u0026#34;); ResultSet rs = qExec.execSelect(); ResultSetFormatter.out(rs); } Authentication Examples jena-examples:arq/examples/auth/.\nBearer Authentication Bearer authentication requires that the application to obtain a token to present to the server.\nRFC 6750 RFC 6751 JSON Web Tokens (JWT) JSON Web Token Best Current Practices How this token is obtained depends on the deployment environment.\nThe application can either register the token to be used:\nAuthEnv.get().addBearerToken(targetURL, jwtString); or can provide a token provider for 401 challeneges stating bearer authentication.\nAuthEnv.get().setBearerTokenProvider( (uri, challenge)-\u0026gt;{ ... ; return jwtString; }); ","permalink":"https://jena.apache.org/documentation/sparql-apis/http-auth.html","tags":null,"title":"HTTP Authentication"},{"categories":null,"contents":"Documentation for HTTP Authentication (Jena3.1.1 to Jena 4.2.0) using Apache Commons HttpClient.\nAfter Jena 3.1.0, Jena exposes the underlying HTTP Commons functionality to support a range of authentication mechanisms as well as other HTTP configuration. From Jena 3.0.0 through Jena 3.1.0 there is a Jena-specific framework that provides a uniform mechanism for HTTP authentication. This documentation is therefore divided into two sections. The first explains how to use HTTP Commons code, and the second explains the older Jena-specific functionality.\nHTTP Authentication from Jena 3.1.1 APIs that support authentication typically provide methods for providing an HttpClient for use with the given instance of that API class. Since it may not always be possible/practical to configure authenticators on a per-request basis the API includes a means to specify a default client that is used when no other client is explicitly specified. This may be configured via the setDefaultHttpClient(HttpClient httpClient) method of the HttpOp class. This allows for static-scoped configuration of HTTP behavior.\nExamples of authentication This section includes a series of examples showing how to use HTTP Commons classes to perform authenticated work. Most of them take advantage of HttpOp.setDefaultHttpClient as described above.\nSimple authentication using username and password First we build an authenticating client:\nCredentialsProvider credsProvider = new BasicCredentialsProvider(); Credentials credentials = new UsernamePasswordCredentials(\u0026quot;user\u0026quot;, \u0026quot;passwd\u0026quot;); credsProvider.setCredentials(AuthScope.ANY, credentials); HttpClient httpclient = HttpClients.custom() .setDefaultCredentialsProvider(credsProvider) .build(); HttpOp.setDefaultHttpClient(httpclient); Notice that we gave no scope for use with the credentials (AuthScope.ANY). We can make further use of that parameter if we want to assign a scope for some credentials:\nCredentialsProvider credsProvider = new BasicCredentialsProvider(); Credentials unscopedCredentials = new UsernamePasswordCredentials(\u0026quot;user\u0026quot;, \u0026quot;passwd\u0026quot;); credsProvider.setCredentials(AuthScope.ANY, unscopedCredentials); Credentials scopedCredentials = new UsernamePasswordCredentials(\u0026quot;user\u0026quot;, \u0026quot;passwd\u0026quot;); final String host = \u0026quot;http://example.com/sparql\u0026quot;; final int port = 80; final String realm = \u0026quot;aRealm\u0026quot;; final String schemeName = \u0026quot;DIGEST\u0026quot;; AuthScope authscope = new AuthScope(host, port, realm, schemeName); credsProvider.setCredentials(authscope, scopedCredentials); HttpClient httpclient = HttpClients.custom() .setDefaultCredentialsProvider(credsProvider) .build(); HttpOp.setDefaultHttpClient(httpclient); Authenticating via a form For this case we introduce an HttpClientContext, which we can use to retrieve the cookie we get from logging into a form. We then use the cookie to authenticate elsewhere.\n// we'll use this context to maintain our HTTP \u0026quot;conversation\u0026quot; HttpClientContext httpContext = new HttpClientContext(); // first we use a method on HttpOp to log in and get our cookie Params params = new Params(); params.addParam(\u0026quot;username\u0026quot;, \u0026quot;Bob Wu\u0026quot;); params.addParam(\u0026quot;password\u0026quot;, \u0026quot;my password\u0026quot;); HttpOp.execHttpPostForm(\u0026quot;http://example.com/loginform\u0026quot;, params , null, null, null, httpContext); // now our cookie is stored in httpContext CookieStore cookieStore = httpContext.getCookieStore(); // lastly we build a client that uses that cookie HttpClient httpclient = HttpClients.custom() .setDefaultCookieStore(cookieStore) .build(); HttpOp.setDefaultHttpClient(httpclient); // alternatively we could use the context directly Query query = ... QueryEngineHTTP qEngine = QueryExecutionFactory.createServiceRequest(\u0026quot;http:example.com/someSPARQL\u0026quot;, query); qEngine.setHttpContext(httpContext); ResultSet results = qEngine.execSelect(); Using authentication functionality in direct query execution Jena offers support for directly creating SPARQL queries against remote services. To use QueryExecutionFactory in this case, select the methods (sparqlService, createServiceRequest) that offer an HttpClient parameter and use an authenticating client in that slot. In the case of QueryEngineHTTP, it is possible to use constructors that have a parameter slot for an HttpClient, but it is also possible post-construction to use setClient(HttpClient client) and setHttpContext(HttpClientContext context) (as shown above). These techniques allow control over HTTP behavior when requests are made to remote services.\nHTTP Authentication from Jena 3.0.0 through 3.1.0 APIs that support authentication typically provide two methods for providing authenticators, a setAuthentication(String username, char[] password) method which merely configures a SimpleAuthenticator. There will also be a setAuthenticator(HttpAuthenticator authenticator) method that allows you to configure an arbitrary authenticator.\nAuthenticators applied this way will only be used for requests by that specific API. APIs that currently support this are as follows:\nQueryEngineHTTP - This is the QueryExecution implementation returned by QueryExecutionFactory.sparqlService() calls UpdateProcessRemoteBase - This is the base class of UpdateProcessor implementations returned by UpdateExecutionFactory.createRemote() and UpdateExecutionFactory.createRemoteForm() calls DatasetGraphAccessorHTTP - This is the DatasetGraphAccessor implementation underlying remote dataset accessors. From 2.11.0 onwards the relevant factory methods include overloads that allow providing a HttpAuthenticator at creation time which avoids the needs to cast and manually set the authenticator afterwards e.g.\nHttpAuthenticator authenticator = new SimpleAuthenticator(\u0026quot;user\u0026quot;, \u0026quot;password\u0026quot;.toCharArray()); try(QueryExecution qe = QueryExecutionFactory.sparqlService(\u0026quot;http://example.org/sparql\u0026quot;, \u0026quot;SELECT * WHERE { ?s a ?type }\u0026quot;, authenticator)) { ... } Authenticators Authentication mechanisms are provided by HttpAuthenticator implementations of which a number are provided built into ARQ.\nThis API provides the authenticator with access to the HttpClient, HttpContext and target URI of the request that is about to be carried out. This allows for authenticators to add credentials to requests on a per-request basis and/or to use different mechanisms and credentials for different services.\nSimpleAuthenticator The simple authenticator is as the name suggests the simplest implementation. It takes a single set of credentials which is applied to any service.\nAuthentication however is not preemptive so unless the remote service sends a HTTP challenge (401 Unauthorized or 407 Proxy Authorization Required) then credentials will not actually be submitted.\nScopedAuthenticator The scoped authenticator is an authenticator which maps credentials to different service URIs. This allows you to specify different credentials for different services as appropriate. Similarly to the simple authenticator this is not preemptive authentication so credentials are not sent unless the service requests them.\nScoping of credentials is not based on exact mapping of the request URI to credentials but rather on a longest match approach. For example if you define credentials for http://example.org then these are used for any request that requires authentication under that URI e.g. http://example.org/some/path. However, if you had also defined credentials for http://example.org/some/path then these would be used in favor of those for http://example.org\nServiceAuthenticator The service authenticator is an authenticator which uses information encoded in the ARQ context and basically provides access to the existing credential provision mechanisms provided for the SERVICE clause, see Basic Federated Query for more information on configuration for this.\nFormsAuthenticator The forms authenticator is an authenticator usable with services that require form based logins and use session cookies to verify login state. This is intended for use with services that don\u0026rsquo;t support HTTP\u0026rsquo;s built-in authentication mechanisms for whatever reason. One example of this are servers secured using Apache HTTP Server mod_auth_form.\nThis is one of the more complex authenticators to configure because it requires you to know certain details of the form login mechanism of the service you are authenticating against. In the simplest case where a site is using Apache mod_auth_form in its default configuration you merely need to know the URL to which login requests should be POSTed and your credentials. Therefore you can do the following to configure an authenticator:\nURI targetService = new URI(\u0026quot;http://example.org/sparql\u0026quot;); FormLogin formLogin = new ApacheModAuthFormLogin(\u0026quot;http://example.org/login\u0026quot;, \u0026quot;user\u0026quot;, \u0026quot;password\u0026quot;.toCharArray()); FormsAuthenticator authenticator = new FormsAuthenticator(targetService, formLogin); In the above example the service we want to authenticate against is http://example.org/sparql and it requires us to first login by POSTing our credentials to http://example.org/login.\nHowever if the service is using a more complicated forms login setup you will additionally need to know what the names of the form fields used to submit the username and password. For example say we were authenticating to a service where the form fields were called id and pwd we\u0026rsquo;d need to configure our authenticator as follows:\nURI targetService = new URI(\u0026quot;http://example.org/sparql\u0026quot;); FormLogin formLogin = new ApacheModAuthFormLogin(\u0026quot;http://example.org/login\u0026quot;, \u0026quot;id\u0026quot;, \u0026quot;pwd\u0026quot;, \u0026quot;user\u0026quot;, \u0026quot;password\u0026quot;.toCharArray()); FormsAuthenticator authenticator = new FormsAuthenticator(targetService, formLogin); Note that you can also create a forms authenticator that uses different login forms for different services by creating a Map\u0026lt;URI, FormLogin\u0026gt; that maps each service to an associated form login and passing that to the FormsAuthenticator constructor.\nCurrently forms based login that require more than just a username and password are not supported.\nPreemptiveBasicAuthenticator This authenticator is a decorator over another authenticator that enables preemptive basic authentication, this only works for servers that support basic authentication and so will cause authentication failures when any other authentication scheme is required. You should only use this when you know the remote server uses basic authentication.\nPreemptive authentication is not enabled by default for two reasons:\nIt reduces security as it can result in sending credentials to servers that don\u0026rsquo;t actually require them. It only works for basic authentication and not for other HTTP authentication mechanisms e.g. digest authentication The 2nd point is important to emphasise, this only works for servers using Basic authentication.\nAlso be aware that basic authentication is very insecure since it sends credentials over the wire with only obfuscation for protection. Therefore many servers will use more secure schemes like Digest authentication which cannot be done preemptively as they require more complex challenge response sequences.\nDelegatingAuthenticator The delegating authenticator allows for mapping different authenticators to different services, this is useful when you need to mix and match the types of authentication needed.\nThe Default Authenticator Since it may not always be possible/practical to configure authenticators on a per-request basis the API includes a means to specify a default authenticator that is used when no authenticator is explicitly specified. This may be configured via the setDefaultAuthenticator(HttpAuthenticator authenticator) method of the HttpOp class.\nBy default there is already a default authenticator configured which is the ServiceAuthenticator since this preserves behavioural backwards compatibility with prior versions of ARQ.\nYou can configure the default authenticator to whatever you need so even if you don\u0026rsquo;t directly control the code that is making HTTP requests provided that it is using ARQs APIs to make these then authentication will still be applied.\nNote that the default authenticator may be disabled by setting it to null.\nOther concerns Debugging Authentication ARQ uses Apache Http Client for all its HTTP operations and this provides detailed logging information that can be used for debugging. To see this information you need to configure your logging framework to set the org.apache.http package to either DEBUG or TRACE level.\nThe DEBUG level will give you general diagnostic information about requests and responses while the TRACE level will give you detailed HTTP traces i.e. allow you to see the exact HTTP requests and responses which can be extremely useful for debugging authentication problems.\nAuthenticating to a SPARQL federated service ARQ allows the user to configure HTTP behavior to use on a per-SERVICE basis, including authentication behavior such as is described above. This works via the ARQ context. See Basic Federated Query for more information on configuring this functionality.\n","permalink":"https://jena.apache.org/documentation/archive/versions/http-auth-old.html","tags":null,"title":"HTTP Authentication in ARQ (Superseded)"},{"categories":null,"contents":"The in-memory, transactional dataset provides a dataset with full ACID transaction semantics, including abort. It provides for multiple readers and a concurrent writer together with full snapshot isolation of the dataset. Readers see an unchanging, consistent dataset where aggregate operations return stable results.\nAPI use A new instance of the class is obtained by a call to DatasetFactory.createTxnMem():\nDataset ds = DatasetFactory.createTxnMem() ; This can then be used by the application for reading:\nDataset ds = DatasetFactory.createTxnMem() ; ds.begin(ReadWrite.READ) ; try { ... SPARQL query ... } finally { ds.end() ; } or writing:\nDataset ds = DatasetFactory.createTxnMem() ; ds.begin(ReadWrite.WRITE) ; try { ... SPARQL update ... ... SPARQL query ... ... SPARQL update ... ds.commit() ; } finally { ds.end() ; } If the application does not call commit(), the transaction aborts and the changes are lost. The same happens if the application throws an exception.\nNon-transactional use. If used outside of a transaction, the implementation provides \u0026ldquo;auto-commit\u0026rdquo; functionality. Each triple or added or deleted is done inside an implicit transaction. This has a measurable performance impact. It is better to do related operations inside a single transaction explicitly in the application code.\nAssembler Use The assembler provides for the creation of a dataset and also loading it with data read from URLs (files or from any other URL).\nType: ja:MemoryDataset Properties: ja:data urlForData ja:namedGraph, for loading a specific graph of the dataset. This uses ja:graphName to specific the name and ja:data to load data. The examples use the following prefixes:\nPREFIX ja: \u0026lt;http://jena.hpl.hp.com/2005/11/Assembler#\u0026gt; PREFIX rdf: \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026gt; To create an empty in-memory dataset, all that is required is the line:\n[] rdf:type ja:MemoryDataset . With triples for the default graph, from file dataFile.ttl, Turtle format:\n[] rdf:type ja:MemoryDataset ; ja:data \u0026lt;file:dataFile.ttl\u0026gt; . With triples from several files:\n[] rdf:type ja:MemoryDataset ; ja:data \u0026lt;file:data1.ttl\u0026gt; ; ja:data \u0026lt;file:data2.nt\u0026gt; ; ja:data \u0026lt;file:data3.jsonld\u0026gt; ; . Load TriG:\n[] rdf:type ja:MemoryDataset ; ja:data \u0026lt;file:data.trig\u0026gt; . Load a file of triples into a named graph:\n[] rdf:type ja:MemoryDataset ; ja:namedGraph [ ja:graphName \u0026lt;http://example/graph\u0026gt; ; ja:data \u0026lt;file:///fullPath/data.ttl\u0026gt; ] . ","permalink":"https://jena.apache.org/documentation/rdf/datasets.html","tags":null,"title":"In-memory, transactional Dataset"},{"categories":null,"contents":"This document describes Jena\u0026rsquo;s built-in assembler classes and how to write and integrate your own assemblers. If you just need a quick guide to the common model specifications, see the assembler quickstart; if you want more details on writing assembler descriptions, see the assembler howto.\nThe Assembler interface An Assembler is an object that builds objects (most importantly, Models) from RDF descriptions.\npublic Object open( Assembler a, Resource root, Mode mode ); public Object open( Assembler a, Resource root ); public Object open( Resource root ); public Model openModel( Resource root ); public Model openModel( Resource root, Mode mode ); The fundamental method is the first: all the others are shorthands for ways of calling it. The abstract class AssemblerBase implements Assembler leaving only that method abstract and defining the others in terms of it.\nThe definition of a.open(Assembler sub, Resource root, Mode mode) is that a will construct the object described by the properties of root. If this requires the construction of sub-objects from descriptions hanging off root, sub.open is to be used to construct those. If the object is to be constructed in some persistent store, mode defines whether objects can be re-used or created: see modes for more details.\nBuiltin assemblers Jena comes with a collection of built-in assemblers: various basic assemblers and a composite general assembler. Each of these assemblers has a constant instance declared as a field of Assembler.\nAssembler Result class Type constant Temporarily omitted as the source got scrambled by the Markdown import TODO Inside Assemblers Assembler.general is a particular implementation of the Assembler interface. An Assembler knows how to build the objects - not just models - described by an Assembler specification. The normal route into an Assembler is through the method:\nopen( Resource root ) ? Object The Assembler inspects the root resource properties and decides whether it can build an object with that description. If not, it throws an exception. Otherwise, it constructs and returns a suitable object. Since the creation of Models is the reason for the existence of Assemblers, there is a convenience wrapper method:\nopenModel( Resource root ) ? Model which constructs the object and checks that it\u0026rsquo;s a Model before returning it. When an Assembler requires sub-objects (for example, when an InfModel Assembler requires a Reasoner object), it uses the method:\nopen( Assembler sub, Resource root ) ? Model passing in a suitable Assembler object. In fact the standard implementation of open(root) is just\nopen( this, root ) passing in itself as the sub-assembler and having open(Assembler,Resource) be the place where all the work is done. (Amongst other things, this makes testing easier.) When working with named persistent objects (typically database models), sometimes you need to control whether new objects should be constructed or old models can be reused. There is an additional method\nopen( Assembler sub, Resource root, Mode mode ) where the Mode argument controls the creation (or not) of persistent models. The mode is passed down to all sub-object creation. The standard implementation of open(sub,root) is just:\nopen( sub, root, Mode.DEFAULT ) A Mode object has two methods:\npermitCreateNew( Resource root, String name ) permitUseExisting( Resource root, String name ) root is the root resource describing the object to be created or reused, and name is the name given to it. The result is true iff the permission is granted. Mode.DEFAULT permits the reuse of existing objects and denies the creation of new ones. There are four Mode constants:\nMode.DEFAULT - reuse existing objects Mode.CREATE - create missing objects Mode.REUSE - reuse existing objects Mode.ANY - reuse existing objects, create missing ones Since the Mode methods are passed the resource root and name, the user can write specialised Modes that look at the name or the other root properties to make their decision. Note that the Modes only apply to persistent objects, so eg MemoryModels or PrefixMappings ignore their Mode arguments.\nImplementing your own assemblers (Temporary documentation pasted in from email; will be integrated and made nice RSN.)\nYou have to implement the Assembler interface, most straightforwardly done by subclassing AssemblerBase and overriding public Object open( Assembler a, Resource root, Mode mode ); because AssemblerBase both implements the boring methods that are just specialisations of `open` and provides some utility methods such as getting the values of unique properties. The arguments are * a -- the assembler to use for any sub-assemblies * root -- the resource in the assembler description for this object * mode -- the persistent open vs create mode The pattern is to look for the known properties of the root, use those to define any sub-objects of the object you're assembling (including using `a` for anything that's itself a structured object) and then constructing a new result object from those components. Then you attach this new assembler object to its type in some AssemblerGroup using that group's `implementWith` method. You can attach it to the handy-but-public-and-shared group `Assembler.general` or you can construct your own group. The point about an AssemblerGroup is that it does the type-to-assembler mapping for you -- and when an AssemblerGroup calls a component assembler's `open` method, it passes /itself/ in as the `a` argument, so that the invoked assembler has access to all of the component assemblers of the Group. basic assemblers There is a family of basic assemblers, each of which knows how to assemble a specific kind of object so long as they\u0026rsquo;re given an Assembler that can construct their sub-objects. There are defined constants in Assembler for (an instance of) each of these basic assembler classes.\nproduces Class Type constant default models DefaultModelAssembler ja:DefaultModel defaultModel memory models MemoryModelAssembler ja:MemoryModel memoryModel inference models InfModelAssembler ja:InfModel infModel reasoners ReasonerAssembler ja:Reasoner reasoner content ContentAssembler ja:Content content ontology models OntModelAssembler ja:OntModel ontModel rules RuleSetAssembler ja:RuleSet rules union models UnionModelAssembler ja:UnionModel unionModel prefix mappings PrefixMappingAssembler ja:PrefixMapping prefixMapping file models FileModelAssembler ja:FileModel fileModel Assembler.general is an assembler group, which ties together those basic assemblers. general can be extended by Jena coders if required. Jena components that use Assembler specifications to construct objects will use general unless documented otherwise.\nIn the remaining sections we will discuss the Assembler classes that return non-Model objects and conclude with a description of AssemblerGroup.\nBasic assembler ContentAssembler The ContentAssembler constructs Content objects (using the ja:Content vocabulary) used to supply content to models. A Content object has the method:\nfill( Model m ) ? m Invoking the fill method adds the represented content to the model. The supplied ModelAssemblers automatically apply the Content objects corresponding to ja:content property values.\nBasic assembler RulesetAssembler A RulesetAssembler generates lists of Jena rules.\nBasic assembler DefaultModelAssembler A \u0026ldquo;default model\u0026rdquo; is a model of unspecified type which is implemented as whatever kind the assembler for ja:DefaultModel generates. The default for a DefaultModel is to create a MemoryModel with no special properties.\nAssemblerGroup The AssemblerGroup class allows a bunch of other Assemblers to be bundled together and selected by RDF type. AssemblerGroup implements Assembler and adds the methods:\nimplementWith( Resource type, Assembler a ) ? this assemblerFor( Resource type ) ? Assembler AssemblerGroup\u0026rsquo;s implementation of open(sub,root) finds the most specific type of root that is a subclass of ja:Object and looks for the Assembler that has been associated with that type by a call of implementWith. It then delegates construction to that Assembler, passing itself as the sub-assembler. Hence each component Assembler only needs to know how to assemble its own particular objects.\nThe assemblerFor method returns the assembler associated with the argument type by a previous call of implementWith, or null if there is no associated assembler.\nLoading assembler classes AssemblerGroups implement the ja:assembler functionality. The object of an (type ja:assembler \u0026quot;ClassName\u0026quot;) statement is a string which is taken as the name of an Assembler implementation to load. An instance of that class is associated with type using implementWith.\nIf the class has a constructor that takes a single Resource object, that constructor is used to initialise the class, passing in the type subject of the triple. Otherwise the no-argument constructor of the class is used.\n","permalink":"https://jena.apache.org/documentation/assembler/inside-assemblers.html","tags":null,"title":"Inside assemblers"},{"categories":null,"contents":"There\u0026rsquo;s quite a lot of code inside Jena, and it can be daunting for new Jena users to find their way around. On this page we\u0026rsquo;ll summarise the key features and interfaces in Jena, as a general overview and guide to the more detailed documentation.\nAt its core, Jena stores information as RDF triples in directed graphs, and allows your code to add, remove, manipulate, store and publish that information. We tend to think of Jena as a number of major subsystems with clearly defined interfaces between them. First let\u0026rsquo;s start with the big picture:\nRDF triples and graphs, and their various components, are accessed through Jena\u0026rsquo;s RDF API. Typical abstractions here are Resource representing an RDF resource (whether named with a URI or anonymous), Literal for data values (numbers, strings, dates, etc.), Statement representing an RDF triple and Model representing the whole graph. The RDF API has basic facilities for adding and removing triples to graphs and finding triples that match particular patterns. Here you can also read in RDF from external sources, whether files or URL\u0026rsquo;s, and serialize a graph in correctly-formatted text form. Both input and output support most of the commonly-used RDF syntaxes.\nWhile the programming interface to Model is quite rich, internally, the RDF graph is stored in a much simpler abstraction named Graph. This allows Jena to use a variety of different storage strategies equivalently, as long as they conform to the Graph interface. Out-of-the box, Jena can store a graph as an in-memory store, or as a persistent store using a custom disk-based tuple index. The graph interface is also a convenient extension point for connecting other stores to Jena, such as LDAP, by writing an adapter that allows the calls from the Graph API to work on that store.\nA key feature of semantic web applications is that the semantic rules of RDF, RDFS and OWL can be used to infer information that is not explicitly stated in the graph. For example, if class C is a sub-class of class B, and B a sub-class of A, then by implication C is a sub-class of A. Jena\u0026rsquo;s inference API provides the means to make these entailed triples appear in the store just as if they had been added explicitly. The inference API provides a number of rule engines to perform this job, either using the built-in rulesets for OWL and RDFS, or using application custom rules. Alternatively, the inference API can be connected up to an external reasoner, such as description logic (DL) engine, to perform the same job with different, specialised, reasoning algorithms.\nThe collection of standards that define semantic web technologies includes SPARQL - the query language for RDF. Jena conforms to all of the published standards, and tracks the revisions and updates in the under-development areas of the standard. Handling SPARQL, both for query and update, is the responsibility of the SPARQL API.\nOntologies are also key to many semantic web applications. Ontologies are formal logical descriptions, or models, of some aspect of the real-world that applications have to deal with. Ontologies can be shared with other developers and researchers, making it a good basis for building linked-data applications. There are two ontology languages for RDF: RDFS, which is rather weak, and OWL which is much more expressive. Both languages are supported in Jena though the Ontology API, which provides convenience methods that know about the richer representation forms available to applications through OWL and RDFS.\nWhile the above capabilities are typically accessed by applications directly through the Java API, publishing data over the Internet is a common requirement in modern applications. Fuseki is a data publishing server, which can present, and update, RDF models over the web using SPARQL and HTTP.\nThere are many other pieces to Jena, including command-line tools, specialised indexes for text-based lookup, etc. These, and further details on the pieces outlined above, can be found in the detailed documentation on this site.\n","permalink":"https://jena.apache.org/about_jena/architecture.html","tags":null,"title":"Jena architecture overview"},{"categories":null,"contents":"Introduction This document describes the vocabulary and effect of the built-in Jena assembler descriptions for constructing models (and other things). A companion document describes the built-in assembler classes and how to write and integrate your own assemblers. If you just need a quick guide to the common model specifications, see the assembler quickstart.\nThis document describes how to use the Assembler classes to construct models \u0026ndash; and other things \u0026ndash; from RDF descriptions that use the Jena Assembler vocabulary. That vocabulary is available in assembler.ttl as an RDFS schema with conventional prefix ja for the URI http://jena.hpl.hp.com/2005/11/Assembler#; the class JA is its Java rendition.\nThe examples used in this document are extracted from the examples file examples.ttl. The pieces of RDF/OWL schema are extracted from the ja-vocabulary file.\nThe property names selected are those which are the \u0026ldquo;declared properties\u0026rdquo; (as per Jena\u0026rsquo;s listDeclaredProperties method) of the class. Only the most specialised super-classes and range classes are shown, so (for example) rdf:Resource typically won\u0026rsquo;t appear.\nOverview An Assembler specification is a Resource in some RDF Model. The properties of that Resource describe what kind of object is to be assembled and what its components are: for example, an InfModel is constructed by specifying a base model and a reasoner. The specifications for the components are themselves Assembler specifications given by other Resources in the same Model.For example, to specify a memory model with data loaded from a file:\neg:model a ja:MemoryModel ; ja:content [ja:externalContent \u0026lt;file:////home/kers/projects/jena2/doc/assembler/Data/example.n3\u0026gt;] . The rdf:type of eg:model specifies that the constructed Model is to be a Jena memory-based model. The ja:content property specifies that the model is to be loaded with the content of the resource file:Data/example.n3. The content handler guesses from the \u0026ldquo;.n3\u0026rdquo; suffix that this file is to be read using the Jena N3 reader.\nUnless otherwise specified by an application, Assembler specifications are interpreted after completion by\nincluding the JA schema, including (recursively) the objects of any owl:imports and ja:imports statements, and doing (limited) RDFS inference. (The supplied model is not modified.) In the example above, eg:model has to be given an explicit type, but the ja:externalContent bnode is implicitly typed by the domain of ja:externalContent. In this document, we will usually leave out inferrable types.\nWe can construct our example model from the specification like this (you may need to tweak the filename to make this work in your environment):\nModel spec = RDFDataMgr.loadModel( \u0026#34;examples.ttl\u0026#34; ); Resource root = spec.createResource( spec.expandPrefix( \u0026#34;eg:opening-example\u0026#34; ) ); Model m = Assembler.general.openModel( root ); The model is constructed from the \u0026ldquo;root resource\u0026rdquo;, eg:opening-example in our example. general knows how to create all the kinds of objects - not just Models - that we describe in the next sections.\nSpecifications common to all models Assembler specifications can describe many kinds of models: memory, inference, ontology, and file-backed. All of these model specifications share a set of base properties for attaching content and prefix mappings.\nja:Loadable a rdfs:Class ; rdfs:subClassOf ja:Object . ja:initialContent a rdf:Property ; rdfs:domain ja:Loadable rdfs:range ja:Content . ja:content a rdf:Property ; rdfs:domain ja:Loadable ; rdfs:range ja:Content . ja:Model a rdfs:Class ; rdfs:subClassOf ja:ContentItem ; rdfs:subClassOf ja:Loadable . ja:prefixMapping a rdf:Property ; rdfs:domain ja:Model ; rdfs:range ja:PrefixMapping . All of a model\u0026rsquo;s ja:content property values are interpreted as specifying Content objects and a single composite Content object is constructed and used to initialise the model. See Content for the description of Content specifications. For example:\neg:sharedContent ja:externalContent \u0026lt;http://somewhere/RDF/ont.owl\u0026gt; . eg:common-example a ja:MemoryModel ; ja:content eg:sharedContent ; ja:content [ja:externalContent \u0026lt;file:////home/kers/projects/jena2/doc/assembler/Data/A.rdf\u0026gt;] ; ja:content [ja:externalContent \u0026lt;file:////home/kers/projects/jena2/doc/assembler/Data/B.rdf\u0026gt;] . The model constructed for eg:A will be loaded with the contents of Data/A.n3, Data/B.rdf, and http://somewhere/RDF/ont.owl. If the model supports transactions, then the content is loaded inside a transaction; if the load fails, the transaction is aborted, and a TransactionAbortedException thrown. If the content has any prefix mappings, then they are also added to the model.\nAll of a model\u0026rsquo;s ja:prefixMapping, ja:prefix, and ja:namespace properties are interpreted as specifying a PrefixMapping object and a single composite PrefixMapping is constructed and used to set the prefixes of the model. See PrefixMapping for the description of Content specifications.\nContent specification A Content specification describes content that can be used to fill models. Content can be external (files and URLs) or literal (strings in the specification) or quotations (referring to RDF which is part of the specification).\nja:Content a rdfs:Class ; rdfs:subClassOf ja:HasFileManager . ja:HasFileManager a rdfs:Class ; rdfs:subClassOf ja:Object . ja:fileManager a rdf:Property ; rdfs:domain ja:HasFileManager ; rdfs:range ja:FileManager . A ja:Content specification may have zero or more ja:externalContent property values. These are URI resources naming an external (file or http etc) RDF object. The constructed Content object contains the union of the values of all such resources. For example:\neg:external-content-example ja:externalContent \u0026lt;file:////home/kers/projects/jena2/doc/assembler/Data/C.owl\u0026gt;, \u0026lt;http://jena.hpl.hp.com/some-jena-data.rdf\u0026gt; . The external content is located using a FileManager. If the Content resource has a ja:fileManager property, then the FileManager described by that resource is used. Otherwise, if the ContentAssembler assembling this specification was constructed with a FileManager argument, that FileManager is used. Otherwise, the default FileManager, FileManager.get(), is used.\nThe string literal value of the any ja:literalContent properties is interpreted as RDF in an appropriate language. The constructed Content object contains that RDF. The language is either specified by an explicit ja:contentEncoding property value, or guessed from the content of the string. The only encodings permitted are \u0026ldquo;N3\u0026rdquo; and \u0026ldquo;RDF/XML\u0026rdquo;. For example:\neg:literal-content-example ja:literalContent \u0026#34;_:it dc:title \u0026#39;Interesting Times\u0026#39;\u0026#34; . The literal content is wrapped so that prefix declarations for rdf, rdfs, owl, dc, and xsd apply before interpretation.\nThe property values of any ja:quotedContent properties should be resources. The subgraphs rooted at those resources (using the algorithm from ResourceUtils.reachableClosure()) are added to the content.\nInference models and reasoners Inference models are specified by supplying a description of the reasoner that is used by the model and (optionally) a base model to reason over. For example:\neg:inference-example ja:baseModel [a ja:MemoryModel] ; ja:reasoner [ja:reasonerURL \u0026lt;http://jena.hpl.hp.com/2003/RDFSExptRuleReasoner\u0026gt;] . describes an inference model that uses RDFS reasoning. The reasonerURL property value is the URI used to identify the reasoner (it is the value of the Jena constant RDFSRuleReasonerFactory.URI). The base model is specified as a memory model; if it is left out, an empty memory model is used.\neg:db-inference-example ja:baseModel eg:model-example ; ja:reasoner [ja:reasonerURL \u0026lt;http://jena.hpl.hp.com/2003/RDFSExptRuleReasoner\u0026gt;] . The same reasoner as used as in the previous example, but now the base model is a specific model description in the same way as our earlier example.\nBecause Jena\u0026rsquo;s access to external reasoners goes through the same API as for its internal reasoners, you can access a DIG reasoner (such as Pellet running as a server) using an Assembler specification:\neg:external-inference-example ja:reasoner [\u0026lt;http://jena.hpl.hp.com/2003/JenaReasoner#extReasonerURL\u0026gt; \u0026lt;http://localhost:2004/\u0026gt; ; ja:reasonerURL \u0026lt;http://jena.hpl.hp.com/2003/DIGReasoner\u0026gt;] . If there\u0026rsquo;s a DIG server running locally on port 2004, this specification will create a DIG inference model that uses it.\nThe internal rule reasoner can be supplied with rules written inside the specification, or outside from some resource (file or http: URL): eg:rule-inference-example ja:reasoner [ja:rule \u0026ldquo;[r1: (?x my:P ?y) -\u0026gt; (?x rdf:type my:T)]\u0026rdquo;] .\nThis reasoner will infer a type declaration from a use of a property. (The prefix my will have to be known to the rule parser, of course.)\nja:InfModel a rdfs:Class ; rdfs:subClassOf [owl:onProperty ja:reasoner; owl:maxCardinality 1] ; rdfs:subClassOf [owl:onProperty ja:baseModel; owl:maxCardinality 1] ; rdfs:subClassOf ja:Model . ja:reasoner a rdf:Property ; rdfs:domain ja:InfModel ; rdfs:range ja:ReasonerFactory . ja:baseModel a rdf:Property ; rdfs:domain ja:InfModel ; rdfs:range ja:Model . ja:HasRules a rdfs:Class ; rdfs:subClassOf ja:Object . ja:rule a rdf:Property ; rdfs:domain ja:HasRules . ja:rulesFrom a rdf:Property ; rdfs:domain ja:HasRules . ja:rules a rdf:Property ; rdfs:domain ja:HasRules ; rdfs:range ja:RuleSet . An InfModel\u0026rsquo;s ja:baseModel property value specifies the base model for the inference model; if omitted, an empty memory model is used.\nAn InfModel\u0026rsquo;s ja:ReasonerFactory property value specifies the Reasoner for this inference model; if omitted, a GenericRuleReasoner is used.\nA Reasoner\u0026rsquo;s optional ja:schema property specifies a Model which contains the schema for the reasoner to be bound to. If omitted, no schema is used.\nIf the Reasoner is a GenericRuleReasoner, it may have any of the RuleSet properties ja:rules, ja:rulesFrom, or ja:rule. The rules of the implied RuleSet are added to the Reasoner.\nReasonerFactory A ReasonerFactory can be specified by URL or by class name (but not both).\nja:ReasonerFactory a rdfs:Class ; rdfs:subClassOf [owl:onProperty ja:ReasonerURL; owl:maxCardinality 1] ; rdfs:subClassOf ja:HasRules . ja:reasonerClass a rdf:Property ; rdfs:domain ja:ReasonerFactory . ja:reasonerURL a rdf:Property ; rdfs:domain ja:ReasonerFactory . ja:schema a rdf:Property ; rdfs:domain ja:ReasonerFactory ; rdfs:range ja:Model . If the optional unique property ja:reasonerURL is specified, then its resource value is the URI of a reasoner in the Jena reasoner registry; the reasoner is the one with the given URI.\nIf the optional property ja:schema is specified, then the models specified by all the schema properties are unioned and any reasoner produced by the factory will have that union bound in as its schema (using the Reasoner::bindSchema() method).\nIf the optional unique property ja:reasonerClass is specified, its value names a class which implements ReasonerFactory. That class is loaded and an instance of it used as the factory.\nThe class may be named by the lexical form of a literal, or by a URI with the (fake) \u0026ldquo;java:\u0026rdquo; scheme.\nIf the class has a method theInstance, that method is called to supply the ReasonerFactory instance to use. Otherwise, a new instance of that class is constructed. Jena\u0026rsquo;s reasoner factories come equipped with this method; for other factories, see the documentation.\nRulesets A RuleSet specification allows rules (for ReasonerFactories) to be specified inline, elsewhere in the specification model, or in an external resource.\nja:RuleSet a rdfs:Class ; rdfs:subClassOf ja:HasRules . The optional repeatable property ja:rule has as its value a literal string which is the text of a Jena rule or rules. All those rules are added to the RuleSet.\nThe optional repeatable property ja:rulesFrom has as its value a resource whose URI identifies a file or other external entity that can be loaded as Jena rules. All those rules are added to the RuleSet.\nThe optional repeatable property ja:rules has as its value a resource which identifies another RuleSet in the specification model. All those rules from that RuleSet are added to this RuleSet.\nOntology models Ontology models can be specified in several ways. The simplest is to use the name of an OntModelSpec from the Java OntModelSpec class:\neg:simple-ont-example ja:ontModelSpec ja:OWL_DL_MEM_RULE_INF . This constructs an OntModel with an empty base model and using the OWL_DL language and the full rule reasoner. All of the OntModelSpec constants in the Jena implementation are available in this way. A base model can be specified:\neg:base-ont-example ja:baseModel [a ja:MemoryModel ; ja:content [ja:externalContent \u0026lt;http://jena.hpl.hp.com/some-jena-data.rdf\u0026gt;]] . The OntModel has a base which is a memory model loaded with the contents of http://jena.hpl.hp.com/some-jena-data.rdf. Since the ontModelSpec was omitted, it defaults to OWL_MEM_RDFS_INF - the same default as ModelFactory.createOntologyModel().\nja:OntModel a rdfs:Class ; rdfs:subClassOf ja:UnionModel ; rdfs:subClassOf ja:InfModel . ja:ontModelSpec a rdf:Property ; rdfs:domain ja:OntModel ; rdfs:range ja:OntModelSpec . ja:OntModelSpec a rdfs:Class ; rdfs:subClassOf [owl:onProperty ja:like; owl:maxCardinality 1] ; rdfs:subClassOf [owl:onProperty ja:reasonerFactory; owl:maxCardinality 1] ; rdfs:subClassOf [owl:onProperty ja:importSource; owl:maxCardinality 1] ; rdfs:subClassOf [owl:onProperty ja:documentManager; owl:maxCardinality 1] ; rdfs:subClassOf [owl:onProperty ja:ontLanguage; owl:maxCardinality 1] ; rdfs:subClassOf ja:Object . ja:importSource a rdf:Property ; rdfs:domain ja:OntModelSpec . ja:reasonerFactory a rdf:Property ; rdfs:domain ja:OntModelSpec ; rdfs:range ja:ReasonerFactory . ja:documentManager a rdf:Property ; rdfs:domain ja:OntModelSpec . ja:ontLanguage a rdf:Property ; rdfs:domain ja:OntModelSpec . ja:likeBuiltinSpec a rdf:Property ; rdfs:domain ja:OntModelSpec . OntModel is a subclass of InfModel, and the ja:baseModel property means the same thing.\nThe OntModelSpec property value is a resource, interpreted as an OntModelSpec description based on its name and the value of the appropriate properties:\nja:likeBuiltinSpec: The value of this optional unique property must be a JA resource whose local name is the same as the name of an OntModelSpec constant (as in the simple case above). This is the basis for the OntModelSpec constructed from this specification. If absent, then OWL_MEM_RDFS_INF is used. To build an OntModelSpec with no inference, use eg ja:likeBuiltinSpec ja:OWL_MEM. ja:importSource: The value of this optional unique property is a ModelSource description which describes where imports are obtained from. A ModelSource is usually of class ja:ModelSource. ja:documentManager: This value of this optional unique property is a DocumentManager specification. If absent, the default document manager is used. ja:reasonerFactory: The value of this optional unique property is the ReasonerFactory resource which will be used to construct this OntModelSpec\u0026rsquo;s reasoner. A reasonerFactory specification is the same as an InfModel\u0026rsquo;s reasoner specification (the different properties are required for technical reasons). ja:reasonerURL: as a special case of reasonerFactory, a reasoner may be specified by giving its URL as the object of the optional unique reasonerURL property. It is not permitted to supply both a reasonerURL and reasonerFactory properties. ja:ontLanguage: The value of this optional unique property is one of the values in the ProfileRegistry class which identifies the ontology language of this OntModelSpec: OWL: http://www.w3.org/2002/07/owl# OWL DL: http://www.w3.org/TR/owl-features/#term_OWLDL OWL Lite: http://www.w3.org/TR/owl-features/#term_OWLLite RDFS: http://www.w3.org/2000/01/rdf-schema# Any unspecified properties have default values, normally taken from those of OntModelSpec.OWL_MEM_RDFS_INF. However, if the OntModelSpec resource is in the JA namespace, and its local name is the same as that of an OntModelSpec constant, then that constant is used as the default value.\nDocument managers An OntDocumentManager can be specified by a ja:DocumentManager specification which describes the OntDocumentManager\u0026rsquo;s file manager and policy settings.\neg:mapper lm:mapping [lm:altName \u0026#34;file:etc/foo.n3\u0026#34; ; lm:name \u0026#34;file:foo.n3\u0026#34;] . eg:document-manager-example ja:fileManager [ja:locationMapper eg:mapper] ; ja:meta [ dm:altURL \u0026lt;http://localhost/RDF/my-alt.rdf\u0026gt;] . In this example, eg:document-manager-example is a ja:DocumentManager specification. It has its own FileManager specification, the object of the ja:fileManager property; that FileManager has a location mapper, eg:mapper, that maps a single filename.\nThe document manager also has an additional property to link it to document manager meta-data: the sub-model of the assembler specification reachable from eg:document-manager-example is passed to the document manager when it is created. For the meanings of the dm: properties, see the Jena ontology documentation and the ontology.rdf ontology.\nja:DocumentManager a rdfs:Class ; rdfs:subClassOf [owl:onProperty ja:policyPath; owl:maxCardinality 1] ; rdfs:subClassOf [owl:onProperty ja:fileManager; owl:maxCardinality 1] ; rdfs:subClassOf [owl:onProperty ja:fileManager; owl:maxCardinality 1] ; rdfs:subClassOf ja:HasFileManager . ja:policyPath a rdf:Property ; rdfs:domain ja:DocumentManager . The ja:fileManager property value, if present, has as its object a ja:FileManager specification; the constructed document manager is given a new file manager constructed from that specification. If there is no ja:fileManager property, then the default FileManager is used.\nThe ja:policyPath property value, if present, should be a string which is a path to policy files as described in the Jena ontology documentation. If absent, the usual default path is applied.\nIf the sub-model of the assembler specification reachable from the DocumentManager resource contains any OntDocumentManager DOC_MGR_POLICY or ONTOLOGY_SPEC objects, they will be interpreted by the constructed document manager object.\nja:FileManager a rdfs:Class ; rdfs:subClassOf [owl:onProperty ja:locationMapper; owl:maxCardinality 1] ; rdfs:subClassOf ja:Object . ja:locationMapper a rdf:Property ; rdfs:domain ja:FileManager ; rdfs:range ja:LocationMapper . A ja:FileManager object may have a ja:locationMapper property value which identifies the specification of a LocationMapper object initialising that file manager.\nja:LocationMapper a rdfs:Class ; rdfs:subClassOf [owl:onProperty lm:mapping; owl:maxCardinality 1] ; rdfs:subClassOf ja:Object . lm:mapping a rdf:Property ; rdfs:domain ja:LocationMapper . A ja:LocationMapper object may have lm:mapping property values, describing the location mapping, as described in the FileManager documentation. (Note that the vocabulary for those items is in a different namespace than the JA properties and classes.)\nUnion models Union models can be constructed from any number of sub-models and a single root model. The root model is the one written to when the union model is updated; the sub-models are untouched.\nja:UnionModel a rdfs:Class ; rdfs:subClassOf [owl:onProperty ja:rootModel; owl:maxCardinality 1] ; rdfs:subClassOf ja:Model . ja:rootModel a rdf:Property ; rdfs:domain ja:UnionModel ; rdfs:range ja:Model . ja:subModel a rdf:Property ; rdfs:domain ja:UnionModel ; rdfs:range ja:Model . If the single ja:rootModel property is present, its value describes a model to use as the root model of the union. All updates to the union are directed to this root model. If no root model is supplied, the union is given an immutable, empty model as its root.\nAny ja:subModel property values have objects describing the remaining sub-models of the union. The order of the sub-models in the union is undefined (which is why there\u0026rsquo;s a special rootModel property).\nPrefix mappings The PrefixMappings of a model may be set from PrefixMapping specifications.\nja:PrefixMapping a rdfs:Class ; rdfs:subClassOf ja:Object . ja:includes a rdf:Property ; rdfs:domain ja:PrefixMapping ; rdfs:range ja:PrefixMapping . ja:SinglePrefixMapping a rdfs:Class ; rdfs:subClassOf [owl:onProperty ja:namespace; owl:cardinality 1] ; rdfs:subClassOf [owl:onProperty ja:prefix; owl:cardinality 1] ; rdfs:subClassOf ja:PrefixMapping . ja:namespace a rdf:Property ; rdfs:domain ja:SinglePrefixMapping . ja:prefix a rdf:Property ; rdfs:domain ja:SinglePrefixMapping . The ja:includes property allows a PrefixMapping to include the content of other specified PrefixMappings.\nThe ja:prefix and ja:namespace properties allow the construction of a single element of a prefix mapping by specifying the prefix and namespace of the mapping.\nOther Assembler directives There are two more Assembler directives that can be used in an Assembler specification: the assembler and imports directives.\nAssembler A specification may contain statements of the form:\nsomeResource ja:assembler \u0026#34;some.Assembler.class.name\u0026#34; When someResource is used as the type of a root object, the AssemblerGroup that processes the description will use an instance of the Java class named by the object of the statement. That class must implement the Assembler interface. See loading assembler classes for more details.\nSimilarly, statements of the form:\nsomeResource ja:loadClass \u0026#34;some.class.name\u0026#34; will cause the named class to be loaded (but not treated as assemblers).\nImports If a specification contains statements of the form:\nanyResource owl:imports someURL or, equivalently,\nanyResource ja:imports someURL then the specification is regarded as also containing the contents of the RDF at someURL. That RDF may in turn contain imports referring to other RDF.\nLimited RDFS inference The Assembler engine uses limited RDFS inference to complete the model it is given, so that the spec-writer does not need to write excessive and redundant RDF. (It does not use the usual Jena reasoners because this limited once-off reasoning has been faster.) The inference steps are:\nadd all the classes from the JA schema. do subclass closure over all the classes. do domain and range inference. do simple intersection inference: if X is an instance of intersection A B C \u0026hellip;, then X is an instance of A, B, C \u0026hellip; (and their supertypes). This is sufficient for closed-world assembling. Other parts of the JA schema \u0026ndash; eg, cardinality constraints \u0026ndash; are hard-coded into the individual assemblers.\n","permalink":"https://jena.apache.org/documentation/assembler/assembler-howto.html","tags":null,"title":"Jena Assembler howto"},{"categories":null,"contents":"Jena\u0026rsquo;s assembler provides a means of constructing Jena models according to a recipe, where that recipe is itself stated in RDF. This is the Assembler quickstart page. For more detailed information, see the Assembler howto or Inside assemblers.\nWhat is an Assembler specification? An Assembler specification is an RDF description of how to construct a model and its associated resources, such as reasoners, prefix mappings, and initial content. The Assembler vocabulary is given in the Assembler schema, and we\u0026rsquo;ll use the prefix ja for its identifiers.\nWhat is an Assembler? An Assembler is an object that implements the Assembler interface and can construct objects (typically models) from Assembler specifications. The constant Assembler.general is an Assembler that knows how to construct some general patterns of model specification.\nHow can I make a model according to a specification? Suppose the Model M contains an Assembler specification whose root - the Resource describing the whole Model to construct is R (so R.getModel() == M). Invoke:\nAssembler.general.openModel(R) The result is the desired Model. Further details about the Assembler interface, the special Assembler general, and the details of specific Assemblers, are deferred to the Assembler howto.\nHow can I specify \u0026hellip; In the remaining sections, the object we want to describe is given the root resource my:root.\n\u0026hellip; a memory model? my:root a ja:MemoryModel. \u0026hellip; an inference model? my:root ja:reasoner [ja:reasonerURL theReasonerURL] ; ja:baseModel theBaseModelResource . theReasonerURL is one of the reasoner (factory) URLs given in the inference documentation and code; theBaseModelResource is another resource in the same document describing the base model.\n\u0026hellip; some initialising content? my:root ja:content [ja:externalContent \u0026lt;someContentURL\u0026gt;] ... rest of model specification ... . The model will be pre-loaded with the contents of someContentURL.\n\u0026hellip; an ontology model? my:root ja:ontModelSpec ja:OntModelSpecName ; ja:baseModel somebaseModel . The OntModelSpecName can be any of the predefined Jena OntModelSpec names, eg OWL_DL_MEM_RULE_INF. The baseModel is another model description - it can be left out, in which case you get an empty memory model. See Assembler howto for construction of non-predefined OntModelSpecs.\n","permalink":"https://jena.apache.org/documentation/assembler/","tags":null,"title":"Jena assembler quickstart"},{"categories":null,"contents":"Jena Extra modules are modules that provide utilities and larger packages that make Apache Jena development or usage easier but that do not fall within the standard Jena framework.\nSub Packages Bulk retrieval and caching with SERVICE clauses Query Builder ","permalink":"https://jena.apache.org/documentation/extras/","tags":null,"title":"Jena Extras - Extra packages for Jena development."},{"categories":null,"contents":"This page is historical \u0026ldquo;for information only\u0026rdquo; - there is no Apache release of Eyeball and the code has not been updated for Jena3.\nThe original source code is available. This document describes Eyeball, an \u0026ldquo;RDF lint\u0026rdquo;. See the release notes for descriptions of changes from previous versions. Eyeball was a part of the Jena family of RDF/OWL tools.\nThroughout this document, the prefix eye: stands for the URL http://jena.hpl.hp.com/Eyeball\\#.\nIntroduction Eyeball is a library and command-line tool for checking RDF and OWL models for various common problems. These problems often result in technically correct but implausible RDF. Eyeball checks against user-provided schema files and makes various closed-world assumptions.\nEyeball can check for:\nunknown [with respect to the schemas] properties and classes bad prefix namespaces ill-formed URIs, with user-specifiable constraints ill-formed language tags on literals datatyped literals with illegal lexical forms unexpected local names in schema namespaces untyped resources and literals individuals having consistent types, assuming complete typing likely cardinality violations broken RDF list structures suspected broken use of the typed list idiom obviously broken OWL restrictions user-specified constraints written in SPARQL Eyeball\u0026rsquo;s checks are performed by Inspector plug-ins and can be customised by the user. Rendering its reports to output is performed by Renderer plug-ins which can also be customised by the user.\nInstallation Fetch the Eyeball distribution zipfile and unpack it somewhere convenient. Eyeball 2.1 comes with its own copy of Jena 2.5 with CVS updates. Do not attempt to use other versions of Jena with Eyeball.\nIn the Eyeball distribution directory, run the Eyeball tests:\nant test If these tests fail, something is wrong. Please ask on the user mailing list.\nIf the tests have passed, you can use Eyeball from the installation directory, or copy lib, etc and mirror to somewhere convenient.\nCommand line operation You must ensure that all the Eyeball jars from lib are on your classpath. (Note that Eyeball comes with its own Jena jar files and may not work with other Jena jars.) The directories etc and mirror should be in the current directory or also on your classpath.\nRun the Eyeball command:\njava [java options eg classpath and proxy] jena.eyeball (-check | -sign | -accept) specialURL+ [-assume Reference*] [-config fileOrURL*] [-set Setting*] [-root rootURI] [-render Name] [-include shortName*] [-exclude shortName*] [-analyse | -repair] [-remark] [-version] The -whatever sections can come in any order and may be repeated, in which case the additional arguments are appended to the existing ones. Exactly one of -check, -sign, -accept, or version must be provided; all the other options are optional.\nWhen Eyeball resolves ordinary filenames or URLs it uses the Jena file manager to possibly map those names (eg to redirect an http: URL to a local cached copy). See the file manager howto for details on how to configure the file manager.\nExamples of command-line use java jena.eyeball -version java jena.eyeball -check myDataFile.rdf java jena.eyeball -assume dc -check http://example.com/nosuch.n3 java jena.eyeball -assume mySchema.rdf -check myData.rdf -render xml java jena.eyeball -check myData.rdf -include consistent-type java jena.eyeball -check myConfig.ttl -sign \u0026gt;signedConfig.ttl -check specialURL+ The -check command checks the specified models for problems. The specialURLs designate the models to be checked. In the simplest case, these are plain filenames, file: URLs, or http: URLs. At least one specialURL must be specified. Each specified model is checked independently of the others.\n-check myModel.ttl -check file:///c:/rdf/pizza.owl -check http://example.com/rdf/beer.rdf If the specialURL is of the form ont:NAME:base, then the checked model is the model base treated as an OntModel with the specification OntModelSpec.\u0026lt;i\u0026gt;NAME\u0026lt;/i\u0026gt;; see the Jena ontology documentation for the available names.\n-check ont:OWL_MEM_RDFS_INF:myModel.ttl -check ont:OWL_DL_MEM_RULE_INF:http://example.com/rdf/beer.rdf If the specialURL is of the form ja:R@AF, then the model is that described by the resource R in the Jena assembler description file AF. R is prefix-expanded using the prefixes in AF.\n-check ja:my:root@my-assembly.ttl -check ont:OWL_MEM_RDFS_INF:my:root@my-assembly.ttl If the URL (or the base) is of the form jdbc:DB:head:model, then the checked model is the one called model in the database with connection jdbc:DB:head. (The database user and password must be specified independently using the jena.db.user and jena.db.password system properties.)\n-check jdbc:mysql://localhost/test:example -config fileOrURL and -root rootURI The -config fileOrURL options specify the Eyeball assembler configuration files to load. A single configuration model is constructed as the union of the contents of those files. If this option is omitted, the default configuration file etc/eyeball-config.n3 is loaded. See inside the Eyeball configuration file for details of the configuration file.\n-config my-hacked-config-file.n3 -config etc/eyeball-config.n3 extras.ttl The -root rootURI option specifies the root resource in the Eyeball configuration. If this option is omitted, eye:eyeball is used by default. rootURI is prefix-expanded using the prefixes in the configuration file.\n-root my:root -root my:sparse-config -root urn:x-hp:eyeball-roots:special -set Setting* The -set option allows command-line tweaks to the configuration, eg for enabling checking URIs for empty local names. You will rarely need to use this; it is presented here because of its association with the -config and -root options.\nEach Setting has the form S.P=O and adds the statement (S' P' O') to the configuration.\nThe current Eyeball converts the components of the S.P=O string into RDF nodes S', P', O' using some special rules:\nA component starting with a digit is treated as an xsd:integer literal (and hence should only appear as the object of the setting). A component starting with a quote, either \u0026quot; or ', is treated as a literal whose lexical form extends to the matching closing quote. Note: (a) literals with embedded spaces are not supported; (b) your command-line interpreter may treat quotes specially, and to allow the quotes to pass through to Eyeball, you\u0026rsquo;ll have to use another (different) pair of quotes! A component starting with _ is treated as a blank node with that label. Otherwise, the component is treated as a URI reference. If it starts with a prefix (eg, rdf:) that prefix is expanded using the prefixes of the configuration file. If it has no prefix, it is as though the empty prefix was specified: in the default configuration file, that is set to the Eyeball namespace, so it is as though the prefix eye: had been used. For example, to enable the URI inspectors non-default reporting of URIs with empty local names, use:\n-set URIInspector.reportEmptyLocalNames=\u0026quot;'true'\u0026quot; Note the nested different quotes required to pass \u0026rsquo;true\u0026rsquo; to Eyeball so that it can interpret this as a literal.\n-include/-exclude shortNames The various Eyeball inspectors are given short names in the configuration file. By default, an Eyeball check uses a specific set of inspectors with short name defaultInspectors. Additional inspectors can be enabled using the -include option, and default inspectors can be disabled using the -exclude option. See below for the available inspectors and their short names, and see inspectors configuration for how to configure inspectors.\n-include list all-typed -exclude cardinality -include owl -exclude consistent-type -assume Reference The -assume References identifies any assumed schemas used to specify the predicates and classes of the data model. The reference may be a file name or a URL (and may be mapped by the file manager).\nEyeball automatically assumes the RDF and RDFS schemas, and the built-in XSD datatype classes. The short name owl can be used to refer to the OWL schema, dc to the Dublin Core schema, dcterms to the Dublin Core terms schema, and dc-all to both.\n-assume owl -assume owl dc-all -assume owl my-ontology.owl -sign and -accept (experimental) If -sign is specified, Eyeball first does a -check. If no problem reports are generated, Eyeball writes a signed version of the current model to the standard output. The signature records the Eyeball configuration used and a weak hash of the model. If the input model is already signed, that signature is discarded before computing the new signature and writing the output.\nIf -accept is specified, the model is checked for its signature. If it is not signed, or if the signature does not match the content of the model \u0026ndash; either the hash fails, or the recorded configuration is not sufficient \u0026ndash; a problem is reported; otherwise not.\nThe intended use of -sign and -accept is that an application can require signed models which have passed some minimum set of inspections. The application code can then rely on the model having the desired properties, without having to run potentially expensive validation checks every time a model is loaded.\nImportant. Model signing is intended to catch careless mistakes, not for security against malicious users.\n-version Eyeball will print its version on the standard error stream (currently \u0026ldquo;Eyeball 2.1 (Nova Embers)\u0026rdquo;).\n-remark Normally Eyeball issues its report or signed model to the standard output and exits with code 0 (success) or 1 (failure) with no additional output. Specifying -remark causes it to report success or some problems reported to standard error.\n-repair and -analyse (experimental) These operations are not currently documented. Try them at your peril: -repair may attempt to update your models.\n-render Name The eyeball reports are written to the standard output; by default, the reports appear as text (RDF rendered by omitting the subjects - which are all blank nodes - and lightly prettifying the predicate and object). To change the rendering style, supply the -render option with the name of the renderer as its value. Eyeball comes with N3, XML, and text renderers; the Eyeball config file associates renderer names with their classes.\n-render n3 -render rdf setting the proxy If any of the data or schema are identified by an http: URL, and you are behind a firewall, you will need specify the proxy to Java using system properties; one way to do this is by using the Java command line options:\n-DproxySet=true -DproxyHost=theProxyHostName -DproxyPort=theProxyPortNumber Inspectors shipped with Eyeball Eyeball comes with a collection of inspectors that do relatively simple checks.\nPropertyInspector (short name: \u0026ldquo;property\u0026rdquo;) Checks that every predicate that appears in the model is declared in some -assumed schema or owl:imported model \u0026ndash; that is, is given rdf:type rdf:Property or some subclass of it.\nClassInspector (short name: \u0026ldquo;presumed-class\u0026rdquo;) Checks that every resource in the model that is used as a class, ie that appears as the object of an rdf:type, rdfs:domain, or rdfs:range statement, or as the subject or object of an rdfs:subClassOf statement, has been declared as a Class in the -assumed schemas or in the model under test.\nURIInspector (short name: \u0026ldquo;URI\u0026rdquo;) Checks that every URI in the model is well-formed according to the rules of the Jena IRI library. May apply additional rules specified in the configuration file: see uri configuration later for details.\nLiteralInspector (short name: \u0026ldquo;literal\u0026rdquo;) Checks literals for syntactically correct language codes, syntactically correct datatype URIs (using the same rules as the URIInspector), and conformance of the lexical form of typed literals to their datatype.\nPrefixInspector (short name: \u0026ldquo;prefix\u0026rdquo;) The PrefixInspector checks that the prefix declarations of the model have namespaces that are valid URIs and that if the prefix name is \u0026ldquo;well-known\u0026rdquo; (rdf, rdfs, owl, xsd, and dc) then the associated URI is the one usually associated with the prefix.\nThe PrefixInspector also reports a problem if any prefix looks like an Jena automatically-generated prefix, j.\u0026lt;i\u0026gt;Number\u0026lt;/i\u0026gt;. (Jena generates these prefixes when writing RDF/XML if the XML syntactically requires a prefix but the model hasn\u0026rsquo;t defined one.)\nVocabularyInspector (short name: \u0026ldquo;vocabulary\u0026rdquo;) Checks that every URI in the model with a namespace which is mentioned in some schema is one of the URIs declared for that namespace \u0026ndash; that is, it assumes that the schemas define a closed set of URIs.\nThe inspector may be configured to suppress this check for specified namespaces: see vocabulary configuration later.\nOwlSyntaxInspector (short name: \u0026ldquo;owl\u0026rdquo;) This inspector looks for \u0026ldquo;suspicious restrictions\u0026rdquo; which have some of the OWL restriction properties but not exactly one owl:onProperty and exactly one constraint (owl:allValuesFrom, etc).\nSparqlDrivenInspector (short name: \u0026ldquo;sparql\u0026rdquo;) The SparqlDrivenInspector is configured according to configuring the SPARQL-driven inspector, and applies arbitrary SPARQL queries to the model. The queries can be required to match or prohibited from matching; a problem is reported if the constraint fails.\nAllTypedInspector (short name: \u0026ldquo;all-typed\u0026rdquo;) Checks that all URI and bnode resources in the model have an rdf:type property in the model or the schema(s). If there is a statement in the configuration with property eye:checkLiteralTypes and value eye:true, also checks that every literal has a type or a language. Not in the default set of inspectors.\nConsistentTypeInspector (short name: \u0026ldquo;consistent-type\u0026rdquo;) Checks that every subject in the model can be given a type which is the intersection of the subclasses of all its \u0026ldquo;attached\u0026rdquo; types \u0026ndash; a \u0026ldquo;consistent type\u0026rdquo;.\nFor example, if the model contains three types Top, Left, and Right, with Left and Right both being subtypes of Top and with no other subclass statements, then some S with rdf:types Left and Right would generate this warning.\nCardinalityInspector (short name: \u0026ldquo;cardinality\u0026rdquo;) Looks for classes C that are subclasses of cardinality restrictions on some property P with cardinality range min to max. For any X of rdf:type C, it checks that the number of values of P is in the range min..max and generates a report if it isn\u0026rsquo;t.\nLiterals are counted as distinct if their values (not just their lexical form) are distinct. Resources are counted as distinct if they have different case-sensitive URIs: the CardinalityInspector takes no account of owl:sameAs statements.\nListInspector (short name: \u0026ldquo;list\u0026rdquo;) The ListInspector performs two separate checks:\nlooks for lists that are ill-formed by having multiple or missing rdf:first or rdf:rest properties on their elements. looks for possible mis-uses of the \u0026ldquo;typed list\u0026rdquo; idiom, and reports the types so defined. The typed list idiom is boilerplate OWL for defining a type which is List-of-T for some type T. It takes the form:\nmy:EList a owl:Class ; rdfs:subClassOf rdf:List ; rdfs:subClassOf [owl:onProperty rdf:first; owl:allValuesFrom my:Element] ; rdfs:subClassOf [owl:onProperty rdf:rest; owl:allValuesFrom my:EList] . The type my:Element is the element type of the list, and the type EList is the resulting typed list. The list inspector checks that all the subclasses of rdf:List (such as EList above) that are also subclasses of any bnode (such as the two other superclasses of *EList)*that has any property (eg, owl:onProperty) that has as an object either rdf:first or rdf:rest is a subclass defined by the full idiom above: if not, it reports it as a suspectListIdiom.\nEyeball problem reports Eyeball generates its reports as items in a model. Each item has rdf:type eye:Item, and its other properties determine what problem report it is. The default text renderer displays a prettified form of each item; use -render n3 to expose the complete report structure.\nOne of the item\u0026rsquo;s properties is its main property, which identifies the problem; the others are qualifications supplying additional detail.\nPropertyInspector: predicate not declared [] eye:unknownPredicate \u0026quot;*URIString*\u0026quot;. The predicate with the given URI is not defined in any of the -assumed schemas.\nClassInspector: class not declared [] eye:unknownClass \u0026quot;*URIString*\u0026quot;. The resource with the given URI is used as a Class, but not defined in any of the -assumed schemas.\nURIInspector: bad URI [] eye:badURI \u0026quot;*URIString*\u0026quot;; eye:forReason *Reason*. The URIString isn\u0026rsquo;t legal as a URI, or is legal but fails a user-specified spelling constraint. Reason is a resource or string identifying the reason.\nreason explanation eye:uriContainsSpaces the URI contains unencoded spaces, probably as a result of sloppy use of file: URLs. eye:uriFileInappropriate a URI used as a namespace is a file: URI, which is inappropriate as a global identifier. eye:uriHasNoScheme a URI has no scheme field, probably a misused relative URI. eye:schemeShouldBeLowercase the scheme part of a URI is not lower-case; while technically correct, this is not usual practice. eye:uriFailsPattern a URI fails the pattern appropriate to its schema (as defined in the configuration for this eyeball). eye:unrecognisedScheme the URI scheme is unknown, perhaps a misplaced QName. eye:uriNoHttpAuthority an http: URI has no authority (domain name/port) component. eye:uriSyntaxFailure the URI can\u0026rsquo;t be parsed using the general URI syntax, even with any spaces removed. eye:namespaceEndsWithNameCharacter a namespace URI ends in a character that can appear in a name, leading to possible ambiguities. eye:uriHasNoLocalname a URI has no local name according to the XML name-splitting rules. (For example, the URI http://x.com/foo#12345 has no local name because a local name cannot start with a digit.) \u0026ldquo;did not match required pattern Taili for prefix Head\u0026rdquo;. This badURI starts with Head, but the remainder doesn\u0026rsquo;t match any of the *Taili*s associated with that prefix. \u0026ldquo;matched prohibited pattern Tail for prefix Head\u0026rdquo;. This badURI starts with Head, and the remainder matched a prohibited Tail associated with that prefix. LiteralInspector: illegal language code [] eye:badLanguage \u0026quot;*badCode*\u0026quot;; eye:onLiteral \u0026quot;*spelling*\u0026quot;. A literal with the lexical form spelling has the illegal language code badCode.\nLiteralInspector: bad datatype URI [] eye:badDatatypeURI \u0026quot;*badURI*\u0026quot;; eye:onLiteral \u0026quot;*spelling*\u0026quot;. A literal with the lexical form spelling has the illegal datatype URI badURI.\nLiteralInspector: bad lexical form [] eye:badLexicalForm \u0026quot;*spelling*\u0026quot;; eye:forDatatype \u0026quot;*dtURI*\u0026quot;. A literal with the datatype URI dtURI has the lexical form spelling, which isn\u0026rsquo;t legal for that datatype.\nPrefixInspector: bad namespace URI [] eye:badNamespaceURI \u0026quot;*URIString*\u0026quot; ; eye:onPrefix \u0026quot;*prefix*\u0026quot; ; eye:forReason *Reason*. The namespace URIString for the declaration of prefix is suspicious for the given Reason (see the URIInspector reports for details of the possible reasons).\nPrefixInspector: Jena prefix found [] eye:jenaPrefixFound \u0026quot;*j.Digits*\u0026quot;; eye:forNamespace \u0026quot;*URIString*\u0026quot;. The namespace URIString has an automatically-generated Jena prefix.\nPrefixInspector: multiple prefixes for namespace [] eye:multiplePrefixesForNamespace \u0026quot;*NameSpace*\u0026quot; ; eye:onPrefix \u0026quot;*prefix\u0026lt;sub\u0026gt;1\u0026lt;/sub\u0026gt;\u0026quot;* ... There are multiple prefix declarations for NameSpace, namely, prefix1 etc.\nVocabularyInspector: not from schema [] eye:notFromSchema \u0026quot;*NameSpace*\u0026quot;; eye:onResource *Resource*. The Resource has a URI in the NameSpace, but isn\u0026rsquo;t declared in the schema associated with that NameSpace.\nOwlSyntaxInspector: suspicious restriction [] eye:suspiciousRestriction *R*; eye:forReason *Reason*... The presumed restriction R is suspicious for the given Reasons:\neye:missingOnProperty \u0026ndash; there is no owl:onProperty property in this suspicious restriction. eye:multipleOnProperty \u0026ndash; there are multiple owl:onProperty properties in this suspicious restriction. eye:missingConstraint \u0026ndash; there is no owl:hasValue, owl:allValuesFrom, owl:someValuesFrom, or owl:[minC|maxC|c]ardinality property in this suspicious restriction. eye:multipleConstraint \u0026ndash; there are multiple constraints (as above) in this suspicious restriction. The restriction R is identified by (a) supplying its immediate properties, and (b) identifying its named equivalent classes and subclasses.\nSparqlDrivenInspector: require failed [] eye:sparqlRequireFailed \u0026quot;*message*\u0026quot;. A SPARQL query that was required to succeed against the model did not. The message is either the query that failed or a meaningful description, depending on the inspector configuration.\nSparqlDrivenInspector: prohibit failed [] eye:sparqlProhibitFailed \u0026quot;*message*\u0026quot;. A SPARQL query that was required to fail against the model did not. The message is either the query that succeeded or a meaningful description, depending on the inspector configuration.\nAllTypedInspector: should have type [] eye:shouldHaveType *Resource*. The Resource has no rdf:type. Note that when using models with inference, this report is unlikely, since inference may well give the resource a type even if it has no explicit type in the original model.\nConsistentTypeInspector: inconsistent types for resource [] eye:noConsistentTypeFor *URI* ; eye:hasAttachedType *TypeURI\u0026lt;sub\u0026gt;i\u0026lt;/sub\u0026gt;* ... The resource URI has been given the various types TypeURIi, but if we assume that subtypes are disjoint unless otherwise specified, these types have no intersection.\nThe ConsistentTypeInspector must do at least some type inference. This release of Eyeball compromises by doing RDFS inference augmented by (very) limited union and intersection reasoning, as described in the Jena rules in etc/owl-like.rules, so its reports must be treated with caution. Even with these restrictions, doing type inference over a large model is costly: you may need to suppress it with -exclude until any other warnings are dealt with.\nWhile, technically, a resource with no attached types at all is automatically inconsistent, Eyeball quietly ignores such resources, since they turn up quite often in simple RDF models.\nCardinalityInspector: cardinality failure [] eye:cardinalityFailure *Subject*; eye:onType *T*; eye:onProperty *P* The Subject has a cardinality-constrained rdf:type T with owl:onProperty P, but the number of distinct values in the model isn\u0026rsquo;t consistent with the cardinality restriction.\nAdditional properties describe the cardinality restriction and the values found:\neye:numValues N: the number of distinct values for (Subject, P) in the model. eye:cardinality [eye:min min; eye:max max]: the minimum and maximum cardinalities permitted. eye:values Set: A blank node of type eye:Set with an rdfs:member value for each of the values of P. ListInspector: ill-formed list [] eye:illFormedList *URI* ; eye:because [eye:element *index\u0026lt;sub\u0026gt;i\u0026lt;/sub\u0026gt;*; *Problem\u0026lt;sub\u0026gt;i~*]~i\u0026lt;/sub\u0026gt; ... The list starting at URI is ill-formed because the element with index indexi had Problemi. The possible problems are:\neye:hasNoRest \u0026ndash; the element has no rdf:rest property. eye:hasMultipleRests \u0026ndash; the element has more than one rdf:rest property. eye:hasNoFirst \u0026ndash; the element has no rdf:first property. eye:hasMultipleFirsts \u0026ndash; the element has more than one rdf:rest property. ListInspector: suspect list idiom [] eye:suspectListIdiom *Type*. The resource Type looks like it\u0026rsquo;s supposed to be a use of the \u0026ldquo;typed list idiom\u0026rdquo;, but it isn\u0026rsquo;t complete/accurate.\nInside the Eyeball configuration file Configuration files The Eyeball command-line utility is configured by files (or URLs) specified on the command line: their RDF contents are unioned together into a single config model. If no config file is specified, then etc/eyeball-config.n3 is loaded. The configuration file is a Jena assembler description (see Assemblers) with added Eyeball vocabulary.\nEyeball is also configured by the location-mapping file etc/location-mapping.n3. The Eyeball jar contains copies of both the default config and the location mapper; these are used by default. You can provide your own etc/eyeball-config.n3 file earlier on your classpath or in your current directory; this config replaces the default. You may provide additional location-mapping files earlier on your classpath or in your current directory.\nConfiguring schema names To avoid having to quote schema names in full on the Eyeball command line, (collections of) schemas can be given short names. [] eye:shortName shortNameLiteral ; eye:schema fullSchemaURL \u0026hellip; .\nA shortname can name several schemas. The Eyeball delivery has the short names rdf, rdfs, owl, and dc for the corresponding schemas (and mirror versions of those schemas so that they don\u0026rsquo;t need to be downloaded each time Eyeball is run.)\nConfiguring inspectors The inspectors that Eyeball runs over the model are specified by eye:inspector properties of inspector resources. These resources are identified by eye:shortNames (supplied on the command line). Each such property value must be a plain string literal whose value is the full name of the Inspector class to load and run; see the Javadoc of Inspector for details.\nAn inspector resource may refer to other inspector resources to include their inspectors, using either of the two properties eye:include or eye:includeByName. The value of an include property should be another inspector resource; the value of an includeByName property should be the shortName of an inspector resource.\nConfiguring the URI inspector As well as applying the standard URI rules, Eyeball allows extra pattern-oriented checks to be applied to URIs. These are specified by eye:check properties of the URIInspector object in the configuration.\nThe object of an eye:check property is a bnode with eye:prefix, eye:prohibit, and eye:require properties. The objects of these properties must be string literals.\nIf a URI U can be split into a prefix P and suffix S, and there is a check property with that prefix, and either:\nthere\u0026rsquo;s a prohibit property and S matches the object of that property, or there\u0026rsquo;s a require property and S does not match the object of that property, then a problem is reported. If there are multiple prohibits, then a problem is reported if any prohibition is violated; if there are multiple requires, a problem is reported if none of them succeed.\neye:URIInspector eye:check [eye:prefix \u0026quot;urn:x-hp:\u0026quot;; eye:prohibit \u0026quot;.*:.*\u0026quot;] ; [eye:prefix \u0026quot;http://example.com/\u0026quot;; eye:require \u0026quot;.*eyeball.*\u0026quot;] The prefixes, requires, and prohibits are treated as Java patterns. The URI inspector can be configured to report URIs with an empty local name. These arise because the meaning of \u0026ldquo;local name\u0026rdquo; comes from XML, and in XML a local name must start with an NCName character, typically a letter but not a digit. Hence URIs like http://example.com/productCode#1829 have an empty local name. This is sometimes confusing.\nTo report empty local names, add the property eye:reportEmptyLocalNames to the inspector eye:URIInspector with the property value true. You may edit the configuration file or use the -set command-line option.\nConfiguring the vocabulary inspector The vocabulary inspector defaults to assuming that schema namespaces are closed. To disable this for specified namespaces, the inspector object in the configuration can be given eye:openNamespace properties.\nThe object of each of these properties must be a resource; the URI of this resource is an open namespace for which the inspector will not report problems.\neye:VocabularyInspector eye:openNamespace \u0026lt;http://example.com/examples#\u0026gt; Configuring the SPARQL-driven inspector The SPARQL inspector object in the configuration may be given eye:sparql properties whose objects are resources specifying SPARQL queries and problem messages.\neye:SparqlDrivenInspector eye:sparql [...] The resource may specify a SPARQL query which must succeed in the model, and a message to produce if it does not.\neye:SparqlDrivenInspector eye:sparql [eye:require \u0026quot;select * where {?s ?p ?o}\u0026quot;; eye:message \u0026quot;must be non-empty\u0026quot;] If the query is non-trivial, the string may contain a reference to a file containing the query, rather than the entire query.\neye:require \u0026quot;@'/home/kers/example/query-one.sparql'\u0026quot; The quoted filename is read using the Jena file manager and so respects any filename mappings. \u0026ldquo;@\u0026rdquo; characters not followed by \u0026ldquo;\u0026rsquo;\u0026rdquo; are not subject to substitution, except that the sequence \u0026ldquo;@@\u0026rdquo; is replaced by \u0026ldquo;@\u0026rdquo;.\nUsing eye:prohibit rather than eye:require means that the problem is reported if the query succeeds, rather than if it fails.\nConfiguring renderers The renderer class that Eyeball uses to render the report into text is giving in the config file by triples of the form:\n[] eye:renderer FullClassName ; eye:shortName ShortClassHandler The FullClassName is a string literal giving the full class name of the rendering class. That class must implement the Renderer interface and have a constructor that takes a Resource, its configuration root, as its argument.\nThe ShortClassHandle is a string literal giving the short name used to refer to the class. The default short name used is default. There should be no more than one eye:shortName statement with the same ShortClassHandle in the configuration file, but the same class can have many different short names.\nThe TextRenderer supports an additional property eye:labels to allow the appropriate labels for an ontology to be supplied to the renderer. Each object of a eye:labels statement names a model; all the rdfs:label statements in that model are used to supply strings which are used to render resources.\nThe model names are strings which are interpreted by Jena\u0026rsquo;s FileManager, so they may be redirected using Jena\u0026rsquo;s file mappings.\nInside the Eyeball code Eyeball can be used from within Java code; the command line merely provides a convenient external interface.\nCreating an Eyeball An Eyeball object has three subcomponents: the assumptions against which the model is to be checked, the inspectors which do the checking, and the renderer used to display the reports.\nThe assumptions are bundled into a single OntModel. Multiple assumptions can be supplied either by adding them as sub-models or by loading their content directly into the OntModel.\nThe inspectors are supplied as a single Inspector object. The method Inspector.Operations.create(List) creates a single Inspector from a list of Inspectors; this inspector delegates all its inspection methods to all of its sub-inspectors.\nThe renderer can be anything that implements the (simple) renderer interface.\nTo create an Eyeball:\nEyeball eyeball = new Eyeball( inspector, assumptions, renderer ); To eyeball a model Models to be inspected are provided as OntModels. The problems are delivered to a Report object, where they are represented as an RDF model.\neyeball.inspect( report, ontModelToBeInspected ) The result is that same report object. The Report::model() method delivers an RDF model which describes the problems found by the inspection. The inspections supplied in the distribution use the EYE vocabulary, and are used in the standard reports:\nEvery report item in the model is a blank node with rdf:type eye:Item. See earlier sections for the descriptions of the properties attached to an Item.\nRebuilding Eyeball The provided ant script can be used to rebuild Eyeball from source:\nant clean build jar (Omitting clean will do an incremental build, useful for small changes.)\nThe libraries required by Eyeball are all in the lib directory, including the necessary Jena jars.\nCreating and configuring an inspector To make a new inspector available to Eyeball, a new Inspector class must be created and that class has to be described in the Eyeball configuration.\nCreating an Inspector Any inspector must implement the Inspector interface, which has four operations:\nbegin( Report r, OntModel assume ): Begin a new inspection. r is the Report object which will accept the reports in this inspection; assume is the model containing the assumed ontologies. begin is responsible for declaring this inspectors report properties. inspectModel( Report r, OntModel m ): Do a whole-model inspection of m, issuing reports to r. inspectStatement( Report r, Statement s ): Inspect the single statement s, issuing reports to r. end( Report r ): Do any tidying-up reports required. Typically end and one of inspectModel or inspectStatement do nothing.\nAn inspector must also have a constructor that takes a Resource argument. When Eyeball creates the Inspector object, it passes the Resource which is the root of this inspector\u0026rsquo;s configuration. (This is, for example, how the SPARQL-driven inspector receives the query strings to use.)\nDevelopers may find the class InpsectorBase useful; it has empty implementations for all the Inspector methods. They may also find InspectorTestBase useful when writing their inspector\u0026rsquo;s tests, both for its convenience methods and because it requires that their class has the appropriate constructors.\nReports and report properties Eyeball reports are statements in a report model. To let the renderer know which property of a report is the \u0026ldquo;main\u0026rdquo; one, and which order the other properties should appear in, the inspector\u0026rsquo;s begin method should declare the properties:\nr.declareProperty( EYE.badDatatypeURI ); r.declareOrder( EYE.badLanguage, EYE.onLiteral ); declareProperty(P) announces that P is a report property of this inspector. declareOrder(F,S) says that both F and S are report properties, and that F should appear before S in the rendered report.\nReports are made up of report items, which are the subjects of the report properties. To create a report item, use one of reportItem() or reportItem(S). The second form is appropriate when the report is attached to some statement S of the model being inspected; a report renderer will attempt to display S.\nTo add the main property to a report item R, use R.addMainProperty(P,O); to add non-main properties, use R.addProperty(P,O).\nConfiguring an inspector To add an inspector to a configuration file, choose a URI for it (here we\u0026rsquo;re using my:Fresh and assuming a prefix declaration for my:) and a short name (here, \u0026ldquo;fresh\u0026rdquo;) and add a description to the configuration file:\nmy:Fresh a eye:Inspector ; eye:shortName \u0026quot;fresh\u0026quot; ; rdfs:label \u0026quot;fresh checks for my application\u0026quot; ; eye:className \u0026quot;full.path.to.Fresh\u0026quot; . Replace full.path.to.Fresh with the full classname of your inspector. Now you can use Fresh by adding -include fresh to the Eyeball command line (and ensuring that the class is on your classpath).\nIf you want Fresh to be included by default, then you must add it as an eye:inspector property of the configuration root, eg:\neye:eyeball a eye:Eyeball ; eye:inspector eye:PrefixInspector, # as delivered my:FreshInspector, # new inspector eye:URIInspector, # as delivered ... ","permalink":"https://jena.apache.org/documentation/archive/eyeball/eyeball-manual.html","tags":null,"title":"Jena Eyeball manual"},{"categories":null,"contents":"This extension to ARQ combines SPARQL and full text search via Lucene. It gives applications the ability to perform indexed full text searches within SPARQL queries. Here is a version compatibility table:\nJena Lucene Solr ElasticSearch upto 3.2.0 5.x or 6.x 5.x or 6.x not supported 3.3.0 - 3.9.0 6.4.x not supported 5.2.2 - 5.2.13 3.10.0 7.4.0 not supported 6.4.2 3.15.0 - 3.17.0 7.7.x not supported 6.8.6 4.0.0 - 4.6.1 8.8.x not supported not supported 4.7.0 - current 9.4.x not supported not supported Note: In Lucene 9, the default setup of the StandardAnalyzer changed to having no stop words. For more details, see analyzer specifications below.\nSPARQL allows the use of regex in FILTERs which is a test on a value retrieved earlier in the query so its use is not indexed. For example, if you\u0026rsquo;re searching for occurrences of \u0026quot;printer\u0026quot; in the rdfs:label of a bunch of products:\nPREFIX ex: \u0026lt;http://www.example.org/resources#\u0026gt; PREFIX rdfs: \u0026lt;http://www.w3.org/2000/01/rdf-schema#\u0026gt; SELECT ?s ?lbl WHERE { ?s a ex:Product ; rdfs:label ?lbl FILTER regex(?lbl, \u0026#34;printer\u0026#34;, \u0026#34;i\u0026#34;) } then the search will need to examine all selected rdfs:label statements and apply the regular expression to each label in turn. If there are many such statements and many such uses of regex, then it may be appropriate to consider using this extension to take advantage of the performance potential of full text indexing.\nText indexes provide additional information for accessing the RDF graph by allowing the application to have indexed access to the internal structure of string literals rather than treating such literals as opaque items. Unlike FILTER, an index can set the values of variables. Assuming appropriate configuration, the above query can use full text search via the ARQ property function extension, text:query:\nPREFIX ex: \u0026lt;http://www.example.org/resources#\u0026gt; PREFIX rdfs: \u0026lt;http://www.w3.org/2000/01/rdf-schema#\u0026gt; PREFIX text: \u0026lt;http://jena.apache.org/text#\u0026gt; SELECT ?s ?lbl WHERE { ?s a ex:Product ; text:query (rdfs:label 'printer') ; rdfs:label ?lbl } This query makes a text query for 'printer' on the rdfs:label property; and then looks in the RDF data and retrieves the complete label for each match.\nThe full text engine can be either Apache Lucene hosted with Jena on a single machine, or Elasticsearch for a large scale enterprise search application where the full text engine is potentially distributed across separate machines.\nThis example code illustrates creating an in-memory dataset with a Lucene index.\nArchitecture In general, a text index engine (Lucene or Elasticsearch) indexes documents where each document is a collection of fields, the values of which are indexed so that searches matching contents of specified fields can return a reference to the document containing the fields with matching values.\nThere are two models for extending Jena with text indexing and search:\nOne Jena triple equals one Lucene document One Lucene document equals one Jena entity One triple equals one document The basic Jena text extension associates a triple with a document and the property of the triple with a field of a document and the object of the triple (which must be a literal) with the value of the field in the document. The subject of the triple then becomes another field of the document that is returned as the result of a search match to identify what was matched. (NB, the particular triple that matched is not identified. Only, its subject and optionally the matching literal and match score.)\nIn this manner, the text index provides an inverted index that maps query string matches to subject URIs.\nA text-indexed dataset is configured with a description of which properties are to be indexed. When triples are added, any properties matching the description cause a document to be added to the index by analyzing the literal value of the triple object and mapping to the subject URI. On the other hand, it is necessary to specifically configure the text-indexed dataset to delete index entries when the corresponding triples are dropped from the RDF store.\nThe text index uses the native query language of the index: Lucene query language (with restrictions) or Elasticsearch query language.\nOne document equals one entity There are two approaches to creating indexed documents that contain more than one indexed field:\nUsing an externally maintained Lucene index Multiple fields per document When using this integration model, text:query returns the subject URI for the document on which additional triples of metadata may be associated, and optionally the Lucene score for the match.\nExternal content When document content is externally indexed via Lucene and accessed in Jena via a text:TextDataset then the subject URI returned for a search result is considered to refer to the external content, and metadata about the document is represented as triples in Jena with the subject URI.\nThere is no requirement that the indexed document content be present in the RDF data. As long as the index contains the index text documents to match the index description, then text search can be performed with queries that explicitly mention indexed fields in the document.\nThat is, if the content of a collection of documents is externally indexed and the URI naming the document is the result of the text search, then an RDF dataset with the document metadata can be combined with accessing the content by URI.\nThe maintenance of the index is external to the RDF data store.\nExternal applications By using Elasticsearch, other applications can share the text index with SPARQL search.\nDocument structure As mentioned above, when using the (default) one-triple equals one-document model, text indexing of a triple involves associating a Lucene document with the triple. How is this done?\nLucene documents are composed of Fields. Indexing and searching are performed over the contents of these Fields. For an RDF triple to be indexed in Lucene the property of the triple must be configured in the entity map of a TextIndex. This associates a Lucene analyzer with the property which will be used for indexing and search. The property becomes the searchable Lucene Field in the resulting document.\nA Lucene index includes a default Field, which is specified in the configuration, that is the field to search if not otherwise named in the query. In jena-text this field is configured via the text:defaultField property which is then mapped to a specific RDF property via text:predicate (see entity map below).\nThere are several additional Fields that will be included in the document that is passed to the Lucene IndexWriter depending on the configuration options that are used. These additional fields are used to manage the interface between Jena and Lucene and are not generally searchable per se.\nThe most important of these additional Fields is the text:entityField. This configuration property defines the name of the Field that will contain the URI or blank node id of the subject of the triple being indexed. This property does not have a default and must be specified for most uses of jena-text. This Field is often given the name, uri, in examples. It is via this Field that ?s is bound in a typical use such as:\nselect ?s where { ?s text:query \u0026quot;some text\u0026quot; } Other Fields that may be configured: text:uidField, text:graphField, and so on are discussed below.\nGiven the triple:\nex:SomeOne skos:prefLabel \u0026quot;zorn protégé a prés\u0026quot;@fr ; The following is an abbreviated illustration a Lucene document that Jena will create and request Lucene to index:\nDocument\u0026lt; \u0026lt;uri:http://example.org/SomeOne\u0026gt; \u0026lt;graph:urn:x-arq:DefaultGraphNode\u0026gt; \u0026lt;label:zorn protégé a prés\u0026gt; \u0026lt;lang:fr\u0026gt; \u0026lt;uid:28959d0130121b51e1459a95bdac2e04f96efa2e6518ff3c090dfa7a1e6dcf00\u0026gt; \u0026gt; It may be instructive to refer back to this example when considering the various points below.\nQuery with SPARQL The URI of the text extension property function is http://jena.apache.org/text#query more conveniently written:\nPREFIX text: \u0026lt;http://jena.apache.org/text#\u0026gt; ... text:query ... Syntax The following forms are all legal:\n?s text:query 'word' # query ?s text:query ('word' 10) # with limit on results ?s text:query (rdfs:label 'word') # query specific property if multiple ?s text:query (rdfs:label 'protégé' 'lang:fr') # restrict search to French (?s ?score) text:query 'word' # query capturing also the score (?s ?score ?literal) text:query 'word' # ... and original literal value (?s ?score ?literal ?g) text:query 'word' # ... and the graph The most general form when using the default one-triple equals one-document integration model is:\n( ?s ?score ?literal ?g ) text:query ( property* 'query string' limit 'lang:xx' 'highlight:yy' ) while for the one-document equals one-entity model, the general form is:\n( ?s ?score ) text:query ( 'query string' limit ) and if only the subject URI is needed:\n?s text:query ( 'query string' limit ) Input arguments: Argument Definition property (zero or more) property URIs (including prefix name form) query string Lucene query string fragment limit (optional) int limit on the number of results lang:xx (optional) language tag spec highlight:yy (optional) highlighting options The property URI is only necessary if multiple properties have been indexed and the property being searched over is not the default field of the index.\nSince 3.13.0, property may be a list of zero or more (prior to 3.13.0 zero or one) Lucene indexed properties, or a defined text:propList of indexed properties. The meaning is an OR of searches on a variety of properties. This can be used in place of SPARQL level UNIONs of individual text:querys. For example, instead of:\nselect ?foo where { { (?s ?sc ?lit) text:query ( rdfs:label \u0026quot;some query\u0026quot; ). } union { (?s ?sc ?lit) text:query ( skos:altLabel \u0026quot;some query\u0026quot; ). } union { (?s ?sc ?lit) text:query ( skos:prefLabel \u0026quot;some query\u0026quot; ). } } it can be more performant to push the unions into the Lucene query by rewriting as:\n(?s ?sc ?lit) text:query ( rdfs:label skos:prefLabel skos:altLabel \u0026quot;some query\u0026quot; ) which creates a Lucene query:\n(altLabel:\u0026quot;some query\u0026quot; OR prefLabel:\u0026quot;some query\u0026quot; OR label:\u0026quot;some query\u0026quot;) The query string syntax conforms to the underlying Lucene, or when appropriate, Elasticsearch.\nIn the case of the default one-triple equals one-document model, the Lucene query syntax is restricted to Terms, Term modifiers, Boolean Operators applied to Terms, and Grouping of terms.\nAdditionally, the use of Fields within the query string is supported when using the one-document equals one-entity text integration model.\nWhen using the default model, use of Fields in the query string will generally lead to unpredictable results.\nThe optional limit indicates the maximum hits to be returned by Lucene.\nThe lang:xx specification is an optional string, where xx is a BCP-47 language tag. This restricts searches to field values that were originally indexed with the tag xx. Searches may be restricted to field values with no language tag via \u0026quot;lang:none\u0026quot;.\nThe highlight:yy specification is an optional string where yy are options that control the highlighting of search result literals. See below for details.\nIf both limit and one or more of lang:xx or highlight:yy are present, then limit must precede these arguments.\nIf only the query string is required, the surrounding ( ) may be omitted.\nOutput arguments: Argument Definition subject URI The subject of the indexed RDF triple. score (optional) The score for the match. literal (optional) The matched object literal. graph URI (optional) The graph URI of the triple. property URI (optional) The property URI of the matched triple The results include the subject URI; the score assigned by the text search engine; and the entire matched literal (if the index has been configured to store literal values). The subject URI may be a variable, e.g., ?s, or a URI. In the latter case the search is restricted to triples with the specified subject. The score, literal, graph URI, and property URI must be variables. The property URI is meaningful when two or more properties are used in the query.\nQuery strings There are several points that need to be considered when formulating SPARQL queries using either of the Lucene integration models.\nAs mentioned above, in the case of the default model the query string syntax is restricted to Terms, Term modifiers, Boolean Operators applied to Terms, and Grouping of terms.\nExplicit use of Fields in the query string is only useful with the one-document equals one-entity model; and otherwise will generally produce unexpected results. See Queries across multiple Fields.\nSimple queries The simplest use of the jena-text Lucene integration is like:\n?s text:query \u0026quot;some phrase\u0026quot; This will bind ?s to each entity URI that is the subject of a triple that has the default property and an object literal that matches the argument string, e.g.:\nex:AnEntity skos:prefLabel \u0026quot;this is some phrase to match\u0026quot; This query form will indicate the subjects that have literals that match for the default property which is determined via the configuration of the text:predicate of the text:defaultField (in the above this has been assumed to be skos:prefLabel.\nFor a non-default property it is necessary to specify the property as an input argument to the text:query:\n?s text:query (rdfs:label \u0026quot;protégé\u0026quot;) (see below for how RDF property names are mapped to Lucene Field names).\nIf this use case is sufficient for your needs you can skip on to the sections on configuration.\nPlease note that the query:\n?s text:query \u0026quot;some phrase\u0026quot; when using the Lucene StandardAnalyzer or similar will treat the query string as an OR of terms: some and phrase. If a phrase search is required then it is necessary to surround the phrase by double quotes, \u0026quot;:\n?s text:query \u0026quot;\\\u0026quot;some phrase\\\u0026quot;\u0026quot; This will only match strings that contain \u0026quot;some phrase\u0026quot;, while the former query will match strings like: \u0026quot;there is a phrase for some\u0026quot; or \u0026quot;this is some of the various sorts of phrase that might be matched\u0026quot;.\nQueries with language tags When working with rdf:langStrings it is necessary that the text:langField has been configured. Then it is as simple as writing queries such as:\n?s text:query \u0026quot;protégé\u0026quot;@fr to return results where the given term or phrase has been indexed under French in the text:defaultField.\nIt is also possible to use the optional lang:xx argument, for example:\n?s text:query (\u0026quot;protégé\u0026quot; 'lang:fr') . In general, the presence of a language tag, xx, on the query string or lang:xx in the text:query adds AND lang:xx to the query sent to Lucene, so the above example becomes the following Lucene query:\n\u0026quot;label:protégé AND lang:fr\u0026quot; For non-default properties the general form is used:\n?s text:query (skos:altLabel \u0026quot;protégé\u0026quot; 'lang:fr') Note that an explicit language tag on the query string takes precedence over the lang:xx, so the following\n?s text:query (\u0026quot;protégé\u0026quot;@fr 'lang:none') will find French matches rather than matches indexed without a language tag.\nQueries that retrieve literals It is possible to retrieve the literals that Lucene finds matches for assuming that\n\u0026lt;#TextIndex#\u0026gt; text:storeValues true ; has been specified in the TextIndex configuration. So\n(?s ?sc ?lit) text:query (rdfs:label \u0026quot;protégé\u0026quot;) will bind the matching literals to ?lit, e.g.,\n\u0026quot;zorn protégé a prés\u0026quot;@fr Note it is necessary to include a variable to capture the Lucene score even if this value is not otherwise needed since the literal variable is determined by position.\nQueries with graphs Assuming that the text:graphField has been configured, then, when a triple is indexed, the graph that the triple resides in is included in the document and may be used to restrict searches or to retrieve the graph that a matching triple resides in.\nFor example:\nselect ?s ?lit where { graph ex:G2 { (?s ?sc ?lit) text:query \u0026quot;zorn\u0026quot; } . } will restrict searches to triples with the default property that reside in graph, ex:G2.\nOn the other hand:\nselect ?g ?s ?lit where { graph ?g { (?s ?sc ?lit) text:query \u0026quot;zorn\u0026quot; } . } will iterate over the graphs in the dataset, searching each in turn for matches.\nIf there is suitable structure to the graphs, e.g., a known rdf:type and depending on the selectivity of the text query and number of graphs, it may be more performant to express the query as follows:\nselect ?g ?s ?lit where { (?s ?sc ?lit) text:query \u0026quot;zorn\u0026quot; . graph ?g { ?s a ex:Item } . } Further, if tdb:unionDefaultGraph true for a TDB dataset backing a Lucene index then it is possible to retrieve the graphs that contain triples resulting from a Lucene search via the fourth output argument to text:query:\nselect ?g ?s ?lit where { (?s ?sc ?lit ?g) text:query \u0026quot;zorn\u0026quot; . } This will generally perform much better than either of the previous approaches when there are large numbers of graphs since the Lucene search will run once and the returned documents carry the containing graph URI for free as it were.\nQueries across multiple Fields As mentioned earlier, the Lucene text index uses the native Lucene query language.\nMultiple fields in the default integration model For the default integration model, since each document has only one field containing searchable text, searching for documents containing multiple fields will generally not find any results.\nNote that the default model provides three Lucene Fields in a document that are used during searching:\nthe field corresponding to the property of the indexed triple, the field for the language of the literal (if configured), and the graph that the triple is in (if configured). Given these, it should be clear from the above that the default model constructs a Lucene query from the property, query string, lang:xx, and SPARQL graph arguments.\nFor example, consider the following triples:\nex:SomePrinter rdfs:label \u0026quot;laser printer\u0026quot; ; ex:description \u0026quot;includes a large capacity cartridge\u0026quot; . assuming an appropriate configuration, if we try to retrieve ex:SomePrinter with the following Lucene query string:\n?s text:query \u0026quot;label:printer AND description:\\\u0026quot;large capacity cartridge\\\u0026quot;\u0026quot; then this query can not find the expected results since the AND is interpreted by Lucene to indicate that all documents that contain a matching label field and a matching description field are to be returned; yet, from the discussion above regarding the structure of Lucene documents in jena-text it is evident that there is not one but rather in fact two separate documents one with a label field and one with a description field so an effective SPARQL query is:\n?s text:query (rdfs:label \u0026quot;printer\u0026quot;) . ?s text:query (ex:description \u0026quot;large capacity cartridge\u0026quot;) . which leads to ?s being bound to ex:SomePrinter.\nIn other words when a query is to involve two or more properties of a given entity then it is expressed at the SPARQL level, as it were, versus in Lucene\u0026rsquo;s query language.\nIt is worth noting that the equivalent of a Lucene OR of Fields can be expressed using SPARQL union, though since 3.13.0 this can be expressed in Jena text using a property list - see Input arguments:\n{ ?s text:query (rdfs:label \u0026quot;printer\u0026quot;) . } union { ?s text:query (ex:description \u0026quot;large capacity cartridge\u0026quot;) . } Suppose the matching literals are required for the above then it should be clear from the above that:\n(?s ?sc1 ?lit1) text:query (skos:prefLabel \u0026quot;printer\u0026quot;) . (?s ?sc2 ?lit2) text:query (ex:description \u0026quot;large capacity cartridge\u0026quot;) . will be the appropriate form to retrieve the subject and the associated literals, ?lit1 and ?lit2. (Obviously, in general, the score variables, ?sc1 and ?sc2 must be distinct since it is very unlikely that the scores of the two Lucene queries will ever match).\nThere is no loss of expressiveness of the Lucene query language versus the jena-text integration of Lucene. Any cross-field ANDs are replaced by concurrent SPARQL calls to text:query as illustrated above and uses of Lucene OR can be converted to SPARQL unions. Uses of Lucene NOT are converted to appropriate SPARQL filters.\nMultiple fields in the one-document equals one-entity model If Lucene documents have been indexed with multiple searchable fields then compound queries expressed directly in the Lucene query language can significantly improve search performance, in particular, where the individual components of the Lucene query generate a lot of results which must be combined in SPARQL.\nIt is possible to have text queries that search multiple fields within a text query. Doing this is more complex as it requires the use of either an externally managed text index or code must be provided to build the multi-field text documents to be indexed. See Multiple fields per document.\nQueries with Boolean Operators and Term Modifiers On the other hand the various features of the Lucene query language are all available to be used for searches within a Field. For example, Boolean Operators on Terms:\n?s text:query (ex:description \u0026quot;(large AND cartridge)\u0026quot;) and\n(?s ?sc ?lit) text:query (ex:description \u0026quot;(includes AND (large OR capacity))\u0026quot;) or fuzzy searches:\n?s text:query (ex:description \u0026quot;include~\u0026quot;) and so on will work as expected.\nAlways surround the query string with ( ) if more than a single term or phrase are involved.\nHighlighting The highlighting option uses the Lucene Highlighter and SimpleHTMLFormatter to insert highlighting markup into the literals returned from search results (hence the text dataset must be configured to store the literals). The highlighted results are returned via the literal output argument. This highlighting feature, introduced in version 3.7.0, does not require re-indexing by Lucene.\nThe simplest way to request highlighting is via 'highlight:'. This will apply all the defaults:\nOption Key Default maxFrags m: 3 fragSize z: 128 start s: RIGHT_ARROW end e: LEFT_ARROW fragSep f: DIVIDES joinHi jh: true joinFrags jf: true to the highlighting of the search results. For example if the query is:\n(?s ?sc ?lit) text:query ( \u0026quot;brown fox\u0026quot; \u0026quot;highlight:\u0026quot; ) then a resulting literal binding might be:\n\u0026quot;the quick ↦brown fox↤ jumped over the lazy baboon\u0026quot; The RIGHT_ARROW is Unicode \\u21a6 and the LEFT_ARROW is Unicode \\u21a4. These are chosen to be single characters that in most situations will be very unlikely to occur in resulting literals. The fragSize of 128 is chosen to be large enough that in many situations the matches will result in single fragments. If the literal is larger than 128 characters and there are several matches in the literal then there may be additional fragments separated by the DIVIDES, Unicode \\u2223.\nDepending on the analyzer used and the tokenizer, the highlighting will result in marking each token rather than an entire phrase. The joinHi option is by default true so that entire phrases are highlighted together rather than as individual tokens as in:\n\u0026quot;the quick ↦brown↤ ↦fox↤ jumped over the lazy baboon\u0026quot; which would result from:\n(?s ?sc ?lit) text:query ( \u0026quot;brown fox\u0026quot; \u0026quot;highlight:jh:n\u0026quot; ) The jh and jf boolean options are set false via n. Any other value is true. The defaults for these options have been selected to be reasonable for most applications.\nThe joining is performed post highlighting via Java String replaceAll rather than using the Lucene Unified Highlighter facility which requires that term vectors and positions be stored. The joining deletes extra highlighting with only intervening Unicode separators, \\p{Z}.\nThe more conventional output of the Lucene SimpleHTMLFormatter with html emphasis markup is achieved via, \u0026quot;highlight:s:\u0026lt;em class='hiLite'\u0026gt; | e:\u0026lt;/em\u0026gt;\u0026quot; (highlight options are separated by a Unicode vertical line, \\u007c. The spaces are not necessary). The result with the above example will be:\n\u0026quot;the quick \u0026lt;em class='hiLite'\u0026gt;brown fox\u0026lt;/em\u0026gt; jumped over the lazy baboon\u0026quot; which would result from the query:\n(?s ?sc ?lit) text:query ( \u0026quot;brown fox\u0026quot; \u0026quot;highlight:s:\u0026lt;em class='hiLite'\u0026gt; | e:\u0026lt;/em\u0026gt;\u0026quot; ) Good practice From the above it should be clear that best practice, except in the simplest cases is to use explicit text:query forms such as:\n(?s ?sc ?lit) text:query (ex:someProperty \u0026quot;a single Field query\u0026quot;) possibly with limit and lang:xx arguments.\nFurther, the query engine does not have information about the selectivity of the text index and so effective query plans cannot be determined programmatically. It is helpful to be aware of the following two general query patterns.\nQuery pattern 1 – Find in the text index and refine results Access to the text index is first in the query and used to find a number of items of interest; further information is obtained about these items from the RDF data.\nSELECT ?s { ?s text:query (rdfs:label 'word' 10) ; rdfs:label ?label ; rdf:type ?type } The text:query limit argument is useful when working with large indexes to limit results to the higher scoring results – results are returned in the order of scoring by the text search engine.\nQuery pattern 2 – Filter results via the text index By finding items of interest first in the RDF data, the text search can be used to restrict the items found still further.\nSELECT ?s { ?s rdf:type :book ; dc:creator \u0026quot;John\u0026quot; . ?s text:query (dc:title 'word') ; } Configuration The usual way to describe a text index is with a Jena assembler description. Configurations can also be built with code. The assembler describes a \u0026rsquo;text dataset\u0026rsquo; which has an underlying RDF dataset and a text index. The text index describes the text index technology (Lucene or Elasticsearch) and the details needed for each.\nA text index has an \u0026ldquo;entity map\u0026rdquo; which defines the properties to index, the name of the Lucene/Elasticsearch field and field used for storing the URI itself.\nFor simple RDF use, there will be one field, mapping a property to a text index field. More complex setups, with multiple properties per entity (URI) are possible.\nThe assembler file can be either default configuration file (\u0026hellip;/run/config.ttl) or a custom file in \u0026hellip;run/configuration folder. Note that you can use several files simultaneously.\nYou have to edit the file (see comments in the assembler code below):\nprovide values for paths and a fixed URI for tdb:DatasetTDB modify the entity map : add the fields you want to index and desired options (filters, tokenizers\u0026hellip;) If your assembler file is run/config.ttl, you can index the dataset with this command :\njava -cp ./fuseki-server.jar jena.textindexer --desc=run/config.ttl Once configured, any data added to the text dataset is automatically indexed as well: Building a Text Index.\nText Dataset Assembler The following is an example of an assembler file defining a TDB dataset with a Lucene text index.\n######## Example of a TDB dataset and text index######################### # The main doc sources are: # - https://jena.apache.org/documentation/fuseki2/fuseki-configuration.html # - https://jena.apache.org/documentation/assembler/assembler-howto.html # - https://jena.apache.org/documentation/assembler/assembler.ttl # See https://jena.apache.org/documentation/fuseki2/fuseki-layout.html for the destination of this file. ######################################################################### PREFIX : \u0026lt;http://localhost/jena_example/#\u0026gt; PREFIX rdf: \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026gt; PREFIX rdfs: \u0026lt;http://www.w3.org/2000/01/rdf-schema#\u0026gt; PREFIX tdb: \u0026lt;http://jena.hpl.hp.com/2008/tdb#\u0026gt; PREFIX text: \u0026lt;http://jena.apache.org/text#\u0026gt; PREFIX skos: \u0026lt;http://www.w3.org/2004/02/skos/core#\u0026gt; PREFIX fuseki: \u0026lt;http://jena.apache.org/fuseki#\u0026gt; [] rdf:type fuseki:Server ; fuseki:services ( :myservice ) . :myservice rdf:type fuseki:Service ; # e.g : `s-query --service=http://localhost:3030/myds \u0026quot;select * ...\u0026quot;` fuseki:name \u0026quot;myds\u0026quot; ; # SPARQL query service : /myds fuseki:endpoint [ fuseki:operation fuseki:query ; ]; # SPARQL query service : /myds/query fuseki:endpoint [ fuseki:operation fuseki:query ; fuseki:name \u0026quot;query\u0026quot; ]; # SPARQL update service : /myds/update fuseki:endpoint [ fuseki:operation fuseki:update ; fuseki:name \u0026quot;update\u0026quot; ]; # SPARQL Graph store protocol (read and write) : /myds/data fuseki:endpoint [ fuseki:operation fuseki:gsp-rw ; fuseki:name \u0026quot;data\u0026quot; ]; # The text-enabled dataset fuseki:dataset :text_dataset ; . ## --------------------------------------------------------------- # A TextDataset is a regular dataset with a text index. :text_dataset rdf:type text:TextDataset ; text:dataset :mydataset ; # \u0026lt;-- replace `:my_dataset` with the desired URI text:index \u0026lt;#indexLucene\u0026gt; ; . # A TDB dataset used for RDF storage :mydataset rdf:type tdb:DatasetTDB ; # \u0026lt;-- replace `:my_dataset` with the desired URI - as above tdb:location \u0026quot;DB\u0026quot; ; tdb:unionDefaultGraph true ; # Optional . # Text index description \u0026lt;#indexLucene\u0026gt; a text:TextIndexLucene ; text:directory \u0026lt;file:path\u0026gt; ; # \u0026lt;-- replace `\u0026lt;file:path\u0026gt;` with your path (e.g., `\u0026lt;file:/.../fuseki/run/databases/MY_INDEX\u0026gt;`) text:entityMap \u0026lt;#entMap\u0026gt; ; text:storeValues true ; text:analyzer [ a text:StandardAnalyzer ] ; text:queryAnalyzer [ a text:KeywordAnalyzer ] ; text:queryParser text:AnalyzingQueryParser ; text:propLists ( [ . . . ] . . . ) ; text:defineAnalyzers ( [ . . . ] . . . ) ; text:multilingualSupport true ; # optional . # Entity map (see documentation for other options) \u0026lt;#entMap\u0026gt; a text:EntityMap ; text:defaultField \u0026quot;label\u0026quot; ; text:entityField \u0026quot;uri\u0026quot; ; text:uidField \u0026quot;uid\u0026quot; ; text:langField \u0026quot;lang\u0026quot; ; text:graphField \u0026quot;graph\u0026quot; ; text:map ( [ text:field \u0026quot;label\u0026quot; ; text:predicate skos:prefLabel ] ) . See below for more on defining an entity map\nThe text:TextDataset has two properties:\na text:dataset, e.g., a tdb:DatasetTDB, to contain the RDF triples; and\nan index configured to use either text:TextIndexLucene or text:TextIndexES.\nThe \u0026lt;#indexLucene\u0026gt; instance of text:TextIndexLucene, above, has two required properties:\nthe text:directory file URI which specifies the directory that will contain the Lucene index files – if this has the value \u0026quot;mem\u0026quot; then the index resides in memory;\nthe text:entityMap, \u0026lt;#entMap\u0026gt; that will define what properties are to be indexed and other features of the index; and\nand several optional properties:\ntext:storeValues controls the storing of literal values. It indicates whether values are stored or not – values must be stored for the ?literal return value to be available in text:query in SPARQL.\ntext:analyzer specifies the default analyzer configuration to used during indexing and querying. The default analyzer defaults to Lucene\u0026rsquo;s StandardAnalyzer.\ntext:queryAnalyzer specifies an optional analyzer for query that will be used to analyze the query string. If not set the analyzer used to index a given field is used.\ntext:queryParser is optional and specifies an alternative query parser\ntext:propLists is optional and allows to specify lists of indexed properties for use in text:query\ntext:defineAnalyzers is optional and allows specification of additional analyzers, tokenizers and filters\ntext:multilingualSupport enables Multilingual Support\nIf using Elasticsearch then an index would be configured as follows:\n\u0026lt;#indexES\u0026gt; a text:TextIndexES ; # A comma-separated list of Host:Port values of the ElasticSearch Cluster nodes. text:serverList \u0026quot;127.0.0.1:9300\u0026quot; ; # Name of the ElasticSearch Cluster. If not specified defaults to 'elasticsearch' text:clusterName \u0026quot;elasticsearch\u0026quot; ; # The number of shards for the index. Defaults to 1 text:shards \u0026quot;1\u0026quot; ; # The number of replicas for the index. Defaults to 1 text:replicas \u0026quot;1\u0026quot; ; # Name of the Index. defaults to jena-text text:indexName \u0026quot;jena-text\u0026quot; ; text:entityMap \u0026lt;#entMap\u0026gt; ; . and text:index \u0026lt;#indexES\u0026gt; ; would be used in the configuration of :text_dataset.\nTo use a text index assembler configuration in Java code is it necessary to identify the dataset URI to be assembled, such as in:\nDataset ds = DatasetFactory.assemble( \u0026quot;text-config.ttl\u0026quot;, \u0026quot;http://localhost/jena_example/#text_dataset\u0026quot;) ; since the assembler contains two dataset definitions, one for the text dataset, one for the base data. Therefore, the application needs to identify the text dataset by it\u0026rsquo;s URI http://localhost/jena_example/#text_dataset.\nLists of Indexed Properties Since 3.13.0, an optional text:TextIndexLucene feature, text:propLists allows to define lists of Lucene indexed properties that may be used in text:querys. For example:\ntext:propLists ( [ text:propListProp ex:labels ; text:props ( skos:prefLabel skos:altLabel rdfs:label ) ; ] [ text:propListProp ex:workStmts ; text:props ( ex:workColophon ex:workAuthorshipStatement ex:workEditionStatement ) ; ] ) ; The text:propLists is a list of property list definitions. Each property list defines a new property, text:propListProp that will be used to refer to the list in a text:query, for example, ex:labels and ex:workStmts, above. The text:props is a list of Lucene indexed properties that will be searched over when the property list property is referred to in a text:query. For example:\n?s text:query ( ex:labels \u0026quot;some text\u0026quot; ) . will request Lucene to search for documents representing triples, ?s ?p ?o, where ?p is one of: rdfs:label OR skos:prefLbael OR skos:altLabel, matching the query string.\nEntity Map definition A text:EntityMap has several properties that condition what is indexed, what information is stored, and what analyzers are used.\n\u0026lt;#entMap\u0026gt; a text:EntityMap ; text:defaultField \u0026quot;label\u0026quot; ; text:entityField \u0026quot;uri\u0026quot; ; text:uidField \u0026quot;uid\u0026quot; ; text:langField \u0026quot;lang\u0026quot; ; text:graphField \u0026quot;graph\u0026quot; ; text:map ( [ text:field \u0026quot;label\u0026quot; ; text:predicate rdfs:label ] ) . Default text field The text:defaultField specifies the default field name that Lucene will use in a query that does not otherwise specify a field. For example,\n?s text:query \u0026quot;\\\u0026quot;bread and butter\\\u0026quot;\u0026quot; will perform a search in the label field for the phrase \u0026quot;bread and butter\u0026quot;\nEntity field The text:entityField specifies the field name of the field that will contain the subject URI that is returned on a match. The value of the property is arbitrary so long as it is unique among the defined names.\nUID Field and automatic document deletion When the text:uidField is defined in the EntityMap then dropping a triple will result in the corresponding document, if any, being deleted from the text index. The value, \u0026quot;uid\u0026quot;, is arbitrary and defines the name of a stored field in Lucene that holds a unique ID that represents the triple.\nIf you configure the index via Java code, you need to set this parameter to the EntityDefinition instance, e.g.\nEntityDefinition docDef = new EntityDefinition(entityField, defaultField); docDef.setUidField(\u0026quot;uid\u0026quot;); Note: If you migrate from an index without deletion support to an index with automatic deletion, you will need to rebuild the index to ensure that the uid information is stored.\nLanguage Field The text:langField is the name of the field that will store the language attribute of the literal in the case of an rdf:langString. This Entity Map property is a key element of the Linguistic support with Lucene index\nGraph Field Setting the text:graphField allows graph-specific indexing of the text index to limit searching to a specified graph when a SPARQL query targets a single named graph. The field value is arbitrary and serves to store the graph ID that a triple belongs to when the index is updated.\nThe Analyzer Map The text:map is a list of analyzer specifications as described below.\nConfiguring an Analyzer Text to be indexed is passed through a text analyzer that divides it into tokens and may perform other transformations such as eliminating stop words. If a Lucene or Elasticsearch text index is used, then by default the Lucene StandardAnalyzer is used.\nAs of Jena 4.7.x / Lucene 9.x onwards, the StandardAnalyzer does not default to having English stopwords if no stop words are provided. The setting up until Apache Lucene 8 had the stopwords:\n\"a\" \"an\" \"and\" \"are\" \"as\" \"at\" \"be\" \"but\" \"by\" \"for\" \"if\" \"in\" \"into\" \"is\" \"it\" \"no\" \"not\" \"of\" \"on\" \"or\" \"such\" \"that\" \"the\" \"their\" \"then\" \"there\" \"these\" \"they\" \"this\" \"to\" \"was\" \"will\" \"with\" In case of a TextIndexLucene the default analyzer can be replaced by another analyzer with the text:analyzer property on the text:TextIndexLucene resource in the text dataset assembler, for example with a SimpleAnalyzer:\n\u0026lt;#indexLucene\u0026gt; a text:TextIndexLucene ; text:directory \u0026lt;file:Lucene\u0026gt; ; text:analyzer [ a text:SimpleAnalyzer ] . It is possible to configure an alternative analyzer for each field indexed in a Lucene index. For example:\n\u0026lt;#entMap\u0026gt; a text:EntityMap ; text:entityField \u0026quot;uri\u0026quot; ; text:defaultField \u0026quot;text\u0026quot; ; text:map ( [ text:field \u0026quot;text\u0026quot; ; text:predicate rdfs:label ; text:analyzer [ a text:StandardAnalyzer ; text:stopWords (\u0026quot;a\u0026quot; \u0026quot;an\u0026quot; \u0026quot;and\u0026quot; \u0026quot;but\u0026quot;) ] ] ) . will configure the index to analyze values of the \u0026rsquo;text\u0026rsquo; field using a StandardAnalyzer with the given list of stop words.\nOther analyzer types that may be specified are SimpleAnalyzer and KeywordAnalyzer, neither of which has any configuration parameters. See the Lucene documentation for details of what these analyzers do. Jena also provides LowerCaseKeywordAnalyzer, which is a case-insensitive version of KeywordAnalyzer, and ConfigurableAnalyzer.\nSupport for the new LocalizedAnalyzer has been introduced in Jena 3.0.0 to deal with Lucene language specific analyzers. See Linguistic Support with Lucene Index for details.\nSupport for GenericAnalyzers has been introduced in Jena 3.4.0 to allow the use of Analyzers that do not have built-in support, e.g., BrazilianAnalyzer; require constructor parameters not otherwise supported, e.g., a stop words FileReader or a stemExclusionSet; and finally use of Analyzers not included in the bundled Lucene distribution, e.g., a SanskritIASTAnalyzer. See Generic and Defined Analyzer Support\nConfigurableAnalyzer ConfigurableAnalyzer was introduced in Jena 3.0.1. It allows more detailed configuration of text analysis parameters by independently selecting a Tokenizer and zero or more TokenFilters which are applied in order after tokenization. See the Lucene documentation for details on what each tokenizer and token filter does.\nThe available Tokenizer implementations are:\nStandardTokenizer KeywordTokenizer WhitespaceTokenizer LetterTokenizer The available TokenFilter implementations are:\nStandardFilter LowerCaseFilter ASCIIFoldingFilter SelectiveFoldingFilter Configuration is done using Jena assembler like this:\ntext:analyzer [ a text:ConfigurableAnalyzer ; text:tokenizer text:KeywordTokenizer ; text:filters (text:ASCIIFoldingFilter, text:LowerCaseFilter) ] From Jena 3.7.0, it is possible to define tokenizers and filters in addition to the built-in choices above that may be used with the ConfigurableAnalyzer. Tokenizers and filters are defined via text:defineAnalyzers in the text:TextIndexLucene assembler section using text:GenericTokenizer and text:GenericFilter.\nAnalyzer for Query New in Jena 2.13.0.\nThere is an ability to specify an analyzer to be used for the query string itself. It will find terms in the query text. If not set, then the analyzer used for the document will be used. The query analyzer is specified on the TextIndexLucene resource:\n\u0026lt;#indexLucene\u0026gt; a text:TextIndexLucene ; text:directory \u0026lt;file:Lucene\u0026gt; ; text:entityMap \u0026lt;#entMap\u0026gt; ; text:queryAnalyzer [ a text:KeywordAnalyzer ] . Alternative Query Parsers New in Jena 3.1.0.\nIt is possible to select a query parser other than the default QueryParser.\nThe available QueryParser implementations are:\nAnalyzingQueryParser: Performs analysis for wildcard queries . This is useful in combination with accent-insensitive wildcard queries.\nComplexPhraseQueryParser: Permits complex phrase query syntax. Eg: \u0026ldquo;(john jon jonathan~) peters*\u0026rdquo;. This is useful for performing wildcard or fuzzy queries on individual terms in a phrase.\nSurroundQueryParser: Provides positional operators (w and n) that accept a numeric distance, as well as boolean operators (and, or, and not, wildcards (* and ?), quoting (with \u0026ldquo;), and boosting (via ^).\nThe query parser is specified on the TextIndexLucene resource:\n\u0026lt;#indexLucene\u0026gt; a text:TextIndexLucene ; text:directory \u0026lt;file:Lucene\u0026gt; ; text:entityMap \u0026lt;#entMap\u0026gt; ; text:queryParser text:AnalyzingQueryParser . Elasticsearch currently doesn\u0026rsquo;t support Analyzers beyond Standard Analyzer.\nConfiguration by Code A text dataset can also be constructed in code as might be done for a purely in-memory setup:\n// Example of building a text dataset with code. // Example is in-memory. // Base dataset Dataset ds1 = DatasetFactory.createMem() ; EntityDefinition entDef = new EntityDefinition(\u0026quot;uri\u0026quot;, \u0026quot;text\u0026quot;, RDFS.label) ; // Lucene, in memory. Directory dir = new RAMDirectory(); // Join together into a dataset Dataset ds = TextDatasetFactory.createLucene(ds1, dir, entDef) ; Graph-specific Indexing jena-text supports storing information about the source graph into the text index. This allows for more efficient text queries when the query targets only a single named graph. Without graph-specific indexing, text queries do not distinguish named graphs and will always return results from all graphs.\nSupport for graph-specific indexing is enabled by defining the name of the index field to use for storing the graph identifier.\nIf you use an assembler configuration, set the graph field using the text:graphField property on the EntityMap, e.g.\n# Mapping in the index # URI stored in field \u0026quot;uri\u0026quot; # Graph stored in field \u0026quot;graph\u0026quot; # rdfs:label is mapped to field \u0026quot;text\u0026quot; \u0026lt;#entMap\u0026gt; a text:EntityMap ; text:entityField \u0026quot;uri\u0026quot; ; text:graphField \u0026quot;graph\u0026quot; ; text:defaultField \u0026quot;text\u0026quot; ; text:map ( [ text:field \u0026quot;text\u0026quot; ; text:predicate rdfs:label ] ) . If you configure the index in Java code, you need to use one of the EntityDefinition constructors that support the graphField parameter, e.g.\nEntityDefinition entDef = new EntityDefinition(\u0026quot;uri\u0026quot;, \u0026quot;text\u0026quot;, \u0026quot;graph\u0026quot;, RDFS.label.asNode()) ; Note: If you migrate from a global (non-graph-aware) index to a graph-aware index, you need to rebuild the index to ensure that the graph information is stored.\nLinguistic support with Lucene index Language tags associated with rdfs:langStrings occurring as literals in triples may be used to enhance indexing and queries. Sub-sections below detail different settings with the index, and use cases with SPARQL queries.\nExplicit Language Field in the Index The language tag for object literals of triples can be stored (during triple insert/update) into the index to extend query capabilities. For that, the text:langField property must be set in the EntityMap assembler :\n\u0026lt;#entMap\u0026gt; a text:EntityMap ; text:entityField \u0026quot;uri\u0026quot; ; text:defaultField \u0026quot;text\u0026quot; ; text:langField \u0026quot;lang\u0026quot; ; . If you configure the index via Java code, you need to set this parameter to the EntityDefinition instance, e.g.\nEntityDefinition docDef = new EntityDefinition(entityField, defaultField); docDef.setLangField(\u0026quot;lang\u0026quot;); Note that configuring the text:langField does not determine a language specific analyzer. It merely records the tag associated with an indexed rdfs:langString.\nSPARQL Linguistic Clause Forms Once the langField is set, you can use it directly inside SPARQL queries. For that the lang:xx argument allows you to target specific localized values. For example:\n//target english literals ?s text:query (rdfs:label 'word' 'lang:en' ) //target unlocalized literals ?s text:query (rdfs:label 'word' 'lang:none') //ignore language field ?s text:query (rdfs:label 'word') Refer above for further discussion on querying.\nLocalizedAnalyzer You can specify a LocalizedAnalyzer in order to benefit from Lucene language specific analyzers (stemming, stop words,\u0026hellip;). Like any other analyzers, it can be done for default text indexing, for each different field or for query.\nUsing an assembler configuration, the text:language property needs to be provided, e.g :\n\u0026lt;#indexLucene\u0026gt; a text:TextIndexLucene ; text:directory \u0026lt;file:Lucene\u0026gt; ; text:entityMap \u0026lt;#entMap\u0026gt; ; text:analyzer [ a text:LocalizedAnalyzer ; text:language \u0026quot;fr\u0026quot; ] . will configure the index to analyze values of the default property field using a FrenchAnalyzer.\nTo configure the same example via Java code, you need to provide the analyzer to the index configuration object:\nTextIndexConfig config = new TextIndexConfig(def); Analyzer analyzer = Util.getLocalizedAnalyzer(\u0026quot;fr\u0026quot;); config.setAnalyzer(analyzer); Dataset ds = TextDatasetFactory.createLucene(ds1, dir, config) ; Where def, ds1 and dir are instances of EntityDefinition, Dataset and Directory classes.\nNote: You do not have to set the text:langField property with a single localized analyzer. Also note that the above configuration will use the FrenchAnalyzer for all strings indexed under the default property regardless of the language tag associated with the literal (if any).\nMultilingual Support Let us suppose that we have many triples with many localized literals in many different languages. It is possible to take all these languages into account for future mixed localized queries. Configure the text:multilingualSupport property to enable indexing and search via localized analyzers based on the language tag:\n\u0026lt;#indexLucene\u0026gt; a text:TextIndexLucene ; text:directory \u0026quot;mem\u0026quot; ; text:multilingualSupport true; . Via Java code, set the multilingual support flag :\nTextIndexConfig config = new TextIndexConfig(def); config.setMultilingualSupport(true); Dataset ds = TextDatasetFactory.createLucene(ds1, dir, config) ; This multilingual index combines dynamically all localized analyzers of existing languages and the storage of langField properties.\nThe multilingual analyzer becomes the default analyzer and the Lucene StandardAnalyzer is the default analyzer used when there is no language tag.\nIt is straightforward to refer to different languages in the same text search query:\nSELECT ?s WHERE { { ?s text:query ( rdfs:label 'institut' 'lang:fr' ) } UNION { ?s text:query ( rdfs:label 'institute' 'lang:en' ) } } Hence, the result set of the query will contain \u0026ldquo;institute\u0026rdquo; related subjects (institution, institutional,\u0026hellip;) in French and in English.\nNote When multilingual indexing is enabled for a property, e.g., rdfs:label, there will actually be two copies of each literal indexed. One under the Field name, \u0026ldquo;label\u0026rdquo;, and one under the name \u0026ldquo;label_xx\u0026rdquo;, where \u0026ldquo;xx\u0026rdquo; is the language tag.\nGeneric and Defined Analyzer Support There are many Analyzers that do not have built-in support, e.g., BrazilianAnalyzer; require constructor parameters not otherwise supported, e.g., a stop words FileReader or a stemExclusionSet; or make use of Analyzers not included in the bundled Lucene distribution, e.g., a SanskritIASTAnalyzer. Two features have been added to enhance the utility of jena-text: 1) text:GenericAnalyzer; and 2) text:DefinedAnalyzer. Further, since Jena 3.7.0, features to allow definition of tokenizers and filters are included.\nGeneric Analyzers, Tokenizers and Filters A text:GenericAnalyzer includes a text:class which is the fully qualified class name of an Analyzer that is accessible on the jena classpath. This is trivial for Analyzer classes that are included in the bundled Lucene distribution and for other custom Analyzers a simple matter of including a jar containing the custom Analyzer and any associated Tokenizer and Filters on the classpath.\nSimilarly, text:GenericTokenizer and text:GenericFilter allow to access any tokenizers or filters that are available on the Jena classpath. These two types are used only to define tokenizer and filter configurations that may be referred to when specifying a ConfigurableAnalyzer.\nIn addition to the text:class it is generally useful to include an ordered list of text:params that will be used to select an appropriate constructor of the Analyzer class. If there are no text:params in the analyzer specification or if the text:params is an empty list then the nullary constructor is used to instantiate the analyzer. Each element of the list of text:params includes:\nan optional text:paramName of type Literal that is useful to identify the purpose of a parameter in the assembler configuration a text:paramType which is one of: Type Description text:TypeAnalyzer a subclass of org.apache.lucene.analysis.Analyzer text:TypeBoolean a java boolean text:TypeFile the String path to a file materialized as a java.io.FileReader text:TypeInt a java int text:TypeString a java String text:TypeSet an org.apache.lucene.analysis.CharArraySet and is required for the types text:TypeAnalyzer, text:TypeFile and text:TypeSet, but, since Jena 3.7.0, may be implied by the form of the literal for the types: text:TypeBoolean, text:TypeInt and text:TypeString.\na required text:paramValue with an object of the type corresponding to text:paramType In the case of an analyzer parameter the text:paramValue is any text:analyzer resource as describe throughout this document.\nAn example of the use of text:GenericAnalyzer to configure an EnglishAnalyzer with stop words and stem exclusions is:\ntext:map ( [ text:field \u0026quot;text\u0026quot; ; text:predicate rdfs:label; text:analyzer [ a text:GenericAnalyzer ; text:class \u0026quot;org.apache.lucene.analysis.en.EnglishAnalyzer\u0026quot; ; text:params ( [ text:paramName \u0026quot;stopwords\u0026quot; ; text:paramType text:TypeSet ; text:paramValue (\u0026quot;the\u0026quot; \u0026quot;a\u0026quot; \u0026quot;an\u0026quot;) ] [ text:paramName \u0026quot;stemExclusionSet\u0026quot; ; text:paramType text:TypeSet ; text:paramValue (\u0026quot;ing\u0026quot; \u0026quot;ed\u0026quot;) ] ) ] . Here is an example of defining an instance of ShingleAnalyzerWrapper:\ntext:map ( [ text:field \u0026quot;text\u0026quot; ; text:predicate rdfs:label; text:analyzer [ a text:GenericAnalyzer ; text:class \u0026quot;org.apache.lucene.analysis.shingle.ShingleAnalyzerWrapper\u0026quot; ; text:params ( [ text:paramName \u0026quot;defaultAnalyzer\u0026quot; ; text:paramType text:TypeAnalyzer ; text:paramValue [ a text:SimpleAnalyzer ] ] [ text:paramName \u0026quot;maxShingleSize\u0026quot; ; text:paramType text:TypeInt ; text:paramValue 3 ] ) ] . If there is need of using an analyzer with constructor parameter types not included here then one approach is to define an AnalyzerWrapper that uses available parameter types, such as file, to collect the information needed to instantiate the desired analyzer. An example of such an analyzer is the Kuromoji morphological analyzer for Japanese text that uses constructor parameters of types: UserDictionary, JapaneseTokenizer.Mode, CharArraySet and Set\u0026lt;String\u0026gt;.\nAs mentioned above, the simple types: TypeInt, TypeBoolean, and TypeString may be written without explicitly including text:paramType in the parameter specification. For example:\n[ text:paramName \u0026quot;maxShingleSize\u0026quot; ; text:paramValue 3 ] is sufficient to specify the parameter.\nDefined Analyzers The text:defineAnalyzers feature allows to extend the Multilingual Support defined above. Further, this feature can also be used to name analyzers defined via text:GenericAnalyzer so that a single (perhaps complex) analyzer configuration can be used is several places.\nFurther, since Jena 3.7.0, this feature is also used to name tokenizers and filters that can be referred to in the specification of a ConfigurableAnalyzer.\nThe text:defineAnalyzers is used with text:TextIndexLucene to provide a list of analyzer definitions:\n\u0026lt;#indexLucene\u0026gt; a text:TextIndexLucene ; text:directory \u0026lt;file:Lucene\u0026gt; ; text:entityMap \u0026lt;#entMap\u0026gt; ; text:defineAnalyzers ( [ text:addLang \u0026quot;sa-x-iast\u0026quot; ; text:analyzer [ . . . ] ] [ text:defineAnalyzer \u0026lt;#foo\u0026gt; ; text:analyzer [ . . . ] ] ) . References to a defined analyzer may be made in the entity map like:\ntext:analyzer [ a text:DefinedAnalyzer text:useAnalyzer \u0026lt;#foo\u0026gt; ] Since Jena 3.7.0, a ConfigurableAnalyzer specification can refer to any defined tokenizer and filters, as in:\ntext:defineAnalyzers ( [ text:defineAnalyzer :configuredAnalyzer ; text:analyzer [ a text:ConfigurableAnalyzer ; text:tokenizer :ngram ; text:filters ( :asciiff text:LowerCaseFilter ) ] ] [ text:defineTokenizer :ngram ; text:tokenizer [ a text:GenericTokenizer ; text:class \u0026quot;org.apache.lucene.analysis.ngram.NGramTokenizer\u0026quot; ; text:params ( [ text:paramName \u0026quot;minGram\u0026quot; ; text:paramValue 3 ] [ text:paramName \u0026quot;maxGram\u0026quot; ; text:paramValue 7 ] ) ] ] [ text:defineFilter :asciiff ; text:filter [ a text:GenericFilter ; text:class \u0026quot;org.apache.lucene.analysis.miscellaneous.ASCIIFoldingFilter\u0026quot; ; text:params ( [ text:paramName \u0026quot;preserveOriginal\u0026quot; ; text:paramValue true ] ) ] ] ) ; And after 3.8.0 users are able to use the JenaText custom filter SelectiveFoldingFilter. This filter is not part of the Apache Lucene, but rather a custom implementation available for JenaText users.\nIt is based on the Apache Lucene\u0026rsquo;s ASCIIFoldingFilter, but with the addition of a white-list for characters that must not be replaced. This is especially useful for languages where some special characters and diacritical marks are useful when searching.\nHere\u0026rsquo;s an example:\ntext:defineAnalyzers ( [ text:defineAnalyzer :configuredAnalyzer ; text:analyzer [ a text:ConfigurableAnalyzer ; text:tokenizer :tokenizer ; text:filters ( :selectiveFoldingFilter text:LowerCaseFilter ) ] ] [ text:defineTokenizer :tokenizer ; text:tokenizer [ a text:GenericTokenizer ; text:class \u0026quot;org.apache.lucene.analysis.core.LowerCaseTokenizer\u0026quot; ] ] [ text:defineFilter :selectiveFoldingFilter ; text:filter [ a text:GenericFilter ; text:class \u0026quot;org.apache.jena.query.text.filter.SelectiveFoldingFilter\u0026quot; ; text:params ( [ text:paramName \u0026quot;whitelisted\u0026quot; ; text:paramType text:TypeSet ; text:paramValue (\u0026quot;ç\u0026quot; \u0026quot;ä\u0026quot;) ] ) ] ] ) ; Extending multilingual support The Multilingual Support described above allows for a limited set of ISO 2-letter codes to be used to select from among built-in analyzers using the nullary constructor associated with each analyzer. So if one is wanting to use:\na language not included, e.g., Brazilian; or use additional constructors defining stop words, stem exclusions and so on; or refer to custom analyzers that might be associated with generalized BCP-47 language tags, such as, sa-x-iast for Sanskrit in the IAST transliteration, then text:defineAnalyzers with text:addLang will add the desired analyzers to the multilingual support so that fields with the appropriate language tags will use the appropriate custom analyzer.\nWhen text:defineAnalyzers is used with text:addLang then text:multilingualSupport is implicitly added if not already specified and a warning is put in the log:\ntext:defineAnalyzers ( [ text:addLang \u0026quot;sa-x-iast\u0026quot; ; text:analyzer [ . . . ] ] this adds an analyzer to be used when the text:langField has the value sa-x-iast during indexing and search.\nMultilingual enhancements for multi-encoding searches There are two multilingual search situations that are supported as of 3.8.0:\nSearch in one encoding and retrieve results that may have been entered in other encodings. For example, searching via Simplified Chinese (Hans) and retrieving results that may have been entered in Traditional Chinese (Hant) or Pinyin. This will simplify applications by permitting encoding independent retrieval without additional layers of transcoding and so on. It\u0026rsquo;s all done under the covers in Lucene. Search with queries entered in a lossy, e.g., phonetic, encoding and retrieve results entered with accurate encoding. For example, searching via Pinyin without diacritics and retrieving all possible Hans and Hant triples. The first situation arises when entering triples that include languages with multiple encodings that for various reasons are not normalized to a single encoding. In this situation it is helpful to be able to retrieve appropriate result sets without regard for the encodings used at the time that the triples were inserted into the dataset.\nThere are several suchlanguages of interest: Chinese, Tibetan, Sanskrit, Japanese and Korean. There are various Romanizations and ideographic variants.\nEncodings may not be normalized when inserting triples for a variety of reasons. A principle one is that the rdf:langString object often must be entered in the same encoding that it occurs in some physical text that is being catalogued. Another is that metadata may be imported from sources that use different encoding conventions and it is desirable to preserve the original form.\nThe second situation arises to provide simple support for phonetic or other forms of lossy search at the time that triples are indexed directly in the Lucene system.\nTo handle the first situation a text assembler predicate, text:searchFor, is introduced that specifies a list of language tags that provides a list of language variants that should be searched whenever a query string of a given encoding (language tag) is used. For example, the following text:defineAnalyzers fragment :\n[ text:addLang \u0026quot;bo\u0026quot; ; text:searchFor ( \u0026quot;bo\u0026quot; \u0026quot;bo-x-ewts\u0026quot; \u0026quot;bo-alalc97\u0026quot; ) ; text:analyzer [ a text:GenericAnalyzer ; text:class \u0026quot;io.bdrc.lucene.bo.TibetanAnalyzer\u0026quot; ; text:params ( [ text:paramName \u0026quot;segmentInWords\u0026quot; ; text:paramValue false ] [ text:paramName \u0026quot;lemmatize\u0026quot; ; text:paramValue true ] [ text:paramName \u0026quot;filterChars\u0026quot; ; text:paramValue false ] [ text:paramName \u0026quot;inputMode\u0026quot; ; text:paramValue \u0026quot;unicode\u0026quot; ] [ text:paramName \u0026quot;stopFilename\u0026quot; ; text:paramValue \u0026quot;\u0026quot; ] ) ] ; ] indicates that when using a search string such as \u0026ldquo;རྡོ་རྗེ་སྙིང་\u0026quot;@bo the Lucene index should also be searched for matches tagged as bo-x-ewts and bo-alalc97.\nThis is made possible by a Tibetan Analyzer that tokenizes strings in all three encodings into Tibetan Unicode. This is feasible since the bo-x-ewts and bo-alalc97 encodings are one-to-one with Unicode Tibetan. Since all fields with these language tags will have a common set of indexed terms, i.e., Tibetan Unicode, it suffices to arrange for the query analyzer to have access to the language tag for the query string along with the various fields that need to be considered.\nSupposing that the query is:\n(?s ?sc ?lit) text:query (\u0026quot;rje\u0026quot;@bo-x-ewts) Then the query formed in TextIndexLucene will be:\nlabel_bo:rje label_bo-x-ewts:rje label_bo-alalc97:rje which is translated using a suitable Analyzer, QueryMultilingualAnalyzer, via Lucene\u0026rsquo;s QueryParser to:\n+(label_bo:རྗེ label_bo-x-ewts:རྗེ label_bo-alalc97:རྗེ) which reflects the underlying Tibetan Unicode term encoding. During IndexSearcher.search all documents with one of the three fields in the index for term, \u0026ldquo;རྗེ\u0026rdquo;, will be returned even though the value in the fields label_bo-x-ewts and label_bo-alalc97 for the returned documents will be the original value \u0026ldquo;rje\u0026rdquo;.\nThis support simplifies applications by permitting encoding independent retrieval without additional layers of transcoding and so on. It\u0026rsquo;s all done under the covers in Lucene.\nSolving the second situation simplifies applications by adding appropriate fields and indexing via configuration in the text:defineAnalyzers. For example, the following fragment:\n[ text:defineAnalyzer :hanzAnalyzer ; text:analyzer [ a text:GenericAnalyzer ; text:class \u0026quot;io.bdrc.lucene.zh.ChineseAnalyzer\u0026quot; ; text:params ( [ text:paramName \u0026quot;profile\u0026quot; ; text:paramValue \u0026quot;TC2SC\u0026quot; ] [ text:paramName \u0026quot;stopwords\u0026quot; ; text:paramValue false ] [ text:paramName \u0026quot;filterChars\u0026quot; ; text:paramValue 0 ] ) ] ; ] [ text:defineAnalyzer :han2pinyin ; text:analyzer [ a text:GenericAnalyzer ; text:class \u0026quot;io.bdrc.lucene.zh.ChineseAnalyzer\u0026quot; ; text:params ( [ text:paramName \u0026quot;profile\u0026quot; ; text:paramValue \u0026quot;TC2PYstrict\u0026quot; ] [ text:paramName \u0026quot;stopwords\u0026quot; ; text:paramValue false ] [ text:paramName \u0026quot;filterChars\u0026quot; ; text:paramValue 0 ] ) ] ; ] [ text:defineAnalyzer :pinyin ; text:analyzer [ a text:GenericAnalyzer ; text:class \u0026quot;io.bdrc.lucene.zh.ChineseAnalyzer\u0026quot; ; text:params ( [ text:paramName \u0026quot;profile\u0026quot; ; text:paramValue \u0026quot;PYstrict\u0026quot; ] ) ] ; ] [ text:addLang \u0026quot;zh-hans\u0026quot; ; text:searchFor ( \u0026quot;zh-hans\u0026quot; \u0026quot;zh-hant\u0026quot; ) ; text:auxIndex ( \u0026quot;zh-aux-han2pinyin\u0026quot; ) ; text:analyzer [ a text:DefinedAnalyzer ; text:useAnalyzer :hanzAnalyzer ] ; ] [ text:addLang \u0026quot;zh-hant\u0026quot; ; text:searchFor ( \u0026quot;zh-hans\u0026quot; \u0026quot;zh-hant\u0026quot; ) ; text:auxIndex ( \u0026quot;zh-aux-han2pinyin\u0026quot; ) ; text:analyzer [ a text:DefinedAnalyzer ; text:useAnalyzer :hanzAnalyzer ] ; ] [ text:addLang \u0026quot;zh-latn-pinyin\u0026quot; ; text:searchFor ( \u0026quot;zh-latn-pinyin\u0026quot; \u0026quot;zh-aux-han2pinyin\u0026quot; ) ; text:analyzer [ a text:DefinedAnalyzer ; text:useAnalyzer :pinyin ] ; ] [ text:addLang \u0026quot;zh-aux-han2pinyin\u0026quot; ; text:searchFor ( \u0026quot;zh-latn-pinyin\u0026quot; \u0026quot;zh-aux-han2pinyin\u0026quot; ) ; text:analyzer [ a text:DefinedAnalyzer ; text:useAnalyzer :pinyin ] ; text:indexAnalyzer :han2pinyin ; ] defines language tags for Traditional, Simplified, Pinyin and an auxiliary tag zh-aux-han2pinyin associated with an Analyzer, :han2pinyin. The purpose of the auxiliary tag is to define an Analyzer that will be used during indexing and to specify a list of tags that should be searched when the auxiliary tag is used with a query string.\nSearching is then done via the multi-encoding support discussed above. In this example the Analyzer, :han2pinyin, tokenizes strings in zh-hans and zh-hant as the corresponding pinyin so that at search time a pinyin query will retrieve appropriate triples inserted in Traditional or Simplified Chinese. Such a query would appear as:\n(?s ?sc ?lit ?g) text:query (\u0026quot;jīng\u0026quot;@zh-aux-han2pinyin) The auxiliary field support is needed to accommodate situations such as pinyin or sound-ex which are not exact, i.e., one-to-many rather than one-to-one as in the case of Simplified and Traditional.\nTextIndexLucene adds a field for each of the auxiliary tags associated with the tag of the triple object being indexed. These fields are in addition to the un-tagged field and the field tagged with the language of the triple object literal.\nNaming analyzers for later use Repeating a text:GenericAnalyzer specification for use with multiple fields in an entity map may be cumbersome. The text:defineAnalyzer is used in an element of a text:defineAnalyzers list to associate a resource with an analyzer so that it may be referred to later in a text:analyzer object. Assuming that an analyzer definition such as the following has appeared among the text:defineAnalyzers list:\n[ text:defineAnalyzer \u0026lt;#foo\u0026gt; text:analyzer [ . . . ] ] then in a text:analyzer specification in an entity map, for example, a reference to analyzer \u0026lt;#foo\u0026gt; is made via:\ntext:map ( [ text:field \u0026quot;text\u0026quot; ; text:predicate rdfs:label; text:analyzer [ a text:DefinedAnalyzer text:useAnalyzer \u0026lt;#foo\u0026gt; ] This makes it straightforward to refer to the same (possibly complex) analyzer definition in multiple fields.\nStoring Literal Values New in Jena 3.0.0.\nIt is possible to configure the text index to store enough information in the text index to be able to access the original indexed literal values at query time. This is controlled by two configuration options. First, the text:storeValues property must be set to true for the text index:\n\u0026lt;#indexLucene\u0026gt; a text:TextIndexLucene ; text:directory \u0026quot;mem\u0026quot; ; text:storeValues true; . Or using Java code, used the setValueStored method of TextIndexConfig:\nTextIndexConfig config = new TextIndexConfig(def); config.setValueStored(true); Additionally, setting the langField configuration option is recommended. See Linguistic Support with Lucene Index for details. Without the langField setting, the stored literals will not have language tag or datatype information.\nAt query time, the stored literals can be accessed by using a 3-element list of variables as the subject of the text:query property function. The literal value will be bound to the third variable:\n(?s ?score ?literal) text:query 'word' Working with Fuseki The Fuseki configuration simply points to the text dataset as the fuseki:dataset of the service.\n\u0026lt;#service_text_tdb\u0026gt; rdf:type fuseki:Service ; rdfs:label \u0026quot;TDB/text service\u0026quot; ; fuseki:name \u0026quot;ds\u0026quot; ; fuseki:serviceQuery \u0026quot;query\u0026quot; ; fuseki:serviceQuery \u0026quot;sparql\u0026quot; ; fuseki:serviceUpdate \u0026quot;update\u0026quot; ; fuseki:serviceReadGraphStore \u0026quot;get\u0026quot; ; fuseki:serviceReadWriteGraphStore \u0026quot;data\u0026quot; ; fuseki:dataset :text_dataset ; . Building a Text Index When working at scale, or when preparing a published, read-only, SPARQL service, creating the index by loading the text dataset is impractical.\nThe index and the dataset can be built using command line tools in two steps: first load the RDF data, second create an index from the existing RDF dataset.\nStep 1 - Building a TDB dataset Note: If you have an existing TDB dataset then you can skip this step\nBuild the TDB dataset:\njava -cp $FUSEKI_HOME/fuseki-server.jar tdb.tdbloader --tdb=assembler_file data_file using the copy of TDB included with Fuseki.\nAlternatively, use one of the TDB utilities tdbloader or tdbloader2 which are better for bulk loading:\n$JENA_HOME/bin/tdbloader --loc=directory data_file Step 2 - Build the Text Index You can then build the text index with the jena.textindexer tool:\njava -cp $FUSEKI_HOME/fuseki-server.jar jena.textindexer --desc=assembler_file Because a Fuseki assembler description can have several datasets descriptions, and several text indexes, it may be necessary to extract a single dataset and index description into a separate assembler file for use in loading.\nUpdating the index If you allow updates to the dataset through Fuseki, the configured index will automatically be updated on every modification. This means that you do not have to run the above mentioned jena.textindexer after updates, only when you want to rebuild the index from scratch.\nConfiguring Alternative TextDocProducers Default Behavior The default behavior when performing text indexing is to index a single property as a single field, generating a different Document for each indexed triple. This behavior may be augmented by writing and configuring an alternative TextDocProducer.\nPlease note that TextDocProducer.change(...) is called once for each triple that is ADDed or DELETEd, and thus can not be directly used to accumulate multiple properties for use in composing a single multi-fielded Lucene document. See below.\nTo configure a TextDocProducer, say net.code.MyProducer in a dataset assembly, use the property textDocProducer, eg:\n\u0026lt;#ds-with-lucene\u0026gt; rdf:type text:TextDataset; text:index \u0026lt;#indexLucene\u0026gt; ; text:dataset \u0026lt;#ds\u0026gt; ; text:textDocProducer \u0026lt;java:net.code.MyProducer\u0026gt; ; . where CLASSNAME is the full java class name. It must have either a single-argument constructor of type TextIndex, or a two-argument constructor (DatasetGraph, TextIndex). The TextIndex argument will be the configured text index, and the DatasetGraph argument will be the graph of the configured dataset.\nFor example, to explicitly create the default TextDocProducer use:\n... text:textDocProducer \u0026lt;java:org.apache.jena.query.text.TextDocProducerTriples\u0026gt; ; ... TextDocProducerTriples produces a new document for each subject/field added to the dataset, using TextIndex.addEntity(Entity).\nExample The example class below is a TextDocProducer that only indexes ADDs of quads for which the subject already had at least one property-value. It uses the two-argument constructor to give it access to the dataset so that it count the (?G, S, P, ?O) quads with that subject and predicate, and delegates the indexing to TextDocProducerTriples if there are at least two values for that property (one of those values, of course, is the one that gives rise to this change()).\npublic class Example extends TextDocProducerTriples { final DatasetGraph dg; public Example(DatasetGraph dg, TextIndex indexer) { super(indexer); this.dg = dg; } public void change(QuadAction qaction, Node g, Node s, Node p, Node o) { if (qaction == QuadAction.ADD) { if (alreadyHasOne(s, p)) super.change(qaction, g, s, p, o); } } private boolean alreadyHasOne(Node s, Node p) { int count = 0; Iterator\u0026lt;Quad\u0026gt; quads = dg.find( null, s, p, null ); while (quads.hasNext()) { quads.next(); count += 1; } return count \u0026gt; 1; } } Multiple fields per document In principle it should be possible to extend Jena to allow for creating documents with multiple searchable fields by extending org.apache.jena.sparql.core.DatasetChangesBatched such as with org.apache.jena.query.text.TextDocProducerEntities; however, this form of extension is not currently (Jena 3.13.1) functional.\nMaven Dependency The jena-text module is included in Fuseki. To use it within application code, then use the following maven dependency:\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-text\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;X.Y.Z\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; adjusting the version X.Y.Z as necessary. This will automatically include a compatible version of Lucene.\nFor Elasticsearch implementation, you can include the following Maven Dependency:\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-text-es\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;X.Y.Z\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; adjusting the version X.Y.Z as necessary.\n","permalink":"https://jena.apache.org/documentation/query/text-query.html","tags":null,"title":"Jena Full Text Search"},{"categories":null,"contents":" Jena Core Fuseki JavaDoc Fuseki2 Webapp Fuseki2 Main ARQ(SPARQL) RDF Connection TDB Text search SHACL ShEx RDF Patch GeoSPARQL Query Builder Service Enhancer Security Permissions JavaDoc ","permalink":"https://jena.apache.org/documentation/javadoc.html","tags":null,"title":"Jena JavaDoc"},{"categories":null,"contents":" Jena JDBC will be removed in Jena5.\nJena JDBC is a set of libraries which provide SPARQL over JDBC driver implementations.\nThis is a pure SPARQL over JDBC implementation, there is no attempt to present the underlying RDF data model as a relational model through the driver and only SPARQL queries and updates are supported.\nIt provides type 4 drivers in that they are pure Java based but the drivers are not JDBC compliant since by definition they do not support SQL.\nThis means that the drivers can be used with JDBC tools provided that those tools don\u0026rsquo;t restrict you to SQL or auto-generate SQL. So it can be used with a tool like SquirrelSQL since you can freely enter SPARQL queries and updates. Conversely it cannot be used with a tool like a SQL based ORM which generates SQL.\nDocumentation Overview Basic Usage Alternatives Jena JDBC Drivers Maven Artifacts for Jena JDBC Implementing a custom Jena JDBC Driver Overview Jena JDBC aims to be a pure SPARQL over JDBC driver, it assumes that all commands that come in are either SPARQL queries or updates and processes them as such.\nAs detailed on the drivers page there are actually three drivers provided currently:\nIn-Memory - uses an in-memory dataset to provide non-persistent storage TDB - uses a TDB dataset to provide persistent and transactional storage Remote Endpoint - uses HTTP based remote endpoints to access any SPARQL protocol compliant storage These are all built on a core library which can be used to build custom drivers if desired. This means that all drivers share common infrastructure and thus exhibit broadly speaking the same behavior around handling queries, updates and results.\nJena JDBC is published as a Maven module via its maven artifacts. The source for Jena JDBC may be downloaded as part of the source distribution.\nTreatment of Results One important behavioral aspect to understand is how results are treated compared to a traditional JDBC driver. SPARQL provides four query forms and thus four forms of results while JDBC assumes all results have a simple tabular format. Therefore one of the main jobs of the core library is to marshal the results of each kind of query into a tabular format. For SELECT queries this is a trivial mapping, for CONSTRUCT and DESCRIBE the triples are mapped to columns named Subject, Predicate and Object respectively, finally for ASK the boolean is mapped to a single column named ASK.\nThe second issue is that JDBC expects uniform column typing throughout a result set which is not something that holds true for SPARQL results. Therefore the core library takes a pragmatic approach to column typing and makes the exact behavior configurable by the user. The default behavior of the core library is to type all columns as Types.NVARCHAR with a Java type of String, this provides the widest compatibility possible with both the SPARQL results and consuming tools since we can treat everything as a string. We refer to this default behavior as medium compatibility, it is sufficient to allow JDBC tools to interpret results for basic display but may be unsuitable for further processing.\nWe then provide two alternatives, the first of which we refer to as high compatibility aims to present the data in a way that is more amenable to subsequent processing by JDBC tools. In this mode the column types in a result set are detected by sniffing the data in the first row of the result set and assigning appropriate types. For example if the first row for a given column has the value \u0026quot;1234\u0026quot;^^xsd:integer then it would be assigned the type Types.BIGINT and have the Java type of Long. Doing this allows JDBC tools to carry out subsequent calculations on the data in a type appropriate way. It is important to be aware that this sniffing may not be accurate for the entire result set so can still result in errors processing some rows.\nThe second alternative we refer to as low compatibility and is designed for users who are using the driver directly and are fully aware that they are writing SPARQL queries and getting SPARQL results. In this mode we make no effort to type columns in a friendly way instead typing them as Types.JAVA_OBJECT with the Java type Node (i.e. the Jena Node class).\nRegardless of how you configure to do column typing the core library does it best to allow you to marshal values into strong types. For example even if using default compatibility and your columns are typed as strings from a JDBC perspective you can still call getLong(\u0026quot;column\u0026quot;) and if there is a valid conversion the library will make it for you.\nAnother point of interest is around our support of different result set types. The drivers support both ResultSet.TYPE_FORWARD_ONLY and ResultSet.TYPE_SCROLL_INSENSITIVE, note that regardless of the type chosen and the underlying query type all result sets are ResultSet.CONCUR_READ_ONLY i.e. the setLong() style methods cannot be used to update the underlying RDF data. Users should be aware that the default behavior is to use forward only result sets since this allows the drivers to stream the results and minimizes memory usage. When scrollable result sets are used the drivers will cache all the results into memory which can use lots of memory when querying large datasets.\nBasic Usage The following takes you through the basic usage of the in-memory JDBC driver. The code should be familiar to anyone who has used JDBC before and is easily used with our other drivers simply by changing the connection URL appropriately.\nEstablishing a Connection Firstly we should ensure that the driver we wish to use is registered with the JDBC driver manager, a static method is provided for this:\nMemDriver.register(); Once this is done we can then make a JDBC connection just be providing an appropriate connection URL:\n// Make a connection using the In-Memory driver starting from an empty dataset Connection conn = DriverManager.getConnection(\u0026quot;jdbc:jena:mem:empty=true\u0026quot;); Now we can go ahead and use the connection as you would normally.\nPerforming Queries You make queries as you would with any JDBC driver, the only difference being that the queries must be SPARQL:\n// Need a statement Statement stmt = conn.createStatement(); try { // Make a query ResultSet rset = stmt.executeQuery(\u0026quot;SELECT DISTINCT ?type WHERE { ?s a ?type } LIMIT 100\u0026quot;); // Iterate over results while (rset.next()) { // Print out type as a string System.out.println(rset.getString(\u0026quot;type\u0026quot;)); } // Clean up rset.close(); } catch (SQLException e) { System.err.println(\u0026quot;SQL Error - \u0026quot; + e.getMessage()); } finally { stmt.close(); } Performing Updates You make updates as you would with any JDBC driver. Again the main difference is that updates must be SPARQL, one downside of this is that SPARQL provides no way to indicate the number of triples/quads affected by an update so the JDBC driver will either return 0 for successful updates or throw a SQLException for failed updates:\n// Need a statement Statement stmt = conn.createStatement(); // Make an update try { stmt.executeUpdate(\u0026quot;INSERT DATA { \u0026lt;http://x\u0026gt; \u0026lt;http://y\u0026gt; \u0026lt;http://z\u0026gt; }\u0026quot;); System.out.println(\u0026quot;Update succeeded\u0026quot;); } catch (SQLException e) { System.out.println(\u0026quot;Update Failed \u0026quot; - + e.getMessage()); } finally { // Clean up stmt.close(); } Alternatives If Jena JDBC does not fulfill your use case you may also be interested in some 3rd party projects which do SPARQL over JDBC in other ways:\nClaude Warren\u0026rsquo;s jdbc4sparql - An alternative approach that does expose the underlying RDF data model as a relational model and supports translating SQL into SPARQL William Greenly\u0026rsquo;s jdbc4sparql - A similar approach to Jena JDBC restricted to accessing HTTP based SPARQL endpoints Paul Gearon\u0026rsquo;s scon - A similar approach to Jena JDBC restricted to accessing HTTP based SPARQL endpoints ","permalink":"https://jena.apache.org/documentation/jdbc/","tags":null,"title":"Jena JDBC - A SPARQL over JDBC driver framework"},{"categories":null,"contents":"Jena JDBC comes with three built in drivers by default with the option of building custom drivers if desired. This page covers the differences between the provided drivers and the connection URL options for each.\nConnection URL Basics Connection URLs for Jena JDBC drivers have a common format, they all start with the following:\njdbc:jena:foo: Where foo is a driver specific prefix that indicates which specific driver implementation is being used.\nAfter the prefix the connection URL consists of a sequence of key value pairs, the characters ampersand (\u0026amp;), semicolon (;) and pipe (|) are considered to be separators between pairs, the separators are reserved characters and may not be used in values. The key is separated from the value by a equals sign (=) though unlike the separators this is not a reserved character in values.\nThere is no notion of character escaping in connection parameters so if you need to use any of the reserved characters in your values then you should pass these to the connect(String, Properties) method directly in the Properties object.\nCommon Parameters There are some common parameter understood by all Jena JDBC drivers and which apply regardless of driver implementation.\nJDBC Compatibility Level As discussed in the overview the drivers have a notion of JDBC compatibility which is configurable. The jdbc-compatibility parameter is used in connection URLs. To avoid typos when creating URLs programmatically a constant (JenaDriver.PARAM_JDBC_COMPATIBILITY) is provided which contains the parameter name exactly as the code expects it. This parameter provides an integer value in the range 1-9 which denotes how compatible the driver should attempt to be. See the aforementioned overview documentation for more information on the interpretation of this parameter.\nWhen not set the default compatibility level is used, note that JenaConnection objects support changing this after the connection has been established.\nPre-Processors The second of the common parameters is the pre-processor parameter which is used to specify one/more CommandPreProcessor implementations to use. The parameter should be specified once for each pre-processor you wish to you and you should supply a fully qualified class name to ensure the pre-processor can be loaded and registered on your connections. The driver will report an error if you specify a class that cannot be appropriately loaded and registered.\nPre-processors are registered in the order that they are specified so if you use multiple pre-processors and they have ordering dependencies please ensure that you specify them in the desired order. Note that JenaConnection objects support changing registered pre-processors after the connection has been established.\nPost-Processors There is also a post-processor parameter which is used to specify one/more ResultsPostProcessor implementations to use. The parameter should be specified once for each post-processor you wish to use and you should supply a fully qualified class name to ensure the post-processor can be loaded and registered on your connections. The driver will report an error is you specify a class that cannot be appropriately loaded and registered.\nPost-processors are registered in the order that they are specified so if you use multiple post-processors and they have ordering dependencies please ensure that you specify them in the desired order. Note that JenaConnection objects support changing registered post-processors after the connection has been established.\nAvailable Drivers In-Memory TDB Remote Endpoint Each driver is available as a separate maven artifact, see the artifacts page for more information.\nIn-Memory The in-memory driver provides access to a non-persistent non-transactional in-memory dataset. This dataset may either be initially empty or may be initialized from an input file. Remember that this is non-persistent so even if the latter option is chosen changes are not persisted to the input file. This driver is primarily intended for testing and demonstration purposes.\nBeyond the common parameters it has two possible connection parameters. The first of these is the dataset parameter and is used to indicate an input file that the driver will initialize the in-memory dataset with e.g.\njdbc:jena:mem:dataset=file.nq If you prefer to start with an empty dataset you should use the empty parameter instead e.g.\njdbc:jena:mem:empty=true If both are specified then the dataset parameter has precedence.\nTDB The TDB driver provides access to a persistent Jena TDB dataset. This means that the dataset is both persistent and can be used transactionally. For correct transactional behavior it is typically necessary to set the holdability for connections and statements to ResultSet.HOLD_CURSORS_OVER_COMMIT as otherwise closing a result set or making an update will cause all other results to be closed.\nBeyond the common parameters the driver requires a single location parameter that provides the path to a location for a TDB dataset e.g.\njdbc:jena:tdb:location=/path/to/data By default a TDB dataset will be created in that location if one does not exist, if you would prefer not to do this i.e. ensure you only access existing TDB datasets then you can add the must-exist parameter e.g.\njdbc:jena:tdb:location=/path/to/data\u0026amp;must-exist=true With this parameter set the connection will fail if the location does not exist as a directory, note that this does not validate that the location is a TDB dataset so it is still possible to pass in invalid paths even with this set.\nRemote Endpoint The Remote Endpoint driver provides access to any SPARQL Protocol compliant store that exposes SPARQL query and/or SPARQL update endpoints. This driver can be explicitly configured to be in read-only or write-only mode by providing only one of the required endpoints.\nThe query parameter sets the query endpoint whilst the update parameter sets the update endpoint e.g.\njdbc:jena:remote:query=http://localhost:3030/ds/query\u0026amp;update=http://localhost:3030/ds/update At least one of these parameters is required, if only one is provided you will get a read-only or write-only connection as appropriate.\nThis driver also provides a whole variety of parameters that may be used to customize its behavior further. Firstly there are a set of parameters which control the dataset description provided via the SPARQL protocol:\ndefault-graph-uri - Sets a default graph for queries named-graph-uri - Sets a named graph for queries using-graph-uri - Sets a default graph for updates using-named-graph-uri - Sets a named graph for updates All of these may be specified multiple times to specify multiple graph URIs for each.\nThen you have the select-results-type and model-results-type which are used to set the MIME type you\u0026rsquo;d prefer to have the driver retrieve SPARQL results from the remote endpoints in. If used you must set them to formats that ARQ supports, the ARQ WebContent class has constants for the various supported formats.\nAuthentication There is also comprehensive support for authentication using this driver, the standard JDBC user and password parameters are used for credentials and then a selection of driver specific parameters are used to configure how you wish the driver to authenticate.\nUnder the hood authentication uses the new HttpAuthenticator framework introduced in the same release as Jena JDBC, see HTTP Authentication in ARQ. This means that it can support standard HTTP auth methods (Basic, Digest etc) or can use more complex schemes such as forms based auth with session cookies.\nTo set up standard HTTP authentication it is sufficient to specify the user and password fields. As with any JDBC application we strongly recommend that you do not place these in the connection URL directly but rather use the Properties object to pass these in. One option you may wish to include if your endpoints use HTTP Basic authentication is the preemptive-auth parameter which when set to true will enable preemptive authentication. While this is less secure it can be more performant if you are making lots of queries.\nSetting up form based authentication is somewhat more complex, at a minimum you need to provide the form-url parameter with a value for the URL that user credentials should be POSTed to in order to login. You may need to specify the form-user-field and form-password-field parameters to provide the name of the fields for the login request, by default these assume you are using an Apache mod_auth_form protected server and use the appropriate default values.\nThe final option for authenticator is to use the authenticator parameter via the Properties object to pass in an actual instance of a HttpAuthenticator that you wish to use. This method is the most powerful in that it allows you to use any authentication method that you need.\n","permalink":"https://jena.apache.org/documentation/jdbc/drivers.html","tags":null,"title":"Jena JDBC Drivers"},{"categories":null,"contents":"This section is a general introduction to the Jena ontology API, including some of the common tasks you may need to perform. We won\u0026rsquo;t go into all of the many details of the API here: you should expect to refer to the Javadoc to get full details of the capabilities of the API.\nPrerequisites We\u0026rsquo;ll assume that you have a basic familiarity with RDF and with Jena. If not, there are other Jena help documents you can read for background on these topics, and a collection of tutorials.\nJena is a programming toolkit, using the Java programming language. While there are a few command-line tools to help you perform some key tasks using Jena, mostly you use Jena by writing Java programs. The examples in this document will be primarily code samples.\nWe also won\u0026rsquo;t be explaining the OWL or RDFS ontology languages in much detail in this document. You should refer to supporting documentation for details on those languages, for example the W3C OWL document index.\nNote: Although OWL version 1.1 is now a W3C recommendation, Jena\u0026rsquo;s support for OWL 1.1 features is limited. We will be addressing this in future versions Jena.\nOverview The section of the manual is broken into a number of sections. You do not need to read them in sequence, though later sections may refer to concepts and techniques introduced in earlier sections. The sections are:\nGeneral concepts Running example: the ESWC ontology Creating ontology models Compound ontology documents and imports processing The generic ontology type: OntResource Ontology classes and basic class expressions Ontology properties More complex class expressions Instances or individuals Ontology meta-data Ontology inference: overview Working with persistent ontologies Experimental ontology tools Further assistance Hopefully, this document will be sufficient to help most readers to get started using the Jena ontology API. For further support, please post questions to the Jena support list, or file a bug report.\nPlease note that we ask that you use the support list or the bug-tracker to communicate with the Jena team, rather than send email to the team members directly. This helps us manage Jena support more effectively, and facilitates contributions from other Jena community members.\nGeneral concepts In a widely-quoted definition, an ontology is\n\u0026ldquo;a specification of a conceptualization\u0026rdquo; [Gruber, T. 1993]\nLet\u0026rsquo;s unpack that brief characterisation a bit. An ontology allows a programmer to specify, in an open, meaningful, way, the concepts and relationships that collectively characterise some domain of interest. Examples might be the concepts of red and white wine, grape varieties, vintage years, wineries and so forth that characterise the domain of \u0026lsquo;wine\u0026rsquo;, and relationships such as \u0026lsquo;wineries produce wines\u0026rsquo;, \u0026lsquo;wines have a year of production\u0026rsquo;. This wine ontology might be developed initially for a particular application, such as a stock-control system at a wine warehouse. As such, it may be considered similar to a well-defined database schema. The advantage to an ontology is that it is an explicit, first-class description. So having been developed for one purpose, it can be published and reused for other purposes. For example, a given winery may use the wine ontology to link its production schedule to the stock system at the wine warehouse. Alternatively, a wine recommendation program may use the wine ontology, and a description (ontology) of different dishes to recommend wines for a given menu.\nThere are many ways of writing down an ontology, and a variety of opinions as to what kinds of definition should go in one. In practice, the contents of an ontology are largely driven by the kinds of application it will be used to support. In Jena, we do not take a particular view on the minimal or necessary components of an ontology. Rather, we try to support a variety of common techniques. In this section, we try to explain what is – and to some extent what isn\u0026rsquo;t – possible using Jena\u0026rsquo;s ontology support.\nSince Jena is fundamentally an RDF platform, Jena\u0026rsquo;s ontology support is limited to ontology formalisms built on top of RDF. Specifically this means RDFS, the varieties of OWL. We will provide a very brief introduction to these languages here, but please refer to the extensive on-line documentation for these formalisms for complete and authoritative details.\nRDFS RDFS is the weakest ontology language supported by Jena. RDFS allows the ontologist to build a simple hierarchy of concepts, and a hierarchy of properties. Consider the following trivial characterisation (with apologies to biology-trained readers!):\nTable 1: A simple concept hierarchy\nUsing RDFS, we can say that my ontology has five classes, and that Plant is a sub-class of Organism and so on. So every animal is also an organism. A good way to think of these classes is as describing sets of individuals: organism is intended to describe a set of living things, some of which are animals (i.e. a sub-set of the set of organisms is the set of animals), and some animals are fish (a subset of the set of all animals is the set of all fish).\nTo describe the attributes of these classes, we can associate properties with the classes. For example, animals have sensory organs (noses, eyes, etc.). A general property of an animal might be senseOrgan, to denote any given sensory organs a particular animal has. In general, fish have eyes, so a fish might have a eyes property to refer to a description of the particular eye structure of some species. Since eyes are a type of sensory organ, we can capture this relationship between these properties by saying that eye is a sub-property-of senseOrgan. Thus if a given fish has two eyes, it also has two sense organs. (It may have more, but we know that it must have two).\nWe can describe this simple hierarchy with RDFS. In general, the class hierarchy is a graph rather than a tree (i.e. not like Java class inheritance). The slime mold is popularly, though perhaps not accurately, thought of as an organism that has characteristics of both plants and animals. We might model a slime mold in our ontology as a class that has both plant and animal classes among its super-classes. RDFS is too weak a language to express the constraint that a thing cannot be both a plant and an animal (which is perhaps lucky for the slime molds). In RDFS, we can only name the classes, we cannot construct expressions to describe interesting classes. However, for many applications it is sufficient to state the basic vocabulary, and RDFS is perfectly well suited to this.\nNote also that we can both describe classes, in general terms, and we can describe particular instances of those classes. So there may be a particular individual Fred who is a Fish (i.e. has rdf:type Fish), and who has two eyes. Their companion Freda, a Mexican Tetra, or blind cave fish, has no eyes. One use of an ontology is to allow us to fill-in missing information about individuals. Thus, though it is not stated directly, we can deduce that Fred is also an Animal and an Organism. Assume that there was no rdf:type asserting that Freda is a Fish. We may still infer Freda\u0026rsquo;s rdf:type since Freda has lateral lines as sense organs, and these only occur in fish. In RDFS, we state that the domain of the lateralLines property is the Fish class, so an RDFS reasoner can infer that Freda must be a fish.\nOWL In general, OWL allows us to say everything that RDFS allows, and much more besides. A key part of OWL is the ability to describe classes in more interesting and complex ways. For example, in OWL we can say that Plant and Animal are disjoint classes: no individual can be both a plant and an animal (which would have the unfortunate consequence of making SlimeMold an empty class). SaltwaterFish might be the intersection of Fish and the class SeaDwellers (which also includes, for example, cetaceans and sea plants).\nSuppose we have a property covering, intended to represent the scales of a fish or the fur of a mammal. We can now refine the mammal class to be \u0026lsquo;animals that have a covering that is hair\u0026rsquo;, using a property restriction to express the condition that property covering has a value from the class Hair. Similarly TropicalFish might be the intersection of the class of Fish and the class of things that have TropicalOcean as their habitat.\nFinally (for this brief overview), we can say more about properties in OWL. In RDFS, properties can be related via a property hierarchy. OWL extends this by allowing properties to be denoted as transitive, symmetric or functional, and allow one property to be declared to be the inverse of another. OWL also makes a distinction between properties that have individuals (RDF resources) as their range and properties that have data-values (known as literals in RDF terminology) as their range. Respectively these are object properties and datatype properties. One consequence of the RDF lineage of OWL is that OWL ontologies cannot make statements about literal values. We cannot say in RDF that seven has the property of being a prime number. We can, of course, say that the class of primes includes seven, doing so doesn\u0026rsquo;t require a number to be the subject of an RDF statement. In OWL, this distinction is important: only object properties can be transitive or symmetric.\nThe OWL language is sub-divided into three syntax classes: OWL Lite, OWL DL and OWL Full. OWL DL does not permit some constructions allowed in OWL Full, and OWL Lite has all the constraints of OWL DL plus some more. The intent for OWL Lite and OWL DL is to make the task of reasoning with expressions in that subset more tractable. Specifically, OWL DL is intended to be able to be processed efficiently by a description logic reasoner. OWL Lite is intended to be amenable to processing by a variety of reasonably simple inference algorithms, though experts in the field have challenged how successfully this has been achieved.\nWhile the OWL standards documents note that OWL builds on top of the (revised) RDF specifications, it is possible to treat OWL as a separate language in its own right, and not something that is built on an RDF foundation. This view uses RDF as a serialisation syntax; the RDF-centric view treats RDF triples as the core of the OWL formalism. While both views are valid, in Jena we take the RDF-centric view.\nOntology languages and the Jena Ontology API As we outlined above, there are various different ontology languages available for representing ontology information on the semantic web. They range from the most expressive, OWL Full, through to the weakest, RDFS. Through the Ontology API, Jena aims to provide a consistent programming interface for ontology application development, independent of which ontology language you are using in your programs.\nThe Jena Ontology API is language-neutral: the Java class names are not specific to the underlying language. For example, the OntClass Java class can represent an OWL class or RDFS class. To represent the differences between the various representations, each of the ontology languages has a profile, which lists the permitted constructs and the names of the classes and properties.\nThus in the OWL profile is it owl:ObjectProperty (short for http://www.w3.org/2002/07/owl#ObjectProperty) and in the RDFS profile it is null since RDFS does not define object properties.\nThe profile is bound to an ontology model, which is an extended version of Jena\u0026rsquo;s Model class. The base Model allows access to the statements in a collection of RDF data. OntModel extends this by adding support for the kinds of constructs expected to be in an ontology: classes (in a class hierarchy), properties (in a property hierarchy) and individuals.\nWhen you\u0026rsquo;re working with an ontology in Jena, all of the state information remains encoded as RDF triples (accessed as Jena Statements) stored in the RDF model. The ontology API doesn\u0026rsquo;t change the RDF representation of ontologies. What it does do is add a set of convenience classes and methods that make it easier for you to write programs that manipulate the underlying RDF triples.\nThe predicate names defined in the ontology language correspond to the accessor methods on the Java classes in the API. For example, an OntClass has a method to list its super-classes, which corresponds to the values of the subClassOf property in the RDF representation. This point is worth re-emphasising: no information is stored in the OntClass object itself. When you call the OntClass listSuperClasses() method, Jena will retrieve the information from the underlying RDF triples. Similarly, adding a subclass to an OntClass asserts an additional RDF triple, typically with predicate rdfs:subClassOf into the model.\nOntologies and reasoning One of the key benefits of building an ontology-based application is using a reasoner to derive additional truths about the concepts you are modelling. We saw a simple instance of this above: the assertion \u0026ldquo;Fred is a Fish\u0026rdquo; entails the deduction \u0026ldquo;Fred is an Animal\u0026rdquo;. There are many different styles of automated reasoner, and very many different reasoning algorithms. Jena includes support for a variety of reasoners through the inference API.\nA common feature of Jena reasoners is that they create a new RDF model which appears to contain the triples that are derived from reasoning as well as the triples that were asserted in the base model. This extended model nevertheless still conforms to the contract for Jena models. It can be used wherever a non-inference model can be used. The ontology API exploits this feature: the convenience methods provide by the ontology API can query an extended inference model in just the same way that they can a plain RDF model. In fact, this is such a common pattern that we provide simple recipes for constructing ontology models whose language, storage model and reasoning engine can all be simply specified when an OntModel is created. We\u0026rsquo;ll show examples shortly.\nFigure 2 shows one way of visualising this:\nGraph is an internal Jena interface that supports the composition of sets of RDF triples. The asserted statements, which may have been read in from an ontology document, are held in the base graph. The reasoner, or inference engine, can use the contents of the base graph and the semantic rules of the language to show a more complete set of base and entailed triples. This is also presented via a Graph interface, so the OntModel works only with the outermost interface. This regularity allows us to very easily build ontology models with or without a reasoner. It also means that the base graph can be an in-memory store, a database-backed persistent store, or some other storage structure altogether – e.g. an LDAP directory – again without affecting the operation of the ontology model (but noting that these different approaches may have very different efficiency profiles).\nRDF-level polymorphism and Java Deciding which Java abstract class to use to represent a given RDF resource can be surprisingly subtle. Consider the following RDF sample:\n\u0026lt;owl:Class rdf:ID=\u0026quot;DigitalCamera\u0026quot;\u0026gt; \u0026lt;/owl:Class\u0026gt; This declares that the resource with the relative URI #DigitalCamera is an OWL ontology class. It suggests that it would be appropriate to model that declaration in Java with an instance of an OntClass. Now suppose we add a triple to the RDF model to augment the class declaration with some more information:\n\u0026lt;owl:Class rdf:ID=\u0026quot;DigitalCamera\u0026quot;\u0026gt; \u0026lt;rdf:type owl:Restriction /\u0026gt; \u0026lt;/owl:Class\u0026gt; Now we are stating that #DigitalCamera is an OWL Restriction. Restriction is a subclass of owl:Class, so this is a perfectly consistent operation. The problem we then have is that Java does not allow us to dynamically change the Java class of the object representing this resource. The resource has not changed: it still has URI #DigitalCamera. But the appropriate Java class Jena might choose to encapsulate it has changed from OntClass to Restriction. Conversely, if we subsequently remove the rdf:type owl:Restriction from the model, using the Restriction Java class is no longer appropriate.\nEven worse, OWL Full allows us to state the following (rather counter-intuitive) construction:\n\u0026lt;owl:Class rdf:ID=\u0026quot;DigitalCamera\u0026quot;\u0026gt; \u0026lt;rdf:type owl:ObjectProperty /\u0026gt; \u0026lt;/owl:Class\u0026gt; That is, #DigitalCamera is both a class and a property. While this may not be a very useful claim, it illustrates a basic point: we cannot rely on a consistent or unique mapping between an RDF resource and the appropriate Java abstraction.\nJena accepts this basic characteristic of polymorphism at the RDF level by considering that the Java abstraction (OntClass, Restriction, DatatypeProperty, etc.) is just a view or facet of the resource. That is, there is a one-to-many mapping from a resource to the facets that the resource can present. If the resource is typed as an owl:Class, it can present the OntClass facet; given other types, it can present other facets. Jena provides the .as() method to efficiently map from an RDF object to one of its allowable facets. Given a RDF object (i.e. an instance of org.apache.jena.rdf.model.RDFNode or one of its sub-types), you can get a facet by invoking as() with an argument that denotes the facet required. Specifically, the facet is identified by the Java class object of the desired facet. For example, to get the OntClass facet of a resource, we can write:\nResource r = myModel.getResource( myNS + \u0026quot;DigitalCamera\u0026quot; ); OntClass cls = r.as( OntClass.class ); This pattern allows our code to defer decisions about the correct Java abstraction to use until run-time. The choice can depend on the properties of the resource itself. If a given RDFNode will not support the conversion to a given facet, it will raise a ConversionException. We can test whether .as() will succeed for a given facet with canAs(). This RDF-level polymorphism is used extensively in the Jena ontology API to allow maximum flexibility in handling ontology data.\nRunning example: the ESWC ontology To illustrate the principles of using the ontology API, we will use examples drawn from the ESWC ontology This ontology presents a simple model for describing the concepts and activities associated with a typical academic conference. A copy of the ontology serialized in RDF/XML is included with the Jena download, see: [eswc-2006-09-21.rdf] (note that you may need to view the page source in some browsers to see the XML code).\nA subset of the classes and properties from the ontology are shown in Figure 3:\nFigure 3: Classes and properties from ESWC ontology\nWe will use elements from this ontology to illustrate the ontology API throughout the rest of this document.\nCreating ontology models An ontology model is an extension of the Jena RDF model, providing extra capabilities for handling ontologies. Ontology models are created through the Jena ModelFactory. The simplest way to create an ontology model is as follows:\nOntModel m = ModelFactory.createOntologyModel(); This will create an ontology model with the default settings, which are set for maximum compatibility with the previous version of Jena. These defaults are:\nOWL-Full language in-memory storage RDFS inference, which principally produces entailments from the sub-class and sub-property hierarchies. Important note: this means that the default ontology model does include some inferencing, with consequences both for the performance of the model, and for the triples which appear in the model.\nIn many applications, such as driving a GUI, RDFS inference is too strong. For example, every class is inferred to be an immediate sub-class of owl:Thing. In other applications, stronger reasoning is needed. In general, to create an OntModel with a particular reasoner or language profile, you should pass a model specification to the createOntologyModel call. For example, an OWL model that performs no reasoning at all can be created with:\nOntModel m = ModelFactory.createOntologyModel( OntModelSpec.OWL_MEM ); To create an ontology model for a particular language, but leaving all of the other values as defaults, you should pass the URI of the ontology language to the model factory. The URI strings for the various language profiles are:\nOntology language URI RDFS http://www.w3.org/2000/01/rdf-schema# OWL Full http://www.w3.org/2002/07/owl# OWL DL http://www.w3.org/TR/owl-features/#term_OWLDL OWL Lite http://www.w3.org/TR/owl-features/#term_OWLLite These URI\u0026rsquo;s are used to look-up the language profile from the ProfileRegistry. The profile registry contains public constant declarations so that you do not have to remember these URI\u0026rsquo;s. Please note that the URI\u0026rsquo;s denoting OWL Lite and OWL DL are not officially sanctioned by the OWL standard.\nBeyond these basic choices, the complexities of configuring an ontology model are wrapped up in a recipe object called OntModelSpec. This specification allows complete control over the configuration choices for the ontology model, including the language profile in use, the reasoner, and the means of handling compound documents. A number of common recipes are pre-declared as constants in OntModelSpec, and listed below.\nOntModelSpec Language profile Storage model Reasoner OWL_MEM OWL full in-memory none OWL_MEM_TRANS_INF OWL full in-memory transitive class-hierarchy inference OWL_MEM_RULE_INF OWL full in-memory rule-based reasoner with OWL rules OWL_MEM_MICRO_RULE_INF OWL full in-memory optimised rule-based reasoner with OWL rules OWL_MEM_MINI_RULE_INF OWL full in-memory rule-based reasoner with subset of OWL rules OWL_DL_MEM OWL DL in-memory none OWL_DL_MEM_RDFS_INF OWL DL in-memory rule reasoner with RDFS-level entailment-rules OWL_DL_MEM_TRANS_INF OWL DL in-memory transitive class-hierarchy inference OWL_DL_MEM_RULE_INF OWL DL in-memory rule-based reasoner with OWL rules OWL_LITE_MEM OWL Lite in-memory none OWL_LITE_MEM_TRANS_INF OWL Lite in-memory transitive class-hierarchy inference OWL_LITE_MEM_RDFS_INF OWL Lite in-memory rule reasoner with RDFS-level entailment-rules OWL_LITE_MEM_RULES_INF OWL Lite in-memory rule-based reasoner with OWL rules RDFS_MEM RDFS in-memory none RDFS_MEM_TRANS_INF RDFS in-memory transitive class-hierarchy inference RDFS_MEM_RDFS_INF RDFS in-memory rule reasoner with RDFS-level entailment-rules For details of reasoner capabilities, please see the inference documentation and the Javadoc for OntModelSpec. See also further discussion below.\nNote: it is primarily the choice of reasoner, rather than the choice of language profile, which determines which entailments are seen by the ontology model.\nTo create a model with a given specification, you should invoke the ModelFactory as follows:\nOntModel m = ModelFactory.createOntologyModel( \u0026lt;model spec\u0026gt; ); for example:\nOntModel m = ModelFactory.createOntologyModel( OntModelSpec.OWL_MEM_MICRO_RULE_INF ); To create a custom model specification, you can create a new one from its constructor, and call the various setter methods to set the appropriate values. More often, we want a variation on an existing recipe. In this case, you copy an existing specification and then update the copy as necessary:\nOntModelSpec s = new OntModelSpec( OntModelSpec.OWL_MEM ); s.setDocumentManager( myDocMgr ); OntModel m = ModelFactory.createOntologyModel( s ); Compound ontology documents and imports processing The OWL ontology language includes some facilities for creating modular ontologies that can be re-used in a similar manner to software modules. In particular, one ontology can import another. Jena helps ontology developers to work with modular ontologies by automatically handling the imports statements in ontology models.\nThe key idea is that the base model of an ontology model is actually a collection of models, one per imported model. This means we have to modify figure 2 a bit. Figure 4 shows how the ontology model builds a collection of import models:\nFigure 4: ontology model compound document structure for imports\nWe will use the term document to describe an ontology serialized in some transport syntax, such as RDF/XML or N3. This terminology isn\u0026rsquo;t used by the OWL or RDFS standards, but it is a convenient way to refer to the written artifacts. However, from a broad view of the interlinked semantic web, a document view imposes artificial boundaries between regions of the global web of data and isn\u0026rsquo;t necessarily a useful way of thinking about ontologies.\nWe will load an ontology document into an ontology model in the same way as a normal Jena model, using the read method. There are several variants on read, that handle differences in the source of the document (to be read from a resolvable URL or directly from an input stream or reader), the base URI that will resolve any relative URI\u0026rsquo;s in the source document, and the serialisation language. In summary, these variants are:\nread( String url ) read( Reader reader, String base ) read( InputStream reader, String base ) read( String url, String lang ) read( Reader reader, String base, String Lang ) read( InputStream reader, String base, String Lang ) You can use any of these methods to load an ontology document. Note that we advise that you avoid the read() variants that accept a java.io.Reader argument when loading XML documents containing internationalised character sets, since the handling of character encoding by the Reader and by XML parsers is not compatible.\nBy default, when an ontology model reads an ontology document, it will also locate and load the document\u0026rsquo;s imports. An OWL document may contain an individual of class Ontology, which contains meta-data about that document itself. For example:\n\u0026lt;owl:Ontology rdf:about=\u0026quot;\u0026quot;\u0026gt; \u0026lt;dc:creator rdf:value=\u0026quot;Ian Dickinson\u0026quot; /\u0026gt; \u0026lt;owl:imports rdf:resource=\u0026quot;http://jena.apache.org/examples/example-ont\u0026quot; /\u0026gt; \u0026lt;/owl:Ontology\u0026gt; The construct rdf:about=\u0026quot;\u0026quot; is a relative URI. It will resolve to the document\u0026rsquo;s base URI: in other words it\u0026rsquo;s a shorthand way of referring to the document itself. The owl:imports line states that this ontology is constructed using classes, properties and individuals from the referenced ontology. When an OntModel reads this document, it will notice the owl:imports line and attempt to load the imported ontology into a sub-model of the ontology model being constructed. The definitions from both the base ontology and all of the imports will be visible to the reasoner.\nEach imported ontology document is held in a separate graph structure. This is important: we want to keep the original source ontology separate from the imports. When we write the model out again, normally only the base model is written (the alternative is that all you see is a confusing union of everything). And when we update the model, only the base model changes. To get the base model or base graph from an OntModel, use:\nModel base = myOntModel.getBaseModel(); Imports are processed recursively, so if our base document imports ontology A, and A imports B, we will end up with the structure shown in Figure 4. Note that the imports have been flattened out. A cycle check is used to prevent the document handler getting stuck if, for example, A imports B which imports A!\nThe ontology document manager Each ontology model has an associated document manager which assists with the processing and handling of ontology documents and related concerns. For convenience, there is one global document manager which is used by default by ontology models. You can get a reference to this shared instance through OntDocumentManager.getInstance(). In many cases, it will be sufficient to simply change the settings on the global document manager to suit your application\u0026rsquo;s needs. However, for more fine-grain control, you can create separate document managers, and pass them to the ontology model when it is created through the model factory. To do this, create an ontology specification object (see above), and set the document manager. For example:\nOntDocumentManager mgr = new OntDocumentManager(); // set mgr's properties now ... some code ... // now use it OntModelSpec s = new OntModelSpec( OntModelSpec.RDFS_MEM ); s.setDocumentManager( mgr ); OntModel m = ModelFactory.createOntologyModel( s ); Note that the model retains a reference to the document manager it was created with. Thus if you change a document manager\u0026rsquo;s properties, it will affect models that have previously been constructed with that document manager.\nDocument manager policy Since the document manager has a large number of configurable options, there are two ways in which you can customise it to your application requirements. Firstly, you can set the individual parameters of the document manager by Java code. Alternatively, when a given document manager is created it can load values for the various parameters from a policy file, expressed in RDF. The document manager has a list of URL\u0026rsquo;s which it will search for a policy document. It will stop at the first entry on the list that resolves to a retrievable document. The default search path for the policy is: file:./etc/ont-policy.rdf;file:ont-policy.rdf. You can find the default policy, which can serve as a template for defining your own policies, in the etc/ directory under the Jena download directory.\nWe can set the general properties of the document manager in the policy as follows:\n\u0026lt;DocumentManagerPolicy\u0026gt; \u0026lt;!-- policy for controlling the document manager's behaviour --\u0026gt; \u0026lt;processImports rdf:datatype=\u0026quot;\u0026amp;xsd;boolean\u0026quot;\u0026gt;true\u0026lt;/processImports\u0026gt; \u0026lt;cacheModels rdf:datatype=\u0026quot;\u0026amp;xsd;boolean\u0026quot;\u0026gt;true\u0026lt;/cacheModels\u0026gt; \u0026lt;/DocumentManagerPolicy\u0026gt; You can find the simple schema that declares the various properties that you can use in such an ontology document policy in the vocabularies directory of the Jena download. It\u0026rsquo;s called ont-manager.rdf. To change the search path that the document manager will use to initialise itself, you can either pass the new search path as a string when creating a new document manager object, or call the method setMetadataSearchPath().\nThe ModelMaker: creating storage on demand In order for the document manager to build the union of the imported documents (which we sometimes refer to as the imports closure), there must be some means of creating new graphs to store the imported ontologies. Loading a new import means that a new graph needs to be added. Jena defines a model maker as a simple interface that allows different kinds of model storage (in-memory, file-backed, in a persistent database, etc.) to be created on demand. For the database case, this may include passing the database user-name and password and other connection parameters. New model makers can be created with the ModelFactory.\nThere are two cases in which we may want to create storage for models on-demand. The first is when creating the OntModel for the first time. Some variants of createOntologyModel will allocate space for the base model (instead of, for example, being handed a base model to use as one of the method arguments). The second case when storage must be allocated is when adding an imported document to the union of imports. These cases often require different policies, so the OntModelSpec contains two model maker parameters: the base model maker and imports model maker, available via getBaseModelMaker() and getImportsModelMaker() methods respectively.\nThe default specifications in OntModelSpec which begin MEM_ use an in-memory model maker for the both the base model and the imported documents.\nImplementation note: internally to Jena, we use Graph as a primary data structure. However, application code will almost always refer to models, not graphs. What\u0026rsquo;s happening is that a Model is a wrapper around the Graph, which balances a rich, convenient programming interface (Model) with a simple, manageable internal data structure (Graph). Hence some potential confusion in that Figure 4, above, refers to a structure containing graphs, but we use a ModelMaker to generate new stores. The document manager extracts the appropriate graph from the containing model. Except in cases where you are extending Jena\u0026rsquo;s internal structures, you should think of Model as the container of RDF and ontology data.\nControlling imports processing By default, loading imports during the read() call is automatic. To read() an ontology without building the imports closure, call the method setProcessImports( false ) on the document manager object before calling read(). Alternatively, you can set the processImports property in the policy file. You can also be more selective, and ignore only certain URI\u0026rsquo;s when loading the imported documents. To selectively skip certain named imports, call the method addIgnoreImport( String uri ) on the document manager object, or set the ignoreImport property in the policy.\nManaging file references An advantage of working with ontologies is that we can reuse work done by other ontologists, by importing their published ontologies into our own. The OntModel can load such referenced ontologies automatically from their published URL\u0026rsquo;s. This can mean that an application suffers a delay on startup. Worse, it may require extra work to cope with intervening firewalls or web proxies. Worse still, connectivity may be intermittent: we do not want our application to fail just because it temporarily does not have Internet access, or because a previously published ontology has been moved. To alleviate these commonly experienced problems, we can use Jena\u0026rsquo;s FileManager to manage local indirections, so that an attempt to import a document from a given published URL means that a local copy of the document is loaded instead. This may be a file on the local disk, or simply a pointer to a local mirror web site.\nWhile the FileManager can be configured directly, we can also specify redirections declaratively in the document manager policy file:\n\u0026lt;OntologySpec\u0026gt; \u0026lt;publicURI rdf:resource=\u0026quot;... the public URI to map from...\u0026quot; /\u0026gt; \u0026lt;altURL rdf:resource=\u0026quot;... the local URL to map to ...\u0026quot; /\u0026gt; \u0026lt;!-- optional ontology language term --\u0026gt; \u0026lt;language rdf:resource=\u0026quot;... encoding used ...\u0026quot; /\u0026gt; \u0026lt;!-- optional prefix to associate with the public URL --\u0026gt; \u0026lt;prefix rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt;a prefix\u0026lt;/prefix\u0026gt; \u0026lt;/OntologySpec\u0026gt; For example:\n\u0026lt;OntologySpec\u0026gt; \u0026lt;!-- local version of the RDFS vocabulary --\u0026gt; \u0026lt;publicURI rdf:resource=\u0026quot;http://www.w3.org/2000/01/rdf-schema\u0026quot; /\u0026gt; \u0026lt;altURL rdf:resource=\u0026quot;file:src/main/resources/rdf-schema.rdf\u0026quot; /\u0026gt; \u0026lt;/OntologySpec\u0026gt; This specifies that an attempt to load the RDFS vocabulary from http://www.w3.org/2000/01/rdf-schema will transparently cause file:src/main/resources/rdf-schema.rdf to be fetched instead. You can specify any number of such re-directions in the policy file, or you can add them to the document manager object directly by calling the various setter methods (see the Javadoc for details). As a side-effect, this mechanism also means that ontologies may be named with any legal URI (not necessarily resolvable) – so long as the altURL is itself resolvable.\nSee the notes on FileManager for details of additional options.\nIn the following example, we use the DocumentManager API to declare that the ESWC ontology is replicated locally on disk. We then load it using the normal URL. Assume that the constant JENA has been initialised to the directory in which Jena was installed.\nOntModel m = ModelFactory.createOntologyModel(); OntDocumentManager dm = m.getDocumentManager(); dm.addAltEntry( \u0026quot;http://www.eswc2006.org/technologies/ontology\u0026quot;, \u0026quot;file:\u0026quot; + JENA + \u0026quot;src/examples/resources/eswc-2006-09-21.rdf\u0026quot; ); m.read( \u0026quot;http://www.eswc2006.org/technologies/ontology\u0026quot; ); Specifying prefixes A model keeps a table of URI prefixes which can be used to present URI\u0026rsquo;s in the shortened prefix:name form. This is useful in displaying URI\u0026rsquo;s in a readable way in user interfaces, and is essential in producing legal XML names that denote arbitrary URI\u0026rsquo;s. The ontology model\u0026rsquo;s table of prefixes can be initialized from a table kept by the document manager, which contains the standard prefixes plus any that are declared by in the policy file (or added to subsequently by method calls).\nCaching of imported models You can use the document manager to assist with loading ontology documents through its cache. Suppose two ontologies, A and B, both import ontology C. We would like not to have to read C twice when loading A and then B. The document manager supports this use case by optionally caching C\u0026rsquo;s model, indexed by URI. When A tries to import C, there is no cached copy, so a new model is created for C, the contents of C\u0026rsquo;s URL read in to the model, then the C model is used in the compound document for A. Subsequently, when ontology B is loading imports, the document manager checks in its cache and finds an existing copy of C. This will be used in preference to reading a fresh copy of C from C\u0026rsquo;s source URL, saving both time and storage space.\nCaching of import models is switched on by default. To turn it off, use the policy property cacheModels, or call the method setCacheModels( boolean caching ) with caching = false. The document manager\u0026rsquo;s current model cache can be cleared at any time by calling clearCache().\nThe generic ontology type: OntResource All of the classes in the ontology API that represent ontology values have OntResource as a common super-class. This makes OntResource a good place to put shared functionality for all such classes, and makes a handy common return value for general methods. The Java interface OntResource extends Jena\u0026rsquo;s RDF Resource interface, so any general method that accepts a resource or an RDFNode will also accept an OntResource, and consequently, any other ontology value.\nSome of the common attributes of ontology resources that are expressed through methods on OntResource are shown below:\nAttribute Meaning versionInfo A string documenting the version or history of this resource comment A general comment associated with this value label A human-readable label seeAlso Another web location to consult for more information about this resource isDefinedBy A specialisation of seeAlso that is intended to supply a definition of this resource sameAs Denotes another resource that this resource is equivalent to differentFrom Denotes another resource that is distinct from this resource (by definition) For each of these properties, there is a standard pattern of available methods:\nMethod Effect add\u0026lt;property\u0026gt; Add an additional value for the given property set\u0026lt;property\u0026gt; Remove any existing values for the property, then add the given value list\u0026lt;property\u0026gt; Return an iterator ranging over the values of the property get\u0026lt;property\u0026gt; Return the value for the given property, if the resource has one. If not, return null. If it has more than one value, an arbitrary selection is made. has\u0026lt;property\u0026gt; Return true if there is at least one value for the given property. Depending on the name of the property, this is sometimes is\u0026lt;property\u0026gt; remove\u0026lt;property\u0026gt; Removes a given value from the values of the property on this resource. Has no effect if the resource does not have that value. For example: addSameAs( Resource r ), or isSameAs( Resource r ). For full details of the individual methods, please consult the Javadoc.\nOntResource defines some other general utility methods. For example, to find out how many values a resource has for a given property, you can call getCardinality( Property p ). To delete the resource from the ontology altogether, you can call remove(). The effect of this is to remove every statement that mentions this resource as a subject or object of a statement.\nTo get the value of a given property, use getPropertyValue( Property p ). To set it, setPropertyValue( Property p, RDFNode value ). Continuing the naming pattern, the values of a named property can be listed (with listPropertyValues), removed (with removeProperty) or added (with addProperty).\nFinally, OntResource provides methods for listing, getting and setting the rdf:type of a resource, which denotes a class to which the resource belongs (noting that, in RDF and OWL, a resource can belong to many classes at once). The rdf:type property is one for which many entailment rules are defined in the semantic models of the various ontology languages. Therefore, the values that listRDFTypes() returns is more than usually dependent on the reasoner bound to the ontology model. For example, suppose we have class A, class B which is a subclass of A, and resource x whose asserted rdf:type is B. With no reasoner, listing x\u0026rsquo;s RDF types will return only B. If the reasoner is able to calculate the closure of the subclass hierarchy (and most can), x\u0026rsquo;s RDF types would also include A. A complete OWL reasoner would also infer that x has rdf:type owl:Thing and rdf:Resource.\nFor some tasks, getting a complete list of the RDF types of a resource is exactly what is needed. For other tasks, this is not the case. If you are developing an ontology editor, for example, you may want to distinguish in its display between inferred and asserted types. In the above example, only x rdf:type B is asserted, everything else is inferred. One way to make this distinction is to make use of the base model (see Figure 4). Getting the resource from the base model and listing the type properties there would return only the asserted values. For example:\n// create the base model String SOURCE = \u0026quot;http://www.eswc2006.org/technologies/ontology\u0026quot;; String NS = SOURCE + \u0026quot;#\u0026quot;; OntModel base = ModelFactory.createOntologyModel( OWL_MEM ); base.read( SOURCE, \u0026quot;RDF/XML\u0026quot; ); // create the reasoning model using the base OntModel inf = ModelFactory.createOntologyModel( OWL_MEM_MICRO_RULE_INF, base ); // create a dummy paper for this example OntClass paper = base.getOntClass( NS + \u0026quot;Paper\u0026quot; ); Individual p1 = base.createIndividual( NS + \u0026quot;paper1\u0026quot;, paper ); // list the asserted types for (Iterator\u0026lt;Resource\u0026gt; i = p1.listRDFTypes(); i.hasNext(); ) { System.out.println( p1.getURI() + \u0026quot; is asserted in class \u0026quot; + i.next() ); } // list the inferred types p1 = inf.getIndividual( NS + \u0026quot;paper1\u0026quot; ); for (Iterator\u0026lt;Resource\u0026gt; i = p1.listRDFTypes(); i.hasNext(); ) { System.out.println( p1.getURI() + \u0026quot; is inferred to be in class \u0026quot; + i.next() ); } For other user interface or presentation tasks, we may want something between the complete list of types and the base list of only the asserted values. Consider the class hierarchy in figure 5 (i):\nFigure 5: asserted and inferred relationships\nFigure 5 (i) shows a base model, containing a class hierarchy and an instance x. Figure 5 (ii) shows the full set of relationships that might be inferred from this base model. In Figure 5 (iii), we see only the direct or maximally specific relationships. For example, in 5 (iii) x does not have rdf:type A, since this is an relationship that is covered by the fact that x has rdf:type D, and D is a subclass of A. Notice also that the rdf:type B link is also removed from the direct graph, for a similar reason. Thus the direct graph hides relationships from both the inferred and asserted graphs. When displaying instance x in a user interface, particularly in a tree view of some kind, the direct graph is often the most useful as it contains the useful information in the most compact form.\nTo list the RDF types of a resource, use:\nlistRDFTypes() // assumes not-direct listRDFTypes( boolean direct ) // if direct=true, show only direct relationships Related methods allow the rdf:type to be tested, set and returned.\nOntology classes and basic class expressions Classes are the basic building blocks of an ontology. A simple class is represented in Jena by an OntClass object. As mentioned above, an ontology class is a facet of an RDF resource. One way, therefore, to get an ontology class is to convert a plain RDF resource into its class facet. Assume that m is a suitably defined OntModel, into which the ESWC ontology has already been read, and that NS is a variable denoting the ontology namespace:\nResource r = m.getResource( NS + \u0026quot;Paper\u0026quot; ); OntClass paper = r.as( OntClass.class ); This can be shortened by calling getOntClass() on the ontology model:\nOntClass paper = m.getOntClass( NS + \u0026quot;Paper\u0026quot; ); The getOntClass method will retrieve the resource with the given URI, and attempt to obtain the OntClass facet. If either of these operations fail, getOntClass() will return null. Compare this with the createClass method, which will reuse an existing resource if possible, or create a new class resource if not:\nOntClass paper = m.createClass( NS + \u0026quot;Paper\u0026quot; ); OntClass bestPaper = m.createClass( NS + \u0026quot;BestPaper\u0026quot; ); You can use the create class method to create an anonymous class – a class description with no associated URI. Anonymous classes are often used when building more complex ontologies in OWL. They are less useful in RDFS.\nOntClass anonClass = m.createClass(); Once you have the ontology class object, you can begin processing it through the methods defined on OntClass. The attributes of a class are handled in a similar way to the attributes of OntResource, above, with a collection of methods to set, add, get, test, list and remove values. Properties of classes that are handled in this way are:\nAttribute Meaning subClass A subclass of this class, i.e. those classes that are declared subClassOf this class. superClass A super-class of this class, i.e. a class that this class is a subClassOf. equivalentClass A class that represents the same concept as this class. This is not just having the same class extension: the class \u0026lsquo;British Prime Minister in 2003\u0026rsquo; contains the same individual as the class \u0026rsquo;the husband of Cherie Blair\u0026rsquo;, but they represent different concepts. disjointWith Denotes a class with which this class has no instances in common. Thus, in our example ontology, we can print a list the subclasses of an Artefact as follows:\nOntClass artefact = m.getOntClass( NS + \u0026quot;Artefact\u0026quot; ); for (Iterator\u0026lt;OntClass\u0026gt; i = artefact.listSubClasses(); i.hasNext(); ) { OntClass c = i.next(); System.out.println( c.getURI() ); } Note that, under RDFS and OWL semantics, each class is a sub-class of itself (in other words, rdfs:subClassOf is reflexive). While this is true in the semantics, Jena users have reported finding it inconvenient. Therefore, the listSubClasses and listSuperClasses convenience methods remove the reflexive from the list of results returned by the iterator. However, if you use the plain Model API to query for rdfs:subClassOf triples, assuming that a reasoner is in use, the reflexive triple will appear among the deduced triples.\nGiven an OntClass object, you can create or remove members of the class extension – individuals that are instances of the class – using the following methods:\nMethod Meaning listInstances()\nlistInstances(boolean direct) Returns an iterator over those instances that include this class among their rdf:type values. The direct flag can be used to select individuals that are direct members of the class, rather than indirectly through the class hierarchy. Thus if p1 has rdf:type :Paper, it will appear in the iterator returned by listInstances on :Artefact, but not in the iterator returned by listInstances(false) on :Artefact. createIndividual()\ncreateIndividual(String uri) Adds a resource to the model, whose asserted rdf:type is this ontology class. If no URI is given, the individual is an anonymous resource. dropIndividual(Resource individual) Removes the association between the given individual and this ontology class. Effectively, this removes the rdf:type link between this class and the resource. Note that this is not the same as removing the individual altogether, unless the only thing that is known about the resource is that it is a member of the class. To delete an OntResource, including classes and individuals, use the remove() method. To test whether a class is a root of the class hierarchy in this model (i.e. it has no known super-classes), call isHierarchyRoot().\nThe domain of a property is intended to allow entailments about the class of an individual, given that it appears as a statement subject. It is not a constraint that can be used to validate a document, in the way that XML schema can do. Nevertheless, many developers find it convenient to use the domain of a property to document the design intent that the property only applies to known instances of the domain class. Given this observation, it can be a useful debugging or display aide to show the properties that have this class among their domain classes. The method listDeclaredProperties() attempts to identify the properties that are intended to apply to instances of this class. Using listDeclaredProperties is explained in detail in the RDF frames how-to.\nOntology properties In an ontology, a property denotes the name of a relationship between resources, or between a resource and a data value. It corresponds to a predicate in logic representations. One interesting aspect of RDFS and OWL is that properties are not defined as aspects of some enclosing class, but are first-class objects in their own right. This means that ontologies and ontology-applications can store, retrieve and make assertions about properties directly. Consequently, Jena has a set of Java classes that allow you to conveniently manipulate the properties represented in an ontology model.\nA property in an ontology model is an extension of the core Jena API class Property and allows access to the additional information that can be asserted about properties in an ontology language. The common API super-class for representing ontology properties in Java is OntProperty. Again, using the pattern of add, set, get, list, has, and remove methods, we can access the following attributes of an OntProperty:\nAttribute Meaning subProperty A sub property of this property; i.e. a property which is declared to be a subPropertyOf this property. If p is a sub property of q, and we know that A p B is true, we can infer that A q B is also true. superProperty A super property of this property, i.e. a property that this property is a subPropertyOf domain Denotes the class or classes that form the domain of this property. Multiple domain values are interpreted as a conjunction. The domain denotes the class of value the property maps from. range Denotes the class or classes that form the range of this property. Multiple range values are interpreted as a conjunction. The range denotes the class of values the property maps to. equivalentProperty Denotes a property that is the same as this property. inverse Denotes a property that is the inverse of this property. Thus if q is the inverse of p, and we know that A q B, then we can infer that B p A. In the example ontology, the property hasProgramme has a domain of OrganizedEvent, a range of Programme and the human-readable label \u0026ldquo;has programme\u0026rdquo;. We can reconstruct this definition in an empty ontology model as follows:\nOntModel m = ModelFactory.createOntologyModel( OntModelSpec.OWL_MEM ); OntClass programme = m.createClass( NS + \u0026quot;Programme\u0026quot; ); OntClass orgEvent = m.createClass( NS + \u0026quot;OrganizedEvent\u0026quot; ); ObjectProperty hasProgramme = m.createObjectProperty( NS + \u0026quot;hasProgramme\u0026quot; ); hasProgramme.addDomain( orgEvent ); body.addRange( programme ); body.addLabel( \u0026quot;has programme\u0026quot;, \u0026quot;en\u0026quot; ); As a further example, we can alternatively add information to an existing ontology. To add a super-property hasDeadline, to generalise the separate properties denoting the submission deadline, notification deadline and camera-ready deadline, do:\nOntModel m = ModelFactory.createOntologyModel( OntModelSpec.OWL_MEM ); m.read( \u0026quot;http://www.eswc2006.org/technologies/ontology\u0026quot; ); DatatypeProperty subDeadline = m.getDatatypeProperty( NS + \u0026quot;hasSubmissionDeadline\u0026quot; ); DatatypeProperty notifyDeadline = m.getDatatypeProperty( NS + \u0026quot;hasNotificationDeadline\u0026quot; ); DatatypeProperty cameraDeadline = m.getDatatypeProperty( NS + \u0026quot;hasCameraReadyDeadline\u0026quot; ); DatatypeProperty deadline = m.createDatatypeProperty( NS + \u0026quot;deadline\u0026quot; ); deadline.addDomain( m.getOntClass( NS + \u0026quot;Call\u0026quot; ) ); deadline.addRange( XSD.dateTime ); deadline.addSubProperty( subDeadline ); deadline.addSubProperty( notifyDeadline ); deadline.addSubProperty( cameraDeadline ); Note that, although we called the addSubProperty method on the object representing the new super-property, the serialized form of the ontology will contain rdfs:subPropertyOf axioms on each of the sub-property resources, since this is what the language defines. Jena will, in general, try to allow symmetric access to sub-properties and sub-classes from either direction.\nObject and Datatype properties OWL refines the basic property type from RDF into two sub-types: object properties and datatype properties (for more details see [OWL Reference]). The difference between them is that an object property can have only individuals in its range, while a datatype property has concrete data literals (only) in its range. Some OWL reasoners are able to exploit the differences between object and datatype properties to perform more efficient reasoning over ontologies. OWL also adds an annotation property, which is defined to have no semantic entailments, and so is useful when annotating ontology documents, for example.\nIn Jena, the Java interfaces ObjectProperty, DatatypeProperty and AnnotationProperty are sub-types of OntProperty. However, they do not have any behaviours (methods) particular to themselves. Their existence allows the more complex sub-types of ObjectProperty – transitive properties and so forth – to be kept separate in the class hierarchy. However, when you create an object property or datatype property in a model, it will have the effect of asserting different rdf:type statements into the underlying triple store.\nFunctional properties OWL permits object and datatype properties to be functional – that is, for a given individual in the domain, the range value will always be the same. In particular, if father is a functional property, and individual :jane has father :jim and father :james, a reasoner is entitled to conclude that :jim and :james denote the same individual. A functional property is equivalent to stating that the property has a maximum cardinality of one.\nBeing a functional property is represented through the FunctionalProperty facet of an ontology property object. If a property is declared functional (test using the isFunctional() method), then the method asFunctionalProperty() conveniently returns the functional property facet. A non-functional property can be made functional through the convertToFunctionalProperty() method. When you are creating a property object, you also have the option of passing a Boolean parameter to the createObjectProperty() method on OntModel.\nOther property types There are several additional sub-types of ObjectProperty that represent additional capabilities of ontology properties. A TransitiveProperty means that if p is transitive, and we know :a p :b and also b p :c, we can infer that :a p :c. A SymmetricProperty means that if p is symmetric, and we know :a p :b, we can infer :b p :a. An InverseFunctionalProperty means that for any given range element, the domain value is unique.\nGiven that all properties are RDFNode objects, and therefore support the as() method, you can use as() to change from an object property facet to a transitive property facet. To make this more straightforward, the OntProperty Java class has a number of methods that support directly switching to the corresponding facet view:\npublic TransitiveProperty asTransitiveProperty(); public FunctionalProperty asFunctionalProperty(); public SymmetricProperty asSymmetricPropery(); public InverseFunctionalProperty asInverseFunctionalProperty(); These methods all assume that the underlying model will support this change in perspective. If not, the operation will fail with a ConversionException. For example, if a given property p is not asserted to be a transitive property in the underlying RDF model, then invoking p.asTransitiveProperty() will throw a conversion exception. The following methods will, if necessary, add additional information (i.e. the additional rdf:type statement) to allow the conversion to an alternative facet to succeed.\npublic TransitiveProperty convertToTransitiveProperty(); public FunctionalProperty convertToFunctionalProperty(); public SymmetricProperty convertToSymmetricPropery(); public InverseFunctionalProperty convertToInverseFunctionalProperty(); Sometimes it is convenient not to check whether the .as() conversion is warranted by the underlying data. This may be the case, for example, if the developer knows that the conversions are correct given the information from an external ontology which is not currently loaded. To allow .as() to always succeed, set the attribute strictMode to false on the OntModel object: myOntModel.setStrictMode( false ).\nFinally, methods beginning is... (e.g. isTransitiveProperty) allow you to test whether a given property would support a given sub-type facet.\nMore complex class expressions We introduced the handling of basic, named classes above. These are the only kind of class descriptions available in RDFS. In OWL, however, there are a number of additional types of class expression, which allow richer and more expressive descriptions of concepts. There are two main categories of additional class expression: restrictions and Boolean expressions. We\u0026rsquo;ll examine each in turn.\nRestriction class expressions A restriction defines a class by reference to one of the properties of the individuals that comprise the members of the class, and then placing some constraint on that property. For example, in a simple view of animal taxonomy, we might say that mammals are covered in fur, and birds in feathers. Thus the property hasCovering is in one case restricted to have the value fur, in the other to have the value feathers. This is a has value restriction. Six restriction types are currently defined by OWL:\nRestriction type Meaning has value The restricted property has exactly the given value. all values from All values of the restricted property, if it has any, are members of the given class. some values from The property has at least one value which is a member of the given class. cardinality The property has exactly n values, for some positive integer n. min cardinality The property has at least n values, for some positive integer n. max cardinality The property has at most n values, for some positive integer n. Note that, at present, the Jena ontology API has only limited support for OWL2\u0026rsquo;s qualified cardinality restrictions (i.e. cardinalityQ, minCardinalityQ and maxCardinalityQ). Qualified cardinality restrictions are encapsulated in the interfaces CardinalityQRestriction, MinCardinalityQRestriction and CardinalityQRestriction. OntModel also provides methods for creating and accessing qualified cardinality restrictions. Since they are not part of the OWL 1.0 language definition, qualified cardinality restrictions are not supported in OWL ontologies. Qualified cardinality restrictions were added to the OWL 2 update. OWL2 support in Jena will be added in due course.\nJena provides a number of ways of creating restrictions, or retrieving them from a model. Firstly, you can retrieve a general restriction from the model by its URI, if known.\n// get restriction with a given URI Restriction r = m.getRestriction( NS + \u0026quot;theName\u0026quot; ); You can create a new restriction created by nominating the property that the restriction applies to:\n// anonymous restriction on property p OntProperty p = m.createOntProperty( NS + \u0026quot;p\u0026quot; ); Restriction anonR = m.createRestriction( p ); Since a restriction is typically not assigned a URI in an ontology, retrieving an existing restriction by name may not be possible. However, you can list all of the restrictions in a model and search for the one you want:\nIterator\u0026lt;Restriction\u0026gt; i = m.listRestrictions(); while (i.hasNext()) { Restriction r = i.next(); if (isTheOne( r )) { // handle the restriction } } A common case is that we want the restrictions on some property p. In this case, from an object denoting p we can list the restrictions that mention that property:\nOntProperty p = m.getProperty( NS + \u0026quot;p\u0026quot; ); Iterator\u0026lt;Restriction\u0026gt; i = p.listReferringRestrictions(); while (i.hasNext()) { Restriction r = i.next(); // now handle the restriction ... } A general restriction can be converted to a specific type of restriction via as... methods (if the information is already in the model), or, if the information is not in the model, via convertTo... methods. For example, to convert the example restriction r from the example above to an all values from restriction, we can do the following:\nOntClass c = m.createClass( NS + \u0026quot;SomeClass\u0026quot; ); AllValuesFromRestriction avf = r.convertToAllValuesFromRestriction( c ); To create a particular restriction ab initio, we can use the creation methods defined on OntModel. For example:\nOntClass c = m.createClass( NS + \u0026quot;SomeClass\u0026quot; ); ObjectProperty p = m.createObjectProperty( NS + \u0026quot;p\u0026quot; ); // null denotes the URI in an anonymous restriction AllValuesFromRestriction avf = m.createAllValuesFromRestriction( null, p, c ); Assuming that the above code fragment was using a model m which was created with the OWL language profile, it creates a instance of an OWL restriction that would have the following definition in RDF/XML:\n\u0026lt;owl:Restriction\u0026gt; \u0026lt;owl:onProperty rdf:resource=\u0026quot;#p\u0026quot;/\u0026gt; \u0026lt;owl:allValuesFrom rdf:resource=\u0026quot;#SomeClass\u0026quot;/\u0026gt; \u0026lt;/owl:Restriction\u0026gt; Once we have a particular restriction object, there are methods following the standard add, get, set and test naming pattern to access the aspects of the restriction. For example, in a camera ontology, we might find this definition of a class describing Large-Format cameras:\n\u0026lt;owl:Class rdf:ID=\u0026quot;Large-Format\u0026quot;\u0026gt; \u0026lt;rdfs:subClassOf rdf:resource=\u0026quot;#Camera\u0026quot;/\u0026gt; \u0026lt;rdfs:subClassOf\u0026gt; \u0026lt;owl:Restriction\u0026gt; \u0026lt;owl:onProperty rdf:resource=\u0026quot;#body\u0026quot;/\u0026gt; \u0026lt;owl:allValuesFrom rdf:resource=\u0026quot;#BodyWithNonAdjustableShutterSpeed\u0026quot;/\u0026gt; \u0026lt;/owl:Restriction\u0026gt; \u0026lt;/rdfs:subClassOf\u0026gt; \u0026lt;/owl:Class\u0026gt; Here\u0026rsquo;s one way to access the components of the all values from restriction. Assume m contains a suitable camera ontology:\nOntClass largeFormat = m.getOntClass( camNS + \u0026quot;Large-Format\u0026quot; ); for (Iterator\u0026lt;OntClass\u0026gt; i = LargeFormat.listSuperClasses( true ); i.hasNext(); ) { OntClass c = i.next(); if (c.isRestriction()) { Restriction r = c.asRestriction(); if (r.isAllValuesFromRestriction()) { AllValuesFromRestriction av = r.asAllValuesFromRestriction(); System.out.println( \u0026quot;AllValuesFrom class \u0026quot; + av.getAllValuesFrom().getURI() + \u0026quot; on property \u0026quot; + av.getOnProperty().getURI() ); } } } Boolean class expressions Most developers are familiar with the use of Boolean operators to construct propositional expressions: conjunction (and), disjunction (or) and negation (not). OWL provides a means for constructing expressions describing classes with analogous operators, by considering class descriptions in terms of the set of individuals that comprise the members of the class.\nSuppose we wish to say that an instance x has rdf:type A and rdf:type B. This means that x is both a member of the set of individuals in A, and in the set of individuals in B. Thus, x lies in the intersection of classes A and B. If, on the other hand, A is either has rdf:type A or B, then x must lie in the union of A and B. Finally, to say that x does not have rdf:type A, it must lie in the complement of A. These operations, union, intersection and complement are the Boolean operators for constructing class expressions. While complement takes only a single argument, union and intersection must necessarily take more than one argument. Before continuing with constructing and using Boolean class expressions, let\u0026rsquo;s briefly to discuss lists.\nList expressions RDF originally had three container types: Seq, Alt and Bag. While useful, these are all open forms: it is not possible to say that a given container has a fixed number of values. Lists have subsequently been added to the core RDF specification, and are used extensively in OWL. A list follows the well-known cons cell pattern from Lisp, Prolog and other list-handling languages. Each cell of a list is either the end-of-list terminator (nil in Lisp), or is a pair consisting of a value and a pointer to the cell that is the first cell on the tail of the list. In RDF lists, the end-of-list is marked by a resource with name rdf:nil, while each list cell is an anonymous resource with two properties, one denoting the tail and the other the value. Fortunately, this complexity is hidden by some simple syntax:\n\u0026lt;p rdf:parseType=\u0026quot;collection\u0026quot;\u0026gt; \u0026lt;A /\u0026gt; \u0026lt;B /\u0026gt; \u0026lt;/p\u0026gt; According to the RDF specification, this list of two elements has the following expansion as RDF triples:\n\u0026lt;p\u0026gt; \u0026lt;rdf:first\u0026gt;\u0026lt;A /\u0026gt;\u0026lt;/rdf:first\u0026gt; \u0026lt;rdf:rest\u0026gt; \u0026lt;rdf:first\u0026gt;\u0026lt;B /\u0026gt;\u0026lt;/rdf:first\u0026gt; \u0026lt;rdf:rest rdf:resource=\u0026quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#nil\u0026quot;/\u0026gt; \u0026lt;/rdf:rest\u0026gt; \u0026lt;/p\u0026gt; Given this construction, a well formed list (one with exactly one rdf:first and rdf:rest per cons cell) has a precisely determined set of members. Incidentally, the same list in Turtle is even more compact:\n:example :p ( :A :B ). Although lists are defined in the generic RDF model in Jena, they are extensively used by the ontology API so we mention them here. Full details of the methods defined are in the RDFList javadoc.\nVarious means of constructing lists are defined in Model, as variants on createList. For example, we can construct a list of three classes as follows:\nOntModel m = ModelFactory.createOntModel(); OntClass c0 = m.createClass( NS + \u0026quot;c0\u0026quot; ); OntClass c1 = m.createClass( NS + \u0026quot;c1\u0026quot; ); OntClass c2 = m.createClass( NS + \u0026quot;c2\u0026quot; ); RDFList cs = m.createList( new RDFNode[] {c0, c1, c2} ); Alternatively, we can build a list one element at time:\nOntModel m = ModelFactory.createOntModel(); RDFList cs = m.createList(); // Cs is empty cs = cs.cons( m.createClass( NS + \u0026quot;c0\u0026quot; ) ); cs = cs.cons( m.createClass( NS + \u0026quot;c1\u0026quot; ) ); cs = cs.cons( m.createClass( NS + \u0026quot;c2\u0026quot; ) ); Note that these two approaches end with the classes in the lists in opposite orders, since the cons operation adds a new list cell to the front of the list. Thus the second list will run c2 to c0. In the ontology operations we are discussing here, the order of values in the list is not considered significant.\nFinally, a resource which is a cell in a list sequence will accept .as( RDFList.class )\nOnce the list has been created or obtained from the model, RDFList methods may be used to access members of the list, iterate over the list, and so forth. For example:\nSystem.out.println( \u0026quot;List has \u0026quot; + myRDFList.size() + \u0026quot; members:\u0026quot; ); for (Iterator\u0026lt;RDFNode\u0026gt; i = myRDFList.iterator(); i.hasNext(); ) { System.out.println( i.next() ); } Intersection, union and complement class expressions Given Jena\u0026rsquo;s ability to construct lists, building intersection and union class expressions is straightforward. The create methods on OntModel allow us to construct an intersection or union directly. Alternatively, given an existing OntClass, we can use the convertTo... methods to construct facet representing the more specialised expressions. For example, we can define the class of UK industry-related conferences as the intersection of conferences with a UK location and conferences with an industrial track. Here\u0026rsquo;s the XML declaration:\n\u0026lt;owl:Class rdf:ID=\u0026quot;UKIndustrialConference\u0026quot;\u0026gt; \u0026lt;owl:intersectionOf rdf:parseType=\u0026quot;Collection\u0026quot;\u0026gt; \u0026lt;owl:Restriction\u0026gt; \u0026lt;owl:onProperty rdf:resource=\u0026quot;#hasLocation\u0026quot;/\u0026gt; \u0026lt;owl:hasValue rdf:resource=\u0026quot;#united_kingdom\u0026quot;/\u0026gt; \u0026lt;/owl:Restriction\u0026gt; \u0026lt;owl:Restriction\u0026gt; \u0026lt;owl:onProperty rdf:resource=\u0026quot;#hasPart\u0026quot;/\u0026gt; \u0026lt;owl:someValuesFrom rdf:resource=\u0026quot;#IndustryTrack\u0026quot;/\u0026gt; \u0026lt;/owl:Restriction\u0026gt; \u0026lt;/owl:intersectionOf\u0026gt; \u0026lt;/owl:Class\u0026gt; Or, more compactly in N3/Turtle:\n:UKIndustrialConference a owl:Class ; owl:intersectionOf ( [a owl:Restriction ; owl:onProperty :hasLocation ; owl:hasValue :united_kingdom] [a owl:Restriction ; owl:onProperty :hasPart ; owl:someValuesFrom :IndustryTrack] ) Here is code to create this class declaration using Jena, assuming that m is a model into which the ESWC ontology has been read:\n// get the class references OntClass place = m.getOntClass( NS + \u0026quot;Place\u0026quot; ); OntClass indTrack = m.getOntClass( NS + \u0026quot;IndustryTrack\u0026quot; ); // get the property references ObjectProperty hasPart = m.getObjectProperty( NS + \u0026quot;hasPart\u0026quot; ); ObjectProperty hasLoc = m.getObjectProperty( NS + \u0026quot;hasLocation\u0026quot; ); // create the UK instance Individual uk = place.createIndividual( NS + \u0026quot;united_kingdom\u0026quot; ); // now the anonymous restrictions HasValueRestriction ukLocation = m.createHasValueRestriction( null, hasLoc, uk ); SomeValuesFromRestriction hasIndTrack = m.createHasValueRestriction( null, hasPart, indTrack ); // finally create the intersection class IntersectionClass ukIndustrialConf = m.createIntersectionClass( NS + \u0026quot;UKIndustrialConference\u0026quot;, m.createList( new RDFNode[] {ukLocation, hasIndTrack} ) ); Union and intersection class expressions are very similar, so Jena defines a common super-class BooleanClassDescription. This class provides access to the operands to the expression. In the intersection example above, the operands are the two restrictions. The BooleanClassDescription class allows us to set the operands en masse by supplying a list, or to be added or deleted one at a time.\nComplement class expressions are very similar. The principal difference is that they take only a single class as operand, and therefore do not accept a list of operands.\nEnumerated classes The final type class expression allows by OWL is the enumerated class. Recall that a class is a set of individuals. Often, we want to define the members of the class implicitly: for example, \u0026ldquo;the class of UK conferences\u0026rdquo;. Sometimes it is convenient to define a class explicitly, by stating the individuals the class contains. An enumerated class is exactly the class whose members are the given individuals. For example, we know that the class of PrimaryColours contains exactly red, green and blue, and no others.\nIn Jena, an enumerated class is created in a similar way to other classes. The set of values that comprise the enumeration is described by an RDFList. For example, here\u0026rsquo;s a class defining the countries that comprise the United Kingdom:\n\u0026lt;owl:Class rdf:ID=\u0026quot;UKCountries\u0026quot;\u0026gt; \u0026lt;owl:oneOf rdf:parseType=\u0026quot;Collection\u0026quot;\u0026gt; \u0026lt;eswc:Place rdf:about=\u0026quot;#england\u0026quot;/\u0026gt; \u0026lt;eswc:Place rdf:about=\u0026quot;#scotland\u0026quot;/\u0026gt; \u0026lt;eswc:Place rdf:about=\u0026quot;#wales\u0026quot;/\u0026gt; \u0026lt;eswc:Place rdf:about=\u0026quot;#northern_ireland\u0026quot;/\u0026gt; \u0026lt;/owl:oneOf\u0026gt; \u0026lt;/owl:Class\u0026gt; To list the contents of this enumeration, we could do the following:\nOntClass place = m.getOntClass( NS + \u0026quot;Place\u0026quot; ); EnumeratedClass ukCountries = m.createEnumeratedClass( NS + \u0026quot;UKCountries\u0026quot;, null ); ukCountries.addOneOf( place.createIndividual( NS + \u0026quot;england\u0026quot; ) ); ukCountries.addOneOf( place.createIndividual( NS + \u0026quot;scotland\u0026quot; ) ); ukCountries.addOneOf( place.createIndividual( NS + \u0026quot;wales\u0026quot; ) ); ukCountries.addOneOf( place.createIndividual( NS + \u0026quot;northern_ireland\u0026quot; ) ); for (Iterator i = UKCountries.listOneOf(); i.hasNext(); ) { Resource r = (Resource) i.next(); System.out.println( r.getURI() ); } An OWL DataRange is similar to an enumerated class, except that the members of the DataRange are literal values, such as integers, dates or strings. See the DataRange javadoc for more details.\nListing classes In many applications, we need to inspect the set of classes in an ontology. The list... methods on OntModel provide a variety of means of listing types of class. The methods available include:\npublic ExtendedIterator\u0026lt;OntClass\u0026gt; listClasses(); public ExtendedIterator\u0026lt;EnumeratedClass\u0026gt; listEnumeratedClasses(); public ExtendedIterator\u0026lt;UnionClass\u0026gt; listUnionClasses(); public ExtendedIterator\u0026lt;ComplementClass\u0026gt; listComplementClasses(); public ExtendedIterator\u0026lt;IntersectionClass\u0026gt; listIntersectionClasses(); public ExtendedIterator\u0026lt;Restriction\u0026gt; listRestrictions(); public ExtendedIterator\u0026lt;OntClass\u0026gt; listNamedClasses(); public ExtendedIterator\u0026lt;OntClass\u0026gt; listHierarchyRootClasses(); The last two methods deserve special mention. In OWL, class expressions are typically not named, but are denoted by anonymous resources (aka bNodes). In many applications, such as displaying an ontology in a user interface, we want to pick out the named classes only, ignoring those denoted by bNodes. This is what listNamedClasses() does. The method listHierarchyRootClasses() identifies the classes that are uppermost in the class hierarchy contained in the given model. These are the classes that have no super-classes. The iteration returned by listHierarchyRootClasses() may contain anonymous classes. To get a list of named hierarchy root classes, i.e. the named classes that lie closest to the top of the hierarchy (alternatively: the shallowest fringe of the hierarchy consisting solely of named classes), use the OntTools method namedHierarchyRoots().\nYou should also note that it is important to close the iterators returned from the list... methods, particularly when the underlying store is a database. This is necessary so that any state (e.g. the database connection resources) can be released. Closing happens automatically when the hasNext() method on the iterator returns false. If your code does not iterate all the way to the end of the iterator, you should call the close() method explicitly. Note also that the values returned by these iterators will depend on the asserted data and the reasoner being used. For example, if the model contains a Restriction, that restriction will only be returned by the listClasses() iterator if the model is bound to a reasoner that can infer that any restriction is also be a class, since Restriction is a subClassOf Class. This difference can be exploited by the programmer: to list classes and restrictions separately, perform the listClasses() and listRestrictions() methods on the base model only, or on a model with no reasoner attached.\nInstances or individuals In OWL Full any value can be an individual – and thus the subject of triples in the RDF graph other than ontology declarations. In OWL Lite and DL, the language terms and the instance data that the application is working with are kept separate, by definition of the language. Jena therefore supports a simple notion of an Individual, which is essentially an alias for Resource. While Individuals are largely synonymous with Resources, they do provide an programming interface that is consistent with the other Java classes in the ontology API.\nThere are two ways to create individuals. Both requires the class to which the individual will initially belong:\nOntClass c = m.createClass( NS + \u0026quot;SomeClass\u0026quot; ); // first way: use a call on OntModel Individual ind0 = m.createIndividual( NS + \u0026quot;ind0\u0026quot;, c ); // second way: use a call on OntClass Individual ind1 = c.createIndividual( NS + \u0026quot;ind1\u0026quot; ); The only real difference between these approaches is that the second way will create the individual in the same model that the class is attached to (see the getModel() method). In both of the above examples the individual is named, but this is not necessary. The method OntModel.createIndividual( Resource cls ) creates an anonymous individual belonging to the given class. Note that the type of the class parameter is only Resource. You are not required to use as() to present a Resource to an OntClass before calling this method, though of course an OntClass is a Resource so using an OntClass will work perfectly well.\nIndividual provides a set of methods for testing and manipulating the ontology classes to which an individual belongs. This is a convenience: OWL and RDFS denote class membership through the rdf:type property, and methods for manipulating and testing rdf:type are defined on OntResource. You may use either approach interchangeably.\nOntology meta-data In OWL, but not RDFS, meta-data about the ontology itself is encoded as properties on an individual of class owl:Ontology. By convention, the URI of this individual is the URL, or web address, of the ontology document itself. In the XML serialisation, this is typically shown as:\n\u0026lt;owl:Ontology rdf:about=\u0026quot;\u0026quot;\u0026gt; \u0026lt;/owl:Ontology\u0026gt; Note that the construct rdf:about=\u0026quot;\u0026quot; does not indicate a resource with no URI; it is in fact a shorthand way of referencing the base URI of the document containing the ontology. The base URI may be stated in the document through an xml:base declaration in the XML preamble. The base URI can also be specified when reading the document via Jena\u0026rsquo;s Model API (see the read() methods on OntModel for reference).\nWe can attach various meta-data statements to this object to indicate attributes of the ontology as a whole. The Java object Ontology represents this special instance, and uses the standard add, set, get, list, test and delete pattern to provide access to the following attributes:\nAttribute Meaning backwardCompatibleWith Names a prior version of this ontology that this version is compatible with. incompatibleWith Names a prior version of this ontology that this version is not compatible with priorVersion Names a prior version of this ontology. imports Names an ontology whose definitions this ontology includes In addition to these attributes, the Ontology element typically contains common meta-data properties, such as comment, label and version information.\nIn the Jena API, the ontology\u0026rsquo;s metadata properties can be accessed through the Ontology interface. Suppose we wish to know the list of URI\u0026rsquo;s that the ontology imports. First we must obtain the resource representing the ontology itself:\nString base = ...; // the base URI of the ontology OntModel m = ...; // the model containing the ontology statements Ontology ont = m.getOntology( base ); // now list the ontology imports for (String imp : ont.listImportedOntologyURIs()) { System.out.println( \u0026quot;Ontology \u0026quot; + base + \u0026quot; imports \u0026quot; + imp ); } If the base URI of the ontology is not known, you can list all resources of rdf:type Ontology in a given model by OntModel.listOntologies(). If there is only one of these, it should be safe to assume that it is the Ontology resource for the ontology. However, you should note that if more than one ontology document has been read in to the model (for example by including the imports of a document), there may well be more than one Ontology resource in the model. In this case, you may find it useful to list the ontology resources in just the base model:\nOntModel m = ... // the model, including imports OntModel mBase = ModelFactory.createOntologyModel( OntModelSpec.OWL_MEM, m.getBaseModel() ); for (Iterator i = mBase.listOntologies(); i.hasNext(); ) { Ontology ont = (Ontology) i.next(); // m's base model has ont as an import ... } A common practice is also to use the Ontology element to attach Dublin Core metadata to the ontology document. Jena provides a copy of the Dublin Core vocabulary, in org.apache.jena.vocabulary.DCTerms. To attach a statement saying that the ontology was authored by John Smith, we can say:\nOntology ont = m.getOntology( baseURI ); ont.addProperty( DCTerms.creator, \u0026quot;John Smith\u0026quot; ); It is also possible to programmatically add imports and other meta-data to a model, for example:\nString base = ...; // the base URI of the ontology OntModel m = ...; Ontology ont = m.createOntology( base ); ont.addImport( m.createResource( \u0026quot;http://example.com/import1\u0026quot; ) ); ont.addImport( m.createResource( \u0026quot;http://example.com/import2\u0026quot; ) ); Note that under default conditions, simply adding (or removing) an owl:imports statement to a model will not cause the corresponding document to be imported (or removed). However, by calling OntModel.setDynamicImports(true), the model will start noticing the addition or removal of owl:imports statements.\nOntology inference: overview You have the choice of whether to use the Ontology API with Jena\u0026rsquo;s reasoning capability turned on, and, if so, which of the various reasoners to use. Sometimes a reasoner will add information to the ontology model that it is not useful for your application to see. A good example is an ontology editor. Here, you may wish to present your users with the information they have entered in to their ontology; the addition of the entailed information into the editor\u0026rsquo;s display would be very confusing. Since Jena does not have a means for distinguishing inferred statements from those statements asserted into the base model, a common choice for ontology editors and similar applications is to run with no reasoner.\nIn many other cases, however, it is the addition of the reasoner that makes the ontology useful. For example, if we know that John is the father of Mary, we would expect a \u0026lsquo;yes\u0026rsquo; if we query whether John is the parent of Mary. The parent relationship is not asserted, but we know from our ontology that fatherOf is a sub-property of parentOf. If \u0026lsquo;John fatherOf Mary\u0026rsquo; is true, then \u0026lsquo;John parentOf Mary\u0026rsquo; is also true. The integrated reasoning capability in Jena exists to allow just such entailments to be seen and used.\nFor a complete and thorough description of Jena\u0026rsquo;s inference capabilities, please see the reasoner documentation. This section of of the ontology API documentation is intended to serve as only a brief guide and overview.\nRecall from the introduction that the reasoners in Jena operate by making it appear that triples entailed by the inference engine are part of the model in just the same way as the asserted triples (see Figure 2). The underlying architecture allows the reasoner to be part of the same Java virtual machine (as is the case with the built-in rule-based reasoners), or in a separate process on the local computer, or even a remote computer. Of course, each of these choices will have different characteristics of what reasoning capabilities are supported, and what the implications for performance are.\nThe reasoner attached to an ontology model, if any, is specified through the OntModelSpec. The methods setReasoner() and setReasonerFactory() on the model spec are used to specify a reasoner. The setReasoner variant is intended for use on a specification which will only be used to build a single model. The factory variant is used where the OntModelSpec will be used to build more than one model, ensuring that each model gets its own reasoner object. The ReasonerRegistry provides a collection of pre-built reasoners – see the reasoner documentation for more details. However, it is also possible for you to define your own reasoner that conforms to the appropriate interface. For example, there is an in-process interface to the open-source Pellet reasoner.\nTo facilitate the choice of reasoners for a given model, some common choices have been included in the pre-built ontology model specifications available as static fields on OntModelSpec. The available choices are described in the section on ont model specifications, above.\nDepending on which of these choices is made, the statements returned from queries to a given ontology model may vary considerably.\nAdditional notes Jena\u0026rsquo;s inference machinery defines some specialised services that are not exposed through the addition of extra triples to the model. These are exposed by the InfModel interface; for convenience OntModel extends this interface to make these services directly available to the user. Please note that calling inference-specific methods on an ontology model that does not contain a reasoner will have unpredictable results. Typically these methods will have no effect or return null, but you should not rely on this behaviour.\nIn general, inference models will add many additional statements to a given model, including the axioms appropriate to the ontology language. This is typically not something you will want in the output when the model is serialized, so write() on an ontology model will only write the statements from the base model. This is typically the desired behaviour, but there are occasions (e.g. during debugging) when you may want to write the entire model, virtual triples included. The easiest way to achieve this is to call the writeAll() method on OntModel. An alternative technique, which can sometimes be useful for a variety of use-cases, including debugging, is to snapshot the model by constructing a temporary plain model and adding to it: the contents of the ontology model:\nOntModel m = ... // snapshot the contents of ont model om Model snapshot = ModelFactory.createDefaultModel(); snapshot.add( om ); Working with persistent ontologies A common way to work with ontology data is to load the ontology axioms and instances at run-time from a set of source documents. This is a very flexible approach, but has limitations. In particular, your application must parse the source documents each time it is run. For large ontologies, this can be a source of significant overhead. Jena provides an implementation of the RDF model interface that stores the triples persistently in a database. This saves the overhead of loading the model each time, and means that you can store RDF models significantly larger than the computer\u0026rsquo;s main memory, but at the expense of a higher overhead (a database interaction) to retrieve and update RDF data from the model. In this section we briefly discuss using the ontology API with Jena\u0026rsquo;s persistent database models.\nFor information on setting-up and accessing the persistent models themselves, see the TDB reference sections.\nThere are two somewhat separate requirements for persistently storing ontology data. The first is making the main or base model itself persistent. The second is re-using or creating persistent models for the imports of an ontology. These two requirements are handled slightly differently.\nTo retrieve a Jena model from the database API, we have to know its name. Fortunately, common practice for ontologies on the Semantic Web is that each is named with a URI. We use this URI to name the model that is stored in the database. Note carefully what is actually happening here: we are exploiting a feature of the database sub-system to make persistently stored ontologies easy to retrieve, but we are not in any sense resolving the URI of the model. Once placed into the database, the name of the model is treated as an opaque string.\nTo create a persistent model for the ontology http://example.org/Customers, we create a model maker that will access our underlying database, and use the ontology URI as the database name. We then take the resulting persistent model, and use it as the base model when constructing an ontology model:\nModel base = getMaker().createModel( \u0026quot;http://example.org/Customers\u0026quot; ); OntModel m = ModelFactory.createOntologyModel( OntModelSpec.OWL_MEM_RULE_INF, base ); Here we assume that the getMaker() method returns a suitably initialized ModelMaker that will open the connection to the database. This step only creates a persistent model named with the ontology URI. To initialise the content, we must either add statements to the model using the OntModel API, or do a one-time read from a document:\nm.read( \u0026quot;http://example.org/Customers\u0026quot; ); Once this step is completed, the model contents may be accessed in future without needing to read again.\nIf the Customers ontology imports other ontologies, using owl:imports, the Jena Ontology API will build a union model containing the closure of the imports. Even if the base model is persistent, the predefined OntModelSpec objects only specify memory models to contain the imported ontologies, since memory models do not require any additional parameters.\nTo specify that the imported models should stored in, and retrieved from, the database, we must update the ontology spec object to use the model maker that encapsulates the database connection:\nOntModelSpec spec = new OntModelSpec( OntModelSpec.OWL_MEM_RULE_INF ); // set the model maker for the base model spec.setBaseModelMaker( getMaker() ); // set the model maker for imports spec.setImportModelMaker( getMaker() ); This new model maker will then be used to generate persistent models named with the URI of the imported ontology, if it passed instead of OntModelSpec.OWL_MEM_RULE_INF to the createOntologyModel method of the model factory. Note that once the import has been loaded once into the database, it can be re-used by other ontologies that import it. Thus a given database will only contain at most one copy of each imported ontology.\nNote on performance The built-in Jena reasoners, including the rule reasoners, make many small queries into the model in order to propagate the effects of rules firing. When using a persistent database model, each of these small queries creates an SQL interaction with the database engine. This is a very inefficient way to interact with a database system, and performance suffers as a result. Efficient reasoning over large, persistent databases is currently an open research challenge. Our best suggested work-around is, where possible, to snapshot the contents of the database-backed model into RAM for the duration of processing by the inference engine. An alternative solution, that may be applicable if your application does not write to the datastore often, is to precompute the inference closure of the ontology and data in-memory, then store that into a database model to be queried by the run-time application. Such an off-line processing architecture will clearly not be applicable to every application problem.\nA sample program shows the above steps combined, to create an ontology in which both base model and imports are stored in a persistent database.\nExperimental ontology tools Starting with Jena release 2.6, the OntTools class provides a small collection of commonly-requested utilities for assisting with ontology processing. Given that this is a new feature, you should regard it as an experimental facility for the time being. We welcome feedback. The capabilities in OntTools are implemented as static methods. Currently available tools are:\nOntClass getLCA( OntModel m, OntClass u, OntClass v ) Determine the lowest common ancestor for classes u and v. This is the class that is lowest in the class hierarchy, and which includes both u and v among its sub-classes. Path findShortestPath( Model m, Resource start, RDFNode end, Filter onPath ) Breadth-first search, including a cycle check, to locate the shortest path from start to end, in which every triple on the path returns true to the onPath predicate. List namedHierarchyRoots( OntModel m ) Compute a list containing the uppermost fringe of the class hierarchy in the given model which consists only of named classes. ","permalink":"https://jena.apache.org/documentation/ontology/","tags":null,"title":"Jena Ontology API"},{"categories":null,"contents":"Jena Permissions is a SecurityEvaluator interface and a set of dynamic proxies that apply that interface to Jena Graphs, Models, and associated methods and classes. It does not implement any specific security policy but provides a framework for developers or integrators to implement any desired policy.\nDocumentation Overview Usage Notes Design Security Evaluator implementation Assembler for a Secured Model Adding Jena Permissions to Fuseki Overview Jena Permissions transparently intercepts calls to the Graph or Model interface, evaluates access restrictions and either allows or rejects the access. The system is authentication agnostic and will work with most authentication systems. The system uses dynamic proxies to wrap any Graph or Model implementation. The Jena Permissions module includes an Assembler module to extend the standard Assembler to include the ability to create secured models and graphs. A complete example application is also available.\nThe developer using Jena Permissions is required to implement a SecurityEvaluator that provides access to the Principal (User) using the system and also determines if that Principal has the proper access to execute a method. Through the SecurityEvaluator the developer may apply full CRUD (Create, Read, Update, and Delete) restrictions to graphs and optionally triples within the graphs.\nThe javadocs have additional annotations that specify what permissions at graph and triple levels are required for the user to execute the method.\nThere is an example jar that contains configuration examples for both a stand alone application and a Fuseki configuration option.\nUsage Notes When the system is correctly configured the developer creates a SecuredGraph by calling Factory.getInstance( SecurityEvaluator, String, Graph );. Once created the resulting graph automatically makes the appropriate calls to the SecurityEvaluator before passing any approved requests to the underlying graph.\nSecured models are created by calling Factory.getInstance( SecurityEvaluator, String, Model ); or ModelFactory.createModelForGraph( SecuredGraph );\nNOTE: when creating a model by wrapping a secured graph (e.g. ModelFactory.createModelForGraph( SecuredGraph );) the resulting Model does not have the same security requirements that the standard secured model. For example When creating a list on a secured model calling model.createList( RDFNode[] );, the standard secured model verifies that the user has the right to update the triples and allows or denies the entire operation accordingly. The wrapped secured graph does not have visibility to the createList() command and can only operate on the instructions issued by the model.createList() implementation. In the standard implementation the model requests the graph to delete one triple and then insert another. Thus the user must have delete and add permissions, not the update permission.\nThere are several other cases where the difference in the layer can trip up the security system. In all known cases the result is a tighter security definition than was requested. For simplicity sake we recommend that the wrapped secured graph only be used in cases where access to the graph as a whole is granted/denied. In these cases the user either has all CRUD capabilities or none.\n","permalink":"https://jena.apache.org/documentation/permissions/","tags":null,"title":"Jena Permissions - A Permissions wrapper around the Jena RDF implementation"},{"categories":null,"contents":"Jena Permissions provides a standard Jena assembler making it easy to use the SecuredModel in an Assembler based environment. To use the permissions assembler the assembler file must contain the lines:\n[] ja:loadClass \u0026quot;org.apache.jena.permissions.SecuredAssembler\u0026quot; . sec:Model rdfs:subClassOf ja:NamedModel . The secured assembler provides XXXXXXXXXXXx properties for the assembler files.\nAssuming we define:\nPREFIX sec: \u0026lt;http://apache.org/jena/permissions/Assembler#\u0026gt; Then the following resources are defined:\nsec:Model - A secured model. One against which the security evaluator is running access checks. All sec:Model instances must have a ja:ModelName to identify it to the SecurityEvaluator\nsec:Evaluator - An instance of SecurityEvaluator.\nThe following are properties are also defined:\nsec:evaluatorFactory - Identifies the class name of a factory class that implements a no-argument getInstance() method that returns an instance of SecurityEvaluator.\nsec:baseModel - Identifies the ja:Model that is to have permissions applied to it.\nsec:evaluatorImpl - Identifies an instance of SecurityEvaluator.\nsec:evaluatorClass - Identifies a class that implements SecurityEvaluator\nsec:args - Identifies arguments to the sec:evaluatorClass constructor.\nThe secured assembler provides two (2) mechanisms to create a secured graph. The first is to use a SecurityEvaluator factory.\nmy:securedModel rdf:type sec:Model ; sec:baseModel my:baseModel ; ja:modelName \u0026quot;https://example.org/securedBaseModel\u0026quot; ; sec:evaluatorFactory \u0026quot;the.evaluator.factory.class.name\u0026quot; . In the above example static method getInstance() is called on the.evaluator.factory.class.name and the result is used as the SecurityEvaluator. This is used to create a secured model (my:securedModel) that wraps the model my:baseModel and identifies itself to the SecurityEvaluator with the URI \u0026quot;https://example.org/securedBaseModel\u0026quot;.\nThe second mechanism is to use the sec:Evaluator method.\nmy:secEvaluator rdf:type sec:Evaluator ; sec:args [ rdf:_1 my:secInfoModel ; ] ; sec:evaluatorClass \u0026quot;your.implementation.SecurityEvaluator\u0026quot; . my:securedModel rdf:type sec:Model ; sec:baseModel my:baseModel ; ja:modelName \u0026quot;https://example.org/securedBaseModel\u0026quot; ; sec:evaluatorImpl my:secEvaluator . In the above example my:secEvaluator is defined as a sec:Evaluator implemented by the class \u0026quot;your.implementation.SecurityEvaluator\u0026quot;. When the instance is constructed the constructor with one argument is used and it is passed my:secInfoModel as an argument. my:secInfoModel may be any type supported by the assembler. If more than one argument is desired then rdf:_2, rdf:_3, rdf:_4, etc. may be added to the sec:args list. The \u0026quot;your.implementation.SecurityEvaluator\u0026quot; with the proper number of arguments will be called. It is an error to have more than one argument with the proper number of arguments.\nAfter construction the value of my:securedModel is used to construct the my:securedModel instance. This has the same properties as the previous example other than that the SecurityEvaluator instance is different.\n","permalink":"https://jena.apache.org/documentation/permissions/assembler.html","tags":null,"title":"Jena Permissions - Assembler for a Secured Model"},{"categories":null,"contents":"Jena Permissions is designed to allow integrators to implement almost any security policy. Fundamentally it works by implementing dynamic proxies on top of the Jena Graph and Model interfaces as well as objects returned by those interfaces. The proxy verifies that the actions on those objects are permitted by the policy before allowing the actions to proceed.\nThe graph or model is created by the org.apache.jena.permissions.Factory object by wrapping a Graph or Model implementation and associating it with a URI (graphIRI) and a SecurityEvaluator implementation. The graphIRI is the URI that will be used to identify the graph/model to the security evaluator.\nThe SecurityEvaluator is an object implemented by the integrator to perform the necessary permission checks. A discussion of the SecurityEvaluator implementation can be found in the Security Evaluator documentation.\nAccess to methods in secured objects are determined by the CRUD (Create, Read, Update and Delete) permissions assigned to the user.\nThe system is designed to allow shallow (graph/model level) or deep (triple/statement level) decisions.\nWhen a secured method is called the system performs the following checks in order:\nDetermines if the user has proper access to the underlying graph/model. Generally the required permission is Update (for add or delete methods), or Read.\nIf the user has access to the graph/model determine if the user has permission to execute the method against all triples/statements in the graph/model. This is performed by calling SecurityEvaluator.evaluate(principal, action, graphIRI, Triple.ANY). If the evaluator returns true then the action is permitted. This is general case for shallow permission systems. For deep permissions systems false may be returned.\nif the user does not have permission to execute the method against all triples/statements the SecurityEvaluator.evaluate(principal, action, graphIRI, triple) method is called with the specific triple (note special cases below). If the evaluator returns true the action is permitted, otherwise a properly detailed PermissionDeniedException is thrown.\nSpecial Cases SecurityEvaluator.FUTURE There are a couple of special cases where the Node/Resource is not known when the permission check is made. An example is the creation of a RDF List object. For example to create an empty list the following triple/statement must be constructed:\n_:b1 rdf:first rdf:nil . However, the permissions system can not know the value of _:b1 until after the triple/statement is constructed and added to the graph/model. To handle this situation the permissions system asks the evaluator to evaluate the triple: (SecurityEvaluator.FUTURE, RDF.first, RDF.nill) Similar situations are found when adding to a list, creating reified statements, RDF alt objects, RDF sequences, or RDF anonymous resources of a specific type.\nSecurityEvaluator.VARIABLE The Node.ANY node is used to identify the case where any node may be returned. Specifically it asks if the user can perform the action on All the nodes in this position in the triple. For example:\nNode.ANY RDF:type FOAF:Person Asks if the operation can be performed on all of the nodes of type FOAF:Person.\nThe SecurityEvaluator.VARIABLE differs from Node.ANY in that the system is asking if there are any prohibitions, and not if the user may perform. Thus queries with the VARIABLE type node should return true where ANY returns false. In general this type is used in query evaluation to determine if triple level filtering of results must be performed. Thus:\nSecurityEvaluator.VARIABLE RDF:type FOAF:Person Asks if there are any restrictions against the user performing the action against all triples of type FOAF:Person. The assumption is that checking for restrictions may be a faster check than checking for all access. Note that by returning true the permissions system will check each explicit triple for access permissions. So if the system can not determine if there are access restrictions it is safe to return true.\nObjects Returned from Secured Objects Models and Graphs often return objects from methods. For example the model.createStatement() returns a Statement object. That object holds a reference to the model and performs operations against the model (for example Statement.changeLiteralObject()). Since permissions provides a dynamic wrapper around the base model to create the secured model, returning the model Statement would return an object that no longer has any permissions applied. Therefore the permissions system creates a SecuredStatement that applies permission checks to all operations before calling the base Statement methods.\nAll secured objects return secured objects if those objects may read or alter the underlying graph/model.\nAll secured objects are defined as interfaces and are returned as dynamic proxies.\nAll secured objects have concrete implementations. These implementations must remain concrete to ensure that we handle all cases where returned objects may alter the underlying graph/model.\nSecured Listeners Both the Graph and the Model interfaces provide a listener framework. Listeners are attached to the graph/model and changes to the graph/model are reported to them. In order to ensure that listeners do not leak information, the principal that was active when the listener was attached is preserved in a CachedSecurityEvaluator instance. This security evaluator implementation, wraps the original implementation and retains the current user. Thus when the listener performs the permission checks the original user is used not the current user. This is why the SecurityEvaluator must use the principal parameters and not call getPrincipal() directly during evaluation calls.\nProxy Implementation The proxy implementation uses a reflection InvocationHandler strategy. This strategy results in a proxy that implements all the interfaces of the original object. The original object along with its InvocationHandler instance are kept together in an ItemHolder instance variable in the secured instance. When the invoker is called it determines if the called method is on the secured interface or not. If the method is on the secured interface the invocation handler method is called, otherwise the method on the base class is called.\n","permalink":"https://jena.apache.org/documentation/permissions/design.html","tags":null,"title":"Jena Permissions - Design"},{"categories":null,"contents":"When Jena moved from version 2 to version 3 there was a major renaming of packages. One of the packages renamed was the Jena Permissions package. It was formerly named Jena Security. There are several changes that need to occur to migrate from jena-security version 2.x to jena-permissions version 3.x.\nChanges Package Rename There are two major changes to package names.\nAs with the rest of the Jena code all references to com.hp.hpl.jena have been changed to org.apache.jena. For integrator code this means that a simple rename of the includes is generally all that is required for this. See the main Migration Notes page for other hints and tips regarding this change.\nJena Security has been renamed Jena Permissions and the Maven artifact id has been changed to jena-permissions to reflect this change.\nThe permissions assembler namespace has been changed to http://apache.org/jena/permissions/Assembler#\nExceptions Formerly Jena Permissions uses a single exception to identify the access restriction violations. With the tighter integration of permission concepts into the Jena core there are now 7 exceptions. This change will probably not required modification to the SecurityEvaluator implementation but may require modification to classes that utilize the permissions based object.\nAll exceptions are runtime exceptions and so do not have to be explicitly caught. Javadocs indicate which methods throw which exceptions.\nRemoval of org.apache.jena.permissions.AccessDeniedException. This is replace by 5 individual exceptions.\nAddition of org.apache.jena.shared.OperationDeniedException. This exception is a child of the JenaException and is the root of all operation denied states whether through process errors or through permissions violations.\nAddition of org.apache.jena.shared.PermissionDeniedException. This exception is a child of the OperationDeniedException and is the root of all operations denied through permission violations. These can be because the object was statically prohibited from performing an operation (e.g. a read-only graph) or due to the Jena Permissions layer.\nAddition of org.apache.jena.shared.AddDeniedException. This exception is a child of PermissionDeniedException and used to indicate that an attempt was made to add to an unmodifiable object. It may be thrown by read-only graphs or by the permission layer when a create restriction is violated.\nAddition of org.apache.jena.shared.DeleteDeniedException. This exception is a child of PermissionDeniedException and used to indicate that an attempt was made to delete from an unmodifiable object. It may be thrown by read-only graphs or by the permission layer when a delete restriction is violated.\nAddition of org.apache.jena.shared.ReadDeniedException. This exception is a child of PermissionDeniedException and used to indicate that a read restriction was violated.\nAddition of org.apache.jena.shared.UpdateDeniedException. This exception is a child of PermissionDeniedException and used to indicate that a update restriction was violated.\nAddition of org.apache.jena.shared.AuthenticationRequiredException. This exception is a child of OperationDeniedException and used to indicate that user authentication is required but has not occurred. This exception should be thrown when the SecurityEvaluator attempts to evaluate an operation and there is both a permissions restriction and the object returned by getPrincipal() indicates that the user is unauthenticated.\nRemoval of Classes The original \u0026ldquo;security\u0026rdquo; code was intended to be graph agnostic and so injected a \u0026ldquo;shim\u0026rdquo; layer to convert from graph specific classes to security specific classes. With the renaming of the package to \u0026ldquo;permissions\u0026rdquo; and the tighter integration to the Jena core the \u0026ldquo;shim\u0026rdquo; structure has been removed. This should make the permissions layer faster and cleaner to implement.\nSecNode The SecNode class has been removed. This was effectively a proxy for the Jena Node object and has been replaced with that object. The SecNode maintained its type (e.g. URI, Literal or Variable) using an internal Enumeration. The method getType() was used to identify the internal type. With the Jena node replacement statements of the form\nif (secNode.getType().equals( SecNode.Type.Literal )) { // do something } are replaced with\nif (node.isLiteral()) { // do something } SecNode.ANY has been replaced with Node.ANY as it served the same purpose.\nSecNode.FUTURE has been replaced with SecurityEvaluator.FUTURE and is now implemented as a blank node with the label urn:jena-permissions:FUTURE.\nSecNode.VARIABLE has been replaced with SecurityEvaluator.VARIABLE and is now implemented as a blank node with the label urn:jena-permissions:VARIABLE.\nSecTriple The SecTriple class has been removed. This was effectively a proxy for the Jena Triple object and has been replaced with that object.\nMovement of Classes SecuredItem The SecuredItem interface was moved from org.apache.jena.permissions.impl to org.apache.jena.permissions.\nAdditional Methods SecurityEvaluator The method isAuthenticatedUser( Object principal ) has been added. The SecurityEvaluator should respond true if the principal is recognized as an authenticated user. The principal object is guaranteed to have been returned from an earlier getPrincipal() call.\n","permalink":"https://jena.apache.org/documentation/permissions/migration2To3.html","tags":null,"title":"Jena Permissions - Migration notes: Version 2.x to Version 3.x"},{"categories":null,"contents":"Overview The SecurityEvaluator interface defines the access control operations. It provides the interface between the authentication (answers the question: \u0026ldquo;who are you?\u0026rdquo;) and the authorization (answers the question: \u0026ldquo;what can you do?\u0026rdquo;), as such it provides access to the current principal (user). The javadocs contain detailed requirements for implementations of the SecurityEvaluator interface, short notes are provided below.\nNOTE The permissions system caches intermediate results and will only call the evaluator if the answer is not already in the cache. There is little or no advantage to implementing caching in the SecurityEvaluator itself.\nNOTE In earlier versions ReadDeniedException was thrown whenever read permissions were not granted. The current version defines a isHardReadError method that defines what action should be taken. The default implementation has changed. See Configuration Methods section below for information.\nActions Principals may perform Create, Read, Update or Delete operations on secured resources. These operations are defined in the Action enum in the SecurityEvaluator interface.\nNode The permission system uses the standard Node.ANY to represent a wild-card in a permission check and the standard Triple.ANY to represent a triple with wild-cards in each of the three positions: subject, predicate and object.\nThe permission system introduces two new node types SecurityEvaluator.VARIABLE, which represents a variable in a permissions query, and SecurityEvaluator.FUTURE, which represents an anonymous node that will be created in the future.\nEvaluator Methods The SecurityEvaluator connects the Jena permissions system with the authentication system used by the application. The SecurityEvaluator must be able to query the authentication system, or its proxy, to determine who the \u0026ldquo;current user\u0026rdquo; is. In this context the \u0026ldquo;current user\u0026rdquo; is the one making the request. In certain instances (specifically when using listeners on secured graphs and models) the \u0026ldquo;current user\u0026rdquo; may not be the user identified by the authentication system at the time of the query.\nThe SecurityEvaluator must implement the following methods. Any of these methods may throw an AuthenticationRequiredException if there is no authenticated user.\nMost of these methods have a principal parameter. The value of that parameter is guaranteed to be a value returned from an earlier calls to getPrincipal(). The principal parameter, not the \u0026ldquo;current user\u0026rdquo; as identified by getPrincipal(), should be used for the permissions evaluation.\nNone of these methods should throw any of the PermissionDeniedException based exceptions. That is handled in a different layer.\nSee the SecurityEvaluator javadocs for detailed implementation notes.\npublic boolean evaluate( Object principal, Action action, Node graphIRI ) throws AuthenticationRequiredException; Determine if the action is permitted on the graph.\npublic boolean evaluate( Object principal, Action action, Node graphIRI, Triple triple ) throws AuthenticationRequiredException; Determine if the action is allowed on the triple within the graph.\npublic boolean evaluate( Object principal, Set\u0026lt;Action\u0026gt; actions, Node graphIRI )throws AuthenticationRequiredException; Determine if all actions are allowed on the graph.\npublic boolean evaluate( Object principal, Set\u0026lt;Action\u0026gt; actions, Node graphIRI, Triple triple ) throws AuthenticationRequiredException; Determine if all the actions are allowed on the triple within the graph.\npublic boolean evaluateAny( Object principal, Set\u0026lt;Action\u0026gt; actions, Node graphIRI ) throws AuthenticationRequiredException; Determine if any of the actions are allowed on the graph.\npublic boolean evaluateAny( Object principal, Set\u0026lt;Action\u0026gt; actions, Node graphIRI, Triple triple ) throws AuthenticationRequiredException; Determine if any of the actions are allowed on the triple within the graph.\npublic boolean evaluateUpdate( Object principal, Node graphIRI, Triple from, Triple to ) throws AuthenticationRequiredException; Determine if the user is allowed to update the \u0026ldquo;from\u0026rdquo; triple to the \u0026ldquo;to\u0026rdquo; triple.\npublic Object getPrincipal() throws AuthenticationRequiredException; Return the current principal or null if there is no current principal.\nConfiguration Methods The evaluator has one configuration method.\npublic default boolean isHardReadError() This method determines how the system will deal with read denied restrictions when attempting to create iterators, counts, or perform existential checks. If set true the system will throw a ReadDeniedException. This is the action that was perfomed in Jena version 3 and earlier. If set false, the default, methods that return iterators return empty iterators, methods that perform existential checks return false, and methods that return counts return 0 (zero).\nSample Implementation This sample is for a graph that contains a set of messages, access to the messages are limited to principals that the messages are to or from. Any triple that is not a message is not affected. This implementation simply has a setPrincipal(String name) method. A real implementation would request the user principal or name from the authentication system. This implementation also requires access to the underlying model to determine if the user has access, however, that is not a requirement of the SecurityEvaluator in general. Determining access from the information provided is an exercise for the implementer.\nNote that this implementation does not vary based on the graph being evaluated (graphIRI). The graphIRI parameter is provided for implementations where such variance is desired.\nSee the example jar for another implementation example.\npublic class ExampleEvaluator implements SecurityEvaluator { private Principal principal; private Model model; private RDFNode msgType = ResourceFactory.createResource( \u0026quot;http://example.com/msg\u0026quot; ); private Property pTo = ResourceFactory.createProperty( \u0026quot;http://example.com/to\u0026quot; ); private Property pFrom = ResourceFactory.createProperty( \u0026quot;http://example.com/from\u0026quot; ); /** * * @param model The graph we are going to evaluate against. */ public ExampleEvaluator( Model model ) { this.model = model; } @Override public boolean evaluate(Object principal, Action action, Node graphIRI) { // we allow any action on a graph. return true; } // not that in this implementation all permission checks flow through // this method. We can do this because we have a simple permissions // requirement. A more complex set of permissions requirement would // require a different strategy. private boolean evaluate( Object principalObj, Resource r ) { Principal principal = (Principal)principalObj; // we do not allow anonymous (un-authenticated) reads of data. // Another strategy would be to only require authentication if the // data being requested was restricted -- but that is a more complex // process and not suitable for this simple example. if (principal == null) { throw new AuthenticationRequiredException(); } // a message is only available to sender or recipient if (r.hasProperty( RDF.type, msgType )) { return r.hasProperty( pTo, principal.getName() ) || r.hasProperty( pFrom, principal.getName()); } return true; } // evaluate a node. private boolean evaluate( Object principal, Node node ) { if (node.equals( Node.ANY )) { // all wildcards are false. This forces each triple // to be explicitly checked. return false; } // if the node is a URI or a blank node evaluate it as a resource. if (node.isURI() || node.isBlank()) { Resource r = model.getRDFNode( node ).asResource(); return evaluate( principal, r ); } return true; } // evaluate the triple by evaluating the subject, predicate and object. private boolean evaluate( Object principal, Triple triple ) { return evaluate( principal, triple.getSubject()) \u0026amp;\u0026amp; evaluate( principal, triple.getObject()) \u0026amp;\u0026amp; evaluate( principal, triple.getPredicate()); } @Override public boolean evaluate(Object principal, Action action, Node graphIRI, Triple triple) { return evaluate( principal, triple ); } @Override public boolean evaluate(Object principal, Set\u0026lt;Action\u0026gt; actions, Node graphIRI) { return true; } @Override public boolean evaluate(Object principal, Set\u0026lt;Action\u0026gt; actions, Node graphIRI, Triple triple) { return evaluate( principal, triple ); } @Override public boolean evaluateAny(Object principal, Set\u0026lt;Action\u0026gt; actions, Node graphIRI) { return true; } @Override public boolean evaluateAny(Object principal, Set\u0026lt;Action\u0026gt; actions, Node graphIRI, Triple triple) { return evaluate( principal, triple ); } @Override public boolean evaluateUpdate(Object principal, Node graphIRI, Triple from, Triple to) { return evaluate( principal, from ) \u0026amp;\u0026amp; evaluate( principal, to ); } public void setPrincipal( String userName ) { if (userName == null) { principal = null; } principal = new BasicUserPrincipal( userName ); } @Override public Principal getPrincipal() { return principal; } @Override public boolean isPrincipalAuthenticated(Object principal) { return principal != null; } } ","permalink":"https://jena.apache.org/documentation/permissions/evaluator.html","tags":null,"title":"Jena Permissions - SecurityEvaluator implementation"},{"categories":null,"contents":"Overview Query Builder provides implementations of Ask, Construct, Select and Update builders that allow developers to create queries without resorting to StringBuilders or similar solutions. The Query Builder module is an extra package and is found in the jena-querybuilder jar.\nEach of the builders has a series of methods to define the query. Each method returns the builder for easy chaining. The example:\nSelectBuilder sb = new SelectBuilder() .addVar( \u0026#34;*\u0026#34; ) .addWhere( \u0026#34;?s\u0026#34;, \u0026#34;?p\u0026#34;, \u0026#34;?o\u0026#34; ); Query q = sb.build() ; produces\nSELECT * WHERE { ?s ?p ?o } Standard Java variables can be used in the various clauses as long as the datatype has a registered Datatype within Jena. For example:\nInteger five = Integer.valueof(5); SelectBuilder sb = new SelectBuilder() .addVar( \u0026#34;*\u0026#34; ) .addWhere( \u0026#34;?s\u0026#34;, \u0026#34;?p\u0026#34;, five ); Query q = sb.build() ; produces\nSELECT * WHERE { ?s ?p \u0026#34;5\u0026#34;^^\u0026lt;http://www.w3.org/2001/XMLSchema#integer\u0026gt; } Java Collections are properly expanded to RDF collections within the query builder provided there is a registered Datatype for the elements. Nested collections are expanded. Collections can also be defined with the standard SPARQL shorthand. So the following produce equivalent queries:\nSelectBuilder sb = new SelectBuilder() .addVar( \u0026#34;*\u0026#34; ) .addWhere( \u0026#34;?s\u0026#34;, \u0026#34;?p\u0026#34;, List.of( \u0026#34;a\u0026#34;, \u0026#34;b\u0026#34;, \u0026#34;c\u0026#34;) ); Query q = sb.build() ; and\nSelectBuilder sb = new SelectBuilder() .addVar( \u0026#34;*\u0026#34; ) .addWhere( \u0026#34;?s\u0026#34;, \u0026#34;?p\u0026#34;, \u0026#34;(\u0026#39;a\u0026#39; \u0026#39;b\u0026#39; \u0026#39;c\u0026#39;)\u0026#34; ); Query q = sb.build() ; It is common to create Var objects and use them in complex queries to make the query more readable. For example:\nVar node = Var.alloc(\u0026#34;node\u0026#34;); Var x = Var.alloc(\u0026#34;x\u0026#34;); Var y = Var.alloc(\u0026#34;y\u0026#34;); SelectBuilder sb = new SelectBuilder() .addVar(x).addVar(y) .addWhere(node, RDF.type, Namespace.Obst) .addWhere(node, Namespace.x, x) .addWhere(node, Namespace.y, y); Constructing Expressions Expressions are primarily used in filter and bind statements as well as in select clauses. All the standard expressions are implemented in the ExprFactory class. An ExprFactory can be retrieved from any Builder by calling the getExprFactory() method. This will create a Factory that has the same prefix mappings and the query. An alternative is to construct the ExprFactory directly, this factory will not have the prefixes defined in PrefixMapping.Extended.\nSelectBuilder builder = new SelectBuilder(); ExprFactory exprF = builder.getExprFactory() .addPrefix( \u0026#34;cf\u0026#34;, \u0026#34;http://vocab.nerc.ac.uk/collection/P07/current/CFSN0023/\u0026#34;); builder.addVar( exprF.floor( ?v ), ?floor ) .addWhere( ?s, \u0026#34;cf:air_temperature\u0026#34;, ?v ); Update Builder The UpdateBuilder is used to create Update, UpdateDeleteWhere or UpdateRequest objects. When an UpdateRequest is built is contains a single Update object as defined by the UpdateBuilder. Update objects can be added to an UpdateRequest using the appendTo() method.\nVar subj = Var.alloc( \u0026#34;s\u0026#34; ); Var obj = Var.alloc( \u0026#34;o\u0026#34; ); UpdateBuilder builder = new UpdateBuilder( PrefixMapping.Standard) .addInsert( subj, \u0026#34;rdfs:comment\u0026#34;, obj ) .addWhere( subj, \u0026#34;dc:title\u0026#34;, obj); UpdateRequest req = builder.buildRequest(); UpdateBuilder builder2 = new UpdateBuilder() .addPrefix( \u0026#34;dc\u0026#34;, \u0026#34;http://purl.org/dc/elements/1.1/\u0026#34;) .addDelete( subj, \u0026#34;?p\u0026#34;, obj) .where( subj, dc:creator, \u0026#34;me\u0026#34;) .appendTo( req ); Where Builder In some use cases it is desirable to create a where clause without constructing an entire query. The WhereBuilder is designed to fit this need. For example to construct the query:\nPREFIX rdfs: \u0026lt;http://www.w3.org/2000/01/rdf-schema#\u0026gt; PREFIX foaf: \u0026lt;http://xmlns.com/foaf/0.1/\u0026gt; SELECT ?page ?type WHERE { ?s foaf:page ?page . { ?s rdfs:label \u0026#34;Microsoft\u0026#34;@en . BIND (\u0026#34;A\u0026#34; as ?type) } UNION { ?s rdfs:label \u0026#34;Apple\u0026#34;@en . BIND (\u0026#34;B\u0026#34; as ?type) } } You could use a WhereBuilder to construct the union queries and add them to a Select or other query builder.\nWhereBuilder whereBuilder = new WhereBuilder() .addPrefix( \u0026#34;rdfs\u0026#34;, \u0026#34;http://www.w3.org/2000/01/rdf-schema#\u0026#34; ) addWhere( \u0026#34;?s\u0026#34;, \u0026#34;rdfs:label\u0026#34;, \u0026#34;\u0026#39;Microsoft\u0026#39;@en\u0026#34; ) .addBind( \u0026#34;\u0026#39;A\u0026#39;\u0026#34;, \u0026#34;?type\u0026#34;) .addUnion( new WhereBuilder() .addPrefix( \u0026#34;rdfs\u0026#34;, \u0026#34;http://www.w3.org/2000/01/rdf-schema#\u0026#34; ) .addWhere( \u0026#34;?s\u0026#34;, \u0026#34;rdfs:label\u0026#34;, \u0026#34;\u0026#39;Apple\u0026#39;@en\u0026#34; ) .addBind( \u0026#34;\u0026#39;B\u0026#39;\u0026#34;, \u0026#34;?type\u0026#34;) ); SelectBuilder builder = new SelectBuilder() .addPrefix( \u0026#34;rdfs\u0026#34;, \u0026#34;http://www.w3.org/2000/01/rdf-schema#\u0026#34; ) .addPrefix( \u0026#34;foaf\u0026#34;, \u0026#34;http://xmlns.com/foaf/0.1/\u0026#34; ); .addVar( \u0026#34;?page\u0026#34;) .addVar( \u0026#34;?type\u0026#34; ) .addWhere( \u0026#34;?s\u0026#34;, \u0026#34;foaf:page\u0026#34;, \u0026#34;?page\u0026#34; ) .addWhere( whereBuilder ); The where clauses could be built inline as:\nSelectBuilder builder = new SelectBuilder() .addPrefixs( PrefixMapping.Standard ) .addPrefix( \u0026#34;foaf\u0026#34;, \u0026#34;http://xmlns.com/foaf/0.1/\u0026#34; ); .addVar( \u0026#34;?page\u0026#34;) .addVar( \u0026#34;?type\u0026#34; ) .addWhere( \u0026#34;?s\u0026#34;, \u0026#34;foaf:page\u0026#34;, \u0026#34;?page\u0026#34; ) .addWhere( new WhereBuilder() .addPrefix( \u0026#34;rdfs\u0026#34;, \u0026#34;http://www.w3.org/2000/01/rdf-schema#\u0026#34; ) .addWhere( \u0026#34;?s\u0026#34;, \u0026#34;rdfs:label\u0026#34;, \u0026#34;\u0026#39;Microsoft\u0026#39;@en\u0026#34; ) .addBind( \u0026#34;\u0026#39;A\u0026#39;\u0026#34;, \u0026#34;?type\u0026#34;) .addUnion( new WhereBuilder() .addPrefix( \u0026#34;rdfs\u0026#34;, \u0026#34;http://www.w3.org/2000/01/rdf-schema#\u0026#34; ) .addWhere( \u0026#34;?s\u0026#34;, \u0026#34;rdfs:label\u0026#34;, \u0026#34;\u0026#39;Apple\u0026#39;@en\u0026#34; ) .addBind( \u0026#34;\u0026#39;B\u0026#39;\u0026#34;, \u0026#34;?type\u0026#34;) ) ); Template Usage In addition to making it easier to build valid queries the QueryBuilder has a clone method. Using this a developer can create as \u0026ldquo;Template\u0026rdquo; query and add to it as necessary.\nFor example using the above query as the \u0026ldquo;template\u0026rdquo; with this code:\nSelectBuilder sb2 = sb.clone(); sb2.addPrefix( \u0026#34;foaf\u0026#34;, \u0026#34;http://xmlns.com/foaf/0.1/\u0026#34; ) .addWhere( ?s, RDF.type, \u0026#34;foaf:Person\u0026#34;) ; produces\nPREFIX foaf: \u0026lt;http://xmlns.com/foaf/0.1/\u0026gt; SELECT * WHERE { ?s ?p ?o . ?s \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type\u0026gt; foaf:person . } Prepared Statement Usage The query builders have the ability to replace variables with other values. This can be\nSelectBuilder sb = new SelectBuilder() .addVar( \u0026#34;*\u0026#34; ) .addWhere( \u0026#34;?s\u0026#34;, \u0026#34;?p\u0026#34;, \u0026#34;?o\u0026#34; ); sb.setVar( Var.alloc( \u0026#34;?o\u0026#34; ), NodeFactory.createURI( \u0026#34;http://xmlns.com/foaf/0.1/Person\u0026#34; ) ) ; Query q = sb.build(); produces\nSELECT * WHERE { ?s ?p \u0026lt;http://xmlns.com/foaf/0.1/Person\u0026gt; } ","permalink":"https://jena.apache.org/documentation/extras/querybuilder/","tags":null,"title":"Jena Query Builder - A query builder for Jena."},{"categories":null,"contents":" RDF/XML Input RDF/XML Output RDF/XML Input The RIOT RDF/XML parser is called RRX.\nThe ARP RDF/XML parser is stil available but wil be rmoved from Apache Jena.\nLegacy ARP RDF/XML input RDF/XML Output Two forms for output are provided:\nThe default output Lang.RDFXML, historically called \u0026ldquo;RDF/XML-ABBREV\u0026rdquo;, which also has a format name RDFFormat.RDFXML_PRETTY. It produces readable output. It requires working memory to analyse the data to be written and it is not streaming.\nFor efficient, streaming output, the basic RDF/XML RDFFormat.RDFXML_PLAIN works for data of any size. It outputs each subject together with all property values without using the full features of RDF/XML.\nFor \u0026ldquo;RDF/XML-ABBREV\u0026rdquo;:\nRDFDataMgr.write(System.out, model, Lang.RDFXML); or\nRDFWriter.source(model).lang(Lang.RDFXML).output(System.out); and for plain RDF/XML:\nRDFDataMgr.write(System.out, model, RDFFormat.RDFXML_PLAIN); or\nRDFWriter.source(model).format(RDFFormat.RDFXML_PLAIN).output(System.out); RDF/XML advanced output ","permalink":"https://jena.apache.org/documentation/io/rdfxml-io.html","tags":null,"title":"Jena RDF XML"},{"categories":null,"contents":"Legacy Documentation : may not be up-to-date\nThe original ARQ parser will be removed from Jena. * This is a guide to the RDF/XML I/O subsystem of Jena, ARP. The first section gives a quick introduction to the I/O subsystem. The other sections are aimed at users wishing to use advanced features within the RDF/XML I/O subsystem.\nOther content related to Jena RDF/XML How-To includes:\nDetails of ARP, the Jena RDF/XML parser Quick Introduction The main I/O methods in Jena use InputStreams and OutputStreams. This is import to correctly handle character sets.\nThese methods are found on the Model interface. These are:\nModel [read](/documentation/javadoc/jena/org.apache.jena.core/org/apache/jena/rdf/model/Model.html#read(java.io.InputStream, java.lang.String))(java.io.InputStream in, java.lang.String base) Add statements from an RDF/XML serialization Model [read](/documentation/javadoc/jena/org.apache.jena.core/org/apache/jena/rdf/model/Model.html#read(java.io.InputStream, java.lang.String, java.lang.String))(java.io.InputStream in, java.lang.String base, java.lang.String lang) Add RDF statements represented in language lang to the model. Model read(java.lang.String url) Add the RDF statements from an XML document. Model write(java.io.OutputStream out) Write the model as an XML document. Model [write](/documentation/javadoc/jena/org.apache.jena.core/org/apache/jena/rdf/model/Model.html#write(java.io.OutputStream, java.lang.String))(java.io.OutputStream out, java.lang.String lang) Write a serialized representation of a model in a specified language. Model [write](/documentation/javadoc/jena/org.apache.jena.core/org/apache/jena/rdf/model/Model.html#write(java.io.OutputStream, java.lang.String, java.lang.String))(java.io.OutputStream out, java.lang.String lang, java.lang.String base) Write a serialized representation of a model in a specified language. The built-in languages are \u0026quot;RDF/XML\u0026quot;, \u0026quot;RDF/XML-ABBREV\u0026quot; as well as \u0026quot;N-TRIPLE\u0026quot;, and \u0026quot;TURTLE\u0026quot;.\nThere are also methods which use Readers and Writers. Do not use them, unless you are sure it is correct to. In advanced applications, they are useful, see below; and there is every intention to continue to support them. The RDF/XML parser now checks to see if the Model.read(Reader …) calls are being abused, and issues ERR_ENCODING_MISMATCH and WARN_ENCODING_MISMATCH errors. Most incorrect usage of Readers for RDF/XML input will result in such errors. Most incorrect usage of Writers for RDF/XML output will produce correct XML by using an appropriate XML declaration giving the encoding - e.g.\n\u0026lt;?xml version='1.0' encoding='ISO-8859-15'?\u0026gt; However, such XML is less portable than XML in UTF-8. Using the Model.write(OutputStream …) methods allows the Jena system code to choose UTF-8 encoding, which is the best choice.\nRDF/XML, RDF/XML-ABBREV For input, both of these are the same, and fully implement the RDF Syntax Recommendation, see conformance.\nFor output, \u0026quot;RDF/XML\u0026quot;, produces regular output reasonably efficiently, but it is not readable. In contrast, \u0026quot;RDF/XML-ABBREV\u0026quot;, produces readable output without much regard to efficiency.\nAll the readers and writers for RDF/XML are configurable, see below, input and output.\nCharacter Encoding Issues The easiest way to not read or understand this section is always to use InputStreams and OutputStreams with Jena, and to never use Readers and Writers. If you do this, Jena will do the right thing, for the vast majority of users. If you have legacy code that uses Readers and Writers, or you have special needs with respect to encodings, then this section may be helpful. The last part of this section summarizes the character encodings supported by Jena.\nCharacter encoding is the way that characters are mapped to bytes, shorts or ints. There are many different character encodings. Within Jena, character encodings are important in their relationship to Web content, particularly RDF/XML files, which cannot be understood without knowing the character encoding, and in relationship to Java, which provides support for many character encodings.\nThe Java approach to encodings is designed for ease of use on a single machine, which uses a single encoding; often being a one-byte encoding, e.g. for European languages which do not need thousands of different characters.\nThe XML approach is designed for the Web which uses multiple encodings, and some of them requiring thousands of characters.\nOn the Web, XML files, including RDF/XML files, are by default encoded in \u0026ldquo;UTF-8\u0026rdquo; (Unicode). This is always a good choice for creating content, and is the one used by Jena by default. Other encodings can be used, but may be less interoperable. Other encodings should be named using the canonical name registered at IANA, but other systems have no obligations to support any of these, other than UTF-8 and UTF-16.\nWithin Java, encodings appear primarily with the InputStreamReader and OutputStreamWriter classes, which convert between bytes and characters using a named encoding, and with their subclasses, FileReader and FileWriter, which convert between bytes in the file and characters using the default encoding of the platform. It is not possible to change the encoding used by a Reader or Writer while it is being used. The default encoding of the platform depends on a large range of factors. This default encoding may be useful for communicating with other programs on the same platform. Sometimes the default encoding is not registered at IANA, and so Jena application developers should not use the default encoding for Web content, but use UTF-8.\nEncodings Supported in Jena 2.2 and later On RDF/XML input any encoding supported by Java can be used. If this is not a canonical name registered at IANA a warning message is produced. Some encodings have better support in Java 1.5 than Java 1.4; for such encodings a warning message is produced on Java 1.4, suggesting upgrading.\nOn RDF/XML output any encoding supported by Java can be used, by constructing an OutputStreamWriter using that encoding, and using that for output. If the encoding is not registered at IANA then a warning message is produced. Some encodings have better support in Java 1.5 than Java 1.4; for such encodings a warning message is produced on Java 1.4, suggesting upgrading.\nJava can be configured either with or without a jar of extra encodings on the classpath. This jar is charsets.jar and sits in the lib directory of the Java Runtime. If this jar is not on your classpath then the range of encodings supported is fairly small.\nThe encodings supported by Java are listed by Sun, for 1.4.2, and 1.5.0. For an encoding that is not in these lists it is possible to write your own transcoder as documented in the java.nio.charset package documentation.\nEarlier versions of Jena supported fewer encodings.\nWhen to Use Reader and Writer? Infrequently.\nDespite the character encoding issues, it is still sometimes appropriate to use Readers and Writers with Jena I/O. A good example is using Readers and Writers into StringBuffers in memory. These do not need to be encoded and decoded so a character encoding does not need to be specified. Other examples are when an advanced user explicitly wishes to correctly control the encoding.\nModel [read](/documentation/javadoc/jena/org.apache.jena.core/org/apache/jena/rdf/model/Model.html#read(java.io.Reader, java.lang.String))(java.io.Reader reader, java.lang.String base) Using this method is often a mistake. Model [read](/documentation/javadoc/jena/org.apache.jena.core/org/apache/jena/rdf/model/Model.html#read(java.io.Reader, java.lang.String, java.lang.String))(java.io.Reader reader, java.lang.String base, java.lang.String lang) Using this method is often a mistake. Model write(java.io.Writer writer) Caution! Write the model as an XML document. Model [write](/documentation/javadoc/jena/org.apache.jena.core/org/apache/jena/rdf/model/Model.html#write(java.io.Writer, java.lang.String))(java.io.Writer writer, java.lang.String lang) Caution! Write a serialized representation of a model in a specified language. Model [write](/documentation/javadoc/jena/org.apache.jena.core/org/apache/jena/rdf/model/Model.html#write(java.io.Writer, java.lang.String, java.lang.String))(java.io.Writer writer, java.lang.String lang, java.lang.String base) Caution! Write a serialized representation of a model in a specified language. Incorrect use of these read(Reader, …) methods results in warnings and errors with RDF/XML and RDF/XML-ABBREV (except in a few cases where the incorrect use cannot be automatically detected). Incorrect use of the write(Writer, …) methods results in peculiar XML declarations such as \u0026lt;?xml version=\u0026quot;1.0\u0026quot; encoding=\u0026quot;WINDOWS-1252\u0026quot;?\u0026gt;. This would reflect that the character encoding you used (probably without realizing) in your Writer is registered with IANA under the name \u0026ldquo;WINDOWS-1252\u0026rdquo;. The resulting XML is of reduced portability as a result. Glenn Marcy notes:\nsince UTF-8 and UTF-16 are the only encodings REQUIRED to be understood by all conformant XML processors, even ISO-8859-1 would technically be on shaky ground if not for the fact that it is in such widespread use that every reasonable XML processor supports it.With N-TRIPLE incorrect use is usually benign, since N-TRIPLE is ascii based.\nCharacter encoding issues of N3 are not well-defined; hence use of these methods may require changes in the future. Use of the InputStream and OutputStream methods will allow your code to work with future versions of Jena which do the right thing - whatever that is. Currently the OutputStream methods use UTF-8 encoding.\nIntroduction to Advanced Jena I/O The RDF/XML input and output is configurable. However, to configure it, it is necessary to access an RDFReader or RDFWriter object that remains hidden in the simpler interface above.\nThe four vital calls in the Model interface are:\nRDFReader getReader() Return an RDFReader instance for the default serialization language. RDFReader getReader(java.lang.String lang) Return an RDFReader instance for the specified serialization language. RDFReader getWriter() Return an RDFWriter instance for the default serialization language. RDFReader getWriter(java.lang.String lang) An RDFWriter instance for the specified serialization language. Each of these calls returns an RDFReader or RDFWriter that can be used to read or write any Model (not just the one which created it). As well as the necessary [read](/documentation/javadoc/jena/org.apache.jena.core/org/apache/jena/rdf/model/RDFReader.html#read(org.apache.jena.rdf.model.Model, java.io.InputStream, java.lang.String)) and [write](/documentation/javadoc/jena/org.apache.jena.core/org/apache/jena/rdf/model/RDFWriter.html#write(org.apache.jena.rdf.model.Model, java.io.OutputStream, java.lang.String)) methods, these interfaces provide:\nRDFErrorHandler setErrorHandler( RDFErrorHandler errHandler ) Set an error handler for the reader java.lang.Object [setProperty](/documentation/javadoc/jena/org.apache.jena.core/org/apache/jena/rdf/model/RDFReader.html#setProperty(java.lang.String, java.lang.Object))(java.lang.String propName, java.lang.Object propValue) Set the value of a reader property. Setting properties, or the error handler, on an RDFReader or an RDFWriter allows the programmer to access non-default behaviour. Moreover, since the RDFReader and RDFWriter is not bound to a specific Model, a typical idiom is to create the RDFReader or RDFWriter on system initialization, to set the appropriate properties so that it behaves exactly as required in your application, and then to do all subsequent I/O through it.\nModel m = ModelFactory.createDefaultModel(); RDFWriter writer = m.getRDFWriter(); m = null; // m is no longer needed. writer.setErrorHandler(myErrorHandler); writer.setProperty(\u0026quot;showXmlDeclaration\u0026quot;,\u0026quot;true\u0026quot;); writer.setProperty(\u0026quot;tab\u0026quot;,\u0026quot;8\u0026quot;); writer.setProperty(\u0026quot;relativeURIs\u0026quot;,\u0026quot;same-document,relative\u0026quot;); … Model marray[]; … for (int i=0; i\u0026lt;marray.length; i++) { … OutputStream out = new FileOutputStream(\u0026quot;foo\u0026quot; + i + \u0026quot;.rdf\u0026quot;); writer.write(marray[i], out, \u0026quot;http://example.org/\u0026quot;); out.close(); } Note that all of the current implementations are synchronized, so that a specific RDFReader cannot be reading two different documents at the same time. In a multi-threaded application this may suggest a need for a pool of RDFReaders and/or RDFWriters, or alternatively to create, initialize, use and discard them as needed.\nFor N-TRIPLE there are currently no properties supported for either the RDFReader or the RDFWriter. Hence this idiom above is not very helpful, and just using the Model.write() methods may prove easier.\nFor RDF/XML and RDF/XML-ABBREV, there are many options in both the RDFReader and the RDFWriter. N3 has options on the RDFWriter. These options are detailed below. For RDF/XML they are also found in the JavaDoc for JenaReader.[setProperty](/documentation/javadoc/jena/org.apache.jena.core/org/apache/jena/rdfxml/xmlinput/JenaReader.html#setProperty(java.lang.String, java.lang.Object))(String, Object) and RDFXMLWriterI.[setProperty](/documentation/javadoc/jena/org.apache.jena.core/org/apache/jena/xmloutput/RDFXMLWriterI.html#setProperty(java.lang.String, java.lang.Object))(String, Object).\nAdvanced RDF/XML Input For access to these advanced features, first get an RDFReader object that is an instance of an ARP parser, by using the getReader() method on any Model. It is then configured using the [setProperty](/documentation/javadoc/jena/org.apache.jena.core/org/apache/jena/rdfxml/xmlinput/JenaReader.html#setProperty(java.lang.String, java.lang.Object))(String, Object) method. This changes the properties for parsing RDF/XML. Many of the properties change the RDF parser, some change the XML parser. (The Jena RDF/XML parser, ARP, implements the RDF grammar over a Xerces2-J XML parser). However, changing the features and properties of the XML parser is not likely to be useful, but was easy to implement.\n[setProperty](/documentation/javadoc/jena/org.apache.jena.core/org/apache/jena/rdfxml/xmlinput/JenaReader.html#setProperty(java.lang.String, java.lang.Object))(String, Object) can be used to set and get:\nARP properties These allow fine grain control over the extensive error reporting capabilities of ARP. And are detailed directly below. SAX2 features See Xerces features. Value should be given as a String \u0026quot;true\u0026quot; or \u0026quot;false\u0026quot; or a Boolean. SAX2 properties See Xerces properties. Xerces features See Xerces features. Value should be given as a String \u0026quot;true\u0026quot; or \u0026quot;false\u0026quot; or a Boolean. Xerces properties See Xerces properties. ARP properties An ARP property is referred to either by its property name, (see below) or by an absolute URL of the form http://jena.hpl.hp.com/arp/properties/\u0026lt;PropertyName\u0026gt;. The value should be a String, an Integer or a Boolean depending on the property.\nARP property names and string values are case insensitive.\nProperty Name Description Value class Legal Values iri-rules Set the engine for checking and resolving. \u0026quot;strict\u0026quot; sets the IRI engine with rules for valid IRIs, XLink and RDF; it does not permit spaces in IRIs. \u0026quot;iri\u0026quot;sets the IRI engine to IRI (RFC 3986, RFC 3987) . The default is \u0026quot;lax\u0026quot;(for backwards compatibility), the rules for RDF URI references only, which does permit spaces although the use of spaces is not good practice. String lax\nstrict\niri error-mode ARPOptions.setDefaultErrorMode() ARPOptions.setLaxErrorMode()\nARPOptions.setStrictErrorMode()\nARPOptions.setStrictErrorMode(int)\nThis allows a coarse-grained approach to control of error handling. Setting this property is equivalent to setting many of the fine-grained error handling properties. String default\nlax\nstrict\nstrict-ignore\nstrict-warning\nstrict-error\nstrict-fatal embedding ARPOptions.setEmbedding(boolean) This sets ARP to look for RDF embedded within an enclosing XML document. String or Boolean true\nfalse ERR_\u0026lt;XXX\u0026gt; WARN_\u0026lt;XXX\u0026gt;\nIGN_\u0026lt;XXX\u0026gt; See ARPErrorNumbers for a complete list of the error conditions detected. Setting one of these properties is equivalent to the method ARPOptions.setErrorMode(int, int). Thus fine-grained control over the behaviour in response to specific error conditions is possible. String or Integer EM_IGNORE\nEM_WARNING\nEM_ERROR\nEM_FATAL As an example, if you are working in an environment with legacy RDF data that uses unqualified RDF attributes such as \u0026ldquo;about\u0026rdquo; instead of \u0026ldquo;rdf:about\u0026rdquo;, then the following code is appropriate:\nModel m = ModelFactory.createDefaultModel(); RDFReader arp = m.getReader(); m = null; // m is no longer needed. // initialize arp // Do not warn on use of unqualified RDF attributes. arp.setProperty(\u0026quot;WARN_UNQUALIFIED_RDF_ATTRIBUTE\u0026quot;,\u0026quot;EM_IGNORE\u0026quot;); … InputStream in = new FileInputStream(fname); arp.read(m,in,url); in.close(); As a second example, suppose you wish to work in strict mode, but allow \u0026quot;daml:collection\u0026quot;, the following works:\n… arp.setProperty(\u0026quot;error-mode\u0026quot;, \u0026quot;strict\u0026quot; ); arp.setProperty(\u0026quot;IGN_DAML_COLLECTION\u0026quot;,\u0026quot;EM_IGNORE\u0026quot;); … The other way round does not work.\n… arp.setProperty(\u0026quot;IGN_DAML_COLLECTION\u0026quot;,\u0026quot;EM_IGNORE\u0026quot;); arp.setProperty(\u0026quot;error-mode\u0026quot;, \u0026quot;strict\u0026quot; ); … This is because in strict mode IGN_DAML_COLLECTION is treated as an error, and so the second call to setProperty overwrites the effect of the first.\nThe IRI rules and resolver can be set on a per-reader basis:\nInputStream in = ... ; String baseURI = ... ; Model model = ModelFactory.createDefaultModel(); RDFReader r = model.getReader(\u0026quot;RDF/XML\u0026quot;); r.setProperty(\u0026quot;iri-rules\u0026quot;, \u0026quot;strict\u0026quot;) ; r.setProperty(\u0026quot;error-mode\u0026quot;, \u0026quot;strict\u0026quot;) ; // Warning will be errors. // Alternative to the above \u0026quot;error-mode\u0026quot;: set specific warning to be an error. //r.setProperty( \u0026quot;WARN_MALFORMED_URI\u0026quot;, ARPErrorNumbers.EM_ERROR) ; r.read(model, in, baseURI) ; in.close(); The global default IRI engine can be set with:\nARPOptions.setIRIFactoryGlobal(IRIFactory.iriImplementation()) ; or other IRI rule engine from IRIFactory.\nInterrupting ARP ARP can be interrupted using the Thread.interrupt() method. This causes an ERR_INTERRUPTED error during the parse, which is usually treated as a fatal error.\nHere is an illustrative code sample:\nARP a = new ARP(); final Thread arpt = Thread.currentThread(); Thread killt = new Thread(new Runnable() { public void run() { try { Thread.sleep(tim); } catch (InterruptedException e) { } arpt.interrupt(); } }); killt.start(); try { in = new FileInputStream(fileName); a.load(in); in.close(); fail(\u0026quot;Thread was not interrupted.\u0026quot;); } catch (SAXParseException e) { } Advanced RDF/XML Output The first RDF/XML output question is whether to use the \u0026quot;RDF/XML\u0026quot; or \u0026quot;RDF/XML-ABBREV\u0026quot; writer. While some of the code is shared, these two writers are really very different, resulting in different but equivalent output. RDF/XML-ABBREV is slower, but should produce more readable XML.\nFor access to advanced features, first get an RDFWriter object, of the appropriate language, by using getWriter(\u0026quot;RDF/XML\u0026quot;) or getWriter(\u0026quot;RDF/XML-ABBREV\u0026quot;) on any Model. It is then configured using the [setProperty](/documentation/javadoc/jena/org.apache.jena.core/org/apache/jena/rdfxml/xmlinput/JenaReader.html#setProperty(java.lang.String, java.lang.Object))(String, Object) method. This changes the properties for writing RDF/XML.\nProperties to Control RDF/XML Output Property NameDescriptionValue classLegal Values xmlbase The value to be included for an xml:base attribute on the root element in the file. String A URI string, or null (default) longId Whether to use long or short id's for anon resources. Short id's are easier to read and are the default, but can run out of memory on very large models. String or Boolean \"true\", \"false\" (default) allowBadURIs URIs in the graph are, by default, checked prior to serialization. String or Boolean \"true\", \"false\" (default) relativeURIs What sort of relative URIs should be used. A comma separated list of options: same-document\nsame-document references (e.g. \u0026quot;\u0026quot; or \u0026ldquo;#foo\u0026rdquo;) network\nnetwork paths e.g. \u0026quot;//example.org/foo\u0026quot; omitting the URI scheme absolute\nabsolute paths e.g. \u0026quot;/foo\u0026quot; omitting the scheme and authority relative\nrelative path not beginning in \u0026quot;../\u0026quot; parent\nrelative path beginning in \u0026quot;../\u0026quot; grandparent\nrelative path beginning in \u0026quot;../../\u0026quot; The default value is \u0026ldquo;same-document, absolute, relative, parent\u0026rdquo;. To switch off relative URIs use the value \u0026ldquo;\u0026rdquo;. Relative URIs of any of these types are output where possible if and only if the option has been specified.\nString \u0026nbsp; showXmlDeclaration If true, an XML Declaration is included in the output, if false no XML declaration is included. The default behaviour only gives an XML Declaration when asked to write to an `OutputStreamWriter` that uses some encoding other than UTF-8 or UTF-16. In this case the encoding is shown in the XML declaration. To ensure that the encoding attribute is shown in the XML declaration either: Set this option to true and use the write(Model,Writer,String) variant with an appropriate OutputStreamWriter. Or set this option to false, and write the declaration to an OutputStream before calling write(Model,OutputStream,String). true, \"true\", false, \"false\" or \"default\" can be true, false or \"default\" (null) showDoctypeDeclaration If true, an XML Doctype declaration is included in the output. This declaration includes a `!ENTITY` declaration for each prefix mapping in the model, and any attribute value that starts with the URI of that mapping is written as starting with the corresponding entity invocation. String or Boolean true, false, \"true\", \"false\" tab The number of spaces with which to indent XML child elements. String or Integer positive integer \"2\" is the default attributeQuoteChar How to write XML attributes. String \"\\\"\" or \"'\" blockRules A list of `Resource` or a `String` being a comma separated list of fragment IDs from [http://www.w3.org/TR/rdf-syntax-grammar](http://www.w3.org/TR/rdf-syntax-grammar) indicating grammar rules that will not be used. Rules that can be blocked are: section-Reification (RDFSyntax.sectionReification) section-List-Expand (RDFSyntax.sectionListExpand) parseTypeLiteralPropertyElt (RDFSyntax.parseTypeLiteralPropertyElt) parseTypeResourcePropertyElt (RDFSyntax.parseTypeLiteralPropertyElt) parseTypeCollectionPropertyElt (RDFSyntax.parseTypeCollectionPropertyElt) idAttr (RDFSyntax.idAttr) propertyAttr (RDFSyntax.propertyAttr) In addition \u0026quot;daml:collection\u0026quot; (DAML_OIL.collection) can be blocked. Blocking idAttr also blocks section-Reification. By default, rule propertyAttr is blocked. For the basic writer (RDF/XML) only parseTypeLiteralPropertyElt has any effect, since none of the other rules are implemented by that writer.\nResource[] or String prettyTypes Only for the RDF/XML-ABBREV writer. This is a list of the types of the principal objects in the model. The writer will tend to create RDF/XML with resources of these types at the top level. Resource[] As an example,\nRDFWriter w = m.getWriter(\u0026quot;RDF/XML-ABBREV\u0026quot;); w.setProperty(\u0026quot;attributeQuoteChar\u0026quot;,\u0026quot;'\u0026quot;); w.setProperty(\u0026quot;showXMLDeclaration\u0026quot;,\u0026quot;true\u0026quot;); w.setProperty(\u0026quot;tab\u0026quot;,\u0026quot;1\u0026quot;); w.setProperty(\u0026quot;blockRules\u0026quot;, \u0026quot;daml:collection,parseTypeLiteralPropertyElt,\u0026quot; +\u0026quot;parseTypeResourcePropertyElt,parseTypeCollectionPropertyElt\u0026quot;); creates a writer that does not use rdf:parseType (preferring rdf:datatype for rdf:XMLLiteral), indents only a little, and produces the XMLDeclaration. Attributes are used, and are quoted with \u0026quot;'\u0026quot;.\nNote that property attributes are not used at all, by default. However, the RDF/XML-ABBREV writer includes a rule to produce property attributes when the value does not contain any spaces. This rule is normally switched off. This rule can be turned on selectively by using the blockRules property as detailed above.\nConformance The RDF/XML I/O endeavours to conform with the RDF Syntax Recommendation.\nThe parser must be set to strict mode. (Note that, the conformant behaviour for rdf:parseType=\u0026quot;daml:collection\u0026quot; is to silently turn \u0026quot;daml:collection\u0026quot; into \u0026quot;Literal\u0026quot;).\nThe RDF/XML writer is conformant, but does not exercise much of the grammar.\nThe RDF/XML-ABBREV writer exercises all of the grammar and is conformant except that it uses the daml:collection construct for DAML ontologies. This non-conformant behaviour can be switched off using the blockRules property.\nFaster RDF/XML I/O To optimise the speed of writing RDF/XML it is suggested that all URI processing is turned off. Also do not use RDF/XML-ABBREV. It is unclear whether the longId attribute is faster or slower; the short IDs have to be generated on the fly and a table maintained during writing. The longer IDs are long, and hence take longer to write. The following creates a faster writer:\nModel m; … … RDFWriter fasterWriter = m.getWriter(\u0026quot;RDF/XML\u0026quot;); fasterWriter.setProperty(\u0026quot;allowBadURIs\u0026quot;,\u0026quot;true\u0026quot;); fasterWriter.setProperty(\u0026quot;relativeURIs\u0026quot;,\u0026quot;\u0026quot;); fasterWriter.setProperty(\u0026quot;tab\u0026quot;,\u0026quot;0\u0026quot;); When reading RDF/XML the check for reuse of rdf:ID has a memory overhead, which can be significant for very large files. In this case, this check can be suppressed by telling ARP to ignore this error.\nModel m; … … RDFReader bigFileReader = m.getReader(\u0026quot;RDF/XML\u0026quot;); bigFileReader.setProperty(\u0026quot;WARN_REDEFINITION_OF_ID\u0026quot;,\u0026quot;EM_IGNORE\u0026quot;); … ","permalink":"https://jena.apache.org/documentation/io/rdfxml_howto.html","tags":null,"title":"Jena RDF/XML How-To"},{"categories":null,"contents":"Legacy Documentation : not up-to-date\nThe original ARP parser will be removed from Jena.\nThe current RDF/XML parser is RRX.\nThis is a guide to the RDF/XML legacy ARP input subsystem of Jena.\nThe ARP RDF/XML parser is designed for use with RIOT and to have the same handling of errors, IRI resolution, and treatment of base IRIs as other RIOT readers.\nThe ARP0 parser is the original standalone parser.\nRDF/XML Input The usual way to access the RDF/XML parser is via RDFDataMgr or RDFParser.\nModel model = RDFDataMgr.loadModel(\u0026quot;data.arp\u0026quot;); or\nModel model = RDFParser.source(\u0026quot;data.arp\u0026quot;).toModel(); Note the file extension is arp.\nLegacy ARP RDF/XML parser RIOT integrated ARP parser To access the parse from Java code use constants RRX.RDFXML_ARP1.\nThe syntax name is arp or arp1.\nThe file extension is arp or arp1.\nOriginal ARP0 parser To access the parse from Java code use constants RRX.RDFXML_ARP0.\nThe syntax name is arp0.\nThe file extension is arp0.\nDetails of the original Jena RDF/XML parser, ARP.\nAdvanced RDF/XML Input For access to these advanced features, first get an RDFReader object that is an instance of an ARP parser, by using the getReader() method on any Model. It is then configured using the [setProperty](/documentation/javadoc/jena/org.apache.jena.core/org/apache/jena/rdfxml/xmlinput0/JenaReader.html#setProperty(java.lang.String, java.lang.Object))(String, Object) method. This changes the properties for parsing RDF/XML. Many of the properties change the RDF parser, some change the XML parser. (The Jena RDF/XML parser, ARP, implements the RDF grammar over a Xerces2-J XML parser). However, changing the features and properties of the XML parser is not likely to be useful, but was easy to implement.\n[setProperty](/documentation/javadoc/jena/org.apache.jena.core/org/apache/jena/rdfxml/xmlinput0/JenaReader.html#setProperty(java.lang.String, java.lang.Object))(String, Object) can be used to set and get:\nARP properties These allow fine grain control over the extensive error reporting capabilities of ARP. And are detailed directly below. SAX2 features See Xerces features. Value should be given as a String \u0026quot;true\u0026quot; or \u0026quot;false\u0026quot; or a Boolean. SAX2 properties See Xerces properties. Xerces features See Xerces features. Value should be given as a String \u0026quot;true\u0026quot; or \u0026quot;false\u0026quot; or a Boolean. Xerces properties See Xerces properties. ARP properties An ARP property is referred to either by its property name, (see below) or by an absolute URL of the form http://jena.hpl.hp.com/arp/properties/\u0026lt;PropertyName\u0026gt;. The value should be a String, an Integer or a Boolean depending on the property.\nARP property names and string values are case insensitive.\nProperty Name Description Value class Legal Values iri-rules Set the engine for checking and resolving. \u0026quot;strict\u0026quot; sets the IRI engine with rules for valid IRIs, XLink and RDF; it does not permit spaces in IRIs. \u0026quot;iri\u0026quot;sets the IRI engine to IRI (RFC 3986, RFC 3987) . The default is \u0026quot;lax\u0026quot;(for backwards compatibility), the rules for RDF URI references only, which does permit spaces although the use of spaces is not good practice. String lax\nstrict\niri error-mode ARPOptions.setDefaultErrorMode() ARPOptions.setLaxErrorMode()\nARPOptions.setStrictErrorMode()\nARPOptions.setStrictErrorMode(int)\nThis allows a coarse-grained approach to control of error handling. Setting this property is equivalent to setting many of the fine-grained error handling properties. String default\nlax\nstrict\nstrict-ignore\nstrict-warning\nstrict-error\nstrict-fatal embedding ARPOptions.setEmbedding(boolean) This sets ARP to look for RDF embedded within an enclosing XML document. String or Boolean true\nfalse ERR_\u0026lt;XXX\u0026gt; WARN_\u0026lt;XXX\u0026gt;\nIGN_\u0026lt;XXX\u0026gt; See ARPErrorNumbers for a complete list of the error conditions detected. Setting one of these properties is equivalent to the method ARPOptions.setErrorMode(int, int). Thus fine-grained control over the behaviour in response to specific error conditions is possible. String or Integer EM_IGNORE\nEM_WARNING\nEM_ERROR\nEM_FATAL To set ARP properties, create a map of values to be set and put this in parser context:\nMap\u0026lt;String, Object\u0026gt; properties = new HashMap\u0026lt;\u0026gt;(); // See class ARPErrorNumbers for the possible ARP properties. properties.put(\u0026#34;WARN_BAD_NAME\u0026#34;, \u0026#34;EM_IGNORE\u0026#34;); // Build and run a parser Model model = RDFParser.create() .lang(Lang.RDFXML) .source(...) .set(SysRIOT.sysRdfReaderProperties, properties) .base(\u0026#34;http://base/\u0026#34;) .toModel(); System.out.println(\u0026#34;== Parsed data output in Turtle\u0026#34;); RDFDataMgr.write(System.out, model, Lang.TURTLE); See example ExRIOT_RDFXML_ReaderProperties.java.\nLegacy Example\nAs an example, if you are working in an environment with legacy RDF data that uses unqualified RDF attributes such as \u0026ldquo;about\u0026rdquo; instead of \u0026ldquo;rdf:about\u0026rdquo;, then the following code is appropriate:\nModel m = ModelFactory.createDefaultModel(); RDFReader arp = m.getReader(); m = null; // m is no longer needed. // initialize arp // Do not warn on use of unqualified RDF attributes. arp.setProperty(\u0026quot;WARN_UNQUALIFIED_RDF_ATTRIBUTE\u0026quot;,\u0026quot;EM_IGNORE\u0026quot;); … InputStream in = new FileInputStream(fname); arp.read(m,in,url); in.close(); As a second example, suppose you wish to work in strict mode, but allow \u0026quot;daml:collection\u0026quot;, the following works:\n… arp.setProperty(\u0026quot;error-mode\u0026quot;, \u0026quot;strict\u0026quot; ); arp.setProperty(\u0026quot;IGN_DAML_COLLECTION\u0026quot;,\u0026quot;EM_IGNORE\u0026quot;); … The other way round does not work.\n… arp.setProperty(\u0026quot;IGN_DAML_COLLECTION\u0026quot;,\u0026quot;EM_IGNORE\u0026quot;); arp.setProperty(\u0026quot;error-mode\u0026quot;, \u0026quot;strict\u0026quot; ); … This is because in strict mode IGN_DAML_COLLECTION is treated as an error, and so the second call to setProperty overwrites the effect of the first.\nThe IRI rules and resolver can be set on a per-reader basis:\nInputStream in = ... ; String baseURI = ... ; Model model = ModelFactory.createDefaultModel(); RDFReader r = model.getReader(\u0026quot;RDF/XML\u0026quot;); r.setProperty(\u0026quot;iri-rules\u0026quot;, \u0026quot;strict\u0026quot;) ; r.setProperty(\u0026quot;error-mode\u0026quot;, \u0026quot;strict\u0026quot;) ; // Warning will be errors. // Alternative to the above \u0026quot;error-mode\u0026quot;: set specific warning to be an error. //r.setProperty( \u0026quot;WARN_MALFORMED_URI\u0026quot;, ARPErrorNumbers.EM_ERROR) ; r.read(model, in, baseURI) ; in.close(); The global default IRI engine can be set with:\nARPOptions.setIRIFactoryGlobal(IRIFactory.iriImplementation()) ; or other IRI rule engine from IRIFactory.\n","permalink":"https://jena.apache.org/documentation/io/rdfxml-input.html","tags":null,"title":"Jena RDF/XML Input How-To (ARP)"},{"categories":null,"contents":"Advanced RDF/XML Output Two forms for output are provided: pretty printed RDF/XML (\u0026ldquo;RDF/XML-ABBREV\u0026rdquo;) or plain RDF/XML\nWhile some of the code is shared, these two writers are really very different, resulting in different but equivalent RDF output. \u0026ldquo;RDF/XML-ABBREV\u0026rdquo; is slower, but should produce more readable XML.\nProperties to Control RDF/XML Output Property NameDescriptionValue classLegal Values xmlbase The value to be included for an xml:base attribute on the root element in the file. String A URI string, or null (default) longId Whether to use long or short id's for anon resources. Short id's are easier to read and are the default, but can run out of memory on very large models. String or Boolean \"true\", \"false\" (default) allowBadURIs URIs in the graph are, by default, checked prior to serialization. String or Boolean \"true\", \"false\" (default) relativeURIs What sort of relative URIs should be used. A comma separated list of options: same-document\nsame-document references (e.g. \u0026quot;\u0026quot; or \u0026ldquo;#foo\u0026rdquo;) network\nnetwork paths e.g. \u0026quot;//example.org/foo\u0026quot; omitting the URI scheme absolute\nabsolute paths e.g. \u0026quot;/foo\u0026quot; omitting the scheme and authority relative\nrelative path not beginning in \u0026quot;../\u0026quot; parent\nrelative path beginning in \u0026quot;../\u0026quot; grandparent\nrelative path beginning in \u0026quot;../../\u0026quot; The default value is \u0026ldquo;same-document, absolute, relative, parent\u0026rdquo;. To switch off relative URIs use the value \u0026ldquo;\u0026rdquo;. Relative URIs of any of these types are output where possible if and only if the option has been specified.\nString \u0026nbsp; showXmlDeclaration If true, an XML Declaration is included in the output, if false no XML declaration is included. The default behaviour only gives an XML Declaration when asked to write to an `OutputStreamWriter` that uses some encoding other than UTF-8 or UTF-16. In this case the encoding is shown in the XML declaration. To ensure that the encoding attribute is shown in the XML declaration either: Set this option to true and use the write(Model,Writer,String) variant with an appropriate OutputStreamWriter. Or set this option to false, and write the declaration to an OutputStream before calling write(Model,OutputStream,String). true, \"true\", false, \"false\" or \"default\" can be true, false or \"default\" (null) showDoctypeDeclaration If true, an XML Doctype declaration is included in the output. This declaration includes a `!ENTITY` declaration for each prefix mapping in the model, and any attribute value that starts with the URI of that mapping is written as starting with the corresponding entity invocation. String or Boolean true, false, \"true\", \"false\" tab The number of spaces with which to indent XML child elements. String or Integer positive integer \"2\" is the default attributeQuoteChar How to write XML attributes. String \"\\\"\" or \"'\" blockRules A list of `Resource` or a `String` being a comma separated list of fragment IDs from [http://www.w3.org/TR/rdf-syntax-grammar](http://www.w3.org/TR/rdf-syntax-grammar) indicating grammar rules that will not be used. Rules that can be blocked are: section-Reification (RDFSyntax.sectionReification) section-List-Expand (RDFSyntax.sectionListExpand) parseTypeLiteralPropertyElt (RDFSyntax.parseTypeLiteralPropertyElt) parseTypeResourcePropertyElt (RDFSyntax.parseTypeLiteralPropertyElt) parseTypeCollectionPropertyElt (RDFSyntax.parseTypeCollectionPropertyElt) idAttr (RDFSyntax.idAttr) propertyAttr (RDFSyntax.propertyAttr) In addition \u0026quot;daml:collection\u0026quot; (DAML_OIL.collection) can be blocked. Blocking idAttr also blocks section-Reification. By default, rule propertyAttr is blocked. For the basic writer (RDF/XML) only parseTypeLiteralPropertyElt has any effect, since none of the other rules are implemented by that writer.\nResource[] or String prettyTypes Only for the RDF/XML-ABBREV writer. This is a list of the types of the principal objects in the model. The writer will tend to create RDF/XML with resources of these types at the top level. Resource[] To set properties on the RDF/XML writer:\n// Properties to be set. Map\u0026lt;String, Object\u0026gt; properties = new HashMap\u0026lt;\u0026gt;() ; properties.put(\u0026#34;showXmlDeclaration\u0026#34;, \u0026#34;true\u0026#34;); RDFWriter.create() .base(\u0026#34;http://example.org/\u0026#34;) .format(RDFFormat.RDFXML_PLAIN) .set(SysRIOT.sysRdfWriterProperties, properties) .source(model) .output(System.out); See ExRIOT_RDFXML_WriterProperties.java.\nLegacy example\nAs an example,\nRDFWriter w = m.getWriter(\u0026quot;RDF/XML-ABBREV\u0026quot;); w.setProperty(\u0026quot;attributeQuoteChar\u0026quot;,\u0026quot;'\u0026quot;); w.setProperty(\u0026quot;showXMLDeclaration\u0026quot;,\u0026quot;true\u0026quot;); w.setProperty(\u0026quot;tab\u0026quot;,\u0026quot;1\u0026quot;); w.setProperty(\u0026quot;blockRules\u0026quot;, \u0026quot;daml:collection,parseTypeLiteralPropertyElt,\u0026quot; +\u0026quot;parseTypeResourcePropertyElt,parseTypeCollectionPropertyElt\u0026quot;); creates a writer that does not use rdf:parseType (preferring rdf:datatype for rdf:XMLLiteral), indents only a little, and produces the XMLDeclaration. Attributes are used, and are quoted with \u0026quot;'\u0026quot;.\nNote that property attributes are not used at all, by default. However, the RDF/XML-ABBREV writer includes a rule to produce property attributes when the value does not contain any spaces. This rule is normally switched off. This rule can be turned on selectively by using the blockRules property as detailed above.\nConformance The RDF/XML I/O endeavours to conform with the RDF Syntax Recommendation.\nThe parser must be set to strict mode. (Note that, the conformant behaviour for rdf:parseType=\u0026quot;daml:collection\u0026quot; is to silently turn \u0026quot;daml:collection\u0026quot; into \u0026quot;Literal\u0026quot;).\nThe RDF/XML writer is conformant, but does not exercise much of the grammar.\nThe RDF/XML-ABBREV writer exercises all of the grammar and is conformant except that it uses the daml:collection construct for DAML ontologies. This non-conformant behaviour can be switched off using the blockRules property.\nFaster RDF/XML I/O To optimise the speed of writing RDF/XML it is suggested that all URI processing is turned off. Also do not use RDF/XML-ABBREV. It is unclear whether the longId attribute is faster or slower; the short IDs have to be generated on the fly and a table maintained during writing. The longer IDs are long, and hence take longer to write. The following creates a faster writer:\nModel m; … … RDFWriter fasterWriter = m.getWriter(\u0026quot;RDF/XML\u0026quot;); fasterWriter.setProperty(\u0026quot;allowBadURIs\u0026quot;,\u0026quot;true\u0026quot;); fasterWriter.setProperty(\u0026quot;relativeURIs\u0026quot;,\u0026quot;\u0026quot;); fasterWriter.setProperty(\u0026quot;tab\u0026quot;,\u0026quot;0\u0026quot;); When reading RDF/XML the check for reuse of rdf:ID has a memory overhead, which can be significant for very large files. In this case, this check can be suppressed by telling ARP to ignore this error.\nModel m; … … RDFReader bigFileReader = m.getReader(\u0026quot;RDF/XML\u0026quot;); bigFileReader.setProperty(\u0026quot;WARN_REDEFINITION_OF_ID\u0026quot;,\u0026quot;EM_IGNORE\u0026quot;); … ","permalink":"https://jena.apache.org/documentation/io/rdfxml-output.html","tags":null,"title":"Jena RDF/XML Output How-To"},{"categories":null,"contents":"You can view a list of the open issues on Github.\nPull requests, patches and other contributions welcome!\n","permalink":"https://jena.apache.org/about_jena/roadmap.html","tags":null,"title":"Jena Roadmap"},{"categories":null,"contents":"The schemagen provided with Jena is used to convert an OWL or RDFS vocabulary into a Java class file that contains static constants for the terms in the vocabulary. This documents outlines the use of schemagen, and the various options and templates that may be used to control the output.\nSchemagen is typically invoked from the command line or from a built script (such as Ant). Synopsis of the command:\njava jena.schemagen -i \u0026lt;input\u0026gt; [-a \u0026lt;namespaceURI\u0026gt;] [-o \u0026lt;output file\u0026gt;] [-c \u0026lt;config uri\u0026gt;] [-e \u0026lt;encoding\u0026gt;] ... Schemagen is highly configurable, either with command line options or by RDF information read from a configuration file. Many other options are defined, and these are described in detail below. Note that the CLASSPATH environment variable must be set to include the Jena .jar libraries.\nSummary of configuration options For quick reference, here is a list of all of the schemagen options (both command line and configuration file). The use of these options is explained in detail below.\nTable 1: schemagen options\nCommand line option RDF config file property Meaning -a \u0026lt;uri\u0026gt; sgen:namespace The namespace URI for the vocabulary. Names with this URI as prefix are automatically included in the generated vocabulary. If not specified, the base URI of the ontology is used as a default (but note that some ontology documents don\u0026rsquo;t define a base URI). -c \u0026lt;filename\u0026gt;\n-c \u0026lt;url\u0026gt; Specify an alternative config file. --classdec \u0026lt;string\u0026gt; sgen:classdec Additional decoration for class header (such as implements) --classnamesuffix \u0026lt;string\u0026gt; sgen:classnamesuffix Option for adding a suffix to the generated class name, e.g. \u0026ldquo;Vocab\u0026rdquo;. --classSection \u0026lt;string\u0026gt; sgen:classSection Section declaration comment for class section. --classTemplate \u0026lt;string\u0026gt; sgen:classTemplate Template for writing out declarations of class resources. --datatypesSection \u0026lt;string\u0026gt; sgen:datatypesSection Section declaration comment for datatypes section. --datatypeTemplate \u0026lt;string\u0026gt; sgen:datatypeTemplate Template for writing out declarations of datatypes. --declarations \u0026lt;string\u0026gt; sgen:declarations Additional declarations to add at the top of the class. --dos sgen:dos Use MSDOS-style line endings (i.e. \\r\\n). Default is Unix-style line endings. -e \u0026lt;string\u0026gt; sgen:encoding The surface syntax of the input file (e.g. RDF/XML, N3). Defaults to RDF/XML. --footer \u0026lt;string\u0026gt; sgen:footer Template for standard text to add to the end of the file. --header \u0026lt;string\u0026gt; sgen:header Template for the file header, including the class comment. -i \u0026lt;filename\u0026gt; -i \u0026lt;url\u0026gt; sgen:input Specify the input document to load --include \u0026lt;uri\u0026gt; sgen:include Option for including non-local URI\u0026rsquo;s in vocabulary --individualsSection \u0026lt;string\u0026gt; sgen:individualsSection Section declaration comment for individuals section. --individualTemplate \u0026lt;string\u0026gt; sgen:individualTemplate Template for writing out declarations of individuals. --inference sgen:inference Causes the model that loads the document prior to being processed to apply inference rules appropriate to the language. E.g. OWL inference rules will be used on a .owl file. --marker \u0026lt;string\u0026gt; sgen:marker Specify the marker string for substitutions, default is \u0026lsquo;%\u0026rsquo; -n \u0026lt;string\u0026gt; sgen:classname The name of the generated class. The default is to synthesise a name based on input document name. --noclasses sgen:noclasses Option to suppress classes in the generated vocabulary file --nocomments sgen:noComments Turn off all comment output in the generated vocabulary --nodatatypes sgen:nodatatypes Option to suppress datatypes in the generated vocabulary file. --noheader sgen:noHeader Prevent the output of a file header, with class comment etc. --noindividuals sgen:noindividuals Option to suppress individuals in the generated vocabulary file. --noproperties sgen:noproperties Option to suppress properties in the generated vocabulary file. -o \u0026lt;filename\u0026gt; -o \u0026lt;dir\u0026gt; sgen:output Specify the destination for the output. If the given value evaluates to a directory, the generated class will be placed in that directory with a file name formed from the generated (or given) class name with \u0026ldquo;.java\u0026rdquo; appended. --nostrict sgen:noStrict Option to turn off strict checking for ontology classes and properties (prevents ConversionExceptions). --ontology sgen:ontology The generated vocabulary will use the ontology API terms, in preference to RDF model API terms. --owl sgen:owl Specify that the language of the source is OWL (the default). Note that RDFS is a subset of OWL, so this setting also suffices for RDFS. --package \u0026lt;string\u0026gt; sgen:package Specify the Java package name and directory. --propSection \u0026lt;string\u0026gt; sgen:propSection Section declaration comment for properties section. --propTemplate \u0026lt;string\u0026gt; sgen:propTemplate Template for writing out declarations of property resources. -r \u0026lt;uri\u0026gt; Specify the uri of the root node in the RDF configuration model. --rdfs sgen:rdfs Specify that the language of the source ontology is RDFS. --strictIndividuals sgen:strictIndividuals When selecting the individuals to include in the output class, schemagen will normally include those individuals whose rdf:type is in the included namespaces for the vocabulary. However, if strictIndividuals is turned on, then all individuals in the output class must themselves have a URI in the included namespaces. --uppercase sgen:uppercase Option for mapping constant names to uppercase (like Java constants). Default is to leave the case of names unchanged. --includeSource sgen:includeSource Serializes the source code of the vocabulary, and includes this into the generated class file. At class load time, creates a Model containing the definitions from the source What does schemagen do? RDFS and OWL provide a very convenient means to define a controlled vocabulary or ontology. For general ontology processing, Jena provides various API\u0026rsquo;s to allow the source files to be read in and manipulated. However, when developing an application, it is frequently convenient to refer to the controlled vocabulary terms directly from Java code. This leads typically to the declaration of constants, such as:\npublic static final Resource A_CLASS = new ResourceImpl( \u0026quot;http://example.org/schemas#a-class\u0026quot; ); When these constants are defined manually, it is tedious and error-prone to maintain them in sync with the source ontology file. Schemagen automates the production of Java constants that correspond to terms in an ontology document. By automating the step from source vocabulary to Java constants, a source of error and inconsistency is removed.\nExample Perhaps the easiest way to explain the detail of what schemagen does is to show an example. Consider the following mini-RDF vocabulary:\n\u0026lt;rdf:RDF xmlns:rdf=\u0026quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026quot; xmlns:rdfs=\u0026quot;http://www.w3.org/2000/01/rdf-schema#\u0026quot; xmlns=\u0026quot;http://example.org/eg#\u0026quot; xml:base=\u0026quot;http://example.org/eg\u0026quot;\u0026gt; \u0026lt;rdfs:Class rdf:ID=\u0026quot;Dog\u0026quot;\u0026gt; \u0026lt;rdfs:comment\u0026gt;A class of canine companions\u0026lt;/rdfs:comment\u0026gt; \u0026lt;/rdfs:Class\u0026gt; \u0026lt;rdf:Property rdf:ID=\u0026quot;petName\u0026quot;\u0026gt; \u0026lt;rdfs:comment\u0026gt;The name that everyone calls a dog\u0026lt;/rdfs:comment\u0026gt; \u0026lt;rdfs:domain rdf:resource=\u0026quot;http://example.org/eg#Dog\u0026quot; /\u0026gt; \u0026lt;/rdf:Property\u0026gt; \u0026lt;rdf:Property rdf:ID=\u0026quot;kennelName\u0026quot;\u0026gt; \u0026lt;rdfs:comment\u0026gt;Posh dogs have a formal name on their KC certificate\u0026lt;/rdfs:comment\u0026gt; \u0026lt;/rdf:Property\u0026gt; \u0026lt;Dog rdf:ID=\u0026quot;deputy\u0026quot;\u0026gt; \u0026lt;rdfs:comment\u0026gt;Deputy is a particular Dog\u0026lt;/rdfs:comment\u0026gt; \u0026lt;kennelName\u0026gt;Deputy Dawg of Chilcompton\u0026lt;/kennelName\u0026gt; \u0026lt;/Dog\u0026gt; \u0026lt;/rdf:RDF\u0026gt; We process this document with a command something like: Java jena.schemagen -i deputy.rdf -a http://example.org/eg# to produce the following generated class:\n/* CVS $Id: schemagen.html,v 1.16 2010-06-11 00:08:23 ian_dickinson Exp $ */ import org.apache.jena.rdf.model.*; /** * Vocabulary definitions from deputy.rdf * @author Auto-generated by schemagen on 01 May 2003 21:49 */ public class Deputy { /** \u0026lt;p\u0026gt;The RDF model that holds the vocabulary terms\u0026lt;/p\u0026gt; */ private static Model m_model = ModelFactory.createDefaultModel(); /** \u0026lt;p\u0026gt;The namespace of the vocabulary as a string {@value}\u0026lt;/p\u0026gt; */ public static final String NS = \u0026quot;http://example.org/eg#\u0026quot;; /** \u0026lt;p\u0026gt;The namespace of the vocabulary as a resource {@value}\u0026lt;/p\u0026gt; */ public static final Resource NAMESPACE = m_model.createResource( \u0026quot;http://example.org/eg#\u0026quot; ); /** \u0026lt;p\u0026gt;The name that everyone calls a dog\u0026lt;/p\u0026gt; */ public static final Property petName = m_model.createProperty( \u0026quot;http://example.org/eg#petName\u0026quot; ); /** \u0026lt;p\u0026gt;Posh dogs have a formal name on their KC certificate\u0026lt;/p\u0026gt; */ public static final Property kennelName = m_model.createProperty( \u0026quot;http://example.org/eg#kennelName\u0026quot; ); /** \u0026lt;p\u0026gt;A class of canine companions\u0026lt;/p\u0026gt; */ public static final Resource Dog = m_model.createResource( \u0026quot;http://example.org/eg#Dog\u0026quot; ); /** \u0026lt;p\u0026gt;Deputy is a particular Dog\u0026lt;/p\u0026gt; */ public static final Resource deputy = m_model.createResource( \u0026quot;http://example.org/eg#deputy\u0026quot; ); } Some things to note in this example. All of the named classes, properties and individuals from the source document are translated to Java constants (below we show how to be more selective than this). The properties of the named resources are not translated: schemagen is for giving access to the names in the vocabulary or schema, not to perform a general translation of RDF to Java. The RDFS comments from the source code are translated to Javadoc comments. Finally, we no longer directly call new ResourceImpl: this idiom is no longer recommended by the Jena team.\nWe noted earlier that schemagen is highly configurable. One additional argument generates a vocabulary file that uses Jena\u0026rsquo;s ontology API, rather than the RDF model API. We change rdfs:Class to owl:Class, and invoke Java jena.schemagen -i deputy.rdf -b http://example.org/eg# --ontology to get:\n/* CVs $Id: schemagen.html,v 1.16 2010-06-11 00:08:23 ian_dickinson Exp $ */ import org.apache.jena.rdf.model.*; import org.apache.jena.ontology.*; /** * Vocabulary definitions from deputy.rdf * @author Auto-generated by schemagen on 01 May 2003 22:03 */ public class Deputy { /** \u0026lt;p\u0026gt;The ontology model that holds the vocabulary terms\u0026lt;/p\u0026gt; */ private static OntModel m_model = ModelFactory.createOntologyModel( ProfileRegistry.OWL_LANG ); /** \u0026lt;p\u0026gt;The namespace of the vocabulary as a string {@value}\u0026lt;/p\u0026gt; */ public static final String NS = \u0026quot;http://example.org/eg#\u0026quot;; /** \u0026lt;p\u0026gt;The namespace of the vocabulary as a resource {@value}\u0026lt;/p\u0026gt; */ public static final Resource NAMESPACE = m_model.createResource( \u0026quot;http://example.org/eg#\u0026quot; ); /** \u0026lt;p\u0026gt;The name that everyone calls a dog\u0026lt;/p\u0026gt; */ public static final Property petName = m_model.createProperty( \u0026quot;http://example.org/eg#petName\u0026quot; ); /** \u0026lt;p\u0026gt;Posh dogs have a formal name on their KC certificate\u0026lt;/p\u0026gt; */ public static final Property kennelName = m_model.createProperty( \u0026quot;http://example.org/eg#kennelName\u0026quot; ); /** \u0026lt;p\u0026gt;A class of canine companions\u0026lt;/p\u0026gt; */ public static final OntClass Dog = m_model.createClass( \u0026quot;http://example.org/eg#Dog\u0026quot; ); /** \u0026lt;p\u0026gt;Deputy is a particular Dog\u0026lt;/p\u0026gt; */ public static final Individual deputy = m_model.createIndividual( Dog, \u0026quot;http://example.org/eg#deputy\u0026quot; ); } General principles In essence, schemagen will load a single vocabulary file, and generate a Java class that contains static constants for the named classes, properties and instances of the vocabulary. Most of the generated components of the output Java file can be controlled by option flags, and formatted with a template. Default templates are provided for all elements, so the minimum amount of necessary information is actually very small.\nOptions can be specified on the command line (when invoking schemagen), or may be preset in an RDF file. Any mixture of command line and RDF option specification is permitted. Where a given option is specified both in an RDF file and on the command line, the command line setting takes precedence. Thus the options in the RDF file can be seen as defaults.\nSpecifying command line options To specify a command line option, add its name (and optional value) to the command line when invoking the schemagen tool. E.g: Java jena.schemagen -i myvocab.owl --ontology --uppercase\nSpecifying options in an RDF file To specify an option in an RDF file, create a resource of type sgen:Config, with properties corresponding to the option names listed in Table 1. The following fragment shows a small options file. A complete example configuration file is shown in appendix A.\nBy default, schemagen will look for a configuration file named schemagen.rdf in the current directory. To specify another configuration, use the -c option with a URL to reference the configuration. Multiple configurations (i.e. multiple sgen:Config nodes) can be placed in one RDF document. In this case, each configuration node must be named, and the URI specified in the -r command line option. If there is no -r option, schemagen will look for a node of type rdf:type sgen:Config. If there are multiple such nodes in the model, it is indeterminate which one will be used.\nUsing templates We have several times referred to a template being used to construct part of the generated file. What is a template? Simply put, it is a fragment of output file. Some templates will be used at most once (for example the file header template), some will be used many times (such as the template used to generate a class constant). In order to make the templates adaptable to the job they\u0026rsquo;re doing, before it is written out a template has keyword substitution performed on it. This looks for certain keywords delimited by a pair of special characters (% by default), and replaces them with the current binding for that keyword. Some keyword bindings stay the same throughout the processing of the file, and some are dependent on the language element being processed. The substitutions are:\nTable 2: Substitutable keywords in templates\nKeyword Meaning Typical value classname The name of the Java class being generated Automatically defined from the document name, or given with the -n option date The date and time the class was generated imports The Java imports for this class nl The newline character for the current platform package The Java package name As specified by an option. The option just gives the package name, schemagen turns the name into a legal Java statement. sourceURI The source of the document being processed As given by the -i option or in the config file. valclass The Java class of the value being defined E.g. Property for vocabulary properties, Resource for classes in RDFS, or OntClass for classes using the ontology API valcreator The method used to generate an instance of the Java representation E.g. createResource or createClass valname The name of the Java constant being generated This is generated from the name of the resource in the source file, adjusted to be a legal Java identifier. By default, this will preserve the case of the RDF constant, but setting --uppercase will map all constants to upper-case names (a common convention in Java code). valtype The rdf:type for an individual The class name or URI used when creating an individual in the ontology API valuri The full URI of the value being defined From the RDF, without adjustment. Details of schemagen options We now go through each of the configuration options in detail.\nNote: for brevity, we assume a standard prefix sgen is defined for resource URI\u0026rsquo;s in the schemagen namespace. The expansion for sgen is: http://jena.hpl.hp.com/2003/04/schemagen#, thus:\nxmlns:sgen=\u0026quot;http://jena.hpl.hp.com/2003/04/schemagen#\u0026quot; Note on legal Java identifiers Schemagen will attempt to ensure that all generated code will compile as legal Java. Occasionally, this means that identifiers from input documents, which are legal components of RDF URI identifiers, have to be modified to be legal Java identifiers. Specifically, any character in an identifier name that is not a legal Java identifier character will be replaced with the character \u0026lsquo;_\u0026rsquo; (underscore). Thus the name \u0026lsquo;trading-price\u0026rsquo; might become 'trading_price\u0026rsquo;. In addition, Java requires that identifiers be distinct. If a name clash is detected (for example, trading-price and trading+price both map to the same Java identifier), schemagen will add disambiguators to the second and subsequent uses. These will be based on the role of the identifier; for example property names are disambiguated by appending _PROPn for increasing values of n. In a well-written ontology, identifiers are typically made distinct for clarity and ease-of-use by the ontology users, so the use of the disambiguation tactic is rare. Indeed, it may be taken as a hint that refactoring the ontology itself is desirable.\nSpecifying the configuration file Command line -c \u0026lt;*config-file-path*\u0026gt;\n-c \u0026lt;*config-file-URL*\u0026gt; Config file n/a The default configuration file name is schemagen.rdf in the current directory. To specify a different configuration file, either as a file name on the local file system, or as a URL (e.g. an http: address), the config file location is passed with the -c option. If no -c option is given, and there is no configuration file in the current directory, schemagen will continue and use default values (plus the other command line options) to configure the tool. If a file name or URL is given with -c, and that file cannot be located, schemagen will stop with an error.\nSchemagen will assume the language encoding of the configuration file is implied by the filename/URL suffix: \u0026ldquo;.n3\u0026rdquo; means N3, \u0026ldquo;.nt\u0026rdquo; means NTRIPLES, \u0026ldquo;.rdf\u0026rdquo; and \u0026ldquo;.owl\u0026rdquo; mean \u0026ldquo;RDF/XML\u0026rdquo;. By default it assumes RDF/XML.\nSpecifying the configuration root in the configuration file Command line -r \u0026lt;*config-root-URI*\u0026gt; Config file n/a It is possible to have more than one set of configuration options in one configuration file. If there is only one set of configuration options, schemagen will locate the root by searching for a resource of rdf:type sgen:Config. If there is more than one, and no root is specified on the command line, it is not specified which set of configuration options will be used. The root URI given as a command line option must match exactly with the URI given in the configuration file. For example:\nJava jena.schemagen -c config/localconf.rdf -r http://example.org/sg#project1 matches:\n... \u0026lt;sgen:Config rdf:about=\u0026quot;http://example.org/SG#project1\u0026quot;\u0026gt; .... \u0026lt;/sgen:Config\u0026gt; Specifying the input document Command line -i \u0026lt;*input-file-path*\u0026gt;\n-i \u0026lt;*input-URL*\u0026gt; Config file \u0026lt;sgen:input rdf:resource=\u0026quot;*inputURL*\u0026quot; /\u0026gt; The only mandatory argument to schemagen is the input document to process. This can be specified in the configuration file, though this does, of course, mean that the same configuration cannot be applied to multiple different input files for consistency. However, by specifying the input document in the default configuration file, schemagen can easily be invoked with the minimum of command line typing. For other means of automating schemagen, see using schemagen with Ant.\nSpecifying the output location Command line -o \u0026lt;*input-file-path*\u0026gt;\n-o \u0026lt;*output-dir*\u0026gt; Config file \u0026lt;sgen:output rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt;*output-path-or-dir*\u0026lt;/sgen:output\u0026gt; Schemagen must know where to write the generated Java file. By default, the output is written to the standard output. Various options exist to change this. The output location can be specified either on the command line, or in the configuration file. If specified in the configuration file, the resource must be a string literal, denoting the file path. If the path given resolves to an existing directory, then it is assumed that the output will be based on the name of the generated class (i.e. it will be the class name with Java appended). Otherwise, the path is assumed to point to a file. Any existing file that has the given path name will be overwritten.\nBy default, schemagen will create files that have the Unix convention for line-endings (i.e. \u0026lsquo;\\n\u0026rsquo;). To switch to DOS-style line endings, use --dos.\nCommand line --dos Config file \u0026lt;sgen:dos rdf:datatype=\u0026quot;\u0026amp;xsd;boolean\u0026quot;\u0026gt;true\u0026lt;/sgen:dos\u0026gt; Specifying the class name Command line -n \u0026lt;*class-name*\u0026gt; Config file \u0026lt;sgen:classname rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt;*classname*\u0026lt;/sgen:classname\u0026gt; By default, the name of the class will be based on the name of the input file. Specifically, the last component of the input document\u0026rsquo;s path name, with the prefix removed, becomes the class name. By default, the initial letter is adjusted to a capital to conform to standard Java usage. Thus file:vocabs/trading.owl becomes Trading.java. To override this default algorithm, a class name specified by -n or in the config file is used exactly as given.\nSometimes it is convenient to have all vocabulary files distinguished by a common suffix, for example xyzSchema.java or xyzVocabs.java. This can be achieved by the classname-suffix option:\nCommand line --classnamesuffix \u0026lt;*suffix*\u0026gt; Config file \u0026lt;sgen:classnamesuffix rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt;*suffix*\u0026lt;/sgen:classnamesuffix\u0026gt; See also the note on legal Java identifiers, which applies to generated class names.\nSpecifying the vocabulary namespace Command line -a \u0026lt;*namespace-URI*\u0026gt; Config file \u0026lt;sgen:namespace rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt;*namespace*\u0026lt;/sgen:namespace\u0026gt; Since ontology files are often modularised, it is not the case that all of the resource names appearing in a given document are being defined by that ontology. They may appear simply as part of the definitions of other terms. Schemagen assumes that there is one primary namespace for each document, and it is names from that namespace that will appear in the generated Java file.\nIn an OWL ontology, this namespace is computed by finding the owl:Ontology element, and using its namespace as the primary namespace of the ontology. This may not be available (it is not, for example, a part of RDFS) or correct, so the namespace may be specified directly with the -a option or in the configuration file.\nSchemagen does not, in the present version, permit more than one primary namespace per generated Java class. However, constants from namespaces other than the primary namespace may be included in the generated Java class by the include option:\nCommand line --include \u0026lt;*namespace-URI*\u0026gt; Config file \u0026lt;sgen:include rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt;*namespace*\u0026lt;/sgen:include\u0026gt; The include option may repeated multiple times to include a variety of constants from other namespaces in the output class.\nSince OWL and RDFS ontologies may include individuals that are named instances of declared classes, schemagen will include individuals among the constants that it generates in Java. By default, an individual will be included if its class has a URI that is in one of the permitted namespaces for the vocabulary, even if the individual itself is not in that namespace. If the option strictIndividuals is set, individuals are only included if they have a URI that is in the permitted namespaces for the vocabulary.\nCommand line --strictIndividuals Config file \u0026lt;sgen:strictIndividuals /\u0026gt; Specifying the syntax (encoding) of the input document Command line -e \u0026lt;*encoding*\u0026gt; Config file \u0026lt;sgen:encoding rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt;*encoding*\u0026lt;/sgen:encoding\u0026gt; Jena can parse a number of different presentation syntaxes for RDF documents, including RDF/XML, N3 and NTRIPLE. By default, the encoding will be derived from the name of the input document (e.g. a document xyz.n3 will be parsed in N3 format), or, if the extension is non-obvious the default is RDF/XML. The encoding, and hence the parser, to use on the input document may be specified by the encoding configuration option.\nChoosing the style of the generated class: ontology or plain RDF Command line --ontology Config file \u0026lt;sgen:ontology rdf:datatype=\u0026quot;\u0026amp;xsd;boolean\u0026quot;\u0026gt;*true or false*\u0026lt;/sgen:ontology\u0026gt; By default, the Java class generated by schemagen will generate constants that are plain RDF Resource, Property or Literal constants. When working with OWL or RDFS ontologies, it may be more convenient to have constants that are OntClass, ObjectProperty, DatatypeProperty and Individual Java objects. To generate these ontology constants, rather than plain RDF constants, set the ontology configuration option.\nFurthermore, since Jena can handle input ontologies in OWL (the default), and RDFS, it is necessary to be able to specify which language is being processed. This will affect both the parsing of the input documents, and the language profile selected for the constants in the generated Java class.\nCommand line --owl Config file \u0026lt;sgen:owl rdf:datatype=\u0026quot;\u0026amp;xsd;boolean\u0026quot;\u0026gt;true\u0026lt;/sgen:owl\u0026gt; Command line --rdfs Config file \u0026lt;sgen:rdfs rdf:datatype=\u0026quot;\u0026amp;xsd;boolean\u0026quot;\u0026gt;true\u0026lt;/sgen:owl\u0026gt; Prior to Jena 2.2, schemagen used a Jena model to load the input document that also applied some rules of inference to the input data. So, for example, a resource that is mentioned as the owl:range of a property can be inferred to be rdf:type owl:Class, and hence listed in the class constants in the generated Java class, even if that fact is not directly asserted in the input model. From Jena 2.2 onwards, this option is now off by default. If correct handling of an input document by schemagen requires the use of inference rules, this must be specified by the inference option.\nCommand line --inference Config file \u0026lt;sgen:inference rdf:datatype=\u0026quot;\u0026amp;xsd;boolean\u0026quot;\u0026gt;true\u0026lt;/sgen:owl\u0026gt; Specifying the Java package Command line --package \u0026lt;*package-name*\u0026gt; Config file \u0026lt;sgen:package rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt;*package-name*\u0026lt;/sgen:package\u0026gt; By default, the Java class generated by schemagen will not be in a Java package. Set the package configuration option to specify the Java package name. Change from Jena 2.6.4-SNAPSHOT onwards: Setting the package name will affect the directory into which the generated class will be written: directories will be appended to the output directory to match the Java package.\nAdditional decorations on the main class declaration Command line --classdec \u0026lt;*class-declaration*\u0026gt; Config file \u0026lt;sgen:classdec rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt;*class-declaration*\u0026lt;/sgen:classdec\u0026gt; In some applications, it may be convenient to add additional information to the declaration of the Java class, for example that the class implements a given interface (such as java.lang.Serializable). Any string given as the value of the class-declaration option will be written immediately after \u0026ldquo;public class \u0026lt;i\u0026gt;ClassName\u0026lt;/i\u0026gt;\u0026rdquo;.\nAdding general declarations within the generated class Command line --declarations \u0026lt;*declarations*\u0026gt; Config file \u0026lt;sgen:declarations rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt;*declarations*\u0026lt;/sgen:declarations\u0026gt; Some more complex vocabularies may require access to static constants, or other Java objects or factories to fully declare the constants defined by the given templates. Any text given by the declarations option will be included in the generated class after the class declaration but before the body of the declared constants. The value of the option should be fully legal Java code (though the template substitutions will be performed on the code). Although this option can be declared as a command line option, it is typically easier to specify as a value in a configuration options file.\nOmitting sections of the generated vocabulary Command line --noclasses\n--nodatatypes\n--noproperties\n--noindividuals Config file \u0026lt;sgen:noclassses rdf:datatype=\u0026quot;\u0026amp;xsd;boolean\u0026quot;\u0026gt;true\u0026lt;/sgen:noclassses\u0026gt;\n\u0026lt;sgen:nodatatypes rdf:datatype=\u0026quot;\u0026amp;xsd;boolean\u0026quot;\u0026gt;true\u0026lt;/sgen:nodatatypes\u0026gt;\n\u0026lt;sgen:noproperties rdf:datatype=\u0026quot;\u0026amp;xsd;boolean\u0026quot;\u0026gt;true\u0026lt;/sgen:noproperties\u0026gt;\n\u0026lt;sgen:noindividuals rdf:datatype=\u0026quot;\u0026amp;xsd;boolean\u0026quot;\u0026gt;true\u0026lt;/sgen:noindividuals\u0026gt; By default, the vocabulary class generated from a given ontology will include constants for each of the included classes, datatypes, properties and individuals in the ontology. To omit any of these groups, use the corresponding noXYZ configuration option. For example, specifying --noproperties means that the generated class will not contain any constants corresponding to predicate names from the ontology, irrespective of what is in the input document.\nSection header comments Command line --classSection *\u0026lt;section heading\u0026gt;*\n--datatypeSection *\u0026lt;section heading\u0026gt;*\n--propSection *\u0026lt;section heading\u0026gt;*\n--individualSection *\u0026lt;section heading*\u0026gt;\n--header *\u0026lt;file header section\u0026gt;*\n--footer *\u0026lt;file footer section\u0026gt;* Config file \u0026lt;sgen:classSection rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt;*section heading*\u0026lt;/sgen:classSection\u0026gt;\n\u0026lt;sgen:datatypeSection rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt;*section heading*\u0026lt;/sgen:datatypeSection\u0026gt;\n\u0026lt;sgen:propSection rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt;*section heading*\u0026lt;/sgen:propSection\u0026gt;\n\u0026lt;sgen:individualSection rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt;*section heading*\u0026lt;/sgen:individualSection\u0026gt;\n\u0026lt;sgen:header rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt;*file header*\u0026lt;/sgen:header\u0026gt;\n\u0026lt;sgen:footer rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt;*file footer*\u0026lt;/sgen:footer\u0026gt; Some coding styles use block comments to delineate different sections of a class. These options allow the introduction of arbitrary Java code, though typically this will be a comment block, at the head of the sections of class constant declarations, datatype constant declarations, property constant declarations, and individual constant declarations.\nInclude vocabulary source code Command line --includeSource Config file \\\u0026lt;sgen:includeSource rdf:datatype=\u0026quot;\u0026amp;xsd;boolean\u0026quot;\u0026gt;true\\\u0026lt;/sgen:includeSource\u0026gt; Schemagen\u0026rsquo;s primary role is to provide Java constants corresponding to the names in a vocabulary. Sometimes, however, we may need more information from the vocabulary source file to available. For example, to know the domain and range of the properties in the vocabulary. If you set the configuration parameter --includeSource, schemagen will:\nconvert the input vocabulary into string form and include that string form in the generated Java class create a Jena model when the Java vocabulary class is first loaded, and load the string-ified vocabulary into that model attach the generated constants to that model, so that, for example, you can look up the declared domain and range of a property or the declared super-classes of a class. Note that Java compilers typically impose some limit on the size of a Java source file (or, more specifically, on the size of .class file they will generate. Loading a particularly large vocabulary with --includeSource may risk breaching that limit.\nUsing schemagen with Maven Apache Maven is a build automation tool typically used for Java. You can use exec-maven-plugin and build-helper-maven-plugin to run schemagen as part of the generate-sources goal of your project. The following example shows one way of performing this task. The developer should customize command-line options or use a configuration file instead as needed.\n\u0026lt;build\u0026gt; \u0026lt;plugins\u0026gt; \u0026lt;plugin\u0026gt; \u0026lt;groupId\u0026gt;org.codehaus.mojo\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;exec-maven-plugin\u0026lt;/artifactId\u0026gt; \u0026lt;executions\u0026gt; \u0026lt;execution\u0026gt; \u0026lt;phase\u0026gt;generate-sources\u0026lt;/phase\u0026gt; \u0026lt;goals\u0026gt; \u0026lt;goal\u0026gt;java\u0026lt;/goal\u0026gt; \u0026lt;/goals\u0026gt; \u0026lt;configuration\u0026gt; \u0026lt;mainClass\u0026gt;jena.schemagen\u0026lt;/mainClass\u0026gt; \u0026lt;commandlineArgs\u0026gt; --inference \\ -i ${basedir}/src/main/resources/example.ttl \\ -e TTL \\ --package org.example.ont \\ -o ${project.build.directory}/generated-sources/java \\ -n ExampleOnt \u0026lt;/commandlineArgs\u0026gt; \u0026lt;/configuration\u0026gt; \u0026lt;/execution\u0026gt; \u0026lt;/executions\u0026gt; \u0026lt;/plugin\u0026gt; \u0026lt;plugin\u0026gt; \u0026lt;groupId\u0026gt;org.codehaus.mojo\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;build-helper-maven-plugin\u0026lt;/artifactId\u0026gt; \u0026lt;executions\u0026gt; \u0026lt;execution\u0026gt; \u0026lt;id\u0026gt;add-source\u0026lt;/id\u0026gt; \u0026lt;goals\u0026gt; \u0026lt;goal\u0026gt;add-source\u0026lt;/goal\u0026gt; \u0026lt;/goals\u0026gt; \u0026lt;configuration\u0026gt; \u0026lt;sources\u0026gt; \u0026lt;source\u0026gt;${project.build.directory}/generated-sources/java\u0026lt;/source\u0026gt; \u0026lt;/sources\u0026gt; \u0026lt;/configuration\u0026gt; \u0026lt;/execution\u0026gt; \u0026lt;/executions\u0026gt; \u0026lt;/plugin\u0026gt; \u0026lt;/plugins\u0026gt; \u0026lt;/build\u0026gt; At this point you can run mvn generate-sources in your project to cause schemagen to run and create your Java source (note that this goal is run automatically from mvn compile or mvn install, so there really isn\u0026rsquo;t any reason to run it manually unless you wish to just generate the source). The source file is placed in the maven standard target/generated-sources/java directory, which is added to the project classpath by build-helper-maven-plugin.\nUsing schemagen with Ant Apache Ant is a tool for automating build steps in Java (and other language) projects. For example, it is the tool used to compile the Jena sources to the jena.jar file, and to prepare the Jena distribution prior to download. Although it would be quite possible to create an Ant taskdef to automate the production of Java classes from input vocabularies, we have not yet done this. Nevertheless, it is straightforward to use schemagen from an ant build script, by making use of Ant\u0026rsquo;s built-in Java task, which can execute an arbitrary Java program.\nThe following example shows a complete ant target definition for generating ExampleVocab.java from example.owl. It ensures that the generation step is only performed when example.owl has been updated more recently than ExampleVocab.java (e.g. if the definitions in the owl file have recently been changed).\n\u0026lt;!-- properties --\u0026gt; \u0026lt;property name=\u0026quot;vocab.dir\u0026quot; value=\u0026quot;src/org/example/vocabulary\u0026quot; /\u0026gt; \u0026lt;property name=\u0026quot;vocab.template\u0026quot; value=\u0026quot;${rdf.dir}/exvocab.rdf\u0026quot; /\u0026gt; \u0026lt;property name=\u0026quot;vocab.tool\u0026quot; value=\u0026quot;jena.schemagen\u0026quot; /\u0026gt; \u0026lt;!-- Section: vocabulary generation --\u0026gt; \u0026lt;target name=\u0026quot;vocabularies\u0026quot; depends=\u0026quot;exVocab\u0026quot; /\u0026gt; \u0026lt;target name=\u0026quot;exVocab.check\u0026quot;\u0026gt; \u0026lt;uptodate property=\u0026quot;exVocab.nobuild\u0026quot; srcFile=\u0026quot;${rdf.dir}/example.owl\u0026quot; targetFile=\u0026quot;${vocab.dir}/ExampleVocab.java\u0026quot; /\u0026gt; \u0026lt;/target\u0026gt; \u0026lt;target name=\u0026quot;exVocab\u0026quot; depends=\u0026quot;exVocab.check\u0026quot; unless=\u0026quot;exVocab.nobuild\u0026quot;\u0026gt; \u0026lt;Java classname=\u0026quot;${vocab.tool}\u0026quot; classpathref=\u0026quot;classpath\u0026quot; fork=\u0026quot;yes\u0026quot;\u0026gt; \u0026lt;arg value=\u0026quot;-i\u0026quot; /\u0026gt; \u0026lt;arg value=\u0026quot;file:${rdf.dir}/example.owl\u0026quot; /\u0026gt; \u0026lt;arg value=\u0026quot;-c\u0026quot; /\u0026gt; \u0026lt;arg value=\u0026quot;${vocab.template}\u0026quot; /\u0026gt; \u0026lt;arg value=\u0026quot;--classnamesuffix\u0026quot; /\u0026gt; \u0026lt;arg value=\u0026quot;Vocab\u0026quot; /\u0026gt; \u0026lt;arg value=\u0026quot;--include\u0026quot; /\u0026gt; \u0026lt;arg value=\u0026quot;http://example.org/2004/01/services#\u0026quot; /\u0026gt; \u0026lt;arg value=\u0026quot;--ontology\u0026quot; /\u0026gt; \u0026lt;/Java\u0026gt; \u0026lt;/target\u0026gt; Clearly it is up to each developer to find the appropriate balance between options that are specified via the command line options, and those that are specified in the configuration options file (exvocab.rdf in the above example). This is not the only, nor necessarily the \u0026ldquo;right\u0026rdquo; way to use schemagen from Ant, but if it points readers in the appropriate direction to produce a custom target for their own application it will have served its purpose.\nAppendix A: Complete example configuration file The source of this example is provided in the Jena download as etc/schemagen.rdf. For clarity, RDF/XML text is highlighted in blue.\n\u0026lt;?xml version='1.0'?\u0026gt; \u0026lt;!DOCTYPE rdf:RDF [ \u0026lt;!ENTITY jena 'http://jena.hpl.hp.com/'\u0026gt; \u0026lt;!ENTITY rdf 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'\u0026gt; \u0026lt;!ENTITY rdfs 'http://www.w3.org/2000/01/rdf-schema#'\u0026gt; \u0026lt;!ENTITY owl 'http://www.w3.org/2002/07/owl#'\u0026gt; \u0026lt;!ENTITY xsd 'http://www.w3.org/2001/XMLSchema#'\u0026gt; \u0026lt;!ENTITY base '\u0026amp;jena;2003/04/schemagen'\u0026gt; \u0026lt;!ENTITY sgen '\u0026amp;base;#'\u0026gt; ]\u0026gt; \u0026lt;rdf:RDF xmlns:rdf =\u0026quot;\u0026amp;rdf;\u0026quot; xmlns:rdfs =\u0026quot;\u0026amp;rdfs;\u0026quot; xmlns:owl =\u0026quot;\u0026amp;owl;\u0026quot; xmlns:sgen =\u0026quot;\u0026amp;sgen;\u0026quot; xmlns =\u0026quot;\u0026amp;sgen;\u0026quot; xml:base =\u0026quot;\u0026amp;base;\u0026quot; \u0026gt; \u0026lt;!-- Example schemagen configuration for use with jena.schemagen Not all possible options are used in this example, see Javadoc and Howto for full details. Author: Ian Dickinson, mailto:ian.dickinson@hp.com CVs: $Id: schemagen.html,v 1.16 2010-06-11 00:08:23 ian_dickinson Exp $ --\u0026gt; \u0026lt;sgen:Config\u0026gt; \u0026lt;!-- specifies that the source document uses OWL --\u0026gt; \u0026lt;sgen:owl rdf:datatype=\u0026quot;\u0026amp;xsd;boolean\u0026quot;\u0026gt;true\u0026lt;/sgen:owl\u0026gt; \u0026lt;!-- specifies that we want the generated vocab to use OntClass, OntProperty, etc, not Resource and Property --\u0026gt; \u0026lt;sgen:ontology rdf:datatype=\u0026quot;\u0026amp;xsd;boolean\u0026quot;\u0026gt;true\u0026lt;/sgen:ontology\u0026gt; \u0026lt;!-- specifies that we want names mapped to uppercase (as standard Java constants) --\u0026gt; \u0026lt;sgen:uppercase rdf:datatype=\u0026quot;\u0026amp;xsd;boolean\u0026quot;\u0026gt;true\u0026lt;/sgen:uppercase\u0026gt; \u0026lt;!-- append Vocab to class name, so input beer.owl becomes BeerVocab.java --\u0026gt; \u0026lt;sgen:classnamesuffix rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt;Vocab\u0026lt;/sgen:classnamesuffix\u0026gt; \u0026lt;!-- the Java package that the vocabulary is in --\u0026gt; \u0026lt;sgen:package rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt;com.example.vocabulary\u0026lt;/sgen:package\u0026gt; \u0026lt;!-- the directory or file to write the results out to --\u0026gt; \u0026lt;sgen:output rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt;src/com/example/vocabulary\u0026lt;/sgen:output\u0026gt; \u0026lt;!-- the template for the file header --\u0026gt; \u0026lt;sgen:header rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt;/***************************************************************************** * Source code information * ----------------------- * Original author Jane Smart, example.com * Author email jane.smart@example.com * Package @package@ * Web site @website@ * Created %date% * Filename $RCSfile: schemagen.html,v $ * Revision $Revision: 1.16 $ * Release status @releaseStatus@ $State: Exp $ * * Last modified on $Date: 2010-06-11 00:08:23 $ * by $Author: ian_dickinson $ * * @copyright@ *****************************************************************************/ // Package /////////////////////////////////////// %package% // Imports /////////////////////////////////////// %imports% /** * Vocabulary definitions from %sourceURI% * @author Auto-generated by schemagen on %date% */\u0026lt;/sgen:header\u0026gt; \u0026lt;!-- the template for the file footer (note @footer@ is an Ant-ism, and will not be processed by SchemaGen) --\u0026gt; \u0026lt;sgen:footer rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt; /* @footer@ */ \u0026lt;/sgen:footer\u0026gt; \u0026lt;!-- template for extra declarations at the top of the class file --\u0026gt; \u0026lt;sgen:declarations rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt; /** Factory for generating symbols */ private static KsValueFactory s_vf = new DefaultValueFactory(); \u0026lt;/sgen:declarations\u0026gt; \u0026lt;!-- template for introducing the properties in the vocabulary --\u0026gt; \u0026lt;sgen:propSection rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt; // Vocabulary properties /////////////////////////// \u0026lt;/sgen:propSection\u0026gt; \u0026lt;!-- template for introducing the classes in the vocabulary --\u0026gt; \u0026lt;sgen:classSection rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt; // Vocabulary classes /////////////////////////// \u0026lt;/sgen:classSection\u0026gt; \u0026lt;!-- template for introducing the datatypes in the vocabulary --\u0026gt; \u0026lt;sgen:datatypeSection rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt; // Vocabulary datatypes /////////////////////////// \u0026lt;/sgen:datatypeSection\u0026gt; \u0026lt;!-- template for introducing the individuals in the vocabulary --\u0026gt; \u0026lt;sgen:individualsSection rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt; // Vocabulary individuals /////////////////////////// \u0026lt;/sgen:individualsSection\u0026gt; \u0026lt;!-- template for doing fancy declarations of individuals --\u0026gt; \u0026lt;sgen:individualTemplate rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt;public static final KsSymbol %valname% = s_vf.newSymbol( \u0026quot;%valuri%\u0026quot; ); /** Ontology individual corresponding to {@link #%valname%} */ public static final %valclass% _%valname% = m_model.%valcreator%( %valtype%, \u0026quot;%valuri%\u0026quot; ); \u0026lt;/sgen:individualTemplate\u0026gt; \u0026lt;/sgen:Config\u0026gt; \u0026lt;/rdf:RDF\u0026gt; ","permalink":"https://jena.apache.org/documentation/tools/schemagen.html","tags":null,"title":"Jena schemagen HOWTO"},{"categories":null,"contents":"The Jena project has issued a number of security advisories during the lifetime of the project. On this page you\u0026rsquo;ll find details of our security issue process, as a listing of our past CVEs and relevant Dependency CVEs.\nProcess Jena follows the standard ASF Security for Committers policy for reporting and addressing security issues.\nIf you think you have identified a Security issue in our project please refer to that policy for how to report it, and the process that the Jena Project Management Committee (PMC) will follow in addressing the issue.\nSingle Supported Version\nAs a project, Apache Jena only has the resources to maintain a single release version. Any accepted security issue will be fixed in a future release in a timeframe appropriate to the severity of the issue.\nStandard Mitigation Advice\nNote that as a project our guidance to users is always to use the newest Jena version available to ensure you have any security fixes we have made available.\nWhere more specific mitigations are available, these will be denoted in the individual CVEs.\nEnd of Life (EOL) Components\nWhere a security advisory is issued for a component that is already EOL (sometimes referred to as archived or retired within our documentation) then we will not fix the issue but instead reiterate our previous recommendations that users cease using the EOL component and migrate to actively supported components.\nSuch issues will follow the CVE EOL Assignment Process and will be clearly denoted by the \u0026ldquo;UNSUPPORTED WHEN ASSIGNED\u0026rdquo; text at the start of the description.\nSecurity Issues in Dependencies\nFor our dependencies, the project relies primarily upon GitHub Dependabot Alerts to be made aware of available dependency updates, whether security related or otherwise. When a security related update is released and our analysis shows that Jena users may be affected we endeavour to take the dependency upgrade ASAP and make a new release in timeframe appropriate to the severity of the issue.\nJena CVEs The following CVEs specifically relate to the Jena codebase itself and have been addressed by the project. Per our policy above we advise users to always utilise the latest Jena release available.\nPlease refer to the individual CVE links for further details and mitigations.\nCVE-2023-32200 - Exposure of execution in script engine expressions\nCVE-2023-32200 affects Jena 3.7.0 through Jena 4.8.0 and relates to the Javascript SPARQL Functions feature of our ARQ SPARQL engine.\nThere is insufficient restrictions of called script functions in Apache Jena versions 4.8.0 and earlier, when invoking custom scripts. It allows a remote user to execute javascript via a SPARQL query.\nFrom Jena 4.9.0, script functions MUST be added to an explicit \u0026ldquo;allow\u0026rdquo; list for them to be called from the SPARQL query engine. This is in addition to the script enabling controls of Jena 4.8.0 which MUST also be applied.\nUsers should upgrade to latest Jena 4.x release available.\nCVE-2023-22665 - Exposure of arbitrary execution in script engine expressions\nCVE-2023-22665 affects Jena 3.7.0 through 4.7.0 and relates to the Javascript SPARQL Functions feature of our ARQ SPARQL engine.\nFrom Jena 4.8.0 onwards this feature MUST be explicitly enabled by end users, and on newer JVMs (Java 17 onwards) a JavaScript script engine MUST be explicitly added to the environment.\nHowever, when enabled this feature does expose the majority of the underlying scripting engine directly to SPARQL queries so may provide a vector for arbitrary code execution. Therefore, it is recommended that this feature remain disabled for any publicly accessible deployment that utilises the ARQ query engine.\nUsers should upgrade to latest Jena 4.x release available.\nCVE-2022-45136 - JDBC Serialisation in Apache Jena SDB\nCVE-2022-45136 affects all versions of Jena SDB up to and including the final 3.17.0 release.\nApache Jena SDB has been EOL since December 2020 and we recommend any remaining users migrate to Jena TDB 2 or other 3rd party vendor alternatives.\nApache Jena would like to thank Crilwa \u0026amp; LaNyer640 for reporting this issue\nCVE-2022-28890 - Processing External DTD\nCVE-2022-28890 affects the RDF/XML parser in Jena 4.4.0 only.\nUsers should upgrade to latest Jena 4.x release available.\nApache Jena would like to thank Feras Daragma, Avishag Shapira \u0026amp; Amit Laish (GE Digital, Cyber Security Lab) for their report.\nCVE-2021-39239 - XML External Entity (XXE) Vulnerabilit\nCVE-2021-39239 affects XML parsing up to and including the Jena 4.1.0 release.\nUsers should upgrade to latest Jena 4.x release available.\nCVE-2021-33192 - Display information UI XSS in Apache Jena Fusek\nCVE-2021-33192 affected Fuseki versions 2.0.0 through 4.0.0.\nUsers should upgrade to latest Jena 4.x release available.\nDependencies The following advisories are CVEs in Jena\u0026rsquo;s dependencies that may affect users of Jena, as with Jena specific CVEs our standard Security Issue Policy applies and any necessary dependency updates, dependency API and/or configuration changes have been adopted and released as soon as appropriate.\nlog4j2\nCVE-2021-45105, CVE-2021-45105 and CVE-2021-44832, collectively known as log4shell were several vulnerabilities identified in the Apache Log4j project that Jena uses as the concrete logging implementation for Fuseki and our command line tools.\nJena versions prior to 4.4.0 included vulnerable versions of Log4j.\nUsers should upgrade to latest Jena 4.x release available.\n","permalink":"https://jena.apache.org/about_jena/security-advisories.html","tags":null,"title":"Jena Security Advisories"},{"categories":null,"contents":"This page gives an overview of transactions in Jena.\nThere are two API for transactions: the basic transaction interface styled after the conventional begin-commit and a higher level Txn API that builds on the basic API using Java8 features.\nAPIs Basic API for Transactions Txn, a high level API to transactions Overview Transaction provide applications with a safe way to use and update data between threads. The properties of transactions are ACID\nAtomic, Consistent, Isolation, Durable - meaning that groups of changes are made visible to other transactions in a single unit or no changes become visible, and when made changes are not reversed, or the case of persistent storage, not lost or the database corrupted. Jena provides transaction on datasets and provides \u0026ldquo;serializable transactions\u0026rdquo;. Any application code reading data sees all changes made elsewhere, not parts of changes. In particular, SPARQL aggregation like COUNT are correct and do not see partial changes due to other transactions.\nThe exact details are dependent on the implementation.\nTransactions can not be nested (a transaction happening inside an outer transaction results in changes visible only to the outer transaction until that commits).\nTransactions are \u0026ldquo;per thread\u0026rdquo;. Actions by different threads on the same dataset are always inside different transactions.\nImplementations Transactions are part of the interface to RDF Datasets. There is a default implementation, based on MRSW locking (multiple-reader or single-writer) that can be used with any mixed set of components. Certain storage sub-systems provide better concurrency with MR+SW (multiple-read and single writer).\nDataset Facilities Creation TxnMem MR+SW DatasetFactory.createTxnMem TDB MR+SW, persistent TDBFactory.create TDB2 MR+SW, persistent TDB2Factory.create General MRSW DatasetFactory.create The general dataset can have any graphs added to it (e.g. inference graphs).\nMore details of transactions in TDB.\n","permalink":"https://jena.apache.org/documentation/txn/","tags":null,"title":"Jena Transactions"},{"categories":null,"contents":" API for Transactions Read transactions Write transactions Transaction promotion Txn - A higher level API to transactions API for Transactions This page describes the basic transaction API in Jena (3.1.0 and later).\nThere is also a higher-level API useful in many situations but sometimes it is necessary to use the basic transaction API described here.\nRead transactions These are used for SPARQL queries and code using the Jena API actions that do not change the data. The general pattern is:\ndataset.begin(ReadWrite.READ) ; try { ... } finally { dataset.end() ; } The dataset.end() declares the end of the read transaction. Applications may also call dataset.commit() or dataset.abort() which all have the same effect for a read transaction.\nThis example has two queries - no updates between or during the queries will be seen by this code even if another thread commits changes in the lifetime of this transaction.\nDataset dataset = ... ; dataset.begin(ReadWrite.READ) ; try { String qs1 = \u0026#34;SELECT * {?s ?p ?o} LIMIT 10\u0026#34; ; try(QueryExecution qExec = QueryExecution.create(qs1, dataset)) { ResultSet rs = qExec.execSelect() ; ResultSetFormatter.out(rs) ; } String qs2 = \u0026#34;SELECT * {?s ?p ?o} OFFSET 10 LIMIT 10\u0026#34; ; try(QueryExecution qExec = QueryExecution.create(qs2, dataset)) { rs = qExec.execSelect() ; ResultSetFormatter.out(rs) ; } } finally { dataset.end() ; } Write transactions These are used for SPARQL queries, SPARQL updates and any Jena API actions that modify the data. Beware that large model.read operations to change a dataset may consume large amounts of temporary space.\nThe general pattern is:\ndataset.begin(ReadWrite.WRITE) ; try { ... dataset.commit() ; } finally { dataset.end() ; } The dataset.end() will abort the transaction is there was no call to dataset.commit() or dataset.abort() inside the write transaction.\nOnce dataset.commit() or dataset.abort() is called, the application needs to start a new transaction to perform further operations on the dataset.\nDataset dataset = ... ; dataset.begin(TxnType.WRITE) ; try { Model model = dataset.getDefaultModel() ; // API calls to a model in the dataset // Make some changes via the model ... model.add( ... ) // A SPARQL query will see the new statement added. try (QueryExecution qExec = QueryExecution.create( \u0026#34;SELECT (count(?s) AS ?count) { ?s ?p ?o} LIMIT 10\u0026#34;, dataset)) { ResultSet rs = qExec.execSelect() ; ResultSetFormatter.out(rs) ; } // ... perform a SPARQL Update String sparqlUpdateString = StrUtils.strjoinNL( \u0026#34;PREFIX . \u0026lt;http://example/\u0026gt;\u0026#34;, \u0026#34;INSERT { :s :p ?now } WHERE { BIND(now() AS ?now) }\u0026#34; ) ; UpdateRequest request = UpdateFactory.create(sparqlUpdateString) ; UpdateExecution.dataset(dataset).update(request).execute() ; // Finally, commit the transaction. dataset.commit() ; // Or call .abort() } finally { dataset.end() ; } Transaction Types, Modes and Promotion. Transaction have a type (enum TxnType) and a mode (enum ReadWrite). TxnType.READ and TxnType.Write start the transaction in that mode and the mode is fixed for the transaction\u0026rsquo;s lifetime. A READ transaction can never update the data of the transactional object it is acting on.\nTransactions can have type TxnType.READ_PROMOTE or TxnType.READ_COMMITTED_PROMOTE. These start in mode READ but can become mode WRITE, either implicitly by attempting an update, or explicitly by calling promote.\nREAD_PROMOTE only succeeds if no writer has made any changes since this transaction started. It gives full isolation.\nREAD_COMMITTED_PROMOTE always succeeds because it changes the view of the data to include any changes made up to that point (it is \u0026ldquo;read committed\u0026rdquo;). Applications should be aware that data they have read up until the point of promotion (the first call or .promote or first update made) may now be invalid. For this reason, READ_PROMOTE is preferred.\nbegin(), the method with no arguments, is equivalent to begin(TxnType.READ_PROMOTE).\nMulti-threaded use Each dataset object has one transaction active at a time per thread. A dataset object can be used by different threads, with independent transactions.\nThe usual idiom within multi-threaded applications is to have one dataset, and so there is one transaction per thread.\nEither:\n// Create a dataset and keep it globally. static Dataset dataset = TDBFactory.createDataset(location) ; Thread 1:\ndataset.begin(TxnType.WRITE) ; try { ... dataset.commit() ; } finally { dataset.end() ; } Thread 2:\ndataset.begin(TxnType.READ) ; try { ... } finally { dataset.end() ; } It is possible (in TDB) to create different dataset objects to the same location.\nThread 1:\nDataset dataset = TDBFactory.createDataset(location) ; dataset.begin(TxnType.WRITE) ; try { ... dataset.commit() ; } finally { dataset.end() ; } Thread 2:\nDataset dataset = TDBFactory.createDataset(location) ; dataset.begin(TxnType.READ) ; try { ... } finally { dataset.end() ; } Each thread has a separate dataset object; these safely share the same storage and have independent transactions.\nMulti JVM Multiple applications, running in multiple JVMs, using the same file databases is not supported and has a high risk of data corruption. Once corrupted a database cannot be repaired and must be rebuilt from the original source data. Therefore there must be a single JVM controlling the database directory and files. From 1.1.0 onwards TDB includes automatic prevention against multi-JVM which prevents this under most circumstances.\nUse our Fuseki component to provide a database server for multiple applications. Fuseki supports SPARQL Query, SPARQL Update and the SPARQL Graph Store protocol.\n","permalink":"https://jena.apache.org/documentation/txn/transactions_api.html","tags":null,"title":"Jena Transactions API"},{"categories":null,"contents":"The following tutorials take a step-by-step approach to explaining aspects of RDF and linked-data applications programming in Jena. For a more task-oriented description, please see the getting started guide.\nRDF core API tutorial SPARQL tutorial Using Jena with Eclipse Manipulating SPARQL using ARQ Jena tutorials in other languages Quelques uns des tutoriels de Jena sont aussi disponibles en français. Vous pouvez les voir en suivant ces liens:\nUne introduction à RDF Requêtes SPARQL utilisant l\u0026rsquo;API Java ARQ Les entrées/sorties RDF Une introduction à SPARQL Os tutoriais a seguir explicam aspectos de RDF e da programação em Jena de aplicações linked-data. Veja também o guia getting started - em inglês.\nUma introdução à API RDF Tutorial SPARQL Manipulando SPARQL usando ARQ Usando o Jena com o Eclipse Simplified Chinese:\nRDF 和 Jena RDF API 入门 Greek:\nΕφαρμογές του Jena API στο Σημασιολογικό Ιστό ","permalink":"https://jena.apache.org/tutorials/","tags":null,"title":"Jena tutorials"},{"categories":null,"contents":"This page lists various projects and tools related to Jena - classes, packages, libraries, applications, or ontologies that enhance Jena or are built on top of it. These projects are not part of the Jena project itself, but may be useful to Jena users.\nThis list is provided for information purposes only, and is not meant as an endorsement of the mentioned projects by the Jena team.\nIf you wish your contribution to appear on this page, please raise a GitHub issue with the details to be published.\nRelated projects Name Description License Creator URL GeoSPARQL Jena Implementation of GeoSPARQL 1.0 standard using Apache Jena for SPARQL query or API. Apache 2.0 Greg Albiston and Haozhe Chen geosparql-jena at GitHub GeoSPARQL Fuseki HTTP server application compliant with the GeoSPARQL standard using GeoSPARQL Jena library and Apache Jena Fuseki server Apache 2.0 Greg Albiston geosparql-fuseki at GitHub Jastor Code generator that emits Java Beans from OWL Web Ontologies Common Public License Ben Szekely and Joe Betz Jastor website NG4J Named Graphs API for Jena BSD license Chris Bizer NG4J website Micro Jena (uJena) Reduced version of Jena for mobile devices as per Jena Fulvio Crivellaro and Gabriele Genovese and Giorgio Orsi Micro Jena Gloze XML to RDF, RDF to XML, XSD to OWL mapping tool as per Jena Steve Battle jena files page WYMIWYG KnoBot A fully Jena based semantic CMS. Implements URIQA. File-based persistence. Apache Reto Bachmann-Gmuer / wymiwyg.org Download KnoBot Infinite Graph An infinite graph implementation for RDF graphs BSD UTD Infinite Graph for Jena Twinkle A GUI interface for working with SPARQL queries Public Domain Leigh Dodds Twinkle project homepage GLEEN A path expression (a.k.a. \u0026ldquo;regular paths\u0026rdquo;) property function library for ARQ SparQL Apache 2.0 Todd Detwiler - University of Washington Structural Informatics Group GLEEN home Jena Sesame Model Jena Sesame Model - Sesame triple store for Jena models GNU Weijian Fang Jena Sesame Model D2RQ Treats non-RDF databases as virtual Jena RDF graphs GNU GPL License Chris Bizer D2RQ website GeoSpatialWeb This projects adds geo-spatial predicates and reasoning features to Jena property functions. GNU GPL License Marco Neumann and Taylor Cowan GeoSpatialWeb Jenabean Jenabean uses Jena\u0026rsquo;s flexible RDF/OWL API to persist Java beans. Apache 2.0 Taylor Cowan and David Donohue Jenabean project page Persistence Annotations 4 RDF Persistence Annotation for RDF (PAR) is a set of annotations and an entity manager that provides JPA like functionality on top of an RDF store while accounting for and exploiting the fundamental differences between graph storage and relational storage. PAR introduces three (3) annotations that map a RDF triple (subject, predicate, object) to a Plain Old Java Object (POJO) using Java\u0026rsquo;s dynamic proxy capabilities. Apache 2.0 Claude Warren PA4RDF at Sourceforge Semantic_Forms Swiss army knife for data management and social networking. open source Jean-Marc Vanel Semantic_Forms JDBC 4 SPARQL JDBC 4 SPARQL is a type 4 JDBC Driver that uses a SPARQL endpoint (or Jena Model) as the data store. Presents graph data as relational data to tools that understand SQL and utilize JDBC Apache 2.0 (Some components GNU LGPL V3.0) Claude Warren jdbc4sparql at GitHub ","permalink":"https://jena.apache.org/about_jena/contributions.html","tags":null,"title":"Jena-related projects and tools"},{"categories":null,"contents":"As of Jena 2.11.0, LARQ is replaced by jena-text\njena-text includes use of Apache Solr as a shared, search server, or Apache Lucene as a local text index. From Fuseki 0.2.7, jena-text is built into Fuseki.\nLARQ is not compatible with jena-text; the index format has changed and the integration with SPARQL is different.\nLARQ is a combination of ARQ and Lucene. It gives users the ability to perform free text searches within their SPARQL queries. Lucene indexes are additional information for accessing the RDF graph, not storage for the graph itself.\nSome example code is available here: https://svn.apache.org/repos/asf/jena/Archive/jena-larq/src/test/java/org/apache/jena/larq/examples/.\nTwo helper commands are provided: larq.larqbuilder and larq.larq used respectively for updating and querying LARQ indexes.\nA full description of the free text query language syntax is given in the Lucene query syntax document.\nUsage Patterns There are three basic usage patterns supported:\nPattern 1 : index string literals. The index will return the literals matching the Lucene search pattern. Pattern 2 : index subject resources by string literal. The index returns the subjects with property value matching a text query. Pattern 3 : index graph nodes based on strings not present in the graph. Patterns 1 and 2 have the indexed content in the graph. Both 1 and 2 can be modified by specifying a property so that only values of a given property are indexed. Pattern 2 is less flexible as discussed below. Pattern 3 is covered in the external content section below.\nLARQ can be used in other ways as well but the classes for these patterns are supplied. In both patterns 1 and 2, strings are indexed, being plain strings, string with any language tag or any literal with datatype XSD string.\nIndex Creation There are many ways to use Lucene, which can be set up to handle particular features or languages. The creation of the index is done outside of the ARQ query system proper and only accessed at query time. LARQ includes some platform classes and also utility classes to create indexes on string literals for the use cases above. Indexing can be performed as the graph is read in, or to built from an existing graph.\nIndex Builders An index builder is a class to create a Lucene index from RDF data.\nIndexBuilderString: This is the most commonly used index builder. It indexes plain literals (with or without language tags) and XSD strings and stores the complete literal. Optionally, a property can be supplied which restricts indexing to strings in statements using that property. IndexBuilderSubject: Index the subject resource by a string literal, a store the subject resource, possibly restricted by a specified property. Lucene has many ways to create indexes and the index builder classes do not attempt to provide all possible Lucene features. Applications may need to extend or modify the standard index builders provided by LARQ.\nIndex Creation An index can be built while reading RDF into a model:\n// -- Read and index all literal strings. IndexBuilderString larqBuilder = new IndexBuilderString() ; // -- Index statements as they are added to the model. model.register(larqBuilder) ; FileManager.get().readModel(model, datafile) ; // -- Finish indexing larqBuilder.closeWriter() ; model.unregister(larqBuilder) ; // -- Create the access index IndexLARQ index = larqBuilder.getIndex() ; or it can be created from an existing model:\n// -- Create an index based on existing statements larqBuilder.indexStatements(model.listStatements()) ; // -- Finish indexing larqBuilder.closeWriter() ; // -- Create the access index IndexLARQ index = larqBuilder.getIndex() ; Index Registration Next the index is made available to ARQ. This can be done globally:\n// -- Make globally available LARQ.setDefaultIndex(index) ; or it can be set on a per-query execution basis.\nQueryExecution qExec = QueryExecutionFactory.create(query, model) ; // -- Make available to this query execution only LARQ.setDefaultIndex(qExec.getContext(), index) ; In both these cases, the default index is set, which is the one expected by property function pf:textMatch. Use of multiple indexes in the same query can be achieved by introducing new properties. The application can subclass the search class org.apache.jena.larq.LuceneSearch to set different indexes with different property names.\nQuery using a Lucene index Query execution is as usual using the property function pf:textMatch. \u0026ldquo;textMatch\u0026rdquo; can be thought of as an implied relationship in the data. Note the prefix ends in \u0026ldquo;.\u0026rdquo;.\nString queryString = StringUtils.join(\u0026quot;\\n\u0026quot;, new String[]{ \u0026quot;PREFIX pf: \u0026lt;http://jena.hpl.hp.com/ARQ/property#\u0026gt;\u0026quot;, \u0026quot;SELECT * {\u0026quot; , \u0026quot; ?lit pf:textMatch '+text'\u0026quot;, \u0026quot;}\u0026quot; }) ; Query query = QueryFactory.create(queryString) ; QueryExecution qExec = QueryExecutionFactory.create(query, model) ; ResultSetFormatter.out(System.out, qExec.execSelect(), query) ; The subjects with a property value of the matched literals can be retrieved by looking up the literals in the model:\nPREFIX pf: \u0026lt;http://jena.hpl.hp.com/ARQ/property#\u0026gt; SELECT ?doc { ?lit pf:textMatch '+text' . ?doc ?p ?lit } This is a more flexible way of achieving the effect of using a IndexBuilderSubject. IndexBuilderSubject can be more compact when there are many large literals (it stores the subject not the literal) but does not work for blank node subjects without extremely careful co-ordination with a persistent model. Looking the literal up in the model does not have this complication.\nAccessing the Lucene Score The application can get access to the Lucene match score by using a list argument for the subject of pf:textMatch. The list must have two arguments, both unbound variables at the time of the query.\nPREFIX pf: \u0026lt;http://jena.hpl.hp.com/ARQ/property#\u0026gt; SELECT ?doc ?score { (?lit ?score ) pf:textMatch '+text' . ?doc ?p ?lit } Limiting the number of matches When used with just a query string, pf:textMatch returns all the Lucene matches. In many applications, the application is only interested in the first few matches (Lucene returns matches in order, highest scoring first), or only matches above some score threshold. The query argument that forms the object of the pf:textMatch property can also be a list, including a score threshold and a total limit on the number of results matched.\n?lit pf:textMatch ( '+text' 100 ) . # Limit to at most 100 hits ?lit pf:textMatch ( '+text' 0.5 ) . # Limit to Lucene scores of 0.5 and over. ?lit pf:textMatch ( '+text' 0.5 100 ) . # Limit to scores of 0.5 and limit to 100 hits Direct Application Use The IndexLARQ class provides the ability to search programmatically, not just from ARQ. The searchModelByIndex method returns an iterator over RDFNodes.\n// -- Create the access index IndexLARQ index = larqBuilder.getIndex() ; NodeIterator nIter = index.searchModelByIndex(\u0026quot;+text\u0026quot;) ; for ( ; nIter.hasNext() ; ) { // if it's an index storing literals ... Literal lit = (Literal)nIter.nextNode() ; } External Content Pattern 3: index graph nodes based on strings not present in the graph. Sometimes, the index needs to be created based on external material and the index gives nodes in the graph. This can be done by using IndexBuilderNode which is a helper class to relate external material to some RDF node.\nHere, the indexed content is not in the RDF graph at all. For example, the indexed content may come from HTML.XHTML, PDFs or XML documents and the RDF graph only holds the metadata about these content items.\nThe Lucene contributions page lists some content converters.\nGetting Help and Getting Involved If you have a problem with LARQ, make sure you read the Getting help with Jena page and post a message on the users@jena.apache.org mailing list. You can also search the jena-users mailing list archives here.\nIf you use LARQ and you want to get involved, make sure you read the Getting Involved page. You can help us making LARQ better by:\nimproving this documentation, writing tutorials or blog posts about LARQ letting us know how you use LARQ, your use cases and what are in your opinion missing features answering users question about LARQ on the users@jena.apache.org mailing list submitting bug reports and feature requests checking out LARQ source code, playing with it and let us know your ideas for possible improvements: https://svn.apache.org/repos/asf/jena/Archive/jena-larq ","permalink":"https://jena.apache.org/documentation/archive/larq/","tags":null,"title":"LARQ - adding free text searches to SPARQL"},{"categories":null,"contents":"The Apache Jena Elephas libraries for Apache Hadoop are a collection of maven artifacts which can be used individually or together as desired. These are available from the same locations as any other Jena artifact, see Using Jena with Maven for more information.\nHadoop Dependencies The first thing to note is that although our libraries depend on relevant Hadoop libraries these dependencies are marked as provided and therefore are not transitive. This means that you may typically also need to declare these basic dependencies as provided in your own POM:\n\u0026lt;!-- Hadoop Dependencies --\u0026gt; \u0026lt;!-- Note these will be provided on the Hadoop cluster hence the provided scope --\u0026gt; \u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.hadoop\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;hadoop-common\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;2.6.0\u0026lt;/version\u0026gt; \u0026lt;scope\u0026gt;provided\u0026lt;/scope\u0026gt; \u0026lt;/dependency\u0026gt; \u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.hadoop\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;hadoop-mapreduce-client-common\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;2.6.0\u0026lt;/version\u0026gt; \u0026lt;scope\u0026gt;provided\u0026lt;/scope\u0026gt; \u0026lt;/dependency\u0026gt; Using Alternative Hadoop versions If you wish to use a different Hadoop version then we suggest that you build the modules yourself from source which can be found in the jena-elephas folder of our source release (available on the Downloads page) or from our Git repository (see Getting Involved for details of the repository).\nWhen building you need to set the hadoop.version property to the desired version e.g.\n\u0026gt; mvn clean package -Dhadoop.version=2.4.1 Would build for Hadoop 2.4.1\nNote that we only support Hadoop 2.x APIs and so Elephas cannot be built for Hadoop 1.x\nJena RDF Tools for Apache Hadoop Artifacts Common API The jena-elephas-common artifact provides common classes for enabling RDF on Hadoop. This is mainly composed of relevant Writable implementations for the various supported RDF primitives.\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-elephas-common\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;x.y.z\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; IO API The IO API artifact provides support for reading and writing RDF in Hadoop:\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-elephas-io\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;x.y.z\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; Map/Reduce The Map/Reduce artifact provides various building block mapper and reducer implementations to help you get started writing Map/Reduce jobs over RDF data quicker:\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-elephas-mapreduce\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;x.y.z\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; RDF Stats Demo The RDF Stats Demo artifact is a Hadoop job jar which can be used to run some simple demo applications over your own RDF data:\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-elephas-stats\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;x.y.z\u0026lt;/version\u0026gt; \u0026lt;classifier\u0026gt;hadoop-job\u0026lt;/classifier\u0026gt; \u0026lt;/dependency\u0026gt; ","permalink":"https://jena.apache.org/documentation/archive/hadoop/artifacts.html","tags":null,"title":"Maven Artifacts for Apache Jena Elephas"},{"categories":null,"contents":"The Jena JDBC libraries are a collection of maven artifacts which can be used individually or together as desired. These are available from the same locations as any other Jena artifact, see Using Jena with Maven for more information.\nCore Library The jena-jdbc-core artifact is the core library that contains much of the common implementation for the drivers. This is a dependency of the other artifacts and will typically only be required as a direct dependency if you are implementing a custom driver\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-jdbc-core\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;x.y.z\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; In-Memory Driver The in-memory driver artifact provides the JDBC driver for non-persistent in-memory datasets.\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-jdbc-driver-mem\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;x.y.z\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; TDB Driver The TDB driver artifact provides the JDBC driver for TDB datasets.\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-jdbc-driver-tdb\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;x.y.z\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; Remote Endpoint Driver The Remote Endpoint driver artifact provides the JDBC driver for accessing arbitrary remote SPARQL compliant stores.\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-jdbc-driver-remote\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;x.y.z\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; Driver Bundle The driver bundle artifact is a shaded JAR (i.e. with dependencies included) suitable for dropping into tools to easily make Jena JDBC drivers available without having to do complex class path setups.\nThis artifact depends on all the other artifacts.\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-jdbc-driver-bundle\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;x.y.z\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; ","permalink":"https://jena.apache.org/documentation/jdbc/artifacts.html","tags":null,"title":"Maven Artifacts for Jena JDBC"},{"categories":null,"contents":"Apache Jena3 is a major version release for Jena - it is not binary compatible with Jena2. The migration consists of package renaming and database reloading.\nKey Changes Package renaming RDF 1.1 Semantics for plain literals Persistent data (TDB, SDB) should be reloaded. Java8 is required. Security renamed to Permissions. Security Evaluator changes required Package Name Changes Packages with a base name of com.hp.hpl.jena become org.apache.jena.\nGlobal replacement of import com.hp.hpl.jena. with import org.apache.jena. will cover the majority of cases.\nThe Jena APIs remain unchanged expect for this renaming.\nVocabularies unchanged Only java package names are being changed. Vocabularies are not affected.\nAssemblers Migration support is provided by mapping ja:loadClass names beginning com.hp.hpl.jena internally to org.apache.jena. A warning is logged.\nLogging This will also affect logging: logger names reflect the java class naming so loggers for com.hp.hpl.jena become org.apache.jena\nRDF 1.1 Many of the changes and refinements for RDF 1.1 are already in Jena2. The parsers for Turtle-family languages already follow the RDF 1.1 grammars and output is compatible with RDF 1.1 as well as earlier output details.\nRDF 1.1 changes for plain literals In RDF 1.1, all literals have a datatype. The datatype of a plain literal with no language tag (also called a \u0026ldquo;simple literal\u0026rdquo;) has datatype xsd:string. A plain literal with a language tag has datatype rdf:langString.\nConsequences:\n\u0026quot;abc\u0026quot; and \u0026quot;abc\u0026quot;^^xsd:string are the same RDF term in RDF 1.1. Jena2 memory models have always treated these as the same value, but different terms. Jena2 persistent models treated them as two separate term and two separate values.\nData is not invalidated by this change.\nThe parsers will give datatypes to all data read, there is no need to change the data.\nOutput is in the datatype-less form (an abbreviated syntax) even in N-triples.\nApplications which explicitly use ^^xsd:string (or in RDF/XML, rdf:datatype=\u0026quot;http://www.w3.org/2001/XMLSchema#string\u0026quot;) will see a change in appearance.\nApplications with a mix of plain literals and explicit ^^xsd:string (the RDF 1.1 Work Group believed these to be uncommon) may see changes.\nApplications that do their own RDF output need to be careful to not assume that having datatype excludes the possibility of also having a language tag.\nPersistent Data For data stored in TDB and SDB, it is advisable to reload data.\nData that does not use explicit xsd:string should be safe but it is still recommended that data is reloaded at a convenient time.\nData that does use explicit xsd:string must be reloaded.\nSecurity package renamed to Permissions Jena Security has been renamed Jena Permissions and the Maven artifact id has been changed to jena-permissions to reflect this change.\nShim code that was introduced to map Jena classes to security classes has been removed. This change requires changes to SecurityEvaluator implementations. More details are available at the Permissions migration documentation.\nOther GraphStore interface has been removed ModelFactory.createFileModelMaker has been removed LateBindingIterator has been removed: use LazyIterator instead EarlyBindingIterator has been removed: no replacement UniqueExtendedIterator has been removed: use ExtendedIterator with unique filter ","permalink":"https://jena.apache.org/documentation/migrate_jena2_jena3.html","tags":null,"title":"Migrating from Jena2 to Jena3"},{"categories":null,"contents":"Note: These notes are not kept up to date.\nThey may be of interest into the original design of the Enhanced Node mechanism.\nEnhanced Nodes This note is a development of the original note on the enhanced node and graph design of Jena 2.\nKey objectives for the enhanced node design One problem with the Jena 1 design was that both the DAML layer and the RDB layer independently extended Resource with domain-specific information. That made it impossible to have a DAML-over-RDB implementation. While this could have been fixed by using the \u0026ldquo;enhanced resource\u0026rdquo; mechanism of Jena 1, that would have left a second problem.\nIn Jena 1.0, once a resource has been determined to be a DAML Class (for instance), that remains true for the lifetime of the model. If a resource starts out not qualifying as a DAML Class (no rdf:type daml:Class) then adding the type assertion later doesn\u0026rsquo;t make it a Class. Similarly, of a resource is a DAML Class, but then the type assertion is retracted, the resource is still apparently a class.\nHence being a DAMLClass is a view of the resource that may change over time. Moreover, a given resource may validly have a number of different views simultaneously. Using the current DAMLClass implementation method means that a given resource is limited to a single such view.\nA key objective of the new design is to allow different views, or facets, to be used dynamically when accessing a node. The new design allows nodes to be polymorphic, in the sense that the same underlying node from the graph can present different encapsulations\nthus different affordances to the programmer - on request. In summary, the enhanced node design in Jena 2.0 allows programmers to:\nprovide alternative perspectives onto a node from a graph, supporting additional functionality particular to that perspective; dynamically convert a between perspectives on nodes; register implementations of implementation classes that present the node as an alternative perspective. Terminology To assist the following discussion, the key terms are introduced first.\nnode ~ A subject or object from a triple in the underlying graph graph ~ The underlying container of RDF triples that simplifies the previous abstraction Model enhanced node ~ An encapsulation of a node that adds additional state or functionality to the interface defined for node. For example, a bag is a resource that contains a number of other resources; an enhanced node encapsulating a bag might provide simplified programmatic access to the members of the bag. enhanced graph ~ Just as an enhanced node encapsulates a node and adds extra functionality, an enhanced graph encapsulates an underlying graph and provides additional features. For example, both Model and DAMLModel can be thought of as enhancements to the (deliberately simple) interface to graphs. polymorphic ~ An abstract super-class of enhanced graph and enhanced node that exists purely to provide shared implementation. personality ~ An abstraction that circumscribes the set of alternative views that are available in a given context. In particular, defines a mapping from types (q.v.) to implementations (q.v.). This seems to be taken to be closed for graphs. implementation ~ A factory object that is able to generate polymorphic objects that present a given enhanced node according to a given type. For example, an alt implementation can produce a sub-class of enhanced node that provides accessors for the members of the alt.\nKey points Some key features of the design are:\nevery enhanced graph has a single graph personality, which represents the types of all the enhanced nodes that can be created in this graph; every enhanced node refers to that personality different kinds of enhanced graph can have different personalities, for example, may implement interfaces in different ways, or not implement some at all. enhanced nodes wrap information in the graph, but keep no independent state; they may be discarded and regenerated at whim. How an enhanced node is created Creation from another enhanced node If en is an enhanced node representing some resource we wish to be able to view as being of some (Java) class/interface T, the expression en.as(T.class) will either deliver an EnhNode of type C, if it is possible to do so, or throw an exception if not.\nTo check if the conversion is allowed, without having to catch exceptions, the expression en.canAs(T.class) delivers true iff the conversion is possible.\nCreation from a base node Somehow, some seed enhanced node must be created, otherwise as() would have nothing to work on. Subclasses of enhanced node provide constructors (perhaps hidden behind factories) which wrap plain nodes up in enhanced graphs. Eventually these invoke the constructor EnhNode(Node,EnhGraph)\nIt\u0026rsquo;s up to the constructors for the enhanced node subclasses to ensure that they are called with appropriate arguments.\ninternal operation of the conversion as(Class T) is defined on EnhNode to invoke asInternal(T) in Polymorphic. If the original enhanced node enis already a valid instance of T, it is returned as the result. Validity is checked by the method isValue().\nIf en is not already of type T, then a cache of alternative views of en is consulted to see if a suitable alternative exists. The cache is implemented as a sibling ring of enhanced nodes - each enhanced node has a link to its next sibling, and the \u0026ldquo;last\u0026rdquo; node links back to the \u0026ldquo;first\u0026rdquo;. This makes it cheap to find alternative views if there are not too many of them, and avoids caches filling up with dead nodes and having to be flushed.\nIf there is no existing suitable enhanced node, the node\u0026rsquo;s personality is consulted. The personality maps the desired class type to an Implementation object, which is a factory with a wrap method which takes a (plain) node and an enhanced graph and delivers the new enhanced node after checking that its conditions apply. The new enhanced node is then linked into the sibling ring.\nHow to build an enhanced node \u0026amp; graph What you have to do to define an enhanced node/graph implementation:\ndefine an interface I for the new enhanced node. (You could use just the implementation class, but we\u0026rsquo;ve stuck with the interface, because there might be different implementations) define the implementation class C. This is just a front for the enhanced node. All the state of C is reflected in the graph (except for caching; but beware that the graph can change without notice). define an Implementation class for the factory. This class defines methods canWrap and wrap, which test a node to see if it is allowed to represent I and construct an implementation of Crespectively. Arrange that the personality of the graph maps the class of I to the factory. At the moment we do this by using (a copy of) the built-in graph personality as the personality for the enhanced graph. For an example, see the code for ReifiedStatementImpl.\nReification API Introduction This document describes the reification API in Jena2, following discussions based on the 0.5a document. The essential decision made during that discussion is that reification triples are captured and dealt with by the Model transparently and appropriately.\nContext The first Jena implementation made some attempt to optimise the representation of reification. In particular it tried to avoid so called \u0026rsquo;triple bloat\u0026rsquo;, ie requiring four triples to represent the reification of a statement. The approach taken was to make a Statement a subclass of Resource so that properties could be directly attached to statement objects.\nThere are a number of defects in the Jena 1 approach.\nNot everyone in the team was bought in to the approach The .equals() method for Statements was arguably wrong and also violated the Java requirements on a .equals() The implied triples of a reification were not present so could not be searched for There was confusion between the optimised representation and explicit representation of reification using triples The optimisation did not round trip through RDF/XML using the the writers and ARP. However, there are some supporters of the approach. They liked:\nthe avoidance of triple bloat that the extra reifications statements are not there to be found on queries or ListStatements and do not affect the size() method. Since Jena was first written the RDFCore WG have clarified the meaning of a reified statement. Whilst Jena 1 took a reified statement to denote a statement, RDFCore have decided that a reified statement denotes an occurrence of a statement, otherwise called a stating. The Jena 1 .equals() methods for Statements is thus inappropriate for comparing reified statements. The goal of reification support in the Jena 2 implementation are:\nto conform to the revised RDF specifications to maintain the expectations of Jena 1; ie they should still be able to reify everything without worrying about triple bloat if they want to as far as is consistent with 2, to not break existing code, or at least make it easy to transition old code to Jena 2. to enable round tripping through RDF/XML and other RDF representation languages enable a complete standard compliant implementation, but not necessarily as default Presentation API Statement will no longer be a subclass of Resource. Thus a statement may not be used where a resource is expected. Instead, a new interface ReifiedStatement will be defined:\npublic interface ReifiedStatement extends Resource { public Statement getStatement(); // could call it a day at that or could duplicate convenience // methods from Statement, eg getSubject(), getInt(). ... } The Statement interface will be extended with the following methods: public interface Statement \u0026hellip; public ReifiedStatement createReifiedStatement(); public ReifiedStatement createReifiedStatement(String URI); /* / public boolean isReified(); public ReifiedStatement getAnyReifiedStatement(); / / public RSIterator listReifiedStatements(); / */ public void removeAllReifications(); \u0026hellip;\nRSIterator is a new iterator which returns ReifiedStatements. It is an extension of ResourceIterator. The Model interface will be extended with the following methods:\npublic interface Model ... public ReifiedStatement createReifiedStatement(Statement stmt); public ReifiedStatement createReifiedStatement(String URI, Statement stmt); /* */ public boolean isReified(Statement st); public ReifiedStatement getAnyReifiedStatement(Statement stmt); /* */ public RSIterator listReifiedStatements(); public RSIterator listReifiedStatements(Statement stmt); /* */ public void removeReifiedStatement(reifiedStatement rs); public void removeAllReifications(Statement st); ... The methods in Statement are defined to be the obvious calls of methods in Model. The interaction of those models is expressed below. Reification operates over statements in the model which use predicates rdf:subject, rdf:predicate, rdf:object, and rdf:type with object rdf:Statement. statements with those predicates are, by default, invisible. They do not appear in calls of listStatements, contains, or uses of the Query mechanism. Adding them to the model will not affect size(). Models that do not hide reification quads will also be available.\nRetrieval The Model::as() mechanism will allow the retrieval of reified statements.\nsomeResource.as( ReifiedStatement.class ) If someResource has an associated reification quad, then this will deliver an instance rs of ReifiedStatement such that rs.getStatement() will be the statement rs reifies. Otherwise a DoesNotReifyException will be thrown. (Use the predicate canAs() to test if the conversion is possible.) It does not matter how the quad components have arrived in the model; explicitly asserted or by the create mechanisms described below. If quad components are removed from the model, existing ReifiedStatement objects will continue to function, but conversions using as() will fail.\nCreation createReifiedStatement(Statement stmt) creates a new ReifiedStatement object that reifies stmt; the appropriate quads are inserted into the model. The resulting resource is a blank node.\ncreateReifiedStatement(String URI, Statement stmt) creates a new ReifiedStatement object that reifies stmt; the appropriate quads are inserted into the model. The resulting resource is a Resource with the URI given.\nEquality Two reified statements are .equals() iff they reify the same statement and have .equals() resources. Thus it is possible for equal Statements to have unequal reifications.\nIsReified isReified(Statement st) is true iff in the Model of this Statement there is a reification quad for this Statement. It does not matter if the quad was inserted piece-by-piece or all at once using a create method.\nFetching getAnyReifiedStatement(Statement st) delivers an existing ReifiedStatement object that reifies st, if there is one; otherwise it creates a new one. If there are multiple reifications for st, it is not specified which one will be returned.\nListing listReifiedStatements() will return an RSIterator which will deliver all the reified statements in the model.\nlistReifiedStatements( Statement st ) will return an RSIterator which will deliver all the reified statements in the model that reifiy st.\nRemoval removeReifiedStatement(ReifiedStatement rs) will remove the reification rs from the model by removing the reification quad. Other reified statements with different resources will remain.\nremoveAllReifications(Statement st) will remove all the reifications in this model which reify st.\nInput and output The writers will have access to the complete set of Statements and will be able to write out the quad components.\nThe readers need have no special machinery, but it would be efficient for them to be able to call createReifiedStatement when detecting an reification.\nPerformance Jena1\u0026rsquo;s \u0026ldquo;statements as resources\u0026rdquo; approach avoided triples bloat by not storing the reification quads. How, then, do we avoid triple bloat in Jena2?\nThe underlying machinery is intended to capture the reification quad components and store them in a form optimised for reification. In particular, in the case where a statement is completely reified, it is expected to store only the implementation representation of the Statement.\ncreateReifiedStatement is expected to bypass the construction and detection of the quad components, so that in the \u0026ldquo;usual case\u0026rdquo; they will never come into existence.\nThe Reification SPI Introduction This document describes the reification SPI, the mechanisms by which the Graph family supports the Model API reification interface.\nGraphs handle reification at two levels. First, their reifier supports requests to reify triples and to search for reifications. The reifier is responsible for managing the reification information it adds and removes - the graph is not involved.\nSecond, a graph may optionally allow all triples added and removed through its normal operations (including the bulk update interfaces) to be monitored by its reifier. If so, all appropriate triples become the property of the reifier - they are no longer visible through the graph.\nA graph may also have a reifier that doesn\u0026rsquo;t do any reification. This is useful for internal graphs that are not exposed as models. So there are three kinds of Graph:\nGraphs that do no reification; Graphs that only do explicit reification; Graphs that do implicit reification.\nGraph operations for reification The primary reification operation on graphs is to extract their Reifier instance. Handing reification off to a different class allows reification to be handled independently of other Graph issues, eg query handling, bulk update.\nGraph.getReifier() -\u0026gt; Reifier Returns the Reifier for this Graph. Each graph has a single reifier during its lifetime. The reifier object need not be allocated until the first call of getReifier().\nadd(Triple), delete(Triple) These two operations may defer their triples to the graph\u0026rsquo;s reifier using handledAdd(Triple) and handledDelete(Triple); see below for details.\nInterface Reifier Instances of Reifier handle reification requests from their Graph and from the API level code (issues by the API class ModelReifier.\nreifier.getHiddenTriples() -\u0026gt; Graph The reifier may keep reification triples to itself, coded in some special way, rather than having them stored in the parent Graph. This method exposes those triples as another Graph. This is a dynamic graph - it changes as the underlying reifications change. However, it is read-only; triples cannot be added to or removed from it. The SimpleReifier implementation currently does not implement a dynamic graph. This is a bug that will need fixing.\nreifier.getParentGraph() -\u0026gt; Graph Get the Graph that this reifier serves; the result is never null. (Thus the observable relationship between graphs and reifiers is 1-1.)\nclass AlreadyReifiedException This class extends RDFException; it is the exception that may be thrown by reifyAs.\nreifier.reifyAs( Triple t, Node n ) -\u0026gt; Node Record the t as reified in the parent Graph by the given n and returns n. If n already reifies a different Triple, throw a AlreadyReifiedException. Calling reifyAs(t,n) is like adding the triples:\nn rdf:type ref:Statement n rdf:subject t.getSubject() n rdf:predicate t.getPredicate() n rdf:object t.getObject() to the associated Graph; however, it is intended that it is efficient in both time and space.\nreifier.hasTriple( Triple t ) -\u0026gt; boolean Returns true iff some Node n reifies t in this Reifier, typically by an unretracted call of reifyAs(t,n). The intended (and actual) use for hasTriple(Triple) is in the implementation of isReified(Statement) in Model.\nreifier.getTriple( Node n ) -\u0026gt; Triple Get the single Triple associated with n, if there is one. If there isn\u0026rsquo;t, return null. A node reifies at most one triple. If reifyAs, with its explicit check, is bypassed, and extra reification triples are asserted into the parent graph, then getTriple() will simply return null.\nreifier.allNodes() -\u0026gt; ExtendedIterator Returns an (extended) iterator over all the nodes that (still) reifiy something in this reifier. This is intended for the implementation of listReifiedStatements in Model.\nreifier.allNodes( Triple t ) -\u0026gt; ClosableIterator Returns an iterator over all the nodes that (still) reify the triple _t_.\nreifier.remove( Node n, Triple t ) Remove the association between n and the triplet. Subsequently, hasNode(n) will return false and getTriple(n) will return null. This method is used to implement removeReification(Statement) in Model.\nreifier.remove( Triple t ) Remove all the associations between any node n and t; ie, for all n do remove(n,t). This method is used to implement removeAllReifications in Model.\nhandledAdd( Triple t ) -\u0026gt; boolean A graph doing reification may choose to monitor the triples being added to it and have the reifier handle reification triples. In this case, the graph\u0026rsquo;s add(t) should call handledAdd(t) and only proceed with its add if the result is false. A graph that does not use handledAdd() [and handledDelete()] can only use the explicit reification supplied by its reifier.\nhandledRemove( Triple t ) As for handledAdd(t), but applied to delete.\nSimpleReifier SimpleReifier is an implementation of Reifier suitable for in-memory Graphs built over GraphBase. It operates in either of two modes: with and without triple interception. With interception enabled, reification triples fed to (or removed from) its parent graph are captured using handledAdd() and handledRemove; otherwise they are ignored and the graph must store them itself. SimpleReifier keeps a map from nodes to the reification information about that node. Nodes which have no reification information (most of them, in the usual case) do not appear in the map at all.\nNodes with partial or excessive reification information are associated with Fragments. A Fragments for a node n records separately\nthe Ss of all n ref:subject S triples the Ps of all n ref:predicate P triples the Os of all n ref:subject O triples the Ts of all n ref:type T[Statement] triples If the Fragments becomes singular, ie each of these sets contains exactly one element, then n represents a reification of the triple (S, P, O), and the Fragments object is replaced by that triple. (If another reification triple for n arrives, then the triple is re-exploded into Fragments.)\n","permalink":"https://jena.apache.org/documentation/notes/jena-internals.html","tags":null,"title":"Notes on Jena internals"},{"categories":null,"contents":"A Parameterized SPARQL String is a SPARQL query/update into which values may be injected.\nThe intended usage of this is where using a QuerySolutionMap as initial bindings is either inappropriate or not possible e.g.\nGenerating query/update strings in code without lots of error prone and messy string concatenation Preparing a query/update for remote execution Where you do not want to simply say some variable should have a certain value but rather wish to insert constants into the query/update in place of variables Defending against SPARQL injection when creating a query/update using some external input, see SPARQL Injection notes for limitations. Provide a more convenient way to prepend common prefixes to your query This class is useful for preparing both queries and updates hence the generic name as it provides programmatic ways to replace variables in the query with constants and to add prefix and base declarations. A Query or UpdateRequest can be created using the asQuery() and asUpdate() methods assuming the command an instance represents is actually valid as a query/update.\nBuilding parameterised commands A ParameterizedSparqlString is created as follows:\nParameterizedSparqlString pss = new ParameterizedSparqlString(); There are also constructor overloads that take in an initial command text, parameter values, namespace prefixes etc. which may allow you to simplify some code.\nOnce you have an instance you first set your template command with the setCommandText() method like so:\npss.setCommandText(\u0026quot;SELECT * WHERE {\\n\u0026quot; + \u0026quot; ?s a ?type .\\n\u0026quot; + \u0026quot; OPTIONAL { ?s rdfs:label ?label . }\\n\u0026quot; + \u0026quot;}\u0026quot;); Note that in the above example we did not define the rdfs: prefix so as it stands the query is invalid. However you can automatically populate BASE and PREFIX declarations for your command without having to explicitly declare them in your command text by using the setBaseUri() and setNsPrefix() method e.g.\n// Add a Base URI and define the rdfs prefix pss.setBaseUri(\u0026quot;http://example.org/base#\u0026quot;); pss.setNsPrefix(\u0026quot;rdfs\u0026quot;, \u0026quot;http://www.w3.org/2000/01/rdf-schema#\u0026quot;); You can always call toString() to see the current state of your instance e.g.\n// Print current state to stdout System.out.println(pss.toString()); Which based on the calls so far would print the following:\nBASE \u0026lt;http://example.org/base#\u0026gt; PREFIX rdfs: \u0026lt;http://www.w3.org/2000/01/rdf-schema#\u0026gt; SELECT * WHERE { ?s a ?type . OPTIONAL { ?s rdfs:label ?label . } } Note that the state of the instance returned by toString() will include any injected values. Part of what the toString() method does is check that your command is not subject to SPARQL injection attacks so in some cases where a possible injection is detected an ARQException will be thrown.\nInjecting Values Once you have a command text prepared then you want to actually inject values into it, values may be injected in several ways:\nBy treating a variable in the SPARQL string as a parameter Using JDBC style positional parameters Appending values directly to the command text being built See the ParameterizedSparqlString javadocs for a comprehensive reference of available methods for setting values, the following sections shows some basic examples of this.\nVariable Parameters Any SPARQL variable in the command text may have a value injected to it, injecting a value replaces all usages of that variable in the command i.e. substitutes the variable for a constant. Importantly injection is done by textual substitution so in some cases may cause unexpected side effects.\nVariables parameters are set via the various setX() methods which take a String as their first argument e.g.\n// Set an IRI pss.setIri(\u0026quot;x\u0026quot;, \u0026quot;http://example.org\u0026quot;); // Set a Literal pss.setLiteral(\u0026quot;x\u0026quot;, 1234); pss.setLiteral(\u0026quot;x\u0026quot;, true); pss.setLiteral(\u0026quot;x\u0026quot;, \u0026quot;value\u0026quot;); Where you set a value for a variable you have already set the existing value is overwritten. Setting any value to null has the same effect as calling the clearParam(\u0026quot;x\u0026quot;) method\nIf you have the value already as a RDFNode or Node instance you can call the setParam() method instead e.g.\n// Set a Node Node n = NodeFactory.createIRI(\u0026quot;http://example.org\u0026quot;); pas.setParam(\u0026quot;x\u0026quot;, n); Positional Parameters You can use JDBC style positional parameters if you prefer, a JDBC style parameter is a single ? followed by whitespace or certain punctuation characters (currently ; , .). Positional parameters have a unique index which reflects the order in which they appear in the string. Note that positional parameters use a zero based index.\nPositional parameters are set via the various setX() methods which take an int as their first argument e.g.\n// Set an IRI pss.setIri(0, \u0026quot;http://example.org\u0026quot;); // Set a Literal pss.setLiteral(0, 1234); pss.setLiteral(0, true); pss.setLiteral(0, \u0026quot;value\u0026quot;); Where you set a value for a variable you have already set the existing value is overwritten. Setting any value to null has the same effect as calling the clearParam(0) method\nIf you have the value already as a RDFNode or Node instance you can call the setParam() method instead e.g.\n// Set a Node Node n = NodeFactory.createIRI(\u0026quot;http://example.org\u0026quot;); pas.setParam(0, n); Non-existent parameters Where you try to set a variable/positional parameter that does not exist there will be no feedback that the parameter does not exist, however the value set will not be included in the string produced when calling the toString() method.\nBuffer Usage Additionally you may use this purely as a StringBuffer replacement for creating commands since it provides a large variety of convenience methods for appending things either as-is or as nodes (which causes appropriate formatting to be applied).\nFor example we could add an ORDER BY clause to our earlier example like so:\n// Add ORDER BY clause pss.append(\u0026quot;ORDER BY ?s\u0026quot;); Be aware that the basic append() methods append the given value as-is without any special formatting applied, if you wanted to use the value being appended as a constant in the SPARQL query then you should use the appropriate appendLiteral(), appendIri() or appendNode() method e.g.\n// Add a LIMIT clause pss.append(\u0026quot;LIMIT \u0026quot;); pss.appendLiteral(50); Getting a Query/Update Once you\u0026rsquo;ve prepared your command you should then call the asQuery() or asUpdate() method to get it as a Query or UpdateRequest object as appropriate. Doing this calls toString() to produce the final version of your command with all values injected and runs it through the appropriate parser (either QueryFactory or UpdateFactory).\nYou can then use the returned Query or UpdateRequest object as you would normally to make a query/update.\nSPARQL Injection Notes First a couple of warnings:\nThis class does not in any way check that your command is syntactically correct until such time as you try and parse it as a Query or UpdateRequest. Injection is done purely based on textual replacement, it does not understand or respect variable scope in any way. For example if your command text contains sub queries you should ensure that variables within the sub query which you don\u0026rsquo;t want replaced have distinct names from those in the outer query you do want replaced (or vice versa) While this class was in part designed to prevent SPARQL injection it is by no means foolproof because it works purely at the textual level. The current version of the code addresses some possible attack vectors that the developers have identified but we do not claim to be sufficiently devious to have thought of and prevented every possible attack vector.\nTherefore we strongly recommend that users concerned about SPARQL Injection attacks perform their own validation on provided parameters and test their use of this class themselves prior to its use in any security conscious deployment. We also recommend that users do not use easily guess-able variable names for their parameters as these can allow a chained injection attack though generally speaking the code should prevent these.\n","permalink":"https://jena.apache.org/documentation/query/parameterized-sparql-strings.html","tags":null,"title":"Parameterized SPARQL String"},{"categories":null,"contents":"The origins of RDF as a representation language include frame languages, in which an object, or frame, was the main unit of structuring data. Frames have slots, for example a Person frame might have an age slot, a heightslot etc. RDF, however, has taken a step beyond frame languages by making rdf:Property a first class value, not an element of a frame or resource per se. In RDF, for example, an age property can be defined: \u0026lt;rdf:Property rdf:ID=\u0026quot;age\u0026quot;\u0026gt;, and then applied to any resource, including, but not limited to a Person resource.\nWhile this introduces an extra element of modelling flexibility in RDF, it is often the case that users want to treat some components in their models in a more structured way, similar to the original idea of frames. It is often assumed that rdfs:domain restricts a property to be used only on resources that are in the domain class. For example, a frequently asked question on the Jena support list is why the following is not an error:\n\u0026lt;rdfs:Class rdf:ID=\u0026quot;Person\u0026quot; /\u0026gt; \u0026lt;rdfs:Class rdf:ID=\u0026quot;Truck\u0026quot; /\u0026gt; \u0026lt;rdf:Property rdf:ID=\u0026quot;age\u0026quot;\u0026gt; \u0026lt;rdfs:domain rdf:resource=\u0026quot;Person\u0026quot; /\u0026gt; \u0026lt;/rdf:Property\u0026gt; \u0026lt;Truck rdf:ID=\u0026quot;truck1\u0026quot;\u0026gt; \u0026lt;age\u0026gt;2\u0026lt;/age\u0026gt; \u0026lt;/Truck\u0026gt; Whereas many object-oriented or frame-oriented representations would regard it as an error that the age property was not being applied to a Person, RDF-based applications are simply entitled to infer that truck1 is a (that is, has rdf:type) Truck as well as a Person. This is unlikely to be the case in any real-world domain, but it is a valid RDF inference.\nA consequence of RDF\u0026rsquo;s design is that it is not really possible to answer the commonly asked question \u0026ldquo;Which properties can be applied to resources of class C?\u0026rdquo;. Strictly speaking, the RDF answer is \u0026ldquo;Any property\u0026rdquo;. However, many developers have a legitimate requirement to present a composite view of classes and their associated properties, forming more a more succinct structuring of an ontology or schema. The purpose of this note is to explain the mechanisms built-in to Jena to support a frame-like view of resources, while remaining correct with respect to RDF (and OWL) semantics.\nBasic principles: the properties of a class Since any RDF property can be applied to any RDF resource, we require a definition of the properties of a given class that respects RDF semantics. Consider the following RDF fragment:\n\u0026lt;rdfs:Class rdf:ID=\u0026quot;Person\u0026quot; /\u0026gt; \u0026lt;rdf:Property rdf:ID=\u0026quot;age\u0026quot; /\u0026gt; \u0026lt;Person rdf:ID=\u0026quot;jane_doe\u0026quot;\u0026gt; \u0026lt;age\u0026gt;23\u0026lt;/a\u0026gt; \u0026lt;/Person\u0026gt; Now consider that we add to this fragment that:\n\u0026lt;rdf:Property rdf:about=\u0026quot;age\u0026quot;\u0026gt; \u0026lt;rdfs:domain rdf:resource=\u0026quot;Person\u0026quot; /\u0026gt; \u0026lt;/rdf:Property\u0026gt; This additional information about the domain of the age property does not add any new entailments to the model. Why? Because we already know that jane_doe is a Person. So we can consider age to be one of the properties of Person type resources, because if we use the property as a predicate of that resource, it doesn\u0026rsquo;t add any new rdf:type information about the resource. Conversely, if we know that some resource has an age, we don\u0026rsquo;t learn any new information by declaring that it has rdf:type Person. In summary, for the purposes of this HOWTO we define the properties of a class as just those properties that don\u0026rsquo;t entail any new type information when applied to resources that are already known to be of that class.\nSub-classes, and more complex class expressions Given these basic principles, now consider the following RDF fragment:\n\u0026lt;rdfs:Class rdf:ID=\u0026quot;LivingThing\u0026quot; /\u0026gt; \u0026lt;rdfs:Class rdf:ID=\u0026quot;Animal\u0026quot;\u0026gt; \u0026lt;rdfs:subClassOf rdf:resource=\u0026quot;#LivingThing\u0026quot;\u0026gt; \u0026lt;/rdfs:Class\u0026gt; \u0026lt;rdfs:Class rdf:ID=\u0026quot;Mammal\u0026quot;\u0026gt; \u0026lt;rdfs:subClassOf rdf:resource=\u0026quot;#Animal\u0026quot;\u0026gt; \u0026lt;/rdfs:Class\u0026gt; \u0026lt;rdf:Property rdf:ID=\u0026quot;hasSkeleton\u0026quot;\u0026gt; \u0026lt;rdfs:domain rdf:resource=\u0026quot;Animal\u0026quot; /\u0026gt; \u0026lt;/rdf:Property\u0026gt; Is hasSkeleton one of the properties of Animal? Yes, because any resource of rdf:type Animal can have a hasSkeleton property (with value either true or false) without adding type information. Similarly, any resource that is a Mammal also has rdf:type Animal (by the sub-class relation), so hasSkeleton is a property of Mammal. However, hasSkeleton is not a property of LivingThing, since we don\u0026rsquo;t automatically know that a living thing is an animal - it may be a plant. Stating that a given LivingThing has a hasSkeleton property, even if the value is false, would entail the additional rdf:type statement that the LivingThing is also an Animal.\nFor more complex class expressions in the domain, we look to see what simple domain constraints are entailed. For example, a domain constraint A ∩ B (i.e. \u0026ldquo;A intersection B\u0026rdquo;) for property p entails that both p rdfs:domain A and p rdfs:domain B are true. However, the properties of neither A nor B will include p. To see this, suppose we have a resource x that we already know is of type A, and a statement x p y. This entails x rdf:type A which we already know, but also x rdf:type B. So information is added, even if we know that x is an instance A, so p is not a property of A. The symmetrical argument holds for p not being a property of B.\nHowever, if the domain of p is A ∪ B (i.e. \u0026ldquo;A union B\u0026rdquo;), then both A and B will have p as a property, since an occurrence of, say x p y does not allow us to conclude that either x rdf:type A or x rdf:type B.\nProperty hierarchies Since sub-properties inherit the domain constraints of their parent property, the properties of a class will include the closure over the sub-property hierarchy. Extending the previous example, the properties of Animal and Mammal include both hasSkeleton and hasEndoSkeleton:\n\u0026lt;rdf:Property rdf:ID=\u0026quot;hasSkeleton\u0026quot;\u0026gt; \u0026lt;rdfs:domain rdf:resource=\u0026quot;Animal\u0026quot; /\u0026gt; \u0026lt;/rdf:Property\u0026gt; \u0026lt;rdf:Property rdf:ID=\u0026quot;hasEndoSkeleton\u0026quot;\u0026gt; \u0026lt;rdfs:subPropertyOf rdf:resource=\u0026quot;#hasSkeleton\u0026quot; /\u0026gt; \u0026lt;/rdf:Property\u0026gt; In general, there may be many different ways of deducing simple domain constraints from the axioms asserted in the ontology. Whether or not all of these possible deductions are present in any given RDF model depends on the power and completeness of the reasoner bound to that model.\nGlobal properties Under the principled definition that we propose here, properties which do not express a domain value are global, in the sense that they can apply to any resource. They do not, by definition, entail any new type information about the individuals they are applied to. Put another way, the domain of a property, if unspecified, is either rdfs:Resource or owl:Thing, depending on the ontology language. These are simply the types that all resources have by default. Therefore, every class has all of the global properties as one of the properties of the class.\nA commonly used idiom in some OWL ontologies is to use Restrictions to create an association between a class and the properties of instances of that class. For example, the following fragment shows that all instances of Person should have a familyName property:\n\u0026lt;owl:Class rdf:ID=\u0026quot;Person\u0026quot;\u0026gt; \u0026lt;rdfs:subClassOf\u0026gt; \u0026lt;owl:Restriction\u0026gt; \u0026lt;owl:onProperty rdf:resource=\u0026quot;#familyName\u0026quot; /\u0026gt; \u0026lt;owl:minCardinality rdf:datatype=\u0026quot;\u0026amp;xsd;int\u0026quot;\u0026gt;1\u0026lt;/owl:minCardinality\u0026gt; \u0026lt;/owl:Restriction\u0026gt; \u0026lt;/rdfs:subClassOf\u0026gt; \u0026lt;/owl:Class\u0026gt; This approach shows the intent of the ontology designer that Person instances have familyName properties. We do regard familyName as one of the properties of Person, but only because of the global properties principle. Unless a domain constraint is also specified for familyName, it will appear as one of the properties of classes other than Person. Note that this is a behaviour change from versions of Jena prior to release 2.2. Prior to this release, Jena used a heuristic method to attempt to associate restriction properties with the classes sub-classing that restriction. Since there were problems with precisely defining the heuristic, and ensuring correct behaviour (especially with inference models), we have dropped the use of this heuristic from Jena 2.2 onwards.\nThe Java API Support for frame-like views of classes and properties is provided through the ontology API. The following methods are used to access the properties of a class, and the converse for properties:\nOntClass.listDeclaredProperties(); OntClass.listDeclaredProperties( boolean direct ); OntClass.hasDeclaredProperty( Property prop, boolean direct ); OntProperty.listDeclaringClasses(); OntProperty.listDeclaringClasses( boolean direct ); All of the above API methods return a Jena ExtendedIterator.\nNote a change from the Jena 2.1 interface: the optional Boolean parameter on listDeclaredProperties has changed name from all (Jena 2.1 and earlier) to direct (Jena 2.2 and later). The meaning of the parameter has also changed: all was intended to simulate some reasoning steps in the absence of a reasoner, whereas direct is used to restrict the associations to only the local associations. See more on direct associations.\nA further difference from Jena 2.1 is that the models that are constructed without reasoners perform only very limited simulation of the inference closure of the model. Users who wish the declared properties to include entailments will need to construct their models with one of the built-in or external reasoners. The difference is illustrated by the following code fragment:\n\u0026lt;rdfs:Class rdf:ID=\u0026quot;A\u0026quot; /\u0026gt; \u0026lt;rdfs:Property rdf:ID=\u0026quot;p\u0026quot;\u0026gt; \u0026lt;rdfs:domain rdf:resource=\u0026quot;#A\u0026quot; /\u0026gt; \u0026lt;/rdfs:Property\u0026gt; \u0026lt;rdfs:Property rdf:ID=\u0026quot;q\u0026quot;\u0026gt; \u0026lt;rdfs:subPropertyOf rdf:resource=\u0026quot;#p\u0026quot; /\u0026gt; \u0026lt;/rdfs:Property\u0026gt; OntModel mNoInf = ModelFactory.createOntologyModel( OntModelSpec.OWL_MEM ); OntClass a0 = mNoInf.getOntClass( NS + \u0026quot;A\u0026quot; ); Iterator i0 = a0.listDeclaredProperties(); OntModel mInf = ModelFactory.createOntologyModel( OntModelSpec.OWL_MEM_RULE_INF ); OntClass a1 = mInf.getOntClass( NS + \u0026quot;A\u0026quot; ); Iterator i1 = a1.listDeclaredProperties(); Iterator i1 will return p and q, while i0 will return only p.\nSummary of changes from Jena 2.2-beta-2 and older For users updating code that uses listDeclaredProperties from versions of Jena prior to 2.2-final, the following changes should be noted:\nGlobal properties listDeclaredProperties will treat properties with no specified domain as global, and regard them as properties of all classes. The use of the direct flag can hide global properties from non-root classes. Restriction properties listDeclaredProperties no longer heuristically returns properties associated with a class via the owl:onProperty predicate of a restriction. Limited simulated inference The old version of listDeclaredProperties attempted to simulate the entailed associations between classes and properties. Users are now advised to attach a reasoner to their models to do this. Change in parameter semantics The old version of listDeclaredProperties(boolean all) took one parameter, a Boolean flag to indicate whether additional declared (implied) properties should be listed. Since this is now covered by the use, or otherwise, of a reasoner attached to the model, the new method signature is listDeclaredProperties(boolean direct), where calling the method with direct = true will compress the returned results to use only the direct associations. ","permalink":"https://jena.apache.org/documentation/notes/rdf-frames.html","tags":null,"title":"Presenting RDF as frames"},{"categories":null,"contents":"SPARQL has four result forms:\nSELECT – Return a table of results. CONSTRUCT – Return an RDF graph, based on a template in the query. DESCRIBE – Return an RDF graph, based on what the query processor is configured to return. ASK – Ask a boolean query. The SELECT form directly returns a table of solutions as a result set, while DESCRIBE and CONSTRUCT use the outcome of matching to build RDF graphs.\nSolution Modifiers Pattern matching produces a set of solutions. This set can be modified in various ways:\nProjection - keep only selected variables OFFSET/LIMIT - chop the number solutions (best used with ORDER BY) ORDER BY - sorted results DISTINCT - yield only one row for one combination of variables and values. The solution modifiers OFFSET/LIMIT and ORDER BY always apply to all result forms.\nOFFSET and LIMIT A set of solutions can be abbreviated by specifying the offset (the start index) and the limit (the number of solutions) to be returned. Using LIMIT alone can be useful to ensure not too many solutions are returned, to restrict the effect of some unexpected situation. LIMIT and OFFSET can be used in conjunction with sorting to take a defined slice through the solutions found.\nORDER BY SPARQL solutions are sorted by expression, including custom functions.\nORDER BY ?x ?y ORDER BY DESC(?x) ORDER BY x:func(?x) # Custom sorting condition DISTINCT The SELECT result form can take the DISTINCT modifier which ensures that no two solutions returned are the same - this takes place after projection to the requested variables.\nSELECT The SELECT result form is a projection, with DISTINCT applied, of the solution set. SELECT identifies which named variables are in the result set. This may be \u0026ldquo;*\u0026rdquo; meaning \u0026ldquo;all named variables\u0026rdquo; (blank nodes in the query act like variables for matching but are never returned).\nCONSTRUCT CONSTRUCT builds an RDF based on a graph template. The graph template can have variables which are bound by a WHERE clause. The effect is to calculate the graph fragment, given the template, for each solution from the WHERE clause, after taking into account any solution modifiers. The graph fragments, one per solution, are merged into a single RDF graph which is the result.\nAny blank nodes explicitly mentioned in the graph template are created afresh for each time the template is used for a solution.\nDESCRIBE The CONSTRUCT form, takes an application template for the graph results. The DESCRIBE form also creates a graph but the form of that graph is provided the query processor, not the application. For each URI found, or explicitly mentioned in the DESCRIBE clause, the query processor should provide a useful fragment of RDF, such as all the known details of a book. ARQ allows domain-specific description handlers to be written.\nASK The ASK result form returns a boolean, true of the pattern matched otherwise false.\nReturn to index\n","permalink":"https://jena.apache.org/tutorials/sparql_results.html","tags":null,"title":"Producing Result Sets"},{"categories":null,"contents":"SPARQL tem quatro formas de se obter resultados:\nSELECT – Retorna uma tabela de resultados. CONSTRUCT – Retorna um grafo RDF, baseado num template da consulta. DESCRIBE – Retorna um grafo RDF, baseado no quê o processador está configurado para retornar. ASK – Faz uma consulta booleana. A forma SELECT, diretamente, retorna uma tabela de soluções como conjunto de resultados, enquanto que DESCRIBE e CONSTRUCT o resultado da consulta para montar um grafo RDF.\nModificadores de Soluções Casamento de padrões produz um conjunto de soluções. Esse conjunto pode ser modificado de várias maneiras:\nProjection - mantém apenas variáveis selecionadas OFFSET/LIMIT - recorta o número de soluções (melhor usado com ORDER BY) ORDER BY - resultados ordenados DISTINCT - retorna apenas uma linha para uma combinação de variáveis e valores. Os modificadores de solução OFFSET/LIMIT e ORDER BY sempre se aplica a todos os resultados.\nOFFSET e LIMIT Um conjunto de soluções pode ser abreviado especificando o deslocamento (índice inicial) e o limite (número de soluções) a ser retornados. Usando apenas LIMIT é útil para garantir que nem tantas soluções vão ser retornadas, para restringir o efeito de uma situação inesperada. LIMIT e OFFSET pode ser usado em conjunção com ordenamento para pegar um fatia definida dentre as soluções encontradas.\nORDER BY Soluções SPARQL são ordenadas por expressões, incluindo funções padrões.\nORDER BY ?x ?y ORDER BY DESC(?x) ORDER BY x:func(?x) # Custom sorting condition DISTINCT O SELECT pode usar o modificador DISTINCT para garantir que duas soluções retornadas sejam diferentes.\nSELECT O SELECT é uma projeção, com DISTINCT aplicado, do conjunto solução. SELECT identifica quais variáveis nomeadas estão no conjunto resultado. Isso pode ser um \u0026ldquo;*\u0026rdquo; significando que “todas as variáveis” (blank nodes na consulta atuam como variáveis para casamento, mas nada é retornado).\nCONSTRUCT CONSTRUCT monta um RDF baseado num grafo template. O grafo template pode ter variáveis que são definidas na clausula WHERE. O efeito é o cálculo de um fragmento de grafo, dado o template, para cada solução da clausula WHERE, depois levando em conta qualquer modificador de solução. Os fragmentos de grafo, um por solução, são juntados num único grafo RDF que é o resultado.\nQualquer blank node explicitamente mencionado no grafo template são criados novamente para cada vez que o template é usado para uma solução.\nDESCRIBE CONSTRUCT pega um template para o grafo de resultados. O DESCRIBE também cria um grafo mas a forma deste grafo é fornecida pelo processador da consulta, não a aplicação. Pra cada URI encontrada, ou explicitamente mencionada na clausula DESCRIBE, o processor de consultas deve prover um fragmento de RDF útil, como todos os detalhes conhecidos de um livro. ARQ permite a escrita de manipuladores de descrições especificas de domínio.\nASK ASK retorna um booleano, true se o padrão for casado, ou false caso contrário.\nRetornar ao índice\n","permalink":"https://jena.apache.org/tutorials/sparql_results_pt.html","tags":null,"title":"Produzindo resultados"},{"categories":null,"contents":"SPARQL allows custom property functions to add functionality to the triple matching process. Property functions can be registered or dynamically loaded.\nSee also the free text search page.\nSee also the FILTER functions FILTER functions library.\nApplications can also provide their own property functions.\nProperty Function Library The prefix apf is \u0026lt;http://jena.apache.org/ARQ/property#\u0026gt;. (The old prefix of \u0026lt;http://jena.hpl.hp.com/ARQ/property#\u0026gt; continues to work. Applications are encouraged to switch.)\nDirect loading using a URI prefix of \u0026lt;java:org.apache.jena.sparql.pfunction.library.\u0026gt; (note the final dot) also works.\nThe prefix list: is http://jena.apache.org/ARQ/list#.\nProperty nameDescription list list:member member Membership of an RDF List (RDF Collection). If list is not bound or a constant, find and iterate all lists in the graph (can be slow) else evaluate for one particular list. If member a variable, generate solutions with member bound to each element in the list. If member is bound or a constant expression, test to see if a member of the list. list list:index (index member) Index of an RDF List (RDF Collection). If list is not bound or a constant, find and iterate all lists in the graph (can be slow) else evaluate for one particular list. The object is a list pair, either element can be bound, unbound or a fixed node. Unbound variables in the object list are bound by the property function. list list:length length Length of an RDF List (RDF Collection). If list is not bound or a constant, find and iterate all lists in the graph (can be slow) else evaluate for one particular list. The object is tested against or bound to the length of the list. container rdfs:member member Membership of an RDF Container (rdf:Bag, rdf:Seq, rdf:Alt). Pre-registered URI. If this infers with queries running over a Jena inference model which also provides rdfs:member, then remove this from the global registry. PropertyFunctionRegistry.get().\nremove(RDFS.member.getURI()) ; apf:textMatch Free text match. bag apf:bag member The argument bag must be bound by this point in the query or a constant expression. If bag is bound or a URI, and member a variable, generate solutions with member bound to each element in the bag. If member is bound or a constant expression, test to see if a member of the list. seq apf:seq member The argument seq must be bound by this point in the query or a constant expression. If seq is bound or a URI, and member a variable, generate solutions with member bound to each element in the sequence. If member is bound or a constant expression, test to see if a member of the list. seq apf:alt member The argument alt must be bound by this point in the query or a constant expression. If alt is bound or a URI, and member a variable, generate solutions with member bound to each element in the alt . If member is bound or a constant expression, test to see if a member of the list. varOrTermapf:assignvarOrTerm Assign an RDF term from one side to the other. If both are fixed RDF terms or bound variables, it becomes a boolean test that the subject is the same RDF term as the object. iriapf:splitIRI (namespace localname)\niriapf:splitURI (namespace localname) Split the IRI or URI into namespace (an IRI) and local name (a string). Compare if given values or bound variables, otherwise set the variable. The object is a list with 2 elements. splitURI is an synonym. subject apf:str object The subject is the string form of the object, like the function str(). Object must be bound or a constant. Object can not be a blank node (see apf:blankNode) subject apf:blankNode label subject apf:bnode label\nSubject must be bound to a blank node or a constant. Label is either a string, in which case test for whether this is the blank node label of subject, or it is a variable, which is assigned the blank node label as a plain string. Argument mismatch causes no match. Use with care. subject apf:versionARQ version\nSet the subject to the IRI for ARQ and set the object to the version string (format \"N.N.N\" where N is a number). If any of the variables are already set, test for the correct value. var apf:concat (arg arg ...) Concatenate the arguments in the object list as strings, and assign to var. var apf:strSplit (arg arg) Split a string and return a binding for each result. The subject variable should be unbound. The first argument to the object list is the string to be split. The second argument to the object list is a regular expression by which to split the string. The subject var is bound for each result of the split, and each result has the whitespace trimmed from it. ARQ documentation index\n","permalink":"https://jena.apache.org/documentation/query/library-propfunc.html","tags":null,"title":"Property Functions in ARQ"},{"categories":null,"contents":"\u0026ldquo;RDF Binary\u0026rdquo; is a efficient format for RDF and RDF-related data using Apache Thrift or Google Protocol Buffers as the binary data encoding.\nThe W3C standard RDF syntaxes are text or XML based. These incur costs in parsing; the most human-readable formats also incur high costs to write, and have limited scalability due to the need to analyse the data for pretty printing rather than simply stream to output.\nBinary formats are faster to process - they do not incur the parsing costs of text-base formats. \u0026ldquo;RDF Binary\u0026rdquo; defines basic encoding for RDF terms, then builds data formats for RDF graphs, RDF datasets, and for SPARQL result sets. This gives a basis for high-performance linked data systems.\nThrift and Protobuf provides efficient, widely-used, binary encoding layers each with a large number of language bindings.\nFor more details of RDF Thrift.\nThrift encoding of RDF Terms RDF Thrift uses the Thrift compact protocol.\nSource: BinaryRDF.thrift\nRDF terms struct RDF_IRI { 1: required string iri } # A prefix name (abbrev for an IRI) struct RDF_PrefixName { 1: required string prefix ; 2: required string localName ; } struct RDF_BNode { 1: required string label } struct RDF_Literal { 1: required string lex ; 2: optional string langtag ; 3: optional string datatype ; 4: optional RDF_PrefixName dtPrefix ; } struct RDF_Decimal { 1: required i64 value ; 2: required i32 scale ; } struct RDF_VAR { 1: required string name ; } struct RDF_ANY { } struct RDF_UNDEF { } struct RDF_REPEAT { } union RDF_Term { 1: RDF_IRI iri 2: RDF_BNode bnode 3: RDF_Literal literal 4: RDF_PrefixName prefixName 5: RDF_VAR variable 6: RDF_ANY any 7: RDF_UNDEF undefined 8: RDF_REPEAT repeat 9: RDF_Triple tripleTerm # RDF-star # Value forms of literals. 10: i64 valInteger 11: double valDouble 12: RDF_Decimal valDecimal } Thrift encoding of Triples, Quads and rows. struct RDF_Triple { 1: required RDF_Term S 2: required RDF_Term P 3: required RDF_Term O } struct RDF_Quad { 1: required RDF_Term S 2: required RDF_Term P 3: required RDF_Term O 4: optional RDF_Term G } struct RDF_PrefixDecl { 1: required string prefix ; 2: required string uri ; } Thrift encoding of RDF Graphs and RDF Datasets union RDF_StreamRow { 1: RDF_PrefixDecl prefixDecl 2: RDF_Triple triple 3: RDF_Quad quad } RDF Graphs are encoded as a stream of RDF_Triple and RDF_PrefixDecl.\nRDF Datasets are encoded as a stream of RDF_Triple, RDF-Quad and RDF_PrefixDecl.\nThrift encoding of SPARQL Result Sets A SPARQL Result Set is encoded as a list of variables (the header), then a stream of rows (the results).\nstruct RDF_VarTuple { 1: list\u0026lt;RDF_VAR\u0026gt; vars } struct RDF_DataTuple { 1: list\u0026lt;RDF_Term\u0026gt; row } Protobuf encoding of RDF Terms The Protobuf schema is simialr.\nSource: binary-rdf.proto\nStreaming isused to allow for abitrary size graphs. Therefore the steram items (RDF_StreamRow below) are written with an initial length (writeDelimitedTo in the Java API).\nSee Protobuf Techniques Streaming.\nsyntax = \u0026#34;proto3\u0026#34;; option java_package = \u0026#34;org.apache.jena.riot.protobuf.wire\u0026#34; ; // Prefer one file with static inner classes. option java_outer_classname = \u0026#34;PB_RDF\u0026#34; ; // Optimize for speed (default) option optimize_for = SPEED ; //option java_multiple_files = true; // ==== RDF Term Definitions message RDF_IRI { string iri = 1 ; } // A prefix name (abbrev for an IRI) message RDF_PrefixName { string prefix = 1 ; string localName = 2 ; } message RDF_BNode { string label = 1 ; // 2 * fixed64 } // Common abbreviations for datatypes and other URIs? // union with additional values. message RDF_Literal { string lex = 1 ; oneof literalKind { bool simple = 9 ; string langtag = 2 ; string datatype = 3 ; RDF_PrefixName dtPrefix = 4 ; } } message RDF_Decimal { sint64 value = 1 ; sint32 scale = 2 ; } message RDF_Var { string name = 1 ; } message RDF_ANY { } message RDF_UNDEF { } message RDF_REPEAT { } message RDF_Term { oneof term { RDF_IRI iri = 1 ; RDF_BNode bnode = 2 ; RDF_Literal literal = 3 ; RDF_PrefixName prefixName = 4 ; RDF_Var variable = 5 ; RDF_Triple tripleTerm = 6 ; RDF_ANY any = 7 ; RDF_UNDEF undefined = 8 ; RDF_REPEAT repeat = 9 ; // Value forms of literals. sint64 valInteger = 20 ; double valDouble = 21 ; RDF_Decimal valDecimal = 22 ; } } // === StreamRDF items message RDF_Triple { RDF_Term S = 1 ; RDF_Term P = 2 ; RDF_Term O = 3 ; } message RDF_Quad { RDF_Term S = 1 ; RDF_Term P = 2 ; RDF_Term O = 3 ; RDF_Term G = 4 ; } // Prefix declaration message RDF_PrefixDecl { string prefix = 1; string uri = 2 ; } // StreamRDF message RDF_StreamRow { oneof row { RDF_PrefixDecl prefixDecl = 1 ; RDF_Triple triple = 2 ; RDF_Quad quad = 3 ; RDF_IRI base = 4 ; } } message RDF_Stream { repeated RDF_StreamRow row = 1 ; } // ==== SPARQL Result Sets message RDF_VarTuple { repeated RDF_Var vars = 1 ; } message RDF_DataTuple { repeated RDF_Term row = 1 ; } // ==== RDF Graph message RDF_Graph { repeated RDF_Triple triple = 1 ; } ","permalink":"https://jena.apache.org/documentation/io/rdf-binary.html","tags":null,"title":"RDF Binary using Apache Thrift"},{"categories":null,"contents":"RDFConnection provides a unified set of operations for working on RDF with SPARQL operations. It provides SPARQL Query, SPARQL Update and the SPARQL Graph Store operations. The interface is uniform - the same interface applies to local data and to remote data using HTTP and the SPARQL protocols (SPARQL protocol) and SPARQL Graph Store Protocol).\nOutline RDFConnection provides a number of different styles for working with RDF data in Java. It provides support for try-resource and functional code passing styles, as well the more basic sequence of methods calls.\nFor example: using try-resources to manage the connection, and perform two operations, one to load some data, and one to make a query can be written as:\ntry ( RDFConnection conn = RDFConnection.connect(...) ) { conn.load(\u0026#34;data.ttl\u0026#34;) ; conn.querySelect(\u0026#34;SELECT DISTINCT ?s { ?s ?p ?o }\u0026#34;, (qs) -\u0026gt; { Resource subject = qs.getResource(\u0026#34;s\u0026#34;) ; System.out.println(\u0026#34;Subject: \u0026#34; + subject) ; }) ; } This could have been written as (approximately \u0026ndash; the error handling is better in the example above):\nRDFConnection conn = RDFConnection.connect(...) conn.load(\u0026#34;data.ttl\u0026#34;) ; QueryExecution qExec = conn.query(\u0026#34;SELECT DISTINCT ?s { ?s ?p ?o }\u0026#34;) ; ResultSet rs = qExec.execSelect() ; while(rs.hasNext()) { QuerySolution qs = rs.next() ; Resource subject = qs.getResource(\u0026#34;s\u0026#34;) ; System.out.println(\u0026#34;Subject: \u0026#34; + subject) ; } qExec.close() ; conn.close() ; Transactions Transactions are the preferred way to work with RDF data. Operations on an RDFConnection outside of an application-controlled transaction will cause the system to add one for the duration of the operation. This \u0026ldquo;autocommit\u0026rdquo; feature may lead to inefficient operations due to excessive overhead.\nThe Txn class provides a Java8-style transaction API. Transactions are code passed in the Txn library that handles the transaction lifecycle.\ntry ( RDFConnection conn = RDFConnection.connect(...) ) { Txn.execWrite(conn, () -\u0026gt; { conn.load(\u0026#34;data1.ttl\u0026#34;) ; conn.load(\u0026#34;data2.ttl\u0026#34;) ; conn.querySelect(\u0026#34;SELECT DISTINCT ?s { ?s ?p ?o }\u0026#34;, (qs) -\u0026gt; Resource subject = qs.getResource(\u0026#34;s\u0026#34;) ; System.out.println(\u0026#34;Subject: \u0026#34; + subject) ; }) ; }) ; } The traditional style of explicit begin, commit, abort is also available.\ntry ( RDFConnection conn = RDFConnection.connect(...) ) { conn.begin(ReadWrite.WRITE) ; try { conn.load(\u0026#34;data1.ttl\u0026#34;) ; conn.load(\u0026#34;data2.ttl\u0026#34;) ; conn.querySelect(\u0026#34;SELECT DISTINCT ?s { ?s ?p ?o }\u0026#34;, (qs) -\u0026gt; { Resource subject = qs.getResource(\u0026#34;s\u0026#34;) ; System.out.println(\u0026#34;Subject: \u0026#34; + subject) ; }) ; conn.commit() ; } finally { conn.end() ; } } The use of try-finally ensures that transactions are properly finished. The conn.end() provides an abort in case an exception occurs in the transaction and a commit has not been issued. The use of try-finally ensures that transactions are properly finished.\nTxn is wrapping these steps up and calling the application supplied code for the transaction body.\nRemote Transactions SPARQL does not define a remote transaction standard protocol. Each remote operation should be atomic (all happens or nothing happens) - this is the responsibility of the remote server.\nAn RDFConnection will at least provide the client-side locking features. This means that overlapping operations that change data are naturally handled by the transaction pattern within a single JVM.\nConfiguring a remote RDFConnection. The default settings on a remote connection should work for any SPARQL triple store endpoint which supports HTTP content negotiation. Sometimes different settings are desirable or required and RDFConnectionRemote provides a builder to construct RDFConnectionRemotes.\nAt its simplest, it is:\nRDFConnectionRemoteBuilder builder = RDFConnection.create() .destination(\u0026#34;http://host/triplestore\u0026#34;); which uses default settings used by RDFConenctionFactory.connect.\nSee example 4 and example 5.\nThere are many options, including setting HTTP headers for content types (javadoc) and providing detailed configuration with Apache HttpComponents HttpClient.\nFuseki Specific Connection If the remote destination is an Apache Jena Fuseki server, then the default general settings work, but it is possible to have a specialised connection\nRDFConnectionRemoteBuilder builder = RDFConnectionFuseki.create() .destination(\u0026#34;http://host/fuseki\u0026#34;); which uses settings tuned to Fuseki, including round-trip handling of blank nodes.\nSee example 6.\nGraph Store Protocol The SPARQL Graph Store Protocol (GSP) is a set of operations to work on whole graphs in a dataset. It provides a standardised way to manage the data in a dataset.\nThe operations are to fetch a graph, set the RDF data in a graph, add more RDF data into a graph, and delete a graph from a dataset.\nFor example: load two files:\ntry ( RDFConnection conn = RDFConnection.connect(...) ) { conn.load(\u0026#34;data1.ttl\u0026#34;) ; conn.load(\u0026#34;data2.nt\u0026#34;) ; } The file extension is used to determine the syntax.\nThere is also a set of scripts to help do these operations from the command line with SOH. It is possible to write curl scripts as well. The SPARQL Graph Store Protocol provides a standardised way to manage the data in a dataset.\nIn addition, RDFConnection provides an extension to give the same style of operation to work on a whole dataset (deleting the dataset is not provided).\nconn.loadDataset(\u0026#34;data-complete.trig\u0026#34;) ; Local vs Remote GSP operations work on whole models and datasets. When used on a remote connection, the result of a GSP operation is a separate copy of the remote RDF data. When working with local connections, 3 isolation modes are available:\nCopy – the models and datasets returned are independent copies. Updates are made to the return copy only. This is most like a remote connection and is useful for testing. Read-only – the models and datasets are made read-only but any changes to the underlying RDF data by changes by another route will be visible. This provides a form of checking for large datasets when \u0026ldquo;copy\u0026rdquo; is impractical. None – the models and datasets are passed back with no additional wrappers, and they can be updated with the changes being made the underlying dataset. The default for a local RDFConnection is \u0026ldquo;none\u0026rdquo;. When used with TDB, accessing returned models must be done with transactions in this mode.\nQuery Usage RDFConnection provides methods for each of the SPARQL query forms (SELECT, CONSTRUCT, DESCRIBE, ASK) as well as a way to get the QueryExecution for specialized configuration. When creating an QueryExecution explicitly, care should be taken to close it.\nIf the application wishes to capture the result set from a SELECT query and retain it across the lifetime of the transaction or QueryExecution, then the application should create a copy which is not attached to any external system with ResultSetFactory.copyResults.\ntry ( RDFConnection conn = RDFConnection.connect(\u0026#34;https://...\u0026#34;) ) { ResultSet safeCopy = Txn.execReadReturn(conn, () -\u0026gt; { // Process results by row: conn.querySelect(\u0026#34;SELECT DISTINCT ?s { ?s ?p ?o }\u0026#34;, (qs) -\u0026gt; { Resource subject = qs.getResource(\u0026#34;s\u0026#34;) ; System.out.println(\u0026#34;Subject: \u0026#34;+subject) ; }) ; ResultSet rs = conn.query(\u0026#34;SELECT * { ?s ?p ?o }\u0026#34;).execSelect() ; return ResultSetFactory.copyResults(rs) ; }) ; } Update Usage SPARQL Update operations can be performed and mixed with other operations.\ntry ( RDFConnection conn = RDFConnection.connect(...) ) { Txn.execWrite(conn, () -\u0026gt; { conn.update(\u0026#34;DELETE DATA { ... }\u0026#34; ) ; conn.load(\u0026#34;data.ttl\u0026#34;) ; }) ; } Dataset operations In addition to the SPARQL Graph Store Protocol, operations on whole datasets are provided for fetching (HTTP GET), adding data (HTTP POST) and setting the data (HTTP PUT) on a dataset URL. This assumes the remote server supported these REST-style operations. Apache Jena Fuseki does provide these.\nSubinterfaces To help structure code, the RDFConnection consists of a number of different interfaces. An RDFConnection can be passed to application code as one of these interfaces so that only certain subsets of the full operations are visible to the called code.\nquery via SparqlQueryConnection update via SparqlUpdateConnection graph store protocol RDFDatasetAccessConnection (read operations), and RDFDatasetConnection (read and write operations). Examples for simple usage examples see https://github.com/apache/jena/tree/main/jena-examples/src/main/java/rdfconnection/examples. for example of how to use with StreamRDF see https://github.com/apache/jena/blob/main/jena-examples/src/main/java/org/apache/jena/example/streaming/StreamRDFToConnection.java. ","permalink":"https://jena.apache.org/documentation/rdfconnection/","tags":null,"title":"RDF Connection : SPARQL operations API"},{"categories":null,"contents":"This page describes RDF Patch. An RDF Patch is a set of changes to an RDF dataset. The change are for triples, quads and prefixes.\nChanges to triples involving blank nodes are handled by using their system identifier which uniquely identifies a blank node. Unlike RDF syntaxes, blank nodes are not generated afresh each time the document is parsed.\nExample This example ensures certain prefixes are in the dataset and adds some basic triples for a new subclass of \u0026lt;http://example/SUPER_CLASS\u0026gt;.\nTX . PA \u0026#34;rdf\u0026#34; \u0026#34;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026#34; . PA \u0026#34;owl\u0026#34; \u0026#34;http://www.w3.org/2002/07/owl#\u0026#34; . PA \u0026#34;rdfs\u0026#34; \u0026#34;http://www.w3.org/2000/01/rdf-schema#\u0026#34; . A \u0026lt;http://example/SubClass\u0026gt; \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type\u0026gt; \u0026lt;http://www.w3.org/2002/07/owl#Class\u0026gt; . A \u0026lt;http://example/SubClass\u0026gt; \u0026lt;http://www.w3.org/2000/01/rdf-schema#subClassOf\u0026gt; \u0026lt;http://example/SUPER_CLASS\u0026gt; . A \u0026lt;http://example/SubClass\u0026gt; \u0026lt;http://www.w3.org/2000/01/rdf-schema#label\u0026gt; \u0026#34;SubClass\u0026#34; . TC . Structure The text format for an RDF Patch is N-Triples-like: it is a series of rows, each row ends with a . (DOT). The tokens on a row are keywords, URIs, blank nodes, writen with their label (see below) or RDF Literals, in N-triples syntax. A keyword follows the same rules as Turtle prefix declarations without a trailing :.\nA line has an operation code, then some number of items depending on the operation.\nOperation H Header TXTCTA Change block: transactions PAPD Change: Prefix add and delete AD Change: Add and delete triples and quads The general structure of an RDF patch is a header (possible empty), then a number of change blocks.\nEach change block is a transaction. Transactions can be explicit recorded (\u0026lsquo;TX\u0026rsquo; start, TC commit) to include multiple transaction in one patch. They are not required. If not present, the patch should be applied atomically to the data.\nheader TX Quad, triple or prefix changes TC or TA Multiple transaction blocks are allowed for multiple sets of changes in one patch.\nA binary version based on RDF Thrift is provided. Parsing binary compared to text for N-triples achieves a x3-x4 increase in throughput.\nHeader The header provides for basic information about patch. It is a series of (key, value) pairs.\nIt is better to put complex metadata in a separate file and link to it from the header, but certain information is best kept with the patch. If patches are given an identifier, and als refer to the exp[ected previous patch, it create a log and patches can be applied in the right order.\nA header section can be used to provide additional information. In this example a patch has an identifier and refers to a previous patch. This might be used to create a log of patches, a log being a sequnce of chnages to apply in-order.\nH id \u0026lt;uuid:0686c69d-8f89-4496-acb5-744f0157a8db\u0026gt; . H prev \u0026lt;uuid:3ee0eca0-6d5f-4b4d-85db-f69ab1167eb1\u0026gt; . TX . PA \u0026#34;rdf\u0026#34; \u0026#34;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026#34; . PA \u0026#34;owl\u0026#34; \u0026#34;http://www.w3.org/2002/07/owl#\u0026#34; . PA \u0026#34;rdfs\u0026#34; \u0026#34;http://www.w3.org/2000/01/rdf-schema#\u0026#34; . A \u0026lt;http://example/SubClass\u0026gt; \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type\u0026gt; \u0026lt;http://www.w3.org/2002/07/owl#Class\u0026gt; . A \u0026lt;http://example/SubClass\u0026gt; \u0026lt;http://www.w3.org/2000/01/rdf-schema#subClassOf\u0026gt; \u0026lt;http://example/SUPER_CLASS\u0026gt; . A \u0026lt;http://example/SubClass\u0026gt; \u0026lt;http://www.w3.org/2000/01/rdf-schema#label\u0026gt; \u0026#34;SubClass\u0026#34; . TC . Header format:\nH word RDFTerm . where word is a string in quotes, or an unquoted string (no spaces, starts with a letter, same as a prefix without the colon).\nThe header is ended by the first non H line or the end of the patch.\nTransactions TX . TC . These delimit a block of quad, triple and prefix changes.\nAbort, TA is provided so that changes can be streamed, not obliging the application to buffer change and wait to confirm the action is committed.\nTransactions should be applied atomically when a patch is applied.\nChanges A change is an add or delete of a quad or a prefix.\nPrefixes Prefixes do not apply to the data of the patch. They are changes to the data the patch is applied to.\nThe prefix name is without the trailing colon. It can be given as a quoted string or unquoted string (keyword) with the same limitations as Turtle on the prefix name.\nPA rdf \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026gt; . PA is adding a prefix, PD is deleting a prefix.\nQuads and Triples Triples and quads are written like N-Quads, 3 or 4 RDF terms, with the addition of an initial A or D for \u0026ldquo;add\u0026rdquo; or \u0026ldquo;delete\u0026rdquo;. Triples are in the order S-P-O, quads are S-P-O-G.\nAdd a triple:\nA \u0026lt;http://example/SubClass\u0026gt; \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type\u0026gt; \u0026lt;http://www.w3.org/2002/07/owl#Class\u0026gt; . Blank nodes In order to synchronize datasets, changes involving blank nodes may need to refer to a blank node already in the data. RDF Patch deals with this by making blank node labels refer to the \u0026ldquo;system identifier\u0026rdquo; for the blank node.\nIn this way, RDF Patch is not an \u0026ldquo;RDF Format\u0026rdquo;. In all syntaxes for RDF (Turtle, TriG, RDF/XML etc), blank nodes are \u0026ldquo;document scoped\u0026rdquo; meaning that the blank node is unique to that one time reading of the document. A new blank node is generated every time the file is read into a graph or dataset, and that blank node does not appear in the existing data.\nIn practice, most RDF triplestores have some kind of internal identifier that identifies the blank node. RDF Patch requires a \u0026ldquo;system identifier\u0026rdquo; for blank nodes so that change can refer to an existing blank node in the data.\nThese can be written as _:label or \u0026lt;_:label\u0026gt; (the latter provides a wider set of permissible characters in the label). Note that _ is illegal as an IRI scheme to highlight the fact this is not, strictly, an IRI.\nRDF 1.1 describes skolemization where blank nodes are replaced by a URI. A system could use those for RDF Patch if it also meets the additional requirements to be able to receive and reverse the mapping back to the internal blank node object and also that all system generating patches can safely generate new, fresh skolem IRIs that will become new blank nodes in the RDF dataset then a patch is applied to it.\nPreferred Style The preferred style is to write patch rows on a single line, single space between tokens on a row and a single space before the terminal .. No comments should be included (comments start # and run to end of line).\nHeaders should be placed before the item they refer to; for information used by an RDF Patch Log, the metadata is about the whole patch and should be at the start of the file, before any TX.\n","permalink":"https://jena.apache.org/documentation/rdf-patch/","tags":null,"title":"RDF Patch"},{"categories":null,"contents":"Legacy Documentation : not up-to-date\nThe original ARP parser will be removed from Jena\nThe current RDF/XML parser is RRX.\nThis section details the Jena RDF/XML parser. ARP is the parsing subsystem in Jena for handling the RDF/XML syntax.\nARP Features Using ARP without Jena Using other SAX and DOM XML sources ARP Features Java based RDF parser. Compliant with RDF Syntax and RDF Test Cases Recommendations. Compliant with following standards and recommendations: xml:lang\nxml:lang is fully supported, both in RDF/XML and any document embedding RDF/XML. Moreover, the language tags are checked against RFC1766, RFC3066, ISO639-1, ISO3166. xml:base\nxml:base is fully supported, both in RDF/XML and any document embedding RDF/XML. URI\nAll URI references are checked against RFC2396. The treatment of international URIs implements the concept of RDF URI Reference. XML Names\nAll rdf:ID\u0026rsquo;s are checked against the XML Names specification. Unicode Normal Form C\nString literals are checked for conformance with an early uniform normalization processing model. XML Literals\nrdf:parseType='Literal' is processed respecting namespaces, processing instructions and XML comments. This follows the XML exclusive canonicalizations recommendation with comments. Relative Namespace URI references\nNamespace URI references are checked in light of the W3C XML Plenary decision. Command-line RDF/XML error checking. Can be used independently of Jena, with customizable StatementHandler. Highly configurable error processing. Xerces based XML parsing. Processes both standalone and embedded RDF/XML. Streaming parser, suitable for large files. Supports SAX and DOM, for integration with non-file XML sources. ","permalink":"https://jena.apache.org/documentation/io/arp/arp.html","tags":null,"title":"RDF/XML Input in Jena"},{"categories":null,"contents":"This page details the setup of RDF I/O technology (RIOT).\nFormats Commands Reading RDF in Jena Writing RDF in Jena Working with RDF Streams Formats The following RDF formats are supported by Jena. In addition, other syntaxes can be integrated into both the parser and writer registries.\nTurtle JSON-LD N-Triples N-Quads TriG RDF/XML TriX RDF/JSON RDF Binary RDF/JSON is different from JSON-LD - it is a direct encoding of RDF triples in JSON. See the description of RDF/JSON.\nRDF Binary is a binary encoding of RDF (graphs and datasets) that can be useful for fast parsing. See RDF Binary.\nCommand line tools There are scripts in Jena download to run these commands.\nriot - parse, guessing the syntax from the file extension. Assumes N-Quads/N-Triples from stdin. turtle, ntriples, nquads, trig, rdfxml - parse a particular language These can be called directly as Java programs:\nThe file extensions understood are:\nExtension Language .ttl Turtle .nt N-Triples .nq N-Quads .trig TriG .rdf RDF/XML .owl RDF/XML .jsonld JSON-LD .trdf RDF Thrift .rt RDF Thrift .rpb RDF Protobuf .pbrdf RDF Protobuf .rj RDF/JSON .trix TriX .n3 is supported but only as a synonym for Turtle.\nThe TriX support is for the core TriX format.\nIn addition, if the extension is .gz the file is assumed to be gzip compressed. The file name is examined for an inner extension. For example, .nt.gz is gzip compressed N-Triples.\nJena does not support all possible compression formats itself, only GZip and BZip2 are supported directly. If you want to use an alternative compression format you can do so by piping the output of the relevant decompression utility into one of Jena\u0026rsquo;s commands e.g.\nzstd -d \u0026lt; FILE.nq.zst | riot --syntax NQ ... These scripts call java programs in the riotcmd package. For example:\njava -cp ... riotcmd.riot file.ttl This can be a mixture of files in different syntaxes when file extensions are used to determine the file syntax type.\nThe scripts all accept the same arguments (type \u0026quot;riot --help\u0026quot; to get command line reminders):\n--syntax=NAME; Explicitly set the input syntax for all files. --validate: Checking mode: same as --strict --sink --check=true. --check=true/false: Run with checking of literals and IRIs either on or off. --time: Output timing information. --sink: No output. --output=FORMAT: Output in a given syntax (streaming if possible). --formatted=FORMAT: Output in a given syntax, using pretty printing. --stream=FORMAT: Output in a given syntax, streaming (not all syntaxes can be streamed). To aid in checking for errors in UTF8-encoded files, there is a utility which reads a file of bytes as UTF8 and checks the encoding.\nutf8 \u0026ndash; read bytes as UTF8 Inference RIOT support creation of inferred triples during the parsing process:\nriotcmd.infer --rdfs VOCAB FILE FILE ... Output will contain the base data and triples inferred based on RDF subclass, subproperty, domain and range declarations.\n","permalink":"https://jena.apache.org/documentation/io/","tags":null,"title":"Reading and Writing RDF in Apache Jena"},{"categories":null,"contents":" JSON-LD 1.1 is the default version of JSON-LD supported by Apache Jena. This page is out of date and left temporary only for information about using JSON-LD 1.1 in versions 4.2.x to 4.4.x. This page details support for reading JSON-LD 1.1 using Titanium JSON-LD.\nWhile Titanium is licensed under the Apache License, it has a dependency on the Eclipse Jakarta JSON Processing API, which is licensed under the Eclipse Public License 2.0.\nAdditional Dependencies The Titanium engine (com.apicatalog:titanium-json-ld) uses the Eclipse Jakarta JSON Processing licensed under the Eclipse Public License 2.0 with dependencies:\njakarta.json:jakarta.json-api org.glassfish:jakarta.json Failure to add these dependencies will result in UnsupportedOperationException\nNeed both titanium-json-ld (1.1.0 or later) and org.glassfish:jakarta on the classpath Usage Jena currently (from version 4.2.0) offers both JSON-LD 1.0 and also JSON-LD 1.1.\nThe file extension for JSONLD 1.1 is .jsonld11.\nIf not reading from a file with this file extension, the application needs to force the language choice to be JSON-LD 1.1 with RDFParser using forceLang(Lang.JSONLD11):\nRDFParser.source(...) .forceLang(Lang.JSONLD11) ... .build() or short-cut form:\nRDFParser.source(URL or InputStream) .forceLang(Lang.JSONLD11) .parse(dataset); ","permalink":"https://jena.apache.org/documentation/io/json-ld-11.html","tags":null,"title":"Reading JSON-LD 1.1"},{"categories":null,"contents":"This page details the setup of RDF I/O technology (RIOT) for Apache Jena.\nSee Writing RDF for details of the RIOT Writer system.\nAPI Determining the RDF syntax Example 1 : Using the RDFDataMgr Example 2 : Model usage Example 3 : Using RDFParser Logging The StreamManager and LocationMapper Configuring a StreamManager Configuring a LocationMapper Advanced examples Iterating over parser output Filtering the output of parsing Add a new language Full details of operations are given in the javadoc.\nAPI Much of the functionality is accessed via the Jena Model API; direct calling of the RIOT subsystem isn\u0026rsquo;t needed. A resource name with no URI scheme is assumed to be a local file name.\nApplications typically use at most RDFDataMgr to read RDF datasets.\nThe major classes in the RIOT API are:\nClass Comment RDFDataMgr Main set of functions to read and load models and datasets StreamRDF Interface for the output of all parsers RDFParser Detailed setup of a parser StreamManager Handles the opening of typed input streams RDFLanguages Registered languages RDFParserRegistry Registered parser factories Determining the RDF syntax The syntax of the RDF file is determined by the content type (if an HTTP request), then the file extension if there is no content type. Content type text/plain is ignored; it is assumed to be type returned for an unconfigured http server. The application can also pass in a declared language hint.\nThe string name traditionally used in model.read is mapped to RIOT Lang as:\nJena reader RIOT Lang \u0026quot;TURTLE\u0026quot; TURTLE \u0026quot;TTL\u0026quot; TURTLE \u0026quot;Turtle\u0026quot; TURTLE \u0026quot;N-TRIPLES\u0026quot; NTRIPLES \u0026quot;N-TRIPLE\u0026quot; NTRIPLES \u0026quot;NT\u0026quot; NTRIPLES \u0026quot;RDF/XML\u0026quot; RDFXML \u0026quot;N3\u0026quot; N3 \u0026quot;JSON-LD\u0026quot; JSONLD \u0026quot;RDF/JSON\u0026quot; RDFJSON \u0026quot;RDF/JSON\u0026quot; RDFJSON The following is a suggested Apache httpd .htaccess file:\nAddType text/turtle .ttl AddType application/rdf+xml .rdf AddType application/n-triples .nt AddType application/ld+json .jsonld AddType text/trig .trig AddType application/n-quads .nq AddType application/trix+xml .trix AddType application/rdf+thrift .rt AddType application/rdf+protobuf .rpb Example 1 : Using the RDFDataMgr RDFDataMgr provides operations to load, read and write models and datasets.\nRDFDataMgr \u0026ldquo;load\u0026rdquo; operations create an in-memory container (model, or dataset as appropriate); \u0026ldquo;read\u0026rdquo; operations add data into an existing model or dataset.\n// Create a model and read into it from file // \u0026quot;data.ttl\u0026quot; assumed to be Turtle. Model model = RDFDataMgr.loadModel(\u0026quot;data.ttl\u0026quot;) ; // Create a dataset and read into it from file // \u0026quot;data.trig\u0026quot; assumed to be TriG. Dataset dataset = RDFDataMgr.loadDataset(\u0026quot;data.trig\u0026quot;) ; // Read into an existing Model RDFDataMgr.read(model, \u0026quot;data2.ttl\u0026quot;) ; Example 2 : Common usage The original Jena Model API operation for read and write provide another way to the same machinery:\nModel model = ModelFactory.createDefaultModel() ; model.read(\u0026quot;data.ttl\u0026quot;) ; If the syntax is not as the file extension, a language can be declared:\nmodel.read(\u0026quot;data.foo\u0026quot;, \u0026quot;TURTLE\u0026quot;) ; Example 3 : Using RDFParser Detailed control over the setup of the parsing process is provided by RDFParser which provides a builder pattern. It has many options - see the javadoc for all details.\nFor example, to read Trig data, and set the error handler specially,\nDataset dataset; // The parsers will do the necessary character set conversion. try (InputStream in = new FileInputStream(\u0026quot;data.some.unusual.extension\u0026quot;)) { dataset = RDFParser.create() .source(in) .lang(RDFLanguages.TRIG) .errorHandler(ErrorHandlerFactory.errorHandlerStrict) .base(\u0026quot;http://example/base\u0026quot;) .toDataset(noWhere); } Logging The parsers log to a logger called org.apache.jena.riot. To avoid WARN messages, set this to ERROR in the logging system of the application.\nStreamManager and LocationMapper Operations to read RDF data can be redirected to local copies and to other URLs. This is useful to provide local copies of remote resources.\nBy default, the RDFDataMgr uses the global StreamManager to open typed InputStreams. The StreamManager can be set using the RDFParser builder:\n// Create a copy of the global default StreamManager. StreamManager sm = StreamManager.get().clone(); // Add directory \u0026quot;/tmp\u0026quot; as a place to look for files sm.addLocator(new LocatorFile(\u0026quot;/tmp\u0026quot;)); RDFParser.create() .streamManager(sm) .source(\u0026quot;data.ttl\u0026quot;) .parse(...); It can also be set in a Context object given the the RDFParser for the operation, but normally this defaults to the global Context available via Context.get(). The constant SysRIOT.sysStreamManager, which is http://jena.apache.org/riot/streamManager, is used.\nSpecialized StreamManagers can be configured with specific locators for data:\nFile locator (with own current directory) URL locator Class loader locator Zip file locator Configuring a StreamManager The StreamManager can be reconfigured with different places to look for files. The default configuration used for the global StreamManager is a file access class, where the current directory is that of the java process, a URL accessor for reading from the web, and a class loader-based accessor. Different setups can be built and used either as the global set up, or on a per request basis.\nThere is also a LocationMapper for rewriting file names and URLs before use to allow placing known names in different places (e.g. having local copies of import http resources).\nConfiguring a LocationMapper Location mapping files are RDF, usually written in Turtle although an RDF syntax can be used.\nPREFIX lm: \u0026lt;http://jena.hpl.hp.com/2004/08/location-mapping#\u0026gt; [] lm:mapping [ lm:name \u0026quot;file:foo.ttl\u0026quot; ; lm:altName \u0026quot;file:etc/foo.ttl\u0026quot; ] , [ lm:prefix \u0026quot;file:etc/\u0026quot; ; lm:altPrefix \u0026quot;file:ETC/\u0026quot; ] , [ lm:name \u0026quot;file:etc/foo.ttl\u0026quot; ; lm:altName \u0026quot;file:DIR/foo.ttl\u0026quot; ] . There are two types of location mapping: exact match renaming and prefix renaming. When trying to find an alternative location, a LocationMapper first tries for an exact match; if none is found, the LocationMapper will search for the longest matching prefix. If two are the same length, there is no guarantee on order tried; there is no implied order in a location mapper configuration file (it sets up two hash tables).\nIn the example above, file:etc/foo.ttl becomes file:DIR/foo.ttl because that is an exact match. The prefix match of file:/etc/ is ignored.\nAll string tests are done case sensitively because the primary use is for URLs.\nNotes:\nProperty values are not URIs, but strings. This is a system feature, not an RDF feature. Prefix mapping is name rewriting; alternate names are not treated as equivalent resources in the rest of Jena. While application writers are encouraged to use URIs to identify files, this is not always possible. There is no check to see if the alternative system resource is equivalent to the original. A LocationMapper finds its configuration file by looking for the following files, in order:\nfile:location-mapping.rdf file:location-mapping.ttl file:etc/location-mapping.rdf file:etc/location-mapping.ttl This is a specified as a path - note the path separator is always the character \u0026lsquo;;\u0026rsquo; regardless of operating system because URLs contain \u0026lsquo;:\u0026rsquo;.\nApplications can also set mappings programmatically. No configuration file is necessary.\nThe base URI for reading models will be the original URI, not the alternative location.\nAdvanced examples Example code may be found in jena-examples:arq/examples.\nIterating over parser output One of the capabilities of the RIOT API is the ability to treat parser output as an iterator, this is useful when you don\u0026rsquo;t want to go to the trouble of writing a full sink implementation and can easily express your logic in normal iterator style.\nTo do this you use AsyncParser.asyncParseTriples which parses the input on another thread:\nIteratorCloseable\u0026lt;Triple\u0026gt; iter = AsyncParser.asyncParseTriples(filename); iter.forEachRemaining(triple-\u0026gt;{ // Do something with triple }); Calling the iterator\u0026rsquo;s close method stops parsing and closes the involved resources. For N-Triples and N-Quads, you can use RiotParsers.createIteratorNTriples(input) which parses the input on the calling thread.\nRIOT example 9.\nAdditional control over parsing is provided by the AsyncParser.of(...) methods which return AsyncParserBuilder instances. The builder features a fluent API that allows for fine-tuning internal buffer sizes as well as eventually obtaining a standard Java Stream. Calling the stream\u0026rsquo;s close method stops parsing and closes the involved resources. Therefore, these streams are best used in conjunction with try-with-resources blocks:\ntry (Stream\u0026lt;Triple\u0026gt; stream = AsyncParser.of(filename) .setQueueSize(2).setChunkSize(100).streamTriples().limit(1000)) { // Do something with the stream } The AsyncParser also supports parsing RDF into a stream of EltStreamRDF elements. Each element can hold a triple, quad, prefix, base IRI or exception. For all Stream-based methods there also exist Iterator-based versions:\nIteratorCloseable\u0026lt;EltStreamRDF\u0026gt; it = AsyncParser.of(filename).asyncParseElements(); try { while (it.hasNext()) { EltStreamRDF elt = it.next(); if (elt.isTriple()) { // Do something with elt.getTriple(); } else if (elt.isPrefix()) { // Do something with elt.getPrefix() and elt.getIri(); } } } finally { Iter.close(it); } Filter the output of parsing When working with very large files, it can be useful to process the stream of triples or quads produced by the parser so as to work in a streaming fashion.\nSee RIOT example 4\nAdd a new language The set of languages is not fixed. A new language, together with a parser, can be added to RIOT as shown in RIOT example 6\n","permalink":"https://jena.apache.org/documentation/io/rdf-input.html","tags":null,"title":"Reading RDF in Apache Jena"},{"categories":null,"contents":"This section of the documentation describes the current support for inference available within Jena. It includes an outline of the general inference API, together with details of the specific rule engines and configurations for RDFS and OWL inference supplied with Jena.\nNot all of the fine details of the API are covered here: refer to the Jena Javadoc to get the full details of the capabilities of the API. Note that this is a preliminary version of this document, some errors or inconsistencies are possible, feedback to the mailing lists is welcomed. Overview of inference support The Jena inference subsystem is designed to allow a range of inference engines or reasoners to be plugged into Jena. Such engines are used to derive additional RDF assertions which are entailed from some base RDF together with any optional ontology information and the axioms and rules associated with the reasoner. The primary use of this mechanism is to support the use of languages such as RDFS and OWL which allow additional facts to be inferred from instance data and class descriptions. However, the machinery is designed to be quite general and, in particular, it includes a generic rule engine that can be used for many RDF processing or transformation tasks.\nWe will try to use the term inference to refer to the abstract process of deriving additional information and the term reasoner to refer to a specific code object that performs this task. Such usage is arbitrary and if we slip into using equivalent terms like reasoning and inference engine, please forgive us. The overall structure of the inference machinery is illustrated below. Applications normally access the inference machinery by using the ModelFactory to associate a data set with some reasoner to create a new Model. Queries to the created model will return not only those statements that were present in the original data but also additional statements than can be derived from the data using the rules or other inference mechanisms implemented by the reasoner.\nAs illustrated the inference machinery is actually implemented at the level of the Graph SPI, so that any of the different Model interfaces can be constructed around an inference Graph. In particular, the Ontology API provides convenient ways to link appropriate reasoners into the OntModels that it constructs. As part of the general RDF API we also provide an InfModel, this is an extension to the normal Model interface that provides additional control and access to an underlying inference graph. The reasoner API supports the notion of specializing a reasoner by binding it to a set of schema or ontology data using the bindSchema call. The specialized reasoner can then be attached to different sets of instance data using bind calls. In situations where the same schema information is to be used multiple times with different sets of instance data then this technique allows for some reuse of inferences across the different uses of the schema. In RDF there is no strong separation between schema (aka Ontology AKA tbox) data and instance (AKA abox) data and so any data, whether class or instance related, can be included in either the bind or bindSchema calls - the names are suggestive rather than restrictive.\nTo keep the design as open ended as possible Jena also includes a ReasonerRegistry. This is a static class though which the set of reasoners currently available can be examined. It is possible to register new reasoner types and to dynamically search for reasoners of a given type. The ReasonerRegistry also provides convenient access to prebuilt instances of the main supplied reasoners.\nAvailable reasoners Included in the Jena distribution are a number of predefined reasoners:\nTransitive reasoner: Provides support for storing and traversing class and property lattices. This implements just the transitive and reflexive properties of rdfs:subPropertyOf and rdfs:subClassOf. RDFS rule reasoner: Implements a configurable subset of the RDFS entailments. OWL, OWL Mini, OWL Micro Reasoners: A set of useful but incomplete implementation of the OWL/Lite subset of the OWL/Full language. Generic rule reasoner: A rule based reasoner that supports user defined rules. Forward chaining, tabled backward chaining and hybrid execution strategies are supported. [Index]\nThe Inference API Generic reasoner API Small examples Operations on inference models\n- Validation\n- Extended list statements\n- Direct and indirect relations\n- Derivations\n- Accessing raw data and deductions\n- Processing control\n- Tracing Generic reasoner API Finding a reasoner For each type of reasoner there is a factory class (which conforms to the interface ReasonerFactory) an instance of which can be used to create instances of the associated Reasoner. The factory instances can be located by going directly to a known factory class and using the static theInstance() method or by retrieval from a global ReasonerRegistry which stores factory instances indexed by URI assigned to the reasoner. In addition, there are convenience methods on the ReasonerRegistry for locating a prebuilt instance of each of the main reasoners (getTransitiveReasoner, getRDFSReasoner, getRDFSSimpleReasoner, getOWLReasoner, getOWLMiniReasoner, getOWLMicroReasoner).\nNote that the factory objects for constructing reasoners are just there to simplify the design and extension of the registry service. Once you have a reasoner instance, the same instance can reused multiple times by binding it to different datasets, without risk of interference - there is no need to create a new reasoner instance each time.\nIf working with the Ontology API it is not always necessary to explicitly locate a reasoner. The prebuilt instances of OntModelSpec provide easy access to the appropriate reasoners to use for different Ontology configurations.\nSimilarly, if all you want is a plain RDF Model with RDFS inference included then the convenience methods ModelFactory.createRDFSModel can be used. Configuring a reasoner The behaviour of many of the reasoners can be configured. To allow arbitrary configuration information to be passed to reasoners we use RDF to encode the configuration details. The ReasonerFactory.create method can be passed a Jena Resource object, the properties of that object will be used to configure the created reasoner.\nTo simplify the code required for simple cases we also provide a direct Java method to set a single configuration parameter, Reasoner.setParameter. The parameter being set is identified by the corresponding configuration property.\nFor the built in reasoners the available configuration parameters are described below and are predefined in the ReasonerVocabulary class.\nThe parameter value can normally be a String or a structured value. For example, to set a boolean value one can use the strings \u0026quot;true\u0026quot; or \u0026quot;false\u0026quot;, or in Java use a Boolean object or in RDF use an instance of xsd:Boolean\nApplying a reasoner to data Once you have an instance of a reasoner it can then be attached to a set of RDF data to create an inference model. This can either be done by putting all the RDF data into one Model or by separating into two components - schema and instance data. For some external reasoners a hard separation may be required. For all of the built-in reasoners the separation is arbitrary. The prime value of this separation is to allow some deductions from one set of data (typically some schema definitions) to be efficiently applied to several subsidiary sets of data (typically sets of instance data).\nIf you want to specialize the reasoner this way, by partially-applying it to a set schema data, use the Reasoner.bindSchema method which returns a new, specialized, reasoner.\nTo bind the reasoner to the final data set to create an inference model see the ModelFactory methods, particularly ModelFactory.createInfModel. Accessing inferences Finally, having created an inference model, any API operations which access RDF statements will be able to access additional statements which are entailed from the bound data by means of the reasoner. Depending on the reasoner these additional virtual statements may all be precomputed the first time the model is touched, may be dynamically recomputed each time or may be computed on-demand but cached.\nReasoner description The reasoners can be described using RDF metadata which can be searched to locate reasoners with appropriate properties. The calls Reasoner.getCapabilities and Reasoner.supportsProperty are used to access this descriptive metadata.\n[API Index] [Main Index]\nSome small examples These initial examples are not designed to illustrate the power of the reasoners but to illustrate the code required to set one up.\nLet us first create a Jena model containing the statements that some property \u0026quot;p\u0026quot; is a subproperty of another property \u0026quot;q\u0026quot; and that we have a resource \u0026quot;a\u0026quot; with value \u0026quot;foo\u0026quot; for \u0026quot;p\u0026quot;. This could be done by writing an RDF/XML or N3 file and reading that in but we have chosen to use the RDF API:\nString NS = \u0026#34;urn:x-hp-jena:eg/\u0026#34;; // Build a trivial example data set Model rdfsExample = ModelFactory.createDefaultModel(); Property p = rdfsExample.createProperty(NS, \u0026#34;p\u0026#34;); Property q = rdfsExample.createProperty(NS, \u0026#34;q\u0026#34;); rdfsExample.add(p, RDFS.subPropertyOf, q); rdfsExample.createResource(NS+\u0026#34;a\u0026#34;).addProperty(p, \u0026#34;foo\u0026#34;); Now we can create an inference model which performs RDFS inference over this data by using:\nInfModel inf = ModelFactory.createRDFSModel(rdfsExample); // [1] We can then check that resulting model shows that \u0026quot;a\u0026quot; also has property \u0026quot;q\u0026quot; of value \u0026quot;foo\u0026quot; by virtue of the subPropertyOf entailment:\nResource a = inf.getResource(NS+\u0026#34;a\u0026#34;); System.out.println(\u0026#34;Statement: \u0026#34; + a.getProperty(q)); Which prints the output:\nStatement: [urn:x-hp-jena:eg/a, urn:x-hp-jena:eg/q, Literal\u0026lt;foo\u0026gt;] Alternatively we could have created an empty inference model and then added in the statements directly to that model.\nIf we wanted to use a different reasoner which is not available as a convenience method or wanted to configure one we would change line [1]. For example, to create the same setup manually we could replace \\[1\\] by:\nReasoner reasoner = ReasonerRegistry.getRDFSReasoner(); InfModel inf = ModelFactory.createInfModel(reasoner, rdfsExample); or even more manually by\nReasoner reasoner = RDFSRuleReasonerFactory.theInstance().create(null); InfModel inf = ModelFactory.createInfModel(reasoner, rdfsExample); The purpose of creating a new reasoner instance like this variant would be to enable configuration parameters to be set. For example, if we were to listStatements on inf Model we would see that it also \u0026quot;includes\u0026quot; all the RDFS axioms, of which there are quite a lot. It is sometimes useful to suppress these and only see the \u0026quot;interesting\u0026quot; entailments. This can be done by setting the processing level parameter by creating a description of a new reasoner configuration and passing that to the factory method:\nResource config = ModelFactory.createDefaultModel() .createResource() .addProperty(ReasonerVocabulary.PROPsetRDFSLevel, \u0026#34;simple\u0026#34;); Reasoner reasoner = RDFSRuleReasonerFactory.theInstance().create(config); InfModel inf = ModelFactory.createInfModel(reasoner, rdfsExample); This is a rather long winded way of setting a single parameter, though it can be useful in the cases where you want to store this sort of configuration information in a separate (RDF) configuration file. For hardwired cases the following alternative is often simpler:\nReasoner reasoner = RDFSRuleReasonerFactory.theInstance()Create(null); reasoner.setParameter(ReasonerVocabulary.PROPsetRDFSLevel, ReasonerVocabulary.RDFS_SIMPLE); InfModel inf = ModelFactory.createInfModel(reasoner, rdfsExample); Finally, supposing you have a more complex set of schema information, defined in a Model called schema, and you want to apply this schema to several sets of instance data without redoing too many of the same intermediate deductions. This can be done by using the SPI level methods: Reasoner boundReasoner = reasoner.bindSchema(schema); InfModel inf = ModelFactory.createInfModel(boundReasoner, data); This creates a new reasoner, independent from the original, which contains the schema data. Any queries to an InfModel created using the boundReasoner will see the schema statements, the data statements and any statements entailed from the combination of the two. Any updates to the InfModel will be reflected in updates to the underlying data model - the schema model will not be affected.\n[API Index] [Main Index]\nOperations on inference models For many applications one simply creates a model incorporating some inference step, using the ModelFactory methods, and then just works within the standard Jena Model API to access the entailed statements. However, sometimes it is necessary to gain more control over the processing or to access additional reasoner features not available as virtual triples.\nValidation The most common reasoner operation which can't be exposed through additional triples in the inference model is that of validation. Typically the ontology languages used with the semantic web allow constraints to be expressed, the validation interface is used to detect when such constraints are violated by some data set. A simple but typical example is that of datatype ranges in RDFS. RDFS allows us to specify the range of a property as lying within the value space of some datatype. If an RDF statement asserts an object value for that property which lies outside the given value space there is an inconsistency.\nTo test for inconsistencies with a data set using a reasoner we use the InfModel.validate() interface. This performs a global check across the schema and instance data looking for inconsistencies. The result is a ValidityReport object which comprises a simple pass/fail flag (ValidityReport.isValid()) together with a list of specific reports (instances of the ValidityReport.Report interface) which detail any detected inconsistencies. At a minimum the individual reports should be printable descriptions of the problem but they can also contain an arbitrary reasoner-specific object which can be used to pass additional information which can be used for programmatic handling of the violations.\nFor example, to check a data set and list any problems one could do something like:\nModel data = RDFDataMgr.loadModel(fname); InfModel infmodel = ModelFactory.createRDFSModel(data); ValidityReport validity = infmodel.validate(); if (validity.isValid()) { System.out.println(\u0026#34;OK\u0026#34;); } else { System.out.println(\u0026#34;Conflicts\u0026#34;); for (Iterator i = validity.getReports(); i.hasNext(); ) { System.out.println(\u0026#34; - \u0026#34; + i.next()); } } The file testing/reasoners/rdfs/dttest2.nt declares a property bar with range xsd:integer and attaches a bar value to some resource with the value \u0026quot;25.5\u0026quot;^^xsd:decimal. If we run the above sample code on this file we see:\nConflicts - Error (dtRange): Property http://www.hpl.hp.com/semweb/2003/eg#bar has a typed range Datatype[http://www.w3.org/2001/XMLSchema#integer -\u0026gt; class java.math.BigInteger]that is not compatible with 25.5:http://www.w3.org/2001/XMLSchema#decimal Whereas the file testing/reasoners/rdfs/dttest3.nt uses the value \u0026quot;25\u0026quot;^^xsd:decimal instead, which is a valid integer and so passes. Note that the individual validation records can include warnings as well as errors. A warning does not affect the overall isValid() status but may indicate some issue the application may wish to be aware of. For example, it would be possible to develop a modification to the RDFS reasoner which warned about use of a property on a resource that is not explicitly declared to have the type of the domain of the property. A particular case of this arises in the case of OWL. In the Description Logic community a class which cannot have an instance is regarded as \u0026quot;inconsistent\u0026quot;. That term is used because it generally arises from an error in the ontology. However, it is not a logical inconsistency - i.e. something giving rise to a contradiction. Having an instance of such a class is, clearly a logical error. In the Jena 2.2 release we clarified the semantics of isValid(). An ontology which is logically consistent but contains empty classes is regarded as valid (that is isValid() is false only if there is a logical inconsistency). Class expressions which cannot be instantiated are treated as warnings rather than errors. To make it easier to test for this case there is an additional method Report.isClean() which returns true if the ontology is both valid (logically consistent) and generated no warnings (such as inconsistent classes).\nExtended list statements The default API supports accessing all entailed information at the level of individual triples. This is surprisingly flexible but there are queries which cannot be easily supported this way. The first such is when the query needs to make reference to an expression which is not already present in the data. For example, in description logic systems it is often possible to ask if there are any instances of some class expression. Whereas using the triple-based approach we can only ask if there are any instances of some class already defined (though it could be defined by a bNode rather than be explicitly named).\nTo overcome this limitation the InfModel API supports a notion of \u0026quot;posit\u0026quot;, that is a set of assertions which can be used to temporarily declare new information such as the definition of some class expression. These temporary assertions can then be referenced by the other arguments to the listStatements command. With the current reasoners this is an expensive operation, involving the temporary creation of an entire new model with the additional posits added and all inference has to start again from scratch. Thus it is worth considering preloading your data with expressions you might need to query over. However, for some external reasoners, especially description logic reasoners, we anticipate restricted uses of this form of listStatement will be important.\nDirect and indirect relationships The second type of operation that is not obviously convenient at the triple level involves distinguishing between direct and indirect relationships. If a relation is transitive, for example rdfs:subClassOf, then we can define the notion of the minimal or direct form of the relationship from which all other values of the relation can be derived by transitive closure. Normally, when an InfGraph is queried for a transitive relation the results returned show the inferred relations, i.e. the full transitive closure (all the links (ii) in the illustration). However, in some cases, such when as building a hierarchical UI widget to represent the graph, it is more convenient to only see the direct relations (iii). This is achieved by defining special direct aliases for those relations which can be queried this way. For the built in reasoners this functionality is available for rdfs:subClassOf and rdfs:subPropertyOf and the direct aliases for these are defined in ReasonerVocabulary.\nTypically, the easiest way to work with such indirect and direct relations is to use the Ontology API which hides the grubby details of these property aliases.\nDerivations It is sometimes useful to be able to trace where an inferred statement was generated from. This is achieved using the InfModel.getDerivation(Statement) method. This returns a iterator over a set Derivation objects through which a brief description of the source of the derivation can be obtained. Typically understanding this involves tracing the sources for other statements which were used in this derivation and the Derivation.PrintTrace method is used to do this recursively.\nThe general form of the Derivation objects is quite abstract but in the case of the rule-based reasoners they have a more detailed internal structure that can be accessed - see RuleDerivation.\nDerivation information is rather expensive to compute and store. For this reason, it is not recorded by default and InfModel.serDerivationLogging(true) must be used to enable derivations to be recorded. This should be called before any queries are made to the inference model.\nAs an illustration suppose that we have a raw data model which asserts three triples:\neg:A eg:p eg:B . eg:B eg:p eg:C . eg:C eg:p eg:D . and suppose that we have a trivial rule set which computes the transitive closure over relation eg:p\nString rules = \u0026#34;[rule1: (?a eg:p ?b) (?b eg:p ?c) -\u0026amp;gt; (?a eg:p ?c)]\u0026#34;; Reasoner reasoner = new GenericRuleReasoner(Rule.parseRules(rules)); reasoner.setDerivationLogging(true); InfModel inf = ModelFactory.createInfModel(reasoner, rawData); Then we can query whether eg:A is related through eg:p to eg:D and list the derivation route using the following code fragment: PrintWriter out = new PrintWriter(System.out); for (StmtIterator i = inf.listStatements(A, p, D); i.hasNext(); ) { Statement s = i.nextStatement(); System.out.println(\u0026#34;Statement is \u0026#34; + s); for (Iterator id = inf.getDerivation(s); id.hasNext(); ) { Derivation deriv = (Derivation) id.next(); deriv.printTrace(out, true); } } out.flush(); Which generates the output:\nStatement is [urn:x-hp:eg/A, urn:x-hp:eg/p, urn:x-hp:eg/D] Rule rule1 concluded (eg:A eg:p eg:D) \u0026lt- Fact (eg:A eg:p eg:B) Rule rule1 concluded (eg:B eg:p eg:D) \u0026lt;- Fact (eg:B eg:p eg:C) Fact (eg:C eg:p eg:D) Accessing raw data and deductions From an InfModel it is easy to retrieve the original, unchanged, data over which the model has been computed using the getRawModel() call. This returns a model equivalent to the one used in the initial bind call. It might not be the same Java object but it uses the same Java object to hold the underlying data graph. Some reasoners, notably the forward chaining rule engine, store the deduced statements in a concrete form and this set of deductions can be obtained separately by using the getDeductionsModel() call. Processing control Having bound a Model into an InfModel by using a Reasoner its content can still be changed by the normal add and remove calls to the InfModel. Any such change the model will usually cause all current deductions and temporary rules to be discarded and inference will start again from scratch at the next query. Some reasoners, such as the RETE-based forward rule engine, can work incrementally. In the non-incremental case then the processing will not be started until a query is made. In that way a sequence of add and removes can be undertaken without redundant work being performed at each change. In some applications it can be convenient to trigger the initial processing ahead of time to reduce the latency of the first query. This can be achieved using the InfModel.prepare() call. This call is not necessary in other cases, any query will automatically trigger an internal prepare phase if one is required.\nThere are times when the data in a model bound into an InfModel can is changed \u0026quot;behind the scenes\u0026quot; instead of through calls to the InfModel. If this occurs the result of future queries to the InfModel are unpredictable. To overcome this and force the InfModel to reconsult the raw data use the InfModel.rebind() call.\nFinally, some reasoners can store both intermediate and final query results between calls. This can substantially reduce the cost of working with the inference services but at the expense of memory usage. It is possible to force an InfModel to discard all such cached state by using the InfModel.reset() call. It there are any outstanding queries (i.e. StmtIterators which have not been read to the end yet) then those will be aborted (the next hasNext() call will return false).\nTracing When developing new reasoner configurations, especially new rule sets for the rule engines, it is sometimes useful to be able to trace the operations of the associated inference engine. Though, often this generates too much information to be of use and selective use of the print builtin can be more effective. Tracing is not supported by a convenience API call but, for those reasoners that support it, it can be enabled using:\nreasoner.setParameter(ReasonerVocabulary.PROPtraceOn, Boolean.TRUE);\nDynamic tracing control is sometimes possible on the InfModel itself by retrieving its underlying InfGraph and calling setTraceOn() call. If you need to make use of this see the full javadoc for the relevant InfGraph implementation.\n[API Index] [Main Index]\nThe RDFS reasoner RDFS reasoner - introduction and coverage RDFS Configuration RDFS Example RDFS implementation and performance notes RDFS reasoner - intro and coverage Jena includes an RDFS reasoner (RDFSRuleReasoner) which supports almost all of the RDFS entailments described by the RDF Core working group [RDF Semantics]. The only omissions are deliberate and are described below.\nThis reasoner is accessed using ModelFactory.createRDFSModel or manually via ReasonerRegistry.getRDFSReasoner().\nDuring the preview phases of Jena experimental RDFS reasoners were released, some of which are still included in the code base for now but applications should not rely on their stability or continued existence.\nWhen configured in full mode (see below for configuration information) then the RDFS reasoner implements all RDFS entailments except for the bNode closure rules. These closure rules imply, for example, that for all triples of the form:\neg:a eg:p nnn^^datatype . we should introduce the corresponding blank nodes:\neg:a eg:p _:anon1 . _:anon1 rdf:type datatype . Whilst such rules are both correct and necessary to reduce RDF datatype entailment down to simple entailment they are not useful in implementation terms. In Jena simple entailment can be implemented by translating a graph containing bNodes to an equivalent query containing variables in place of the bNodes. Such a query is can directly match the literal node and the RDF API can be used to extract the datatype of the literal. The value to applications of directly seeing the additional bNode triples, even in virtual triple form, is negligible and so this has been deliberately omitted from the reasoner. [RDFS Index] [Main Index]\nRDFS configuration The RDFSRuleReasoner can be configured to work at three different compliance levels: Full This implements all of the RDFS axioms and closure rules with the exception of bNode entailments and datatypes (rdfD 1). See above for comments on these. This is an expensive mode because all statements in the data graph need to be checked for possible use of container membership properties. It also generates type assertions for all resources and properties mentioned in the data (rdf1, rdfs4a, rdfs4b). Default This omits the expensive checks for container membership properties and the \u0026quot;everything is a resource\u0026quot; and \u0026quot;everything used as a property is one\u0026quot; rules (rdf1, rdfs4a, rdfs4b). The latter information is available through the Jena API and creating virtual triples to this effect has little practical value.\nThis mode does include all the axiomatic rules. Thus, for example, even querying an \u0026quot;empty\u0026quot; RDFS InfModel will return triples such as [rdf:type rdfs:range rdfs:Class]. Simple This implements just the transitive closure of subPropertyOf and subClassOf relations, the domain and range entailments and the implications of subPropertyOf and subClassOf. It omits all of the axioms. This is probably the most useful mode but is not the default because it is a less complete implementation of the standard. The level can be set using the setParameter call, e.g.\nreasoner.setParameter(ReasonerVocabulary.PROPsetRDFSLevel, ReasonerVocabulary.RDFS_SIMPLE); or by constructing an RDF configuration description and passing that to the RDFSRuleReasonerFactory e.g.\nResource config = ModelFactory.createDefaultModel() .createResource() .addProperty(ReasonerVocabulary.PROPsetRDFSLevel, \u0026#34;simple\u0026#34;); Reasoner reasoner = RDFSRuleReasonerFactory.theInstance()Create(config); Summary of parameters Parameter Values Description PROPsetRDFSLevel \u0026quot;full\u0026quot;, \u0026quot;default\u0026quot;, \u0026quot;simple\u0026quot; Sets the RDFS processing level as described above. PROPenableCMPScan Boolean If true forces a preprocessing pass which finds all usages of rdf:_n properties and declares them as ContainerMembershipProperties. This is implied by setting the level parameter to \u0026quot;full\u0026quot; and is not normally used directly. PROPtraceOn Boolean If true switches on exhaustive tracing of rule executions at the INFO level. PROPderivationLogging Boolean If true causes derivation routes to be recorded internally so that future getDerivation calls can return useful information. [RDFS Index] [Main Index]\nRDFS Example As a complete worked example let us create a simple RDFS schema, some instance data and use an instance of the RDFS reasoner to query the two.\nWe shall use a trivial schema:\n\u0026lt;rdf:Description rdf:about=\u0026#34;eg:mum\u0026#34;\u0026gt; \u0026lt;rdfs:subPropertyOf rdf:resource=\u0026#34;eg:parent\u0026#34;/\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;rdf:Description rdf:about=\u0026#34;eg:parent\u0026#34;\u0026gt; \u0026lt;rdfs:range rdf:resource=\u0026#34;eg:Person\u0026#34;/\u0026gt; \u0026lt;rdfs:domain rdf:resource=\u0026#34;eg:Person\u0026#34;/\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;rdf:Description rdf:about=\u0026#34;eg:age\u0026#34;\u0026gt; \u0026lt;rdfs:range rdf:resource=\u0026#34;xsd:integer\u0026#34; /\u0026gt; \u0026lt;/rdf:Description\u0026gt; This defines a property parent from Person to Person, a sub-property mum of parent and an integer-valued property age.\nWe shall also use the even simpler instance file:\n\u0026lt;Teenager rdf:about=\u0026#34;eg:colin\u0026#34;\u0026gt; \u0026lt;mum rdf:resource=\u0026#34;eg:rosy\u0026#34; /\u0026gt; \u0026lt;age\u0026gt;13\u0026lt;/age\u0026gt; \u0026lt;/Teenager\u0026gt; Which defines a Teenager called colin who has a mum rosy and an age of 13.\nThen the following code fragment can be used to read files containing these definitions, create an inference model and query it for information on the rdf:type of colin and the rdf:type of Person:\nModel schema = RDFDataMgr.loadModel(\u0026#34;file:data/rdfsDemoSchema.rdf\u0026#34;); Model data = RDFDataMgr.loadModel(\u0026#34;file:data/rdfsDemoData.rdf\u0026#34;); InfModel infmodel = ModelFactory.createRDFSModel(schema, data); Resource colin = infmodel.getResource(\u0026#34;urn:x-hp:eg/colin\u0026#34;); System.out.println(\u0026#34;colin has types:\u0026#34;); printStatements(infmodel, colin, RDF.type, null); Resource Person = infmodel.getResource(\u0026#34;urn:x-hp:eg/Person\u0026#34;); System.out.println(\u0026#34;\\nPerson has types:\u0026#34;); printStatements(infmodel, Person, RDF.type, null); This produces the output:\ncolin has types: - (eg:colin rdf:type eg:Teenager) - (eg:colin rdf:type rdfs:Resource) - (eg:colin rdf:type eg:Person) Person has types: - (eg:Person rdf:type rdfs:Class) - (eg:Person rdf:type rdfs:Resource) This says that colin is both a Teenager (by direct definition), a Person (because he has a mum which means he has a parent and the domain of parent is Person) and an rdfs:Resource. It also says that Person is an rdfs:Class, even though that wasn't explicitly in the schema, because it is used as object of range and domain statements.\nIf we add the additional code:\nValidityReport validity = infmodel.validate(); if (validity.isValid()) { System.out.println(\u0026#34;\\nOK\u0026#34;); } else { System.out.println(\u0026#34;\\nConflicts\u0026#34;); for (Iterator i = validity.getReports(); i.hasNext(); ) { ValidityReport.Report report = (ValidityReport.Report)i.next(); System.out.println(\u0026#34; - \u0026#34; + report); } } Then we get the additional output:\nConflicts - Error (dtRange): Property urn:x-hp:eg/age has a typed range Datatype[http://www.w3.org/2001/XMLSchema#integer -\u0026gt; class java.math.BigInteger] that is not compatible with 13 because the age was given using an RDF plain literal where as the schema requires it to be a datatyped literal which is compatible with xsd:integer.\n[RDFS Index] [Main Index]\nRDFS implementation and performance notes The RDFSRuleReasoner is a hybrid implementation. The subproperty and subclass lattices are eagerly computed and stored in a compact in-memory form using the TransitiveReasoner (see below). The identification of which container membership properties (properties like rdf:_1) are present is implemented using a preprocessing hook. The rest of the RDFS operations are implemented by explicit rule sets executed by the general hybrid rule reasoner. The three different processing levels correspond to different rule sets. These rule sets are located by looking for files \u0026quot;`etc/*.rules`\u0026quot; on the classpath and so could, in principle, be overridden by applications wishing to modify the rules. Performance for in-memory queries appears to be good. Using a synthetic dataset we obtain the following times to determine the extension of a class from a class hierarchy:\nSet #concepts total instances #instances of concept JenaRDFS XSB* 1 155 1550 310 0.07 0.16 2 780 7800 1560 0.25 0.47 3 3905 39050 7810 1.16 2.11 The times are in seconds, normalized to a 1.1GHz Pentium processor. The XSB* figures are taken from a pre-published paper and may not be directly comparable (for example they do not include any rule compilation time) - they are just offered to illustrate that the RDFSRuleReasoner has broadly similar scaling and performance to other rule-based implementations.\nThe Jena RDFS implementation has not been tested and evaluated over database models. The Jena architecture makes it easy to construct such models but in the absence of caching we would expect the performance to be poor. Future work on adapting the rule engines to exploit the capabilities of the more sophisticated database backends will be considered.\n[RDFS Index] [Main Index]\nThe OWL reasoner OWL reasoner introduction OWL coverage OWL configuration OWL example OWL notes and limitations The second major set of reasoners supplied with Jena is a rule-based implementation of the OWL/lite subset of OWL/full.\nThe current release includes a default OWL reasoner and two small/faster configurations. Each of the configurations is intended to be a sound implementation of a subset of OWL/full semantics but none of them is complete (in the technical sense). For complete OWL DL reasoning use an external DL reasoner such as Pellet, Racer or FaCT. Performance (especially memory use) of the fuller reasoner configuration still leaves something to be desired and will the subject of future work - time permitting.\nSee also subsection 5 for notes on more specific limitations of the current implementation. OWL coverage The Jena OWL reasoners could be described as instance-based reasoners. That is, they work by using rules to propagate the if- and only-if- implications of the OWL constructs on instance data. Reasoning about classes is done indirectly - for each declared class a prototypical instance is created and elaborated. If the prototype for a class A can be deduced as being a member of class B then we conclude that A is a subClassOf B. This approach is in contrast to more sophisticated Description Logic reasoners which work with class expressions and can be less efficient when handling instance data but more efficient with complex class expressions and able to provide complete reasoning. We thus anticipate that the OWL rule reasoner will be most suited to applications involving primarily instance reasoning with relatively simple, regular ontologies and least suited to applications involving large rich ontologies. A better characterisation of the tradeoffs involved would be useful and will be sought.\nWe intend that the OWL reasoners should be smooth extensions of the RDFS reasoner described above. That is all RDFS entailments found by the RDFS reasoner will also be found by the OWL reasoners and scaling on RDFS schemas should be similar (though there are some costs, see later). The instance-based implementation technique is in keeping with this \u0026quot;RDFS plus a bit\u0026quot; approach.\nAnother reason for choosing this inference approach is that it makes it possible to experiment with support for different constructs, including constructs that go beyond OWL, by modification of the rule set. In particular, some applications of interest to ourselves involve ontology transformation which very often implies the need to support property composition. This is something straightforward to express in rule-based form and harder to express in standard Description Logics.\nSince RDFS is not a subset of the OWL/Lite or OWL/DL languages the Jena implementation is an incomplete implementation of OWL/full. We provide three implementations a default (\u0026quot;full\u0026quot; one), a slightly cut down \u0026quot;mini\u0026quot; and a rather smaller/faster \u0026quot;micro\u0026quot;. The default OWL rule reasoner (ReasonerRegistry.getOWLReasoner()) supports the constructs as listed below. The OWLMini reasoner is nearly the same but omits the forward entailments from minCardinality/someValuesFrom restrictions - that is it avoids introducing bNodes which avoids some infinite expansions and enables it to meet the Jena API contract more precisely. The OWLMicro reasoner just supports RDFS plus the various property axioms, intersectionOf, unionOf (partial) and hasValue. It omits the cardinality restrictions and equality axioms, which enables it to achieve much higher performance. Constructs Supported by Notes rdfs:subClassOf, rdfs:subPropertyOf, rdf:type all Normal RDFS semantics supported including meta use (e.g. taking the subPropertyOf subClassOf). rdfs:domain, rdfs:range all Stronger if-and-only-if semantics supported owl:intersectionOf all \u0026nbsp; owl:unionOf all Partial support. If C=unionOf(A,B) then will infer that A,B are subclasses of C, and thus that instances of A or B are instances of C. Does not handle the reverse (that an instance of C must be either an instance of A or an instance of B). owl:equivalentClass all owl:disjointWith full, mini owl:sameAs, owl:differentFrom, owl:distinctMembers full, mini owl:distinctMembers is currently translated into a quadratic set of owl:differentFrom assertions. Owl:Thing all \u0026nbsp; owl:equivalentProperty, owl:inverseOf all owl:FunctionalProperty, owl:InverseFunctionalProperty all owl:SymmetricProperty, owl:TransitiveProperty all owl:someValuesFrom full, (mini) Full supports both directions (existence of a value implies membership of someValuesFrom restriction, membership of someValuesFrom implies the existence of a bNode representing the value).\nMini omits the latter \u0026quot;bNode introduction\u0026quot; which avoids some infinite closures.\nowl:allValuesFrom full, mini Partial support, forward direction only (member of a allValuesFrom(p, C) implies that all p values are of type C). Does handle cases where the reverse direction is trivially true (e.g. by virtue of a global rdfs:range axiom). owl:minCardinality, owl:maxCardinality, owl:cardinality full, (mini) Restricted to cardinalities of 0 or 1, though higher cardinalities are partially supported in validation for the case of literal-valued properties.\nMini omits the bNodes introduction in the minCardinality(1) case, see someValuesFrom above.\nowl:hasValue all The critical constructs which go beyond OWL/lite and are not supported in the Jena OWL reasoner are complementOf and oneOf. As noted above the support for unionOf is partial (due to limitations of the rule based approach) but is useful for traversing class hierarchies.\nEven within these constructs rule based implementations are limited in the extent to which they can handle equality reasoning - propositions provable by reasoning over concrete and introduced instances are covered but reasoning by cases is not supported.\nNevertheless, the full reasoner passes the normative OWL working group positive and negative entailment tests for the supported constructs, though some tests need modification for the comprehension axioms (see below).\nThe OWL rule set does include incomplete support for validation of datasets using the above constructs. Specifically, it tests for:\nIllegal existence of a property restricted by a maxCardinality(0) restriction. Two individuals both sameAs and differentFrom each other. Two classes declared as disjoint but where one subsumes the other (currently reported as a violation concerning the class prototypes, error message to be improved). Range or a allValuesFrom violations for DatatypeProperties. Too many literal-values for a DatatypeProperty restricted by a maxCardinality(N) restriction. [OWL Index] [Main Index]\nOWL Configuration This reasoner is accessed using ModelFactory.createOntologyModel with the prebuilt OntModelSpec OWL_MEM_RULE_INF or manually via ReasonerRegistry.getOWLReasoner().\nThere are no OWL-specific configuration parameters though the reasoner supports the standard control parameters:\nParameter Values Description PROPtraceOn boolean If true switches on exhaustive tracing of rule executions at the INFO level. PROPderivationLogging Boolean If true causes derivation routes to be recorded internally so that future getDerivation calls can return useful information. As we gain experience with the ways in which OWL is used and the capabilities of the rule-based approach we imagine useful subsets of functionality emerging - like that supported by the RDFS reasoner in the form of the level settings.\n[OWL Index] [Main Index]\nOWL Example As an example of using the OWL inference support, consider the sample schema and data file in the data directory - owlDemoSchema.rdf and owlDemoData.rdf. The schema file shows a simple, artificial ontology concerning computers which defines a GamingComputer as a Computer which includes at least one bundle of type GameBundle and a component with the value gamingGraphics. The data file shows information on several hypothetical computer configurations including two different descriptions of the configurations \u0026quot;whiteBoxZX\u0026quot; and \u0026quot;bigName42\u0026quot;.\nWe can create an instance of the OWL reasoner, specialized to the demo schema and then apply that to the demo data to obtain an inference model, as follows:\nModel schema = RDFDataMgr.loadModel(\u0026#34;file:data/owlDemoSchema.rdf\u0026#34;); Model data = RDFDataMgr.loadModel(\u0026#34;file:data/owlDemoData.rdf\u0026#34;); Reasoner reasoner = ReasonerRegistry.getOWLReasoner(); reasoner = reasoner.bindSchema(schema); InfModel infmodel = ModelFactory.createInfModel(reasoner, data); A typical example operation on such a model would be to find out all we know about a specific instance, for example the nForce mother board. This can be done using:\nResource nForce = infmodel.getResource(\u0026#34;urn:x-hp:eg/nForce\u0026#34;); System.out.println(\u0026#34;nForce *:\u0026#34;); printStatements(infmodel, nForce, null, null); where printStatements is defined by: public void printStatements(Model m, Resource s, Property p, Resource o) { for (StmtIterator i = m.listStatements(s,p,o); i.hasNext(); ) { Statement stmt = i.nextStatement(); System.out.println(\u0026#34; - \u0026#34; + PrintUtil.print(stmt)); } } This produces the output:\nnForce *: - (eg:nForce rdf:type owl:Thing) - (eg:nForce owl:sameAs eg:unknownMB) - (eg:nForce owl:sameAs eg:nForce) - (eg:nForce rdf:type eg:MotherBoard) - (eg:nForce rdf:type rdfs:Resource) - (eg:nForce rdf:type a3b24:f7822755ad:-7ffd) - (eg:nForce eg:hasGraphics eg:gamingGraphics) - (eg:nForce eg:hasComponent eg:gamingGraphics) Note that this includes inferences based on subClass inheritance (being an eg:MotherBoard implies it is an owl:Thing and an rdfs:Resource), property inheritance (eg:hasComponent eg:gameGraphics derives from hasGraphics being a subProperty of hasComponent) and cardinality reasoning (it is the sameAs eg:unknownMB because computers are defined to have only one motherboard and the two different descriptions of whileBoxZX use these two different terms for the mother board). The anonymous rdf:type statement references the \u0026quot;hasValue(eg:hasComponent, eg:gamingGraphics)\u0026quot; restriction mentioned in the definition of GamingComputer.\nA second, typical operation is instance recognition. Testing if an individual is an instance of a class expression. In this case the whileBoxZX is identifiable as a GamingComputer because it is a Computer, is explicitly declared as having an appropriate bundle and can be inferred to have a gamingGraphics component from the combination of the nForce inferences we've already seen and the transitivity of hasComponent. We can test this using:\nResource gamingComputer = infmodel.getResource(\u0026#34;urn:x-hp:eg/GamingComputer\u0026#34;); Resource whiteBox = infmodel.getResource(\u0026#34;urn:x-hp:eg/whiteBoxZX\u0026#34;); if (infmodel.contains(whiteBox, RDF.type, gamingComputer)) { System.out.println(\u0026#34;White box recognized as gaming computer\u0026#34;); } else { System.out.println(\u0026#34;Failed to recognize white box correctly\u0026#34;); } Which generates the output:\nWhite box recognized as gaming computer Finally, we can check for inconsistencies within the data by using the validation interface:\nValidityReport validity = infmodel.validate(); if (validity.isValid()) { System.out.println(\u0026#34;OK\u0026#34;); } else { System.out.println(\u0026#34;Conflicts\u0026#34;); for (Iterator i = validity.getReports(); i.hasNext(); ) { ValidityReport.Report report = (ValidityReport.Report)i.next(); System.out.println(\u0026#34; - \u0026#34; + report); } } Which generates the output:\nConflicts - Error (conflict): Two individuals both same and different, may be due to disjoint classes or functional properties Culprit = eg:nForce2 Implicated node: eg:bigNameSpecialMB \u0026hellip; + 3 other similar reports This is due to the two records for the bigName42 configuration referencing two motherboards which are explicitly defined to be different resources and thus violate the FunctionProperty nature of hasMotherBoard.\n[OWL Index] [Main Index]\nOWL notes and limitations Comprehension axioms A critical implication of our variant of the instance-based approach is that the reasoner does not directly answer queries relating to dynamically introduced class expressions.\nFor example, given a model containing the RDF assertions corresponding to the two OWL axioms:\nclass A = intersectionOf (minCardinality(P, 1), maxCardinality(P,1)) class B = cardinality(P,1) Then the reasoner can demonstrate that classes A and B are equivalent, in particular that any instance of A is an instance of B and vice versa. However, given a model just containing the first set of assertions you cannot directly query the inference model for the individual triples that make up cardinality(P,1). If the relevant class expressions are not already present in your model then you need to use the list-with-posits mechanism described above, though be warned that such posits start inference afresh each time and can be expensive. Actually, it would be possible to introduce comprehension axioms for simple cases like this example. We have, so far, chosen not to do so. First, since the OWL/full closure is generally infinite, some limitation on comprehension inferences seems to be useful. Secondly, the typical queries that Jena applications expect to be able to issue would suddenly jump in size and cost - causing a support nightmare. For example, queries such as (a, rdf:type, *) would become near-unusable.\nApproximately, 10 of the OWL working group tests for the supported OWL subset currently rely on such comprehension inferences. The shipping version of the Jena rule reasoner passes these tests only after they have been rewritten to avoid the comprehension requirements.\nPrototypes As noted above the current OWL rule set introduces prototypical instances for each defined class. These prototypical instances used to be visible to queries. From release 2.1 they are used internally but should not longer be visible. Direct/indirect We noted above that the Jena reasoners support a separation of direct and indirect relations for transitive properties such as subClassOf. The current implementation of the full and mini OWL reasoner fails to do this and the direct forms of the queries will fail. The OWL Micro reasoner, which is but a small extension of RDFS, does support the direct queries.\nThis does not affect querying though the Ontology API, which works around this limitation. It only affects direct RDF accesses to the inference model.\nPerformance The OWL reasoners use the rule engines for all inference. The full and mini configurations omit some of the performance tricks employed by the RDFS reasoner (notably the use of the custom transitive reasoner) making those OWL reasoner configurations slower than the RDFS reasoner on pure RDFS data (typically around x3-4 slow down). The OWL Micro reasoner is intended to be as close to RDFS performance while also supporting the core OWL constructs as described earlier.\nOnce the owl constructs are used then substantial reasoning can be required. The most expensive aspect of the supported constructs is the equality reasoning implied by use of cardinality restrictions and FunctionalProperties. The current rule set implements equality reasoning by identifying all sameAs deductions during the initial forward \u0026quot;prepare\u0026quot; phase. This may require the entire instance dataset to be touched several times searching for occurrences of FunctionalProperties.\nBeyond this the rules implementing the OWL constructs can interact in complex ways leading to serious performance overheads for complex ontologies. Characterising the sorts of ontologies and inference problems that are well tackled by this sort of implementation and those best handled by plugging a Description Logic engine, or a saturation theorem prover, into Jena is a topic for future work.\nOne random hint: explicitly importing the owl.owl definitions causes much duplication of rule use and a substantial slow down - the OWL axioms that the reasoner can handle are already built in and don't need to be redeclared.\nIncompleteness The rule based approach cannot offer a complete solution for OWL/Lite, let alone the OWL/Full fragment corresponding to the OWL/Lite constructs. In addition the current implementation is still under development and may well have omissions and oversights. We intend that the reasoner should be sound (all inferred triples should be valid) but not complete. [OWL Index] [Main Index]\nThe transitive reasoner The TransitiveReasoner provides support for storing and traversing class and property lattices. This implements just the transitive and symmetric properties of rdfs:subPropertyOf and rdfs:subClassOf. It is not all that exciting on its own but is one of the building blocks used for the more complex reasoners. It is a hardwired Java implementation that stores the class and property lattices as graph structures. It is slightly higher performance, and somewhat more space efficient, than the alternative of using the pure rule engines to performance transitive closure but its main advantage is that it implements the direct/minimal version of those relations as well as the transitively closed version.\nThe GenericRuleReasoner (see below) can optionally use an instance of the transitive reasoner for handling these two properties. This is the approach used in the default RDFS reasoner.\nIt has no configuration options.\n[Index]\nThe general purpose rule engine Overview of the rule engine(s) Rule syntax and structure Forward chaining engine Backward chaining engine Hybrid engine GenericRuleReasoner configuration Builtin primitives Example Combining RDFS/OWL with custom rules Notes Extensions Overview of the rule engine(s) Jena includes a general purpose rule-based reasoner which is used to implement both the RDFS and OWL reasoners but is also available for general use. This reasoner supports rule-based inference over RDF graphs and provides forward chaining, backward chaining and a hybrid execution model. To be more exact, there are two internal rule engines one forward chaining RETE engine and one tabled datalog engine - they can be run separately or the forward engine can be used to prime the backward engine which in turn will be used to answer queries.\nThe various engine configurations are all accessible through a single parameterized reasoner GenericRuleReasoner. At a minimum a GenericRuleReasoner requires a ruleset to define its behaviour. A GenericRuleReasoner instance with a ruleset can be used like any of the other reasoners described above - that is it can be bound to a data model and used to answer queries to the resulting inference model. The rule reasoner can also be extended by registering new procedural primitives. The current release includes a starting set of primitives which are sufficient for the RDFS and OWL implementations but is easily extensible.\n[Rule Index] [Main Index]\nRule syntax and structure A rule for the rule-based reasoner is defined by a Java Rule object with a list of body terms (premises), a list of head terms (conclusions) and an optional name and optional direction. Each term or ClauseEntry is either a triple pattern, an extended triple pattern or a call to a builtin primitive. A rule set is simply a List of Rules.\nFor convenience a rather simple parser is included with Rule which allows rules to be specified in reasonably compact form in text source files. However, it would be perfectly possible to define alternative parsers which handle rules encoded using, say, XML or RDF and generate Rule objects as output. It would also be possible to build a real parser for the current text file syntax which offered better error recovery and diagnostics.\nAn informal description of the simplified text rule syntax is:\nRule := bare-rule . or [ bare-rule ]\nor [ ruleName : bare-rule ] bare-rule := term, \u0026hellip; term -\u0026gt; hterm, \u0026hellip; hterm // forward rule or bhterm \u0026lt;- term, \u0026hellip; term // backward rule\nhterm := term or [ bare-rule ]\nterm := (node, node, node) // triple pattern or (node, node, functor) // extended triple pattern or builtin(node, \u0026hellip; node) // invoke procedural primitive\nbhterm := (node, node, node) // triple pattern\nfunctor := functorName(node, \u0026hellip; node) // structured literal\nnode := uri-ref // e.g. http://foo.com/eg or prefix:localname // e.g. rdf:type or \u0026lt;uri-ref\u0026gt; // e.g. \u0026lt;myscheme:myuri\u0026gt; or ?varname // variable or \u0026lsquo;a literal\u0026rsquo; // a plain string literal or \u0026rsquo;lex\u0026rsquo;^^typeURI // a typed literal, xsd:* type names supported or number // e.g. 42 or 25.5\nThe \u0026quot;,\u0026quot; separators are optional.\nThe difference between the forward and backward rule syntax is only relevant for the hybrid execution strategy, see below.\nThe functor in an extended triple pattern is used to create and access structured literal values. The functorName can be any simple identifier and is not related to the execution of builtin procedural primitives, it is just a datastructure. It is useful when a single semantic structure is defined across multiple triples and allows a rule to collect those triples together in one place.\nTo keep rules readable qname syntax is supported for URI refs. The set of known prefixes is those registered with the PrintUtil object. This initially knows about rdf, rdfs, owl, xsd and a test namespace eg, but more mappings can be registered in java code. In addition it is possible to define additional prefix mappings in the rule file, see below. Here are some example rules which illustrate most of these constructs:\n[allID: (?C rdf:type owl:Restriction), (?C owl:onProperty ?P), (?C owl:allValuesFrom ?D) -\u0026gt; (?C owl:equivalentClass all(?P, ?D)) ] [all2: (?C rdfs:subClassOf all(?P, ?D)) -\u0026gt; print(\u0026lsquo;Rule for \u0026lsquo;, ?C) [all1b: (?Y rdf:type ?D) \u0026lt;- (?X ?P ?Y), (?X rdf:type ?C) ] ]\n[max1: (?A rdf:type max(?P, 1)), (?A ?P ?B), (?A ?P ?C) -\u0026gt; (?B owl:sameAs ?C) ] Rule allID illustrates the functor use for collecting the components of an OWL restriction into a single datastructure which can then fire further rules. Rule all2 illustrates a forward rule which creates a new backward rule and also calls the print procedural primitive. Rule max1 illustrates use of numeric literals.\nRule files may be loaded and parsed using: List rules = Rule.rulesFromURL(\u0026#34;file:myfile.rules\u0026#34;); or\nBufferedReader br = /* open reader */ ; List rules = Rule.parseRules( Rule.rulesParserFromReader(br) ); or\nString ruleSrc = /* list of rules in line */ List rules = Rule.parseRules( rulesSrc ); In the first two cases (reading from a URL or a BufferedReader) the rule file is preprocessed by a simple processor which strips comments and supports some additional macro commands:\n# ... A comment line. // ... A comment line.\n@prefix pre: \u0026lt;http://domain/url#\u0026gt;. Defines a prefix pre which can be used in the rules. The prefix is local to the rule file.\n@include \u0026lt;urlToRuleFile\u0026gt;. Includes the rules defined in the given file in this file. The included rules will appear before the user defined rules, irrespective of where in the file the @include directive appears. A set of special cases is supported to allow a rule file to include the predefined rules for RDFS and OWL - in place of a real URL for a rule file use one of the keywords RDFS OWL OWLMicro OWLMini (case insensitive). So an example complete rule file which includes the RDFS rules and defines a single extra rule is:\n# Example rule file @prefix pre: \u0026lt;http://jena.hpl.hp.com/prefix#\u0026gt;. @include \u0026lt;RDFS\u0026gt;. [rule1: (?f pre:father ?a) (?u pre:brother ?f) -\u0026gt; (?u pre:uncle ?a)] [Rule Index] [Main Index]\nForward chaining engine If the rule reasoner is configured to run in forward mode then only the forward chaining engine will be used. The first time the inference Model is queried (or when an explicit prepare() call is made, see above) then all of the relevant data in the model will be submitted to the rule engine. Any rules which fire that create additional triples do so in an internal deductions graph and can in turn trigger additional rules. There is a remove primitive that can be used to remove triples and such removals can also trigger rules to fire in removal mode. This cascade of rule firings continues until no more rules can fire. It is perfectly possible, though not a good idea, to write rules that will loop infinitely at this point.\nOnce the preparation phase is complete the inference graph will act as if it were the union of all the statements in the original model together with all the statements in the internal deductions graph generated by the rule firings. All queries will see all of these statements and will be of similar speed to normal model accesses. It is possible to separately access the original raw data and the set of deduced statements if required, see above.\nIf the inference model is changed by adding or removing statements through the normal API then this will trigger further rule firings. The forward rules work incrementally and only the consequences of the added or removed triples will be explored. The default rule engine is based on the standard RETE algorithm (C.L Forgy, RETE: A fast algorithm for the many pattern/many object pattern match problem, Artificial Intelligence 1982) which is optimized for such incremental changes. When run in forward mode all rules are treated as forward even if they were written in backward (\u0026quot;\u0026lt;-\u0026quot;) syntax. This allows the same rule set to be used in different modes to explore the performance tradeoffs.\nThere is no guarantee of the order in which matching rules will fire or the order in which body terms will be tested, however once a rule fires its head-terms will be executed in left-to-right sequence.\nIn forward mode then head-terms which assert backward rules (such as all1b above) are ignored.\nThere are in fact two forward engines included within the Jena code base, an earlier non-RETE implementation is retained for now because it can be more efficient in some circumstances but has identical external semantics. This alternative engine is likely to be eliminated in a future release once more tuning has been done to the default RETE engine.\n[Rule Index] [Main Index]\nBackward chaining engine If the rule reasoner is run in backward chaining mode it uses a logic programming (LP) engine with a similar execution strategy to Prolog engines. When the inference Model is queried then the query is translated into a goal and the engine attempts to satisfy that goal by matching to any stored triples and by goal resolution against the backward chaining rules.\nExcept as noted below rules will be executed in top-to-bottom, left-to-right order with backtracking, as in SLD resolution. In fact, the rule language is essentially datalog rather than full prolog, whilst the functor syntax within rules does allow some creation of nested data structures they are flat (not recursive) and so can be regarded a syntactic sugar for datalog.\nAs a datalog language the rule syntax is a little surprising because it restricts all properties to be binary (as in RDF) and allows variables in any position including the property position. In effect, rules of the form:\n(s, p, o), (s1, p1, o1) ... \u0026lt;- (sb1, pb1, ob1), .... Can be thought of as being translated to datalog rules of the form:\ntriple(s, p, o) :- triple(sb1, pb1, ob1), ... triple(s1, p1, o1) :- triple(sb1, pb1, ob1), ... ... where \u0026quot;triple/3\u0026quot; is a hidden implicit predicate. Internally, this transformation is not actually used, instead the rules are implemented directly.\nIn addition, all the data in the raw model supplied to the engine is treated as if it were a set of triple(s,p,o) facts which are prepended to the front of the rule set. Again, the implementation does not actually work that way but consults the source graph, with all its storage and indexing capabilities, directly.\nBecause the order of triples in a Model is not defined then this is one violation to strict top-to-bottom execution. Essentially all ground facts are consulted before all rule clauses but the ordering of ground facts is arbitrary.\nTabling The LP engine supports tabling. When a goal is tabled then all previously computed matches to that goal are recorded (memoized) and used when satisfying future similar goals. When such a tabled goal is called and all known answers have been consumed then the goal will suspend until some other execution branch has generated new results and then be resumed. This allows one to successfully run recursive rules such as transitive closure which would be infinite loops in normal SLD prolog. This execution strategy, SLG, is essentially the same as that used in the well known XSB system.\nIn the Jena rule engine the goals to be tabled are identified by the property field of the triple. One can request that all goals be tabled by calling the tableAll() primitive or that all goals involving a given property P be tabled by calling table(P). Note that if any property is tabled then goals such as (A, ?P, ?X) will all be tabled because the property variable might match one of the tabled properties.\nThus the rule set:\n-\u0026gt; table(rdfs:subClassOf). [r1: (?A rdfs:subClassOf ?C) \u0026lt;- (?A rdfs:subClassOf ?B) (?B rdfs:subClassOf ?C)] will successfully compute the transitive closure of the subClassOf relation. Any query of the form (*, rdfs:subClassOf, *) will be satisfied by a mixture of ground facts and resolution of rule r1. Without the first line this rule would be an infinite loop. The tabled results of each query are kept indefinitely. This means that queries can exploit all of the results of the subgoals involved in previous queries. In essence we build up a closure of the data set in response to successive queries. The reset() operation on the inference model will force these tabled results to be discarded, thus saving memory at the expense of response time for future queries.\nWhen the inference Model is updated by adding or removing statements all tabled results are discarded by an internal reset() and the next query will rebuild the tabled results from scratch. Note that backward rules can only have one consequent so that if writing rules that might be run in either backward or forward mode then they should be limited to a single consequent each. [Rule Index] [Main Index]\nHybrid rule engine The rule reasoner has the option of employing both of the individual rule engines in conjunction. When run in this hybrid mode the data flows look something like this: The forward engine runs, as described above, and maintains a set of inferred statements in the deductions store. Any forward rules which assert new backward rules will instantiate those rules according to the forward variable bindings and pass the instantiated rules on to the backward engine.\nQueries are answered by using the backward chaining LP engine, employing the merge of the supplied and generated rules applied to the merge of the raw and deduced data.\nThis split allows the ruleset developer to achieve greater performance by only including backward rules which are relevant to the dataset at hand. In particular, we can use the forward rules to compile a set of backward rules from the ontology information in the dataset. As a simple example consider trying to implement the RDFS subPropertyOf entailments using a rule engine. A simple approach would involve rules like:\n(?a ?q ?b) \u0026lt;- (?p rdfs:subPropertyOf ?q), (?a ?p ?b) . Such a rule would work but every goal would match the head of this rule and so every query would invoke a dynamic test for whether there was a subProperty of the property being queried for. Instead the hybrid rule:\n(?p rdfs:subPropertyOf ?q), notEqual(?p,?q) -\u0026gt; [ (?a ?q ?b) \u0026lt;- (?a ?p ?b) ] . would precompile all the declared subPropertyOf relationships into simple chain rules which would only fire if the query goal references a property which actually has a sub property. If there are no subPropertyOf relationships then there will be no overhead at query time for such a rule.\nNote that there are no loops in the above data flows. The backward rules are not employed when searching for matches to forward rule terms. This two-phase execution is simple to understand and keeps the semantics of the rule engines straightforward. However, it does mean that care needs to be take when formulating rules. If in the above example there were ways that the subPropertyOf relation could be derived from some other relations then that derivation would have to be accessible to the forward rules for the above to be complete.\nUpdates to an inference Model working in hybrid mode will discard all the tabled LP results, as they do in the pure backward case. However, the forward rules still work incrementally, including incrementally asserting or removing backward rules in response to the data changes.\n[Rule Index] [Main Index]\nGenericRuleReasoner configuration As with the other reasoners there are a set of parameters, identified by RDF properties, to control behaviour of the GenericRuleReasoner. These parameters can be set using the Reasoner.setParameter call or passed into the Reasoner factory in an RDF Model.\nThe primary parameter required to instantiate a useful GenericRuleReasoner is a rule set which can be passed into the constructor, for example:\nString ruleSrc = \u0026#34;[rule1: (?a eg:p ?b) (?b eg:p ?c) -\u0026amp;gt; (?a eg:p ?c)]\u0026#34;; List rules = Rule.parseRules(ruleSrc); ... Reasoner reasoner = new GenericRuleReasoner(rules);\u0026lt;/pre\u0026gt; A short cut, useful when the rules are defined in local text files using the syntax described earlier, is the ruleSet parameter which gives a file name which should be loadable from either the classpath or relative to the current working directory.\nSummary of parameters Parameter Values Description PROPruleMode \u0026quot;forward\u0026quot;, \u0026quot;forwardRETE\u0026quot;, \u0026quot;backward\u0026quot;, \u0026quot;hybrid\u0026quot; Sets the rule direction mode as discussed above. Default is \u0026quot;hybrid\u0026quot;. PROPruleSet filename-string The name of a rule text file which can be found on the classpath or from the current directory. PROPenableTGCCaching Boolean If true, causes an instance of the TransitiveReasoner to be inserted in the forward dataflow to cache the transitive closure of the subProperty and subClass lattices. PROPenableFunctorFiltering Boolean If set to true, this causes the structured literals (functors) generated by rules to be filtered out of any final queries. This allows them to be used for storing intermediate results hidden from the view of the InfModel's clients. PROPenableOWLTranslation Boolean If set to true this causes a procedural preprocessing step to be inserted in the dataflow which supports the OWL reasoner (it translates intersectionOf clauses into groups of backward rules in a way that is clumsy to express in pure rule form). PROPtraceOn Boolean If true, switches on exhaustive tracing of rule executions at the INFO level. PROPderivationLogging Boolean If true, causes derivation routes to be recorded internally so that future getDerivation calls can return useful information. [Rule Index] [Main Index]\nBuiltin primitives The procedural primitives which can be called by the rules are each implemented by a Java object stored in a registry. Additional primitives can be created and registered - see below for more details.\nEach primitive can optionally be used in either the rule body, the rule head or both. If used in the rule body then as well as binding variables (and any procedural side-effects like printing) the primitive can act as a test - if it returns false the rule will not match. Primitives used in the rule head are only used for their side effects.\nThe set of builtin primitives available at the time writing are:\nBuiltin Operations isLiteral(?x) notLiteral(?x)\nisFunctor(?x) notFunctor(?x)\nisBNode(?x) notBNode(?x)\nTest whether the single argument is or is not a literal, a functor-valued literal or a blank-node, respectively. bound(?x...) unbound(?x..) Test if all of the arguments are bound (not bound) variables equal(?x,?y) notEqual(?x,?y) Test if x=y (or x != y). The equality test is semantic equality so that, for example, the xsd:int 1 and the xsd:decimal 1 would test equal. lessThan(?x, ?y), greaterThan(?x, ?y)\nle(?x, ?y), ge(?x, ?y)\nTest if x is \u0026lt;, \u0026gt;, \u0026lt;= or \u0026gt;= y. Only passes if both x and y are numbers or time instants (can be integer or floating point or XSDDateTime). sum(?a, ?b, ?c)\naddOne(?a, ?c)\ndifference(?a, ?b, ?c)\nmin(?a, ?b, ?c)\nmax(?a, ?b, ?c)\nproduct(?a, ?b, ?c)\nquotient(?a, ?b, ?c)\nSets c to be (a+b), (a+1) (a-b), min(a,b), max(a,b), (a*b), (a/b). Note that these do not run backwards, if in sum a and c are bound and b is unbound then the test will fail rather than bind b to (c-a). This could be fixed. strConcat(?a1, .. ?an, ?t)\nuriConcat(?a1, .. ?an, ?t)\nConcatenates the lexical form of all the arguments except the last, then binds the last argument to a plain literal (strConcat) or a URI node (uriConcat) with that lexical form. In both cases if an argument node is a URI node the URI will be used as the lexical form. regex(?t, ?p)\nregex(?t, ?p, ?m1, .. ?mn)\nMatches the lexical form of a literal (?t) against a regular expression pattern given by another literal (?p). If the match succeeds, and if there are any additional arguments then it will bind the first n capture groups to the arguments ?m1 to ?mn. The regular expression pattern syntax is that provided by java.util.regex. Note that the capture groups are numbered from 1 and the first capture group will be bound to ?m1, we ignore the implicit capture group 0 which corresponds to the entire matched string. So for example regexp('foo bar', '(.*) (.*)', ?m1, ?m2) will bind m1 to \"foo\" and m2 to \"bar\". now(?x) Binds ?x to an xsd:dateTime value corresponding to the current time. makeTemp(?x) Binds ?x to a newly created blank node. makeInstance(?x, ?p, ?v)\nmakeInstance(?x, ?p, ?t, ?v) Binds ?v to be a blank node which is asserted as the value of the ?p property on resource ?x and optionally has type ?t. Multiple calls with the same arguments will return the same blank node each time - thus allowing this call to be used in backward rules. makeSkolem(?x, ?v1, ... ?vn) Binds ?x to be a blank node. The blank node is generated based on the values of the remain ?vi arguments, so the same combination of arguments will generate the same bNode. noValue(?x, ?p)\nnoValue(?x ?p ?v) True if there is no known triple (x, p, *) or (x, p, v) in the model or the explicit forward deductions so far. remove(n, ...)\ndrop(n, ...) Remove the statement (triple) which caused the n'th body term of this (forward-only) rule to match. Remove will propagate the change to other consequent rules including the firing rule (which must thus be guarded by some other clauses). In particular, if the removed statement (triple) appears in the body of a rule that has already fired, the consequences of such rule are retracted from the deducted model. Drop will silently remove the triple(s) from the graph but not fire any rules as a consequence. These are clearly non-monotonic operations and, in particular, the behaviour of a rule set in which different rules both drop and create the same triple(s) is undefined. isDType(?l, ?t) notDType(?l, ?t) Tests if literal ?l is (or is not) an instance of the datatype defined by resource ?t. print(?x, ...) Print (to standard out) a representation of each argument. This is useful for debugging rather than serious IO work. listContains(?l, ?x) listNotContains(?l, ?x) Passes if ?l is a list which contains (does not contain) the element ?x, both arguments must be ground, can not be used as a generator. listEntry(?list, ?index, ?val) Binds ?val to the ?index'th entry in the RDF list ?list. If there is no such entry the variable will be unbound and the call will fail. Only usable in rule bodies. listLength(?l, ?len) Binds ?len to the length of the list ?l. listEqual(?la, ?lb) listNotEqual(?la, ?lb) listEqual tests if the two arguments are both lists and contain the same elements. The equality test is semantic equality on literals (sameValueAs) but will not take into account owl:sameAs aliases. listNotEqual is the negation of this (passes if listEqual fails). listMapAsObject(?s, ?p ?l) listMapAsSubject(?l, ?p, ?o) These can only be used as actions in the head of a rule. They deduce a set of triples derived from the list argument ?l : listMapAsObject asserts triples (?s ?p ?x) for each ?x in the list ?l, listMapAsSubject asserts triples (?x ?p ?o). table(?p) tableAll() Declare that all goals involving property ?p (or all goals) should be tabled by the backward engine. hide(p) Declares that statements involving the predicate p should be hidden. Queries to the model will not report such statements. This is useful to enable non-monotonic forward rules to define flag predicates which are only used for inference control and do not \"pollute\" the inference results. [Rule Index] [Main Index]\nExample As a simple illustration suppose we wish to create a simple ontology language in which we can declare one property as being the concatenation of two others and to build a rule reasoner to implement this.\nAs a simple design we define two properties eg:concatFirst, eg:concatSecond which declare the first and second properties in a concatenation. Thus the triples:\neg:r eg:concatFirst eg:p . eg:r eg:concatSecond eg:q . mean that the property r = p o q.\nSuppose we have a Jena Model rawModel which contains the above assertions together with the additional facts:\neg:A eg:p eg:B . eg:B eg:q eg:C . Then we want to be able to conclude that A is related to C through the composite relation r. The following code fragment constructs and runs a rule reasoner instance to implement this:\nString rules = \u0026#34;[r1: (?c eg:concatFirst ?p), (?c eg:concatSecond ?q) -\u0026amp;gt; \u0026#34; + \u0026#34; [r1b: (?x ?c ?y) \u0026amp;lt;- (?x ?p ?z) (?z ?q ?y)] ]\u0026#34;; Reasoner reasoner = new GenericRuleReasoner(Rule.parseRules(rules)); InfModel inf = ModelFactory.createInfModel(reasoner, rawData); System.out.println(\u0026#34;A * * =\u0026amp;gt;\u0026#34;); Iterator list = inf.listStatements(A, null, (RDFNode)null); while (list.hasNext()) { System.out.println(\u0026#34; - \u0026#34; + list.next()); } When run on a rawData model contain the above four triples this generates the (correct) output:\nA * * =\u0026gt; - [urn:x-hp:eg/A, urn:x-hp:eg/p, urn:x-hp:eg/B] - [urn:x-hp:eg/A, urn:x-hp:eg/r, urn:x-hp:eg/C] Example 2 As a second example, we'll look at ways to define a property as being both symmetric and transitive. Of course, this can be done directly in OWL but there are times when one might wish to do this outside of the full OWL rule set and, in any case, it makes for a compact illustration.\nThis time we'll put the rules in a separate file to simplify editing them and we'll use the machinery for configuring a reasoner using an RDF specification. The code then looks something like this:\n// Register a namespace for use in the demo String demoURI = \u0026#34;http://jena.hpl.hp.com/demo#\u0026#34;; PrintUtil.registerPrefix(\u0026#34;demo\u0026#34;, demoURI); // Create an (RDF) specification of a hybrid reasoner which // loads its data from an external file. Model m = ModelFactory.createDefaultModel(); Resource configuration = m.createResource(); configuration.addProperty(ReasonerVocabulary.PROPruleMode, \u0026#34;hybrid\u0026#34;); configuration.addProperty(ReasonerVocabulary.PROPruleSet, \u0026#34;data/demo.rules\u0026#34;); // Create an instance of such a reasoner Reasoner reasoner = GenericRuleReasonerFactory.theInstance().create(configuration); // Load test data Model data = RDFDataMgr.loadModel(\u0026#34;file:data/demoData.rdf\u0026#34;); InfModel infmodel = ModelFactory.createInfModel(reasoner, data); // Query for all things related to \u0026#34;a\u0026#34; by \u0026#34;p\u0026#34; Property p = data.getProperty(demoURI, \u0026#34;p\u0026#34;); Resource a = data.getResource(demoURI + \u0026#34;a\u0026#34;); StmtIterator i = infmodel.listStatements(a, p, (RDFNode)null); while (i.hasNext()) { System.out.println(\u0026#34; - \u0026#34; + PrintUtil.print(i.nextStatement())); } Here is file data/demo.rules which defines property demo:p as being both symmetric and transitive using pure forward rules:\n[transitiveRule: (?A demo:p ?B), (?B demo:p ?C) -\u0026gt; (?A \u0026gt; demo:p ?C) ] [symmetricRule: (?Y demo:p ?X) -\u0026gt; (?X demo:p ?Y) ] Running this on data/demoData.rdf gives the correct output:\n- (demo:a demo:p demo:c) - (demo:a demo:p demo:a) - (demo:a demo:p demo:d) - (demo:a demo:p demo:b) However, those example rules are overly specialized. It would be better to define a new class of property to indicate symmetric-transitive properties and and make demo:p a member of that class. We can generalize the rules to support this:\n[transitiveRule: (?P rdf:type demo:TransProp)(?A ?P ?B), (?B ?P ?C) -\u0026gt; (?A ?P ?C) ] [symmetricRule: (?P rdf:type demo:TransProp)(?Y ?P ?X) -\u0026gt; (?X ?P ?Y) ] These rules work but they compute the complete symmetric-transitive closure of p when the graph is first prepared. Suppose we have a lot of p values but only want to query some of them it would be better to compute the closure on demand using backward rules. We could do this using the same rules run in pure backward mode but then the rules would fire lots of times as they checked every property at query time to see if it has been declared as a demo:TransProp. The hybrid rule system allows us to get round this by using forward rules to recognize any demo:TransProp declarations once and to generate the appropriate backward rules:\n-\u0026gt; tableAll(). [rule1: (?P rdf:type demo:TransProp) -\u0026gt; [ (?X ?P ?Y) \u0026lt;- (?Y ?P ?X) ] [ (?A ?P ?C) \u0026lt;- (?A ?P ?B), (?B ?P ?C) ] ] [Rule Index] [Main Index]\nCombining RDFS/OWL with custom rules Sometimes one wishes to write generic inference rules but combine them with some RDFS or OWL inference. With the current Jena architecture limited forms of this is possible but you need to be aware of the limitations.\nThere are two ways of achieving this sort of configuration within Jena (not counting using an external engine that already supports such a combination).\nFirstly, it is possible to cascade reasoners, i.e. to construct one InfModel using another InfModel as the base data. The strength of this approach is that the two inference processes are separate and so can be of different sorts. For example one could create a GenericRuleReasoner whose base model is an external OWL reasoner. The chief weakness of the approach is that it is \"layered\" - the outer InfModel can see the results of the inner InfModel but not vice versa. For some applications that layering is fine and it is clear which way the inference should be layered, for some it is not. A second possible weakness is performance. A query to an InfModel is generally expensive and involves lots of queries to the data. The outer InfModel in our layered case will typically issue a lot of queries to the inner model, each of which may trigger more inference. If the inner model caches all of its inferences (e.g. a forward rule engine) then there may not be very much redundancy there but if not then performance can suffer dramatically. Secondly, one can create a single GenericRuleReasoner whose rules combine rules for RDFS or OWL and custom rules. At first glance this looks like it gets round the layering limitation. However, the default Jena RDFS and OWL rulesets use the Hybrid rule engine. The hybrid engine is itself layered, forward rules do not see the results of any backward rules. Thus layering is still present though you have finer grain control - all your inferences you want the RDFS/OWL rules to see should be forward, all the inferences which need all of the results of the RDFS/OWL rules should be backward. Note that the RDFS and OWL rulesets assume certain settings for the GenericRuleReasoner so a typical configuration is:\nModel data = RDFDataMgr.loadModel(\u0026#34;file:data.n3\u0026#34;); List rules = Rule.rulesFromURL(\u0026#34;myrules.rules\u0026#34;); GenericRuleReasoner reasoner = new GenericRuleReasoner(rules); reasoner.setOWLTranslation(true); // not needed in RDFS case reasoner.setTransitiveClosureCaching(true); InfModel inf = ModelFactory.createInfModel(reasoner, data); Where the myrules.rules file will use @include to include one of the RDFS or OWL rule sets.\nOne useful variant on this option, at least in simple cases, is to manually include a pure (non-hybrid) ruleset for the RDFS/OWL fragment you want so that there is no layering problem. [The reason the default rulesets use the hybrid mode is a performance tradeoff - trying to balance the better performance of forward reasoning with the cost of computing all possible answers when an application might only want a few.]\nA simple example of this is that the interesting bits of RDFS can be captured by enabling TransitiveClosureCaching and including just the four core rules:\n[rdfs2: (?x ?p ?y), (?p rdfs:domain ?c) -\u0026gt; (?x rdf:type ?c)] [rdfs3: (?x ?p ?y), (?p rdfs:range ?c) -\u0026gt; (?y rdf:type ?c)] [rdfs6: (?a ?p ?b), (?p rdfs:subPropertyOf ?q) -\u0026gt; (?a ?q ?b)] [rdfs9: (?x rdfs:subClassOf ?y), (?a rdf:type ?x) -\u0026gt; (?a rdf:type ?y)] [Rule Index] [Main Index]\nNotes One final aspect of the general rule engine to mention is that of validation rules. We described earlier how reasoners can implement a validate call which returns a set of error reports and warnings about inconsistencies in a dataset. Some reasoners (e.g. the RDFS reasoner) implement this feature through procedural code. Others (e.g. the OWL reasoner) does so using yet more rules.\nValidation rules take the general form:\n(?v rb:validation on()) ... -\u0026gt; [ (?X rb:violation error('summary', 'description', args)) \u0026lt;- ...) ] . The validation calls can be \"switched on\" by inserting an additional triple into the graph of the form:\n_:anon rb:validation on() . This makes it possible to build rules, such as the template above, which are ignored unless validation has been switched on - thus avoiding potential overhead in normal operation. This is optional and the \u0026quot;validation on()\u0026quot; guard can be omitted.\nThen the validate call queries the inference graph for all triples of the form:\n?x rb:violation f(summary, description, args) . The subject resource is the \u0026quot;prime suspect\u0026quot; implicated in the inconsistency, the relation rb:violation is a reserved property used to communicate validation reports from the rules to the reasoner, the object is a structured (functor-valued) literal. The name of the functor indicates the type of violation and is normally error or warning, the first argument is a short form summary of the type of problem, the second is a descriptive text and the remaining arguments are other resources involved in the inconsistency. Future extensions will improve the formatting capabilities and flexibility of this mechanism. [Rule Index] [Main Index]\nExtensions There are several places at which the rule system can be extended by application code.\nRule syntax First, as mentioned earlier, the rule engines themselves only see rules in terms of the Rule Java object. Thus applications are free to define an alternative rule syntax so long as it can be compiled into Rule objects.\nBuiltins Second, the set of procedural builtins can be extended. A builtin should implement the Builtin interface. The easiest way to achieve this is by subclassing BaseBuiltin and defining a name (getName), the number of arguments expected (getArgLength) and one or both of bodyCall and headAction. The bodyCall method is used when the builtin is invoked in the body of a rule clause and should return true or false according to whether the test passes. In both cases the arguments may be variables or bound values and the supplied RuleContext object can be used to dereference bound variables and to bind new variables. Once the Builtin has been defined then an instance of it needs to be registered with BuiltinRegistry for it to be seen by the rule parser and interpreters.\nThe easiest way to experiment with this is to look at the examples in the builtins directory. Preprocessing hooks The rule reasoner can optionally run a sequence of procedural preprocessing hooks over the data at the time the inference graph is prepared. These procedural hooks can be used to perform tests or translations which are slow or inconvenient to express in rule form. See GenericRuleReasoner.addPreprocessingHook and the RulePreprocessHook class for more details.\n[Index]\nExtending the inference support Apart from the extension points in the rule reasoner discussed above, the intention is that it should be possible to plug external inference engines into Jena. The core interfaces of InfGraph and Reasoner are kept as simple and generic as we can to make this possible and the ReasonerRegistry provides a means for mapping from reasoner ids (URIs) to reasoner instances at run time.\nIn a future Jena release we plan to provide at least one adapter to an example, freely available, reasoner to both validate the machinery and to provide an example of how this extension can be done.\n[Index]\nFutures Contributions for the following areas would be very welcome:\nDevelop a custom equality reasoner which can handle the \u0026quot;owl:sameAs\u0026quot; and related processing more efficiently that the plain rules engine. Tune the RETE engine to perform better with highly non-ground patterns. Tune the LP engine to further reduce memory usage (in particular explore subsumption tabling rather than the current variant tabling). Investigate routes to better integrating the rule reasoner with underlying database engines. This is a rather larger and longer term task than the others above and is the least likely to happen in the near future. [Index]\n","permalink":"https://jena.apache.org/documentation/inference/","tags":null,"title":"Reasoners and rule engines: Jena inference support"},{"categories":null,"contents":" Reification API support will be removed in Jena5.\nIntroduction This document describes the Jena API support for reification. it. As always, consult the Javadoc for interface details.\nReification in RDF and Jena is the ability to treat a Statement as a Resource, and hence to make assertions about that statement. A statement may be reified as many different resources, allowing different manifestations (\u0026ldquo;statings\u0026rdquo;) of that statement to be treated differently if required.\nRDF represents a reified statement as four statements with particular RDF properties and objects: the statement (S, P, O), reified by resource R, is represented by:\nR rdf:type rdf:Statement R rdf:subject S R rdf:predicate P R rdf:object O We shall call these four such statements a reification quad and the components quadlets. Users of reification in Jena may, by default, simply manipulate reified statements as these quads. However, just as for Bag, Seq, Alt and RDF lists in ordinary models, or ontology classes and individuals in OntModels, Jena has additional support for manipulating reified statements.\nThe interface ReifiedStatement is used to represent a reified statement as a Jena Resource that has direct access to the statement it reifies. The method\nReifiedStatement::getStatement() returns the Statement that the resource is reifying. All the other Resource methods, of course, may be applied to a ReifiedStatement.\nConverting resources to reified statements If a resource R is associated with a reified statement, but might not itself be a ReifiedStatement object, the conversion method RDFNode::as(Class) can be used to find (or create) a ReifiedStatement:\n(ReifiedStatement) R.as(ReifiedStatement.class) For example, a model that has been read in from an RDF/XML file may have reified statements: knowing the name of the resource allows a ReifiedStatement object to be constructed without knowing the statement itself.\nIf there is no such associated reified statement, a CannotReifyException is thrown. To find out in advance if the conversion is possible, use the predicate RDFNode::canAs(ReifiedStatement.class). (Jena only counts as \u0026ldquo;an associated reified statement\u0026rdquo; a resource with exactly one rdf:subject, rdf:predicate, and rdf:object which has rdf:type rdf:Statement. It can of course have other properties.)\nTesting statements for reification You may wish to know if some Statement is reified. The methods Statement::isReified() and Model::isReified(Statement) return true if (and only if) the statement has been reified in the model. Note that the Statement method tests to see if the statement is reified in its own model, and the model method tests to see if the Statement is reified in that model; there is no test to see if a Statement is reified in any other models.\nListing reified statements Just as listStatements is used to find the statements present in some model, there are methods for finding the reified statements of a model. Each of them returns a RSIterator object, which is an iterator each of whose elements are ReifiedStatements and for which the convenience method nextRS() will deliver a suitably-cast reified statement.\nStatement::listReifiedStatements() - all the reifications of this statement in its model. Model::listReifiedStatements() - all the reified statements in this model. Model::listReifiedStatements(Statement s) - all the reified statements reifying s in this model. Creating reified statements directly You do not have to create reified statements by asserting the quad into a Model; they can be created directly from their Statements using one of the methods:\nStatement::createReifiedStatement() Statement::createReifiedStatement(String) Model::createReifiedStatement(Statement) Model::createReifiedStatement(String,Statement) Each of these returns a ReifiedStatement who\u0026rsquo;s getStatement() method delivers the original statement (actually, a .equals() statement; it may not be the identical statement object). If the creation method passed in a (non-null) String, the ReifiedStatement is a named resource and that string is its URI. Otherwise it is a newly-minted bnode. The methods on Statement create a reified statement in that statements model; those on Model create a reified statement in that model.\nIt is not permitted for two different (non-equals) statements to be reified onto the same resource. An attempt to do so will generate an AlreadyReifiedException.\nThe additional method Model::getAnyReifiedStatement(Statement) returns some reification of the supplied Statement; an existing one if possible, otherwise a fresh one (reified by a fresh bnode).\nRemoving reified statements There are two methods which remove all the reifications of a Statement in some Model:\nStatement::removeReification() Model::removeAllReifications(Statement) All the reified statements in the model that reify the given statement are removed, whatever their reifying resource. To remove a particular reified statement only, use\nModel::removeReification(ReifiedStatement) ","permalink":"https://jena.apache.org/documentation/notes/reification.html","tags":null,"title":"Reification HowTo"},{"categories":null,"contents":"Reification styles Prior to version 2.10.0 of Jena, there were 3 styles of reification, \u0026ldquo;standard\u0026rdquo;, \u0026ldquo;minimal\u0026rdquo; and \u0026ldquo;convenient\u0026rdquo;. As of 2.10.0 and later, only what was previously the \u0026ldquo;standard\u0026rdquo; style is supported.\nBy default and as you might expect, Jena models allow reification quads to be manifested as ReifiedStatements. Similarly, explicitly created ReifiedStatements are visible as statement quads.\nSometimes, this is not desirable. For example, in an application that reifies large numbers of statements in the same model as those statements, most of the results from listStatements() will be quadlets; this is inefficient and confusing. One choice is to reify the statements in a different model. Another is to take advantage of reification styles.\nEach model has a reification style, described by constants in ModelFactory. The default style is called Standard because it behaves more closely to the RDF standard. There are two other reification styles to choose from:\nConvenient: reification quadlets are not visible in the results of listStatements)(). Otherwise everything is normal; quadlets that are added to the model contribute to ReifiedStatement construction. Minimal: reification quadlets play no role at all in the construction of ReifiedStatements, which can only be created by the methods discussed earlier. This style is most similar to that of Jena 1. The method ModelFactory.createDefaultModel() takes an optional Style argument, which defaults to Standard. Similarly, createFileModelMaker() and createMemModelMaker() can take Style arguments which are applied to every model they create. To take a model with hidden reification quads and expose them as statements, the method ModelFactory.withHiddenStatements(Model m) produces a new model which does just that.\n","permalink":"https://jena.apache.org/documentation/archive/reification_previous.html","tags":null,"title":"Reification styles (archive material)"},{"categories":null,"contents":"Please report bugs using Jena\u0026rsquo;s GitHub Issues. General suggestions or requests for changes can also be discussed on the user list or Jena\u0026rsquo;s GitHub Discussions but are less likely to be accidentally forgotten if you log them in a GitHub Issue.\nFor any security issues please refer to our Security Advisories page for how those should be reported and handled.\nPatches and other code contributions are made via git pull requests. See \u0026lsquo;Getting Involved\u0026rsquo;\nPlease note that ASF requires that all contributions must be covered by an appropriate license.\n","permalink":"https://jena.apache.org/help_and_support/bugs_and_suggestions.html","tags":null,"title":"Reporting bugs and making suggestions"},{"categories":null,"contents":"This page details how to review contributions submitted for Apache Jena, it is intended primarily for Jena committers but is also useful in helping contributors understand what we expect from a contribution.\nPatch Guidelines When reviewing contributed patches to Jena the committers are going to be considered the following:\nDoes the pull request includes tests? Does the pull request includes documentation? Does it have Apache copyright headers? Are there any @author tags? Is it contributed to Apache? What is the size and impact on Jena of the contribution? Is IP clearance required? Pull requests and commit messages Including Tests Including tests is almost always a must for a patch unless the patch is for non-code content e.g. CMS diffs, maven config tweaks.\nTests are essential for bug fixes but should be considered mandatory for any patch. Jena uses JUnit for tests and uses the standard Java src/test/ directory conventions within its modules.\nIncluding Documentation Users will not find or understand new feature if there is no documentation.\nApache Copyright Headers Code for inclusion in Jena should contain Apache copyright headers, only the contributor should remove/change copyright headers so if a different copyright header is present then you must request that the contributor change the copyright headers.\nNo @author Tags The Jena PMC have agreed not to maintain @author tags in the code since generally authorship can be identified from the SVN history anyway and over time committers will typically touch much code even if only briefly and in minor ways.\n@author tags will not prevent a contribution being accepted but should be removed by the committer who integrates the contribution.\nCode style Jena does not have a particular formal code style specification, but here are some simple tips for keeping your contribution in good order:\nJena uses the Java code conventions with spaces (not tabs!), an indent of 4, and opening braces on the same line. Use no trailing whitespace if avoidable. Use common sense to make your code readable for the next person. Don\u0026rsquo;t create a method signature that throws checked exceptions that aren\u0026rsquo;t ever actually thrown from the code in that method unless an API supertype specifies that signature. Otherwise, clients of your code will have to include unnecessary handling code. Don\u0026rsquo;t leave unused imports in your code. IDEs provide facilities to clean imports. If a type declares a supertype that isn\u0026rsquo;t a required declaration, consider whether that clarifies or confuses the intent. Minimize the new compiler warnings your patch creates. If you use @SuppressWarnings to hide them, please add a comment explaining the situation. Remove unused local variables or fields. If there is valuable code in some unused private method, add a @SuppressWarnings(\u0026ldquo;unused\u0026rdquo;) with an explanation of when it might become useful. Contribution to Apache The Apache License states that any contribution to an Apache project is automatically considered to be contributed to the Apache foundation and thus liable for inclusion in an Apache project.\nGenerally you will not have to worry about this but if anyone ever states that code is not for inclusion then we must abide by that or request that they make a statement that they are contributing the code to Apache.\nSize and Impact on Jena Small patches can generally be incorporated immediately, larger patches - particularly those adding significant features - should usually be discussed on the dev@jena.apache.org list prior to acceptance.\nUse your judgement here, a few hundred lines of code may be considered small if it isn\u0026rsquo;t changing/extending functionality significantly. Conversely a small patch that changes a core behavior should be more widely discussed.\nIP Clearance Depending on where a patch comes from there may be IP clearance issues, for small patches this is generally a non-issue. Where this comes into play is when a large patch is coming in which has been developed completely externally to Jena, particularly if that patch has been developed for/on behalf of a company rather than be developers working in their free time.\nFor patches like this we may require that the company in question submit a CCLA and that the developers involve submit ICLAs. There may also need to be IP Clearance vote called on the developer list to give developers a chance to review the code and check that there isn\u0026rsquo;t anything being incorporated that violates Apache policy.\nPull Requests and Commit Messages A pull request is a single unit so a large number of commits details the evolution internally but does not help record the external contribution.\nConsider asking the contributor to merge commits into a few with useful messages for an external reviewer.\nProject Processes Project Processes including:\nRelease process Commit Workflow for Github-ASF ","permalink":"https://jena.apache.org/getting_involved/reviewing_contributions.html","tags":null,"title":"Reviewing Contributions"},{"categories":null,"contents":"Fuseki/UI can be run in a number of ways:\nAs a standalone server As a service run by the operation system, for example, started when the machine boots As a Web Application inside a container such as Apache Tomcat or Jetty. Fuseki is also packaged as a plain server \u0026ldquo;Fuseki Main\u0026rdquo; with no UI for use as a configurable SPARQL server, for building as a Docker container, and as a deployment and development standalone server. It supports the arguments used by the standalone server.\nThe configuration file is the same format for all forms of the Fuseki server.\nSee \u0026ldquo;Fuseki Configuration\u0026rdquo; for information on how to provide datasets and configure services using the configuration file.\nFuseki as a Standalone Server This is running Fuseki from the command line.\nTo publish at http://host:3030/NAME:\nwhere /NAME is the dataset publishing name at this server in URI space.\nTDB2 database:\nfuseki-server [--loc=DIR] [[--update] /NAME] The --loc directory is either a TDB1 or TDB2 database. The directory DIR must exist. If the database in DIR does not exist, then a new database is created. By default this is a TDB2 database unless the argument \u0026ndash;tdb1 is given.\nAn in-memory, non-peristent database (always updatable) is:\nfuseki-server --mem /NAME Load a file at start and provide it read-only:\nfuseki-server --file=MyData.ttl /NAME where MyData.ttl can be any RDF format, both triples or quads.\nAdministrative functions are only available from \u0026ldquo;localhost\u0026rdquo;.\nSee fuseki-server --help for details of more arguments.\nLayout When run from the command line, the server creates its work area in the directory named by environment variable FUSEKI_BASE. When run from the command line, this defaults to the current directory.\nFuseki layout\nIf you get the error message Can't find jarfile to run then you either need to put a copy of fuseki-server.jar in the current directory or set the environment variable FUSEKI_HOME to point to an unpacked Fuseki distribution.\nStarting with no dataset and no configuration is possible. Datasets can be added from the admin UI to a running server.\nFuseki as a Service Fuseki can run as an operating system service, started when the server machine boots. The script fuseki is a Linux init.d with the common secondary arguments of start and stop.\nProcess arguments are read from /etc/default/fuseki including FUSEKI_HOME and FUSEKI_BASE. FUSEKI_HOME should be the directory where the distribution was unpacked.\nFuseki as a Web Application Fuseki can run from a WAR file.\nThe war file can be downloaded from the project downloads page. It should then be renamed to the webapp name \u0026ldquo;ROOT.war\u0026rdquo; (this means there no name in the URL) or \u0026ldquo;fuseki.war\u0026rdquo; (with a name /fuseki/) or some other choice of name.\nFUSEKI_HOME is not applicable.\nFUSEKI_BASE defaults to /etc/fuseki which must be a writeable directory. It is initialised the first time Fuseki runs, including a Apache Shiro security file but this is only intended as a starting point. It restricts use of the admin UI to the local machine.\nWhen deploying as a web application a more fully featured Admin API is made available and described on the Fuseki Server Protocol (REST API) page.\nConfiguring logging When running from a WAR file in a webapp container such as Apache Tomcat, the logging configuration comes from the file log4j2.properties in the root of the unpacked war file, e.g. /var/lib/tomcat9/webapps/fuseki/log4j2.properties.\nThe name of the file is taken from web.xml:\nlog4jConfiguration log4j2.properties This only applies when running in a webapp container. When run from the command line, the server looks for log4j2.properties in the current directory and if not found, uses a built-in configuration.\nThis logging goes to the standard output.\nFuseki with Tomcat9 and systemd systemd may be set to sandbox Tomcat9. The file area /etc/fuseki will not be writable to Fuseki. To enable this area, add ReadWritePaths=/etc/fuseki/ to the file /etc/systemd/system/tomcat9.service.d/override.conf, creating the file if necessary.\nsystemd also captures standard out and routes it to the system journal:\njournalctl -u tomcat9 To direct the output to the traditional location of /var/log/tomcat9/catalina.out use the StandardOutput setting in override.conf:\n[Service] # Allow access to the Fuseki area ReadWritePaths=/etc/fuseki/ StandardOutput=file:/var/log/tomcat9/catalina.out ","permalink":"https://jena.apache.org/documentation/fuseki2/fuseki-webapp.html","tags":null,"title":"Running Fuseki with UI"},{"categories":null,"contents":"Legacy Documentation : not up-to-date\nThe original ARQ parser will be removed from Jena\nNormally, both ARP and Jena are used to read files either from the local machine or from the Web. A different use case, addressed here, is when the XML source is available in-memory in some way. In these cases, ARP and Jena can be used as a SAX event handler, turning SAX events into triples, or a DOM tree can be parsed into a Jena Model.\n1. Overview To read an arbitrary SAX source as triples to be added into a Jena model, it is not possible to use a Model.read() operation. Instead, you construct a SAX event handler of class SAX2Model, using the create method, install these as the handler on your SAX event source, and then stream the SAX events. It is possible to have fine-grained control over the SAX events, for instance, by inserting or deleting events, before passing them to the SAX2Model handler.\nSample Code This code uses the Xerces parser as a SAX event stream, and adds the triple to a Model using default options.\n// Use your own SAX source. XMLReader saxParser = new SAXParser(); // set up SAX input InputStream in = new FileInputStream(\u0026quot;kb.rdf\u0026quot;); InputSource ins = new InputSource(in); ins.setSystemId(base); Model m = ModelFactory.createDefaultModel(); String base = \u0026quot;http://example.org/\u0026quot;; // create handler, linked to Model SAX2Model handler = SAX2Model.create(base, m); // install handler on SAX event stream SAX2RDF.installHandlers(saxParser, handler); try { try { saxParser.parse(ins); } finally { // MUST ensure handler is closed. handler.close(); } } catch (SAXParseException e) { // Fatal parsing errors end here, // but they will already have been reported. } Initializing SAX event source If your SAX event source is a subclass of XMLReader, then the installHandlers static method can be used as shown in the sample. Otherwise, you have to do it yourself. The installHandlers code is like this:\nstatic public void installHandlers(XMLReader rdr, XMLHandler sax2rdf) throws SAXException { rdr.setEntityResolver(sax2rdf); rdr.setDTDHandler(sax2rdf); rdr.setContentHandler(sax2rdf); rdr.setErrorHandler(sax2rdf); rdr.setFeature(\u0026quot;http://xml.org/sax/features/namespaces\u0026quot;, true); rdr.setFeature( \u0026quot;http://xml.org/sax/features/namespace-prefixes\u0026quot;, true); rdr.setProperty( \u0026quot;http://xml.org/sax/properties/lexical-handler\u0026quot;, sax2rdf); } For some other SAX source, the exact code will differ, but the required operations are as above.\nError Handler The SAX2Model handler supports the setErrorHandler method, from the Jena RDFReader interface. This is used in the same way as that method to control error reporting.\nA specific fatal error, new in Jena 2.3, is ERR_INTERRUPTED, which indicates that the current Thread received an interrupt. This allows long jobs to be aborted on user request.\nOptions The SAX2Model handler supports the setProperty method, from the Jena RDFReader interface. This is used in nearly the same way to have fine grain control over ARPs behaviour, particularly over error reporting, see the I/O howto. Setting SAX or Xerces properties cannot be done using this method.\nXML Lang and Namespaces If you are only treating some document subset as RDF/XML then it is necessary to ensure that ARP knows the correct value for xml:lang and desirable that it knows the correct mappings of namespace prefixes.\nThere is a second version of the create method, which allows specification of the xml:lang value from the outer context. If this is inappropriate it is possible, but hard work, to synthesis an appropriate SAX event.\nFor the namespaces prefixes, it is possible to call the startPrefixMapping SAX event, before passing the other SAX events, to declare each namespace, one by one. Failure to do this is permitted, but, for instance, a Jena Model will then not know the (advisory) namespace prefix bindings. These should be paired with endPrefixMapping events, but nothing untoward is likely if such code is omitted.\nUsing your own triple handler As with ARP, it is possible to use this functionality, without using other Jena features, in particular, without using a Jena Model. Instead of using the class SAX2Model, you use its superclass SAX2RDF. The create method on this class does not provide any means of specifying what to do with the triples. Instead, the class implements the ARPConfig interface, which permits the setting of handlers and parser options, as described in the documentation for using ARP without Jena.\nThus you need to:\nCreate a SAX2RDF using SAX2RDF.create() Attach your StatementHandler and SAXErrorHandler and optionally your NamespaceHandler and ExtendedHandler to the SAX2RDF instance. Install the SAX2RDF instance as the SAX handler on your SAX source. Follow the remainder of the code sample above. Using a DOM as Input None of the approaches listed here work with Java 1.4.1_04. We suggest using Java 1.4.2_04 or greater for this functionality. This issue has no impact on any other Jena functionality.\nUsing a DOM as Input to Jena The DOM2Model subclass of SAX2Model, allows the parsing of a DOM using ARP. The procedure to follow is:\nConstruct a DOM2Model, using a factory method such as createD2M, specifying the xml:base of the document to be loaded, the Model to load into, optionally the xml:lang value (particularly useful if using a DOM Node from within a Document). Set any properties, error handlers etc. on the DOM2Model object. The DOM is parsed simply by calling the load(Node) method. Using a DOM as Input to ARP DOM2Model is a subclass of SAX2RDF, and handlers etc. can be set on the DOM2Model as for SAX2RDF. Using a null model as the argument to the factory indicates this usage.\n","permalink":"https://jena.apache.org/documentation/io/arp/arp_sax.html","tags":null,"title":"SAX Input into Jena and ARP"},{"categories":null,"contents":"Schemagen-maven: generating Java source files from OWL and RDFS ontologies via Maven The Apache Jena command line tool schemagen provides an automated way of creating Java source code constants from ontology files in an RDF-based project. This can be very convenient, as it provides both a level of robustness that the names of RDF classes, properties and individuals are being used correctly, and it can be used by IDE\u0026rsquo;s such as Eclipse to provide name-completion for constants from the ontology.\nFor some projects, invoking schemagen from the command line, perhaps via ant, is sufficient. For projects organised around Apache Maven, it would be convenient to integrate the schemagen translation step into Maven\u0026rsquo;s normal build process. This plugin provides a means to do just that.\nPre-requisites This plugin adds a step to the Maven build process to automatically translate RDFS and OWL files, encoded as RDF/XML, Turtle or N-triples into Java source files. This plugin is designed to be used with a Java project that is already using Apache Maven to control the build. Non-Java projects do not need this tool. Projects that are not using Maven should see the schemagen documentation for ways to run schemagen from the command line.\nInstalling Schemagen is available from the maven central repository. To use it, add the following dependency to your pom.xml:\n\u0026lt;build\u0026gt; \u0026lt;plugins\u0026gt; \u0026lt;plugin\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-maven-tools\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;x.y.z\u0026lt;/version\u0026gt; \u0026lt;executions\u0026gt; \u0026lt;execution\u0026gt; \u0026lt;id\u0026gt;schemagen\u0026lt;/id\u0026gt; \u0026lt;goals\u0026gt; \u0026lt;goal\u0026gt;translate\u0026lt;/goal\u0026gt; \u0026lt;/goals\u0026gt; \u0026lt;/execution\u0026gt; \u0026lt;/executions\u0026gt; \u0026lt;/plugin\u0026gt; \u0026lt;/plugins\u0026gt; \u0026lt;/build\u0026gt; \u0026lt;dependencies\u0026gt; \u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-core\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;x.y.z\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; \u0026lt;/dependencies\u0026gt; Replace the x.y.z above with the latest versions as found by browsing jena-maven-tools and jena-core in Maven Central.\nConfiguration: basic principles Schemagen supports a large number of options, controlling such things as the name of the input file, the RDF namespace to expect, which Java package to place the output in and so forth. For a command line or Ant-based build, these options are normally passed on a per-file basis. When using maven, however, we point the plugin at a whole collection of input files to be converted to Java, and let the plugin figure out which ones need updating (e.g. because the RDF source is newer than the Java output, or because the Java file has not yet been generated). So we need:\na mechanism to specify which files to process a mechanism to specify common options for all input files a mechanism to specify per-file unique options In Maven, all such configuration information is provided via the pom.xml file. We tell Maven to use the plugin via the \u0026lt;build\u0026gt; \u0026lt;plugins\u0026gt; section:\n\u0026lt;build\u0026gt; \u0026lt;plugins\u0026gt; \u0026lt;plugin\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-maven-tools\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;x.y.z\u0026lt;/version\u0026gt; \u0026lt;configuration\u0026gt; \u0026lt;/configuration\u0026gt; \u0026lt;executions\u0026gt; \u0026lt;execution\u0026gt; \u0026lt;id\u0026gt;schemagen\u0026lt;/id\u0026gt; \u0026lt;goals\u0026gt; \u0026lt;goal\u0026gt;translate\u0026lt;/goal\u0026gt; \u0026lt;/goals\u0026gt; \u0026lt;/execution\u0026gt; \u0026lt;/executions\u0026gt; \u0026lt;/plugin\u0026gt; \u0026lt;/plugins\u0026gt; \u0026lt;/build\u0026gt; Replace the x.y.z above with the latest versions as found by browsing jena-maven-tools in Maven Central.\nThe configuration options all nest inside the \u0026lt;configuration\u0026gt; section.\nSpecifying files to process An \u0026lt;include\u0026gt; directive specifies one file pattern to include in the set of files to process. Wildcards may be used. For example, the following section specifies that the tool will process all Turtle files, and the foaf.rdf file, in src/main/vocabs:\n\u0026lt;includes\u0026gt; \u0026lt;include\u0026gt;src/main/vocabs/*.ttl\u0026lt;/include\u0026gt; \u0026lt;include\u0026gt;src/main/vocabs/foaf.rdf\u0026lt;/include\u0026gt; \u0026lt;/includes\u0026gt; Specifying processing options Options are, in general, given in the \u0026lt;fileOptions\u0026gt; section. A given \u0026lt;source\u0026gt; refers to one input source - one file - as named by the \u0026lt;input\u0026gt; name. The actual option names are taken from the RDF config file property names, omitting the namespace:\n\u0026lt;fileOptions\u0026gt; \u0026lt;source\u0026gt; \u0026lt;!-- Test2.java (only) will contain OntModel declarations --\u0026gt; \u0026lt;input\u0026gt;src/main/vocabs/demo2.ttl\u0026lt;/input\u0026gt; \u0026lt;ontology\u0026gt;true\u0026lt;/ontology\u0026gt; \u0026lt;/source\u0026gt; \u0026lt;/fileOptions\u0026gt; The special source default provides a mechanism for specifying shared defaults across all input sources:\n\u0026lt;source\u0026gt; \u0026lt;input\u0026gt;default\u0026lt;/input\u0026gt; \u0026lt;package-name\u0026gt;org.example.test\u0026lt;/package-name\u0026gt; \u0026lt;/source\u0026gt; Example configuration Note: Replace the x.y.z below with the latest versions as found by browsing jena-maven-tools and jena-core in Maven Central.\n\u0026lt;build\u0026gt; \u0026lt;plugins\u0026gt; \u0026lt;plugin\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-maven-tools\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;x.y.z\u0026lt;/version\u0026gt; \u0026lt;configuration\u0026gt; \u0026lt;includes\u0026gt; \u0026lt;include\u0026gt;src/main/vocabs/*.ttl\u0026lt;/include\u0026gt; \u0026lt;include\u0026gt;src/main/vocabs/foaf.rdf\u0026lt;/include\u0026gt; \u0026lt;/includes\u0026gt; \u0026lt;fileOptions\u0026gt; \u0026lt;source\u0026gt; \u0026lt;input\u0026gt;default\u0026lt;/input\u0026gt; \u0026lt;package-name\u0026gt;org.example.test\u0026lt;/package-name\u0026gt; \u0026lt;/source\u0026gt; \u0026lt;source\u0026gt; \u0026lt;!-- Test2.java (only) will contain OntModel declarations --\u0026gt; \u0026lt;input\u0026gt;src/main/vocabs/demo2.ttl\u0026lt;/input\u0026gt; \u0026lt;ontology\u0026gt;true\u0026lt;/ontology\u0026gt; \u0026lt;!-- caution: the config file property name 'inference' is mapped to 'use-inf' --\u0026gt; \u0026lt;use-inf\u0026gt;true\u0026lt;/use-inf\u0026gt; \u0026lt;/source\u0026gt; \u0026lt;/fileOptions\u0026gt; \u0026lt;/configuration\u0026gt; \u0026lt;executions\u0026gt; \u0026lt;execution\u0026gt; \u0026lt;id\u0026gt;schemagen\u0026lt;/id\u0026gt; \u0026lt;goals\u0026gt; \u0026lt;goal\u0026gt;translate\u0026lt;/goal\u0026gt; \u0026lt;/goals\u0026gt; \u0026lt;/execution\u0026gt; \u0026lt;/executions\u0026gt; \u0026lt;/plugin\u0026gt; \u0026lt;/plugins\u0026gt; \u0026lt;/build\u0026gt; \u0026lt;dependencies\u0026gt; \u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-core\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;x.y.z\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; \u0026lt;/dependencies\u0026gt; ","permalink":"https://jena.apache.org/documentation/tools/schemagen-maven.html","tags":null,"title":"Schemagen Maven"},{"categories":null,"contents":" The Apache Jena SDB module has been retired and no longer supported. The last release of Jena with this module is Jena 3.17.0.\nSDB uses an SQL database for the storage and query of RDF data. Many databases are supported, both Open Source and proprietary.\nAn SDB store can be accessed and managed with the provided command line scripts and via the Jena API.\nUse of SDB for new applications is not recommended. This component is \u0026ldquo;maintenance only\u0026rdquo;.\nTDB is faster, more scalable and better supported than SDB.\nStatus As of June 2013 the Jena developers agreed to treat SDB as being only maintained where possible. See Future of SDB thread on the mailing list.\nThe developers intend to continue releasing SDB alongside other Jena components but it is not actively developed. None of the developers use it within their organizations.\nSDB may be revived as a fully supported component if members of the community come forward to develop it. The Jena team strongly recommends the use of TDB instead of SDB for all new development due to TDBs substantially better performance and scalability.\nDocumentation SDB Installation Quickstart Command line utilities Store Description format Dataset And Model Descriptions Use from Java Specialized configuration Database Layouts FAQ Fuseki Integration Databases supported Downloads SDB is distributed from the Apache Jena project. See the downloads page for details.\nSupport Support and questions\nDetails Loading data Loading performance Query performance Database Notes List of databases supported\nNotes:\nPostgreSQL notes MySQL notes Oracle notes Microsoft SQL Server notes DB2 notes Derby notes HSQLDB notes H2 notes ","permalink":"https://jena.apache.org/documentation/archive/sdb/sdb_index.html","tags":null,"title":"SDB - persistent triple stores using relational databases"},{"categories":null,"contents":" The Apache Jena SDB module has been retired and is no longer supported. The last release of Jena with this module was Apache Jena 3.17.0.\nThis page describes the configuration options available. These are options for query processing, not for the database layout and storage, which is controlled by store descriptions.\nSetting Options Options can be set globally, throughout the JVM, or on a per query execution basis. SDB uses the same mechanism as ARQ.\nThere is a global context, which is give to each query execution as it is created. Modifications to the global context after the query execution is created are not seen by the query execution. Modifications to the context of a single query execution do not affect any other query execution nor the global context.\nA context is a set of symbol/value pairs. Symbols are used created internal to ARQ and SDB and access via Java constants. Values are any Java object, together with the values true and false, which are short for the constants of class java.lang.Boolean.\nSetting globally:\nSDB.getContext().set(symbol, value) ; Per query execution:\nQueryExecution qExec = QueryExecutionFactory.create(...) ; qExec.getContext.set(symbol, value) ; Setting for a query execution happens before any query compilation or setup happens. Creation of a query execution object does not compile the query, which happens when the appropriate .exec method is called.\nCurrent Options Symbol Effect Default SDB.unionDefaultGraph Query patterns on the default graph match against the union of the named graphs. false SDB.jdbcStream Attempt to stream JDBC results. true SDB.jdbcFetchSize Set the JDBC fetch size on the SQL query statements. Must be \u0026gt;= 0. unset SDB.streamGraphAPI Stream Jena APIs (also requires jdbcStream and jdbcFetchSize false SDB.annotateGeneratedSQL Put SQL comments in SQL true Queries over all Named Graphs All the named graphs can be treated as a single graph in two ways: either set the SDB option above or use the URI that refers to RDF merge of the named graphs (urn:x-arq:UnionGraph).\nWhen querying the RDF merge of named graph, the default graph in the store is not included.\nThis feature applies to queries only. It does not affect the storage nor does it change loading.\nTo find out which named graph a triple can be found in, use GRAPH as usual.\nThe following special IRIs exist for use as a graph name in GRAPH only:\n\u0026lt;urn:x-arq:DefaultGraph\u0026gt; – the default graph, even when option for named union queries is set. \u0026lt;urn:x-arq:UnionGraph\u0026gt; – the union of all named graphs, even when the option for named union queries is not set. Streaming over JDBC By default, SDB processes results from SQL statements in a streaming fashion. It is important to close query execution objects, especially if not consuming all the results, because that causes the underlying JDBC result set to be closed. JDBC result sets can be a scarce system resource.\nIf this option is set, but the JDBC connection is not streaming, then this feature is harmless. Setting it false caused SDB to read all results of an SQL statement at once, treating streamed connections as unstreamed.\nNote that this only streams results end-to-end if the underlying JDBC connection itself is set up to stream. Most do not in the default configuration to reduce transient resource peaks on the server under load.\nSetting the fetch size enables cursors in some databases but there may be restrictions imposed by the database. See the documentation for your database for details.\nIn addition, operations on the graph API can be made streaming by also setting the Graph API to streaming.\nAnnotated SQL SQL generation can include SQL comments to show how SPARQL has been turned into SQL. This option is true by default and always set for the command sdbprint.\nSDB.getContext().setFalse(SDB.annotateGeneratedSQL) ; ","permalink":"https://jena.apache.org/documentation/archive/sdb/configuration.html","tags":null,"title":"SDB Configuration"},{"categories":null,"contents":" The Apache Jena SDB module has been retired and is no longer supported. The last release of Jena with this module was Apache Jena 3.17.0.\nSDB does not have a single database layout. This page is an informal overview of the two main types (\u0026ldquo;layout2/hash\u0026rdquo; and \u0026ldquo;layout2/index\u0026rdquo;).\nIn SDB one store is one RDF dataset is one SQL database.\nDatabases of type layout2 have a triples table for the default graph, a quads table for the named graphs. In the triples and quads tables, the columns are integers referencing a nodes table.\nIn the hash form, the integers are 8-byte hashes of the node.\nIn the index form, the integers are 4-byte sequence ids into the node table.\nTriples\n+-----------+ | S | P | O | +-----------+ Primary key: SPO Indexes: PO, OS\nQuads\n+---------------+ | G | S | P | O | +---------------+ Primary key: GSPO Indexes: GPO, GOS, SPO, OS, PO.\nNodes\nIn the index-based layout, the table is:\n+------------------------------------------------+ | Id | Hash | lex | lang | datatype | value type | +------------------------------------------------+ Primary key: Id Index: Hash\nHash:\n+-------------------------------------------+ | Hash | lex | lang | datatype | value type | +-------------------------------------------+ Primary key: Hash\nAll character fields are unicode, supporting any character set, including mixed language use.\n","permalink":"https://jena.apache.org/documentation/archive/sdb/database_layouts.html","tags":null,"title":"SDB Database Layouts"},{"categories":null,"contents":" The Apache Jena SDB module has been retired and is no longer supported. The last release of Jena with this module was Apache Jena 3.17.0.\nDB2 Derby MS SQL MySQL PostgresQL DB2 Database creation The database should be created with code set UTF-8 so unicode is enabled (SDB creates tables CCSID UNICODE for full internationalization support).\nDerby Loading Restriction Only one load operation can be active at any one time. Limitations on temporary tables in Derby mean the loader tables are not temporary and hence are shared by all connections.\nMS SQL The collation sequence for the database must be one that is binary (BIN in the name). It does not matter which one is used. Without BIN, string matching is case insensitive but RDF requires case sensitive literals and IRIs. The normal layout is not affected by this because it does not use string comparisons.\nMySQL National Characters SDB formats all table columns used for storing text in the MySQL schema to UTF-8. However, this does not cause the data to be transmitted in UTF-8 over the JDBC connection.\nThe best way is to run the server with a default character set of UTF-8. This is set in the MySQL server configuration file:\n[mysql] default-character-set=utf8 A less reliable way is to pass parameters to the JDBC driver in the JDBC URL. The application will need to explicitly set the JDBC URL in the store configuration file.\n...?useUnicode=true\u0026amp;characterEncoding=UTF-8 Connection timeouts If you get the connection timing out after (by default) 8 hours of no activity, try setting autoReconnect=true in the JDBC URL.\nTuning For InndoDB, the critical parameter is innodb_buffer_pool_size. See the MySQL sample configuration files for details. Using ANALYZE TABLE on the database tables can improve the choices made by the MySQL optimizer. Connection Timeout MySQL closes the JDBC connection after a period of no use (8 hours by default).\nWhile deprecated my MySQL, ?autoReconnect=true may help here.\nOther ways of addressing the problem are to make a simple query call on a regular basis just to keep the connection alive (e.g. SELECT * { \u0026lt;http://example/junk\u0026gt; \u0026lt;http://example/junk\u0026gt; \u0026lt;http://example/junk\u0026gt; }).\nSome connection pool systems automatic compensate for this feature of MySQL.\nPostgresQL Databases must use UTF-8 encoding Create SDB stores with encoding UTF-8.\nInternational character sets can cause corrupted databases otherwise. The database will not pass the SDB test suite.\nSet this when creating the database with pgAdmin or if you use the command line, for example:\nCREATE DATABASE \u0026quot;YourStoreName\u0026quot; WITH OWNER = \u0026quot;user\u0026quot; ENCODING = 'UTF8' TABLESPACE = pg_default; Improving loading rates The index layout (\u0026ldquo;layout2/index\u0026rdquo;) usually loads faster than the hash form.\nExisting store\nWhen loading into an existing store, where there is existing data and ANALYZE has been run, the process is:\nDrop indexes\nsdbconfig \u0026ndash;drop\nLoad data\nsdbload file\nRedo the indexes\nsdbconfig \u0026ndash;index\nFresh store\nPostgreSQL needs statistics to improve load performance through the use of ANALYSE.\nWhen loading the first time, there are no statistics so, for a large load, it is advisable to load a sample, run ANALYSE and then load the whole data.\nCreate the database without indexes (just the primary keys).\nsdbconfig \u0026ndash;format\nLoad a sample of the triples (say, a 100K or a million triples\nuntil the load rate starts to drop appreciably). The sample must be representative of the data.\nsdbload --time sample Run ANALYZE on the database.\nIf your sample is one part of a large set of files, this set is not necessary at all. If you are loading one single large file then you might wish to empty the database. This is only needed if the data has bNodes in\nit because the load process suppresses duplicates.\nsdbconfig --truncate Now load the data or rest of the data.\nsdbload \u0026ndash;time file\nAdd the indexes. This only takes a few minutes even on a very large store but calculating them during loading (that is, --create, not --format) is noticeably slower.\nsdbconfig \u0026ndash;index\nRun ANALYZE on the database again.\nTuning It is essential to run the PostgreSQL ANALYZE command on a database, either during or after building. This is done via the command line psql or via pgAdmin. The PostgreSQL documentation describes ways to run this as a background daemon.\nVarious of the PostgreSQL configuration parameters will affect performance, particularly effective_cache_size. The parameter enable_seqscan may help avoid some unexpected slow queries.\n","permalink":"https://jena.apache.org/documentation/archive/sdb/db_notes.html","tags":null,"title":"SDB Database Notes"},{"categories":null,"contents":" The Apache Jena SDB module has been retired and is no longer supported. The last release of Jena with this module was Apache Jena 3.17.0.\nProduct Version Oracle 10g Including OracleXE Microsoft SQL Server 2005 Including MS SQL Express DB2 9 Including DB2 9 Express PostgreSQL v8.2 MySQL v5.0.22 Apache Derby v10.2.2.0 H2 1.0.71 HSQLDB 1.8.0 Support for a version implies support for later versions unless otherwise stated.\nMicrosoft SQL Server 2000 is also reported to work.\nH2 support was contributed by Martin Hein (March 2008).\nIBM DB2 support was contributed by Venkat Krishnamurthy (October 2007).\n","permalink":"https://jena.apache.org/documentation/archive/sdb/databases_supported.html","tags":null,"title":"SDB Databases Supported"},{"categories":null,"contents":" The Apache Jena SDB module has been retired and is no longer supported. The last release of Jena with this module was Apache Jena 3.17.0.\nAssembler descriptions for RDF Datasets and individual models are built from Store Descriptions. A dataset assembler just points to the store to use; a model assembler points to the store and identifies the model within the store to use (or use the default model).\nDatasets The example below creates an in-memory store implemented by HSQLDB.\nPREFIX rdfs: \u0026lt;http://www.w3.org/2000/01/rdf-schema#\u0026gt; PREFIX rdf: \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026gt; PREFIX xsd: \u0026lt;http://www.w3.org/2001/XMLSchema#\u0026gt; PREFIX ja: \u0026lt;http://jena.hpl.hp.com/2005/11/Assembler#\u0026gt; PREFIX sdb: \u0026lt;http://jena.hpl.hp.com/2007/sdb#\u0026gt; sdb:DatasetStore rdfs:subClassOf ja:RDFDataset . \u0026lt;#dataset\u0026gt; rdf:type sdb:DatasetStore ; sdb:store \u0026lt;#store\u0026gt; . \u0026lt;#store\u0026gt; rdf:type sdb:Store ; ... . A dataset description for SDB is identical to a store description, except the rdf:type is sdb:DatasetStore. A different kind of Java object is created.\nModels To assemble a particular model in a store, especially to work with at the API level rather than at the query level, the following can be added to an assembler description:\n# Default graph \u0026lt;#myModel1\u0026gt; rdf:type sdb:Model ; sdb:dataset \u0026lt;#dataset\u0026gt; . # Named graph \u0026lt;#myModel2\u0026gt; rdf:type sdb:Model ; sdb:namedGraph data:graph1 ; sdb:dataset \u0026lt;#dataset\u0026gt; . There can several model descriptions in the same file, referring to the same SDB dataset, or to different ones. The Jena assembler interface enables different items to be picked out.\nNote that creating a general (ARQ) dataset from models/graph inside an SDB store is not the same as using a dataset which is the query interface to the store. It is the dataset for the store that triggers full SPARQL to SQL translation, not a model.\n","permalink":"https://jena.apache.org/documentation/archive/sdb/dataset_description.html","tags":null,"title":"SDB Dataset Description"},{"categories":null,"contents":" The Apache Jena SDB module has been retired and is no longer supported. The last release of Jena with this module was Apache Jena 3.17.0.\nTune your database Database performance depends on the database being tuned. Some databases default to \u0026ldquo;developer setup\u0026rdquo; which does not use much of the RAM but is only for functional testing.\nImproving loading rates For a large bulk load into an existing store, dropping the indexes, doing the load and then recreating the indexes can be noticeably faster.\nsdbconfig --drop sdbload file sdbconfig --index For a large bulk load into a new store, just format it, and not create the indexes, do the load and then recreating the indexes can be noticeably faster.\nsdbconfig --format sdbload --time file sdbconfig --index ","permalink":"https://jena.apache.org/documentation/archive/sdb/faq.html","tags":null,"title":"SDB FAQ"},{"categories":null,"contents":" The Apache Jena SDB module has been retired and is no longer supported. The last release of Jena with this module was Apache Jena 3.17.0.\nFuseki is a server that implements the SPARQL protocol for HTTP. It can be used to give a SPARQL interface to an SDB installation.\nThe Fuseki server needs the SDB jar files on its classpath. The Fuseki server configuration file needs to contain two triples to integrate SDB:\n[] rdf:type fuseki:Server ; fuseki:services ( \u0026lt;#service1\u0026gt; ) . ## Declare that sdb:DatasetStore is an implementation of ja:RDFDataset . sdb:DatasetStore rdfs:subClassOf ja:RDFDataset . then a Fuseki service can use an SBD-implemented dataset:\n\u0026lt;#dataset\u0026gt; rdf:type sdb:DatasetStore ; sdb:store \u0026lt;#store\u0026gt; . \u0026lt;#store\u0026gt; rdf:type sdb:Store ; sdb:layout \u0026quot;layout2\u0026quot; ; sdb:connection \u0026lt;#conn\u0026gt; ; . \u0026lt;#conn\u0026gt; rdf:type sdb:SDBConnection ; .... The database installation does not need to accept public requests, it needs only to be accessible to the Fuseki server itself.\n","permalink":"https://jena.apache.org/documentation/archive/sdb/fuseki_integration.html","tags":null,"title":"SDB Fuseki Integration"},{"categories":null,"contents":" The Apache Jena SDB module has been retired and is no longer supported. The last release of Jena with this module was Apache Jena 3.17.0.\nA suitable database must be installed separately. Any database installation should be tuned according to the database documentation.\nThe SDB distribution is zip file of a directory hierarchy.\nUnzip this. You may need to run chmod u+x on the scripts in the bin/ directory.\nWrite a sdb.ttl store description: there are examples in the Store/ directory.\nA database must be created before the tests can be run. Microsoft SQL server and PostgreSQL need specific database options set when a database is created.\nTo use in a Java application, put all the jar files in lib/ on the build and classpath of your application. See the Java API.\nTo use command line scripts, see the scripts page including setting environment variables SDBROOT, SDB_USER, SDB_PASSWORD and SDB_JDBC.\nbin/sdbconfig --sdb=sdb.ttl --create ","permalink":"https://jena.apache.org/documentation/archive/sdb/installation.html","tags":null,"title":"SDB Installation"},{"categories":null,"contents":" The Apache Jena SDB module has been retired and is no longer supported. The last release of Jena with this module was Apache Jena 3.17.0.\nThis page describes how to use SDB from Java.\nCode examples are in src-examples/ in the SDB distribution.\nConcepts Store SDBFactory SDBConnection SDB loads and queries data based on the unit of a Store. The Store object has all the information for formatting, loading and accessing an SDB database. One database or table space is one Store. Store objects are made via the static method of the StoreFactory class.\nSDBConnection wraps the underlying database connection, as well as providing logging operations.\nStoreDesc A store description is the low level mechanism for describing stores to be created.\nDatasetStore GraphSDB Two further class are not immediately visible because they are managed by the SDBFactory which creates the necessary classes, such as Jena models and graphs.\nAn object of class DatasetStore represents an RDF dataset backed by an SDB store. Objects of this class trigger SPARQL queries being sent to SDB.\nThe class GraphSDB provides the adapter between the standard Jena Java API and an SDB store, either to the default graph or one of the named graphs. The SDBFactory can also create Jena Models backed by such a graph.\nObtaining the Store A store is build from a description. This can be a description in file as a Jena assembler or the application can build the store description programmatically.\nFrom a configuration file The stored description is the only point where the specific details of store are given. This includes connection information, the database name, and database type. It makes sense to place this outside the code. That way, the application can be switched between different databases (e.g. testing and production) by changing a configuration file, and not the code, which would require recompilation and a rebuild.\nTo create a Store from a store assembler\nStore store = SDBFactory.connectStore(\u0026quot;sdb.ttl\u0026quot;) ; The assembler file has two parts, the connection details and the store type.\n@prefix rdfs: \u0026lt;http://www.w3.org/2000/01/rdf-schema#\u0026gt; . @prefix rdf: \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026gt; . @prefix ja: \u0026lt;http://jena.hpl.hp.com/2005/11/Assembler#\u0026gt; . @prefix sdb: \u0026lt;http://jena.hpl.hp.com/2007/sdb#\u0026gt; . _:c rdf:type sdb:SDBConnection ; sdb:sdbType \u0026quot;derby\u0026quot; ; sdb:sdbName \u0026quot;DB/SDB2\u0026quot; ; sdb:driver \u0026quot;org.apache.derby.jdbc.EmbeddedDriver\u0026quot; ; . [] rdf:type sdb:Store ; sdb:layout \u0026quot;layout2\u0026quot; ; sdb:connection _:c ; . See the full details of store description files for the options.\nIn Java code The less flexible way to create a store description is to build it in Java. For example:\nStoreDesc storeDesc = new StoreDesc(LayoutType.LayoutTripleNodesHash, DatabaseType.Derby) ; JDBC.loadDriverDerby() ; String jdbcURL = \u0026quot;jdbc:derby:DB/SDB2\u0026quot;; SDBConnection conn = new SDBConnection(jdbcURL, null, null) ; Store store = SDBFactory.connectStore(conn, storeDesc) ; Database User and Password The user and password for the database can be set in explicitly in the description file but it is usually better to use an environment variable or Java system property because this avoid writing use and password in a file.\nEnvironment variable: SDB_USER Java property: jena.db.user\nEnvironment variable: SDB_PASSWORD Java property: jena.db.password\nConnection Management Each store has a JDBC connection associated with it.\nIn situations where such connections are managed externally, the store object can be created and used within a single operation.\nA Store is lightweight and does not perform any database actions when created, so creating and releasing them will not impact performance. Closing a store does not close the JDBC connection.\nSimilarly, a SDBConnection is lightweight and creation does not result in any database or JDBC connection actions.\nThe store description can be read from the same file because any SDB connection information is ignored when reading to get just the store description. The store description can be kept across store creations:\nstoreDesc = StoreDesc.read(\u0026quot;sdb.ttl\u0026quot;) ; then used with an JDBC connection object passed from the connection container:\npublic static void query(String queryString, StoreDesc storeDesc, Connection jdbcConnection) { Query query = QueryFactory.create(queryString) ; SDBConnection conn = SDBFactory.createConnection(jdbcConnection) ; Store store = SDBFactory.connectStore(conn, storeDesc) ; Dataset ds = SDBFactory.connectDataset(store) ; try(QueryExecution qe = QueryExecutionFactory.create(query, ds)) { ResultSet rs = qe.execSelect() ; ResultSetFormatter.out(rs) ; } store.close() ; } Formatting or Emptying the Store SDB stores do not ensure that the database is formatted. You can check whether the store is already formatted using:\nStoreUtils.isFormatted(store); This is an expensive operation, and should be used sparingly.\nOnce you obtain a store for the first time you will need to:\nstore.getTableFormatter().create(); This will create the necessary tables and indexes required for a full SDB store.\nYou may empty the store completely using:\nstore.getTableFormatter().truncate(); Loading data Data loading uses the standard Jena Model.read operations. GraphSDB, and models made from a GraphSDB, implement the standard Jena bulk data interface with backed by an SBD implementation of that interface.\nExecuting Queries The interface to making queries with SDB is same as that for querying with ARQ. SDB is an ARQ query engine that can handle queries made on an RDF dataset which is of the SDB class DatasetStore:\nDataset ds = DatasetStore.create(store) ; This is then used as normal with ARQ:\nDataset ds = DatasetStore.create(store) ; try(QueryExecution qe = QueryExecutionFactory.create(query, ds)) { ResultSet rs = qe.execSelect() ; ResultSetFormatter.out(rs) ; } When finished, the store should be closed to release any resources associated with the particular implementation. Closing a store does not close it\u0026rsquo;s JDBC connection.\nstore.close() ; Closing the SDBConnection does close the JDBC connection:\nstore.getConnection().close() ; store.close() ; If models or graphs backed by SDB are placed in a general Dataset then the query is not efficiently executed by SDB.\nUsing the Jena Model API with SDB A Jena model can be connected to one graph in the store and used with all the Jena API operations.\nHere, the graph for the model is the default graph:\nStore store = SDBFactory.connectStore(\u0026quot;sdb.ttl\u0026quot;) ; Model model = SDBFactory.connectDefaultModel(store) ; StmtIterator sIter = model.listStatements() ; for ( ; sIter.hasNext() ; ) { Statement stmt = sIter.nextStatement() ; System.out.println(stmt) ; } sIter.close() ; store.close() ; SDB is optimized for SPARQL queries but queries and other Jena API operations can be mixed. The results from a SPARQL query are Jena RDFNodes, with the associated model having a graph implemented by SDB.\n","permalink":"https://jena.apache.org/documentation/archive/sdb/javaapi.html","tags":null,"title":"SDB JavaAPI"},{"categories":null,"contents":" The Apache Jena SDB module has been retired and is no longer supported. The last release of Jena with this module was Apache Jena 3.17.0.\nThere are three ways to load data into SDB:\nUse the command utility sdbload Use one of the Jena model.read operations Use the Jena model.add The last one of these requires the application to signal the beginning and end of batches.\nLoading with Model.read A Jena Model obtained from SDB via:\nSDBFactory.connectModel(store) will automatically bulk load data for each call of one of the Model.read operations.\nLoading with Model.add The Model.add operations, in any form or combination of forms, whether loading a single statement, list of statements, or another model, will invoke the bulk loader if previously notified before an add operation.\nYou can also explicitly delimit bulk operations:\nmodel.notifyEvent(GraphEvents.startRead) ... do add/remove operations ... model.notifyEvent(GraphEvents.finishRead) Failing to notify the end of the operations will result in data loss.\nA try/finally block can ensure that the finish is notified.\nmodel.notifyEvent(GraphEvents.startRead) ; try { ... do add/remove operations ... } finally { model.notifyEvent(GraphEvents.finishRead) ; } The model.read operations do this automatically.\nThe bulk loader will automatically chunk large sequences of additions to sizes appropriate to the underlying database. The bulk loader is threaded with double-buffered; loading to the database happens in parallel to the application thread and any RDF parsing.\nHow the loader works Loading consists of two phases: in the java VM, and on the database itself. The SDB loader takes incoming triples and breaks them down into components ready for the database. These prepared triples are added to a queue for the database phase, which (by default) takes place on a separate thread. When the number of triples reaches a limit (default 20,000), or finish update is signalled, the triples are passed to the database.\nYou can configure whether to use threading and the \u0026lsquo;chunk size\u0026rsquo; \u0026ndash; the number of triples per load event \u0026ndash; via StoreLoader.\nStore store; // SDB Store ... store.getLoader().setChunkSize(5000); // store.getLoader().setUseThreading(false); // Don't thread You should set these before the loader has been used.\nEach loader sets up two temporary tables (NNode and NTrip) that mirror Nodes and Triples tables. These tables are virtually identical, except that a) they are not indexed and b) for the index variant there is no index column for nodes.\nWhen loading prepared triples \u0026ndash; triples that have been broken down ready for the database \u0026ndash; are passed to the loader core (normally running on a different thread). When the chunk size is reached, or we are out of triples, the following happens:\nPrepared nodes are added in one go to NNode. Duplicate nodes within a chunk are suppressed on the java side (this is worth doing since they are quite common, e.g. properties). Prepared triples are added in one go to NTrip. New nodes are added to the node table (duplicate suppression is explained below). New triples are added to the triple table (once again suppressing dupes). For the index case this involves joining on the node table to do a hash to index lookup. We commit. If anything goes wrong the transaction (the chunk) is rolled back, and an exception is thrown (or readied for throwing on the calling thread). Thus there are five calls to the database for every chunk. The database handles almost all of the work uninterrupted (duplicate suppression, hash to index lookup), which makes loading reasonably quick.\nDuplicate Suppression MySQL has a very useful INSERT IGNORE, which will keep going, skipping an offending row if a uniqueness constraint is violated. For other databases we need something else.\nHaving tried a number of options the best seems to be to INSERT new items by LEFT JOIN new items to existing items, then filtering WHERE (existing item feature) IS NULL. Specifically, for the triple hash case (where no id lookups are needed):\nINSERT INTO Triples SELECT DISTINCT NTrip.s, NTrip.p, NTrip.o -- DISTINCT because new triples may contain duplicates (not so for nodes) NTrip LEFT JOIN Triples ON (NTrip.s=Triples.s AND NTrip.p=Triples.p AND NTrip.o=Triples.o) WHERE Triples.s IS NULL OR Triples.p IS NULL OR Triples.o IS NULL ","permalink":"https://jena.apache.org/documentation/archive/sdb/loading_data.html","tags":null,"title":"SDB Loading data"},{"categories":null,"contents":" The Apache Jena SDB module has been retired and is no longer supported. The last release of Jena with this module was Apache Jena 3.17.0.\nIntroduction The Databases and Hardware Hardware Windows setup Linux setup The Dataset and Queries LUBM dbpedia Loading Results Uniprot 700m loading: Tuning Helps Introduction Performance reporting is an area prone to misinterpretation, and such reports should be liberally decorated with disclaimers. In our case there are an alarming number of variables: the hardware, the operating system, the database engine and its myriad parameters, the data itself, the queries, and planetary alignment.\nGiven this here is some basic information. You may find it sufficient:\nLoading speed will be in the thousands of triples per second range. Expect to load around 5 million triples per hour. Index layout is usually better than hash for loading speed. Hash loading is very bad on MySQL. Hash layout is better for query speed. We suggest that you don\u0026rsquo;t choose your database based on these figures. The performance is broadly similar, so if you already have a relational database installed this is your best option.\nThe Databases and Hardware SDB supports a range of databases, but the figures here are limited to SQLServer and Postgresql. The hardware used was identical, although running linux (for Postgresql) and windows (for SQLServer).\nHardware Dual AMD Opteron processors, 64 bit, 1.8 GHz. 8 GB memory. 80 GB disk for database. Windows setup Windows server 2003 Java 6 64 bit SQLServer 2005 Linux setup Redhat Enterprise Linux 4 Java 6 64 bit Postgresql 8.2 The Dataset and Queries We use the Lehigh University Benchmark http://swat.cse.lehigh.edu/projects/lubm/ and dbpedia http://dbpedia.org/, together with some example queries that each provides. You can find the queries in SDB/PerfTests.\nLUBM LUBM generates artificial datasets. To be useful one needs to apply reasoning, and this was done in advance of loading. The queries are quite stressful for SDB in that they are not very ground (in many neither subjects nor objects are present), and many produce very large result sets. Thus they are probably atypical of many SPARQL queries.\nSize: 19 million triples (including inferred triples). dbpedia The dbpedia queries are, unlike LUBM, quite ground. dbpedia contains many large literals, in contrast to LUBM.\nSize: 25 million triples. Loading All operations were performed using SDB\u0026rsquo;s command line tools. The data was loaded into a freshly formatted SDB store \u0026ndash; although postgresql needs an ANALYSE to avoid silly planning \u0026ndash; then the additional indexes were added.\nResults Benchmark Database loading Speed (tps) Index time (s) Size (MB) LUBM Postgres (Hash) 4972 199 5124 LUBM Postgres (Index) 8658 176 3666 LUBM SQLServer (Hash) 8762 121 3200 LUBM SQLServer (Index) 7419 68 2029 DBpedia Postgres (Hash) 3029 298 10193 DBpedia Postgres (Index) 4293 227 6251 DBpedia SQLServer (Hash) 5345 162 6349 DBpedia SQLServer (Index) 4749 110 4930 Uniprot 700m loading: Tuning Helps To illustrate the variability in loading speed, and emphasise the importance of tuning, consider the case of Uniprot http://dev.isb-sib.ch/projects/uniprot-rdf/. Uniprot contains (at the time of writing) around 700 million triples. We loaded these on to the SQLServer setup given above, but with the following changes:\nThe database was stored on a separate disk. The database\u0026rsquo;s transactional logs were stored on yet another disk. So the rdf data, database data, and log data were all on distinct disks.\nLoading into an index-layout store proceeded at:\n11079 triples per second ","permalink":"https://jena.apache.org/documentation/archive/sdb/loading_performance.html","tags":null,"title":"SDB Loading performance"},{"categories":null,"contents":" The Apache Jena SDB module has been retired and is no longer supported. The last release of Jena with this module was Apache Jena 3.17.0.\nThis page compares the effect of SDB with RDB, Jena\u0026rsquo;s usual database layout. RDB was designed for supporting the fine-grained API calls as well as having some support for basic graph patterns. Therefore, the RDB design goals were not those of SDB.\nRDB uses a denormalised database layout in order that all statement-level operations do not require additional joins. The SDB layout is normalised so that the triple table is narrower and uses integers for RDF nodes, then does do joins to get the node representation. These optimizers for longer patterns, not API operations.\nThese figures were taken July 2007.\nAs with any performance figures, these should be taken merely as a guide. The shape of the data, the hardware details, choice of database, and its configuration (particularly amount of memory used), as well as the queries themselves all greatly contribute to the execution costs.\nSetup Database and hardware setup was the same as for the load performance tests.\nData was taken generated with the LUBM test generator (with N = 15), then the inference expanded on loading to give about 19.5 million triples. This data is larger than the database could completely cache.\nThe queries are taken the LUBM suite and rewritten in SPARQL.\nLUBM Query 1 PREFIX rdf: \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026gt; PREFIX ub: \u0026lt;http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#\u0026gt; SELECT * WHERE { ?x rdf:type ub:GraduateStudent . ?x ub:takesCourse \u0026lt;http://www.Department0.University0.edu/GraduateCourse0\u0026gt; . } Jena: 24.16s SDB/index: 0.014s\nSDB/hash: 0.04s\nLUBM Query 2 PREFIX rdf: \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026gt; PREFIX ub: \u0026lt;http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#\u0026gt; SELECT * WHERE { ?x rdf:type ub:GraduateStudent . ?y rdf:type ub:University . ?z rdf:type ub:Department . ?x ub:memberOf ?z . ?z ub:subOrganizationOf ?y . ?x ub:undergraduateDegreeFrom ?y . } This query searches for a particular pattern in the data without specific starting point.\nJena: 232.1s (153s with an addition index on OP) SDB/index: 12.7s SDB/hash: 3.7s\nNotes: Removing the rdf:type statements actually slows the query down.\nSummary In SPARQL queries, there is often a sufficiently complex graph pattern that the SDb design tradeoff provides significant advantages in query performance.\n","permalink":"https://jena.apache.org/documentation/archive/sdb/query_performance.html","tags":null,"title":"SDB Query performance"},{"categories":null,"contents":" The Apache Jena SDB module has been retired and is no longer supported. The last release of Jena with this module was Apache Jena 3.17.0.\nSDB provides some command line tools to work with SDB triple stores. In the following it assumed that you have a store description set up for your database (sdb.ttl). See the store description format for details. The Store/ directory for some examples.\nSetting up your environment $ export SDBROOT=/path/to/sdb $ export PATH=$SDBROOT/bin:$PATH $ export SDB_USER=YourDatabaseUserName $ export SDB_PASSWORD=YourDatabasePassword $ export SDB_JDBC=YourJDBCdriver Initialising the database Be aware that this will wipe existing data from the database.\n$ sdbconfig --sdb sdb.ttl --format This creates a basic layout. It does not add all indexes to the triple table, which may be left until after loading.\nLoading data $ sdbload --sdb sdb.ttl file.rdf You might want to add the --verbose flag to show the load as it progresses.\nAdding indexes You need to do this at some point if you want your queries to execute in a reasonable time.\n$ sdbconfig --sdb sdb.ttl --index Query $ sdbquery --sdb sdb.ttl 'SELECT * WHERE { ?s a ?p }' $ sdbquery --sdb sdb.ttl --file query.rq ","permalink":"https://jena.apache.org/documentation/archive/sdb/quickstart.html","tags":null,"title":"SDB Quickstart"},{"categories":null,"contents":" The Apache Jena SDB module has been retired and is no longer supported. The last release of Jena with this module was Apache Jena 3.17.0.\nUse of an SDB store requires a Store object which is described in two parts:\na connection to the database a description of the store configuration These can be built from a Jena assembler description.\nStore objects themselves are lightweight so connections to an SDB database can be created on a per-request basis as required for use in J2EE application servers.\nStore Descriptions A store description identifies which storage layout is being used, the connection to use and the database type.\n[] rdf:type sdb:Store ; sdb:layout \u0026quot;layout2\u0026quot; ; sdb:connection \u0026lt;#conn\u0026gt; . \u0026lt;#conn\u0026gt; ... SDB Connections SDB connections, objects of class SDBConnection, abstract away from the details of the connection and also provide consist logging and transaction operations. Currently, SDB connections encapsulate JDBC connections but other connection technologies, such as direct database APIs, can be added.\nExample The sdbType is needed for both a connection and for a store description. It can be given in either part of the complete store description. If it is specified in both places, it must be the same.\n@prefix rdfs: \u0026lt;http://www.w3.org/2000/01/rdf-schema#\u0026gt; . @prefix rdf: \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026gt; . @prefix ja: \u0026lt;http://jena.hpl.hp.com/2005/11/Assembler#\u0026gt; . @prefix sdb: \u0026lt;http://jena.hpl.hp.com/2007/sdb#\u0026gt; . \u0026lt;#myStore\u0026gt; rdf:type sdb:Store ; sdb:layout \u0026quot;layout2\u0026quot; ; sdb:connection \u0026lt;#conn\u0026gt; ; . \u0026lt;#conn\u0026gt; rdf:type sdb:SDBConnection ; sdb:sdbType \u0026quot;derby\u0026quot; ; sdb:sdbName \u0026quot;DB/SDB2\u0026quot; ; sdb:driver \u0026quot;org.apache.derby.jdbc.EmbeddedDriver\u0026quot; ; . Examples of assembler files are to be found in the Store/ directory in the distribution.\nVocabulary Store The value of sdbType needed for the connection also applies to choosing the store type.\nlayout ~ Layout type (e.g. \u0026ldquo;layout2\u0026rdquo;, \u0026ldquo;layout2/hash\u0026rdquo; or \u0026ldquo;layout2/index\u0026rdquo;).\nconnection ~ The object of this triple is the subject of the connection description.\nengine ~ Set the MySQL engine type (MySQL only).\nConnection sdbType The type of the database (e.g. \u0026ldquo;oracle\u0026rdquo;, \u0026ldquo;MSSQLServerExpress\u0026rdquo;, \u0026ldquo;postgresql\u0026rdquo;, \u0026ldquo;mysql\u0026rdquo;). Controls both creating the JDBC URL, if not given explicitly, and the store type. sdbName Name used by the database service to select a database. Oracle SID. sdbHost Host name for the database server. Include :port to change the port from the default for the database. sdbUser sdbPassword Database user name and password. The environment variables SDB_USER and SDB_PASSWORD are a better way to pass in the user and password because they are not then written into store description files. In Java programs, the system properties jena.db.user and jena.db.password can be used. driver The JDBC driver class name. Normally, the system looks up the sdbType to find the driver. Setting this property overrides that choice. jdbcURL If necessary, the JDBC URL can be set explicitly, not constructed by SDB. The sdbType is still needed. ","permalink":"https://jena.apache.org/documentation/archive/sdb/store_description.html","tags":null,"title":"SDB Store Description"},{"categories":null,"contents":" The Apache Jena SDB module has been retired and is no longer supported. The last release of Jena with this module was Apache Jena 3.17.0.\nThis page describes the command line programs that can be used to create an SDB store, load data into it and to issue queries.\nScripts The directory bin/ contains shell scripts to run the commands from the command line. The scripts are bash scripts which also run over Cygwin.\nScript set up Set the environment variable SDBROOT to the root of the SDB installation.\nA store description can include naming the class for the JDBC driver. Getting a Store object from a store description will automatically load the JDBC driver from the classpath.\nWhen running scripts, set the environment variable SDB_JDBC to one or more jar files for JDBC drivers. If it is more than one jar file, use the classpath syntax for your system. You can also use the system property jdbc.drivers.\nSet the environment variables SDB_USER and SDB_PASSWORD to the database user name and password for JDBC.\n$ export SDBROOT=\u0026quot;/path/to/sdb $ export SDB_USER=\u0026quot;YourDbUserName\u0026quot; $ export SDB_PASSWORD=\u0026quot;YourDbPassword\u0026quot; $ export SDB_JDBC=\u0026quot;/path/to/driver.jar\u0026quot; They are bash scripts, and work on Linux and Cygwin for MS Windows.\n$ export PATH=$SDBROOT/bin:$PATH Alternatively, there are wrapper scripts in $SDBROOT/bin2 which can be placed in a convenient directory that is already on the shell command path.\nArgument Structure All commands take a SDB store description to extract the connection and configuration information they need. This is written SPEC in the command descriptions below but it can be composed of several arguments as described here.\nEach command then has command-specific arguments described below.\nAll commands support --help to give details of named and positional arguments.\nThere are two equivalent forms of named argument syntax:\n--arg=val --arg val Store Description If this is not specified, commands load the description file sdb.ttl from the current directory.\n--sdb=\u0026lt;sdb.ttl\u0026gt; This store description is a Jena assembler file. The description consists of two parts; a store description and a connection description.\nOften, this is all that is needed to describe which store to use. The individual components of a connection or configuration can be overridden after the description have been read, before it is processed.\nThe directory Store/ has example assembler files.\nThe full details of the assembler file is given in \u0026lsquo;SDB/Store Description\u0026rsquo;\nModifying the Store Description The individual items of a store description can be overridden by various command arguments. The description in the assembler file is read, then any command line arguments used to modify the description, then the appropriate object is created from the modified description.\nSet the layout type:\n--layout : layout name Currently, one of layout1, layout2, layout2/index, layout2/hash.\nSet JDBC details:\n--dbName : Database Name --dbHost : Host machine name --dbType : Database type. --dbUser : Database use --dbPassword : Database password. The host name can host or host:port.\nThe better way to handle passwords is to use environment variables SDB_USER and SDB_PASSWORD because then the user/password is not stored in a visible way.\nLogging and Monitoring All commands take the following arguments (although they may do nothing if they make no sense to the command).\n-v Be verbose.\n--time Print timing information. Treat with care - while the timer avoids recording JVM and some class loading time, it can\u0026rsquo;t avoid all class loading. Hence, the values of timing are more meaningful on longer operations. JDBC operation times to a remote server can also be a significant proportion in short operations.\n--log=[all|none|queries|statements|exceptions] to log SQL actions on the database connection (but not the prepared statements used by the loader). Can be repeated on the command line.\nSDB Commands Database creation sdbconfig SPEC [--create|--format|--indexes|--dropIndexes] Setup a database.\nOption Description --create formats the store and sets up indexes --format just formats the store and creates indexes for loading, not querying. --indexes Create indexes for querying --dropIndexes Drop indexes for querying. Loading large graphs can be faster by formatting, loading the data, then building the query indexes with this command.\nsdbtruncate SPEC Truncate the store. Non-transactional. Destroys data.\nLoading data sdbload SPEC FILE [FILE ...] Load RDF data into a store using the SDB bulk loader. Data is streamed into the database and is not loaded as a single transaction.\nThe file\u0026rsquo;s extension is used to determine the data syntax.\nTo load into a named graph:\nsdbload SPEC --graph=URI FILE [FILE ...] Query sdbquery SPEC --query=FILE Execute a query.\nsdbprint SPEC --print=X [--sql] --query=FILE Print details of a query. X is any of query, op, sqlNode, sql or plan. --print=X can be repeated. \u0026ndash;sql is short for \u0026ndash;print=sql. The default is --print=sql.\nTesting sdbtest SPEC MANIFEST Execute a test manifest file. The manifest of all query tests, which will test connection and loading of data, is in \u0026lt;em\u0026gt;SDBROOT\u0026lt;/em\u0026gt;/testing/manifest-sdb.ttl.\nOther sdbdump SPEC --out=SYNTAX Dump the contents of a store N-TRIPLES or a given serialization format (usual Jena syntax names, e.g. Turtle or TTL).\nOnly suitable for data sizes that fit in memory. All output syntaxes that do some form of pretty printing will need additional space for their internal datastructures.\nsdbsql SPEC [ --file=FILE | SQL string ] Execute a SQL command on the store, using the connection details from the store specification. The SQL command either comes from file FILE or the command line as a string.\nsdbinfo SPEC Details of a store.\nsdbmeta SPEC --out=SYNTAX Do things with the meta graphs of a store.\nsdbscript SPEC FILE Execute a script. Currently only JRuby is supported.\nsdbtuple SPEC [--create|--print|--drop|--truncate] tableName Many of the tables used within SDB are tuples of RDF nodes. This command allows low-level access to these tuple tables. Misuse of this command can corrupt the store.\n","permalink":"https://jena.apache.org/documentation/archive/sdb/commands.html","tags":null,"title":"SDB/Commands"},{"categories":null,"contents":"Fuseki2 webapp provides security by using Apache Shiro. This is controlled by the configuration file shiro.ini located at $FUSEKI_BASE/shiro.ini. If not found, the server initializes with a default configuration. This can then be replaced or edited as required. An existing file is never overwritten by the server.\nIn its default configuration, SPARQL endpoints are open to the public but administrative functions are limited to localhost. One can access it via http://localhost:.../.... Or the according IPv4 or IPv6 address, for example 127.0.0.1 (IPv4), or [::1] (IPv6). Access from an external machine is not considered as localhost and thus restricted.\nOnce Shiro has been configured to perform user authentication it provides a good foundation on which the Jena Permissions layer can be configured. There is an example implementation documented in the Jena Permissions section. The Jena Permissions layer can be used to restrict access to specific graphs or triples within graphs.\nA simple example to enable basic user/password authentication is shown in the default shiro.ini configuration. The default admin user is admin and the password is pw. This can be changed directly in the INI file. Note that this setup is not recommended for production for various reasons (no TLS, passwords in plain text etc.), consult the Shiro INI documentation for best practices.\nAs mentioned above, the default setup only restricts access to the admin pages of Fuseki. To avoid clashes with dataset names, the namespace of the admin interface starts with \u0026lsquo;/$/\u0026rsquo;, consult the Fuseki HTTP Administration Protocol documentation for more details.\nIf access to SPARQL endpoints should be restricted, additional Shiro ACLs are necessary. This is done in the [urls] section of the configuration. As an example, restricting access to the ../query SPARQL endpoint for all datasets on Fuseki could be done with this wildcard pattern:\n/**/query = authcBasic,user[admin]\nAnonymous SPARQL queries would no longer be possible in this example.\nAgain, please consult the Apache Shiro website for details and more sophisticated setups. The default configuration of Fuseki is kept simple but is not recommended for setups where sensitive data is provided.\nChanging the security setup requires a server restart.\nContributions of more examples are very welcome.\nExamples The shipped shiro.ini has additional comments.\nThe default configuration. This is a minimal configuration for the default configuration.\n[main] localhost=org.apache.jena.fuseki.authz.LocalhostFilter [urls] ## Control functions open to anyone /$/server = anon /$/ping = anon ## and the rest are restricted to localhost. ## See above for 'localhost' /$/** = localhost /**=anon Simple user/password This extract shows the simple user/password setup.\nIt adds a [users] section and changes the /$/** line in [urls]\n[users] admin=pw [urls] ## Control functions open to anyone /$/status = anon /$/ping = anon /$/** = authcBasic,user[admin] # Everything else /**=anon ","permalink":"https://jena.apache.org/documentation/fuseki2/fuseki-security.html","tags":null,"title":"Security in Fuseki2"},{"categories":null,"contents":"This page covers security for Fuseki Main.\nSee other documentation for the webapp packaging of Fuseki.\nServing RDF For any use of users-password information, and especially HTTP basic authentication, information is visible in the HTTP headers. When serving RDF and SPARQL requests, using HTTPS is necessary to avoid snooping. Digest authentication is also stronger over HTTPS because it protects against man-in-the-middle attacks.\n","permalink":"https://jena.apache.org/documentation/fuseki2/fuseki-main-security.html","tags":null,"title":"Security in Fuseki2 server"},{"categories":null,"contents":"The service enhancer (SE) plugin extends the functionality of the SERVICE clause with:\nBulk requests Correlated joins also known as lateral joins A streaming cache for SERVICE requests results which can also cope with bulk requests and correlated joins. Furthermore, queries that only differ in limit and offset will result in cache hits for overlapping ranges. At present, the plugin only ships with an in-memory cache provider. As a fundamental principle, a request making use of cache and bulk should return the exact same result as if those settings were omitted. As a consequence, runtime result set size recognition (RRR) is employed to reveal hidden result set limits. This is used to ensure that always only the appropriate amount of data is returned from the caches.\nA correlated join using this plugin is syntactically expressed with SERVICE \u0026lt;loop:\u0026gt; {}. It is a binary operation on two graph patterns: The operation \u0026ldquo;loops\u0026rdquo; over every binding obtained from evaluation of the left-hand-side (lhs) and uses it as an input to substitute the variables of the right-hand-side (rhs). Afterwards, the substituted rhs is evaluated to sequence of bindings. Each rhs binding is subsequently merged with lhs\u0026rsquo; input binding to produce a solution binding of the join.\nExample The following query demonstrates the features of the service enhancer. It executes as a single remote request to Wikidata:\nPREFIX rdfs: \u0026lt;http://www.w3.org/2000/01/rdf-schema#\u0026gt; PREFIX wd: \u0026lt;http://www.wikidata.org/entity/\u0026gt; SELECT ?s ?l { # The ids below correspond in order to: Apache Jena, Semantic Web, RDF, SPARQL, Andy Seaborne VALUES ?s { wd:Q1686799 wd:Q54837 wd:Q54872 wd:Q54871 wd:Q108379795 } SERVICE \u0026lt;cache:loop:bulk+5:https://query.wikidata.org/sparql\u0026gt; { SELECT ?l { ?s rdfs:label ?l FILTER(langMatches(lang(?l), \u0026#39;en\u0026#39;)) } ORDER BY ?l LIMIT 1 } } Click here to view the rewritten query SELECT * WHERE { { { { SELECT * WHERE { { SELECT ?l WHERE { \u0026lt;http://www.wikidata.org/entity/Q1686799\u0026gt; \u0026lt;http://www.w3.org/2000/01/rdf-schema#label\u0026gt; ?l FILTER langMatches(lang(?l), \u0026#34;en\u0026#34;) } } BIND(0 AS ?__idx__) } LIMIT 1 } } UNION { { { SELECT * WHERE { { SELECT ?l WHERE { \u0026lt;http://www.wikidata.org/entity/Q54837\u0026gt; \u0026lt;http://www.w3.org/2000/01/rdf-schema#label\u0026gt; ?l FILTER langMatches(lang(?l), \u0026#34;en\u0026#34;) } } BIND(1 AS ?__idx__) } LIMIT 1 } } UNION { { { SELECT * WHERE { { SELECT ?l WHERE { \u0026lt;http://www.wikidata.org/entity/Q54872\u0026gt; \u0026lt;http://www.w3.org/2000/01/rdf-schema#label\u0026gt; ?l FILTER langMatches(lang(?l), \u0026#34;en\u0026#34;) } } BIND(2 AS ?__idx__) } LIMIT 1 } } UNION { { { SELECT * WHERE { { SELECT ?l WHERE { \u0026lt;http://www.wikidata.org/entity/Q54871\u0026gt; \u0026lt;http://www.w3.org/2000/01/rdf-schema#label\u0026gt; ?l FILTER langMatches(lang(?l), \u0026#34;en\u0026#34;) } } BIND(3 AS ?__idx__) } LIMIT 1 } } UNION { { SELECT * WHERE { { SELECT ?l WHERE { \u0026lt;http://www.wikidata.org/entity/Q108379795\u0026gt; \u0026lt;http://www.w3.org/2000/01/rdf-schema#label\u0026gt; ?l FILTER langMatches(lang(?l), \u0026#34;en\u0026#34;) } } BIND(4 AS ?__idx__) } LIMIT 1 } } } } } } UNION # This union member adds an end marker # Its absence in responses is # used to detect result set size limits { BIND(1000000000 AS ?__idx__) } } ORDER BY ASC(?__idx__) ?l Note that in the query above ?s has been substituted based on the respective input bindings (in this case the Wikidata IRIs). For every bulk query execution, the SE plugin assigns an increasing ID to every input binding (starting from 0). This ID is included in the service request via the ?__idx__ variable. (If the variable is already used then an unused name is allocated by appending a number such as ?__idx__1). Every obtained binding\u0026rsquo;s ?__idx__ value determines the input binding that has to be merged with in order to produce the final binding. A special value for ?__idx__ is the end marker. It is a number higher than any input binding ID and it is used to detect result set size limits: It\u0026rsquo;s absence in a result set means that it was cut off. This information is used to ensure that a request using a certain service IRI does not yield more results than limit.\nNote, that a repeated execution of a query (possibly with different limits/offsets) will serve the data from cache rather than making another remote request. The cache operates on a per-input-binding basis: For instance, in the example above it means that when removing bindings from the VALUES block data will still be served from the cache. Conversely, adding additional bindings to the VALUES block will only send a (bulk) remote request for those that lack cache entries.\nSERVICE loop: vs LATERAL Since Jena 4.7.0 the SPARQL engine has native support for the LATERAL keyword which should almost always be preferred over SERVICE \u0026lt;loop:\u0026gt;. The use of SERVICE \u0026lt;loop:\u0026gt; is essentially only justified in combination with bulk requests, such as SERVICE \u0026lt;loop:bulk+5:\u0026gt;.\nAlso note, that the semantics of loop: and LATERAL differ: the former substitutes variables regardless of scope, whereas the latter substitutes only in-scope variables. Another difference is, that loop: creates a new execution context for each request (even for local ones) such that the NOW() function will yield increasing timestamps as query execution progresses.\nCurrently, the SE plugin does not support bulk requests under LATERAL semantics.\nNamespace The plugin introduces the namespace http://jena.apache.org/service-enhancer# which is used for both ARQ context symbols as well as assembler configuration.\nMaven Dependency \u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-serviceenhancer\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;\u0026lt;!-- Check the link below for available versions --\u0026gt;\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; Available Versions.\nAdding this dependency will automatically initialize the plugin via service-loading of org.apache.jena.sparql.service.enhancer.init.ServiceEnhancerInit using Jena\u0026rsquo;s plugin system.\nProgrammatic Setup Loading the jena-serviceenhancer jar file automatically enables bulk requests and caching. Correlated joins however require explicit activation because they require specific algebra transformations to run as part of the query optimization process. For more details about the transformation see Programmatic Algebra Transformation.\nThe following snippet globally enables correlated joins by overriding the context\u0026rsquo;s optimizer:\nimport org.apache.jena.sparql.service.enhancer.init.ServiceEnhancerInit; ServiceEnhancerInit.wrapOptimizer(ARQ.getContext()); As usual, in order to avoid a global setup, the context of a dataset or statement execution (i.e. query / update) can be used instead:\nDatasetFactory dataset = DatasetFactory.create(); ServiceEnhancerInit.wrapOptimizer(dataset.getContext()); The lookup procedure for which optimizer to wrap first consults the given context and then the global one. If neither has an optimizer configured then Jena\u0026rsquo;s default one will be used.\nService requests that do not make use of this plugin\u0026rsquo;s options will not be affected even if the plugin is loaded. The plugin registration makes use of the custom service executor extension system.\nAssembler The se:DatasetServiceEnhancer assembler can be used to enable the SE plugin on a dataset. This procedure also automatically enables correlated joins using the dataset\u0026rsquo;s context as described in Programmatic Setup. By default, the SE assembler alters the base dataset\u0026rsquo;s context and returns the base dataset again. There is one important exception: If se:enableMgmt is true then the assembler\u0026rsquo;s final step is to create a wrapped dataset with a copy of the original dataset\u0026rsquo;s context where enableMgmt is true. This way, management functions are not available in the base dataset.\n# assembler.ttl PREFIX ja: \u0026lt;http://jena.hpl.hp.com/2005/11/Assembler#\u0026gt; PREFIX se: \u0026lt;http://jena.apache.org/service-enhancer#\u0026gt; \u0026lt;urn:example:root\u0026gt; a se:DatasetServiceEnhancer ; ja:baseDataset \u0026lt;urn:example:base\u0026gt; ; se:datasetId \u0026lt;https://my.dataset.id/\u0026gt; ; # Defaults to the value of ja:baseDataset se:cacheMaxEntryCount 300 ; # Maximum number of cache entries ; # identified by the tuple (service IRI, query, input binding) se:cacheMaxPageCount 15 ; # Maximum number of pages per cache entry se:cachePageSize 10000 ; # Number of bindings per page se:bulkMaxSize 100 ; # Maximum number of input bindings to group into a bulk request se:bulkSize 30 ; # Default bulk size when not specifying a size se:bulkMaxOutOfBandSize 30 ; # Dispatch non-full batches as soon as this number of non-fitting # input bindings have been encountered se:enableMgmt false # Enables management functions; # wraps the base dataset with an independent context . \u0026lt;urn:example:base\u0026gt; a ja:MemoryDataset . In the example above, the shown values for se:cacheMaxEntryCount, se:cacheMaxPageCount, se:cachePageSize, se:bulkMaxSize, se:bulkSize and se:bulkMaxOutOfBandSize are the defaults which are used if those options are left unspecified. They allow for caching up to 45mio bindings (300 x 15 x 10000). There is one caveat though: Specifying the cache options puts a new a cache instance in the dataset\u0026rsquo;s context. Without these options the global cache instance that is registered in the ARQ context by the SE plugin during service loading is used. Presently, the global instance cannot be configured via the assembler.\nCreating a dataset from the specification above is programmatically accomplished as follows:\nModel spec = RDFDataMgr.load(\u0026#34;assembler.ttl\u0026#34;); Dataset dataset = DatasetFactory.assemble(spec.getResource(\u0026#34;urn:example:root\u0026#34;)); The value of se:datasetId is used to look up caches when referring to the active dataset using SERVICE \u0026lt;urn:x-arq:self\u0026gt; {}.\nConfiguration with Fuseki Supplying Service Enhancer Dependencies Before Jena 4.10 No additional dependencies are needed.\nJena 4.10 and Jena 5 Guava needs to be supplied externally, as it is no longer part of Jena\u0026rsquo;s core.\nA Guava JAR can be downloaded manually, such as from the Maven Central repository, by picking a guava-VERSION.jar file from the published Guava versions. Typically, all newer versions should work. When in doubt, cross-check with the declaration(s) in the service enhancer\u0026rsquo;s POM file.\nThe POM file of the SE module includes a bundle profile which can be used to build a JAR bundle using Apache Maven using the commands below. Be sure to replace VERSION with a version that matches that of your Fuseki setup. See also the published Service Enhancer versions.\n# Fetch the serviceenhancer pom # and save it as ./jena-serviceenhancer.pom mvn dependency:copy -D\u0026#39;artifact=org.apache.jena:jena-serviceenhancer:VERSION:pom\u0026#39; \\ -Dmdep.stripVersion=true -D\u0026#39;outputDirectory=.\u0026#39; # Build using the \u0026#39;bundle\u0026#39; profile which creates the (shaded) JAR bundle # ./target/jena-serviceenhancer-VERSION.jar mvn -f jena-serviceenhancer.pom -Pbundle package Adding the Service Enhancer JAR This section assumes that one of the distributions of apache-jena-fuseki has been downloaded from [https://jena.apache.org/download/]. The extracted folder should contain the ./fuseki-server executable start script which automatically loads all jars (relative to $PWD) under run/extra. These folders need to be created e.g. using mkdir -p run/extra. The SE plugin can be manually built or downloaded from maven central, as described in the prior section. Placing it into the run/extra folder makes it available for use with Fuseki. The plugin and Fuseki version should match.\nFuseki Assembler Configuration The snippet below shows a simple setup of enabling the SE plugin for a given base dataset. Cache management can be performed via SPARQL extension functions. However, usually not every user should be allowed to invalidate caches as this could be exploited for service disruptions. Jena does not directly provide a security model for access privileges on functions such as known from SQL DBMSs. However, with Fuseki it is possible to create both a public and an admin endpoint over the same base dataset:\n\u0026lt;#myServicePublic\u0026gt; a fuseki:Service; fuseki:name \u0026#34;test\u0026#34;; fuseki:dataset \u0026lt;#myDsPublic\u0026gt; . \u0026lt;#myServiceAdmin\u0026gt; a fuseki:Service; fuseki:name \u0026#34;testAdmin\u0026#34;; fuseki:dataset \u0026lt;#myDsAdmin\u0026gt; . \u0026lt;#myDsPublic\u0026gt; a se:DatasetServiceEnhancer ; ja:baseDataset \u0026lt;#myDsBase\u0026gt; . \u0026lt;#myDsAdmin\u0026gt; a se:DatasetServiceEnhancer ; ja:baseDataset \u0026lt;#myDsBase\u0026gt; ; se:enableMgmt true . \u0026lt;#myDsBase\u0026gt; a ja:MemoryDataset . For configuring access control with Fuseki please refer to Data Access Control for Fuseki.\nContext Symbols The service enhancer plugin defines several symbols for configuration via context. The context symbols are in the namespace http://jena.apache.org/service-enhancer#.\nSymbol Value type Default* Description enableMgmt boolean false This symbol must be set to true in the context in order to allow calling certain \u0026ldquo;privileged\u0026rdquo; SPARQL functions. serviceBulkBindingCount int 10 Number of bindings to group into a single bulk request serviceBulkMaxBindingCount int 100 Maximum number of input bindings to group into a single bulk request; restricts serviceBulkRequestItemCount. When using bulk+n then n will be capped to the configured value. serviceBulkMaxOutOfBandBindingCount int 30 Dispatch non-full batches as soon as this number of non-fitting bindings have been read from the input iterator datasetId String null An IRI to resolve urn:x-arq:self to. Used to discriminate cache entries for self-referenced datasets. serviceCache ServiceResponseCache null Symbol for the cache of services\u0026rsquo; result sets serviceResultSizeCache ServiceResultSizeCache null Symbol for the cache of services\u0026rsquo; result set sizes * The value that is assumed if the symbol is absent.\nThe class org.apache.jena.sparql.service.enhancer.init.ServiceEnhancerConstants defines the constants for programmatic usage. As usual, context attributes can be set on global, dataset and query execution level:\n// Global level ARQ.getContext().set(ServiceEnhancerConstants.serviceBulkBindingCount, 5); // Dataset level Dataset dataset = DatasetFactory.create(); dataset.getContext().set(ServiceEnhancerConstants.datasetId, \u0026#34;http://example.org/myDatasetId\u0026#34;); // Query Execution level try (QueryExecution qe = QueryExecutionFactory.create(dataset, \u0026#34;SELECT * { ?s ?p ?o }\u0026#34;)) { qe.getContext().set(ServiceEnhancerConstants.enableMgmt, true); // ... } Service Options The service option syntax is used to express a list of key-value pairs followed by an optional IRI. The first pair must always be terminated by a : in order to avoid misinterpreting it as a relative IRI which would be resolved against the configured base IRI. Multiple pairs are separated using :. Pairs may be followed by an IRI for the service. If it is absent, then the IRI urn:x-arq:self is implicitly assumed.\n(key[+value]:)* (key[+value][:] | IRI) The special IRI urn:x-arq:self is used to refer to the active dataset. This is the dataset the query is executed against. If service options are present that are not followed by an IRI then this IRI is assumed. Consequently, Both e.g. SERVICE \u0026lt;cache:\u0026gt; or SERVICE \u0026lt;bulk:loop\u0026gt; refer the active dataset.\nBulk Requests The bulk key enables bulk requests. The default bulk size is based on serviceBulkBindingCount. It can be overridden using e.g. SERVICE \u0026lt;bulk+20:\u0026gt; {...}. The specified number is silently capped by serviceBulkMaxBindingCount.\nExecution of a bulk request proceeds by first taking n items from the lhs to form a batch. Then the bulk query is generated by forming a union where the service\u0026rsquo;s graph pattern is substituted with every input binding in the batch as shown in the example.\nCorrelated Joins Informally, conventional joins in SPARQL are bottom-up such that the result of a join is obtained from evaluating the lhs and rhs of a join independently and merging all compatible bindings (and discarding the incompatible ones). Correlated joins are left-to-right such that each binding obtained from lhs\u0026rsquo;s evaluation is used to substitute the rhs prior to its evaluation. Correlated joins alter the scoping rules of variables as demonstrated by the subsequent two examples.\nThe following concepts are relevant to understand the how correlated joins are dealt with:\nScope rename SPARQL evaluation has a notion of scoping which determines whether a variable will be part of the solution bindings created from a graph pattern as defined here. Jena provides TransformScopeRename which renames variables such as their names are globally. Jena\u0026rsquo;s scope renaming prepends / characters before the original variable name so ?x may become ?/x or ?//x. TransformScopeRename is applied by the default optimizer. Substitution When evaluating the lhs of a join then the scope renaming enables that for each obtained binding all variables on the rhs can be substituted with the corresponding values of that binding. Base name The base name of a variable is it\u0026rsquo;s name without scoping. For example the variables ?x, ?/x and ?//x all have the base name x. Join key A join key of a join operation is the set of variables that is the intersection of lhs\u0026rsquo; visible variables with rhs\u0026rsquo; mentioned ones. Join binding A join binding is obtained by projecting an lhs\u0026rsquo; input binding with a join key. It is used to substitute variables on the rhs and is part of the key object used in caching. Example of Scoping in a Conventional join Consider the following example.\nSELECT ?p ?c { BIND(\u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type\u0026gt; AS ?p) { SELECT (COUNT(*) AS ?c) { ?s ?p ?o } } } Note that the ?p on the right hand side becomes scoped as ?/p. Consequently, lhs\u0026rsquo; ?p and rhs\u0026rsquo; ?/p are considered different variables.\n(project (?p ?c) (join (extend ((?p \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type\u0026gt;)) (table unit)) (project (?c) (extend ((?c ?/.0)) (group () ((?/.0 (count))) (bgp (triple ?/s ?/p ?/o))))))) # ?/p is different from the ?p on the lhs Because there is no overlap in the variables on either side of the join the join key is the empty set of variables.\nExample of Scoping in a Correlated Join The two effects of the loop: transform are shown below. First, a sequence is enforced. And second, the scope of ?p is now the same on the lhs and rhs.\nSELECT ?p ?c { BIND(\u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type\u0026gt; AS ?p) SERVICE \u0026lt;loop:\u0026gt; { SELECT (COUNT(*) AS ?c) { ?s ?p ?o } } } The obtained algebra now includes sequence instead of join and the variable ?p appears on both sides of it:\n(project (?p ?c) (sequence (extend ((?p \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type\u0026gt;)) (table unit)) (service \u0026lt;loop:\u0026gt; (project (?c) (extend ((?c ?/.0)) (group () ((?/.0 (count))) (bgp (triple ?/s ?p ?/o)))))))) # ?p is now the same here and on the lhs The join key is set containing ?p because this variable appears on either side of the join. The lhs will produce a single join binding where ?p is assigned to rdf:type.\nUpon evaluation, for each input binding of the lhs the ?p on the rhs is now substituted thus giving the count for the specific property. Note, that the cache system of this plugin caches per join binding even for bulk requests. Hence, use of SERVICE \u0026lt;loop:cache\u0026gt; {...} will produce cache hits for repeated join bindings regardless of the pattern on the lhs.\nProgrammatic Algebra Transformation In order to make loop: work the following machinery is in place:\nThe algebra transformation implemented by TransformSE_JoinStrategy needs to run bothe before and after the default algebra optimization. The reason is that is does two things:\nIt converts every OpJoin instance with a loop: on the right hand side into a OpSequence. Any mentioned variable on the rhs whose base name matches the base name of a visible variable on the lhs gets substituted by the lhs variable. String queryStr = \u0026#34;SELECT ...\u0026#34;; // Put any example query string here Transform loopTransform = new TransformSE_JoinStrategy(); Op op0 = Algebra.compile(QueryFactory.create(queryStr)); Op op1 = Transformer.transform(loopTransform, op0); Op op2 = Optimize.stdOptimizationFactory.create(ARQ.getContext()).rewrite(op1); Op op3 = Transformer.transform(loopTransform, op2); System.out.println(op3); Caching Any graph pattern contained in a SERVICE \u0026lt;cache:\u0026gt; { } block is subject to caching. The key of a cache entry is composed of three components:\nThe concrete service IRI The input binding that originates from the lhs The (algebra of) the SERVICE clause\u0026rsquo;s graph pattern (the rhs) The cache is slice-aware: If the rhs corresponds to a SPARQL query making use of LIMIT and/or OFFSET then the cache lookup will find any priorly fetched overlapping ranges and derive a backend request that only fetches the needed parts.\nThe cache service option can be used with the following values:\ncache: Read from cache when possible and write retrieved data to cache cache+default: Same as cache. cache+clear: Clears all cache entries for the current batch of input bindings. cache+off: Disables use the cache in the query execution PREFIX rdf: \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026gt; SELECT * { BIND(rdf:type AS ?p) SERVICE \u0026lt;loop:cache:\u0026gt; { SELECT * { ?s ?p ?o } OFSET 10 LIMIT 10 # ^ Altering limit/offset will match overlapping ranges of data in the cache } } Note, that in pathological cases this can require a bulk request to be repeatedly re-executed with disabled caches for each input binding. For example, assume that the largest result yet set seen for a service is 1000 and the system is about to serve the 1001st binding from cache for a specific input binding. The question is whether this would exceed the service\u0026rsquo;s so far unknown result set size limit. Therefore, in order to answer that question a remote request that bypasses the cache is performed. Furthermore, let\u0026rsquo;s assume that another request produced 2000 results. Then the problem repeats once another input binding\u0026rsquo;s 2001st result were about to be served.\nSPARQL Functions The service enhancer plugin introduces functions and property functions for listing cache content and removing cache entries. The namespace is\nPREFIX se: \u0026lt;http://jena.apache.org/service-enhancer#\u0026gt; Signature Description long se:cacheRm() Invalidates all entries from the cache that are not currently in use. Returns the number of invalidated entries. long se:cacheRm(long) Attempts to remove the given entry. Returns 1 on success or 0 otherwise (e.g. entry did not exist or was still in use). ?id se:cacheLs ([?serviceIri [?queryStr [?inputBindingStr]]]) Property function to list cache content. PREFIX sepf: \u0026lt;java:org.apache.jena.sparql.service.enhancer.pfunction.\u0026gt; SELECT * WHERE { ?id sepf:cacheLs (?service ?query ?binding) } If e.g. data was cached using the following query, then se:cacheLs will yield the result set below.\nSELECT * { SERVICE \u0026lt;loop:\u0026gt; { { SERVICE \u0026lt;cache:\u0026gt; { SELECT (\u0026lt;urn:x-arq:DefaultGraph\u0026gt; AS ?g) ?p (COUNT(*) AS ?c) { ?s ?p ?o } GROUP BY ?p } } UNION { SERVICE \u0026lt;cache:\u0026gt; { SELECT ?g ?p (COUNT(*) AS ?c) { GRAPH ?g { ?s ?p ?o } } GROUP BY ?g ?p } } } # FILTER(CONTAINS(STR(?g), \u0026#39;filter over ?g\u0026#39;)) # FILTER(CONTAINS(STR(?p), \u0026#39;filter over ?p\u0026#39;)) } order by DESC(?c) ?g ?p ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | id | service | query | binding | ======================================================================================================================================================================== | 2 | \u0026#34;urn:x-arq:self@dataset813601419\u0026#34; | \u0026#34;SELECT (\u0026lt;urn:x-arq:DefaultGraph\u0026gt; AS ?g) ?p (count(*) AS ?c)\\nWHERE\\n { ?s a ?o }\\nGROUP BY ?p\\n\u0026#34; | \u0026#34;( ?p = rdf:type )\u0026#34; | | 3 | \u0026#34;urn:x-arq:self@dataset813601419\u0026#34; | \u0026#34;SELECT ?g ?p (count(*) AS ?c)\\nWHERE\\n { GRAPH ?g\\n { ?s a ?o }\\n }\\nGROUP BY ?g ?p\\n\u0026#34; | \u0026#34;( ?p = rdf:type )\u0026#34; | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Example: Invaliding All Cache Entries PREFIX se: \u0026lt;http://jena.apache.org/service-enhancer#\u0026gt; SELECT (se:cacheRm() AS ?count) { } Example: Invalidating Specific Cache Entries PREFIX se: \u0026lt;http://jena.apache.org/service-enhancer#\u0026gt; SELECT SUM(se:cacheRm(?id) AS ?count) { ?id se:cacheLs (\u0026lt;http://dbpedia.org/sparql\u0026gt;) } For completeness, the functions can be addressed via their fully qualified Java class names:\n\u0026lt;java:org.apache.jena.sparql.service.enhancer.pfunction.cacheLs\u0026gt; \u0026lt;java:org.apache.jena.sparql.service.enhancer.function.cacheRm\u0026gt; Limitations, Troubleshooting and Pitfalls Storing Caches to Disk At present the plugin only ships with an in-memory implementation of the cache. Custom storage strategies can be implemented based one the interface Slice. A file-based storage system is expected to be shipped with a later version of the SE plugin.\nCaching with Virtuoso There is a bug in Virtuoso that causes queries making use of DISTINCT a with non-zero OFFSET without LIMIT to fail. The remainder shows how the SE plugin may unexpectedly fail due to it and shows a workaround.\nThe following query will cause caching of the first 10 results:\nSELECT \u0026lt;cache:http://dbpedia.org/sparql\u0026gt; { SELECT DISTINCT ?s { ?s a ?o } ORDER BY ?s LIMIT 10 } Executing the the following query afterwards will fail:\nSELECT \u0026lt;cache:http://dbpedia.org/sparql\u0026gt; { SELECT DISTINCT ?s { ?s a ?o } ORDER BY ?s } The reason is that the first 10 results will be read from cache and the actual query sent as a remote request is:\nSELECT \u0026lt;cache:http://dbpedia.org/sparql\u0026gt; { SELECT DISTINCT ?s { ?s a ?o } ORDER BY ?s OFFSET 10 } Thus we end up with a query using DISTINCT with a non-zero offset and without LIMIT.\nAs a workaround, note that if the service enhancer plugin detects a result set size limit then it will inject it in remote requests. In such cases, executing the query SELECT * { SERVICE \u0026lt;http://dbpedia.org/sparql\u0026gt; { ?s ?p ?o } } once will make the result set size limit known (at the time of writing DBpedia was configured with a limit of 10000), and therefore the modified request becomes\nSELECT \u0026lt;cache:http://dbpedia.org/sparql\u0026gt; { SELECT DISTINCT ?s { ?s a ?o } ORDER BY ?s OFFSET 10 LIMIT 10000 } Order of Bindings differ between Cache and Remote Reads In practice, many triple store engines return the same response for the same graph pattern / query over the same physical database even if ordering is absent. As can be seen from example, bulk requests result in a union which are sorted by the serial numbers assigned to the input bindings. However, SPARQL does not mandate stable sorting, therefore this approach may cause bindings with the same serial number to become \u0026lsquo;shuffled\u0026rsquo;. The solution is to is to include sort sufficient conditions in the SERVICE\u0026rsquo;s graph pattern. The bulk query will include those sort conditions after the serial number sort condition.\n","permalink":"https://jena.apache.org/documentation/query/service_enhancer.html","tags":null,"title":"Service Enhancer"},{"categories":null,"contents":"SOH (SPARQL Over HTTP) is a set of command-line scripts for working with SPARQL 1.1. SOH is server-independent and will work with any compliant SPARQL 1.1 system offering HTTP access.\nSOH is written in ruby.\nCommands:\ns-http – SPARQL 1.1 HTTP Protocol s-get, s-put, s-delete, s-post, s-head – abbreviation for s-http get ... etc. s-query – SPARQL 1.1 Query, both GET and POST of queries. s-update – SPARQL 1.1 Update s-update-form – SPARQL 1.1 Update using the HTML form and a parameter of request=. Each command supports the -v flag to print out details of the HTTP interaction.\nSOH SPARQL Query s-query --service=endpointURL 'query string' s-query --service=endpointURL --query=queryFile.rq SOH SPARQL HTTP The SPARQL Graph Store Protocol is a way to read, create and update whole graphs in an RDF Dataset. It is useful for data management and building into automated processes because it is easy to script with tools like curl or wget.\nSOH provides commands that simplify the use HTTP further by setting HTTP headers based on the operation performed.\nThe syntax of the commands is:\ns-http VERB datasetURI graphName [file] where graph name is a URI or the word default for the default graph.\ns-get, s-put, s-delete, s-post are abbreviations for s-http get, s-http put, s-http delete and s-http post respectively.\nfile is needed for PUT and POST. The file name extension determines the HTTP content type.\ns-put http://localhost:3030/dataset default data.ttl s-get http://localhost:3030/dataset default s-put http://localhost:3030/dataset http://example/graph data.ttl s-get http://localhost:3030/dataset http://example/graph SOH SPARQL Update s-update --service=endpointURL 'update string' s-update --service=endpointURL --update=updateFile.ru Service endpoints SOH is a general purpose set of scripts that work with any SPARQL 1.1. server. Different servers offer different naming conventions for HTTP REST, query and update. This section provides summary information about using SOH with some servers. See the documentation for each server for authoritative information.\nIf you have details for other servers, get involved\nFuseki If a Fuseki server is run with the command:\nfuseki-server --update --mem /MyDataset If you want to run fuseki server with persistent data store then use the following command:\nfuseki-server --loc=[store location] --update /MyDataset then the service endpoints are:\nHTTP: http://localhost:3030/MyDataset/data Query: http://localhost:3030/MyDataset/query or http://localhost:3030/MyDataset/sparql Update: http://localhost:3030/MyDataset/update ","permalink":"https://jena.apache.org/documentation/fuseki2/soh.html","tags":null,"title":"SOH - SPARQL over HTTP"},{"categories":null,"contents":"PAGE This section covers \u0026hellip;\nNext: @@\n","permalink":"https://jena.apache.org/tutorials/sparql_page.html","tags":null,"title":"Sparql page"},{"categories":null,"contents":"A way to write down data structures in an RDF-centric syntax.\nBut not an idea for another RDF serialization format.\nNeed The SPARQL algebra defines the semantics of a SPARQL graph pattern. Every SPARQL query string (the syntax) is mapped to a SPARQL algebra expression.\nIt is convenient to be able to print out such algebra expressions for discussion between people and for debugging. Further, if algebra expressions can be read back in as well, testing of specific parts of an implementation is also easier.\nThis is an example of a general problem : how to express data structures where the basic elements of RDF are based on RDF nodes.\nRDF itself is often the most appropriate way to do this, but sometimes it isn\u0026rsquo;t so convenient. An algebra expression is a tree, and order matters.\nWhen expressing a data structure, there are certain key structure that need to be expressible: arrays and maps, then sets and bags, but expression of a data structure is not the same as the high-level semantics of the data structure.\nA stack can be expressed as a list. And because we want to express the structure, and not express the operations on the structures, data structures with operational meaning don\u0026rsquo;t enter the picture. There are no operations, no push, pop or peek.\nNote that this is to express a data structure, not encode or represent it. By express we mean communicate it, between people or between cooperating machines. The structures are not completely self-representing. But we do discuss a way to express in RDF that does give a self-describing nature through the use of tagged structures.\nDesign Intent Concise (=\u0026gt; for people to write conveniently) format for data structures RDF-centric Non-goals:\nto directly represent any data structure. to be another RDF syntax. So desirable features are:\nConcise syntax for RDF terms Datastructures Other Approaches RDF RDF is \u0026ldquo;map-centric\u0026rdquo; but not all data structures are conveniently expressible in maps. RDF has lists, and these lists have convenient syntax in Turtle or N3.\nIf your data structure fits the RDF paradigm, then RDF is a better choice that SSE. Below is a possible mapping from SSE to RDF as Turtle.\nLisp Lacks convenient syntax for the RDF terms themselves.\nSSE syntax is almost valid Scheme; literal language tags and datatypes get split a separate list symbols but the information is recoverable. Scheme doesn\u0026rsquo;t use [] lists or single-quoted strings.\nXML Too verbose.\nJSON JSON provides values (strings, numbers, booleans, null), arrays and object (which are maps). SPARQL Query Results in JSON shows how JSON might be used. It describes how RDF terms are encoded into further substructures. Alternatively, we could put encoded terms in strings like \u0026ldquo;http://w3.org/\\\u0026rdquo; and have a parser-within-a-parser. But both these approaches do not make the writing of RDF terms as easy as it could be.\nDesign S-expressions using RDF terms.\nThe command arq.qparse --print=op --file queryFile will print the SPARQL algebra for the query in SSE format.\nTokens Tokens are the atomic elements of the syntax.\nExample Explanation \u0026quot;abc\u0026quot; string \u0026quot;abc\u0026quot;@en string with language tag. 123 number, specifically an xsd;integer. \u0026lt;http://example.org/\u0026gt; IRI (or URI). _:abc blank node. ?x variable ? variable ex:thing prefixed name ex:123 prefixed name SELECT symbol + symbol @xyz symbol For ? (no name), a unique, internal name for a fresh variable will be allocated; every use of ? is a different variable.\n??x creates a non-distinguished variable. ?? creates a fresh non-distinguished variable.\n_: creates a fresh blank node.\n@xyz - this is a symbol because a language tags only follow a lexical form.\nAlmost any sequence of characters which is not an RDF term or variable is a symbol that can be given special meaning by processing software.\nSSE Comments # or ; introduce comments, which run to the end of line, including the end-of-line characters.\nSSE Escapes \\u and \\U escape sequences for arbitrary Unicode codepoints. These apply to the input character stream before parsing. They don\u0026rsquo;t, for example, permit a space in a symbol.\nStrings provide \\n, \\t, \\r, \\b, \\b, \\f, \\\u0026quot;, \\' and \\\\ escape sequences as in SPARQL.\nStructures (?x ns:p \u0026quot;abc\u0026quot;) - list of 3 elements: a variable, a prefixed name and a string\n(bgp [?x ns:p \u0026quot;abc\u0026quot;]) A list of 2 elements: a symbol (bgp) and a list of 3 elements. Both () and [] delimit lists; they must match but otherwise it\u0026rsquo;s a free choice. Convention is that compact lists use []; large lists use ().\nTagged Structures The basic syntax defines tokens and lists. Higher level processing happens on this basic syntax and can be extended by interpreting the structure.\nLayers on top of the basic abstract syntax produce specialised data structures. This can be a transformation into a new SSE structure or the production of programming language objects.\nThis is driven by tagged (data) objects in an SSE expression. The tag is a symbol and the elements of the data object are the rest of the list.\n(+ 1 2) is tagged with symbol +\n(triple ?s ?p \u0026quot;text\u0026quot;@en) is tagged with symbol triple\nIRI resolution One such layer is IRI and prefix name resolution, using tags base and prefix.\nBasic syntax includes unresolved IRIs, (example \u0026lt;abc\u0026gt;) and prefixed names (example foaf:name). These are turned into absolute IRIs and the base and prefix tagged object wrappers are removed.\nThis is sufficiently important that the SSE library handles this in an optimized fashion where the IRI processing directly rewrites the streamed output of the parser.\nbase (base \u0026lt;http://example/\u0026gt; (triple \u0026lt;xyz\u0026gt; ?p \u0026quot;lex\u0026quot;^^\u0026lt;thing\u0026gt;)) becomes\n(triple \u0026lt;http://example/xyz\u0026gt; ?p \u0026quot;lex\u0026quot;^^\u0026lt;http://example/thing\u0026gt;) prefix (prefix ((: \u0026lt;http://example/\u0026gt;) (ns: \u0026lt;http://example/ns#\u0026gt;)) (triple :x ns:p \u0026quot;lex\u0026quot;^^ns:type)) becomes\n(triple \u0026lt;http://example/x\u0026gt; \u0026lt;http://example/ns#p\u0026gt; \u0026quot;lex\u0026quot;^^\u0026lt;http://example/ns#type\u0026gt;) Nesting The tagged structures can be combined and nested. The base or prefixes declared only apply to the body of the data object.\n(prefix ((: \u0026lt;http://jena.hpl.hp.com/2007/\u0026gt;) (foaf: \u0026lt;http://xmlns.com/foaf/0.1/\u0026gt;)) (triple (base \u0026lt;http://jena.hpl.hp.com/\u0026gt; \u0026lt;afs\u0026gt; foaf:name \u0026quot;Andy\u0026quot;))) Combined with the triple builder, this will produce a triple:\n\u0026lt;http://jena.hpl.hp.com/afs\u0026gt; \u0026lt;http://xmlns.com/foaf/0.1/name\u0026gt; \u0026quot;Andy\u0026quot; . Links Not implemented\nNot all data structures can be conveniently expressed as nested lists. Sub-element sharing matters. A structure with shared elements can\u0026rsquo;t be serialized as a strict tree and some form of reference is needed.\nName a place in the structure: (name@ symbol X)\nLink to it: (@link symbol)\nThe link layer will produce an SSE structure without these tags, having replaced all name@ and @link with the shared structure X.\n@ is a convention for referencing.\nBuilding Java Objects Builders are code classes that process the structure into Java objects. Writing builders is straight-forward because low-level parsing details have been taken care of in the basic syntax. A typical builder is a recursive-decent parser over the abstract syntax tree, coding one is primarily walking the structure, with a tagged object to Java instance mapping being applied.\nSome tagged objects with builders are:\n(triple S P O) where X is an RDF node (RDF term or variable). (quad G S P O) (graph triple*) (graph@ URL) — Read a URL. @@ Need to write the abstract syntax for each tagged object\nMany builders have convenience syntax. Triples can be abbreviated by omitting the tag triple because usually the fact it is a triple is clear.\n(bgp (triple ?s ?p ?o)) (bgp (?s ?p ?o)) Quads have a similar abbreviation as 4-lists. In addition, _ is a quad on the default graph.\nElements for executing SPARQL:\nSPARQL algebra operators and basic graph patterns Filter expressions (in prefix notation (+ 1 2)) Query solutions (Bindings) and tables. SSE Factory The class SSE in package org.apache.jena.sparql.sse provides many convenience functions to call builders for RDF and SPARQL structures.\nNode n = SSE.parseNode(\u0026quot;\u0026lt;http://example/node\u0026gt;\u0026quot;) ; Triple t = SSE.parseTriple(\u0026quot;(?s ?p ?o)\u0026quot;) ; Op op = SSE.parseOp(\u0026quot;(filter (\u0026gt; ?v 123) (bgp (?s ?p ?v)))\u0026quot;) ; Most of the operations have forms that allow a PrefixMapping to be specified - this is wrapped around the parser run so prefixed names can be used without explicit prefix declarations.\nThere is a default prefix mapping with a few common prefixes: rdf, rdfs, owl, xsd and fn (the XPath/XQuery functions and operators namespace).\nSSE Files The file extension is .sse and all files are UTF-8.\nA quick and pragmatic Emacs mode is given by:\n;; ==== SSE mode (define-derived-mode sse-mode lisp-mode \u0026quot;SSE\u0026quot; nil (make-local-variable 'lisp-indent-function) (setq lisp-indent-function 'sse-indent-function) ) ;; Everything in SSE is \u0026quot;def\u0026quot; like (defun sse-indent-function (indent-point state) (lisp-indent-defform state indent-point)) (setq auto-mode-alist (cons '(\u0026quot;\\\\.sse\u0026quot; . sse-mode) auto-mode-alist)) Longer Examples Query 1 PREFIX foaf: \u0026lt;http://xmlns.com/foaf/0.1/\u0026gt; SELECT DISTINCT ?name ?nick { ?x foaf:mbox \u0026lt;mailt:person@server\u0026gt; . ?x foaf:name ?name OPTIONAL { ?x foaf:nick ?nick } } (prefix ((foaf: \u0026lt;http://xmlns.com/foaf/0.1/\u0026gt;)) (distinct (project (?name ?nick) (leftjoin (BGP [triple ?x foaf:mbox \u0026lt;mailto:person@server\u0026gt;] [triple ?x foaf:name ?name] ) (BGP [triple ?x foaf:nick ?nick]) )))) Complete SPARQL Execution The following is a complete query execution, data and query. There is an inline dataset and a query of\nPREFIX : \u0026lt;http://example/\u0026gt; SELECT * { GRAPH :g1 { ?x ?p ?v } } The tag graph is used twice, with different meanings. First, for an RDF graph, and second in GRAPH SPARQL pattern. In a data structure, context sorts out the different usages.\n(prefix ((: \u0026lt;http://example/\u0026gt;)) (exec (dataset (default (graph (:x :p 1) (:x :p 2))) (namedgraph :g1 (graph (:x :gp 1) (:x :gp 2))) (namedgraph :g2 (graph (:y :gp 1) (:y :gp 2))) ) (graph :g1 (bgp (?x ?p ?v))) )) SSE Grammar @@ insert grammar here\n","permalink":"https://jena.apache.org/documentation/notes/sse.html","tags":null,"title":"SPARQL S-Expressions (or \"SPARQL Syntax Expressions\")"},{"categories":null,"contents":"The objective of this SPARQL tutorial is to give a fast course in SPARQL. The tutorial covers the major features of the query language through examples but does not aim to be complete.\nIf you are looking for a short introduction to SPARQL and Jena try Search RDF data with SPARQL. If you are looking to execute SPARQL queries in code and already known SPARQL then you likely want to read the ARQ Documentation instead.\nSPARQL is a query language and a protocol for accessing RDF designed by the W3C RDF Data Access Working Group.\nAs a query language, SPARQL is \u0026ldquo;data-oriented\u0026rdquo; in that it only queries the information held in the models; there is no inference in the query language itself. Of course, the Jena model may be \u0026lsquo;smart\u0026rsquo; in that it provides the impression that certain triples exist by creating them on-demand, including OWL reasoning. SPARQL does not do anything other than take the description of what the application wants, in the form of a query, and returns that information, in the form of a set of bindings or an RDF graph.\nSPARQL tutorial Preliminaries: data! Executing a simple query Basic patterns Value constraints Optional information Alternatives Named Graphs Results Other Material The SPARQL query language definition document itself contains many examples. Search RDF data with SPARQL (by Phil McCarthy) - article published on IBM developer works about SPARQL and Jena. SPARQL reference card (by Dave Beckett) Detailed ARQ documentation\n","permalink":"https://jena.apache.org/tutorials/sparql.html","tags":null,"title":"SPARQL Tutorial"},{"categories":null,"contents":"In this section, we look at a simple first query and show how to execute it with Jena.\nA \u0026ldquo;hello world\u0026rdquo; of queries The file \u0026ldquo;q1.rq\u0026rdquo; contains the following query:\nSELECT ?x WHERE { ?x \u0026lt;http://www.w3.org/2001/vcard-rdf/3.0#FN\u0026gt; \u0026#34;John Smith\u0026#34; } executing that query with the command line query application;\n--------------------------------- | x | ================================= | \u0026lt;http://somewhere/JohnSmith/\u0026gt; | --------------------------------- This works by matching the triple pattern in the WHERE clause against the triples in the RDF graph. The predicate and object of the triple are fixed values so the pattern is going to match only triples with those values. The subject is a variable, and there are no other restrictions on the variable. The pattern matches any triples with these predicate and object values, and it matches with solutions for x.\nThe item enclosed in \u0026lt;\\\u0026gt; is a URI (actually, it\u0026rsquo;s an IRI) and the item enclosed in \u0026quot;\u0026quot; is a plain literal. Just like Turtle, N3 or N-triples, typed literals are written with ^^ and language tags can be added with @.\n?x is a variable called x. The ? does not form part of the name which is why it does not appear in the table output.\nThere is one match. The query returns the match in the x query variable. The output shown was obtained by using one of ARQ\u0026rsquo;s command line applications.\nExecuting the query There are helper scripts in the Jena distribution bat/ and bin/ directories. You should check these scripts before use. They can be placed on the shell command path.\nWindows setup Execute:\n\u0026gt; bat\\sparql.bat --data=doc\\Tutorial\\vc-db-1.rdf --query=doc\\Tutorial\\q1.rq You can just put the bat/ directory on your classpath or copy the programs out of it.\nbash scripts for Linux/Cygwin/Unix Execute:\n$ bin/sparql --data=doc/Tutorial/vc-db-1.rdf --query=doc/Tutorial/q1.rq Using the Java command line applications directly (This is not necessary.)\nYou will need to set the classpath to include all the jar files in the Jena distribution lib/ directory.\n$ java -cp \u0026#39;DIST/lib/*\u0026#39; arq.sparql ... where DIST is the apache-jena-VERSION directory.\nNext: basic patterns\n","permalink":"https://jena.apache.org/tutorials/sparql_query1.html","tags":null,"title":"SPARQL Tutorial - A First SPARQL Query"},{"categories":null,"contents":"Another way of dealing with the semi-structured data is to query for one of a number of possibilities. This section covers UNION patterns, where one of a number of possibilities is tried.\nUNION - two ways to the same data Both the vCard vocabulary and the FOAF vocabulary have properties for people\u0026rsquo;s names. In vCard, it is vCard:FN, the \u0026ldquo;formatted name\u0026rdquo;, and in FOAF, it is foaf:name. In this section, we will look at a small set of data where the names of people can be given by either the FOAF or the vCard vocabulary.\nSuppose we have an RDF graph that contains name information using both the vCard and FOAF vocabularies.\n@prefix foaf: \u0026lt;http://xmlns.com/foaf/0.1/\u0026gt; . @prefix vcard: \u0026lt;http://www.w3.org/2001/vcard-rdf/3.0#\u0026gt; . _:a foaf:name \u0026#34;Matt Jones\u0026#34; . _:b foaf:name \u0026#34;Sarah Jones\u0026#34; . _:c vcard:FN \u0026#34;Becky Smith\u0026#34; . _:d vcard:FN \u0026#34;John Smith\u0026#34; . A query to access the name information, when it can be in either form, could be (q-union1.rq):\nPREFIX foaf: \u0026lt;http://xmlns.com/foaf/0.1/\u0026gt; PREFIX vCard: \u0026lt;http://www.w3.org/2001/vcard-rdf/3.0#\u0026gt; SELECT ?name WHERE { { [] foaf:name ?name } UNION { [] vCard:FN ?name } } This returns the results:\n----------------- | name | ================= | \u0026#34;Matt Jones\u0026#34; | | \u0026#34;Sarah Jones\u0026#34; | | \u0026#34;Becky Smith\u0026#34; | | \u0026#34;John Smith\u0026#34; | ----------------- It didn\u0026rsquo;t matter which form of expression was used for the name, the ?name variable is set. This can be achieved using a FILTER as this query (q-union-1alt.rq) shows:\nPREFIX foaf: \u0026lt;http://xmlns.com/foaf/0.1/\u0026gt; PREFIX vCard: \u0026lt;http://www.w3.org/2001/vcard-rdf/3.0#\u0026gt; SELECT ?name WHERE { [] ?p ?name FILTER ( ?p = foaf:name || ?p = vCard:FN ) } testing whether the property is one URI or another. The solutions may not come out in the same order. The first form is more likely to be faster, depending on the data and the storage used, because the second form may have to get all the triples from the graph to match the triple pattern with unbound variables (or blank nodes) in each slot, then test each ?p to see if it matches one of the values. It will depend on the sophistication of the query optimizer as to whether it spots that it can perform the query more efficiently and is able to pass the constraint down as well as to the storage layer.\nUNION - remembering where the data was found. The example above used the same variable in each branch. If different variables are used, the application can discover which sub-pattern caused the match (q-union2.rq):\nPREFIX foaf: \u0026lt;http://xmlns.com/foaf/0.1/\u0026gt; PREFIX vCard: \u0026lt;http://www.w3.org/2001/vcard-rdf/3.0#\u0026gt; SELECT ?name1 ?name2 WHERE { { [] foaf:name ?name1 } UNION { [] vCard:FN ?name2 } } --------------------------------- | name1 | name2 | ================================= | \u0026#34;Matt Jones\u0026#34; | | | \u0026#34;Sarah Jones\u0026#34; | | | | \u0026#34;Becky Smith\u0026#34; | | | \u0026#34;John Smith\u0026#34; | --------------------------------- This second query has retained information of where the name of the person came from by assigning the name to different variables.\nOPTIONAL and UNION In practice, OPTIONAL is more common than UNION but they both have their uses. OPTIONAL are useful for augmenting the solutions found, UNION is useful for concatenating the solutions from two possibilities. They don\u0026rsquo;t necessary return the information in the same way:\nQuery(q-union3.rq):\nPREFIX foaf: \u0026lt;http://xmlns.com/foaf/0.1/\u0026gt; PREFIX vCard: \u0026lt;http://www.w3.org/2001/vcard-rdf/3.0#\u0026gt; SELECT ?name1 ?name2 WHERE { ?x a foaf:Person OPTIONAL { ?x foaf:name ?name1 } OPTIONAL { ?x vCard:FN ?name2 } } --------------------------------- | name1 | name2 | ================================= | \u0026#34;Matt Jones\u0026#34; | | | \u0026#34;Sarah Jones\u0026#34; | | | | \u0026#34;Becky Smith\u0026#34; | | | \u0026#34;John Smith\u0026#34; | --------------------------------- but beware of using ?name in each OPTIONAL because that is an order-dependent query.\nNext: Named Graphs\n","permalink":"https://jena.apache.org/tutorials/sparql_union.html","tags":null,"title":"SPARQL Tutorial - Alternatives in a Pattern"},{"categories":null,"contents":"This section covers basic patterns and solutions, the main building blocks of SPARQL queries.\nSolutions Query solutions are a set of pairs of a variable name with a value. A SELECT query directly exposes the solutions (after order/limit/offset are applied) as the result set - other query forms use the solutions to make a graph. The solution is the way the pattern matched - which values the variables must take for a pattern to match.\nThe first query example had a single solution. Change the pattern to this second query: (q-bp1.rq):\nSELECT ?x ?fname WHERE {?x \u0026lt;http://www.w3.org/2001/vcard-rdf/3.0#FN\u0026gt; ?fname} This has 4 solutions, one for each VCARD name property triples in the data source\n---------------------------------------------------- | x | fname | ==================================================== | \u0026lt;http://somewhere/RebeccaSmith/\u0026gt; | \u0026#34;Becky Smith\u0026#34; | | \u0026lt;http://somewhere/SarahJones/\u0026gt; | \u0026#34;Sarah Jones\u0026#34; | | \u0026lt;http://somewhere/JohnSmith/\u0026gt; | \u0026#34;John Smith\u0026#34; | | \u0026lt;http://somewhere/MattJones/\u0026gt; | \u0026#34;Matt Jones\u0026#34; | ---------------------------------------------------- So far, with triple patterns and basic patterns, every variable will be defined in every solution. The solutions to a query can be thought of a table, but in the general case, it is a table where not every row will have a value for every column. All the solutions to a given SPARQL query don\u0026rsquo;t have to have values for all the variables in every solution as we shall see later.\nBasic Patterns A basic pattern is a set of triple patterns. It matches when the triple patterns all match with the same value used each time the variable with the same name is used.\nSELECT ?givenName WHERE { ?y \u0026lt;http://www.w3.org/2001/vcard-rdf/3.0#Family\u0026gt; \u0026#34;Smith\u0026#34; . ?y \u0026lt;http://www.w3.org/2001/vcard-rdf/3.0#Given\u0026gt; ?givenName . } This query (q-bp2.rq) involves two triple patterns, each triple ends in a \u0026lsquo;.\u0026rsquo; (but the dot after the last one can be omitted like it was in the one triple pattern example). The variable y has to be the same for each triple pattern match. The solutions are:\n------------- | givenName | ============= | \u0026#34;John\u0026#34; | | \u0026#34;Rebecca\u0026#34; | ------------- QNames There is shorthand mechanism for writing long URIs using prefixes. The query above is more clearly written as the query (q-bp3.rq):\nPREFIX vcard: \u0026lt;http://www.w3.org/2001/vcard-rdf/3.0#\u0026gt; SELECT ?givenName WHERE { ?y vcard:Family \u0026#34;Smith\u0026#34; . ?y vcard:Given ?givenName . } This is a prefixing mechanism - the two parts of the URIs, from the prefix declaration and from the part after the \u0026ldquo;:\u0026rdquo; in the qname, are concatenated together. This is strictly not what an XML qname is but uses the RDF rule for turning a qname into a URI by concatenating the parts.\nBlank Nodes Change the query just a little to return y as well (q-bp4.rq) :\nPREFIX vcard: \u0026lt;http://www.w3.org/2001/vcard-rdf/3.0#\u0026gt; SELECT ?y ?givenName WHERE { ?y vcard:Family \u0026#34;Smith\u0026#34; . ?y vcard:Given ?givenName . } and the blank nodes appear\n-------------------- | y | givenName | ==================== | _:b0 | \u0026#34;John\u0026#34; | | _:b1 | \u0026#34;Rebecca\u0026#34; | -------------------- as odd looking qnames starting _:. This isn\u0026rsquo;t the internal label for the blank node - it is ARQ printing them out that assigned the _:b0, _:b1 to show when two blank nodes are the same. Here they are different. It does not reveal the internal label used for the blank node although that is available when using the Java API.\nNext: Filters\n","permalink":"https://jena.apache.org/tutorials/sparql_basic_patterns.html","tags":null,"title":"SPARQL Tutorial - Basic Patterns"},{"categories":null,"contents":"First, we need to be clear about what data is being queried. SPARQL queries RDF graphs. An RDF graph is a set of triples (Jena calls RDF graphs \u0026ldquo;models\u0026rdquo; and triples \u0026ldquo;statements\u0026rdquo; because that is what they were called at the time the Jena API was first designed).\nIt is important to realize that it is the triples that matter, not the serialization. The serialization is just a way to write the triples down. RDF/XML is the W3C recommendation but it can be difficult to see the triples in the serialized form because there are multiple ways to encode the same graph. In this tutorial, we use a more \u0026ldquo;triple-like\u0026rdquo; serialization, called Turtle (see also N3 language described in the W3C semantic web primer).\nWe will start with the simple data in vc-db-1.rdf: this file contains RDF for a number of vCard descriptions of people. vCards are described in RFC2426 and the RDF translation is described in the W3C note \u0026ldquo;Representing vCard Objects in RDF/XML\u0026rdquo;. Our example database just contains some name information.\nGraphically, the data looks like:\nIn triples, this might look like:\n@prefix vCard: \u0026lt;http://www.w3.org/2001/vcard-rdf/3.0#\u0026gt; . @prefix rdf: \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026gt; . @prefix : \u0026lt;#\u0026gt; . \u0026lt;http://somewhere/MattJones/\u0026gt; vCard:FN \u0026#34;Matt Jones\u0026#34; ; vCard:N [ vCard:Family \u0026#34;Jones\u0026#34; ; vCard:Given \u0026#34;Matthew\u0026#34; ] . \u0026lt;http://somewhere/RebeccaSmith/\u0026gt; vCard:FN \u0026#34;Becky Smith\u0026#34; ; vCard:N [ vCard:Family \u0026#34;Smith\u0026#34; ; vCard:Given \u0026#34;Rebecca\u0026#34; ] . \u0026lt;http://somewhere/JohnSmith/\u0026gt; vCard:FN \u0026#34;John Smith\u0026#34; ; vCard:N [ vCard:Family \u0026#34;Smith\u0026#34; ; vCard:Given \u0026#34;John\u0026#34; ] . \u0026lt;http://somewhere/SarahJones/\u0026gt; vCard:FN \u0026#34;Sarah Jones\u0026#34; ; vCard:N [ vCard:Family \u0026#34;Jones\u0026#34; ; vCard:Given \u0026#34;Sarah\u0026#34; ] . or even more explicitly as triples:\n@prefix vCard: \u0026lt;http://www.w3.org/2001/vcard-rdf/3.0#\u0026gt; . @prefix rdf: \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026gt; . \u0026lt;http://somewhere/MattJones/\u0026gt; vCard:FN \u0026#34;Matt Jones\u0026#34; . \u0026lt;http://somewhere/MattJones/\u0026gt; vCard:N _:b0 . _:b0 vCard:Family \u0026#34;Jones\u0026#34; . _:b0 vCard:Given \u0026#34;Matthew\u0026#34; . \u0026lt;http://somewhere/RebeccaSmith/\u0026gt; vCard:FN \u0026#34;Becky Smith\u0026#34; . \u0026lt;http://somewhere/RebeccaSmith/\u0026gt; vCard:N _:b1 . _:b1 vCard:Family \u0026#34;Smith\u0026#34; . _:b1 vCard:Given \u0026#34;Rebecca\u0026#34; . \u0026lt;http://somewhere/JohnSmith/\u0026gt; vCard:FN \u0026#34;John Smith\u0026#34; . \u0026lt;http://somewhere/JohnSmith/\u0026gt; vCard:N _:b2 . _:b2 vCard:Family \u0026#34;Smith\u0026#34; . _:b2 vCard:Given \u0026#34;John\u0026#34; . \u0026lt;http://somewhere/SarahJones/\u0026gt; vCard:FN \u0026#34;Sarah Jones\u0026#34; . \u0026lt;http://somewhere/SarahJones/\u0026gt; vCard:N _:b3 . _:b3 vCard:Family \u0026#34;Jones\u0026#34; . _:b3 vCard:Given \u0026#34;Sarah\u0026#34; . It is important to realize that these are the same RDF graph and that the triples in the graph are in no particular order. They are just written in related groups above for the human reader - the machine does not care.\nNext: A Simple Query\n","permalink":"https://jena.apache.org/tutorials/sparql_data.html","tags":null,"title":"SPARQL Tutorial - Data Formats"},{"categories":null,"contents":"This section covers RDF Datasets - an RDF Dataset is the unit that is queried by a SPARQL query. It consists of a default graph, and a number of named graphs.\nQuerying datasets The graph matching operation (basic patterns, OPTIONALs, and UNIONs) work on one RDF graph. This starts out being the default graph of the dataset but it can be changed by the GRAPH keyword.\nGRAPH uri { ... pattern ... } GRAPH var { ... pattern ... } If a URI is given, the pattern will be matched against the graph in the dataset with that name - if there isn\u0026rsquo;t one, the GRAPH clause fails to match at all.\nIf a variable is given, all the named graphs (not the default graph) are tried. The variable may be used elsewhere so that if, during execution, its value is already known for a solution, only the specific named graph is tried.\nExample Data An RDF dataset can take a variety of forms. Two common setups are to have the default graph being the union (the RDF merge) of all the named graphs or to have the default graph be an inventory of the named graphs (where they came from, when they were read etc). There are no limitations - one graph can be included twice under different names, or some graphs may share triples with others.\nIn the examples below we will use the following dataset that might occur for an RDF aggregator of book details:\nDefault graph (ds-dft.ttl):\n@prefix dc: \u0026lt;http://purl.org/dc/elements/1.1/\u0026gt; . @prefix xsd: \u0026lt;http://www.w3.org/2001/XMLSchema#\u0026gt; . \u0026lt;ds-ng-1.ttl\u0026gt; dc:date \u0026#34;2005-07-14T03:18:56+0100\u0026#34;^^xsd:dateTime . \u0026lt;ds-ng-2.ttl\u0026gt; dc:date \u0026#34;2005-09-22T05:53:05+0100\u0026#34;^^xsd:dateTime . Named graph (ds-ng-1.ttl):\n@prefix dc: \u0026lt;http://purl.org/dc/elements/1.1/\u0026gt; . [] dc:title \u0026#34;Harry Potter and the Philospher\u0026#39;s Stone\u0026#34; . [] dc:title \u0026#34;Harry Potter and the Chamber of Secrets\u0026#34; . Named graph (ds-ng-2.ttl):\n@prefix dc: \u0026lt;http://purl.org/dc/elements/1.1/\u0026gt; . [] dc:title \u0026#34;Harry Potter and the Sorcerer\u0026#39;s Stone\u0026#34; . [] dc:title \u0026#34;Harry Potter and the Chamber of Secrets\u0026#34; . That is, we have two small graphs describing some books, and we have a default graph which records when these graphs were last read.\nQueries can be run with the command line application (this would be all one line):\n$ java -cp ... arq.sparql --graph ds-dft.ttl --namedgraph ds-ng-1.ttl --namedgraph ds-ng-2.ttl --query query file Datasets don\u0026rsquo;t have to be created just for the lifetime of the query. They can be created and stored in a database, as would be more usual for an aggregator application.\nAccessing the Dataset The first example just accesses the default graph (q-ds-1.rq):\nPREFIX xsd: \u0026lt;http://www.w3.org/2001/XMLSchema#\u0026gt; PREFIX dc: \u0026lt;http://purl.org/dc/elements/1.1/\u0026gt; PREFIX : \u0026lt;.\u0026gt; SELECT * { ?s ?p ?o } (The \u0026ldquo;PREFIX : \u0026lt;.\u0026gt;\u0026rdquo; just helps format the output)\n---------------------------------------------------------------------- | s | p | o | ====================================================================== | :ds-ng-2.ttl | dc:date | \u0026#34;2005-09-22T05:53:05+01:00\u0026#34;^^xsd:dateTime | | :ds-ng-1.ttl | dc:date | \u0026#34;2005-07-14T03:18:56+01:00\u0026#34;^^xsd:dateTime | ---------------------------------------------------------------------- This is the default graph only - nothing from the named graphs because they aren\u0026rsquo;t queried unless explicitly indicated via GRAPH.\nWe can query for all triples by querying the default graph and the named graphs (q-ds-2.rq):\nPREFIX xsd: \u0026lt;http://www.w3.org/2001/XMLSchema#\u0026gt; PREFIX dc: \u0026lt;http://purl.org/dc/elements/1.1/\u0026gt; PREFIX : \u0026lt;.\u0026gt; SELECT * { { ?s ?p ?o } UNION { GRAPH ?g { ?s ?p ?o } } } giving:\n--------------------------------------------------------------------------------------- | s | p | o | g | ======================================================================================= | :ds-ng-2.ttl | dc:date | \u0026#34;2005-09-22T05:53:05+01:00\u0026#34;^^xsd:dateTime | | | :ds-ng-1.ttl | dc:date | \u0026#34;2005-07-14T03:18:56+01:00\u0026#34;^^xsd:dateTime | | | _:b0 | dc:title | \u0026#34;Harry Potter and the Sorcerer\u0026#39;s Stone\u0026#34; | :ds-ng-2.ttl | | _:b1 | dc:title | \u0026#34;Harry Potter and the Chamber of Secrets\u0026#34; | :ds-ng-2.ttl | | _:b2 | dc:title | \u0026#34;Harry Potter and the Chamber of Secrets\u0026#34; | :ds-ng-1.ttl | | _:b3 | dc:title | \u0026#34;Harry Potter and the Philospher\u0026#39;s Stone\u0026#34; | :ds-ng-1.ttl | --------------------------------------------------------------------------------------- Querying a specific graph If the application knows the name graph, it can directly ask a query such as finding all the titles in a given graph (q-ds-3.rq):\nPREFIX dc: \u0026lt;http://purl.org/dc/elements/1.1/\u0026gt; PREFIX : \u0026lt;.\u0026gt; SELECT ?title { GRAPH :ds-ng-2.ttl { ?b dc:title ?title } } Results:\n--------------------------------------------- | title | ============================================= | \u0026#34;Harry Potter and the Sorcerer\u0026#39;s Stone\u0026#34; | | \u0026#34;Harry Potter and the Chamber of Secrets\u0026#34; | --------------------------------------------- Querying to find data from graphs that match a pattern The name of the graphs to be queried can be determined with the query itself. The same process for variables applies whether they are part of a graph pattern or the GRAPH form. The query below (q-ds-4.rq) sets a condition on the variable used to select named graphs, based on information in the default graph.\nPREFIX xsd: \u0026lt;http://www.w3.org/2001/XMLSchema#\u0026gt; PREFIX dc: \u0026lt;http://purl.org/dc/elements/1.1/\u0026gt; PREFIX : \u0026lt;.\u0026gt; SELECT ?date ?title { ?g dc:date ?date . FILTER (?date \u0026gt; \u0026#34;2005-08-01T00:00:00Z\u0026#34;^^xsd:dateTime ) GRAPH ?g { ?b dc:title ?title } } The results of executing this query on the example dataset are the titles in one of the graphs, the one with the date later than 1 August 2005.\n----------------------------------------------------------------------------------------- | date | title | ========================================================================================= | \u0026#34;2005-09-22T05:53:05+01:00\u0026#34;^^xsd:dateTime | \u0026#34;Harry Potter and the Sorcerer\u0026#39;s Stone\u0026#34; | | \u0026#34;2005-09-22T05:53:05+01:00\u0026#34;^^xsd:dateTime | \u0026#34;Harry Potter and the Chamber of Secrets\u0026#34; | ----------------------------------------------------------------------------------------- Describing RDF Datasets - FROM and FROM NAMED A query execution can be given the dataset when the execution object is built or it can be described in the query itself. When the details are on the command line, a temporary dataset is created but an application can create datasets and then use them in many queries.\nWhen described in the query, FROM \u0026lt;url\u0026gt; is used to identify the contents to be in the default graph. There can be more than one FROM clause and the default graph is result of reading each file into the default graph. It is the RDF merge of the individual graphs.\nDon\u0026rsquo;t be confused by the fact the default graph is described by one or more URLs in FROM clauses. This is where the data is read from, not the name of the graph. As several FROM clauses can be given, the data can be read in from several places but none of them become the graph name.\nFROM NAMED \u0026lt;url\u0026gt; is used to identify a named graph. The graph is given the name url and the data is read from that location. Multiple FROM NAMED clauses cause multiple graphs to be added to the dataset.\nFor example, the query to find all the triples in both default graph and named graphs could be written as (q-ds-5.rq):\nPREFIX xsd: \u0026lt;http://www.w3.org/2001/XMLSchema#\u0026gt; PREFIX dc: \u0026lt;http://purl.org/dc/elements/1.1/\u0026gt; PREFIX : \u0026lt;.\u0026gt; SELECT * FROM \u0026lt;ds-dft.ttl\u0026gt; FROM NAMED \u0026lt;ds-ng-1.ttl\u0026gt; FROM NAMED \u0026lt;ds-ng-2.ttl\u0026gt; { { ?s ?p ?o } UNION { GRAPH ?g { ?s ?p ?o } } } Next: results\n","permalink":"https://jena.apache.org/tutorials/sparql_datasets.html","tags":null,"title":"SPARQL Tutorial - Datasets"},{"categories":null,"contents":"Essa sessão cobre datasets RDF – um dataset RDF é a unidade consultada por uma consulta SPARQL. Ele consiste de um grafo padrão, e certo número de grafos nomeados.\nConsultando datasets As operações de casamento de grafos (padrões básicos, OPTIONALs, e UNIONs) funcionam em um grafo RDF. Isso começa por ser o grafo padrão do conjunto de dados, mas pode ser alterado pela palavra-chave GRAPH. GRAPH uri { \u0026hellip; padrão \u0026hellip; }\nGRAPH var { ... padrão ... } Se um URI é fornecido, o padrão vai ser casado contra o grafo no dataset com esse nome – se não houver um, então clausula GRAPH falhará ao tentar casar.\nSe uma variável é dada, todos os grafos nomeados (não o grafo padrão) são testados. A variável pode ser usada em outro lugar, então, durante a execução, esse valor já é conhecido para a solução, somente o grafo nomeado é testado.\nDados de exemplo Um dataset RDF pode ter várias formas. A instalação comum é ter o grafo padrão sendo a união (o merge RDF) de todos os grafos nomeados e ter o grafo padrão como um inventário de grafos nomeados (de onde eles vieram, quando eles foram lidos, etc.). Não há limitações – um grafo pode ser incluído duas vezes sob diferentes nomes, ou alguns grafos podem compartilhar triplas com outros.\nNos exemplos abaixo, vamos usar o seguinte dataset que pode ocorrer para um RDF agregador de um livro de detalhes:\nGrafo padrão (ds-dft.ttl):\n@prefix dc: \u0026lt;http://purl.org/dc/elements/1.1/\u0026gt; . @prefix xsd: \u0026lt;http://www.w3.org/2001/XMLSchema#\u0026gt; . \u0026lt;ds-ng-1.ttl\u0026gt; dc:date \u0026#34;2005-07-14T03:18:56+0100\u0026#34;^^xsd:dateTime . \u0026lt;ds-ng-2.ttl\u0026gt; dc:date \u0026#34;2005-09-22T05:53:05+0100\u0026#34;^^xsd:dateTime . Grafo nomeado (ds-ng-1.ttl):\n@prefix dc: \u0026lt;http://purl.org/dc/elements/1.1/\u0026gt; . [] dc:title \u0026#34;Harry Potter and the Philospher\u0026#39;s Stone\u0026#34; . [] dc:title \u0026#34;Harry Potter and the Chamber of Secrets\u0026#34; . Grafo nomeado (ds-ng-2.ttl):\n@prefix dc: \u0026lt;http://purl.org/dc/elements/1.1/\u0026gt; . [] dc:title \u0026#34;Harry Potter and the Sorcerer\u0026#39;s Stone\u0026#34; . [] dc:title \u0026#34;Harry Potter and the Chamber of Secrets\u0026#34; . Isto é, nós temos dois pequenos grafos descrevendo alguns livros, e nós temos um grafo padrão que armazena quando esses grafos foram lidos pela última vez.\nAs consultas podem ser executadas via linha de comando (tudo numa linha):\n$ java -cp ... arq.sparql --graph ds-dft.ttl --namedgraph ds-ng-1.ttl --namedgraph ds-ng-2.ttl --query query file Datasets não têm que ser criados só para o tempo de vida da consulta. Eles podem ser criados e armazenados num banco de dados, o que seria mais usual para uma aplicação agregadora.\nAcessando o Dataset O primeiro exemplo apenas acessa o grafo padrão: (q-ds-1.rq):\nPREFIX xsd: \u0026lt;http://www.w3.org/2001/XMLSchema#\u0026gt; PREFIX dc: \u0026lt;http://purl.org/dc/elements/1.1/\u0026gt; PREFIX : \u0026lt;.\u0026gt; SELECT * { ?s ?p ?o } (O \u0026ldquo;PREFIX : \u0026lt;.\u0026gt;\u0026rdquo; apenas ajuda a formatar a saída)\n---------------------------------------------------------------------- | s | p | o | ====================================================================== | :ds-ng-2.ttl | dc:date | \u0026#34;2005-09-22T05:53:05+01:00\u0026#34;^^xsd:dateTime | | :ds-ng-1.ttl | dc:date | \u0026#34;2005-07-14T03:18:56+01:00\u0026#34;^^xsd:dateTime | ---------------------------------------------------------------------- Este é somente o grafo padrão – nada dos grafos nomeados porque eles não são consultados a menos que seja informado explicitamente via GRAPH.\nNós podemos consultar todas as triplas ao consultar o grafo padrão e os grafos nomeados: (q-ds-2.rq):\nPREFIX xsd: \u0026lt;http://www.w3.org/2001/XMLSchema#\u0026gt; PREFIX dc: \u0026lt;http://purl.org/dc/elements/1.1/\u0026gt; PREFIX : \u0026lt;.\u0026gt; SELECT * { { ?s ?p ?o } UNION { GRAPH ?g { ?s ?p ?o } } } resultando em:\n--------------------------------------------------------------------------------------- | s | p | o | g | ======================================================================================= | :ds-ng-2.ttl | dc:date | \u0026#34;2005-09-22T05:53:05+01:00\u0026#34;^^xsd:dateTime | | | :ds-ng-1.ttl | dc:date | \u0026#34;2005-07-14T03:18:56+01:00\u0026#34;^^xsd:dateTime | | | _:b0 | dc:title | \u0026#34;Harry Potter and the Sorcerer\u0026#39;s Stone\u0026#34; | :ds-ng-2.ttl | | _:b1 | dc:title | \u0026#34;Harry Potter and the Chamber of Secrets\u0026#34; | :ds-ng-2.ttl | | _:b2 | dc:title | \u0026#34;Harry Potter and the Chamber of Secrets\u0026#34; | :ds-ng-1.ttl | | _:b3 | dc:title | \u0026#34;Harry Potter and the Philospher\u0026#39;s Stone\u0026#34; | :ds-ng-1.ttl | --------------------------------------------------------------------------------------- Consultando um grafo especifico Se a aplicação souber o nome do grafo, ele pode consultar diretamente títulos num grafo dado: (q-ds-3.rq):\nPREFIX dc: \u0026lt;http://purl.org/dc/elements/1.1/\u0026gt; PREFIX : \u0026lt;.\u0026gt; SELECT ?title { GRAPH :ds-ng-2.ttl { ?b dc:title ?title } } Resultados:\n--------------------------------------------- | title | ============================================= | \u0026#34;Harry Potter and the Sorcerer\u0026#39;s Stone\u0026#34; | | \u0026#34;Harry Potter and the Chamber of Secrets\u0026#34; | --------------------------------------------- Consulta para encontrar dados de grafos que casam com um padrão O nome dos grafos a ser consultados podem ser determinados na consulta.\nO mesmo processo se aplica a variáveis se elas são parte de um padrão de grafo ou na forma GRAPH form. A consulta abaixo (q-ds-4.rq) seta uma condição nas variáveis usadas para selecionar grafos nomeados, baseada na informação do grafo padrão.\nPREFIX xsd: \u0026lt;http://www.w3.org/2001/XMLSchema#\u0026gt; PREFIX dc: \u0026lt;http://purl.org/dc/elements/1.1/\u0026gt; PREFIX : \u0026lt;.\u0026gt; SELECT ?date ?title { ?g dc:date ?date . FILTER (?date \u0026gt; \u0026#34;2005-08-01T00:00:00Z\u0026#34;^^xsd:dateTime ) GRAPH ?g { ?b dc:title ?title } } O resultado da consulta no dataset de exemplo são títulos em um dos grafos, o grafo com data anterior a 1 de agosto de 2005.\n----------------------------------------------------------------------------------------- | date | title | ========================================================================================= | \u0026#34;2005-09-22T05:53:05+01:00\u0026#34;^^xsd:dateTime | \u0026#34;Harry Potter and the Sorcerer\u0026#39;s Stone\u0026#34; | | \u0026#34;2005-09-22T05:53:05+01:00\u0026#34;^^xsd:dateTime | \u0026#34;Harry Potter and the Chamber of Secrets\u0026#34; | ----------------------------------------------------------------------------------------- Descrevendo datasets RDF - FROM e FROM NAMED À execução de um consulta pode ser dado o dataset quando o objeto da execução é construído ou ele pode ser descrito na própria consulta. Quando os detalhes estão na linha de comando, um dataset temporário é criado, mas uma aplicação pode criar datasets e então usá-los em várias consultas.\nQuando descrito na consulta, FROM \u0026lt;i\u0026gt;url\u0026lt;/i\u0026gt; é usado para identificar o conteúdo a preencher o grafo padrão. Pode haver mais de uma cláusula FROM e o grafo padrão é resultado da leitura de cada arquivo no grafo padrão. Isto é o merge de RDF de grafos individuais.\nNão se confunda com o fato de um grafo padrão ser descrito por uma ou mais URL na clausula FROM. Esse é o lugar de onde o dado é lido, não o nome do grafo. Como muitas cláusulas FROM podem ser fornecidas, o dado pode ser lido de vários lugares, mas nenhum deles se torna o nome do grafo.\nFROM NAMED \u0026lt;i\u0026gt;url\u0026lt;/i\u0026gt; é usado para identificar o grafo nomeado. Ao grafo é dado a url e o dado é lido daquela localização. Múltiplas clausulas FROM NAMED causam em muitos grafos para serem adicionados ao dataset.\nPor exemplo, a consulta para buscar todas as triplas em ambos o grafo padrão e os grafos nomeados poderia ser escrita como (q-ds-5.rq):\nPREFIX xsd: \u0026lt;http://www.w3.org/2001/XMLSchema#\u0026gt; PREFIX dc: \u0026lt;http://purl.org/dc/elements/1.1/\u0026gt; PREFIX : \u0026lt;.\u0026gt; SELECT * FROM \u0026lt;ds-dft.ttl\u0026gt; FROM NAMED \u0026lt;ds-ng-1.ttl\u0026gt; FROM NAMED \u0026lt;ds-ng-2.ttl\u0026gt; { { ?s ?p ?o } UNION { GRAPH ?g { ?s ?p ?o } } } Next: Resultados\n","permalink":"https://jena.apache.org/tutorials/sparql_datasets_pt.html","tags":null,"title":"SPARQL Tutorial - Datasets"},{"categories":null,"contents":"Graph matching allows patterns in the graph to be found. This section describes how the values in a solution can be restricted. There are many comparisons available - we just cover two cases here.\nString Matching SPARQL provides an operation to test strings, based on regular expressions. This includes the ability to ask SQL \u0026ldquo;LIKE\u0026rdquo; style tests, although the syntax of the regular expression is different from SQL.\nThe syntax is:\nFILTER regex(?x, \u0026#34;pattern\u0026#34; [, \u0026#34;flags\u0026#34;]) The flags argument is optional. The flag \u0026ldquo;i\u0026rdquo; means a case-insensitive pattern match is done.\nThe example query (q-f1.rq) finds given names with an \u0026ldquo;r\u0026rdquo; or \u0026ldquo;R\u0026rdquo; in them.\nPREFIX vcard: \u0026lt;http://www.w3.org/2001/vcard-rdf/3.0#\u0026gt; SELECT ?g WHERE { ?y vcard:Given ?g . FILTER regex(?g, \u0026#34;r\u0026#34;, \u0026#34;i\u0026#34;) } with the results:\n------------- | g | ============= | \u0026#34;Rebecca\u0026#34; | | \u0026#34;Sarah\u0026#34; | ------------- The regular expression language is the same as the XQuery regular expression language which is codified version of that found in Perl.\nTesting Values There are times when the application wants to filter on the value of a variable. In the data file vc-db-2.rdf, we have added an extra field for age. Age is not defined by the vCard schema, so we have created a new property for the purpose of this tutorial. RDF allows such mixing of different definitions of information because URIs are unique. Note also that the info:age property value is typed.\nIn this extract of the data, we show the typed value. It can also be written plain 23.\n\u0026lt;http://somewhere/RebeccaSmith/\u0026gt; info:age \u0026#34;23\u0026#34;^^xsd:integer ; vCard:FN \u0026#34;Becky Smith\u0026#34; ; vCard:N [ vCard:Family \u0026#34;Smith\u0026#34; ; vCard:Given \u0026#34;Rebecca\u0026#34; ] . So, a query (q-f2.rq) to find the names of people who are older than 24 is:\nPREFIX info: \u0026lt;http://somewhere/peopleInfo#\u0026gt; SELECT ?resource WHERE { ?resource info:age ?age . FILTER (?age \u0026gt;= 24) } The arithmetic expression must be in parentheses (round brackets). The only solution is:\n--------------------------------- | resource | ================================= | \u0026lt;http://somewhere/JohnSmith/\u0026gt; | --------------------------------- Just one match, resulting in the resource URI for John Smith. Turning this round to ask for those less than 24 also yields one match for Rebecca Smith. Nothing about the Jones'.\nThe database contains no age information about the Jones: there are no info:age properties on these vCards so the variable age did not get a value and so was not tested by the filter.\nNext: Optionals\n","permalink":"https://jena.apache.org/tutorials/sparql_filters.html","tags":null,"title":"SPARQL Tutorial - Filters"},{"categories":null,"contents":"RDF is semi-structured data so SPARQL has a the ability to query for data but not to fail query when that data does not exist. The query is using an optional part to extend the information found in a query solution but to return the non-optional information anyway.\nOPTIONALs This query (q-opt1.rq) gets the name of a person and also their age if that piece of information is available.\nPREFIX info: \u0026lt;http://somewhere/peopleInfo#\u0026gt; PREFIX vcard: \u0026lt;http://www.w3.org/2001/vcard-rdf/3.0#\u0026gt; SELECT ?name ?age WHERE { ?person vcard:FN ?name . OPTIONAL { ?person info:age ?age } } Two of the four people in the data (vc-db-2.rdf)have age properties so two of the query solutions have that information. However, because the triple pattern for the age is optional, there is a pattern solution for the people who don\u0026rsquo;t have age information.\n------------------------ | name | age | ======================= | \u0026#34;Becky Smith\u0026#34; | 23 | | \u0026#34;Sarah Jones\u0026#34; | | | \u0026#34;John Smith\u0026#34; | 25 | | \u0026#34;Matt Jones\u0026#34; | | ----------------------- If the optional clause had not been there, no age information would have been retrieved. If the triple pattern had been included but not optional then we would have the query (q-opt2.rq):\nPREFIX info: \u0026lt;http://somewhere/peopleInfo#\u0026gt; PREFIX vcard: \u0026lt;http://www.w3.org/2001/vcard-rdf/3.0#\u0026gt; SELECT ?name ?age WHERE { ?person vcard:FN ?name . ?person info:age ?age . } with only two solutions:\n----------------------- | name | age | ======================= | \u0026#34;Becky Smith\u0026#34; | 23 | | \u0026#34;John Smith\u0026#34; | 25 | ----------------------- because the info:age property must now be present in a solution.\nOPTIONALs with FILTERs OPTIONAL is a binary operator that combines two graph patterns. The optional pattern is any group pattern and may involve any SPARQL pattern types. If the group matches, the solution is extended, if not, the original solution is given (q-opt3.rq).\nPREFIX info: \u0026lt;http://somewhere/peopleInfo#\u0026gt; PREFIX vcard: \u0026lt;http://www.w3.org/2001/vcard-rdf/3.0#\u0026gt; SELECT ?name ?age WHERE { ?person vcard:FN ?name . OPTIONAL { ?person info:age ?age . FILTER ( ?age \u0026gt; 24 ) } } So, if we filter for ages greater than 24 in the optional part, we will still get 4 solutions (from the vcard:FN pattern) but only get ages if they pass the test.\n----------------------- | name | age | ======================= | \u0026#34;Becky Smith\u0026#34; | | | \u0026#34;Sarah Jones\u0026#34; | | | \u0026#34;John Smith\u0026#34; | 25 | | \u0026#34;Matt Jones\u0026#34; | | ----------------------- No age included for \u0026ldquo;Becky Smith\u0026rdquo; because it is less than 24.\nIf the filter condition is moved out of the optional part, then it can influence the number of solutions, but it may be necessary to make the filter more complicated to allow for variable age being unbound (q-opt4.rq).\nPREFIX info: \u0026lt;http://somewhere/peopleInfo#\u0026gt; PREFIX vcard: \u0026lt;http://www.w3.org/2001/vcard-rdf/3.0#\u0026gt; SELECT ?name ?age WHERE { ?person vcard:FN ?name . OPTIONAL { ?person info:age ?age . } FILTER ( !bound(?age) || ?age \u0026gt; 24 ) } If a solution has an age variable, then it must be greater than 24. It can also be unbound. There are now three solutions:\n----------------------- | name | age | ======================= | \u0026#34;Sarah Jones\u0026#34; | | | \u0026#34;John Smith\u0026#34; | 25 | | \u0026#34;Matt Jones\u0026#34; | | ----------------------- Evaluating an expression which has an unbound variables where a bound one was expected causes an evaluation exception and the whole expression fails.\nOPTIONALs and Order Dependent Queries One thing to be careful of is using the same variable in two or more optional clauses (and not in some basic pattern as well):\nPREFIX foaf: \u0026lt;http://xmlns.com/foaf/0.1/\u0026gt; PREFIX vCard: \u0026lt;http://www.w3.org/2001/vcard-rdf/3.0#\u0026gt; SELECT ?name WHERE { ?x a foaf:Person . OPTIONAL { ?x foaf:name ?name } OPTIONAL { ?x vCard:FN ?name } } If the first optional binds ?name and ?x to some values, the second OPTIONAL is an attempt to match the ground triples (?x and ?name have values). If the first optional did not match the optional part, then the second one is an attempt to match its triple with two variables.\nWith an example set of data in which every combination of values exist:\n\u0026lt;rdf:RDF xmlns:rdf=\u0026#39;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026#39; xmlns:vCard=\u0026#39;http://www.w3.org/2001/vcard-rdf/3.0#\u0026#39; xmlns:info=\u0026#39;http://somewhere/peopleInfo#\u0026#39; xmlns:foaf=\u0026#39;http://xmlns.com/foaf/0.1/\u0026#39; \u0026gt; \u0026lt;!-- both vCard:FN and foaf:name have values, and the values are the same --\u0026gt; \u0026lt;foaf:Person rdf:about=\u0026#34;http://somewhere/JohnSmith\u0026#34;\u0026gt; \u0026lt;vCard:FN\u0026gt;John Smith\u0026lt;/vCard:FN\u0026gt; \u0026lt;foaf:name\u0026gt;John Smith\u0026lt;/foaf:name\u0026gt; \u0026lt;/foaf:Person\u0026gt; \u0026lt;!-- both vCard:FN and foaf:name have values, but the values are not the same --\u0026gt; \u0026lt;foaf:Person rdf:about=\u0026#34;http://somewhere/RebeccaSmith\u0026#34;\u0026gt; \u0026lt;vCard:FN\u0026gt;Becky Smith\u0026lt;/vCard:FN\u0026gt; \u0026lt;foaf:name\u0026gt;Rebecca Smith\u0026lt;/foaf:name\u0026gt; \u0026lt;/foaf:Person\u0026gt; \u0026lt;!-- only vCard:FN has values --\u0026gt; \u0026lt;foaf:Person rdf:about=\u0026#34;http://somewhere/SarahJones\u0026#34;\u0026gt; \u0026lt;vCard:FN\u0026gt;Sarah Jones\u0026lt;/vCard:FN\u0026gt; \u0026lt;/foaf:Person\u0026gt; \u0026lt;!-- only foaf:name has values --\u0026gt; \u0026lt;foaf:Person rdf:about=\u0026#34;http://somewhere/MattJones\u0026#34;\u0026gt; \u0026lt;foaf:name\u0026gt;Matthew Jones\u0026lt;/foaf:name\u0026gt; \u0026lt;/foaf:Person\u0026gt; \u0026lt;!-- neither vCard:FN nor foaf:name have values --\u0026gt; \u0026lt;foaf:Person rdf:about=\u0026#34;http://somewhere/AdamJones\u0026#34; /\u0026gt; \u0026lt;/rdf:RDF\u0026gt; Executing the above query will yield these solutions:\n------------------- | name | =================== | \u0026#34;John Smith\u0026#34; | | \u0026#34;Matthew Jones\u0026#34; | | \u0026#34;Sarah Jones\u0026#34; | | | | \u0026#34;Rebecca Smith\u0026#34; | ------------------- Next: union queries\n","permalink":"https://jena.apache.org/tutorials/sparql_optionals.html","tags":null,"title":"SPARQL Tutorial - Optional Information"},{"categories":null,"contents":"This module was first released with Jena 2.11.0. It was last released in Jena 3.12.0.\nJena provides a GeoSPARQL implementation.\nThis is an extension to Apache Jena ARQ, which combines SPARQL and simple spatial query. It gives applications the ability to perform simple spatial searches within SPARQL queries. Spatial indexes are additional information for accessing the RDF graph.\nThe spatial index can be either Apache Lucene for a same-machine spatial index, or Apache Solr for a large scale enterprise search application.\nSome example code is available here.\nIllustration\nThis query makes a spatial query for the places within 10 kilometres of Bristol UK (which as latitude/longitude of 51.46, 2.6).\nPREFIX spatial: \u0026lt;http://jena.apache.org/spatial#\u0026gt; PREFIX rdfs: \u0026lt;http://www.w3.org/2000/01/rdf-schema#\u0026gt; SELECT ?placeName { ?place spatial:nearby (51.46 2.6 10 'km') . ?place rdfs:label ?placeName } How to Use it by Code Create Spatial Dataset import org.apache.jena.query.spatial.EntityDefinition ... // In Lucene, \u0026quot;entityField\u0026quot; stores the uri of the subject (e.g. a place), // while \u0026quot;geoField\u0026quot; holds the indexed geo data (e.g. latitude/longitude). // Using fields \u0026quot;uri\u0026quot; and \u0026quot;geo\u0026quot;: EntityDefinition entDef = new EntityDefinition(\u0026quot;uri\u0026quot;, \u0026quot;geo\u0026quot;); // index in File system (or use an in-memory one) Directory dir = FSDirectory.open(indexDir); // The baseDataset can be an in-memory or TDB/SDB file based one which contains the geo data. Join together into a dataset. Dataset spatialDataset = SpatialDatasetFactory.createLucene(baseDataset, dir, entDef); ... Supported Geo Data for Indexing and Querying Builtin Geo Predicates There are mainly 2 types of RDF representation of geo data, which are both supported by jena-spatial:\n1) Latitude/Longitude Format (in gonames, DBPedia and Linked Geo Data)\nPREFIX geo: \u0026lt;http://www.w3.org/2003/01/geo/wgs84_pos#\u0026gt; :EGBB rdf:type :LargeAirport ; geo:lat \u0026quot;52.4539\u0026quot;^^xsd:float ; geo:long \u0026quot;-1.74803\u0026quot;^^xsd:float . :EGBB_String rdf:type :LargeAirport ; geo:lat \u0026quot;52.4539\u0026quot; ; geo:long \u0026quot;-1.74803\u0026quot; . 2) Well Known Text (WKT) Literal (in DBPedia and Linked Geo Data)\nPREFIX ogc: \u0026lt;http://www.opengis.net/ont/geosparql#\u0026gt; :node1000032677 a :Geometry ; ogc:asWKT \u0026quot;POINT(7.338818000000001 51.4433324)\u0026quot;^^ogc:wktLiteral . airports:EGBB_Fake_In_Box rdf:type airports_sc:LargeAirport ; ogc:asWKT \u0026quot;Polygon ((-2.0 51.2, 1.0 51.2, 1.0 51.8, -2.0 51.8, -2.0 51.2))\u0026quot;^^wkt:wktLiteral. For 2) WKT, DBPedia uses geo:geometry, while Linked Geo Data adopts ogc:asWKT and geo:geometry.\nThe builtin predicates that can be automatically processed by jena-spatial include: 1) geo:lat, geo:long; 2) geo:geometry, ogc:asWKT.\nImportant note In order to read geo data in 2) WKT literal format, jena-spatial uses JTS Topology Suite, which is under LGPL licence. jena-spatial does not make a hard dependency on JTS. In other words, if an end user just uses the feature of 1), there\u0026rsquo;s no need to depend on JTS (i.e. nothing needs to be done). If they want 2), they can make it by setting the SpatialContextFactory of EntityDefinition to JtsSpatialContextFactory, which is an optional choice. In this way, the JTS libs should be in the classpath. Here\u0026rsquo;s the sample code:\nimport org.apache.jena.query.spatial.EntityDefinition ... EntityDefinition entDef = new EntityDefinition(\u0026quot;uri\u0026quot;, \u0026quot;geo\u0026quot;); // use JtsSpatialContextFactory to support 2) WKT literals (optional) entDef.setSpatialContextFactory(\u0026quot;com.spatial4j.core.context.jts.JtsSpatialContextFactory\u0026quot;); ... Custom Geo Predicates However, there may be more predicates for other data sources for both 1) and 2). jena-spatial provides an interface for consuming all kinds of custom geo predicates. You can simply add predicates to let jena-spatial recognize them using EntityDefinition:\nimport org.apache.jena.query.spatial.EntityDefinition ... EntityDefinition entDef = new EntityDefinition(\u0026quot;uri\u0026quot;\u0026quot;, \u0026quot;geo\u0026quot;); // custom geo predicates for 1) Latitude/Longitude Format Resource lat_1 = ResourceFactory.createResource(\u0026quot;http://localhost/jena_example/#latitude_1\u0026quot;) ; Resource long_1 ResourceFactory.createResource(\u0026quot;http://localhost/jena_example/#longitude_1\u0026quot;) ; entDef.addSpatialPredicatePair(lat_1, long_1) ; // custom geo predicates for Well Known Text (WKT) Literal Resource wkt_1 = ResourceFactory.createResource(\u0026quot;http://localhost/jena_example/#wkt_1\u0026quot;); entDef.addWKTPredicate( wkt_1 ); See more supported geo data examples\nLoad Geo Data into Spatial Dataset spatialDataset.begin(ReadWrite.WRITE); try { Model m = spatialDataset.getDefaultModel(); RDFDataMgr.read(m, file); spatialDataset.commit(); } finally { spatialDataset.end(); } Now the spatial dataset is ready for spatial query.\nProperty Function Library The prefix spatial is \u0026lt;http://jena.apache.org/spatial#\u0026gt;.\nProperty name Description ?place spatial:nearby (latitude, longitude, radius [, units, limit])\n?place spatial:withinCircle (latitude, longitude, radius [, units, limit]) Query for the ?place within the radius distance of the location of (latitude, longitude). The distance units can be: \u0026ldquo;kilometres\u0026rdquo;/\u0026ldquo;km\u0026rdquo;, \u0026ldquo;miles\u0026rdquo;/\u0026ldquo;mi\u0026rdquo;, \u0026ldquo;metres\u0026rdquo;/\u0026ldquo;m\u0026rdquo;, \u0026ldquo;centimetres\u0026rdquo;/\u0026ldquo;cm\u0026rdquo;, \u0026ldquo;millimetres\u0026rdquo;/\u0026ldquo;mm\u0026rdquo; or \u0026ldquo;degrees\u0026rdquo;/\u0026ldquo;de\u0026rdquo;, which are delivered as the optional strings (the default value is \u0026ldquo;kilometres\u0026rdquo;). limit is an optional integer parameter for the limit of the query results (if limit\u0026lt;0, return all query results). ?place spatial:withinBox (latitude_min, longitude_min, latitude_max, longitude_max [, limit]) Query for the ?place within the box area of (latitude_min, longitude_min, latitude_max, longitude_max). ?place spatial:intersectBox (latitude_min, longitude_min, latitude_max, longitude_max [, limit]) Query for the ?place intersecting the box area of (latitude_min, longitude_min, latitude_max, longitude_max). ?place spatial:north (latitude, longitude [, limit]) Query for the ?place northing the location of (latitude, longitude). ?place spatial:south (latitude, longitude [, limit]) Query for the ?place southing the location of (latitude, longitude). ?place spatial:west (latitude, longitude [, limit]) Query for the ?place westing the location of (latitude, longitude). ?place spatial:east (latitude, longitude [, limit]) Query for the ?place easting the location of (latitude, longitude). See ESRIs docs on spatial relations\nSpatial Dataset Assembler The usual way to describe an index is with a Jena assembler description. Configurations can also be built with code. The assembler describes a \u0026ldquo;spatial dataset\u0026rdquo; which has an underlying RDF dataset and a spatial index. The spatial index describes the spatial index technology (Lucene or Solr) and the details needed for each.\nA spatial index has an EntityDefinition which defines the properties to index, the name of the lucene/solr field used for storing the URI itself (e.g. \u0026ldquo;entityField\u0026rdquo;) and its geo information (e.g. latitude/longitude as \u0026ldquo;geoField\u0026rdquo;), and the custom geo predicates.\nFor common RDF spatial query, only \u0026ldquo;entityField\u0026rdquo; and \u0026ldquo;geoField\u0026rdquo; are required with the builtin geo predicates working well. More complex setups, with multiple custom geo predicates besides the two fields are possible. You also optionally use JtsSpatialContextFactory to support indexing WKT literals.\nOnce setup this way, any data added to the spatial dataset is automatically indexed as well.\nThe following is an example of a TDB dataset with a spatial index.\n## Example of a TDB dataset and spatial index PREFIX : \u0026lt;http://localhost/jena_example/#\u0026gt; PREFIX rdf: \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026gt; PREFIX rdfs: \u0026lt;http://www.w3.org/2000/01/rdf-schema#\u0026gt; PREFIX tdb: \u0026lt;http://jena.hpl.hp.com/2008/tdb#\u0026gt; PREFIX ja: \u0026lt;http://jena.hpl.hp.com/2005/11/Assembler#\u0026gt; PREFIX spatial: \u0026lt;http://jena.apache.org/spatial#\u0026gt; ## --------------------------------------------------------------- ## This URI must be fixed - it's used to assemble the spatial dataset. :spatial_dataset rdf:type spatial:SpatialDataset ; spatial:dataset \u0026lt;#dataset\u0026gt; ; ##spatial:index \u0026lt;#indexSolr\u0026gt; ; spatial:index \u0026lt;#indexLucene\u0026gt; ; . \u0026lt;#dataset\u0026gt; rdf:type tdb:DatasetTDB ; tdb:location \u0026quot;--mem--\u0026quot; ; tdb:unionDefaultGraph true ; . \u0026lt;#indexLucene\u0026gt; a spatial:SpatialIndexLucene ; #spatial:directory \u0026lt;file:Lucene\u0026gt; ; spatial:directory \u0026quot;mem\u0026quot; ; spatial:definition \u0026lt;#definition\u0026gt; ; . \u0026lt;#definition\u0026gt; a spatial:EntityDefinition ; spatial:entityField \u0026quot;uri\u0026quot; ; spatial:geoField \u0026quot;geo\u0026quot; ; # custom geo predicates for 1) Latitude/Longitude Format spatial:hasSpatialPredicatePairs ( [ spatial:latitude :latitude_1 ; spatial:longitude :longitude_1 ] [ spatial:latitude :latitude_2 ; spatial:longitude :longitude_2 ] ) ; # custom geo predicates for 2) Well Known Text (WKT) Literal spatial:hasWKTPredicates (:wkt_1 :wkt_2) ; # custom SpatialContextFactory for 2) Well Known Text (WKT) Literal spatial:spatialContextFactory \u0026quot;com.spatial4j.core.context.jts.JtsSpatialContextFactory\u0026quot; . then use code such as:\nDataset spatialDataset = DatasetFactory.assemble( \u0026quot;spatial-config.ttl\u0026quot;, \u0026quot;http://localhost/jena_example/#spatial_dataset\u0026quot;) ; Key here is that the assembler contains two dataset definitions, one for the spatial dataset, one for the base data. Therefore, the application needs to identify the text dataset by its URI \u0026lsquo;http://localhost/jena_example/#spatial_dataset\u0026rsquo;.\nWorking with Solr Besides Lucene, jena-spatial can work with Solr for spatial query, powered by Lucene / Solr 4 Spatial and Solrj.\nIt\u0026rsquo;s required to add the field definitions for \u0026ldquo;entityField\u0026rdquo; and \u0026ldquo;geoField\u0026rdquo; respectively in schema.xml of Solr. The names of the fields in EntityDefinition should be in accordance with those in schema.xml. Here is an example defining the names of \u0026ldquo;entityField\u0026rdquo; as \u0026ldquo;uri\u0026rdquo; and \u0026ldquo;geoField\u0026rdquo; as \u0026ldquo;geo\u0026rdquo;:\n\u0026lt;field name=\u0026quot;uri\u0026quot; type=\u0026quot;string\u0026quot; indexed=\u0026quot;true\u0026quot; stored=\u0026quot;true\u0026quot; required=\u0026quot;true\u0026quot; multiValued=\u0026quot;false\u0026quot; /\u0026gt; \u0026lt;field name=\u0026quot;geo\u0026quot; type=\u0026quot;location_rpt\u0026quot; indexed=\u0026quot;true\u0026quot; stored=\u0026quot;true\u0026quot; multiValued=\u0026quot;true\u0026quot; /\u0026gt; The fieldType of \u0026ldquo;entityField\u0026rdquo; is string, while that of \u0026ldquo;geoField\u0026rdquo; is location_rpt:\n\u0026lt;fieldType name=\u0026quot;string\u0026quot; class=\u0026quot;solr.StrField\u0026quot; sortMissingLast=\u0026quot;true\u0026quot; /\u0026gt; \u0026lt;fieldType name=\u0026quot;location_rpt\u0026quot; class=\u0026quot;solr.SpatialRecursivePrefixTreeFieldType\u0026quot; geo=\u0026quot;true\u0026quot; distErrPct=\u0026quot;0.025\u0026quot; maxDistErr=\u0026quot;0.000009\u0026quot; units=\u0026quot;degrees\u0026quot; /\u0026gt; Additionally, in solrconfig.xml, there should be 2 requestHandlers defined for querying and updating the spatial data and the index.\n\u0026lt;requestHandler name=\u0026quot;/select\u0026quot; class=\u0026quot;solr.SearchHandler\u0026quot;\u0026gt;\u0026lt;/requestHandler\u0026gt; \u0026lt;requestHandler name=\u0026quot;/update\u0026quot; class=\u0026quot;solr.UpdateRequestHandler\u0026quot;\u0026gt;\u0026lt;/requestHandler\u0026gt; The above is the least required configuration to run jena-spatial in Solr. For more information about the configuration, please check the Lucene / Solr 4 Spatial documentation.\nThere are also some demonstrations of the usage of Solr in the unit tests of jena-spatial. They use a EmbeddedSolrServerwith a SOLR_HOME sample here.\nWorking with Fuseki The Fuseki configuration simply points to the spatial dataset as the fuseki:dataset of the service.\n\u0026lt;#service_spatial_tdb\u0026gt; rdf:type fuseki:Service ; rdfs:label \u0026quot;TDB/spatial service\u0026quot; ; fuseki:name \u0026quot;ds\u0026quot; ; fuseki:serviceQuery \u0026quot;query\u0026quot; ; fuseki:serviceQuery \u0026quot;sparql\u0026quot; ; fuseki:serviceUpdate \u0026quot;update\u0026quot; ; fuseki:serviceReadGraphStore \u0026quot;get\u0026quot; ; fuseki:serviceReadWriteGraphStore \u0026quot;data\u0026quot; ; fuseki:dataset :spatial_dataset ; Building a Spatial Index When working at scale, or when preparing a published, read-only, SPARQL service, creating the index by loading the spatial dataset is impractical. The index and the dataset can be built using command line tools in two steps: first load the RDF data, second create an index from the existing RDF dataset.\nBuild the TDB dataset:\njava -cp $FUSEKI_HOME/fuseki-server.jar tdb.tdbloader --tdb=assembler_file data_file using the copy of TDB included with Fuseki. Alternatively, use one of the TDB utilities tdbloader or tdbloader2:\n$JENA_HOME/bin/tdbloader --loc=directory data_file then build the spatial index with the jena.spatialindexer:\njava -cp jena-spatial.jar jena.spatialindexer --desc=assembler_file ","permalink":"https://jena.apache.org/documentation/query/spatial-query-doc.html","tags":null,"title":"Spatial searches with SPARQL"},{"categories":null,"contents":" The Jena Spatial module has been retired. The last release of Jena with jena-spatial was Jena 3.12.0. See jena-spatial/README.md. The original documentation is here\n","permalink":"https://jena.apache.org/documentation/query/spatial-query.html","tags":null,"title":"Spatial searches with SPARQL"},{"categories":null,"contents":"The Jena IRI Library is an implementation of RFC 3987 (IRI) and RFC 3986 (URI), and a partial implementation of other related standards. It is incomplete.\nJavadoc The IRI Library Javadoc (Public APIs)\nThe most important parts of the Javadoc are:\nViolationCodes Gives the relationships between the error codes and the specifications.\nIRI Gives the main interface for IRIs.\nIRIFactory Gives the main class for creating IRIs, including specifying which specifications you wish to be using, and with what degree of force.\nMinimal Documentation Unfortunately this version of the IRI Library has badly incomplete documentation, any help in producing good documentation would be appreciated.\nThe current version is incomplete with little indication as to where. It is primarily intended to support the functionality of checking strings against any of the various IRI or URI specifications. Some support for different levels of checking is provided.\nThese instructions are from a mail message on the jena-dev mailing list.\nSummary: ======= use something like: import org.apache.jena.iri.*; static IRIFactory iriFactory = IRIFactory .semanticWebImplementation(); ... boolean includeWarnings = false; IRI iri; iri = iriFactory.create(iriString); // always works if (iri.hasViolation(includeWarnings)) { // bad iri code } ... Since you are taking IRI rules seriously, you may want to have includeWarnings = true in the above. Full version ============ The code is found in the iri.jar, which is not particularly well documented, and the source and documentation is in the separate iri download, from the Jena download area. As shown, you start by building an IRIFactory org.apache.jena.iri.IRIFactory this embodies some set of rules, against which you will check an IRI. The one we use is: IRIFactory.jenaImplementation() For use by Jena team only. This method reflects the current IRI support in Jena, which is a moving target at present. (actually it hasn't ever moved - the main issue is to do with file: IRIs - we definitely want to be more liberal than a conservative reading of the specs allow, because, e.g. filenames with spaces in happen, and because file uris like file:localFile which aren't particularly conformant, also happen). others, that allow you to control which specs you are checking against are: IRIFactory.iriImplementation() RFC 3987 IRIFactory.uriImplementation() RFC 3986 (US-ASCII only) IRIFactory.semanticWebImplementation() This factory is a conservative implementation appropriate for Semantic Web applications. Having got your factory then you convert a string into an IRI in one of two ways, depending on how you want to handle errors: e.g. IRI iri; try { iri = iriFactory.construct{iriString); } catch (IRIException e) { // bad iri code } or boolean includeWarnings = false; IRI iri; iri = iriFactory.create{iriString); // always works if (iri.hasViolation(includeWarnings)) { // bad iri code // e.g. Iterator it = iri.violations(includeWarnings); while (it.hasNext()) { Violation v = (Violation) it.next(); // do something: printErrorMessages(v); } } various warning and error conditions are listed in the java doc for ViolationCodes (in the iri download). An error is a MUST force statement from the spec, a warning corresponds to a SHOULD force statement from the spec. There is also some support for 'minting' violations, which provide a stricter level of checking for IRIs that you are generating, as opposed to IRIs that have been passed to your application from elsewhere. So that, if I remember correctly: http://example.org:80/foo raises a warning with code DEFAULT_PORT_SHOULD_BE_OMITTED Like this one, many of the SHOULD force statements help avoid having two different IRIs that have the same operational semantics. Each spec is implemented as some set of active error and warning codes, so depending on which factory you chose in the first place, you may get a different collection of spec violations, some with SHOULD force and some with MUST force. There are also potentially warnings associated with security issues like IRI spoofing, which may not strictly violate any SHOULDs in any spec. ","permalink":"https://jena.apache.org/documentation/notes/iri.html","tags":null,"title":"Support for Internationalised Resource Identifiers in Jena"},{"categories":null,"contents":"ARQ supports the functions and operators from \u0026ldquo;XQuery 1.0 and XPath 2.0 Functions and Operators v3.1\u0026rdquo;.\nARQ supports all the XSD atomic datatypes.\nThe prefix fn is \u0026lt;http://www.w3.org/2005/xpath-functions#\u0026gt; (the XPath and XQuery function namespace).\nThe prefix math is \u0026lt;http://www.w3.org/2005/xpath-functions/math#\u0026gt;\nTo check the exact registrations for a specific version, see function/StandardFunctions.java in the source code for that version.\nThe supported datatypes (including those required by SPARQL 1.1), including full operator support, are the XSD atomic datatypes except for XML-related ones. Sequences are not supported.\nxsd:string, xsd:boolean, xsd:decimal, xsd:integer, xsd:double, xsd:float, xsd:double\nxsd:long, xsd:int, xsd:short, xsd:byte, xsd:nonPositiveInteger, xsd:negativeInteger, xsd:nonNegativeInteger, xsd:positiveInteger, xsd:unsignedLong, xsd:unsignedInt, xsd:unsignedShort\nxsd:duration, xsd:dayTimeDuration, xsd:yearMonthDuration\nxsd:anyURI\nxsd:dateTime, xsd:dateTimeStamp, xsd:date, xsd:time xsd:gYear, xsd:gYearMonth, xsd:gMonth, xsd:gMonthDay, xsd:gDay\nFunctions on atomic types not currently supported are list below (but check for later additions).\nSupported functions:\nfn:concat, fn:substring, fn:string-length, fn:upper-case, fn:lower-case, fn:contains, fn:starts-with, fn:ends-with, fn:substring-before, fn:substring-after, fn:matches, fn:replace, fn:abs, fn:ceiling, fn:floor, fn:round, fn:encode-for-uri,\nfn:year-from-dateTime, fn:month-from-dateTime, fn:day-from-dateTime, fn:hours-from-dateTime, fn:minutes-from-dateTime, fn:seconds-from-dateTime, fn:timezone-from-dateTime, fn:years-from-duration, fn:months-from-duration, fn:days-from-duration, fn:hours-from-duration, fn:minutes-from-duration, fn:seconds-from-duration,\nfn:boolean, fn:not, fn:normalize-space, fn:normalize-unicode, fn:format-number, fn:round-half-to-even,\nmath:pi, math:exp, math:exp10, math:log, math:log10, math:pow, math:sqrt, math:sin, math:cos, math:tan, math:asin, math:acos, math:atan, math:atan2\nF\u0026amp;O Functions not currently supported: fn:format-dateTime, fn:format-date, fn:format-time.\n","permalink":"https://jena.apache.org/documentation/query/xsd-support.html","tags":null,"title":"Support for XSD Datatype and XQuery/Xpath Functions and Operations."},{"categories":null,"contents":"RDF-star is an extension to RDF that provides a way for one triple to refer to another triple. RDF* is the name of the original work which is described in Olaf Hartig\u0026rsquo;s blog entry.\nExample:\n\u0026lt;\u0026lt; :john foaf:name \u0026#34;John Smith\u0026#34; \u0026gt;\u0026gt; dct:source \u0026lt;http://example/directory\u0026gt; . The part \u0026lt;\u0026lt; :john foaf:name \u0026quot;John Smith\u0026quot; \u0026gt;\u0026gt; is a quoted triple and refers to the triple with subject :john, property foaf:name and object \u0026quot;John Smith\u0026quot;.\nTriple terms can be in the subject or object position.\nJena provides support for RDF-star and the related SPARQL-star.\nTurtle, N-Triples, TriG and N-Quads extended for Triple Terms syntax, input and output. There is no output in RDF/XML. SPARQL extended with Triple Term syntax for graph matching. SPARQL Result formats for JSON and XML extended to support quoted triples in results. Support in the Model API. Translation to and from RDF reification. All this is active by default in Fuseki.\nThe aim is to follow the definition of the RDF-star community.\nStorage in databases TDB1 and TDB2 as well as in-memory databases is supported.\nRDF-star RDF-star syntax for quoted triples is added to the parsers for Turtle, N-Triples, TriG and N-Quads.\nDatasets may have graphs that have quoted triples that refer to triples anywhere, not just in the same graph.\nSPARQL-star Matches for quoted triples:\nSELECT ?name { \u0026lt;\u0026lt;:john foaf:name ?name \u0026gt;\u0026gt; dct:source \u0026lt;http://example/directory\u0026gt; } Insert triples terms into the default graph to record the graph source.\nINSERT { \u0026lt;\u0026lt;?s ?p ?o\u0026gt;\u0026gt; dct:source \u0026lt;http://example/directory\u0026gt; } WHERE { GRAPH \u0026lt;http://example/directory\u0026gt; { ?s ?p ?o } } Use in expressions:\nSELECT ?t { ?s ?p ?o BIND(\u0026lt;\u0026lt; ?s ?p ?o\u0026gt;\u0026gt; AS ?t) } SELECT (\u0026lt;\u0026lt; ?s ?p ?o\u0026gt;\u0026gt; AS ?t) { ?s ?p ?o } SPARQL Functions related to quoted triples These functions cause an expression error if passed the wrong type of arguments.\nFunction Description TRIPLE(?s, ?p, ?o) Create a quoted triple from s/p/o isTRIPLE(?t) Return true if the argument value is a quoted triple SUBJECT(?t) Return the subject of the quoted triple PREDICATE(?t) Return the predicate (property) of the quoted triple OBJECT(?t) Return the object of the quoted triple SPARQL results The syntaxes for SPARQL results from a SELECT query, application/sparql-results+json, application/sparql-results+xml are extended to include quoted triples:\nThe quoted triple \u0026lt;\u0026lt; _:b0 \u0026lt;http://example/p\u0026gt; 123 \u0026gt;\u0026gt; is encoded, in application/sparql-results+json as:\n{ \u0026#34;type\u0026#34;: \u0026#34;triple\u0026#34; , \u0026#34;value\u0026#34;: { \u0026#34;subject\u0026#34;: { \u0026#34;type\u0026#34;: \u0026#34;bnode\u0026#34; , \u0026#34;value\u0026#34;: \u0026#34;b0\u0026#34; } , \u0026#34;predicate\u0026#34;: { \u0026#34;type\u0026#34;: \u0026#34;uri\u0026#34; , \u0026#34;value\u0026#34;: \u0026#34;http://example/p\u0026#34; } , \u0026#34;object\u0026#34;: { \u0026#34;type\u0026#34;: \u0026#34;literal\u0026#34; , \u0026#34;datatype\u0026#34;: \u0026#34;http://www.w3.org/2001/XMLSchema#integer\u0026#34; , \u0026#34;value\u0026#34;: \u0026#34;123\u0026#34; } } and similarly in application/sparql-results+xml:\n\u0026lt;triple\u0026gt; \u0026lt;subject\u0026gt; \u0026lt;bnode\u0026gt;b0\u0026lt;/bnode\u0026gt; \u0026lt;/subject\u0026gt; \u0026lt;predicate\u0026gt; \u0026lt;uri\u0026gt;http://example/p\u0026lt;/uri\u0026gt; \u0026lt;/predicate\u0026gt; \u0026lt;object\u0026gt; \u0026lt;literal datatype=\u0026#34;http://www.w3.org/2001/XMLSchema#integer\u0026#34;\u0026gt;123\u0026lt;/literal\u0026gt; \u0026lt;/object\u0026gt; \u0026lt;/triple\u0026gt; Model API RDF-star quoted triples are treated as Resource to preserve the typed Model API. They occur in the subject and object positions.\nA Resource contains a Statement object if the underlying RDF term is an RDF-star quoted triple.\nNew methods include:\nStatement Resource.getStatement() Resource Model.createResource(Statement) Resource ResourceFactory.createStatement Reification org.apache.jena.system.RDFStar provides functions to translate RDF-star into RDF reification, and translate it back again to RDF-star.\nTranslating back to RDF-star relies on the consistency constraint that there is only one reification for each unique quoted triple term.\n","permalink":"https://jena.apache.org/documentation/rdf-star/","tags":null,"title":"Support of RDF-star"},{"categories":null,"contents":"Writing a Support Request A good support request or bug report is clear and concise. For ARQ, reduce the problem to a short dataset (20 triples should be enough; Turtle, N3 or N-triples preferred) and a query that illustrates the problem. State what you expected to happen as well as what did happen.\nIt also a good idea to check the documentation.\nARQ FAQ ARQ Documentation Parse Errors The SPARQL parser outputs a line and column number - it is usually correct in identifying the first point of a syntax error.\nExecution failure If you are reporting a failure that produces a stack trace, please include the message, exception and the stack trace. Only if it is very long, should you truncate the trace but please include the whole trace to the point where it indicates it is entering ARQ (a package name starting org.apache.jena.query) and then one level which is your code.\nUnexpected results If you are getting unexpected results, show the results you expect as well as what you actually get.\nIf you are getting no results, or less than you expected, try cutting parts out of the query until something changes.\nThere are various formatters for result sets included in ARQ. Print your results in text form if possible. There is a testing format used by the Data Access Working Group and that is used for the scripted tests in the distribution.\nReports A bug report or support request should be complete and minimal. It helps you to develop a concise description of the problem - often, you will discover the solution yourself.\nComplete means that any query included should include prefixes and the whole syntax string, not a fragment. Any program code should be ready to run, not a fragment of a large program.\nMinimal means that the data sent should be an abbreviated selection to illustrate the point.\nTypically, any program will be less that 20 lines, and any data less than 20 triples.\nThe report should also include details of the execution environment:\nEnvironment:\nARQ version Java version Operating system What\u0026rsquo;s the CLASSPATH? How are you running the query?\nAre you running in an application server? Which one? Have you used the command line tools? Data:\nDoes your data parse as RDF? Are you querying an inference model? Query:\nHave you printed out the query exactly as it is (especially if the string has been assembled in by java code)? It is a good idea to print the query out after building it programmatically. Have you passed the query through the SPARQL syntax checker? java -cp ... arq.qparse 'your query' Bug reports should be sent to the mailto:users@jena.apache.org (you need to subscribe to send to this list).\nARQ documentation index\n","permalink":"https://jena.apache.org/documentation/query/support_request.html","tags":null,"title":"Support Request"},{"categories":null,"contents":"TDB is a component of Jena for RDF storage and query. It supports the full range of Jena APIs. TDB can be used as a high performance RDF store on a single machine. This documentation describes the latest version, unless otherwise noted.\nThis is the documentation for the current standard version of TDB. This is also called TDB1 to distinguish it from the next generation version TDB2\nTDB1 and TDB2 databases are not compatible.\nA TDB store can be accessed and managed with the provided command line scripts and via the Jena API. When accessed using transactions a TDB dataset is protected against corruption, unexpected process terminations and system crashes.\nA TDB dataset should only be directly accessed from a single JVM at a time otherwise data corruption may occur. From 1.1.0 onwards TDB includes automatic protection against multi-JVM usage which prevents this under most circumstances.\nIf you wish to share a TDB dataset between multiple applications please use our Fuseki component which provides a SPARQL server that can use TDB for persistent storage and provides the SPARQL protocols for query, update and REST update over HTTP.\nDocumentation Using TDB from Java through the API Command line utilities Transactions Assemblers for Graphs and Datasets Datasets and Named Graphs Dynamic Datasets: Query a subset of the named graphs Quad filtering: Hide information in the dataset The TDB Optimizer TDB Configuration Value Canonicalization TDB Design Use on 64 bit or 32 bit Java systems FAQs ","permalink":"https://jena.apache.org/documentation/tdb/","tags":null,"title":"TDB"},{"categories":null,"contents":"TDB (as of version Jena 3.0.0) supports configuration of the databases when they are first created and each time an application connects to an existing database. Databases using the default settings built-into TDB continue to work exactly as before.\nSetting Store Parameters In TDB, there is exactly one internal object for each dataset in the JVM and this is shared between all application datasets for that location of persistent storage.\nSetting store parameters is done by setting the internal system state before any other access to the disk area occurs. It is not possible to have different setups for the same dataset on disk.\nStoreParams are set by populating the internal state with the setup before an application level dataset is created.\nTDBFactory.setup(Location location, StoreParams params) This must be called before any application calls to get a Dataset (or DatasetGraph) object otherwise IllegalStateException is thrown by this function.\nLocation location = ... ; StoreParams customParams = ... ; TDBFactory.setup(location, customParams) ; Dataset ds = TDBFactory.createDataset(location) ; ... It is only possible to change store parameters by expelling the managed storage by calling TDBFactory.release(Location). This drops all caching. Access to the dataset is then a cold start.\nPer-connect Options The per-connect options are the ones that can be changed after the database has been created and can be different each time the application attaches to the database. A database can have at most one JVM attached to it (see Fuseki to share a database).\nThese options do not affect the on-disk structures.\nJSON key name Default value Notes tdb.file_mode See below tdb.node2nodeid_cache_size 100,000 50,000 on 32 bit java tdb.nodeid2node_cache_size 500,000 50,000 on 32 bit java tdb.node_miss_cache_size 100 tdb.block_read_cache_size 10000 Only in direct mode tdb.block_write_cache_size 2000 Only in direct mode File access - \u0026ldquo;mapped\u0026rdquo; and \u0026ldquo;direct\u0026rdquo; modes TDB has two modes of operation for accessing block files - \u0026ldquo;mapped\u0026rdquo; and \u0026ldquo;direct\u0026rdquo;.\n\u0026ldquo;mapped\u0026rdquo; uses memory mapped files and so the operating system is managing caching, flexing the amount of memory for file system cache to balance demands from other programmes on the same hardware.\n\u0026ldquo;direct\u0026rdquo; using TDB\u0026rsquo;s own in-heap block caching. It avoids the problem that addressing is limited to a total of about 1.5Gbytes on 32 bit Java.\nBy default, TDB uses memory mapped files on 64 bit Java and its own file caching on 32 bit java.\nOn Microsoft Windows, \u0026ldquo;mapped\u0026rdquo; databases can not be deleted while the JVM is running on MS Windows. This is a known issue with Java.\nTDB databases are compatible across these file modes. There is no difference to the file layouts. Memory mapped files may appear larger because they contain unused space. Some utilities report this in file size, some do not.\nCaching options. These are the useful tuning options. Only the node* choices have any effect when running in \u0026ldquo;mapped\u0026rdquo; mode.\nAll these options effect the amount of heap used. The block read/write cache sizes are tuned to 32 bit Java.\nIncreasing the Node/NodeId cache sizes on 64 bit machines may be beneficial.\nStatic Options While it is possible to customize a database, this is considered to be experimental. It is possible to corrupt, unrecoverable, existing databases and create nonsense databases with inappropriate settings. It will be useful in very few real situations. Not all combinations of index choices will work. Only the standard layout is supported; alternative schemes are for experimentation only.\nBlock Size The block size can not be changed once a database has been created.\nWhile the code attempts to detect block size mismatches, in order to retain compatibility with existing database, the testing can not be perfect. If undetected, any update will permanently and irrecoverably damage the database.\nStore Parameters File Format JSON is used for the on-disk record of store parameters, see the example below. Unspecified options defaults to the for the running setup.\nThese are default settings for a 64 bit Java:\n{ \u0026#34;tdb.file_mode\u0026#34; : \u0026#34;mapped\u0026#34; , \u0026#34;tdb.block_size\u0026#34; : 8192 , \u0026#34;tdb.block_read_cache_size\u0026#34; : 10000 , \u0026#34;tdb.block_write_cache_size\u0026#34; : 2000 , \u0026#34;tdb.node2nodeid_cache_size\u0026#34; : 100000 , \u0026#34;tdb.nodeid2node_cache_size\u0026#34; : 500000 , \u0026#34;tdb.node_miss_cache_size\u0026#34; : 100 , \u0026#34;tdb.index_node2id\u0026#34; : \u0026#34;node2id\u0026#34; , \u0026#34;tdb.index_id2node\u0026#34; : \u0026#34;nodes\u0026#34; , \u0026#34;tdb.triple_index_primary\u0026#34; : \u0026#34;SPO\u0026#34; , \u0026#34;tdb.triple_indexes\u0026#34; : [ \u0026#34;SPO\u0026#34; , \u0026#34;POS\u0026#34; , \u0026#34;OSP\u0026#34; ] , \u0026#34;tdb.quad_index_primary\u0026#34; : \u0026#34;GSPO\u0026#34; , \u0026#34;tdb.quad_indexes\u0026#34; : [ \u0026#34;GSPO\u0026#34; , \u0026#34;GPOS\u0026#34; , \u0026#34;GOSP\u0026#34; , \u0026#34;POSG\u0026#34; , \u0026#34;OSPG\u0026#34; , \u0026#34;SPOG\u0026#34; ] , \u0026#34;tdb.prefix_index_primary\u0026#34; : \u0026#34;GPU\u0026#34; , \u0026#34;tdb.prefix_indexes\u0026#34; : [ \u0026#34;GPU\u0026#34; ] , \u0026#34;tdb.file_prefix_index\u0026#34; : \u0026#34;prefixIdx\u0026#34; , \u0026#34;tdb.file_prefix_nodeid\u0026#34; : \u0026#34;prefix2id\u0026#34; , \u0026#34;tdb.file_prefix_id2node\u0026#34; : \u0026#34;prefixes\u0026#34; } Choosing the store parameters This is the policy applied when creating or reattaching to a database.\nIf the database location has a parameter file, tdb.cfg then use that. This is modified by any dynamic options supplied by the application. So to create a specialized database, one way to do that is to create an empty directory and put a tdb.cfg in place.\nIf there is no parameter file and this is a new database, use the application provided store parameters, or if there are no application provided parameters, use the system default parameters. If application supplied parameters are used, write a tdb.cfg file.\nFinally, if this is an existing database, with no tdb.cfg, use the system default modified by any application parameters.\nIn other words, if there is no tdb.cfg assume the system defaults, except when creating a database.\nModification involves taking one set of store parameters and applying any dynamic parameters set in the second set. Only explicitly set dynamic parameters modify the original.\n","permalink":"https://jena.apache.org/documentation/tdb/store-parameters.html","tags":null,"title":"TDB - Store Parameters"},{"categories":null,"contents":"This page gives an overview of the TDB architecture. It applies to TDB1 and TDB2 with differences noted.\nTerminology Terms like \u0026ldquo;table\u0026rdquo; and \u0026ldquo;index\u0026rdquo; are used in this description. They don\u0026rsquo;t directly correspond to concepts in SQL, For example, in SQL terms, there is no triple table; that can be seen as just having indexes for the table or, alternatively, there are 3 tables, each of which has a primary key and TDB manages the relationship between them.\nDesign A dataset backed by TDB is stored in a single directory in the filing system. A dataset consists of\nThe node table Triple and Quad indexes The prefixes table The Node Table The node table stores the representation of RDF terms (except for inlined value - see below). It provides two mappings from Node to NodeId and from NodeId to Node.\nThe Node to NodeId mapping is used during data loading and when converting constant terms in queries from their Jena Node representation to the TDB-specific internal ids.\nThe NodeId to Node mapping is used to turn query results expressed as TDB NodeIds into the Jena Node representation and also during query processing when filters are applied if the whole node representation is needed for testing (e.g. regex).\nNode table implementations usually provide a large cache - the NodeId to Node mapping is heavily used in query processing yet the same NodeId can appear in many query results.\nNodeIds are 8 byte quantities. The Node to NodeId mapping is based on hash of the Node (a 128 bit MD5 hash - the length was found not to major performance factor).\nThe default storage of the node table is a sequential access file for the NodeId to Node mapping and a B+Tree for the Node to NodeId mapping.\nTriple and Quad indexes Quads are used for named graphs, triples for the default graph. Triples are held as 3-tuples of NodeIds in triple indexes - quads as 4-tuples. Otherwise they are handled in the same manner.\nThe triple table is 3 indexes - there is no distinguished triple table with secondary indexes. Instead, each index has all the information about a triple.\nThe default storage of each indexes\nPrefixes Table The prefixes table uses a node table and a index for GPU (Graph-\u0026gt;Prefix-\u0026gt;URI). It is usually small. It does not take part in query processing. It provides support for Jena\u0026rsquo;s PrefixMappings used mainly for presentation and for serialisation of triples in RDF/XML or Turtle.\nTDB B+Trees Many of the persistent data structures in TDB use a custom implementation of B+Trees. The TDB implementation only provides for fixed length key and fixed length value. There is no use of the value part in triple and quads indexes.\nTransactions Both TDB1 and TDB2 provide database transactions. The API is described on the Jena Transactions page.\nWhen running with transactions, TDB1 and TDB2 provide support for multiple read and write transactions without application involvement. There will be multiple readers active, and also a single writer active (referred to as \u0026ldquo;MR+SW\u0026rdquo;). TDB itself manages multiple writers, queuing them as necessary.\nTo support transactions, TDB2 uses copy-on-write MVCC data structures internally.\nTDB1 can run non-transactionally but the application is responsible for ensuring that there is one writer or several readers, not both. This is referred to as \u0026ldquo;MRSW\u0026rdquo;. Misuse of TDB1 in non-transactional mode can corrupt the database.\nInline values Values of certain datatypes are held as part of the NodeId. The top bit indicates whether the remaining 63 bits are a position in the stored RDF terms file (high bit is 0) or an encoded value (high bit 1).\nBy storing the value, the exact lexical form is not recorded. The integers 01 and 1 will both be treated as the value 1.\nTDB2 The TDB2 encoding is as follows:\nHigh bit (bit 63) 0 means the node is in the object table (PTR). High bit (bit 63) 1, bit 62 1: double as 62 bits. High bit (bit 63) 1, bit 62 0: 6 bits of type, 56 bits of value. If a value would not fit, it will be stored externally so there is no guarantee that all integers, say, are store inline.\nInteger format: signed 56 bit number, the type field has the XSD type. Derived types of integer, each with their own datatype. Decimal format: 8 bits scale, 48bits of signed valued. Date and DateTime Boolean Float In the case of xsd:double, the standard Java 64 bit format is used except that the range of the exponent is reduced by 2 bits.\nbit 63 : sign bit bits 52-62 : exponent, 11 bits, the power of 2, bias -1023. bits 0-51 : mantissa (significand) 52 bits (the leading one is not stored). Exponents are 11 bits, with values -1022 to +1023 held as 1 to 2046 (11 bits, bias -1023) Exponents 0x000 and 0x7ff have a special meaning:\nThe xsd:dateTime and xsd:date ranges cover about 8000 years from year zero with a precision down to 1 millisecond. Timezone information is retained to an accuracy of 15 minutes with special timezones for Z and for no explicit timezone.\nTDB1 The value spaces handled are: xsd:decimal, xsd:integer, xsd:dateTime, xsd:date and xsd:boolean. Each has its own encoding to fit in 56 bits. If a node falls outside of the range of values that can be represented in the 56 bit encoding.\nThe xsd:dateTime and xsd:date ranges cover about 8000 years from year zero with a precision down to 1 millisecond. Timezone information is retained to an accuracy of 15 minutes with special timezones for Z and for no explicit timezone.\nDerived XSD datatypes are held as their base type. The exact datatype is not retained; the value of the RDF term is. An input of xsd:int will become xsd:integer.\nQuery Processing TDB uses quad-execution rewriting SPARQL algebra (graph...) to blocks of quads where possible. It extends OpExecutor. TDB provides low level optimization of basic graph patterns using a statistics based optimizer.\nCaching on 32 and 64 bit Java systems TDB runs on both 32-bit and 64-bit Java Virtual Machines. A 64-bit Java Virtual Machine is the normal mode of use. The same file formats are used on both systems and database files can be transferred between architectures (no TDB system should be running for the database at the time of copy). What differs is the file access mechanism used.\nThe node table caches are always in the Java heap but otherwise the OS file system plays an important part in index caching.\nThe file access mechanism can be set explicitly, but this is not a good idea for production usage, only for experimentation - see the File Access mode option.\nOn 64-bit Java, TDB uses memory mapped files, accessed 8M segments, and the operating system handles caching between RAM and disk. The amount of RAM used for file caching increases and decreases as other application run on the machine. The fewer other programs running on the machine, the more RAM will be available for file caching. The mapped address space counts as part of the application processes memory usage but this space is not part of the Java heap.\nOn a 32 bit JVM, this approach does not work because Java addressing is limited to about 1.5Gbytes (the exact figure is JVM specific and includes any memory mapped file usage) and this would limit the size of TDB datasets. Instead, TDB provides an in-heap LRU cache of B+Tree blocks. Applications should set the JVM heap to 1G or above (within the JVM specific limit).\nOn 32-bit Java, TDB uses its own file caching to enable large databases. 32-bit Java limits the address space of the JVM to about 1.5Gbytes (the exact size is JVM-dependent), and this includes memory mapped files, even though they are not in the Java heap. The JVM heap size may need to be increased to make space for the disk caches used by TDB.\n","permalink":"https://jena.apache.org/documentation/tdb/architecture.html","tags":null,"title":"TDB Architecture"},{"categories":null,"contents":"Assemblers are a general mechanism in Jena to describe objects to be built, often these objects are models and datasets. Assemblers are used heavily in Fuseki for dataset and model descriptions, for example.\nSPARQL queries operate over an RDF dataset, which is an unnamed, default graph and zero or more named graphs.\nHaving the description in a file means that the data that the application is going to work on can be changed without changing the program code.\nDataset This is needed for use in Fuseki.\nA dataset can be constructed in an assembler file:\nPREFIX tdb: \u0026lt;http://jena.hpl.hp.com/2008/tdb#\u0026gt; PREFIX rdf: \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026gt; PREFIX rdfs: \u0026lt;http://www.w3.org/2000/01/rdf-schema#\u0026gt; PREFIX ja: \u0026lt;http://jena.hpl.hp.com/2005/11/Assembler#\u0026gt; \u0026lt;#dataset\u0026gt; rdf:type tdb:DatasetTDB ; tdb:location \u0026quot;DB\u0026quot; ; . Only one dataset can be stored in a location (filing system directory).\nThe first section declares the prefixes used later:\nPREFIX tdb: \u0026lt;http://jena.hpl.hp.com/2008/tdb#\u0026gt; PREFIX rdf: \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026gt; PREFIX rdfs: \u0026lt;http://www.w3.org/2000/01/rdf-schema#\u0026gt; PREFIX ja: \u0026lt;http://jena.hpl.hp.com/2005/11/Assembler#\u0026gt; then there is the description of the TDB dataset itself:\n\u0026lt;#dataset\u0026gt; rdf:type tdb:DatasetTDB ; tdb:location \u0026quot;DB\u0026quot; ; The property tdb:location gives the file name as a string. It is relative to the applications current working directory, not where the assembler file is read from.\nThe dataset description is usually found by looking for the one subject with type tdb:GraphDataset. If more than one graph is given in a single file, the application will have to specify which description it wishes to use.\nUnion Default Graph An assembler can specify that the default graph for query is the union of the named graphs. This is done by adding tdb:unionDefaultGraph.\n\u0026lt;#dataset\u0026gt; rdf:type tdb:DatasetTDB ; tdb:location \u0026quot;DB\u0026quot; ; tdb:unionDefaultGraph true ; . Graph TDB always stores data in an RDF dataset. It is possible to use just one of the graphs from the dataset. A common way of working with one graph is to use the default graph of the dataset.\nA single graph from a TDB dataset can be described by:\nPREFIX tdb: \u0026lt;http://jena.hpl.hp.com/2008/tdb#\u0026gt; PREFIX rdf: \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026gt; PREFIX rdfs: \u0026lt;http://www.w3.org/2000/01/rdf-schema#\u0026gt; PREFIX ja: \u0026lt;http://jena.hpl.hp.com/2005/11/Assembler#\u0026gt; \u0026lt;#dataset\u0026gt; rdf:type tdb:DatasetTDB ; tdb:location \u0026quot;DB\u0026quot; ; \u0026lt;#graph\u0026gt; rdf:type tdb:GraphTDB ; tdb:dataset \u0026lt;#dataset\u0026gt; . A particular named graph in the dataset at a location can be assembled with:\n\u0026lt;#graphNamed\u0026gt; rdf:type tdb:GraphTDB ; tdb:dataset \u0026lt;#dataset\u0026gt; ; tdb:graphName \u0026lt;http://example/graph1\u0026gt; ; . Mixed Datasets It is possible to create a dataset with graphs backed by different storage subsystems, although query is not necessarily as efficient.\nTo include as a named graph in a dataset use vocabulary as shown below:\nPREFIX tdb: \u0026lt;http://jena.hpl.hp.com/2008/tdb#\u0026gt; PREFIX rdf: \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026gt; PREFIX rdfs: \u0026lt;http://www.w3.org/2000/01/rdf-schema#\u0026gt; PREFIX ja: \u0026lt;http://jena.hpl.hp.com/2005/11/Assembler#\u0026gt; # A dataset of one TDB-backed graph as the default graph and # an in-memory graph as a named graph. \u0026lt;#dataset\u0026gt; rdf:type ja:RDFDataset ; ja:defaultGraph \u0026lt;#graph\u0026gt; ; ja:namedGraph [ ja:graphName \u0026lt;http://example.org/name1\u0026gt; ; ja:graph \u0026lt;#graph2\u0026gt; ] ; . \u0026lt;#graph\u0026gt; rdf:type tdb:GraphTDB ; tdb:location \u0026quot;DB\u0026quot; ; . \u0026lt;#graph2\u0026gt; rdf:type ja:MemoryModel ; ja:content [ja:externalContent \u0026lt;file:Data/books.n3\u0026gt; ] ; . Note here we added:\ntdb:DatasetTDB rdfs:subClassOf ja:RDFDataset . tdb:GraphTDB rdfs:subClassOf ja:Model . which provides for integration with complex model setups, such as reasoners.\nRDFS PREFIX tdb: \u0026lt;http://jena.hpl.hp.com/2008/tdb#\u0026gt; PREFIX rdf: \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026gt; PREFIX rdfs: \u0026lt;http://www.w3.org/2000/01/rdf-schema#\u0026gt; PREFIX ja: \u0026lt;http://jena.hpl.hp.com/2005/11/Assembler#\u0026gt; tdb:Dataset a rdfs:Class . tdb:GraphTDB a rdfs:Class . tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset . tdb:GraphTDB rdfs:subClassOf ja:Model . tdb:location a rdf:Property ; # domain is tdb:Dataset or tdb:GraphTDB # The range is simple literal . tdb:unionDefaultGraph a rdf:Property ; rdfs:domain tdb:Dataset ; # The range is xsd:boolean . tdb:graphName a rdf:Property ; rdfs:domain tdb:GraphTDB ; # range is a URI . ","permalink":"https://jena.apache.org/documentation/tdb/assembler.html","tags":null,"title":"TDB Assembler"},{"categories":null,"contents":"Installation From Apache Jena version 2.7.x onwards, TDB is now installed as part of a single integrated Jena package. There is no longer a need to install a separate TDB package to run the TDB command line tools, or to use TDB in your Java programs. See the downloads page for details on getting the latest Jena release.\nScripts From the location The directory bin/ contains shell scripts to run the commands from the command line. The scripts are bash scripts which should work on Linux systems, Windows systems using Cygwin and Mac/OS systems. The directory bat/ contains Windows batch files which provide the same functionality for Windows systems that are not using Cygwin.\nScript set up The TDB tools are included in the jena toolset. See the command line tools page.\nCommand line script arguments Each command then has command-specific arguments described below.\nAll commands support --help to give details of named and positional arguments.\nThere are two equivalent forms of named argument syntax:\n--arg=val --arg val Setting options from the command line TDB has a number of configuration options which can be set from the command line using:\n--set tdb:symbol=value Using tdb: is really a short hand for the URI prefix http://jena.hpl.hp.com/TDB# so the full URI form is\n--set http://jena.hpl.hp.com/TDB#symbol=value TDB Commands Store description TDB commands use an assembler description for the persistent store\n--desc=assembler.ttl --tdb=assembler.ttl or a direct reference to the directory with the index and node files:\n--loc=DIRECTORY --location=DIRECTORY The assembler description follow the form for a dataset given in TDB assembler description page.\nIf neither assembler file nor location is given, --desc=tdb.ttl is assumed.\ntdbloader tdbloader --loc /path/for/database ...input files ... Input files can be any RDF syntax; triple formats (e.g. N-Triples, Turtle) are loaded into the default graph, quad formats (e.g. N-Quads, TriG) are loaded into the dataset according to the name or the default graph.\nBulk loader and index builder. Performs bulk load operations more efficiently than simply reading RDF into a TDB-back model.\ntdb.xloader tdb1.xloader and tdb2.xloader are bulk loaders for very large data for TDB1 and TDB2.\nSee TDB xloader for more information. These loaders only work on Linux since it relies on some Unix system utilities.\ntdbquery Invoke a SPARQL query on a store. Use --time for timing information. The store is attached on each run of this command so timing includes some overhead not present in a running system.\nDetails about query execution can be obtained \u0026ndash; see notes on the TDB Optimizer.\ntdbdump Dump the store in N-Quads format.\ntdbstats Produce a statistics for the dataset. See the TDB Optimizer description..\ntdbloader2 This has been replace by TDB xloader.\nThis bulk loader can only be used to create a database. It may overwrite existing data. It requires accepts the --loc argument and a list of files to load e.g.\ntdbloader2 --loc /path/for/database input1.ttl input2.ttl ... Advanced tdbloader2 Usage There are various other advanced options available to customise the behaviour of the bulk loader. Run with --help to see the full usage summary.\nIt is possible to do builds in phases by using the tdbloader2data and tdbloader2index scripts separately though this should only be used by advanced users. You can also do this by passing the --phase argument to the tdbloader2 script and specifying data or index as desired.\nThe indexing phase of the build uses the sort utility to prepare the raw data for indexing, this can potentially require large amounts of disk space and the scripts will automatically check and warn/abort if the disk space looks to be/is insufficient.\nIf you are building a large dataset (i.e. gigabytes of input data) you may wish to have the PipeViewer tool installed on your system as this will provide extra progress information during the indexing phase of the build.\n","permalink":"https://jena.apache.org/documentation/tdb/commands.html","tags":null,"title":"TDB Command-line Utilities"},{"categories":null,"contents":"There are a number of configuration options that affect the operation of TDB.\nSetting Options Options can be set globally, through out the JVM, or on a per query execution basis. TDB uses the same mechanism as ARQ.\nThere is a global context, which is give to each query execution as it is created. Modifications to the global context after the query execution is created are not seen by the query execution. Modifications to the context of a single query execution do not affect any other query execution nor the global context.\nA context is a set of symbol/value pairs. Symbols are used created internal to ARQ and TDB and accessed via Java constants. Values are any Java object, together with the values true and false, which are short for the constants of class java.lang.Boolean.\nSetting globally:\nTDB.getContext().set(symbol, value) ; Per query execution:\ntry(QueryExecution qExec = QueryExecution.dataset(dataset) .query(query).set(ARQ.symLogExec,true).build() ) { .... } Setting for a query execution happens before any query compilation or setup happens. Creation of a query execution object does not compile the query, which happens when the appropriate .exec method is called.\nSetting from the command line Options can also set from the command line with \u0026ldquo;--set.\nSetting with Java System properties (TDB 0.8.5 and later)\nOptions can be set when invoking the JVM using the Java system properties as set by \u0026ldquo;-D\u0026rdquo;.\nQuery of the union of named graphs See TDB/Datasets.\nLogging Query Execution If the symbol \u0026ldquo;tdb:logExec\u0026rdquo; is set to \u0026ldquo;true\u0026rdquo;, and also the logger org.apache.jena.tdb.exec is enabled from level \u0026ldquo;info\u0026rdquo;, then each basic graph patterns is logged before execution. This pattern logged is after substitution of variable values and after optimization by the BGP optimizer.\nDataset Caching (TDB 0.8.0 and later)\nTDB caches datasets based on the location of the backing directory. Within a single JVM, all attempts to create or open a dataset at a particular location go through the same dataset (and same disk caching). Therefore, an application can open the same location several times in different places in the code and still get the same underlying dataset for query and update.\nNote that closing the dataset closes it everywhere (the opening calls are not being reference counted).\nFile Access Mode The context symbol can be set to \u0026ldquo;mapped\u0026rdquo; or \u0026ldquo;direct\u0026rdquo;. Unset, or the value \u0026ldquo;default\u0026rdquo;, ask TDB to use to make the choice based on JVM. Leaving it to the default is strongly encouraged.\nTDB Configuration Symbols Configuration Symbols\nSymbol Java Constant Effect Default tdb:logExec TDB.symLogExec Log execution of BGPs. Set to \u0026ldquo;true\u0026rdquo; to enable. Must also enable the logger \u0026ldquo;org.apache.jena.tdb.exec\u0026rdquo;. unset tdb:unionDefaultGraph TDB.symUnionDefaultGraph Query patterns on the default graph match against the union of the named graphs. unset tdb:fileMode SystemTDB.fileMode Force use of memory mapped files (\u0026quot;mapped\u0026quot;) or direct file caching (\u0026quot;direct\u0026quot;). See discussion of TDB on 32 or 64 bit hardware, especially limitations of memory mapped files on 32 bit Java. Set by the system based on 32 or 64 bit java. Advanced Store Configuration Various internal caching sizes can be set to different values to the defaults. See the full description.\n","permalink":"https://jena.apache.org/documentation/tdb/configuration.html","tags":null,"title":"TDB Configuration"},{"categories":null,"contents":"An RDF Dataset is a collection of one, unnamed, default graph and zero, or more named graphs. In a SPARQL query, a query pattern is matched against the default graph unless the GRAPH keyword is applied to a pattern.\nDataset Storage One file location (directory) is used to store one RDF dataset. The unnamed graph of the dataset is held as a single graph while all the named graphs are held in a collection of quad indexes.\nEvery dataset is obtained via TDBFactory.createDataset(Location) within a JVM is the same dataset. (If a model is obtained from via TDBFactory.createModel(Location) there is a hidden, shared dataset and the appropriate model is returned. The preferred style is to create the dataset, then get a model.)\nDataset Query There is full support for SPARQL query over named graphs in a TDB-back dataset.\nAll the named graphs can be treated as a single graph which is the union (RDF merge) of all the named graphs. This is given the special graph name urn:x-arq:UnionGraph\\ in a GRAPH pattern.\nWhen querying the RDF merge of named graphs, the default graph in the store is not included. This feature applies to queries only. It does not affect the storage nor does it change loading.\nAlternatively, if the symbol tdb:unionDefaultGraph (see TDB Configuration) is set, the unnamed graph for the query is the union of all the named graphs in the datasets. The stored default graph is ignored and is not part of the data of the union graph although it is accessible by the special name \u0026lt;urn:x-arq:DefaultGraph\\\u0026gt; in a GRAPH pattern.\nSet globally:\nTDB.getContext().set(TDB.symUnionDefaultGraph, true) ; or set on a per query basis:\ntry(QueryExecution qExec = QueryExecution.dataset(dataset) .query(query) .set(TDB.symUnionDefaultGraph,true) .build() ) { .... } Special Graph Names URI Meaning urn:x-arq:UnionGraph The RDF merge of all the named graphs in the datasets of the query. urn:x-arq:DefaultGraph The default graph of the dataset, used when the default graph of the query is the union graph. Note that setting tdb:unionDefaultGraph does not affect the default graph or default model obtained with dataset.getDefaultModel().\nThe RDF merge of all named graph can be accessed as the named graph urn:x-arq:UnionGraph using Dataset.getNamedModel(\u0026quot;urn:x-arq:UnionGraph\u0026quot;) .\nDataset Inferencing Inferencing on a Model in a Dataset, using the TDB Java API, follows the same pattern as an in-memory InfModel. The use of TDB Transactions is strongly recommended to avoid data corruption.\n//Open TDB Dataset String directory = ... Dataset dataset = TDBFactory.createDataset(directory); //Retrieve Named Graph from Dataset, or use Default Graph. String graphURI = \u0026quot;http://example.org/myGraph\u0026quot;; Model model = dataset.getNamedModel(graphURI); //Create RDFS Inference Model, or use other Reasoner e.g. OWL. InfModel infModel = ModelFactory.createRDFSModel(model); ... //Perform operations on infModel. ... ","permalink":"https://jena.apache.org/documentation/tdb/datasets.html","tags":null,"title":"TDB Datasets"},{"categories":null,"contents":"TDB version 0.8.5 and later\nThis feature allows a query to be made on a subset of all the named graphs in the TDB storage datasets. The SPARQL GRAPH pattern allows access to either a specific named graph or to all the named graph in a dataset. This feature means that only specified named graphs are visible to the query.\nSPARQL has the concept of a dataset description. In a query string, the clauses for FROM and FROM NAMED specify the dataset. The FROM clauses define the graphs that are merged to form the default graph, and the FROM NAMED clauses identify the graphs to be included as named graphs.\nNormally, ARQ interprets these as coming from the web; the graphs are read using HTTP GET. TDB modifies this behaviour; instead of the universe of graphs being the web, the universe of graphs is the TDB data store. FROM and FROM NAMED describe a dataset with graphs drawn only from the TDB data store.\nUsing one or more FROM clauses, causes the default graph of the dataset to be the union of those graphs. Using one or more FROM NAMED, with no FROM in a query, causes an empty graph to be used for the default graph. Using one or more FROM NAMED, with no FROM in a query, where the symbol TDB.symUnionDefaultGraph is also set, causes the default graph to be the set union of all the named graphs (FROM NAMED). Example\n#Follow a foaf:knows path across both Alice and Bobs FOAF data #where the data is in the datastore as named graphs. BASE \u0026lt;http://example\u0026gt; SELECT ?zName FROM \u0026lt;alice-foaf\u0026gt; FROM \u0026lt;bob-foaf\u0026gt; { \u0026lt;http://example/Alice#me\u0026gt; foaf:knows ?y . ?y foaf:knows ?z . ?z foaf:name ?zName . } ","permalink":"https://jena.apache.org/documentation/tdb/dynamic_datasets.html","tags":null,"title":"TDB Dynamic Datasets"},{"categories":null,"contents":"FAQs What are TDB1 and TDB2? Does TDB support Transactions? Can I share a TDB dataset between multiple applications? What is the Impossibly Large Object exception? What are the ObjectFile.read() and ObjectFileStorage.read() errors? What is the difference between tdbloader and tdbloader2? How large a Java heap size should I use for TDB? Does Fuseki/TDB have a memory leak? Should I use a SSD? Why do I get the exception Can\u0026rsquo;t open database at location /path/to/db as it is already locked by the process with PID 1234 when trying to open a TDB database? I see a warning that Location /path/to/db was not locked, if another JVM accessed this location simultaneously data corruption may have occurred in my logs? Why can\u0026rsquo;t I delete a dataset (MS Windows/64 bit)? What is the Unable to check TDB lock owner, the lock file contents appear to be for a TDB2 database. Please try loading this location as a TDB2 database error? My question isn\u0026rsquo;t answered here? \u0026lt;a name=\u0026ldquo;tdb1-tdb2\u0026gt;\nTDB1 and TDB2 TDB2 is a later generation of database for Jena. It is more robust and can handle large update transactions.\nThese are different databases systems - they have different on-disk file formats and databases for one are not compatible with other database engine.\nDoes TDB support transactions? Yes, TDB provides Serializable transactions, the highest isolation level.\nUsing transactions is strongly recommended as they help prevent data corruption from unexpected process termination and system crashes as well as data corruption that can otherwise occur from non-transactional use of TDB.\nPlease see the transactions documentation for how to use TDB transactionally.\nCan I share a TDB dataset between multiple applications? Multiple applications, running in multiple JVMs, using the same file databases is not supported and has a high risk of data corruption. Once corrupted, a database cannot be repaired and must be rebuilt from the original source data. Therefore there must be a single JVM controlling the database directory and files.\nTDB includes automatic prevention of multi-JVM usage which prevents this under most circumstances and helps protect your data from corruption.\nIf you wish to share a TDB dataset between applications use our Fuseki component which provides a database server. Fuseki supports SPARQL Query, SPARQL Update and the SPARQL Graph Store protocol. Applications should be written in terms of these protocols using the relevant Jena APIs, this has the added benefit of making your applications portable to another SPARQL backend should you ever need to.\nWhat is the Impossibly Large Object exception? The Impossibly Large Object exception is an exception that occurs when part of your TDB dataset has become corrupted. It may only affect a small section of your dataset so may only occur intermittently depending on your queries. For example some queries may continue to function normally while other queries or queries with/without particular features may fail. A particular query that fails with this error should continue to always fail unless the database is modified.\nA query that touches the entirety of the dataset will always encounter this exception and can be used to verify whether your database has this problem e.g.\nSELECT * WHERE { { ?s ?p ?o } UNION { GRAPH ?g { ?s ?p ?o } } } The corruption may have happened at any time in the past and once it has happened there is no way to repair it. Corrupted datasets will need to be rebuilt from the original source data, this is why we strongly recommend you use transactions since this protects your dataset against corruption.\nTo resolve this problem you must rebuild your database from the original source data, a corrupted database cannot be repaired.\nWhat are the ObjectFile.read() and ObjectFileStorage.read() errors? These errors are closely related to the above Impossibly Large Object exception, they also indicate corruption to your TDB database.\nAs noted above to resolve this problem you must rebuild your database from the original source data, a corrupted database cannot be repaired. This is why we strongly recommend you use transactions since this protects your dataset against corruption.\nWhat is tdb.xloader? tdb1.xloader and tdb2.xloader are bulk loaders for very large datasets that take several hours to load.\nSee TDB xloader for more information.\nWhat is the different between tdbloader and tdbloader2? tdbloader2 has been replaced by tdb1.xloader and tdb2.xloader for TDB1 and TDB2 respectively.\ntdbloader and tdbloader2 differ in how they build databases.\ntdbloader is Java based and uses the same TDB APIs that you would use in your own Java code to perform the data load. The advantage of this is that it supports incremental loading of data into a TDB database. The downside is that the loader will be slower for initial database builds.\ntdbloader2 is POSIX compliant script based which limits it to running on POSIX systems only. The advantage this gives it is that it is capable of building the database files and indices directly without going through the Java API which makes it much faster. However this does mean that it can only be used for an initial database load since it does not know how to apply incremental updates. Using tdbloader2 on a pre-existing database will cause the existing database to be overwritten.\nOften a good strategy is to use tdbloader2 for your initial database creation and then use tdbloader for smaller incremental updates in the future.\nHow large a Java heap should I use for TDB? TDB uses memory mapped files heavily for providing fast access to data and indices. Memory mapped files live outside of the JVM heap and are managed by the OS therefore it is important to not allocate all available memory to the JVM heap.\nHowever JVM heap is needed for TDB related things like query \u0026amp; update processing, storing the in-memory journal etc and also for any other activities that your code carries out. What you should set the JVM heap to will depend on the kinds of queries that you are running, very specific queries will not need a large heap whereas queries that touch large amounts of data or use operators that may require lots of data to be buffered in-memory e.g. DISTINCT, GROUP BY, ORDER BY may need a much larger heap depending on the overall size of your database.\nThere is no hard and fast guidance we can give you on the exact numbers since it depends heavily on your data and your workload. Please ask on our mailing lists (see our Ask page) and provide as much detail as possible about your data and workload if you would like us to attempt to provide more specific guidance.\nDoes Fuseki/TDB have a memory leak? A number of users have reported a suspected memory leak when using Fuseki/TDB when it used to serve a database that has continuous high load with a mixture of queries and updates. Having investigate the problem this is not a memory leak per-se rather a limitation of how transactions are implemented for TDB.\nTDB uses write-ahead logging so new data is written both to an on-disk journal and kept in-memory. This is necessary because TDB permits a single writer and multiple readers at any one time and readers are guaranteed to always see the state of the database at the time they started reading. Therefore, until there are no active readers it is not possible to update the database directly since readers are actively accessing it hence why a journal is used. The in-memory journal holds some memory that cannot be freed up until such time as the database has no active readers/writers and the changes it holds can be safely flushed to disk.\nThis means that in scenarios where there is continuous high load on the system TDB never reaches a state where it is able to flush the journal eventually causing out of memory errors in Fuseki. You can see if you are experiencing this issue by examining your database directory, if it contains a .jrnl file that is non-empty then Fuseki/TDB is having to hold the journal in-memory.\nHowever, because this relates to transactional use and the journal is also stored on disk no data will be lost, by stopping and restarting Fuseki the journal will be flushed to disk. When using the TDB Java API, the journal can be flushed by closing any datasets and releasing the TDB resources.\nDataset dataset = TDBFactory.createDataset(directory) ; try{ ... dataset.begin(ReadWrite.READ) ; // Perform operations dataset.end() ; ... }finally{ dataset.close(); TDBFactory.release(dataset); } Should I use a SSD? Yes if you are able to\nUsing a SSD boost performance in a number of ways. Firstly bulk loads, inserts and deletions will be faster i.e. operations that modify the database and have to be flushed to disk at some point due to faster IO. Secondly TDB will start faster because the files can be mapped into memory faster.\nSSDs will make the most difference when performing bulk loads since the on-disk database format for TDB is entirely portable and may be safely copied between systems (provided there is no process accessing the database at the time). Therefore even if you can\u0026rsquo;t run your production system with a SSD you can always perform your bulk load on a SSD equipped system first and then move the database to your production system.\nWhy do I get the exception Can\u0026rsquo;t open database at location /path/to/db as it is already locked by the process with PID 1234 when trying to open a TDB database? This exception is a result of TDBs automatic multi-JVM usage prevention, as noted in the earlier Can I share a TDB dataset between multiple applications? question a TDB database can only be safely used by a single JVM otherwise data corruption may occur. From 1.1.0 onwards TDB automatically enforces this restriction wherever possible and you will get this exception if you attempt to access a database which is being accessed from another JVM.\nTo investigate this error use the process management tools for your OS to see what the process ID referenced in the error is. If it is another JVM then the error is entirely valid and you should follow the advice about sharing a TDB dataset between applications. You may need to coordinate with the owner of the other process (if it is not yourself) in order to do this.\nIn rare circumstances you may find that the process is entirely unrelated (this can happen due to stale lock files since they are not always automatically cleared up) in which case you can try and manually remove the tdb.lock file from the database directory. Please only do this if you are certain that the other process is not accessing the TDB database otherwise data corruption may occur.\nI see a warning that Location /path/to/db was not locked, if another JVM accessed this location simultaneously data corruption may have occurred in my logs? This warning can occur in rare circumstances when TDB detects that you are releasing a database location via StoreConnection.release() and that the database was eligible to be locked but wasn\u0026rsquo;t. This can usually only occur if you circumvented the normal TDB database opening procedures somehow.\nAs the warning states data corruption may occur if another JVM accesses the location while your process is accessing it. Ideally you should follow the advice on multi-JVM usage if this might happen, otherwise the warning can likely be safely ignored.\nWhy can\u0026rsquo;t I delete a dataset (MS Windows/64 bit)? Java on MS Windows does not provide the ability to delete a memory mapped file while the JVM is still running. The file is properly deleted when the JVM exits. This is a known issue with Java.\nSee the Java bug database e.g. Bug id: 4724038 and several others. While there are some workarounds mentioned on the web, none is known to always work on all JVMs.\nOn 64 bit systems, TDB uses memory mapped to manage datasets on disk. This means that the operating system dynamically controls how much of a file is held in RAM, trading off against requests by other applications. But it also means the database files are not properly deleted until the JVM exits. A new dataset can not be created in the same location (directory on disk).\nThe workaround is to use a different location.\nWhat is the Unable to check TDB lock owner, the lock file contents appear to be for a TDB2 database. Please try loading this location as a TDB2 database error? As described elsewhere in this FAQ (see Lock Exceptions and No Lock Warning) TDB uses a lock file to ensure that multiple JVMs don\u0026rsquo;t try to use the same TDB database simultaneously as this can lead to data corruption. However with the introduction of TDB2 there are now two versions of TDB, TDB2 also uses a lock file however it uses a slightly different format for that file.\nThis error means that you have tried to open a TDB2 database as a TDB1 database which is not permitted. Please adjust your usage of Jena libraries or command line tools to use TDB2 code/arguments as appropriate.\nFor example if Using TDB2 with Fuseki you would need to use the --tdb2 option.\nMy question isn\u0026rsquo;t answered here? If your question isn\u0026rsquo;t answered here please get in touch with the project, please check out the Ask page for ways to ask for further help.\n","permalink":"https://jena.apache.org/documentation/tdb/faqs.html","tags":null,"title":"TDB FAQs"},{"categories":null,"contents":"All the operations of the Jena API including the SPARQL query and SPARQL Update are supported. The application obtains a model or RDF datasets from TDB then uses it as for any other model or dataset.\nTDB also supports transactions.\nConstructing a model or dataset The class TDBFactory contains the static factory methods for creating and connecting to a TDB-backed graph or an RDF dataset. Models and datasets should be closed after use.\nAn application can specify the model or dataset by:\nGiving a directory name Giving an assembler file If a directory is empty, the TDB files for indexes and node table are created. If the directory contains files from a previous application run, TDB connects to the data already there.\nClosing the model or dataset is important. Any updates made are forced to disk if they have not been written already.\nUsing a directory name // Make a TDB-backed dataset String directory = \u0026quot;MyDatabases/Dataset1\u0026quot; ; Dataset dataset = TDBFactory.createDataset(directory) ; ... dataset.begin(ReadWrite.READ) ; // Get model inside the transaction Model model = dataset.getDefaultModel() ; dataset.end() ; ... dataset.begin(ReadWrite.WRITE) ; model = dataset.getDefaultModel() ; dataset.end() ; ... Using an assembler file // Assembler way: Make a TDB-back Jena model in the named directory. // This way, you can change the model being used without changing the code. // The assembler file is a configuration file. // The same assembler description will work in Fuseki. String assemblerFile = \u0026quot;Store/tdb-assembler.ttl\u0026quot; ; Dataset dataset = TDBFactory.assembleDataset(assemblerFile) ; ... dataset.begin(ReadWrite.READ) ; // Get model inside the transaction Model model = dataset.getDefaultModel() ; dataset.end() ; ... See the TDB assembler documentation for details.\nBulkloader The bulkloader is a faster way to load data into an empty dataset than just using the Jena update operations.\nIt is accessed through the command line utility tdbloader.\nConcurrency TDB support transactions, which is the preferred way to work. It is possible to act directly on the dataset without transaction with a Multiple Reader or Single Writer (MRSW) policy for concurrency access. Applications are expected to adhere to this policy - it is not automatically checked.\nOne gotcha is Java iterators. An iterator that is moving over the database is making read operations and no updates to the dataset are possible while an iterator is being used.\nCaching and synchronization If used non-transactionally, then the application must be aware of the caching and synchronization used by TDB. TDB employs caching at various levels, from RDF terms to disk blocks. It is important to flush all caches to make the file state consistent with the cached states because some caches are write-behind so unwritten changes may be held in-memory.\nTDB provides an explicit call dataset objects for synchronization with disk:\nDataset dataset = ... ; TDB.sync(dataset ); Any dataset or model can be passed to these functions - if they are not backed by TDB then no action is taken and the call merely returns without error.\n","permalink":"https://jena.apache.org/documentation/tdb/java_api.html","tags":null,"title":"TDB Java API"},{"categories":null,"contents":"Query execution in TDB involves both static and dynamic optimizations. Static optimizations are transformations of the SPARQL algebra performed before query execution begins; dynamic optimizations involve deciding the best execution approach during the execution phase and can take into account the actual data so far retrieved.\nThe optimizer has a number of strategies: a statistics based strategy, a fixed strategy and a strategy of no reordering.\nFor the preferred statistics strategy, the TDB optimizer uses information captured in a per-database statistics file. The file takes the form of a number of rules for approximate matching counts for triple patterns. The statistic file can be automatically generated. The user can add and modify rules to tune the database based on higher level knowledge, such as inverse function properties.\nThe commands look for file log4j2.properties in the current directory, as well as the usual log4j2 initialization with property log4j.configurationFile and looking for classpath resource log4j2.properties; there is a default setup of log4j2 built-in.\nQuickstart This section provides a practical how-to.\nLoad data. Generate the statistics file. Run tdbstats. Place the file generated in the database directory with the name stats.opt. Running tdbstats Usage:\ntdbstats --loc=DIR|--desc=assemblerFile [--graph=URI] Choosing the optimizer strategy TDB chooses the basic graph pattern optimizer by the presence of a file in the database directory.\nOptimizer control files\nFile name Effect none.opt No reordering - execute triple patterns in the order in the query fixed.opt Use a built-in reordering based on the number of variables in a triple pattern. stats.opt The contents of this file are the weighing rules (see below). The contents of the files none.opt and fixed.opt are not read and don\u0026rsquo;t matter. They can be zero-length files.\nIf more then one file is found, the choice is made: stats.opt over fixed.opt over none.opt.\nOptimization can be disabled by setting arq:optReorderBGP to false. This can be done in the Assembler file by setting ja:context on the server, dataset, or endpoint:\n[] ja:context [ ja:cxtName \u0026quot;arq:optReorderBGP\u0026quot; ; ja:cxtValue false ] . Filter placement One of the key optimization is of filtered basic graph patterns. This optimization decides the best order of triple patterns in a basic graph pattern and also the best point at which to apply the filters within the triple patterns.\nAny filter expression of a basic graph pattern is placed immediately after all it\u0026rsquo;s variables will be bound. Conjunctions at the top level in filter expressions are broken into their constituent pieces and placed separately.\nInvestigating what is going on TDB can optionally log query execution details. This is controlled by two setting: the logging level and a context setting. Having two setting means it is possible to log some queries and not others.\nThe logger used is called org.apache.jena.arq.exec. Messages are sent at level \u0026ldquo;INFO\u0026rdquo;. So for log4j2, the following can be set in the log4j2.properties file:\n# Execution logging logger.arq-exec.name = org.apache.jena.arq.exec logger.arq-exec.level = INFO logger.arq-info.name = org.apache.jena.arq.info logger.arq-info.level = INFO The context setting is for key (Java constant) ARQ.symLogExec. To set globally:\nARQ.getContext().set(ARQ.symLogExec,true) ; and it may also be set on an individual query execution using its local context.\ntry(QueryExecution qExec = QueryExecution.dataset(dataset) .query(query) .set(ARQ.symLogExec,true) .build() ) { ResultSet rs = qExec.execSelect() ; } On the command line:\ntdbquery --set arq:logExec=true --file queryfile This can also be done in the Assembler file by setting ja:context on the server, dataset, or endpoint:\n[] ja:context [ ja:cxtName \u0026quot;arq:logExec\u0026quot; ; ja:cxtValue \u0026quot;info\u0026quot; ] . Explanation Levels Level Effect INFO Log each query FINE Log each query and it\u0026rsquo;s algebra form after optimization ALL Log query, algebra and every database access (can be expensive) NONE No information logged These can be specified as string, to the command line tools, or using the constants in Explain.InfoLevel.\nqExec.getContext().set(ARQ.symLogExec,Explain.InfoLevel.FINE) ; tdbquery \u0026ndash;explain The --explain parameter can be used for understanding the query execution. An execution can detail the query, algebra and every point at which the dataset is touched.\nFor example, given the sample query execution with tdbquery below\ntdbquery --loc=DB \u0026quot;SELECT * WHERE { ?a ?b ?c }\u0026quot; we can include the --explain parameter to the command\ntdbquery --explain --loc=DB \u0026quot;SELECT * WHERE { ?a ?b ?c }\u0026quot; and increase the logging levels, in order to output more information about the query execution.\n# log4j2.properties log4j.rootLogger=INFO, stdlog log4j.appender.stdlog=org.apache.log4j.ConsoleAppender log4j.appender.stdlog.layout=org.apache.log4j.PatternLayout log4j.appender.stdlog.layout.ConversionPattern=%d{HH:mm:ss} %-5p %-25c{1} :: %m%n status = error name = PropertiesConfig filters = threshold filter.threshold.type = ThresholdFilter filter.threshold.level = INFO appender.console.type = Console appender.console.name = STDOUT appender.console.layout.type = PatternLayout appender.console.layout.pattern = %d{HH:mm:ss} %-5p %-15c{1} :: %m%n rootLogger.level = INFO rootLogger.appenderRef.stdout.ref = STDOUT # the query execution logger # Execution logging logger.arq-exec.name = org.apache.jena.arq.exec logger.arq-exec.level = INFO The command output will be similar to this one.\n00:05:20 INFO exec :: QUERY SELECT * WHERE { ?a ?b ?c } 00:05:20 INFO exec :: ALGEBRA (quadpattern (quad \u0026lt;urn:x-arq:DefaultGraphNode\u0026gt; ?a ?b ?c)) 00:05:20 INFO exec :: TDB (quadpattern (quad \u0026lt;urn:x-arq:DefaultGraphNode\u0026gt; ?a ?b ?c)) 00:05:20 INFO exec :: Execute :: (?a ?b ?c) The logging operation can be expensive, so try to limit it when possible.\nStatistics Rule File The syntax is SSE, a simple format that uses Turtle-syntax for RDF terms, keywords for other terms (for example, the stats marks a statistics data structure), and forms a tree data structure.\nThe structure of a statistics file takes the form:\n(prefix ... (stats (meta ...) rule rule )) that is, a meta block and a number of pattern rules.\nA simple example:\n(prefix ((: \u0026lt;http://example/)) (stats (meta (timestamp \u0026quot;2008-10-23T10:35:19.122+01:00\u0026quot;^^\u0026lt;http://www.w3.org/2001/XMLSchema#dateTime\u0026gt;) (run@ \u0026quot;2008/10/23 10:35:19\u0026quot;) (count 11)) (:p 7) (\u0026lt;http://example/q\u0026gt; 7) )) This example statistics file contains some metadata about statistics (time and date the file was generated, size of graph), the frequence count for two predicates http://example/p (written using a prefixed name) and http://example/q (written in full).\nThe numbers are the estimated counts. They do not have to be exact\nthey guide the optimizer in choosing one execution plan over another. They do not have to exactly up-to-date providing the relative counts are representative of the data. Statistics Rule Language A rule is made up of a triple pattern and a count estimation for the approximate number of matches that the pattern will yield. This does have to be exact, only an indication.\nIn addition, the optimizer considers which variables will be bound to RDF terms by the time a triplepatetrn is reached in the execution plan being considered. For example, in the basic graph pattern:\n{ ?x :identifier 1234 . ?x :name ?name . } then ?x will be bound in pattern ?x :name ?name to an RDF term if executed after the pattern ?x :identifier 1234.\nA rule is of the form:\n( (subj pred obj) count) where subj, pred, obj are either RDF terms or one of the tokens in the following table:\nStatistic rule tokens Token | Description TERM | Matches any RDF term (URI, Literal, Blank node) VAR | Matches a named variable (e.g. ?x) URI | Matches a URI LITERAL | Matches an RDF literal BNODE | Matches an RDF blank node (in the data) ANY |Matches anything - a term or variable\nFrom the example above, (VAR :identifier TERM) will match ?x :identifier 1234.\n(TERM :name VAR) will match ?x :name ?name when in a potential plan where the :identifier triple pattern is first because ?x will be a bound term at that point but not if this triple pattern is considered first.\nWhen searching for a weighting of a triple pattern, the first rule to match is taken.\nThe rule which says an RDF graph is a set of triples:\n((TERM TERM TERM) 1) is always implicitly present.\nBNODE does not match a blank node in the query (which is a variable and matches VAR) but in the data, if it is known that slot of a triple pattern is a blank node.\nAbbreviated Rule Form While a complete rule is of the form:\n( (subj pred obj) count) there is an abbreviated form:\n(predicate count) The abbreviated form is equivalent to writing:\n((TERM predicate ANY) X) ((ANY predicate TERM) Y) ((ANY predicate ANY) count) where for small graphs (less that 100 triples) X=2, Y=4 but Y=40 if the predicate is rdf:type and 2, 10, 1000 for large graphs. Use of \u0026ldquo;VAR rdf:type Class\u0026rdquo; can be a quite unselective triple pattern and so there is a preference to move it later in the order of execution to allow more selective patterns reduce the set of possibilities first. The astute reader may notice that ontological information may render it unnecessary (the domain or range of another property implies the class of some resource). TDB does not currently perform this optimization.\nThese number are merely convenient guesses and the application can use the full rules language for detailed control of pattern weightings.\nDefaults A rule of the form:\n(other number) is used when no matches from other rules (abbreviated or full) when matching a triple pattern that has a URI in the predicate position. If a rule of this form is absent, the default is to place the triple pattern after all known triple patterns; this is the same as specifying -1 as the number. To declare that the rules are complete and no other predicates occur in the data, set this to 0 (zero) because the triple pattern can not match the data (the predicate does not occur).\nGenerating a statistics file The command line tdbstats will scan the data and produce a rules file based on the frequency of properties. The output should first go to a temporary file, then that file moved into the database location.\nPractical tip: Don\u0026rsquo;t feed the output of this command directly to location/stats.opt because when the command starts it will find an empty statistics file at that location.\nGenerating statistics for Union Graphs By default tdbstats only processes the default graph of a dataset. However in some circumstances it is desirable to have the statistics generated over Named Graphs in the dataset.\nThe tdb:unionDefaultGraph option will cause TDB to synthesize a default graph for SPARQL queries, from the union of all Named Graphs in the dataset.\nIdeally the statistics file should be generated against this union graph. This can be achieved using the --graph option as follows:\ntdbstats --graph urn:x-arq:UnionGraph --loc /path/to/indexes The graph parameter uses a built-in TDB special graph name\nWriting Rules Rule for an inverse functional property:\n((VAR :ifp TERM) 1 ) and even if a property is only approximately identifying for resources (e.g. date of birth in a small dataset of people), it is useful to indicate this. Because the counts needed are only approximations so the optimizer can choose one order over another, and does not need to predicate exact counts, rules that are usually right but may be slightly wrong are still useful overall.\nRules involving rdf:type can be useful where they indicate whether a particular class is common or not. In some datasets\n((VAR rdf:type class) ...) may help little because a property whose domain is that class, or a subclass, may be more elective. SO a rule like:\n((VAR :property VAR) ...) is more selective.\nIn other datasets, there may be many classes, each with a small number of instances, in which case\n((VAR rdf:type class) ...) is a useful selective rule.\n","permalink":"https://jena.apache.org/documentation/tdb/optimizer.html","tags":null,"title":"TDB Optimizer"},{"categories":null,"contents":"This page describes how to filter quads at the lowest level of TDB. It can be used to hide certain quads (triples in named graphs) or triples.\nThe code for the example on this page can be found in the TDB examples Filtering quads should be used with care. The performance of the tuple filter callback is critical.\nSee also Dynamic Datasets to select only certain specified named graphs for a query.\nTDB will call a registered filter on every quad that it retrieves from any of the indexes, both quads (for named graphs) and triples (for the stored default graph). This filter indicates whether to accept or reject the quad or triple. This happens during basic graph pattern processing.\nA rejected quad is simply not processed further in the basic graph pattern and it is as if it is not in the dataset.\nThe filter has a signature of: java.util.function.Predicate\u0026lt;Tuple\u0026lt;NodeId\u0026gt;\u0026gt; with a type parameter of Tuple\u0026lt;NodeId\u0026gt;. NodeId is the low level internal identifier TDB uses for RDF terms. Tuple is a class for an immutable tuples of values of the same type.\n/** Create a filter to exclude the graph http://example/g2 */ private static Predicate\u0026lt;Tuple\u0026lt;NodeId\u0026gt;\u0026gt; createFilter(Dataset ds) { DatasetGraphTransaction dst = (DatasetGraphTransaction)(ds.asDatasetGraph()) ; DatasetGraphTDB dsg = dst.getBaseDatasetGraph(); NodeTable nodeTable = dsg.getQuadTable().getNodeTupleTable().getNodeTable() ; // Filtering operates at a very low level: // need to know the internal identifier for the graph name. final NodeId target = nodeTable.getNodeIdForNode(Node.createURI(\u0026quot;http://example/g2\u0026quot;)) ; // Filter for accept/reject as quad as being visible. Predicate\u0026lt;Tuple\u0026lt;NodeId\u0026gt;\u0026gt; filter = item -\u0026gt; { // Quads are 4-tuples, triples are 3-tuples. if ( item.size() == 4 \u0026amp;\u0026amp; item.get(0).equals(target) ) // reject return false ; // Accept return true ; } ; return filter ; } To install a filter, put it in the context of a query execution under the symbol SystemTDB.symTupleFilter then execute the query as normal.\nDataset ds = ... ; Predicate\u0026lt;Tuple\u0026lt;NodeId\u0026gt;\u0026gt; filter = createFilter(ds) ; Query query = ... ; try (QueryExecution qExec = QueryExecution.dataset(ds) .query(query) .set(SystemTDB.symTupleFilter, filter) .build() ) { ResultSet rs = qExec.execSelect() ; ... } ","permalink":"https://jena.apache.org/documentation/tdb/quadfilter.html","tags":null,"title":"TDB Quad Filter"},{"categories":null,"contents":"TDB can run on 32 bit or 64 bit JVMs. It adapts to the underlying architecture by choosing different file access mechanisms. 64 bit Java is preferred for large scale and production deployments. On 64 bit Java, TDB uses memory mapped files.\nOn 32 bit platforms, TDB uses in-heap caching of file data. In practice, the JVM heap size should be set to at least 1Gbyte. While there is no inherent scaling limits on the size of the database but, in practice, only one large dataset can be handled per TDB instance.\nThe on-disk file format is compatible between 32 and 64 bit systems and databases can be transferred between systems by file copy if the databases are not in use (no TDB instance is accessing them at the time). Databases can not be copied while TDB is running, even if TDB is not actively processing a query or update.\n","permalink":"https://jena.apache.org/documentation/tdb/requirements.html","tags":null,"title":"TDB Requirements"},{"categories":null,"contents":"TDB provides ACID transaction support through the use of write-ahead-logging in TDB1 and copy-on-write MVCC structures in TDB2.\nUse of transactions protects a TDB dataset against data corruption, unexpected process termination and system crashes.\nNon-transactional use of TDB1 should be avoided; TDB2 only operates with transactions.\nOverview TDB2 uses MVCC via a copy-on-write mechanism. Update transactions can be of any size.\nThe TDB1 transaction mechanism is based on write-ahead-logging. All changes made inside a write-transaction are written to journals, then propagated to the main database at a suitable moment. Transactions in TDB1 are limited in size to a few 10\u0026rsquo;s of million triples because they retain data in-memory until indexes can be updated.\nTransactional TDB supports one active write transaction, and multiple read transactions at the same time. Read-transactions started before a write-transaction commits see the database in a state without any changes visible. Any transaction starting after a write-transaction commits sees the database with the changes visible, whether fully propagates back to the database or not. There can be active read transactions seeing the state of the database before the updates, and read transactions seeing the state of the database after the updates running at the same time.\nTransactional TDB works with SPARQL Query, SPARQL Update, SPARQL Graph Store Update as well as the full Jena API.\nTDB provides Serializable transactions, the highest isolation level.\nLimitations (some of these limitations may be removed in later versions)\nBulk loads: the TDB bulk loader is not transactional Nested transactions are not supported. TDB2 removed the limitations of TDB1:\nSome active transaction state is held exclusively in-memory, limiting scalability. Long-running transactions. Read-transactions cause a build-up of pending changes; If a single read transaction runs for a long time when there are many updates, the TDB1 system will consume a lot of temporary resources.\nAPI for Transactions Ths section uses the primitives of the transaction mechanism.\nBetter APIs are described in the transaction API documentation.\nRead transactions These are used for SPARQL queries and code using the Jena API actions that do not change the data. The general pattern is:\ndataset.begin(ReadWrite.READ) ; try { ... } finally { dataset.end() ; } The dataset.end() declares the end of the read transaction. Applications may also call dataset.commit() or dataset.abort() which all have the same effect for a read transaction.\nLocation location = ... ; Dataset dataset = ... ; dataset.begin(ReadWrite.READ) ; String qs1 = \u0026quot;SELECT * {?s ?p ?o} LIMIT 10\u0026quot; ; try(QueryExecution qExec = QueryExecution.dataset(dataset).query(qs1).build() ) { ResultSet rs = qExec.execSelect() ; ResultSetFormatter.out(rs) ; } String qs2 = \u0026quot;SELECT * {?s ?p ?o} OFFSET 10 LIMIT 10\u0026quot; ; try(QueryExecution qExec = QueryExecution.dataset(dataset).query(qs2).build() ) { rs = qExec.execSelect() ; ResultSetFormatter.out(rs) ; } Write transactions These are used for SPARQL queries, SPARQL updates and any Jena API actions that modify the data. Beware that large model.read operations consume large amounts of temporary space.\nThe general pattern is:\ndataset.begin(ReadWrite.WRITE) ; try { ... dataset.commit() ; } finally { dataset.end() ; } The dataset.end() will abort the transaction is there was no call to dataset.commit() or dataset.abort() inside the write transaction.\nOnce dataset.commit() or dataset.abort() is called, the application needs to start a new transaction to perform further operations on the dataset.\nLocation location = ... ; Dataset dataset = ... ; dataset.begin(ReadWrite.WRITE) ; try { Model model = dataset.getDefaultModel() ; // API calls to a model in the dataset model.add( ... ) // A SPARQL query will see the new statement added. try (QueryExecution qExec = QueryExecution.dataset(dataset) .query(\u0026quot;SELECT (count(*) AS ?count) { ?s ?p ?o} LIMIT 10\u0026quot;) .build() ) { ResultSet rs = qExec.execSelect() ; ResultSetFormatter.out(rs) ; } // ... perform a SPARQL Update String sparqlUpdateString = StrUtils.strjoinNL( \u0026quot;PREFIX . \u0026lt;http://example/\u0026gt;\u0026quot;, \u0026quot;INSERT { :s :p ?now } WHERE { BIND(now() AS ?now) }\u0026quot; ) ; UpdateRequest request = UpdateFactory.create(sparqlUpdateString) ; UpdateExecution.dataset(dataset).update(request).execute(); // Finally, commit the transaction. dataset.commit() ; // Or call .abort() } finally { dataset.end() ; } Multi-threaded use Each dataset object has one transaction active at a time per thread. A dataset object can be used by different threads, with independent transactions.\nThe usual idiom within multi-threaded applications is to have one dataset per thread, and so there is one transaction per thread.\nEither:\n// Create a dataset and keep it globally. Dataset dataset = TDBFactory.createDataset(location) ; Thread 1:\ndataset.begin(ReadWrite.WRITE) ; try { ... dataset.commit() ; } finally { dataset.end() ; } Thread 2:\ndataset.begin(ReadWrite.READ) ; try { ... } finally { dataset.end() ; } or create a dataset object on the thread:\nThread 1:\nDataset dataset = TDBFactory.createDataset(location) ; dataset.begin(ReadWrite.WRITE) ; try { ... dataset.commit() ; } finally { dataset.end() ; } Thread 2:\nDataset dataset = TDBFactory.createDataset(location) ; dataset.begin(ReadWrite.READ) ; try { ... } finally { dataset.end() ; } Each thread has a separate dataset object; these safely share the same storage. in both cases, the transactions are independent.\nMulti JVM Multiple applications, running in multiple JVMs, using the same file databases is not supported and has a high risk of data corruption. Once corrupted a database cannot be repaired and must be rebuilt from the original source data. Therefore there must be a single JVM controlling the database directory and files. TDB includes automatic prevention against multi-JVM usage which prevents this under most circumstances.\nUse Fuseki to provide a database server for multiple applications. Fuseki supports SPARQL Query, SPARQL Update and the SPARQL Graph Store protocol.\nBulk loading Bulk loaders are not transactional.\n","permalink":"https://jena.apache.org/documentation/tdb/tdb_transactions.html","tags":null,"title":"TDB Transactions"},{"categories":null,"contents":"API for Transactions TDB1 and TDB2 TDB1, the original native TDB database for Apache Jena, and TDB2 are related but different systems. Their transaction systems both provide Serializable transactions, the highest isolation level. with one active write transaction and multiple read transactions at the same time.\nTDB2 does not have the transaction size limitations of TDB1.\nTDB1 The transaction mechanism in TDB is based on write-ahead-logging. All changes made inside a write-transaction are written to journals, then propagated to the main database at a suitable moment. This design allows for read-transactions to proceed without locking or other overhead over the base database.\nTransactional TDB supports one active write transaction, and multiple read transactions at the same time. Read-transactions started before a write-transaction commits see the database in a state without any changes visible. Any transaction starting after a write-transaction commits sees the database with the changes visible, whether fully propagates back to the database or not. There can be active read transactions seeing the state of the database before the updates, and read transactions seeing the state of the database after the updates running at the same time.\nTDB provides Serializable transactions, the highest isolation level.\nLimitations Nested transactions are not supported. Some active transaction state is held exclusively in-memory, limiting scalability. Long-running transactions. Read-transactions cause a build-up of pending changes; If a single read transaction runs for a long time when there are many updates, the system will consume a lot of temporary resources.\nTDB2 The transaction mechanism in TDB2 is based on MVCC using immutable datastructrures, which are known as \u0026ldquo;persistent data structures\u0026rdquo; (although this name, for the functional programming community, is slightly confusing).\nLimitations (some of these limitations may be removed in later versions)\nBulk loads: the TDB2 bulk loader is not transactional Nested transactions are not supported. ","permalink":"https://jena.apache.org/documentation/txn/transactions_tdb.html","tags":null,"title":"TDB Transactions"},{"categories":null,"contents":"TDB canonicalizes certain XSD datatypes. The value of literals of these datatypes is stored, not the original lexical form. For example, \u0026quot;01\u0026quot;^^xsd:integer, \u0026quot;1\u0026quot;^^xsd:integer and \u0026quot;+001\u0026quot;^^xsd:integer are all the same value and are stored as the same RDF literal. In addition, derived types for integers are also understood by TDB. For example, \u0026quot;01\u0026quot;^^xsd:integer and \u0026quot;1\u0026quot;^^xsd:byte are the same value.\nWhen RDF terms for these values are returned, the lexical form will be the canonical representation.\nOnly certain ranges of values are directly encoded as values. If a literal is outside the canonicalization range, its lexical representation is stored. TDB transparently switches between value and non-value based literals in graph matching and filter expressions; non-canonicalized and canonicalized values will be compared as needed.\n(Future versions of TDB may increase the ranges canonicalized.)\nThe datatypes canonicalized by TDB are:\nXSD decimal (canonicalized range: 8 bits of scale, signed 48 bits of value) XSD integer (canonicalized range: 56 bits) XSD dateTime (canonicalized range: 0 to the year 8000, millisecond accuracy, timezone to 15 minutes). XSD date (canonicalized range: 0 to the year 8000, timezone to 15 minutes). XSD boolean (canonicalized range: true and false) ","permalink":"https://jena.apache.org/documentation/tdb/value_canonicalization.html","tags":null,"title":"TDB Value Canonicalization"},{"categories":null,"contents":"TDB runs on both 32-bit and 64-bit Java Virtual Machines. The same file formats are used on both systems and database files can be transferred between architectures (no TDB system should be running for the database at the time of copy). The difference is that a different file access mechanism used.\nThe file access mechanism can be set explicitly, but this is not a good idea for production usage, only for experimentation - see the File Access mode option.\n64-bit Java On 64-bit Java, TDB uses memory mapped files and the operating system handles much of the caching between RAM and disk. The amount of RAM used for file caching increases and decreases as other application run on the machine. The fewer other programs running on the machine, the more RAM will be available for file caching.\nTDB is faster on a 64 bit JVM because more memory is available for file caching.\n32-bit Java On 32-bit Java, TDB uses its own file caching to enable large databases. 32-bit Java limits the address space of the JVM to about 1.5Gbytes (the exact size is JVM-dependent), and this includes memory mapped files, even though they are not in the Java heap. The JVM heap size may need to be increased to make space for the disk caches used by TDB.\nDisk Format The on-disk file format is compatible between 32 and 64 bit systems and databases can be transferred between systems by file copy if the databases are not in use (no TDB or Fuseki instance is accessing them at the time). Databases can not be copied while TDB is running, even if TDB is not actively processing a query or update.\n","permalink":"https://jena.apache.org/documentation/tdb/tdb_system.html","tags":null,"title":"TDB with 64-bit and 32-bit JVMs"},{"categories":null,"contents":"TDB xloader (\u0026ldquo;x\u0026rdquo; for external) is a bulkloader for very large datasets. The goal is stability and reliability for long running loading, running on modest hardware and can be use to load a database on rotating disk or SSD.\nxloader is not a replacement for regular TDB1 and TDB2 loaders. It is for very large datasets.\nThere are two scripts to load data using the xloader subsystem.\n\u0026ldquo;tdb1.xloader\u0026rdquo;, which was called \u0026ldquo;tdbloader2\u0026rdquo;, has some improvements.\nIt is not as fast as other TDB loaders on datasets where the general loaders work without encountering progressive slowdown.\nThe xloaders for TDB1 and TDB2 are not identical. The TDB2 xloader is more capable; it is based on the same design approach with further refinements to building the node table and to reduce the total amount of temporary file space used.\nThe xloader does not run on MS Windows. It uses an external sort program from unix - sort(1).\nThe xloader only builds a fresh database from empty. It can not be used to load an existing database.\nRunning xloader tdb2.xloader --loc DIRECTORY FILE\u0026hellip;\nor\ntdb1.xloader --loc DIRECTORY FILE\u0026hellip;\nAdditionally, there is an argument --tmpdir to use a different directory for temporary files.\nFILE is any RDF syntax supported by Jena. Syntax is determined by the file extension and can include an addtional \u0026ldquo;.gz\u0026rdquo; or \u0026ldquo;.bz2\u0026rdquo; for compressed files.\ntdb2.xloader also supports argument --threads to set the number of threads to use with sort(1). The default is 2. The recommendation for an initial setting is to set it to the number of cores (not hardware threads) minus 1. This is sensitive to the hardware environment. Experimentation may show a different, better setting.\nAdvice To avoid a load failing due to a syntax or other data error, it is advisable to run riot --check on the data first. Parsing is faster than loading.\nThe TDB databases will take up a lot of disk space and in addition during loading xloader uses a significant amount of temporary disk space.\nIf desired, the data can be converted to RDF Thrift at this stage by adding --stream rdf-thrift to the riot checking run. Parsing RDF Thrift is faster than parsing N-Triples although the bulk of the loading process is not limited by parser speed.\nDo not capture the bulk loader output in a file on the same disk as the database or temporary directory; it slows loading down.\n","permalink":"https://jena.apache.org/documentation/tdb/tdb-xloader.html","tags":null,"title":"TDB xloader"},{"categories":null,"contents":"TDB2 is a component of Apache Jena for RDF storage and query. It supports the full range of Jena APIs. TDB2 can be used as a high performance RDF store on a single machine. TDB2 can be used with Apache Jena Fuseki.\nTDB1 is the previous generation native storage system for Jena.\nCompared to TDB1:\nNo size limits on transactions : bulk uploads into a live Fuseki can be 100\u0026rsquo;s of millions of triples. Models and Graphs can be passed across transactions Transactional only (there is currently no \u0026ldquo;autocommit\u0026rdquo; mode). Better transaction control No queue of delayed updates No backlog problems. \u0026ldquo;Writer pays\u0026rdquo; - readers don\u0026rsquo;t Datatypes of numerics preserved; xsd:doubles supported. TDB2 is not compatible with TDB1\nDocumentation Migrating from TDB1 Use with Fuseki2 Command line tools Database administration ","permalink":"https://jena.apache.org/documentation/tdb2/","tags":null,"title":"TDB2"},{"categories":null,"contents":"TDB2 is not compatible with TDB1\nDo not run TDB1 tools on a TDB2 database, nor run TDB2 tools on a TDB1 database.\nThese scripts are available jena binary distribution.\ntdb2.tdbbackup tdb2.tdbdump tdb2.tdbcompact tdb2.tdbloader tdb2.tdbquery tdb2.tdbupdate tdb2.tdbstats On MS Windows, these commands are called tdb2_tdbquery etc.\nExample usage:\ntdb2.tdbloader --loc \u0026lt;DB location\u0026gt; file1 file2 ... Note:\ntdbloader2 is a TDB1 command tool.\ntdb2.tdbloader Basic usage: load files into a database at location \u0026ldquo;DB\u0026rdquo;:\ntdb2.tdbloader --loc DB file1 file2 .... To load the data into a named graph, use the --graph=IRI argument:\ntdb2.tdbloader --loc DB --graph=https://example.org/graph#name file1 For the complete syntax and list of all arguments use --help:\ntdb2.tdbloader --help All TDB2 loaders can update datasets and do not have to work on an empty dataset. However, only the basic and sequential loader are fully transactional in the presence of crashes. The other loaders, while faster, work by manipulating the low-level datastructures, and are tuned for large changes of data. They do not provide perfect transaction isolation in case a load goes wrong for some reason. The multiphase loading operations use partial transactions which can leave the database in a strange state.\nWhen working with large data to load, it is advisable to check it completely first with riot --validate. Parse errors during loading can lead to inconsistent indexing. Fixing bad data, even if legal RDF, such as bad lexical forms of literals or bad URIs, is much easier before the data is in the database.\nBecause loading in hardware dependent, the right choice for any situation can only be found by trying each loader to see what works best and the notes below are only initial guidance. The default choice is a reasonable starting point. Closing all applications to release their memory and not use CPU improves the loading process performance.\nLoading very large datasets (like Wikidata) with tdb2.tdbloader may sometimes on linux configurations fail with errors like:\nNative memory allocation (mmap) failed to map 65536 bytes for committing reserved memory. This can be avoided by adding a larger value to the vm.max_map_count option. The command sudo sysctl -w vm.max_map_count=262144 updates the value for your current session, or you can persist the change by editing the value in /etc/sysctl.conf or in /etc/sysctl.d/* override files if available.\nLoader options The choice of loader is given by the optional --loader argument.\n--loader=basic\nThe basic loader loads data as a single transaction into the dataset on a single thread. It is suitable for small data and also for incrementally adding to a dataset safely, A machine crash while running this loader will not invalidate the database; the load simply will not happen.\n--loader=sequential\nThe sequential loader is a single threaded loader that loads the primary index then each of the other indexes. It is suitable only for low resource hardware, especially in a low I/O bandwidth situation.\n--loader=phased (default)\nThe phased loader, the default if no --loader argument is provided, is balance between performance and hardware demands.\nIt used multiple threads for both the initial loading (3 worker threads) and then 2 threads in parallel for building the other indexes.\n--loader=parallel\nThe parallel loader runs all operations at once. It can deliver the best performance providing enough RAM is available and the persistent storage is SSD. It can consume all hardware resources, greatly impacting any other applications running.\ntdb2.tdbstats Produce statistics for the dataset, which can be used for optimization rules. See the TDB Optimizer description..\nFor TDB2 the statistic files is read and placed in the Data-NNNN directory (Data-0001/stats.opt).\n","permalink":"https://jena.apache.org/documentation/tdb2/tdb2_cmds.html","tags":null,"title":"TDB2 - Command Line Tools"},{"categories":null,"contents":"TDB2 directory layout A TDB2 database is contained in a directory location DIR as:\nDIR/ Backups/ Data-0001/ Data-0002/ tdb.lock where Data-NNNN are the compacted generations of the database. The highest number is the currently live database. The others are not used and not touched by the TDB2 subsystem. They can be deleted, moved elsewhere, or compressed as required. Each is a valid database in it own right.\nBackups is the directory used to write backup files.\ntdb.lock is the lock file to stop multiple use of the same database at the same time by different JVM processes. (If you wish to share a database between processes, or machines, consider using Fuseki2 with TDB2.\nCompaction TDB2 databases grow over time as updates occur. They can be compacted by calling:\nDatabaseMgr.compact(dataset.asDatasetGraph()); Compaction can be done on a live database. Read requests will continue to be serviced; write request are held up until compaction has finished. This can be a long time for large databases.\nCompaction creates a new Data-NNNN subdirectory and copied over the latest view of the RDF dataset into that directory, then switch to using that generation of the database.\nThere is also a command line tool tdb2.tdbcompact to run the compaction process on a database not in use. The command line option --deleteOld removes the last database after compaction.\nCompaction can also be called from the Fuseki HTTP Administration Protocol for live Fuseki webapps.\nBackup A TDB2 database can be backed up by calling:\nDatabaseMgr.backup(dataset.asDatasetGraph()); which will create a dump file including a timestamp:\nlocation/Backups/backup-yyyy-MM-dd_HH:mm:ss.nq.gz The file is a compressed N-Quads file.\nBackup can be done on a live database. It takes a consistent view of the data and does not include any updates committed after the backup starts.\nBackup can be called on a live database and read and write transactions continue to be serviced.\nThere is also a command line tool tdb2.tdbbackup to run the backup process on a database not in use.\n","permalink":"https://jena.apache.org/documentation/tdb2/tdb2_admin.html","tags":null,"title":"TDB2 - Database Administration"},{"categories":null,"contents":"Migrating Data TDB2 is not compatible with TDB1. Data must be reloaded from RDF again.\nMigrating Code Simple migration of code is to use TDB2Factory in place of TDBFactory to create datasets. DatasetGraph objects are now created via DatabaseMgr.\nBeware that many classes have the same name in TDB1 and TDB2 but are in different packages. The base package name for TDB2 is org.apache.jena.tdb2.\nExample code: TDB2Factory\nimport org.apache.jena.tdb2.TDB2Factory; ... public static void main(String[] args) { Dataset ds = TDB2Factory.createDataset() ; Txn.execWrite(ds, ()-\u0026gt;{ RDFDataMgr.read(ds, \u0026#34;SomeData.ttl\u0026#34;); }) ; Txn.execRead(dsg, ()-\u0026gt;{ RDFDataMgr.write(System.out, ds, Lang.TRIG) ; }) ; } ","permalink":"https://jena.apache.org/documentation/tdb2/tdb2_migration.html","tags":null,"title":"TDB2 - Migration from TDB1"},{"categories":null,"contents":"TDB2 is incorporated into Fuseki2, both in the full server, with UI, and embeddable Fuseki2 main server.\nThe TDB2 database can be in a configuration file, either a complete server configuration (see below) or as an entry in the FUSEKI_BASE/configuration/ area of the full server.\nThe command line start-up for Fuseki (both full and basic versions) uses the --tdb2 flag to modify the --loc argument to work with a TDB2 dataset.\nExample complete server configuration file for full or basic servers: The base URL will be of the form http://_host:port_/tdb2-database.\nNote the tdb2: prefix.\nPREFIX : \u0026lt;#\u0026gt; PREFIX fuseki: \u0026lt;http://jena.apache.org/fuseki#\u0026gt; PREFIX rdf: \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026gt; PREFIX rdfs: \u0026lt;http://www.w3.org/2000/01/rdf-schema#\u0026gt; # tdb2 in this line, below. PREFIX tdb2: \u0026lt;http://jena.apache.org/2016/tdb#\u0026gt; PREFIX ja: \u0026lt;http://jena.hpl.hp.com/2005/11/Assembler#\u0026gt; [] rdf:type fuseki:Server ; fuseki:services ( \u0026lt;#service_tdb2\u0026gt; ) . \u0026lt;#service_tdb2\u0026gt; rdf:type fuseki:Service ; rdfs:label \u0026#34;TDB2 Service (RW)\u0026#34; ; fuseki:name \u0026#34;tdb2-database\u0026#34; ; fuseki:serviceQuery \u0026#34;query\u0026#34; ; fuseki:serviceQuery \u0026#34;sparql\u0026#34; ; fuseki:serviceUpdate \u0026#34;update\u0026#34; ; fuseki:serviceReadWriteGraphStore \u0026#34;data\u0026#34; ; # A separate read-only graph store endpoint: fuseki:serviceReadGraphStore \u0026#34;get\u0026#34; ; fuseki:dataset \u0026lt;#tdb_dataset_readwrite\u0026gt; ; . \u0026lt;#tdb_dataset_readwrite\u0026gt; rdf:type \u0026lt;b\u0026gt;tdb2:DatasetTDB2\u0026lt;/b\u0026gt; ; \u0026lt;b\u0026gt;tdb2:location\u0026lt;/b\u0026gt; \u0026#34;TDB2\u0026#34; ; ## This is supported: tdb2:unionDefaultGraph true ; . This example is available in config-tdb2.ttl\nThe key difference is the declared rdf:type of the dataset.\nNote that the Fuseki UI does not provide a way to create TDB2 databases; a configuration file must be used. Once setup, upload, query and graph editing will be routed to the TDB2 database.\nFor a service configuration in FUSEKI_BASE/configuration/:\nPREFIX : \u0026lt;#\u0026gt; PREFIX fuseki: \u0026lt;http://jena.apache.org/fuseki#\u0026gt; PREFIX rdf: \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026gt; PREFIX rdfs: \u0026lt;http://www.w3.org/2000/01/rdf-schema#\u0026gt; PREFIX tdb2: \u0026lt;http://jena.apache.org/2016/tdb#\u0026gt; PREFIX ja: \u0026lt;http://jena.hpl.hp.com/2005/11/Assembler#\u0026gt; \u0026lt;#service_tdb2\u0026gt; rdf:type fuseki:Service ; rdfs:label \u0026#34;TDB2 Service (RW)\u0026#34; ; fuseki:name \u0026#34;tdb2-database\u0026#34; ; fuseki:serviceQuery \u0026#34;query\u0026#34; ; fuseki:serviceQuery \u0026#34;sparql\u0026#34; ; fuseki:serviceUpdate \u0026#34;update\u0026#34; ; fuseki:serviceReadWriteGraphStore \u0026#34;data\u0026#34; ; # A separate read-only graph store endpoint: fuseki:serviceReadGraphStore \u0026#34;get\u0026#34; ; fuseki:dataset \u0026lt;#tdb_dataset_readwrite\u0026gt; ; . \u0026lt;#tdb_dataset_readwrite\u0026gt; rdf:type \u0026lt;b\u0026gt;tdb2:DatasetTDB2\u0026lt;/b\u0026gt; ; \u0026lt;b\u0026gt;tdb2:location\u0026lt;/b\u0026gt; \u0026#34;TDB2\u0026#34; ; ## tdb2:unionDefaultGraph true ; . ","permalink":"https://jena.apache.org/documentation/tdb2/tdb2_fuseki.html","tags":null,"title":"TDB2 - Use with Fuseki2"},{"categories":null,"contents":"This section provides some basic reference notes on the core Jena RDF API. For a more tutorial introduction, please see the tutorials.\nCore concepts Graphs, models In Jena, all state information provided by a collection of RDF triples is contained in a data structure called a Model. The model denotes an RDF graph, so called because it contains a collection of RDF nodes, attached to each other by labelled relations. Each relationship goes only in one direction, so the triple:\nexample:ijd foaf:name \u0026#34;Ian\u0026#34; can be read as \u0026lsquo;resource example:ijd has property foaf:name with value \u0026quot;Ian\u0026quot;\u0026rsquo;. Clearly the reverse is not true. Mathematically, this makes the model an instance of a directed graph.\nIn Java terms, we use the class Model as the primary container of RDF information contained in graph form. Model is designed to have a rich API, with many methods intended to make it easier to write RDF-based programs and applications. One of Model\u0026rsquo;s other roles is to provide an abstraction over different ways of storing the RDF nodes and relations: in-memory data structures, disk-based persistent stores and inference engines, for example, all provide Model as a core API.\nWhile this common abstraction is appealing to API users, it is less convenient when trying to create a new abstraction over a different storage medium. For example, suppose we wanted to present an RDF triples view of an LDAP store by wrapping it as a Jena Model. Internally, Jena uses a much simpler abstraction, Graph as the common interface to low-level RDF stores. Graph has a much simpler API, so is easier to re-implement for different store substrates.\nIn summary there are three distinct concepts of RDF containers in Jena:\ngraph, a mathematical view of the directed relations between nodes in a connected structure Model, a rich Java API with many convenience methods for Java application developers Graph, a simpler Java API intended for extending Jena\u0026rsquo;s functionality. As an application developer, you will mostly be concerned with Model.\nNodes: resources, literals and blank nodes So if RDF information is contained in a graph of connected nodes, what do the nodes themselves look like? There are two distinct types of nodes: URI references and literals. Essentially, these denote, respectively, some resource about which we wish to make some assertions, and concrete data values that appear in those assertions. In the example above, example:ijd is a resource, denoting a person, and \u0026quot;Ian\u0026quot; denotes the value of a property of that resource (that property being first name, in this case). The resource is denoted by a URI, shown in abbreviated form here (about which more below).\nWhat is the nature of the relationship between the resource node in the graph (example:ijd) and an actual person (the author of this document)? That turns out to be a surprisingly subtle and complex matter, which we won\u0026rsquo;t dwell on here. See this very good summary of the issues by Jeni Tennison for a detailed analysis. Suffice to say here that resources - somehow - denote the things we want to describe in an RDF model.\nA resource represented as a URI denotes a named thing - it has an identity. We can use that identity to refer to directly the resource, as we will see below. Another kind of node in the graph is a literal, which just represents a data value such as the string \u0026quot;ten\u0026quot; or the number 10. Literals representing values other than strings may have an attached datatype, which helps an RDF processor correctly convert the string representation of the literal into the correct value in the computer. By default, RDF assumes the datatypes used XSD are available, but in fact any datatype URI may be used.\nRDF allows one special case of resources, in which we don\u0026rsquo;t actually know the identity (i.e. the URI) of the resource. Consider the sentence \u0026ldquo;I gave my friend five dollars\u0026rdquo;. We know from this claim that I have friend, but we don\u0026rsquo;t know who that friend is. We also know a property of the friend - namely that he or she is five dollars better off than before. In RDF, we can model this situation by using a special type of resource called an anonymous resource. In the RDF semantics, an anonymous resource is represented as having an identity which is blank, so they are often referred to as nodes in the graph with blank identities, or blank nodes, typically shortened to bNodes.\nIn Jena, the Java interface Resource represents both ordinary URI resources and bNodes (in the case of a bNode, the getURI() method returns null, and the isAnon() method returns true). The Java interface Literal represents literals. Since both resources and literals may appear as nodes in a graph, the common interface RDFNode is a super-class of both Resource and Literal.\nTriples In an RDF graph, the relationships always connect one subject resource to one other resource or one literal. For example:\nexample:ijd foaf:firstName \u0026#34;Ian\u0026#34;. example:ijd foaf:knows example:mary. The relationship, or predicate, always connects two nodes (formally, it has arity two). The first argument of the predicate is node we are linking from, and the second is the node we are linking to. We will often refer to these as the subject and object of the RDF statement, respectively. The pattern subject-predicate-object is sufficiently commonplace that we will sometimes use the abbreviation SPO. More commonly, we refer to a statement of one subject, predicate and object as a triple, leading naturally to the term triplestore to refer to a means of storing RDF information.\nIn Jena, the Java class used to represent a single triple is Statement. According to the RDF specification, only resources can be the subject of an RDF triple, whereas the object can be a resource or a literal. The key methods for extracting the elements of a Statement are then:\ngetSubject() returning a Resource getObject() returning an RDFNode getPredicate() returning a Property (see below for more on Properties) The predicate of a triple corresponds to the label on an edge in the RDF graph. So in the figure below, the two representations are equivalent:\nTechnically, an RDF graph corresponds to a set of RDF triples. This means that an RDF resource can only be the subject of at most one triple with the same predicate and object (because sets do not contain any duplicates).\nProperties As mentioned above, the connection between two resources or a resource and a literal in an RDF graph is labelled with the identity of the property. Just as RDF itself uses URI\u0026rsquo;s as names for resources, minimising the chances of accidental name collisions, so too are properties identified with URI\u0026rsquo;s. In fact, RDF Properties are just a special case of RDF Resources. Properties are denoted in Jena by the Property object, which is a Java sub-class of Resource (itself a Java sub-class of RDFNode).\nOne difference between properties and resources in general is that RDF does not permit anonymous properties, so you can\u0026rsquo;t use a bNode in place of a Property in the graph.\nNamespaces Suppose two companies, Acme Inc, and Emca Inc, decide to encode their product catalogues in RDF. A key piece of information to include in the graph is the price of the product, so both decide to use a price predicate to denote the relationship between a product and its current price. However, Acme wants the price to include applicable sales taxes, whereas Emca wants to exclude them. So the notion of price is slightly different in each case. However, using the name \u0026lsquo;price\u0026rsquo; on its own risks losing this distinction.\nFortunately, RDF specifies that a property is identified by a URI, and \u0026lsquo;price\u0026rsquo; on its own is not a URI. A logical solution is for both Acme and Emca to use their own web spaces to provide different base URIs on which to construct the URI for the property:\nhttp://acme.example/schema/products#price http://emca.example/ontology/catalogue/price These are clearly now two distinct identities, and so each company can define the semantics of the price property without interfering with the other. Writing out such long strings each time, however, can be unwieldy and a source of error. A compact URI or curie is an abbreviated form in which a namespace and name are separated by a colon character:\nacme-product:price emca-catalogue:price where acme-product is defined to be http://acme.example/schema/products#. This can be defined, for example, in Turtle:\nPREFIX acme-product: \u0026lt;http://acme.example/schema/products#\u0026gt; acme-product:widget acme-product:price \u0026#34;44.99\u0026#34;^^xsd:decimal. The datatype xsd:decimal is another example of an abbreviated URI. Note that no PREFIX rules are defined by RDF or Turtle: authors of RDF content should ensure that all prefixes used in curies are defined before use.\nNote\nJena does not treat namespaces in a special way. A Model will remember any prefixes defined in the input RDF (see the PrefixMapping interface; all Jena Model objects extend PrefixMapping), and the output writers which serialize a model to XML or Turtle will normally attempt to use prefixes to abbreviate URI\u0026rsquo;s. However internally, a Resource URI is not separated into a namespace and local-name pair. The method getLocalName() on Resource will attempt to calculate what a reasonable local name might have been, but it may not always recover the pairing that was used in the input document.\ncan be used as the subject of statements about the properties of that resource, as above, but also as the value of a statement. For example, the property is-a-friend-of might typically connect two resources denoting people\nJena packages As a guide to the various features of Jena, here\u0026rsquo;s a description of the main Java packages. For brevity, we shorten org.apache.jena to oaj.\nPackage Description More information oaj.jena.rdf.model The Jena core. Creating and manipulating RDF graphs. oaj.riot Reading and Writing RDF. oaj.jena.datatypes Provides the core interfaces through which datatypes are described to Jena. Typed literals oaj.jena.ontology Abstractions and convenience classes for accessing and manipulating ontologies represented in RDF. Ontology API oaj.jena.rdf.listeners Listening for changes to the statements in a model oaj.jena.reasoner The reasoner subsystem is supports a range of inference engines which derive additional information from an RDF model Reasoner how-to oaj.jena.shared Common utility classes oaj.jena.vocabulary A package containing constant classes with predefined constant objects for classes and properties defined in well known vocabularies. oaj.jena.xmloutput Writing RDF/XML. I/O index ","permalink":"https://jena.apache.org/documentation/rdf/","tags":null,"title":"The core RDF API"},{"categories":null,"contents":"These notes were written February 2016.\nDatasetGraph forms the basic of storage as RDFDataset. There is a class hierarchy to make implementation a matter of choosing the style of implementation and adding specific functionality.\nThe hierarchy of the significant classes is: (there are others adding special features)\nDatasetGraph - the interface DatasetGraphBase DatasetGraphBaseFind DatasetGraphCollection DatasetGraphMapLink - ad hoc collection of graphs DatasetGraphOne DatasetGraphTriplesQuads DatasetGraphInMemory - fully transactional in-memory. DatasetGraphMap DatasetGraphQuads DatasetGraphTrackActive - transaction support DatasetGraphTransaction - This is the main TDB dataset. DatasetGraphWithLock - MRSW support DatasetGraphWrapper DatasetGraphTxn - TDB usage DatasetGraphViewGraphs Other important classes:\nGraphView DatasetGraph This is the interface. Includes Transactional operations.\nThere are two markers for transaction features supported.\nIf begin, commit and end are supported (which is normally the case) supportsTransactions returns true.\nIf, further, abort is supported, then supportsTransactionAbort is true.\nGeneral hierarchy DatasetGraphBase\nThis provides some basic machinery and provides implementations of operations that have alternative styles. It converts add(G,S,P,O) to add(quad) and delete(G,S,P,O) to delete(quad) and converts find(quad) to find(G,S,P,O).\nIt provides basic implementations of deleteAny(?,?,?,?) and clear().\nIt provides a Lock (LockMRSW) and the Context.\nFrom here on down, the storage aspect of the hierarchy splits depending on implementation style.\nDatasetGraphBaseFind\nThis is the beginning of the hierarchy for DSGs that store using different units for default graph and named graphs. This class splits find/4 into the following variants:\nfindInDftGraph findInUnionGraph findQuadsInUnionGraph findUnionGraphTriples findInSpecificNamedGraph findInAnyNamedGraphs DatasetGraphTriplesQuads\nThis is the beginning of the hierarchy for DSGs implemented as a set of triples for the default graph and a set of quads for all the named graphs.\nIt splits add(Quad) and delete(Quad) into:\naddToDftGraph addToNamedGraph deleteFromDftGraph deleteFromNamedGraph and makes\nsetDefaultGraph addGraph removeGraph copy-in operations - triples are copied into the graph or removed from the graph, rather than the graph being shared.\n** DatasetGraphInMemory**\nThe main in-memory implementation, providing full transactions (serializable isolation, abort).\nUse this one!\nThis class backs DatasetFactory.createTxnMem().\nDatasetGraphMap\nThe in-memory implementation using in-memory Graphs as the storage for Triples. It provides MRSW-transactions (serializable isolation, no real abort). Use this if a single threaded application.\nThis class backs DatasetFactory.create().\nDatasetGraphCollection\nOperations split into operations on a collection of Graphs, one for the default graph, and a map of (Node,Graph) for the named graphs. It provides MRSW-transactions (serializable isolation, no real abort).\nDatasetGraphMapLink\nThis implementation is manages Graphs provided by the application.\nIt provides MRSW-transactions (serializable isolation, no real abort). Applications need to be careful when modifying the Graphs directly and also modifying them via the DatasetGraph interface.\nThis class backs DatasetFactory.createGeneral().\nDatasetGraphWrapper\nIndirection to another DatasetGraph.\nSurprisingly useful.\nDatasetGraphViewGraphs\nA small class that provides the \u0026ldquo;get a graph\u0026rdquo; operations over a DatasetGraph using GraphView.\nNot used because subclasses usually want to inherit from a different part fo the hierarchy but the idea of implementing getDefaultGraph() and getGraph(Node) as calls to GraphView is used elsewhere.\nDo not use with an implementations that store using graph (e.g. DatasetGraphMap, DatasetGraphMapLink) because it goes into an infinite recursion if they use GraphView internally.\nGraphView\nImplementation of the Graph interface as a view of a DatasetGraph including providing a basic implementation of the union graph. Subclasses can, and do, provide a better mechanisms for the union graph based on their internal indexes.\nDatasetGraphOne\nAn implement that only provides a default graph, given at creation time. This is a fixed - the app can\u0026rsquo;t add named graphs.\nCuts through all the machinery to be a simple, direct implementation.\nBacks up DatasetGraphFactory.createOneGraph but not DatasetFactory.create(Model) which provided are adding named graphs.\nDatasetGraphQuads\nRoot for implementations based on just quad storage, no special triples in the default graph (e.g. the default graph is always the calculated union of named graphs).\nNot used currently.\nDatasetGraphTrackActive\nFramework for implementing transactions. Provides checking.\nDatasetGraphWithLock\nProvides transactions, without abort by default, using a lock. If the lock is LockMRSW, we get multiple-readers or a single writer at any given moment in time. As most datastructures are multi-thread reader safe, this style works over systems that do not themselves provide transactions.\nAbort requires work to be undone. Jena may in the future provide reverse replay abort (do the adds and deletes in reverse operation, reverse order) but this is partial. It does not protect against the DatasetGraph implementation throwing exceptions nor JVM or machine crash (if any persistence). It still needs MRSW to archive isolation.\nRead-committed needs synchronization safe datastructures -including co-ordinated changes to several places at once (ConcurrentHashMap isn\u0026rsquo;t enough - need to update 2 or more ConcurrentHashMaps together).\nTDB DatasetGraphTDB\nDatasetGraphTDB is concerned with the storage historical and not used directly by applications.\nDatasetGraphTransaction\nThis is the class returned by TDBFactory, wrapped in DatasetImpl.\nDifferent in TDB2 - DatasetGraphTransaction not used, DatasetGraphTDB is transactional.\nDatasetGraphTxn\nThis is the TDB per-transaction DatasetGraph using the transaction view of indexes. For the application, it is held in the transactions ThreadLocal in DatasetGraphTransaction.\nInternally, each read transaction for the same generation of the data uses the same DatasetGraphTransaction.\n","permalink":"https://jena.apache.org/documentation/notes/datasetgraph.html","tags":null,"title":"The DatasetGraph hierarchy."},{"categories":null,"contents":"The StreamManager is a utility to find and read files into models. There is a standard global StreamManager and applications may also define specific ones by constructing additional StreamManagers.\nThe LocationMapper provides alternative locations for RDF data.\nThe Stream Manager Files are named by a string, according to the conventions of their storage system. Typically this is by URI. There are a number of storage system adapters provided:\nFile locator (with own current directory) URL locator (HTTP and FTP) Class loader locator Zip file locator The global stream manager has a file location, a URL locator and a class loader (tried in that order).\nA StreamManager can have an associated LocationMapper that transforms names before use. This means local copies of documents can be used transparently to the rest of the application.\nA StreamManager provides an \u0026ldquo;open\u0026rdquo; operation to get an InputStream to the resource.\nThe LocationMapper configuration file Location mapping files are RDF - they may be written in RDF/XML, Turtle (file suffix .ttl) or N-Triples (file suffix .nt). The default is RDF/XML unless one of these suffices is found.\nPREFIX lm: \u0026lt;http://jena.hpl.hp.com/2004/08/location-mapping#\u0026gt; [] lm:mapping [ lm:name \u0026quot;file:foo.ttl\u0026quot; ; lm:altName \u0026quot;file:etc/foo.ttl\u0026quot; ] , [ lm:prefix \u0026quot;file:etc/\u0026quot; ; lm:altPrefix \u0026quot;file:ETC/\u0026quot; ] , [ lm:name \u0026quot;file:etc/foo.ttl\u0026quot; ; lm:altName \u0026quot;file:DIR/foo.ttl\u0026quot; ] . There are two types of location mapping: exact match renaming and prefix renaming. When trying to find an alternative location, a LocationMapper first tries for an exact match; if none is found, the LocationMapper will search for the longest matching prefix. If two are the same length, there is no guarantee on order tried; there is no implied order in a location mapper configuration file (it sets up two hash tables).\nIn the example above, file:etc/foo.ttl becomes file:DIR/foo.ttl because that is an exact match. The prefix match of file:etc/ is ignored.\nAll string tests are done case sensitively because the primary use is for URLs.\nNotes:\nProperty values are not URIs, but strings. This is a system feature, not an RDF feature. Prefix mapping is name rewriting; alternate names are not treated as equivalent resources in the rest of Jena. While application writers are encouraged to use URIs to identify files, this is not always possible. There is no check to see if the alternative system resource is equivalent to the original. A LocationMapper finds its configuration file by looking for the following files, in order:\nlocation-mapping.ttl location-mapping.rdf etc/location-mapping.rdf etc/location-mapping.ttl This is specified as a path - note the path separator is always the character \u0026lsquo;;\u0026rsquo; regardless of operating system because URLs contain \u0026lsquo;:\u0026rsquo;.\nlocation-mapping.ttl;location-mapping.rdf;etc/location-mapping.rdf;etc/location-mapping.ttl\nApplications can also set mappings programmatically. No configuration file is necessary.\nThe base URI for reading models with the StreamManager will be the original URI, not the alternative location.\nDebugging If log4j2, set the logging level of the classes:\nlogger.filemanager.name = org.apache.jena.riot.system.stream.StreamManager logger.filemanager.level = ALL logger.location-manager.name = org.apache.jena.riot.system.stream.LocationMapper logger.location-manager.level = ALL See also Javadoc:\nStreamManager LocationMapper ","permalink":"https://jena.apache.org/documentation/notes/stream-manager.html","tags":null,"title":"The Jena StreamManager and LocationMapper"},{"categories":null,"contents":"The title above will generate an h1 element from the CMS scripts\n{% toc %}\nI am the first h2 lorem fubarum\nI am the second h2 lorem fubarum\nI am the first h3 lorem fubarum\nAnd I\u0026rsquo;m back to h2 lorem fubarum\nTOC test The title above will generate an h1 element from the CMS scripts\nI am the first h2 I am the second h2 I am the first h3 And I'm back to h2 I am the first h2 lorem fubarum\nI am the second h2 lorem fubarum\nI am the first h3 lorem fubarum\nAnd I'm back to h2 lorem fubarum\n","permalink":"https://jena.apache.org/documentation/ontology/toc-test.html","tags":null,"title":"TOC test"},{"categories":null,"contents":"Jena supports TriX, a simple XML format for RDF, for both reading and writing RDF data.\nThe support is of the TriX core, without processing instructions.\nBoth the original HPlabs and W3C DTDs are supported for reading. Writing is according to the W3C DTD, that is using root element \u0026lt;trix\u0026gt;, rather than \u0026lt;TriX\u0026gt;.\nNote: This format should not be confused with RDF/XML, the W3C standardised XML format for RDF.\nTriX History TriX originated from work by Jeremy Carroll (then at HP Labs, Bristol) and Patrick Stickler (then at Nokia) and published as a tech report HPL-2004-56 There is also earlier work published in HPL-2003-268.\nThe work within the Semantic Web Interest Group on Named Graphs, including TriX, is documented at https://www.w3.org/2004/03/trix/.\nTriX DTD: http://www.w3.org/2004/03/trix/trix-1/trix-1.0.dtd Trix XML Schema: http://www.w3.org/2004/03/trix/trix-1/trix-1.0.xsd\nThe W3C DTD differs from HPL-2004-56 by having root element \u0026lt;trix\u0026gt; not \u0026lt;TriX\u0026gt;.\n\u0026lt;!-- TriX: RDF Triples in XML --\u0026gt; \u0026lt;!ELEMENT trix (graph*)\u0026gt; \u0026lt;!ATTLIST trix xmlns CDATA #FIXED \u0026#34;http://www.w3.org/2004/03/trix/trix-1/\u0026#34;\u0026gt; \u0026lt;!ELEMENT graph (uri, triple*)\u0026gt; \u0026lt;!ELEMENT triple ((id|uri|plainLiteral|typedLiteral), uri, (id|uri|plainLiteral|typedLiteral))\u0026gt; \u0026lt;!ELEMENT id (#PCDATA)\u0026gt; \u0026lt;!ELEMENT uri (#PCDATA)\u0026gt; \u0026lt;!ELEMENT plainLiteral (#PCDATA)\u0026gt; \u0026lt;!ATTLIST plainLiteral xml:lang CDATA #IMPLIED\u0026gt; \u0026lt;!ELEMENT typedLiteral (#PCDATA)\u0026gt; \u0026lt;!ATTLIST typedLiteral datatype CDATA #REQUIRED\u0026gt; TriX-star The format is extended for RDF-star with embedded triples by allowing nested \u0026lt;triple\u0026gt;.\nTrix-star (2021) adds \u0026rsquo;triple\u0026rsquo; to subject and object positions of ELEMENT triple.\n\u0026lt;!ELEMENT triple ((id|uri|plainLiteral|typedLiteral|triple), uri, (id|uri|plainLiteral|typedLiteral|triple))\u0026gt; Example The Turtle:\nPREFIX : \u0026lt;http://example/\u0026gt; :s :p \u0026#34;ABC\u0026#34; . \u0026lt;\u0026lt; :s :p :o \u0026gt;\u0026gt; :q :r . is written in Trix as:\n\u0026lt;trix xmlns=\u0026#34;http://www.w3.org/2004/03/trix/trix-1/\u0026#34;\u0026gt; \u0026lt;graph\u0026gt; \u0026lt;triple\u0026gt; \u0026lt;uri\u0026gt;http://example/s\u0026lt;/uri\u0026gt; \u0026lt;uri\u0026gt;http://example/p\u0026lt;/uri\u0026gt; \u0026lt;plainLiteral\u0026gt;ABC\u0026lt;/plainLiteral\u0026gt; \u0026lt;/triple\u0026gt; \u0026lt;triple\u0026gt; \u0026lt;triple\u0026gt; \u0026lt;uri\u0026gt;http://example/s\u0026lt;/uri\u0026gt; \u0026lt;uri\u0026gt;http://example/p\u0026lt;/uri\u0026gt; \u0026lt;uri\u0026gt;http://example/o\u0026lt;/uri\u0026gt; \u0026lt;/triple\u0026gt; \u0026lt;uri\u0026gt;http://example/q\u0026lt;/uri\u0026gt; \u0026lt;uri\u0026gt;http://example/r\u0026lt;/uri\u0026gt; \u0026lt;/triple\u0026gt; \u0026lt;/graph\u0026gt; \u0026lt;/trix\u0026gt; ","permalink":"https://jena.apache.org/documentation/io/trix.html","tags":null,"title":"TriX support in Apache Jena"},{"categories":null,"contents":"Quando você começa a trabalhar com SPARQL você rapidamente descobre que queries estáticas são restritivas. Talvez você queira mudar um valor, adicionar um filtro, alterar o limite, etc. Sendo do tipo impaciente, você começa a manipular a string da query e isso funciona. Mas o que dizer de little Bobby Tables? Além do mais, mesmo que você limpe ao máximo suas entradas, manipulação de strings é um processo tenso e erros de sintaxe esperam por você. Muito embora possa parecer mais difícil do que string munging, a API ARQ é sua amiga na longa jornada.\nOriginalmente publicado em Research Revealed project blog\nInserindo valores (comandos simples prontos) Vamos começar com algo simples. Suponha que nós queiramos restringir a query a seguir a uma pessoa (person) em particular:\nselect * { ?person \u0026lt;http://xmlns.com/foaf/0.1/name\u0026gt; ?name } String#replaceAll deveria funcionar, mas existe um jeito mais seguro. QueryExecutionFactory em muitos casos, permite que você alimente uma QuerySolution com a qual você pode prefixar valores.\nQuerySolutionMap initialBinding = new QuerySolutionMap(); initialBinding.add(\u0026quot;name\u0026quot;, personResource); qe = QueryExecutionFactory.create(query, dataset, initialBinding); Isto geralmente é muito mais simples do que a string equivalente desde que você não tenha usar aspas para citações. (Esteja ciente de que isto não funciona para sparqlService, o que é uma pena. Seria legal gastar algum tempo consertando isto.)\nFazendo uma Query a partir do zero As limitações previamente mencionadas se devem ao fato de que prefixação na verdade não muda a query em nada, apenas a execução daquela query. Então, como nós realmente alteramos queries?\nARQ provê duas maneiras de se trabalhar com queries: no nível de sintaxe (Query and Element), ou no nível algébrico (Op). A distinção entre eles fica claro com os filtros:\nSELECT ?s { ?s \u0026lt;http://example.com/val\u0026gt; ?val . FILTER ( ?val \u0026lt; 20 ) } Se você trabalha no nível de sintaxe, você irá descobrir que isso (em pseudo código) se parece com :\n(GROUP (PATTERN ( ?s \u0026lt;http://example.com/val\u0026gt; ?val )) (FILTER ( \u0026lt; ?val 20 ) )) Isto é, existe um grupo contendo um padrão triplo e um filtro, do mesmo jeito que você vê na query. A álgebra é diferente e nós podemos vê-la usando arq.qparse --print op\n$ java arq.qparse --print op 'SELECT ?s { ?s \u0026lt;http://example.com/val\u0026gt; ?val . FILTER ( ?val \u0026lt; 20 ) }' (base \u0026lt;file:///...\u0026gt; (project (?s) (filter (\u0026lt; ?val 20) (bgp (triple ?s \u0026lt;http://example.com/val\u0026gt; ?val))))) Aqui o filtro contém o padrão, ao invés de se situar próximo a ele. Esta forma torna claro que a expressão está filtrando o padrão.\nVamos criar esta query do zero usando ARQ. Nós começamos com algumas partes comuns: a tripla a ser comparada e a expressão a ser filtrada.\n// ?s ?p ?o . Triple pattern = Triple.create(Var.alloc(\u0026quot;s\u0026quot;), Var.alloc(\u0026quot;p\u0026quot;), Var.alloc(\u0026quot;o\u0026quot;)); // ( ?s \u0026lt; 20 ) Expr e = new E_LessThan(new ExprVar(\u0026quot;s\u0026quot;), new NodeValueInteger(20)); Triple deveria ser familiar de Jena. Var é uma extensão de Node para variáveis. Expr é a interface raíz para expressões, aquelas coisas que aparecem em FILTER and LET.\nPrimeiro, o caminho da sintaxe:\nElementTriplesBlock block = new ElementTriplesBlock(); // Make a BGP block.addTriple(pattern); // Add our pattern match ElementFilter filter = new ElementFilter(e); // Make a filter matching the expression ElementGroup body = new ElementGroup(); // Group our pattern match and filter body.addElement(block); body.addElement(filter); Query q = QueryFactory.make(); q.setQueryPattern(body); // Set the body of the query to our group q.setQuerySelectType(); // Make it a select query q.addResultVar(\u0026quot;s\u0026quot;); // Select ?s Agora a álgebra:\nOp op; BasicPattern pat = new BasicPattern(); // Make a pattern pat.add(pattern); // Add our pattern match op = new OpBGP(pat); // Make a BGP from this pattern op = OpFilter.filter(e, op); // Filter that pattern with our expression op = new OpProject(op, Arrays.asList(Var.alloc(\u0026quot;s\u0026quot;))); // Reduce to just ?s Query q = OpAsQuery.asQuery(op); // Convert to a query q.setQuerySelectType(); // Make is a select query Note que o tipo da query (SELECT, CONSTRUCT, DESCRIBE, ASK)não é parte da álgebra, e que nós temos que configurar isso na query (embora SELECT seja o padrão). FROM e FROM NAMED estão igualmente ausentes.\nNavegando e Aprendendo: Visitors Você também pode olhar para a álgebra e a sintaxe usando visitors. Comece estendendo OpVisitorBase (ElementVisitorBase) que apaga a interface de modo que você pode se concentrar nas partes de interesse, então dê um passo a frente e use OpWalker.walk(Op, OpVisitor) (ElementWalker.walk(Element, ElementVisitor)). Isso funciona no esquema “bottom up” (de baixo para cima).\nPara algumas alterações, como manipulação de padrões triplos no local, visitors irão trabalhar bem. Eles provêm um jeito simples de manipular as partes certas da query e você pode alterar as BGPs backing padrões tanto na álgebra quanto na sintaxe. Entretanto, mutações (mutation) não estão consistentemente disponíveis, não conte com elas.\nTransformando a Álgebra A primeira vista, não há vantagens óbvias em usar a álgebra. O real poder fica claro com o uso de transformers (transformações), que lhe permitem reorganizar uma álgebra completamente. ARQ faz amplo uso de transformers para simplificar e aperfeiçoar execuções de query.\nEm Research Revealed (Pesquisa revelada, em tradução livre), eu escrevi algum código para pegar certo número de constraints (constantes) e produzir uma query. Havia várias formas de se fazer isto, mas o jeito que eu achei foi gerar ops de cada constraint e juntar o resultado.\nfor (Constraint con: cons) { op = OpJoin.create(op, consToOp(cons)); // join } O resultado foi uma bagunça incrivelmente correta, que é remotamente compreensível em apenas três condições:\n(join (join (filter (\u0026lt; ?o0 20) (bgp (triple ?s \u0026lt;urn:ex:prop0\u0026gt; ?o0))) (filter (\u0026lt; ?o1 20) (bgp (triple ?s \u0026lt;urn:ex:prop1\u0026gt; ?o1)))) (filter (\u0026lt; ?o2 20) (bgp (triple ?s \u0026lt;urn:ex:prop2\u0026gt; ?o2)))) Cada uma das constraints é um filtro e um bgp. Isso pode ser muito mais compreensível removendo os filtros e juntando (merging) os padrões triplos. Nós podemos fazer isso usando Transform:\nclass QueryCleaner extends TransformBase { @Override public Op transform(OpJoin join, Op left, Op right) { // Bail if not of the right form if (!(left instanceof OpFilter \u0026amp;\u0026amp; right instanceof OpFilter)) return join; OpFilter leftF = (OpFilter) left; OpFilter rightF = (OpFilter) right; // Add all of the triple matches to the LHS BGP ((OpBGP) leftF.getSubOp()).getPattern().addAll(((OpBGP) rightF.getSubOp()).getPattern()); // Add the RHS filter to the LHS leftF.getExprs().addAll(rightF.getExprs()); return leftF; } } ... op = Transformer.transform(new QueryCleaner(), op); // clean query O código abaixo procura pelos joins do formulário:\n(join (filter (exp1) (bgp1)) (filter (exp2) (bgp2))) E substitui ele com:\n(filter (exp1 \u0026amp;\u0026amp; exp2) (bgp1 \u0026amp;\u0026amp; bgp2)) Enquanto nós percorremos a query original, todos os joins são removidos e o resultado final é:\n(filter (exprlist (\u0026lt; ?o0 20) (\u0026lt; ?o1 20) (\u0026lt; ?o2 20)) (bgp (triple ?s \u0026lt;urn:ex:prop0\u0026gt; ?o0) (triple ?s \u0026lt;urn:ex:prop1\u0026gt; ?o1) (triple ?s \u0026lt;urn:ex:prop2\u0026gt; ?o2) )) Isto completa esta breve introdução. Existe muito mais em ARQ, claro, mas esperamos que você tenha tido um gostinho do que ele pode fazer.\n","permalink":"https://jena.apache.org/documentation/query/manipulating_sparql_using_arq_pt.html","tags":null,"title":"Tutorial - Manipulando SPARQL usando ARQ"},{"categories":null,"contents":"When you\u0026rsquo;ve been working with SPARQL you quickly find that static queries are restrictive. Maybe you want to vary a value, perhaps add a filter, alter the limit, etc etc. Being an impatient sort you dive in to the query string, and it works. But what about little Bobby Tables? And, even if you sanitise your inputs, string manipulation is a fraught process and syntax errors await you. Although it might seem harder than string munging, the ARQ API is your friend in the long run.\nOriginally published on the Research Revealed project blog\nInserting values (simple prepared statements) Let\u0026rsquo;s begin with something simple. Suppose we wanted to restrict the following query to a particular person:\nselect * { ?person \u0026lt;http://xmlns.com/foaf/0.1/name\u0026gt; ?name } String#replaceAll would work, but there is a safer way. QueryExecutionFactory in most cases lets you supply a QuerySolution with which you can prebind values.\nQuerySolutionMap initialBinding = new QuerySolutionMap(); initialBinding.add(\u0026quot;name\u0026quot;, personResource); qe = QueryExecutionFactory.create(query, dataset, initialBinding); This is often much simpler than the string equivalent since you don\u0026rsquo;t have to escape quotes in literals. (Beware that this doesn\u0026rsquo;t work for sparqlService, which is a great shame. It would be nice to spend some time remedying that.)\nMaking a Query from Scratch The previously mentioned limitation is due to the fact that prebinding doesn\u0026rsquo;t actually change the query at all, but the execution of that query. So what how do we really alter queries?\nARQ provides two ways to work with queries: at the syntax level (Query and Element), or the algebra level (Op). The distinction is clear in filters:\nSELECT ?s { ?s \u0026lt;http://example.com/val\u0026gt; ?val . FILTER ( ?val \u0026lt; 20 ) } If you work at the syntax level you\u0026rsquo;ll find that this looks (in pseudo code) like:\n(GROUP (PATTERN ( ?s \u0026lt;http://example.com/val\u0026gt; ?val )) (FILTER ( \u0026lt; ?val 20 ) )) That is there\u0026rsquo;s a group containing a triple pattern and a filter, just as you see in the query. The algebra is different, and we can see it using arq.qparse --print op\n$ java arq.qparse --print op 'SELECT ?s { ?s \u0026lt;http://example.com/val\u0026gt; ?val . FILTER ( ?val \u0026lt; 20 ) }' (base \u0026lt;file:///...\u0026gt; (project (?s) (filter (\u0026lt; ?val 20) (bgp (triple ?s \u0026lt;http://example.com/val\u0026gt; ?val))))) Here the filter contains the pattern, rather than sitting next to it. This form makes it clear that the expression is filtering the pattern.\nLet\u0026rsquo;s create that query from scratch using ARQ. We begin with some common pieces: the triple to match, and the expression for the filter.\n// ?s ?p ?o . Triple pattern = Triple.create(Var.alloc(\u0026quot;s\u0026quot;), Var.alloc(\u0026quot;p\u0026quot;), Var.alloc(\u0026quot;o\u0026quot;)); // ( ?s \u0026lt; 20 ) Expr e = new E_LessThan(new ExprVar(\u0026quot;s\u0026quot;), new NodeValueInteger(20)); Triple should be familiar from jena. Var is an extension of Node for variables. Expr is the root interface for expressions, those things that appear in FILTER and LET.\nFirst the syntax route:\nElementTriplesBlock block = new ElementTriplesBlock(); // Make a BGP block.addTriple(pattern); // Add our pattern match ElementFilter filter = new ElementFilter(e); // Make a filter matching the expression ElementGroup body = new ElementGroup(); // Group our pattern match and filter body.addElement(block); body.addElement(filter); Query q = QueryFactory.make(); q.setQueryPattern(body); // Set the body of the query to our group q.setQuerySelectType(); // Make it a select query q.addResultVar(\u0026quot;s\u0026quot;); // Select ?s Now the algebra:\nOp op; BasicPattern pat = new BasicPattern(); // Make a pattern pat.add(pattern); // Add our pattern match op = new OpBGP(pat); // Make a BGP from this pattern op = OpFilter.filter(e, op); // Filter that pattern with our expression op = new OpProject(op, Arrays.asList(Var.alloc(\u0026quot;s\u0026quot;))); // Reduce to just ?s Query q = OpAsQuery.asQuery(op); // Convert to a query q.setQuerySelectType(); // Make is a select query Notice that the query form (SELECT, CONSTRUCT, DESCRIBE, ASK) isn\u0026rsquo;t part of the algebra, and we have to set this in the query (although SELECT is the default). FROM and FROM NAMED are similarly absent.\nNavigating and Tinkering: Visitors You can also look around the algebra and syntax using visitors. Start by extending OpVisitorBase (ElementVisitorBase) which stubs out the interface so you can concentrate on the parts of interest, then walk using OpWalker.walk(Op, OpVisitor) (ElementWalker.walk(Element, ElementVisitor)). These work bottom up.\nFor some alterations, like manipulating triple matches in place, visitors will do the trick. They provide a simple way to get to the right parts of the query, and you can alter the pattern backing BGPs in both the algebra and syntax. Mutation isn\u0026rsquo;t consistently available, however, so don\u0026rsquo;t depend on it.\nTransforming the Algebra So far there is no obvious advantage in using the algebra. The real power is visible in transformers, which allow you to reorganise an algebra completely. ARQ makes extensive use of transformations to simplify and optimise query execution.\nIn Research Revealed I wrote some code to take a number of constraints and produce a query. There were a number of ways to do this, but one way I found was to generate ops from each constraint and join the results:\nfor (Constraint con: cons) { op = OpJoin.create(op, consToOp(cons)); // join } The result was a perfectly correct mess, which is only barely readable with just three conditions:\n(join (join (filter (\u0026lt; ?o0 20) (bgp (triple ?s \u0026lt;urn:ex:prop0\u0026gt; ?o0))) (filter (\u0026lt; ?o1 20) (bgp (triple ?s \u0026lt;urn:ex:prop1\u0026gt; ?o1)))) (filter (\u0026lt; ?o2 20) (bgp (triple ?s \u0026lt;urn:ex:prop2\u0026gt; ?o2)))) Each of the constraints is a filter on a bgp. This can be made much more readable by moving the filters out, and merging the triple patterns. We can do this with the following Transform:\nclass QueryCleaner extends TransformBase { @Override public Op transform(OpJoin join, Op left, Op right) { // Bail if not of the right form if (!(left instanceof OpFilter \u0026amp;\u0026amp; right instanceof OpFilter)) return join; OpFilter leftF = (OpFilter) left; OpFilter rightF = (OpFilter) right; // Add all of the triple matches to the LHS BGP ((OpBGP) leftF.getSubOp()).getPattern().addAll(((OpBGP) rightF.getSubOp()).getPattern()); // Add the RHS filter to the LHS leftF.getExprs().addAll(rightF.getExprs()); return leftF; } } ... op = Transformer.transform(new QueryCleaner(), op); // clean query This looks for joins of the form:\n(join (filter (exp1) (bgp1)) (filter (exp2) (bgp2))) And replaces it with:\n(filter (exp1 \u0026amp;\u0026amp; exp2) (bgp1 \u0026amp;\u0026amp; bgp2)) As we go through the original query all joins are removed, and the result is:\n(filter (exprlist (\u0026lt; ?o0 20) (\u0026lt; ?o1 20) (\u0026lt; ?o2 20)) (bgp (triple ?s \u0026lt;urn:ex:prop0\u0026gt; ?o0) (triple ?s \u0026lt;urn:ex:prop1\u0026gt; ?o1) (triple ?s \u0026lt;urn:ex:prop2\u0026gt; ?o2) )) That completes this brief introduction. There is much more to ARQ, of course, but hopefully you now have a taste for what it can do.\n","permalink":"https://jena.apache.org/documentation/query/manipulating_sparql_using_arq.html","tags":null,"title":"Tutorial - Manipulating SPARQL using ARQ"},{"categories":null,"contents":"O objetivo deste tutorial é dar um curso rápido sobre SPARQL. Esse tutorial cobre os principais aspectos desta linguagem de consulta através de exemplos, mas não tem como objetivo ser completo.\nSe você estiver procurando uma pequena introdução a SPARQL e Jena, experimente Search RDF data with SPARQL. Se você quer executar consultas SPARQL e já sabe como ele funciona, então você deveria ler a ARQ Documentation.\nSPARQL é uma linguagem de consulta e um protocolo para acesso a RDF elaborado pelo W3C RDF Data Access Working Group.\nComo uma linguagem de consulta, SPARQL é orientada a dados de forma que só consulta as informações presentes nos modelos, não há inferência propriamente dita nesta linguagem de consulta. Por acaso, os modelos de Jena são “inteligentes” quanto a isso, e nos dá a impressão de que certas triplas são criadas sob demanda, incluindo raciocínio OWL. SPARQL nada mais faz do que pegar a descrição do que a aplicação quer, na forma de uma consulta, e retornar a informação, na forma de um conjunto de ligações ou grafo RDF.\nTutorial SPARQL Preliminares: dados! Executando uma consulta simples Padrões básicos Restrição de valores Informação opcional Alternativas Grafos nomeados Resultados Outros Materiais SPARQL query language definition document - contem muitos exemplos. Search RDF data with SPARQL (by Phil McCarthy) - artigo publicado por um desenvolvedor da IBM sobre SPARQL e Jena Guia de referência SPARQL (por Dave Beckett) Detalhado ARQ documentation\n","permalink":"https://jena.apache.org/tutorials/sparql_pt.html","tags":null,"title":"Tutorial SPARQL"},{"categories":null,"contents":"Nesta sessão, vamos olhar para a primeira consulta simples e mostrar como executá-la em Jena.\nO \u0026ldquo;hello world\u0026rdquo; de consultas O arquivo \u0026ldquo;q1.rq\u0026rdquo; contem a seguinte consulta:\nSELECT ?x WHERE { ?x \u0026lt;http://www.w3.org/2001/vcard-rdf/3.0#FN\u0026gt; \u0026#34;John Smith\u0026#34; } executando esta consulta com a aplicação de consultas em linhas de comando:\n--------------------------------- | x | ================================= | \u0026lt;http://somewhere/JohnSmith/\u0026gt; | --------------------------------- Isso funciona casando o padrão da tripla na clausula WHERE contra as triplas no grafo RDF. O predicado e o objeto da tripla são valores fixos, então o padrão vai casar somente triplas com estes valores. O sujeito é a variável, e não há outras restrições para a variável. O padrão casa qualquer tripla com aquele predicado e aquele objeto, e isso casa com soluções para x.\nO item entre \u0026lt;\\\u0026gt; é a URI (atualmente, é uma IRI) e o item entre \u0026quot;\u0026quot; é uma literal. Assim como Turtle, N3 ou N-Triplas, literais tipadas são escritas com ^^e tags de linguagem podem ser adicionadas com @.\n?x é uma variável chamada x. A ? não faz parte do nome e por conta disso não aparece nos resultados.\nHá um casamento. A consulta retorna o casamento na variável x da consulta. A saída mostrada foi obtida usando uma das aplicações de ARQ em linhas de comando.\nExecutando a consulta Há scripts de ajuda nos diretórios de ARQ bat/ e bin/ de sua distribuição. Eles podem não estar na distribuição do Jena. Você deve checar esses scripts antes de usá-los.\nInstalação no Windows Aponte a variável de ambiente ARQROOT para a localização do arquivo na distribuição do ARQ.\n\u0026gt; set ARQROOT=c:\\MyProjects\\ARQ A distribuição normalmente contém o número da versão no nome do diretório.\nNo diretório do ARQ, execute:\n\u0026gt; bat\\sparql.bat --data=doc\\Tutorial\\vc-db-1.rdf --query=doc\\Tutorial\\q1.rq Você pode simplesmente colocar o diretório bat/ no seu classpath ou copiar os programas lá. Todos eles dependem de ARQROOT.\nscripts bash para Linux/Cyqwin/Unix Aponte a variável de ambiente ARQROOT para a localização do arquivo na distribuição do ARQ.\n$ export ARQROOT=$HOME/MyProjects/ARQ A distribuição normalmente contém o número da versão no nome do diretório.\nNo diretório do ARQ, execute:\n$ bin/sparql --data=doc/Tutorial/vc-db-1.rdf --query=doc/Tutorial/q1.rq Você pode simplesmente colocar o diretório bin/ no seu classpath ou copiar os programas lá. Todos eles dependem de ARQROOT.\nCygwin é um ambiente Linux para Windows.\nUsando a aplicação de linhas de comando de Jena diretamente Você precisará modificar o classpath para incluir todos os arquivos jar do diretório lib/ do ARQ.\nPor exemplo, no Windows:\nARQdir\\lib\\antlr-2.7.5.jar;ARQdir\\lib\\arq-extra.jar;ARQdir\\lib\\arq.jar; ARQdir\\lib\\commons-logging-1.1.jar;ARQdir\\lib\\concurrent.jar;ARQdir\\lib\\icu4j_3_4.jar; ARQdir\\lib\\iri.jar;ARQdir\\lib\\jena.jar;ARQdir\\lib\\jenatest.jar; ARQdir\\lib\\json.jar;ARQdir\\lib\\junit.jar;ARQdir\\lib\\log4j-1.2.12.jar; ARQdir\\lib\\lucene-core-2.2.0.jar;ARQdir\\lib\\stax-api-1.0.jar; ARQdir\\lib\\wstx-asl-3.0.0.jar;ARQdir\\lib\\xercesImpl.jar;ARQdir\\lib\\xml-apis.jar onde ARQdir é onde você descompactou o ARQ. Isso tudo precisa estar numa linha.\nOs nomes dos arquivos JAR muitas vezes mudam e novos arquivos são adicionados – verifique essa lista com sua versão do ARQ.\nOs comandos estão no pacote ARQ.\nPróximo: Padrões Básicos\n","permalink":"https://jena.apache.org/tutorials/sparql_query1_pt.html","tags":null,"title":"Tutorial SPARQL - A primeira consulta SPARQL"},{"categories":null,"contents":"Primeiro, nós precisamos esclarecer quais dados estão sendo consultados. SPARQL consulta grafos RDF. Um grafo RDF é um conjunto de triplas (Jena chama os grafos de modelos e as triplas de sentenças porque assim eram chamadas quando a API foi elaborada inicialmente).\nÉ importante perceber que o que importa são as triplas, e não a serialização. A serialização é apenas uma maneira de escrever as triplas. RDF/XML é uma recomendação da W3C, mas isso pode dificultar a visão das triplas porque há múltiplas formas de codificar o mesmo grafo. Neste tutorial, usamos uma serialização mais parecida com triplas, chamada Turtle (veja também a linguagem N3 descrita pela W3C semantic web primer).\nNós vamos começar os dados em vc-db-1.rdf: este arquivo contém RDF para uma quantidade de descrições de vcards de pessoas. Vcards são descritos em RFC2426 e a tradução RDF é descrita na nota da W3C \u0026ldquo;Representing vCard Objects in RDF/XML\u0026rdquo;. Nosso banco de dados exemplo apenas contém alguma informação sobre nomes.\nGraficamente, os dados se assemelham a:\nEm triplas, devem se parecer com:\n@prefix vCard: \u0026lt;http://www.w3.org/2001/vcard-rdf/3.0#\u0026gt; . @prefix rdf: \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026gt; . @prefix : \u0026lt;#\u0026gt; . \u0026lt;http://somewhere/MattJones/\u0026gt; vCard:FN \u0026#34;Matt Jones\u0026#34; ; vCard:N [ vCard:Family \u0026#34;Jones\u0026#34; ; vCard:Given \u0026#34;Matthew\u0026#34; ] . \u0026lt;http://somewhere/RebeccaSmith/\u0026gt; vCard:FN \u0026#34;Becky Smith\u0026#34; ; vCard:N [ vCard:Family \u0026#34;Smith\u0026#34; ; vCard:Given \u0026#34;Rebecca\u0026#34; ] . \u0026lt;http://somewhere/JohnSmith/\u0026gt; vCard:FN \u0026#34;John Smith\u0026#34; ; vCard:N [ vCard:Family \u0026#34;Smith\u0026#34; ; vCard:Given \u0026#34;John\u0026#34; ] . \u0026lt;http://somewhere/SarahJones/\u0026gt; vCard:FN \u0026#34;Sarah Jones\u0026#34; ; vCard:N [ vCard:Family \u0026#34;Jones\u0026#34; ; vCard:Given \u0026#34;Sarah\u0026#34; ] . ou então mais explicitamente como triplas:\n@prefix vCard: \u0026lt;http://www.w3.org/2001/vcard-rdf/3.0#\u0026gt; . @prefix rdf: \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026gt; . \u0026lt;http://somewhere/MattJones/\u0026gt; vCard:FN \u0026#34;Matt Jones\u0026#34; . \u0026lt;http://somewhere/MattJones/\u0026gt; vCard:N _:b0 . _:b0 vCard:Family \u0026#34;Jones\u0026#34; . _:b0 vCard:Given \u0026#34;Matthew\u0026#34; . \u0026lt;http://somewhere/RebeccaSmith/\u0026gt; vCard:FN \u0026#34;Becky Smith\u0026#34; . \u0026lt;http://somewhere/RebeccaSmith/\u0026gt; vCard:N _:b1 . _:b1 vCard:Family \u0026#34;Smith\u0026#34; . _:b1 vCard:Given \u0026#34;Rebecca\u0026#34; . \u0026lt;http://somewhere/JohnSmith/\u0026gt; vCard:FN \u0026#34;John Smith\u0026#34; . \u0026lt;http://somewhere/JohnSmith/\u0026gt; vCard:N _:b2 . _:b2 vCard:Family \u0026#34;Smith\u0026#34; . _:b2 vCard:Given \u0026#34;John\u0026#34; . \u0026lt;http://somewhere/SarahJones/\u0026gt; vCard:FN \u0026#34;Sarah Jones\u0026#34; . \u0026lt;http://somewhere/SarahJones/\u0026gt; vCard:N _:b3 . _:b3 vCard:Family \u0026#34;Jones\u0026#34; . _:b3 vCard:Given \u0026#34;Sarah\u0026#34; . É importante perceber que elas são as mesmas do grafo RDF e que as triplas no grafo não estão em alguma ordem particular. Elas são apenas escritas em grupos relacionados para a leitura humana – a máquina não se importa com isso.\nPróximo: Uma consulta simples\n","permalink":"https://jena.apache.org/tutorials/sparql_data_pt.html","tags":null,"title":"Tutorial SPARQL - Formato de Dados"},{"categories":null,"contents":"RDF é dado semi-estruturado então SPARQL tem a habilidade de consultá-lo, mas não para falhar quando o dado não existe. A consulta usa uma parte opcional para extender a informação encontrada na solução de uma consulta, mas para retornar a informação não opcional de qualquer maneira.\nOPICIONAIS Essa consulta (q-opt1.rq) pega o nome da pessoa e também sua idade se essa informação estiver disponível.\nPREFIX info: \u0026lt;http://somewhere/peopleInfo#\u0026gt; PREFIX vcard: \u0026lt;http://www.w3.org/2001/vcard-rdf/3.0#\u0026gt; SELECT ?name ?age WHERE { ?person vcard:FN ?name . OPTIONAL { ?person info:age ?age } } Duas das quatro pessoas nos dados (vc-db-2.rdf) possui a propriedade idade, então duas das soluções da consulta têm essa informação. No entanto, já que o padrão de tripla para a idade é opcional, há uma solução padrão para a pessoa que não tiver informação sobre a idade.\n------------------------ | name | age | ======================= | \u0026#34;Becky Smith\u0026#34; | 23 | | \u0026#34;Sarah Jones\u0026#34; | | | \u0026#34;John Smith\u0026#34; | 25 | | \u0026#34;Matt Jones\u0026#34; | | ----------------------- Se a clausula opcional não estivesse ali, nenhuma informação sobre idade seria retornada. Se o padrão da tripla fosse incluída, mas não fosse opcional, nós teríamos a consulta (q-opt2.rq):\nPREFIX info: \u0026lt;http://somewhere/peopleInfo#\u0026gt; PREFIX vcard: \u0026lt;http://www.w3.org/2001/vcard-rdf/3.0#\u0026gt; SELECT ?name ?age WHERE { ?person vcard:FN ?name . ?person info:age ?age . } com os dois únicos resultados:\n----------------------- | name | age | ======================= | \u0026#34;Becky Smith\u0026#34; | 23 | | \u0026#34;John Smith\u0026#34; | 25 | ----------------------- porque a propriedade info:age deve estar presente na solução agora.\nOPCIONAIS com FILTROS OPTIONAL é um operador binário que combina dois padrões de grafo. O padrão opcional é qualquer padrão de grupo e deve envolver qualquer tipo de padrão SPARQL. Se o grupo casar, a solução é estendida, senão, a solução original é dada (q-opt-3.rq).\nPREFIX info: \u0026lt;http://somewhere/peopleInfo#\u0026gt; PREFIX vcard: \u0026lt;http://www.w3.org/2001/vcard-rdf/3.0#\u0026gt; SELECT ?name ?age WHERE { ?person vcard:FN ?name . OPTIONAL { ?person info:age ?age . FILTER ( ?age \u0026gt; 24 ) } } Portanto, se filtrarmos por idades maiores que 24 na parte opcional, nós ainda teremos quatro soluções (do padrão vcard:FN) mas somente pegaremos idades se elas passarem no teste.\n----------------------- | name | age | ======================= | \u0026#34;Becky Smith\u0026#34; | | | \u0026#34;Sarah Jones\u0026#34; | | | \u0026#34;John Smith\u0026#34; | 25 | | \u0026#34;Matt Jones\u0026#34; | | ----------------------- Idade não incluída para \u0026ldquo;Becky Smith\u0026rdquo; porque é menor que 24.\nSe a condição do filtro é movida para a parte opcional, então isso pode influenciar no número de soluções, mas deve ser necessário fazer um filtro mais complicado para permitir que a variável age seja não limitada (q-opt4.rq).\nPREFIX info: \u0026lt;http://somewhere/peopleInfo#\u0026gt; PREFIX vcard: \u0026lt;http://www.w3.org/2001/vcard-rdf/3.0#\u0026gt; SELECT ?name ?age WHERE { ?person vcard:FN ?name . OPTIONAL { ?person info:age ?age . } FILTER ( !bound(?age) || ?age \u0026gt; 24 ) } Se a solução tiver uma variável age, então ela deve ser maior que 24. Isso também pode ser não limitado. Agora há 3 soluções:\n----------------------- | name | age | ======================= | \u0026#34;Sarah Jones\u0026#34; | | | \u0026#34;John Smith\u0026#34; | 25 | | \u0026#34;Matt Jones\u0026#34; | | ----------------------- Avaliar uma expressão que tem variáveis não limitadas onde uma variável limitada é esperada causa uma exceção de avaliação e toda a expressão falha.\nOPCIONAIS e consultas dependentes de ordem Uma coisa a se ter cuidado ao usar a mesma variável em duas ou mais cláusulas opcionais (e não em algum padrão básico também):\nPREFIX foaf: \u0026lt;http://xmlns.com/foaf/0.1/\u0026gt; PREFIX vCard: \u0026lt;http://www.w3.org/2001/vcard-rdf/3.0#\u0026gt; SELECT ?name WHERE { ?x a foaf:Person . OPTIONAL { ?x foaf:name ?name } OPTIONAL { ?x vCard:FN ?name } } Se a primeira opção liga ?name e ?x a algum valor, a segunda opção é uma tentativa de casar as outras triplas (?x e \u0026lt;kbd\u0026gt;?name\u0026lt;/kbd\u0026gt; têm valor). Se a primeira opção não casar com a parte opcional, então a segunda é uma tentativa para casar a tripla com duas variáveis.\nPróximo: União de Consultas\n","permalink":"https://jena.apache.org/tutorials/sparql_optionals_pt.html","tags":null,"title":"Tutorial SPARQL - Informações Opcionais"},{"categories":null,"contents":"Casamento em Grafos permite que sejam encontrados padrões no grafo. Essa seção descreve como os valores numa solução podem ser restritas. Há muitas comparações disponíveis – vamos apenas cobrir dois casos destes.\nCasamento de Strings SPARQL fornece uma operação para testar strings, baseada em expressões regulares. Isso inclui a habilidade de testes como SQL \u0026ldquo;LIKE\u0026rdquo;, no entanto, a sintaxe de expressões regulares é diferente de SQL.\nA sintaxe é:\nFILTER regex(?x, \u0026#34;pattern\u0026#34; [, \u0026#34;flags\u0026#34;]) O argumento flags é opcional. A flag \u0026ldquo;i\u0026rdquo; significa casamento de padrão case-insensitivo.\nA consulta (q-f1.rq) procura nomes com um “r” ou “R” neles.\nPREFIX vcard: \u0026lt;http://www.w3.org/2001/vcard-rdf/3.0#\u0026gt; SELECT ?g WHERE { ?y vcard:Given ?g . FILTER regex(?g, \u0026#34;r\u0026#34;, \u0026#34;i\u0026#34;) } resultados:\n------------- | g | ============= | \u0026#34;Rebecca\u0026#34; | | \u0026#34;Sarah\u0026#34; | ------------- A linguagem de expressão regular XQuery regular expression language é a versão codificada da mesma encontrada em Perl.\nTestando valores Muitas vezes, a aplicação necessita filtrar com o valor de uma variável. No arquivo vc-db-2.rdf, nós adicionamos um campo extra para idade. Idade não é definida no esquema de vcard então tivemos que criar uma nova propriedade para usar neste tutorial. RDF permite a mistura de diferentes definições de informação porque URIs são únicas. Note também que a propriedade info:age é tipada.\nNesse pedaço de dado, nós mostramos o valor tipado.\n\u0026lt;http://somewhere/RebeccaSmith/\u0026gt; info:age \u0026#34;23\u0026#34;^^xsd:integer ; vCard:FN \u0026#34;Becky Smith\u0026#34; ; vCard:N [ vCard:Family \u0026#34;Smith\u0026#34; ; vCard:Given \u0026#34;Rebecca\u0026#34; ] . Então a consulta (q-f2.rq) para procurar as pessoas mais velhas que 24 anos é:\nPREFIX info: \u0026lt;http://somewhere/peopleInfo#\u0026gt; SELECT ?resource WHERE { ?resource info:age ?age . FILTER (?age \u0026gt;= 24) } A expressão aritmética precisa estar em parêntesis. A única solução é:\n--------------------------------- | resource | ================================= | \u0026lt;http://somewhere/JohnSmith/\u0026gt; | --------------------------------- Apenas um resultado, resultando na URI para o recurso Jonh Smith. Se consultássemos os mais novos que 24 anos, resultaria em Rebecca Smith. Nada sobre os Jones.\nO banco de dados não contém informação sobre a idade dos Jones: não há propriedades info:age nos seus vcards, então a variável age não recebe um valor, então não é testada.\nPróximo: Opcionais\n","permalink":"https://jena.apache.org/tutorials/sparql_filters_pt.html","tags":null,"title":"Tutorial SPARQL – Filtros"},{"categories":null,"contents":"Esta sessão cobre os padrões básicos e as soluções, os principais blocos das consultas SPARQL.\nSoluções Soluções são um conjunto de pares de variáveis com um valor. Uma consulta SELECT expõe diretamente as soluções (depois de ordenar/limitar/deslocar) como o conjunto resultado – outras formas de consulta usam as soluções para fazer um grafo. Solução é a maneira como o padrão é casado – em que os valores das variáveis são utilizados para casar com o padrão.\nA primeira consulta de exemplo teve uma solução simples. Mude o padrão para esta segunda consulta: (q-bp1.rq):\nSELECT ?x ?fname WHERE {?x \u0026lt;http://www.w3.org/2001/vcard-rdf/3.0#FN\u0026gt; ?fname} Isso tem quatro soluções, uma pra cada propriedade nome de VCARD das triplas na fonte de dados.\n---------------------------------------------------- | x | name | ==================================================== | \u0026lt;http://somewhere/RebeccaSmith/\u0026gt; | \u0026#34;Becky Smith\u0026#34; | | \u0026lt;http://somewhere/SarahJones/\u0026gt; | \u0026#34;Sarah Jones\u0026#34; | | \u0026lt;http://somewhere/JohnSmith/\u0026gt; | \u0026#34;John Smith\u0026#34; | | \u0026lt;http://somewhere/MattJones/\u0026gt; | \u0026#34;Matt Jones\u0026#34; | ---------------------------------------------------- Até agora, com padrões de triplas e padrões básicos, cada variável será definida em cada solução. As soluções de uma consulta podem ser pensadas como uma tabela, mas no caso geral, é uma tabela onde nem sempre cada linha vai ter um valor para cada coluna. Todas as soluções para uma consulta SPARQL não têm que ter valores para todas as variáveis em todas as soluções como veremos depois.\nPadrões Básicos Um padrão básico é um conjunto de padrões de triplas. Ele casa quando todo o padrão da tripla casa com o mesmo valor usado cada vez que a variável com o mesmo nome é usada.\nSELECT ?givenName WHERE { ?y \u0026lt;http://www.w3.org/2001/vcard-rdf/3.0#Family\u0026gt; \u0026#34;Smith\u0026#34; . ?y \u0026lt;http://www.w3.org/2001/vcard-rdf/3.0#Given\u0026gt; ?givenName . } Essa consulta (q-bp2.rq)envolve dois padrões de triplas, cada tripla termina com \u0026lsquo;.\u0026rsquo; (mas o ponto depois do último pode ser omitido como foi omitido no exemplo de padrão de uma tripla). A variável y tem que ser a mesma para cada casamento de padrão de tripla. As soluções são:\n------------- | givenName | ============= | \u0026#34;John\u0026#34; | | \u0026#34;Rebecca\u0026#34; | ------------- QNames Aqui temos um mecanismo prático para escrever longas URIs usando prefixos. A consulta acima poderia ser escrita mais claramente como a consulta: (q-bp3.rq):\nPREFIX vcard: \u0026lt;http://www.w3.org/2001/vcard-rdf/3.0#\u0026gt; SELECT ?givenName WHERE { ?y vcard:Family \u0026#34;Smith\u0026#34; . ?y vcard:Given ?givenName . } Isso é um mecanismo de prefixagem – as duas partes do URI, da declaração do prefixo e da parte depois de \u0026ldquo;:\u0026rdquo; no qname são concatenadas. Isso não é exatamente como um qname XML é, mas usa as regras de RDF para transformar o qname numa URI concatenando as partes.\nBlank Nodes Mude a consulta só para retornar y da seguinte forma: (q-bp4.rq) :\nPREFIX vcard: \u0026lt;http://www.w3.org/2001/vcard-rdf/3.0#\u0026gt; SELECT ?y ?givenName WHERE { ?y vcard:Family \u0026#34;Smith\u0026#34; . ?y vcard:Given ?givenName . } e os blank nodes aparecem\n-------------------- | y | givenName | ==================== | _:b0 | \u0026#34;John\u0026#34; | | _:b1 | \u0026#34;Rebecca\u0026#34; | -------------------- como os estranhos qnames iniciados com _:. Isso não é o título interno do blank node – isso é o ARQ imprimindo-os, atribuindo _:b0, _:b1para mostrar quando dois blank nodes são o mesmo. Aqui eles são diferentes. Isso não revela o título interno usado para um blank node, mas isso está disponível quando usar a API Java.\nPróximo: Filtros\n","permalink":"https://jena.apache.org/tutorials/sparql_basic_patterns_pt.html","tags":null,"title":"Tutorial SPARQL – Padrões Básicos"},{"categories":null,"contents":"Txn provides a high level interface to Jena transactions. It is a library over the core functionality - applications do not have to use Txn to use transactions.\nFeatures:\nJava8 idioms Application exceptions cause transaction aborts. \u0026ldquo;Transaction continuation\u0026rdquo; - use any existing active transaction. Autocommit - ensure actions are inside a transaction even if none is active. Transactions The basic transactions API provides operations begin, commit, abort and end.\nA write transaction looks like:\ndataset.begin(ReadWrite.write) ; try { ... write operations ... dataset.commit() ; // Or abort } finally { dataset.end() ; } This can be simplified by wrapping application code, contained in a Java lambda expression or a Java Runnable object, and calling a method of the dataset or other transactional object. This wil apply the correct entry and exit code for a transaction, eliminating coding errors.\nThis is also available via transactional objects such as Dataset.\nThe pattern is:\nDataset dataset = ... dataset.executeRead(()-\u0026gt; { . . . }) ; and\ndataset.executeWrite(()-\u0026gt; { . . . }) ; The form is:\nTxn.executeRead(ds, ()-\u0026gt; { . . . }) ; and\nTxn.executeWrite(ds, ()-\u0026gt; { . . . }) ; is also available (Txn is the implementation of this machinery). Using Txn is this way is necessary for Jena3.\nUsage This first example shows how to write a SPARQL query .\nDataset dataset = ... ; Query query = ... ; dataset.executeRead(()-\u0026gt; { try(QueryExecution qExec = QueryExecutionFactory.create(query, dataset)) { ResultSetFormatter.out(qExec.execSelect()) ; } }) ; Here, a try-with-resources ensures correct handling of the QueryExecution inside a read transaction.\nWriting to a file is a read-action (it does not update the RDF data, the writer needs to read the dataset or model):\nDataset dataset = ... ; dataset.executeRead(()-\u0026gt; { RDFDataMgr.write(System.out, dataset, Lang.TRIG) ; }) ; whereas reading data into an RDF dataset needs to be a write transaction (the dataset or model is changed).\nDataset dataset = ... ; dataset.executeWrite(()-\u0026gt; { RDFDataMgr.read(\u0026#34;data.ttl\u0026#34;) ; }) ; Applications are not limited to a single operation inside a transaction. It can involve multiple applications read operations, such as making several queries:\nDataset dataset = ... ; Query query1 = ... ; Query query2 = ... ; dataset.executeRead(()-\u0026gt; { try(QueryExecution qExec1 = QueryExecutionFactory.create(query1, dataset)) { ... } try(QueryExecution qExec2 = QueryExecutionFactory.create(query2, dataset)) { ... } }) ; A calculateRead block can return a result but only with the condition that what is returned does not touch the data again unless it uses a new transaction.\nThis includes returning a result set or returning a model from a dataset.\nResultSets by default stream - each time hasNext or next is called, new data might be read from the RDF dataset. A copy of the results needs to be taken:\nDataset dataset = ... ; Query query = ... ; List\u0026lt;String\u0026gt; results = dataset.calculateRead(()-\u0026gt; { List\u0026lt;String\u0026gt; accumulator = new ArrayList\u0026lt;\u0026gt;() ; try(QueryExecution qExec = QueryExecutionFactory.create(query, dataset)) { qExec.execSelect().forEachRemaining((row)-\u0026gt;{ String strThisRow = row.getLiteral(\u0026#34;variable\u0026#34;).getLexicalForm() ; accumulator.add(strThisRow) ; }) ; } return accumulator ; }) ; // use \u0026#34;results\u0026#34; Dataset dataset = ... ; Query query = ... ; ResultSet List\u0026lt;String\u0026gt; resultSet = dataset.calculateRead(()-\u0026gt; { List\u0026lt;String\u0026gt; accumulator = new ArrayList\u0026lt;\u0026gt;() ; try(QueryExecution qExec = QueryExecutionFactory.create(query, dataset)) { return ResultSetFactory.copyResults(qExec.execSelect()) ; } }) ; // use \u0026#34;resultSet\u0026#34; The functions execute and calculate start READ_PROMOTE transactions which start in \u0026ldquo;read\u0026rdquo; mode but convert to \u0026ldquo;write\u0026rdquo; mode if needed. For details of transaction promotion see the section in the transaction API documentation.\nWorking with RDF Models The unit of transaction is the dataset. Model in datasets are just views of that dataset. Model should not be passed out of a transaction because they are still attached to the dataset.\nAutocommit and Transaction continuation If there is a transaction already started for the thread, then execute... or calculate... will be performed as part of the transaction and that transaction is not terminated. If there is not transaction already started, a transaction is wrapped around the execute... or calculate... action.\nDataset dataset = ... // Main transaction. dataset.begin(ReadWrite.WRITE) ; try { ... // Part of the transaction above. dataset.executeRead(() -\u0026gt; ...) ; ... // Part of the transaction above - no commit/abort dataset.executeWrite(() -\u0026gt; ...) ; // Outer transaction dataset.commit() ; } finally { dataset.end() ; } Design Txn uses Java Runnable for the application code, passed into control code that wraps the transaction operations around the application code. This results in application code automatically applied transaction begin/commit/end as needed.\nA bare read transaction requires the following code structure (no exception handling):\ntxn.begin(ReadWrite.READ) ; try { ... application code ... } finally { txn.end() ; } while a write transaction requires either a commit or an abort at the end of the application code as well.\nWithout the transaction continuation code (simplified, the Txn code for a read transaction takes the form:\npublic static \u0026lt;T extends Transactional\u0026gt; void executeRead(T txn, Runnable r) { txn.begin(ReadWrite.READ) ; try { r.run() ; } catch (Throwable th) { txn.end() ; throw th ; } txn.end() ; } See Txn.java for full details.\n","permalink":"https://jena.apache.org/documentation/txn/txn.html","tags":null,"title":"Txn - A library for working with Transactions"},{"categories":null,"contents":"What are typed literals? In the original RDF specifications there were two types of literal values defined - plain literals (which are basically strings with an optional language tag) and XML literals (which are more or less plain literals plus a \u0026ldquo;well-formed-xml\u0026rdquo; flag).\nPart of the remit for the 2001 RDF Core working group was to add to RDF support for typed values, i.e. things like numbers. These notes describe the support for typed literals in Jena2.\nBefore going into the Jena details here are some informal reminders of how typed literals work in RDF. We refer readers to the RDF core semantics, syntax and concepts documents for more precise details.\nIn RDF, typed literal values comprise a string (the lexical form of the literal) and a datatype (identified by a URI). The datatype is supposed to denote a mapping from lexical forms to some space of values. The pair comprising the literal then denotes an element of the value space of the datatype. For example, a typed literal comprising (\u0026quot;true\u0026quot;, xsd:boolean) would denote the abstract true value T.\nIn the RDF/XML syntax typed literals are notated with syntax such as:\n\u0026lt;age rdf:datatype=\u0026quot;http://www.w3.org/2001/XMLSchema#int\u0026quot;\u0026gt;13\u0026lt;/age\u0026gt; In NTriple syntax the notation is:\n\u0026quot;13\u0026quot;^^\u0026lt;http://www.w3.org/2001/XMLSchema#int\u0026gt; In Turtle, it can be abbreviated:\n\u0026quot;13\u0026quot;^^xsd:int This ^^ notation will appear in literals printed by Jena.\nNote that a literal is either typed or plain (an old style literal) and which it is can be determined statically. There is no way to define a literal as having a lexical value of, say \u0026ldquo;13\u0026rdquo; but leave its datatype open and then infer the datatype from some schema or ontology definition.\nIn the new scheme of things well-formed XML literals are treated as typed literals whose datatype is the special type rdf:XMLLiteral.\nBasic API operations Jena will correctly parse typed literals within RDF/XML, NTriple and Turtle source files. The same Java object, Literal will represent \u0026ldquo;plain\u0026rdquo; and \u0026ldquo;typed\u0026rdquo; literals. Literal now supports some new methods:\ngetDatatype() Returns null for a plain literal or a Java object which represents the datatype of a typed Literal.\ngetDatatypeURI() Returns null for a plain literal or the URI of the datatype of a typed Literal.\ngetValue() Returns a Java object representing the value of the literal, for example for an xsd:int this will be a java.lang.Integer, for plain literals it will be a String. The converse operation of creating a Java object to represent a typed literal in a model can be achieved using:\nmodel.createTypedLiteral(value, datatype) This allows the value to be specified by a lexical form (i.e. a String) or by a Java object representing the typed value; the datatype can be specified by a URI string or a Java object representing the datatype.\nIn addition there is a built in mapping from standard Java wrapper objects to XSD datatypes (see later) so that the simpler call:\nmodel.createTypedLiteral(Object) will create a typed literal with the datatype appropriate for representing that java object. For example,\nLiteral l = model.createTypedLiteral(new Integer(25)); will create a typed literal with the lexical value \u0026ldquo;25\u0026rdquo;, of type xsd:int.\nNote that there are also functions which look similar but do not use typed literals. For example::\nLiteral l = model.createLiteral(25); int age = l.getInt(); These worked by converting the primitive to a string and storing the resulting string as a plain literal. The inverse operation then attempts to parse the string of the plain literal (as an int in this example). These are for backward compatibility with earlier versions of Jena and older datasets. In normal circumstances createTypedLiteral is preferable.\nEquality issues There is a well defined notion of when two typed literals should be equal, based on the equality defined for the datatype in question. Jena2 implements this equality function by using the method sameValueAs. Thus two literals (\u0026ldquo;13\u0026rdquo;, xsd:int) and (\u0026ldquo;13\u0026rdquo;, xsd:decimal) will test as sameValueAs each other but neither will test sameValueAs (\u0026ldquo;13\u0026rdquo;, xsd:string).\nNote that this is a different function from the Java equals method. Had we changed the equals method to test for semantic equality problems would have arisen because the two objects are not substitutable in the Java sense (for example they return different values from a getDatatype() call). This would, for example, have made it impossible to cache literals in a hash table.\nHow datatypes are represented Datatypes for typed literals are represented by instances of the interface org.apache.jena.datatypes.RDFDatatype. Instances of this interface can be used to parse and serialized typed data, test for equality and test if a typed or lexical value is a legal value for this datatype.\nPrebuilt instances of this interface are included for all the main XSD datatypes (see below).\nIn addition, it is possible for an application to define new datatypes and register them against some URI (see below).\nError detection When Jena parses a datatype whose lexical value is not legal for the declared datatype is does not immediately throw an error. This is because the RDFCore working group has defined that illegal datatype values are errors but are not syntactic errors so we try to avoid have parsers break at this point. Instead a literal is created which is marked internally as ill-formed and the first time an application attempts to access its value (with getValue()) an error will be thrown.\nWhen Jena is reading a file there is also the issue of what to do when it encounters a typed value whose datatype URI is not one that is knows about. The default behaviour is to create a new datatype object (whose value space is the same as its lexical space). Again this behaviour seems in keeping with the working group preference that illegal datatypes are semantic but not syntactic errors.\nHowever, both of these behaviours can mean that simple common errors (such as mis-spelling the xsd namespace) may go unnoticed until very late on. To overcome this we have hidden some global switches that allow you to force Jena to report such syntactic errors earlier. These are static Boolean parameters:\norg.apache.jena.shared.impl.JenaParameters.enableEagerLiteralValidation org.apache.jena.shared.impl.JenaParameters.enableSilentAcceptanceOfUnknownDatatypes They are placed here in an impl package (and thus only visible in the full javadoc, not the API javadoc) because they should not be regarded as stable. We plan to develop a cleaner way of setting mode switches for Jena and these switches will migrate there in due course, if they prove to be useful.\nXSD data types Jena includes prebuilt, and pre-registered, instances of RDFDatatype for all of the relevant XSD types:\nfloat double int long short byte unsignedByte unsignedShort unsignedInt unsignedLong decimal integer nonPositiveInteger nonNegativeInteger positiveInteger negativeInteger Boolean string normalizedString anyURI token Name QName language NMTOKEN ENTITIES NMTOKENS ENTITY ID NCName IDREF IDREFS NOTATION hexBinary base64Binary date time dateTime duration gDay gMonth gYear gYearMonth gMonthDay These are all available as static member variables from org.apache.jena.datatypes.xsd.XSDDatatype.\nOf these types, the following are registered as the default type to use to represent certain Java classes:\nJava class xsd type Float float Double double Integer int Long long Short short Byte byte BigInteger integer BigDecimal decimal Boolean Boolean String string Thus when creating a typed literal from a Java BigInteger then xsd:integer will be used. The converse mapping is more adaptive. When parsing an xsd:integer the Java value object used will be an Integer, Long or BigInteger depending on the size of the specific value being represented.\nUser defined XSD data types XML schema allows derived types to be defined in which a base type is modified through some facet restriction such as limiting the min/max of an integer or restricting a string to a regular expression. It also allows new types to be created by unioning other types or by constructing lists of other types.\nJena provides support for derived and union types but not for list types.\nThese are supported through the XSDDatatype.loadUserDefined method which allows an XML schema datatype file to be loaded. This registers a new RDFDatatype that can be used to create, parse, serialize, test instances of that datatype.\nThere is one difficult issue in here, what URI to give to the user defined datatype? This is not defined by XML Schema, nor RDF nor OWL. Jena2 adopts the position that the defined datatype should have the base URI of the schema file with a fragment identifier given by the datatype name.\nTo illustrate working with the defined types, the following code then tries to create and use two instances of the over 12 type:\nModel m = ModelFactory.createDefaultModel(); RDFDatatype over12Type = tm.getSafeTypeByName(uri + \u0026quot;#over12\u0026quot;); Object value = null; try { value = \u0026quot;15\u0026quot;; m.createTypedLiteral((String)value, over12Type).getValue(); System.out.println(\u0026quot;Over 12 value of \u0026quot; + value + \u0026quot; is ok\u0026quot;); value = \u0026quot;12\u0026quot;; m.createTypedLiteral((String)value, over12Type).getValue(); System.out.println(\u0026quot;Over 12 value of \u0026quot; + value + \u0026quot; is OK\u0026quot;); } catch (DatatypeFormatException e) { System.out.println(\u0026quot;Over 12 value of \u0026quot; + value + \u0026quot; is illegal\u0026quot;); } which products the output:\nOver 12 value of 15 is OK Over 12 value of 12 is illegal User defined non-XSD data types RDF allows any URI to be used as a datatype but provides no standard for how to map the datatype URI to a datatype definition.\nWithin Jena2 we allow new datatypes to be created and registered by using the TypeMapper class.\nThe easiest way to define a new RDFDatatype is to subclass BaseDatatype and define implementations for parse, unparse and isEqual.\nFor example here is the outline of a type used to represent rational numbers:\nclass RationalType extends BaseDatatype { public static final String theTypeURI = \u0026quot;urn:x-hp-dt:rational\u0026quot;; public static final RDFDatatype theRationalType = new RationalType(); /** private constructor - single global instance */ private RationalType() { super(theTypeURI); } /** * Convert a value of this datatype out * to lexical form. */ public String unparse(Object value) { Rational r = (Rational) value; return Integer.toString(r.getNumerator()) + \u0026quot;/\u0026quot; + r.getDenominator(); } /** * Parse a lexical form of this datatype to a value * @throws DatatypeFormatException if the lexical form is not legal */ public Object parse(String lexicalForm) throws DatatypeFormatException { int index = lexicalForm.indexOf(\u0026quot;/\u0026quot;); if (index == -1) { throw new DatatypeFormatException(lexicalForm, theRationalType, \u0026quot;\u0026quot;); } try { int numerator = Integer.parseInt(lexicalForm.substring(0, index)); int denominator = Integer.parseInt(lexicalForm.substring(index+1)); return new Rational(numerator, denominator); } catch (NumberFormatException e) { throw new DatatypeFormatException(lexicalForm, theRationalType, \u0026quot;\u0026quot;); } } /** * Compares two instances of values of the given datatype. * This does not allow rationals to be compared to other number * formats, Lang tag is not significant. */ Public Boolean isEqual(LiteralLabel value1, LiteralLabel value2) { return value1.getDatatype() == value2.getDatatype() \u0026amp;\u0026amp; value1.getValue().equals(value2.getValue()); } } To register and use this type you simply need the call:\nRDFDatatype rtype = RationalType.theRationalType; TypeMapper.getInstance().registerDatatype(rtype); ... // Create a rational literal Literal l1 = m.createTypedLiteral(\u0026quot;3/5\u0026quot;, rtype); Note that whilst any serialization of RDF containing such user defined literals will be perfectly legal a client application has no standard way of looking up the datatype URI you have chosen. This has to be done \u0026ldquo;out of band\u0026rdquo; as they say.\nA note on xml:Lang Plain literals have an xml:Lang tag as well as a string value. Two plain literals with the same string but different Lang tags are not equal.\nXML Schema states that xml:Lang is not meaningful on xsd datatypes.\nThus for almost all typed literals there is no xml:Lang tag.\nAt the time of last call the RDF specifications allowed the special case that rdf:XMLLiterals could have a Lang tag that would be significant in equality testing. Thus in preview releases of Jena2 the createTypedLiterals calls took an extra Lang tag argument.\nHowever, at the time of writing that specification has been changed so that Lang tags will never be significant on typed literals (whether this means that xml:Lang is not significant on XMLLiterals or means that XMLLiteral will cease to be a typed literal is not completely certain).\nFor this reason we have removed the Lang tag from the createTypedLiterals calls and deprecated the createLiteral call which allowed both wellFormedXML and Lang tag to be specified.\nWe do not expect to need to change the API even if the working group decision changes again, the most we might expect to do would be to undeprecate the 3-argument version of createLiteral.\n","permalink":"https://jena.apache.org/documentation/notes/typed-literals.html","tags":null,"title":"Typed literals how-to"},{"categories":null,"contents":"Prefácio Este é um tutorial introdutório ao framework de descrição de recursos (RDF) e Jena, uma API Java para RDF. Ele é escrito para programadores que não estão familiarizados com RDF e que aprendem melhor através de prototipagem, ou, por outros motivos, desejam avançar rapidamente para a implementação. Familiaridade com XML e Java é assumido.\nAvançar direto para a implementação, sem conhecer inicialmente o modelo de dados de RDF, levará à frustração e ao desapontamento. No entanto, estudar unicamente o modelo de dados é desgastante e muitas vezes leva a \"enigmas metafísicos torturantes\". É melhor, então, abordar os conceitos do modelo de dados e como usá-lo, paralelamente. Aprender um pouco o modelo de dados, e praticá-lo. Então aprender um pouco mais e praticar. A teoria leva à prática , e a prática leva à teoria. O modelo de dados é relativamente simples, então esta abordagem não exigirá muito tempo.\nRDF possui uma sintaxe XML, e muitos dos que são familiarizados com XML irão pensar em RDF em termos da sintaxe do XML. Isso é um erro. RDF deve ser entendido em termos do seu modelo de dados. Os dados RDF podem ser representados em XML, mas entender a sintaxe é menos importante do que entender o modelo de dados.\nUma implementação da API JENA, incluindo o código fonte dos exemplos usados neste tutorial, podem ser baixados em jena.apache.org/download/index.cgi.\nIntrodução O framework de descrição de recursos (RDF) é um padrão (tecnicamente uma recomendação da W3C) para descrever recursos. Mas o que são recursos? Isso é uma questão profunda e a definição precisa ainda é um assunto de debates. Para nossos propósitos, nós podemos pensar em recursos como tudo que podemos identificar. Você é um recurso, assim como sua página pessoal, este tutorial, o número um e a grande baleia branca em Moby Dick.\nNossos exemplos neste tutorial serão sobre pessoas. Elas usam uma representação RDF de cartão de negócios (VCARDS). RDF é melhor representado como um diagrama de nós e arcos. Um simples vcard se assemelha a isto em RDF:\nO recurso, John Smith, é exibido como uma elipse e identificado por um Identificador Uniforme de Recurso (URI)1, neste caso \"http://.../JohnSmith\". Se você tentar acessar o recurso usando seu navegador, não vai obter sucesso. Se você não tem familiaridade com URI's, pense neles como nomes estranhos.\nRecursos possuem propriedades. Nesses exemplos, nós estamos interessados nos tipos de propriedades que apareceriam no cartão de negócios de Jonh Smith. A figura 1 mostra somente uma propriedade, o nome completo (full name) de Jonh Smith. Uma propriedade é representada por um arco, intitulado com o nome da propriedade. O nome da propriedade é também um URI, mas como os URIs são longos e incomodas, o diagrama o exibe em forma XML qname. A parte antes de ':' é chamada de prefixo namespace e representa um namespace. A parte depois de ':' é um nome local e representa o nome naquele namespace. Propriedades são normalmente representadas nesta forma de qname quando escrito como RDF XML, e isso é uma maneira prática de representá-los em diagramas e textos. No entanto, propriedades são rigorosamente representadas por um URI. A forma nsprefix:localname é um atalho para o URI do namespace concatenado com o nome local. Não há exigências de que o URI de uma propriedade resulte em algo quando acessado do navegador.\nToda propriedade possui um valor. Neste caso, o valor é uma literal, que por hora podemos pensar nelas como uma cadeia de caracteres2. Literais são exibidas em retângulos.\nJena é uma API Java que pode ser usada para pra criar e manipular grafos RDF como o apresentado no exemplo. Jena possui classes para representar grafos, recursos, propriedades e literais. As interfaces que representam recursos, propriedades e literais são chamadas de modelo e é representada pela interface Model.\nO código para criar este grafo, ou modelo, é simples:\n// some definitions static String personURI = \u0026#34;http://somewhere/JohnSmith\u0026#34;; static String fullName = \u0026#34;John Smith\u0026#34;; // create an empty Model Model model = ModelFactory.createDefaultModel(); // create the resource Resource johnSmith = model.createResource(personURI); // add the property johnSmith.addProperty(VCARD.FN, fullName); Ele começa com algumas definições de constantes e então cria um Model vazio, usando o método createDefaultModel() de ModelFactory para criar um modelo na memória. Jena possui outras implementações da interface Model, e.g. uma que usa banco de dados relacionais: esses tipos de modelo são também disponibilizados a partir de ModelFactory.\nO recurso Jonh Smith é então criado, e uma propriedade é adicionada a ele. A propriedade é fornecida pela a classe \"constante\" VCARD, que mantém os objetos que representam todas as definições no esquema de VCARD. Jena provê classes constantes para outros esquemas bem conhecidos, bem como os próprios RDF e RDFs , Dublin Core e OWL.\nO código para criar o recurso e adicionar a propriedade pode ser escrito de forma mais compacta usando um estilo cascata:\nResource johnSmith = model.createResource(personURI) .addProperty(VCARD.FN, fullName); Os códigos desse exemplo podem ser encontrados no diretório /src-examples no pacote de distribuição do Jena como tutorial 1. Como exercício, pegue este código e modifique-o para criar um próprio VCARD para você.\nAgora vamos adicionar mais detalhes ao vcard, explorando mais recursos de RDF e Jena.\nNo primeiro exemplo, o valor da propriedade foi um número. As propriedades RDF podem também assumir outros recursos como valor. Usando uma técnica comum em RDF, este exemplo mostra como representar diferentes partes do nome de Jonh Smith:\nAqui, nós adicionamos uma nova propriedade, vcard:N, para representar a estrutura do nome de Jonh Smith. Há muitas coisas interessantes sobre este modelo. Note que a propriedade vcard:N usa um recurso como seu valor. Note também que a elipse que representa a composição do nome não possui URI. Isso é conhecido como blank Node.\nO código Jena para construir este exemplo é, novamente, muito simples. Primeiro algumas declarações e a criação do modelo vazio.\n// some definitions String personURI = \u0026#34;http://somewhere/JohnSmith\u0026#34;; String givenName = \u0026#34;John\u0026#34;; String familyName = \u0026#34;Smith\u0026#34;; String fullName = givenName + \u0026#34; \u0026#34; + familyName; // create an empty Model Model model = ModelFactory.createDefaultModel(); // create the resource // and add the properties cascading style Resource johnSmith = model.createResource(personURI) .addProperty(VCARD.FN, fullName) .addProperty(VCARD.N, model.createResource() .addProperty(VCARD.Given, givenName) .addProperty(VCARD.Family, familyName)); Os códigos desse exemplo podem ser encontrados no diretório /src-examples no pacote de distribuição do Jena como tutorial 2.\nSentenças Cada arco no modelo RDF é chamado de sentença. Cada sentença define um fato sobre o recurso. Uma sentença possui três partes:\no sujeito é o recurso de onde o arco começa. o predicado é a propriedade que nomeia o arco. o objeto é o recurso ou literal apontado pelo arco. Uma sentença é algumas vezes chamadas de tripla, por causa de suas três partes.\nUm modelo RDF é representado como um conjunto de sentenças. Cada chamada a addProperty no tutorial2 adiciona uma nova sentença. (Já que um modelo é um conjunto de sentenças, adicionar sentenças duplicadas não afeta em nada). A interface modelo de Jena define o método listStatements() que retorna um StmtIterator, um subtipo de Iterator Java sobre todas as sentenças de um modelo. StmtIterator possui o método nextStatement() que retorna a próxima sentença do iterador (o mesmo que next() faz, já convertido para Statement). A interface Statement provê métodos de acesso ao sujeito, predicado e objeto de uma sentença.\nAgora vamos usar essa interface para estender tutorial2 para listar todas as sentenças criadas e imprimi-las. O código completo deste exemplo pode ser encontrado em tutorial 3.\n// list the statements in the Model StmtIterator iter = model.listStatements(); // print out the predicate, subject and object of each statement while (iter.hasNext()) { Statement stmt = iter.nextStatement(); // get next statement Resource subject = stmt.getSubject(); // get the subject Property predicate = stmt.getPredicate(); // get the predicate RDFNode object = stmt.getObject(); // get the object System.out.print(subject.toString()); System.out.print(\u0026#34; \u0026#34; + predicate.toString() + \u0026#34; \u0026#34;); if (object instanceof Resource) { System.out.print(object.toString()); } else { // object is a literal System.out.print(\u0026#34; \\\u0026#34;\u0026#34; + object.toString() + \u0026#34;\\\u0026#34;\u0026#34;); } System.out.println(\u0026#34; .\u0026#34;); } Já que o objeto de uma sentença pode ser tanto um recurso quanto uma literal, o método getObject() retorna um objeto do tipo RDFNode, que é uma superclasse comum de ambos Resource e Literal. O objeto em si é do tipo apropriado, então o código usa instanceof para determinar qual e processá-lo de acordo.\nQuando executado, o programa deve produzir a saída:\nhttp://somewhere/JohnSmith http://www.w3.org/2001/vcard-rdf/3.0#N anon:14df86:ecc3dee17b:-7fff . anon:14df86:ecc3dee17b:-7fff http://www.w3.org/2001/vcard-rdf/3.0#Family \u0026#34;Smith\u0026#34; . anon:14df86:ecc3dee17b:-7fff http://www.w3.org/2001/vcard-rdf/3.0#Given \u0026#34;John\u0026#34; . http://somewhere/JohnSmith http://www.w3.org/2001/vcard-rdf/3.0#FN \u0026#34;John Smith\u0026#34; . Agora você sabe o porquê de ser simples elaborar modelos. Se você olhar atentamente, você perceberá que cada linha consiste de três campos representando o sujeito, predicado e objeto de cada sentença. Há quatro arcos no nosso modelo, então há quatro sentenças. O \"anon:14df86:ecc3dee17b:-7fff\" é um identificador interno gerado pelo Jena. Não é uma URI e não deve ser confundido como tal. É simplesmente um nome interno usado pela implementação do Jena.\nO W3C RDFCore Working Group definiu uma notação similar chamada N-Triples. O nome significa \"notação de triplas\". Nós veremos na próxima sessão que o Jena possui uma interface de escrita de N-Triples também.\nEscrita de RDF Jena possui métodos para ler e escrever RDF como XML. Eles podem ser usados para armazenar o modelo RDF em um arquivo e carregá-lo novamente em outro momento.\nO Tutorial 3 criou um modelo e o escreveu no formato de triplas. Tutorial 4 modifica o tutorial 3 para escrever o modelo na forma de RDF XML numa stream de saída. O código, novamente, é muito simples: model.write pode receber um OutputStream como argumento.\n// now write the model in XML form to a file model.write(System.out); A saída deve parecer com isso:\n\u0026lt;rdf:RDF xmlns:rdf=\u0026#39;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026#39; xmlns:vcard=\u0026#39;http://www.w3.org/2001/vcard-rdf/3.0#\u0026#39; \u0026gt; \u0026lt;rdf:Description rdf:about=\u0026#39;http://somewhere/JohnSmith\u0026#39;\u0026gt; \u0026lt;vcard:FN\u0026gt;John Smith\u0026lt;/vcard:FN\u0026gt; \u0026lt;vcard:N rdf:nodeID=\u0026#34;A0\u0026#34;/\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;rdf:Description rdf:nodeID=\u0026#34;A0\u0026#34;\u0026gt; \u0026lt;vcard:Given\u0026gt;John\u0026lt;/vcard:Given\u0026gt; \u0026lt;vcard:Family\u0026gt;Smith\u0026lt;/vcard:Family\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;/rdf:RDF\u0026gt; As especificações de RDF especificam como representar RDF como XML. A sintaxe de RDF XML é bastante complexa. Recomendamos ao leitor dar uma olhada no primer sendo desenvolvido pelo RDFCore WG para uma introdução mais detalhada. Entretanto, vamos dar uma olhada rápida em como interpretar a saída acima.\nRDF é normalmente encapsulada num elemento \u0026lt;rdf:RDF\u0026gt;. O elemento é opcional se houver outras maneiras de saber se aquele XML é RDF, mas normalmente ele é presente. O elemento RDF define os dois namespaces usados no documento. Há um elemento \u0026lt;rdf:Description\u0026gt; que descreve o recurso cuja URI é \"http://somewhere/JohnSmith\". Se o atributo rdf:about estivesse ausente, esse elemento representaria um blank node.\nO elemento \u0026lt;vcard:FN\u0026gt; descreve uma propriedade do recurso. O nome da propriedade é o \"FN\" no namespace do vcard. RDF o converte para uma referência URI concatenando a referência URI do prefixo presente no namespace de vcard e \"FN\", o nome local parte do nome. Isto nos dá a referência URI \" http://www.w3.org/2001/vcard-rdf/3.0#FN\". O valor da propriedade é a literal \"Jonh Smith\".\nO elemento \u0026lt;vcard:N\u0026gt; é um recurso. Neste caso, o recurso é representado por uma referência URI relativa. RDF o converte para uma referência URI absoluta concatenando com o URI base do documento.\nHá um erro nesse RDF XML: ele não representa exatamente o modelo que criamos. Foi dado uma URI ao blank node do modelo. Ele não é mais um blank node portanto. A sintaxe RDF/XML não é capaz de representar todos os modelos RDF; por exemplo, ela não pode representar um blank node que é o objeto de duas sentenças. O escritor que usamos para escrever este RDF/XML não é capaz de escrever corretamente o subconjunto de modelos que podem ser escritos corretamente. Ele dá uma URI a cada blank node, tornando-o não mais blank.\nJena possui uma interface extensível que permite novos escritores para diferentes linguagens de serialização RDF. Jena possuem também um escritor RDF/XML mais sofisticado que pode ser invocado ao especificar outro argumento à chamada de método RDFDataMgr.write:\n// now write the model in a pretty form RDFDataMgr.write(System.out, model, Lang.RDFXML); Este escritor, chamado também de PrettyWriter, ganha vantagem ao usar as particularidades da sintaxe abreviada de RDF/XML ao criar um modelo mais compacto. Ele também é capaz de preservar os blank nodes onde é possível. Entretanto, não é recomendável para escrever modelos muito grandes, já que sua desempenho deixa a desejar. Para escrever grandes arquivos e preservar os blank nodes, escreva no formato de N-Triplas:\n// now write the model in XML form to a file RDFDataMgr.write(System.out, model, Lang.NTRIPLES); Isso produzirá uma saída similar à do tutorial 3, que está em conformidade com a especificação de N-Triplas.\nLeitura de RDF Tutorial 5 demonstra a leitura num modelo de sentenças gravadas num RDF XML. Com este tutorial, nós teremos criado uma pequena base de dados de vcards na forma RDF/XML. O código a seguir fará leitura e escrita. Note que para esta aplicação rodar, o arquivo de entrada precisa estar no diretório da aplicação.\n// create an empty model Model model = ModelFactory.createDefaultModel(); // use the RDFDataMgr to find the input file InputStream in = RDFDataMgr.open( inputFileName ); if (in == null) { throw new IllegalArgumentException( \u0026#34;File: \u0026#34; + inputFileName + \u0026#34; not found\u0026#34;); } // read the RDF/XML file model.read(in, null); // write it to standard out model.write(System.out); O segundo argumento da chamada de método read() é a URI que será usada para resolver URIs relativas. Como não há referências URI relativas no arquivo de teste, ele pode ser vazio. Quando executado, tutorial 5 produzirá uma saída XML como esta:\n\u0026lt;rdf:RDF xmlns:rdf=\u0026#39;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026#39; xmlns:vcard=\u0026#39;http://www.w3.org/2001/vcard-rdf/3.0#\u0026#39; \u0026gt; \u0026lt;rdf:Description rdf:nodeID=\u0026#34;A0\u0026#34;\u0026gt; \u0026lt;vcard:Family\u0026gt;Smith\u0026lt;/vcard:Family\u0026gt; \u0026lt;vcard:Given\u0026gt;John\u0026lt;/vcard:Given\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;rdf:Description rdf:about=\u0026#39;http://somewhere/JohnSmith/\u0026#39;\u0026gt; \u0026lt;vcard:FN\u0026gt;John Smith\u0026lt;/vcard:FN\u0026gt; \u0026lt;vcard:N rdf:nodeID=\u0026#34;A0\u0026#34;/\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;rdf:Description rdf:about=\u0026#39;http://somewhere/SarahJones/\u0026#39;\u0026gt; \u0026lt;vcard:FN\u0026gt;Sarah Jones\u0026lt;/vcard:FN\u0026gt; \u0026lt;vcard:N rdf:nodeID=\u0026#34;A1\u0026#34;/\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;rdf:Description rdf:about=\u0026#39;http://somewhere/MattJones/\u0026#39;\u0026gt; \u0026lt;vcard:FN\u0026gt;Matt Jones\u0026lt;/vcard:FN\u0026gt; \u0026lt;vcard:N rdf:nodeID=\u0026#34;A2\u0026#34;/\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;rdf:Description rdf:nodeID=\u0026#34;A3\u0026#34;\u0026gt; \u0026lt;vcard:Family\u0026gt;Smith\u0026lt;/vcard:Family\u0026gt; \u0026lt;vcard:Given\u0026gt;Rebecca\u0026lt;/vcard:Given\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;rdf:Description rdf:nodeID=\u0026#34;A1\u0026#34;\u0026gt; \u0026lt;vcard:Family\u0026gt;Jones\u0026lt;/vcard:Family\u0026gt; \u0026lt;vcard:Given\u0026gt;Sarah\u0026lt;/vcard:Given\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;rdf:Description rdf:nodeID=\u0026#34;A2\u0026#34;\u0026gt; \u0026lt;vcard:Family\u0026gt;Jones\u0026lt;/vcard:Family\u0026gt; \u0026lt;vcard:Given\u0026gt;Matthew\u0026lt;/vcard:Given\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;rdf:Description rdf:about=\u0026#39;http://somewhere/RebeccaSmith/\u0026#39;\u0026gt; \u0026lt;vcard:FN\u0026gt;Becky Smith\u0026lt;/vcard:FN\u0026gt; \u0026lt;vcard:N rdf:nodeID=\u0026#34;A3\u0026#34;/\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;/rdf:RDF\u0026gt; Controlando prefixos Definições explícitas de prefixos Na sessão anterior, nós vimos que a saída XML declarou um prefixo namespace\nvcard e o usou para abreviar URIs. Enquanto que RDF usa somente URIs completas, e não sua forma encurtada, Jena provê formas de controlar namespaces usados na saída com seu mapeamento de prefixos. Aqui vai um exemplo simples.\nModel m = ModelFactory.createDefaultModel(); String nsA = \u0026#34;http://somewhere/else#\u0026#34;; String nsB = \u0026#34;http://nowhere/else#\u0026#34;; Resource root = m.createResource( nsA + \u0026#34;root\u0026#34; ); Property P = m.createProperty( nsA + \u0026#34;P\u0026#34; ); Property Q = m.createProperty( nsB + \u0026#34;Q\u0026#34; ); Resource x = m.createResource( nsA + \u0026#34;x\u0026#34; ); Resource y = m.createResource( nsA + \u0026#34;y\u0026#34; ); Resource z = m.createResource( nsA + \u0026#34;z\u0026#34; ); m.add( root, P, x ).add( root, P, y ).add( y, Q, z ); System.out.println( \u0026#34;# -- no special prefixes defined\u0026#34; ); m.write( System.out ); System.out.println( \u0026#34;# -- nsA defined\u0026#34; ); m.setNsPrefix( \u0026#34;nsA\u0026#34;, nsA ); m.write( System.out ); System.out.println( \u0026#34;# -- nsA and cat defined\u0026#34; ); m.setNsPrefix( \u0026#34;cat\u0026#34;, nsB ); m.write( System.out ); A saída deste fragmento são três blocos de RDF/XML, com três diferentes mapeamentos de prefixos. Primeiro o padrão, sem prefixos diferentes dos padrões:\n# -- no special prefixes defined \u0026lt;rdf:RDF xmlns:j.0=\u0026#34;http://nowhere/else#\u0026#34; xmlns:rdf=\u0026#34;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026#34; xmlns:j.1=\u0026#34;http://somewhere/else#\u0026#34; \u0026gt; \u0026lt;rdf:Description rdf:about=\u0026#34;http://somewhere/else#root\u0026#34;\u0026gt; \u0026lt;j.1:P rdf:resource=\u0026#34;http://somewhere/else#x\u0026#34;/\u0026gt; \u0026lt;j.1:P rdf:resource=\u0026#34;http://somewhere/else#y\u0026#34;/\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;rdf:Description rdf:about=\u0026#34;http://somewhere/else#y\u0026#34;\u0026gt; \u0026lt;j.0:Q rdf:resource=\u0026#34;http://somewhere/else#z\u0026#34;/\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;/rdf:RDF\u0026gt; Nós vimos que o namespace rdf é declarado automaticamente, já que são requeridos para tags como \u0026lt;RDF:rdf\u0026gt; e \u0026lt;rdf:resource\u0026gt;. Declarações de namespace são também necessárias para o uso das duas propriedades P e Q, mas já que seus namespaces não foram introduzidos no modelo, eles recebem nomes namespaces inventados j.0 e j.1.\nO método setNsPrefix(String prefix, String URI) declara que o namespace da URI deve ser abreviado por prefixos. Jena requer que o prefixo seja um namespace XML correto, e que o URI termine com um caractere sem-nome. O escritor RDF/XML transformará essas declarações de prefixos em declarações de namespaces XML e as usará nas suas saídas: # -- nsA defined \u0026lt;rdf:RDF xmlns:j.0=\u0026#34;http://nowhere/else#\u0026#34; xmlns:rdf=\u0026#34;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026#34; xmlns:nsA=\u0026#34;http://somewhere/else#\u0026#34; \u0026gt; \u0026lt;rdf:Description rdf:about=\u0026#34;http://somewhere/else#root\u0026#34;\u0026gt; \u0026lt;nsA:P rdf:resource=\u0026#34;http://somewhere/else#x\u0026#34;/\u0026gt; \u0026lt;nsA:P rdf:resource=\u0026#34;http://somewhere/else#y\u0026#34;/\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;rdf:Description rdf:about=\u0026#34;http://somewhere/else#y\u0026#34;\u0026gt; \u0026lt;j.0:Q rdf:resource=\u0026#34;http://somewhere/else#z\u0026#34;/\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;/rdf:RDF\u0026gt; O outro namespace ainda recebe o nome criado automaticamente, mas o nome nsA é agora usado nas tags de propriedades. Não há necessidade de que o nome do prefixo tenha alguma relação com as variáveis do código Jena:\n# -- nsA and cat defined \u0026lt;rdf:RDF xmlns:cat=\u0026#34;http://nowhere/else#\u0026#34; xmlns:rdf=\u0026#34;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026#34; xmlns:nsA=\u0026#34;http://somewhere/else#\u0026#34; \u0026gt; \u0026lt;rdf:Description rdf:about=\u0026#34;http://somewhere/else#root\u0026#34;\u0026gt; \u0026lt;nsA:P rdf:resource=\u0026#34;http://somewhere/else#x\u0026#34;/\u0026gt; \u0026lt;nsA:P rdf:resource=\u0026#34;http://somewhere/else#y\u0026#34;/\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;rdf:Description rdf:about=\u0026#34;http://somewhere/else#y\u0026#34;\u0026gt; \u0026lt;cat:Q rdf:resource=\u0026#34;http://somewhere/else#z\u0026#34;/\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;/rdf:RDF\u0026gt; Ambos os prefixos são usados na saída, e não houve a necessidade de prefixos gerados automaticamente.\nDefinições implícitas de prefixos Assim como as declarações de prefixos definidas por chamadas a setNsPrefix, Jena vai lembrar-se dos prefixos que foram usados na entrada para model.read().\nPegue a saída produzida pelo fragmento anterior e cole-o dentro de algum arquivo, com a URL file:/tmp/fragment.rdf say. E execute o código: Model m2 = ModelFactory.createDefaultModel(); m2.read( \u0026#34;file:/tmp/fragment.rdf\u0026#34; ); m2.write( System.out ); Você verá que os prefixos da entrada são preservados na saída. Todos os prefixos são escritos, mesmo se eles não forem usados em lugar algum. Você pode remover um prefixo com removeNsPrefix(String prefix) se você não o quiser na saída.\nComo N-Triplas não possuem nenhuma forma reduzida de escrever URIs, não há prefixos nem na entrada nem na saída. A notação N3, também suportada pelo Jena, possui nomes prefixados reduzidos, e grava-os na entrada e usa-os na saída. Jena possui outras operações sobre mapeamento de prefixos de um modelo, como umMap de Java extraído a partir dos mapeamentos existentes, ou a adição de um grupo inteiro de mapeamentos de uma só vez; olhe a documentação de PrefixMapping para mais detalhes. Pacotes Jena RDF Jena é uma API JAVA para aplicações de web semântica. O pacote RDF chave para o desenvolvedor é org.apache.jena.rdf.model. A API tem sido definida em termos de interfaces, logo o código da aplicação pode trabalhar com diferentes implementações sem causar mudanças. Esse pacote contém interfaces para representar modelos, recursos, propriedades, literais, sentenças e todos os outros conceitos chaves de RDF, e um ModelFactory para criação de modelos. Portanto, o código da aplicação permanece independente da implementação, o melhor é usar interfaces onde for possível e não implementações específicas de classes.\nO pacote org.apache.jena.tutorial contém o código fonte funcional de todos os exemplos usados neste tutorial.\nOs pacotes org.apache.jena...impl contêm a implementação de classes que podem ser comuns a várias implementações. Por exemplo, eles definem as classes ResourceImpl, PropertyImpl, e LiteralImpl que podem ser usadas diretamente ou então herdadas por diferentes implementações. As aplicações devem raramente usar essas classes diretamente. Por exemplo, em vez de criar um nova instância de ResourceImpl, é melhor usar o método createResource do modelo que estiver sendo usado. Desta forma, se a implementação do modelo usar uma implementação otimizada de Resource, então não serão necessárias conversões entre os dois tipos.\nNavegação em Modelos Até agora, este tutorial mostrou como criar, ler e escrever modelos RDF. Chegou o momento de mostrar como acessar as informações mantidas num modelo.\nDada a URI de um recurso, o objeto do recurso pode ser recuperado de um modelo usando o método Model.getResource(String uri). Este método é definido para retornar um objeto Resource se ele existir no modelo, ou, caso contrário, criar um novo. Por exemplo, para recuperar o recurso Adam Smith do modelo lido a partir do arquivo no tutorial 5:\n// retrieve the John Smith vcard resource from the model Resource vcard = model.getResource(johnSmithURI); A interface Resource define numerosos métodos para acessar as propriedades de um recurso. O método Resource.getProperty(Property p) acessa uma propriedade do recurso. Este método não segue a convenção usual de Java de acesso já que o tipo do objeto retorna é Statement, e não Property como era de se esperar. Retornando toda a sentença permite à aplicação acessar o valor da propriedade usando um de seus métodos de acesso que retornam o objeto da sentença. Por exemplo, para recuperar o recurso que é o valor da propriedade vcard:N\n// retrieve the value of the N property Resource name = (Resource) vcard.getProperty(VCARD.N) .getObject(); De modo geral, o objeto de uma sentença pode ser um recurso ou uma literal, então a aplicação, sabendo que o valor precisar ser um recurso, faz o cast do objeto retornado. Uma das coisas que Jena tenta fazer é fornecer tipos específicos de métodos, então a aplicação não tem que fazer cast, e checagem de tipos pode ser feita em tempo de compilação. O fragmento de código acima poderia ser mais convenientemente escrito assim:\n// retrieve the value of the N property Resource name = vcard.getProperty(VCARD.N) .getResource(); Similarmente, o valor literal de uma propriedade pode ser recuperado:\nString fullName = vcard.getProperty(VCARD.FN) .getString(); Neste exemplo, o recurso vcard possui somente as propriedades vcard:FN e vcard:N. RDF permite a um recurso repetir uma propriedade; por exemplo, Adam pode ter mais de um apelido. Vamos dar dois apelidos a ele:\n// add two nickname properties to vcard vcard.addProperty(VCARD.NICKNAME, \u0026#34;Smithy\u0026#34;) .addProperty(VCARD.NICKNAME, \u0026#34;Adman\u0026#34;); Como notado anteriormente, Jena representa um modelo RDF como um conjunto de sentenças, então, adicionar uma sentença com um sujeito, predicado e objeto igual a um já existente não terá efeito. Jena não define qual do dois apelidos será retornado. O resultado da chamada a vcard.getProperty(VCARD.NICKNAME) é indeterminado. Jena vai retornar um dos valores, mas não há garantia nem mesmo de que duas chamadas consecutivas irá retornar o mesmo valor.\nSe for possível que uma propriedade ocorra mais de uma vez, então o método Resource.listProperties(Property p) pode ser usado para retornar um iterador para lista-las. Este método retorna um iterador que retorna objetos do tipo Statement. Nós podemos listar os apelidos assim:\n// set up the output System.out.println(\u0026#34;The nicknames of \\\u0026#34;\u0026#34; + fullName + \u0026#34;\\\u0026#34; are:\u0026#34;); // list the nicknames StmtIterator iter = vcard.listProperties(VCARD.NICKNAME); while (iter.hasNext()) { System.out.println(\u0026#34; \u0026#34; + iter.nextStatement() .getObject() .toString()); } Esse código pode ser encontrado em tutorial 6. O iterador iter reproduz todas as sentenças com sujeito vcard e predicado VCARD.NICKNAME, então, iterar sobre ele permite recuperar cada sentença usando nextStatement(), pegar o campo do objeto, e convertê-lo para string. O código produz a seguinte saída quando executado:\nThe nicknames of \u0026#34;John Smith\u0026#34; are: Smithy Adman Todas as propriedades de um recurso podem ser listadas usando o método listProperties() sem argumentos. Consultas em Modelos A sessão anterior mostrou como navegar um modelo a partir de um recurso com uma URI conhecida. Essa sessão mostrará como fazer buscas em um modelo. O núcleo da API Jena suporta um limitada primitiva de consulta. As consultas mais poderosas de SPARQL são descritas em outros lugares.\nO método Model.listStatements(), que lista todos as sentenças de um modelo, é talvez a forma mais crua de se consultar um modelo. Este uso não é recomendado em modelos muito grandes. Model.listSubjects() é similar, mas retorna um iterador sobre todos os recursos que possuem propriedades, ie são sujeitos de alguma sentença.\nModel.listSubjectsWithProperty(Property p, RDFNode o) retornará um iterador sobre todos os recursos com propriedade p de valor o. Se nós assumirmos que somente recursos vcard terão a propriedade vcard:FN e que, em nossos dados, todos esses recursos têm essa propriedade, então podemos encontrar todos os vCards assim:\n// list vcards ResIterator iter = model.listSubjectsWithProperty(VCARD.FN); while (iter.hasNext()) { Resource r = iter.nextResource(); ... } Todos esses métodos de consulta são açúcar sintático sobre o método primitivo de consulta model.listStatements(Selector s). Esse método retorna um iterador sobre todas as sentenças no modelo 'selecionado' por s. A interface de selector foi feita para ser extensível, mas por hora, só há uma implementação dela, a classe SimpleSelector do pacote org.apache.jena.rdf.model. Usar SimpleSelector é uma das raras ocasiões em Jena onde é necessário usar uma classe especifica em vez de uma interface. O construtor de SimpleSelector recebe três argumentos:\nSelector selector = new SimpleSelector(subject, predicate, object); Esse selector vai selecionar todas as sentenças em que o sujeito casa com subject, um predicado que casa com predicate e um objeto que casa com object. Se null é passado para algum dos argumentos, ele vai casar com qualquer coisa, caso contrário, ele vai casar com os recursos ou literais correspondentes. (Dois recursos são iguais se eles possuem o mesmo URI ou são o mesmo blank node; dois literais são iguais se todos os seus componentes forem iguais.) Assim:\nSelector selector = new SimpleSelector(null, null, null); vai selecionar todas as sentenças do modelo.\nSelector selector = new SimpleSelector(null, VCARD.FN, null); vai selecionar todas as sentenças com o predicado VCARD.FN, independente do sujeito ou objeto. Como um atalho especial, listStatements( S, P, O ) é equivalente a\nlistStatements( new SimpleSelector( S, P, O ) ) O código a seguir, que pode ser encontrado em tutorial 7 que lista os nomes completos de todos os vcards do banco de dados.\n// select all the resources with a VCARD.FN property ResIterator iter = model.listSubjectsWithProperty(VCARD.FN); if (iter.hasNext()) { System.out.println(\u0026#34;The database contains vcards for:\u0026#34;); while (iter.hasNext()) { System.out.println(\u0026#34; \u0026#34; + iter.nextResource() .getProperty(VCARD.FN) .getString()); } } else { System.out.println(\u0026#34;No vcards were found in the database\u0026#34;); } Isso deve produzir uma saída similar a:\nThe database contains vcards for: Sarah Jones John Smith Matt Jones Becky Smith Seu próximo exercício é modificar o código para usar SimpleSelector em vez de listSubjectsWithProperty.\nVamos ver como implementar um controle mais refinado sobre as sentenças selecionadas. SimpleSelector pode ser herdado ter seus selects modificado para mais filtragens:\n// select all the resources with a VCARD.FN property // whose value ends with \u0026#34;Smith\u0026#34; StmtIterator iter = model.listStatements( new SimpleSelector(null, VCARD.FN, (RDFNode) null) { public boolean selects(Statement s) {return s.getString().endsWith(\u0026#34;Smith\u0026#34;);} }); Esse código usa uma técnica elegante de Java para sobrescrever a definição de um método quando criamos uma instância da classe. Aqui, o método selects(...) garante que o nome completo termine com “Smith”. É importante notar que a filtragem baseada nos argumentos sujeito, predicado e objeto tem lugar antes que o método selects(...) seja chamado, então esse teste extra só será aplicado para casar sentenças.\nO código completo pode ser encontrado no tutorial 8 e produz uma saída igual a:\nThe database contains vcards for: John Smith Becky Smith Você pode imaginar que:\n// do all filtering in the selects method StmtIterator iter = model.listStatements( new SimpleSelector(null, null, (RDFNode) null) { public boolean selects(Statement s) { return (subject == null || s.getSubject().equals(subject)) \u0026amp;amp;\u0026amp;amp; (predicate == null || s.getPredicate().equals(predicate)) \u0026amp;amp;\u0026amp;amp; (object == null || s.getObject().equals(object)) ; } } }); é equivalente a:\nStmtIterator iter = model.listStatements(new SimpleSelector(subject, predicate, object) Embora possam ser funcionalmente equivalentes, a primeira forma vai listar todas as sentenças do modelo e testar cada uma individualmente, a segunda forma permite índices mantidos pela implementação para melhor a perfomance. Tente isso em modelos grandes e veja você mesmo, mas prepare uma chícara de café antes.\nOperações em Modelos Jena provê três operações para manipular modelos. Elas são operações comuns de conjunto: união, intersecção e diferença.\nA união de dois modelos é a união do conjunto de sentenças que representa cada modelo. Esta é uma das operações chaves que RDF suporta. Isso permite que fontes de dados discrepantes sejam juntadas. Considere o seguintes modelos:\nand Quando eles são juntados, os dois nós http://...JohnSmith são unidos em um, e o arco vcard:FN duplicado é descartado para produzir:\nVamos ver o código (o código completo está em tutorial 9) e ver o que acontece.\n// read the RDF/XML files model1.read(new InputStreamReader(in1), \u0026#34;\u0026#34;); model2.read(new InputStreamReader(in2), \u0026#34;\u0026#34;); // merge the Models Model model = model1.union(model2); // print the Model as RDF/XML model.write(system.out, \u0026#34;RDF/XML-ABBREV\u0026#34;); A saída produzida pelo PrettyWriter se assemelha a:\n\u0026lt;rdf:RDF xmlns:rdf=\u0026#34;\u0026lt;a href=\u0026#34;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026#34;\u0026gt;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026lt;/a\u0026gt;\u0026#34; xmlns:vcard=\u0026#34;http://www.w3.org/2001/vcard-rdf/3.0#\u0026#34;\u0026gt; \u0026lt;rdf:Description rdf:about=\u0026#34;http://somewhere/JohnSmith/\u0026#34;\u0026gt; \u0026lt;vcard:EMAIL\u0026gt; \u0026lt;vcard:internet\u0026gt; \u0026lt;rdf:value\u0026gt;John@somewhere.com\u0026lt;/rdf:value\u0026gt; \u0026lt;/vcard:internet\u0026gt; \u0026lt;/vcard:EMAIL\u0026gt; \u0026lt;vcard:N rdf:parseType=\u0026#34;Resource\u0026#34;\u0026gt; \u0026lt;vcard:Given\u0026gt;John\u0026lt;/vcard:Given\u0026gt; \u0026lt;vcard:Family\u0026gt;Smith\u0026lt;/vcard:Family\u0026gt; \u0026lt;/vcard:N\u0026gt; \u0026lt;vcard:FN\u0026gt;John Smith\u0026lt;/vcard:FN\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;/rdf:RDF\u0026gt; Mesmo que você não seja familiarizado com os detalhes da sintaxe RDF/XML, deve ser relativamente claro que os modelos foram unidos como esperado A interseção e a diferença de modelos podem ser computados de maneira semelhante, usando os métodos .intersection(Model) e .difference(Model); veja a documentação de difference e ","permalink":"https://jena.apache.org/tutorials/rdf_api_pt.html","tags":null,"title":"Uma Introdução a RDF e à API RDF de Jena"},{"categories":null,"contents":"Este tutorial mostrará como importar o projeto do Jena no Eclipse. A versão do Eclipse utilizada foi 4.7.0, e do Java foi 1.8.0_121. O sistema operacional não deve ser um problema, entã os ũnicos requisitos são Eclipse, Java 1.8.x, e git para baixar o código-fonte do Jena.\nConfigurando seu ambiente O primeiro passo é instalar o Java JDK 1.8.x. As instruções para a instalação variam dependendo do sistema operacional, e não serão abordadas neste tutorial.\nApós a instalação do Java, o próximo passo é o Eclipse. Você pode baixar uma versão do Eclipse, ou baixar o instalador e escolher entre as diferentes versões disponíveis. As instruções e screenshots a seguir foram feitos com a versão “Eclipse IDE for Java Developers”.\nO Eclipse vem com uma versão do Apache Maven embutida, mas talvez você prefira utilizar uma versão externa para que você possa customizar as configurações para o seu ambiente. Este passo não é necessário para este tutorial, e também não será discutido neste tutorial.\nBaixando o código-fonte Siga as instruções da página Getting involved in Apache Jena para baixar o código-fonte do repositório Git. Muitos desenvolvedores baixam o código-fonte em um diretório dentro do workspace do Eclipse. Mas você pode importar o código-fonte no Eclipse de qualquer diretório, como será demonstrado a seguir.\nE não esqueça de executar mvn clean install, para que o Eclipse possa encontrar todos as dependências necessárias.\nImportando o código-fonte no Eclipse Por padrão, o Eclipse provê uma integração com Maven. Antigamente você teria que instalar um plug-in primeiro. Mas se você tiver seguido as intruções anteriores, você deve estar pronto para importar o código-fonte.\nNa figura anterior, o workspace do Eclipse está não tem nenhum projeto ainda. A perspectiva foi configurada para mostrar “working sets”, e já há um working set criado para o Jena. Este passo não é necessário para este tutorial, mas pode ser útil se você tiver vários projetos no seu workspace (por exemplo, se você tiver importado Apache Commons RDF e Apache Jena no mesmo workspace).\nPadrão o Eclipse mantém seus projetos no painel à esquerda. Clique com o botão direito do mouse sobre este painel e escolha “Import”. Se você preferir, você pode utilizar o menu superior e ir para File / Import.\nVocê deverá ver um diálogo, onde poderá escolher entre diferentes tipos de projetos para importar no seu workspace. Para o Jena, você deve selecionar importar Existing Maven Project, que fica na categoria de projetos Maven.\nClicando em Next, você verá uma nova tela onde você poderá escolher a localização do código-fonte do Jena. Escolha o diretório onde você baixou o código-fonte na seção anterior deste tutorial.\nAgora clique em Finish e o Eclipse deverá começar a importar o projeto. Este passo pode levar vários minutos, dependendo dos recursos disponíveis no seu sistema operacional e hardware. Você pode acompanhar o progresso na aba Progress, no painel inferior.\nAssim que o projeto tiver sido importado no seu workspace, você deverá ver algo similar à tela seguinte.\nQuando o projeto tiver sido importado, o Eclipse deverá começar a construir o projeto automaticamente se você estiver com as configurações padrões, senão você pode clicar em Project / _ Build All_.\nO Eclipse mostrará um ícone vermelho nos projetos importados que tiverem problemas. Agora veremos como arrumar estes problemas.\nOs problemas são geralmente relacionados a um problema conhecido por como um dos projetos utiliza o Maven Shade Plugin nas classes do Google Guava.\nA solução é garantir que o projeto jena-shaded-guava fique fechado no workspace do Eclipse. Você pode simplesmente clicar com o botão direito sobre o projeto, e escolher Close. O ícone do projeto deverá mudar, indicando que ele foi fechado com sucesso.\nFeito isso, é uma boa ideia selecionar a opção para limpar (Clean) todos os projetos abertos, para que o Eclipse então comece a construir os projetos novamente.\nVocê também pode atualizar as configurações dos projetos Maven, para que o Eclipse entenda que um projeto foi fechado e utilize a dependência do seu repositório Maven local, ao invés do projeto importado no workspace.\nSe você seguiu todos os passos até aqui, e não há nenhuma tarefa rodando em segundo-plano (verifique a aba Progress) então o seu projeto deve estar sendo construído com sucesso.\nSe você quiser testar o Fuseki agora, por exemplo, abra o projeto jena-fuseki-core, navegue até o pacote org.apache.jena.fuseki.cmd, e execute FusekiCmd como Java Application.\nO Fuseki deverá iniciar, e estará disponível em http://localhost:3030.\nAgora você já pode debugar o Jena, modificar o código-fonte e construir o projeto novamente, ou importar ou criar outros projetos no seu workspace, e utilizá-los com a última versão do Jena.\n","permalink":"https://jena.apache.org/tutorials/using_jena_with_eclipse_pt.html","tags":null,"title":"Usando o Jena com o Eclipse"},{"categories":null,"contents":"Legacy Documentation : not up-to-date\nThe original ARQ parser will be removed from Jena.\nARP can be used both as a Jena subsystem, or as a standalone RDF/XML parser. This document gives a quick guide to using ARP standalone.\nOverview To load an RDF file:\nCreate an ARP instance. Set parse options, particularly error detection control, using getOptions or setOptionsWith. Set its handlers, by calling the getHandlers or setHandlersWith methods, and then. Setting the statement handler. Optionally setting the other handlers. Call a load method Xerces is used for parsing the XML. The SAXEvents generated by Xerces are then analysed as RDF by ARP. It is possible to use a different source of SAX events.\nErrors may occur in either the XML or the RDF part.\nSample Code ARP arp = new ARP(); // initialisation - uses ARPConfig interface only. arp.getOptions().setLaxErrorMode(); arp.getHandlers().setErrorHandler(new ErrorHandler(){ public void fatalError(SAXParseException e){ // TODO code } public void error(SAXParseException e){ // TODO code } public void warning(SAXParseException e){ // TODO code } }); arp.getHandlers().setStatementHandler(new StatementHandler(){ public void statement(AResource a, AResource b, ALiteral l){ // TODO code } public void statement(AResource a, AResource b, AResource l){ // TODO code } }); // parsing. try { // Loading fixed input ... arp.load(new StringReader( \u0026quot;\u0026lt;rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'\u0026gt;\\n\u0026quot; +\u0026quot;\u0026lt;rdf:Description\u0026gt;\u0026lt;rdf:value rdf:parseType='Literal'\u0026gt;\u0026quot; +\u0026quot;\u0026lt;b\u0026gt;hello\u0026lt;/b\u0026gt;\u0026lt;/rdf:value\u0026gt;\\n\u0026quot; +\u0026quot;\u0026lt;/rdf:Description\u0026gt;\u0026lt;/rdf:RDF\u0026gt;\u0026quot; )); } catch (IOException ioe){ // something unexpected went wrong } catch (SAXParseException s){ // This error will have been reported } catch (SAXException ss) { // This error will not have been reported. } ARP Event Handling ARP reports events concerning:\nTriples found in the input. Errors in the input. Namespace declarations. Scope of blank nodes. User code is needed to respond to any of these events of interest. This is written by implementing any of the relevant interfaces: StatementHandler, org.xml.sax.ErrorHandler, NamespaceHandler, and ExtendedHandler.\nAn individual handler is set by calling the getHandlers method on the ARP instance. This returns an encapsulation of all the handlers being used. A specific handler is set by calling the appropriate set\u0026hellip;Handler method on that object, e.g. setStatementHandler.\nAll the handlers can be copied from one ARP instance to another by using the setHandlersWith method:\nARP from, to; // initialize from and to // ... to.setHandlersWith(from.getHandlers()); The error handler reports both XML and RDF errors, the former detected by Xerces. See ARPHandlers.setErrorHandler for details of how to distinguish between them.\nConfiguring ARP ARP can be configured to treat most error conditions as warnings or to be ignored, and to treat some non-error conditions as warnings or errors.\nIn addition, the behaviour in response to input that does not have an \u0026lt;rdf:RDF\u0026gt; root element is configurable: either to treat the whole file as RDF anyway, or to scan the file looking for embedded \u0026lt;rdf:RDF\u0026gt; elements.\nAs with the handlers, there is an options object that encapsulates these settings. It can be accessed using getOptions, and then individual settings can be made using the methods in ARPOptions.\nIt is also possible to copy all the option settings from one ARP instance to another:\nARP from, to; // initialize from and to ... to.setOptionsWith(from.getOptions()); The I/O how-to gives some more detail about the options settings, although it assumes the use of the Jena RDFReader interface.\nInterrupting ARP It is possible to interrupt an ARP thread. See the I/O how-to for details.\nUsing Other SAX Sources It is possible to use ARP with other SAX input sources, e.g. from a non-Xerces parser, or from an in-memory XML source, such as a DOM tree.\nInstead of an ARP instance, you create an instance of SAX2RDF using the newInstance method. This can be configured just like an ARP instance, following the initialization section of the sample code.\nThis is used like a SAX2Model instance as described elsewhere.\nMemory usage For very large files, ARP does not use any additional memory except when either the ExtendedHandler.discardNodesWithNodeID returns false or when the AResource.setUserData method has been used. In these cases ARP needs to remember the rdf:nodeID usage through the file life time.\n","permalink":"https://jena.apache.org/documentation/io/arp/arp_standalone.html","tags":null,"title":"Using ARP Without Jena"},{"categories":null,"contents":"Apache Maven is a tool to help Java projects manage their dependencies on library code, such as Jena. By declaring a dependency on the core of Jena in your project\u0026rsquo;s pom.xml file, you will get the consistent set of library files that Jena depends on automatically added too.\nThis page assumes you have Maven installed on your computer. If this is not the case, please read and follow these instructions.\nRepositories Released maven artifacts are mirrored to the central maven repositories.\nDevelopment snapshots are available as well.\nhttps://repository.apache.org/content/repositories/snapshots/\nStable Jena releases are automatically mirrored by the central Maven repositories, so there will normally be no need to add any extra repositories to your pom.xml or settings.xml.\nSpecifying Jena as a dependency This is how to specify in your pom.xml file the dependency on a version of Jena:\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;apache-jena-libs\u0026lt;/artifactId\u0026gt; \u0026lt;type\u0026gt;pom\u0026lt;/type\u0026gt; \u0026lt;version\u0026gt;X.Y.Z\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; This will transitively resolve all the dependencies for you: jena-core, jena-arq, jena-tdb and jena-iri and their dependencies.\nNote the use of \u0026lt;type\u0026gt;pom\u0026lt;/type\u0026gt; above.\nOther modules need to be added separately, for example:\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-text\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;x.y.z\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; Please check for the latest versions.\nMajor Artifacts Jena provides a number of maven artifacts as delivery points.\nThere are also a number of maven artifacts used as part of structuring Jena development.\nArtifact ID Packaging (\u0026lt;type\u0026gt;) Description apache-jena-libs pom A POM artifact that may be referenced to pull in all the standard Jena Libraries (Core, ARQ, IRI, and TDB) with a single dependency. apache-jena pom The binary distribution apache-jena-fuseki pom Fuseki2 distribution jena The formal released source-released for each Jena release. This is not a maven-runnable set of binary files jena-fuseki-main war Fuseki packaged for standalone and embedded use. jena-text jar SPARQL Text Search. Included in Fuseki. jena-shacl jar SHACL engine for Jena. jena-shex jar ShEx engine for Jena. jena-serviceenhancer jar Bulk retrieval and caching for SERVICE clauses jena-querybuilder jar A utility package to simplify the building of ARQ queries in code. jena-permissions jar Security wrapper around Jena RDF implementation. jena-jdbc-driver-bundle jar A collection of JDBC drivers There are also a number of artifacts used in development. The full list can be seen by browsing Maven\nReleased Jena artifacts\n(This includes historic artifacts which are no longer active.)\nYou can run mvn dependency:tree to print the dependency tree.\nSpecifying dependencies on SNAPSHOTs If you want to depend on Jena development snapshots and help with Jena development, e.g. to get access to recent bug fixes for testing, you should add the following to your pom.xml:\n\u0026lt;repository\u0026gt; \u0026lt;id\u0026gt;apache-repo-snapshots\u0026lt;/id\u0026gt; \u0026lt;url\u0026gt;https://repository.apache.org/content/repositories/snapshots/\u0026lt;/url\u0026gt; \u0026lt;releases\u0026gt; \u0026lt;enabled\u0026gt;false\u0026lt;/enabled\u0026gt; \u0026lt;/releases\u0026gt; \u0026lt;snapshots\u0026gt; \u0026lt;enabled\u0026gt;true\u0026lt;/enabled\u0026gt; \u0026lt;/snapshots\u0026gt; \u0026lt;/repository\u0026gt; Build and install artifacts in your local Maven repository If you want you can check out the Jena sources, build the artifacts and install them in your local Maven repository, then you simply check out the source tree and build with maven mvn install. This assumes you have Maven and Git installed:\n$ git clone https://github.com/apache/jena/ $ cd jena $ mvn clean install Each of the modules can be built on its own, but they require the current snapshots and Jena parent POM to be installed.\n","permalink":"https://jena.apache.org/download/maven.html","tags":null,"title":"Using Jena with Apache Maven"},{"categories":null,"contents":"This tutorial will guide you to set up Jena on your Eclipse. At the time of writing, the latest version of Eclipse is 4.7.0. The version of Java used for this tutorial was Java 1.8.0_121. The operational system should not be a problem, so the only requirements are Eclipse, Java 1.8.x, and git to checkout the Jena source code.\nSetting up your environment The first thing you will need to install is a Java JDK 1.8.x. The installation instructions vary depending on the operating system, and will not be covered in this tutorial.\nOnce you have Java installed, you can proceed installing Eclipse. You can either download an Eclipse distribution, or download the installer and choose one amongst the available packages. For this tutorial, you will see instructions and screenshots taken from an Eclipse IDE for Java Developers.\nEclipse comes with a bundled Apache Maven, but you may prefer to install it to another directory and customize your local settings. As this is not a must have requirement, this will not be covered in this tutorial.\nGetting the source code Follow the instructions from our Getting involved in Apache Jena page to check out the code from the Git repository. Most developers will check out the code into their Eclipse workspace folder. But you should be able to import it into Eclipse from a different folder too, as will be shown in the next sections.\nDo not forget to run mvn clean install as instructed, so that Eclipse will be able to find all local artifacts with no issues.\nImporting the source code into Eclipse Eclipse comes, by default, with Maven integration. In the past you would have to install and configure a plug-in for that. But assuming you followed the instructions from the previous sections, you should be ready to import the source code.\nIn the previous picture, you can see an empty Eclipse workspace. The view was configured to display working sets, and there is a Jena working set already created. This is not necessary for this tutorial, but you may find it useful if you work on separate projects at the same time (e.g. working on Apache Commons RDF and Apache Jena projects simultaneously).\nEclipse keeps, by default, your projects on the left hand side panel. Right click somewhere on that panel and choose Import. Alternatively, you can navigate using the top menu to File / Import.\nThat will open a menu dialog, where you should find several types of projects to import into your workspace. For Jena, you must select import Existing Maven Projects, under the Maven project category.\nClicking Next will bring you to another screen where you can choose the location of Jena source code. Point it to the folder where you checked out the Jena source code in the previous section of this tutorial.\nClick Finish and Eclipse will start importing your project. This may take a few minutes, depending on your computer resources. You can keep an eye at the Progress tab, in the bottom panel, to see what is the status of the import process.\nOnce the project has been imported into your workspace, you should see something similar to the following screenshot.\nAfter the import process is complete, Eclipse will start building the project automatically if you have it configured with the default settings, or you may have to click on Project / Build All.\nEclipse will display a red icon on the project folders with build problems. We will see now how to fix these build problems, so Eclipse can successfully build and run the project.\nThe build problems are related to a known issue due to how the project shades Google Guava classes.\nThe workaround is to make sure the jena-shaded-guava Maven module remains closed in Eclipse. You can simply right click on the project, and choose Close. Its icon should change, indicating it has been closed.\nAfter doing that, it is good to trigger a Clean on all projects, so that Eclipse can clean and re-build everything.\nYou may also need to update the Maven project settings, so that Eclipse is aware that the project is closed and it will use a local artifact, rather than the module in the workspace.\nIf you followed all steps, and there is nothing else running in your Eclipse (check the Progress tab) then your Jena project should have been built with success.\nIf you would like to test Fuseki now, for example, you can expand the jena-fuseki-core Maven module, navigate to the org.apache.jena.fuseki.cmd package, and run FusekiCmd as a Java Application.\nThat should initialize Fuseki, and have it listening on http://localhost:3030.\nNow you should also be able to debug Jena, modify the source code and build the project again, or import or create other projects into your workspace, and use them with the latest version of Jena.\n","permalink":"https://jena.apache.org/tutorials/using_jena_with_eclipse.html","tags":null,"title":"Using Jena with Eclipse"},{"categories":null,"contents":" Welcome to the Apache Jena project! Jena is a Java framework for building Semantic Web applications. Jena provides a collection of tools and Java libraries to help you to develop semantic web and linked-data apps, tools and servers.\nThe Jena Framework includes:\nan API for reading, processing and writing RDF data in XML, N-triples and Turtle formats; an ontology API for handling OWL and RDFS ontologies; a rule-based inference engine for reasoning with RDF and OWL data sources; stores to allow large numbers of RDF triples to be efficiently stored on disk; a query engine compliant with the latest SPARQL specification servers to allow RDF data to be published to other applications using a variety of protocols, including SPARQL In April 2012, Jena graduated from the Apache incubator process and was approved as a top-level Apache project.\nQuick shortcuts I would like to \u0026hellip;\n\u0026hellip; download Jena components \u0026hellip; use Jena with Maven \u0026hellip; find out more about the Jena project \u0026hellip; see who\u0026rsquo;s involved \u0026hellip; follow a tutorial \u0026hellip; see how to get started with Jena \u0026hellip; report a bug \u0026hellip; get help using Jena \u0026hellip; read the Javadoc \u0026hellip; contribute to the project! \u0026hellip; get the DOAP description of the Jena project in RDF ","permalink":"https://jena.apache.org/about_jena/","tags":null,"title":"Welcome to Apache Jena"},{"categories":null,"contents":"Jena is a Java framework for building Semantic Web applications. It provides an extensive Java libraries for helping developers develop code that handles RDF, RDFS, RDFa, OWL and SPARQL in line with published W3C recommendations. Jena includes a rule-based inference engine to perform reasoning based on OWL and RDFS ontologies, and a variety of storage strategies to store RDF triples in memory or on disk.\nHistory Jena was originally developed by researchers in HP Labs, starting in Bristol, UK, in 2000. Jena has always been an open-source project, and has been extensively used in a wide variety of semantic web applications and demonstrators. In 2009, HP decided to refocus development activity away from direct support of development of Jena, though remaining supportive of the project\u0026rsquo;s aims. The project team successfully applied to have Jena adopted by the Apache Software Foundation in November 2010 (see the vote result).\nCurrent status Jena entered incubation with the Apache in November 2010, and graduated as a top-level project in April 2012.\nThanks YourKit is kindly supporting open source projects with its full-featured Java Profiler. YourKit, LLC is the creator of innovative and intelligent tools for profiling Java and .NET applications. Take a look at YourKit\u0026rsquo;s leading software products: YourKit Java Profiler and YourKit .NET Profiler.\n","permalink":"https://jena.apache.org/about_jena/about.html","tags":null,"title":"What is Jena?"},{"categories":null,"contents":"Jena has operations useful in processing RDF in a streaming fashion. Streaming can be used for manipulating RDF at scale. Jena provides high performance readers and writers for all standard RDF formats, and it can be extended with custom formats.\nThe RDF Binary provides the highest input parsing performance. N-Triples/N-Quads provide the highest input parsing performance using W3C Standards.\nFiles ending in .gz are assumed to be gzip-compressed. Input and output to such files takes this into account, including looking for the other file extension. data.nt.gz is parsed as a gzip-compressed N-Triples file.\nJena does not support all possible compression formats itself, only GZip and BZip2 are supported directly. If you want to use an alternative compression format you can do so by adding suitable dependencies into your project and passing an appropriate InputStream/OutputStream implementation to Jena code e.g.\nInputStream input = new ZstdCompressorInputStream(....); Graph graph = RDFParser.source(input).lang(Lang.NQ).toGraph(); StreamRDF The central abstraction is StreamRDF which is an interface for streamed RDF data. It covers triples and quads, and also parser events for prefix settings and base URI declarations.\npublic interface StreamRDF { /** Start processing */ public void start() ; /** Triple emitted */ public void triple(Triple triple) ; /** Quad emitted */ public void quad(Quad quad) ; /** base declaration seen */ public void base(String base) ; /** prefix declaration seen */ public void prefix(String prefix, String iri) ; /** Finish processing */ public void finish() ; } There are utilities to help:\nStreamRDFLib – create StreamRDF objects StreamRDFOps – helpers for sending RDF data to StreamRDF objects Reading data All parsers of RDF syntaxes provided by RIOT are streaming with the exception of JSON-LD. A JSON object can have members in any order so the parser may need the whole top-level object in order to have the information needed for parsing.\nThe parse functions of RDFDataMgr directs the output of the parser to a StreamRDF. For example:\nStreamRDF destination = ... RDFDataMgr.parse(destination, \u0026quot;http://example/data.ttl\u0026quot;) ; The above code reads the remote URL, with content negotiation, and sends the triples to the destination.\nWriting data Not all RDF formats are suitable for writing as a stream. Formats that provide pretty printing (for example the default RDFFormat for each of Turtle, TriG and RDF/XML) require analysis of the entire model in order to determine nestable structures of blank nodes and for using specific syntax for RDF lists.\nThese languages can be used for streaming output but with an appearance that is necessarily \u0026ldquo;less pretty\u0026rdquo;. See \u0026ldquo;Streamed Block Formats\u0026rdquo; for details.\nThe StreamRDFWriter class has functions that write graphs and datasets using a streaming writer and also provides for the creation of an StreamRDF backed by a stream-based writer\nStreamRDFWriter.write(output, model.getGraph(), lang) ; which can be done as:\nStreamRDF writer = StreamRDFWriter.getWriterStream(output, lang) ; StreamRDFOps.graphToStream(model.getGraph(), writer) ; N-Triples and N-Quads are always written as a stream.\nRDFFormat and Lang RDFFormat Lang shortcut RDFFormat.TURTLE_BLOCKS Lang.TURTLE RDFFormat.TURTLE_FLAT RDFFormat.TRIG_BLOCKS Lang.TRIG RDFFormat.TRIG_FLAT RDFFormat.NTRIPLES_UTF8 Lang.NTRIPLES RDFFormat.NTRIPLES_ASCII RDFFormat.NQUADS_UTF8 Lang.NQUADS RDFFormat.NQUADS_ASCII RDFFormat.TRIX Lang.TRIX RDFFormat.RDF_THRIFT Lang.RDFTHRIFT RDFFormat.RDF_PROTO Lang.RDFPROTO ","permalink":"https://jena.apache.org/documentation/io/streaming-io.html","tags":null,"title":"Working with RDF Streams in Apache Jena"},{"categories":null,"contents":"This page describes the RIOT (RDF I/O technology) output capabilities.\nSee Reading RDF for details of the RIOT Reader system.\nAPI RDFFormat RDFFormats and Jena syntax names Formats Normal Printing Pretty Printed Languages Streamed Block Formats Line printed formats Turtle and Trig format options N-Triples and N-Quads JSON-LD RDF Binary RDF/XML Examples Notes API There are two ways to write RDF data using Apache Jena RIOT, either via the RDFDataMgr\nRDFDataMgr.write(OutputStream, Model, Lang) ; RDFDataMgr.write(OutputStream, Dataset, Lang) ; RDFDataMgr.write(OutputStream, Model, RDFFormat) ; RDFDataMgr.write(OutputStream, Dataset, RDFFormat) ; or the legacy way using the model API, where there is a limited set of \u0026quot;format\u0026quot; names\nmodel.write(output, \u0026quot;format\u0026quot;) ; The format names are described below; they include the names Jena has supported before RIOT.\nMany variations of these methods exist. See the full javadoc for details.\nRDFFormat Output using RIOT depends on the format, which involves both the language (syntax) being written and the variant of that syntax.\nThe RIOT writer architecture is extensible. The following languages are available as part of the standard setup.\nTurtle N-Triples NQuads TriG JSON-LD RDF/XML RDF/JSON TriX RDF Binary In addition, there are variants of Turtle, TriG for pretty printing, streamed output and flat output. RDF/XML has variants for pretty printing and plain output. Jena RIOT uses org.apache.jena.riot.RDFFormat as a way to identify the language and variant to be written. The class contains constants for the standard supported formats.\nNote:\nRDF/JSON is not JSON-LD. See the description of RDF/JSON. N3 is treated as Turtle for output. RDFFormats and Jena syntax names The string name traditionally used in model.write is mapped to RIOT RDFFormat as follows:\nJena writer name RIOT RDFFormat \u0026quot;TURTLE\u0026quot; TURTLE \u0026quot;TTL\u0026quot; TURTLE \u0026quot;Turtle\u0026quot; TURTLE \u0026quot;N-TRIPLES\u0026quot; NTRIPLES \u0026quot;N-TRIPLE\u0026quot; NTRIPLES \u0026quot;NT\u0026quot; NTRIPLES \u0026quot;JSON-LD\u0026quot; JSONLD \u0026quot;RDF/XML-ABBREV\u0026quot; RDFXML \u0026quot;RDF/XML\u0026quot; RDFXML_PLAIN \u0026quot;N3\u0026quot; N3 \u0026quot;RDF/JSON\u0026quot; RDFJSON Formats Normal Printing A Lang can be used for the writer format, in which case it is mapped to an RDFFormat internally. The normal writers are:\nRDFFormat or Lang Default TURTLE Turtle, pretty printed TTL Turtle, pretty printed NTRIPLES N-Triples, UTF-8 TRIG TriG, pretty printed NQUADS N-Quads, UTF-8 JSONLD JSON-LD, pretty printed RDFXML RDF/XML, pretty printed RDFJSON TRIX RDFTHRFT RDF Binary Thrift RDFPROTO RDF Binary Protobuf Pretty printed RDF/XML is also known as RDF/XML-ABBREV.\nPretty Printed Languages All Turtle and TriG formats use prefix names, and short forms for literals.\nThe pretty printed versions of Turtle and TriG prints data with the same subject in the same graph together. All the properties for a given subject are sorted into a predefined order. RDF lists are printed as (...) and [...] is used for blank nodes where possible.\nThe analysis for determining what can be pretty printed requires temporary datastructures and also a scan of the whole graph before writing begins. Therefore, pretty printed formats are not suitable for writing persistent graphs and datasets.\nWhen writing at scale use either a \u0026ldquo;blocked\u0026rdquo; version of Turtle or TriG, or write N-triples/N-Quads.\nExample:\nPREFIX : \u0026lt;http://example/\u0026gt; PREFIX dc: \u0026lt;http://purl.org/dc/elements/1.1/\u0026gt; PREFIX foaf: \u0026lt;http://xmlns.com/foaf/0.1/\u0026gt; PREFIX rdf: \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026gt; :book dc:author ( :a :b ) . :a a foaf:Person ; foaf:knows [ foaf:name \u0026quot;Bob\u0026quot; ] ; foaf:name \u0026quot;Alice\u0026quot; . :b foaf:knows :a . The default pretty printed output (shown above) aligns predicates and objects, which can result in wide lines. For a narrower indentation style, set ttl:indentStyle to long. See Turtle and Trig format options.\nPretty printed formats:\nRDFFormat Same as TURTLE_PRETTY TURTLE, TTL TRIG_PRETTY TRIG RDFXML_PRETTY RDFXML_ABBREV, RDFXML Streamed Block Formats Fully pretty printed formats can not be streamed. They require analysis of all of the data to be written in order to choose the short forms. This limits their use in fully scalable applications.\nSome formats can be written streaming style, where the triples or quads are partially grouped together by adjacent subject or graph/subject in the output stream.\nThe written data is like the pretty printed forms of Turtle or TriG, but without RDF lists being written in the \u0026lsquo;(\u0026hellip;)\u0026rsquo; form, without using [...] for blank nodes.\nThis gives some degree of readability while not requiring excessive temporary datastructure. Arbitrary amounts of data can be written but blank node labels need to be tracked in order to use the short label form.\nExample:\nPREFIX dc: \u0026lt;http://purl.org/dc/elements/1.1/\u0026gt; PREFIX : \u0026lt;http://example/\u0026gt; PREFIX foaf: \u0026lt;http://xmlns.com/foaf/0.1/\u0026gt; PREFIX rdf: \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026gt; :book dc:author _:b0 . _:b0 rdf:rest _:b1 ; rdf:first :a . :a foaf:knows _:b2 ; foaf:name \u0026quot;Alice\u0026quot; ; rdf:type foaf:Person . _:b2 foaf:name \u0026quot;Bob\u0026quot; . :b foaf:knows :a . _:b1 rdf:rest rdf:nil ; rdf:first :b . Formats:\nRDFFormat TURTLE_BLOCKS TRIG_BLOCKS Line printed formats There are writers for Turtle and Trig that use the abbreviated formats for prefix names and short forms for literals. They write each triple or quad on a single line.\nThe regularity of the output can be useful for text processing of data.\nThese formats do not offer more scalability than the stream forms.\nExample:\nThe FLAT writers abbreviates IRIs, literals and blank node labels but always writes one complete triple on one line (no use of ;).\nPREFIX : \u0026lt;http://example/\u0026gt; PREFIX dc: \u0026lt;http://purl.org/dc/elements/1.1/\u0026gt; PREFIX foaf: \u0026lt;http://xmlns.com/foaf/0.1/\u0026gt; PREFIX rdf: \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026gt; _:b0 foaf:name \u0026quot;Bob\u0026quot; . :book dc:author _:b1 . _:b2 rdf:rest rdf:nil . _:b2 rdf:first :b . :a foaf:knows _:b0 . :a foaf:name \u0026quot;Alice\u0026quot; . :a rdf:type foaf:Person . _:b1 rdf:rest _:b2 . _:b1 rdf:first :a . :b foaf:knows :a . \u0026nbsp;\nRDFFormat TURTLE_FLAT TRIG_FLAT Turtle and Trig format options Some context settings affect the output of Turtle and TriG writers. Unless otherwise noted, the setting applies to both Turtle and TriG.\nContext setting Cmd line Values RIOT.symTurtleDirectiveStyle \u0026ldquo;ttl:directiveStyle\u0026rdquo; \u0026ldquo;sparql\u0026rdquo;, \u0026ldquo;rdf11\u0026rdquo;, \u0026ldquo;at\u0026rdquo;, \u0026ldquo;n3\u0026rdquo; RIOT.symTurtleIndentStyle \u0026ldquo;ttl:indentStyle\u0026rdquo; \u0026ldquo;wide\u0026rdquo;, \u0026ldquo;long\u0026rdquo; RIOT.symTurtleOmitBase \u0026ldquo;ttl:omitBase\u0026rdquo; \u0026ldquo;true\u0026rdquo;, \u0026ldquo;false\u0026rdquo; \u0026nbsp;\nDirective Style Effect \u0026ldquo;sparql\u0026rdquo;, \u0026ldquo;rdf11\u0026rdquo; Use PREFIX and BASE in output. \u0026ldquo;at\u0026rdquo;, \u0026ldquo;rdf10 Use @prefix and @base in output. unset Use PREFIX and BASE in output. \u0026nbsp; Format Option Usage Setting directive style riot --set ttl:directiveStyle=rdf11 --pretty Turtle file1.rdf file2.nt ... and in code:\nRDFWriter.source(model) .set(RIOT.symTurtleDirectiveStyle, \u0026#34;sparql\u0026#34;) .lang(Lang.TTL) .output(System.out); Setting indent style riot --set ttl:indentStyle=long --formatted=ttl file1.rdf file2.nt ... and in code:\nRDFWriter.source(model) .format(RDFFormat.TURTLE_LONG) .output(System.out); or:\nRDFWriter.source(model) .set(RIOT.symTurtleIndentStyle, \u0026#34;long\u0026#34;) .lang(Lang.TTL) .output(System.out); Base URI Output can be written with relative URIs and no base. Note: such output is not portable; its meaning depends on the base URI at the time of reading.\nTurtle and Trig can be written with relative URIs by setting the base URI for writing and switching off output of the base URI.\nRDFWriter.create() .base(\u0026#34;http://host/someBase\u0026#34;) .set(RIOT.symTurtleOmitBase, true) .lang(Lang.TTL) .source(model) .output(System.out); N-Triples and N-Quads These provide the formats that are fastest to write, and data of any size can be output. They do not use any internal state and formats always stream without limitation.\nThey maximise the interoperability with other systems and are useful for database dumps. They are not human readable, even at moderate scale.\nThe files can be large but they compress well with gzip. Compression ratios of x8-x10 can often be obtained.\nExample:\nThe N-Triples writer makes no attempt to make it\u0026rsquo;s output readable. It uses internal blank nodes to ensure correct labeling without needing any writer state.\n_:BX2Dc2b3371X3A13cf8faaf53X3AX2D7fff \u0026lt;http://xmlns.com/foaf/0.1/name\u0026gt; \u0026quot;Bob\u0026quot; . \u0026lt;http://example/book\u0026gt; \u0026lt;http://purl.org/dc/elements/1.1/author\u0026gt; _:BX2Dc2b3371X3A13cf8faaf53X3AX2D7ffe . _:BX2Dc2b3371X3A13cf8faaf53X3AX2D7ffd \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#rest\u0026gt; \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#nil\u0026gt; . _:BX2Dc2b3371X3A13cf8faaf53X3AX2D7ffd \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#first\u0026gt; \u0026lt;http://example/b\u0026gt; . \u0026lt;http://example/a\u0026gt; \u0026lt;http://xmlns.com/foaf/0.1/knows\u0026gt; _:BX2Dc2b3371X3A13cf8faaf53X3AX2D7fff . \u0026lt;http://example/a\u0026gt; \u0026lt;http://xmlns.com/foaf/0.1/name\u0026gt; \u0026quot;Alice\u0026quot; . \u0026lt;http://example/a\u0026gt; \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type\u0026gt; \u0026lt;http://xmlns.com/foaf/0.1/Person\u0026gt; . _:BX2Dc2b3371X3A13cf8faaf53X3AX2D7ffe \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#rest\u0026gt; _:BX2Dc2b3371X3A13cf8faaf53X3AX2D7ffd . _:BX2Dc2b3371X3A13cf8faaf53X3AX2D7ffe \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#first\u0026gt; \u0026lt;http://example/a\u0026gt; . \u0026lt;http://example/b\u0026gt; \u0026lt;http://xmlns.com/foaf/0.1/knows\u0026gt; \u0026lt;http://example/a\u0026gt; . \u0026nbsp;\nRDFFormat Other names NTRIPLE NTRIPLE, NT, NTRIPLES_UTF8 NQUADS NQUADS, NQ, NQUADS_UTF8 \u0026nbsp;\nThe main N-Triples and N-Quads writers follow RDF 1.1 and output using UTF-8.\nFor compatibility with old software, writers are provided that output in ASCII (using \\u escape sequences for non-ASCI characters where necessary).\nRDFFormat NTRIPLES_ASCII NQUADS_ASCII JSON-LD Caution. This section describes features that may be removed.\nJena uses different third party processors for JSON-LD 1.0 and JSON-LD 1.1.\nThis section describes support for passing configuration to the JSON-LD 1.0 processor only. It does not apply to the JSON-LD 1.1 processor.\nIt is planned that support for JSON-LD 1.0 will be removed in Jena5.\nThe project is looking for contributions for passing framing configuration to the JSON-LD 1.1 processor, which is titanium-json-ld.\nJSON-LD output is supported, in its various flavors (\u0026ldquo;compacted\u0026rdquo;, \u0026ldquo;expanded\u0026rdquo;, \u0026ldquo;flattened\u0026rdquo;, \u0026ldquo;framed\u0026rdquo;), by using one of the following RDFFormats:\nRDFFormat JSONLD_EXPAND_PRETTY JSONLD_EXPAND_FLAT JSONLD_COMPACT_PRETTY JSONLD_COMPACT_FLAT JSONLD_FLATTEN_PRETTY JSONLD_FLATTEN_FLAT JSONLD_FRAME_PRETTY JSONLD_FRAME_FLAT The default registration for JSONLD is JSONLD_PRETTY. JSONLD_PRETTY is identical to JSONLD_COMPACT_PRETTY.\nOutput can be customized, passing more info to the writer by using the \u0026ldquo;Context\u0026rdquo; mechanism provided by Jena. The same mechanism is used to pass the \u0026ldquo;frame\u0026rdquo; in the JSONLD_FRAME_PRETTY and JSONLD_FRAME_FLAT cases.\nWhat can be done, and how it can be, is explained in the sample code.\nRDF Binary This is a binary encoding using Apache Thrift or Google Protocol Buffers for RDF Graphs and RDF Datasets, as well as SPARQL Result Sets, and it provides faster parsing compared to the text-based standardised syntax such as N-triples, Turtle or RDF/XML.\nRDFFormat RDF_THRIFT RDF_THRIFT_VALUES RDF_PROTO RDF_PROTO_VALUES RDF_THRIFT_VALUES and RDF_PROTO_VALUES are variants where numeric values are written as values, not as lexical format and datatype. See the description of RDF Binary. for discussion.\nRDF/XML RIOT supports output in RDF/XML. RIOT RDFFormats defaults to pretty printed RDF/XML, while the jena writer name defaults to a streaming plain output.\nRDFFormat Other names Jena writer name RDFXML RDFXML_PRETTY, RDF_XML_ABBREV \u0026ldquo;RDF/XML-ABBREV\u0026rdquo; RDFXML_PLAIN \u0026ldquo;RDF/XML\u0026rdquo; More details RDF/XML Output.\nExamples Example code may be found in jena-examples:arq/examples.\nWays to write a model The follow ways are different ways to write a model in Turtle:\nModel model = ... ; // Write a model in Turtle syntax, default style (pretty printed) RDFDataMgr.write(System.out, model, Lang.TURTLE) ; // Write Turtle to the blocks variant RDFDataMgr.write(System.out, model, RDFFormat.TURTLE_BLOCKS) ; // Write as Turtle via model.write model.write(System.out, \u0026quot;TTL\u0026quot;) ; Ways to write a dataset The preferred style is to use RDFDataMgr:\nDataset ds = .... ; // Write as TriG RDFDataMgr.write(System.out, ds, Lang.TRIG) ; // Write as N-Quads RDFDataMgr.write(System.out, dataset, Lang.NQUADS) ; Additionally, a single model can be written in a dataset format - it becomes the default graph of the dataset.\nModel m = ... ; RDFDataMgr.write(System.out, m, Lang.TRIG) ; might give:\nPREFIX : \u0026lt;http://example/\u0026gt; PREFIX dc: \u0026lt;http://purl.org/dc/elements/1.1/\u0026gt; PREFIX foaf: \u0026lt;http://xmlns.com/foaf/0.1/\u0026gt; PREFIX rdf: \u0026lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#\u0026gt; { :book dc:author ( :a :b ) . :a a foaf:Person ; foaf:knows [ foaf:name \u0026quot;Bob\u0026quot; ] ; foaf:name \u0026quot;Alice\u0026quot; . :b foaf:knows :a . } Adding a new output format A complete example of adding a new output format is given in the example file: RIOT Output example 7.\nNotes Using OutputStreams is strongly encouraged. This allows the writers to manage the character encoding using UTF-8. Using java.io.Writer does not allow this; on platforms such as MS Windows, the default configuration of a Writer is not suitable for Turtle because the character set is the platform default, and not UTF-8. The only use of writers that is useful is using java.io.StringWriter.\n","permalink":"https://jena.apache.org/documentation/io/rdf-output.html","tags":null,"title":"Writing RDF in Apache Jena"}]