blob: e59b719bc3b13a0dd7d61c88c5452eacdac2550d [file] [log] [blame]
[{"categories":null,"contents":"This page is historical \u0026ldquo;for information only\u0026rdquo; - there is no Apache release of Eyeball and the code has not been updated for Jena3.\nThe original source code is available. So you\u0026rsquo;ve got Eyeball installed and you\u0026rsquo;ve run it on one of your files, and Eyeball doesn\u0026rsquo;t like it. You\u0026rsquo;re not sure why, or what to do about it. Here\u0026rsquo;s what\u0026rsquo;s going on.\nEyeball inspects your model against a set of schemas. The default set of schemas includes RDF, RDFS, the XSD datatypes, and any models your model imports: you can add additional schemas from the command line or configuration file. Eyeball uses those schemas to work out what URIs count as \u0026ldquo;declared\u0026rdquo; in advance. It also checks URIs and literals for syntactic correctness and name space prefixes for being \u0026ldquo;sensible\u0026rdquo;. Let\u0026rsquo;s look at some of the messages you can get.\nUnknown predicate reports You\u0026rsquo;ll probably find several messages like this: predicate not declared in any schema: somePredicateURI\nEyeball treats the imported models, and (independently) the specified schemas, as single OntModels, and extracts those OntModels\u0026rsquo; properties. It includes the RDF and RDFS schemas. Anything used as a predicate that isn\u0026rsquo;t one of those properties is reported.\nIf you\u0026rsquo;re using OWL, you can silence the \u0026ldquo;undeclared property\u0026rdquo; messages about OWL properties by adding to your Eyeball command line the option: -assume owl\nEyeball will read the OWL schema (it has a copy stashed away in the mirror directory) and add the declared properties to its known list. This works for any filename or URL you like, so long as there\u0026rsquo;s RDF there and it has a suitable file suffix - .n3 for N3 or .rdf or .owl for RDF/XML - and for the built-in names dc (basic Dublin Core), dcterms (Dublin Core terms) and dc-all (both). So you can construct your own schemas, which declare your own domain-specific property declarations, and invoke Eyeball with\n-assume owl *mySchemaFile.n3* *otherSchemaFile.rdf* You can give short names (like dc and rdfs) to your own schemas, or collections of schemas, using an Eyeball config file, but you\u0026rsquo;ll have to see the manual to find out how.\nUnknown class reports You may see messages like this:\nclass not declared in any schema: someClassURI Having read the previous section, you can probably work out what\u0026rsquo;s going on: Eyeball reads the schemas (and imports) and extracts the declared OntClasses. Then anything used as a class that isn\u0026rsquo;t one of those declared classes is reported..\nAnd that\u0026rsquo;s exactly it. \u0026ldquo;Used as a class\u0026rdquo; means appearing as C or D in any statement of the form:\n\\_ rdf:type C \\_ rdfs:domain C \\_ rdfs:range C C rdfs:subClassOf D Suppressing inspectors It may be that you\u0026rsquo;re not interested in the \u0026ldquo;unknown predicate\u0026rdquo; or \u0026ldquo;unknown class\u0026rdquo; reports until you\u0026rsquo;ve sorted out the URIs. Or maybe you don\u0026rsquo;t care about them. In that case, you can switch them off.\nEyeball\u0026rsquo;s different checks are carried out by inspector classes. These classes are given short names by entries in Eyeball config files (which are RDF files written using N3; you can see the default config file by looking in Eyeball\u0026rsquo;s etc directory for eyeball2-config.n3). By adding eg:\n-exclude property class to the Eyeball command line, you can exclude the inspectors with those short names from the check. property is the short name for the \u0026ldquo;unknown property\u0026rdquo; inspector, and class is the short name for the \u0026ldquo;unknown class\u0026rdquo; inspector.\nNamespace and URI reports Eyeball checks all the URIs in the model, including (if available) those used for namespaces. (And literals, but see below.) Here\u0026rsquo;s an example:\nbad namespace URI: \u0026quot;file:some-filename\u0026quot; on prefix: \u0026quot;pqr\u0026quot; for reason: file URI inappropriate for namespace A \u0026ldquo;bad namespace URI\u0026rdquo; means that Eyeball doesn\u0026rsquo;t like the URI for a namespace in the model. The \u0026ldquo;on prefix\u0026rdquo; part of the report says what the namespace prefix is, and the \u0026ldquo;for reason\u0026rdquo; part gives the reason. In this case, we (the designer of Eyeball) feel that it is unwise to use file URIs - which tend to depend on internal details of your directory structure - for global concepts. A more usual reason is that the URI is syntactically illegal. Here are some possibilities:\nproblem explanation URI contains spaces literal spaces are not legal in URIs. This usually arises from file URIs when the file has a space in its name. Spaces in URIs have to be encoded. URI has no scheme The URI has no scheme at all. This usually happens when some relative URI hasn\u0026rsquo;t been resolved properly, eg there\u0026rsquo;s no xml base in an RDF/XML document. URI has an unrecognised scheme The scheme part of the URI - the bit before the first colon - isn\u0026rsquo;t recognised. Eyeball knows, by default, four schemes: http, ftp, file, and urn. This usually arises when a QName has \u0026ldquo;escaped\u0026rdquo; from somewhere, or from a typo. You can tell Eyeball about other schemes, if you need them. scheme should be lower-case The scheme part of the URI contains uppercase letters. While this is not actually wrong, it is unconventional and pointless. URI doesn\u0026rsquo;t fit pattern Eyeball has some (weak) checks on the syntax of URIs in different schemes, expressed as patterns in its config files. If a URI doesn\u0026rsquo;t match the pattern, Eyeball reports this problem. At the moment, you\u0026rsquo;ll only get this report for a urn URI like urn:x-hp:23487682347 where the URN id (the bit between the first and second colons, here x-hp) is illegal. URI syntax error A catch-all error: Java couldn\u0026rsquo;t make any sense of this URI at all. Problems with literals Eyeball checks literals (using the literal inspector, whose short name is literal if you want to switch it off), but the checking is quite weak because it doesn\u0026rsquo;t understand types at the moment. You can get two different classes of error.\nbad language: someLanguageCode on literal: theLiteralInQuestion Literals with language codes (things like en-UK or de) are checked to make sure that the language code conforms to the general syntax for language codes: alphanumeric words separated by hyphens, with the first containing no digits.\n(Later versions of Eyeball will likely allow you to specify which language codes you want to permit in your models. But we haven\u0026rsquo;t got there yet.)\nbad datatype URI: someURI on literal: theLiteralInQuestion for reason: theReason Similarly, literals with datatypes are checked to make sure that the type URI is legal. That\u0026rsquo;s it for the moment: Eyeball doesn\u0026rsquo;t try to find out if the URI really is a type URI, or if the spelling of the literal is OK for that type. But it spots the bad URIs. (The messages are the same as those that appear in the URI checking - above - for the very good reason that it\u0026rsquo;s the same code doing the checking.)\nProblematic prefixes Both RDF/XML and N3 allow (and RDF/XML requires) namespaces to be abbreviated by prefixes. Eyeball checks prefixes for two possible problems. The first:\nnon-standard namespace for prefix This arises when a \u0026ldquo;standard\u0026rdquo; prefix has been bound to a namespace URI which isn\u0026rsquo;t its usual one. The \u0026ldquo;standard\u0026rdquo; prefixes are taken from Jena\u0026rsquo;s PrefixMapping.Extended and are currently:\n**rdf, rdfs, owl, xsd, rss, vcard** And the second:\nJena generated prefix found This arises when the model contains prefixes of the form j.N, where N is a number. These are generated by Jena when writing RDF/XML for URIs that must have a prefix (because they are used as types or predicates) but haven\u0026rsquo;t been given one.\nIf you\u0026rsquo;re not bothered about inventing short prefixes for your namespaces, you can -exclude jena-prefix to suppress this inspection.\nBut how do I \u0026hellip; The reports described so far are part of Eyeball\u0026rsquo;s default set of inspections. There are some other checks that it can do that are switched off by default, because they are expensive, initially overwhelming, or downright obscure. If you need to add these checks to your eyeballing, this is how to do it.\n\u0026hellip; make sure everything is typed? Some applications (or a general notion of cleanliness) require that every individual in an RDF model has an explicit rdf:type. The Eyeball check for this isn\u0026rsquo;t enabled by default, because lots of casual RDF use doesn\u0026rsquo;t need it, and more sophisticated use has models with enough inference power to infer types.\nYou can add the all-typed inspector to the inspectors that Eyeball will run by adding to the command line:\n-inspectors defaultInspectors all-typed The all-typed inspector will generate a message\nresource has no rdf:type for each resource in the model which is not the subject of an rdf:type statement.\n\u0026hellip; check for type consistency? One easy mistake to make in RDF is to make an assertion - we\u0026rsquo;ll call it S P O - about some subject S which is \u0026ldquo;of the wrong type\u0026rdquo;, that is, not of whatever type P\u0026rsquo;s domain is. This isn\u0026rsquo;t, in principle, an error, since RDF resources can have multiple types, and this just makes S have a type which is a subtype of both P\u0026rsquo;s domain and whatever type it was supposed to have.\nTo spot this, and related problems, Eyeball has the consistent-type inspector. You can add it to the inspections in the same way as the all-typed inspector:\n-inspectors defaultInspectors consistent-type It checks that every resource which has been given at least one type has a type which is a subtype of all its types, under an additional assumption:\nTypes in the type graph (the network of rdfs:subClassOf statements) are disjoint (share no instances) unless the type graph says they're not. For example, suppose that both A and B are subclasses of Top, and that there are no other subclass relationships. Then consistent-types assumes that there are (supposed to be) no resources which have both A and B as types. If it finds a resource X which does have both types, it generates a message like this:\nno consistent type for: X has associated type: A has associated type: B has associated type: Top It\u0026rsquo;s up to you to disentangle the types and work out what went wrong.\nNote: this test requires that Eyeball do a significant amount of inference, to complete the type hierarchy and check the domains and ranges of properties. It\u0026rsquo;s quite slow, which is one reason it isn\u0026rsquo;t switched on by default.\n\u0026hellip; check the right number of values for a property? You want to make sure that your data has the right properties for things of a certain type: say, that a book has at least one author (or editor), an album has at least one track, nobody in your organisation has more than ten managers, a Jena contrib has at least a dc:creator, a dc:name, and a dc:description. You write some OWL cardinality constraints:\nmy:Type rdfs:subClassOf [owl:onProperty my:track; owl:minCardinality 1] Then you discover that, for wildly technical reasons, the OWL validation code in Jena doesn\u0026rsquo;t think it\u0026rsquo;s an error for some album to have no tracks (maybe there\u0026rsquo;s a namespace error). You can enable Eyeball\u0026rsquo;s cardinality inspector by adding\n-inspectors cardinality to the command line. You\u0026rsquo;ll now get a report item for every resource that has rdf:type your restricted type (my:Type above) but doesn\u0026rsquo;t have the right (at least one) value for the property. It will look something like:\ncardinality failure for: my:Instance on type: my:Type on property: my:track cardinality range: [min: 1] number of values: 0 values: {} If there are some values for the property - say you\u0026rsquo;ve supplied an owl:maxCardinality restriction and then gone over the top - they get listed inside the values curly braces.\n","permalink":"","tags":null,"title":"A brief guide to Jena Eyeball"},{"categories":null,"contents":"Overview The goal of this document is to add Jena Permissions to a fuseki deployment to restrict access to graph data. This example will take the example application, deploy the data to a fuseki instance and add the Jena Permissions to achieve the same access restrictions that the example application has.\nTo do this you will need a Fuseki installation, the Permissions Packages and a SecurityEvaluator implementation. For this example we will use the SecurityEvaluator from the permissions-example.\nSet up This example uses Fuseki 2.3.0 or higher, Permissions 3.1.0 or higher and Apache Commons Collections v4.\nFuseki can be downloaded from:\nJena Permissions jars can be downloaded from:\nDownload and unpack Fuseki. The directory that you unpack Fuseki into will be referred to as the Fuseki Home directory for the remainder of this document.\nDownload the permissions jar and the associated permissions-example jar.\nCopy the permissions jar and the permissions-example jar into the Fuseki Home directory. For the rest of this document the permissions jar will be referred to as permissions.jar and the permissions-example.jar as example.jar\nDownload the Apache Commons Collections v4. Uncompress the commons-collections*.jar into the Fuseki Home directory.\nAdd security jars to the startup script/batch file.\nOn *NIX edit fuseki-server script\nComment out the line that reads exec java $JVM_ARGS -jar \u0026quot;$JAR\u0026quot; \u0026quot;$@\u0026quot; Uncomment the line that reads ## APPJAR=MyCode.jar Uncomment the line that reads ## java $JVM_ARGS -cp \u0026quot;$JAR:$APPJAR\u0026quot; org.apache.jena.fuseki.cmd.FusekiCmd \u0026quot;$@\u0026quot; change MyCode.jar to permissions.jar:example.jar:commons-collections*.jar On Windows edit fuseki-server.bat file.\nComment out the line that reads java -Xmx1200M -jar fuseki-server.jar %* Uncomment the line that reads @REM java ... -cp fuseki-server.jar;MyCustomCode.jar org.apache.jena.fuseki.cmd.FusekiCmd %* Change MyCustomCode.jar to permissions.jar;example.jar;commons-collections*.jar Run the fuseki-server script or batch file.\nStop the server.\nExtract the example configuration into the newly created Fuseki Home/run directory. From the example.jar archive:\nextract /org/apache/jena/permissions/example/example.ttl into the Fuseki Home/run directory extract /org/apache/jena/permissions/example/fuseki/config.ttl into the Fuseki Home/run directory extract /org/apache/jena/permissions/example/fuseki/shiro.ini into the Fuseki Home/run directory Run fuseki-server –config=run/config.ttl or fuseki-server.bat –config=run/config.ttl\nReview of configuration At this point the system is configured with the following logins:\nLoginpasswordAccess to adminadminEverything alicealiceOnly messages to or from alice bobbobOnly messages to or from bob chuckchuckOnly messages to or from chuck darladarlaOnly messages to or from darla The messages graph is defined in the run/example.ttl file.\nThe run/shiro.ini file lists the users and their passwords and configures Fuseki to require authentication to access to the graphs.\nThe run/config.ttl file adds the permissions to the graph as follows by applying the org.apache.jena.permissions.example.ShiroExampleEvaluator security evaluator to the message graph.\nDefine all the prefixes.\n@prefix fuseki: \u0026lt;\u0026gt; . @prefix tdb: \u0026lt;\u0026gt; . @prefix rdf: \u0026lt;\u0026gt; . @prefix rdfs: \u0026lt;\u0026gt; . @prefix ja: \u0026lt;\u0026gt; . @prefix perm: \u0026lt;\u0026gt; . @prefix my: \u0026lt;\u0026gt; . Load the SecuredAssembler class from the permissions library and define the perm:Model as a subclass of ja:NamedModel.\n[] ja:loadClass \u0026quot;org.apache.jena.permissions.SecuredAssembler\u0026quot; . perm:Model rdfs:subClassOf ja:NamedModel . Define the base model that contains the unsecured data. This can be any model type. For our example we use an in memory model that reads the example.ttl file.\nmy:baseModel rdf:type ja:MemoryModel; ja:content [ja:externalContent \u0026lt;file:./example.ttl\u0026gt;] . Define the secured model. This is where permissions is applied to the my:baseModel to create a model that has permission restrictions. Note that it is using the security evaluator implementation (sec:evaluatorImpl) called my:secEvaluator which we will define next.\nmy:securedModel rdf:type sec:Model ; perm:baseModel my:baseModel ; ja:modelName \u0026quot;\u0026quot; ; perm:evaluatorImpl my:secEvaluator . Define the security evaluator. This is where we use the example ShiroExampleEvaluator. For your production environment you will replace \u0026ldquo;\u0026rdquo; with your SecurityEvaluator implementation. Note that ShiroExampleEvaluator constructor takes a Model argument. We pass in the unsecured baseModel so that the evaluator can read it unencumbered. Your implementation of SecurityEvaluator may have different parameters to meet your specific needs.\nmy:secEvaluator rdf:type perm:Evaluator ; perm:args [ rdf:_1 my:baseModel ; ] ; perm:evaluatorClass \u0026quot;org.apache.jena.permissions.example.ShiroExampleEvaluator\u0026quot; . Define the dataset that we will use for in the server. Note that in the example dataset only contains the single secured model, adding multiple models and missing secured and unsecured models is supported.\nmy:securedDataset rdf:type ja:RDFDataset ; ja:defaultGraph my:securedModel . Define the fuseki:Server.\nmy:fuseki rdf:type fuseki:Server ; fuseki:services ( my:service1 ) . Define the service for the fuseki:Service. Note that the fuseki:dataset served by this server is the secured dataset defined above.\nmy:service1 rdf:type fuseki:Service ; rdfs:label \u0026quot;My Secured Data Service\u0026quot; ; fuseki:name \u0026quot;myAppFuseki\u0026quot; ; # http://host:port/myAppFuseki fuseki:serviceQuery \u0026quot;query\u0026quot; ; # SPARQL query service fuseki:serviceQuery \u0026quot;sparql\u0026quot; ; # SPARQL query service fuseki:serviceUpdate \u0026quot;update\u0026quot; ; # SPARQL query service fuseki:serviceReadWriteGraphStore \u0026quot;data\u0026quot; ; # SPARQL Graph store protocol (read and write) # A separate read-only graph store endpoint: fuseki:serviceReadGraphStore \u0026quot;get\u0026quot; ; # SPARQL Graph store protocol (read only) fuseki:dataset my:securedDataset ; . Review of ShiroExampleEvaluator The ShiroExampleEvaluator uses triple level permissions to limit access to the \u0026ldquo;messages\u0026rdquo; in the graph to only those people in the message is address to or from. It is connected to the Shiro system by the getPrincipal() implementation where it simply calls the Shiro SecurityUtils.getSubject() method to return the current shiro user.\n/** * Return the Shiro subject. This is the subject that Shiro currently has logged in. */ @Override public Object getPrincipal() { return SecurityUtils.getSubject(); } This example allows any action on a graph as is seen in the evaluate(Object principal, Action action, Node graphIRI) and evaluateAny(Object principal, Set\u0026lt;Action\u0026gt; actions, Node graphIRI) methods. This is the first permissions check. If you wish to restrict users from specific graphs this method should be recoded to perform the check.\n/** * We allow any action on the graph itself, so this is always true. */ @Override public boolean evaluate(Object principal, Action action, Node graphIRI) { // we allow any action on a graph. return true; } /** * As per our design, users can access any graph. If we were to implement rules that * restricted user access to specific graphs, those checks would be here and we would * return \u0026lt;code\u0026gt;false\u0026lt;/code\u0026gt; if they were not allowed to access the graph. Note that this * method is checking to see that the user may perform ANY of the actions in the set on the * graph. */ @Override public boolean evaluateAny(Object principal, Set\u0026lt;Action\u0026gt; actions, Node graphIRI) { return true; } The other overridden methods are implemented using one of three (3) private methods that evaluate if the user should have access to the data based on our security design. To implement your security design you should understand what each of the methods checks. See the SecurityEvaluator javadocs and SecurityEvaluator implementation notes.\n","permalink":"","tags":null,"title":"Adding Jena Permissions to Fuseki"},{"categories":null,"contents":"Outra forma de lidar com dado semiestruturado é consultar uma de um número de possibilidades. Essa sessão cobre O padrão UNION, onde uma de um número de possibilidades é testado.\nUNION - duas maneiras para o mesmo dado Ambos os vocabulários de vcard e de FOAF possuem propriedades para nome de pessoas. Em vcard, é vcard:FN, o nome formatado, e em FOAF, é foaf:name. Nesta sessão, vamos olhar um pequeno conjunto de dados onde o nome das pessoas podem ser dados por ambos os vocabulários de FOAF e vcard.\nSuponha que temos um an RDF graph que contém a informação de nome usando os vocabulários de vcard e FOAF.\n@prefix foaf: \u0026lt;\u0026gt; . @prefix vcard: \u0026lt;\u0026gt; . _:a foaf:name \u0026quot;Matt Jones\u0026quot; . _:b foaf:name \u0026quot;Sarah Jones\u0026quot; . _:c vcard:FN \u0026quot;Becky Smith\u0026quot; . _:d vcard:FN \u0026quot;John Smith\u0026quot; . Uma consulta para acessar a informação do nome, poderia ser (q-union1.rq):\nPREFIX foaf: \u0026lt;\u0026gt; PREFIX vCard: \u0026lt;\u0026gt; SELECT ?name WHERE { { [] foaf:name ?name } UNION { [] vCard:FN ?name } } isso retorna:\n----------------- | name | ================= | \u0026quot;Matt Jones\u0026quot; | | \u0026quot;Sarah Jones\u0026quot; | | \u0026quot;Becky Smith\u0026quot; | | \u0026quot;John Smith\u0026quot; | ----------------- Não importa que forma de expressão usasse para o nome, a variável ?name é preenchida. Isso pode ser obtido usando um FILTER como mostra essa consulta (q-union-1alt.rq):\nPREFIX foaf: \u0026lt;\u0026gt; PREFIX vCard: \u0026lt;\u0026gt; SELECT ?name WHERE { [] ?p ?name FILTER ( ?p = foaf:name || ?p = vCard:FN ) } testando se a propriedade é uma URI ou a outra. As soluções podem não vir na mesma ordem. A primeira forma é conhecida como a mais rápida, dependendo dos dados e do armazenamento utilizado, porque a segunda forma tem que pegar todas as triplas do grafo para casar com o padrão da tripla com variáveis não ligadas (ou blank nodes) em cada slot, então testa cada ?p para ver se casa com algum dos valores. Isso vai depender da sofisticação do otimizador de consultas para saber se ele vai executar a consulta mais eficientemente e transcender para a camada de armazenamento.\nUnion - relembrando onde o dado foi encontrado. O exemplo acima usou a mesma variável em cada ramo. Se diferentes variáveis forem usadas, a aplicação pode descobrir que sub-padrão causou o casamento (q-union2.rq):\nPREFIX foaf: \u0026lt;\u0026gt; PREFIX vCard: \u0026lt;\u0026gt; SELECT ?name1 ?name2 WHERE { { [] foaf:name ?name1 } UNION { [] vCard:FN ?name2 } } --------------------------------- | name1 | name2 | ================================= | \u0026quot;Matt Jones\u0026quot; | | | \u0026quot;Sarah Jones\u0026quot; | | | | \u0026quot;Becky Smith\u0026quot; | | | \u0026quot;John Smith\u0026quot; | --------------------------------- Essa segunda consulta guardou informação sobre onde o name da pessoa se originou atribuindo o nome para diferentes variáveis.\nOPTIONAL e UNION Na prática, OPTIONAL é mais comum que UNION mas ambas têm seu uso. OPTIONAL é útil para aumentar as soluções encontradas, UNION é útil para concatenar soluções de diferentes possibilidades. Eles não retornam necessariamente a informação da mesma maneira.\nConsulta(q-union3.rq):\nPREFIX foaf: \u0026lt;\u0026gt; PREFIX vCard: \u0026lt;\u0026gt; SELECT ?name1 ?name2 WHERE { ?x a foaf:Person OPTIONAL { ?x foaf:name ?name1 } OPTIONAL { ?x vCard:FN ?name2 } } --------------------------------- | name1 | name2 | ================================= | \u0026quot;Matt Jones\u0026quot; | | | \u0026quot;Sarah Jones\u0026quot; | | | | \u0026quot;Becky Smith\u0026quot; | | | \u0026quot;John Smith\u0026quot; | --------------------------------- Mas cuidado ao usar ?name em cada OPTIONAL porque isso é uma consulta dependente da ordem.\nPróximo: Grafos Nomeados\n","permalink":"","tags":null,"title":"Alternativas num padrão"},{"categories":null,"contents":"Preface This is a tutorial introduction to both W3C\u0026rsquo;s Resource Description Framework (RDF) and Jena, a Java API for RDF. It is written for the programmer who is unfamiliar with RDF and who learns best by prototyping, or, for other reasons, wishes to move quickly to implementation. Some familiarity with both XML and Java is assumed.\nImplementing too quickly, without first understanding the RDF data model, leads to frustration and disappointment. Yet studying the data model alone is dry stuff and often leads to tortuous metaphysical conundrums. It is better to approach understanding both the data model and how to use it in parallel. Learn a bit of the data model and try it out. Then learn a bit more and try that out. Then the theory informs the practice and the practice the theory. The data model is quite simple, so this approach does not take long.\nRDF has an XML syntax and many who are familiar with XML will think of RDF in terms of that syntax. This is a mistake. RDF should be understood in terms of its data model. RDF data can be represented in XML, but understanding the syntax is secondary to understanding the data model.\nAn implementation of the Jena API, including the working source code for all the examples used in this tutorial can be downloaded from\nIntroduction The Resource Description Framework (RDF) is a standard (technically a W3C Recommendation) for describing resources. What is a resource? That is rather a deep question and the precise definition is still the subject of debate. For our purposes we can think of it as anything we can identify. You are a resource, as is your home page, this tutorial, the number one and the great white whale in Moby Dick.\nOur examples in this tutorial will be about people. They use an RDF representation of VCARDS. RDF is best thought of in the form of node and arc diagrams. A simple vcard might look like this in RDF:\nThe resource, John Smith, is shown as an ellipse and is identified by a Uniform Resource Identifier (URI)1, in this case \"http://.../JohnSmith\". If you try to access that resource using your browser, you are unlikely to be successful; April the first jokes not withstanding, you would be rather surprised if your browser were able to deliver John Smith to your desk top. If you are unfamiliar with URI's, think of them simply as rather strange looking names.\nResources have properties. In these examples we are interested in the sort of properties that would appear on John Smith's business card. Figure 1 shows only one property, John Smith's full name. A property is represented by an arc, labeled with the name of a property. The name of a property is also a URI, but as URI's are rather long and cumbersome, the diagram shows it in XML qname form. The part before the ':' is called a namespace prefix and represents a namespace. The part after the ':' is called a local name and represents a name in that namespace. Properties are usually represented in this qname form when written as RDF XML and it is a convenient shorthand for representing them in diagrams and in text. Strictly, however, properties are identified by a URI. The nsprefix:localname form is a shorthand for the URI of the namespace concatenated with the localname. There is no requirement that the URI of a property resolve to anything when accessed by a browser.\nEach property has a value. In this case the value is a literal, which for now we can think of as a strings of characters2. Literals are shown in rectangles.\nJena is a Java API which can be used to create and manipulate RDF graphs like this one. Jena has object classes to represent graphs, resources, properties and literals. The interfaces representing resources, properties and literals are called Resource, Property and Literal respectively. In Jena, a graph is called a model and is represented by the Model interface.\nThe code to create this graph, or model, is simple:\n// some definitions static String personURI = \u0026#34;http://somewhere/JohnSmith\u0026#34;; static String fullName = \u0026#34;John Smith\u0026#34;; // create an empty Model Model model = ModelFactory.createDefaultModel(); // create the resource Resource johnSmith = model.createResource(personURI); // add the property johnSmith.addProperty(VCARD.FN, fullName); It begins with some constant definitions and then creates an empty Model or model, using the ModelFactory method createDefaultModel() to create a memory-based model. Jena contains other implementations of the Model interface, e.g one which uses a relational database: these types of Model are also available from ModelFactory.\nThe John Smith resource is then created and a property added to it. The property is provided by a \"constant\" class VCARD which holds objects representing all the definitions in the VCARD schema. Jena provides constant classes for other well known schemas, such as RDF and RDF schema themselves, Dublin Core and OWL.\nThe working code for this example can be found in the /src-examples directory of the Jena distribution as tutorial 1. As an exercise, take this code and modify it to create a simple VCARD for yourself.\nThe code to create the resource and add the property, can be more compactly written in a cascading style:\nResource johnSmith = model.createResource(personURI) .addProperty(VCARD.FN, fullName); Now let's add some more detail to the vcard, exploring some more features of RDF and Jena.\nIn the first example, the property value was a literal. RDF properties can also take other resources as their value. Using a common RDF technique, this example shows how to represent the different parts of John Smith's name:\nHere we have added a new property, vcard:N, to represent the structure of John Smith's name. There are several things of interest about this Model. Note that the vcard:N property takes a resource as its value. Note also that the ellipse representing the compound name has no URI. It is known as an blank Node.\nThe Jena code to construct this example, is again very simple. First some declarations and the creation of the empty model.\n// some definitions String personURI = \u0026#34;http://somewhere/JohnSmith\u0026#34;; String givenName = \u0026#34;John\u0026#34;; String familyName = \u0026#34;Smith\u0026#34;; String fullName = givenName + \u0026#34; \u0026#34; + familyName; // create an empty Model Model model = ModelFactory.createDefaultModel(); // create the resource // and add the properties cascading style Resource johnSmith = model.createResource(personURI) .addProperty(VCARD.FN, fullName) .addProperty(VCARD.N, model.createResource() .addProperty(VCARD.Given, givenName) .addProperty(VCARD.Family, familyName)); The working code for this example can be found as tutorial 2 in the /src-examples directory of the Jena distribution.\nStatements Each arc in an RDF Model is called a statement. Each statement asserts a fact about a resource. A statement has three parts:\nthe subject is the resource from which the arc leaves the predicate is the property that labels the arc the object is the resource or literal pointed to by the arc A statement is sometimes called a triple, because of its three parts.\nAn RDF Model is represented as a set of statements. Each call of addProperty in tutorial2 added another statement to the Model. (Because a Model is set of statements, adding a duplicate of a statement has no effect.) The Jena model interface defines a listStatements() method which returns an StmtIterator, a subtype of Java's Iterator over all the statements in a Model. StmtIterator has a method nextStatement() which returns the next statement from the iterator (the same one that next() would deliver, already cast to Statement). The Statement interface provides accessor methods to the subject, predicate and object of a statement.\nNow we will use that interface to extend tutorial2 to list all the statements created and print them out. The complete code for this can be found in tutorial 3.\n// list the statements in the Model StmtIterator iter = model.listStatements(); // print out the predicate, subject and object of each statement while (iter.hasNext()) { Statement stmt = iter.nextStatement(); // get next statement Resource subject = stmt.getSubject(); // get the subject Property predicate = stmt.getPredicate(); // get the predicate RDFNode object = stmt.getObject(); // get the object System.out.print(subject.toString()); System.out.print(\u0026#34; \u0026#34; + predicate.toString() + \u0026#34; \u0026#34;); if (object instanceof Resource) { System.out.print(object.toString()); } else { // object is a literal System.out.print(\u0026#34; \\\u0026#34;\u0026#34; + object.toString() + \u0026#34;\\\u0026#34;\u0026#34;); } System.out.println(\u0026#34; .\u0026#34;); } Since the object of a statement can be either a resource or a literal, the getObject() method returns an object typed as RDFNode, which is a common superclass of both Resource and Literal. The underlying object is of the appropriate type, so the code uses instanceof to determine which and processes it accordingly.\nWhen run, this program should produce output resembling:\nhttp://somewhere/JohnSmith 413f6415-c3b0-4259-b74d-4bd6e757eb60 . 413f6415-c3b0-4259-b74d-4bd6e757eb60 \u0026#34;Smith\u0026#34; . 413f6415-c3b0-4259-b74d-4bd6e757eb60 \u0026#34;John\u0026#34; . http://somewhere/JohnSmith \u0026#34;John Smith\u0026#34; . Now you know why it is clearer to draw Models. If you look carefully, you will see that each line consists of three fields representing the subject, predicate and object of each statement. There are four arcs in the Model, so there are four statements. The \"14df86:ecc3dee17b:-7fff\" is an internal identifier generated by Jena. It is not a URI and should not be confused with one. It is simply an internal label used by the Jena implementation.\nThe W3C RDFCore Working Group have defined a similar simple notation called N-Triples. The name means \"triple notation\". We will see in the next section that Jena has an N-Triples writer built in.\nWriting RDF Jena has methods for reading and writing RDF as XML. These can be used to save an RDF model to a file and later read it back in again.\nTutorial 3 created a model and wrote it out in triple form. Tutorial 4 modifies tutorial 3 to write the model in RDF XML form to the standard output stream. The code again, is very simple: model.write can take an OutputStream argument.\n// now write the model in XML form to a file model.write(System.out); The output should look something like this:\n\u0026lt;rdf:RDF xmlns:rdf=\u0026#39;\u0026#39; xmlns:vcard=\u0026#39;\u0026#39; \u0026gt; \u0026lt;rdf:Description rdf:about=\u0026#39;http://somewhere/JohnSmith\u0026#39;\u0026gt; \u0026lt;vcard:FN\u0026gt;John Smith\u0026lt;/vcard:FN\u0026gt; \u0026lt;vcard:N rdf:nodeID=\u0026#34;A0\u0026#34;/\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;rdf:Description rdf:nodeID=\u0026#34;A0\u0026#34;\u0026gt; \u0026lt;vcard:Given\u0026gt;John\u0026lt;/vcard:Given\u0026gt; \u0026lt;vcard:Family\u0026gt;Smith\u0026lt;/vcard:Family\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;/rdf:RDF\u0026gt; The RDF specifications specify how to represent RDF as XML. The RDF XML syntax is quite complex. The reader is referred to the primer being developed by the RDFCore WG for a more detailed introduction. However, let's take a quick look at how to interpret the above.\nRDF is usually embedded in an \u0026lt;rdf:RDF\u0026gt; element. The element is optional if there are other ways of knowing that some XML is RDF, but it is usually present. The RDF element defines the two namespaces used in the document. There is then an \u0026lt;rdf:Description\u0026gt; element which describes the resource whose URI is \"http://somewhere/JohnSmith\". If the rdf:about attribute was missing, this element would represent a blank node.\nThe \u0026lt;vcard:FN\u0026gt; element describes a property of the resource. The property name is the \"FN\" in the vcard namespace. RDF converts this to a URI reference by concatenating the URI reference for the namespace prefix and \"FN\", the local name part of the name. This gives a URI reference of \"\". The value of the property is the literal \"John Smith\".\nThe \u0026lt;vcard:N\u0026gt; element is a resource. In this case the resource is represented by a relative URI reference. RDF converts this to an absolute URI reference by concatenating it with the base URI of the current document.\nThere is an error in this RDF XML; it does not exactly represent the Model we created. The blank node in the Model has been given a URI reference. It is no longer blank. The RDF/XML syntax is not capable of representing all RDF Models; for example it cannot represent a blank node which is the object of two statements. The 'dumb' writer we used to write this RDF/XML makes no attempt to write correctly the subset of Models which can be written correctly. It gives a URI to each blank node, making it no longer blank.\nJena has an extensible interface which allows new writers for different serialization languages for RDF to be easily plugged in. The above call invoked the standard 'dumb' writer. Jena also includes a more sophisticated RDF/XML writer which can be invoked by using RDFDataMgr.write function call:\n// now write the model in a pretty form RDFDataMgr.write(System.out, model, Lang.RDFXML); This writer, the so called PrettyWriter, takes advantage of features of the RDF/XML abbreviated syntax to write a Model more compactly. It is also able to preserve blank nodes where that is possible. It is however, not suitable for writing very large Models, as its performance is unlikely to be acceptable. To write large files and preserve blank nodes, write in N-Triples format:\n// now write the model in N-TRIPLES form RDFDataMgr.write(System.out, model, Lang.NTRIPLES); This will produce output similar to that of tutorial 3 which conforms to the N-Triples specification.\nReading RDF Tutorial 5 demonstrates reading the statements recorded in RDF XML form into a model. With this tutorial, we have provided a small database of vcards in RDF/XML form. The following code will read it in and write it out. Note that for this application to run, the input file must be in the current directory.\n// create an empty model Model model = ModelFactory.createDefaultModel(); // use the RDFDataMgr to find the input file InputStream in = inputFileName ); if (in == null) { throw new IllegalArgumentException(\u0026#34;File: \u0026#34; + inputFileName + \u0026#34; not found\u0026#34;); } // read the RDF/XML file, null); // write it to standard out model.write(System.out); The second argument to the read() method call is the URI which will be used for resolving relative URI's. As there are no relative URI references in the test file, it is allowed to be empty. When run, tutorial 5 will produce XML output which looks like:\n\u0026lt;rdf:RDF xmlns:rdf=\u0026#39;\u0026#39; xmlns:vcard=\u0026#39;\u0026#39; \u0026gt; \u0026lt;rdf:Description rdf:nodeID=\u0026#34;A0\u0026#34;\u0026gt; \u0026lt;vcard:Family\u0026gt;Smith\u0026lt;/vcard:Family\u0026gt; \u0026lt;vcard:Given\u0026gt;John\u0026lt;/vcard:Given\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;rdf:Description rdf:about=\u0026#39;http://somewhere/JohnSmith/\u0026#39;\u0026gt; \u0026lt;vcard:FN\u0026gt;John Smith\u0026lt;/vcard:FN\u0026gt; \u0026lt;vcard:N rdf:nodeID=\u0026#34;A0\u0026#34;/\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;rdf:Description rdf:about=\u0026#39;http://somewhere/SarahJones/\u0026#39;\u0026gt; \u0026lt;vcard:FN\u0026gt;Sarah Jones\u0026lt;/vcard:FN\u0026gt; \u0026lt;vcard:N rdf:nodeID=\u0026#34;A1\u0026#34;/\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;rdf:Description rdf:about=\u0026#39;http://somewhere/MattJones/\u0026#39;\u0026gt; \u0026lt;vcard:FN\u0026gt;Matt Jones\u0026lt;/vcard:FN\u0026gt; \u0026lt;vcard:N rdf:nodeID=\u0026#34;A2\u0026#34;/\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;rdf:Description rdf:nodeID=\u0026#34;A3\u0026#34;\u0026gt; \u0026lt;vcard:Family\u0026gt;Smith\u0026lt;/vcard:Family\u0026gt; \u0026lt;vcard:Given\u0026gt;Rebecca\u0026lt;/vcard:Given\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;rdf:Description rdf:nodeID=\u0026#34;A1\u0026#34;\u0026gt; \u0026lt;vcard:Family\u0026gt;Jones\u0026lt;/vcard:Family\u0026gt; \u0026lt;vcard:Given\u0026gt;Sarah\u0026lt;/vcard:Given\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;rdf:Description rdf:nodeID=\u0026#34;A2\u0026#34;\u0026gt; \u0026lt;vcard:Family\u0026gt;Jones\u0026lt;/vcard:Family\u0026gt; \u0026lt;vcard:Given\u0026gt;Matthew\u0026lt;/vcard:Given\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;rdf:Description rdf:about=\u0026#39;http://somewhere/RebeccaSmith/\u0026#39;\u0026gt; \u0026lt;vcard:FN\u0026gt;Becky Smith\u0026lt;/vcard:FN\u0026gt; \u0026lt;vcard:N rdf:nodeID=\u0026#34;A3\u0026#34;/\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;/rdf:RDF\u0026gt; Controlling Prefixes Explicit prefix definitions In the previous section, we saw that the output XML declared a namespace prefix vcard and used that prefix to abbreviate URIs. While RDF uses only the full URIs, and not this shortened form, Jena provides ways of controlling the namespaces used on output with its prefix mappings. Here\u0026rsquo;s a simple example.\nModel m = ModelFactory.createDefaultModel(); String nsA = \u0026#34;http://somewhere/else#\u0026#34;; String nsB = \u0026#34;http://nowhere/else#\u0026#34;; Resource root = m.createResource( nsA + \u0026#34;root\u0026#34; ); Property P = m.createProperty( nsA + \u0026#34;P\u0026#34; ); Property Q = m.createProperty( nsB + \u0026#34;Q\u0026#34; ); Resource x = m.createResource( nsA + \u0026#34;x\u0026#34; ); Resource y = m.createResource( nsA + \u0026#34;y\u0026#34; ); Resource z = m.createResource( nsA + \u0026#34;z\u0026#34; ); m.add( root, P, x ).add( root, P, y ).add( y, Q, z ); System.out.println( \u0026#34;# -- no special prefixes defined\u0026#34; ); m.write( System.out ); System.out.println( \u0026#34;# -- nsA defined\u0026#34; ); m.setNsPrefix( \u0026#34;nsA\u0026#34;, nsA ); m.write( System.out ); System.out.println( \u0026#34;# -- nsA and cat defined\u0026#34; ); m.setNsPrefix( \u0026#34;cat\u0026#34;, nsB ); m.write( System.out ); The output from this fragment is three lots of RDF/XML, with three different prefix mappings. First the default, with no prefixes other than the standard ones:\n# -- no special prefixes defined \u0026lt;rdf:RDF xmlns:j.0=\u0026#34;http://nowhere/else#\u0026#34; xmlns:rdf=\u0026#34;\u0026#34; xmlns:j.1=\u0026#34;http://somewhere/else#\u0026#34; \u0026gt; \u0026lt;rdf:Description rdf:about=\u0026#34;http://somewhere/else#root\u0026#34;\u0026gt; \u0026lt;j.1:P rdf:resource=\u0026#34;http://somewhere/else#x\u0026#34;/\u0026gt; \u0026lt;j.1:P rdf:resource=\u0026#34;http://somewhere/else#y\u0026#34;/\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;rdf:Description rdf:about=\u0026#34;http://somewhere/else#y\u0026#34;\u0026gt; \u0026lt;j.0:Q rdf:resource=\u0026#34;http://somewhere/else#z\u0026#34;/\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;/rdf:RDF\u0026gt; We see that the rdf namespace is declared automatically, since it is required for tags such as \u0026lt;rdf:RDF\u0026gt; and \u0026lt;rdf:resource\u0026gt;. XML namespace declarations are also needed for using the two properties P and Q, but since their prefixes have not been introduced to the model in this example, they get invented namespace names: j.0 and j.1.\nThe method setNsPrefix(String prefix, String URI) declares that the namespace URI may be abbreviated by prefix. Jena requires that prefix be a legal XML namespace name, and that URI ends with a non-name character. The RDF/XML writer will turn these prefix declarations into XML namespace declarations and use them in its output: # -- nsA defined \u0026lt;rdf:RDF xmlns:j.0=\u0026#34;http://nowhere/else#\u0026#34; xmlns:rdf=\u0026#34;\u0026#34; xmlns:nsA=\u0026#34;http://somewhere/else#\u0026#34; \u0026gt; \u0026lt;rdf:Description rdf:about=\u0026#34;http://somewhere/else#root\u0026#34;\u0026gt; \u0026lt;nsA:P rdf:resource=\u0026#34;http://somewhere/else#x\u0026#34;/\u0026gt; \u0026lt;nsA:P rdf:resource=\u0026#34;http://somewhere/else#y\u0026#34;/\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;rdf:Description rdf:about=\u0026#34;http://somewhere/else#y\u0026#34;\u0026gt; \u0026lt;j.0:Q rdf:resource=\u0026#34;http://somewhere/else#z\u0026#34;/\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;/rdf:RDF\u0026gt; The other namespace still gets the constructed name, but the nsA name is now used in the property tags. There\u0026rsquo;s no need for the prefix name to have anything to do with the variables in the Jena code:\n# -- nsA and cat defined \u0026lt;rdf:RDF xmlns:cat=\u0026#34;http://nowhere/else#\u0026#34; xmlns:rdf=\u0026#34;\u0026#34; xmlns:nsA=\u0026#34;http://somewhere/else#\u0026#34; \u0026gt; \u0026lt;rdf:Description rdf:about=\u0026#34;http://somewhere/else#root\u0026#34;\u0026gt; \u0026lt;nsA:P rdf:resource=\u0026#34;http://somewhere/else#x\u0026#34;/\u0026gt; \u0026lt;nsA:P rdf:resource=\u0026#34;http://somewhere/else#y\u0026#34;/\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;rdf:Description rdf:about=\u0026#34;http://somewhere/else#y\u0026#34;\u0026gt; \u0026lt;cat:Q rdf:resource=\u0026#34;http://somewhere/else#z\u0026#34;/\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;/rdf:RDF\u0026gt; Both prefixes are used for output, and no generated prefixes are needed.\nImplicit prefix definitions As well as prefix declarations provided by calls to setNsPrefix, Jena will remember the prefixes that were used in input to\nTake the output produced by the previous fragment, and paste it into some file, with URL file:/tmp/fragment.rdf say. Then run the code: Model m2 = ModelFactory.createDefaultModel(); \u0026#34;file:/tmp/fragment.rdf\u0026#34; ); m2.write( System.out ); You\u0026rsquo;ll see that the prefixes from the input are preserved in the output. All the prefixes are written, even if they\u0026rsquo;re not used anywhere. You can remove a prefix with removeNsPrefix(String prefix) if you don\u0026rsquo;t want it in the output.\nSince NTriples doesn't have any short way of writing URIs, it takes no notice of prefixes on output and doesn't provide any on input. The notation N3, also supported by Jena, does have short prefixed names, and records them on input and uses them on output. Jena has further operations on the prefix mappings that a model holds, such as extracting a Java Map of the exiting mappings, or adding a whole group of mappings at once; see the documentation for PrefixMapping for details. Jena RDF Packages Jena is a Java API for semantic web applications. The key RDF package for the application developer is org.apache.jena.rdf.model. The API has been defined in terms of interfaces so that application code can work with different implementations without change. This package contains interfaces for representing models, resources, properties, literals, statements and all the other key concepts of RDF, and a ModelFactory for creating models. So that application code remains independent of the implementation, it is best if it uses interfaces wherever possible, not specific class implementations.\nThe org.apache.jena.tutorial package contains the working source code for all the examples used in this tutorial.\nThe org.apache.jena...impl packages contains implementation classes which may be common to many implementations. For example, they defines classes ResourceImpl, PropertyImpl, and LiteralImpl which may be used directly or subclassed by different implementations. Applications should rarely, if ever, use these classes directly. For example, rather than creating a new instance of ResourceImpl, it is better to use the createResource method of whatever model is being used. That way, if the model implementation has used an optimized implementation of Resource, then no conversions between the two types will be necessary.\nNavigating a Model So far, this tutorial has dealt mainly with creating, reading and writing RDF Models. It is now time to deal with accessing information held in a Model.\nGiven the URI of a resource, the resource object can be retrieved from a model using the Model.getResource(String uri) method. This method is defined to return a Resource object if one exists in the model, or otherwise to create a new one. For example, to retrieve the John Smith resource from the model read in from the file in tutorial 5:\n// retrieve the John Smith vcard resource from the model Resource vcard = model.getResource(johnSmithURI);\u0026lt;/code\u0026gt; The Resource interface defines a number of methods for accessing the properties of a resource. The Resource.getProperty(Property p) method accesses a property of the resource. This method does not follow the usual Java accessor convention in that the type of the object returned is Statement, not the Property that you might have expected. Returning the whole statement allows the application to access the value of the property using one of its accessor methods which return the object of the statement. For example to retrieve the resource which is the value of the vcard:N property:\n// retrieve the value of the N property Resource name = (Resource) vcard.getProperty(VCARD.N) .getObject(); In general, the object of a statement could be a resource or a literal, so the application code, knowing the value must be a resource, casts the returned object. One of the things that Jena tries to do is to provide type specific methods so the application does not have to cast and type checking can be done at compile time. The code fragment above, can be more conveniently written:\n// retrieve the value of the N property Resource name = vcard.getProperty(VCARD.N) .getResource(); Similarly, the literal value of a property can be retrieved:\nString fullName = vcard.getProperty(VCARD.FN) .getString(); In this example, the vcard resource has only one vcard:FN and one vcard:N property. RDF permits a resource to repeat a property; for example Adam might have more than one nickname. Let's give him two:\n// add two nickname properties to vcard vcard.addProperty(VCARD.NICKNAME, \u0026#34;Smithy\u0026#34;) .addProperty(VCARD.NICKNAME, \u0026#34;Adman\u0026#34;); As noted before, Jena represents an RDF Model as set of statements, so adding a statement with the subject, predicate and object as one already in the Model will have no effect. Jena does not define which of the two nicknames present in the Model will be returned. The result of calling vcard.getProperty(VCARD.NICKNAME) is indeterminate. Jena will return one of the values, but there is no guarantee even that two consecutive calls will return the same value.\nIf it is possible that a property may occur more than once, then the Resource.listProperties(Property p) method can be used to return an iterator which will list them all. This method returns an iterator which returns objects of type Statement. We can list the nicknames like this:\n// set up the output System.out.println(\u0026#34;The nicknames of \\\u0026#34;\u0026#34; + fullName + \u0026#34;\\\u0026#34; are:\u0026#34;); // list the nicknames StmtIterator iter = vcard.listProperties(VCARD.NICKNAME); while (iter.hasNext()) { System.out.println(\u0026#34; \u0026#34; + iter.nextStatement() .getObject() .toString()); } This code can be found in tutorial 6. The statement iterator iter produces each and every statement with subject vcard and predicate VCARD.NICKNAME, so looping over it allows us to fetch each statement by using nextStatement(), get the object field, and convert it to a string. The code produces the following output when run:\nThe nicknames of \u0026#34;John Smith\u0026#34; are: Smithy Adman All the properties of a resource can be listed by using the listProperties() method without an argument. Querying a Model The previous section dealt with the case of navigating a model from a resource with a known URI. This section deals with searching a model. The core Jena API supports only a limited query primitive. The more powerful query facilities of SPARQL are described elsewhere.\nThe Model.listStatements() method, which lists all the statements in a model, is perhaps the crudest way of querying a model. Its use is not recommended on very large Models. Model.listSubjects() is similar, but returns an iterator over all resources that have properties, ie are the subject of some statement.\nModel.listSubjectsWithProperty(Property p, RDFNode o) will return an iterator over all the resources which have property p with value o. If we assume that only vcard resources will have vcard:FN property, and that in our data, all such resources have such a property, then we can find all the vcards like this:\n// list vcards ResIterator iter = model.listSubjectsWithProperty(VCARD.FN); while (iter.hasNext()) { Resource r = iter.nextResource(); ... } All these query methods are simply syntactic sugar over a primitive query method model.listStatements(Selector s). This method returns an iterator over all the statements in the model 'selected' by s. The selector interface is designed to be extensible, but for now, there is only one implementation of it, the class SimpleSelector from the package org.apache.jena.rdf.model. Using SimpleSelector is one of the rare occasions in Jena when it is necessary to use a specific class rather than an interface. The SimpleSelector constructor takes three arguments:\nSelector selector = new SimpleSelector(subject, predicate, object); This selector will select all statements with a subject that matches subject, a predicate that matches predicate and an object that matches object. If a null is supplied in any of the positions, it matches anything; otherwise they match corresponding equal resources or literals. (Two resources are equal if they have equal URIs or are the same blank node; two literals are the same if all their components are equal.) Thus:\nSelector selector = new SimpleSelector(null, null, null); will select all the statements in a Model.\nSelector selector = new SimpleSelector(null, VCARD.FN, null); will select all the statements with VCARD.FN as their predicate, whatever the subject or object. As a special shorthand, listStatements( S, P, O ) is equivalent to\nlistStatements( new SimpleSelector( S, P, O ) ) The following code, which can be found in full in tutorial 7 lists the full names on all the vcards in the database.\n// select all the resources with a VCARD.FN property ResIterator iter = model.listSubjectsWithProperty(VCARD.FN); if (iter.hasNext()) { System.out.println(\u0026#34;The database contains vcards for:\u0026#34;); while (iter.hasNext()) { System.out.println(\u0026#34; \u0026#34; + iter.nextResource() .getProperty(VCARD.FN) .getString()); } } else { System.out.println(\u0026#34;No vcards were found in the database\u0026#34;); } This should produce output similar to the following:\nThe database contains vcards for: Sarah Jones John Smith Matt Jones Becky Smith Your next exercise is to modify this code to use SimpleSelector instead of listSubjectsWithProperty.\nLets see how to implement some finer control over the statements selected. SimpleSelector can be subclassed and its selects method modified to perform further filtering:\n// select all the resources with a VCARD.FN property // whose value ends with \u0026#34;Smith\u0026#34; StmtIterator iter = model.listStatements( new SimpleSelector(null, VCARD.FN, (RDFNode) null) { public boolean selects(Statement s) {return s.getString().endsWith(\u0026#34;Smith\u0026#34;);} }); This sample code uses a neat Java technique of overriding a method definition inline when creating an instance of the class. Here the selects(...) method checks to ensure that the full name ends with \"Smith\". It is important to note that filtering based on the subject, predicate and object arguments takes place before the selects(...) method is called, so the extra test will only be applied to matching statements.\nThe full code can be found in tutorial 8 and produces output like this:\nThe database contains vcards for: John Smith Becky Smith You might think that:\n// do all filtering in the selects method StmtIterator iter = model.listStatements( new SimpleSelector(null, null, (RDFNode) null) { public boolean selects(Statement s) { return (subject == null || s.getSubject().equals(subject)) \u0026amp;amp;\u0026amp;amp; (predicate == null || s.getPredicate().equals(predicate)) \u0026amp;amp;\u0026amp;amp; (object == null || s.getObject().equals(object)) ; } } }); is equivalent to:\nStmtIterator iter = model.listStatements(new SimpleSelector(subject, predicate, object) Whilst functionally they may be equivalent, the first form will list all the statements in the Model and test each one individually, whilst the second allows indexes maintained by the implementation to improve performance. Try it on a large Model and see for yourself, but make a cup of coffee first.\nOperations on Models Jena provides three operations for manipulating Models as a whole. These are the common set operations of union, intersection and difference.\nThe union of two Models is the union of the sets of statements which represent each Model. This is one of the key operations that the design of RDF supports. It enables data from disparate data sources to be merged. Consider the following two Models:\nand When these are merged, the two http://...JohnSmith nodes are merged into one and the duplicate vcard:FN arc is dropped to produce:\nLets look at the code to do this (the full code is in tutorial 9) and see what happens.\n// read the RDF/XML files InputStreamReader(in1), \u0026#34;\u0026#34;); InputStreamReader(in2), \u0026#34;\u0026#34;); // merge the Models Model model = model1.union(model2); // print the Model as RDF/XML model.write(system.out, \u0026#34;RDF/XML-ABBREV\u0026#34;); The output produced by the pretty writer looks like this:\n\u0026lt;rdf:RDF xmlns:rdf=\u0026#34;\u0026lt;a href=\u0026#34;\u0026#34;\u0026gt;\u0026lt;/a\u0026gt;\u0026#34; xmlns:vcard=\u0026#34;\u0026#34;\u0026gt; \u0026lt;rdf:Description rdf:about=\u0026#34;http://somewhere/JohnSmith/\u0026#34;\u0026gt; \u0026lt;vcard:EMAIL\u0026gt; \u0026lt;vcard:internet\u0026gt; \u0026lt;rdf:value\u0026gt;\u0026lt;/rdf:value\u0026gt; \u0026lt;/vcard:internet\u0026gt; \u0026lt;/vcard:EMAIL\u0026gt; \u0026lt;vcard:N rdf:parseType=\u0026#34;Resource\u0026#34;\u0026gt; \u0026lt;vcard:Given\u0026gt;John\u0026lt;/vcard:Given\u0026gt; \u0026lt;vcard:Family\u0026gt;Smith\u0026lt;/vcard:Family\u0026gt; \u0026lt;/vcard:N\u0026gt; \u0026lt;vcard:FN\u0026gt;John Smith\u0026lt;/vcard:FN\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;/rdf:RDF\u0026gt; Even if you are unfamiliar with the details of the RDF/XML syntax, it should be reasonably clear that the Models have merged as expected. The intersection and difference of the Models can be computed in a similar manner, using the methods .intersection(Model) and .difference(Model); see the difference and intersection Javadocs for more details. Containers RDF defines a special kind of resources for representing collections of things. These resources are called containers. The members of a container can be either literals or resources. There are three kinds of container:\na BAG is an unordered collection an ALT is an unordered collection intended to represent alternatives a SEQ is an ordered collection A container is represented by a resource. That resource will have an rdf:type property whose value should be one of rdf:Bag, rdf:Alt or rdf:Seq, or a subclass of one of these, depending on the type of the container. The first member of the container is the value of the container's rdf:_1 property; the second member of the container is the value of the container's rdf:_2 property and so on. The rdf:_nnn properties are known as the ordinal properties.\nFor example, the Model for a simple bag containing the vcards of the Smith's might look like this:\nWhilst the members of the bag are represented by the properties rdf:_1, rdf:_2 etc the ordering of the properties is not significant. We could switch the values of the rdf:_1 and rdf:_2 properties and the resulting Model would represent the same information.\nAlt's are intended to represent alternatives. For example, lets say a resource represented a software product. It might have a property to indicate where it might be obtained from. The value of that property might be an Alt collection containing various sites from which it could be downloaded. Alt's are unordered except that the rdf:_1 property has special significance. It represents the default choice.\nWhilst containers can be handled using the basic machinery of resources and properties, Jena has explicit interfaces and implementation classes to handle them. It is not a good idea to have an object manipulating a container, and at the same time to modify the state of that container using the lower level methods.\nLet's modify tutorial 8 to create this bag:\n// create a bag Bag smiths = model.createBag(); // select all the resources with a VCARD.FN property // whose value ends with \u0026#34;Smith\u0026#34; StmtIterator iter = model.listStatements( new SimpleSelector(null, VCARD.FN, (RDFNode) null) { public boolean selects(Statement s) { return s.getString().endsWith(\u0026#34;Smith\u0026#34;); } }); // add the Smith\u0026#39;s to the bag while (iter.hasNext()) { smiths.add(iter.nextStatement().getSubject()); } If we write out this Model, it contains something like the following:\n\u0026lt;rdf:RDF xmlns:rdf=\u0026#39;\u0026#39; xmlns:vcard=\u0026#39;\u0026#39; \u0026gt; ... \u0026lt;rdf:Description rdf:nodeID=\u0026#34;A3\u0026#34;\u0026gt; \u0026lt;rdf:type rdf:resource=\u0026#39;\u0026#39;/\u0026gt; \u0026lt;rdf:_1 rdf:resource=\u0026#39;http://somewhere/JohnSmith/\u0026#39;/\u0026gt; \u0026lt;rdf:_2 rdf:resource=\u0026#39;http://somewhere/RebeccaSmith/\u0026#39;/\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;/rdf:RDF\u0026gt; which represents the Bag resource.\nThe container interface provides an iterator to list the contents of a container:\n// print out the members of the bag NodeIterator iter2 = smiths.iterator(); if (iter2.hasNext()) { System.out.println(\u0026#34;The bag contains:\u0026#34;); while (iter2.hasNext()) { System.out.println(\u0026#34; \u0026#34; + ((Resource) .getProperty(VCARD.FN) .getString()); } } else { System.out.println(\u0026#34;The bag is empty\u0026#34;); } which produces the following output:\nThe bag contains: John Smith Becky Smith Executable example code can be found in tutorial 10, which glues together the fragments above into a complete example.\nThe Jena classes offer methods for manipulating containers including adding new members, inserting new members into the middle of a container and removing existing members. The Jena container classes currently ensure that the list of ordinal properties used starts at rdf:_1 and is contiguous. The RDFCore WG have relaxed this constraint, which allows partial representation of containers. This therefore is an area of Jena may be changed in the future.\nMore about Literals and Datatypes RDF literals are not just simple strings. Literals may have a language tag to indicate the language of the literal. The literal \"chat\" with an English language tag is considered different to the literal \"chat\" with a French language tag. This rather strange behaviour is an artefact of the original RDF/XML syntax.\nFurther there are really two sorts of Literals. In one, the string component is just that, an ordinary string. In the other the string component is expected to be a well balanced fragment of XML. When an RDF Model is written as RDF/XML a special construction using a parseType='Literal' attribute is used to represent it.\nIn Jena, these attributes of a literal may be set when the literal is constructed, e.g. in tutorial 11:\n// create the resource Resource r = model.createResource(); // add the property r.addProperty(RDFS.label, model.createLiteral(\u0026#34;chat\u0026#34;, \u0026#34;en\u0026#34;)) .addProperty(RDFS.label, model.createLiteral(\u0026#34;chat\u0026#34;, \u0026#34;fr\u0026#34;)) .addProperty(RDFS.label, model.createLiteral(\u0026#34;\u0026amp;lt;em\u0026amp;gt;chat\u0026amp;lt;/em\u0026amp;gt;\u0026#34;, true)); // write out the Model model.write(system.out); produces\n\u0026lt;rdf:RDF xmlns:rdf=\u0026#39;\u0026#39; xmlns:rdfs=\u0026#39;\u0026#39; \u0026gt; \u0026lt;rdf:Description rdf:nodeID=\u0026#34;A0\u0026#34;\u0026gt; \u0026lt;rdfs:label xml:lang=\u0026#39;en\u0026#39;\u0026gt;chat\u0026lt;/rdfs:label\u0026gt; \u0026lt;rdfs:label xml:lang=\u0026#39;fr\u0026#39;\u0026gt;chat\u0026lt;/rdfs:label\u0026gt; \u0026lt;rdfs:label rdf:parseType=\u0026#39;Literal\u0026#39;\u0026gt;\u0026lt;em\u0026gt;chat\u0026lt;/em\u0026gt;\u0026lt;/rdfs:label\u0026gt; \u0026lt;/rdf:Description\u0026gt; \u0026lt;/rdf:RDF\u0026gt; For two literals to be considered equal, they must either both be XML literals or both be simple literals. In addition, either both must have no language tag, or if language tags are present they must be equal. For simple literals the strings must be equal. XML literals have two notions of equality. The simple notion is that the conditions previously mentioned are true and the strings are also equal. The other notion is that they can be equal if the canonicalization of their strings is equal.\nJena's interfaces also support typed literals. The old-fashioned way (shown below) treats typed literals as shorthand for strings: typed values are converted in the usual Java way to strings and these strings are stored in the Model. For example, try (noting that for simple literals, we can omit the model.createLiteral(...) call):\n// create the resource Resource r = model.createResource(); // add the property r.addProperty(RDFS.label, \u0026#34;11\u0026#34;) .addProperty(RDFS.label, 11); // write out the Model model.write(system.out, \u0026#34;N-TRIPLE\u0026#34;); The output produced is:\n_:A... \u0026lt;\u0026gt; \u0026#34;11\u0026#34; . Since both literals are really just the string \"11\", then only one statement is added.\nThe RDFCore WG has defined mechanisms for supporting datatypes in RDF. Jena supports these using the typed literal mechanisms; they are not discussed in this tutorial.\nGlossary Blank Node Represents a resource, but does not indicate a URI for the resource. Blank nodes act like existentially qualified variables in first order logic. Dublin Core A standard for metadata about web resources. Further information can be found at the Dublin Core web site. Literal A string of characters which can be the value of a property. Object The part of a triple which is the value of the statement. Predicate The property part of a triple. Property A property is an attribute of a resource. For example DC.title is a property, as is RDF.type. Resource Some entity. It could be a web resource such as web page, or it could be a concrete physical thing such as a tree or a car. It could be an abstract idea such as chess or football. Resources are named by URI's. Statement An arc in an RDF Model, normally interpreted as a fact. Subject The resource which is the source of an arc in an RDF Model Triple A structure containing a subject, a predicate and an object. Another term for a statement. Footnotes The identifier of an RDF resource can include a fragment identifier, e.g. http://hostname/rdf/tutorial/#ch-Introduction, so, strictly speaking, an RDF resource is identified by a URI reference. As well as being a string of characters, literals also have an optional language encoding to represent the language of the string. For example the literal \"two\" might have a language encoding of \"en\" for English and the literal \"deux\" might have a language encoding of \"fr\" for France. ","permalink":"","tags":null,"title":"An Introduction to RDF and the Jena RDF API"},{"categories":null,"contents":"This section contains detailed information about the various Jena sub-systems, aimed at developers using Jena. For more general introductions, please refer to the Getting started and Tutorial sections.\nDocumentation index The RDF API - the core RDF API in Jena SPARQL - querying and updating RDF models using the SPARQL standards Fuseki - SPARQL server which can present RDF data and answer SPARQL queries over HTTP I/O - reading and writing RDF data RDF Connection - a SPARQL API for local datasets and remote services Assembler - describing recipes for constructing Jena models declaratively using RDF Inference - using the Jena rules engine and other inference algorithms to derive consequences from RDF models Ontology - support for handling OWL models in Jena Data and RDFS - apply RDFS to graphs in a dataset TDB2 - a fast persistent triple store that stores directly to disk TDB - Original TDB database SHACL - SHACL processor for Jena ShEx - ShEx processor for Jena Text Search - enhanced indexes using Lucene for more efficient searching of text literals in Jena models and datasets. GeoSPARQL - support for GeoSPARQL Permissions - a permissions wrapper around Jena RDF implementation JDBC - a SPARQL over JDBC driver framework Tools - various command-line tools and utilities to help developers manage RDF data and other aspects of Jena How-To\u0026rsquo;s - various topic-specific how-to documents QueryBuilder - Classes to simplify the programmatic building of various query and update statements. Extras - various modules that provide utilities and larger packages that make Apache Jena development or usage easier but that do not fall within the standard Jena framework. Javadoc - JavaDoc generated from the Jena source ","permalink":"","tags":null,"title":"Apache Jena documentation overview"},{"categories":null,"contents":" The Jena Elephas module has been retired. The last release of Jena with Elephas is Jena 3.17.0. See jena-elephas/ The original documentation.\n","permalink":"","tags":null,"title":"Apache Jena Elephas"},{"categories":null,"contents":"Apache Jena Elephas is a set of libraries which provide various basic building blocks which enable you to start writing Apache Hadoop based applications which work with RDF data.\nHistorically there has been no serious support for RDF within the Hadoop ecosystem and what support has existed has often been limited and task specific. These libraries aim to be as generic as possible and provide the necessary infrastructure that enables developers to create their application specific logic without worrying about the underlying plumbing.\nBeta These modules are currently considered to be in a Beta state, they have been under active development for about a year but have not yet been widely deployed and may contain as yet undiscovered bugs.\nPlease see the How to Report a Bug page for how to report any bugs you may encounter.\nDocumentation Overview Getting Started APIs Common IO Map/Reduce Javadoc Examples RDF Stats Demo Maven Artifacts Overview Apache Jena Elephas is published as a set of Maven module via its maven artifacts. The source for these libraries may be downloaded as part of the source distribution. These modules are built against the Hadoop 2.x. APIs and no backwards compatibility for 1.x is provided.\nThe core aim of these libraries it to provide the basic building blocks that allow users to start writing Hadoop applications that work with RDF. They are mostly fairly low level components but they are designed to be used as building blocks to help users and developers focus on actual application logic rather than on the low level plumbing.\nFirstly at the lowest level they provide Writable implementations that allow the basic RDF primitives - nodes, triples and quads - to be represented and exchanged within Hadoop applications, this support is provided by the Common library.\nSecondly they provide support for all the RDF serialisations which Jena supports as both input and output formats subject to the specific limitations of those serialisations. This support is provided by the IO library in the form of standard InputFormat and OutputFormat implementations.\nThere are also a set of basic Mapper and Reducer implementations provided by the Map/Reduce library which contains code that enables various common Hadoop tasks such as counting, filtering, splitting and grouping to be carried out on RDF data. Typically these will be used as a starting point to build more complex RDF processing applications.\nFinally there is a RDF Stats Demo which is a runnable Hadoop job JAR file that demonstrates using these libraries to calculate a number of basic statistics over arbitrary RDF data.\nGetting Started To get started you will need to add the relevant dependencies to your project, the exact dependencies necessary will depend on what you are trying to do. Typically you will likely need at least the IO library and possibly the Map/Reduce library:\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-elephas-io\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;x.y.z\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; \u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-elephas-mapreduce\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;x.y.z\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; Our libraries depend on the relevant Hadoop libraries but since these libraries are typically provided by the Hadoop cluster those dependencies are marked as provided and thus are not transitive. This means that you will typically also need to add the following additional dependencies:\n\u0026lt;!-- Hadoop Dependencies --\u0026gt; \u0026lt;!-- Note these will be provided on the Hadoop cluster hence the provided scope --\u0026gt; \u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.hadoop\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;hadoop-common\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;2.6.0\u0026lt;/version\u0026gt; \u0026lt;scope\u0026gt;provided\u0026lt;/scope\u0026gt; \u0026lt;/dependency\u0026gt; \u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.hadoop\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;hadoop-mapreduce-client-common\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;2.6.0\u0026lt;/version\u0026gt; \u0026lt;scope\u0026gt;provided\u0026lt;/scope\u0026gt; \u0026lt;/dependency\u0026gt; You can then write code to launch a Map/Reduce job that works with RDF. For example let us consider a RDF variation of the classic Hadoop word count example. In this example which we call node count we do the following:\nTake in some RDF triples Split them up into their constituent nodes i.e. the URIs, Blank Nodes \u0026amp; Literals Assign an initial count of one to each node Group by node and sum up the counts Output the nodes and their usage counts We will start with our Mapper implementation, as you can see this simply takes in a triple and splits it into its constituent nodes. It then outputs each node with an initial count of 1:\npackage org.apache.jena.hadoop.rdf.mapreduce.count; import org.apache.jena.hadoop.rdf.types.NodeWritable; import org.apache.jena.hadoop.rdf.types.TripleWritable; import org.apache.jena.graph.Triple; /** * A mapper for counting node usages within triples designed primarily for use * in conjunction with {@link NodeCountReducer} * * @param \u0026lt;TKey\u0026gt; Key type */ public class TripleNodeCountMapper\u0026lt;TKey\u0026gt; extends AbstractNodeTupleNodeCountMapper\u0026lt;TKey, Triple, TripleWritable\u0026gt; { @Override protected NodeWritable[] getNodes(TripleWritable tuple) { Triple t = tuple.get(); return new NodeWritable[] { new NodeWritable(t.getSubject()), new NodeWritable(t.getPredicate()), new NodeWritable(t.getObject()) }; } } And then our Reducer implementation, this takes in the data grouped by node and sums up the counts outputting the node and the final count:\npackage org.apache.jena.hadoop.rdf.mapreduce.count; import; import java.util.Iterator; import; import org.apache.hadoop.mapreduce.Reducer; import org.apache.jena.hadoop.rdf.types.NodeWritable; /** * A reducer which takes node keys with a sequence of longs representing counts * as the values and sums the counts together into pairs consisting of a node * key and a count value. */ public class NodeCountReducer extends Reducer\u0026lt;NodeWritable, LongWritable, NodeWritable, LongWritable\u0026gt; { @Override protected void reduce(NodeWritable key, Iterable\u0026lt;LongWritable\u0026gt; values, Context context) throws IOException, InterruptedException { long count = 0; Iterator\u0026lt;LongWritable\u0026gt; iter = values.iterator(); while (iter.hasNext()) { count +=; } context.write(key, new LongWritable(count)); } } Finally we then need to define an actual Hadoop job we can submit to run this. Here we take advantage of the IO library to provide us with support for our desired RDF input format:\npackage org.apache.jena.hadoop.rdf.stats; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import; import; import org.apache.jena.hadoop.rdf.mapreduce.count.NodeCountReducer; import org.apache.jena.hadoop.rdf.mapreduce.count.TripleNodeCountMapper; import org.apache.jena.hadoop.rdf.types.NodeWritable; public class RdfMapReduceExample { public static void main(String[] args) { try { // Get Hadoop configuration Configuration config = new Configuration(true); // Create job Job job = Job.getInstance(config); job.setJarByClass(RdfMapReduceExample.class); job.setJobName(\u0026quot;RDF Triples Node Usage Count\u0026quot;); // Map/Reduce classes job.setMapperClass(TripleNodeCountMapper.class); job.setMapOutputKeyClass(NodeWritable.class); job.setMapOutputValueClass(LongWritable.class); job.setReducerClass(NodeCountReducer.class); // Input and Output job.setInputFormatClass(TriplesInputFormat.class); job.setOutputFormatClass(NTriplesNodeOutputFormat.class); FileInputFormat.setInputPaths(job, new Path(\u0026quot;/example/input/\u0026quot;)); FileOutputFormat.setOutputPath(job, new Path(\u0026quot;/example/output/\u0026quot;)); // Launch the job and await completion job.submit(); if (job.monitorAndPrintJob()) { // OK System.out.println(\u0026quot;Completed\u0026quot;); } else { // Failed System.err.println(\u0026quot;Failed\u0026quot;); } } catch (Throwable e) { e.printStackTrace(); } } } So this really is no different from configuring any other Hadoop job, we simply have to point to the relevant input and output formats and provide our mapper and reducer. Note that here we use the TriplesInputFormat which can handle RDF in any Jena supported format, if you know your RDF is in a specific format it is usually more efficient to use a more specific input format. Please see the IO page for more detail on the available input formats and the differences between them.\nWe recommend that you next take a look at our RDF Stats Demo which shows how to do some more complex computations by chaining multiple jobs together.\nAPIs There are three main libraries each with their own API:\nCommon - this provides the basic data model for representing RDF data within Hadoop IO - this provides support for reading and writing RDF Map/Reduce - this provides support for writing Map/Reduce jobs that work with RDF ","permalink":"","tags":null,"title":"Apache Jena Elephas"},{"categories":null,"contents":"The Common API provides the basic data model for representing RDF data within Apache Hadoop applications. This primarily takes the form of Writable implementations and the necessary machinery to efficiently serialise and deserialise these.\nCurrently we represent the three main RDF primitives - Nodes, Triples and Quads - though in future a wider range of primitives may be supported if we receive contributions to implement them.\nRDF Primitives Nodes The Writable type for nodes is predictably enough called NodeWritable and it implements the WritableComparable interface which means it can be used as both a key and/or value in Map/Reduce. In standard Hadoop style a get() method returns the actual value as a Jena Node instance while a corresponding set() method allows the value to be set. Conveying null values is acceptable and fully supported.\nNote that nodes are lazily converted to and from the underlying binary representation so there is minimal overhead if you create a NodeWritable instance that does not actually ever get read/written.\nNodeWritable supports and automatically registers itself for Hadoop\u0026rsquo;s WritableComparator mechanism which allows it to provide high efficiency binary comparisons on nodes which helps reduce phases run faster by avoiding unnecessary deserialisation into POJOs.\nHowever the downside of this is that the sort order for nodes may not be as natural as the sort order using POJOs or when sorting with SPARQL. Ultimately this is a performance trade off and in our experiments the benefits far outweigh the lack of a more natural sort order.\nYou simply use it as follows\nNodeWritable nw = new NodeWritable(); // Set the value nw.set(NodeFactory.createURI(\u0026quot;\u0026quot;)); // Get the value (remember this may be null) Node value = nw.get(); Triples Again the Writable type for nodes is simply called TripleWritable and it also implements the WritableComparable interface meaning it may be used as both a key and/or value. Again the standard Hadoop conventions of a get() and set() method to get/set the value as a Jena Triple are followed. Unlike the NodeWritable this class does not support conveying null values.\nLike the other primitives it is lazily converted to and from the underlying binary representations and it also supports \u0026amp; registers itself for Hadoop\u0026rsquo;s WritableComparator mechanism.\nQuads Similarly the Writable type for quads is again simply called QuadWritable and it implements the WritableComparable interface making it usable as both a key and/or value. As per the other primitives standard Hadoop conventions of a get() and set() method are provided to get/set the value as a Jena Quad. Unlike the NodeWritable this class does not support conveying null values.\nLike the other primitives it is lazily converted to and from the underlying binary representations and it also supports \u0026amp; registers itself for Hadoop\u0026rsquo;s WritableComparator mechanism.\nArbitrary sized tuples In some cases you may have data that is RDF like but not itself RDF or that is a mix of triples and quads in which case you may wish to use the NodeTupleWritable. This is used to represent an arbitrarily sized tuple consisting of zero or more Node instances, there is no restriction on the number of nodes per tuple and no requirement that tuple data be uniform in size.\nLike the other primitives it implements WritableComparable so can be used as a key and/or value. However this primitive does not support binary comparisons meaning it may not perform as well as using the other primitives.\nIn this case the get() and set() methods get/set a Tuple\u0026lt;Node\u0026gt; instance which is a convenience container class provided by ARQ. Currently the implementation does not support lazy conversion so the full Tuple\u0026lt;Node\u0026gt; is reconstructed as soon as an NodeTupleWritable instance is deserialised.\n","permalink":"","tags":null,"title":"Apache Jena Elephas - Common API"},{"categories":null,"contents":"The IO API provides support for reading and writing RDF within Apache Hadoop applications. This is done by providing InputFormat and OutputFormat implementations that cover all the RDF serialisations that Jena supports.\n{% toc %}\nBackground on Hadoop IO If you are already familiar with the Hadoop IO paradigm then please skip this section, if not please read as otherwise some of the later information will not make much sense.\nHadoop applications and particularly Map/Reduce exploit horizontally scalability by dividing input data up into splits where each split represents a portion of the input data that can be read in isolation from the other pieces. This isolation property is very important to understand, if a file format requires that the entire file be read sequentially in order to properly interpret it then it cannot be split and must be read as a whole.\nTherefore depending on the file formats used for your input data you may not get as much parallel performance because Hadoop\u0026rsquo;s ability to split the input data may be limited.\nIn some cases there are file formats that may be processed in multiple ways i.e. you can split them into pieces or you can process them as a whole. Which approach you wish to use will depend on whether you have a single file to process or many files to process. In the case of many files processing files as a whole may provide better overall throughput than processing them as chunks. However your mileage may vary especially if your input data has many files of uneven size.\nCompressed IO Hadoop natively provides support for compressed input and output providing your Hadoop cluster is appropriately configured. The advantage of compressing the input/output data is that it means there is less IO workload on the cluster however this comes with the disadvantage that most compression formats block Hadoop\u0026rsquo;s ability to split up the input.\nHadoop generally handles compression automatically and all our input and output formats are capable of handling compressed input and output as necessary. However in order to use this your Hadoop cluster/job configuration must be appropriately configured to inform Hadoop about what compression codecs are in use.\nFor example to enable BZip2 compression (assuming your cluster doesn\u0026rsquo;t enable this by default):\n// Assumes you already have a Configuration object you are preparing // in the variable config config.set(HadoopIOConstants.IO_COMPRESSION_CODECS, BZip2Codec.class.getCanonicalName()); See the Javadocs for the Hadoop CompressionCodec API to see the available out of the box implementations. Note that some clusters may provide additional compression codecs beyond those built directly into Hadoop.\nRDF IO in Hadoop There are a wide range of RDF serialisations supported by ARQ, please see the RDF IO for an overview of the formats that Jena supports. In this section we go into a lot more depth of how exactly we support RDF IO in Hadoop.\nInput One of the difficulties posed when wrapping these for Hadoop IO is that the formats have very different properties in terms of our ability to split them into distinct chunks for Hadoop to process. So we categorise the possible ways to process RDF inputs as follows:\nLine Based - Each line of the input is processed as a single line Batch Based - The input is processed in batches of N lines (where N is configurable) Whole File - The input is processed as a whole There is then also the question of whether a serialisation encodes triples, quads or can encode both. Where a serialisation encodes both we provide two variants of it so you can choose whether you want to process it as triples/quads.\nBlank Nodes in Input Note that readers familiar with RDF may be wondering how we cope with blank nodes when splitting input and this is an important issue to address.\nEssentially Jena contains functionality that allows it to predictably generate identifiers from the original identifier present in the file e.g. _:blank. This means that wherever _:blank appears in the original file we are guaranteed to assign it the same internal identifier. Note that this functionality uses a seed value to ensure that blank nodes coming from different input files are not assigned the same identifier.\nWhen used with Hadoop this seed is chosen based on a combination of the Job ID and the input file path. This means that the same file processed by different jobs will produce different blank node identifiers each time. However within a job every read of the file will predictably generate blank node identifiers so splitting does not prevent correct blank node identification.\nAdditionally the binary serialisation we use for our RDF primitives (described on the Common API) page guarantees that internal identifiers are preserved as-is when communicating values across the cluster.\nMixed Inputs In many cases your input data may be in a variety of different RDF formats in which case we have you covered. The TriplesInputFormat, QuadsInputFormat and TriplesOrQuadsInputFormat can handle mixture of triples/quads/both triples \u0026amp; quads as desired. Note that in the case of TriplesOrQuadsInputFormat any triples are up-cast into quads in the default graph.\nWith mixed inputs the specific input format to use for each is determined based on the file extensions of each input file, unrecognised extensions will result in an IOException. Compression is handled automatically you simply need to name your files appropriately to indicate the type of compression used e.g. example.ttl.gz would be treated as GZipped Turtle, if you\u0026rsquo;ve used a decent compression tool it should have done this for you. The downside of mixed inputs is that it decides quite late what the input format is which means that it always processes inputs as whole files because it doesn\u0026rsquo;t decide on the format until after it has been asked to split the inputs.\nOutput As with input we also need to be careful about how we output RDF data. Similar to input some serialisations can be output in a streaming fashion while other serialisations require us to store up all the data and then write it out in one go at the end. We use the same categorisations for output though the meanings are slightly different:\nLine Based - Each record is written as soon as it is received Batch Based - Records are cached until N records are seen or the end of output and then the current batch is output (where N is configurable) Whole File - Records are cached until the end of output and then the entire output is written in one go However both the batch based and whole file approaches have the downside that it is possible to exhaust memory if you have large amounts of output to process (or set the batch size too high for batch based output).\nBlank Nodes in Output As with input blank nodes provide a complicating factor in producing RDF output. For whole file output formats this is not an issue but it does need to be considered for line and batch based formats.\nHowever what we have found in practise is that the Jena writers will predictably map internal identifiers to the blank node identifiers in the output serialisations. What this means is that even when processing output in batches we\u0026rsquo;ve found that using the line/batch based formats correctly preserve blank node identity.\nIf you are concerned about potential data corruption as a result of this then you should make sure to always choose a whole file output format but be aware that this can exhaust memory if your output is large.\nBlank Node Divergence in multi-stage pipelines The other thing to consider with regards to blank nodes in output is that Hadoop will by default create multiple output files (one for each reducer) so even if consistent and valid blank nodes are output they may be spread over multiple files.\nIn multi-stage pipelines you may need to manually concatenate these files back together (assuming they are in a format that allows this e.g. NTriples) as otherwise when you pass them as input to the next job the blank node identifiers will diverge from each other. JENA-820 discusses this problem and introduces a special configuration setting that can be used to resolve this. Note that even with this setting enabled some formats are not capable of respecting it, see the later section on Job Configuration Options for more details.\nAn alternative workaround is to always use RDF Thrift as the intermediate output format since it preserves blank node identifiers precisely as they are seen. This also has the advantage that RDF Thrift is extremely fast to read and write which can speed up multi-stage pipelines considerably.\nNode Output Format We also include a special NTriplesNodeOutputFormat which is capable of outputting pairs composed of a NodeWritable key and any value type. Think of this as being similar to the standard Hadoop TextOutputFormat except it understands how to format nodes as valid NTriples serialisation. This format is useful when performing simple statistical analysis such as node usage counts or other calculations over nodes.\nIn the case where the value of the key value pair is also a RDF primitive proper NTriples formatting is also applied to each of the nodes in the value\nRDF Serialisation Support Input The following table categorises how each supported RDF serialisation is processed for input. Note that in some cases we offer multiple ways to process a serialisation.\nRDF Serialisation Line Based Batch Based Whole File Triple Formats NTriplesYesYesYes TurtleNoNoYes RDF/XMLNoNoYes RDF/JSONNoNoYes Quad Formats NQuadsYesYesYes TriGNoNoYes TriXNoNoYes Triple/Quad Formats JSON-LDNoNoYes RDF ThriftNoNoYes Output The following table categorises how each supported RDF serialisation can be processed for output. As with input some serialisations may be processed in multiple ways.\nRDF Serialisation Line Based Batch Based Whole File Triple Formats NTriplesYesNoNo TurtleYesYesNo RDF/XMLNoNoYes RDF/JSONNoNoYes Quad Formats NQuadsYesNoNo TriGYesYesNo TriXYesNoNo Triple/Quad Formats JSON-LDNoNoYes RDF ThriftYesNoNo Job Setup To use RDF as an input and/or output format you will need to configure your Job appropriately, this requires setting the input/output format and setting the data paths:\n// Create a job using default configuration Job job = Job.createInstance(new Configuration(true)); // Use Turtle as the input format job.setInputFormatClass(TurtleInputFormat.class); FileInputFormat.setInputPath(job, \u0026quot;/users/example/input\u0026quot;); // Use NTriples as the output format job.setOutputFormatClass(NTriplesOutputFormat.class); FileOutputFormat.setOutputPath(job, \u0026quot;/users/example/output\u0026quot;); // Other job configuration... This example takes in input in Turtle format from the directory /users/example/input and outputs the end results in NTriples in the directory /users/example/output.\nTake a look at the Javadocs to find the actual available input and output format implementations.\nJob Configuration Options There are a several useful configuration options that can be used to tweak the behaviour of the RDF IO functionality if desired.\nInput Lines per Batch Since our line based input formats use the standard Hadoop NLineInputFormat to decide how to split up inputs we support the standard mapreduce.input.lineinputformat.linespermap configuration setting for changing the number of lines processed per map.\nYou can set this directly in your configuration:\njob.getConfiguration().setInt(NLineInputFormat.LINES_PER_MAP, 100); Or you can use the convenience method of NLineInputFormat like so:\nNLineInputFormat.setNumLinesPerMap(job, 100); Max Line Length When using line based inputs it may be desirable to ignore lines that exceed a certain length (for example if you are not interested in really long literals). Again we use the standard Hadoop configuration setting mapreduce.input.linerecordreader.line.maxlength to control this behaviour:\njob.getConfiguration().setInt(HadoopIOConstants.MAX_LINE_LENGTH, 8192); Ignoring Bad Tuples In many cases you may have data that you know contains invalid tuples, in such cases it can be useful to just ignore the bad tuples and continue. By default we enable this behaviour and will skip over bad tuples though they will be logged as an error. If you want you can disable this behaviour by setting the configuration setting:\njob.getConfiguration().setBoolean(RdfIOConstants.INPUT_IGNORE_BAD_TUPLES, false); Global Blank Node Identity The default behaviour of this library is to allocate file scoped blank node identifiers in such a way that the same syntactic identifier read from the same file is allocated the same blank node ID even across input splits within a job. Conversely the same syntactic identifier in different input files will result in different blank nodes within a job.\nHowever as discussed earlier in the case of multi-stage jobs the intermediate outputs may be split over several files which can cause the blank node identifiers to diverge from each other when they are read back in by subsequent jobs. For multi-stage jobs this is often (but not always) incorrect and undesirable behaviour in which case you will need to set the property to true for the subsequent jobs:\njob.getConfiguration.setBoolean(RdfIOConstants.GLOBAL_BNODE_IDENTITY, true); Important - This should only be set for the later jobs in a multi-stage pipeline and should rarely (if ever) be set for single jobs or the first job of a pipeline.\nEven with this setting enabled not all formats are capable of honouring this option, RDF/XML and JSON-LD will ignore this option and should be avoided as intermediate output formats.\nAs noted earlier an alternative workaround to enabling this setting is to instead use RDF Thrift as the intermediate output format since it guarantees to preserve blank node identifiers as-is on both reads and writes.\nOutput Batch Size The batch size for batched output formats can be controlled by setting the property as desired. The default value for this if not explicitly configured is 10,000:\njob.getConfiguration.setInt(RdfIOConstants.OUTPUT_BATCH_SIZE, 25000); ","permalink":"","tags":null,"title":"Apache Jena Elephas - IO API"},{"categories":null,"contents":"The Map/Reduce API provides a range of building block Mapper and Reducer implementations that can be used as a starting point for building Map/Reduce applications that process RDF. Typically more complex applications will need to implement their own variants but these basic ones may still prove useful as part of a larger pipeline.\n{% toc %}\nTasks The API is divided based upon implementations that support various common Hadoop tasks with appropriate Mapper and Reducer implementations provided for each. In most cases these are implemented to be at least partially abstract to make it easy to implement customised versions of these.\nThe following common tasks are supported:\nCounting Filtering Grouping Splitting Transforming Note that standard Map/Reduce programming rules apply as normal. For example if a mapper/reducer transforms between data types then you need to make setMapOutputKeyClass(), setMapOutputValueClass(), setOutputKeyClass() and setOutputValueClass() calls on your Job configuration as necessary.\nCounting Counting is one of the classic Map/Reduce tasks and features as both the official Map/Reduce example for both Hadoop itself and for Elephas. Implementations cover a number of different counting tasks that you might want to carry out upon RDF data, in most cases you will use the desired Mapper implementation in conjunction with the NodeCountReducer.\nNode Usage The simplest type of counting supported is to count the usages of individual RDF nodes within the triples/quads. Depending on whether your data is triples/quads you can use either the TripleNodeCountMapper or the QuadNodeCountMapper.\nIf you want to count only usages of RDF nodes in a specific position then we also provide variants for that, for example TripleSubjectCountMapper counts only RDF nodes present in the subject position. You can substitute Predicate or Object into the class name in place of Subject if you prefer to count just RDF nodes in the predicate/object position instead. Similarly replace Triple with Quad if you wish to count usage of RDF nodes in specific positions of quads, an additional QuadGraphCountMapper if you want to calculate the size of graphs.\nLiteral Data Types Another interesting variant of counting is to count the usage of literal data types, you can use the TripleDataTypeCountMapper or QuadDataTypeCountMapper if you want to do this.\nNamespaces Finally you may be interested in the usage of namespaces within your data, in this case the TripleNamespaceCountMapper or QuadNamespaceCountMapper can be used to do this. For this use case you should use the TextCountReducer to total up the counts for each namespace. Note that the mappers determine the namespace for a URI simply by splitting after the last # or / in the URI, if no such character exists then the full URI is considered to be the namespace.\nFiltering Filtering is another classic Map/Reduce use case, here you want to take the data and extract only the portions that you are interested in based on some criteria. All our filter Mapper implementations also support a Job configuration option named rdf.mapreduce.filter.invert allowing their effects to be inverted if desired e.g.\nconfig.setBoolean(RdfMapReduceConstants.FILTER_INVERT, true); Valid Data One type of filter that may be useful particularly if you are generating RDF data that may not be strict RDF is the ValidTripleFilterMapper and the ValidQuadFilterMapper. These filters only keep triples/quads that are valid according to strict RDF semantics i.e.\nSubject can only be URI/Blank Node Predicate can only be a URI Object can be a URI/Blank Node/Literal Graph can only be a URI or Blank Node If you wanted to extract only the bad data e.g. for debugging then you can of course invert these filters by setting rdf.mapreduce.filter.invert to true as shown above.\nGround Data In some cases you may only be interesting in triples/quads that are grounded i.e. don\u0026rsquo;t contain blank nodes in which case the GroundTripleFilterMapper and GroundQuadFilterMapper can be used.\nData with a specific URI In lots of case you may want to extract only data where a specific URI occurs in a specific position, for example if you wanted to extract all the rdf:type declarations then you might want to use the TripleFilterByPredicateUriMapper or QuadFilterByPredicateUriMapper as appropriate. The job configuration option rdf.mapreduce.filter.predicate.uris is used to provide a comma separated list of the full URIs you want the filter to accept e.g.\nconfig.set(RdfMapReduceConstants.FILTER_PREDICATE_URIS, \u0026quot;,\u0026quot;); Similar to the counting of node usage you can substitute Predicate for Subject, Object or Graph as desired. You will also need to do this in the job configuration option, for example to filter on subject URIs in quads use the QuadFilterBySubjectUriMapper and the rdf.mapreduce.filter.subject.uris configuration option e.g.\nconfig.set(RdfMapReduceConstants.FILTER_SUBJECT_URIS, \u0026quot;\u0026quot;); Grouping Grouping is again another frequent Map/Reduce use case, here we provide implementations that allow you to group triples or quads by a specific RDF node within the triples/quads e.g. by subject. For example to group quads by predicate use the QuadGroupByPredicateMapper, similar to filtering and counting you can substitute Predicate for Subject, Object or Graph if you wish to group by another node of the triple/quad.\nSplitting Splitting allows you to split triples/quads up into the constituent RDF nodes, we provide two kinds of splitting:\nTo Nodes - Splits pairs of arbitrary keys with triple/quad values into several pairs of the key with the nodes as the values With Nodes - Splits pairs of arbitrary keys with triple/quad values keeping the triple/quad as the key and the nodes as the values. Transforming Transforming provides some very simple implementations that allow you to convert between triples and quads. For the lossy case of going from quads to triples simply use the QuadsToTriplesMapper.\nIf you want to go the other way - triples to quads - this requires adding a graph field to each triple and we provide two implementations that do that. Firstly there is TriplesToQuadsBySubjectMapper which puts each triple into a graph based on its subject i.e. all triples with a common subject go into a graph named for the subject. Secondly there is TriplesToQuadsConstantGraphMapper which simply puts all triples into the default graph, if you wish to change the target graph you should extend this class. If you wanted to select the graph to use based on some arbitrary criteria you should look at extending the AbstractTriplesToQuadsMapper instead.\nExample Jobs Node Count The following example shows how to configure a job which performs a node count i.e. counts the usages of RDF terms (aka nodes in Jena parlance) within the data:\n// Assumes we have already created a Hadoop Configuration // and stored it in the variable config Job job = Job.getInstance(config); // This is necessary as otherwise Hadoop won't ship the JAR to all // nodes and you'll get ClassDefNotFound and similar errors job.setJarByClass(Example.class); // Give our job a friendly name job.setJobName(\u0026quot;RDF Triples Node Usage Count\u0026quot;); // Mapper class // Since the output type is different from the input type have to declare // our output types job.setMapperClass(TripleNodeCountMapper.class); job.setMapOutputKeyClass(NodeWritable.class); job.setMapOutputValueClass(LongWritable.class); // Reducer class job.setReducerClass(NodeCountReducer.class); // Input // TriplesInputFormat accepts any RDF triples serialisation job.setInputFormatClass(TriplesInputFormat.class); // Output // NTriplesNodeOutputFormat produces lines consisting of a Node formatted // according to the NTriples spec and the value separated by a tab job.setOutputFormatClass(NTriplesNodeOutputFormat.class); // Set your input and output paths FileInputFormat.setInputPath(job, new Path(\u0026quot;/example/input\u0026quot;)); FileOutputFormat.setOutputPath(job, new Path(\u0026quot;/example/output\u0026quot;)); // Now run the job... ","permalink":"","tags":null,"title":"Apache Jena Elephas - Map/Reduce API"},{"categories":null,"contents":"The RDF Stats Demo is a pre-built application available as a ready to run Hadoop Job JAR with all dependencies embedded within it. The demo app uses the other libraries to allow calculating a number of basic statistics over any RDF data supported by Elephas.\nTo use it you will first need to build it from source or download the relevant Maven artefact:\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-elephas-stats\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;x.y.z\u0026lt;/version\u0026gt; \u0026lt;classifier\u0026gt;hadoop-job\u0026lt;/classifier\u0026gt; \u0026lt;/dependency\u0026gt; Where x.y.z is the desired version.\nPre-requisites In order to run this demo you will need to have a Hadoop 2.x cluster available, for simple experimentation purposes a single node cluster will be sufficient.\nRunning Assuming your cluster is started and running and the hadoop command is available on your path you can run the application without any arguments to see help:\n\u0026gt; hadoop jar jena-elephas-stats-VERSION-hadoop-job.jar org.apache.jena.hadoop.rdf.stats.RdfStats NAME hadoop jar PATH_TO_JAR org.apache.jena.hadoop.rdf.stats.RdfStats - A command which computes statistics on RDF data using Hadoop SYNOPSIS hadoop jar PATH_TO_JAR org.apache.jena.hadoop.rdf.stats.RdfStats [ {-a | --all} ] [ {-d | --data-types} ] [ {-g | --graph-sizes} ] [ {-h | --help} ] [ --input-type \u0026lt;inputType\u0026gt; ] [ {-n | --node-count} ] [ --namespaces ] {-o | --output} \u0026lt;OutputPath\u0026gt; [ {-t | --type-count} ] [--] \u0026lt;InputPath\u0026gt;... OPTIONS -a, --all Requests that all available statistics be calculated -d, --data-types Requests that literal data type usage counts be calculated -g, --graph-sizes Requests that the size of each named graph be counted -h, --help Display help information --input-type \u0026lt;inputType\u0026gt; Specifies whether the input data is a mixture of quads and triples, just quads or just triples. Using the most specific data type will yield the most accurate statistics This options value is restricted to the following value(s): mixed quads triples -n, --node-count Requests that node usage counts be calculated --namespaces Requests that namespace usage counts be calculated -o \u0026lt;OutputPath\u0026gt;, --output \u0026lt;OutputPath\u0026gt; Sets the output path -t, --type-count Requests that rdf:type usage counts be calculated -- This option can be used to separate command-line options from the list of argument, (useful when arguments might be mistaken for command-line options) \u0026lt;InputPath\u0026gt; Sets the input path(s) If we wanted to calculate the node count on some data we could do the following:\n\u0026gt; hadoop jar jena-elephas-stats-VERSION-hadoop-job.jar org.apache.jena.hadoop.rdf.stats.RdfStats --node-count --output /example/output /example/input This calculates the node counts for the input data found in /example/input placing the generated counts in /example/output\nSpecifying Inputs and Outputs Inputs are specified simply by providing one or more paths to the data you wish to analyse. You can provide directory paths in which case all files within the directory will be processed.\nTo specify the output location use the -o or --output option followed by the desired output path.\nBy default the demo application assumes a mixture of quads and triples data, if you know your data is only in triples/quads then you can use the --input-type argument followed by triples or quads to indicate the type of your data. Not doing this can skew some statistics as the default is to assume mixed data and so all triples are upgraded into quads when calculating the statistics.\nAvailable Statistics The following statistics are available and are activated by the relevant command line option:\nCommand Line OptionStatisticDescription \u0026 Notes -n or --node-countNode CountCounts the occurrences of each unique RDF term i.e. node in Jena parlance -t or --type-countType CountCounts the occurrences of each declared rdf:type value -d or --data-typesData Type CountCounts the occurrences of each declared literal data type --namespacesNamespace CountsCounts the occurrences of namespaces within the data.\nNamespaces are determined by splitting URIs at the # fragment separator if present and if not the last / character -g or --graph-sizesGraph SizesCounts the sizes of each graph declared in the data You can also use the -a or --all option if you simply wish to calculate all statistics.\n","permalink":"","tags":null,"title":"Apache Jena Elephas - RDF Stats Demo"},{"categories":null,"contents":"Apache Jena Fuseki is a SPARQL server. It can run as an operating system service, as a Java web application (WAR file), and as a standalone server.\nFuseki comes in two forms, a single system \u0026ldquo;webapp\u0026rdquo;, combined with a UI for admin and query, and as \u0026ldquo;main\u0026rdquo;, a server suitable to run as part of a larger deployment, including with Docker or running embedded. Both forms use the same core protocol engine and same configuration file format.\nFuseki provides the SPARQL 1.1 protocols for query and update as well as the SPARQL Graph Store protocol.\nFuseki is tightly integrated with TDB to provide a robust, transactional persistent storage layer, and incorporates Jena text query.\nContents Download with UI Getting Started Running Fuseki with UI As a standalone server with UI As a service As a web application Security with Apache Shiro Running Fuseki Server Setup As a Docker container As an embedded SPARQL server Security and data access control Logging Fuseki Configuration Server Statistics and Metrics How to Contribute Client access Use from Java SPARQL Over HTTP - scripts to help with data management. Extending Fuseki with Fuseki Modules Links to Standards The Jena users mailing is the place to get help with Fuseki.\nEmail support lists\nDownload Fuseki with UI Releases of Apache Jena Fuseki can be downloaded from one of the mirror sites:\nJena Downloads\nand previous releases are available from the archive. We strongly recommend that users use the latest official Apache releases of Jena Fuseki in preference to any older versions.\nFuseki download files\nFilename Description apache-jena-fuseki-*VER*.zip Fuseki with UI download jena-fuseki-server The Fuseki Main packaging apache-jena-fuseki-*VER*.zip contains both a war file and an executable jar.\nFuskei Main is also available as a Maven artifact:\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-fuseki-main\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;X.Y.Z\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; Previous releases While previous releases are available, we strongly recommend that wherever possible users use the latest official Apache releases of Jena in preference to using any older versions of Jena.\nPrevious Apache Jena releases can be found in the Apache archive area at\nDevelopment Builds Regular development builds of all of Jena are available (these are not formal releases) from the Apache snapshots maven repository. This includes packaged builds of Fuseki.\nGetting Started With Fuseki The quick start section serves as a basic guide to getting a Fuseki server running on your local machine.\nSee all the ways to run Fuseki for complete coverage of all the deployment methods for Fuseki.\nHow to Contribute We welcome contributions towards making Jena a better platform for semantic web and linked data applications. We appreciate feature suggestions, bug reports and patches for code or documentation.\nSee \u0026ldquo;Getting Involved\u0026rdquo; for ways to contribute to Jena and Fuseki, including patches and making github pull-requests.\nSource code The development codebase is available from git.\nDevelopment builds (not a formal release): SNAPSHOT\nSource code:\nThe Fuseki code is under \u0026ldquo;jena-fuseki2/\u0026rdquo;:\nCode Purpose jena-fuseki-core The Fuseki engine. All SPARQL operations. Fuseki/Main jena-fuseki-main Embedded server and command line jena-fuseki-server Build the combined jar for Fusek/main server jena-fuseki-docker Build a docker conntained based on Fusek/main Webapp jena-fuseki-webapp Web application and command line startup jena-fuseki-fulljar Build the combined jar for Fuseki/UI server jena-fuseki-war Build the war file for Fusek/UI server apache-jena-fuseki The download for Fuskei Other jena-fuseki-access Data access control jena-fuseki-geosparql Integration for GeoSPARQL ","permalink":"","tags":null,"title":"Apache Jena Fuseki"},{"categories":null,"contents":"An implementation of GeoSPARQL 1.0 standard for SPARQL query or API.\nIntegration with Fuseki is provided either by using the GeoSPARQL assembler or using the self-contained original jena-fuseki-geosparql. In either case, this page describes the GeoSPARQL supported features.\nGetting Started GeoSPARQL Jena can be accessed as a library using Maven etc. from Maven Central.\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-geosparql\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;...\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; Features This implementation follows the 11-052r4 OGC GeoSPARQL standard ( The implementation is pure Java and does not require any set-up or configuration of any third party relational databases and geospatial extensions.\nIt implements the six Conformance Classes described in the GeoSPARQL document:\nCore Topology Vocabulary Geometry Extension Geometry Topology RDFS Entailment Extension Query Rewrite Extension The WKT (as described in 11-052r4) and GML 2.0 Simple Features Profile (10-100r3) serialisations are supported. Additional serialisations can be implemented by extending the org.apache.jena.geosparql.implementation.datatype.GeometryDatatype and registering with Jena\u0026rsquo;s org.apache.jena.datatypes.TypeMapper.\nAll three spatial relation families are supported: Simple Feature, Egenhofer and RCC8.\nIndexing and caching of spatial objects and relations is performed on-demand during query execution. Therefore, set-up delays should be minimal. Spatial indexing is available based on the STRtree from the JTS library. The STRtree is readonly once built and contributions of a QuadTree implementation are welcome.\nBenchmarking of the implementation against Strabon and Parliament has found it to be comparable or quicker. The benchmarking used was the Geographical query and dataset (\nAdditional Features The following additional features are also provided:\nGeometry properties are automatically calculated and do not need to be asserted in the dataset. Conversion between EPSG spatial/coordinate reference systems is applied automatically. Therefore, mixed datasets or querying can be applied. This is reliance upon local installation of Apache SIS EPSG dataset, see Key Dependencies. Units of measure are automatically converted to the appropriate units for the coordinate reference system. Geometry, transformation and spatial relation results are stored in persistent and configurable time-limited caches to improve response times and reduce recalculations. Dataset conversion between serialisations and spatial/coordinate reference systems. Tabular data can also be loaded, see RDF Tables project ( Functions to test Geometry properties directly on Geometry Literals have been included for convenience. SPARQL Query Configuration Using the library for SPARQL querying requires one line of code. All indexing and caching is performed during query execution and so there should be minimal delay during initialisation. This will register the Property Functions with ARQ query engine and configures the indexes used for time-limited caching.\nThere are three indexes which can be configured independently or switched off. These indexes retain data that may be required again when a query is being executed but may not be required between different queries. Therefore, the memory usage will grow during query execution and then recede as data is not re-used. All the indexes support concurrency and can be set to a maximum size or allowed to increase capacity as required.\nGeometry Literal: Geometry objects following de-serialisation from Geometry Literal. Geometry Transform: Geometry objects resulting from coordinate transformations between spatial reference systems. Query Rewrite: results of spatial relations between Feature and Geometry spatial objects. Testing has found up to 20% improvement in query completion durations using the indexes. The indexes can be configured by size, retention duration and frequency of clean up.\nBasic setup with default values: GeoSPARQLConfig.setupMemoryIndex()\nIndexes set to maximum sizes: GeoSPARQLConfig.setupMemoryIndexSize(50000, 50000, 50000)\nIndexes set to remove objects not used after 5 seconds: GeoSPARQLConfig.setupMemoryIndexExpiry(5000, 5000, 5000)\nNo indexes setup (Query rewrite still performed but results not stored) : GeoSPARQLConfig.setupNoIndex()\nNo indexes and no query rewriting: GeoSPARQLConfig.setupNoIndex(false)\nReset indexes and other stored data: GeoSPARQLConfig.reset()\nA variety of configuration methods are provided in org.apache.jena.geosparql.configuration.GeoSPARQLConfig. Caching of frequently used but small quantity data is also applied in several registries, e.g. coordinate reference systems and mathematical transformations.\nExample GeoSPARQL query:\nPREFIX geo: \u0026lt;\u0026gt; SELECT ?obj WHERE{ ?subj geo:sfContains ?obj } ORDER by ?obj Querying Datasets \u0026amp; Models with SPARQL The setup of GeoSPARQL Jena only needs to be performed once in an application. After it is setup querying is performed using Jena\u0026rsquo;s standard query methods.\nTo query a Model with GeoSPARQL or standard SPARQL:\nGeoSPARQLConfig.setupMemoryIndex(); Model model = .....; String query = ....; try (QueryExecution qe = QueryExecution.create(query, model)) { ResultSet rs = qe.execSelect(); ResultSetFormatter.outputAsTSV(rs); } If your dataset needs to be separate from your application and accessed over HTTP then you probably need the GeoSPARQL Assembler to integrate with Fuseki. The GeoSPARQL functionality needs to be setup in the application or Fuseki server where the dataset is located.\nIt is recommended that hasDefaultGeometry properties are included in the dataset to access all functionality. It is necessary that SpatialObject classes are asserted or inferred (i.e. a reasoner with the GeoSPARQL schema is applied) in the dataset. Methods to prepare a dataset can be found in org.apache.jena.geosparql.configuration.GeoSPARQLOperations.\nAPI The library can be used as an API in Java. The main class to handle geometries and their spatial relations is the GeometryWrapper. This can be obtained by parsing the string representation of a geometry using the appropriate datatype (e.g. WKT or GML). Alternatively, a Literal can be extracted automatically using the GeometryWrapper.extract() method and registered datatypes. The GeometryWrapperFactory can be used to directly construct a GeometryWrapper. There is overlap between spatial relation families so repeated methods are not specified.\nParse a Geometry Literal: GeometryWrapper geometryWrapper = WKTDatatype.INSTANCE.parse(\u0026quot;POINT(1 1)\u0026quot;);\nExtract from a Jena Literal: GeometryWrapper geometryWrapper = GeometryWrapper.extract(geometryLiteral);\nCreate from a JTS Geometry: GeometryWrapper geometryWrapper = GeometryWrapperFactory.createGeometry(geometry, srsURI, geometryDatatypeURI);\nCreate from a JTS Point Geometry: GeometryWrapper geometryWrapper = GeometryWrapperFactory.createPoint(coordinate, srsURI, geometryDatatypeURI);\nConvert CRS/SRS: GeometryWrapper otherGeometryWrapper = geometryWrapper.convertCRS(\u0026quot;\u0026quot;)\nSpatial Relation: boolean isCrossing = geometryWrapper.crosses(otherGeometryWrapper);\nDE-9IM Intersection Pattern: boolean isRelated = geometryWrapper.relate(otherGeometryWrapper, \u0026quot;TFFFTFFFT\u0026quot;);\nGeometry Property: boolean isEmpty = geometryWrapper.isEmpty();\nThe GeoSPARQL standard specifies that WKT Geometry Literals without an SRS URI are defaulted to CRS84\nKey Dependencies GeoSPARQL The OGC GeoSPARQL standard supports representing and querying geospatial data on the Semantic Web. GeoSPARQL defines a vocabulary for representing geospatial data in RDF, and it defines an extension to the SPARQL query language for processing geospatial data. In addition, GeoSPARQL is designed to accommodate systems based on qualitative spatial reasoning and systems based on quantitative spatial computations.\nThe GeoSPARQL standard is based upon the OGC Simple Features standard ( used in relational databases. Modifications and enhancements have been made for usage with RDF and SPARQL. The Simple Features standard, and by extension GeoSPARQL, simplify calculations to Euclidean planer geometry. Therefore, datasets using a geographic spatial/coordinate reference system, which are based on latitude and longitude on an ellipsoid, e.g. WGS84, will have minor error introduced. This error has been deemed acceptable due to the simplification in calculation it offers.\nApache SIS/SIS_DATA Environment Variable Apache Spatial Information System (SIS) is a free software, Java language library for developing geospatial applications. SIS provides data structures for geographic features and associated meta-data along with methods to manipulate those data structures. The library is an implementation of GeoAPI 3.0 interfaces and can be used for desktop or server applications.\nA subset of the EPSG spatial/coordinate reference systems are included by default. The full EPSG dataset is not distributed due to the EPSG terms of use being incompatible with the Apache Licence. Several options are available to include the EPSG dataset by setting the SIS_DATA environment variable (\nAn embedded EPSG dataset can be included in a Gradle application by adding the following dependency to build.gradle:\next.sisVersion = \u0026quot;1.1\u0026quot; implementation \u0026quot;org.apache.sis.non-free:sis-embedded-data:$sisVersion\u0026quot; Java Topology Suite The JTS Topology Suite is a Java library for creating and manipulating vector geometry.\nNote The following are implementation points that may be useful during usage.\nGeoSPARQL Schema An RDF/XML schema has been published for the GeoSPARQL v1.0 standard (v1.0.1 - This can be applied to Jena Models (see the inference documentation) to provide RDFS and OWL inferencing on a GeoSPARQL conforming dataset. However, the published schema does not conform with the standard.\nThe property hasDefaultGeometry is missing from the schema and instead the defaultGeometry property is stated.\nThis prevents RDFS inferencing being performed correctly and has been reported to the OGC Standards Tracker. A corrected version of the schema is available in the Resources folder.\nSpatial Relations The GeoSPARQL and Simple Features standard both define the DE-9IM intersection patterns for the three spatial relation families. However, these patterns are not always consistent with the patterns stated by the JTS library for certain relations.\nFor example, GeoSPARQL/Simple Features use TFFFTFFFT equals relations in Simple Feature, Egenhofer and RCC8. However, this does not yield the usually expected result when comparing a pair of point geometries. The Simple Features standard states that the boundary of a point is empty. Therefore, the boundary intersection of two points would also be empty so give a negative comparison result.\nJTS, and other libraries, use the alternative intersection pattern of T*F**FFF*. This is a combination of the within and contains relations and yields the expected results for all geometry types.\nThe spatial relations utilised by JTS have been implemented as the extension spatial:equals filter and property functions. A user can also supply their own DE-9IM intersection patterns by using the geof:relate filter function.\nSpatial Relations and Geometry Shapes/Types The spatial relations for the three spatial families do not apply to all combinations of the geometry shapes (Point, LineString, Polygon) and their collections (MultiPoint, MultiLineString, MultiPolygon). Therefore, some queries may not produce all the results that may initially be expected.\nSome examples are:\nIn some relations there may only be results when a collection of shapes is being used, e.g. two multi-points can overlap but two points cannot. A relation may only apply for one combination but not its reciprocal, e.g. a line may cross a polygon but a polygon may not cross a line. The RCC8 family only applies to Polygon and MultiPolygon types. Refer to pages 8-10 of 11-052r4 GeoSPARQL standard for more details.\nEquals Relations The three equals relations (sfEquals, ehEquals and rccEquals) use spatial equality and not lexical equality. Therefore, some comparisons using these relations may not be as expected.\nThe JTS description of sfEquals is:\nTrue if two geometries have at least one point in common and no point of either geometry lies in the exterior of the other geometry. Therefore, two empty geometries will return false as they are not spatially equal. Shapes which differ in the number of points but have the same geometry are equal and will return true.\ne.g. LINESTRING (0 0, 0 10) and LINESTRING (0 0, 0 5, 0 10) are spatially equal.\nQuery Rewrite Extension The Query Rewrite Extension provides for simpler querying syntax. Feature and Geometry can be used in spatial relations without needing the relations to be asserted in the dataset. This also means the Geometry Literal does not need to be specified in the query. In the case of Features this requires the hasDefaultGeometry property to be used in the dataset.\nThis means the query:\n?subj geo:hasDefaultGeometry ?subjGeom . ?subjGeom geo:hasSerialization ?subjLit . ?obj geo:hasDefaultGeometry ?objGeom . ?objGeom geo:hasSerialization ?objLit . FILTER(geof:sfContains(?subjLit, ?objLit)) becomes:\n?subj geo:sfContains ?obj . Methods are available to apply the hasDefaultGeometry property to every Geometry with a single hasGeometry property, see org.apache.jena.geosparql.configuration.GeoSPARQLOperations.\nDepending upon the spatial relation, queries may include the specified Feature and Geometry in the results. e.g. FeatureA is bound in a query on a dataset only containing FeatureA and GeometryA. The results FeatureA and GeometryA are returned rather than no results. Therefore, filtering using FILTER(!sameTerm(?subj, ?obj)) etc. may be needed in some cases. The query rewrite functionality can be switched off in the library configuration, see org.apache.jena.geosparql.configuration.GeoSPARQLConfig.\nEach dataset is assigned a Query Rewrite Index to store the results of previous tests. There is the potential that relations are tested multiple times in a query (i.e. Feature-Feature, Feature-Geometry, Geometry-Geometry, Geometry-Feature). Therefore, it is useful to retain the results for at least a short period of time.\nIterating through all combinations of spatial relations for a dataset containing n Geometry Literals will produce 27n^2 true/false results (asserting the true result statements in a dataset would be a subset). Control is given on a dataset basis to allow choice in when and how storage of rewrite results is applied, e.g. store all found results on a small dataset but on demand for a large dataset.\nThis index can be configured on a global and individual dataset basis for the maximum size and duration until unused items are removed. Query rewriting can be switched on independently of the indexes, i.e. query rewriting can be performed but an index is configured to not store the result.\nAs an extension to the standard, supplying a Geometry Literal is also permitted. For example:\n?subj geo:sfContains \u0026quot;POINT(0 0)\u0026quot;^^geo:wktLiteral . Dataset Conversion Methods to convert datasets between serialisations and spatial/coordinate reference systems are available in: org.apache.jena.geosparql..configuration.GeoSPARQLOperations\nThe following list shows some of the operations that can be performed. Once these operations have been performed they can be serialised to file or stored in a Jena TDB to remove the need to reprocess.\nLoad a Jena Model from file: Model dataModel = RDFDataMgr.loadModel(\u0026quot;data.ttl\u0026quot;);\nConvert Feature-GeometryLiteral to the GeoSPARQL Feature-Geometry-GeometryLiteral structure: Model geosparqlModel = GeoSPARQLOperations.convertGeometryStructure(dataModel);\nConvert Feature-Lat, Feature-Lon Geo predicates to the GeoSPARQL Feature-Geometry-GeometryLiteral structure, with option to remove Geo predicates: Model geosparqlModel = GeoSPARQLOperations.convertGeoPredicates(dataModel, true);\nAssert additional hasDefaultGeometry statements for single hasGeometry triples, used in Query Rewriting: GeoSPARQLOperations.applyDefaultGeometry(geosparqlModel);\nConvert Geometry Literals to the WGS84 spatial reference system and WKT datatype: Model model = GeoSPARQLOperations.convert(geosparqlModel, \u0026quot;\u0026quot;, \u0026quot;\u0026quot;);\nApply GeoSPARQL schema with RDFS inferencing and assert additional statements in the Model: GeoSPARQLOperations.applyInferencing(model);\nApply commonly used GeoSPARQL prefixes for URIs to the model: GeoSPARQLOperations.applyPrefixes(model);\nCreate Spatial Index for a Model within a Dataset for spatial querying: Dataset dataset = SpatialIndex.wrapModel(model);\nOther operations are available and can be applied to a Dataset containing multiple Models and in some cases files and folders. These operations do not configure and setup the GeoSPARQL functions or indexes that are required for querying.\nSpatial Index A Spatial Index can be created to improve searching of a dataset. The Spatial Index is expected to be unique to the dataset and should not be shared between datasets. Once built the Spatial Index cannot have additional items added to it.\nA Spatial Index is required for the jena-spatial property functions and is optional for the GeoSPARQL spatial relations. Only a single SRS can be used for a Spatial Index and it is recommended that datasets are converted to a single SRS, see GeoSPARQLOperations.\nSetting up a Spatial Index can be done through org.apache.jena.geosparql.configuration.GeoSPARQLConfig. Additional methods for building, loading and saving Spatial Indexes are provided in org.apache.jena.geosparql.spatial.SpatialIndex.\nUnits URI Spatial/coordinate reference systems use a variety of measuring systems for defining distances. These can be specified using a URI identifier, as either URL or URN, with conversion undertaken automatically as required. It should be noted that there is error inherent in spatial reference systems and some variation in values may occur between different systems.\nThe following table gives some examples of units that are supported (additional units can be added to the UnitsRegistry using the javax.measure.Unit API. These URI are all in the namespace and here use the prefix units.\nURI Description units:kilometre or units:kilometer Kilometres units:metre or units:meter Metres units:mile or units:statuteMile Miles units:degree Degrees units:radian Radians Full listing of default Units can be found in org.apache.jena.geosparql.implementation.vocabulary.Unit_URI.\nGeography Markup Language Support (GML) The supported GML profile is GML 2.0 Simple Features Profile (10-100r3), which is a profile of GML 3.2.1 (07-036r1). The profile restricts the geometry shapes permitted in GML 3.2.1 to a subset, see 10-100r3 page 22. The profile supports Points, LineString and Polygon shapes used in WKT. There are also additional shape serialisations available in the profile that do not exist in WKT or JTS to provide simplified representations which would otherwise use LineStrings or Polygons. Curves can be described by LineStringSegment, Arc, Circle and CircleByCenterPoint. Surfaces can be formed similarly to Polygons or using Curves. These additional shapes can be read as part of a dataset or query but will not be produced if the SRS of the shape is transformed, instead a LineString or Polygon representation will be produced.\nDetails of the GML structure for these shapes can be found in the geometryPrimitives.xsd, geometryBasic0d1d.xsd, geometryBasic2d.xsd and geometryAggregates.xsd schemas.\nThe labelling of collections is as follows:\nCollection Geometry MultiPoint Point MultiCurve LineString, Curve MultiSurface Polygon, Surface MultiGeometry Point, LineString, Curve, Polygon, Surface Apache Jena Spatial Functions/WGS84 Geo Predicates The jena-spatial module contains several SPARQL functions for querying datasets using the WGS84 Geo predicates for latitude ( and longitude ( These jena-spatial functions are supported for both Geo predicates and Geometry Literals, i.e. a GeoSPARQL dataset. Additional SPARQL filter functions have been provided to convert Geo predicate properties into WKT strings and calculate Great Circle and Euclidean distances. The jena-spatialfunctions require setting up a Spatial Index for the target Dataset, e.g. GeoSPARQLConfig.setupSpatialIndex(dataset);, see Spatial Index section.\nSupported Features The Geo predicate form of spatial representation is restricted to only \u0026lsquo;Point\u0026rsquo; shapes in the WGS84 spatial/coordinate reference system. The Geo predicates are properties of the Feature and do not use the properties and structure of the GeoSPARQL standard, including Geometry Literals. Methods are available to convert datasets from Geo predicates to GeoSPARQL structure, see: org.apache.jena.geosparql.configuration.GeoSPARQLOperations\nThe spatial relations and query re-writing of GeoSPARQL outlined previously has been implemented for Geo predicates. However, only certain spatial relations are valid for Point to Point relationships. Refer to pages 8-10 of 11-052r4 GeoSPARQL standard for more details.\nGeo predicates can be converted to Geometry Literals in query and then used with the GeoSPARQL filter functions.\n?subj wgs:lat ?lat . ?subj wgs:long ?lon . BIND(spatialF:convertLatLon(?lat, ?lon) as ?point) . #Coordinate order is Lon/Lat without stated SRS URI. BIND(\u0026quot;POLYGON((...))\u0026quot;^^\u0026lt;\u0026gt; AS ?box) . FILTER(geof:sfContains(?box, ?point)) Alternatively, utilising more shapes, relations and spatial reference systems can be achieved by converting the dataset to the GeoSPARQL structure.\n?subj geo:hasGeometry ?geom . ?geom geo:hasSerialization ?geomLit . #Coordinate order is Lon/Lat without stated SRS URI. BIND(\u0026quot;POLYGON((...))\u0026quot;^^\u0026lt;\u0026gt; AS ?box) . FILTER(geof:sfContains(?box, ?geomLit)) Datasets can contain both Geo predicates and Geometry Literals without interference. However, a dataset containing both types will only examine those Features which have Geometry Literals for spatial relations, i.e. the check for Geo predicates is a fallback when Geometry Literals aren\u0026rsquo;t found. Therefore, it is not recommended to insert new Geo predicate properties after a dataset has been converted to GeoSPARQL structure (unless corresponding Geometry and Geometry Literals are included).\nFilter Functions These filter functions are available in the namespace and here use the prefix spatialF.\nFunction Name Description ?wktString spatialF:convertLatLon(?lat, ?lon) Converts Lat and Lon double values into WKT string of a Point with WGS84 SRS. ?wktString spatialF:convertLatLonBox(?latMin, ?lonMin, ?latMax, ?lonMax) Converts Lat and Lon double values into WKT string of a Polygon forming a box with WGS84 SRS. ?boolean spatialF:equals(?geomLit1, ?geomLit2) True, if geomLit1 is spatially equal to geomLit2. ?boolean spatialF:nearby(?geomLit1, ?geomLit2, ?distance, ?unitsURI) True, if geomLit1 is within distance of geomLit2 using the distance units. ?boolean spatialF:withinCircle(?geomLit1, ?geomLit2, ?distance, ?unitsURI) True, if geomLit1 is within distance of geomLit2 using the distance units. ?radians spatialF:angle(?x1, ?y1, ?x2, ?y2) Angle clockwise from y-axis from Point(x1,y1) to Point (x2,y2) in 0 to 2π radians. ?degrees spatialF:angleDeg(?x, ?y1, ?x2, ?y2) Angle clockwise from y-axis from Point(x1,y1) to Point (x2,y2) in 0 to 360 degrees. ?distance spatialF:distance(?geomLit1, ?geomLit2, ?unitsURI) Distance between two Geometry Literals in distance units. Chooses distance measure based on SRS type. Great Circle distance for Geographic SRS and Euclidean otherwise. ?radians spatialF:azimuth(?lat1, ?lon1, ?lat2, ?lon2) Forward azimuth clockwise from North between two Lat/Lon Points in 0 to 2π radians. ?degrees spatialF:azimuthDeg(?lat1, ?lon1, ?lat2, ?lon2) Forward azimuth clockwise from North between two Lat/Lon Points in 0 to 360 degrees. ?distance spatialF:greatCircle(?lat1, ?lon1, ?lat2, ?lon2, ?unitsURI) Great Circle distance (Vincenty formula) between two Lat/Lon Points in distance units. ?distance spatialF:greatCircleGeom(?geomLit1, ?geomLit2, ?unitsURI) Great Circle distance (Vincenty formula) between two Geometry Literals in distance units. Use from GeoSPARQL standard for Euclidean distance. ?geomLit2 spatialF:transform(?geomLit1, ?datatypeURI, ?srsURI) Transform Geometry Literal by Datatype and SRS. ?geomLit2 spatialF:transformDatatype(?geomLit1, ?datatypeURI) Transform Geometry Literal by Datatype. ?geomLit2 spatialF:transformSRS(?geomLit1, ?srsURI) Transform Geometry Literal by SRS. Property Functions These property functions are available in the namespace and here use the prefix spatial. This is the same namespace as the jena-spatial functions utilise and these form direct replacements. The subject Feature may be bound, to test the pattern is true, or unbound, to find all cases the pattern is true. These property functions require a Spatial Index to be setup for the dataset.\nThe optional ?limit parameter restricts the number of results returned. The default value is -1 which returns all results. No guarantee is given for ordering of results. The optional ?unitsURI parameter specifies the units of a distance. The default value is kilometres through the string or resource\nThe spatial:equals property function behaves the same way as the main GeoSPARQL property functions. Either, both or neither of the subject and object can be bound. A Spatial Index is not required for the dataset with the spatial:equals property function.\nFunction Name Description ?spatialObject1 spatial:equals ?spatialObject2 Find spatialObjects (i.e. features or geometries) that are spatially equal. ?feature spatial:intersectBox(?latMin ?lonMin ?latMax ?lonMax [ ?limit]) Find features that intersect the provided box, up to the limit. ?feature spatial:intersectBoxGeom(?geomLit1 ?geomLit2 [ ?limit]) Find features that intersect the provided box, up to the limit. ?feature spatial:withinBox(?latMin ?lonMin ?latMax ?lonMax [ ?limit]) Find features that intersect the provided box, up to the limit. ?feature spatial:withinBoxGeom(?geomLit1 ?geomLit2 [ ?limit]) Find features that are within the provided box, up to the limit. ?feature spatial:nearby(?lat ?lon ?radius [ ?unitsURI [ ?limit]]) Find features that are within radius of the distance units, up to the limit. ?feature spatial:nearbyGeom(?geomLit ?radius [ ?unitsURI [ ?limit]]) Find features that are within radius of the distance units, up to the limit. ?feature spatial:withinCircle(?lat ?lon ?radius [ ?unitsURI [ ?limit]]) Find features that are within radius of the distance units, up to the limit. ?feature spatial:withinCircleGeom(?geomLit ?radius [ ?unitsURI [ ?limit]]) Find features that are within radius of the distance units, up to the limit. The Cardinal Functions find all Features that are present in the specified direction. In Geographic spatial reference systems (SRS), e.g. WGS84 and CRS84, the East/West directions wrap around. Therefore, a search is made from the shape\u0026rsquo;s edge for up to half the range of the SRS (i.e. 180 degrees in WGS84) and will continue across the East/West boundary if necessary. In other SRS, e.g. Projected onto a flat plane, the East/West check is made from the shape\u0026rsquo;s edge to the farthest limit of the SRS range, i.e. there is no wrap around.\nCardinal Function Name Description ?feature spatial:north(?lat ?lon [ ?limit]) Find features that are North of the Lat/Lon point (point to +90 degrees), up to the limit. ?feature spatial:northGeom(?geomLit [ ?limit]) Find features that are North of the Geometry Literal, up to the limit. ?feature spatial:south(?lat ?lon [ ?limit]) Find features that are South of the Lat/Lon point (point to -90 degrees), up to the limit. ?feature spatial:southGeom(?geomLit [ ?limit]) Find features that are South of the Geometry Literal, up to the limit. ?feature spatial:east(?lat ?lon [ ?limit]) Find features that are East of the Lat/Lon point (point plus 180 degrees longitude, wrapping round), up to the limit. ?feature spatial:eastGeom(?geomLit [ ?limit]) Find features that are East of the Geometry Literal, up to the limit. ?feature spatial:west(?lat ?lon [ ?limit]) Find features that are West of the Lat/Lon point (point minus 180 degrees longitude, wrapping round), up to the limit. ?feature spatial:westGeom(?geomLit [ ?limit]) Find features that are West of the Geometry Literal, up to the limit. Geometry Property Filter Functions The GeoSPARQL standard provides a set of properties related to geometries, see Section 8.4. These are applied on the Geometry resource and are automatically determined if not asserted in the data. However, it may be necessary to retrieve the properties of a Geometry Literal directly without an associated Geometry resource. Filter functions to do this have been included as part of the namespace as a minor variation to the GeoSPARQL standard. The relevant functions using the geof prefix are:\nGeometry Property Filter Function Name Description ?integer geof:dimension(?geometryLiteral) Topological dimension, e.g. 0 for Point, 1 for LineString and 2 for Polygon. ?integer geof:coordinateDimension(?geometryLiteral) Coordinate dimension, e.g. 2 for XY coordinates and 4 for XYZM coordinates. ?integer geof:spatialDimension(?geometryLiteral) Spatial dimension, e.g. 2 for XY coordinates and 3 for XYZM coordinates. ?boolean geof:isEmpty(?geometryLiteral) True, if geometry is empty. ?boolean geof:isSimple(?geometryLiteral) True, if geometry is simple. ?boolean geof:isValid(?geometryLiteral) True, if geometry is topologically valid. A dataset that follows the GeoSPARQL Feature-Geometry-GeometryLiteral can have simpler SPARQL queries without needing to use these functions by taking advantage of the Query Rewriting functionality. The geof:isValid filter function and geo:isValid property for a Geometry resource are not part of the GeoSPARQL standard but have been included as a minor variation.\nFuture Work Implementing GeoJSON as a GeometryLiteral serialisation ( Producing GeoJSON is already possible with geof:asGeoJSON(?geometryLiteral). Contributors The following individuals have made contributions to this project:\nGreg Albiston Haozhe Chen Taha Osman Why Use This Implementation? There are several implementations of the GeoSPARQL standard. The conformance and completeness of these implementations is difficult to ascertain and varies between features.\nHowever, the following may be of interest when considering whether to use this implementation based on reviewing several alternatives.\nThis Implementation Other Implementations Implements all six components of the GeoSPARQL standard. Generally partially implement the Geometry Topology and Geometry Extensions. Do not implement the Query Rewrite Extension. Pure Java and does not require a supporting relational database. Configuration requires a single line of code (although Apache SIS may need some setting up, see above). Require setting up a database, configuring a geospatial extension and setting environment variables. Uses Jena, which conforms to the W3C standards for RDF and SPARQL. New versions of the standards will quickly feed through. Not fully RDF and SPARQL compliant, e.g. RDFS/OWL inferencing or SPARQL syntax. Adding your own schema may not produce inferences. Automatically determines geometry properties and handles mixed cases of units or coordinate reference systems. The GeoSPARQL standard suggests this approach but does not require it. Tend to produce errors or no results in these situations. Performs indexing and caching on-demand which reduces set-up time and only performs calculations that are required. Perform indexing in the data loading phase and initialisation phase, which can lead to lengthy delays (even on relatively small datasets). Uses JTS which does not truncate coordinate precision and applies spatial equality. May truncate coordinate precision and apply lexical equality, which is quicker but does not comply with the GeoSPARQL standard. ","permalink":"","tags":null,"title":"Apache Jena GeoSPARQL"},{"categories":null,"contents":"Jena has an initialization sequence that is used to setup components available at runtime.\nApplication code is welcome to also use this mechanism. This must be done with care. During Jena initialization, there can be visibility of uninitialized data in class static members.\nThe standard initialization sequence is\nCore -\u0026gt; RIOT -\u0026gt; ARQ -\u0026gt; TDB -\u0026gt; other (including jena text)\nThe sequence from 0 to level 500 is the Jena platform initialization. Application may use the jena initialization mechanism and it is recommended to place initialization code above level 500.\nInitialization occurs when JenaSystem.init() is first called. Jena ensures that this is done when the application first uses any Jena code by using class initializers.\nApplication can call JenaSystem.init().\nSee notes on repacking Jena code for how to deal with ServiceLoader files in repacked jars.\nInitialization code Initialization code is an implementation of JenaSubsystemLifecycle which itself extends SubsystemLifecycle.\nFor use in the default initialization, the class must have a zero-argument constructor and implement:\npublic interface JenaSubsystemLifecycle { public void start() ; public void stop() ; default public int level() { return 9999 ; } } The code should supply a level, indicating its place in the order of initialization. The levels used by Jena are:\n0 - reserved 10 - Used by jena-core 15 - CLI Commands registry 20 - RIOT 30 - ARQ 40 - Text indexing 40 - TDB1 42 - TDB2 60 - Additional HTTP configuration 60 - RDFPatch 96 - SHACL 96 - ShEx 101 - Fuseki 9999 - Default. Levels up to 500 are considered to be \u0026ldquo;Jena system level\u0026rdquo;, Application code should use level above 500.\nFuseki initialization includes Fuseki Modules which uses SubsystemLifecycle with a different Java interface.\nThe Initialization Process The process followed by JenaSystem.init() is to load all java ServiceLoader registered JenaSubsystemLifecycle, sort into level order, then call init on each initialization object. Initialization code at the same level may be called in any order and that order may be different between runs.\nOnly the first call of JenaSystem.init() causes the process to run. Any subsequent calls are cheap, so calling JenaSystem.init() when in doubt about the initialization state is safe.\nOverlapping concurrent calls to JenaSystem.init() are thread-safe. On a return from JenaSystem.init(), Jena has been initialized at some point.\nDebugging There is a flag JenaSystem.DEBUG_INIT to help with development. It is not intended for runtime logging.\nJena components print their initialization beginning and end points on System.err to help track down ordering issues.\n","permalink":"","tags":null,"title":"Apache Jena Initialization"},{"categories":null,"contents":"In first name alphabetical order:\nAaron Coburn (acoburn) C Adam Soroka (ajs6f) CP Andy Seaborne (andy) CP VP Bruno Kinoshita (kinow) CP Chris Dollin (chrisdollin) CP Chris Tomlinson (codeferret) CP Claude Warren (claude) CP Damian Steer (damian) CP Dave Reynolds (der) CP Ian Dickinson (ijd) CP Lorenz Buehmann (lbuehmann) C Osma Suominen (osma) CP Paolo Castagna (castagna) CP Rob Vesse (rvesse) CP Stephen Allen (sallen) CP Ying Jiang (jpz6311whu) C Emeritus and Mentors:\nBenson Margulies C Dave Johnson Leo Simons Ross Gardler Key C a committer P a PMC member VP project chair and Apache Foundation Vice-President ","permalink":"","tags":null,"title":"Apache Jena project team members"},{"categories":null,"contents":"Apache Jena is packaged as downloads which contain the most commonly used portions of the systems:\napache-jena – contains the APIs, SPARQL engine, the TDB native RDF database and command line tools apache-jena-fuseki – the Jena SPARQL server Jena4 requires Java 11.\nJena jars are available from Maven.\nYou may verify the authenticity of artifacts below by using the PGP KEYS file.\nApache Jena Release Source release: this forms the official release of Apache Jena. All binaries artifacts and maven binaries correspond to this source.\nApache Jena Release SHA512 Signature SHA512 PGP Apache Jena Binary Distributions The binary distribution of the Fuseki server:\nApache Jena Fuseki SHA512 Signature apache-jena-fuseki-4.9.0.tar.gz SHA512 PGP SHA512 PGP \u0026nbsp;\nThe binary distribution of libraries contains the APIs, SPARQL engine, the TDB native RDF database and a variety of command line scripts and tools for working with these systems. Apache Jena libraries SHA512 Signature apache-jena-4.9.0.tar.gz SHA512 PGP SHA512 PGP \u0026nbsp;\nThe binary distribution of Fuseki as a WAR file: Apache Jena Fuseki SHA512 Signature jena-fuseki-war-4.9.0.war SHA512 PGP Individual Modules Apache Jena publishes a range of modules beyond those included in the binary distributions (code for all modules may be found in the source distribution).\nIndividual modules may be obtained using a dependency manager which can talk to Maven repositories, some modules are only available via Maven.\nMaven See \u0026ldquo;Using Jena with Apache Maven\u0026rdquo; for full details.\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;apache-jena-libs\u0026lt;/artifactId\u0026gt; \u0026lt;type\u0026gt;pom\u0026lt;/type\u0026gt; \u0026lt;version\u0026gt;X.Y.Z\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; Source code The development codebase is available from git.\n\nThis is also available on github:\n\nPrevious releases While previous releases are available, we strongly recommend that wherever possible users use the latest official Apache releases of Jena in preference to using any older versions of Jena.\nPrevious Apache Jena releases can be found in the Apache archive area at\nDownload Source The Apache Software foundation uses CDN-distribution for Apache projects and the current release of Jena.\n[if-any logo] [end] The currently selected mirror is [preferred]. If you encounter a problem with this mirror, please select another mirror. Other mirrors: [if-any http] [for http][http][end] [end] [if-any ftp] [for ftp][ftp][end] [end] [if-any backup] [for backup][backup] (backup)[end] [end] ","permalink":"","tags":null,"title":"Apache Jena Releases"},{"categories":null,"contents":" The Apache Jena SDB module has been retired and is no longer supported. The last release of Jena with this module was Apache Jena 3.17.0. The original documentation.\n","permalink":"","tags":null,"title":"Apache Jena SDB - persistent triple stores using relational databases"},{"categories":null,"contents":"jena-shacl is an implementation of the W3C Shapes Constraint Language (SHACL). It implements SHACL Core and SHACL SPARQL Constraints.\nIn addition, it provides:\nSHACL Compact Syntax SPARQL-based targets Command line The command shacl introduces shacl operations; it takes a sub-command argument.\nTo validate:\nshacl validate --shapes SHAPES.ttl --data DATA.ttl shacl v -s SHAPES.ttl -d DATA.ttl The shapes and data files can be the same; the --shapes is optional and defaults to the same as --data. This includes running individual W3C Working Group tests.\nTo parse a file:\nshacl parse FILE shacl p FILE which writes out a text format.\nshacl p --out=FMT FILE writes out in text(t), compact(c), rdf(r) formats. Multiple formats can be given, separated by \u0026ldquo;,\u0026rdquo; and format all outputs all 3 formats.\nIntegration with Apache Jena Fuseki Fuseki has a new service operation fuseki:shacl:\n\u0026lt;#serviceWithShacl\u0026gt;; rdf:type fuseki:Service ; rdfs:label \u0026#34;Dataset with SHACL validation\u0026#34; ; fuseki:name \u0026#34;\u0026lt;i\u0026gt;ds\u0026lt;/i\u0026gt;\u0026#34; ; fuseki:serviceReadWriteGraphStore \u0026#34;\u0026#34; ; fuseki:endpoint [ fuseki:operation fuseki:shacl ; fuseki:name \u0026#34;shacl\u0026#34; ] ; fuseki:dataset \u0026lt;#dataset\u0026gt; ; . This requires a \u0026ldquo;new style\u0026rdquo; endpoint declaration: see \u0026ldquo;Fuseki Endpoint Configuration\u0026rdquo;.\nThis is not installed into a dataset setup by default; a configuration file using\nfuseki:endpoint [ fuseki:operation fuseki:shacl ; fuseki:name \u0026#34;shacl\u0026#34; ]; is necessary (or programmatic setup for Fuseki Main).\nThe service accepts a shapes graph posted as RDF to /ds/shacl with content negotiation.\nThere is a graph argument, ?graph=, that specifies the graph to validate. It is the URI of a named graph, default for the unnamed, default graph (and this is the assumed value of ?graph if not present), or union for union of all named graphs in the dataset.\nFurther, an argument target=uri validates a specific node in the data.\nUpload data in file fu-data.ttl:\ncurl -XPOST --data-binary @fu-data.ttl \\ --header \u0026#39;Content-type: text/turtle\u0026#39; \\ \u0026#39;http://localhost:3030/ds?default\u0026#39; Validate with shapes in fu-shapes.ttl and get back a validation report:\ncurl -XPOST --data-binary @fu-shapes.ttl \\ --header \u0026#39;Content-type: text/turtle\u0026#39; \\ \u0026#39;http://localhost:3030/ds/shacl?graph=default\u0026#39; API The package org.apache.jena.shacl has the main classes.\nShaclValidator for parsing and validation GraphValidation for updating graphs with validation API Examples\nExample Shacl01_validateGraph shows validation and printing of the validation report in a text form and in RDF:\npublic static void main(String ...args) { String SHAPES = \u0026#34;shapes.ttl\u0026#34;; String DATA = \u0026#34;data1.ttl\u0026#34;; Graph shapesGraph = RDFDataMgr.loadGraph(SHAPES); Graph dataGraph = RDFDataMgr.loadGraph(DATA); Shapes shapes = Shapes.parse(shapesGraph); ValidationReport report = ShaclValidator.get().validate(shapes, dataGraph); ShLib.printReport(report); System.out.println(); RDFDataMgr.write(System.out, report.getModel(), Lang.TTL); } Example Shacl02_validateTransaction shows how to update a graph only if, after the changes, the graph is validated according to the shapes provided.\nSHACL Compact Syntax Apache Jena supports SHACL Compact Syntax (SHACL-C) for both reading and writing.\nThe file extensions for SHACL-C are .shc and .shaclc and there is a registered language constant Lang.SHACLC.\nRDFDataMgr.load(\u0026#34;shapes.shc\u0026#34;);\u0026#34;file:compactShapes\u0026#34;, Lang.SHACLC); RDFDataMgr.write(System.out, shapesGraph, Lang.SHACLC); SHACL-C is managed by the SHACL Community Group. It does not cover all possible shapes. When outputting SHACL-C, SHACL shapes not expressible in SHACL-C will cause an exception and data in the RDF graph that is not relevant will not be output. In other words, SHACL-C is a lossy format for RDF.\nThe Jena SHACL-C writer will output any valid SHACL-C document.\nExtensions:\nThe constraint grammar rule allows a shape reference to a node shape. The propertyParam grammar rule provides \u0026ldquo;group\u0026rdquo;, \u0026ldquo;order\u0026rdquo;, \u0026ldquo;name\u0026rdquo;, \u0026ldquo;description\u0026rdquo; and \u0026ldquo;defaultValue\u0026rdquo; to align with nodeParam. The nodeParam grammar rule supports \u0026ldquo;targetClass\u0026rdquo; (normally written with the shorthand -\u0026gt;) as well as the defined \u0026ldquo;targetNode\u0026rdquo;, \u0026ldquo;targetObjectsOf\u0026rdquo;, \u0026ldquo;targetSubjectsOf\u0026rdquo; SPARQL-based targets SPARQL-based targets allow the target nodes to be calculated with a SPARQL SELECT query.\nSee SPARQL-based targets for details.\nex:example sh:target [ a sh:SPARQLTarget ; sh:select \u0026#34;\u0026#34;\u0026#34; SELECT ?this WHERE { ... } \u0026#34;\u0026#34;\u0026#34; ; ] ; ValidationListener When given a ValidationListener the SHACL validation code emits events at each step of validation:\nwhen validation of a shape starts or finishes when the focus nodes of the shape have been identified when validation of a constraint begins, ends and yields positive or negative results For example, the following listener will just record all events in a List:\npublic class RecordingValidationListener implements ValidationListener { private final List\u0026lt;ValidationEvent\u0026gt; events = new ArrayList\u0026lt;\u0026gt;(); @Override public void onValidationEvent(ValidationEvent e) { events.add(e); } public List\u0026lt;ValidationEvent\u0026gt; getEvents() { return events; } } The listener must be passed to the constructor of the ValidationContext. The following example validates the dataGraph according to the shapesGraph using the ValidationListener above:\nGraph shapesGraph = RDFDataMgr.loadGraph(shapesGraphUri); //assuming shapesGraphUri points to an RDF file Graph dataGraph = RDFDataMgr.loadGraph(dataGraphUri); //assuming dataGraphUri points to an RDF file RecordingValidationListener listener = new RecordingValidationListener(); // see above Shapes shapes = Shapes.parse(shapesGraph); ValidationContext vCtx = ValidationContext.create(shapes, dataGraph, listener); // pass listener here for (Shape shape : shapes.getTargetShapes()) { Collection\u0026lt;Node\u0026gt; focusNodes = VLib.focusNodes(dataGraph, shape); for (Node focusNode : focusNodes) { VLib.validateShape(vCtx, dataGraph, shape, focusNode); } } List\u0026lt;ValidationEvent\u0026gt; actualEvents = listener.getEvents(); // all events have been recorded The events thus generated might look like this (event.toString(), one per line):\nFocusNodeValidationStartedEvent{focusNode=, shape=NodeShape[]} ConstraintEvaluationForNodeShapeStartedEvent{constraint=ClassConstraint[\u0026lt;\u0026gt;], focusNode=, shape=NodeShape[]} ConstraintEvaluatedOnFocusNodeEvent{constraint=ClassConstraint[\u0026lt;\u0026gt;], focusNode=, shape=NodeShape[], valid=true} ConstraintEvaluationForNodeShapeFinishedEvent{constraint=ClassConstraint[\u0026lt;\u0026gt;], focusNode=, shape=NodeShape[]} FocusNodeValidationFinishedEvent{focusNode=, shape=NodeShape[]} [...] Many use cases can be addressed with the HandlerBasedValidationListener, which allows for registering event handlers on a per-event basis. For example:\nValidationListener myListener = HandlerBasedValidationListener .builder() .forEventType(FocusNodeValidationStartedEvent.class) .addSimpleHandler(e -\u0026gt; { // ... }) .forEventType(ConstraintEvaluatedEvent.class) .addHandler(c -\u0026gt; c .iff(EventPredicates.isValid()) // use a Predicate\u0026lt;ValidationEvent\u0026gt; to select events .handle(e -\u0026gt; { // ... }) ) .build(); ","permalink":"","tags":null,"title":"Apache Jena SHACL"},{"categories":null,"contents":"jena-shex is an implementation of the ShEx (Shape Expressions) language.\nStatus jena-shex reads ShExC (the compact syntax) files.\nNot currently supported:\nsemantic actions EXTERNAL Blank node label validation is meaningless in Jena because a blank node label is scoped to the file, and not retained after the file has been read.\nCommand line The command shex introduces ShEx operations; it takes a sub-command argument.\nTo validate:\nshex validate --schema SCHEMA.shex --map MAP.smap --data DATA.ttl shex v -s SCHEMA.shex -m MAP.smap -d data.ttl To parse a file:\nshex parse FILE shex p FILE which writes out the parser results in a text format.\nAPI The package org.apache.jena.shex has the main classes.\nShex for reading ShEx related formats. ShexValidation for validation. API Examples Examples:\n\npublic static void main(String ...args) { String SHAPES = \u0026#34;examples/schema.shex\u0026#34;; String SHAPES_MAP = \u0026#34;examples/shape-map.shexmap\u0026#34;; String DATA = \u0026#34;examples/data.ttl\u0026#34;; System.out.println(\u0026#34;Read data\u0026#34;); Graph dataGraph = RDFDataMgr.loadGraph(DATA); System.out.println(\u0026#34;Read schema\u0026#34;); ShexSchema shapes = Shex.readSchema(SHAPES); // Shapes map. System.out.println(\u0026#34;Read shapes map\u0026#34;); ShapeMap shapeMap = Shex.readShapeMap(SHAPES_MAP); // ShexReport System.out.println(\u0026#34;Validate\u0026#34;); ShexReport report = ShexValidator.get().validate(dataGraph, shapes, shapeMap); System.out.println(); // Print report. ShexLib.printReport(report); } ","permalink":"","tags":null,"title":"Apache Jena ShEx"},{"categories":null,"contents":"Jump to the \u0026ldquo;Changes\u0026rdquo; section.\nOverview The SPARQL specifications provide query, update and the graph store protocol (GSP). In addition, Jena provided store operations for named graph formats.\nFor working with RDF data:\nAPI GPI Model Graph Statement Triple Resource Node Literal Node String Var Dataset DatasetGraph Quad and for SPARQL,\nAPI GPI RDFConnection RDFLink QueryExecution QueryExec UpdateExecution UpdateExec ResultSet RowSet ModelStore GSP ModelStore DSP Jena provides a single interface, RDFConnection for working with local and remote RDF data using these protocols in a unified way. This is most useful for remote data because the setup to connect is more complicated and can be done once and reused.\nHTTP authentication support is provided, supporting both basic and digest authentication in challenge-response scenarios. Most authentication setup is abstracted away from the particualr HTTP client library Jena is using.\nApplications can also use the various execution engines through QueryExecution, UpdateExecution and ModelStore.\nAll the main implementations work at \u0026ldquo;Graph SPI\u0026rdquo; (GPI) level and an application may wish to work with this lower level interface that implements generalized RDF (i.e. a triple is any three nodes, including ones like variables, and subsystem extension nodes).\nThe GPI version is the main machinery working at the storage and network level, and the API version is an adapter to convert to the Model API and related classes.\nUpdateProcessor is a legacy name for UpdateExecution\nGSP provides the SPARQL Graph Store Protocol, and \u0026lsquo;DSP\u0026rsquo; (Dataset Store Protocol) provides for sending and receiving datasets, rather than individual graphs.\nBoth API and GPI provide builders for detailed setup, particularly for remote usage over HTTP and HTTPS where detailed control of the HTTP requests is sometimes necessary to work with other triple stores.\nUse of the builders is preferred to factories. Factory style functions for many common usage patterns are retained in QueryExecutionFactory, UpdateExecutionFactory. Note that any methods that involved Apache HttpClient objects have been removed.\nChanges from Jena 4.2.0 Changes at Jena 4.3.0 Execution objects have a companion builder. This is especially important of HTTP as there many configuration options that may be needed. Local use is still covered by the existing QueryExecutionFactory as well as the new QueryExecutionBuilder.\nHTTP usage provided by the JDK package, with challenge-based authentication provided on top by Jena. See the authentiucation documentation.\nAuthentication support is uniformly applied to query, update, GSP, DSP and SERVICE.\nHTTP/2 support\nRemove Apache HttpClient usage\nWhen using this for authentication, application code changes wil be necessary. Deprecate modifying QueryExecution after it is built.\nSubstitution of variables for concrete values in query and update execution. This is a form of paramterization that works in both local and remnote usage (unlike \u0026ldquo;initial bindings\u0026rdquo; which are only available for local query execution). See the substitution section section below.\nHttpOp, using, is split into HttpRDF for GET/POST/PUT/DELETE of graphs and datasets and new HttpOp for packaged-up common patterns of HTTP usage.\nThe previous HttpOp is available as HttpOp1 and Apache HttpClient is still a dependency. Eventually, HttpOp and dependency on Apache HttpClient will be removed.\nGSP - support for dataset operations as well as graphs (also supported by Fuseki).\nDatasetAccessors removed - previously these were deprecated. GSP and ModelStore are the replacement for remote operations. RDFConnection and RDFLink provide APIs.\nChanges at Jena 4.5.0 Separate the dataset operations from the graph operations.\nGSP - SPARQL Graph Store Protocol\nDSP - Dataset Store Protocol: HTTP GET, POST, PUT operations on the datatse, e.g. quad formats like TriG.\nSubstitution All query and update builders provide operations to use a query and substitute variables for concrete RDF terms in the execution.\nUnlike \u0026ldquo;initial bindings\u0026rdquo; substitution is provided in query and update builders for both local and remote cases.\nSubstitution is always \u0026ldquo;replace variable with RDF term\u0026rdquo; in a query or update that is correct syntax. This means it does not apply to INSERT DATA or DELETE DATA but can be used with INSERT { ?s ?p ?o } WHERE {} and DELETE { ?s ?p ?o } WHERE {}.\nFull example:\nResultSet resultSet1 = QueryExecution.dataset(dataset) .query(prefixes+\u0026#34;SELECT * { ?person foaf:name ?name }\u0026#34;) .substitution(\u0026#34;name\u0026#34;, name1) .select(); ResultSetFormatter.out(resultSet1); Substitution is to be preferred over \u0026ldquo;initial bindings\u0026rdquo; because it is clearly defined and applies to both query and update in both local and remote uses.\n\u0026ldquo;Substitution\u0026rdquo; and \u0026ldquo;initial bindings\u0026rdquo; are similar but not identical.\nSee also\nParameterized Queries Jena Query Builder which provide different ways to build a query.\nRDFConnection RDFConnection\ntry ( RDFConnection conn = RDFConnectionRemote.service(dataURL).build()) { conn.update(\u0026#34;INSERT DATA{}\u0026#34;); conn.queryAsk(\u0026#34;ASK{}\u0026#34;); } or the less flexible:\ntry ( RDFConnection conn = RDFConnection.connect(dataURL) ) { conn.update(\u0026#34;INSERT DATA{}\u0026#34;); conn.queryAsk(\u0026#34;ASK{}\u0026#34;); } Query Execution Builder Examples Builders are reusable and modifiable after a \u0026ldquo;build\u0026rdquo; operation.\nDataset dataset = ... Query query = ... try ( QueryExecution qExec = QueryExecution.create() .dataset(dataset) .query(query) .build() ) { ResultSet results = qExec.execSelect(); ... use results ... } and remote calls:\ntry ( QueryExecution qExec = QueryExecutionHTTP.service(\u0026#34;http://....\u0026#34;) .query(query) .build() ) { ResultSet results = qExec.execSelect(); ... use results ... } Factory Examples\nDataset dataset = ... Query query = ... try ( QueryExecution qExec = QueryExecutionFactory.create(query, dataset) ) { ResultSet results = qExec.execSelect(); ... use results ... } More complex setup:\n// JDK HttpClient HttpClient httpClient = HttpClient.newBuilder() .connectTimeout(Duration.ofSeconds(10)) // Timeout to connect .followRedirects(Redirect.NORMAL) .build(); try ( QueryExecution qExec = QueryExecutionHTTP.create() .service(\u0026#34;http:// ....\u0026#34;) .httpClient(httpClient) .query(query) .sendMode(QuerySendMode.asPost) .timeout(30, TimeUnit.SECONDS) // Timeout of request .build() ) { ResultSet results = qExec.execSelect(); ... use results ... } There is only one timeout setting for eacho HTTP query execution. The \u0026ldquo;time to connect\u0026rdquo; is handled by the JDK HttpClient. Timeouts for local execution are \u0026ldquo;time to first result\u0026rdquo; and \u0026ldquo;time to all results\u0026rdquo; as before.\nModelStore and GSP Model model = ModelStore.service(\u0026#34;http://fuseki/dataset\u0026#34;).defaultGraph().GET(); Graph graph = GSP.service(\u0026#34;http://fuseki/dataset\u0026#34;).defaultGraph().GET(); Graph graph = ... ; GSP.request(\u0026#34;http://fuseki/dataset\u0026#34;).graphName(\u0026#34;http;//data/myGraph\u0026#34;).POST(graph); DatasetGraph dataset = GSP.request(\u0026#34;http://fuseki/dataset\u0026#34;).getDataset(); SERVICE Old documentation - configuration, especially for authentication, has changed.\nSERVICE configuration See below for more on HTTP authentication with SERVICE.\nThe configuration of SERVICE operations has changed in Jena 4.3.0 and the parameter names have changed.\nSymbol Java Constant Usage arq:httpServiceAllowed ARQ.httpServiceAllowed False to disable arq:httpQueryClient ARQ.httpQueryCient An object arq:httpServiceSendMode ARQ.httpServiceSendMode See Service documentation where arq: is prefix for \u0026lt;\u0026gt;.\nThe timeout is now only for the overall request and manged by the HTTP client code.\nCompression of responses is not currently supported.\nCustomization of HTTP requests There is a mechanism to modify HTTP requests to specific endpoints or to a collection of endpoints with the same prefix.\nFor example, to add a header X-Tracker to each request to a particular server:\nAtomicLong counter = new AtomicLong(0); HttpRequestModifier modifier = (params, headers)-\u0026gt;{ long x = counter.incrementAndGet(); headers.put(\u0026#34;X-Tracker\u0026#34;, \u0026#34;Call=\u0026#34;+x); }; // serverURL is the HTTP URL for the server or part of the server HTTP space. RegistryRequestModifier.get().addPrefix(serverURL, modifier); The RegistryRequestModifier registry is checked on each HTTP operation. It maps URLs or prefix of URLs to a function of interface HttpRequestModifier which has access to the headers and the query string parameters of the request.\nAuthentication Documentation for authentication.\n","permalink":"","tags":null,"title":"Apache Jena SPARQL APIs"},{"categories":null,"contents":"ARQ is a query engine for Jena that supports the SPARQL RDF Query language. SPARQL is the query language developed by the W3C RDF Data Access Working Group.\nARQ Features Standard SPARQL Free text search via Lucene SPARQL/Update Access and extension of the SPARQL algebra Support for custom filter functions, including javascript functions Property functions for custom processing of semantic relationships Aggregation, GROUP BY and assignment as SPARQL extensions Support for federated query Support for extension to other storage systems Client-support for remote access to any SPARQL endpoint Introduction A Brief Tutorial on SPARQL Application API - covers the majority of application usages Frequently Asked Questions ARQ Support Application javadoc Command line utilities Querying remote SPARQL services HTTP Authentication for ARQ Logging Explaining queries Tutorial: manipulating SPARQL using ARQ Basic federated query (SERVICE) Property paths GROUP BY and counting SELECT expressions Sub-SELECT Negation Features of ARQ that are legal SPARQL syntax\nConditions in FILTERs Free text searches Accessing lists (RDF collections) Extension mechanisms Custom Expression Functions Property Functions Library Expression function library Property function library Writing SPARQL functions Writing SPARQL functions in JavaScript Custom execution of SERVICE Constructing queries programmatically Parameterized query strings ARQ and the SPARQL algebra Extending ARQ query execution and accessing different storage implementations Custom aggregates Caching and bulk-retrieval for SERVICE Extensions Feature of ARQ that go beyond SPARQL syntax.\nLATERAL Join RDF-star Operators and functions MOD and IDIV for modulus and integer division. LET variable assignment Order results using a Collation Construct Quad Generate JSON from SPARQL Update ARQ supports the W3C standard SPARQL Update language.\nSPARQL Update The ARQ SPARQL/Update API See Also Fuseki - Server implementation of the SPARQL protocol. TDB - A SPARQL database for Jena, a pure Java persistence layer for large graphs, high performance applications and embedded use. RDFConnection, a unified API for SPARQL Query, Update and Graph Store Protocol. W3C Documents SPARQL Query Language specification SPARQL Query Results JSON Format SPARQL Protocol Articles Articles and documentation elsewhere:\nIntroducing SPARQL: Querying the Semantic Web ( article by Leigh Dodds) Search RDF data with SPARQL (by Phil McCarthy) - article published on IBM developer works about SPARQL and Jena. SPARQL reference card (by Dave Beckett) Parameterised Queries with SPARQL and ARQ (by Leigh Dodds) Writing an ARQ Extension Function (by Leigh Dodds) RDF Syntax Specifications Turtle N-Triples TriG N-Quads ","permalink":"","tags":null,"title":"ARQ - A SPARQL Processor for Jena"},{"categories":null,"contents":"ARQ includes support for GROUP BY and counting. This was previously an ARQ extension but is now legal SPARQL 1.1\nGROUP BY A GROUP BY clause transforms a result set so that only one row will appear for each unique set of grouping variables. All other variables from the query pattern are projected away and are not available in the SELECT clause.\nPREFIX SELECT ?p ?q { . . . } GROUP BY ?p ?q SELECT * will include variables from the GROUP BY but no others. This ensures that results are always the same - including other variables from the pattern would involve choosing some value that was not constant across each section of the group and so lead to indeterminate results.\nThe GROUP BY clause can involve an expression. If the expression is named, then the value is included in the columns, before projection. An unnamed expression is used for grouping but the value is not placed in the result set formed by the GROUP BY clause.\nSELECT ?productId ?cost { . . . } GROUP BY ?productId (?num * ?price AS ?cost) HAVING A query may specify a HAVING clause to apply a filter to the result set after grouping. The filter may involve variables from the GROUP BY clause or aggregations.\nSELECT ?p ?q { . . . } GROUP BY ?p ?q HAVING (count(distinct *) \u0026gt; 1) Aggregation Currently supported aggregations:\nAggregator Description count(*) Count rows of each group element, or the whole result set if no GROUP BY. count(distinct *) Count the distinct rows of each group element, or the whole result set if no GROUP BY. count(?var) Count the number of times ?var is bound in a group. count(distinct ?var) Count the number of distinct values ?var is bound to in a group. sum(?x) Sum the variable over the group (non-numeric values and unbound values are ignored). When a variable is used, what is being counted is occurrences of RDF terms, that is names. It is not a count of individuals because two names can refer to the same individual.\nIf there was no explicit GROUP BY clause, then it is as if the whole of the result set forms a single group element. Equivalently, it is GROUP BY of no variables. Only aggregation expressions make sense in the SELECT clause as there are no variables from the query pattern to project out.\nARQ documentation index\n","permalink":"","tags":null,"title":"ARQ - Aggregates"},{"categories":null,"contents":"The application API is in the package org.apache.jena.query.\nOther packages contain various parts of the system (execution engine, parsers, testing etc). Most applications will only need to use the main package. Only applications wishing to programmatically build queries or modify the behaviour of the query engine need to use the others packages directly.\nKey Classes The package org.apache.jena.query is the main application package.\nQuery - a class that represents the application query. It is a container for all the details of the query. Objects of class Query are normally created by calling one of the methods of QueryFactory methods which provide access to the various parsers. QueryExecution - represents one execution of a query. QueryExecutionFactory - a place to get QueryExecution instances. DatasetFactory - a place to make datasets. For SELECT queries: QuerySolution - A single solution to the query. ResultSet - All the QuerySolutions. An iterator. ResultSetFormatter - turn a ResultSet into various forms; into json, text, or as plain XML. SELECT queries The basic steps in making a SELECT query are outlined in the example below. A query is created from a string using the QueryFactory. The query and model or RDF dataset to be queried are then passed to QueryExecutionFactory to produce an instance of a query execution. QueryExecution objects are java.lang.AutoCloseable and can be used in try-resource. Result are handled in a loop and finally the query execution is closed.\nimport org.apache.jena.query.* ; Model model = ... ; String queryString = \u0026quot; .... \u0026quot; ; Query query = QueryFactory.create(queryString) ; try (QueryExecution qexec = QueryExecutionFactory.create(query, model)) { ResultSet results = qexec.execSelect() ; for ( ; results.hasNext() ; ) { QuerySolution soln = results.nextSolution() ; RDFNode x = soln.get(\u0026quot;varName\u0026quot;) ; // Get a result variable by name. Resource r = soln.getResource(\u0026quot;VarR\u0026quot;) ; // Get a result variable - must be a resource Literal l = soln.getLiteral(\u0026quot;VarL\u0026quot;) ; // Get a result variable - must be a literal } } It is important to cleanly close the query execution when finished. System resources connected to persistent storage may need to be released.\nA ResultSet supports the Java iterator interface so the following is also a way to process the results if preferred:\nIterator\u0026lt;QuerySolution\u0026gt; results = qexec.execSelect() ; for ( ; results.hasNext() ; ) { QuerySolution soln = ; . . . } The step of creating a query and then a query execution can be reduced to one step in some common cases:\nimport org.apache.jena.query.* ; Model model = ... ; String queryString = \u0026quot; .... \u0026quot; ; try (QueryExecution qexec = QueryExecutionFactory.create(queryString, model)) { ResultSet results = qexec.execSelect() ; . . . } Passing a result set out of the processing loop. A ResultSet is an iterator and can be traversed only once. What is more, much of query execution and result set processing is handled internally in a streaming fashion. The ResultSet returned by execSelect is not valid after the QueryExecution is closed, whether explicitly or by try-resources as the QueryExecution implements AutoCloseable.\nA result set may be materialized - this is then usable outside\ntry (QueryExecution qexec = QueryExecutionFactory.create(queryString, model)) { ResultSet results = qexec.execSelect() ; results = ResultSetFactory.copyResults(results) ; return results ; // Passes the result set out of the try-resources } The result set from ResultSetFactory.copyResults is a ResultSetRewindable which has a reset() operation that positions the iterator at the start of the result again.\nThis can also be used when the results are going to be used in a loop that modifies the data. It is not possible to update the model or dataset while looping over the results of a SELECT query.\nThe models returned by execConstruct and execDescribe are valid after the QueryExecution is closed.\nExample: formatting a result set Instead of a loop to deal with each row in the result set, the application can call an operation of the ResultSetFormatter. This is what the command line applications do.\nExample: processing results to produce a simple text presentation:\nResultSetFormatter fmt = new ResultSetFormatter(results, query) ; fmt.printAll(System.out) ; or simply:\nResultSetFormatter.out(System.out, results, query) ; Example: Processing results The results are objects from the Jena RDF API and API calls, which do not modify the model, can be mixed with query results processing:\nfor ( ; results.hasNext() ; ) { // Access variables: soln.get(\u0026quot;x\u0026quot;) ; RDFNode n = soln.get(\u0026quot;x\u0026quot;) ; // \u0026quot;x\u0026quot; is a variable in the query // If you need to test the thing returned if ( n.isLiteral() ) ((Literal)n).getLexicalForm() ; if ( n.isResource() ) { Resource r = (Resource)n ; if ( ! r.isAnon() ) { ... r.getURI() ... } } } Updates to the model must be carried out after the query execution has finished. Typically, this involves collecting results of interest in a local datastructure and looping over that structure after the query execution has finished and been closed.\nCONSTRUCT Queries CONSTRUCT queries return a single RDF graph. As usual, the query execution should be closed after use.\nQuery query = QueryFactory.create(queryString) ; QueryExecution qexec = QueryExecutionFactory.create(query, model) ; Model resultModel = qexec.execConstruct() ; qexec.close() ; DESCRIBE Queries DESCRIBE queries return a single RDF graph. Different handlers for the DESCRIBE operation can be loaded by added by the application.\nQuery query = QueryFactory.create(queryString) ; QueryExecution qexec = QueryExecutionFactory.create(query, model) ; Model resultModel = qexec.execDescribe() ; qexec.close() ; ASK Queries The operation Query.execAsk() returns a boolean value indicating whether the query pattern matched the graph or dataset or not.\nQuery query = QueryFactory.create(queryString) ; QueryExecution qexec = QueryExecutionFactory.create(query, model) ; boolean result = qexec.execAsk() ; qexec.close() ; Formatting XML results The ResultSetFormatter class has methods to write out the SPARQL Query Results XML Format. See ResultSetFormatter.outputAsXML method.\nDatasets The examples above are all queries on a single model. A SPARQL query is made on a dataset, which is a default graph and zero or more named graphs. Datasets can be constructed using the DatasetFactory:\nString dftGraphURI = \u0026quot;file:default-graph.ttl\u0026quot; ; List namedGraphURIs = new ArrayList() ; namedGraphURIs.add(\u0026quot;file:named-1.ttl\u0026quot;) ; namedGraphURIs.add(\u0026quot;file:named-2.ttl\u0026quot;) ; Query query = QueryFactory.create(queryString) ; Dataset dataset = DatasetFactory.create(dftGraphURI, namedGraphURIs) ; try(QueryExecution qExec = QueryExecutionFactory.create(query, dataset)) { ... } Already existing models can also be used:\nDataset dataset = DatasetFactory.create() ; dataset.setDefaultModel(model) ; dataset.addNamedModel(\u0026quot;http://example/named-1\u0026quot;, modelX) ; dataset.addNamedModel(\u0026quot;http://example/named-2\u0026quot;, modelY) ; try(QueryExecution qExec = QueryExecutionFactory.create(query, dataset)) { ... } ARQ documentation index\n","permalink":"","tags":null,"title":"ARQ - Application API"},{"categories":null,"contents":"ARQ includes support for a logical assignment of variables. If the variable is already bound, it acts like a filter, otherwise the value is assignment. This makes it position independent.\nThis involves is syntactic extension and is available is the query is parsed with language Syntax.syntaxARQ (which is the default).\nSee also SELECT expressions which is also a form of assignment.\nAssignment The general form is:\nLET ( variable := expression ) For example:\nLET ( ?x := 2 ) { ?x :name ?name . LET ( ?age2 := ?age - 21 ) Note: Assignment is \u0026ldquo;:=\u0026rdquo;\nAssignment Rules ARQ assignment is single assignment, that is, once a variable is assigned a binding, then it can not be changed in the same query solution.\nOnly one LET expression per variable is allowed in a single scope.\nThe execution rules are:\nIf the expression does not evaluate (e.g. unbound variable in the expression), no assignment occurs and the query continues. If the variable is unbound, and the expression evaluates, the variable is bound to the value. If the variable is bound to the same value as the expression evaluates, nothing happens and the query continues. If the variable is bound to a different value as the expression evaluates, an error occurs and the current solution will be excluded from the results. Note that \u0026ldquo;same value\u0026rdquo; means the same as applies to graph pattern matching, not to FILTER expressions. Some graph implementation only provide same-term graph pattern matching. FILTERs always do value-based comparisons for \u0026ldquo;=\u0026rdquo; for all graphs.\nUse with CONSTRUCT One use is to perform some calculation prior to forming the result graph in a CONSTRUCT query.\nCONSTRUCT { ?x :lengthInCM ?cm } WHERE { ?x :lengthInInches ?inch . LET ( ?cm := ?inches/2.54 ) } Use with !BOUND The OPTIONAL/!BOUND/FILTER idiom for performing limited negation of a pattern in SPARQL can be inconvenient because it requires a variable in the OPTIONAL to be assigned by pattern matching. Using a LET can make that easier; here, we assign to ?z (any value will do) to mark when the matching pattern included the OPTIONAL pattern.\nExample: ?x with no \u0026ldquo;:p 1\u0026rdquo; triple:\n{ ?x a :FOO . OPTIONAL { ?x :p 1 . LET (?z := true) } FILTER ( !BOUND(?z) ) } Note that negation is supported properly through the NOT EXISTS form.\nARQ documentation index\n","permalink":"","tags":null,"title":"ARQ - Assignment"},{"categories":null,"contents":"There are already ways to access remote RDF data. The simplest is to read a document which is an RDF graph and query it. Another way is with the SPARQL protocol which allows a query to be sent to a remote service endpoint and the results sent back (in RDF, or an XML-based results format or even a JSON one).\nSERVICE is a feature of SPARQL 1.1 that allows an executing query to make a SPARQL protocol to another SPARQL endpoint.\nSyntax PREFIX : \u0026lt;http://example/\u0026gt; PREFIX dc: \u0026lt;\u0026gt; SELECT ?a FROM \u0026lt;mybooks.rdf\u0026gt; { ?b dc:title ?title . SERVICE \u0026lt;\u0026gt; { ?s dc:title ?title . ?s dc:creator ?a } } Algebra There is an operator in the algebra.\n(prefix ((dc: \u0026lt;\u0026gt;)) (project (?a) (join (BGP [triple ?b dc:title ?title]) (service \u0026lt;\u0026gt; (BGP [triple ?s dc:title ?title] [triple ?s dc:creator ?a] )) ))) Performance Considerations This feature is a basic building block to allow remote access in the middle of a query, not a general solution to the issues in distributed query evaluation. The algebra operation is executed without regard to how selective the pattern is. So the order of the query will affect the speed of execution. Because it involves HTTP operations, asking the query in the right order matters a lot. Don\u0026rsquo;t ask for the whole of a bookstore just to find a book whose title comes from a local RDF file - ask the bookshop a query with the title already bound from earlier in the query.\nControlling SERVICE requests. The SERVICE operation in a SPARQL query may be configured via the Context. The values for configuration can be set in the global context (accessed via ARQ.getContext()) or in the per-query execution context.\nThe prefix arq: is \u0026lt;\u0026gt;.\nSymbol Java Constant Default arq:httpServiceAllowed ARQ.httpServiceAllowed true arq:httpQueryClient ARQ.httpQueryClient System default. arq:httpServiceSendMode `ARQ.httpServiceSendMode unset arq:httpServiceAllowed This setting can be used to disable execution of any SERVICE request in query. Set to \u0026ldquo;false\u0026rdquo; to prohibit SERVICE requests.\narq:httpQueryClient The HttpClient object to use for SERVICE execution.\narq:httpServiceSendMode The HTTP operation to use. The value is a string or a QuerySendMode object.\nString settings are:\nSetting Effect \u0026ldquo;POST\u0026rdquo; Use HTTP POST. Same as \u0026ldquo;asPost\u0026rdquo;. \u0026ldquo;GET\u0026rdquo; Use HTTP GET unconditionally. Same as \u0026ldquo;asGetAlways\u0026rdquo;. \u0026ldquo;asGetAlways\u0026rdquo; Use HTTP GET. \u0026ldquo;asGetWithLimitBody\u0026rdquo; Use HTTP GET upto a size limit (usually 2kbytes). \u0026ldquo;asGetWithLimitForm\u0026rdquo; Use HTTP GET upto a size limit (usually 2kbytes), and use a HTML form for the query. \u0026ldquo;asPostForm\u0026rdquo; Use HTTP POST and use an HTML form for the query. \u0026ldquo;asPost\u0026rdquo; Use HTTP POST. Old Context setting Old settings are honored where possible but should not be used:\nThe prefix srv: is the IRI \u0026lt;\u0026gt;.\nSymbol Usage Default srv:queryTimeout Set timeouts none srv:queryCompression Enable use of deflation and GZip true srv:queryClient Enable use of a specific client none srv:serviceContext Per-endpoint configuration none srv:queryTimeout As documented above.\nsrv:queryCompression Sets the flag for use of deflation and GZip.\nBoolean: True indicates that gzip compressed data is acceptable.\nsrv:queryClient Enable use of a specific client\nProvides a slot for a specific HttpClient for use with a specific SERVICE\nsrv:serviceContext Provides a mechanism to override system context settings on a per URI basis.\nThe value is a Map\u0026lt;String,Context\u0026gt; where the map key is the URI of the service endpoint, and the Context is a set of values to override the default values.\nIf a context is provided for the URI, the system context is copied and the context for the URI is used to set specific values. This ensures that any URI specific settings will be used.\n","permalink":"","tags":null,"title":"ARQ - Basic Federated SPARQL Query"},{"categories":null,"contents":"It is possible to build queries by building and abstract syntax tree (as the parser does) or by building the algebra expression for the query. It is usually better to work with the algebra form as it is more regular.\nSee the examples such as arq.examples.algrebra.AlgebraExec at jena-examples:arq/examples\nSee also ARQ - SPARQL Algebra\nARQ documentation index\n","permalink":"","tags":null,"title":"ARQ - Building Queries Programmatically"},{"categories":null,"contents":"ARQ supports sorting results in a query. Users are able to specify an expression that can be a function (built-in function, custom function, or a variable).\nBy default, results are sorted using the default behavior provided by the JVM. If you have the following query.\nSELECT ?label WHERE { VALUES ?label { \u0026quot;tsahurin kieli\u0026quot;@fi \u0026quot;tšekin kieli\u0026quot;@fi \u0026quot;tulun kieli\u0026quot;@fi \u0026quot;töyhtöhyyppä\u0026quot;@fi } } ORDER BY ?label The results will be returned exactly in the following order.\n\u0026ldquo;töyhtöhyyppä\u0026rdquo;@fi \u0026ldquo;tsahurin kieli\u0026rdquo;@fi \u0026ldquo;tšekin kieli\u0026rdquo;@fi \u0026ldquo;tulun kieli\u0026rdquo;@fi However, in Finnish the expected order is as follows.\n\u0026ldquo;tsahurin kieli\u0026rdquo;@fi \u0026ldquo;tšekin kieli\u0026rdquo;@fi \u0026ldquo;tulun kieli\u0026rdquo;@fi \u0026ldquo;töyhtöhyyppä\u0026rdquo;@fi To specify the collation used for sorting, we can use the ARQ collation function.\nPREFIX arq: \u0026lt;\u0026gt; SELECT ?label WHERE { VALUES ?label { \u0026quot;tsahurin kieli\u0026quot;@fi \u0026quot;tšekin kieli\u0026quot;@fi \u0026quot;tulun kieli\u0026quot;@fi \u0026quot;töyhtöhyyppä\u0026quot;@fi } } ORDER BY arq:collation(\u0026quot;fi\u0026quot;, ?label) The function collation receives two parameters. The first is the desired collation, and the second is the function (which can be a variable, or another function).\nThe collation used, will be the Finnish collation algorithm provided with the JVM. This is done through calls to methods in the java.util.Locale class and in the java.text.Collator, to retrieve a collator.\nIf the desired collation is not available, or invalid, the JVM behavior is also adopted. It may return the default collator, but it may vary depending on the JVM vendor.\nNote that this function was released with Jena 3.4.0. Mixing locales may lead to undesired results. See JENA-1313 for more information about the implementation details.\nARQ documentation index\n","permalink":"","tags":null,"title":"ARQ - Collation"},{"categories":null,"contents":"The arq package contains some command line applications to run queries, parse queries, process result sets and run test sets.\nYou will need to set the classpath, or use the helper scripts, to run these applications from the command line. The helper scripts are in bin/ (Linux, Unix, Cygwin, OS/X) and bat/ (Windows) directories. There are ancillary scripts in the directories that the main commands need - see the tools page for setup details.\nThe commands look for file in the current directory, as well as the usual log4j2 initialization with property log4j.configurationFile and looking for classpath resource; there is a default setup of log4j2 built-in.\narq.query is the main query driver.\narq.qparse : parse and print a SPARQL query.\narq.uparse : parse and print a SPARQL update.\narq.update : execute SPARQL/Update requests.\narq.remote : execute a query by HTTP on a remote SPARQL endpoint.\narq.rset : transform result sets.\narq.qexpr : evaluate and print an expression.\nAll commands have a --help command for a summary of the arguments.\nWhen using a query in a file, if the query file ends .rq, it is assumed to be a SPARQL query. If it ends .arq, it is assumed to be an ARQ query (extensions to SPARQL). You can specify the syntax explicitly.\narq.query This is the main command for executing queries on data. The wrappers just set the query language.\narq.sparql : wrapper for SPARQL queries arq.arq : wrapper for ARQ queries Running arq.query --helpprints the usage message. The main arguments are:\n--query FILE : The file with the query to execute --data FILE : The data to query. It will be included in the default graph. --namedgraph FILE : The data to query. It will be included as a named graph. --desc/--dataset: Jena Assembler description of the dataset to be queried, augmented with vocabulary for datasets, not just graphs. See etc/ for examples. The file extension is used to guess the file serialization format. If a data file ends .n3, it is assumed to be N3; if it ends .ttl is Turtle; if it is .nt is N-Triples; otherwise it is assumed to be RDF/XML. The data serialization can be explicitly specified on the command line.\narq.qparse Parse a query and print it out.\narq.qparse will parse the query, print it out again (with line numbers by default) and then parse the serialized query again. If your query has a syntax error, a message is printed but no query is printed. If a query is printed then you get a syntax error message, then your query was syntactically correct but the ARQ serialization is broken. Please report this.\nThe command arq.qparse --print=op --file \u0026lt;i\u0026gt;queryFile\u0026lt;/i\u0026gt;will print the SPARQL algebra for the query in SSE format.\narq.uparse Parse a SPARQL update print it out.\narq.uparse will parse the update, print it out again (with line numbers by default) and then parse the serialized update again. If your update has a syntax error, a message is printed but no update is printed. If a update is printed then you get a syntax error message, then your query was syntactically correct but the ARQ serialization is broken. Please report this.\narq.update Execute SPARQL Update requests.\n--desc: Jena Assembler description of the dataset or graph store to be updated. See etc/ for examples. arq.rset Read and write result sets.\nIn particular,\njava -cp ... arq.rset --in xml --out text will translate a SPARQL XML Result Set into a tabular text form.\narq.qexpr Read and print an expression (something that can go in a FILTER clause). Indicates whether an evaluation exception occurred.\nThe -v argument prints the parsed expression.\narq.remote Execute a request on a remote SPARQL endpoint using HTTP.\n--service URL : The endpoint. --data FILE : Dataset description (default graph) added to the request. --namedgraph FILE : Dataset description (named graph) added to the request. --results FORMAT : Write results in specified format. Does not change the request to the server which is always for an XML form. ","permalink":"","tags":null,"title":"ARQ - Command Line Applications"},{"categories":null,"contents":"The current W3C recommendation of SPARQL 1.1 supports the CONSTRUCT query form, which returns a single RDF graph specified by a graph template. The result is an RDF graph formed by taking each query solution in the solution sequence, substituting for the variables in the graph template, and combining the triples into a single RDF graph by set union. However, it does not directly generate quads or RDF datasets.\nIn order to eliminate this limitation, Jena ARQ extends the grammar of the CONSTRUCT query form and provides the according components, which brings more conveniences for the users manipulating RDF datasets with SPARQL.\nThis feature was added in Jena 3.0.1.\nQuery Syntax A CONSTRUCT template of the SPARQL 1.1 query String is Turtle format with possible variables. The syntax for this extension follows that style in ARQ, using TriG plus variables. Just like SPARQL 1.1, there are 2 forms for ARQ Construct Quad query:\nComplete Form CONSTRUCT { # Named graph GRAPH :g { ?s :p ?o } # Default graph { ?s :p ?o } # Default graph :s ?p :o } WHERE { # SPARQL 1.1 WHERE Clause ... } The default graphs and the named graphs can be constructed within the CONSTRUCT clause in the above way. Note that, for constructing the named graph, the token of GRAPH can be optional. The brackets of the triples to be constructed in the default graph can also be optional.\nShort Form CONSTRUCT WHERE { # Basic dataset pattern (only the default graph and the named graphs) ... } A short form is provided for the case where the template and the pattern are the same and the pattern is just a basic dataset pattern (no FILTERs and no complex graph patterns are allowed in the short form). The keyword WHERE is required in the short form.\nGrammar The normative definition of the syntax grammar of the query string is defined in this table:\nRule Expression ConstructQuery ::= \u0026lsquo;CONSTRUCT\u0026rsquo; ( ConstructTemplate DatasetClause* WhereClause SolutionModifier | DatasetClause* \u0026lsquo;WHERE\u0026rsquo; \u0026lsquo;{\u0026rsquo; ConstructQuads \u0026lsquo;}\u0026rsquo; SolutionModifier ) ConstructTemplate ::= \u0026lsquo;{\u0026rsquo; ConstructQuads \u0026lsquo;}\u0026rsquo; ConstructQuads ::= TriplesTemplate? ( ConstructQuadsNotTriples \u0026lsquo;.\u0026rsquo;? TriplesTemplate? )* ConstructQuadsNotTriples ::= ( \u0026lsquo;GRAPH\u0026rsquo; VarOrBlankNodeIri )? \u0026lsquo;{\u0026rsquo; TriplesTemplate? \u0026lsquo;}\u0026rsquo; TriplesTemplate ::= TriplesSameSubject ( \u0026lsquo;.\u0026rsquo; TriplesTemplate? )? DatasetClause, WhereClause, SolutionModifier, TriplesTemplate, VarOrIri, TriplesSameSubject are as for the SPARQL 1.1 Grammar\nProgramming API ARQ provides 2 additional methods in QueryExecution for Construct Quad.\nIterator\u0026lt;Quad\u0026gt; QueryExecution.execConstructQuads() // allow duplication Dataset QueryExecution.execConstructDataset() // no duplication One difference of the 2 methods is: The method of execConstructQuads() returns an Iterator of Quad, allowing duplication. But execConstructDataset() constructs the desired Dataset object with only unique Quads.\nIn order to use these methods, it\u0026rsquo;s required to switch on the query syntax of ARQ beforehand, when creating the Query object:\nQuery query = QueryFactory.create(queryString, Syntax.syntaxARQ); If the query is supposed to construct only triples, not quads, the triples will be constructed in the default graph. For example:\nString queryString = \u0026quot;CONSTRUCT { ?s ?p ?o } WHERE ... \u0026quot; ... // The graph node of the quads are the default graph (ARQ uses \u0026lt;urn:x-arq:DefaultGraphNode\u0026gt;). Iterator\u0026lt;Quad\u0026gt; quads = qexec.execConstructQuads(); If the query string stands for constructing quads while the method of exeConstructTriples() are called, it returns only the triples in the default graph of the CONSTRUCT query template. It\u0026rsquo;s called a \u0026ldquo;projection\u0026rdquo; on the default graph. For instance:\nString queryString = \u0026quot;CONSTRUCT { ?s ?p ?o . GRAPH ?g1 { ?s1 ?p1 ?o1 } } WHERE ...\u0026quot; ... // The part of \u0026quot;GRAPH ?g1 { ?s1 ?p1 ?o1 }\u0026quot; will be ignored. Only \u0026quot;?s ?p ?o\u0026quot; in the default graph will be returned. Iterator\u0026lt;Triple\u0026gt; triples = qexec.execConstructTriples(); More examples can be found at at jena-examples:arq/examples/constructquads/.\nFuseki Support Jena Fuseki is also empowered with Construct Quad query as a built-in function. No more additional configuration is required to switch it on. Because QueryEngineHTTP is just an implementation of QueryExecution, there\u0026rsquo;s not much difference for the client users to manipulate the programming API described in the previous sections, e.g.\nString queryString = \u0026quot; CONSTRUCT { GRAPH \u0026lt;http://example/ns#g1\u0026gt; {?s ?p ?o} } WHERE {?s ?p ?o}\u0026quot; ; Query query = QueryFactory.create(queryString, Syntax.syntaxARQ); try ( QueryExecution qExec = QueryExecution.service(serviceQuery).query(query).build() ) { // serviceQuery is the URL of the remote service Iterator\u0026lt;Quad\u0026gt; result = qExec.execConstructQuads(); ... } ... ARQ documentation index\n","permalink":"","tags":null,"title":"ARQ - Construct Quad"},{"categories":null,"contents":"ARQ supports custom aggregate functions as allowed by the SPARQL 1.1 specification.\nSee jena-examples:arq/examples/aggregates.\n","permalink":"","tags":null,"title":"ARQ - Custom aggregates"},{"categories":null,"contents":"Since Jena 4.2.0, ARQ features a plugin system for custom service executors. The relevant classes are located in the package org.apache.jena.sparql.service and are summarized as follows:\nServiceExecutorRegistry: A registry that holds a list of service executors. When Jena starts up, it configures a default registry to handle SERVICE requests against HTTP SPARQL endpoints and registers it with the global ARQ context accessible under ARQ.getContext().\nServiceExecutorFactory: This is the main interface for custom SERVICE handler implementations:\npublic interface ServiceExecutorFactory { public ServiceExecution createExecutor(OpService substituted, OpService original, Binding binding, ExecutionContext execCxt); } The second OpService parameter represents the original SERVICE clause as it occurs in the query, whereas the first parameter is the OpService obtained after substitution of all mentioned variables w.r.t. the current binding. A ServiceExecutorFactory can indicate its non-applicability for handling a request simply by returning null. In that case, Jena will ask the next service executor factory in the registry. If a request remains unhandled then the QueryExecException No SERVICE handler is raised.\nServiceExecution: If a ServiceExectorFactory can handle a request then it needs to returns a ServiceExecution instance: public interface ServiceExecution { public QueryIterator exec(); } The actual execution is started by calling the exec() method which returns a QueryIterator. Note, that there are uses cases where ServiceExecution instances may not have to be executed. For example, one may analyze which service executor factories among a set of them claim to be capable of handling a request. This can be useful for debugging or display in a dashboard of applicable service executors.\nExamples A runnable example suite is located in the jena-examples module at\nIn the remainder we summarize the essentials of setting up a custom service executor. The following snippet sets up a simple service executor factory that relays queries targeted at Wikidata to DBpedia:\nNode WIKIDATA = NodeFactory.createURI(\u0026#34;\u0026#34;); Node DBPEDIA = NodeFactory.createURI(\u0026#34;\u0026#34;); ServiceExecutorFactory myExecutorFactory = (opExecute, original, binding, execCxt) -\u0026gt; { if (opExecute.getService().equals(WIKIDATA)) { opExecute = new OpService(DBPEDIA, opExecute.getSubOp(), opExecute.getSilent()); return ServiceExecutorRegistry.httpService.createExecutor(opExecute, original, binding, execCxt); } return null; }; Global vs Local Service Executor Registration The global registry can be accessed and modified as shown below:\nServiceExecutorRegistry globalRegistry = ServiceExecutorRegistry.get(); // Note: registry.add() prepends executor factories to the internal list such // that they are consulted first! globalRegistry.add(myExecutorFactory); The following snippet shows how a custom service executor can be configured locally for an individual query execution:\nContext cxt = ARQ.getContext().copy(); ServiceExecutorRegistry localRegistry = ServiceExecutorRegistry().get().copy(); localRegistry.add(myExecutorFactory); String queryStr = \u0026#34;SELECT * { SERVICE \u0026lt;\u0026gt; { ?s ?p \u0026#34;Apache Jena\u0026#34;@en } }\u0026#34;; try (QueryExecution qe = QueryExecutionFactory.create(queryStr)) { ServiceExecutorRegistry.set(qe.getContext(), registry); // ... } ","permalink":"","tags":null,"title":"ARQ - Custom Service Executors"},{"categories":null,"contents":"This page describes the mechanisms that can be used to extend and modify query execution within ARQ. Through these mechanisms, ARQ can be used to query different graph implementations and to provide different query evaluation and optimization strategies for particular circumstances. These mechanisms are used by TDB.\nARQ can be extended in various ways to incorporate custom code into a query. Custom filter functions and property functions provide ways to add application specific code. The free text search capabilities, using Apache Lucene, are provided via a property function. Custom filter functions and property functions should be used where possible.\nJena itself can be extended by providing a new implementation of the Graph interface. This can be used to encapsulate specific specialised storage and also for wrapping non-RDF sources to look like RDF. There is a common implementation framework provided by GraphBase so only one operation, the find method, needs to be written for a read-only data source. Basic find works well in many cases, and the whole Jena API will be able to use the extension. For higher SPARQL performance, ARQ can be extended at the basic graph matching or algebra level.\nApplications writers who extend ARQ at the query execution level should be prepared to work with the source code for ARQ for specific details and for finding code to reuse. Some examples can be found arq/examples directory\nOverview of ARQ Query processing The Main Query Engine Graph matching and a custom StageGenerator OpExecutor Quads Mixed Graph Implementation Datasets Custom Query Engines Extend the algebra Overview of ARQ Query Processing The sequence of actions performed by ARQ to perform a query are parsing, algebra generation, execution building, high-level optimization, low-level optimization and finally evaluation. It is not usual to modify the parsing step nor the conversion from the parse tree to the algebra form, which is a fixed algorithm defined by the SPARQL standard. Extensions can modify the algebra form by transforming it from one algebra expression to another, including introducing new operators. See also the documentation on working with the SPARQL algebra in ARQ including building algebra expressions programmatically, rather than obtaining them from a query string.\nParsing The parsing step turns a query string into a Query object. The class Query represents the abstract syntax tree (AST) for the query and provides methods to create the AST, primarily for use by the parser. The query object also provides methods to serialize the query to a string. Because this is from an AST, the string produced will be very close to the original query with the same syntactic elements, but without comments, and formatted with whitespace for readability. It is not usually the best way to build a query programmatically and the AST is not normally an extension point.\nThe query object can be used many times. It is not modified once created, and in particular it is not modified by query execution.\nAlgebra generation ARQ generates the SPARQL algebra expression for the query. After this a number of transformations can be applied (for example, identification of property functions) but the first step is the application of the algorithm in the SPARQL specification for translating a SPARQL query string, as held in a Query object into a SPARQL algebra expression. This includes the process of removing joins involving the identity pattern (the empty graph pattern).\nFor example, the query:\nPREFIX foaf: \u0026lt;\u0026gt; SELECT ?name ?mbox ?nick WHERE { ?x foaf:name ?name ; foaf:mbox ?mbox . OPTIONAL { ?x foaf:nick ?nick } } becomes\n(prefix ((foaf: \u0026lt;\u0026gt;)) (project (?name ?mbox ?nick) (leftjoin (bgp (triple ?x foaf:name ?name) (triple ?x foaf:mbox ?mbox) ) (bgp (triple ?x foaf:nick ?nick) ) ))) using the SSE syntax to write out the internal data-structure for the algebra.\nThe online SPARQL validator at can be used to see the algebra expression for a SPARQL query. This validator is also included in Fuseki.\nHigh-Level Optimization and Transformations There is a collection of transformations that can be applied to the algebra, such as replacing equality filters with a more efficient graph pattern and an assignment. When extending ARQ, a query processor for a custom storage layout can choose which optimizations are appropriate and can also provide its own algebra transformations.\nA transform is code that converts an algebra operation into other algebra operations. It is applied using the Transformer class:\nOp op = ... ; Transform someTransform = ... ; op = Transformer.transform(someTransform, op) ; The Transformer class applies the transform to each operation in the algebra expression tree. Transform itself is an interface, with one method signature for each operation type, returning a replacement for the operator instance it is called on.\nOne such transformation is to turn a SPARQL algebra expression involving named graphs and triples into one using quads. This transformation is performed by a call to Algebra.toQuadForm.\nTransformations proceed from the bottom of the expression tree to the top. Algebra expressions are best treated as immutable so a change made in one part of the tree should result in a copy of the tree above it. This is automated by the TransformCopy class which is the commonly used base class for writing transforms. The other helper base class is TransformBase, which provides the identify operation (returns the node supplied) for each transform operation.\nOperations can be printed out in SSE syntax. The Java toString method is overridden to provide pretty printing and the static methods in WriterOp provide output to various output objects like\nLow-Level Optimization and Evaluation The step of evaluating a query is the process of executing the algebra expression, as modified by any transformations applied, to yield a stream of pattern solutions. Low-level optimizations include choosing the order in which to evaluate basic graph patterns. These are the responsibility of the custom storage layer. Low-level optimization can be carried out dynamically as part of evaluation.\nInternally, ARQ uses iterators extensively. Where possible, evaluation of an operation is achieved by feeding the stream of results from the previous stage into the evaluation. A common pattern is to take each intermediate result one at a time (use QueryIterRepeatApply to be called for each binding) , substituting the variables of pattern with those in the incoming binding, and evaluating to a query iterator of all results for this incoming row. The result can be the empty iterator (one that always returns false for hasNext). It is also common to not have to touch the incoming stream at all but merely to pass it to sub-operations.\nQuery Engines and Query Engine Factories The steps from algebra generation to query evaluation are carried out when a query is executed via the QueryExecution.execSelect or other QueryExecution exec operation. It is possible to carry out storage-specific operations when the query execution is created. A query engine works in conjunction with a QueryExecution to provide the evaluation of a query pattern. QueryExecutionBase provides all the machinery for the different result types and does not need to be modified by extensions to query execution.\nARQ provides three query engine factories; the main query engine factory, one for a reference query engine and one to remotely execute a query. TDB provides its own query engine factories which they register during sub-system initialization. Both extend the main query engine described below.\nThe reference query engine is a direct top-down evaluation of the expression. Its purpose is to be simple so it can be easily verified and checked then its results used to check more complicated processing in the main engine and other implementations. All arguments to each operator are fully evaluated to produce intermediate in-memory tables then a simple implementation of the operator is called to calculate the results. It does not scale and does not perform any optimizations. It is intended to be clear and simple; it is not designed to be efficient.\nQuery engines are chosen by referring to the registry of query engine factories.\npublic interface QueryEngineFactory { public boolean accept(Query query, DatasetGraph dataset, Context context) ; public Plan create(Query query, DatasetGraph dataset, Binding inputBinding, Context context) ; public boolean accept(Op op, DatasetGraph dataset, Context context) ; public Plan create(Op op, DatasetGraph dataset, Binding inputBinding, Context context) ; } When the query execution factory is given a dataset and query, the query execution factory tries each registered engine factory in turn calling the accept method (for query or algebra depending on how it was presented). The registry is kept in reverse registration order - the most recently registered query engine factory is tried first. The first query engine factory to return true is chosen and no further engine factories are checked.\nWhen a query engine factory is chosen, the create method is called to return a Plan object for the execution. The main operation of the Plan interface is to get the QueryIterator for the query.\nSee the example arq.examples.engine.MyQueryEngine at jena-examples:arq/examples.\nThe Main Query Engine The main query engine can execute any query. It contains a number of basic graph pattern matching implementations including one that uses the Graph.find operation so it can work with any implementation of the Jena Graph SPI. The main query engine works with general purpose datasets but not directly with quad stores; it evaluates patterns on each graph in turn. The main query engine includes optimizations for the standard Jena implementation of in-memory graphs.\nHigh-level optimization is performed by a sequence of transformations. This set of optimizations is evolving. A custom implementation of a query engine can reuse some or all of these transformations (see Algebra.optimize which is the set of transforms used by the main query engine).\nThe main query engine is a streaming engine. It evaluates expressions as the client consumes each query solution. After preparing the execution by creating the initial conditions (a partial solution of one row and no bound variables or any initial bindings of variables), the main query engine calls QC.execute which is the algorithm to execute a query. Any extension that wished to reuse some of the main query engine by providing its own OpExecutor must call this method to evaluate a sub-operation.\nQC.execute finds the currently active OpExecutor factory, creates an OpExecutor object and invokes it to evaluate one algebra operation.\nThere are two points of extension for the main query engine:\nStage generators, for evaluating basic graph patterns and reusing the rest of the engine. OpExecutor to execute any algebra operator specially. The standard OpExecutor invokes the stage generator mechanism to match a basic graph pattern.\nGraph matching and a custom StageGenerator The correct point to hook into ARQ for just extending basic graph pattern matching (BGPs) is to provide a custom StageGenerator. (To hook into filtered basic graph patterns, the extension will need to provide its own OpExecutor factory). The advantage of the StageGenerator mechanism, as compared to the more general OpExecutor described below, is that it more self-contained and requires less detail about the internal evaluation of the other SPARQL algebra operators. This extension point corresponds to section 12.6 \u0026ldquo;Extending SPARQL Basic Graph Matching\u0026rdquo;.\nBelow is the default code to match a BGP from OpExecutor.execute(OpBGP, QueryIterator). It merely calls fixed code in the StageBuilder class.The input is a stream of results from earlier stages. The execution must return a query iterator that is all the possible ways to match the basic graph pattern for each of the inputs in turn. Order of results does not matter. protected QueryIterator execute(OpBGP opBGP, QueryIterator input) { BasicPattern pattern = opBGP.getPattern() ; return StageBuilder.execute(pattern, input, execCxt) ; } The StageBuilder looks for the stage generator by accessing the context for the execution:\nStageGenerator stageGenerator = (StageGenerator)context.get(ARQ.stageGenerator) ; where the context is the global context and any query execution specific additions together with various execution control elements.\nA StageGenerator is an implementation of:\npublic interface StageGenerator { public QueryIterator execute(BasicPattern pattern, QueryIterator input, ExecutionContext execCxt) ; } Setting the Stage Generator An extension stage generator can be registered on a per-query execution basis or (more usually) in the global context.\nStageBuilder.setGenerator(Context, StageGenerator) The global context can be obtained by a call to ARQ.getContext()\nStageBuilder.setGenerator(ARQ.getContext(), myStageGenerator) ; In order to allow an extensions to still permit other graphs to be used, stage generators are usually chained, with each new custom one passing the execution request up the chain if the request is not supported by this custom stage generator.\npublic class MyStageGenerator implements StageGenerator { StageGenerator above = null ; public MyStageGenerator (StageGenerator original) { above = original ; } @Override public QueryIterator execute(BasicPattern pattern, QueryIterator input, ExecutionContext execCxt) { Graph g = execCxt.getActiveGraph() ; // Test to see if this is a graph we support. if ( ! ( g instanceof MySpecialGraphClass ) ) // Not us - bounce up the StageGenerator chain return above.execute(pattern, input, execCxt) ; MySpecialGraphClass graph = (MySpecialGraphClass )g ; // Create a QueryIterator for this request ... This is registered by setting the global context (StageBuilder has a convenience operation to do this):\n// Get the standard one. StageGenerator orig = (StageGenerator)ARQ.getContext().get(ARQ.stageGenerator) ; // Create a new one StageGenerator myStageGenerator= new MyStageGenerator(orig) ; // Register it StageBuilder.setGenerator(ARQ.getContext(), myStageGenerator) ; Example: jena-examples:arq/examples/bgpmatching\nOpExecutor A StageGenerator provides matching for a basic graph pattern. If an extension wishes to take responsibility for more of the evaluation then it needs to work with OpExecutor. This includes evaluation of filtered basic graph patterns.\nAn example query using a filter:\nPREFIX dc: \u0026lt;\u0026gt; PREFIX books: \u0026lt;\u0026gt; SELECT * WHERE { ?book dc:title ?title . FILTER regex(?title, \u0026quot;Paddington\u0026quot;) } results in the algebra expression for the pattern:\n(filter (regex ?title \u0026quot;Paddington\u0026quot;) (bgp (triple ?book dc:title ?title) )) showing that the filter is being applied to the results of a basic graph pattern matching.\nNote: this is not the way to provide custom filter operations. See the documentation for application-provided filter functions.\nEach step of evaluation in the main query engine is performed by a OpExecutor and a new one is created from a factory at each step. The factory is registered in the execution context. The implementation of a specialized OpExecutor can inherit from the standard one and override only those algebra operators it wishes to deal with, including inspecting the execution and choosing to pass up to the super-class based on the details of the operation. From the query above, only regex filters might be specially handled.\nRegistering an OpExecutorFactory:\nOpExecutorFactory customExecutorFactory = new MyOpExecutorFactory(...) ; QC.setFactory(ARQ.getCOntext(), customExecutorFactory) ; QC is a point of indirection that chooses the execution process at each stage in a query so if the custom execution wishes to evaluate an algebra operation within another operation, it should call QC.execute. Be careful not to loop endlessly if the operation is itself handled by the custom evaluator. This can be done by swapping in a different OpExecutorFactory.\n// Execute an operation with a different OpExecution Factory // New context. ExecutionContext ec2 = new ExecutionContext(execCxt) ; ec2.setExecutor(plainFactory) ; QueryIterator qIter = QC.execute(op, input, ec2) ; private static OpExecutorFactory plainFactory = new OpExecutorFactory() { @Override public OpExecutor create(ExecutionContext execCxt) { // The default OpExecutor of ARQ. return new OpExecutor(execCxt) ; } } ; Quads If a custom extension provides named graphs, then it may be useful to execute the quad form of the query. This is done by writing a custom query engine and overriding QueryEngineMain.modifyOp:\n@Override protected Op modifyOp(Op op) { op = Substitute.substitute(op, initialInput) ; // Use standard optimizations. op = super.modifyOp(op) ; // Turn into quad form. op = Algebra.toQuadForm(op) ; return op ; } The extension may need to provide its own dataset implementation so that it can detect when queries are directed to its named graph storage. TDB are examples of this.\nMixed Graph Implementation Datasets The dataset implementation used in normal operation does not work on quads but instead can provide a dataset with a collection of graphs each from different implementation sub-systems. In-memory graphs can be mixed with database backed graphs as well as custom storage systems. Query execution proceeds per-graph so that an custom OpExecutor will need to test the graph to work with to make sure it is of the right class. The pattern in the StageGenerator extension point is an example of a design pattern in that situation.\nCustom Query Engines A custom query engine enables an extension to choose which datasets it wishes to handle. It also allows the extension to intercept query execution during the setup of the execution so it can modify the algebra expression, introduce its own algebra extensions, choose which high-level optimizations to apply and also transform to the expression into quad form. Execution can proceed with the normal algorithm or a custom OpExecutor or a custom Stage Generator or a combination of all three extension mechanism.\nOnly a small, skeleton custom query engine is needed to intercept the initial setup. See the example in jena-examples:arq/examples arq.examples.engine.MyQueryEngine.\nWhile it is possible to replace the entire process of query evaluation, this is a substantial endeavour. QueryExecutionBase provides the machinery for result presentation (SELECT, CONSTRUCT, DESCRIBE, ASK), leaving the work of pattern evaluation to the custom query engine.\nAlgebra Extensions New operators can be added to the algebra using the OpExt class as the super-class of the new operator. They can be inserted into the expression to be evaluated using a custom query engine to intercept evaluation initialization. When evaluation of a query requires the evaluation of a sub-class of OpExt, the eval method is called.\n","permalink":"","tags":null,"title":"ARQ - Extending Query Execution"},{"categories":null,"contents":"This page describes function-like operators that can be used in expressions, such as FILTERs, assignments and SELECT expressions.\nThese are not strictly functions - the evaluation semantics of custom functions is to evaluate each argument then call the function with the results of the sub-expressions. Examples in standard SPARQL include bound, which does not evaluate a variable as an expression but just tests whether it is set or not, and boolean operators || and \u0026amp;\u0026amp; which handle errors and do not just evaluate each branch and combining the results.\nThese were previously ARQ extensions but are now legal SPARQL 1.1\nIF The IF form evaluates its first argument to get a boolean result, then evaluates and return the value of the second if the boolean result is true, and the third argument if it is false.\nExamples:\nIF ( ?x\u0026lt;0 , \u0026#34;negative\u0026#34; , \u0026#34;positive\u0026#34; ) # A possible way to do default values. LET( ?z := IF(bound(?z) , ?z , \u0026#34;DftValue\u0026#34; ) ) COALESCE The COALESCEform returns the first argument of its argument list that is bound.\n# Suppose ?y is bound to \u0026#34;y\u0026#34; and ?z to \u0026#34;z\u0026#34; but ?x is not. COALESCE(?x , ?y , ?z) # return \u0026#34;y\u0026#34; ARQ documentation index\n","permalink":"","tags":null,"title":"ARQ - Filter Forms"},{"categories":null,"contents":"java.lang.NoClassDefFoundError : Exception in thread \u0026ldquo;main\u0026rdquo; : The classpath is wrong. Include all the jar files in lib/ before running one of the command line applications.\njava.lang.NoSuchFieldError: actualValueType : This is almost always due to using the wrong version of the Xerces library. Jena and ARQ make use of XML schema support that changed at Xerces 2.6.0 and is not compatible with earlier versions. At the time of writing Jena ships with Xerces 2.6.1.\nIn some situations your runtime environment may be picking up an earlier version of Xerces from an \u0026quot;endorsed\u0026quot; directory. You will need to either disable use of that endorsed library or replace it by a more up to date version of Xerces. This appears to happen with some distributions of Tomcat 5.\\* and certain configurations of JDK 1.4.1. Query Debugging : Look at the data in N3 or Turtle or N-triples. This can give you a better sense of the graph than RDF/XML.\nUse the [command line tools](cmds.html) and a sample of your data to develop a query, especially a complex one. Break your query up into smaller sections. How do I do test substrings of literals? : SPARQL provides regular expression matching which can be used to test for substrings and other forms that SQL\u0026rsquo;s LIKE operator provides.\nExample: find resource with an RDFS label contains the substring \u0026quot;orange\u0026quot;, matching without respecting case of the string. PREFIX rdfs: \u0026lt;\u0026gt; SELECT ?x WHERE { ?x rdfs:label ?v . FILTER regex(?v, \u0026quot;orange\u0026quot;, \u0026quot;i\u0026quot;) } The regular expression matching in ARQ is provided by java.util.regex. Accented characters and characters outside of basic latin ~ SPARQL queries are assumed to be Unicode strings. If typing from a text editor, ensure it is working in UTF-8 and not the operating system native character set. UTF-8 is not the default character set under MS Windows.\nARQ supports \\\\u escape sequences in queries for the input of 16bit codepoints. ARQ does not support 32 bit codepoints (it would require a move to Java 1.5, including all support libraries and checking the codebase for char/codepoint inconsistencies and drop support for Java 1.4). The same is true for data. XML files can be written in any XML-supported character set if the right `?xml` processing instruction is used. The default is UTF-8 or UTF-16. XSD DateTime : Examples of correctly formatted XSD DateTime literals are: these two are actually the same point in time and will test equal in a filter:\n\u0026quot;2005-04-04T04:04:04Z\u0026quot;^^xsd:dateTime \u0026quot;2004-12-31T18:01:00-05:00\u0026quot;^^\u0026lt;\u0026gt; - The timezone is required. - The datatype must be given. String Operations : ARQ provides many of the XPath/XQuery functions and operators including string operations. These include: fn:contains, fn:starts-with, fn:ends-with. See the library page for details of all function provided.\nNote 1: For string operations taken from XQuery/XPath, character positions are numbered from 1, unlike Java where they are numbered from 0. Note 2: `fn:substring` operation takes the length of the substring as the 3rd argument, unlike Java where it is the end index. ARQ documentation index\n","permalink":"","tags":null,"title":"ARQ - Frequently Asked Questions"},{"categories":null,"contents":"The current W3C recommendation of SPARQL 1.1 supports the query results in JSON format. What is described in this page is not that format, but an extension of Apache Jena, which allows users to define how results should be returned in a key/value pair fashion, providing this way a simpler output. This output can be easily used as model for web applications, or inspecting data.\nCompare the output of this extension:\n[ { \u0026quot;book\u0026quot;: \u0026quot;\u0026quot;, \u0026quot;title\u0026quot;: \u0026quot;Harry Potter and the Half-Blood Prince\u0026quot; }, { \u0026quot;book\u0026quot;: \u0026quot;\u0026quot;, \u0026quot;title\u0026quot;: \u0026quot;Harry Potter and the Deathly Hallows\u0026quot; }, ] With the output of the SPARQL 1.1 query result JSON format below:\n{ \u0026quot;head\u0026quot;: { \u0026quot;vars\u0026quot;: [ \u0026quot;book\u0026quot; , \u0026quot;title\u0026quot; ] } , \u0026quot;results\u0026quot;: { \u0026quot;bindings\u0026quot;: [ { \u0026quot;book\u0026quot;: { \u0026quot;type\u0026quot;: \u0026quot;uri\u0026quot; , \u0026quot;value\u0026quot;: \u0026quot;\u0026quot; } , \u0026quot;title\u0026quot;: { \u0026quot;type\u0026quot;: \u0026quot;literal\u0026quot; , \u0026quot;value\u0026quot;: \u0026quot;Harry Potter and the Half-Blood Prince\u0026quot; } } , { \u0026quot;book\u0026quot;: { \u0026quot;type\u0026quot;: \u0026quot;uri\u0026quot; , \u0026quot;value\u0026quot;: \u0026quot;\u0026quot; } , \u0026quot;title\u0026quot;: { \u0026quot;type\u0026quot;: \u0026quot;literal\u0026quot; , \u0026quot;value\u0026quot;: \u0026quot;Harry Potter and the Deathly Hallows\u0026quot; } } ] } } This feature was added in Jena 3.8.0.\nQuery Syntax The JSON syntax is similar in certain ways to the SPARQL CONSTRUCT syntax.\nPREFIX purl: \u0026lt;\u0026gt; PREFIX w3: \u0026lt;\u0026gt; PREFIX : \u0026lt;\u0026gt; JSON { \u0026quot;author\u0026quot;: ?author, \u0026quot;title\u0026quot;: ?title } WHERE { ?book purl:creator ?author . ?book purl:title ?title . FILTER (?author = 'J.K. Rowling') } As in CONSTRUCT, users are able to specify how the output must look like, using a simple key/value pair pattern, which could produce the following output for the query above.\n[ { \u0026quot;author\u0026quot; : \u0026quot;J.K. Rowling\u0026quot; , \u0026quot;title\u0026quot; : \u0026quot;Harry Potter and the Deathly Hallows\u0026quot; } { \u0026quot;author\u0026quot; : \u0026quot;J.K. Rowling\u0026quot; , \u0026quot;title\u0026quot; : \u0026quot;Harry Potter and the Philosopher's Stone\u0026quot; } { \u0026quot;author\u0026quot; : \u0026quot;J.K. Rowling\u0026quot; , \u0026quot;title\u0026quot; : \u0026quot;Harry Potter and the Order of the Phoenix\u0026quot; } { \u0026quot;author\u0026quot; : \u0026quot;J.K. Rowling\u0026quot; , \u0026quot;title\u0026quot; : \u0026quot;Harry Potter and the Half-Blood Prince\u0026quot; } ] Grammar The normative definition of the syntax grammar of the query string is defined in this table:\nRule Expression JsonQuery ::= JsonClause ( DatasetClause )* WhereClause SolutionModifier JsonClause ::= \u0026lsquo;JSON\u0026rsquo; \u0026lsquo;{\u0026rsquo; JsonObjectMember ( \u0026lsquo;,\u0026rsquo; JsonObjectMember )* \u0026lsquo;}\u0026rsquo; JsonObjectMember ::= String \u0026lsquo;:\u0026rsquo; ( Var | RDFLiteral | NumericLiteral | BooleanLiteral ) DatasetClause, WhereClause, SolutionModifier, String, Var, \u0026lsquo;RDFLiteral\u0026rsquo;, NumericLiteral, and \u0026lsquo;BooleanLiteral\u0026rsquo; are as for the SPARQL 1.1 Grammar\nProgramming API ARQ provides 2 additional methods in QueryExecution for JSON.\nIterator\u0026lt;JsonObject\u0026gt; QueryExecution.execJsonItems() JsonArray QueryExecution.execJson() In order to use these methods, it\u0026rsquo;s required to switch on the query syntax of ARQ beforehand, when creating the Query object:\nQuery query = QueryFactory.create(queryString, Syntax.syntaxARQ) String queryString = \u0026quot;JSON { 'name' : ?name, 'age' : ?age } WHERE ... \u0026quot; ... Iterator\u0026lt;JsonObject\u0026gt; json = qexec.execJsonItems() Fuseki Support Users are able to use Fuseki web interface, as well as the other HTTP endpoints to submit queries using any programming language. The following example shows how to POST to the query endpoint passing the query as a form data field.\ncurl -XPOST --data \u0026quot;query=JSON { 'name' : ?name, 'age': ?age } WHERE { ... }\u0026quot; http://localhost:3030/ds/query The web interface editor parses the SPARQL implementation syntax, so syntax errors are expected in the web editor at this moment when using the JSON clause. The query should still be correctly executed, and the results displayed as with other normal SPARQL queries.\nARQ documentation index\n","permalink":"","tags":null,"title":"ARQ - Generate JSON from SPARQL"},{"categories":null,"contents":"@@ Incomplete / misnamed?\nARQ consists of the following parts:\nThe SPARQL abstract syntax tree (AST) and the SPARQL parser\nThe algebra generator that turns SPARQL AST into algebra expressions\nImplementation of the translation in the SPARQL specification. Quad version compiling SPARQL to quad expressions, not basic graph patterns. Query engines to execute queries\nSPARQL protocol client - remote HTTP requests Reference engine - direct implementation of the algebra Quad engine - direct implementation of the algebra except The main engine TDB, a SPARQL database for large-sale persistent data Result set handling for the SPARQL XML results format, the JSON and text versions.\nMain packages Package Use org.apache.jena.query The application API org.apache.jena.sparql.syntax Abstract syntax tree org.apache.jena.sparql.algebra SPARQL algebra org.apache.jena.sparql.lang The parsers: SPARQL, ARQ, RDQL org.apache.jena.sparql.expr Expression code. org.apache.jena.sparql.serializer Output in SPARQL, ARQ forms, in SPARQL syntax, in an abstract form (useful in debugging) and in XML. org.apache.jena.sparql.engine The abstraction of a query engine. org.apache.jena.sparql.engine.main The usual query engine. org.apache.jena.sparql.engine.ref The reference query engine (and quad version) Key Execution Classes Bindings Query Iterators Context ARQ documentation index\n","permalink":"","tags":null,"title":"ARQ - Internal Design"},{"categories":null,"contents":"ARQ supports writing custom SPARQL functions in JavaScript. These functions can be used in FILTERs and for calculating values to assign with AS in BIND and SELECT expressions.\nXSD datatypes for strings, numbers and booleans are converted to the native JavaScript datatypes. RDFterms that do not fit easily into JavaScript datatypes are handled with a object class NV.\nApplications should be aware that there are risks in exposing a script engine with full computational capabilities through SPARQL. Script functions are only as secure as the script engine environment they run in.\nRequirements ARQ requires a javascript engine such as GraalVM to be added to the classpath.\n\u0026lt;properties\u0026gt; \u0026lt;ver.graalvm\u0026gt;....\u0026lt;/ver.graalvm\u0026gt; ... \u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.graalvm.js\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;js\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;${ver.graalvm}\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; \u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.graalvm.js\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;js-scriptengine\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;${ver.graalvm}\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; Enabling and Loading JavaScript functions JavaScript is loaded from an external file using the context setting \u0026ldquo;\u0026quot;. This can be written as arq:js-library for commands and Fuseki configuration files.\nAccess to the script engine must be enabled at runtime. The Java system property to do this is jena:scripting.\nExample:\nexport JVM_ARGS=-Djena:scripting=true sparql --set arq:js-library=SomeFile.js --data ... --query ... and for MS Windows:\nset JVM_ARGS=-Djena:scripting=true sparql --set arq:js-library=SomeFile.js --data ... --query ... will execute on the data with the JavaScript functions from file \u0026ldquo;SomeFile.js\u0026rdquo; available.\nJavaScript functions can also be set from a string directly from within Java using constant ARQ.symJavaScriptFunctions (\u0026ldquo;\u0026quot;).\nWARNING: Enabling this feature exposes the majority of the underlying scripting engine directly to SPARQL queries so may provide a vector for arbitrary code execution. Therefore it is recommended that this feature remain disabled for any publicly accessible deployment that utilises the ARQ query engine.\nIdentifying callable functions The context setting \u0026ldquo;\u0026ldquo;\u0026quot; is used to provide a comma-separated list of function names, which are the local part of the URI, that are allowed to be called as custom script functions.\nThis can be written as arq:scriptAllowList for commands and Fuseki configuration files. It is the java constant ARQ.symCustomFunctionScriptAllowList\nsparql --set arq:js-library=SomeFile.js \\ --set arq:scriptAllowList=toCamelCase,anotherFunction --data ... --query ... and a query of:\nPREFIX js: \u0026lt;\u0026gt; SELECT ?input (js:toCamelCase(?input) AS ?X) { VALUES ?input { \u0026quot;some woRDs to PROCESS\u0026quot; } } Using JavaScript functions SPARQL functions implemented in JavaScript are automatically called when a URI starting \u0026ldquo;\u0026quot; used.\nThis can conveniently be abbreviated by:\nPREFIX js: \u0026lt;\u0026gt; Arguments and Function Results xsd:string (a string with no language tag), any XSD numbers (integer, decimal, float, double and all the derived types) and xsd:boolean are converted to JavaScript string, number and boolean respectively.\nSPARQL functions must return a value. When a function returns a value, it can be one of these JavaScript native datatypes, in which case the reverse conversion is applied back to XSD datatypes. For numbers, the conversion is back to xsd:integer (if it has no fractional part) or xsd:double.\nThe JavaScript function can also create NodeValue (or NV) objects for other datatypes by calling Java from within the JavaScript script engine of the Java runtime.\nURIs are passed as NV object and are available in JavaScript as a string.\nThe class NV is used for all other RDF terms.\nReturning JavaScript null is the error indicator and a SPARQL expression error (ExprEvalException) is raised, like any other expression error in SPARQL. That, in turn, will cause the whole expression the function is part of to evaluate to an error (unless a special form like COALESCE is used). In a FILTER that typically makes the filter evaluate to \u0026ldquo;false\u0026rdquo;.\nExample Suppose \u0026ldquo;functions.js\u0026rdquo; contains code to camel case words in a string. For example, \u0026ldquo;some words to process \u0026quot; becomes \u0026ldquo;someWordsToProcess\u0026rdquo;.\n// CamelCase a string // Words to be combined are separated by a space in the string. function toCamelCase(str) { return str .split(' ') .map(cc) .join(''); } function ucFirst(word) { return word.charAt(0).toUpperCase() + word.slice(1).toLowerCase(); } function lcFirst(word) { return word.toLowerCase(); } function cc(word,index) { return (index == 0) ? lcFirst(word) : ucFirst(word); } and the query Q.rq\nPREFIX js: \u0026lt;\u0026gt; SELECT ?input (js:toCamelCase(?input) AS ?X) { VALUES ?input { \u0026quot;some woRDs to PROCESS\u0026quot; } } which results in:\n-------------------------------------------------- | input | X | ================================================== | \u0026quot;some woRDs to PROCESS\u0026quot; | \u0026quot;someWordsToProcess\u0026quot; | -------------------------------------------------- Use with Fuseki The context setting can be provided on the command line starting the server, for example:\nexport JVM_ARGS=-Djena:scripting=true fuseki --set arq:js-library=functions.js \\ --set arq:scriptAllowList=toCamelCase \\ --mem /ds or it can be specified in the server configuration file config.ttl:\nPREFIX : \u0026lt;#\u0026gt; PREFIX fuseki: \u0026lt;\u0026gt; PREFIX rdf: \u0026lt;\u0026gt; PREFIX rdfs: \u0026lt;\u0026gt; PREFIX ja: \u0026lt;\u0026gt; [] rdf:type fuseki:Server ; # Set the server-wide context ja:context [ ja:cxtName \u0026quot;arq:js-library\u0026quot; ; ja:cxtValue \u0026quot;/filepath/functions.js\u0026quot; ] ; ja:context [ ja:cxtName \u0026quot;arq:scriptAllowList\u0026quot; ; ja:cxtValue \u0026quot;toCamelCase\u0026quot; ] ; . \u0026lt;#service\u0026gt; rdf:type fuseki:Service; rdfs:label \u0026quot;Dataset\u0026quot;; fuseki:name \u0026quot;ds\u0026quot;; fuseki:serviceQuery \u0026quot;sparql\u0026quot;; fuseki:dataset \u0026lt;#dataset\u0026gt; ; . \u0026lt;#dataset\u0026gt; rdf:type ja:DatasetTxnMem; ja:data \u0026lt;file:D.trig\u0026gt;; . and used as:\nexport JVM_ARGS=-Djena:scripting=true fuseki --conf config.ttl ","permalink":"","tags":null,"title":"ARQ - JavaScript SPARQL Functions"},{"categories":null,"contents":"Lateral joins using the keyword LATERAL were introduced in Apache Jena 4.7.0.\nA LATERAL join is like a foreach loop, looping on the results from the left-hand side (LHS), the pattern before the LATERAL keyword, and executing the right-hand side (RHS) query pattern once for each row, with the variables from the input LHS in-scope during each RHS evaluation.\nA regular join only executes the RHS once, and the variables from the LHS are used for the join condition after evaluation of the left and right sub-patterns.\nAnother way to think of a lateral join is as a flatmap.\nExamples:\n## Get exactly one label for each subject with type `:T` SELECT * { ?s rdf:type :T LATERAL { SELECT * { ?s rdfs:label ?label } LIMIT 1 } } ## Get zero or one labels for each subject. SELECT * { ?s ?p ?o LATERAL { OPTIONAL { SELECT * { ?s rdfs:label ?label } LIMIT 1 } } } Syntax The LATERAL keyword which takes the graph pattern so far in the group, from the { starting of the current block, and a { } block afterwards.\nEvaluation Substituting variables from the LHS into the RHS (with the same restrictions), then executing the pattern, gives the evaluation of LATERAL.\nVariable assignment There needs to be a new syntax restriction: there can no variable introduced by AS (BIND, or sub-query) or VALUES in-scope at the top level of the LATERAL RHS, that is the same name as any in-scope variable from the LHS.\nSuch a variable assignment would conflict with the variable being set in variables of the row being joined.\n## ** Illegal ** SELECT * { ?s ?p ?o LATERAL { BIND( 123 AS ?o) } } See SPARQL Grammar note 12.\nIn ARQ, LET would work. LET for a variable that is bound acts like a filter.\nVariable Scopes In looping on the input, a lateral join makes the bindings of variables in the current row available to the right-hand side pattern, setting their value from the top down.\nIn SPARQL, it is possible to have variables of the same name which are not exposed within a sub-select. These are not lateral-joined to a variable of the same name from the LHS.\nThis is not specific to lateral joins. In\nSELECT * { ?s rdf:type :T { SELECT ?label { ?s rdfs:label ?label } } } the inner ?s can be replaced by ?z without changing the results because the inner ?s is not joined to the outer ?s but instead is hidden by the SELECT ?label.\nSELECT * { ?s rdf:type :T { SELECT ?label { ?z rdfs:label ?label } } } The same rule applies to lateral joins.\nSELECT * { ?s rdf:type :T LATERAL { SELECT ?label { ?s rdfs:label ?label } LIMIT 1 } } The inner ?s in the SELECT ?label is not the outer ?s because the SELECT ?label does not pass out ?s. As a sub-query the ?s could be any name except ?label for the same results.\nNotes There is a similarity to filter NOT EXISTS/EXISTS expressed as the non-legal FILTER ( ASK { pattern } ) where the variables of the row being filtered are available to \u0026ldquo;pattern\u0026rdquo;. This is similar to an SQL correlated subquery.\nSPARQL Specification Additional Material Syntax LATERAL is added to the SPARQL grammar at rule [[56] GraphPatternNotTriples]( As a syntax form, it is similar to OPTIONAL.\n[56] GraphPatternNotTriples ::= GroupOrUnionGraphPattern | OptionalGraphPattern | LateralGraphPattern | ... [57] OptionalGraphPattern ::= \u0026#39;OPTIONAL\u0026#39; GroupGraphPattern [ ] LateralGraphPattern ::= \u0026#39;LATERAL\u0026#39; GroupGraphPattern Algebra The new algebra operator is lateral which takes two expressions\nSELECT * { ?s ?p ?o LATERAL { ?a ?b ?c } } is translated to:\n(lateral (bgp (triple ?s ?p ?o)) (bgp (triple ?a ?b ?c))) Evaluation To evaluate lateral:\nEvaluate the first argument (left-hand side from syntax) to get a multiset of solution mappings. For each solution mapping (\u0026ldquo;row\u0026rdquo;), inject variable bindings into the second argument Evaluate this pattern Add to results Outline:\nDefinition: Lateral Let Ω be a multiset of solution mappings. We define: Lateral(Ω, P) = { μ | union of Ω1 where foreach μ1 in Ω: pattern2 = inject(pattern, μ1) Ω1 = eval(D(G), pattern2) result Ω1 } where inject is the corrected substitute operation.\nAn alternative style is to define Lateral more like \u0026ldquo;evaluate P such that μ is in-scope\u0026rdquo; in some way, rather than rely on inject which is a mechanism.\nDefinition: Evaluation of Lateral eval(D(G), Lateral(P1, P2) = Lateral(eval(D(G), P1), P2) ","permalink":"","tags":null,"title":"ARQ - Lateral Join"},{"categories":null,"contents":"ARQ uses SLF4j as the logging API and the query and RIOT commands use Log4J2 as a deployment system. You can use Java 1.4 logging instead.\nARQ does not output any logging messages at level INFO in normal operation. The code uses level TRACE and DEBUG. Running with logging set to an application at INFO will cause no output in normal operation. Output below INFO can be very verbose and is intended mainly to help debug ARQ. WARN and FATAL messages are only used when something is wrong.\nThe root of all the loggers is org.apache.jena. org.apache.jena.query is the application API. org.apache.jena.sparql is the implementation and extensions points.\nIf using in Tomcat, or other system that provides complex class loading arrangements, be careful about loading from jars in both the web application and the system directories as this can cause separate logging systems to be created (this may not matter).\nThe ARQ and RIOT command line utilities look for a file \u0026ldquo;\u0026rdquo; in the current directory to control logging during command execution. There is also a built-in configuration so no configuration work is required.\nLogger Names Name Constant Logger Use ARQ.logInfoName ARQ.getLoggerInfo() General information org.apache.jena.arq.exec ARQ.logExecName ARQ.getLoggerExec() Execution information The reading of from the current directory is achieved by a call to org.apache.jena.atlas.logging.Log.setlog4j2().\nExample file:\nstatus = error name = PropertiesConfig filters = threshold filter.threshold.type = ThresholdFilter filter.threshold.level = INFO appender.console.type = Console = STDOUT appender.console.layout.type = PatternLayout appender.console.layout.pattern = %d{HH:mm:ss} %-5p %-15c{1} :: %m%n rootLogger.level = INFO rootLogger.appenderRef.stdout.ref = STDOUT = org.apache.jena logger.jena.level = INFO = org.apache.jena.arq.exec logger.arq-exec.level = INFO = logger.arq-info.level = INFO = org.apache.jena.riot logger.riot.level = INFO A Fuseki server output can include ARQ execution logging, see Fuseki logging for the configuration.\nExecution Logging ARQ can log query and update execution details globally or for an individual operations. This adds another level of control on top of the logger level controls.\nExplanatory messages are controlled by the Explain.InfoLevel level in the execution context.\nThe logger used is called org.apache.jena.arq.exec. Message are sent at level \u0026ldquo;info\u0026rdquo;. So for log4j2, the following can be set in the file:\ = org.apache.jena.arq.exec logger.arq-exec.level = INFO The context setting is for key (Java constant) ARQ.symLogExec. To set globally:\nARQ.setExecutionLogging(Explain.InfoLevel.ALL) ; and it may also be set on an individual query execution using its local context.\ntry(QueryExecution qExec = QueryExecution.create()... .set(ARQ.symLogExec, Explain.InfoLevel.ALL).build) { ... } On the command line:\narq.query --explain --data data file --query=queryfile The command tdbquery takes the same --explain argument.\nInformation levels\nLevel Effect INFO Log each query FINE Log each query and its algebra form after optimization ALL Log query, algebra and every dataset access (can be expensive) NONE No information logged These can be specified as string, to the command line tools, or using the constants in Explain.InfoLevel.\nqExec.getContext().set(ARQ.symLogExec, Explain.InfoLevel.FINE) ; arq.query --set arq:logExec=FINE --data data file --query=queryfile ARQ documentation index\n","permalink":"","tags":null,"title":"ARQ - Logging"},{"categories":null,"contents":"Negation by Failure (OPTIONAL + !BOUND) Standard SPARQL 1.0 can perform negation using the idiom of OPTIONAL/!BOUND. It is inconvenient and can be hard to use as complexity increases. SPARQL 1.1 supports additional operators for negation.\n# Names of people who have not stated that they know anyone PREFIX foaf: \u0026lt;\u0026gt; SELECT ?name WHERE { ?x foaf:givenName ?name . OPTIONAL { ?x foaf:knows ?who } . FILTER (!BOUND(?who)) } EXISTS and NOT EXISTS The EXISTS and NOT EXISTS are now legal SPARQL 1.1 when used inside a FILTER, they may be used as bare graph patterns only when Syntax.syntaxARQ is used\nThere is the NOT EXISTS operator which acts at the point in the query where it is written. It does not bind any variables but variables already bound in the query will have their bound value.\n# Names of people who have not stated that they know anyone PREFIX foaf: \u0026lt;\u0026gt; SELECT ?name WHERE { ?x foaf:givenName ?name . FILTER NOT EXISTS { ?x foaf:knows ?who } } There is also an EXISTS operator.\n# Names of people where it is stated that they know at least one other person. PREFIX foaf: \u0026lt;\u0026gt; SELECT ?name WHERE { ?x foaf:givenName ?name . FILTER EXISTS { ?x foaf:knows ?who . FILTER(?who != ?x) } } In this example, the pattern is a little more complex. Any graph pattern is allowed although use of OPTIONAL is pointless (which will always match, possible with no additional results).\nNOT EXISTS and EXISTS can also be used in FILTER expressions. In SPARQL, FILTER expressions act over the whole of the basic graph pattern in which they occur.\n# Names of people who have not stated that they know anyone PREFIX foaf: \u0026lt;\u0026gt; SELECT ?name WHERE { ?x foaf:givenName ?name . FILTER (NOT EXISTS { ?x foaf:knows ?who }) } A note of caution:\nPREFIX foaf: \u0026lt;\u0026gt; SELECT ?name WHERE { ?x foaf:givenName ?name . FILTER (NOT EXISTS { ?x foaf:knows ?y }) ?x foaf:knows ?who } is the same as (it\u0026rsquo;s a single basic graph pattern - the filter does not break it in two):\nPREFIX foaf: \u0026lt;\u0026gt; SELECT ?name WHERE { ?x foaf:givenName ?name . ?x foaf:knows ?y . FILTER (NOT EXISTS { ?x foaf:knows ?who }) } and the FILTER will always be false ({ ?x foaf:knows ?y } must have matched to get to this point in the query and using ?who instead makes no difference).\nMINUS SPARQL 1.1 also provides a MINUS keyword which is broadly similar to NOT EXISTS though does have some key differences as explained in the specification:\nPREFIX foaf: \u0026lt;\u0026gt; SELECT ?name WHERE { ?x foaf:givenName ?name . ?x foaf:knows ?y . MINUS { ?x foaf:knows \u0026lt;\u0026gt; } } Here we subtract any solutions where ?x also knows\nOne of the key differences between MINUS and NOT EXISTS is that it is a child graph pattern and so breaks the graph pattern and so the result of the query can change depending where the MINUS is placed. This is unlike the earlier NOT EXISTS examples where moving the position of the FILTER resulted in equivalent queries.\nNOT IN SPARQL 1.1 also has a simpler form of negation for when you simply need to restrict a variable to not being in a given set of values, this is the NOT IN function:\nPREFIX foaf: \u0026lt;\u0026gt; SELECT ?name WHERE { ?x foaf:givenName ?name . ?x foaf:knows ?y . FILTER(?y NOT IN (\u0026lt;\u0026gt;, \u0026lt;\u0026gt;)) } This would filter out matches where the value of ?y is either or\nARQ documentation index\n","permalink":"","tags":null,"title":"ARQ - Negation"},{"categories":null,"contents":"SPARQL is a query language and a remote access protocol. The remote access protocol runs over HTTP.\nSee Fuseki for an implementation of the SPARQL protocol over HTTP. Fuseki uses ARQ to provide SPARQL query access to Jena models, including Jena persistent models.\nARQ includes a query engine capable of using the HTTP version.\nFrom your application The QueryExecutionHTTP has methods for creating a QueryExecution object for remote use. There are various HTTP specific settings; the default should work in most cases.\nThe remote request is made when the execSelect, execConstruct, execDescribe or execAsk method is called.\nThe results are held locally after remote execution and can be processed as usual.\nFrom the command line The arq.rsparql command can issue remote query requests using the --service argument:\njava -cp ... arq.rsparql --service 'http://host/service' --query 'SELECT ?s WHERE {?s [] []}' Or: rsparql \u0026ndash;service \u0026lsquo;http://host/service\u0026rsquo; \u0026ndash;query \u0026lsquo;SELECT ?s WHERE {?s [] []}\u0026rsquo;\nThis takes a URL that is the service location.\nThe query given is parsed locally to check for syntax errors before sending.\nAuthentication ARQ provides a flexible API for authenticating against remote services, see the HTTP Authentication documentation for more details.\nFirewalls and Proxies Don\u0026rsquo;t forget to set the proxy for Java if you are accessing a public server from behind a blocking firewall. Most home firewalls do not block outgoing requests; many corporate firewalls do block outgoing requests.\nIf, to use your web browser, you need to set a proxy, you need to do so for a Java program.\nSimple examples include:\n-DsocksProxyHost=YourSocksServer -DsocksProxyHost=YourSocksServer -DsocksProxyPort=port -Dhttp.proxyHost=WebProxy -Dhttp.proxyPort=Port This can be done in the application if it is done before any network connection are made:\nSystem.setProperty(\u0026quot;socksProxyHost\u0026quot;, \u0026quot;\u0026quot;); Consult the Java documentation for more details. Searching the web is also very helpful.\nARQ documentation index\n","permalink":"","tags":null,"title":"ARQ - Querying Remote SPARQL Services"},{"categories":null,"contents":"RDF collections, also called RDF lists, are difficult to query directly.\nARQ provides a 3 property functions to work with RDF collections. list:member \u0026ndash; members of a list list:index \u0026ndash; index of a member in a list list:length \u0026ndash; length of a list list:member is similar to rdfs:member except for RDF lists. ARQ also provides rdfs:member.\nSee the property functions library page.\nARQ documentation index\n","permalink":"","tags":null,"title":"ARQ - RDF Collections"},{"categories":null,"contents":"The SELECT statement of a query can include expressions, not just variables. This was previously a SPARQL extension but is now legal SPARQL 1.1\nExpressions are enclosed in () and can be optionally named using AS. If no name is given, and internal name is allocated which may not be a legal SPARQL variable name. In order to make results portable in the SPARQL Query Results XML Format, the application must specify the name so using AS is strongly encouraged.\nExpressions can involve group aggregations.\nExpressions that do not correctly evaluate result in an unbound variable in the results. That is, the illegal expression is silently skipped.\nExamples:\nPREFIX : \u0026lt;http://example/\u0026gt; SELECT (?p+1 AS ?q) { :x :p ?p } PREFIX rdf: \u0026lt;\u0026gt; PREFIX : \u0026lt;http://example/\u0026gt; SELECT (count(*) AS ?count) { :x rdf:type :Class } ARQ documentation index\n","permalink":"","tags":null,"title":"ARQ - SELECT Expressions"},{"categories":null,"contents":"A SPARQL query in ARQ goes through several stages of processing:\nString to Query (parsing) Translation from Query to a SPARQL algebra expression Optimization of the algebra expression Query plan determination and low-level optimization Evaluation of the query plan This page describes how to access and use expressions in the SPARQL algebra within ARQ. The definition of the SPARQL algebra is to be found in the SPARQL specification in section 12. ARQ can be extended to modify the evaluation of the algebra form to access different graph storage implementations.\nThe classes for the datastructures for the algebra resize in the package org.apache.jena.sparql.algebra in the op subpackage. All the classes are names \u0026ldquo;Op...\u0026rdquo;; the interface that they all offer is \u0026ldquo;Op\u0026rdquo;.\nViewing the algebra expression for a Query The command line tool arq.qparse will print the algebra form of a query:\narq.qparse --print=op --query=Q.rq arq.qparse --print=op 'SELECT * { ?s ?p ?o}' The syntax of the output is SSE, a simple format for writing data structures involving RDF terms. It can be read back in again to produce the Java form of the algebra expression.\nTurning a query into an algebra expression Getting the algebra expression for a Query is simply a matter of passing the parsed Query object to the transaction function in the Algebra class:\nQuery query = QueryFactory.create(.....) ; Op op = Algebra.compile(query) ; And back again.\nQuery query = OpAsQuery.asQuery(op) ; System.out.println(query.serialize()) ; This reverse translation can handle any algebra expression originally from a SPARQL Query, but not any algebra expression. It is possible to create programmatically useful algebra expressions that can not be turned into a query, especially if they involve algebra. Also, the query produced may not be exactly the same but will yield the same results (for example, filters may be moved because the SPARQL query algebra translation in the SPARQL specification moves filter expressions around).\nDirectly reading and writing algebra expression The SSE class is a collection of functions to parse SSE expressions for the SPARQL algebra but also RDF terms, filter expressions and even dataset and graphs.\nOp op = SSE.parseOp(\u0026quot;(bgp (?s ?p ?o))\u0026quot;) ; // Read a string Op op = SSE.readOp(\u0026quot;filename.sse\u0026quot;) ; // Read a file The SSE class simply calls the appropriate builder operation from the org.apache.jena.sparql.sse.builder package.\nTo go with this, there is a collection of writers for many of the Java structures in ARQ. Op op = ... ; SSE.write(op) ; // Write to stdout Writers default to writing to System.out but support calls to any output stream (it manages the conversion to UTF-8) and ARQ own IndentedWriters form for embedding in structured output. Again, SSE is simply passing the calls to the writer operation from the org.apache.jena.sparql.sse.writer package.\nCreating an algebra expression programmatically See the example in AlgebraExec.\nTo produce the complete javadoc for ARQ, download an ARQ distribution and run the ant task \u0026lsquo;javadoc-all\u0026rsquo;.\nEvaluating a algebra expression QueryIterator qIter = Algebra.exec(op,graph) ; QueryIterator qIter = Algebra.exec(op,datasetGraph) ; Evaluating an algebra expression produces a iterator of query solutions (called Bindings).\nfor ( ; qIter.hasNext() ; ) { Binding b = qIter.nextBinding() ; Node n = b.get(var_x) ; System.out.println(var_x+\u0026quot; = \u0026quot;+FmtUtils.stringForNode(n)) ; } qIter.close() ; Operations of CONSTRUCT, DESCRIBE and ASK are done on top of algebra evaluation. Applications can access this functionality by creating their own QueryEngine (see arq.examples.engine.MyQueryEngine) and it\u0026rsquo;s factory. A query engine is a one-time use object for each query execution.\nARQ documentation index\n","permalink":"","tags":null,"title":"ARQ - SPARQL Algebra"},{"categories":null,"contents":"SPARQL Update is a W3C standard for an RDF update language with SPARQL syntax. It is described in \u0026ldquo;SPARQL 1.1 Update\u0026rdquo;.\nA SPARQL Update request is composed of a number of update operations, so in a single request graphs can be created, loaded with RDF data and modified.\nSome examples of ARQ\u0026rsquo;s SPARQL Update support are to be found in the download in jena-examples:arq/examples/update.\nThe main API classes are:\nUpdateRequest - A list of Update to be performed. UpdateFactory - Create UpdateRequest objects by parsing strings or parsing the contents of a file. UpdateAction - execute updates To execute a SPARQL Update request as a script from a file:\nDataset dataset = ... UpdateAction.readExecute(\u0026quot;\u0026quot;, dataset) ; To execute a SPARQL Update request as a string:\nDataset dataset = ... UpdateAction.parseExecute(\u0026quot;DROP ALL\u0026quot;, dataset) ; The application writer can create and execute operations:\nUpdateRequest request = UpdateFactory.create() ; request.add(\u0026quot;DROP ALL\u0026quot;) .add(\u0026quot;CREATE GRAPH \u0026lt;http://example/g2\u0026gt;\u0026quot;) .add(\u0026quot;LOAD \u0026lt;file:etc/update-data.ttl\u0026gt; INTO \u0026lt;http://example/g2\u0026gt;\u0026quot;) ; // And perform the operations. UpdateAction.execute(request, dataset) ; but be aware that each operation added needs to be a complete SPARQL Update operation, including prefixes if needed.\nARQ documentation index\n","permalink":"","tags":null,"title":"ARQ - SPARQL Update"},{"categories":null,"contents":"ARQ includes support for nested SELECTs. This was previously an ARQ extension but is now legal SPARQL 1.1\nNested SELECT A SELECT query can be placed inside a graph pattern to produce a table that is used within the outer query. A nested SELECT statement is enclosed in {} and is the only element in that group.\nExample: find toys with more than five orders:\nPREFIX : \u0026lt;http://example/\u0026gt; SELECT ?x { ?x a :Toy . { SELECT ?x ( count(?order) as ?q ) { ?x :order ?order } GROUP BY ?x } FILTER ( ?q \u0026gt; 5 ) } ARQ documentation index\n","permalink":"","tags":null,"title":"ARQ - Sub Queries"},{"categories":null,"contents":"ARQ uses URIs of the form \u0026lt;java:\u0026lt;i\u0026gt;package.class\u0026lt;/i\u0026gt;\u0026gt; to provide dynamic loading of code for value functions and property functions. ARQ loads the class when needed. For functions and property functions, it also wraps it in the necessary factory code. A new instance of the function or property function is created for each mention of the name in each query.\nDynamic Code Loading Any classes loaded by ARQ must already be on the java classpath. ARQ does not create any new class loaders, nor modify the Java class path in any way. The class path must be set up to include any class files or jar files for dynamically loaded code.\nClasses can be mor conveniently named in queries using SPARQL PREFIXes but because dots can\u0026rsquo;t appear in the local part of a prefixed name, all the package name and the final dot must be in the PREFIX declaration.\nPREFIX fn: \u0026lt;java:org.example.functions.\u0026gt; # Including the final dot ... FILTER fn:alter(?x) ... Remapping All code loading is performed via the MappedLoader class. Before actually loading the code, the mapped loader applies any transformation of URIs. For example, the ARQ function library has a namespace of \u0026lt;\u0026gt; and resides in the Java package org.apache.jena.sparql.function.library. The mapped loader includes a partial rewrite rule turning http URLs starting with that namespace into java: URIs using the package name.\n","permalink":"","tags":null,"title":"ARQ - The java: URI scheme"},{"categories":null,"contents":"Applications can add SPARQL functions to the query engine. This is done by writing a class implementing the right interface, then either registering it or using the fake java: URI scheme to dynamically call the function.\nWriting SPARQL Value Functions A SPARQL value function is an extension point of the SPARQL query language that allows URI to name a function in the query processor.\nIn the ARQ engine, code to implement function must implement the interface org.apache.jena.sparql.function.Function although it is easier to work with one of the abstract classes for specific numbers of arguments like org.apache.jena.sparql.function.FunctionBase1 for one argument functions. Functions do not have to have a fixed number of arguments.\nThe abstract class FunctionBase, the superclass of FunctionBase1 to FunctionBase4, evaluates its arguments and calls the implementation code with argument values (if a variable was unbound, an error will have been generated) It is possible to get unevaluated arguments but care must be taken not to violate the rules of function evaluation. The standard functions that access unevaluated arguments are the logical \u0026lsquo;or\u0026rsquo; and logical \u0026lsquo;and\u0026rsquo; operations that back || and \u0026amp;\u0026amp; are special forms to allow for the special exception handling rules.\nNormally, function should be a pure evaluation based on its argument. It should not access a graph nor return different values for the same arguments (to allow expression optimization). Usually, these requirements can be better met with a property function. Functions can\u0026rsquo;t bind variables; this would be done in a property function as well.\nExample: (this is the max function in the standard ARQ library):\npublic class max extends FunctionBase2 { public max() { super() ; } public NodeValue exec(NodeValue nv1, NodeValue nv2) { return Functions.max(nv1, nv2) ; } } The function takes two arguments and returns a single value. The class NodeValue represents values and supports value-based operations. NodeValue value support includes the XSD datatypes, xsd:decimal and all its subtypes like xsd:integer and xsd:byte, xsd\u0026rsquo;;double, xsd:float, xsd:boolean, xsd:dateTime and xsd:date. Literals with language tags are also treated as values in additional \u0026ldquo;value spaces\u0026rdquo; determined by the language tag without regard to case.\nThe Functions class contains the core XML Functions and Operators operations. Class NodeFunctions contains the implementations of node-centric operations like isLiteral and str.\nIf any of the arguments are wrong, then the function should throw ExprEvalException.\nExample: calculate the canonical namespace from a URI (calls the Jena operation for the actual work):\npublic class namespace extends FunctionBase1 { public namespace() { super() ; } public NodeValue exec(NodeValue v) { Node n = v.asNode() ; if ( ! n.isURI() ) throw new ExprEvalException(\u0026quot;Not a URI: \u0026quot;+FmtUtils.stringForNode(n)) ; String str = n.getNameSpace() ; return NodeValue.makeString(str) ; } } This throws an evaluation exception if it is passed a value that is not a URI.\nThe standard library, in package org.apache.jena.sparql.function.library, contains many examples.\nRegistering Functions The query compiler finds functions based on the functions URI. There is a global registry of known functions, but any query execution can have its own function registry.\nFor each function, there is a function factory associated with the URI. A new function instance is created for each use of a function in each query execution.\n// Register with the global registry. FunctionRegistry.get().put(\u0026quot;\u0026quot;, new MyFunctionFactory()) ; A common case is registering a specific class for a function implementation so there is an addition method that takes a class, wraps in a built-in function factory and registers the function implementation.\n// Register with the global registry. FunctionRegistry.get().put(\u0026quot;\u0026quot;, MyFunction.class) ; Another convenience route to function calling is to use the java: URI scheme. This dynamically loads the code, which must be on the Java classpath. With this scheme, the function URI gives the class name. There is automatic registration of a wrapper into the function registry. This way, no explicit registration step is needed by the application and queries issues with the command line tools can load custom functions.\nPREFIX f: \u0026lt;java:app.myFunctions.\u0026gt; ... FILTER f:myTest(?x, ?y) ... FILTER (?x + f:myIntToXSD(?y)) ... ARQ documentation index\n","permalink":"","tags":null,"title":"ARQ - Writing Filter Functions"},{"categories":null,"contents":"ARQ - Writing Property Functions\nSee also Writing Filter Functions.\nApplications can add SPARQL property functions to the query engine. This is done by first implementing the PropertyFunction interface, and then either registering that function or using the fake java: URI scheme to dynamically load the function.\nWriting SPARQL Property Functions\nSimilar to SPARQL Filter Functions, a SPARQL Property Function is an extension point of the SPARQL query language that allows a URI to name a function in the query processor. A key difference is that Property Functions may generate new bindings.\nJust like org.apache.jena.sparql.function.Function there are various utility classes provided to simplify the creation of a Property Function. The selection of one depends on the \u0026lsquo;style\u0026rsquo; of the desired built-in. For example, PFuncSimple is expected to be the predicate of triple patterns ?such ex:as ?this, where neither argument is an rdf:list, and either may be a variable. Alternatively, PFuncAssignToObject assumes that the subject will be bound, while the object will be a variable.\nPropertyFunction | |--PropertyFunctionBase | |--PropertyFunctionEval | |--PFuncSimpleAndList | |--PFuncSimple | |--PFuncAssignToObject | |--PFuncAssignToSubject | |--PFuncListAndSimple | |--PFuncListAndList The choice of extension point determines the function signature that the developer will need to implement, and primarily determines whether some of the arguments will be org.apache.jena.graph.Nodes or org.apache.jena.sparql.pfunction.PropFuncArgs. In the latter case, the programmer can determine whether the argument is a list as well as how many arguments it consists of.\nRegistration\nEvery property function is associated with a particular org.apache.jena.sparql.util.Context. This allows you to limit the availability of the function to be global or associated with a particular dataset. For example, a custom Property Function may expose an index which only has meaning with respect to some set of data.\nAssuming you have an implementation of org.apache.jena.sparql.pfunction.PropertyFunctionFactory (shown later), you can register a function as follows:\nfinal PropertyFunctionRegistry reg = PropertyFunctionRegistry.chooseRegistry(ARQ.getContext()); reg.put(\u0026quot;urn:ex:fn#example\u0026quot;, new ExamplePropertyFunctionFactory()); PropertyFunctionRegistry.set(ARQ.getContext(), reg); The only difference between global and dataset-specific registration is where the Context object comes from:\nfinal Dataset ds = DatasetFactory.createGeneral(); final PropertyFunctionRegistry reg = PropertyFunctionRegistry.chooseRegistry(ds.getContext()); reg.put(\u0026quot;urn:ex:fn#example\u0026quot;, new ExamplePropertyFunctionFactory()); PropertyFunctionRegistry.set(ds.getContext(), reg); Note that org.apache.jena.sparql.pfunction.PropertyFunctionRegistry has other put methods that allow registration by passing a Class object, as well.\nImplementation\nThe implementation of a Property Function is actually quite straight forward once one is aware of the tools at their disposal to do so. For example, if we wished to create a Property Function that returns no results regardless of their arguments we could do so as follows:\npublic class ExamplePropertyFunctionFactory implements PropertyFunctionFactory { @Override public PropertyFunction create(final String uri) {\treturn new PFuncSimple() { @Override public QueryIterator execEvaluated(final Binding parent, final Node subject, final Node predicate, final Node object, final ExecutionContext execCtx) {\treturn QueryIterNullIterator.create(execCtx); } }; } } Node and PropFuncArg objects allow the developer to reflect on the state of the arguments, and choose what bindings to generate given the intended usage of the Property Function. For example, if the function expects a list of three bound arguments for the object of the property, then it can throw a ExprEvalException (or derivative) to indicate incorrect use. It is the responsibility of the developer to identify what parts of the argument are bound, and to respond appropriately.\nFor example, if ?a ex:f ?b were a triple pattern in a query, it could be called with ?a bound, ?b bound, or neither. It may make sense to return new bindings that include ?b if passed a concrete value for ?a, or conversely to generate new bindings for ?a when passed a concrete ?b. If both ?a and ?b are bound, and the function wishes to confirm that the pairing is valid, it can return the existing binding. If there are no valid solutions to return, then an empty solution may be presented.\nThere are several extremely useful implementations of QueryIterator within the Jena library that make it easy to support typical use cases.\nOf particular note:\nQueryIterNullIterator - to indicate that there are no valid solutions/bindings for the given values QueryIterSingleton - to provide a single solution/binding for the given values QueryIterPlainWrapper - to provide multiple solutions/bindings for the given values The second two cases require instances of Binding objects which can be obtained through static methods of BindingFactory. Creation of Binding objects will also require references to Var and NodeFactory\nNote that it can make a lot of sense to generate the Iterator\u0026lt;Binding\u0026gt; for QueryIterPlainWrapper by means of Jena\u0026rsquo;s ExtendedIterator. This can allow domain-specific value to be easily mapped to Binding objects in a lazy fashion.\nGraph Operations\nAdditional operations on the current, or another, Graph can be achieved through the Execution Context. Once retrieved the Graph can be operated upon directly, queried or wrapped in a Model, if preferred.\n// Retrieve current Graph. Graph graph = execCxt.getActiveGraph(); // Wrap Graph in a Model. Model model = ModelFactory.createModelForGraph(graph); Access another graph:\n// Retrieve DatasetGraph of current Graph. DatasetGraph datasetGraph = execCxt.getDataset(); // Retrieve a different Graph in the Dataset. Node otherGraphNode = NodeFactory.createURI(\u0026quot;\u0026quot;); Graph otherGraph = datasetGraph.getNamedGraph(otherGraphNode); // Access the other graph ExtendedIterator\u0026lt;Triple\u0026gt; iter = otherGraph.find(...); ","permalink":"","tags":null,"title":"ARQ - Writing Property Functions"},{"categories":null,"contents":"Read the following first:\nFrequently Asked Questions Submitting a support request or bug reports The documentation Support for ARQ is provided via the Jena mailing list \u0026lt;\u0026gt;.\nARQ documentation index\n","permalink":"","tags":null,"title":"ARQ – Support"},{"categories":null,"contents":"For details on downloading ARQ, please see the Jena downloads page.\nARQ documentation index\n","permalink":"","tags":null,"title":"ARQ Downloads"},{"categories":null,"contents":"ARQ - Property Paths A property path is a possible route through a graph between two graph nodes. A trivial case is a property path of length exactly one, which is a triple pattern.\nMost property paths are now legal SPARQL 1.1 syntax, there are some advanced property paths which are syntactic extensions and are only available if the query is parsed with language Syntax.syntaxARQ.\nPath Language A property path expression (or just \u0026lsquo;path\u0026rsquo;) is similar to a string regular expression but over properties, not characters. ARQ determines all matches of a path expression and binds subject or object as appropriate. Only one match is recorded - no duplicates for any given path expression, although if the path is used in a situation where it\u0026rsquo;s initial points is already repeated in a pattern, then this duplication is preserved.\nPath example Meaning dc:title | rdfs:label Dublin Core title or an RDFS label. foaf:knows/foaf:name Name of people one \u0026ldquo;knows\u0026rdquo; steps away. foaf:knows/foaf:knows/foaf:name Name of people two \u0026ldquo;knows\u0026rdquo; steps away. In the description below, uri is either a URI or a prefixed name.\nSyntax Form Matches uri A URI or a prefixed name. A path of length one. ^elt Reverse path (object to subject) (elt) A group path elt, brackets control precedence. elt1 / elt2 A sequence path of elt1, followed by elt2 elt1 | elt2 A alternative path of elt1, or elt2 (both possibilities are tried) elt* A path of zero or more occurrences of elt. elt+ A path of one or more occurrences of elt. elt? A path of zero or one elt. !uri A path matching a property which isn\u0026rsquo;t uri (negated property set) !(uri1|\u0026hellip;|uriN) A path matching a property which isn\u0026rsquo;t any of uri1 ... uri (negated property set) ARQ extensions: to use these you must use Syntax.syntaxARQ\nSyntax Form Matches elt1 ^ elt2 Shorthand for elt1 / ^elt2, that is elt1 followed by reverse elt2. elt{n,m} A path between n and m occurrences of elt. elt{n} Exactly n occurrences of elt. A fixed length path. elt{n,} n or more occurrences of elt. elt{,n} Between 0 and n occurrences of elt. Precedence:\nURI, prefixed names Negated property set Groups Unary ^ reverse links Unary operators *, ?, + and {} forms Binary operators / and ^ Binary operator | Precedence is left-to-right within groups.\nPath Evaluation Paths are \u0026ldquo;simple\u0026rdquo; if they involve only operators / (sequence), ^ (reverse, unary or binary) and the form {n}, for some single integer n. Such paths are fixed length. They are translated to triple patterns by the query compiler and do not require special path-evaluation at runtime.\nA path of just a URI is still a single triple pattern.\nA path is \u0026ldquo;complex\u0026rdquo; if it involves one or more of the operators *,?, + and {}. Such paths require special evaluation and provide expressivity outside of strict SPARQL because paths can be of variable length. When used with models backed by SQL databases, complex path expressions may take some time.\nA path of length zero connects a graph node to itself.\nCycles in paths are possible and are handled.\nPaths do not need to be anchored at one end of the other, although this can lead to large numbers of result because the whole graph is searched.\nProperty functions in paths are only available for simple paths.\nExtended Language This involves is syntactic extension and is available if the query is parsed with language Syntax.syntaxARQ.\nPaths can be directly included in the query in the property position of a triple pattern:\nPREFIX : \u0026lt;http://example/\u0026gt; PREFIX rdf: \u0026lt;\u0026gt; PREFIX rdfs: \u0026lt;\u0026gt; # Find the types of :x, following subClassOf SELECT * { :x rdf:type/rdfs:subClassOf* ?t } Examples Simple Paths Find the name of any people that Alice knows.\n{ ?x foaf:mbox \u0026lt;mailto:alice@example\u0026gt; . ?x foaf:knows/foaf:name ?name . } Find the names of people 2 \u0026ldquo;foaf:knows\u0026rdquo; links away.\n{ ?x foaf:mbox \u0026lt;mailto:alice@example\u0026gt; . ?x foaf:knows/foaf:knows/foaf:name ?name . } This is the same as the strict SPARQL query:\n{ ?x foaf:mbox \u0026lt;mailto:alice@example\u0026gt; . ?x foaf:knows [ foaf:knows [ foaf:name ?name ]]. } or, with explicit variables:\n{ ?x foaf:mbox \u0026lt;mailto:alice@example\u0026gt; . ?x foaf:knows ?a1 . ?a1 foaf:knows ?a2 . ?a2 foaf:name ?name . } Because someone Alice knows may well know Alice, the example above may include Alice herself. This could be avoided with:\n{ ?x foaf:mbox \u0026lt;mailto:alice@example\u0026gt; . ?x foaf:knows/foaf:knows ?y . FILTER ( ?x != ?y ) ?y foaf:name ?name } These two are the same query: the second is just reversing the property direction which swaps the roles of subject and object.\n{ ?x foaf:mbox \u0026lt;mailto:alice@example\u0026gt; } { \u0026lt;mailto:alice@example\u0026gt; ^foaf:mbox ?x } Mutual foaf:knows relationships: ?x knows someone who knows ?x\n{ ?x foaf:knows^foaf:knows ?x . } Negated property sets define matching by naming one or more properties that must not match. Match if there is a triple from ?x to ?y which is not rdf:type.\n{ ?x !rdf:type ?y . } { ?x !(rdf:type|^rdf:type) ?y . } Only properties and reverse properties are allowed in a negated property set, not a full path expression.\nComplex Paths Find the names of all the people can be reached from Alice by foaf:knows:\n{ ?x foaf:mbox \u0026lt;mailto:alice@example\u0026gt; . ?x foaf:knows+/foaf:name ?name . } Again, because of cycles in foaf:knows relationships, it is likely to include Alice herself.\nSome forms of limited inference are possible as well. For example: all types and supertypes of a resource:\n{ \u0026lt;http://example/\u0026gt; rdf:type/rdfs:subClassOf* ?type } All resources and all their inferred types:\n{ ?x rdf:type/rdfs:subClassOf* ?type } Use with Legal SPARQL Syntax A path can parsed, then installed as a property function to be referred to by URI. This way, when the URI is used in the predicate location in a triple pattern, the path expression is evaluated.\nPath path = ... String uri = ... PathLib.install(uri, path) ; For example:\nPath path = PathParser.parse(\u0026quot;rdf:type/rdfs:subClassOf*\u0026quot;, PrefixMapping.Standard) ; String uri = \u0026quot;http://example/ns#myType\u0026quot; ; PathLib.install(uri, path) ; and the SPARQL query:\nPREFIX : \u0026lt;http://example/\u0026gt; PREFIX ns: \u0026lt;http://example/ns#\u0026gt; # Find the types of :x, following subClassOf SELECT * { :x ns:myType ?t} This also works with if an existing property is redefined (a URI in a path expression is not interpreted as a property function) so, for example, rdf:type can be redefined as a path that also considers RDFS sub -class relationships. The path is a complex path so the property function for rdf:type is not triggered.\nPath path = PathParser.parse(\u0026quot;rdf:type/rdfs:subClassOf*\u0026quot;, PrefixMapping.Standard) ; PathLib.install(RDF.type.getURI(), path) ; ARQ documentation index\n","permalink":"","tags":null,"title":"ARQ Property Paths"},{"categories":null,"contents":"The name of the project is “Apache Jena”. That should appear as the first use in a paper and in a reference. After that \u0026ldquo;Jena\u0026rdquo; can be used. It is also a trademark of the Apache Software Foundation. This is also the industry practice.\nThe reference should indicate the website (https is preferable). If relevant to reproducibility, or discussing performance, the release version number MUST also be included. The date of access would also be helpful to the reader.\nYou can use names such as “TDB” and “Fuseki” on their own. They are informal names to parts of the whole system. They also change over time and versions. You could say “Apache Jena Fuseki” for the triplestore but as the components function as part of the whole, “Apache Jena” would be accurate.\nThe first paper citing Jena is Jena: implementing the semantic web recommendations. That only covers the API and its implementation. Some parts of the system mentioned in that paper have been dropped a long time ago (e.g. the “RDB” system). The paper is also prior to the move to under the Apache Software Foundation. It is also good to acknowledge Brian McBride, who started the project.\nHere is an example of what a citation may look like:\nApache Software Foundation, 2021. Apache Jena, Available at: ","permalink":"","tags":null,"title":"Citing Jena"},{"categories":null,"contents":"Apache Jena initializes uses Java\u0026rsquo;s ServiceLoader mechanism to locate initialization steps. The documentation for process in Jena is available here.\nThere are a number of files (Java resources) in Jena jars named:\nMETA-INF/services/org.apache.jena.sys.JenaSubsystemLifecycle Each has different contents, usually one or two lines.\nWhen making a combined jar (\u0026ldquo;uber-jar\u0026rdquo;, jar with dependencies) from Jena dependencies and application code, the contents of the Jena files must be combined and be present in the combined jar as a java resource of the same name.\nMaven The maven shade plugin is capable of doing this process in a build using a \u0026ldquo;transformer\u0026rdquo;.\nThe Apache Jena uses the shade plugin technique itself to make the combined jar for Fuseki. It uses the maven shade plugin with a transformer.\nThis is an extract from the POM:\n\u0026lt;plugin\u0026gt; \u0026lt;groupId\u0026gt;org.apache.maven.plugins\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;maven-shade-plugin\u0026lt;/artifactId\u0026gt; \u0026lt;configuration\u0026gt; ... \u0026lt;transformers\u0026gt; \u0026lt;transformer implementation=\u0026#34;org.apache.maven.plugins.shade.resource.ServicesResourceTransformer\u0026#34;/\u0026gt; ... other transformers ... \u0026lt;/transformers\u0026gt; \u0026lt;/configuration\u0026gt; \u0026lt;/plugin\u0026gt; See jena-fuseki2/jena-fuseki-server/pom.xml for the complete shade plugin setup used by Fuseki.\nGradle For Gradle, the shadowJar plugin has the mergeServiceFiles operation.\nplugins { ... id \u0026#34;com.github.johnrengelman.shadow\u0026#34; version \u0026#34;7.1.2\u0026#34; } shadowJar { mergeServiceFiles() } ... Manual assembling If doing manually, create a single file (META-INF/services/org.apache.jena.sys.JenaSubsystemLifecycle) in your application jar containing the lines of all the services resource files. The order does not matter. Jena calls modules in the right order.\n","permalink":"","tags":null,"title":"Combining Apache Jena jars"},{"categories":null,"contents":"Jena includes various command-line utilities which can help you with a variety of tasks in developing Jena-based applications.\nIndex of tools schemagen using schemagen from maven Setting up your Environment An environment variable JENA_HOME is used by all the command line tools to configure the class path automatically for you. You can set this up as follows:\nOn Linux / Mac\nexport JENA_HOME=the directory you downloaded Jena to export PATH=$PATH:$JENA_HOME/bin On Windows\nSET JENA_HOME =the directory you downloaded Jena to SET PATH=%PATH%;%JENA_HOME%\\bat Running the Tools Once you\u0026rsquo;ve done the above you should now be able to run the tools from the command line like so:\nOn Linux / Mac\nsparql --version On Windows\nsparql.bat --version This command will simply print the versions of Jena and ARQ used in your distribution, all the tools support the --version option. To find out how to use a specific tool add the --help flag instead.\nNote that many examples of using Jena tools typically use the Linux style invocation because most of the Jena developers work on Linux/Mac platforms. When running on windows simply add .bat as an extension to the name of the command line tool to run it, on some versions of Windows this may not be required.\nCommon Issues with Running the Tools If you receive errors stating that a class is not found then it is most likely that JENA_HOME is not set correctly. As a quick check you can try the following to see if it is set appropriately:\nOn Linux / Mac\ncd $JENA_HOME On Windows\ncd %JENA_HOME% If this command fails then JENA_HOME is not correctly set, please ensure you have set it correctly and try again.\nWindows users may experience problems if trying to run the tools when their JENA_HOME path contains spaces in it, there are two workarounds for this:\nMove your Jena installation to a path without spaces Grab the latest scripts from main where they have been fixed to safely handle this. Future releases will include this fix and resolve this issue Command Line Tools Quick Reference riot and Related See Reading and Writing RDF in Apache Jena for more information.\nriot: parse RDF data, guessing the syntax from the file extension. Assumes that standard input is N-Quads/N-Triples unless you tell it otherwise with the --syntax parameter. riot can also do RDFS inferencing, count triples, convert serializations, validate syntax, concatenate datasets, and more.\nturtle, ntriples, nquads, trig, rdfxml: specialized versions of riot that assume that the input is in the named serialization.\nrdfparse: parse an RDF/XML document, for which you can usually just use riot, but this can also pull triples out of rdf:RDF elements embedded at arbitrary places in an XML document if you need to deal with those.\nSPARQL Queries on Local Files and Endpoints See ARQ - Command Line Applications for more about these.\narq and sparql: run a query in a file named as a command line parameter on a dataset in one or more files named as command line parameters.\nqparse: parse a query, report on any problems, and output a pretty-printed version of the query.\nuparse: do the same thing as qparse but for update requests.\nrsparql: send a local query to a SPARQL endpoint specified with a URL, giving you the same choice of output formats that arq does.\nrupdate: send a local update query to a SPARQL endpoint specified with a URL, assuming that is accepting updates from you.\nQuerying and Manipulating Fuseki Datasets The following utilities let you work with data stored using a local Fuseki triplestore. They can be useful for automating queries and updates of data stored there. Each requires an assembler file pointing at a dataset as a parameter; Fuseki creates these for you.\nFor each pair of utilities shown, the first is used with data stored using the TDB format and the second with data stored using the newer and more efficient TDB2 format.\nThe TDB and TDB2 - Command Line Tools pages describe these further.\ntdbquery, tdb2.tdbquery: query a dataset that has been stored with Fuseki.\ntdbdump, tdb2.tdbdump: dump the contents of a Fuseki dataset to standard out.\ntdbupdate, tdb2.tdbupdate: run an update request against a Fuseki dataset.\ntdbloader, tdb2.tdbloader: load a data from a file into a Fuseki dataset.\ntdbstats, tdb2.tdbstats: output a short report of information about a Fuseki dataset.\ntdbbackup, tdb2.tdbbackup: create a gzipped copy of the Fuseki dataset\u0026rsquo;s triples.\nnot implemented for TDB1, tdb2.tdbcompact: reduce the size of the Fuseki dataset.\nOther Handy Command Line Tools shacl: validate a dataset against a set of shapes and constraints described in a file that conforms to the W3C SHACL standard. Jena\u0026rsquo;s SHACL page has more on this utility.\nshex: validate data using ShEx from the W3C Shape Expressions Community Group. Jena\u0026rsquo;s ShEx page has more on this utility.\nrdfdiff: compare the triples in two datasets, regardless of their serializations, and list which are different between the two datasets. (Modeled on the UNIX diff utility.)\niri: Parse a IRI and tell you about it, with errors and warnings. Good for checking for issues like proper escaping.\n","permalink":"","tags":null,"title":"Command-line and other tools for Jena developers"},{"categories":null,"contents":" All datasets provide transactions. This is the preferred way to handle concurrenct access to data. Applications need to be aware of the concurrency issues in access Jena models. API operations are not thread safe by default. Thread safety would simple ensure that the model data-structures remained intact but would not give an application consistent access to the RDF graph. It would also limit the throughput of multi-threaded applications on multiprocessor machines where true concurrency can lead to a reduction in response time.\nFor example, supposed an application wishes to read the name and age of a person from model. This takes two API calls. It is more convenient to be able to read that information in a consistent fashion, knowing that the access to the second piece of information is not being done after some model change has occurred.\nSpecial care is needed with iterators. In general, Jena\u0026rsquo;s iterators do not take a copy to enable safe use in the presence of concurrent update. A multi-threaded application needs to be aware of these issues and correctly use the mechanisms that Jena provides (or manage its own concurrency itself). While not zero, the application burden is not high.\nThere are two main cases:\nMultiple threads in the same JVM. Multiple applications accessing the same persistent model (typically, a database). This note describes the support for same-JVM, multi-threaded applications using in-memory Jena Models.\nLocks Locks provide critical section support for managing the interactions of multiple threads in the same JVM. Jena provides multiple-reader/single-writer concurrency support (MRSW).\nThe pattern general is:\nModel model = . . . ; model.enterCriticalSection(Lock.READ) ; // or Lock.WRITE try { ... perform actions on the model ... ... obey contract - no update operations if a read lock } finally { model.leaveCriticalSection() ; } Applications are expected to obey the lock contract, that is, they must not do update operations if they have a read lock as there can be other application threads reading the model concurrently.\nIterators Care must be taken with iterators: unless otherwise stated, all iterators must be assumed to be iterating over the data-structures in the model or graph implementation itself. It is not possible to safely pass these out of critical sections.\n","permalink":"","tags":null,"title":"Concurrent access to Models"},{"categories":null,"contents":"As noted in the overview Jena JDBC drivers are built around a core library which implements much of the common functionality required in an abstract way. This means that it is relatively easy to build a custom driver just by relying on the core library and implementing a minimum of one class.\nCustom Driver class The one and only thing that you are required to do to create a custom driver is to implement a class that extends JenaDriver. This requires you to implement a constructor which simply needs to call the parent constructor with the relevant inputs, one of these is your driver specific connection URL prefix i.e. the foo in jdbc:jena:foo:. Implementation specific prefixes must conform to the regular expression [A-Za-z\\d\\-_]+: i.e. some combination of alphanumerics, hyphens and underscores terminated by a colon.\nAdditionally you must override and implement two abstract methods connect() and getPropertyInfo(). The former is used to produce an instance of a JenaConnection while the latter provides information that may be used by tools to present users with some form of user interface for configuring a connection to your driver.\nAn important thing to note is that this may be all you need to do to create a custom driver, it is perfectly acceptable for your connect() implementation to just return one of the implementations from the built-in drivers. This may be useful if you are writing a driver for a specific store and wish to provide simplified connection URL parameters and create the appropriate connection instance programmatically.\nCustom Connection class The next stage in creating a custom driver (where necessary) is to create a class derived from JenaConnection. This has a somewhat broader set of abstract methods which you will need to implement such as createStatementInternal() and various methods which you may optionally override if you need to deviate from the default behaviors.\nIf you wish to go down this route then we recommend looking at the source for the built in implementations to guide you in this. It may be easier to extend one of the built-in implementations rather than writing an entire custom implementation yourself.\nNote that custom implementations may also require you to implement custom JenaStatement and JenaPreparedStatement implementations.\nTesting your Driver To aid testing your custom driver the jena-jdbc-core module provides a number of abstract test classes which can be derived from in order to provide a wide variety of tests for your driver implementation. This is how all the built in drivers are tested so you can check out their test sources for examples of this.\n","permalink":"","tags":null,"title":"Creating a Custom Jena JDBC Driver"},{"categories":null,"contents":"Introduction Jena is a moderately complicated system, with several different kinds of Model and ways of constructing them. This note describes the Jena ModelFactory, a one-stop shop for creating Jena models. ModelFactory lives in Java package org.apache.jena.rdf.model.\nThis note is an introduction, not an exhaustive description. As usual consult the Javadoc for details of the methods and classes to use.\nSimple model creation The simplest way to create a model (if not the shortest) is to call ModelFactory.createDefaultModel(). This [by default] delivers a plain RDF model, stored in-memory, that does no inference and has no special ontology interface.\nDatabase model creation For methods of creating models for TDB please see the relevant reference sections.\nInference model creation An important feature of Jena is support for different kinds of inference over RDF-based models (used for RDFS and OWL). Inference models are constructed by applying reasoners to base models and optionally schema. The statements deduced by the reasoner from the base model then appear in the inferred model alongside the statements from the base model itself. RDFS reasoning is directly available:\ncreateRDFSModel(Model base) creates an inference model over the base model using the built-in RDFS inference rules and any RDFS statements in the base model.\ncreateRDFSModel(Model schema, Model base) creates an RDFS inference model from the base model and the supplied schema model. The advantage of supplying the schema separately is that the reasoner may be able to compute useful information in advance on the assumption that the schema won\u0026rsquo;t change, or at least not change as often as the base model.\nIt\u0026rsquo;s possible to use other reasoning systems than RDFS. For these a Reasoner is required:\ncreateInfModel(Reasoner reasoner, Model base) creates an inference model using the rules of reasoner over the model base.\ncreateInfModel(Reasoner reasoner, Model schema, Model base) Just as for the RDFS case, the schema may be supplied separately to allow the reasoner to digest them before working on the model.\nFrom where do you fetch your reasoners? From the reasoner registry, the class ReasonerRegistry. This allows reasoners to be looked up by name, but also provides some predefined access methods for well-know reasoners:\ngetOWLReasoner(): the reasoner used for OWL inference\ngetRDFSReasoner(): the reasoner used for RDFS inference\ngetTransitiveReasoner(): a reasoner for doing subclass and sub-property closure.\nOntology model creation An ontology model is one that presents RDF as an ontology - classes, individuals, different kinds of properties, and so forth. Jena supports RDFS and OWL ontologies through profiles. There is extensive documentation on Jena\u0026rsquo;s ontology support, so all we\u0026rsquo;ll do here is summarise the creation methods.\ncreateOntologyModel() Creates an ontology model which is in-memory and presents OWL ontologies.\ncreateOntologyModel(OntModelSpec spec, Model base) Creates an ontology model according the OntModelSpec spec which presents the ontology of base.\ncreateOntologyModel(OntModelSpec spec, ModelMaker maker, Model base) Creates an OWL ontology model according to the spec over the base model. If the ontology model needs to construct additional models (for OWL imports), use the ModelMaker to create them. [The previous method will construct a MemModelMaker for this.]\nWhere do OntModelSpecs come from? There\u0026rsquo;s a cluster of constants in the class which provide for common uses; to name but three:\nOntModelSpec.OWL_MEM_RDFS_INF OWL ontologies, model stored in memory, using RDFS entailment only\nOntModelSpec.RDFS_MEM RDFS ontologies, in memory, but doing no additional inferences\nOntModelSpec.OWL_DL_MEM_RULE_INF OWL ontologies, in memory, with the full OWL Lite inference\nCreating models from Assembler descriptions A model can be built from a description of the required model. This is documented in the assembler howto. Access to the assembler system for model creation is provided by three ModelFactory methods:\nassembleModelFrom( Model singleRoot ): assemble a Model from the single Model description in singleRoot. If there is no such description, or more than one, an exception is thrown. If a description has to be selected from more than one available candidates, consider using the methods below.\nfindAssemblerRoots( Model m ): answer a Set of all the Resources in m which are of type ja:Model, ie descriptions of models to assemble. (Note that this will include sub-descriptions of embedded models if they are present.)\nassembleModelFrom( Resource root ): answer a Model assembled according to the description hanging from root. Assemblers can construct other things as well as models, and the Assembler system is user-extensible: see the howto for details.\nFile-based models The method ModelFactory.createFileModelMaker(String) returns a ModelMaker which attaches models to filing-system files. The String argument is the fileBase. When a file-ModelMaker opens a file, it reads it from a file in the directory named by the fileBase; when the model is closed (and only then, in the current implementation), the contents of the model are written back to the file.\nBecause the names of models in a modelMaker can be arbitrary character strings, in particular URIs, they are translated slightly to avoid confusion with significant characters of common filing systems. In the current implementation,\ncolon : is converted to \\_C slash / is converted to \\_S underbar _ is converted to \\_U ModelMakers Plain models can be given names which allows them to be \u0026ldquo;saved\u0026rdquo; and looked up by name later. This is handled by implementations of the interface ModelMaker; each ModelMaker produces Models of the same kind. The simplest kind of ModelMaker is a memory model maker, which you get by calling ModelFactory.createMemModelMaker(). The methods you\u0026rsquo;d want to use to start with on a ModelMaker are:\ncreateModel(String): create a model with the given name in the ModelMaker. If a model with that name already exists, then that model is used instead.\nopenModel(String): open an existing model with the given name. If no such model exists, create a new empty one and give it that name. [createModel(String) and openModel(String) behave in the same way, but each has a two-argument form for which the behaviour is different. Use whichever one best fits your intention.]\ncreateModel(): create a fresh anonymous model.\ngetModel(): each ModelMaker has a default model; this method returns that model.\nThere are other methods, for removing models, additional control over create vs open, closing the maker, and looking names up; for those consult the ModelMaker JavaDoc.\nMiscellany Finally, ModelFactory contains a collection of methods for some special cases not conveniently dealt with elsewhere.\ncreateModelForGraph(Graph g) is used when an advanced user with access to the Jena SPI has constructed or obtained a Graph and wishes to present it as a model. This method wraps the graph up as a plain model. Alterations to the graph are visible in the model, and vice versa.\n","permalink":"","tags":null,"title":"Creating Jena models"},{"categories":null,"contents":" This page covers the jena-csv module which has been retired. The last release of Jena with this module is Jena 3.9.0. See jena-csv/ The original documentation.\n","permalink":"","tags":null,"title":"CSV PropertyTable"},{"categories":null,"contents":" This page covers the jena-csv module which has been retired. The last release of Jena with this module is Jena 3.9.0. See jena-csv/ This is the original documentation.\nThis module is about getting CSVs into a form that is amenable to Jena SPARQL processing, and doing so in a way that is not specific to CSV files. It includes getting the right architecture in place for regular table shaped data, using the core abstraction of PropertyTable.\nIllustration\nThis module involves the basic mapping of CSV to RDF using a fixed algorithm, including interpreting data as numbers or strings.\nSuppose we have a CSV file located in “file:///c:/town.csv”, which has one header row, two data rows:\nTown,Population Southton,123000 Northville,654000 As RDF this might be viewable as:\n@prefix : \u0026lt;file:///c:/town.csv#\u0026gt; . @prefix csv: \u0026lt;http://w3c/future-csv-vocab/\u0026gt; . [ csv:row 1 ; :Town \u0026quot;Southton\u0026quot; ; :Population “123000”^^ ] . [ csv:row 2 ; :Town \u0026quot;Northville\u0026quot; ; :Population “654000”^^ ] . or without the bnode abbreviation:\n@prefix : \u0026lt;file:///c:/town.csv#\u0026gt; . @prefix csv: \u0026lt;http://w3c/future-csv-vocab/\u0026gt; . _:b0 csv:row 1 ; :Town \u0026quot;Southton\u0026quot; ; :Population “123000”^^ . _:b1 csv:row 2 ; :Town \u0026quot;Northville\u0026quot; ; :Population “654000”^^ Each row is modeling one \u0026ldquo;entity\u0026rdquo; (here, a population observation). There is a subject (a blank node) and one predicate-value for each cell of the row. Row numbers are added because it can be important. Now the CSV file is viewed as a graph - normal, unmodified SPARQL can be used. Multiple CSVs files can be multiple graphs in one dataset to give query across different data sources.\nWe can use the following SPARQL query for “Towns over 500,000 people” mentioned in the CSV file:\nSELECT ?townName ?pop { GRAPH \u0026lt;file:///c:/town.csv\u0026gt; { ?x :Town ?townName ; :Popuation ?pop . FILTER(?pop \u0026gt; 500000) } } What\u0026rsquo;s more, we make some room for future extension through PropertyTable. The architecture is designed to be able to accommodate any table-like data sources, such as relational databases, Microsoft Excel, etc.\nDocumentation Get Started Design Implementation ","permalink":"","tags":null,"title":"CSV PropertyTable"},{"categories":null,"contents":"Architecture The architecture of CSV PropertyTable mainly involves 2 components:\nPropertyTable GraphPropertyTable PropertyTable A PropertyTable is collection of data that is sufficiently regular in shape it can be treated as a table. That means each subject has a value for each one of the set of properties. Irregularity in terms of missing values needs to be handled but not multiple values for the same property. With special storage, a PropertyTable\nis more compact and more amenable to custom storage (e.g. a JSON document store) can have custom indexes on specific columns can guarantee access orders More explicitly, PropertyTable is designed to be a table of RDF terms, or Nodes in Jena. Each Column of the PropertyTable has an unique columnKey Node of the predicate (or p for short). Each Row of the PropertyTable has an unique rowKey Node of the subject (or s for short). You can use getColumn() to get the Column by its columnKey Node of the predicate, while getRow() for Row.\nA PropertyTable should be constructed in this workflow (in order):\nCreate Columns using PropertyTable.createColumn() for each Column of the PropertyTable Create Rows using PropertyTable.createRow() for each Row of the PropertyTable For each Row created, set a value (Node) at the specified Column, by calling Row.setValue() Once a PropertyTable is built, tabular data within can be accessed by the API of PropertyTable.getMatchingRows(), PropertyTable.getColumnValues(), etc.\nGraphPropertyTable GraphPropertyTable implements the Graph interface (read-only) over a PropertyTable. This is subclass from GraphBase and implements find(). The graphBaseFind()(for matching a Triple) and propertyTableBaseFind()(for matching a whole Row) methods can choose the access route based on the find arguments. GraphPropertyTable holds/wraps a reference of the PropertyTable instance, so that such a Graph can be treated in a more table-like fashion.\nNote: Both PropertyTable and GraphPropertyTable are NOT restricted to CSV data. They are supposed to be compatible with any table-like data sources, such as relational databases, Microsoft Excel, etc.\nGraphCSV GraphCSV is a sub class of GraphPropertyTable aiming at CSV data. Its constructor takes a CSV file path as the parameter, parse the file using a CSV Parser, and makes a PropertyTable through PropertyTableBuilder.\nFor CSV to RDF mapping, we establish some basic principles:\nSingle-Value and Regular-Shaped CSV Only In the CSV-WG, it looks like duplicate column names are not going to be supported. Therefore, we just consider parsing single-valued CSV tables. There is the current editor working draft from the CSV on the Web Working Group, which is defining a more regular data out of CSV. This is the target for the CSV work of GraphCSV: tabular regular-shaped CSV; not arbitrary, irregularly shaped CSV.\nNo Additional CSV Metadata A CSV file with no additional metadata is directly mapped to RDF, which makes a simpler case compared to SQL-to-RDF work. It\u0026rsquo;s not necessary to have a defined primary column, similar to the primary key of database. The subject of the triple can be generated through one of:\nThe triples for each row have a blank node for the subject, e.g. something like the illustration The triples for row N have a subject URI which is \u0026lt;FILE#_N\u0026gt;. Data Type for Typed Literal All the values in CSV are parsed as strings line by line. As a better option for the user to turn on, a dynamic choice which is a posh way of saying attempt to parse it as an integer (or decimal, double, date) and if it passes, it\u0026rsquo;s an integer (or decimal, double, date). Note that for the current release, all of the numbers are parsed as double, and date is not supported yet.\nFile Path as Namespace RDF requires that the subjects and the predicates are URIs. We need to pass in the namespaces (or just the default namespaces) to make URIs by combining the namespaces with the values in CSV. We don’t have metadata of the namespaces for the columns, But subjects can be blank nodes which is useful because each row is then a new blank node. For predicates, suppose the URL of the CSV file is file:///c:/town.csv, then the columns can be \u0026lt;file:///c:/town.csv#Town\u0026gt; and \u0026lt;file:///c:/town.csv#Population\u0026gt;, as is showed in the illustration.\nFirst Line of Table Header Needed as Predicates The first line of the CSV file must be the table header. The columns of the first line are parsed as the predicates of the RDF triples. The RDF triple data are parsed starting from the second line.\nUTF-8 Encoded Only The CSV files must be UTF-8 encoded. If your CSV files are using Western European encodings, please change the encoding before using CSV PropertyTable.\n","permalink":"","tags":null,"title":"CSV PropertyTable - Design"},{"categories":null,"contents":"Using CSV PropertyTable with Apache Maven See \u0026ldquo;Using Jena with Apache Maven\u0026rdquo; for full details.\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-csv\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;X.Y.Z\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; Using CSV PropertyTable from Java through the API In order to switch on CSV PropertyTable, it\u0026rsquo;s required to register LangCSV into Jena RIOT, through a simple method call:\nimport org.apache.jena.propertytable.lang.CSV2RDF; ... CSV2RDF.init() ; It\u0026rsquo;s a static method call of registration, which needs to be run just one time for an application before using CSV PropertyTable (e.g. during the initialization phase).\nOnce registered, CSV PropertyTable provides 2 ways for the users to play with (i.e. GraphCSV and RIOT):\nGraphCSV GraphCSV wrappers a CSV file as a Graph, which makes a Model for SPARQL query:\nModel model = ModelFactory.createModelForGraph(new GraphCSV(\u0026quot;data.csv\u0026quot;)) ; QueryExecution qExec = QueryExecutionFactory.create(query, model) ; or for multiple CSV files and/or other RDF data:\nModel csv1 = ModelFactory.createModelForGraph(new GraphCSV(\u0026quot;data1.csv\u0026quot;)) ; Model csv2 = ModelFactory.createModelForGraph(new GraphCSV(\u0026quot;data2.csv\u0026quot;)) ; Model other = ModelFactory.createModelForGraph(otherGraph) ; Dataset dataset = ... ; dataset.addNamedModel(\u0026quot;http://example/table1\u0026quot;, csv1) ; dataset.addNamedModel(\u0026quot;http://example/table2\u0026quot;, csv2) ; dataset.addNamedModel(\u0026quot;http://example/other\u0026quot;, other) ; ... normal SPARQL execution ... You can also find the full examples from GraphCSVTest.\nIn short, for Jena ARQ, a CSV table is actually a Graph (i.e. GraphCSV), without any differences from other types of Graphs when using it from the Jena ARQ API.\nRIOT When LangCSV is registered into RIOT, CSV PropertyTable adds a new RDF syntax of \u0026lsquo;.csv\u0026rsquo; with the content type of \u0026ldquo;text/csv\u0026rdquo;. You can read \u0026ldquo;.csv\u0026rdquo; files into Model following the standard RIOT usages:\n// Usage 1: Direct reading through Model Model model_1 = ModelFactory.createDefaultModel()\u0026quot;test.csv\u0026quot;) ; // Usage 2: Reading using RDFDataMgr Model model_2 = RDFDataMgr.loadModel(\u0026quot;test.csv\u0026quot;) ; For more information, see Reading RDF in Apache Jena.\nNote that, the requirements for the CSV files are listed in the documentation of Design. CSV PropertyTable only supports single-Value, regular-Shaped, table-headed and UTF-8-encoded CSV files (NOT Microsoft Excel files).\nCommand Line Tool csv2rdf is a tool for direct transforming from CSV to the formatted RDF syntax of N-Triples. The script calls the csv2rdf java program in the riotcmd package in this way:\njava -cp ... riotcmdx.csv2rdf inputFile ... It transforms the CSV inputFile into N-Triples. For example,\njava -cp ... riotcmdx.csv2rdf src/test/resources/test.csv The script reuses Common framework for running RIOT parsers, so that it also accepts the same arguments (type \u0026quot;riot --help\u0026quot; to get command line reminders) from RIOT Command line tools:\n--validate: Checking mode: same as --strict --sink --check=true --check=true/false: Run with checking of literals and IRIs either on or off. --sink: No output of triples or quads in the standard output (i.e. System.out). --time: Output timing information. ","permalink":"","tags":null,"title":"CSV PropertyTable - Get Started"},{"categories":null,"contents":"PropertyTable Implementations There are 2 implementations for PropertyTable. The pros and cons are summarised in the following table:\nPropertyTable Implementation Description Supported Indexes Advantages Disadvantages PropertyTableArrayImpl implemented by a two-dimensioned Java array of Nodes SPO, PSO compact memory usage, fast for querying with S and P, fast for query a whole Row slow for query with O, table Row/Column size provided PropertyTableHashMapImpl implemented by several Java HashMaps PSO, POS fast for querying with O, table Row/Column size not required more memory usage for HashMaps By default, [PropertyTableArrayImpl](( is used as the PropertyTable implementation held by GraphCSV. If you want to switch to PropertyTableHashMapImpl, just use the static method of GraphCSV.createHashMapImpl() to replace the default new GraphCSV() way. Here is an example:\nModel model_csv_array_impl = ModelFactory.createModelForGraph(new GraphCSV(file)); // PropertyTableArrayImpl Model model_csv_hashmap_impl = ModelFactory.createModelForGraph(GraphCSV.createHashMapImpl(file)); // PropertyTableHashMapImpl StageGenerator Optimization for GraphPropertyTable Accessing from SPARQL via Graph.find() will work, but it\u0026rsquo;s not ideal. Some optimizations can be done for processing a SPARQL basic graph pattern. More explicitly, in the method of OpExecutor.execute(OpBGP, ...), when the target for the query is a GraphPropertyTable, it can get a whole Row, or Rows, of the table data and match the pattern with the bindings.\nThe optimization of querying a whole Row in the PropertyTable are supported now. The following query pattern can be transformed into a Row querying, without generating triples:\n?x :prop1 ?v . ?x :prop2 ?w . ... It\u0026rsquo;s made by using the extension point of StageGenerator, because it\u0026rsquo;s now just concerned with BasicPattern. The detailed workflow goes in this way:\nSplit the incoming BasicPattern by subjects, (i.e. it becomes multiple sub BasicPatterns grouped by the same subjects. (see QueryIterPropertyTable ) For each sub BasicPattern, if the Triple size within is greater than 1 (i.e. at least 2 Triples), it\u0026rsquo;s turned into a Row querying, and processed by QueryIterPropertyTableRow, else if it contains only 1 Triple, it goes for the traditional Triple querying by graph.graphBaseFind() In order to turn on this optimization, we need to register the StageGeneratorPropertyTable into ARQ context, before performing SPARQL querying:\nStageGenerator orig = (StageGenerator)ARQ.getContext().get(ARQ.stageGenerator) ; StageGenerator stageGenerator = new StageGeneratorPropertyTable(orig) ; StageBuilder.setGenerator(ARQ.getContext(), stageGenerator) ; ","permalink":"","tags":null,"title":"CSV PropertyTable - Implementation"},{"categories":null,"contents":"Fuseki can provide access control at the level on the server, on datasets, on endpoints and also on specific graphs within a dataset. It also provides native https to protect data in-flight.\nFuseki Main provides some common patterns of authentication and also Graph level Data Access Control to provide control over the visibility of graphs within a dataset, including the union graph of a dataset and the default graph. Currently, Graph level access control only applies to read-only datasets.\nFuseki Full (Fuseki with the UI) can be used when run in a web application server such as Tomcat to provide authentication of the user. See \u0026ldquo;Fuseki Security\u0026rdquo; for configuring security over the whole of the Fuseki UI.\nThis page applies to Fuseki Main.\nHTTPS HTTPS support is configured from the fuseki server command line.\nServer Argument \u0026ndash;https=SETUP Name of file for certificate details. \u0026ndash;httpsPort=PORT The port for https Default: 3043 The --https argument names a file in JSON which includes the name of the certificate file and password for the certificate.\nHTTPS certificate details file The file is a simple JSON file:\n{ \"cert\": KEYSTORE , \"passwd\": SECRET } This file must be protected by file access settings so that it can only be read by the userid running the server. One way is to put the keystore certificate and the certificate details file in the same directory, then make the directory secure.\nSelf-signed certificates A self-signed certificate provides an encrypted link to the server and stops some attacks. What it does not do is guarantee the identity of the host name of the Fuseki server to the client system. A signed certificate provides that through the chain of trust. A self-signed certificate does protect data in HTTP responses.\nA self-signed certificate can be generated with:\nkeytool -keystore keystore -alias jetty -genkey -keyalg RSA For information on creating a certificate, see the Jetty documentation for generating certificates.\nAuthentication Authentication, is establishing the identity of the principal (user or program) accessing the system. Fuseki Main provides users/password setup and HTTP authentication, digest or basic).\nThese should be used with HTTPS.\nServer Argument \u0026ndash;passwd=FILE Password file \u0026ndash;auth= \u0026ldquo;basic\u0026rdquo; or \u0026ldquo;digest\u0026rdquo; Default is \u0026ldquo;digest\u0026rdquo; These can also be given in the server configuration file:\n\u0026lt;#server\u0026gt; rdf:type fuseki:Server ; fuseki:passwd \"password_file\" ; fuseki:auth \"digest\" ; ... The format of the password file is:\nusername: password and passwords can be stored in hash or obfuscated form.\nDocumentation of the Eclipse Jetty Password file format.\nIf different authentication is required, the full facilities of Eclipse Jetty configuration are available - see the section below.\nUsing curl See the curl documentation for full details. This section is a brief summary of some relevant options:\ncurl argument Value \u0026ndash; -n, --netrc Take passwords from .netrc (_netrc on windows) --user= user:password Set the user and password (visible to all on the local machine) --anyauth Use server nominated authentication scheme --basic Use HTTP basic auth --digest Use HTTP digest auth -k, --insecure Don\u0026rsquo;t check HTTPS certificate. This allows for self-signed or expired certificates, or ones with the wrong host name. Using wget See the wget documentation for full details. This section is a brief summary of some relevant options:\nwget argument Value \u0026ndash; --http-user user name Set the user. --http-password password Set the password (visible to all on the local machine) wget uses users/password from .wgetrc or .netrc by default. --no-check-certificate Don\u0026rsquo;t check HTTPS certificate. This allows for self-signed or expired, certificates or ones with the wrong host name. Access Control Lists ACLs can be applied to the server as a whole, to a dataset, to endpoints, and to graphs within a dataset. This section covers server, dataset and endpoint access control lists. Graph-level access control is covered below.\nAccess control lists (ACL) as part of the server configuration file.\nfuseki --conf configFile.ttl ACLs are provided by the fuseki:allowedUsers property\nFormat of fuseki:allowedUsers The list of users allowed access can be an RDF list or repeated use of the property or a mixture. The different settings are combined into one ACL.\nfuseki:allowedUsers \u0026quot;user1\u0026quot;, \u0026quot;user2\u0026quot;, \u0026quot;user3\u0026quot;; fuseki:allowedUsers \u0026quot;user3\u0026quot;; fuseki:allowedUsers ( \u0026quot;user1\u0026quot; \u0026quot;user2\u0026quot; \u0026quot;user3\u0026quot;) ; There is a special user name \u0026ldquo;*\u0026rdquo; which means \u0026ldquo;any authenticated user\u0026rdquo;.\nfuseki:allowedUsers \u0026quot;*\u0026quot; ; Server Level ACLs \u0026lt;#server\u0026gt; rdf:type fuseki:Server ; fuseki:allowedUsers \"user1\", \"user2\", \"user3\"; ... fuseki:services ( ... ) ; ... . A useful pattern is:\n\u0026lt;#server\u0026gt; rdf:type fuseki:Server ; fuseki:allowedUsers \"*\"; ... fuseki:services ( ... ) ; ... . which requires all access to to be authenticated and the allowed users are those in the password file.\nDataset Level ACLs When there is an access control list on the fuseki:Service, it applies to all requests to the endpoints of the dataset.\nAny server-wide \u0026ldquo;allowedUsers\u0026rdquo; configuration also applies and both levels must allow the user access.\n\u0026lt;#service_auth\u0026gt; rdf:type fuseki:Service ; rdfs:label \"ACL controlled dataset\" ; fuseki:name \"db-acl\" ; fuseki:allowedUsers \"user1\", \"user3\"; ## Choice of operations. fuseki:endpoint [ fuseki:operation fuseki:query ; fuseki:name \"sparql\" ]; fuseki:endpoint [ fuseki:operation fuseki:update ; fuseki:name \"sparql\" ] ; fuseki:endpoint [ fuseki:operation fuseki:gsp-r ; fuseki:name \"get\" ] ; fuseki:dataset \u0026lt;#base_dataset\u0026gt;; . Endpoint Level ACLs An access control list can be applied to an individual endpoint. Again, any other \u0026ldquo;allowedUsers\u0026rdquo; configuration, service-wide, or server-wide) also applies.\nfuseki:endpoint [ fuseki:operation fuseki:query ; fuseki:name \u0026quot;query\u0026quot; ; fuseki:allowedUsers \u0026quot;user1\u0026quot;, \u0026quot;user2\u0026quot; ; ]; fuseki:endpoint [ fuseki:operation fuseki:update ; fuseki:name \u0026quot;update\u0026quot; ; fuseki:allowedUsers \u0026quot;user1\u0026quot; ] ; Only user1 can use SPARQL update; both user1 and user2 can use SPARQL query.\nGraph Access Control Lists Graph level access control is defined using a specific dataset implementation for the service.\n\u0026lt;#access_dataset\u0026gt; rdf:type access:AccessControlledDataset ; access:registry ... ; access:dataset ... ; . Graph ACLs are defined in a Graph Security Registry which lists the users and graph URIs.\n\u0026lt;#service_tdb2\u0026gt; rdf:type fuseki:Service ; rdfs:label \"Graph-level access controlled dataset\" ; fuseki:name \"db-graph-acl\" ; ## Read-only operations on the dataset URL. fuseki:endpoint [ fuseki:operation fuseki:query ] ; fuseki:endpoint [ fuseki:operation fuseki:gsp_r ] ; fuseki:dataset \u0026lt;#access_dataset\u0026gt; ; . # Define access on the dataset. \u0026lt;#access_dataset\u0026gt; rdf:type access:AccessControlledDataset ; access:registry \u0026lt;#securityRegistry\u0026gt; ; access:dataset \u0026lt;#tdb_dataset_shared\u0026gt; ; . \u0026lt;#securityRegistry\u0026gt;rdf:type access:SecurityRegistry ; . . . \u0026lt;#tdb_dataset_shared\u0026gt; rdf:type tdb:DatasetTDB ; . . . All dataset storage types are supported. TDB1 and TDB2 have special implementations for handling graph access control.\nGraph Security Registry The Graph Security Registry is defined as a number of access entries in either a list format \u0026ldquo;(user graph1 graph2 \u0026hellip;)\u0026rdquo; or as RDF properties access:user and access:graphs. The property access:graphs has graph URI or a list of URIs as its object.\n\u0026lt;#securityRegistry\u0026gt; rdf:type access:SecurityRegistry ; access:entry ( \u0026quot;user1\u0026quot; \u0026lt;http://host/graphname1\u0026gt; \u0026lt;http://host/graphname2\u0026gt; ) ; access:entry ( \u0026quot;user1\u0026quot; \u0026lt;http://host/graphname3\u0026gt; ) ; access:entry ( \u0026quot;user1\u0026quot; \u0026lt;urn:x-arq:DefaultGraph\u0026gt; ) ; access:entry ( \u0026quot;user2\u0026quot; \u0026lt;http://host/graphname9\u0026gt; ) ; access:entry [ access:user \u0026quot;user3\u0026quot; ; access:graphs ( \u0026lt;http://host/graphname3\u0026gt; \u0026lt;http://host/graphname4\u0026gt; ) ] ; access:entry [ access:user \u0026quot;user3\u0026quot; ; access:graphs \u0026lt;http://host/graphname5\u0026gt; ] ; access:entry [ access:user \u0026quot;userZ\u0026quot; ; access:graphs \u0026lt;http://host/graphnameZ\u0026gt; ] ; . Jetty Configuration For authentication configuration not covered by Fuseki configuration, the deployed server can be run using a Jetty configuration.\nServer command line: \u0026ndash;jetty=jetty.xml.\nDocumentation for jetty.xml.\n","permalink":"","tags":null,"title":"Data Access Control for Fuseki"},{"categories":null,"contents":"This page describes support for accessing data with additional statements derived using RDFS. It supports rdfs:subClassOf, rdfs:subPropertyOf, rdfs:domain and rdfs:range. It does not provide RDF axioms. The RDFS vocabulary is not included in the data.\nIt does support use with RDF datasets, where each graph in the dataset has the same RDFS vocabulary applied to it.\nThis is not a replacement for the Jena RDFS Reasoner support which covers full RDFS inference.\nThe data is updateable, and graphs can be added and removed from the dataset. The vocabulary can not be changed during the lifetime of the RDFS dataset.\nAPI: RDFSFactory The API provides operation to build RDF-enabled datasets from data storage and vocabularies:\nExample:\nDatasetGraph data = ... // Load the vocabulary Graph vocab = RDFDataMgr.loadGraph(\u0026#34;vocabulary.ttl\u0026#34;); // Create a DatasetGraph with RDFS DatasetGraph dsg = datasetRDFS(DatasetGraph data, Graph vocab ); // (Optional) Present as a Dataset. Dataset dataset = DatasetFactory.wrap(dsg); The vocabulary is processed to produce datastructure needed for processing the data efficiently at run time. This is the SetupRDFS class that can be created and shared; it is thread-safe.\nSetupRDFS setup = setupRDFS(vocab); Assembler: RDFS Dataset Datasets with RDFS can be built with an assembler:\n\u0026lt;#rdfsDS\u0026gt; rdf:type ja:DatasetRDFS ; ja:rdfsSchema \u0026lt;vocabulary.ttl\u0026gt;; ja:dataset \u0026lt;#baseDataset\u0026gt; ; . \u0026lt;#baseDataset\u0026gt; rdf:type ...some dataset type ... ; ... . where \u0026lt;#baseDataset\u0026gt; is the definition of the dataset to be enriched.\nAssembler: RDFS Graph It is possible to build a single Model:\n\u0026lt;#rdfsGraph\u0026gt; rdf:type ja:GraphRDFS ; ja:rdfsSchema \u0026lt;vocabulary.ttl\u0026gt;; ja:graph \u0026lt;#baseGraph\u0026gt; ; . \u0026lt;#baseGraph\u0026gt; rdf:type ja:MemoryModel; ... More generally, inference models can be defined using the Jena Inference and Rule engine: jena-fuseki2/examples/config-inference-1.ttl.\nUse with Fuseki The files for this example are available at: jena-fuseki2/examples/rdfs.\nFrom the command line (here, loading data from a file into an in-memory dataset):\nfuseki-server --data data.trig --rdfs vocabulary.ttl /dataset or from a configuration file with an RDFS Dataset:\nPREFIX : \u0026lt;#\u0026gt; PREFIX fuseki: \u0026lt;\u0026gt; PREFIX rdf: \u0026lt;\u0026gt; PREFIX rdfs: \u0026lt;\u0026gt; PREFIX ja: \u0026lt;\u0026gt; [] rdf:type fuseki:Server ; fuseki:services ( :service ) . ## Fuseki service /dataset with SPARQ query ## /dataset?query= :service rdf:type fuseki:Service ; fuseki:name \u0026#34;dataset\u0026#34; ; fuseki:endpoint [ fuseki:operation fuseki:query ] ; fuseki:endpoint [ fuseki:operation fuseki:update ] ; fuseki:dataset :rdfsDataset ; . ## RDFS :rdfsDataset rdf:type ja:DatasetRDFS ; ja:rdfsSchema \u0026lt;file:vocabulary.ttl\u0026gt;; ja:dataset :baseDataset; . ## Transactional in-memory dataset. :baseDataset rdf:type ja:MemoryDataset ; ja:data \u0026lt;file:data.trig\u0026gt;; . Querying the Fuseki server With the SOH tools, a query (asking for plain text output):\ns-query --service http://localhost:3030/dataset --output=text --file query.rq or with curl:\ncurl --data @query.rq \\ --header \u0026#39;Accept: text/plain\u0026#39; \\ --header \u0026#39;Content-type: application/sparql-query\u0026#39; \\ http://localhost:3030/dataset will return:\n------------------------- | s | p | o | ========================= | :s | ns:p | :o | | :s | rdf:type | ns:D | | :o | rdf:type | ns:T1 | | :o | rdf:type | ns:T3 | | :o | rdf:type | ns:T2 | ------------------------- Files data.trig:\nPREFIX : \u0026lt;http://example/\u0026gt; PREFIX ns: \u0026lt;http://example/ns#\u0026gt; :s ns:p :o . vocabulary.ttl:\nPREFIX xsd: \u0026lt;\u0026gt; PREFIX rdf: \u0026lt;\u0026gt; PREFIX rdfs: \u0026lt;\u0026gt; PREFIX skos: \u0026lt;\u0026gt; PREFIX list: \u0026lt;\u0026gt; PREFIX ns: \u0026lt;http://example/ns#\u0026gt; ns:T1 rdfs:subClassOf ns:T2 . ns:T2 rdfs:subClassOf ns:T3 . ns:p rdfs:domain ns:D . ns:p rdfs:range ns:T1 . query.rq:\nPREFIX : \u0026lt;http://example/\u0026gt; PREFIX ns: \u0026lt;http://example/ns#\u0026gt; PREFIX rdf: \u0026lt;\u0026gt; SELECT * { ?s ?p ?o } ","permalink":"","tags":null,"title":"Data with RDFS Inferencing"},{"categories":null,"contents":"ModelChangedListener In Jena it is possible to monitor a Model for changes, so that code can be run after changes are applied without the coding for that Model having to do anything special. We call these changes \u0026ldquo;events\u0026rdquo;. This first design and implementation is open for user comment and we may refine or reduce the implementation as more experience is gained with it.\nTo monitor a Model, you must register a ModelChangedListener with that Model:\nModel m = ModelFactory.createDefaultModel(); ModelChangedListener L = new MyListener(); m.register( L ); MyListener must be an implementation of ModelChangedListener, for example:\nclass MyListener implements ModelChangedListener { public void addedStatement( Statement s ) { System.out.println( \u0026#34;\u0026gt;\u0026gt; added statement \u0026#34; + s ); } public void addedStatements( Statement [] statements ) {} public void addedStatements( List statements ) {} public void addedStatements( StmtIterator statements ) {} public void addedStatements( Model m ) {} public void removedStatement( Statement s ) {} public void removedStatements( Statement [] statements ) {} public void removedStatements( List statements ) {} public void removedStatements( StmtIterator statements ) {} public void removedStatements( Model m ) {} } This listener ignores everything except the addition of single statements to m; those it prints out. The listener has a method for each of the ways that statements can be added to a Model:\nas a single statement, Model::add(Statement) as an element of an array of statements, Model::add(Statement[]) as an element of a list of statements, Model::add(List) as an iterator over statements, Model::add(StmtIterator) as part of another Model, Model::add(Model) (Similarly for delete.)\nThe listener method is called when the statement(s) have been added to the Model, if no exceptions have been thrown. It does not matter if the statement was already in the Model or not; it is the act of adding it that fires the listener.\nThere is no guarantee that the statement, array, list, or model that is added or removed is the same one that is passed to the appropriate listener method, and the StmtIterator will never be the same one. However, in the current design:\na single Statement will be .equals to the original Statement a List will be .equals to the original List a Statement[] will be the same length and have .equal elements in the same order a StmtIterator will deliver .equal elements in the same order a Model will contain the same statements We advise not relying on these ordering properties; instead assume that for any bulk update operation on the model, the listener will be told the method of update and the statements added or removed, but that the order may be different and duplicate statements may have been removed. Note in particular that a Model with any Listeners will have to record the complete contents of any StmtIterator that is added or removed to the model, so that the Model and the Listener can both see all the statements.\nFinally, there is no guarantee that only Statements etc added through the Model API will be presented to the listener; any Triples added to its underlying Graph will also be presented to the listener as statements.\nUtility classes The full Listener API is rather chunky and it can be inconvenient to use, especially for the creation of inline classes. There are four utility classes in org.apache.jena.rdf.listeners:\nNullListener. This class\u0026rsquo;s methods do nothing. This is useful when you want to subclass and intercept only specific ways of updating a Model. ChangedListener. This class only records whether some change has been made, but not what it is. The method hasChanged() returns true if some change has been made since the last call of hasChanged() [or since the listener was created]. StatementListener. This class translates all bulk update calls (ie the ones other than addedStatement() and removedStatement()) into calls to addedStatement()/removedStatement() for each Statement in the collection. This allows statements to be tracked whether they are added one at a time or in bulk. ObjectListener. This class translates all the listener calls into added(Object) or removed(Object) as appropriate; it is left to the user code to distinguish among the types of argument. When listeners are called In the current implementation, listener methods are called immediately the additions or removals are made, in the same thread as the one making the update. If a model has multiple listeners registered, the order in which they are informed about an update is unspecified and may change from update to update. If any listener throws an exception, that exception is thrown through the update call, and other listeners may not be informed of the update. Hence listener code should be brief and exception-free if at all possible.\nRegistering and unregistering A listener may be registered with the same model multiple times. If so, it will be invoked as many times as it is registered for each update event on the model.\nA listener L may be unregistered from a Model using the method unregister(L). If L is not registered with the model, nothing happens.\nIf a listener is registered multiple times with the same model, each unregister() for that listener will remove just one of the registrations.\nTransactions and databases In the current design, listeners are not informed of transaction boundaries, and all events are fed to listeners as soon as they happen.\n","permalink":"","tags":null,"title":"Event handling in Jena"},{"categories":null,"contents":"Optimization in ARQ proceeds on two levels. After the query is parsed, the SPARQL algebra for the query is generated as described in the SPARQL specification. High-level optimization occurs by rewriting the algebra into new, equivalent algebra forms and introducing specialized algebra operators. During query execution, the low-level, storage-specific optimization occurs such as choosing the order of triple patterns within basic graph patterns.\nThe effect of high-level optimizations can be seen using arq.qparse and the low-level runtime optimizations can be seen by execution logging.\nAlgebra Transformations The preparation for a query for execution can be investigated with the command arq.qparse --explain --query QueryFile.rq. Different storage systems may perform different optimizations, usually chosen from the standard set. qparse shows the action of the memory-storage optimizer which applies all optimizations.\nOther useful arguments are:\nqparse arguments\nArgument Effect --print=query Print the parsed query --print=op Print the SPARQL algebra for the query. This is exactly the algebra specified by the SPARQL standard. --print=opt Print the optimized algebra for the query. --print=quad Print the quad form algebra for the query. --print=optquad Print the quad-form optimized algebra for the query. The argument --explain is equivalent to --print=query --print=opt\nExamples:\narq.qparse --explain --query Q.rq arq.qparse --explain \u0026#39;SELECT * { ?s ?p ?o }\u0026#39; Execution Logging ARQ can log query and update execution details globally or for an individual operations. This adds another level of control on top of the logger level controls.\nFrom command line:\narq.sparql --explain --data ... --query ... Explanatory messages are controlled by the Explain.InfoLevel level in the execution context.\nExecution logging at level ALL can cause a significant slowdown in query execution speeds but the order of operations logged will be correct.\nThe logger used is called org.apache.jena.arq.exec. Message are sent at level \u0026ldquo;info\u0026rdquo;. So for log4j2, the following can be set in the file:\ = org.apache.jena.arq.exec logger.arq-exec.level = INFO The context setting is for key (Java constant) ARQ.symLogExec. To set globally:\nARQ.setExecutionLogging(Explain.InfoLevel.ALL) ; and it may also be set on an individual query execution using its local context.\ntry(QueryExecution qExec = QueryExecution.create() ... .set(ARQ.symLogExec, Explain.InfoLevel.ALL).build() ) { ResultSet rs = qExec.execSelect() ; ... } On the command line:\narq.query --explain --data data file --query=queryfile The command tdbquery takes the same \u0026ndash;explain argument.\nLogging information levels: see the logging page\nTo get ARQ query explanation in Fuseki logs see Fuseki logging\n","permalink":"","tags":null,"title":"Explaining ARQ queries"},{"categories":null,"contents":"There are several ways to extend the ARQ query engine within the SPARQL syntax.\nExpression Functions - additional operations in FILTERS, BIND and SELECT expressions. Property functions - adding predicates that introduce custom query stages DESCRIBE handlers Support for finding blank nodes by label Extending query evaluation for querying different storage and inference systems Functions are standard part of SPARQL. ARQ provides application-written functions and provides a function library. Applications can write and register their own functions.\nProperty functions provide a way to provide custom matching of particular predicates. They enable triple matches to be calculated, rather than just looked up in a RDF graph and they are a way to add functionality and remain within SPARQL. ARQ has a property function library. Applications can write and register their own property functions.\nThe free text support in ARQ is provided by Lucene, using property functions.\nFilter Functions A SPARQL custom function is implementation dependent. Most details of the ARQ query engine do not have to be understood to write a function; it is a matter of implementing one interface. This is made simpler for many cases by a number of base classes that provide much of the machinery needed.\nFunction Registry Functions can be installed into the function registry by the application. The function registry is a mapping from URI to a factory class for functions (each time a function is mentioned in a query, a new instance is created) and there is an auto-factory class so just registering a Java class will work. A function can access the queried dataset.\nDynamically Loaded Functions The ARQ function library uses this mechanism. The namespace of the ARQ function library is \u0026lt;\u0026gt;.\nPREFIX afn: \u0026lt;\u0026gt; PREFIX dc: \u0026lt;\u0026gt; SELECT ?v { ?x dc:date ?date . FILTER (?date \u0026lt; afn:now() ) } The afn:now returns the time the query started.\nThe expression functions in the ARQ distribution are described on the expression function library page.\nURIs for functions in the (fake) URI scheme java: are dynamically loaded. The class name forms the scheme specific part of the URI.\nProperty functions Property functions, sometimes called \u0026ldquo;magic properties\u0026rdquo;, are properties that cause triple matching to happen by executing some piece of code, determined by the property URI, and not by the usual graph matching. They can be used to give certain kinds of inference and rule processing. Some calculated properties have additional, non-declarative requirements such as needing one of other of the subject or object to be a query constant or a bound value, and not able to generate all possibilities for that slot.\nProperty functions must have fixed URI for the predicate (it can\u0026rsquo;t be query variable). They may take a list for subject or object.\nOne common case is for access to collections (RDF lists) or containers (rdf:Bag, rdf:Seq, rdf:Alt).\nPREFIX list: \u0026lt;\u0026gt; SELECT ?member { ?x :p ?list . # Some way to find the list ?list list:member ?member . } which can also be written:\nPREFIX list: \u0026lt;\u0026gt; SELECT ?member { ?x :p [ list:member ?member ] } Likewise, RDF containers:\nPREFIX rdfs: \u0026lt;\u0026gt; SELECT ?member { ?x :p ?bag . # Some way to find the bag ?bag rdfs:member ?member . } Property functions can also take lists in the subject or object slot.\nCode for properties can be dynamically loaded or pre-registered. For example, splitIRI will take an IRI and assign the namespace ad localname parts to variables (if the variables are already bound, not constants are used, splitIRI will check the values).\nPREFIX xsd: \u0026lt;\u0026gt; PREFIX apf: \u0026lt;java:org.apache.jena.query.pfunction.library.\u0026gt; SELECT ?namespace ?localname { xsd:string apf:splitIRI (?namespace ?localname) } Property functions might conflict with inference rules and it can be turned off by the Java code:\nARQ.setFalse(ARQ.enablePropertyFunctions) ; or on a per instance basis:\ntry(QueryExecution qExec = ... ) { qExec.getContext().setFalse(ARQ.enablePropertyFunctions) ; ... } The property functions in the ARQ distribution are described on the property function library page.\nURIs for functions in the (fake) URI scheme java: are dynamically loaded. The class name forms the scheme specific part of the URI.\nDESCRIBE handlers The DESCRIBE result form in SPARQL does not define an exact form of RDF to return. Instead, it allows the server or query processor to return what it considers to be an appropriate description of the resources located. This description will be specific to the domain, data modelling or application.\nARQ comes with one built-in handler which calculates the blank node closure of resources found. While suitable for many situations, it is not general (for example, a FOAF file usually consists of all blank nodes). ARQ allows the application to replace or add handlers for producing DESCRIBE result forms.\nApplication-specific handlers can be added to the DescribeHandlerRegistry. The handler will be called for each resource (not literals) identified by the DESCRIBE query.\nBlank Node Labels URIs from with scheme name \u0026ldquo;_\u0026rdquo; (which is illegal) are created as blank node labels for directly accessing a blank node in the queried graph or dataset. These are constant terms in the query - not unnamed variables. Do not confuse these with the standard qname-like notation for blank nodes in queries. This is not portable - use with care.\n\u0026lt;_:1234-5678-90\u0026gt; # A blank node in the data _:b0 # A blank node in the query - a variable ","permalink":"","tags":null,"title":"Extensions in ARQ"},{"categories":null,"contents":"Eyeball is a Jena-based tool for checking RDF models (including OWL) for common problems. It is user-extensible using plugins.\nThis page is historical \u0026ldquo;for information only\u0026rdquo; - there is no Apache release of Eyeball and the code has not been updated for Jena3.\nThe original source code is available. Documentation index The brief guide. The manual. The JavaDoc. Getting the Eyeball release Installation Eyeball needs to be compiled from source.\nIf you have Ant installed, run the Eyeball test suite:\nant test Ensure all the jars in the Eyeball lib directory are on your classpath.\nUsing Eyeball with Apache Maven TODO\nTrying it out Pick one of your RDF files; we\u0026rsquo;ll call it FOO for now. Run the command-line command\njava jena.eyeball -check FOO You will likely get a whole bunch of messages about your RDF. The messages are supposed to be self-explanatory, so you may be able to go ahead and fix some problems straight away. If you get a Java error about NoClassDefFoundError, you\u0026rsquo;ve forgotten to set the classpath up or use the -cp myClassPath option to Java.\nYou may also want to try the experimental GUI, see below.\nIf the messages aren\u0026rsquo;t self-explanatory, or you want more details, please consult the guide.\nExperimental Eyeball GUI Eyeball includes a simple GUI tool which will allow multiple files to be checked at once and multiple schemas to be assumed. It will also allow you to select which inspectors are used.\nTo start the GUI, use the following (assuming your classpath is set up, as above): java jena.eyeballGUI\n","permalink":"","tags":null,"title":"Eyeball - checking RDF/OWL for common problems"},{"categories":null,"contents":"","permalink":"","tags":null,"title":"Frequently asked questions"},{"categories":null,"contents":"The regular expressions for fan:localname and afn:namespace were incorrect. SPARQL allows custom functions in expressions so that queries can be used on domain-specific data. SPARQL defines a function by URI (or prefixed name) in FILTER expressions. ARQ provides a function library and supports application-provided functions. Functions and property functions can be registered or dynamically loaded.\nApplications can also provide their own functions.\nARQ also provides an implementation the Leviathan Function Library.\nXQuery/XPath Functions and Operators supported ARQ supports the scalar functions and operators from \u0026ldquo;XQuery 1.0 and XPath 2.0 Functions and Operators v3.1\u0026rdquo;.\nFunctions in involving sequences are not supported.\nSee XSD Support for details of datatypes and functions currently supported. To check the exact current registrations, see function/\nSee also the property functions library page.\nFunction Library The prefix afn is \u0026lt;\u0026gt;. (The old prefix of \u0026lt;\u0026gt; continues to work. Applications are encouraged to switch.)\nDirect loading using a URI prefix of \u0026lt;java:org.apache.jena.sparql.function.library.\u0026gt; (note the final dot) is deprecated.\nThe prefix fn is \u0026lt;\u0026gt; (the XPath and XQuery function namespace).\nThe prefix math is \u0026lt;\u0026gt;.\nCustom Aggregates The prefix agg: is \u0026lt;\u0026gt;.\nThe statistical aggregates are provided are:\nagg:stdev, agg:stdev_samp, agg:stdev_pop, agg:variance, agg:var_samp, agg:var_pop\nThese are modelled after SQL aggregate functions STDEV, STDEV_SAMP, STDEV_POP, VARIANCE, VAR_SAMP, VAR_POP.\nThese, as keywords, are available in ARQ\u0026rsquo;s extended SPARQL (parse using Syntax.syntaxARQ).\nAdditional Functions Provided by ARQ Most of these have equivalents, or near equivalents, in SPARQL or as an XQuery function and are to be preferred. These ARQ-specific versions remain for compatibility.\nRDF Graph Functions\nFunction name Description Alternative afn:bnode(?x) Return the blank node label if ?x is a blank node. STR(?x) afn:localname(?x) The local name of ?x `REPLACE(STR(?x), \u0026ldquo;^(.*)(/ afn:namespace(?x) The namespace of ?x `REPLACE(STR(?x), \u0026ldquo;^(.*)(/ The prefix and local name of a IRI is based on splitting the IRI, not on any prefixes in the query or dataset.\nString Functions\nFunction name Description Alternative afn:sprintf(format, v1, v2, ...) Make a string from the format string and the RDF terms. afn:substr(string, startIndex [,endIndex]) Substring, Java style using startIndex and endIndex. afn:substring Synonym for afn:substr afn:strjoin(sep, string ...) Concatenate string together, with a separator. afn:sha1sum(resource) Calculate the SHA1 checksum of a literal or URI SHA1(STR(resource)) Notes:\nStrings in \u0026ldquo;XQuery 1.0 and XPath 2.0 Functions and Operators\u0026rdquo; start from character position one, unlike Java and C# where strings start from zero. The fn:substring operation takes an optional length, like C# but different from Java, where it is the endIndex of the first character after the substring. afn:substr uses Java-style startIndex and endIndex. Mathematical Functions\nFunction name Description Alternative afn:min(num1, num2) Return the minimum of two numbers fn:min afn:max(num1, num2) Return the maximum of two numbers fn:max afn:pi() The value of pi, as an XSD double math:pi() afn:e() The value of e, as an XSD double math:exp(1) afn:sqrt(num) The square root of num math:sqrt Miscellaneous Functions\nFunction name Description Alternative afn:now() Current time. Actually, the time the query started. NOW() afn:sha1sum(resource) Calculate the SHA1 checksum SHASUM ","permalink":"","tags":null,"title":"Functions in ARQ"},{"categories":null,"contents":"The jena-fuseki-docker package contains a Dockerfile, docker-compose file, and helper scripts to create a docker container for Apache Jena Fuseki.\nThe docker container is based on Fuseki main for running a SPARQL server.\nThere is no UI - all configuration is by command line and all usage by via the network protocols.\nDatabases can be mounted outside the docker container so they are preserved when the container terminates.\nThis build system allows the user to customize the docker image.\nThe docker build downloads the server binary from Maven central, checking the download against the SHA1 checksum.\nDatabase There is a volume mapping \u0026ldquo;./databases\u0026rdquo; in the current directory into the server. This can be used to contain databases outside, but accessible to, the container that do not get deleted when the container exits.\nSee examples below.\nBuild Choose the version number of Apache Jena release you wish to use. This toolkit defaults to the version of the overall Jena release it was part of. It is best to use the release of this set of tools from the same release of the desired server.\ndocker-compose build --build-arg JENA_VERSION=3.16.0 Note the build command must provide the version number.\nTest Run docker-compose run cam be used to test the build from the previous section.\nExamples:\nStart Fuseki with an in-memory, updatable dataset at http://host:3030/ds\ndocker-compose run --rm --service-ports fuseki --mem /ds Load a TDB2 database, and expose, read-only, via docker:\nmkdir -p databases/DB2 tdb2.tdbloader --loc databases/DB2 MyData.ttl # Publish read-only docker-compose run --rm --name MyServer --service-ports fuseki --loc databases/DB2 /ds To allow update on the database, add --update. Updates are persisted.\ndocker-compose run --rm --name MyServer --service-ports fuseki --update --loc databases/DB2 /ds See fuseki-configuration for more information on command line arguments.\nTo use docker-compose up, edit the docker-compose.yaml to set the Fuseki command line arguments appropriately.\nLayout The default layout in the container is:\nPath Use /opt/java-minimal A reduced size Java runtime /fuseki The Fuseki installation /fuseki/ Logging configuration /fuseki/databases/ Directory for a volume for persistent databases Setting JVM arguments Use JAVA_OPTIONS:\ndocker-compose run --service-ports --rm -e JAVA_OPTIONS=\u0026quot;-Xmx1048m -Xms1048m\u0026quot; --name MyServer fuseki --mem /ds Docker Commands If you prefer to use docker directly:\nBuild:\ndocker build --force-rm --build-arg JENA_VERSION=3.16.0 -t fuseki . Run:\ndocker run -i --rm -p \u0026quot;3030:3030\u0026quot; --name MyServer -t fuseki --mem /ds With databases on a bind mount to host filesystem directory:\nMNT=\u0026quot;--mount type=bind,src=$PWD/databases,dst=/fuseki/databases\u0026quot; docker run -i --rm -p \u0026quot;3030:3030\u0026quot; $MNT --name MyServer -t fuseki --update --loc databases/DB2 /ds Version specific notes: Versions of Jena up to 3.14.0 use Log4j1 for logging. The docker will build will ignore the file Version 3.15.0: When run, a warning will be emitted.\nWARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.\nThis can be ignored. ","permalink":"","tags":null,"title":"Fuseki : Docker Tools"},{"categories":null,"contents":"Fuseki can be run within a larger JVM application as an embedded triplestore.\nDependencies and Setup Logging Building a Server Examples The application can safely access and modify the data published by the server if it does so inside a transaction using an appropriate storage choice. DatasetFactory.createTxnMem() is a good choice for in-memory use; TDB is a good choice for a persistent database.\nTo build and start the server:\nDataset ds = ... FusekiServer server = FusekiServer.create() .add(\u0026quot;/rdf\u0026quot;, ds) .build() ; server.start() ; then the application can modify the dataset:\n// Add some data while live. // Write transaction. Txn.execWrite(dsg, ()-\u0026gt;, \u0026quot;D.trig\u0026quot;)) ; or read the dataset and see any updates made by remote systems:\n// Query data while live // Read transaction. Txn.execRead(dsg, ()-\u0026gt;{ Dataset ds = DatasetFactory.wrap(dsg) ; try (QueryExecution qExec = QueryExecution.create(\u0026quot;SELECT * { ?s ?o}\u0026quot;, ds) ) { ResultSet rs = qExec.execSelect() ; ResultSetFormatter.out(rs) ; } }) ; The full Jena API can be used provided operations (read and write) are inside a transaction.\nDependencies and Setup To include an embedded Fuseki server in the application:\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-fuseki-main\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;3.x.y\u0026lt;/version\u0026gt; \u0026lt;!-- Set the version --\u0026gt; \u0026lt;/dependency\u0026gt; This brings in enough dependencies to run Fuseki. Application writers are strongly encouraged to use a dependency manager because the number of Jetty and other dependencies is quite large and difficult to set manually.\nThis dependency does not include a logging setting. Fuseki uses slf4j.\nIf the application wishes to use a dataset with a text-index then the application will also need to include jena-text in its dependencies.\nLogging The application must set the logging provided for slf4j. Apache Jena provides helpers Apache Log4j v2.\nFor Apache Log4j2, call:\nFusekiLogging.setLogging(); and dependency:\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.logging.log4j\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;log4j-slf4j-impl\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;2.13.1\u0026lt;/version\u0026gt; \u0026lt;!-- Many versions work --\u0026gt; \u0026lt;/dependency\u0026gt; See Fuseki Logging.\nTo silence logging from Java, try:\nLogCtl.setLevel(Fuseki.serverLogName, \u0026quot;WARN\u0026quot;); LogCtl.setLevel(Fuseki.actionLogName, \u0026quot;WARN\u0026quot;); LogCtl.setLevel(Fuseki.requestLogName, \u0026quot;WARN\u0026quot;); LogCtl.setLevel(Fuseki.adminLogName, \u0026quot;WARN\u0026quot;); LogCtl.setLevel(\u0026quot;org.eclipse.jetty\u0026quot;, \u0026quot;WARN\u0026quot;); Building a server A FusekiServer is built by creating a configuration, building the server, then running it. The application needs to start the server.\nThe default port for a Fuseki embedded server is 3330. This is different for the default port for Fuseki running as a standalone server or as a webapp application.\nExamples of embedded use Example 1 Create a server on port 3330, that provides the default set of endpoints for an RDF dataset that can be updated via HTTP.\nDataset ds = DatasetFactory.createTxnMem() ; FusekiServer server = FusekiServer.create() .add(\u0026quot;/ds\u0026quot;, ds) .build() ; server.start() ; ... server.stop() ; The services are avilable on a named endpoint and also on the dataset URL itself.\nURLs:\nService Endpoint1 Endpoint2 SPARQL Query http://host:3330/ds/query http://host:3330/ds SPARQL Query http://host:3330/ds/sparql http://host:3330/ds SPARQL Update http://host:3330/ds/update http://host:3330/ds GSP read-write http://host:3330/ds/data http://host:3330/ds \u0026ldquo;GSP\u0026rdquo; = SPARQL Graph Store Protocol\nExample 2 Create a server on port 3332, that provides the default set of endpoints for a data set that is read-only over HTTP. The application can still update the dataset.\nDataset ds = ... ; FusekiServer server = FusekiServer.create() .port(3332) .add(\u0026quot;/ds\u0026quot;, ds, false) .build() ; server.start() ; Service Endpoint Endpoint2 SPARQL Query http://host:3332/ds/query http://host:3332/ds SPARQL Query http://host:3332/ds/sparql http://host:3332/ds GSP read-only http://host:3332/ds/data http://host:3332/ds Example 3 Different combinations of services and endpoint names can be given using a DataService.\nDatasetGraph dsg = ... ; DataService dataService = new DataService(dsg) ; dataService.addEndpoint(OperationName.GSP_RW, \u0026quot;\u0026quot;); dataService.addEndpoint(OperationName.Query, \u0026quot;\u0026quot;); dataService.addEndpoint(OperationName.Update, \u0026quot;\u0026quot;); FusekiServer server = FusekiServer.create() .port(3332) .add(\u0026quot;/data\u0026quot;, dataService) .build() ; server.start() ; This setup puts all the operation on the dataset URL. The Content-type and any query string is used to determine the operation.\nService Endpoint SPARQL Query http://host:3332/ds SPARQL Update http://host:3332/ds GSP read-write http://host:3332/ds Example 4 Multiple datasets can be served by one server.\nDataset ds1 = ... Dataset ds2 = ... FusekiServer server = FusekiServer.create() .add(\u0026quot;/data1\u0026quot;, ds1) .add(\u0026quot;/data1-readonly\u0026quot;, ds1, true) .add(\u0026quot;/data2\u0026quot;, ds2) .build() ; server.start() ; ","permalink":"","tags":null,"title":"Fuseki : Embedded Server"},{"categories":null,"contents":"Fuseki main is a packaging of Fuseki as a triple store without a UI for administration.\nFuseki can be run in the background by an application as an embedded server. The application can safely work with the dataset directly from java while having Fuseki provide SPARQL access over HTTP. An embedded server is useful for adding functionality around a triple store and also for development and testing.\nRunning as a deployment or development server Running from Docker Running as an embedded server Dependencies and Setup Logging Building a Server Examples The main server does not depend on any files on disk (other than for databases provided by the application), and does not provide the Fuseki UI or admins functions to create dataset via HTTP.\nSee also Data Access Control for Fuseki.\nRunning as a configured deployment or development server The artifact org.apache.jena:jena-fuseki-server is a packaging of the \u0026ldquo;main\u0026rdquo; server that runs from the command line. Unlike the UI Fuseki server, it is only configured from the command line and has no persistent work area on-disk.\njava -jar jena-fuseki-server-$VER.jar --help The arguments are the same as the full UI server command line program. There are no special environment variables.\nThe entry point is org.apache.jena.fuseki.main.cmds.FusekiMainCmd so the server can also be run as:\njava -cp jena-fuseki-server-$VER.jar:...OtherJars... \\ org.apache.jena.fuseki.main.cmds.FusekiMainCmd ARGS Docker A kit to build a container with docker or docker compose\n Note: take care that databases are on mounted volumes if they are to persist after the container is removed.\nSee the Fuseki docker tools page for details.\nRunning as an embedded server Fuseki can be run from inside an Java application to provide SPARQL services to application data. The application can continue to access and update the datasets served by the server.\nTo build and start the server:\nDataset ds = ... FusekiServer server = FusekiServer.create() .add(\u0026quot;/dataset\u0026quot;, ds) .build() ; server.start() ; See Fuseki embedded documentation for details and examples.\n","permalink":"","tags":null,"title":"Fuseki : Main Server"},{"categories":null,"contents":"A data service provides a number of operations on a dataset. These can be explicitly named endpoints or operations at the URL of the dataset. New operations can be configured in; these typically have their own named endpoints.\nSyntax Here is an example of a server configuration that provides one operation, SPARQL query, and then only on the dataset URL.\nPREFIX : \u0026lt;#\u0026gt; PREFIX fuseki: \u0026lt;\u0026gt; PREFIX rdf: \u0026lt;\u0026gt; [] rdf:type fuseki:Server . \u0026lt;#service\u0026gt; rdf:type fuseki:Service ; fuseki:name \u0026quot;dataset\u0026quot; ; fuseki:endpoint [ fuseki:operation fuseki:query ] ; fuseki:dataset \u0026lt;#dataset\u0026gt; . ## In memory transactional dataset initially loaded ## with the contents of file \u0026quot;data.trig\u0026quot; \u0026lt;#dataset\u0026gt; rdf:type ja:MemoryDataset; ja:data \u0026quot;data.trig\u0026quot; . This is invoked with a URL of the form http://host:port/dataset?query=... which is a SPARQL query request sent to the dataset URL.\nThe property fuseki:endpoint describes the operation available. No name is given so the operation is available at the URL of the dataset.\nfuseki:dataset names the dataset to be used with this data service.\nIn this second example:\n\u0026lt;#service\u0026gt; rdf:type fuseki:Service ; fuseki:name \u0026quot;dataset\u0026quot; ; fuseki:endpoint [ fuseki:operation fuseki:query ; fuseki:name \u0026quot;sparql\u0026quot;; ]; fuseki:dataset \u0026lt;#dataset\u0026gt; . the endpoint has a name. The URL to invoke the operation is now:\nhttp://host:port/dataset/sparql?query=...\nand is similar to older form:\n\u0026lt;#service\u0026gt; rdf:type fuseki:Service ; fuseki:name \u0026quot;dataset\u0026quot; ; fuseki:serviceQuery \u0026quot;sparql\u0026quot; ; fuseki:dataset \u0026lt;#dataset\u0026gt; . Operations on the dataset URL have the name \u0026quot;\u0026quot; (the empty string) and this is the default. The first example is the same as:\n\u0026lt;#service\u0026gt; rdf:type fuseki:Service ; fuseki:name \u0026quot;dataset\u0026quot; ; fuseki:endpoint [ fuseki:operation fuseki:query ; fuseki:name \u0026quot;\u0026quot; ; ] ; fuseki:dataset \u0026lt;#dataset\u0026gt; . Original Configuration Syntax The syntax described on this page was introduced in Apache Jena 3.13.0.\nThe previous syntax is still valid.\nThe new syntax enables more configuration options and gives more control of server functionality:\nsetting the context on a per-endpoint basis having multiple operations at the service access point, switching based on operation type a more general structure for adding custom services adding custom extensions to a Fuseki server Operations The following operations are provided:\nURI Operation fuseki:query SPARQL 1.1 Query with ARQ extensions fuseki:update SPARQL 1.1 Update with ARQ extensions fuseki:gsp-r SPARQL Graph Store Protocol and Quad extensions (read only) fuseki:gsp-rw SPARQL Graph Store Protocol and Quad extensions fuseki:upload HTML form file upload fuseki:no-op An operation that causes a 400 or 404 error Custom extensions can be added (see Programmatic configuration of the Fuseki server). To be able to uniquely identify the operation, these are usually\nfuseki:endpoint [ fuseki:operation fuseki:shacl ; fuseki:name \u0026quot;shacl\u0026quot; ; ] ; See the section \u0026ldquo;Integration with Apache Jena Fuseki\u0026rdquo; for details of the SHACL support. While this operation is part of the standard Fuseki distribution, this operation is added during system initialization, using the custom operation support.\nCommand Line Equivalents The standard set of service installed by running the server from the command line without a configuration file is for a read-only:\n\u0026lt;#service1\u0026gt; rdf:type fuseki:Service ; fuseki:name \u0026quot;dataset\u0026quot; ; fuseki:endpoint [ fuseki:operation fuseki:query ; ]; fuseki:endpoint [ fuseki:operation fuseki:query ; fuseki:name \u0026quot;sparql\u0026quot; ]; fuseki:endpoint [ fuseki:operation fuseki:query ; fuseki:name \u0026quot;query\u0026quot; ]; fuseki:endpoint [ fuseki:operation fuseki:gsp-r ; ]; fuseki:endpoint [ fuseki:operation fuseki:gsp-r ; fuseki:name \u0026quot;get\u0026quot; ]; fuseki:dataset ... which supports requests such as:\nhttp://\u0026lt;i\u0026gt;host:port\u0026lt;/i\u0026gt;/dataset?query=... http://\u0026lt;i\u0026gt;host:port\u0026lt;/i\u0026gt;/dataset/sparql?query=... http://\u0026lt;i\u0026gt;host:port\u0026lt;/i\u0026gt;/dataset?default http://\u0026lt;i\u0026gt;host:port\u0026lt;/i\u0026gt;/dataset/get?default and for an updatable dataset (command line --mem for an in-memory dataset; or with TDB storage, with --update):\n\u0026lt;#service1\u0026gt; rdf:type fuseki:Service ; fuseki:name \u0026quot;dataset\u0026quot; ; fuseki:endpoint [ fuseki:operation fuseki:query ;]; fuseki:endpoint [ fuseki:operation fuseki:query ; fuseki:name \u0026quot;sparql\u0026quot; ]; fuseki:endpoint [ fuseki:operation fuseki:query ; fuseki:name \u0026quot;query\u0026quot; ]; fuseki:endpoint [ fuseki:operation fuseki:update ; ]; fuseki:endpoint [ fuseki:operation fuseki:update ; fuseki:name \u0026quot;update\u0026quot; ]; fuseki:endpoint [ fuseki:operation fuseki:gsp-r ; fuseki:name \u0026quot;get\u0026quot; ]; fuseki:endpoint [ fuseki:operation fuseki:gsp-rw ; ] ; fuseki:endpoint [ fuseki:operation fuseki:gsp-rw ; fuseki:name \u0026quot;data\u0026quot; ]; fuseki:endpoint [ fuseki:operation fuseki:upload ; fuseki:name \u0026quot;upload\u0026quot; ] fuseki:dataset ... which adds requests that can change the data.\nNew operations can be added by programmatic setup in Fuseki Main.\nDispatch \u0026ldquo;Dispatch\u0026rdquo; is the process of routing a HTTP request to a specific operation processor implementation to handle the request.\nDispatch to named endpoint usually happens from the name alone, when there is a unique name for an endpoint. If, however, two endpoints give the same fuseki:name, or if operations are defined for the dataset itself, then dispatch is based on a second step of determining the operation type by inspecting the request. Each of the SPARQL operations has a unique signature.\nA query is either a GET with query string including \u0026ldquo;?query=\u0026rdquo;, or a POST with a content type of the body \u0026ldquo;application/sparql-query\u0026rdquo;, or an HTML form with a field \u0026ldquo;query=\u0026rdquo;\nAn update is a POST where the body is \u0026ldquo;application/sparql-update\u0026rdquo; or an HTML form with field \u0026ldquo;update=\u0026rdquo;.\nA GSP operation has ?default or ?graph=.\nQuads operations are also provided by GSP endpoints when there is no query string and a have a Content-Type for a data in a RDF triples or quads syntax.\nSo, for example \u0026ldquo;GET /dataset\u0026rdquo; is a request to get all the triples and quads in the dataset. The syntax for the response is determined by content negotiation, defaulting to text/trig.\nCustom services usually use a named endpoint. Custom operations can specific a content type that they handle, which must be unique for the operation. They can not provide a query string signature for dispatch.\nCommon Cases This section describes a few deployment patterns:\nCase 1: Read-only Dataset The 2 SPARQL standard operations for a read-only dataset:\n\u0026lt;#service\u0026gt; rdf:type fuseki:Service ; fuseki:name \u0026quot;ds-read-only\u0026quot; ; ## fuseki:name \u0026quot;\u0026quot; is optional. fuseki:endpoint [ fuseki:operation fuseki:query; ] ; fuseki:endpoint [ fuseki:operation fuseki:gsp-r; ] ; fuseki:dataset \u0026lt;#dataset\u0026gt; . This is good for publishing data.\nCase 2: Dataset level operation. The 3 SPARQL standard operations for a read-write dataset, request are sent to http://host:port/dataset. There are no named endpoint services.\n\u0026lt;#service\u0026gt; rdf:type fuseki:Service ; fuseki:name \u0026quot;ds-rw\u0026quot; ; ## fuseki:name \u0026quot;\u0026quot; is optional. fuseki:endpoint [ fuseki:operation fuseki:query; ] ; fuseki:endpoint [ fuseki:operation fuseki:update;] ; fuseki:endpoint [ fuseki:operation fuseki:gsp-rw; ] ; fuseki:dataset \u0026lt;#dataset\u0026gt; . Case 3: Named endpoints \u0026lt;#service1\u0026gt; rdf:type fuseki:Service ; fuseki:name \u0026quot;ds-named\u0026quot; ; fuseki:endpoint [ fuseki:operation fuseki:query; fuseki:name \u0026quot;sparql\u0026quot; ] ; fuseki:endpoint [ fuseki:operation fuseki:query; fuseki:name \u0026quot;query\u0026quot; ] ; fuseki:endpoint [ fuseki:operation fuseki:update; fuseki:name \u0026quot;update\u0026quot; ] ; fuseki:endpoint [ fuseki:operation fuseki:upload; fuseki:name \u0026quot;upload\u0026quot; ] ; fuseki:endpoint [ fuseki:operation fuseki:gsp_r; fuseki:name \u0026quot;get\u0026quot; ] ; fuseki:endpoint [ fuseki:operation fuseki:gsp_rw; fuseki:name \u0026quot;data\u0026quot; ] ; fuseki:dataset \u0026lt;#dataset\u0026gt; . The operation on this dataset can only be accessed as \u0026ldquo;/ds-named/sparql\u0026rdquo;, \u0026ldquo;/ds-named/update\u0026rdquo; etc, not as \u0026ldquo;/ds-named\u0026rdquo;.\nCase 4: Named endpoints with query of the dataset. \u0026lt;#service1\u0026gt; rdf:type fuseki:Service ; fuseki:name \u0026quot;ds\u0026quot; ; fuseki:endpoint [ fuseki:operation fuseki:query ] ; fuseki:endpoint [ fuseki:operation fuseki:query; fuseki:name \u0026quot;sparql\u0026quot; ] ; fuseki:endpoint [ fuseki:operation fuseki:query; fuseki:name \u0026quot;query\u0026quot; ] ; fuseki:endpoint [ fuseki:operation fuseki:update; fuseki:name \u0026quot;update\u0026quot; ] ; fuseki:endpoint [ fuseki:operation fuseki:upload; fuseki:name \u0026quot;upload\u0026quot; ] ; fuseki:endpoint [ fuseki:operation fuseki:gsp_r; fuseki:name \u0026quot;get\u0026quot; ] ; fuseki:endpoint [ fuseki:operation fuseki:gsp_rw; fuseki:name \u0026quot;data\u0026quot; ] ; fuseki:dataset \u0026lt;#dataset\u0026gt; . The operations on this dataset are accessed as \u0026ldquo;/ds/sparql\u0026rdquo;, \u0026ldquo;/ds/update\u0026rdquo; etc. In addition, \u0026ldquo;/ds?query=\u0026rdquo; provided SPARQL query.\nQuad extensions The GSP (SPARQL Graph Store Protocol) operations provide the HTTP operations of GET, POST, PUT and DELETE for specific graphs in the RDF dataset. The SPARQL GSP standard includes identifying the target graph with ?default or ?graph=...uri... and the request or response is one of the RDF triple syntaxes (Turtle, N-Triples, JSON-LD, RDF/XML) as well as older proposals (TriX and RDF/JSON).\nApache Jena Fuseki also provides quad operations for HTTP methods GET, POST, PUT (not DELETE, that would be the dataset itself), and the request or response is one of the syntaxes for datasets (TriG, N-Quads, JSON-LD, TriX).\nThe DSP (\u0026ldquo;Dataset Store Protocol\u0026rdquo;) operations provide operations similar to GSP but operating on the dataset, not a speciifc graph.\nFuseki also provides [/documentation/io/rdf-binary.html](RDF Binary) for triples and quads.\nContext Each operation execution is given a \u0026ldquo;context\u0026rdquo; - a set of name-value pairs. Internally, this is used for system registries, for the fixed \u0026ldquo;current time\u0026rdquo; for an operation. The context is the merge of the server\u0026rsquo;s context, any additional settings on the dataset and any settings for the endpoint. The merge is performed in that order - server then dataset then endpoint.\nUses for the context setting include query timeouts and making default query pattern matching apply to the union of named graphs, not the default graph.\nIn this example (prefix tdb2: is for URI \u0026lt;\u0026gt;):\n\u0026lt;#servicetdb\u0026gt; rdf:type fuseki:Service ; fuseki:name \u0026quot;ds-tdb\u0026quot; ; fuseki:endpoint [ fuseki:operation fuseki:query ; fuseki:name \u0026quot;sparql-union\u0026quot; ; ja:context [ ja:cxtName \u0026quot;tdb:unionDefaultGraph\u0026quot; ; ja:cxtValue true ] ; ] ; fuseki:endpoint [ fuseki:operation fuseki:query; ] ; fuseki:endpoint [ fuseki:operation fuseki:update; ] ; fuseki:dataset \u0026lt;#tdbDataset\u0026gt; . \u0026lt;#tdbDataset\u0026gt; rdf:type tdb2:DatasetTDB ; ja:context [ ja:cxtName \u0026quot;arq:queryTimeout\u0026quot; ; ja:cxtValue \u0026quot;10000,30000\u0026quot; ] ; tdb2:location \u0026quot;DATA\u0026quot; . \u0026ldquo;/ds-tdb\u0026rdquo; is a TDB2 database with endpoints for SPARQL query and update on the dataset URL. In addition, it has a named service \u0026ldquo;/ds-tdb/sparql-union\u0026rdquo; where the query works with the union of named graphs as the default graph.\nQuery timeout is set for any use of the dataset with first result in 10 seconds, and complete results in 30 seconds.\nSecurity The page Data Access Control for Fuseki covers the\nFor endpoints, the permitted users are part of the endpoint description.\nfuseki:endpoint [ fuseki:operation fuseki:query; fuseki:name \u0026quot;sparql\u0026quot; ; fuseki:allowedUsers \u0026quot;user1\u0026quot;, \u0026quot;user2\u0026quot; ] ; ","permalink":"","tags":null,"title":"Fuseki Data Service Configuration Syntax"},{"categories":null,"contents":"This page describes the original Fuseki2 server configuration syntax.\nExample:\n## Updatable dataset. \u0026lt;#service1\u0026gt; rdf:type fuseki:Service ; fuseki:name \u0026quot;ds\u0026quot; ; # http://host:port/ds fuseki:serviceQuery \u0026quot;sparql\u0026quot; ; # SPARQL query service fuseki:serviceQuery \u0026quot;query\u0026quot; ; # SPARQL query service (alt name) fuseki:serviceUpdate \u0026quot;update\u0026quot; ; # SPARQL update service fuseki:serviceReadWriteGraphStore \u0026quot;data\u0026quot; ; # SPARQL Graph Store Protocol (read and write) fuseki:serviceReadGraphStore \u0026quot;get\u0026quot; ; # SPARQL Graph Store Protocol (read only) fuseki:dataset \u0026lt;#dataset\u0026gt; ; . \u0026lt;#dataset\u0026gt; refers to a dataset description in the same file.\nThere are a fixed set of services:\nService Description fuseki:serviceQuery SPARQL query service fuseki:serviceUpdate SPARQL update service fuseki:serviceReadGraphStore SPARQL Graph Store Protocol (read) fuseki:serviceReadWriteGraphStore SPARQL Graph Store Protocol (read and write) Configuration syntax can be mixed. If there are both old style and new style configurations for the same endpoint, the new style configuration is used.\nQuads operations on dataset are implied if there is a SPARQL Graph Store Protocol service configured.\nIf a request is made on the dataset (no service name in the request URL), then the dispatcher classifies the operation and looks for a named endpoint for that operation of any name. If one is found, that is used. In the full endpoint configuration syntax, the additional dataset services are specified explicitly.\nThe equivalent of\nfuseki:serviceQuery \u0026quot;sparql\u0026quot; ; is\nfuseki:endpoint [ fuseki:operation fuseki:query ; ]; fuseki:endpoint [ fuseki:operation fuseki:query ; fuseki:name \u0026quot;sparql\u0026quot; ]; and the two endpoint can have different context setting and security.\n","permalink":"","tags":null,"title":"Fuseki Data Service Configuration Syntax - Old Style"},{"categories":null,"contents":"There are two areas: the fixed files provided by the distribution and the changing files for the local deployment,including the default location for TDB databases.\nTwo environment variables control the file system usage. Symbolic links can be used to create variations on the standard layout.\nFUSEKI_HOME - this contains the fixed files from the distribution and is used for Unix service deployments. When deployment as a WAR file, everything is in the WAR file itself.\nFUSEKI_BASE - this contains the deployment files.\nMode Environment Variable Default Setting Service FUSEKI_HOME /usr/share/fuseki FUSEKI_BASE /etc/fuseki Webapp FUSEKI_HOME N/A (Files in the Fuseki .war file) FUSEKI_BASE /etc/fuseki Standalone FUSEKI_HOME Current directory FUSEKI_BASE ${FUSEKI_HOME}/run/ When run in a web application container (e.g. Tomcat, Jetty or other webapp compliant server), FUSEKI_BASE will be /etc/fuseki.\nIf FUSEKI_BASE is the same as FUSEKI_HOME, be careful when upgrading not to delete server deployment files and directories.\nDistribution area \u0026ndash; FUSEKI_HOME Directory or File Usage fuseki Fuseki Service (Linux) fuseki-server Fuseki standalone command fuseki-server.bat Fuseki standalone command fuseki-server.jar The Fuseki Server binary fuseki.war The Fuseki Server as a WAR file bin/ Helper scripts webapp/ The webapp for the UI Runtime area \u0026ndash; FUSEKI_BASE Directory or File Usage config.ttl Server configuration shiro.ini Apache Shiro configuration databases/ TDB Databases backups/ Write area for live backups configuration/ Assembler files logs/ Log file area system/ System configuration database system_files/ Uploaded data service descriptions (copies) templates/ Templates for build-in configurations The system_files/ keeps a copy of any assemblers uploaded to configure the server. The primary copy is kept in the system database.\nResetting To reset the server, stop the server, and delete the system database in system/, the system_files/ and any other unwanted deployment files, then restart the server.\n","permalink":"","tags":null,"title":"Fuseki File System Layout"},{"categories":null,"contents":"This page describes the HTTP Protocol used to control an Fuseki server via its administrative interface.\nOperations Server Information Datasets and Services Adding a Dataset and its Services Removing a Dataset Dormant and Active Removing a dataset All admin operations have URL paths starting /$/ to avoid clashes with dataset names and this prefix is reserved for the Fuseki control functions. Further operations may be added within this naming scheme.\nOperations Replace {name} with a dataset name: e.g. /$/backup/myDataset.\nMethod URL pattern Description GET /$/ping Check if server is alive POST /$/ping GET /$/server Get basic information (version, uptime, datasets\u0026hellip;) POST /$/server GET /$/status Alias of /$/server POST /$/datasets Create a new dataset GET /$/datasets Get a list of datasets DELETE /$/datasets/{name} Remove a dataset GET /$/datasets/{name} Get information about a dataset POST /$/datasets/{name}?state=offline Switch state of dataset to offline POST /$/datasets/{name}?state=active Switch state of dataset to online POST /$/server/shutdown Not implemented yet GET /$/stats Get request statistics for all datasets GET /$/stats/{name} Get request statistics for a dataset POST /$/backup/{name} POST /$/backups/{name} Alias of /$/backup/{name} GET /$/backups-list POST /$/compact/{name}?deleteOld=true POST /$/sleep GET /$/tasks GET /$/tasks/{name} GET /$/metrics GET /$/logs Not implemented yet Ping Pattern: /$/ping\nThe URL /$/ping is a guaranteed low cost point to test whether a server is running or not. It returns no other information other than to respond to the request over GET or POST (to avoid any HTTP caching) with a 200 response.\nReturn: current timestamp\nServer Information Pattern: /$/server\nThe URL /$/server returns details about the server and it\u0026rsquo;s current status in JSON.\n@@details of JSON format.\nDatasets and Services Pattern: /$/datasets\n/$/datasets is a container representing all datasets present in the server. /$/datasets/{name} names a specific dataset. As a container, operations on items in the container, via GET, POST and DELETE, operate on specific dataset.\nAdding a Dataset and its Services. @@ May add server-managed templates\nA dataset set can be added to a running server. There are several methods for doing this:\nPost the assembler file HTML Form upload the assembler file Use a built-in template (in-memory or persistent) All require HTTP POST.\nChanges to the server state are carried across restarts.\nFor persistent datasets, for example TDB, the dataset is persists across restart.\nFor in-memory datasets, the dataset is rebuilt from it\u0026rsquo;s description (this may include loading data from a file) but any changes are lost.\nTemplates A short-cut form for some common set-ups is provided by POSTing with the following parameters (query string or HTML form):\nParameter dbType Either mem or tdb dbName URL path name The dataset name must not be already in-use.\nDatasets are created in directory databases/.\nAssembler example The assembler description contains data and service. It can be sent by posting the assembler RDF graph in any RDF format or by posting from an HTML form (the syntax must be Turtle).\nThe assembler file is stored by the server will be used on restart or when making the dataset active again.\n@@\nRemoving a Dataset Note: DELETE means \u0026ldquo;gone for ever\u0026rdquo;. The dataset name and the details of its configuration are completely deleted and can not be recovered.\nThe data of a TDB dataset is not deleted.\nActive and Offline A dataset is in one of two modes: \u0026ldquo;active\u0026rdquo;, meaning it is services request over HTTP (subject to configuration and security), or \u0026ldquo;offline\u0026rdquo;, meaning the configuration and name is known about by the server but the dataset is not attached to the server. When \u0026ldquo;offline\u0026rdquo;, any persistent data can be manipulated outside the server.\nDatasets are initially \u0026ldquo;active\u0026rdquo;. The transition from \u0026ldquo;active\u0026rdquo; to \u0026ldquo;offline\u0026rdquo; is graceful - all outstanding requests are completed.\nStatistics Pattern: /$/stats/{name}\nStatistics can be obtained for each dataset or all datasets in a single response. /$/stats is treated as a container for this information.\n@@ stats details See Fuseki Server Information for details of statistics kept by a Fuseki server.\nBackup Pattern: /$/backup/{name}\nThis operation initiates a backup and returns a JSON object with the task Id in it.\nBackups are written to the server local directory \u0026lsquo;backups\u0026rsquo; as gzip-compressed N-Quads files.\nSee Tasks for how to monitor a backups progress.\nReturn: A task is allocated a identifier (usually, a number).\n{ \u0026#34;taskId\u0026#34; : \u0026#34;{taskId}\u0026#34; } The task id can be used to construct a URL to get details of the task:\n/$/tasks/{taskId} Pattern: /$/backups-list\nReturns a list of all the files in the backup area of the server. This is useful for managing the files externally.\nThe returned JSON object will have the form { backups: [ ... ] } where the [] array is a list of file names.\nSince 4.7.0 backups are written to a temporary file in the same directory and renamed on completion. In case of server crash, it will not be renamed. This guarantees backups are complete. Cleanup of incomplete backups can be done by users on application / container start: remove all incomplete files.g\nBackup policies Users can use the backup api the Fuseki HTTP Administration Protocol to build backup policies. See issue for more information .\nCompact Pattern: /$/compact/{name}\nThis operations initiates a database compaction task and returns a JSON object with the task Id in it.\nThe optional parameter and value deleteOld=true deletes the database which currently is compacted after compacting completion.\nCompaction ONLY applies to TDB2 datasets, see TDB2 Database Administration for more details of this operation.\nYou can monitor the status of the task via the Tasks portion of the API. A successful compaction will have the finishPoint field set and success field set to true.\nTasks Some operations cause a background task to be executed, backup is an example. The result of such operations includes a json object with the task id and also a Location: header with the URL of the task created.\nThe progress of the task can be monitored with HTTP GET operations:\nPattern: /$/tasks – All asynchronous tasks. Pattern: /$/tasks/{taskId} – A particular task.\nThe URL /$/tasks returns a description of all running and recently tasks. A finished task can be identified by having a finishPoint and success fields.\nEach background task has an id. The URL /$/tasks/{taskId} gets a description about one single task.\nDetails of the last few completed tasks are retained, up to a fixed number. The records will eventually be removed as later tasks complete, and the task URL will then return 404.\nPattern: /$/tasks ; example:\n[ { \u0026#34;finished\u0026#34; : \u0026#34;2014-05-28T12:52:51.860+01:00\u0026#34; , \u0026#34;started\u0026#34; : \u0026#34;2014-05-28T12:52:50.859+01:00\u0026#34; , \u0026#34;task\u0026#34; : \u0026#34;sleep\u0026#34; , \u0026#34;taskId\u0026#34; : \u0026#34;1\u0026#34; , \u0026#34;success\u0026#34; : true } , { \u0026#34;finished\u0026#34; : \u0026#34;2014-05-28T12:53:24.718+01:00\u0026#34; , \u0026#34;started\u0026#34; : \u0026#34;2014-05-28T12:53:14.717+01:00\u0026#34; , \u0026#34;task\u0026#34; : \u0026#34;sleep\u0026#34; , \u0026#34;taskId\u0026#34; : \u0026#34;2\u0026#34; , \u0026#34;success\u0026#34; : true } ] Pattern: /$/tasks/1 : example:\n[ { \u0026#34;finished\u0026#34; : \u0026#34;2014-05-28T13:54:13.608+01:00\u0026#34; , \u0026#34;started\u0026#34; : \u0026#34;2014-05-28T13:54:03.607+01:00\u0026#34; , \u0026#34;task\u0026#34; : \u0026#34;backup\u0026#34; , \u0026#34;taskId\u0026#34; : \u0026#34;1\u0026#34; , \u0026#34;success\u0026#34; : false } ] This is inside an array to make the format returned the same as /$/tasks.\nMetrics Pattern: /$/metrics\n@@\n","permalink":"","tags":null,"title":"Fuseki HTTP Administration Protocol"},{"categories":null,"contents":"Fuseki logs operation details and also provides a standard NCSA request log.\nLogging is via SLF4J over Apache Log4J2, or by the Tomcat configuration if running the WAR file.\nFull Log name Usage org.apache.jena.fuseki.Server General Server Messages org.apache.jena.fuseki.Request NCSA request Log org.apache.jena.fuseki.Fuseki The HTTP request log org.apache.jena.fuseki.Admin Administration operations org.apache.jena.fuseki.Builder Dataset and service build operations org.apache.jena.fuseki.Config Configuration NCSA request Log This log is in NCSA extended/combined log format.\nMany web log analysers can process this format.\nThis log is normally off. The logger name is org.apache.jena.fuseki.Request.\nWhen run as a WAR file inside a webapp container (e.g. Apache Tomcat), the webapp container or reverse proxy will log access requests anyway.\nSetting logging The Fuseki Main engine looks for the log4j2 configuration as follows:\nUse system property log4j2.configurationFile if defined (as usual for log4j2). Use (current directory) if it exists Use java resource on the classpath. Use java resource org/apache/jena/fuseki/ on the classpath. Use a built-in configuration. The last step is a fallback to catch the case where Fuseki has been repackaged into a new WAR file and org/apache/jena/fuseki/ omitted, or run from the base jar. It is better to include org/apache/jena/fuseki/\nThe preferred customization is to use a custom file in the directory where Fuseki Main is run.\nFor the war file packaging, the should go in FUSEKI_BASE which defaults to /etc/fuseki on Linux.\nFor the standalone webapp server, FUSEKI_BASE defaults to directory run/ within the directory where the server is run.\nThe property fuseki.loglogging can also be set to true for additional logging.\nSetting ARQ explain logging Query explanation can be turned on by setting the symbol arq:optReorderBGP in the context to \u0026ldquo;info\u0026rdquo;, \u0026ldquo;fine\u0026rdquo; or \u0026ldquo;all\u0026rdquo;. This can be done in the Assembler file by setting ja:context on the server, dataset, or endpoint:\n[] ja:context [ ja:cxtName \u0026quot;arq:logExec\u0026quot; ; ja:cxtValue \u0026quot;info\u0026quot; ] . Default setting The default\nLogrotate Below is an example logrotate(1) configuration (to go in /etc/logrotate.d) assuming the log file has been put in /etc/fuseki/logs/fuseki.log.\nIt rotates the logs once a month, compresses logs on rotation, and keeps them for 6 months.\nIt uses copytruncate. This may lead to at most one broken log file line.\n/etc/fuseki/logs/fuseki.log { compress monthly rotate 6 create missingok copytruncate # Date in extension. dateext # No need # delaycompress } ","permalink":"","tags":null,"title":"Fuseki Logging"},{"categories":null,"contents":"Fuseki modules are a mechanism to include extension code into a Fuseki server. Modules are invoked during the process of building a Fuseki Main server. A module can modify the server configuration, add new functionality, or react to a server being built and started.\nThis feature was added in Jena version 4.3.0. It is an experimental feature that will evolve based on feedback and use cases.\nThe interface for modules is FusekiModule; if automatcally loaded, the interface is FusekiAutoModule which extends FusekiModule.\nFuseki modules can be provided in two ways:\nLoaded from additional jars on the classpath Programmatically controlling the setup of the FusekiServer server. Automatically loaded Fuseki Modules can be loaded using the JDK ServiceLoader by being placing a jar file on the classpath, together with any additional dependencies. These provide interface FusekiAutoModule. The service loader is controlled by file resources META-INF/services/org.apache.jena.fuseki.main.sys.FusekiAutoModule in the jar file. The module class must have a no-argument constructor.\nThis is often done by placing the file in the development code in src/main/resources/META-INF/services/). The file containing a line with the implementation full class name. If repacking Fuseki with the maven-shade-plugin, make sure the ServicesResourceTransformer is used.\nThe method start is called when the module is loaded. Custom operations can be globally registered at this point (see the Fuseki examples directory).\nA FusekiAutoModule can provide a level, an integer, to control the order in which modules are invoked during server building. Lower numbers are invoked before larger numbers at each step.\nProgrammaticaly configuring a server If creating a Fuseki server from Java, the modules can be autoloaded as described above, or explicitly added to the server builder.\nA FusekiModules object is collection of modules, called at each point in the order given when creating the object.\nFusekiModule myModule = new MyModule(); FusekiModules fmods = FusekiModules.create(myModule); FusekiServer server = FusekiServer.create() ... .fusekiModules(fmods) ... .build(); Fuseki Module operations The module lifecycle during creating a Fuseki server is:\nprepare - called at the start of the server build steps before setting up the datasets. configured - access and modify the setup. This is called after the server has been configured, before the server is built. It defaults to calls to configDataAccessPoint for dataset being hosted by the server. server - called after the built, before the return of There are also operations notified when a server is reloaded while running.\nserverConfirmReload serveReload As of Jena 4.9.0, eeload is not yet supported.\nThe Fuseki start up sequence is:\nserverBeforeStarting - called at the start of server.start() serverAfterStarting - called at the end of server.start() serverStopped - called as just after the server has stopped in the server.stop() call. (note, this is not always called because a server can simply exit the JVM). A Fuseki module does not need to implement all these steps. The default for all steps is \u0026ldquo;do nothing\u0026rdquo;. Usually, an extension will only be interested in certain steps, such as prepare, or the registry information of configuration.\nDuring the configuration step, the Fuseki configuration file for the server is available. If the server is built programmatically without a configuration file, this is null.\nThe configuration file can contain RDF information to build resources (e.g. it can contain additional assembler descriptions not directly linked to the server).\nThere is an example Fuseki Module in the Fuseki examples directory.\nFusekiModule interface /** * Module interface for Fuseki. * \u0026lt;p\u0026gt; * A module is additional code, usually in a separate jar, * but can also be part of the application code. */ public interface FusekiModule extends SubsystemLifecycle { /** * Display name to identify this module. */ public String name(); // -- Build cycle. /** * Called at the start of \u0026#34;build\u0026#34; step. The builder has been set according to the * configuration of API calls and parsing configuration files. No build actions have been carried out yet. * The module can make further FusekiServer.{@link Builder} calls. * The \u0026#34;configModel\u0026#34; parameter is set if a configuration file was used otherwise it is null. */ public default void prepare(FusekiServer.Builder serverBuilder, Set\u0026lt;String\u0026gt; datasetNames, Model configModel) ; /** * Called after the DataAccessPointRegistry has been built. * \u0026lt;p\u0026gt; * The default implementation is to call {@link #configDataAccessPoint(DataAccessPoint, Model)} * for each {@link DataAccessPoint}. * \u0026lt;pre\u0026gt; * dapRegistry.accessPoints().forEach(accessPoint{@literal -\u0026gt;}configDataAccessPoint(accessPoint, configModel)); * \u0026lt;/pre\u0026gt; */ public default void configured(FusekiServer.Builder serverBuilder, DataAccessPointRegistry dapRegistry, Model configModel) { dapRegistry.accessPoints().forEach(accessPoint-\u0026gt;configDataAccessPoint(accessPoint, configModel)); } /** * This method is called for each {@link DataAccessPoint} by the default * implementation of {@link #configured} after the new servers * DataAccessPointRegistry has been built. */ public default void configDataAccessPoint(DataAccessPoint dap, Model configModel) {} /** * Built, not started, about to be returned to the builder caller. */ public default void server(FusekiServer server) { } /** * Confirm or reject a request to reload. */ public default boolean serverConfirmReload(FusekiServer server) { return true; } /** * Perform any operations necessary for a reload. */ public default void serverReload(FusekiServer server) { } // -- Server start up /** * Server starting - called just before server.start happens. */ public default void serverBeforeStarting(FusekiServer server) { } /** * Server started - called just after server.start happens, and before server * .start() returns to the application. */ public default void serverAfterStarting(FusekiServer server) { } /** Server stopping. * Do not rely on this to clear up external resources. * Usually there is no stop phase and the JVM just exits or is killed externally. * */ public default void serverStopped(FusekiServer server) { } /** Module unloaded : do not rely on this happening. */ @Override public default void stop() {} } FusekiAutoModules also provide the org.apache.jena.base.module.SubsystemLifecycle interface.\n","permalink":"","tags":null,"title":"Fuseki Modules"},{"categories":null,"contents":"This page describes how to achieve certain common tasks in the most direct way possible.\nRunning with Apache Tomcat and loading a file. Unpack the distribution. Copy the WAR file into the Apache tomcat webapp directory, under the name \u0026lsquo;fuseki\u0026rsquo; If the user under which Apache tomcat is running does not have write access to /etc, then please make sure to set the environment variable FUSEKI_BASE, whereas the value should be a directory where the user running Apache tomcat is able to write to. In a browser, go to [http://localhost:8080/fuseki/](http://localhost:8080/fuseki) (details such as port number depend on the Tomcat setup). Click on \u0026ldquo;Add one\u0026rdquo;, choose \u0026ldquo;in-memory\u0026rdquo;, choose a name for the URL for the dataset. Go to \u0026ldquo;add data\u0026rdquo; and load the file (single graph). Publish an RDF file as a SPARQL endpoint. Unpack the distribution. Run fuseki-server --file FILE /name Explore a TDB database Unpack the distribution. Run fuseki-server --loc=DATABASE /name In a browser, go to http://localhost:3030//query.html More details on running Fuseki can be found nearby, including running as an operating system service and in a web app or servlet container such as Apache Tomcat or Jetty.\n","permalink":"","tags":null,"title":"Fuseki Quickstart"},{"categories":null,"contents":"A Fuseki server is configured by defining the data services (data and actions available on the data). There is also server configuration although this is often unnecessary.\nThe data services configuration can come from:\nFor Fuseki Full (webapp with UI):\nThe directory FUSEKI_BASE/configuration/ with one data service assembler per file (includes endpoint details and the dataset description.) The system database. This includes uploaded assembler files. It also keeps the state of each data service (whether it\u0026rsquo;s active or offline). The server configuration file config.ttl. For compatibility, the server configuration file can also have data services. The command line, if not running as a web application from a .war file. FUSEKI_BASE is the location of the Fuseki run area.\nFor Fuseki Main:\nThe command line, using --conf to provide a configuration file. The command line, using arguments (e.g. --mem /ds or --tdb2 --loc DB2 /ds). Programmatic configuration of the server. See Fuseki Security for more information on security configuration.\nExamples Example server configuration files can be found at jena-fuseki2/examples.\nSecurity and Access Control Access Control can be configured on any of the server, data service or dataset. Fuseki Data Access Control.\nSeparately, Fuseki Full has request based security filtering provided by Apache Shiro: Fuseki Full Security\nFuseki Configuration File A Fuseki server can be set up using a configuration file. The command-line arguments for publishing a single dataset are a short cut that, internally, builds a default configuration based on the dataset name given.\nThe configuration is an RDF graph. One graph consists of one server description, with a number of services, and each service offers a number of endpoints over a dataset.\nThe example below is all one file (RDF graph in Turtle syntax) split to allow for commentary.\nPrefix declarations Some useful prefix declarations:\nPREFIX fuseki: \u0026lt;\u0026gt; PREFIX rdf: \u0026lt;\u0026gt; PREFIX rdfs: \u0026lt;\u0026gt; PREFIX tdb1: \u0026lt;\u0026gt; PREFIX tdb2: \u0026lt;\u0026gt; PREFIX ja: \u0026lt;\u0026gt; PREFIX : \u0026lt;#\u0026gt; Assembler Initialization All datasets are described by assembler descriptions. Assemblers provide an extensible way of describing many kinds of objects.\nDefining the service name and endpoints available Each data service assembler defines:\nThe base name The operations and endpoint names The dataset for the RDF data. This example offers SPARQL Query, SPARQL Update and SPARQL Graph Store protocol, as well as file upload.\nSee Data Service Configuration Syntax for the complete details of the endpoint configuration description. Here, we show some examples.\nThe original configuration syntax, using, for example, fuseki:serviceQuery, is still supported.\nThe base name is /ds.\n## Updatable in-memory dataset. \u0026lt;#service1\u0026gt; rdf:type fuseki:Service ; fuseki:name \u0026quot;ds\u0026quot; ; # http://host:port/ds fuseki:endpoint [ # SPARQL query service fuseki:operation fuseki:query ; fuseki:name \u0026quot;sparql\u0026quot; ] ; fuseki:endpoint [ # SPARQL query service (alt name) fuseki:operation fuseki:query ; fuseki:name \u0026quot;query\u0026quot; ] ; fuseki:endpoint [ # SPARQL update service fuseki:operation fuseki:update ; fuseki:name \u0026quot;update\u0026quot; ] ; fuseki:endpoint [ # HTML file upload service fuseki:operation fuseki:update ; fuseki:name \u0026quot;update\u0026quot; ] ; fuseki:endpoint [ # SPARQL Graph Store Protocol (read) fuseki:operation fuseki:gsp_r ; fuseki:name \u0026quot;get\u0026quot; ] ; fuseki:endpoint [ # SPARQL Graph Store Protcol (read and write) fuseki:operation fuseki:gsp_rw ; fuseki:name \u0026quot;data\u0026quot; ] ; fuseki:dataset \u0026lt;#dataset\u0026gt; ; . \u0026lt;#dataset\u0026gt; refers to a dataset description in the same file.\nHTTP requests will include the service name: http://host:port/ds/sparql?query=....\nRead-only service This example offers only read-only endpoints (SPARQL Query and HTTP GET SPARQL Graph Store protocol).\nThis service offers read-only access to a dataset with a single graph of data.\n\u0026lt;#service2\u0026gt; rdf:type fuseki:Service ; fuseki:name \u0026quot;/ds-ro\u0026quot; ; # http://host:port/ds-ro fuseki:endpoint [ fuseki:operation fuseki:query ; fuseki:name \u0026quot;sparql\u0026quot; ]; fuseki:endpoint [ fuseki:operation fuseki:query ; fuseki:name \u0026quot;query\u0026quot; ]; fuseki:endpoint [ fuseki:operation fuseki:gsp_r ; fuseki:name \u0026quot;data\u0026quot; ]; fuseki:dataset \u0026lt;#dataset\u0026gt; ; . Data services on the dataset The standard SPARQL operations can also be defined on the dataset URL with no secondary service name:\n\u0026lt;#service2\u0026gt; rdf:type fuseki:Service ; fuseki:name \u0026quot;/dataset\u0026quot; ; fuseki:endpoint [ fuseki:operation fuseki:query ]; fuseki:endpoint [ fuseki:operation fuseki:gsp_r ]; fuseki:dataset \u0026lt;#dataset\u0026gt; ; . HTTP requests use the URL of the dataset.\nSPARQL Query: http://host:port/dataset?query=... Fetch the default graph (SPARQL Graph Store Protocol): http://host:port/dataset?default Server Configuration If you need to load additional classes, or set global parameters, then these go in FUSEKI_BASE/config.ttl.\nAdditional classes can not be loaded if running as a .war file. You will need to create a custom .war file consisting of the contents of the Fuseki web application and the additional classes\nThe server section is optional.\nIf absent, fuseki configuration is performed by searching the configuration file for the type fuseki:Service.\nServer Section [] rdf:type fuseki:Server ; # Server-wide context parameters can be given here. # For example, to set query timeouts: on a server-wide basis: # Format 1: \u0026quot;1000\u0026quot; -- 1 second timeout # Format 2: \u0026quot;10000,60000\u0026quot; -- 10s timeout to first result, then 60s timeout to for rest of query. # See java doc for ARQ.queryTimeout # ja:context [ ja:cxtName \u0026quot;arq:queryTimeout\u0026quot; ; ja:cxtValue \u0026quot;10000\u0026quot; ] ; # Explicitly choose which services to add to the server. # If absent, include all descriptions of type `fuseki:Service`. # fuseki:services (\u0026lt;#service1\u0026gt; \u0026lt;#service2\u0026gt;) . Datasets In-memory An in-memory dataset, with data in the default graph taken from a local file.\n\u0026lt;#books\u0026gt; rdf:type ja:RDFDataset ; rdfs:label \u0026quot;Books\u0026quot; ; ja:defaultGraph [ rdfs:label \u0026quot;books.ttl\u0026quot; ; a ja:MemoryModel ; ja:content [ja:externalContent \u0026lt;file:Data/books.ttl\u0026gt; ] ; ] ; . TDB \u0026lt;#dataset\u0026gt; rdf:type tdb:DatasetTDB ; tdb:location \u0026quot;DB\u0026quot; ; # Query timeout on this dataset (1s, 1000 milliseconds) ja:context [ ja:cxtName \u0026quot;arq:queryTimeout\u0026quot; ; ja:cxtValue \u0026quot;1000\u0026quot; ] ; # Make the default graph be the union of all named graphs. ## tdb:unionDefaultGraph true ; . TDB2 \u0026lt;#dataset\u0026gt; rdf:type tdb:DatasetTDB2 ; tdb:location \u0026quot;DB2\u0026quot; ; # Query timeout on this dataset (1s, 1000 milliseconds) ja:context [ ja:cxtName \u0026quot;arq:queryTimeout\u0026quot; ; ja:cxtValue \u0026quot;1000\u0026quot; ] ; # Make the default graph be the union of all named graphs. ## tdb:unionDefaultGraph true ; . Inference An inference reasoner can be layered on top of a dataset as defined above. The type of reasoner must be selected carefully and should not include more reasoning than is required by the application, as extensive reasoning can be detrimental to performance.\nYou have to build up layers of dataset, inference model, and graph.\n\u0026lt;#dataset\u0026gt; rdf:type ja:RDFDataset; ja:defaultGraph \u0026lt;#inferenceModel\u0026gt; . \u0026lt;#inferenceModel\u0026gt; rdf:type ja:InfModel; ja:reasoner [ ja:reasonerURL \u0026lt;http://example/someReasonerURLHere\u0026gt; ]; ja:baseModel \u0026lt;#baseModel\u0026gt;; . \u0026lt;#baseModel\u0026gt; rdf:type tdb:GraphTDB2; # for example. tdb2:location \u0026quot;/some/path/to/store/data/to\u0026quot;; # etc . where http://example/someReasonerURLHere is one of the URLs below.\nPossible reasoners: Details are in the main documentation for inference.\nGeneric Rule Reasoner:\nThe specific rule set and mode configuration can be set through parameters in the configuration Model.\nTransitive Reasoner:\nA simple \u0026ldquo;reasoner\u0026rdquo; used to help with API development.\nThis reasoner caches a transitive closure of the subClass and subProperty graphs. The generated infGraph allows both the direct and closed versions of these properties to be retrieved. The cache is built when the tbox is bound in but if the final data graph contains additional subProperty/subClass declarations then the cache has to be rebuilt.\nThe triples in the tbox (if present) will also be included in any query. Any of tbox or data graph are allowed to be null.\nRDFS Rule Reasoner:\nA full implementation of RDFS reasoning using a hybrid rule system, together with optimized subclass/subproperty closure using the transitive graph caches. Implements the container membership property rules using an optional data scanning hook. Implements datatype range validation.\nFull OWL Reasoner:\nA hybrid forward/backward implementation of the OWL closure rules.\nMini OWL Reasoner:\nKey limitations over the normal OWL configuration are:\nomits the someValuesFrom =\u0026gt; bNode entailments avoids any guard clauses which would break the find() contract omits inheritance of range implications for XSD datatype ranges Micro OWL Reasoner:\nThis only supports:\nRDFS entailments basic OWL axioms like ObjectProperty subClassOf Property intersectionOf, equivalentClass and forward implication of unionOf sufficient for traversal of explicit class hierarchies Property axioms (inverseOf, SymmetricProperty, TransitiveProperty, equivalentProperty) There is some experimental support for the cheaper class restriction handling which should not be relied on at this point.\n","permalink":"","tags":null,"title":"Fuseki: Configuring Fuseki"},{"categories":null,"contents":"A Fuseki server keeps detailed statistics for each dataset and each service of a dataset keeps counters as to the number of incoming requests, number of successful requests, number of bad requests (i.e client errors), and number of failing requests (i.e. server errors).\nStatistics are available in JSON and in Prometheus format. The Prometheus data includes both database and JVM metrics.\nEndpoints The following servers endpoints are available. They are present in Fuseki/UI; they need to be enabled with Fuseki/main, either on the command line or in the server configuration file with a boolean setting.\nEndpoint Config Property Usage /$/ping fuseki:pingEP Server liveness endpoint /$/stats fuseki:statsEP JSON format endpoint /$/metrics fuseki:metricsEP Prometheus format endpoint Ping The \u0026ldquo;ping\u0026rdquo; service can be used to test whether a Fuseki server is running. Calling this endpoint imposes minimal overhead on the server. Requests return the current time as a plain text string so to show the ping is current.\nHTTP GET and HTTP POST are supported. The GET request is marked \u0026ldquo;no-cache\u0026rdquo;.\nStructure of the Statistics Report The statistics report shows the endpoints for each dataset with total counts of requests, good request and bad requests.\nExample Endpoints with the format \u0026ldquo;_1\u0026rdquo; etc are unnamed services of the dataset.\n{ \u0026quot;datasets\u0026quot; : { \u0026quot;/ds\u0026quot; : { \u0026quot;Requests\u0026quot; : 0 , \u0026quot;RequestsGood\u0026quot; : 0 , \u0026quot;RequestsBad\u0026quot; : 0 , \u0026quot;endpoints\u0026quot; : { \u0026quot;data\u0026quot; : { \u0026quot;RequestsBad\u0026quot; : 0 , \u0026quot;Requests\u0026quot; : 0 , \u0026quot;RequestsGood\u0026quot; : 0 , \u0026quot;operation\u0026quot; : \u0026quot;gsp-rw\u0026quot; , \u0026quot;description\u0026quot; : \u0026quot;Graph Store Protocol\u0026quot; } , \u0026quot;_1\u0026quot; : { \u0026quot;RequestsBad\u0026quot; : 0 , \u0026quot;Requests\u0026quot; : 0 , \u0026quot;RequestsGood\u0026quot; : 0 , \u0026quot;operation\u0026quot; : \u0026quot;gsp-rw\u0026quot; , \u0026quot;description\u0026quot; : \u0026quot;Graph Store Protocol\u0026quot; } , \u0026quot;_2\u0026quot; : { \u0026quot;RequestsBad\u0026quot; : 0 , \u0026quot;Requests\u0026quot; : 0 , \u0026quot;RequestsGood\u0026quot; : 0 , \u0026quot;operation\u0026quot; : \u0026quot;query\u0026quot; , \u0026quot;description\u0026quot; : \u0026quot;SPARQL Query\u0026quot; } , \u0026quot;query\u0026quot; : { \u0026quot;RequestsBad\u0026quot; : 0 , \u0026quot;Requests\u0026quot; : 0 , \u0026quot;RequestsGood\u0026quot; : 0 , \u0026quot;operation\u0026quot; : \u0026quot;query\u0026quot; , \u0026quot;description\u0026quot; : \u0026quot;SPARQL Query\u0026quot; } , \u0026quot;sparql\u0026quot; : { \u0026quot;RequestsBad\u0026quot; : 0 , \u0026quot;Requests\u0026quot; : 0 , \u0026quot;RequestsGood\u0026quot; : 0 , \u0026quot;operation\u0026quot; : \u0026quot;query\u0026quot; , \u0026quot;description\u0026quot; : \u0026quot;SPARQL Query\u0026quot; } , \u0026quot;get\u0026quot; : { \u0026quot;RequestsBad\u0026quot; : 0 , \u0026quot;Requests\u0026quot; : 0 , \u0026quot;RequestsGood\u0026quot; : 0 , \u0026quot;operation\u0026quot; : \u0026quot;gsp-r\u0026quot; , \u0026quot;description\u0026quot; : \u0026quot;Graph Store Protocol (Read)\u0026quot; } , \u0026quot;update\u0026quot; : { \u0026quot;RequestsBad\u0026quot; : 0 , \u0026quot;Requests\u0026quot; : 0 , \u0026quot;RequestsGood\u0026quot; : 0 , \u0026quot;operation\u0026quot; : \u0026quot;update\u0026quot; , \u0026quot;description\u0026quot; : \u0026quot;SPARQL Update\u0026quot; } , \u0026quot;_3\u0026quot; : { \u0026quot;RequestsBad\u0026quot; : 0 , \u0026quot;Requests\u0026quot; : 0 , \u0026quot;RequestsGood\u0026quot; : 0 , \u0026quot;operation\u0026quot; : \u0026quot;update\u0026quot; , \u0026quot;description\u0026quot; : \u0026quot;SPARQL Update\u0026quot; } , \u0026quot;upload\u0026quot; : { \u0026quot;RequestsBad\u0026quot; : 0 , \u0026quot;Requests\u0026quot; : 0 , \u0026quot;RequestsGood\u0026quot; : 0 , \u0026quot;operation\u0026quot; : \u0026quot;upload\u0026quot; , \u0026quot;description\u0026quot; : \u0026quot;File Upload\u0026quot; } } } } } ","permalink":"","tags":null,"title":"Fuseki: Server Information"},{"categories":null,"contents":"See the Fuseki2 documentation. This page covers Fuseki v1. Fuseki1 is deprecated and has been retired. The last release of Jena with this module is Jena 3.9.0.\nFuseki is a SPARQL server. It provides REST-style SPARQL HTTP Update, SPARQL Query, and SPARQL Update using the SPARQL protocol over HTTP.\nThe relevant SPARQL standards are:\nSPARQL 1.1 Query SPARQL 1.1 Update SPARQL 1.1 Protocol SPARQL 1.1 Graph Store HTTP Protocol Download Fuseki1 Binaries for Fuseki1 are available from the maven repositories.\nThe source code is available in the Apache Jena source release.\nGetting Started With Fuseki This section provides a brief guide to getting up and running with a simple server installation. It uses the SOH (SPARQL over HTTP) scripts included in the download.\nDownload the latest jena-fuseki-*-distribution\nUnpack the downloaded file with unzip or tar zxfv\nMove into the newly-created apache-jena-fuseki-* directory\n(Linux) chmod +x fuseki-server bin/s-*\nRun a server\n./fuseki-server \u0026ndash;update \u0026ndash;mem /ds\nThe server logging goes to the console:\n09:25:41 INFO Fuseki :: Dataset: in-memory 09:25:41 INFO Fuseki :: Update enabled 09:25:41 INFO Fuseki :: Fuseki development 09:25:41 INFO Fuseki :: Jetty 7.2.1.v20101111 09:25:41 INFO Fuseki :: Dataset = /ds 09:25:41 INFO Fuseki :: Started 2011/01/06 09:25:41 GMT on port 3030 User Interface The Fuseki download includes a number of services:\nSPARQL Query, SPARQL Update, and file upload to a selected dataset. Link to the documentation (here). Validators for SPARQL query and update and for non-RDF/XML formats. For the control panel:\nIn a browser, go to http://localhost:3030/ Click on Control Panel Select the dataset (if set up above, there is only one choice). The page offers SPARQL operations and file upload acting on the selected dataset.\nScript Control In a new window:\nLoad some RDF data into the default graph of the server:\ns-put http://localhost:3030/ds/data default books.ttl Get it back:\ns-get http://localhost:3030/ds/data default Query it with SPARQL using the \u0026hellip;/query endpoint.\ns-query --service http://localhost:3030/ds/query 'SELECT * {?s ?p ?o}' Update it with SPARQL using the \u0026hellip;/update endpoint.\ns-update --service http://localhost:3030/ds/update 'CLEAR DEFAULT' Security and Access Control Fuseki does not currently offer security and access control itself.\nAuthentication and control of the number of concurrent requests can be added using an Apache server and either blocking the Fuseki port to outside traffic (e.g. on Amazon\u0026rsquo;s EC2) or by listening only the localhost network interface. This is especially important for update endpoints (SPARQL Update, SPARQL Graph Store protocol with PUT/POST/DELETE enabled).\nData can be updated without access control if the server is started with the --update argument. If started without that argument, data is read-only.\nLogging Fuseki uses Log4J for logging. There are two main logging channels:\nThe general server messages: org.apache.jena.fuseki.Server A channel for all request messages: org.apache.jena.fuseki.Fuseki The default settings are (this is an extract of a log4j properties file):\n# Fuseki # Server log. # Request log. # Internal logs Server URI scheme This details the service URIs for Fuseki:\nhttp://*host*/dataset/query \u0026ndash; the SPARQL query endpoint. http://*host*/dataset/update \u0026ndash; the SPARQL Update language endpoint. http://*host*/dataset/data \u0026ndash; the SPARQL Graph Store Protocol endpoint. http://*host*/dataset/upload \u0026ndash; the file upload endpoint. Where dataset is a URI path. Note that Fuseki defaults to using port 3030 so host is often localhost:3030.\nImportant - While you may set dataset to be the text dataset this should be avoided since it may interfere with the function of the control panel and web pages.\nThe URI http://host/dataset/sparql is currently mapped to /query but this may change to being a general purpose SPARQL query endpoint.\nRunning a Fuseki Server The server can be run with the script fuseki-server. Common forms are:\nfuseki-server --mem /DatasetPathName fuseki-server --file=FILE /DatasetPathName fuseki-server --loc=DB /DatasetPathName fuseki-server --config=ConfigFile There is an option --port=PORT to set the port number. It defaults to 3030.\n/DatasetPathName is the name under which the dataset will be accessible over HTTP. Please see the above section on Server URI scheme for notes regarding available URIs and choice of this name\nThe server will service read requests only unless the --update argument is used.\nThe full choice of dataset forms is:\nFuseki Dataset Descriptions\n--mem Create an empty, in-memory (non-persistent) dataset. --file=FILE Create an empty, in-memory (non-persistent) dataset, then load FILE into it. --loc=DIR Use an existing TDB database. Create an empty one if it does not exist. --desc=assemblerFile Construct a dataset based on the general assembler description. --config=ConfigFile Construct one or more service endpoints based on the configuration description. A copy of TDB is included in the standalone server. An example assembler file for TDB is in tdb.ttl.\nFuseki Server Arguments\n--help Print help message. --port=*number* Run on port number (default is 3030). --localhost Listen only to the localhost network interface. --update Allow update. Otherwise only read requests are served (ignored if a configuration file is given). Fuseki Server starting with an empty dataset fuseki-server --update --mem /ds runs the server on port 3030 with an in-memory dataset. It can be accessed via the appropriate protocol at the following URLs:\nSPARQL query: http://localhost:3030/ds/query SPARQL update: http://localhost:3030/ds/update SPARQL HTTP update: http://localhost:3030/ds/data The SPARQL Over HTTP scripts take care of naming and protocol details. For example, to load in a file data.rdf:\ns-put http://localhost:3030/ds/data default data.rdf Fuseki Server and TDB Fuseki includes a built-in version of TDB. Run the server with the --desc argument\nfuseki-server --desc tdb.ttl /ds and a database in the directory DB, an assembler description of:\n@prefix rdf: \u0026lt;\u0026gt; . @prefix rdfs: \u0026lt;\u0026gt; . @prefix ja: \u0026lt;\u0026gt; . @prefix tdb: \u0026lt;\u0026gt; . \u0026lt;#dataset\u0026gt; rdf:type tdb:DatasetTDB ; tdb:location \u0026quot;DB\u0026quot; ; . The form:\nfuseki-server --loc=DB /ds is a shorthand for such an assembler with location DB.\nTo make triples from all the named graphs appear as the default, unnamed graph, use:\n\u0026lt;#dataset\u0026gt; rdf:type tdb:DatasetTDB ; tdb:location \u0026quot;DB\u0026quot; ; tdb:unionDefaultGraph true ; . Fuseki Server and general dataset descriptions The Fuseki server can be given an assembler description to build a variety of model and datasets types.\nfuseki-server --desc assembler.ttl /ds Full details of setting up models assembler is given in the assembler documentation and assembler howto.\nA general dataset is described by:\n# Dataset of default graph and one named graph. \u0026lt;#dataset\u0026gt; rdf:type ja:RDFDataset ; ja:defaultGraph \u0026lt;#modelDft\u0026gt; ; ja:namedGraph [ ja:graphName \u0026lt;\u0026gt; ; ja:graph \u0026lt;#model1\u0026gt; ] ; . \u0026lt;#modelDft\u0026gt; a ja:MemoryModel ; ja:content [ ja:externalContent \u0026lt;file:Data.ttl\u0026gt; . \u0026lt;#model1\u0026gt; rdf:type ja:MemoryModel ; ja:content [ ja:externalContent \u0026lt;file:FILE-1.ttl\u0026gt; ] ; ja:content [ ja:externalContent \u0026lt;file:FILE-2.ttl\u0026gt; ] ; . The models can be Jena inference models.\nFuseki Configuration File A Fuseki server can be set up using a configuration file. The command-line arguments for publishing a single dataset are a short cut that, internally, builds a default configuration based on the dataset name given.\nThe configuration is an RDF graph. One graph consists of one server description, with a number of services, and each service offers a number of endpoints over a dataset.\nThe example below is all one file (RDF graph in Turtle syntax) split to allow for commentary.\nPrefix declarations Some useful prefix declarations:\n@prefix fuseki: \u0026lt;\u0026gt; . @prefix rdf: \u0026lt;\u0026gt; . @prefix rdfs: \u0026lt;\u0026gt; . @prefix tdb: \u0026lt;\u0026gt; . @prefix ja: \u0026lt;\u0026gt; . @prefix : \u0026lt;#\u0026gt; . Server Section Order of the file does not matter to the machine, but it\u0026rsquo;s useful to start with the server description, then each of the services with its datasets.\n[] rdf:type fuseki:Server ; # Server-wide context parameters can be given here. # For example, to set query timeouts: on a server-wide basis: # Format 1: \u0026quot;1000\u0026quot; -- 1 second timeout # Format 2: \u0026quot;10000,60000\u0026quot; -- 10s timeout to first result, then 60s timeout to for rest of query. # See java doc for ARQ.queryTimeout # ja:context [ ja:cxtName \u0026quot;arq:queryTimeout\u0026quot; ; ja:cxtValue \u0026quot;10000\u0026quot; ] ; # Services available. Only explicitly listed services are configured. # If there is a service description not linked from this list, it is ignored. fuseki:services ( \u0026lt;#service1\u0026gt; \u0026lt;#service2\u0026gt; ) . Assembler Initialization All datasets are described by assembler descriptions. Assemblers provide an extensible way of describing many kinds of objects. Set up any assembler extensions - here, the TDB assembler support.\nService 1 This service offers SPARQL Query, SPARQL Update and SPARQL Graph Store protocol, as well as file upload, on an in-memory dataset. Initially, the dataset is empty.\n## --------------------------------------------------------------- ## Updatable in-memory dataset. \u0026lt;#service1\u0026gt; rdf:type fuseki:Service ; fuseki:name \u0026quot;ds\u0026quot; ; # http://host:port/ds fuseki:serviceQuery \u0026quot;query\u0026quot; ; # SPARQL query service fuseki:serviceQuery \u0026quot;sparql\u0026quot; ; # SPARQL query service fuseki:serviceUpdate \u0026quot;update\u0026quot; ; # SPARQL query service fuseki:serviceUpload \u0026quot;upload\u0026quot; ; # Non-SPARQL upload service fuseki:serviceReadWriteGraphStore \u0026quot;data\u0026quot; ; # SPARQL Graph store protocol (read and write) # A separate read-only graph store endpoint: fuseki:serviceReadGraphStore \u0026quot;get\u0026quot; ; # SPARQL Graph store protocol (read only) fuseki:dataset \u0026lt;#dataset-mem\u0026gt; ; . \u0026lt;#dataset-mem\u0026gt; rdf:type ja:RDFDataset . Service 2 This service offers a number of endpoints. It is read-only, because only read-only endpoints are defined (SPARQL Query and HTTP GET SPARQl Graph Store protocol). The dataset is a single in-memory graph:\nThis service offers read-only access to a dataset with a single graph of data.\n\u0026lt;#service2\u0026gt; rdf:type fuseki:Service ; fuseki:name \u0026quot;books\u0026quot; ; # http://host:port/books fuseki:serviceQuery \u0026quot;query\u0026quot; ; # SPARQL query service fuseki:serviceReadGraphStore \u0026quot;data\u0026quot; ; # SPARQL Graph store protocol (read only) fuseki:dataset \u0026lt;#books\u0026gt; ; . \u0026lt;#books\u0026gt; rdf:type ja:RDFDataset ; rdfs:label \u0026quot;Books\u0026quot; ; ja:defaultGraph [ rdfs:label \u0026quot;books.ttl\u0026quot; ; a ja:MemoryModel ; ja:content [ja:externalContent \u0026lt;file:Data/books.ttl\u0026gt; ] ; ] ; . Service 3 This service offers SPARQL query access only to a TDB database. The TDB database can have specific features set, such as query timeout or making the default graph the union of all named graphs.\n\u0026lt;#service3\u0026gt; rdf:type fuseki:Service ; fuseki:name \u0026quot;tdb\u0026quot; ; # http://host:port/tdb fuseki:serviceQuery \u0026quot;sparql\u0026quot; ; # SPARQL query service fuseki:dataset \u0026lt;#dataset\u0026gt; ; . \u0026lt;#dataset\u0026gt; rdf:type tdb:DatasetTDB ; tdb:location \u0026quot;DB\u0026quot; ; # Query timeout on this dataset (1s, 1000 milliseconds) ja:context [ ja:cxtName \u0026quot;arq:queryTimeout\u0026quot; ; ja:cxtValue \u0026quot;1000\u0026quot; ] ; # Make the default graph be the union of all named graphs. ## tdb:unionDefaultGraph true ; . SPARQL Over HTTP SOH (SPARQL Over HTTP) is a set of command-line scripts for working with SPARQL 1.1. SOH is server-independent and will work with any compliant SPARQL 1.1 system offering HTTP access.\nSee the SPARQL Over HTTP page.\nExamples # PUT a file s-put http://localhost:3030/ds/data default D.nt # GET a file s-get http://localhost:3030/ds/data default # PUT a file to a named graph s-put http://localhost:3030/ds/data http://example/graph D.nt # Query s-query --service http://localhost:3030/ds/query 'SELECT * {?s ?p ?o}' # Update s-update --service http://localhost:3030/ds/update Use from Java SPARQL Query ARQ\u0026rsquo;s QueryExecutionFactory.sparqlService can be used.\nSPARQL Update See UpdateExecutionFactory.createRemote\nSPARQL HTTP See DatasetAccessor\n","permalink":"","tags":null,"title":"Fuseki: serving RDF data over HTTP"},{"categories":null,"contents":"SPARQL Standards The relevant SPARQL 1.1 standards are:\nSPARQL 1.1 Query SPARQL 1.1 Update SPARQL 1.1 Protocol SPARQL 1.1 Graph Store HTTP Protocol SPARQL 1.1 Query Results JSON Format SPARQL 1.1 Query Results CSV and TSV Formats SPARQL Query Results XML Format RDF Standards Some RDF 1.1 standards\nRDF 1.1 Turtle RDF 1.1 Trig RDF 1.1 N-Triples RDF 1.1 N-Quads JSON-LD ","permalink":"","tags":null,"title":"Fuseki: SPARQL and RDF Standards"},{"categories":null,"contents":" Dataset Transactions Concurrency how-to Handling concurrent access to Jena models Event handler how-to Responding to events Stream manager how-to Redirecting URLs to local files Model factory Creating Jena models of various kinds RDF frames Viewing RDF statements as frame-like objects Typed literals Creating and extracting RDF typed literals SSE SPARQL Syntax Expressions Repacking Jena jars Jena Initialization ","permalink":"","tags":null,"title":"General notes and how-to's"},{"categories":null,"contents":"Details of the GeoSPARQL support are proivded on the GeoSPARQL page.\nThe assembler for GeoSPARQL support is part of the jena-geosparql artifact and must be on the Fuseki server classpath, along with its dependencies.\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-geosparql\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;...\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; or download the binary from the Maven central repository org/apache/jena/jena-geosparql\nThe GeoSPARQL assembler can be used in a Fuseki configuration file.\nThis example is of a read-only:\nPREFIX fuseki: \u0026lt;\u0026gt; PREFIX rdf: \u0026lt;\u0026gt; PREFIX rdfs: \u0026lt;\u0026gt; PREFIX tdb2: \u0026lt;\u0026gt; PREFIX ja: \u0026lt;\u0026gt; PREFIX geosparql: \u0026lt;\u0026gt; \u0026lt;#service\u0026gt; rdf:type fuseki:Service; fuseki:name \u0026#34;geo\u0026#34;; fuseki:endpoint [ fuseki:operation fuseki:query; ] ; fuseki:dataset \u0026lt;#geo_ds\u0026gt; . \u0026lt;#geo_ds\u0026gt; rdf:type geosparql:geosparqlDataset ; geosparql:spatialIndexFile \u0026#34;DB/spatial.index\u0026#34;; geosparql:dataset \u0026lt;#baseDataset\u0026gt; ; . \u0026lt;#baseDataset\u0026gt; rdf:type tdb2:DatasetTDB2 ; tdb2:location \u0026#34;DB/\u0026#34; ; . It is possible to run with a data file loaded into memory and an spatial in-memory index:\nPREFIX fuseki: \u0026lt;\u0026gt; PREFIX rdf: \u0026lt;\u0026gt; PREFIX rdfs: \u0026lt;\u0026gt; PREFIX ja: \u0026lt;\u0026gt; PREFIX geosparql: \u0026lt;\u0026gt; \u0026lt;#service\u0026gt; rdf:type fuseki:Service; fuseki:name \u0026#34;ds\u0026#34;; fuseki:endpoint [ fuseki:operation fuseki:query; ] ; fuseki:dataset \u0026lt;#geo_ds\u0026gt; . # In-memory data and index. \u0026lt;#geo_ds\u0026gt; rdf:type geosparql:geosparqlDataset ; geosparql:dataset \u0026lt;#baseDataset\u0026gt; . \u0026lt;#baseDataset\u0026gt; rdf:type ja:MemoryDataset ; ja:data \u0026lt;file:geosparql_data.ttl\u0026gt; ; . The full assembler properties with the default settings is:\n\u0026lt;#geo_ds\u0026gt; rdf:type geosparql:GeosparqlDataset ; # Build in-memory is absent. geosparql:spatialIndexFile \u0026#34;spatial.index\u0026#34;; ## Default settings. See documentation for meanings. geosparql:inference true ; geosparql:queryRewrite true ; geosparql:indexEnabled true ; geosparql:applyDefaultGeometry false ; # 3 item lists: [Geometry Literal, Geometry Transform, Query Rewrite] geosparql:indexSizes \u0026#34;-1,-1,-1\u0026#34; ; # Default - unlimited. geosparql:indexExpires \u0026#34;5000,5000,5000\u0026#34; ; # Default - time in milliseconds. ## Required setting - data over which GeoSPARQL is applied. geosparql:dataset \u0026lt;#baseDataset\u0026gt; ; . ","permalink":"","tags":null,"title":"GeoSPARQL Assembler"},{"categories":null,"contents":"This application provides a HTTP server compliant with the GeoSPARQL standard.\nGeoSPARQL can also be integrated with Fuseki using the GeoSPARQL assembler with a general Fuseki server.\njena-fuseki-geosparql GeoSPARQL Fuseki can be accessed as an embedded server using Maven etc. from Maven Central or run from the command line. SPARQL queries directly on Jena Datasets and Models can be done using the GeoSPARQL Jena module.\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-fuseki-geosparql\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;...\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; or download the binary from the Maven central repository org/apache/jena/jena-fuseki-geosparql\nThis uses the embedded server Fuseki and provides additional parameters for dataset loading.\nThe project uses the GeoSPARQL implementation from the GeoSPARQL Jena module, which includes a range of functions in addition to those from the GeoSPARQL standard.\nCurrently, there is no GUI interface as provided with this server.\nThe intended usage is to specify a TDB folder (either TDB1 or TDB2, created if required) for persistent storage of the dataset. File loading, inferencing and data conversion operations can also be specified to load and manipulate data into the dataset. When the server is restarted these conversion operations are not required again (as they have been stored in the dataset) unless there are relevant changes. The TDB dataset can also be prepared and manipulated programatically using the Jena API.\nUpdates can be made to the dataset while the Fuseki server is running. However, these changes will not be applied to inferencing and spatial indexes until the server restarts (any default or specified spatial index file must not exists to trigger building). This is due to the current implementation of RDFS inferencing in Jena (and is required in any Fuseki server with inferencing) and the selected spatial index.\nA subset of the EPSG spatial/coordinate reference systems are included by default from the Apache SIS project ( The full EPSG dataset is not distributed due to the EPSG terms of use being incompatible with the Apache Licence. Several options are available to include the EPSG dataset by setting the SIS_DATA environment variable (\nIt is expected that at least one Geometry Literal or Geo Predicate is present in a dataset (otherwise a standard Fuseki server can be used). A spatial index is created and new data cannot be added to the index once built. The spatial index can optionally be stored for future usage and needs to removed from a TDB folder if the index is to rebuilt.\nClarifications on GeoSPARQL Geographic Markup Language (GML) GeoSPARQL refers to the Geographic Markup Language (GML) as one format for GeometryLiterals. This does not mean that GML is part of the GeoSPARQL standard. Instead a subset of geometry encodings from the GML standards are permitted (specifically the GML 2.0 Simple Features Profile (10-100r3) is supported by GeoSPARQL Jena). The expected encoding of data is in RDF triples and can be loaded from any RDF file format supported by Apache Jena. Conversion of GML to RDF is out of scope of the GeoSPARQL standard and Apache Jena.\nGeo Predicates Lat/Lon Historically, geospatial data has frequently been encoded as Latitude/Longitude coordinates in the WGS84 coordinate reference system. The GeoSPARQL standard specifically chooses not to adopt this approach and instead uses the more versatile GeometryLiteral, which permits multiple encoding formats that support multiple coordinate reference systems and geometry shapes. Therefore, Lat/Lon Geo Predicates are not part of the GeoSPARQL standard. However, GeoSPARQL Jena provides two methods to support users with geo predicates in their geospatial data.\nConversion of Geo Predicates to the GeoSPARQL data structure (encoding the Lat/Lon as a Point geometry). Spatial extension which provides property and filter functions accepting Lat/Lon arguments. The Spatial extension functions (documented in the GeoSPARQL Jena module) support triples in either GeoSPARQL data structure or Geo Predicates. Therefore, converting a dataset to GeoSPARQL will not lose functionality. By converting to the GeoSPARQL data structure, datasets can include a broader range of geospatial data.\nCommand Line Run from the command line and send queries over HTTP.\njava -jar jena-fuseki-geosparql-VER.jar ARGS\nwritten geosparql-fuseki below.\nExamples java -jar jena-fuseki-geosparql-VER.jar -rf \u0026quot;geosparql_test.rdf\u0026quot; -i\nThe example file geosparql_test.rdf in the GitHub repository contains several geometries in geodectic WGS84 (EPSG:4326). The example file geosparql_test_27700.rdf is identical but in the projected OSGB36 (EPSG:27770) used in the United Kingdom. Both will return the same results as GeoSPARQL treats all SRS as being projected. RDFS inferencing is applied using the GeoSPARQL schema to infer additional relationships (which aren\u0026rsquo;t asserted in the example files) that are used in the spatial operations and data retrieval.\nExamples:\nLoad RDF file (XML format) into memory and run server: geosparql-fuseki -rf \u0026quot;test.rdf\u0026quot;\nLoad RDF file (TTL format: default) into memory, apply GeoSPARQL schema with RDFS inferencing and run server: geosparql-fuseki -rf \u0026quot;test.rdf\u0026quot; -i\nLoad RDF file into memory, write spatial index to file and run server: geosparql-fuseki -rf \u0026quot;test.rdf\u0026quot; -si \u0026quot;spatial.index\u0026quot;\nLoad RDF file into persistent TDB and run server: geosparql-fuseki -rf \u0026quot;test.rdf\u0026quot; -t \u0026quot;TestTDB\u0026quot;\nLoad from persistent TDB and run server: geosparql-fuseki -t \u0026quot;TestTDB\u0026quot;\nLoad from persistent TDB, change port and run server: geosparql-fuseki -t \u0026quot;TestTDB\u0026quot; -p 3030\nSee rdf-tables in Output Formats/Serialisations for supported RDF format keywords.\nN.B. Windows Powershell will strip quotation pairs from arguments and so triple quotation pairs may be required, e.g. \u0026ldquo;\u0026ldquo;\u0026ldquo;test.rdf\u0026rdquo;\u0026rdquo;\u0026rdquo;. Otherwise, logging output will be sent to a file called \u0026ldquo;xml\u0026rdquo;. Also, \u0026ldquo;The input line is too long\u0026rdquo; error can mean the path to the exceeds the character limit and needs shortening.\nEmbedded Server Run within a Java application to provide GeoSPARQL support over HTTP to other applications:\nFusekiLogging.setLogging(); GeosparqlServer server = new GeosparqlServer(portNumber, datasetName, isLoopbackOnly, dataset, isUpdate); SPARQL Query Example Once the default server is running it can be queried using Jena as follows:\nString service = \u0026quot;http://localhost:3030/ds\u0026quot;; String query = ....; try (QueryExecution qe = QueryExecution.service(service).query(query).build()) { ResultSet rs = qe.execSelect(); ResultSetFormatter.outputAsTSV(rs); } The server will respond to any valid SPARQL HTTP so an alternative SPARQL framework can be used. More information on SPARQL querying using Jena can be found on their website (\nSIS_DATA Environment Variable The Apache SIS library is used to support the recognition and transformation of Coordinate/Spatial Reference Systems. These Reference Systems are published as the EPSG dataset. The full EPSG dataset is not distributed due to the EPSG terms of use being incompatible with the Apache Licence. A subset of the EPSG spatial/coordinate reference systems are included by default but the wider dataset may be required. Several options are available to include the EPSG dataset by setting the SIS_DATA environment variable (\nAn embedded EPSG dataset can be included in an application by adding the following dependency:\nGradle dependency in build.gradle\next.sisVersion = \u0026ldquo;0.8\u0026rdquo; implementation \u0026ldquo;org.apache.sis.non-free:sis-embedded-data:$sisVersion\u0026rdquo;\nMaven dependency in pom.xml\norg.apache.sis.non-free sis-embedded-data 0.8 Command Line Arguments Boolean options that have false defaults only require \u0026ldquo;\u0026ndash;option\u0026rdquo; to make true in release v1.0.7 or later. Release v1.0.6 and earlier use the form \u0026ldquo;\u0026ndash;option true\u0026rdquo;.\n1) Port --port, -p The port number of the server. Default: 3030\n2) Dataset name --dataset, -d The name of the dataset used in the URL. Default: ds\n3) Loopback only --loopback, -l The server only accepts local host loopback requests. Default: true\n4) SPARQL update allowed --update, -u The server accepts updates to modify the dataset. Default: false\n5) TDB folder --tdb, -t An existing or new TDB folder used to persist the dataset. Default set to memory dataset. If accessing a dataset for the first time with GeoSPARQL then consider the --inference, --default_geometry and --validate options. These operations may add additional statements to the dataset. TDB1 Dataset will be used by default, use -t \u0026lt;folder_path\u0026gt; -t2 options for TDB2 Dataset.\n6) Load RDF file into dataset --rdf_file, -rf Comma separated list of [RDF file path#graph name\u0026amp;RDF format] to load into dataset. Graph name is optional and will use default graph. RDF format is optional (default: ttl) or select from one of the following: json-ld, json-rdf, nt, nq, thrift, trig, trix, ttl, ttl-pretty, xml, xml-plain, xml-pretty. e.g. test.rdf#test\u0026amp;xml,test2.rdf will load test.rdf file into test graph as RDF/XML and test2.rdf into default graph as TTL.\nConsider the --inference, --default_geometry and --validate options. These operations may add additional statements to the dataset.\nThe combination of specifying -t TDB folder and -rf loading RDF file will store the triples in the persistent TDB dataset. Therefore, loading the RDF file would only be required once.\n7) Load Tabular file into dataset --tabular_file, -tf Comma separated list of [Tabular file path#graph name|delimiter] to load into dataset. See RDF Tables for table formatting. Graph name is optional and will use default graph. Column delimiter is optional and will default to COMMA. Any character except \u0026lsquo;:\u0026rsquo;, \u0026lsquo;^\u0026rsquo; and \u0026lsquo;|\u0026rsquo;. Keywords TAB, SPACE and COMMA are also supported. e.g. test.rdf#test|TAB,test2.rdf will load test.rdf file into test graph as TAB delimited and test2.rdf into default graph as COMMA delimited.\nSee RDF Tables project ( for more details on tabular format.\nConsider the --inference, --default_geometry and --validate options. These operations may add additional statements to the dataset.\nThe combination of specifying -t TDB folder and -tf loading tabular file will store the triples in the persistent TDB dataset. Therefore, loading the tabular file would only be required once.\n8) GeoSPARQL RDFS inference --inference, -i Enable GeoSPARQL RDFS schema and inferencing (class and property hierarchy). Inferences will be applied to the dataset. Updates to dataset may require server restart. Default: false\nThe combination of specifying -t TDB folder and -i GeoSPARQL RDFS inference will store the triples in the persistent TDB dataset. Therefore, the GeoSPARL RDFS inference option would only be required when there is a change to the dataset.\n9) Apply hasDefaultGeometry --default_geometry, -dg Apply hasDefaultGeometry to single Feature hasGeometry Geometry statements. Additional properties will be added to the dataset. Default: false\nThe combination of specifying -t TDB folder and -dg apply hasDefaultGeometry will modify the triples in the persistent TDB dataset. Therefore, applying hasDefaultGeometry would only be required when there is a change to the dataset.\n10) Validate Geometry Literals --validate, -v Validate that the Geometry Literals in the dataset are valid. Default: false\n11) Convert Geo predicates --convert_geo, -c Convert Geo predicates in the data to Geometry with WKT WGS84 Point GeometryLiteral. Default: false\nThe combination of specifying -t TDB folder and -c convert Geo predicates will modify the triples in the persistent TDB dataset. Therefore, converting the Geo predicates would only be required once.\n12) Remove Geo predicates --remove_geo, -rg Remove Geo predicates in the data after combining to Geometry. Default: false\nThe combination of specifying -t TDB folder and -rg remove Geo predicates will modify the triples in the persistent TDB dataset. Therefore, removing the Geo predicates would only be required once.\n13) Query Rewrite enabled --rewrite, -r Enable query rewrite extension of GeoSPARQL standard to simplify queries, which relies upon the \u0026lsquo;hasDefaultGeometry\u0026rsquo; property. The \u0026lsquo;default_geometry\u0026rsquo; may be useful for adding the \u0026lsquo;hasDefaultGeometry\u0026rsquo; to a dataset. Default: true\n14) Indexing enabled --index, -x Enable caching of re-usable data to improve query performance. Default: true See GeoSPARQL Jena project for more details.\n15) Index sizes --index_sizes, -xs List of Index item sizes: [Geometry Literal, Geometry Transform, Query Rewrite]. Unlimited: -1, Off: 0 Unlimited: -1, Off: 0, Default: -1,-1,-1\n16) Index expiries --index_expiry, -xe List of Index item expiry in milliseconds: [Geometry Literal, Geometry Transform, Query Rewrite]. Off: 0, Minimum: 1001, Default: 5000,5000,500\n17) Spatial Index file --spatial_index, -si File to load or store the spatial index. Default to \u0026ldquo;spatial.index\u0026rdquo; in TDB folder if using TDB option and this option is not set. Otherwise spatial index is not stored and rebuilt at start up. The spatial index file must not exist for the index to be built (e.g. following changes to the dataset).\n18) Properties File Supply the above parameters as a file:\n$ java Main @/tmp/parameters Future Work GUI to assist users when querying a dataset. ","permalink":"","tags":null,"title":"GeoSPARQL Fuseki"},{"categories":null,"contents":"We are always happy to help you get your Jena project going. Jena has been around for many years, there are many archives of past questions, tutorials and articles on the web. A quick search may well answer your question directly! If not, please feel free to post a question to the user support list (details below).\nEmail support lists The main user support list is To join this list, please send an email to: from the email account you want to subscribe with. This list is a good place to ask for advice on developing Jena-based applications, or solving a problem with using Jena. Please see below for notes on asking good questions. The list is archived at or externally at\nThe developers list is To join this list, please send an email to: from the email account you want to subscribe with. This list is a good place to discuss the development of the Jena platform itself, including patches you want to submit.\nTo unsubscribe from a mailing list, send email to\nFull details of Apache mailing lists:\nOther resources There are curated collections of Jena questions on StackOverflow tagged \u0026lsquo;jena\u0026rsquo; and \u0026lsquo;apache-jena\u0026rsquo;. There are also questions and answers about SPARQL.\nHow to ask a good question Asking good questions is the best way to get good answers. Try to follow these tips:\nMake the question precise and specific. \u0026ldquo;My code doesn\u0026rsquo;t work\u0026rdquo;, for example, does not help us to help you as much as \u0026ldquo;The following SPARQL query gave me an answer I didn\u0026rsquo;t expect\u0026rdquo;.\nShow that you\u0026rsquo;ve tried to solve the problem yourself. Everyone who answers questions on the list has a full-time job or study to do; no-one gets paid for answering these support questions. Spend their goodwill wisely: \u0026ldquo;Here\u0026rsquo;s the code I tried\u0026hellip;\u0026rdquo; or \u0026ldquo;I read in the documentation that \u0026hellip;\u0026rdquo; shows that you\u0026rsquo;ve at least made some effort to find things out for yourself.\nWhere appropriate show a complete test case. Seeing where your code goes wrong is generally much easier if we can run it our computers. Corollaries: don\u0026rsquo;t post your entire project - take some time to reduce it down to a minimal test case. Include enough data - runnable code is no help if critical resources like *.rdf files are missing. Reducing your code down to a minimal test case is often enough for you to figure out the problem yourself, which is always satisfying!\nDon\u0026rsquo;t re-post your question after only a few hours. People are busy, and may be in a different timezone to you. If you\u0026rsquo;re not sure if your question made it to the list, look in the archive.\nAdding lots of exclamation marks or other punctuation will not move your question up the queue. Quite the reverse, in fact.\nAsk questions on the list, rather than emailing the developers directly. This gives us the chance to share the load of answering questions, and also ensures that answers are archived in case they\u0026rsquo;re of use to others in the future.\n","permalink":"","tags":null,"title":"Getting help with Jena"},{"categories":null,"contents":"We welcome your contribution towards making Jena a better platform for semantic web and linked data applications. We appreciate feature suggestions, bug reports and patches for code or documentation.\nIf you need help using Jena, please see our getting help page.\nHow to contribute You can help us sending your suggestions, feature requests and bug reports (as well as patches) using Jena\u0026rsquo;s GitHub Issues or Jena JIRA.\nYou can discuss your contribution, before or after adding it to Jira, on the mailing list. You can also help other users by answering their questions on the mailing list. See the subscription instructions for details.\nPlease see the Reviewing Contributions page for details of what committers will be looking for when reviewing contributions.\nImproving the Website You can also help us improve the documentation on this website via Pull Request.\nThe website source lives in an Apache git repository at repo jena-site. There is also a full read-write mirror on GitHub, see jena-site on GitHub:\ngit clone cd jena-site You can then make a branch, prepare your changes and submit a pull request. Please see the in that repository for more details.\nSNAPSHOTs If you use Apache Maven and you are not afraid of being on the bleeding-edge, you can help us by testing our SNAPSHOTs which you can find in the Apache Maven repository.\nHere is, for example, how you can add TDB version X.Y.Z-SNAPSHOT to your project (please ask if you are unsure what the latest snapshot version number currently is):\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-tdb\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;X.Y.Z-SNAPSHOT\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; See also how to use Jena with Maven.\nIf you have problems with any of our SNAPSHOTs, let us know.\nYou can check the state of each Jena development builds on the Apache Jenkins continuous integration server.\nGit repository You can find the Jena source code in the Apache git repository:\nThere is also a full read-write mirror of Jena on GitHub:\ngit clone cd jena mvn clean install You can fork Jena on GitHub and also submit pull requests to contribute your suggested changes to the code.\nOpen issues You can find a list of the open issues on JIRA (sorted by priority). Or, you can look at the last week activity to get a sense of what people are working on.\nSubmit your patches You can develop new contributions and work on patches using either the Apache-hosted git repository or the mirror on GitHub.\nGitHub pull requests are forwarded to the dev@jira mailing list for review by the Jena committers. You should subscribe to dev@jira to follow the feedback on your pull request.\nAlternatively, patches can be attached directly to issues in Jira (click on More Actions \u0026gt; Attach Files).\nPlease, inspect your contribution/patch and make sure it includes all (and only) the relevant changes for a single issue. Don\u0026rsquo;t forget tests!\nIf you want to test if a patch applies cleanly you can use:\npatch -p0 \u0026lt; JENA-XYZ.patch If you use Eclipse: right click on the project name in Package Explorer, select Team \u0026gt; Create Patch or Team \u0026gt; Apply Patch.\nYou can also use git:\ngit format-patch origin/trunk IRC channel Some Jena developers hang out on #jena on\nHow Apache Software Foundation works To better understand how to get involved and how the Apache Software Foundation works we recommend you read:\n ","permalink":"","tags":null,"title":"Getting involved in Apache Jena"},{"categories":null,"contents":"Apache Jena (or Jena in short) is a free and open source Java framework for building semantic web and Linked Data applications. The framework is composed of different APIs interacting together to process RDF data. If you are new here, you might want to get started by following one of the tutorials. You can also browse the documentation if you are interested in a particular topic.\nTutorials RDF API tutorial - you will learn the essence of the semantic web and the graph representation behind RDF. SPARQL tutorial - will guide you to formulate expressive queries over RDF data. Ontology API - illustrates the usage of advanced semantic web features such as reasoning over your data using OWL. Finally, some of the tutorials are also available in Traditional Chinese, Portuguese and French. Documentation The following topics are covered in the documentation:\nThe RDF API - the core RDF API in Jena SPARQL - querying and updating RDF models using the SPARQL standards Fuseki - SPARQL server which can present RDF data and answer SPARQL queries over HTTP Assembler - describing recipes for constructing Jena models declaratively using RDF Inference - using the Jena rules engine and other inference algorithms to derive consequences from RDF models Javadoc - JavaDoc generated from the Jena source Text Search - enhanced indexes using Lucene or Solr for more efficient searching of text literals in Jena models and datasets I/O - notes on input and output of triples to and from Jena models How-To\u0026rsquo;s - various topic-specific how-to documents Ontology - support for handling OWL models in Jena TDB - a fast persistent triple store that stores directly to disk Tools - various command-line tools and utilities to help developers manage RDF data and other aspects of Jena Framework Architecture The interaction between the different APIs:\nOther resources ","permalink":"","tags":null,"title":"Getting started with Apache Jena"},{"categories":null,"contents":"Old documentation (Jena 3.1.1 to Jena 4.2.0)\nJena 4.3.0 and later uses the JDK package. Jena adds API support for challenge-based authentication and also provide HTTP digest authentication.\nAuthentication There are 5 variations:\nBasic authentication Challenge-Basic authentication Challenge-Digest authentication URL user (that is, in the URL) URL user and password in the URL (that is, in the URL) Basic authentication occurs where the app provides the user and password information to the JDK HttpClient and that information is always used when sending HTTP requests with that HttpClient. It does not require an initial request-challenge-resend to initiate. This is provided natively by the JDK code. See HttpClient.newBuilder().authenticate(...).\nChallenge based authentication, for \u0026ldquo;basic\u0026rdquo; or \u0026ldquo;digest\u0026rdquo;, are provided by Jena. The challenge happens on the first contact with the remote endpoint and the server returns a 401 response with an HTTP header saying which style of authentication is required. There is a registry of users name and password for endpoints which is consulted and the appropriate Authorization: header is generated then the request resent. If no registration matches, the 401 is passed back to the application as an exception.\nBecause it is a challenge response to a request, the request must be sent twice, first to trigger the challenge and then again with the HTTP authentication information. To make this automatic, the first request must not be a streaming request (the stream is not repeatable). All HTTP request generated by Jena are repeatable.\nThe URL can contain a userinfo part, either the users@host form, or the user:password@host form. If just the user is given, the authentication environment is consulted for registered users-password information. If user and password is given, the details as given are used. This latter form is not recommended and should only be used if necessary because the password is in-clear in the SPARQL query.\nJena also has support for bearer authentication.\nJDK HttpClient.authenticator // Basic or Digest - determined when the challenge happens. AuthEnv.get().registerUsernamePassword(URI.create(dataURL), \u0026#34;user\u0026#34;, \u0026#34;password\u0026#34;); try ( QueryExecution qExec = QueryExecutionHTTP.service(dataURL) .endpoint(dataURL) .queryString(\u0026#34;ASK{}\u0026#34;) .build()) { qExec.execAsk(); } alternatively, the java platform provides basic authentication. This is not challenge based - any request sent using a HttpClient configured with an authenticator will include the authentication details. (Caution - including sending username/password to the wrong site!). Digest authentication must use AuthEnv.get().registerUsernamePassword.\nAuthenticator authenticator = AuthLib.authenticator(\u0026#34;user\u0026#34;, \u0026#34;password\u0026#34;); HttpClient httpClient = HttpClient.newBuilder() .authenticator(authenticator) .build(); // Use with RDFConnection try ( RDFConnection conn = RDFConnectionRemote.service(dataURL) .httpClient(httpClient) .build()) { conn.queryAsk(\u0026#34;ASK{}\u0026#34;); } try ( QueryExecution qExec = QueryExecutionHTTP.service(dataURL) .httpClient(httpClient) .endpoint(dataURL) .queryString(\u0026#34;ASK{}\u0026#34;) .build()) { qExec.execAsk(); } Challenge registration AuthEnv maintains a registry of credentials and also a registry of which service URLs the credentials should be used. It supports registration of endpoint prefixes so that one registration will apply to all URLs starting with a common root.\nThe main function is AuthEnv.get().registerUsernamePassword.\n// Application setup code AuthEnv.get().registerUsernamePassword(\u0026#34;username\u0026#34;, \u0026#34;password\u0026#34;); ... try ( QueryExecution qExec = QueryExecutionHTTP.service(dataURL) .endpoint(dataURL) .queryString(\u0026#34;ASK{}\u0026#34;) .build()) { qExec.execAsk(); } When an HTTP 401 response with an WWW-Authenticate header is received, the Jena http handling code will will look for a suitable authentication registration (exact or longest prefix), and retry the request. If it succeeds, a modifier is installed so all subsequent request to the same endpoint will have the authentication header added and there is no challenge round-trip.\nSERVICE The same mechanism is used for the URL in a SPARQL SERVICE clause. If there is a 401 challenge, the registry is consulted and authentication applied.\nIn addition, if the SERVICE URL has a username as the userinfo (that is,, that user name is used to look in the authentication registry.\nIf the userinfo is of the form \u0026ldquo;username:password\u0026rdquo; then the information as given in the URL is used.\nAuthEnv.get().registerUsernamePassword(URI.create(\u0026#34;http://host/sparql\u0026#34;), \u0026#34;u\u0026#34;, \u0026#34;p\u0026#34;); // Registration applies to SERVICE. Query query = QueryFactory.create(\u0026#34;SELECT * { SERVICE \u0026lt;http://host/sparql\u0026gt; { ?s ?p ?o } }\u0026#34;); try ( QueryExecution qExec = QueryExecution.create().query(query).dataset(...).build() ) { System.out.println(\u0026#34;Call using SERVICE...\u0026#34;); ResultSet rs = qExec.execSelect(); ResultSetFormatter.out(rs); } Authentication Examples jena-examples:arq/examples/auth/.\nBearer Authentication Bearer authentication requires that the application to obtain a token to present to the server.\nRFC 6750 RFC 6751 JSON Web Tokens (JWT) JSON Web Token Best Current Practices How this token is obtained depends on the deployment environment.\nThe application can either register the token to be used:\nAuthEnv.get().addBearerToken(targetURL, jwtString); or can provide a token provider for 401 challeneges stating bearer authentication.\nAuthEnv.get().setBearerTokenProvider( (uri, challenge)-\u0026gt;{ ... ; return jwtString; }); ","permalink":"","tags":null,"title":"HTTP Authentication"},{"categories":null,"contents":"Documentation for HTTP Authentication (Jena3.1.1 to Jena 4.2.0) using Apache Commons HttpClient.\nAfter Jena 3.1.0, Jena exposes the underlying HTTP Commons functionality to support a range of authentication mechanisms as well as other HTTP configuration. From Jena 3.0.0 through Jena 3.1.0 there is a Jena-specific framework that provides a uniform mechanism for HTTP authentication. This documentation is therefore divided into two sections. The first explains how to use HTTP Commons code, and the second explains the older Jena-specific functionality.\nHTTP Authentication from Jena 3.1.1 APIs that support authentication typically provide methods for providing an HttpClient for use with the given instance of that API class. Since it may not always be possible/practical to configure authenticators on a per-request basis the API includes a means to specify a default client that is used when no other client is explicitly specified. This may be configured via the setDefaultHttpClient(HttpClient httpClient) method of the HttpOp class. This allows for static-scoped configuration of HTTP behavior.\nExamples of authentication This section includes a series of examples showing how to use HTTP Commons classes to perform authenticated work. Most of them take advantage of HttpOp.setDefaultHttpClient as described above.\nSimple authentication using username and password First we build an authenticating client:\nCredentialsProvider credsProvider = new BasicCredentialsProvider(); Credentials credentials = new UsernamePasswordCredentials(\u0026quot;user\u0026quot;, \u0026quot;passwd\u0026quot;); credsProvider.setCredentials(AuthScope.ANY, credentials); HttpClient httpclient = HttpClients.custom() .setDefaultCredentialsProvider(credsProvider) .build(); HttpOp.setDefaultHttpClient(httpclient); Notice that we gave no scope for use with the credentials (AuthScope.ANY). We can make further use of that parameter if we want to assign a scope for some credentials:\nCredentialsProvider credsProvider = new BasicCredentialsProvider(); Credentials unscopedCredentials = new UsernamePasswordCredentials(\u0026quot;user\u0026quot;, \u0026quot;passwd\u0026quot;); credsProvider.setCredentials(AuthScope.ANY, unscopedCredentials); Credentials scopedCredentials = new UsernamePasswordCredentials(\u0026quot;user\u0026quot;, \u0026quot;passwd\u0026quot;); final String host = \u0026quot;\u0026quot;; final int port = 80; final String realm = \u0026quot;aRealm\u0026quot;; final String schemeName = \u0026quot;DIGEST\u0026quot;; AuthScope authscope = new AuthScope(host, port, realm, schemeName); credsProvider.setCredentials(authscope, scopedCredentials); HttpClient httpclient = HttpClients.custom() .setDefaultCredentialsProvider(credsProvider) .build(); HttpOp.setDefaultHttpClient(httpclient); Authenticating via a form For this case we introduce an HttpClientContext, which we can use to retrieve the cookie we get from logging into a form. We then use the cookie to authenticate elsewhere.\n// we'll use this context to maintain our HTTP \u0026quot;conversation\u0026quot; HttpClientContext httpContext = new HttpClientContext(); // first we use a method on HttpOp to log in and get our cookie Params params = new Params(); params.addParam(\u0026quot;username\u0026quot;, \u0026quot;Bob Wu\u0026quot;); params.addParam(\u0026quot;password\u0026quot;, \u0026quot;my password\u0026quot;); HttpOp.execHttpPostForm(\u0026quot;\u0026quot;, params , null, null, null, httpContext); // now our cookie is stored in httpContext CookieStore cookieStore = httpContext.getCookieStore(); // lastly we build a client that uses that cookie HttpClient httpclient = HttpClients.custom() .setDefaultCookieStore(cookieStore) .build(); HttpOp.setDefaultHttpClient(httpclient); // alternatively we could use the context directly Query query = ... QueryEngineHTTP qEngine = QueryExecutionFactory.createServiceRequest(\u0026quot;\u0026quot;, query); qEngine.setHttpContext(httpContext); ResultSet results = qEngine.execSelect(); Using authentication functionality in direct query execution Jena offers support for directly creating SPARQL queries against remote services. To use QueryExecutionFactory in this case, select the methods (sparqlService, createServiceRequest) that offer an HttpClient parameter and use an authenticating client in that slot. In the case of QueryEngineHTTP, it is possible to use constructors that have a parameter slot for an HttpClient, but it is also possible post-construction to use setClient(HttpClient client) and setHttpContext(HttpClientContext context) (as shown above). These techniques allow control over HTTP behavior when requests are made to remote services.\nHTTP Authentication from Jena 3.0.0 through 3.1.0 APIs that support authentication typically provide two methods for providing authenticators, a setAuthentication(String username, char[] password) method which merely configures a SimpleAuthenticator. There will also be a setAuthenticator(HttpAuthenticator authenticator) method that allows you to configure an arbitrary authenticator.\nAuthenticators applied this way will only be used for requests by that specific API. APIs that currently support this are as follows:\nQueryEngineHTTP - This is the QueryExecution implementation returned by QueryExecutionFactory.sparqlService() calls UpdateProcessRemoteBase - This is the base class of UpdateProcessor implementations returned by UpdateExecutionFactory.createRemote() and UpdateExecutionFactory.createRemoteForm() calls DatasetGraphAccessorHTTP - This is the DatasetGraphAccessor implementation underlying remote dataset accessors. From 2.11.0 onwards the relevant factory methods include overloads that allow providing a HttpAuthenticator at creation time which avoids the needs to cast and manually set the authenticator afterwards e.g.\nHttpAuthenticator authenticator = new SimpleAuthenticator(\u0026quot;user\u0026quot;, \u0026quot;password\u0026quot;.toCharArray()); try(QueryExecution qe = QueryExecutionFactory.sparqlService(\u0026quot;\u0026quot;, \u0026quot;SELECT * WHERE { ?s a ?type }\u0026quot;, authenticator)) { ... } Authenticators Authentication mechanisms are provided by HttpAuthenticator implementations of which a number are provided built into ARQ.\nThis API provides the authenticator with access to the HttpClient, HttpContext and target URI of the request that is about to be carried out. This allows for authenticators to add credentials to requests on a per-request basis and/or to use different mechanisms and credentials for different services.\nSimpleAuthenticator The simple authenticator is as the name suggests the simplest implementation. It takes a single set of credentials which is applied to any service.\nAuthentication however is not preemptive so unless the remote service sends a HTTP challenge (401 Unauthorized or 407 Proxy Authorization Required) then credentials will not actually be submitted.\nScopedAuthenticator The scoped authenticator is an authenticator which maps credentials to different service URIs. This allows you to specify different credentials for different services as appropriate. Similarly to the simple authenticator this is not preemptive authentication so credentials are not sent unless the service requests them.\nScoping of credentials is not based on exact mapping of the request URI to credentials but rather on a longest match approach. For example if you define credentials for then these are used for any request that requires authentication under that URI e.g. However, if you had also defined credentials for then these would be used in favor of those for\nServiceAuthenticator The service authenticator is an authenticator which uses information encoded in the ARQ context and basically provides access to the existing credential provision mechanisms provided for the SERVICE clause, see Basic Federated Query for more information on configuration for this.\nFormsAuthenticator The forms authenticator is an authenticator usable with services that require form based logins and use session cookies to verify login state. This is intended for use with services that don\u0026rsquo;t support HTTP\u0026rsquo;s built-in authentication mechanisms for whatever reason. One example of this are servers secured using Apache HTTP Server mod_auth_form.\nThis is one of the more complex authenticators to configure because it requires you to know certain details of the form login mechanism of the service you are authenticating against. In the simplest case where a site is using Apache mod_auth_form in its default configuration you merely need to know the URL to which login requests should be POSTed and your credentials. Therefore you can do the following to configure an authenticator:\nURI targetService = new URI(\u0026quot;\u0026quot;); FormLogin formLogin = new ApacheModAuthFormLogin(\u0026quot;\u0026quot;, \u0026quot;user\u0026quot;, \u0026quot;password\u0026quot;.toCharArray()); FormsAuthenticator authenticator = new FormsAuthenticator(targetService, formLogin); In the above example the service we want to authenticate against is and it requires us to first login by POSTing our credentials to\nHowever if the service is using a more complicated forms login setup you will additionally need to know what the names of the form fields used to submit the username and password. For example say we were authenticating to a service where the form fields were called id and pwd we\u0026rsquo;d need to configure our authenticator as follows:\nURI targetService = new URI(\u0026quot;\u0026quot;); FormLogin formLogin = new ApacheModAuthFormLogin(\u0026quot;\u0026quot;, \u0026quot;id\u0026quot;, \u0026quot;pwd\u0026quot;, \u0026quot;user\u0026quot;, \u0026quot;password\u0026quot;.toCharArray()); FormsAuthenticator authenticator = new FormsAuthenticator(targetService, formLogin); Note that you can also create a forms authenticator that uses different login forms for different services by creating a Map\u0026lt;URI, FormLogin\u0026gt; that maps each service to an associated form login and passing that to the FormsAuthenticator constructor.\nCurrently forms based login that require more than just a username and password are not supported.\nPreemptiveBasicAuthenticator This authenticator is a decorator over another authenticator that enables preemptive basic authentication, this only works for servers that support basic authentication and so will cause authentication failures when any other authentication scheme is required. You should only use this when you know the remote server uses basic authentication.\nPreemptive authentication is not enabled by default for two reasons:\nIt reduces security as it can result in sending credentials to servers that don\u0026rsquo;t actually require them. It only works for basic authentication and not for other HTTP authentication mechanisms e.g. digest authentication The 2nd point is important to emphasise, this only works for servers using Basic authentication.\nAlso be aware that basic authentication is very insecure since it sends credentials over the wire with only obfuscation for protection. Therefore many servers will use more secure schemes like Digest authentication which cannot be done preemptively as they require more complex challenge response sequences.\nDelegatingAuthenticator The delegating authenticator allows for mapping different authenticators to different services, this is useful when you need to mix and match the types of authentication needed.\nThe Default Authenticator Since it may not always be possible/practical to configure authenticators on a per-request basis the API includes a means to specify a default authenticator that is used when no authenticator is explicitly specified. This may be configured via the setDefaultAuthenticator(HttpAuthenticator authenticator) method of the HttpOp class.\nBy default there is already a default authenticator configured which is the ServiceAuthenticator since this preserves behavioural backwards compatibility with prior versions of ARQ.\nYou can configure the default authenticator to whatever you need so even if you don\u0026rsquo;t directly control the code that is making HTTP requests provided that it is using ARQs APIs to make these then authentication will still be applied.\nNote that the default authenticator may be disabled by setting it to null.\nOther concerns Debugging Authentication ARQ uses Apache Http Client for all its HTTP operations and this provides detailed logging information that can be used for debugging. To see this information you need to configure your logging framework to set the org.apache.http package to either DEBUG or TRACE level.\nThe DEBUG level will give you general diagnostic information about requests and responses while the TRACE level will give you detailed HTTP traces i.e. allow you to see the exact HTTP requests and responses which can be extremely useful for debugging authentication problems.\nAuthenticating to a SPARQL federated service ARQ allows the user to configure HTTP behavior to use on a per-SERVICE basis, including authentication behavior such as is described above. This works via the ARQ context. See Basic Federated Query for more information on configuring this functionality.\n","permalink":"","tags":null,"title":"HTTP Authentication in ARQ (Superseded)"},{"categories":null,"contents":"The in-memory, transactional dataset provides a dataset with full ACID transaction semantics, including abort. It provides for multiple readers and a concurrent writer together with full snapshot isolation of the dataset. Readers see an unchanging, consistent dataset where aggregate operations return stable results.\nAPI use A new instance of the class is obtained by a call to DatasetFactory.createTxnMem():\nDataset ds = DatasetFactory.createTxnMem() ; This can then be used by the application for reading:\nDataset ds = DatasetFactory.createTxnMem() ; ds.begin(ReadWrite.READ) ; try { ... SPARQL query ... } finally { ds.end() ; } or writing:\nDataset ds = DatasetFactory.createTxnMem() ; ds.begin(ReadWrite.WRITE) ; try { ... SPARQL update ... ... SPARQL query ... ... SPARQL update ... ds.commit() ; } finally { ds.end() ; } If the application does not call commit(), the transaction aborts and the changes are lost. The same happens if the application throws an exception.\nNon-transactional use. If used outside of a transaction, the implementation provides \u0026ldquo;auto-commit\u0026rdquo; functionality. Each triple or added or deleted is done inside an implicit transaction. This has a measurable performance impact. It is better to do related operations inside a single transaction explicitly in the application code.\nAssembler Use The assembler provides for the creation of a dataset and also loading it with data read from URLs (files or from any other URL).\nType: ja:MemoryDataset Properties: ja:data urlForData ja:namedGraph, for loading a specific graph of the dataset. This uses ja:graphName to specific the name and ja:data to load data. The examples use the following prefixes:\n@prefix ja: \u0026lt;\u0026gt; . @prefix rdf: \u0026lt;\u0026gt; . To create an empty in-memory dataset, all that is required is the line:\n[] rdf:type ja:MemoryDataset . With triples for the default graph, from file dataFile.ttl, Turtle format:\n[] rdf:type ja:MemoryDataset ; ja:data \u0026lt;file:dataFile.ttl\u0026gt; . With triples from several files:\n[] rdf:type ja:MemoryDataset ; ja:data \u0026lt;file:data1.ttl\u0026gt; ; ja:data \u0026lt;file:data2.nt\u0026gt; ; ja:data \u0026lt;file:data3.jsonld\u0026gt; ; . Load TriG:\n[] rdf:type ja:MemoryDataset ; ja:data \u0026lt;file:data.trig\u0026gt; . Load a file of triples into a named graph:\n[] rdf:type ja:MemoryDataset ; ja:namedGraph [ ja:graphName \u0026lt;http://example/graph\u0026gt; ; ja:data \u0026lt;file:///fullPath/data.ttl\u0026gt; ] . ","permalink":"","tags":null,"title":"In-memory, transactional Dataset"},{"categories":null,"contents":"This document describes Jena\u0026rsquo;s built-in assembler classes and how to write and integrate your own assemblers. If you just need a quick guide to the common model specifications, see the assembler quickstart; if you want more details on writing assembler descriptions, see the assembler howto.\nThe Assembler interface An Assembler is an object that builds objects (most importantly, Models) from RDF descriptions.\npublic Object open( Assembler a, Resource root, Mode mode ); public Object open( Assembler a, Resource root ); public Object open( Resource root ); public Model openModel( Resource root ); public Model openModel( Resource root, Mode mode ); The fundamental method is the first: all the others are shorthands for ways of calling it. The abstract class AssemblerBase implements Assembler leaving only that method abstract and defining the others in terms of it.\nThe definition of sub, Resource root, Mode mode) is that a will construct the object described by the properties of root. If this requires the construction of sub-objects from descriptions hanging off root, is to be used to construct those. If the object is to be constructed in some persistent store, mode defines whether objects can be re-used or created: see modes for more details.\nBuiltin assemblers Jena comes with a collection of built-in assemblers: various basic assemblers and a composite general assembler. Each of these assemblers has a constant instance declared as a field of Assembler.\nAssembler Result class Type constant Temporarily omitted as the source got scrambled by the Markdown import TODO Inside Assemblers Assembler.general is a particular implementation of the Assembler interface. An Assembler knows how to build the objects - not just models - described by an Assembler specification. The normal route into an Assembler is through the method:\nopen( Resource root ) ? Object The Assembler inspects the root resource properties and decides whether it can build an object with that description. If not, it throws an exception. Otherwise, it constructs and returns a suitable object. Since the creation of Models is the reason for the existence of Assemblers, there is a convenience wrapper method:\nopenModel( Resource root ) ? Model which constructs the object and checks that it\u0026rsquo;s a Model before returning it. When an Assembler requires sub-objects (for example, when an InfModel Assembler requires a Reasoner object), it uses the method:\nopen( Assembler sub, Resource root ) ? Model passing in a suitable Assembler object. In fact the standard implementation of open(root) is just\nopen( this, root ) passing in itself as the sub-assembler and having open(Assembler,Resource) be the place where all the work is done. (Amongst other things, this makes testing easier.) When working with named persistent objects (typically database models), sometimes you need to control whether new objects should be constructed or old models can be reused. There is an additional method\nopen( Assembler sub, Resource root, Mode mode ) where the Mode argument controls the creation (or not) of persistent models. The mode is passed down to all sub-object creation. The standard implementation of open(sub,root) is just:\nopen( sub, root, Mode.DEFAULT ) A Mode object has two methods:\npermitCreateNew( Resource root, String name ) permitUseExisting( Resource root, String name ) root is the root resource describing the object to be created or reused, and name is the name given to it. The result is true iff the permission is granted. Mode.DEFAULT permits the reuse of existing objects and denies the creation of new ones. There are four Mode constants:\nMode.DEFAULT - reuse existing objects Mode.CREATE - create missing objects Mode.REUSE - reuse existing objects Mode.ANY - reuse existing objects, create missing ones Since the Mode methods are passed the resource root and name, the user can write specialised Modes that look at the name or the other root properties to make their decision. Note that the Modes only apply to persistent objects, so eg MemoryModels or PrefixMappings ignore their Mode arguments.\nImplementing your own assemblers (Temporary documentation pasted in from email; will be integrated and made nice RSN.)\nYou have to implement the Assembler interface, most straightforwardly done by subclassing AssemblerBase and overriding public Object open( Assembler a, Resource root, Mode mode ); because AssemblerBase both implements the boring methods that are just specialisations of `open` and provides some utility methods such as getting the values of unique properties. The arguments are * a -- the assembler to use for any sub-assemblies * root -- the resource in the assembler description for this object * mode -- the persistent open vs create mode The pattern is to look for the known properties of the root, use those to define any sub-objects of the object you're assembling (including using `a` for anything that's itself a structured object) and then constructing a new result object from those components. Then you attach this new assembler object to its type in some AssemblerGroup using that group's `implementWith` method. You can attach it to the handy-but-public-and-shared group `Assembler.general` or you can construct your own group. The point about an AssemblerGroup is that it does the type-to-assembler mapping for you -- and when an AssemblerGroup calls a component assembler's `open` method, it passes /itself/ in as the `a` argument, so that the invoked assembler has access to all of the component assemblers of the Group. basic assemblers There is a family of basic assemblers, each of which knows how to assemble a specific kind of object so long as they\u0026rsquo;re given an Assembler that can construct their sub-objects. There are defined constants in Assembler for (an instance of) each of these basic assembler classes.\nproduces Class Type constant default models DefaultModelAssembler ja:DefaultModel defaultModel memory models MemoryModelAssembler ja:MemoryModel memoryModel inference models InfModelAssembler ja:InfModel infModel reasoners ReasonerAssembler ja:Reasoner reasoner content ContentAssembler ja:Content content ontology models OntModelAssembler ja:OntModel ontModel rules RuleSetAssembler ja:RuleSet rules union models UnionModelAssembler ja:UnionModel unionModel prefix mappings PrefixMappingAssembler ja:PrefixMapping prefixMapping file models FileModelAssembler ja:FileModel fileModel Assembler.general is an assembler group, which ties together those basic assemblers. general can be extended by Jena coders if required. Jena components that use Assembler specifications to construct objects will use general unless documented otherwise.\nIn the remaining sections we will discuss the Assembler classes that return non-Model objects and conclude with a description of AssemblerGroup.\nBasic assembler ContentAssembler The ContentAssembler constructs Content objects (using the ja:Content vocabulary) used to supply content to models. A Content object has the method:\nfill( Model m ) ? m Invoking the fill method adds the represented content to the model. The supplied ModelAssemblers automatically apply the Content objects corresponding to ja:content property values.\nBasic assembler RulesetAssembler A RulesetAssembler generates lists of Jena rules.\nBasic assembler DefaultModelAssembler A \u0026ldquo;default model\u0026rdquo; is a model of unspecified type which is implemented as whatever kind the assembler for ja:DefaultModel generates. The default for a DefaultModel is to create a MemoryModel with no special properties.\nAssemblerGroup The AssemblerGroup class allows a bunch of other Assemblers to be bundled together and selected by RDF type. AssemblerGroup implements Assembler and adds the methods:\nimplementWith( Resource type, Assembler a ) ? this assemblerFor( Resource type ) ? Assembler AssemblerGroup\u0026rsquo;s implementation of open(sub,root) finds the most specific type of root that is a subclass of ja:Object and looks for the Assembler that has been associated with that type by a call of implementWith. It then delegates construction to that Assembler, passing itself as the sub-assembler. Hence each component Assembler only needs to know how to assemble its own particular objects.\nThe assemblerFor method returns the assembler associated with the argument type by a previous call of implementWith, or null if there is no associated assembler.\nLoading assembler classes AssemblerGroups implement the ja:assembler functionality. The object of an (type ja:assembler \u0026quot;ClassName\u0026quot;) statement is a string which is taken as the name of an Assembler implementation to load. An instance of that class is associated with type using implementWith.\nIf the class has a constructor that takes a single Resource object, that constructor is used to initialise the class, passing in the type subject of the triple. Otherwise the no-argument constructor of the class is used.\n","permalink":"","tags":null,"title":"Inside assemblers"},{"categories":null,"contents":"There\u0026rsquo;s quite a lot of code inside Jena, and it can be daunting for new Jena users to find their way around. On this page we\u0026rsquo;ll summarise the key features and interfaces in Jena, as a general overview and guide to the more detailed documentation.\nAt its core, Jena stores information as RDF triples in directed graphs, and allows your code to add, remove, manipulate, store and publish that information. We tend to think of Jena as a number of major subsystems with clearly defined interfaces between them. First let\u0026rsquo;s start with the big picture:\nRDF triples and graphs, and their various components, are accessed through Jena\u0026rsquo;s RDF API. Typical abstractions here are Resource representing an RDF resource (whether named with a URI or anonymous), Literal for data values (numbers, strings, dates, etc), Statement representing an RDF triple and Model representing the whole graph. The RDF API has basic facilities for adding and removing triples to graphs and finding triples that match particular patterns. Here you can also read in RDF from external sources, whether files or URL\u0026rsquo;s, and serialize a graph in correctly-formatted text form. Both input and output support most of the commonly-used RDF syntaxes.\nWhile the programming interface to Model is quite rich, internally, the RDF graph is stored in a much simpler abstraction named Graph. This allows Jena to use a variety of different storage strategies equivalently, as long as they conform to the Graph interface. Out-of-the box, Jena can store a graph as an in-memory store, or as a persistent store using a custom disk-based tuple index. The graph interface is also a convenient extension point for connecting other stores to Jena, such as LDAP, by writing an adapter that allows the calls from the Graph API to work on that store.\nA key feature of semantic web applications is that the semantic rules of RDF, RDFS and OWL can be used to infer information that is not explicitly stated in the graph. For example, if class C is a sub-class of class B, and B a sub-class of A, then by implication C is a sub-class of A. Jena\u0026rsquo;s inference API provides the means to make these entailed triples appear in the store just as if they had been added explicitly. The inference API provides a number of rule engines to perform this job, either using the built-in rulesets for OWL and RDFS, or using application custom rules. Alternatively, the inference API can be connected up to an external reasoner, such as description logic (DL) engine, to perform the same job with different, specialised, reasoning algorithms.\nThe collection of standards that define semantic web technologies includes SPARQL - the query language for RDF. Jena conforms to all of the published standards, and tracks the revisions and updates in the under-development areas of the standard. Handling SPARQL, both for query and update, is the responsibility of the SPARQL API.\nOntologies are also key to many semantic web applications. Ontologies are formal logical descriptions, or models, of some aspect of the real-world that applications have to deal with. Ontologies can be shared with other developers and researchers, making it a good basis for building linked-data applications. There are two ontology languages for RDF: RDFS, which is rather weak, and OWL which is much more expressive. Both languages are supported in Jena though the Ontology API, which provides convenience methods that know about the richer representation forms available to applications through OWL and RDFS.\nWhile the above capabilities are typically accessed by applications directly through the Java API, publishing data over the Internet is a common requirement in modern applications. Fuseki is a data publishing server, which can present, and update, RDF models over the web using SPARQL and HTTP.\nThere are many other pieces to Jena, including command-line tools, specialised indexes for text-based lookup, etc. These, and further details on the pieces outlined above, can be found in the detailed documentation on this site.\n","permalink":"","tags":null,"title":"Jena architecture overview"},{"categories":null,"contents":"Introduction This document describes the vocabulary and effect of the built-in Jena assembler descriptions for constructing models (and other things). A companion document describes the built-in assembler classes and how to write and integrate your own assemblers. If you just need a quick guide to the common model specifications, see the assembler quickstart.\nThis document describes how to use the Assembler classes to construct models \u0026ndash; and other things \u0026ndash; from RDF descriptions that use the Jena Assembler vocabulary. That vocabulary is available in assembler.ttl as an RDFS schema with conventional prefix ja for the URI; the class JA is its Java rendition.\nThe examples used in this document are extracted from the examples file examples.ttl. The pieces of RDF/OWL schema are extracted from the ja-vocabulary file.\nThe property names selected are those which are the \u0026ldquo;declared properties\u0026rdquo; (as per Jena\u0026rsquo;s listDeclaredProperties method) of the class. Only the most specialised super-classes and range classes are shown, so (for example) rdf:Resource typically won\u0026rsquo;t appear.\nOverview An Assembler specification is a Resource in some RDF Model. The properties of that Resource describe what kind of object is to be assembled and what its components are: for example, an InfModel is constructed by specifying a base model and a reasoner. The specifications for the components are themselves Assembler specifications given by other Resources in the same Model.For example, to specify a memory model with data loaded from a file:\neg:model a ja:MemoryModel ; ja:content [ja:externalContent \u0026lt;file:////home/kers/projects/jena2/doc/assembler/Data/example.n3\u0026gt;] . The rdf:type of eg:model specifies that the constructed Model is to be a Jena memory-based model. The ja:content property specifies that the model is to be loaded with the content of the resource file:Data/example.n3. The content handler guesses from the \u0026ldquo;.n3\u0026rdquo; suffix that this file is to be read using the Jena N3 reader.\nUnless otherwise specified by an application, Assembler specifications are interpreted after completion by\nincluding the JA schema, including (recursively) the objects of any owl:imports and ja:imports statements, and doing (limited) RDFS inference. (The supplied model is not modified.) In the example above, eg:model has to be given an explicit type, but the ja:externalContent bnode is implicitly typed by the domain of ja:externalContent. In this document, we will usually leave out inferrable types.\nWe can construct our example model from the specification like this (you may need to tweak the filename to make this work in your environment):\nModel spec = RDFDataMgr.loadModel( \u0026quot;examples.ttl\u0026quot; ); Resource root = spec.createResource( spec.expandPrefix( \u0026quot;eg:opening-example\u0026quot; ) ); Model m = Assembler.general.openModel( root ); The model is constructed from the \u0026ldquo;root resource\u0026rdquo;, eg:opening-example in our example. general knows how to create all the kinds of objects - not just Models - that we describe in the next sections.\nSpecifications common to all models Assembler specifications can describe many kinds of models: memory, inference, ontology, and file-backed. All of these model specifications share a set of base properties for attaching content and prefix mappings.\nja:Loadable a rdfs:Class ; rdfs:subClassOf ja:Object . ja:initialContent a rdf:Property ; rdfs:domain ja:Loadable rdfs:range ja:Content . ja:content a rdf:Property ; rdfs:domain ja:Loadable ; rdfs:range ja:Content . ja:Model a rdfs:Class ; rdfs:subClassOf ja:ContentItem ; rdfs:subClassOf ja:Loadable . ja:prefixMapping a rdf:Property ; rdfs:domain ja:Model ; rdfs:range ja:PrefixMapping . All of a model\u0026rsquo;s ja:content property values are interpreted as specifying Content objects and a single composite Content object is constructed and used to initialise the model. See Content for the description of Content specifications. For example:\neg:sharedContent ja:externalContent \u0026lt;http://somewhere/RDF/ont.owl\u0026gt; . eg:common-example a ja:MemoryModel ; ja:content eg:sharedContent ; ja:content [ja:externalContent \u0026lt;file:////home/kers/projects/jena2/doc/assembler/Data/A.rdf\u0026gt;] ; ja:content [ja:externalContent \u0026lt;file:////home/kers/projects/jena2/doc/assembler/Data/B.rdf\u0026gt;] . The model constructed for eg:A will be loaded with the contents of Data/A.n3, Data/B.rdf, and http://somewhere/RDF/ont.owl. If the model supports transactions, then the content is loaded inside a transaction; if the load fails, the transaction is aborted, and a TransactionAbortedException thrown. If the content has any prefix mappings, then they are also added to the model.\nAll of a model\u0026rsquo;s ja:prefixMapping, ja:prefix, and ja:namespace properties are interpreted as specifying a PrefixMapping object and a single composite PrefixMapping is constructed and used to set the prefixes of the model. See PrefixMapping for the description of Content specifications.\nContent specification A Content specification describes content that can be used to fill models. Content can be external (files and URLs) or literal (strings in the specification) or quotations (referring to RDF which is part of the specification).\nja:Content a rdfs:Class ; rdfs:subClassOf ja:HasFileManager . ja:HasFileManager a rdfs:Class ; rdfs:subClassOf ja:Object . ja:fileManager a rdf:Property ; rdfs:domain ja:HasFileManager ; rdfs:range ja:FileManager . A ja:Content specification may have zero or more ja:externalContent property values. These are URI resources naming an external (file or http etc) RDF object. The constructed Content object contains the union of the values of all such resources. For example:\neg:external-content-example ja:externalContent \u0026lt;file:////home/kers/projects/jena2/doc/assembler/Data/C.owl\u0026gt;, \u0026lt;\u0026gt; . The external content is located using a FileManager. If the Content resource has a ja:fileManager property, then the FileManager described by that resource is used. Otherwise, if the ContentAssembler assembling this specification was constructed with a FileManager argument, that FileManager is used. Otherwise, the default FileManager, FileManager.get(), is used.\nThe string literal value of the any ja:literalContent properties is interpreted as RDF in an appropriate language. The constructed Content object contains that RDF. The language is either specified by an explicit ja:contentEncoding property value, or guessed from the content of the string. The only encodings permitted are \u0026ldquo;N3\u0026rdquo; and \u0026ldquo;RDF/XML\u0026rdquo;. For example:\neg:literal-content-example ja:literalContent \u0026quot;_:it dc:title 'Interesting Times'\u0026quot; . The literal content is wrapped so that prefix declarations for rdf, rdfs, owl, dc, and xsd apply before interpretation.\nThe property values of any ja:quotedContent properties should be resources. The subgraphs rooted at those resources (using the algorithm from ResourceUtils.reachableClosure()) are added to the content.\nInference models and reasoners Inference models are specified by supplying a description of the reasoner that is used by the model and (optionally) a base model to reason over. For example:\neg:inference-example ja:baseModel [a ja:MemoryModel] ; ja:reasoner [ja:reasonerURL \u0026lt;\u0026gt;] . describes an inference model that uses RDFS reasoning. The reasonerURL property value is the URI used to identify the reasoner (it is the value of the Jena constant RDFSRuleReasonerFactory.URI). The base model is specified as a memory model; if it is left out, an empty memory model is used.\neg:db-inference-example ja:baseModel eg:model-example ; ja:reasoner [ja:reasonerURL \u0026lt;\u0026gt;] . The same reasoner as used as in the previous example, but now the base model is a specific model description in the same way as our earlier example.\nBecause Jena\u0026rsquo;s access to external reasoners goes through the same API as for its internal reasoners, you can access a DIG reasoner (such as Pellet running as a server) using an Assembler specification:\neg:external-inference-example ja:reasoner [\u0026lt;\u0026gt; \u0026lt;http://localhost:2004/\u0026gt; ; ja:reasonerURL \u0026lt;\u0026gt;] . If there\u0026rsquo;s a DIG server running locally on port 2004, this specification will create a DIG inference model that uses it.\nThe internal rule reasoner can be supplied with rules written inside the specification, or outside from some resource (file or http: URL): eg:rule-inference-example ja:reasoner [ja:rule \u0026ldquo;[r1: (?x my:P ?y) -\u0026gt; (?x rdf:type my:T)]\u0026rdquo;] .\nThis reasoner will infer a type declaration from a use of a property. (The prefix my will have to be known to the rule parser, of course.)\nja:InfModel a rdfs:Class ; rdfs:subClassOf [owl:onProperty ja:reasoner; owl:maxCardinality 1] ; rdfs:subClassOf [owl:onProperty ja:baseModel; owl:maxCardinality 1] ; rdfs:subClassOf ja:Model . ja:reasoner a rdf:Property ; rdfs:domain ja:InfModel ; rdfs:range ja:ReasonerFactory . ja:baseModel a rdf:Property ; rdfs:domain ja:InfModel ; rdfs:range ja:Model . ja:HasRules a rdfs:Class ; rdfs:subClassOf ja:Object . ja:rule a rdf:Property ; rdfs:domain ja:HasRules . ja:rulesFrom a rdf:Property ; rdfs:domain ja:HasRules . ja:rules a rdf:Property ; rdfs:domain ja:HasRules ; rdfs:range ja:RuleSet . An InfModel\u0026rsquo;s ja:baseModel property value specifies the base model for the inference model; if omitted, an empty memory model is used.\nAn InfModel\u0026rsquo;s ja:ReasonerFactory property value specifies the Reasoner for this inference model; if omitted, a GenericRuleReasoner is used.\nA Reasoner\u0026rsquo;s optional ja:schema property specifies a Model which contains the schema for the reasoner to be bound to. If omitted, no schema is used.\nIf the Reasoner is a GenericRuleReasoner, it may have any of the RuleSet properties ja:rules, ja:rulesFrom, or ja:rule. The rules of the implied RuleSet are added to the Reasoner.\nReasonerFactory A ReasonerFactory can be specified by URL or by class name (but not both).\nja:ReasonerFactory a rdfs:Class ; rdfs:subClassOf [owl:onProperty ja:ReasonerURL; owl:maxCardinality 1] ; rdfs:subClassOf ja:HasRules . ja:reasonerClass a rdf:Property ; rdfs:domain ja:ReasonerFactory . ja:reasonerURL a rdf:Property ; rdfs:domain ja:ReasonerFactory . ja:schema a rdf:Property ; rdfs:domain ja:ReasonerFactory ; rdfs:range ja:Model . If the optional unique property ja:reasonerURL is specified, then its resource value is the URI of a reasoner in the Jena reasoner registry; the reasoner is the one with the given URI.\nIf the optional property ja:schema is specified, then the models specified by all the schema properties are unioned and any reasoner produced by the factory will have that union bound in as its schema (using the Reasoner::bindSchema() method).\nIf the optional unique property ja:reasonerClass is specified, its value names a class which implements ReasonerFactory. That class is loaded and an instance of it used as the factory.\nThe class may be named by the lexical form of a literal, or by a URI with the (fake) \u0026ldquo;java:\u0026rdquo; scheme.\nIf the class has a method theInstance, that method is called to supply the ReasonerFactory instance to use. Otherwise, a new instance of that class is constructed. Jena\u0026rsquo;s reasoner factories come equipped with this method; for other factories, see the documentation.\nRulesets A RuleSet specification allows rules (for ReasonerFactories) to be specified inline, elsewhere in the specification model, or in an external resource.\nja:RuleSet a rdfs:Class ; rdfs:subClassOf ja:HasRules . The optional repeatable property ja:rule has as its value a literal string which is the text of a Jena rule or rules. All those rules are added to the RuleSet.\nThe optional repeatable property ja:rulesFrom has as its value a resource whose URI identifies a file or other external entity that can be loaded as Jena rules. All those rules are added to the RuleSet.\nThe optional repeatable property ja:rules has as its value a resource which identifies another RuleSet in the specification model. All those rules from that RuleSet are added to this RuleSet.\nOntology models Ontology models can be specified in several ways. The simplest is to use the name of an OntModelSpec from the Java OntModelSpec class:\neg:simple-ont-example ja:ontModelSpec ja:OWL_DL_MEM_RULE_INF . This constructs an OntModel with an empty base model and using the OWL_DL language and the full rule reasoner. All of the OntModelSpec constants in the Jena implementation are available in this way. A base model can be specified:\neg:base-ont-example ja:baseModel [a ja:MemoryModel ; ja:content [ja:externalContent \u0026lt;\u0026gt;]] . The OntModel has a base which is a memory model loaded with the contents of Since the ontModelSpec was omitted, it defaults to OWL_MEM_RDFS_INF - the same default as ModelFactory.createOntologyModel().\nja:OntModel a rdfs:Class ; rdfs:subClassOf ja:UnionModel ; rdfs:subClassOf ja:InfModel . ja:ontModelSpec a rdf:Property ; rdfs:domain ja:OntModel ; rdfs:range ja:OntModelSpec . ja:OntModelSpec a rdfs:Class ; rdfs:subClassOf [owl:onProperty ja:like; owl:maxCardinality 1] ; rdfs:subClassOf [owl:onProperty ja:reasonerFactory; owl:maxCardinality 1] ; rdfs:subClassOf [owl:onProperty ja:importSource; owl:maxCardinality 1] ; rdfs:subClassOf [owl:onProperty ja:documentManager; owl:maxCardinality 1] ; rdfs:subClassOf [owl:onProperty ja:ontLanguage; owl:maxCardinality 1] ; rdfs:subClassOf ja:Object . ja:importSource a rdf:Property ; rdfs:domain ja:OntModelSpec . ja:reasonerFactory a rdf:Property ; rdfs:domain ja:OntModelSpec ; rdfs:range ja:ReasonerFactory . ja:documentManager a rdf:Property ; rdfs:domain ja:OntModelSpec . ja:ontLanguage a rdf:Property ; rdfs:domain ja:OntModelSpec . ja:likeBuiltinSpec a rdf:Property ; rdfs:domain ja:OntModelSpec . OntModel is a subclass of InfModel, and the ja:baseModel property means the same thing.\nThe OntModelSpec property value is a resource, interpreted as an OntModelSpec description based on its name and the value of the appropriate properties:\nja:likeBuiltinSpec: The value of this optional unique property must be a JA resource whose local name is the same as the name of an OntModelSpec constant (as in the simple case above). This is the basis for the OntModelSpec constructed from this specification. If absent, then OWL_MEM_RDFS_INF is used. To build an OntModelSpec with no inference, use eg ja:likeBuiltinSpec ja:OWL_MEM. ja:importSource: The value of this optional unique property is a ModelSource description which describes where imports are obtained from. A ModelSource is usually of class ja:ModelSource. ja:documentManager: This value of this optional unique property is a DocumentManager specification. If absent, the default document manager is used. ja:reasonerFactory: The value of this optional unique property is the ReasonerFactory resource which will be used to construct this OntModelSpec\u0026rsquo;s reasoner. A reasonerFactory specification is the same as an InfModel\u0026rsquo;s reasoner specification (the different properties are required for technical reasons). ja:reasonerURL: as a special case of reasonerFactory, a reasoner may be specified by giving its URL as the object of the optional unique reasonerURL property. It is not permitted to supply both a reasonerURL and reasonerFactory properties. ja:ontLanguage: The value of this optional unique property is one of the values in the ProfileRegistry class which identifies the ontology language of this OntModelSpec: OWL: OWL DL: OWL Lite: RDFS: Any unspecified properties have default values, normally taken from those of OntModelSpec.OWL_MEM_RDFS_INF. However, if the OntModelSpec resource is in the JA namespace, and its local name is the same as that of an OntModelSpec constant, then that constant is used as the default value.\nDocument managers An OntDocumentManager can be specified by a ja:DocumentManager specification which describes the OntDocumentManager\u0026rsquo;s file manager and policy settings.\neg:mapper lm:mapping [lm:altName \u0026quot;file:etc/foo.n3\u0026quot; ; lm:name \u0026quot;file:foo.n3\u0026quot;] . eg:document-manager-example ja:fileManager [ja:locationMapper eg:mapper] ; ja:meta [ dm:altURL \u0026lt;http://localhost/RDF/my-alt.rdf\u0026gt;] . In this example, eg:document-manager-example is a ja:DocumentManager specification. It has its own FileManager specification, the object of the ja:fileManager property; that FileManager has a location mapper, eg:mapper, that maps a single filename.\nThe document manager also has an additional property to link it to document manager meta-data: the sub-model of the assembler specification reachable from eg:document-manager-example is passed to the document manager when it is created. For the meanings of the dm: properties, see the Jena ontology documentation and the ontology.rdf ontology.\nja:DocumentManager a rdfs:Class ; rdfs:subClassOf [owl:onProperty ja:policyPath; owl:maxCardinality 1] ; rdfs:subClassOf [owl:onProperty ja:fileManager; owl:maxCardinality 1] ; rdfs:subClassOf [owl:onProperty ja:fileManager; owl:maxCardinality 1] ; rdfs:subClassOf ja:HasFileManager . ja:policyPath a rdf:Property ; rdfs:domain ja:DocumentManager . The ja:fileManager property value, if present, has as its object a ja:FileManager specification; the constructed document manager is given a new file manager constructed from that specification. If there is no ja:fileManager property, then the default FileManager is used.\nThe ja:policyPath property value, if present, should be a string which is a path to policy files as described in the Jena ontology documentation. If absent, the usual default path is applied.\nIf the sub-model of the assembler specification reachable from the DocumentManager resource contains any OntDocumentManager DOC_MGR_POLICY or ONTOLOGY_SPEC objects, they will be interpreted by the constructed document manager object.\nja:FileManager a rdfs:Class ; rdfs:subClassOf [owl:onProperty ja:locationMapper; owl:maxCardinality 1] ; rdfs:subClassOf ja:Object . ja:locationMapper a rdf:Property ; rdfs:domain ja:FileManager ; rdfs:range ja:LocationMapper . A ja:FileManager object may have a ja:locationMapper property value which identifies the specification of a LocationMapper object initialising that file manager.\nja:LocationMapper a rdfs:Class ; rdfs:subClassOf [owl:onProperty lm:mapping; owl:maxCardinality 1] ; rdfs:subClassOf ja:Object . lm:mapping a rdf:Property ; rdfs:domain ja:LocationMapper . A ja:LocationMapper object may have lm:mapping property values, describing the location mapping, as described in the FileManager documentation. (Note that the vocabulary for those items is in a different namespace than the JA properties and classes.)\nUnion models Union models can be constructed from any number of sub-models and a single root model. The root model is the one written to when the union model is updated; the sub-models are untouched.\nja:UnionModel a rdfs:Class ; rdfs:subClassOf [owl:onProperty ja:rootModel; owl:maxCardinality 1] ; rdfs:subClassOf ja:Model . ja:rootModel a rdf:Property ; rdfs:domain ja:UnionModel ; rdfs:range ja:Model . ja:subModel a rdf:Property ; rdfs:domain ja:UnionModel ; rdfs:range ja:Model . If the single ja:rootModel property is present, its value describes a model to use as the root model of the union. All updates to the union are directed to this root model. If no root model is supplied, the union is given an immutable, empty model as its root.\nAny ja:subModel property values have objects describing the remaining sub-models of the union. The order of the sub-models in the union is undefined (which is why there\u0026rsquo;s a special rootModel property).\nPrefix mappings The PrefixMappings of a model may be set from PrefixMapping specifications.\nja:PrefixMapping a rdfs:Class ; rdfs:subClassOf ja:Object . ja:includes a rdf:Property ; rdfs:domain ja:PrefixMapping ; rdfs:range ja:PrefixMapping . ja:SinglePrefixMapping a rdfs:Class ; rdfs:subClassOf [owl:onProperty ja:namespace; owl:cardinality 1] ; rdfs:subClassOf [owl:onProperty ja:prefix; owl:cardinality 1] ; rdfs:subClassOf ja:PrefixMapping . ja:namespace a rdf:Property ; rdfs:domain ja:SinglePrefixMapping . ja:prefix a rdf:Property ; rdfs:domain ja:SinglePrefixMapping . The ja:includes property allows a PrefixMapping to include the content of other specified PrefixMappings.\nThe ja:prefix and ja:namespace properties allow the construction of a single element of a prefix mapping by specifying the prefix and namespace of the mapping.\nOther Assembler directives There are two more Assembler directives that can be used in an Assembler specification: the assembler and imports directives.\nAssembler A specification may contain statements of the form:\nsomeResource ja:assembler \u0026quot;\u0026quot; When someResource is used as the type of a root object, the AssemblerGroup that processes the description will use an instance of the Java class named by the object of the statement. That class must implement the Assembler interface. See loading assembler classes for more details.\nSimilarly, statements of the form:\nsomeResource ja:loadClass \u0026quot;\u0026quot; will cause the named class to be loaded (but not treated as assemblers).\nImports If a specification contains statements of the form:\nanyResource owl:imports someURL or, equivalently,\nanyResource ja:imports someURL then the specification is regarded as also containing the contents of the RDF at someURL. That RDF may in turn contain imports referring to other RDF.\nLimited RDFS inference The Assembler engine uses limited RDFS inference to complete the model it is given, so that the spec-writer does not need to write excessive and redundant RDF. (It does not use the usual Jena reasoners because this limited once-off reasoning has been faster.) The inference steps are:\nadd all the classes from the JA schema. do subclass closure over all the classes. do domain and range inference. do simple intersection inference: if X is an instance of intersection A B C \u0026hellip;, then X is an instance of A, B, C \u0026hellip; (and their supertypes). This is sufficient for closed-world assembling. Other parts of the JA schema \u0026ndash; eg, cardinality constraints \u0026ndash; are hard-coded into the individual assemblers.\n","permalink":"","tags":null,"title":"Jena Assembler howto"},{"categories":null,"contents":"Jena\u0026rsquo;s assembler provides a means of constructing Jena models according to a recipe, where that recipe is itself stated in RDF. This is the Assembler quickstart page. For more detailed information, see the Assembler howto or Inside assemblers.\nWhat is an Assembler specification? An Assembler specification is an RDF description of how to construct a model and its associated resources, such as reasoners, prefix mappings, and initial content. The Assembler vocabulary is given in the Assembler schema, and we\u0026rsquo;ll use the prefix ja for its identifiers.\nWhat is an Assembler? An Assembler is an object that implements the Assembler interface and can construct objects (typically models) from Assembler specifications. The constant Assembler.general is an Assembler that knows how to construct some general patterns of model specification.\nHow can I make a model according to a specification? Suppose the Model M contains an Assembler specification whose root - the Resource describing the whole Model to construct is R (so R.getModel() == M). Invoke:\nAssembler.general.openModel(R) The result is the desired Model. Further details about the Assembler interface, the special Assembler general, and the details of specific Assemblers, are deferred to the Assembler howto.\nHow can I specify \u0026hellip; In the remaining sections, the object we want to describe is given the root resource my:root.\n\u0026hellip; a memory model? my:root a ja:MemoryModel. \u0026hellip; an inference model? my:root ja:reasoner [ja:reasonerURL theReasonerURL] ; ja:baseModel theBaseModelResource . theReasonerURL is one of the reasoner (factory) URLs given in the inference documentation and code; theBaseModelResource is another resource in the same document describing the base model.\n\u0026hellip; some initialising content? my:root ja:content [ja:externalContent \u0026lt;someContentURL\u0026gt;] ... rest of model specification ... . The model will be pre-loaded with the contents of someContentURL.\n\u0026hellip; an ontology model? my:root ja:ontModelSpec ja:OntModelSpecName ; ja:baseModel somebaseModel . The OntModelSpecName can be any of the predefined Jena OntModelSpec names, eg OWL_DL_MEM_RULE_INF. The baseModel is another model description - it can be left out, in which case you get an empty memory model. See Assembler howto for construction of non-predefined OntModelSpecs.\n","permalink":"","tags":null,"title":"Jena assembler quickstart"},{"categories":null,"contents":"Jena Extra modules are modules that provide utilities and larger packages that make Apache Jena development or usage easier but that do not fall within the standard Jena framework.\nSub Packages Bulk retrieval and caching with SERVICE clauses Query Builder ","permalink":"","tags":null,"title":"Jena Extras - Extra packages for Jena development."},{"categories":null,"contents":"This page is historical \u0026ldquo;for information only\u0026rdquo; - there is no Apache release of Eyeball and the code has not been updated for Jena3.\nThe original source code is available. This document describes Eyeball, an \u0026ldquo;RDF lint\u0026rdquo;. See the release notes for descriptions of changes from previous versions. Eyeball was a part of the Jena family of RDF/OWL tools.\nThroughout this document, the prefix eye: stands for the URL\\#.\nIntroduction Eyeball is a library and command-line tool for checking RDF and OWL models for various common problems. These problems often result in technically correct but implausible RDF. Eyeball checks against user-provided schema files and makes various closed-world assumptions.\nEyeball can check for:\nunknown [with respect to the schemas] properties and classes bad prefix namespaces ill-formed URIs, with user-specifiable constraints ill-formed language tags on literals datatyped literals with illegal lexical forms unexpected local names in schema namespaces untyped resources and literals individuals having consistent types, assuming complete typing likely cardinality violations broken RDF list structures suspected broken use of the typed list idiom obviously broken OWL restrictions user-specified constraints written in SPARQL Eyeball\u0026rsquo;s checks are performed by Inspector plug-ins and can be customised by the user. Rendering its reports to output is performed by Renderer plug-ins which can also be customised by the user.\nInstallation Fetch the Eyeball distribution zipfile and unpack it somewhere convenient. Eyeball 2.1 comes with its own copy of Jena 2.5 with CVS updates. Do not attempt to use other versions of Jena with Eyeball.\nIn the Eyeball distribution directory, run the Eyeball tests:\nant test If these tests fail, something is wrong. Please ask on the user mailing list or file a Jira issue.\nIf the tests have passed, you can use Eyeball from the installation directory, or copy lib, etc and mirror to somewhere convenient.\nCommand line operation You must ensure that all the Eyeball jars from lib are on your classpath. (Note that Eyeball comes with its own Jena jar files and may not work with other Jena jars.) The directories etc and mirror should be in the current directory or also on your classpath.\nRun the Eyeball command:\njava [java options eg classpath and proxy] jena.eyeball (-check | -sign | -accept) specialURL+ [-assume Reference*] [-config fileOrURL*] [-set Setting*] [-root rootURI] [-render Name] [-include shortName*] [-exclude shortName*] [-analyse | -repair] [-remark] [-version] The -whatever sections can come in any order and may be repeated, in which case the additional arguments are appended to the existing ones. Exactly one of -check, -sign, -accept, or version must be provided; all the other options are optional.\nWhen Eyeball resolves ordinary filenames or URLs it uses the Jena file manager to possibly map those names (eg to redirect an http: URL to a local cached copy). See the file manager howto for details on how to configure the file manager.\nExamples of command-line use java jena.eyeball -version java jena.eyeball -check myDataFile.rdf java jena.eyeball -assume dc -check java jena.eyeball -assume mySchema.rdf -check myData.rdf -render xml java jena.eyeball -check myData.rdf -include consistent-type java jena.eyeball -check myConfig.ttl -sign \u0026gt;signedConfig.ttl -check specialURL+ The -check command checks the specified models for problems. The specialURLs designate the models to be checked. In the simplest case, these are plain filenames, file: URLs, or http: URLs. At least one specialURL must be specified. Each specified model is checked independently of the others.\n-check myModel.ttl -check file:///c:/rdf/pizza.owl -check If the specialURL is of the form ont:NAME:base, then the checked model is the model base treated as an OntModel with the specification OntModelSpec.\u0026lt;i\u0026gt;NAME\u0026lt;/i\u0026gt;; see the Jena ontology documentation for the available names.\n-check ont:OWL_MEM_RDFS_INF:myModel.ttl -check ont:OWL_DL_MEM_RULE_INF: If the specialURL is of the form ja:R@AF, then the model is that described by the resource R in the Jena assembler description file AF. R is prefix-expanded using the prefixes in AF.\n-check ja:my:root@my-assembly.ttl -check ont:OWL_MEM_RDFS_INF:my:root@my-assembly.ttl If the URL (or the base) is of the form jdbc:DB:head:model, then the checked model is the one called model in the database with connection jdbc:DB:head. (The database user and password must be specified independently using the jena.db.user and jena.db.password system properties.)\n-check jdbc:mysql://localhost/test:example -config fileOrURL and -root rootURI The -config fileOrURL options specify the Eyeball assembler configuration files to load. A single configuration model is constructed as the union of the contents of those files. If this option is omitted, the default configuration file etc/eyeball-config.n3 is loaded. See inside the Eyeball configuration file for details of the configuration file.\n-config my-hacked-config-file.n3 -config etc/eyeball-config.n3 extras.ttl The -root rootURI option specifies the root resource in the Eyeball configuration. If this option is omitted, eye:eyeball is used by default. rootURI is prefix-expanded using the prefixes in the configuration file.\n-root my:root -root my:sparse-config -root urn:x-hp:eyeball-roots:special -set Setting* The -set option allows command-line tweaks to the configuration, eg for enabling checking URIs for empty local names. You will rarely need to use this; it is presented here because of its association with the -config and -root options.\nEach Setting has the form S.P=O and adds the statement (S' P' O') to the configuration.\nThe current Eyeball converts the components of the S.P=O string into RDF nodes S', P', O' using some special rules:\nA component starting with a digit is treated as an xsd:integer literal (and hence should only appear as the object of the setting). A component starting with a quote, either \u0026quot; or ', is treated as a literal whose lexical form extends to the matching closing quote. Note: (a) literals with embedded spaces are not supported; (b) your command-line interpreter may treat quotes specially, and to allow the quotes to pass through to Eyeball, you\u0026rsquo;ll have to use another (different) pair of quotes! A component starting with _ is treated as a blank node with that label. Otherwise, the component is treated as a URI reference. If it starts with a prefix (eg, rdf:) that prefix is expanded using the prefixes of the configuration file. If it has no prefix, it is as though the empty prefix was specified: in the default configuration file, that is set to the Eyeball namespace, so it is as though the prefix eye: had been used. For example, to enable the URI inspectors non-default reporting of URIs with empty local names, use:\n-set URIInspector.reportEmptyLocalNames=\u0026quot;'true'\u0026quot; Note the nested different quotes required to pass \u0026rsquo;true\u0026rsquo; to Eyeball so that it can interpret this as a literal.\n-include/-exclude shortNames The various Eyeball inspectors are given short names in the configuration file. By default, an Eyeball check uses a specific set of inspectors with short name defaultInspectors. Additional inspectors can be enabled using the -include option, and default inspectors can be disabled using the -exclude option. See below for the available inspectors and their short names, and see inspectors configuration for how to configure inspectors.\n-include list all-typed -exclude cardinality -include owl -exclude consistent-type -assume Reference The -assume References identifies any assumed schemas used to specify the predicates and classes of the data model. The reference may be a file name or a URL (and may be mapped by the file manager).\nEyeball automatically assumes the RDF and RDFS schemas, and the built-in XSD datatype classes. The short name owl can be used to refer to the OWL schema, dc to the Dublin Core schema, dcterms to the Dublin Core terms schema, and dc-all to both.\n-assume owl -assume owl dc-all -assume owl my-ontology.owl -sign and -accept (experimental) If -sign is specified, Eyeball first does a -check. If no problem reports are generated, Eyeball writes a signed version of the current model to the standard output. The signature records the Eyeball configuration used and a weak hash of the model. If the input model is already signed, that signature is discarded before computing the new signature and writing the output.\nIf -accept is specified, the model is checked for its signature. If it is not signed, or if the signature does not match the content of the model \u0026ndash; either the hash fails, or the recorded configuration is not sufficient \u0026ndash; a problem is reported; otherwise not.\nThe intended use of -sign and -accept is that an application can require signed models which have passed some minimum set of inspections. The application code can then rely on the model having the desired properties, without having to run potentially expensive validation checks every time a model is loaded.\nImportant. Model signing is intended to catch careless mistakes, not for security against malicious users.\n-version Eyeball will print its version on the standard error stream (currently \u0026ldquo;Eyeball 2.1 (Nova Embers)\u0026rdquo;).\n-remark Normally Eyeball issues its report or signed model to the standard output and exits with code 0 (success) or 1 (failure) with no additional output. Specifying -remark causes it to report success or some problems reported to standard error.\n-repair and -analyse (experimental) These operations are not currently documented. Try them at your peril: -repair may attempt to update your models.\n-render Name The eyeball reports are written to the standard output; by default, the reports appear as text (RDF rendered by omitting the subjects - which are all blank nodes - and lightly prettifying the predicate and object). To change the rendering style, supply the -render option with the name of the renderer as its value. Eyeball comes with N3, XML, and text renderers; the Eyeball config file associates renderer names with their classes.\n-render n3 -render rdf setting the proxy If any of the data or schema are identified by an http: URL, and you are behind a firewall, you will need specify the proxy to Java using system properties; one way to do this is by using the Java command line options:\n-DproxySet=true -DproxyHost=theProxyHostName -DproxyPort=theProxyPortNumber Inspectors shipped with Eyeball Eyeball comes with a collection of inspectors that do relatively simple checks.\nPropertyInspector (short name: \u0026ldquo;property\u0026rdquo;) Checks that every predicate that appears in the model is declared in some -assumed schema or owl:imported model \u0026ndash; that is, is given rdf:type rdf:Property or some subclass of it.\nClassInspector (short name: \u0026ldquo;presumed-class\u0026rdquo;) Checks that every resource in the model that is used as a class, ie that appears as the object of an rdf:type, rdfs:domain, or rdfs:range statement, or as the subject or object of an rdfs:subClassOf statement, has been declared as a Class in the -assumed schemas or in the model under test.\nURIInspector (short name: \u0026ldquo;URI\u0026rdquo;) Checks that every URI in the model is well-formed according to the rules of the Jena IRI library. May apply additional rules specified in the configuration file: see uri configuration later for details.\nLiteralInspector (short name: \u0026ldquo;literal\u0026rdquo;) Checks literals for syntactically correct language codes, syntactically correct datatype URIs (using the same rules as the URIInspector), and conformance of the lexical form of typed literals to their datatype.\nPrefixInspector (short name: \u0026ldquo;prefix\u0026rdquo;) The PrefixInspector checks that the prefix declarations of the model have namespaces that are valid URIs and that if the prefix name is \u0026ldquo;well-known\u0026rdquo; (rdf, rdfs, owl, xsd, and dc) then the associated URI is the one usually associated with the prefix.\nThe PrefixInspector also reports a problem if any prefix looks like an Jena automatically-generated prefix, j.\u0026lt;i\u0026gt;Number\u0026lt;/i\u0026gt;. (Jena generates these prefixes when writing RDF/XML if the XML syntactically requires a prefix but the model hasn\u0026rsquo;t defined one.)\nVocabularyInspector (short name: \u0026ldquo;vocabulary\u0026rdquo;) Checks that every URI in the model with a namespace which is mentioned in some schema is one of the URIs declared for that namespace \u0026ndash; that is, it assumes that the schemas define a closed set of URIs.\nThe inspector may be configured to suppress this check for specified namespaces: see vocabulary configuration later.\nOwlSyntaxInspector (short name: \u0026ldquo;owl\u0026rdquo;) This inspector looks for \u0026ldquo;suspicious restrictions\u0026rdquo; which have some of the OWL restriction properties but not exactly one owl:onProperty and exactly one constraint (owl:allValuesFrom, etc).\nSparqlDrivenInspector (short name: \u0026ldquo;sparql\u0026rdquo;) The SparqlDrivenInspector is configured according to configuring the SPARQL-driven inspector, and applies arbitrary SPARQL queries to the model. The queries can be required to match or prohibited from matching; a problem is reported if the constraint fails.\nAllTypedInspector (short name: \u0026ldquo;all-typed\u0026rdquo;) Checks that all URI and bnode resources in the model have an rdf:type property in the model or the schema(s). If there is a statement in the configuration with property eye:checkLiteralTypes and value eye:true, also checks that every literal has a type or a language. Not in the default set of inspectors.\nConsistentTypeInspector (short name: \u0026ldquo;consistent-type\u0026rdquo;) Checks that every subject in the model can be given a type which is the intersection of the subclasses of all its \u0026ldquo;attached\u0026rdquo; types \u0026ndash; a \u0026ldquo;consistent type\u0026rdquo;.\nFor example, if the model contains three types Top, Left, and Right, with Left and Right both being subtypes of Top and with no other subclass statements, then some S with rdf:types Left and Right would generate this warning.\nCardinalityInspector (short name: \u0026ldquo;cardinality\u0026rdquo;) Looks for classes C that are subclasses of cardinality restrictions on some property P with cardinality range min to max. For any X of rdf:type C, it checks that the number of values of P is in the range min..max and generates a report if it isn\u0026rsquo;t.\nLiterals are counted as distinct if their values (not just their lexical form) are distinct. Resources are counted as distinct if they have different case-sensitive URIs: the CardinalityInspector takes no account of owl:sameAs statements.\nListInspector (short name: \u0026ldquo;list\u0026rdquo;) The ListInspector performs two separate checks:\nlooks for lists that are ill-formed by having multiple or missing rdf:first or rdf:rest properties on their elements. looks for possible mis-uses of the \u0026ldquo;typed list\u0026rdquo; idiom, and reports the types so defined. The typed list idiom is boilerplate OWL for defining a type which is List-of-T for some type T. It takes the form:\nmy:EList a owl:Class ; rdfs:subClassOf rdf:List ; rdfs:subClassOf [owl:onProperty rdf:first; owl:allValuesFrom my:Element] ; rdfs:subClassOf [owl:onProperty rdf:rest; owl:allValuesFrom my:EList] . The type my:Element is the element type of the list, and the type EList is the resulting typed list. The list inspector checks that all the subclasses of rdf:List (such as EList above) that are also subclasses of any bnode (such as the two other superclasses of *EList)*that has any property (eg, owl:onProperty) that has as an object either rdf:first or rdf:rest is a subclass defined by the full idiom above: if not, it reports it as a suspectListIdiom.\nEyeball problem reports Eyeball generates its reports as items in a model. Each item has rdf:type eye:Item, and its other properties determine what problem report it is. The default text renderer displays a prettified form of each item; use -render n3 to expose the complete report structure.\nOne of the item\u0026rsquo;s properties is its main property, which identifies the problem; the others are qualifications supplying additional detail.\nPropertyInspector: predicate not declared [] eye:unknownPredicate \u0026quot;*URIString*\u0026quot;. The predicate with the given URI is not defined in any of the -assumed schemas.\nClassInspector: class not declared [] eye:unknownClass \u0026quot;*URIString*\u0026quot;. The resource with the given URI is used as a Class, but not defined in any of the -assumed schemas.\nURIInspector: bad URI [] eye:badURI \u0026quot;*URIString*\u0026quot;; eye:forReason *Reason*. The URIString isn\u0026rsquo;t legal as a URI, or is legal but fails a user-specified spelling constraint. Reason is a resource or string identifying the reason.\nreason explanation eye:uriContainsSpaces the URI contains unencoded spaces, probably as a result of sloppy use of file: URLs. eye:uriFileInappropriate a URI used as a namespace is a file: URI, which is inappropriate as a global identifier. eye:uriHasNoScheme a URI has no scheme field, probably a misused relative URI. eye:schemeShouldBeLowercase the scheme part of a URI is not lower-case; while technically correct, this is not usual practice. eye:uriFailsPattern a URI fails the pattern appropriate to its schema (as defined in the configuration for this eyeball). eye:unrecognisedScheme the URI scheme is unknown, perhaps a misplaced QName. eye:uriNoHttpAuthority an http: URI has no authority (domain name/port) component. eye:uriSyntaxFailure the URI can\u0026rsquo;t be parsed using the general URI syntax, even with any spaces removed. eye:namespaceEndsWithNameCharacter a namespace URI ends in a character that can appear in a name, leading to possible ambiguities. eye:uriHasNoLocalname a URI has no local name according to the XML name-splitting rules. (For example, the URI has no local name because a local name cannot start with a digit.) \u0026ldquo;did not match required pattern Taili for prefix Head\u0026rdquo;. This badURI starts with Head, but the remainder doesn\u0026rsquo;t match any of the *Taili*s associated with that prefix. \u0026ldquo;matched prohibited pattern Tail for prefix Head\u0026rdquo;. This badURI starts with Head, and the remainder matched a prohibited Tail associated with that prefix. LiteralInspector: illegal language code [] eye:badLanguage \u0026quot;*badCode*\u0026quot;; eye:onLiteral \u0026quot;*spelling*\u0026quot;. A literal with the lexical form spelling has the illegal language code badCode.\nLiteralInspector: bad datatype URI [] eye:badDatatypeURI \u0026quot;*badURI*\u0026quot;; eye:onLiteral \u0026quot;*spelling*\u0026quot;. A literal with the lexical form spelling has the illegal datatype URI badURI.\nLiteralInspector: bad lexical form [] eye:badLexicalForm \u0026quot;*spelling*\u0026quot;; eye:forDatatype \u0026quot;*dtURI*\u0026quot;. A literal with the datatype URI dtURI has the lexical form spelling, which isn\u0026rsquo;t legal for that datatype.\nPrefixInspector: bad namespace URI [] eye:badNamespaceURI \u0026quot;*URIString*\u0026quot; ; eye:onPrefix \u0026quot;*prefix*\u0026quot; ; eye:forReason *Reason*. The namespace URIString for the declaration of prefix is suspicious for the given Reason (see the URIInspector reports for details of the possible reasons).\nPrefixInspector: Jena prefix found [] eye:jenaPrefixFound \u0026quot;*j.Digits*\u0026quot;; eye:forNamespace \u0026quot;*URIString*\u0026quot;. The namespace URIString has an automatically-generated Jena prefix.\nPrefixInspector: multiple prefixes for namespace [] eye:multiplePrefixesForNamespace \u0026quot;*NameSpace*\u0026quot; ; eye:onPrefix \u0026quot;*prefix\u0026lt;sub\u0026gt;1\u0026lt;/sub\u0026gt;\u0026quot;* ... There are multiple prefix declarations for NameSpace, namely, prefix1 etc.\nVocabularyInspector: not from schema [] eye:notFromSchema \u0026quot;*NameSpace*\u0026quot;; eye:onResource *Resource*. The Resource has a URI in the NameSpace, but isn\u0026rsquo;t declared in the schema associated with that NameSpace.\nOwlSyntaxInspector: suspicious restriction [] eye:suspiciousRestriction *R*; eye:forReason *Reason*... The presumed restriction R is suspicious for the given Reasons:\neye:missingOnProperty \u0026ndash; there is no owl:onProperty property in this suspicious restriction. eye:multipleOnProperty \u0026ndash; there are multiple owl:onProperty properties in this suspicious restriction. eye:missingConstraint \u0026ndash; there is no owl:hasValue, owl:allValuesFrom, owl:someValuesFrom, or owl:[minC|maxC|c]ardinality property in this suspicious restriction. eye:multipleConstraint \u0026ndash; there are multiple constraints (as above) in this suspicious restriction. The restriction R is identified by (a) supplying its immediate properties, and (b) identifying its named equivalent classes and subclasses.\nSparqlDrivenInspector: require failed [] eye:sparqlRequireFailed \u0026quot;*message*\u0026quot;. A SPARQL query that was required to succeed against the model did not. The message is either the query that failed or a meaningful description, depending on the inspector configuration.\nSparqlDrivenInspector: prohibit failed [] eye:sparqlProhibitFailed \u0026quot;*message*\u0026quot;. A SPARQL query that was required to fail against the model did not. The message is either the query that succeeded or a meaningful description, depending on the inspector configuration.\nAllTypedInspector: should have type [] eye:shouldHaveType *Resource*. The Resource has no rdf:type. Note that when using models with inference, this report is unlikely, since inference may well give the resource a type even if it has no explicit type in the original model.\nConsistentTypeInspector: inconsistent types for resource [] eye:noConsistentTypeFor *URI* ; eye:hasAttachedType *TypeURI\u0026lt;sub\u0026gt;i\u0026lt;/sub\u0026gt;* ... The resource URI has been given the various types TypeURIi, but if we assume that subtypes are disjoint unless otherwise specified, these types have no intersection.\nThe ConsistentTypeInspector must do at least some type inference. This release of Eyeball compromises by doing RDFS inference augmented by (very) limited union and intersection reasoning, as described in the Jena rules in etc/owl-like.rules, so its reports must be treated with caution. Even with these restrictions, doing type inference over a large model is costly: you may need to suppress it with -exclude until any other warnings are dealt with.\nWhile, technically, a resource with no attached types at all is automatically inconsistent, Eyeball quietly ignores such resources, since they turn up quite often in simple RDF models.\nCardinalityInspector: cardinality failure [] eye:cardinalityFailure *Subject*; eye:onType *T*; eye:onProperty *P* The Subject has a cardinality-constrained rdf:type T with owl:onProperty P, but the number of distinct values in the model isn\u0026rsquo;t consistent with the cardinality restriction.\nAdditional properties describe the cardinality restriction and the values found:\neye:numValues N: the number of distinct values for (Subject, P) in the model. eye:cardinality [eye:min min; eye:max max]: the minimum and maximum cardinalities permitted. eye:values Set: A blank node of type eye:Set with an rdfs:member value for each of the values of P. ListInspector: ill-formed list [] eye:illFormedList *URI* ; eye:because [eye:element *index\u0026lt;sub\u0026gt;i\u0026lt;/sub\u0026gt;*; *Problem\u0026lt;sub\u0026gt;i~*]~i\u0026lt;/sub\u0026gt; ... The list starting at URI is ill-formed because the element with index indexi had Problemi. The possible problems are:\neye:hasNoRest \u0026ndash; the element has no rdf:rest property. eye:hasMultipleRests \u0026ndash; the element has more than one rdf:rest property. eye:hasNoFirst \u0026ndash; the element has no rdf:first property. eye:hasMultipleFirsts \u0026ndash; the element has more than one rdf:rest property. ListInspector: suspect list idiom [] eye:suspectListIdiom *Type*. The resource Type looks like it\u0026rsquo;s supposed to be a use of the \u0026ldquo;typed list idiom\u0026rdquo;, but it isn\u0026rsquo;t complete/accurate.\nInside the Eyeball configuration file Configuration files The Eyeball command-line utility is configured by files (or URLs) specified on the command line: their RDF contents are unioned together into a single config model. If no config file is specified, then etc/eyeball-config.n3 is loaded. The configuration file is a Jena assembler description (see Assemblers) with added Eyeball vocabulary.\nEyeball is also configured by the location-mapping file etc/location-mapping.n3. The Eyeball jar contains copies of both the default config and the location mapper; these are used by default. You can provide your own etc/eyeball-config.n3 file earlier on your classpath or in your current directory; this config replaces the default. You may provide additional location-mapping files earlier on your classpath or in your current directory.\nConfiguring schema names To avoid having to quote schema names in full on the Eyeball command line, (collections of) schemas can be given short names. [] eye:shortName shortNameLiteral ; eye:schema fullSchemaURL \u0026hellip; .\nA shortname can name several schemas. The Eyeball delivery has the short names rdf, rdfs, owl, and dc for the corresponding schemas (and mirror versions of those schemas so that they don\u0026rsquo;t need to be downloaded each time Eyeball is run.)\nConfiguring inspectors The inspectors that Eyeball runs over the model are specified by eye:inspector properties of inspector resources. These resources are identified by eye:shortNames (supplied on the command line). Each such property value must be a plain string literal whose value is the full name of the Inspector class to load and run; see the Javadoc of Inspector for details.\nAn inspector resource may refer to other inspector resources to include their inspectors, using either of the two properties eye:include or eye:includeByName. The value of an include property should be another inspector resource; the value of an includeByName property should be the shortName of an inspector resource.\nConfiguring the URI inspector As well as applying the standard URI rules, Eyeball allows extra pattern-oriented checks to be applied to URIs. These are specified by eye:check properties of the URIInspector object in the configuration.\nThe object of an eye:check property is a bnode with eye:prefix, eye:prohibit, and eye:require properties. The objects of these properties must be string literals.\nIf a URI U can be split into a prefix P and suffix S, and there is a check property with that prefix, and either:\nthere\u0026rsquo;s a prohibit property and S matches the object of that property, or there\u0026rsquo;s a require property and S does not match the object of that property, then a problem is reported. If there are multiple prohibits, then a problem is reported if any prohibition is violated; if there are multiple requires, a problem is reported if none of them succeed.\neye:URIInspector eye:check [eye:prefix \u0026quot;urn:x-hp:\u0026quot;; eye:prohibit \u0026quot;.*:.*\u0026quot;] ; [eye:prefix \u0026quot;\u0026quot;; eye:require \u0026quot;.*eyeball.*\u0026quot;] The prefixes, requires, and prohibits are treated as Java patterns. The URI inspector can be configured to report URIs with an empty local name. These arise because the meaning of \u0026ldquo;local name\u0026rdquo; comes from XML, and in XML a local name must start with an NCName character, typically a letter but not a digit. Hence URIs like have an empty local name. This is sometimes confusing.\nTo report empty local names, add the property eye:reportEmptyLocalNames to the inspector eye:URIInspector with the property value true. You may edit the configuration file or use the -set command-line option.\nConfiguring the vocabulary inspector The vocabulary inspector defaults to assuming that schema namespaces are closed. To disable this for specified namespaces, the inspector object in the configuration can be given eye:openNamespace properties.\nThe object of each of these properties must be a resource; the URI of this resource is an open namespace for which the inspector will not report problems.\neye:VocabularyInspector eye:openNamespace \u0026lt;\u0026gt; Configuring the SPARQL-driven inspector The SPARQL inspector object in the configuration may be given eye:sparql properties whose objects are resources specifying SPARQL queries and problem messages.\neye:SparqlDrivenInspector eye:sparql [...] The resource may specify a SPARQL query which must succeed in the model, and a message to produce if it does not.\neye:SparqlDrivenInspector eye:sparql [eye:require \u0026quot;select * where {?s ?p ?o}\u0026quot;; eye:message \u0026quot;must be non-empty\u0026quot;] If the query is non-trivial, the string may contain a reference to a file containing the query, rather than the entire query.\neye:require \u0026quot;@'/home/kers/example/query-one.sparql'\u0026quot; The quoted filename is read using the Jena file manager and so respects any filename mappings. \u0026ldquo;@\u0026rdquo; characters not followed by \u0026ldquo;\u0026rsquo;\u0026rdquo; are not subject to substitution, except that the sequence \u0026ldquo;@@\u0026rdquo; is replaced by \u0026ldquo;@\u0026rdquo;.\nUsing eye:prohibit rather than eye:require means that the problem is reported if the query succeeds, rather than if it fails.\nConfiguring renderers The renderer class that Eyeball uses to render the report into text is giving in the config file by triples of the form:\n[] eye:renderer FullClassName ; eye:shortName ShortClassHandler The FullClassName is a string literal giving the full class name of the rendering class. That class must implement the Renderer interface and have a constructor that takes a Resource, its configuration root, as its argument.\nThe ShortClassHandle is a string literal giving the short name used to refer to the class. The default short name used is default. There should be no more than one eye:shortName statement with the same ShortClassHandle in the configuration file, but the same class can have many different short names.\nThe TextRenderer supports an additional property eye:labels to allow the appropriate labels for an ontology to be supplied to the renderer. Each object of a eye:labels statement names a model; all the rdfs:label statements in that model are used to supply strings which are used to render resources.\nThe model names are strings which are interpreted by Jena\u0026rsquo;s FileManager, so they may be redirected using Jena\u0026rsquo;s file mappings.\nInside the Eyeball code Eyeball can be used from within Java code; the command line merely provides a convenient external interface.\nCreating an Eyeball An Eyeball object has three subcomponents: the assumptions against which the model is to be checked, the inspectors which do the checking, and the renderer used to display the reports.\nThe assumptions are bundled into a single OntModel. Multiple assumptions can be supplied either by adding them as sub-models or by loading their content directly into the OntModel.\nThe inspectors are supplied as a single Inspector object. The method Inspector.Operations.create(List) creates a single Inspector from a list of Inspectors; this inspector delegates all its inspection methods to all of its sub-inspectors.\nThe renderer can be anything that implements the (simple) renderer interface.\nTo create an Eyeball:\nEyeball eyeball = new Eyeball( inspector, assumptions, renderer ); To eyeball a model Models to be inspected are provided as OntModels. The problems are delivered to a Report object, where they are represented as an RDF model.\neyeball.inspect( report, ontModelToBeInspected ) The result is that same report object. The Report::model() method delivers an RDF model which describes the problems found by the inspection. The inspections supplied in the distribution use the EYE vocabulary, and are used in the standard reports:\nEvery report item in the model is a blank node with rdf:type eye:Item. See earlier sections for the descriptions of the properties attached to an Item.\nRebuilding Eyeball The provided ant script can be used to rebuild Eyeball from source:\nant clean build jar (Omitting clean will do an incremental build, useful for small changes.)\nThe libraries required by Eyeball are all in the lib directory, including the necessary Jena jars.\nCreating and configuring an inspector To make a new inspector available to Eyeball, a new Inspector class must be created and that class has to be described in the Eyeball configuration.\nCreating an Inspector Any inspector must implement the Inspector interface, which has four operations:\nbegin( Report r, OntModel assume ): Begin a new inspection. r is the Report object which will accept the reports in this inspection; assume is the model containing the assumed ontologies. begin is responsible for declaring this inspectors report properties. inspectModel( Report r, OntModel m ): Do a whole-model inspection of m, issuing reports to r. inspectStatement( Report r, Statement s ): Inspect the single statement s, issuing reports to r. end( Report r ): Do any tidying-up reports required. Typically end and one of inspectModel or inspectStatement do nothing.\nAn inspector must also have a constructor that takes a Resource argument. When Eyeball creates the Inspector object, it passes the Resource which is the root of this inspector\u0026rsquo;s configuration. (This is, for example, how the SPARQL-driven inspector receives the query strings to use.)\nDevelopers may find the class InpsectorBase useful; it has empty implementations for all the Inspector methods. They may also find InspectorTestBase useful when writing their inspector\u0026rsquo;s tests, both for its convenience methods and because it requires that their class has the appropriate constructors.\nReports and report properties Eyeball reports are statements in a report model. To let the renderer know which property of a report is the \u0026ldquo;main\u0026rdquo; one, and which order the other properties should appear in, the inspector\u0026rsquo;s begin method should declare the properties:\nr.declareProperty( EYE.badDatatypeURI ); r.declareOrder( EYE.badLanguage, EYE.onLiteral ); declareProperty(P) announces that P is a report property of this inspector. declareOrder(F,S) says that both F and S are report properties, and that F should appear before S in the rendered report.\nReports are made up of report items, which are the subjects of the report properties. To create a report item, use one of reportItem() or reportItem(S). The second form is appropriate when the report is attached to some statement S of the model being inspected; a report renderer will attempt to display S.\nTo add the main property to a report item R, use R.addMainProperty(P,O); to add non-main properties, use R.addProperty(P,O).\nConfiguring an inspector To add an inspector to a configuration file, choose a URI for it (here we\u0026rsquo;re using my:Fresh and assuming a prefix declaration for my:) and a short name (here, \u0026ldquo;fresh\u0026rdquo;) and add a description to the configuration file:\nmy:Fresh a eye:Inspector ; eye:shortName \u0026quot;fresh\u0026quot; ; rdfs:label \u0026quot;fresh checks for my application\u0026quot; ; eye:className \u0026quot;\u0026quot; . Replace with the full classname of your inspector. Now you can use Fresh by adding -include fresh to the Eyeball command line (and ensuring that the class is on your classpath).\nIf you want Fresh to be included by default, then you must add it as an eye:inspector property of the configuration root, eg:\neye:eyeball a eye:Eyeball ; eye:inspector eye:PrefixInspector, # as delivered my:FreshInspector, # new inspector eye:URIInspector, # as delivered ... ","permalink":"","tags":null,"title":"Jena Eyeball manual"},{"categories":null,"contents":"This extension to ARQ combines SPARQL and full text search via Lucene. It gives applications the ability to perform indexed full text searches within SPARQL queries. Here is a version compatibility table:\nJena Lucene Solr ElasticSearch upto 3.2.0 5.x or 6.x 5.x or 6.x not supported 3.3.0 - 3.9.0 6.4.x not supported 5.2.2 - 5.2.13 3.10.0 7.4.0 not supported 6.4.2 3.15.0 - 3.17.0 7.7.x not supported 6.8.6 4.0.0 - 4.6.1 8.8.x not supported not supported 4.7.0 - current 9.4.x not supported not supported Note: In Lucene 9, the default setup of the StandardAnalyzer changed to having no stop words. For more details, see analyzer specifications below.\nSPARQL allows the use of regex in FILTERs which is a test on a value retrieved earlier in the query so its use is not indexed. For example, if you\u0026rsquo;re searching for occurrences of \u0026quot;printer\u0026quot; in the rdfs:label of a bunch of products:\nPREFIX ex: \u0026lt;\u0026gt; PREFIX rdfs: \u0026lt;\u0026gt; SELECT ?s ?lbl WHERE { ?s a ex:Product ; rdfs:label ?lbl FILTER regex(?lbl, \u0026quot;printer\u0026quot;, \u0026quot;i\u0026quot;) } then the search will need to examine all selected rdfs:label statements and apply the regular expression to each label in turn. If there are many such statements and many such uses of regex, then it may be appropriate to consider using this extension to take advantage of the performance potential of full text indexing.\nText indexes provide additional information for accessing the RDF graph by allowing the application to have indexed access to the internal structure of string literals rather than treating such literals as opaque items. Unlike FILTER, an index can set the values of variables. Assuming appropriate configuration, the above query can use full text search via the ARQ property function extension, text:query:\nPREFIX ex: \u0026lt;\u0026gt; PREFIX rdfs: \u0026lt;\u0026gt; PREFIX text: \u0026lt;\u0026gt; SELECT ?s ?lbl WHERE { ?s a ex:Product ; text:query (rdfs:label 'printer') ; rdfs:label ?lbl } This query makes a text query for 'printer' on the rdfs:label property; and then looks in the RDF data and retrieves the complete label for each match.\nThe full text engine can be either Apache Lucene hosted with Jena on a single machine, or Elasticsearch for a large scale enterprise search application where the full text engine is potentially distributed across separate machines.\nThis example code illustrates creating an in-memory dataset with a Lucene index.\nArchitecture In general, a text index engine (Lucene or Elasticsearch) indexes documents where each document is a collection of fields, the values of which are indexed so that searches matching contents of specified fields can return a reference to the document containing the fields with matching values.\nThere are two models for extending Jena with text indexing and search:\nOne Jena triple equals one Lucene document One Lucene document equals one Jena entity One triple equals one document The basic Jena text extension associates a triple with a document and the property of the triple with a field of a document and the object of the triple (which must be a literal) with the value of the field in the document. The subject of the triple then becomes another field of the document that is returned as the result of a search match to identify what was matched. (NB, the particular triple that matched is not identified. Only, its subject and optionally the matching literal and match score.)\nIn this manner, the text index provides an inverted index that maps query string matches to subject URIs.\nA text-indexed dataset is configured with a description of which properties are to be indexed. When triples are added, any properties matching the description cause a document to be added to the index by analyzing the literal value of the triple object and mapping to the subject URI. On the other hand, it is necessary to specifically configure the text-indexed dataset to delete index entries when the corresponding triples are dropped from the RDF store.\nThe text index uses the native query language of the index: Lucene query language (with restrictions) or Elasticsearch query language.\nOne document equals one entity There are two approaches to creating indexed documents that contain more than one indexed field:\nUsing an externally maintained Lucene index Multiple fields per document When using this integration model, text:query returns the subject URI for the document on which additional triples of metadata may be associated, and optionally the Lucene score for the match.\nExternal content When document content is externally indexed via Lucene and accessed in Jena via a text:TextDataset then the subject URI returned for a search result is considered to refer to the external content, and metadata about the document is represented as triples in Jena with the subject URI.\nThere is no requirement that the indexed document content be present in the RDF data. As long as the index contains the index text documents to match the index description, then text search can be performed with queries that explicitly mention indexed fields in the document.\nThat is, if the content of a collection of documents is externally indexed and the URI naming the document is the result of the text search, then an RDF dataset with the document metadata can be combined with accessing the content by URI.\nThe maintenance of the index is external to the RDF data store.\nExternal applications By using Elasticsearch, other applications can share the text index with SPARQL search.\nDocument structure As mentioned above, when using the (default) one-triple equals one-document model, text indexing of a triple involves associating a Lucene document with the triple. How is this done?\nLucene documents are composed of Fields. Indexing and searching are performed over the contents of these Fields. For an RDF triple to be indexed in Lucene the property of the triple must be configured in the entity map of a TextIndex. This associates a Lucene analyzer with the property which will be used for indexing and search. The property becomes the searchable Lucene Field in the resulting document.\nA Lucene index includes a default Field, which is specified in the configuration, that is the field to search if not otherwise named in the query. In jena-text this field is configured via the text:defaultField property which is then mapped to a specific RDF property via text:predicate (see entity map below).\nThere are several additional Fields that will be included in the document that is passed to the Lucene IndexWriter depending on the configuration options that are used. These additional fields are used to manage the interface between Jena and Lucene and are not generally searchable per se.\nThe most important of these additional Fields is the text:entityField. This configuration property defines the name of the Field that will contain the URI or blank node id of the subject of the triple being indexed. This property does not have a default and must be specified for most uses of jena-text. This Field is often given the name, uri, in examples. It is via this Field that ?s is bound in a typical use such as:\nselect ?s where { ?s text:query \u0026quot;some text\u0026quot; } Other Fields that may be configured: text:uidField, text:graphField, and so on are discussed below.\nGiven the triple:\nex:SomeOne skos:prefLabel \u0026quot;zorn protégé a prés\u0026quot;@fr ; The following is an abbreviated illustration a Lucene document that Jena will create and request Lucene to index:\nDocument\u0026lt; \u0026lt;uri:\u0026gt; \u0026lt;graph:urn:x-arq:DefaultGraphNode\u0026gt; \u0026lt;label:zorn protégé a prés\u0026gt; \u0026lt;lang:fr\u0026gt; \u0026lt;uid:28959d0130121b51e1459a95bdac2e04f96efa2e6518ff3c090dfa7a1e6dcf00\u0026gt; \u0026gt; It may be instructive to refer back to this example when considering the various points below.\nQuery with SPARQL The URI of the text extension property function is more conveniently written:\nPREFIX text: \u0026lt;\u0026gt; ... text:query ... Syntax The following forms are all legal:\n?s text:query 'word' # query ?s text:query ('word' 10) # with limit on results ?s text:query (rdfs:label 'word') # query specific property if multiple ?s text:query (rdfs:label 'protégé' 'lang:fr') # restrict search to French (?s ?score) text:query 'word' # query capturing also the score (?s ?score ?literal) text:query 'word' # ... and original literal value (?s ?score ?literal ?g) text:query 'word' # ... and the graph The most general form when using the default one-triple equals one-document integration model is:\n( ?s ?score ?literal ?g ) text:query ( property* 'query string' limit 'lang:xx' 'highlight:yy' ) while for the one-document equals one-entity model, the general form is:\n( ?s ?score ) text:query ( 'query string' limit ) and if only the subject URI is needed:\n?s text:query ( 'query string' limit ) Input arguments: Argument Definition property (zero or more) property URIs (including prefix name form) query string Lucene query string fragment limit (optional) int limit on the number of results lang:xx (optional) language tag spec highlight:yy (optional) highlighting options The property URI is only necessary if multiple properties have been indexed and the property being searched over is not the default field of the index.\nSince 3.13.0, property may be a list of zero or more (prior to 3.13.0 zero or one) Lucene indexed properties, or a defined text:propList of indexed properties. The meaning is an OR of searches on a variety of properties. This can be used in place of SPARQL level UNIONs of individual text:querys. For example, instead of:\nselect ?foo where { { (?s ?sc ?lit) text:query ( rdfs:label \u0026quot;some query\u0026quot; ). } union { (?s ?sc ?lit) text:query ( skos:altLabel \u0026quot;some query\u0026quot; ). } union { (?s ?sc ?lit) text:query ( skos:prefLabel \u0026quot;some query\u0026quot; ). } } it can be more performant to push the unions into the Lucene query by rewriting as:\n(?s ?sc ?lit) text:query ( rdfs:label skos:prefLabel skos:altLabel \u0026quot;some query\u0026quot; ) which creates a Lucene query:\n(altLabel:\u0026quot;some query\u0026quot; OR prefLabel:\u0026quot;some query\u0026quot; OR label:\u0026quot;some query\u0026quot;) The query string syntax conforms to the underlying Lucene, or when appropriate, Elasticsearch.\nIn the case of the default one-triple equals one-document model, the Lucene query syntax is restricted to Terms, Term modifiers, Boolean Operators applied to Terms, and Grouping of terms.\nAdditionally, the use of Fields within the query string is supported when using the one-document equals one-entity text integration model.\nWhen using the default model, use of Fields in the query string will generally lead to unpredictable results.\nThe optional limit indicates the maximum hits to be returned by Lucene.\nThe lang:xx specification is an optional string, where xx is a BCP-47 language tag. This restricts searches to field values that were originally indexed with the tag xx. Searches may be restricted to field values with no language tag via \u0026quot;lang:none\u0026quot;.\nThe highlight:yy specification is an optional string where yy are options that control the highlighting of search result literals. See below for details.\nIf both limit and one or more of lang:xx or highlight:yy are present, then limit must precede these arguments.\nIf only the query string is required, the surrounding ( ) may be omitted.\nOutput arguments: Argument Definition subject URI The subject of the indexed RDF triple. score (optional) The score for the match. literal (optional) The matched object literal. graph URI (optional) The graph URI of the triple. property URI (optional) The property URI of the matched triple The results include the subject URI; the score assigned by the text search engine; and the entire matched literal (if the index has been configured to store literal values). The subject URI may be a variable, e.g., ?s, or a URI. In the latter case the search is restricted to triples with the specified subject. The score, literal, graph URI, and property URI must be variables. The property URI is meaningful when two or more properties are used in the query.\nQuery strings There are several points that need to be considered when formulating SPARQL queries using either of the Lucene integration models.\nAs mentioned above, in the case of the default model the query string syntax is restricted to Terms, Term modifiers, Boolean Operators applied to Terms, and Grouping of terms.\nExplicit use of Fields in the query string is only useful with the one-document equals one-entity model; and otherwise will generally produce unexpected results. See Queries across multiple Fields.\nSimple queries The simplest use of the jena-text Lucene integration is like:\n?s text:query \u0026quot;some phrase\u0026quot; This will bind ?s to each entity URI that is the subject of a triple that has the default property and an object literal that matches the argument string, e.g.:\nex:AnEntity skos:prefLabel \u0026quot;this is some phrase to match\u0026quot; This query form will indicate the subjects that have literals that match for the default property which is determined via the configuration of the text:predicate of the text:defaultField (in the above this has been assumed to be skos:prefLabel.\nFor a non-default property it is necessary to specify the property as an input argument to the text:query:\n?s text:query (rdfs:label \u0026quot;protégé\u0026quot;) (see below for how RDF property names are mapped to Lucene Field names).\nIf this use case is sufficient for your needs you can skip on to the sections on configuration.\nPlease note that the query:\n?s text:query \u0026quot;some phrase\u0026quot; when using the Lucene StandardAnalyzer or similar will treat the query string as an OR of terms: some and phrase. If a phrase search is required then it is necessary to surround the phrase by double quotes, \u0026quot;:\n?s text:query \u0026quot;\\\u0026quot;some phrase\\\u0026quot;\u0026quot; This will only match strings that contain \u0026quot;some phrase\u0026quot;, while the former query will match strings like: \u0026quot;there is a phrase for some\u0026quot; or \u0026quot;this is some of the various sorts of phrase that might be matched\u0026quot;.\nQueries with language tags When working with rdf:langStrings it is necessary that the text:langField has been configured. Then it is as simple as writing queries such as:\n?s text:query \u0026quot;protégé\u0026quot;@fr to return results where the given term or phrase has been indexed under French in the text:defaultField.\nIt is also possible to use the optional lang:xx argument, for example:\n?s text:query (\u0026quot;protégé\u0026quot; 'lang:fr') . In general, the presence of a language tag, xx, on the query string or lang:xx in the text:query adds AND lang:xx to the query sent to Lucene, so the above example becomes the following Lucene query:\n\u0026quot;label:protégé AND lang:fr\u0026quot; For non-default properties the general form is used:\n?s text:query (skos:altLabel \u0026quot;protégé\u0026quot; 'lang:fr') Note that an explicit language tag on the query string takes precedence over the lang:xx, so the following\n?s text:query (\u0026quot;protégé\u0026quot;@fr 'lang:none') will find French matches rather than matches indexed without a language tag.\nQueries that retrieve literals It is possible to retrieve the literals that Lucene finds matches for assuming that\n\u0026lt;#TextIndex#\u0026gt; text:storeValues true ; has been specified in the TextIndex configuration. So\n(?s ?sc ?lit) text:query (rdfs:label \u0026quot;protégé\u0026quot;) will bind the matching literals to ?lit, e.g.,\n\u0026quot;zorn protégé a prés\u0026quot;@fr Note it is necessary to include a variable to capture the Lucene score even if this value is not otherwise needed since the literal variable is determined by position.\nQueries with graphs Assuming that the text:graphField has been configured, then, when a triple is indexed, the graph that the triple resides in is included in the document and may be used to restrict searches or to retrieve the graph that a matching triple resides in.\nFor example:\nselect ?s ?lit where { graph ex:G2 { (?s ?sc ?lit) text:query \u0026quot;zorn\u0026quot; } . } will restrict searches to triples with the default property that reside in graph, ex:G2.\nOn the other hand:\nselect ?g ?s ?lit where { graph ?g { (?s ?sc ?lit) text:query \u0026quot;zorn\u0026quot; } . } will iterate over the graphs in the dataset, searching each in turn for matches.\nIf there is suitable structure to the graphs, e.g., a known rdf:type and depending on the selectivity of the text query and number of graphs, it may be more performant to express the query as follows:\nselect ?g ?s ?lit where { (?s ?sc ?lit) text:query \u0026quot;zorn\u0026quot; . graph ?g { ?s a ex:Item } . } Further, if tdb:unionDefaultGraph true for a TDB dataset backing a Lucene index then it is possible to retrieve the graphs that contain triples resulting from a Lucene search via the fourth output argument to text:query:\nselect ?g ?s ?lit where { (?s ?sc ?lit ?g) text:query \u0026quot;zorn\u0026quot; . } This will generally perform much better than either of the previous approaches when there are large numbers of graphs since the Lucene search will run once and the returned documents carry the containing graph URI for free as it were.\nQueries across multiple Fields As mentioned earlier, the Lucene text index uses the native Lucene query language.\nMultiple fields in the default integration model For the default integration model, since each document has only one field containing searchable text, searching for documents containing multiple fields will generally not find any results.\nNote that the default model provides three Lucene Fields in a document that are used during searching:\nthe field corresponding to the property of the indexed triple, the field for the language of the literal (if configured), and the graph that the triple is in (if configured). Given these, it should be clear from the above that the default model constructs a Lucene query from the property, query string, lang:xx, and SPARQL graph arguments.\nFor example, consider the following triples:\nex:SomePrinter rdfs:label \u0026quot;laser printer\u0026quot; ; ex:description \u0026quot;includes a large capacity cartridge\u0026quot; . assuming an appropriate configuration, if we try to retrieve ex:SomePrinter with the following Lucene query string:\n?s text:query \u0026quot;label:printer AND description:\\\u0026quot;large capacity cartridge\\\u0026quot;\u0026quot; then this query can not find the expected results since the AND is interpreted by Lucene to indicate that all documents that contain a matching label field and a matching description field are to be returned; yet, from the discussion above regarding the structure of Lucene documents in jena-text it is evident that there is not one but rather in fact two separate documents one with a label field and one with a description field so an effective SPARQL query is:\n?s text:query (rdfs:label \u0026quot;printer\u0026quot;) . ?s text:query (ex:description \u0026quot;large capacity cartridge\u0026quot;) . which leads to ?s being bound to ex:SomePrinter.\nIn other words when a query is to involve two or more properties of a given entity then it is expressed at the SPARQL level, as it were, versus in Lucene\u0026rsquo;s query language.\nIt is worth noting that the equivalent of a Lucene OR of Fields can be expressed using SPARQL union, though since 3.13.0 this can be expressed in Jena text using a property list - see Input arguments:\n{ ?s text:query (rdfs:label \u0026quot;printer\u0026quot;) . } union { ?s text:query (ex:description \u0026quot;large capacity cartridge\u0026quot;) . } Suppose the matching literals are required for the above then it should be clear from the above that:\n(?s ?sc1 ?lit1) text:query (skos:prefLabel \u0026quot;printer\u0026quot;) . (?s ?sc2 ?lit2) text:query (ex:description \u0026quot;large capacity cartridge\u0026quot;) . will be the appropriate form to retrieve the subject and the associated literals, ?lit1 and ?lit2. (Obviously, in general, the score variables, ?sc1 and ?sc2 must be distinct since it is very unlikely that the scores of the two Lucene queries will ever match).\nThere is no loss of expressiveness of the Lucene query language versus the jena-text integration of Lucene. Any cross-field ANDs are replaced by concurrent SPARQL calls to text:query as illustrated above and uses of Lucene OR can be converted to SPARQL unions. Uses of Lucene NOT are converted to appropriate SPARQL filters.\nMultiple fields in the one-document equals one-entity model If Lucene documents have been indexed with multiple searchable fields then compound queries expressed directly in the Lucene query language can significantly improve search performance, in particular, where the individual components of the Lucene query generate a lot of results which must be combined in SPARQL.\nIt is possible to have text queries that search multiple fields within a text query. Doing this is more complex as it requires the use of either an externally managed text index or code must be provided to build the multi-field text documents to be indexed. See Multiple fields per document.\nQueries with Boolean Operators and Term Modifiers On the other hand the various features of the Lucene query language are all available to be used for searches within a Field. For example, Boolean Operators on Terms:\n?s text:query (ex:description \u0026quot;(large AND cartridge)\u0026quot;) and\n(?s ?sc ?lit) text:query (ex:description \u0026quot;(includes AND (large OR capacity))\u0026quot;) or fuzzy searches:\n?s text:query (ex:description \u0026quot;include~\u0026quot;) and so on will work as expected.\nAlways surround the query string with ( ) if more than a single term or phrase are involved.\nHighlighting The highlighting option uses the Lucene Highlighter and SimpleHTMLFormatter to insert highlighting markup into the literals returned from search results (hence the text dataset must be configured to store the literals). The highlighted results are returned via the literal output argument. This highlighting feature, introduced in version 3.7.0, does not require re-indexing by Lucene.\nThe simplest way to request highlighting is via 'highlight:'. This will apply all the defaults:\nOption Key Default maxFrags m: 3 fragSize z: 128 start s: RIGHT_ARROW end e: LEFT_ARROW fragSep f: DIVIDES joinHi jh: true joinFrags jf: true to the highlighting of the search results. For example if the query is:\n(?s ?sc ?lit) text:query ( \u0026quot;brown fox\u0026quot; \u0026quot;highlight:\u0026quot; ) then a resulting literal binding might be:\n\u0026quot;the quick ↦brown fox↤ jumped over the lazy baboon\u0026quot; The RIGHT_ARROW is Unicode \\u21a6 and the LEFT_ARROW is Unicode \\u21a4. These are chosen to be single characters that in most situations will be very unlikely to occur in resulting literals. The fragSize of 128 is chosen to be large enough that in many situations the matches will result in single fragments. If the literal is larger than 128 characters and there are several matches in the literal then there may be additional fragments separated by the DIVIDES, Unicode \\u2223.\nDepending on the analyzer used and the tokenizer, the highlighting will result in marking each token rather than an entire phrase. The joinHi option is by default true so that entire phrases are highlighted together rather than as individual tokens as in:\n\u0026quot;the quick ↦brown↤ ↦fox↤ jumped over the lazy baboon\u0026quot; which would result from:\n(?s ?sc ?lit) text:query ( \u0026quot;brown fox\u0026quot; \u0026quot;highlight:jh:n\u0026quot; ) The jh and jf boolean options are set false via n. Any other value is true. The defaults for these options have been selected to be reasonable for most applications.\nThe joining is performed post highlighting via Java String replaceAll rather than using the Lucene Unified Highlighter facility which requires that term vectors and positions be stored. The joining deletes extra highlighting with only intervening Unicode separators, \\p{Z}.\nThe more conventional output of the Lucene SimpleHTMLFormatter with html emphasis markup is achieved via, \u0026quot;highlight:s:\u0026lt;em class='hiLite'\u0026gt; | e:\u0026lt;/em\u0026gt;\u0026quot; (highlight options are separated by a Unicode vertical line, \\u007c. The spaces are not necessary). The result with the above example will be:\n\u0026quot;the quick \u0026lt;em class='hiLite'\u0026gt;brown fox\u0026lt;/em\u0026gt; jumped over the lazy baboon\u0026quot; which would result from the query:\n(?s ?sc ?lit) text:query ( \u0026quot;brown fox\u0026quot; \u0026quot;highlight:s:\u0026lt;em class='hiLite'\u0026gt; | e:\u0026lt;/em\u0026gt;\u0026quot; ) Good practice From the above it should be clear that best practice, except in the simplest cases is to use explicit text:query forms such as:\n(?s ?sc ?lit) text:query (ex:someProperty \u0026quot;a single Field query\u0026quot;) possibly with limit and lang:xx arguments.\nFurther, the query engine does not have information about the selectivity of the text index and so effective query plans cannot be determined programmatically. It is helpful to be aware of the following two general query patterns.\nQuery pattern 1 – Find in the text index and refine results Access to the text index is first in the query and used to find a number of items of interest; further information is obtained about these items from the RDF data.\nSELECT ?s { ?s text:query (rdfs:label 'word' 10) ; rdfs:label ?label ; rdf:type ?type } The text:query limit argument is useful when working with large indexes to limit results to the higher scoring results – results are returned in the order of scoring by the text search engine.\nQuery pattern 2 – Filter results via the text index By finding items of interest first in the RDF data, the text search can be used to restrict the items found still further.\nSELECT ?s { ?s rdf:type :book ; dc:creator \u0026quot;John\u0026quot; . ?s text:query (dc:title 'word') ; } Configuration The usual way to describe a text index is with a Jena assembler description. Configurations can also be built with code. The assembler describes a \u0026rsquo;text dataset\u0026rsquo; which has an underlying RDF dataset and a text index. The text index describes the text index technology (Lucene or Elasticsearch) and the details needed for each.\nA text index has an \u0026ldquo;entity map\u0026rdquo; which defines the properties to index, the name of the Lucene/Elasticsearch field and field used for storing the URI itself.\nFor simple RDF use, there will be one field, mapping a property to a text index field. More complex setups, with multiple properties per entity (URI) are possible.\nThe assembler file can be either default configuration file (\u0026hellip;/run/config.ttl) or a custom file in \u0026hellip;run/configuration folder. Note that you can use several files simultaneously.\nYou have to edit the file (see comments in the assembler code below):\nprovide values for paths and a fixed URI for tdb:DatasetTDB modify the entity map : add the fields you want to index and desired options (filters, tokenizers\u0026hellip;) If your assembler file is run/config.ttl, you can index the dataset with this command :\njava -cp ./fuseki-server.jar jena.textindexer --desc=run/config.ttl Once configured, any data added to the text dataset is automatically indexed as well: Building a Text Index.\nText Dataset Assembler The following is an example of an assembler file defining a TDB dataset with a Lucene text index.\n######## Example of a TDB dataset and text index######################### # The main doc sources are: # - # - # - # See for the destination of this file. ######################################################################### @prefix : \u0026lt;http://localhost/jena_example/#\u0026gt; . @prefix rdf: \u0026lt;\u0026gt; . @prefix rdfs: \u0026lt;\u0026gt; . @prefix tdb: \u0026lt;\u0026gt; . @prefix text: \u0026lt;\u0026gt; . @prefix skos: \u0026lt;\u0026gt; @prefix fuseki: \u0026lt;\u0026gt; . [] rdf:type fuseki:Server ; fuseki:services ( :myservice ) . :myservice rdf:type fuseki:Service ; # e.g : `s-query --service=http://localhost:3030/myds \u0026quot;select * ...\u0026quot;` fuseki:name \u0026quot;myds\u0026quot; ; # SPARQL query service : /myds fuseki:endpoint [ fuseki:operation fuseki:query ; ]; # SPARQL query service : /myds/query fuseki:endpoint [ fuseki:operation fuseki:query ; fuseki:name \u0026quot;query\u0026quot; ]; # SPARQL update service : /myds/update fuseki:endpoint [ fuseki:operation fuseki:update ; fuseki:name \u0026quot;update\u0026quot; ]; # SPARQL Graph store protocol (read and write) : /myds/data fuseki:endpoint [ fuseki:operation fuseki:gsp-rw ; fuseki:name \u0026quot;data\u0026quot; ]; # The text-enabled dataset fuseki:dataset :text_dataset ; . ## --------------------------------------------------------------- # A TextDataset is a regular dataset with a text index. :text_dataset rdf:type text:TextDataset ; text:dataset :mydataset ; # \u0026lt;-- replace `:my_dataset` with the desired URI text:index \u0026lt;#indexLucene\u0026gt; ; . # A TDB dataset used for RDF storage :mydataset rdf:type tdb:DatasetTDB ; # \u0026lt;-- replace `:my_dataset` with the desired URI - as above tdb:location \u0026quot;DB\u0026quot; ; tdb:unionDefaultGraph true ; # Optional . # Text index description \u0026lt;#indexLucene\u0026gt; a text:TextIndexLucene ; text:directory \u0026lt;file:path\u0026gt; ; # \u0026lt;-- replace `\u0026lt;file:path\u0026gt;` with your path (e.g., `\u0026lt;file:/.../fuseki/run/databases/MY_INDEX\u0026gt;`) text:entityMap \u0026lt;#entMap\u0026gt; ; text:storeValues true ; text:analyzer [ a text:StandardAnalyzer ] ; text:queryAnalyzer [ a text:KeywordAnalyzer ] ; text:queryParser text:AnalyzingQueryParser ; text:propLists ( [ . . . ] . . . ) ; text:defineAnalyzers ( [ . . . ] . . . ) ; text:multilingualSupport true ; # optional . # Entity map (see documentation for other options) \u0026lt;#entMap\u0026gt; a text:EntityMap ; text:defaultField \u0026quot;label\u0026quot; ; text:entityField \u0026quot;uri\u0026quot; ; text:uidField \u0026quot;uid\u0026quot; ; text:langField \u0026quot;lang\u0026quot; ; text:graphField \u0026quot;graph\u0026quot; ; text:map ( [ text:field \u0026quot;label\u0026quot; ; text:predicate skos:prefLabel ] ) . See below for more on defining an entity map\nThe text:TextDataset has two properties:\na text:dataset, e.g., a tdb:DatasetTDB, to contain the RDF triples; and\nan index configured to use either text:TextIndexLucene or text:TextIndexES.\nThe \u0026lt;#indexLucene\u0026gt; instance of text:TextIndexLucene, above, has two required properties:\nthe text:directory file URI which specifies the directory that will contain the Lucene index files – if this has the value \u0026quot;mem\u0026quot; then the index resides in memory;\nthe text:entityMap, \u0026lt;#entMap\u0026gt; that will define what properties are to be indexed and other features of the index; and\nand several optional properties:\ntext:storeValues controls the storing of literal values. It indicates whether values are stored or not – values must be stored for the ?literal return value to be available in text:query in SPARQL.\ntext:analyzer specifies the default analyzer configuration to used during indexing and querying. The default analyzer defaults to Lucene\u0026rsquo;s StandardAnalyzer.\ntext:queryAnalyzer specifies an optional analyzer for query that will be used to analyze the query string. If not set the analyzer used to index a given field is used.\ntext:queryParser is optional and specifies an alternative query parser\ntext:propLists is optional and allows to specify lists of indexed properties for use in text:query\ntext:defineAnalyzers is optional and allows specification of additional analyzers, tokenizers and filters\ntext:multilingualSupport enables Multilingual Support\nIf using Elasticsearch then an index would be configured as follows:\n\u0026lt;#indexES\u0026gt; a text:TextIndexES ; # A comma-separated list of Host:Port values of the ElasticSearch Cluster nodes. text:serverList \u0026quot;\u0026quot; ; # Name of the ElasticSearch Cluster. If not specified defaults to 'elasticsearch' text:clusterName \u0026quot;elasticsearch\u0026quot; ; # The number of shards for the index. Defaults to 1 text:shards \u0026quot;1\u0026quot; ; # The number of replicas for the index. Defaults to 1 text:replicas \u0026quot;1\u0026quot; ; # Name of the Index. defaults to jena-text text:indexName \u0026quot;jena-text\u0026quot; ; text:entityMap \u0026lt;#entMap\u0026gt; ; . and text:index \u0026lt;#indexES\u0026gt; ; would be used in the configuration of :text_dataset.\nTo use a text index assembler configuration in Java code is it necessary to identify the dataset URI to be assembled, such as in:\nDataset ds = DatasetFactory.assemble( \u0026quot;text-config.ttl\u0026quot;, \u0026quot;http://localhost/jena_example/#text_dataset\u0026quot;) ; since the assembler contains two dataset definitions, one for the text dataset, one for the base data. Therefore, the application needs to identify the text dataset by it\u0026rsquo;s URI http://localhost/jena_example/#text_dataset.\nLists of Indexed Properties Since 3.13.0, an optional text:TextIndexLucene feature, text:propLists allows to define lists of Lucene indexed properties that may be used in text:querys. For example:\ntext:propLists ( [ text:propListProp ex:labels ; text:props ( skos:prefLabel skos:altLabel rdfs:label ) ; ] [ text:propListProp ex:workStmts ; text:props ( ex:workColophon ex:workAuthorshipStatement ex:workEditionStatement ) ; ] ) ; The text:propLists is a list of property list definitions. Each property list defines a new property, text:propListProp that will be used to refer to the list in a text:query, for example, ex:labels and ex:workStmts, above. The text:props is a list of Lucene indexed properties that will be searched over when the property list property is referred to in a text:query. For example:\n?s text:query ( ex:labels \u0026quot;some text\u0026quot; ) . will request Lucene to search for documents representing triples, ?s ?p ?o, where ?p is one of: rdfs:label OR skos:prefLbael OR skos:altLabel, matching the query string.\nEntity Map definition A text:EntityMap has several properties that condition what is indexed, what information is stored, and what analyzers are used.\n\u0026lt;#entMap\u0026gt; a text:EntityMap ; text:defaultField \u0026quot;label\u0026quot; ; text:entityField \u0026quot;uri\u0026quot; ; text:uidField \u0026quot;uid\u0026quot; ; text:langField \u0026quot;lang\u0026quot; ; text:graphField \u0026quot;graph\u0026quot; ; text:map ( [ text:field \u0026quot;label\u0026quot; ; text:predicate rdfs:label ] ) . Default text field The text:defaultField specifies the default field name that Lucene will use in a query that does not otherwise specify a field. For example,\n?s text:query \u0026quot;\\\u0026quot;bread and butter\\\u0026quot;\u0026quot; will perform a search in the label field for the phrase \u0026quot;bread and butter\u0026quot;\nEntity field The text:entityField specifies the field name of the field that will contain the subject URI that is returned on a match. The value of the property is arbitrary so long as it is unique among the defined names.\nUID Field and automatic document deletion When the text:uidField is defined in the EntityMap then dropping a triple will result in the corresponding document, if any, being deleted from the text index. The value, \u0026quot;uid\u0026quot;, is arbitrary and defines the name of a stored field in Lucene that holds a unique ID that represents the triple.\nIf you configure the index via Java code, you need to set this parameter to the EntityDefinition instance, e.g.\nEntityDefinition docDef = new EntityDefinition(entityField, defaultField); docDef.setUidField(\u0026quot;uid\u0026quot;); Note: If you migrate from an index without deletion support to an index with automatic deletion, you will need to rebuild the index to ensure that the uid information is stored.\nLanguage Field The text:langField is the name of the field that will store the language attribute of the literal in the case of an rdf:langString. This Entity Map property is a key element of the Linguistic support with Lucene index\nGraph Field Setting the text:graphField allows graph-specific indexing of the text index to limit searching to a specified graph when a SPARQL query targets a single named graph. The field value is arbitrary and serves to store the graph ID that a triple belongs to when the index is updated.\nThe Analyzer Map The text:map is a list of analyzer specifications as described below.\nConfiguring an Analyzer Text to be indexed is passed through a text analyzer that divides it into tokens and may perform other transformations such as eliminating stop words. If a Lucene or Elasticsearch text index is used, then by default the Lucene StandardAnalyzer is used.\nAs of Jena 4.7.x / Lucene 9.x onwards, the StandardAnalyzer does not default to having English stopwords if no stop words are provided. The setting up until Apache Lucene 8 had the stopwords:\n\"a\" \"an\" \"and\" \"are\" \"as\" \"at\" \"be\" \"but\" \"by\" \"for\" \"if\" \"in\" \"into\" \"is\" \"it\" \"no\" \"not\" \"of\" \"on\" \"or\" \"such\" \"that\" \"the\" \"their\" \"then\" \"there\" \"these\" \"they\" \"this\" \"to\" \"was\" \"will\" \"with\" In case of a TextIndexLucene the default analyzer can be replaced by another analyzer with the text:analyzer property on the text:TextIndexLucene resource in the text dataset assembler, for example with a SimpleAnalyzer:\n\u0026lt;#indexLucene\u0026gt; a text:TextIndexLucene ; text:directory \u0026lt;file:Lucene\u0026gt; ; text:analyzer [ a text:SimpleAnalyzer ] . It is possible to configure an alternative analyzer for each field indexed in a Lucene index. For example:\n\u0026lt;#entMap\u0026gt; a text:EntityMap ; text:entityField \u0026quot;uri\u0026quot; ; text:defaultField \u0026quot;text\u0026quot; ; text:map ( [ text:field \u0026quot;text\u0026quot; ; text:predicate rdfs:label ; text:analyzer [ a text:StandardAnalyzer ; text:stopWords (\u0026quot;a\u0026quot; \u0026quot;an\u0026quot; \u0026quot;and\u0026quot; \u0026quot;but\u0026quot;) ] ] ) . will configure the index to analyze values of the \u0026rsquo;text\u0026rsquo; field using a StandardAnalyzer with the given list of stop words.\nOther analyzer types that may be specified are SimpleAnalyzer and KeywordAnalyzer, neither of which has any configuration parameters. See the Lucene documentation for details of what these analyzers do. Jena also provides LowerCaseKeywordAnalyzer, which is a case-insensitive version of KeywordAnalyzer, and ConfigurableAnalyzer.\nSupport for the new LocalizedAnalyzer has been introduced in Jena 3.0.0 to deal with Lucene language specific analyzers. See Linguistic Support with Lucene Index for details.\nSupport for GenericAnalyzers has been introduced in Jena 3.4.0 to allow the use of Analyzers that do not have built-in support, e.g., BrazilianAnalyzer; require constructor parameters not otherwise supported, e.g., a stop words FileReader or a stemExclusionSet; and finally use of Analyzers not included in the bundled Lucene distribution, e.g., a SanskritIASTAnalyzer. See Generic and Defined Analyzer Support\nConfigurableAnalyzer ConfigurableAnalyzer was introduced in Jena 3.0.1. It allows more detailed configuration of text analysis parameters by independently selecting a Tokenizer and zero or more TokenFilters which are applied in order after tokenization. See the Lucene documentation for details on what each tokenizer and token filter does.\nThe available Tokenizer implementations are:\nStandardTokenizer KeywordTokenizer WhitespaceTokenizer LetterTokenizer The available TokenFilter implementations are:\nStandardFilter LowerCaseFilter ASCIIFoldingFilter SelectiveFoldingFilter Configuration is done using Jena assembler like this:\ntext:analyzer [ a text:ConfigurableAnalyzer ; text:tokenizer text:KeywordTokenizer ; text:filters (text:ASCIIFoldingFilter, text:LowerCaseFilter) ] From Jena 3.7.0, it is possible to define tokenizers and filters in addition to the built-in choices above that may be used with the ConfigurableAnalyzer. Tokenizers and filters are defined via text:defineAnalyzers in the text:TextIndexLucene assembler section using text:GenericTokenizer and text:GenericFilter.\nAnalyzer for Query New in Jena 2.13.0.\nThere is an ability to specify an analyzer to be used for the query string itself. It will find terms in the query text. If not set, then the analyzer used for the document will be used. The query analyzer is specified on the TextIndexLucene resource:\n\u0026lt;#indexLucene\u0026gt; a text:TextIndexLucene ; text:directory \u0026lt;file:Lucene\u0026gt; ; text:entityMap \u0026lt;#entMap\u0026gt; ; text:queryAnalyzer [ a text:KeywordAnalyzer ] . Alternative Query Parsers New in Jena 3.1.0.\nIt is possible to select a query parser other than the default QueryParser.\nThe available QueryParser implementations are:\nAnalyzingQueryParser: Performs analysis for wildcard queries . This is useful in combination with accent-insensitive wildcard queries.\nComplexPhraseQueryParser: Permits complex phrase query syntax. Eg: \u0026ldquo;(john jon jonathan~) peters*\u0026rdquo;. This is useful for performing wildcard or fuzzy queries on individual terms in a phrase.\nSurroundQueryParser: Provides positional operators (w and n) that accept a numeric distance, as well as boolean operators (and, or, and not, wildcards (* and ?), quoting (with \u0026ldquo;), and boosting (via ^).\nThe query parser is specified on the TextIndexLucene resource:\n\u0026lt;#indexLucene\u0026gt; a text:TextIndexLucene ; text:directory \u0026lt;file:Lucene\u0026gt; ; text:entityMap \u0026lt;#entMap\u0026gt; ; text:queryParser text:AnalyzingQueryParser . Elasticsearch currently doesn\u0026rsquo;t support Analyzers beyond Standard Analyzer.\nConfiguration by Code A text dataset can also be constructed in code as might be done for a purely in-memory setup:\n// Example of building a text dataset with code. // Example is in-memory. // Base dataset Dataset ds1 = DatasetFactory.createMem() ; EntityDefinition entDef = new EntityDefinition(\u0026quot;uri\u0026quot;, \u0026quot;text\u0026quot;, RDFS.label) ; // Lucene, in memory. Directory dir = new RAMDirectory(); // Join together into a dataset Dataset ds = TextDatasetFactory.createLucene(ds1, dir, entDef) ; Graph-specific Indexing jena-text supports storing information about the source graph into the text index. This allows for more efficient text queries when the query targets only a single named graph. Without graph-specific indexing, text queries do not distinguish named graphs and will always return results from all graphs.\nSupport for graph-specific indexing is enabled by defining the name of the index field to use for storing the graph identifier.\nIf you use an assembler configuration, set the graph field using the text:graphField property on the EntityMap, e.g.\n# Mapping in the index # URI stored in field \u0026quot;uri\u0026quot; # Graph stored in field \u0026quot;graph\u0026quot; # rdfs:label is mapped to field \u0026quot;text\u0026quot; \u0026lt;#entMap\u0026gt; a text:EntityMap ; text:entityField \u0026quot;uri\u0026quot; ; text:graphField \u0026quot;graph\u0026quot; ; text:defaultField \u0026quot;text\u0026quot; ; text:map ( [ text:field \u0026quot;text\u0026quot; ; text:predicate rdfs:label ] ) . If you configure the index in Java code, you need to use one of the EntityDefinition constructors that support the graphField parameter, e.g.\nEntityDefinition entDef = new EntityDefinition(\u0026quot;uri\u0026quot;, \u0026quot;text\u0026quot;, \u0026quot;graph\u0026quot;, RDFS.label.asNode()) ; Note: If you migrate from a global (non-graph-aware) index to a graph-aware index, you need to rebuild the index to ensure that the graph information is stored.\nLinguistic support with Lucene index Language tags associated with rdfs:langStrings occurring as literals in triples may be used to enhance indexing and queries. Sub-sections below detail different settings with the index, and use cases with SPARQL queries.\nExplicit Language Field in the Index The language tag for object literals of triples can be stored (during triple insert/update) into the index to extend query capabilities. For that, the text:langField property must be set in the EntityMap assembler :\n\u0026lt;#entMap\u0026gt; a text:EntityMap ; text:entityField \u0026quot;uri\u0026quot; ; text:defaultField \u0026quot;text\u0026quot; ; text:langField \u0026quot;lang\u0026quot; ; . If you configure the index via Java code, you need to set this parameter to the EntityDefinition instance, e.g.\nEntityDefinition docDef = new EntityDefinition(entityField, defaultField); docDef.setLangField(\u0026quot;lang\u0026quot;); Note that configuring the text:langField does not determine a language specific analyzer. It merely records the tag associated with an indexed rdfs:langString.\nSPARQL Linguistic Clause Forms Once the langField is set, you can use it directly inside SPARQL queries. For that the lang:xx argument allows you to target specific localized values. For example:\n//target english literals ?s text:query (rdfs:label 'word' 'lang:en' ) //target unlocalized literals ?s text:query (rdfs:label 'word' 'lang:none') //ignore language field ?s text:query (rdfs:label 'word') Refer above for further discussion on querying.\nLocalizedAnalyzer You can specify a LocalizedAnalyzer in order to benefit from Lucene language specific analyzers (stemming, stop words,\u0026hellip;). Like any other analyzers, it can be done for default text indexing, for each different field or for query.\nUsing an assembler configuration, the text:language property needs to be provided, e.g :\n\u0026lt;#indexLucene\u0026gt; a text:TextIndexLucene ; text:directory \u0026lt;file:Lucene\u0026gt; ; text:entityMap \u0026lt;#entMap\u0026gt; ; text:analyzer [ a text:LocalizedAnalyzer ; text:language \u0026quot;fr\u0026quot; ] . will configure the index to analyze values of the default property field using a FrenchAnalyzer.\nTo configure the same example via Java code, you need to provide the analyzer to the index configuration object:\nTextIndexConfig config = new TextIndexConfig(def); Analyzer analyzer = Util.getLocalizedAnalyzer(\u0026quot;fr\u0026quot;); config.setAnalyzer(analyzer); Dataset ds = TextDatasetFactory.createLucene(ds1, dir, config) ; Where def, ds1 and dir are instances of EntityDefinition, Dataset and Directory classes.\nNote: You do not have to set the text:langField property with a single localized analyzer. Also note that the above configuration will use the FrenchAnalyzer for all strings indexed under the default property regardless of the language tag associated with the literal (if any).\nMultilingual Support Let us suppose that we have many triples with many localized literals in many different languages. It is possible to take all these languages into account for future mixed localized queries. Configure the text:multilingualSupport property to enable indexing and search via localized analyzers based on the language tag:\n\u0026lt;#indexLucene\u0026gt; a text:TextIndexLucene ; text:directory \u0026quot;mem\u0026quot; ; text:multilingualSupport true; . Via Java code, set the multilingual support flag :\nTextIndexConfig config = new TextIndexConfig(def); config.setMultilingualSupport(true); Dataset ds = TextDatasetFactory.createLucene(ds1, dir, config) ; This multilingual index combines dynamically all localized analyzers of existing languages and the storage of langField properties.\nThe multilingual analyzer becomes the default analyzer and the Lucene StandardAnalyzer is the default analyzer used when there is no language tag.\nIt is straightforward to refer to different languages in the same text search query:\nSELECT ?s WHERE { { ?s text:query ( rdfs:label 'institut' 'lang:fr' ) } UNION { ?s text:query ( rdfs:label 'institute' 'lang:en' ) } } Hence, the result set of the query will contain \u0026ldquo;institute\u0026rdquo; related subjects (institution, institutional,\u0026hellip;) in French and in English.\nNote When multilingual indexing is enabled for a property, e.g., rdfs:label, there will actually be two copies of each literal indexed. One under the Field name, \u0026ldquo;label\u0026rdquo;, and one under the name \u0026ldquo;label_xx\u0026rdquo;, where \u0026ldquo;xx\u0026rdquo; is the language tag.\nGeneric and Defined Analyzer Support There are many Analyzers that do not have built-in support, e.g., BrazilianAnalyzer; require constructor parameters not otherwise supported, e.g., a stop words FileReader or a stemExclusionSet; or make use of Analyzers not included in the bundled Lucene distribution, e.g., a SanskritIASTAnalyzer. Two features have been added to enhance the utility of jena-text: 1) text:GenericAnalyzer; and 2) text:DefinedAnalyzer. Further, since Jena 3.7.0, features to allow definition of tokenizers and filters are included.\nGeneric Analyzers, Tokenizers and Filters A text:GenericAnalyzer includes a text:class which is the fully qualified class name of an Analyzer that is accessible on the jena classpath. This is trivial for Analyzer classes that are included in the bundled Lucene distribution and for other custom Analyzers a simple matter of including a jar containing the custom Analyzer and any associated Tokenizer and Filters on the classpath.\nSimilarly, text:GenericTokenizer and text:GenericFilter allow to access any tokenizers or filters that are available on the Jena classpath. These two types are used only to define tokenizer and filter configurations that may be referred to when specifying a ConfigurableAnalyzer.\nIn addition to the text:class it is generally useful to include an ordered list of text:params that will be used to select an appropriate constructor of the Analyzer class. If there are no text:params in the analyzer specification or if the text:params is an empty list then the nullary constructor is used to instantiate the analyzer. Each element of the list of text:params includes:\nan optional text:paramName of type Literal that is useful to identify the purpose of a parameter in the assembler configuration a text:paramType which is one of: Type Description text:TypeAnalyzer a subclass of org.apache.lucene.analysis.Analyzer text:TypeBoolean a java boolean text:TypeFile the String path to a file materialized as a text:TypeInt a java int text:TypeString a java String text:TypeSet an org.apache.lucene.analysis.CharArraySet and is required for the types text:TypeAnalyzer, text:TypeFile and text:TypeSet, but, since Jena 3.7.0, may be implied by the form of the literal for the types: text:TypeBoolean, text:TypeInt and text:TypeString.\na required text:paramValue with an object of the type corresponding to text:paramType In the case of an analyzer parameter the text:paramValue is any text:analyzer resource as describe throughout this document.\nAn example of the use of text:GenericAnalyzer to configure an EnglishAnalyzer with stop words and stem exclusions is:\ntext:map ( [ text:field \u0026quot;text\u0026quot; ; text:predicate rdfs:label; text:analyzer [ a text:GenericAnalyzer ; text:class \u0026quot;org.apache.lucene.analysis.en.EnglishAnalyzer\u0026quot; ; text:params ( [ text:paramName \u0026quot;stopwords\u0026quot; ; text:paramType text:TypeSet ; text:paramValue (\u0026quot;the\u0026quot; \u0026quot;a\u0026quot; \u0026quot;an\u0026quot;) ] [ text:paramName \u0026quot;stemExclusionSet\u0026quot; ; text:paramType text:TypeSet ; text:paramValue (\u0026quot;ing\u0026quot; \u0026quot;ed\u0026quot;) ] ) ] . Here is an example of defining an instance of ShingleAnalyzerWrapper:\ntext:map ( [ text:field \u0026quot;text\u0026quot; ; text:predicate rdfs:label; text:analyzer [ a text:GenericAnalyzer ; text:class \u0026quot;org.apache.lucene.analysis.shingle.ShingleAnalyzerWrapper\u0026quot; ; text:params ( [ text:paramName \u0026quot;defaultAnalyzer\u0026quot; ; text:paramType text:TypeAnalyzer ; text:paramValue [ a text:SimpleAnalyzer ] ] [ text:paramName \u0026quot;maxShingleSize\u0026quot; ; text:paramType text:TypeInt ; text:paramValue 3 ] ) ] . If there is need of using an analyzer with constructor parameter types not included here then one approach is to define an AnalyzerWrapper that uses available parameter types, such as file, to collect the information needed to instantiate the desired analyzer. An example of such an analyzer is the Kuromoji morphological analyzer for Japanese text that uses constructor parameters of types: UserDictionary, JapaneseTokenizer.Mode, CharArraySet and Set\u0026lt;String\u0026gt;.\nAs mentioned above, the simple types: TypeInt, TypeBoolean, and TypeString may be written without explicitly including text:paramType in the parameter specification. For example:\n[ text:paramName \u0026quot;maxShingleSize\u0026quot; ; text:paramValue 3 ] is sufficient to specify the parameter.\nDefined Analyzers The text:defineAnalyzers feature allows to extend the Multilingual Support defined above. Further, this feature can also be used to name analyzers defined via text:GenericAnalyzer so that a single (perhaps complex) analyzer configuration can be used is several places.\nFurther, since Jena 3.7.0, this feature is also used to name tokenizers and filters that can be referred to in the specification of a ConfigurableAnalyzer.\nThe text:defineAnalyzers is used with text:TextIndexLucene to provide a list of analyzer definitions:\n\u0026lt;#indexLucene\u0026gt; a text:TextIndexLucene ; text:directory \u0026lt;file:Lucene\u0026gt; ; text:entityMap \u0026lt;#entMap\u0026gt; ; text:defineAnalyzers ( [ text:addLang \u0026quot;sa-x-iast\u0026quot; ; text:analyzer [ . . . ] ] [ text:defineAnalyzer \u0026lt;#foo\u0026gt; ; text:analyzer [ . . . ] ] ) . References to a defined analyzer may be made in the entity map like:\ntext:analyzer [ a text:DefinedAnalyzer text:useAnalyzer \u0026lt;#foo\u0026gt; ] Since Jena 3.7.0, a ConfigurableAnalyzer specification can refer to any defined tokenizer and filters, as in:\ntext:defineAnalyzers ( [ text:defineAnalyzer :configuredAnalyzer ; text:analyzer [ a text:ConfigurableAnalyzer ; text:tokenizer :ngram ; text:filters ( :asciiff text:LowerCaseFilter ) ] ] [ text:defineTokenizer :ngram ; text:tokenizer [ a text:GenericTokenizer ; text:class \u0026quot;org.apache.lucene.analysis.ngram.NGramTokenizer\u0026quot; ; text:params ( [ text:paramName \u0026quot;minGram\u0026quot; ; text:paramValue 3 ] [ text:paramName \u0026quot;maxGram\u0026quot; ; text:paramValue 7 ] ) ] ] [ text:defineFilter :asciiff ; text:filter [ a text:GenericFilter ; text:class \u0026quot;org.apache.lucene.analysis.miscellaneous.ASCIIFoldingFilter\u0026quot; ; text:params ( [ text:paramName \u0026quot;preserveOriginal\u0026quot; ; text:paramValue true ] ) ] ] ) ; And after 3.8.0 users are able to use the JenaText custom filter SelectiveFoldingFilter. This filter is not part of the Apache Lucene, but rather a custom implementation available for JenaText users.\nIt is based on the Apache Lucene\u0026rsquo;s ASCIIFoldingFilter, but with the addition of a white-list for characters that must not be replaced. This is especially useful for languages where some special characters and diacritical marks are useful when searching.\nHere\u0026rsquo;s an example:\ntext:defineAnalyzers ( [ text:defineAnalyzer :configuredAnalyzer ; text:analyzer [ a text:ConfigurableAnalyzer ; text:tokenizer :tokenizer ; text:filters ( :selectiveFoldingFilter text:LowerCaseFilter ) ] ] [ text:defineTokenizer :tokenizer ; text:tokenizer [ a text:GenericTokenizer ; text:class \u0026quot;org.apache.lucene.analysis.core.LowerCaseTokenizer\u0026quot; ] ] [ text:defineFilter :selectiveFoldingFilter ; text:filter [ a text:GenericFilter ; text:class \u0026quot;org.apache.jena.query.text.filter.SelectiveFoldingFilter\u0026quot; ; text:params ( [ text:paramName \u0026quot;whitelisted\u0026quot; ; text:paramType text:TypeSet ; text:paramValue (\u0026quot;ç\u0026quot; \u0026quot;ä\u0026quot;) ] ) ] ] ) ; Extending multilingual support The Multilingual Support described above allows for a limited set of ISO 2-letter codes to be used to select from among built-in analyzers using the nullary constructor associated with each analyzer. So if one is wanting to use:\na language not included, e.g., Brazilian; or use additional constructors defining stop words, stem exclusions and so on; or refer to custom analyzers that might be associated with generalized BCP-47 language tags, such as, sa-x-iast for Sanskrit in the IAST transliteration, then text:defineAnalyzers with text:addLang will add the desired analyzers to the multilingual support so that fields with the appropriate language tags will use the appropriate custom analyzer.\nWhen text:defineAnalyzers is used with text:addLang then text:multilingualSupport is implicitly added if not already specified and a warning is put in the log:\ntext:defineAnalyzers ( [ text:addLang \u0026quot;sa-x-iast\u0026quot; ; text:analyzer [ . . . ] ] this adds an analyzer to be used when the text:langField has the value sa-x-iast during indexing and search.\nMultilingual enhancements for multi-encoding searches There are two multilingual search situations that are supported as of 3.8.0:\nSearch in one encoding and retrieve results that may have been entered in other encodings. For example, searching via Simplified Chinese (Hans) and retrieving results that may have been entered in Traditional Chinese (Hant) or Pinyin. This will simplify applications by permitting encoding independent retrieval without additional layers of transcoding and so on. It\u0026rsquo;s all done under the covers in Lucene. Search with queries entered in a lossy, e.g., phonetic, encoding and retrieve results entered with accurate encoding. For example, searching via Pinyin without diacritics and retrieving all possible Hans and Hant triples. The first situation arises when entering triples that include languages with multiple encodings that for various reasons are not normalized to a single encoding. In this situation it is helpful to be able to retrieve appropriate result sets without regard for the encodings used at the time that the triples were inserted into the dataset.\nThere are several such languages of interest: Chinese, Tibetan, Sanskrit, Japanese and Korean. There are various Romanizations and ideographic variants.\nEncodings may not be normalized when inserting triples for a variety of reasons. A principle one is that the rdf:langString object often must be entered in the same encoding that it occurs in some physical text that is being catalogued. Another is that metadata may be imported from sources that use different encoding conventions and it is desirable to preserve the original form.\nThe second situation arises to provide simple support for phonetic or other forms of lossy search at the time that triples are indexed directly in the Lucene system.\nTo handle the first situation a text assembler predicate, text:searchFor, is introduced that specifies a list of language tags that provides a list of language variants that should be searched whenever a query string of a given encoding (language tag) is used. For example, the following text:defineAnalyzers fragment :\n[ text:addLang \u0026quot;bo\u0026quot; ; text:searchFor ( \u0026quot;bo\u0026quot; \u0026quot;bo-x-ewts\u0026quot; \u0026quot;bo-alalc97\u0026quot; ) ; text:analyzer [ a text:GenericAnalyzer ; text:class \u0026quot;\u0026quot; ; text:params ( [ text:paramName \u0026quot;segmentInWords\u0026quot; ; text:paramValue false ] [ text:paramName \u0026quot;lemmatize\u0026quot; ; text:paramValue true ] [ text:paramName \u0026quot;filterChars\u0026quot; ; text:paramValue false ] [ text:paramName \u0026quot;inputMode\u0026quot; ; text:paramValue \u0026quot;unicode\u0026quot; ] [ text:paramName \u0026quot;stopFilename\u0026quot; ; text:paramValue \u0026quot;\u0026quot; ] ) ] ; ] indicates that when using a search string such as \u0026ldquo;རྡོ་རྗེ་སྙིང་\u0026quot;@bo the Lucene index should also be searched for matches tagged as bo-x-ewts and bo-alalc97.\nThis is made possible by a Tibetan Analyzer that tokenizes strings in all three encodings into Tibetan Unicode. This is feasible since the bo-x-ewts and bo-alalc97 encodings are one-to-one with Unicode Tibetan. Since all fields with these language tags will have a common set of indexed terms, i.e., Tibetan Unicode, it suffices to arrange for the query analyzer to have access to the language tag for the query string along with the various fields that need to be considered.\nSupposing that the query is:\n(?s ?sc ?lit) text:query (\u0026quot;rje\u0026quot;@bo-x-ewts) Then the query formed in TextIndexLucene will be:\nlabel_bo:rje label_bo-x-ewts:rje label_bo-alalc97:rje which is translated using a suitable Analyzer, QueryMultilingualAnalyzer, via Lucene\u0026rsquo;s QueryParser to:\n+(label_bo:རྗེ label_bo-x-ewts:རྗེ label_bo-alalc97:རྗེ) which reflects the underlying Tibetan Unicode term encoding. During all documents with one of the three fields in the index for term, \u0026ldquo;རྗེ\u0026rdquo;, will be returned even though the value in the fields label_bo-x-ewts and label_bo-alalc97 for the returned documents will be the original value \u0026ldquo;rje\u0026rdquo;.\nThis support simplifies applications by permitting encoding independent retrieval without additional layers of transcoding and so on. It\u0026rsquo;s all done under the covers in Lucene.\nSolving the second situation simplifies applications by adding appropriate fields and indexing via configuration in the text:defineAnalyzers. For example, the following fragment:\n[ text:defineAnalyzer :hanzAnalyzer ; text:analyzer [ a text:GenericAnalyzer ; text:class \u0026quot;io.bdrc.lucene.zh.ChineseAnalyzer\u0026quot; ; text:params ( [ text:paramName \u0026quot;profile\u0026quot; ; text:paramValue \u0026quot;TC2SC\u0026quot; ] [ text:paramName \u0026quot;stopwords\u0026quot; ; text:paramValue false ] [ text:paramName \u0026quot;filterChars\u0026quot; ; text:paramValue 0 ] ) ] ; ] [ text:defineAnalyzer :han2pinyin ; text:analyzer [ a text:GenericAnalyzer ; text:class \u0026quot;io.bdrc.lucene.zh.ChineseAnalyzer\u0026quot; ; text:params ( [ text:paramName \u0026quot;profile\u0026quot; ; text:paramValue \u0026quot;TC2PYstrict\u0026quot; ] [ text:paramName \u0026quot;stopwords\u0026quot; ; text:paramValue false ] [ text:paramName \u0026quot;filterChars\u0026quot; ; text:paramValue 0 ] ) ] ; ] [ text:defineAnalyzer :pinyin ; text:analyzer [ a text:GenericAnalyzer ; text:class \u0026quot;io.bdrc.lucene.zh.ChineseAnalyzer\u0026quot; ; text:params ( [ text:paramName \u0026quot;profile\u0026quot; ; text:paramValue \u0026quot;PYstrict\u0026quot; ] ) ] ; ] [ text:addLang \u0026quot;zh-hans\u0026quot; ; text:searchFor ( \u0026quot;zh-hans\u0026quot; \u0026quot;zh-hant\u0026quot; ) ; text:auxIndex ( \u0026quot;zh-aux-han2pinyin\u0026quot; ) ; text:analyzer [ a text:DefinedAnalyzer ; text:useAnalyzer :hanzAnalyzer ] ; ] [ text:addLang \u0026quot;zh-hant\u0026quot; ; text:searchFor ( \u0026quot;zh-hans\u0026quot; \u0026quot;zh-hant\u0026quot; ) ; text:auxIndex ( \u0026quot;zh-aux-han2pinyin\u0026quot; ) ; text:analyzer [ a text:DefinedAnalyzer ; text:useAnalyzer :hanzAnalyzer ] ; ] [ text:addLang \u0026quot;zh-latn-pinyin\u0026quot; ; text:searchFor ( \u0026quot;zh-latn-pinyin\u0026quot; \u0026quot;zh-aux-han2pinyin\u0026quot; ) ; text:analyzer [ a text:DefinedAnalyzer ; text:useAnalyzer :pinyin ] ; ] [ text:addLang \u0026quot;zh-aux-han2pinyin\u0026quot; ; text:searchFor ( \u0026quot;zh-latn-pinyin\u0026quot; \u0026quot;zh-aux-han2pinyin\u0026quot; ) ; text:analyzer [ a text:DefinedAnalyzer ; text:useAnalyzer :pinyin ] ; text:indexAnalyzer :han2pinyin ; ] defines language tags for Traditional, Simplified, Pinyin and an auxiliary tag zh-aux-han2pinyin associated with an Analyzer, :han2pinyin. The purpose of the auxiliary tag is to define an Analyzer that will be used during indexing and to specify a list of tags that should be searched when the auxiliary tag is used with a query string.\nSearching is then done via the multi-encoding support discussed above. In this example the Analyzer, :han2pinyin, tokenizes strings in zh-hans and zh-hant as the corresponding pinyin so that at search time a pinyin query will retrieve appropriate triples inserted in Traditional or Simplified Chinese. Such a query would appear as:\n(?s ?sc ?lit ?g) text:query (\u0026quot;jīng\u0026quot;@zh-aux-han2pinyin) The auxiliary field support is needed to accommodate situations such as pinyin or sound-ex which are not exact, i.e., one-to-many rather than one-to-one as in the case of Simplified and Traditional.\nTextIndexLucene adds a field for each of the auxiliary tags associated with the tag of the triple object being indexed. These fields are in addition to the un-tagged field and the field tagged with the language of the triple object literal.\nNaming analyzers for later use Repeating a text:GenericAnalyzer specification for use with multiple fields in an entity map may be cumbersome. The text:defineAnalyzer is used in an element of a text:defineAnalyzers list to associate a resource with an analyzer so that it may be referred to later in a text:analyzer object. Assuming that an analyzer definition such as the following has appeared among the text:defineAnalyzers list:\n[ text:defineAnalyzer \u0026lt;#foo\u0026gt; text:analyzer [ . . . ] ] then in a text:analyzer specification in an entity map, for example, a reference to analyzer \u0026lt;#foo\u0026gt; is made via:\ntext:map ( [ text:field \u0026quot;text\u0026quot; ; text:predicate rdfs:label; text:analyzer [ a text:DefinedAnalyzer text:useAnalyzer \u0026lt;#foo\u0026gt; ] This makes it straightforward to refer to the same (possibly complex) analyzer definition in multiple fields.\nStoring Literal Values New in Jena 3.0.0.\nIt is possible to configure the text index to store enough information in the text index to be able to access the original indexed literal values at query time. This is controlled by two configuration options. First, the text:storeValues property must be set to true for the text index:\n\u0026lt;#indexLucene\u0026gt; a text:TextIndexLucene ; text:directory \u0026quot;mem\u0026quot; ; text:storeValues true; . Or using Java code, used the setValueStored method of TextIndexConfig:\nTextIndexConfig config = new TextIndexConfig(def); config.setValueStored(true); Additionally, setting the langField configuration option is recommended. See Linguistic Support with Lucene Index for details. Without the langField setting, the stored literals will not have language tag or datatype information.\nAt query time, the stored literals can be accessed by using a 3-element list of variables as the subject of the text:query property function. The literal value will be bound to the third variable:\n(?s ?score ?literal) text:query 'word' Working with Fuseki The Fuseki configuration simply points to the text dataset as the fuseki:dataset of the service.\n\u0026lt;#service_text_tdb\u0026gt; rdf:type fuseki:Service ; rdfs:label \u0026quot;TDB/text service\u0026quot; ; fuseki:name \u0026quot;ds\u0026quot; ; fuseki:serviceQuery \u0026quot;query\u0026quot; ; fuseki:serviceQuery \u0026quot;sparql\u0026quot; ; fuseki:serviceUpdate \u0026quot;update\u0026quot; ; fuseki:serviceReadGraphStore \u0026quot;get\u0026quot; ; fuseki:serviceReadWriteGraphStore \u0026quot;data\u0026quot; ; fuseki:dataset :text_dataset ; . Building a Text Index When working at scale, or when preparing a published, read-only, SPARQL service, creating the index by loading the text dataset is impractical.\nThe index and the dataset can be built using command line tools in two steps: first load the RDF data, second create an index from the existing RDF dataset.\nStep 1 - Building a TDB dataset Note: If you have an existing TDB dataset then you can skip this step\nBuild the TDB dataset:\njava -cp $FUSEKI_HOME/fuseki-server.jar tdb.tdbloader --tdb=assembler_file data_file using the copy of TDB included with Fuseki.\nAlternatively, use one of the TDB utilities tdbloader or tdbloader2 which are better for bulk loading:\n$JENA_HOME/bin/tdbloader --loc=directory data_file Step 2 - Build the Text Index You can then build the text index with the jena.textindexer tool:\njava -cp $FUSEKI_HOME/fuseki-server.jar jena.textindexer --desc=assembler_file Because a Fuseki assembler description can have several datasets descriptions, and several text indexes, it may be necessary to extract a single dataset and index description into a separate assembler file for use in loading.\nUpdating the index If you allow updates to the dataset through Fuseki, the configured index will automatically be updated on every modification. This means that you do not have to run the above mentioned jena.textindexer after updates, only when you want to rebuild the index from scratch.\nConfiguring Alternative TextDocProducers Default Behavior The default behavior when performing text indexing is to index a single property as a single field, generating a different Document for each indexed triple. This behavior may be augmented by writing and configuring an alternative TextDocProducer.\nPlease note that TextDocProducer.change(...) is called once for each triple that is ADDed or DELETEd, and thus can not be directly used to accumulate multiple properties for use in composing a single multi-fielded Lucene document. See below.\nTo configure a TextDocProducer, say net.code.MyProducer in a dataset assembly, use the property textDocProducer, eg:\n\u0026lt;#ds-with-lucene\u0026gt; rdf:type text:TextDataset; text:index \u0026lt;#indexLucene\u0026gt; ; text:dataset \u0026lt;#ds\u0026gt; ; text:textDocProducer \u0026lt;java:net.code.MyProducer\u0026gt; ; . where CLASSNAME is the full java class name. It must have either a single-argument constructor of type TextIndex, or a two-argument constructor (DatasetGraph, TextIndex). The TextIndex argument will be the configured text index, and the DatasetGraph argument will be the graph of the configured dataset.\nFor example, to explicitly create the default TextDocProducer use:\n... text:textDocProducer \u0026lt;java:org.apache.jena.query.text.TextDocProducerTriples\u0026gt; ; ... TextDocProducerTriples produces a new document for each subject/field added to the dataset, using TextIndex.addEntity(Entity).\nExample The example class below is a TextDocProducer that only indexes ADDs of quads for which the subject already had at least one property-value. It uses the two-argument constructor to give it access to the dataset so that it count the (?G, S, P, ?O) quads with that subject and predicate, and delegates the indexing to TextDocProducerTriples if there are at least two values for that property (one of those values, of course, is the one that gives rise to this change()).\npublic class Example extends TextDocProducerTriples { final DatasetGraph dg; public Example(DatasetGraph dg, TextIndex indexer) { super(indexer); this.dg = dg; } public void change(QuadAction qaction, Node g, Node s, Node p, Node o) { if (qaction == QuadAction.ADD) { if (alreadyHasOne(s, p)) super.change(qaction, g, s, p, o); } } private boolean alreadyHasOne(Node s, Node p) { int count = 0; Iterator\u0026lt;Quad\u0026gt; quads = dg.find( null, s, p, null ); while (quads.hasNext()) {; count += 1; } return count \u0026gt; 1; } } Multiple fields per document In principle it should be possible to extend Jena to allow for creating documents with multiple searchable fields by extending org.apache.jena.sparql.core.DatasetChangesBatched such as with org.apache.jena.query.text.TextDocProducerEntities; however, this form of extension is not currently (Jena 3.13.1) functional.\nMaven Dependency The jena-text module is included in Fuseki. To use it within application code, then use the following maven dependency:\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-text\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;X.Y.Z\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; adjusting the version X.Y.Z as necessary. This will automatically include a compatible version of Lucene.\nFor Elasticsearch implementation, you can include the following Maven Dependency:\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-text-es\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;X.Y.Z\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; adjusting the version X.Y.Z as necessary.\n","permalink":"","tags":null,"title":"Jena Full Text Search"},{"categories":null,"contents":" Jena Core JavaDoc ARQ JavaDoc (SPARQL) TDB JavaDoc RDF Connection Fuseki JavaDoc Fuseki2 Webapp Fuseki2 Main Text search SHACL ShEx RDF Patch GeoSPARQL Query Builder Service Enhancer Security Permissions JavaDoc JDBC JavaDoc ","permalink":"","tags":null,"title":"Jena JavaDoc"},{"categories":null,"contents":"Jena JDBC is a set of libraries which provide SPARQL over JDBC driver implementations.\nThis is a pure SPARQL over JDBC implementation, there is no attempt to present the underlying RDF data model as a relational model through the driver and only SPARQL queries and updates are supported.\nIt provides type 4 drivers in that they are pure Java based but the drivers are not JDBC compliant since by definition they do not support SQL.\nThis means that the drivers can be used with JDBC tools provided that those tools don\u0026rsquo;t restrict you to SQL or auto-generate SQL. So it can be used with a tool like SquirrelSQL since you can freely enter SPARQL queries and updates. Conversely it cannot be used with a tool like a SQL based ORM which generates SQL.\nDocumentation Overview Basic Usage Alternatives Jena JDBC Drivers Maven Artifacts for Jena JDBC Implementing a custom Jena JDBC Driver Overview Jena JDBC aims to be a pure SPARQL over JDBC driver, it assumes that all commands that come in are either SPARQL queries or updates and processes them as such.\nAs detailed on the drivers page there are actually three drivers provided currently:\nIn-Memory - uses an in-memory dataset to provide non-persistent storage TDB - uses a TDB dataset to provide persistent and transactional storage Remote Endpoint - uses HTTP based remote endpoints to access any SPARQL protocol compliant storage These are all built on a core library which can be used to build custom drivers if desired. This means that all drivers share common infrastructure and thus exhibit broadly speaking the same behavior around handling queries, updates and results.\nJena JDBC is published as a Maven module via its maven artifacts. The source for Jena JDBC may be downloaded as part of the source distribution.\nTreatment of Results One important behavioral aspect to understand is how results are treated compared to a traditional JDBC driver. SPARQL provides four query forms and thus four forms of results while JDBC assumes all results have a simple tabular format. Therefore one of the main jobs of the core library is to marshal the results of each kind of query into a tabular format. For SELECT queries this is a trivial mapping, for CONSTRUCT and DESCRIBE the triples are mapped to columns named Subject, Predicate and Object respectively, finally for ASK the boolean is mapped to a single column named ASK.\nThe second issue is that JDBC expects uniform column typing throughout a result set which is not something that holds true for SPARQL results. Therefore the core library takes a pragmatic approach to column typing and makes the exact behavior configurable by the user. The default behavior of the core library is to type all columns as Types.NVARCHAR with a Java type of String, this provides the widest compatibility possible with both the SPARQL results and consuming tools since we can treat everything as a string. We refer to this default behavior as medium compatibility, it is sufficient to allow JDBC tools to interpret results for basic display but may be unsuitable for further processing.\nWe then provide two alternatives, the first of which we refer to as high compatibility aims to present the data in a way that is more amenable to subsequent processing by JDBC tools. In this mode the column types in a result set are detected by sniffing the data in the first row of the result set and assigning appropriate types. For example if the first row for a given column has the value \u0026quot;1234\u0026quot;^^xsd:integer then it would be assigned the type Types.BIGINT and have the Java type of Long. Doing this allows JDBC tools to carry out subsequent calculations on the data in a type appropriate way. It is important to be aware that this sniffing may not be accurate for the entire result set so can still result in errors processing some rows.\nThe second alternative we refer to as low compatibility and is designed for users who are using the driver directly and are fully aware that they are writing SPARQL queries and getting SPARQL results. In this mode we make no effort to type columns in a friendly way instead typing them as Types.JAVA_OBJECT with the Java type Node (i.e. the Jena Node class).\nRegardless of how you configure to do column typing the core library does it best to allow you to marshal values into strong types. For example even if using default compatibility and your columns are typed as strings from a JDBC perspective you can still call getLong(\u0026quot;column\u0026quot;) and if there is a valid conversion the library will make it for you.\nAnother point of interest is around our support of different result set types. The drivers support both ResultSet.TYPE_FORWARD_ONLY and ResultSet.TYPE_SCROLL_INSENSITIVE, note that regardless of the type chosen and the underlying query type all result sets are ResultSet.CONCUR_READ_ONLY i.e. the setLong() style methods cannot be used to update the underlying RDF data. Users should be aware that the default behavior is to use forward only result sets since this allows the drivers to stream the results and minimizes memory usage. When scrollable result sets are used the drivers will cache all the results into memory which can use lots of memory when querying large datasets.\nBasic Usage The following takes you through the basic usage of the in-memory JDBC driver. The code should be familiar to anyone who has used JDBC before and is easily used with our other drivers simply by changing the connection URL appropriately.\nEstablishing a Connection Firstly we should ensure that the driver we wish to use is registered with the JDBC driver manager, a static method is provided for this:\nMemDriver.register(); Once this is done we can then make a JDBC connection just be providing an appropriate connection URL:\n// Make a connection using the In-Memory driver starting from an empty dataset Connection conn = DriverManager.getConnection(\u0026quot;jdbc:jena:mem:empty=true\u0026quot;); Now we can go ahead and use the connection as you would normally.\nPerforming Queries You make queries as you would with any JDBC driver, the only difference being that the queries must be SPARQL:\n// Need a statement Statement stmt = conn.createStatement(); try { // Make a query ResultSet rset = stmt.executeQuery(\u0026quot;SELECT DISTINCT ?type WHERE { ?s a ?type } LIMIT 100\u0026quot;); // Iterate over results while ( { // Print out type as a string System.out.println(rset.getString(\u0026quot;type\u0026quot;)); } // Clean up rset.close(); } catch (SQLException e) { System.err.println(\u0026quot;SQL Error - \u0026quot; + e.getMessage()); } finally { stmt.close(); } Performing Updates You make updates as you would with any JDBC driver. Again the main difference is that updates must be SPARQL, one downside of this is that SPARQL provides no way to indicate the number of triples/quads affected by an update so the JDBC driver will either return 0 for successful updates or throw a SQLException for failed updates:\n// Need a statement Statement stmt = conn.createStatement(); // Make an update try { stmt.executeUpdate(\u0026quot;INSERT DATA { \u0026lt;http://x\u0026gt; \u0026lt;http://y\u0026gt; \u0026lt;http://z\u0026gt; }\u0026quot;); System.out.println(\u0026quot;Update succeeded\u0026quot;); } catch (SQLException e) { System.out.println(\u0026quot;Update Failed \u0026quot; - + e.getMessage()); } finally { // Clean up stmt.close(); } Alternatives If Jena JDBC does not fulfill your use case you may also be interested in some 3rd party projects which do SPARQL over JDBC in other ways:\nClaude Warren\u0026rsquo;s jdbc4sparql - An alternative approach that does expose the underlying RDF data model as a relational model and supports translating SQL into SPARQL William Greenly\u0026rsquo;s jdbc4sparql - A similar approach to Jena JDBC restricted to accessing HTTP based SPARQL endpoints Paul Gearon\u0026rsquo;s scon - A similar approach to Jena JDBC restricted to accessing HTTP based SPARQL endpoints ","permalink":"","tags":null,"title":"Jena JDBC - A SPARQL over JDBC driver framework"},{"categories":null,"contents":"Jena JDBC comes with three built in drivers by default with the option of building custom drivers if desired. This page covers the differences between the provided drivers and the connection URL options for each.\nConnection URL Basics Connection URLs for Jena JDBC drivers have a common format, they all start with the following:\njdbc:jena:foo: Where foo is a driver specific prefix that indicates which specific driver implementation is being used.\nAfter the prefix the connection URL consists of a sequence of key value pairs, the characters ampersand (\u0026amp;), semicolon (;) and pipe (|) are considered to be separators between pairs, the separators are reserved characters and may not be used in values. The key is separated from the value by a equals sign (=) though unlike the separators this is not a reserved character in values.\nThere is no notion of character escaping in connection parameters so if you need to use any of the reserved characters in your values then you should pass these to the connect(String, Properties) method directly in the Properties object.\nCommon Parameters There are some common parameter understood by all Jena JDBC drivers and which apply regardless of driver implementation.\nJDBC Compatibility Level As discussed in the overview the drivers have a notion of JDBC compatibility which is configurable. The jdbc-compatibility parameter is used in connection URLs. To avoid typos when creating URLs programmatically a constant (JenaDriver.PARAM_JDBC_COMPATIBILITY) is provided which contains the parameter name exactly as the code expects it. This parameter provides an integer value in the range 1-9 which denotes how compatible the driver should attempt to be. See the aforementioned overview documentation for more information on the interpretation of this parameter.\nWhen not set the default compatibility level is used, note that JenaConnection objects support changing this after the connection has been established.\nPre-Processors The second of the common parameters is the pre-processor parameter which is used to specify one/more CommandPreProcessor implementations to use. The parameter should be specified once for each pre-processor you wish to you and you should supply a fully qualified class name to ensure the pre-processor can be loaded and registered on your connections. The driver will report an error if you specify a class that cannot be appropriately loaded and registered.\nPre-processors are registered in the order that they are specified so if you use multiple pre-processors and they have ordering dependencies please ensure that you specify them in the desired order. Note that JenaConnection objects support changing registered pre-processors after the connection has been established.\nPost-Processors There is also a post-processor parameter which is used to specify one/more ResultsPostProcessor implementations to use. The parameter should be specified once for each post-processor you wish to use and you should supply a fully qualified class name to ensure the post-processor can be loaded and registered on your connections. The driver will report an error is you specify a class that cannot be appropriately loaded and registered.\nPost-processors are registered in the order that they are specified so if you use multiple post-processors and they have ordering dependencies please ensure that you specify them in the desired order. Note that JenaConnection objects support changing registered post-processors after the connection has been established.\nAvailable Drivers In-Memory TDB Remote Endpoint Each driver is available as a separate maven artifact, see the artifacts page for more information.\nIn-Memory The in-memory driver provides access to a non-persistent non-transactional in-memory dataset. This dataset may either be initially empty or may be initialized from an input file. Remember that this is non-persistent so even if the latter option is chosen changes are not persisted to the input file. This driver is primarily intended for testing and demonstration purposes.\nBeyond the common parameters it has two possible connection parameters. The first of these is the dataset parameter and is used to indicate an input file that the driver will initialize the in-memory dataset with e.g.\njdbc:jena:mem:dataset=file.nq If you prefer to start with an empty dataset you should use the empty parameter instead e.g.\njdbc:jena:mem:empty=true If both are specified then the dataset parameter has precedence.\nTDB The TDB driver provides access to a persistent Jena TDB dataset. This means that the dataset is both persistent and can be used transactionally. For correct transactional behavior it is typically necessary to set the holdability for connections and statements to ResultSet.HOLD_CURSORS_OVER_COMMIT as otherwise closing a result set or making an update will cause all other results to be closed.\nBeyond the common parameters the driver requires a single location parameter that provides the path to a location for a TDB dataset e.g.\njdbc:jena:tdb:location=/path/to/data By default a TDB dataset will be created in that location if one does not exist, if you would prefer not to do this i.e. ensure you only access existing TDB datasets then you can add the must-exist parameter e.g.\njdbc:jena:tdb:location=/path/to/data\u0026amp;must-exist=true With this parameter set the connection will fail if the location does not exist as a directory, note that this does not validate that the location is a TDB dataset so it is still possible to pass in invalid paths even with this set.\nRemote Endpoint The Remote Endpoint driver provides access to any SPARQL Protocol compliant store that exposes SPARQL query and/or SPARQL update endpoints. This driver can be explicitly configured to be in read-only or write-only mode by providing only one of the required endpoints.\nThe query parameter sets the query endpoint whilst the update parameter sets the update endpoint e.g.\njdbc:jena:remote:query=http://localhost:3030/ds/query\u0026amp;update=http://localhost:3030/ds/update At least one of these parameters is required, if only one is provided you will get a read-only or write-only connection as appropriate.\nThis driver also provides a whole variety of parameters that may be used to customize its behavior further. Firstly there are a set of parameters which control the dataset description provided via the SPARQL protocol:\ndefault-graph-uri - Sets a default graph for queries named-graph-uri - Sets a named graph for queries using-graph-uri - Sets a default graph for updates using-named-graph-uri - Sets a named graph for updates All of these may be specified multiple times to specify multiple graph URIs for each.\nThen you have the select-results-type and model-results-type which are used to set the MIME type you\u0026rsquo;d prefer to have the driver retrieve SPARQL results from the remote endpoints in. If used you must set them to formats that ARQ supports, the ARQ WebContent class has constants for the various supported formats.\nAuthentication There is also comprehensive support for authentication using this driver, the standard JDBC user and password parameters are used for credentials and then a selection of driver specific parameters are used to configure how you wish the driver to authenticate.\nUnder the hood authentication uses the new HttpAuthenticator framework introduced in the same release as Jena JDBC, see HTTP Authentication in ARQ. This means that it can support standard HTTP auth methods (Basic, Digest etc) or can use more complex schemes such as forms based auth with session cookies.\nTo set up standard HTTP authentication it is sufficient to specify the user and password fields. As with any JDBC application we strongly recommend that you do not place these in the connection URL directly but rather use the Properties object to pass these in. One option you may wish to include if your endpoints use HTTP Basic authentication is the preemptive-auth parameter which when set to true will enable preemptive authentication. While this is less secure it can be more performant if you are making lots of queries.\nSetting up form based authentication is somewhat more complex, at a minimum you need to provide the form-url parameter with a value for the URL that user credentials should be POSTed to in order to login. You may need to specify the form-user-field and form-password-field parameters to provide the name of the fields for the login request, by default these assume you are using an Apache mod_auth_form protected server and use the appropriate default values.\nThe final option for authenticator is to use the authenticator parameter via the Properties object to pass in an actual instance of a HttpAuthenticator that you wish to use. This method is the most powerful in that it allows you to use any authentication method that you need.\n","permalink":"","tags":null,"title":"Jena JDBC Drivers"},{"categories":null,"contents":"This section is a general introduction to the Jena ontology API, including some of the common tasks you may need to perform. We won\u0026rsquo;t go into all of the many details of the API here: you should expect to refer to the Javadoc to get full details of the capabilities of the API.\nPrerequisites We\u0026rsquo;ll assume that you have a basic familiarity with RDF and with Jena. If not, there are other Jena help documents you can read for background on these topics, and a collection of tutorials.\nJena is a programming toolkit, using the Java programming language. While there are a few command-line tools to help you perform some key tasks using Jena, mostly you use Jena by writing Java programs. The examples in this document will be primarily code samples.\nWe also won\u0026rsquo;t be explaining the OWL or RDFS ontology languages in much detail in this document. You should refer to supporting documentation for details on those languages, for example the W3C OWL document index.\nNote: Although OWL version 1.1 is now a W3C recommendation, Jena\u0026rsquo;s support for OWL 1.1 features is limited. We will be addressing this in future versions Jena.\nOverview The section of the manual is broken into a number of sections. You do not need to read them in sequence, though later sections may refer to concepts and techniques introduced in earlier sections. The sections are:\nGeneral concepts Running example: the ESWC ontology Creating ontology models Compound ontology documents and imports processing The generic ontology type: OntResource Ontology classes and basic class expressions Ontology properties More complex class expressions Instances or individuals Ontology meta-data Ontology inference: overview Working with persistent ontologies Experimental ontology tools Further assistance Hopefully, this document will be sufficient to help most readers to get started using the Jena ontology API. For further support, please post questions to the Jena support list, or file a bug report.\nPlease note that we ask that you use the support list or the bug-tracker to communicate with the Jena team, rather than send email to the team members directly. This helps us manage Jena support more effectively, and facilitates contributions from other Jena community members.\nGeneral concepts In a widely-quoted definition, an ontology is\n\u0026ldquo;a specification of a conceptualization\u0026rdquo; [Gruber, T. 1993]\nLet\u0026rsquo;s unpack that brief characterisation a bit. An ontology allows a programmer to specify, in an open, meaningful, way, the concepts and relationships that collectively characterise some domain of interest. Examples might be the concepts of red and white wine, grape varieties, vintage years, wineries and so forth that characterise the domain of \u0026lsquo;wine\u0026rsquo;, and relationships such as \u0026lsquo;wineries produce wines\u0026rsquo;, \u0026lsquo;wines have a year of production\u0026rsquo;. This wine ontology might be developed initially for a particular application, such as a stock-control system at a wine warehouse. As such, it may be considered similar to a well-defined database schema. The advantage to an ontology is that it is an explicit, first-class description. So having been developed for one purpose, it can be published and reused for other purposes. For example, a given winery may use the wine ontology to link its production schedule to the stock system at the wine warehouse. Alternatively, a wine recommendation program may use the wine ontology, and a description (ontology) of different dishes to recommend wines for a given menu.\nThere are many ways of writing down an ontology, and a variety of opinions as to what kinds of definition should go in one. In practice, the contents of an ontology are largely driven by the kinds of application it will be used to support. In Jena, we do not take a particular view on the minimal or necessary components of an ontology. Rather, we try to support a variety of common techniques. In this section, we try to explain what is – and to some extent what isn\u0026rsquo;t – possible using Jena\u0026rsquo;s ontology support.\nSince Jena is fundamentally an RDF platform, Jena\u0026rsquo;s ontology support is limited to ontology formalisms built on top of RDF. Specifically this means RDFS, the varieties of OWL. We will provide a very brief introduction to these languages here, but please refer to the extensive on-line documentation for these formalisms for complete and authoritative details.\nRDFS RDFS is the weakest ontology language supported by Jena. RDFS allows the ontologist to build a simple hierarchy of concepts, and a hierarchy of properties. Consider the following trivial characterisation (with apologies to biology-trained readers!):\nTable 1: A simple concept hierarchy\nUsing RDFS, we can say that my ontology has five classes, and that Plant is a sub-class of Organism and so on. So every animal is also an organism. A good way to think of these classes is as describing sets of individuals: organism is intended to describe a set of living things, some of which are animals (i.e. a sub-set of the set of organisms is the set of animals), and some animals are fish (a subset of the set of all animals is the set of all fish).\nTo describe the attributes of these classes, we can associate properties with the classes. For example, animals have sensory organs (noses, eyes, etc.). A general property of an animal might be senseOrgan, to denote any given sensory organs a particular animal has. In general, fish have eyes, so a fish might have a eyes property to refer to a description of the particular eye structure of some species. Since eyes are a type of sensory organ, we can capture this relationship between these properties by saying that eye is a sub-property-of senseOrgan. Thus if a given fish has two eyes, it also has two sense organs. (It may have more, but we know that it must have two).\nWe can describe this simple hierarchy with RDFS. In general, the class hierarchy is a graph rather than a tree (i.e. not like Java class inheritance). The slime mold is popularly, though perhaps not accurately, thought of as an organism that has characteristics of both plants and animals. We might model a slime mold in our ontology as a class that has both plant and animal classes among its super-classes. RDFS is too weak a language to express the constraint that a thing cannot be both a plant and an animal (which is perhaps lucky for the slime molds). In RDFS, we can only name the classes, we cannot construct expressions to describe interesting classes. However, for many applications it is sufficient to state the basic vocabulary, and RDFS is perfectly well suited to this.\nNote also that we can both describe classes, in general terms, and we can describe particular instances of those classes. So there may be a particular individual Fred who is a Fish (i.e. has rdf:type Fish), and who has two eyes. Their companion Freda, a Mexican Tetra, or blind cave fish, has no eyes. One use of an ontology is to allow us to fill-in missing information about individuals. Thus, though it is not stated directly, we can deduce that Fred is also an Animal and an Organism. Assume that there was no rdf:type asserting that Freda is a Fish. We may still infer Freda\u0026rsquo;s rdf:type since Freda has lateral lines as sense organs, and these only occur in fish. In RDFS, we state that the domain of the lateralLines property is the Fish class, so an RDFS reasoner can infer that Freda must be a fish.\nOWL In general, OWL allows us to say everything that RDFS allows, and much more besides. A key part of OWL is the ability to describe classes in more interesting and complex ways. For example, in OWL we can say that Plant and Animal are disjoint classes: no individual can be both a plant and an animal (which would have the unfortunate consequence of making SlimeMold an empty class). SaltwaterFish might be the intersection of Fish and the class SeaDwellers (which also includes, for example, cetaceans and sea plants).\nSuppose we have a property covering, intended to represent the scales of a fish or the fur of a mammal. We can now refine the mammal class to be \u0026lsquo;animals that have a covering that is hair\u0026rsquo;, using a property restriction to express the condition that property covering has a value from the class Hair. Similarly TropicalFish might be the intersection of the class of Fish and the class of things that have TropicalOcean as their habitat.\nFinally (for this brief overview), we can say more about properties in OWL. In RDFS, properties can be related via a property hierarchy. OWL extends this by allowing properties to be denoted as transitive, symmetric or functional, and allow one property to be declared to be the inverse of another. OWL also makes a distinction between properties that have individuals (RDF resources) as their range and properties that have data-values (known as literals in RDF terminology) as their range. Respectively these are object properties and datatype properties. One consequence of the RDF lineage of OWL is that OWL ontologies cannot make statements about literal values. We cannot say in RDF that seven has the property of being a prime number. We can, of course, say that the class of primes includes seven, doing so doesn\u0026rsquo;t require a number to be the subject of an RDF statement. In OWL, this distinction is important: only object properties can be transitive or symmetric.\nThe OWL language is sub-divided into three syntax classes: OWL Lite, OWL DL and OWL Full. OWL DL does not permit some constructions allowed in OWL Full, and OWL Lite has all the constraints of OWL DL plus some more. The intent for OWL Lite and OWL DL is to make the task of reasoning with expressions in that subset more tractable. Specifically, OWL DL is intended to be able to be processed efficiently by a description logic reasoner. OWL Lite is intended to be amenable to processing by a variety of reasonably simple inference algorithms, though experts in the field have challenged how successfully this has been achieved.\nWhile the OWL standards documents note that OWL builds on top of the (revised) RDF specifications, it is possible to treat OWL as a separate language in its own right, and not something that is built on an RDF foundation. This view uses RDF as a serialisation syntax; the RDF-centric view treats RDF triples as the core of the OWL formalism. While both views are valid, in Jena we take the RDF-centric view.\nOntology languages and the Jena Ontology API As we outlined above, there are various different ontology languages available for representing ontology information on the semantic web. They range from the most expressive, OWL Full, through to the weakest, RDFS. Through the Ontology API, Jena aims to provide a consistent programming interface for ontology application development, independent of which ontology language you are using in your programs.\nThe Jena Ontology API is language-neutral: the Java class names are not specific to the underlying language. For example, the OntClass Java class can represent an OWL class or RDFS class. To represent the differences between the various representations, each of the ontology languages has a profile, which lists the permitted constructs and the names of the classes and properties.\nThus in the OWL profile is it owl:ObjectProperty (short for and in the RDFS profile it is null since RDFS does not define object properties.\nThe profile is bound to an ontology model, which is an extended version of Jena\u0026rsquo;s Model class. The base Model allows access to the statements in a collection of RDF data. OntModel extends this by adding support for the kinds of constructs expected to be in an ontology: classes (in a class hierarchy), properties (in a property hierarchy) and individuals.\nWhen you\u0026rsquo;re working with an ontology in Jena, all of the state information remains encoded as RDF triples (accessed as Jena Statements) stored in the RDF model. The ontology API doesn\u0026rsquo;t change the RDF representation of ontologies. What it does do is add a set of convenience classes and methods that make it easier for you to write programs that manipulate the underlying RDF triples.\nThe predicate names defined in the ontology language correspond to the accessor methods on the Java classes in the API. For example, an OntClass has a method to list its super-classes, which corresponds to the values of the subClassOf property in the RDF representation. This point is worth re-emphasising: no information is stored in the OntClass object itself. When you call the OntClass listSuperClasses() method, Jena will retrieve the information from the underlying RDF triples. Similarly, adding a subclass to an OntClass asserts an additional RDF triple, typically with predicate rdfs:subClassOf into the model.\nOntologies and reasoning One of the key benefits of building an ontology-based application is using a reasoner to derive additional truths about the concepts you are modelling. We saw a simple instance of this above: the assertion \u0026ldquo;Fred is a Fish\u0026rdquo; entails the deduction \u0026ldquo;Fred is an Animal\u0026rdquo;. There are many different styles of automated reasoner, and very many different reasoning algorithms. Jena includes support for a variety of reasoners through the inference API.\nA common feature of Jena reasoners is that they create a new RDF model which appears to contain the triples that are derived from reasoning as well as the triples that were asserted in the base model. This extended model nevertheless still conforms to the contract for Jena models. It can be used wherever a non-inference model can be used. The ontology API exploits this feature: the convenience methods provide by the ontology API can query an extended inference model in just the same way that they can a plain RDF model. In fact, this is such a common pattern that we provide simple recipes for constructing ontology models whose language, storage model and reasoning engine can all be simply specified when an OntModel is created. We\u0026rsquo;ll show examples shortly.\nFigure 2 shows one way of visualising this:\nGraph is an internal Jena interface that supports the composition of sets of RDF triples. The asserted statements, which may have been read in from an ontology document, are held in the base graph. The reasoner, or inference engine, can use the contents of the base graph and the semantic rules of the language to show a more complete set of base and entailed triples. This is also presented via a Graph interface, so the OntModel works only with the outermost interface. This regularity allows us to very easily build ontology models with or without a reasoner. It also means that the base graph can be an in-memory store, a database-backed persistent store, or some other storage structure altogether – e.g. an LDAP directory – again without affecting the operation of the ontology model (but noting that these different approaches may have very different efficiency profiles).\nRDF-level polymorphism and Java Deciding which Java abstract class to use to represent a given RDF resource can be surprisingly subtle. Consider the following RDF sample:\n\u0026lt;owl:Class rdf:ID=\u0026quot;DigitalCamera\u0026quot;\u0026gt; \u0026lt;/owl:Class\u0026gt; This declares that the resource with the relative URI #DigitalCamera is an OWL ontology class. It suggests that it would be appropriate to model that declaration in Java with an instance of an OntClass. Now suppose we add a triple to the RDF model to augment the class declaration with some more information:\n\u0026lt;owl:Class rdf:ID=\u0026quot;DigitalCamera\u0026quot;\u0026gt; \u0026lt;rdf:type owl:Restriction /\u0026gt; \u0026lt;/owl:Class\u0026gt; Now we are stating that #DigitalCamera is an OWL Restriction. Restriction is a subclass of owl:Class, so this is a perfectly consistent operation. The problem we then have is that Java does not allow us to dynamically change the Java class of the object representing this resource. The resource has not changed: it still has URI #DigitalCamera. But the appropriate Java class Jena might choose to encapsulate it has changed from OntClass to Restriction. Conversely, if we subsequently remove the rdf:type owl:Restriction from the model, using the Restriction Java class is no longer appropriate.\nEven worse, OWL Full allows us to state the following (rather counter-intuitive) construction:\n\u0026lt;owl:Class rdf:ID=\u0026quot;DigitalCamera\u0026quot;\u0026gt; \u0026lt;rdf:type owl:ObjectProperty /\u0026gt; \u0026lt;/owl:Class\u0026gt; That is, #DigitalCamera is both a class and a property. While this may not be a very useful claim, it illustrates a basic point: we cannot rely on a consistent or unique mapping between an RDF resource and the appropriate Java abstraction.\nJena accepts this basic characteristic of polymorphism at the RDF level by considering that the Java abstraction (OntClass, Restriction, DatatypeProperty, etc.) is just a view or facet of the resource. That is, there is a one-to-many mapping from a resource to the facets that the resource can present. If the resource is typed as an owl:Class, it can present the OntClass facet; given other types, it can present other facets. Jena provides the .as() method to efficiently map from an RDF object to one of its allowable facets. Given a RDF object (i.e. an instance of org.apache.jena.rdf.model.RDFNode or one of its sub-types), you can get a facet by invoking as() with an argument that denotes the facet required. Specifically, the facet is identified by the Java class object of the desired facet. For example, to get the OntClass facet of a resource, we can write:\nResource r = myModel.getResource( myNS + \u0026quot;DigitalCamera\u0026quot; ); OntClass cls = OntClass.class ); This pattern allows our code to defer decisions about the correct Java abstraction to use until run-time. The choice can depend on the properties of the resource itself. If a given RDFNode will not support the conversion to a given facet, it will raise a ConversionException. We can test whether .as() will succeed for a given facet with canAs(). This RDF-level polymorphism is used extensively in the Jena ontology API to allow maximum flexibility in handling ontology data.\nRunning example: the ESWC ontology To illustrate the principles of using the ontology API, we will use examples drawn from the ESWC ontology This ontology presents a simple model for describing the concepts and activities associated with a typical academic conference. A copy of the ontology serialized in RDF/XML is included with the Jena download, see: [eswc-2006-09-21.rdf] (note that you may need to view the page source in some browsers to see the XML code).\nA subset of the classes and properties from the ontology are shown in Figure 3:\nFigure 3: Classes and properties from ESWC ontology\nWe will use elements from this ontology to illustrate the ontology API throughout the rest of this document.\nCreating ontology models An ontology model is an extension of the Jena RDF model, providing extra capabilities for handling ontologies. Ontology models are created through the Jena ModelFactory. The simplest way to create an ontology model is as follows:\nOntModel m = ModelFactory.createOntologyModel(); This will create an ontology model with the default settings, which are set for maximum compatibility with the previous version of Jena. These defaults are:\nOWL-Full language in-memory storage RDFS inference, which principally produces entailments from the sub-class and sub-property hierarchies. Important note: this means that the default ontology model does include some inferencing, with consequences both for the performance of the model, and for the triples which appear in the model.\nIn many applications, such as driving a GUI, RDFS inference is too strong. For example, every class is inferred to be an immediate sub-class of owl:Thing. In other applications, stronger reasoning is needed. In general, to create an OntModel with a particular reasoner or language profile, you should pass a model specification to the createOntologyModel call. For example, an OWL model that performs no reasoning at all can be created with:\nOntModel m = ModelFactory.createOntologyModel( OntModelSpec.OWL_MEM ); To create an ontology model for a particular language, but leaving all of the other values as defaults, you should pass the URI of the ontology language to the model factory. The URI strings for the various language profiles are:\nOntology language URI RDFS OWL Full OWL DL OWL Lite These URI\u0026rsquo;s are used to look-up the language profile from the ProfileRegistry. The profile registry contains public constant declarations so that you do not have to remember these URI\u0026rsquo;s. Please note that the URI\u0026rsquo;s denoting OWL Lite and OWL DL are not officially sanctioned by the OWL standard.\nBeyond these basic choices, the complexities of configuring an ontology model are wrapped up in a recipe object called OntModelSpec. This specification allows complete control over the configuration choices for the ontology model, including the language profile in use, the reasoner, and the means of handling compound documents. A number of common recipes are pre-declared as constants in OntModelSpec, and listed below.\nOntModelSpec Language profile Storage model Reasoner OWL_MEM OWL full in-memory none OWL_MEM_TRANS_INF OWL full in-memory transitive class-hierarchy inference OWL_MEM_RULE_INF OWL full in-memory rule-based reasoner with OWL rules OWL_MEM_MICRO_RULE_INF OWL full in-memory optimised rule-based reasoner with OWL rules OWL_MEM_MINI_RULE_INF OWL full in-memory rule-based reasoner with subset of OWL rules OWL_DL_MEM OWL DL in-memory none OWL_DL_MEM_RDFS_INF OWL DL in-memory rule reasoner with RDFS-level entailment-rules OWL_DL_MEM_TRANS_INF OWL DL in-memory transitive class-hierarchy inference OWL_DL_MEM_RULE_INF OWL DL in-memory rule-based reasoner with OWL rules OWL_LITE_MEM OWL Lite in-memory none OWL_LITE_MEM_TRANS_INF OWL Lite in-memory transitive class-hierarchy inference OWL_LITE_MEM_RDFS_INF OWL Lite in-memory rule reasoner with RDFS-level entailment-rules OWL_LITE_MEM_RULES_INF OWL Lite in-memory rule-based reasoner with OWL rules RDFS_MEM RDFS in-memory none RDFS_MEM_TRANS_INF RDFS in-memory transitive class-hierarchy inference RDFS_MEM_RDFS_INF RDFS in-memory rule reasoner with RDFS-level entailment-rules For details of reasoner capabilities, please see the inference documentation and the Javadoc for OntModelSpec. See also further discussion below.\nNote: it is primarily the choice of reasoner, rather than the choice of language profile, which determines which entailments are seen by the ontology model.\nTo create a model with a given specification, you should invoke the ModelFactory as follows:\nOntModel m = ModelFactory.createOntologyModel( \u0026lt;model spec\u0026gt; ); for example:\nOntModel m = ModelFactory.createOntologyModel( OntModelSpec.OWL_MEM_MICRO_RULE_INF ); To create a custom model specification, you can create a new one from its constructor, and call the various setter methods to set the appropriate values. More often, we want a variation on an existing recipe. In this case, you copy an existing specification and then update the copy as necessary:\nOntModelSpec s = new OntModelSpec( OntModelSpec.OWL_MEM ); s.setDocumentManager( myDocMgr ); OntModel m = ModelFactory.createOntologyModel( s ); Compound ontology documents and imports processing The OWL ontology language includes some facilities for creating modular ontologies that can be re-used in a similar manner to software modules. In particular, one ontology can import another. Jena helps ontology developers to work with modular ontologies by automatically handling the imports statements in ontology models.\nThe key idea is that the base model of an ontology model is actually a collection of models, one per imported model. This means we have to modify figure 2 a bit. Figure 4 shows how the ontology model builds a collection of import models:\nFigure 4: ontology model compound document structure for imports\nWe will use the term document to describe an ontology serialized in some transport syntax, such as RDF/XML or N3. This terminology isn\u0026rsquo;t used by the OWL or RDFS standards, but it is a convenient way to refer to the written artifacts. However, from a broad view of the interlinked semantic web, a document view imposes artificial boundaries between regions of the global web of data and isn\u0026rsquo;t necessarily a useful way of thinking about ontologies.\nWe will load an ontology document into an ontology model in the same way as a normal Jena model, using the read method. There are several variants on read, that handle differences in the source of the document (to be read from a resolvable URL or directly from an input stream or reader), the base URI that will resolve any relative URI\u0026rsquo;s in the source document, and the serialisation language. In summary, these variants are:\nread( String url ) read( Reader reader, String base ) read( InputStream reader, String base ) read( String url, String lang ) read( Reader reader, String base, String Lang ) read( InputStream reader, String base, String Lang ) You can use any of these methods to load an ontology document. Note that we advise that you avoid the read() variants that accept a argument when loading XML documents containing internationalised character sets, since the handling of character encoding by the Reader and by XML parsers is not compatible.\nBy default, when an ontology model reads an ontology document, it will also locate and load the document\u0026rsquo;s imports. An OWL document may contain an individual of class Ontology, which contains meta-data about that document itself. For example:\n\u0026lt;owl:Ontology rdf:about=\u0026quot;\u0026quot;\u0026gt; \u0026lt;dc:creator rdf:value=\u0026quot;Ian Dickinson\u0026quot; /\u0026gt; \u0026lt;owl:imports rdf:resource=\u0026quot;\u0026quot; /\u0026gt; \u0026lt;/owl:Ontology\u0026gt; The construct rdf:about=\u0026quot;\u0026quot; is a relative URI. It will resolve to the document\u0026rsquo;s base URI: in other words it\u0026rsquo;s a shorthand way of referring to the document itself. The owl:imports line states that this ontology is constructed using classes, properties and individuals from the referenced ontology. When an OntModel reads this document, it will notice the owl:imports line and attempt to load the imported ontology into a sub-model of the ontology model being constructed. The definitions from both the base ontology and all of the imports will be visible to the reasoner.\nEach imported ontology document is held in a separate graph structure. This is important: we want to keep the original source ontology separate from the imports. When we write the model out again, normally only the base model is written (the alternative is that all you see is a confusing union of everything). And when we update the model, only the base model changes. To get the base model or base graph from an OntModel, use:\nModel base = myOntModel.getBaseModel(); Imports are processed recursively, so if our base document imports ontology A, and A imports B, we will end up with the structure shown in Figure 4. Note that the imports have been flattened out. A cycle check is used to prevent the document handler getting stuck if, for example, A imports B which imports A!\nThe ontology document manager Each ontology model has an associated document manager which assists with the processing and handling of ontology documents and related concerns. For convenience, there is one global document manager which is used by default by ontology models. You can get a reference to this shared instance through OntDocumentManager.getInstance(). In many cases, it will be sufficient to simply change the settings on the global document manager to suit your application\u0026rsquo;s needs. However, for more fine-grain control, you can create separate document managers, and pass them to the ontology model when it is created through the model factory. To do this, create an ontology specification object (see above), and set the document manager. For example:\nOntDocumentManager mgr = new OntDocumentManager(); // set mgr's properties now ... some code ... // now use it OntModelSpec s = new OntModelSpec( OntModelSpec.RDFS_MEM ); s.setDocumentManager( mgr ); OntModel m = ModelFactory.createOntologyModel( s ); Note that the model retains a reference to the document manager it was created with. Thus if you change a document manager\u0026rsquo;s properties, it will affect models that have previously been constructed with that document manager.\nDocument manager policy Since the document manager has a large number of configurable options, there are two ways in which you can customise it to your application requirements. Firstly, you can set the individual parameters of the document manager by Java code. Alternatively, when a given document manager is created it can load values for the various parameters from a policy file, expressed in RDF. The document manager has a list of URL\u0026rsquo;s which it will search for a policy document. It will stop at the first entry on the list that resolves to a retrievable document. The default search path for the policy is: file:./etc/ont-policy.rdf;file:ont-policy.rdf. You can find the default policy, which can serve as a template for defining your own policies, in the etc/ directory under the Jena download directory.\nWe can set the general properties of the document manager in the policy as follows:\n\u0026lt;DocumentManagerPolicy\u0026gt; \u0026lt;!-- policy for controlling the document manager's behaviour --\u0026gt; \u0026lt;processImports rdf:datatype=\u0026quot;\u0026amp;xsd;boolean\u0026quot;\u0026gt;true\u0026lt;/processImports\u0026gt; \u0026lt;cacheModels rdf:datatype=\u0026quot;\u0026amp;xsd;boolean\u0026quot;\u0026gt;true\u0026lt;/cacheModels\u0026gt; \u0026lt;/DocumentManagerPolicy\u0026gt; You can find the simple schema that declares the various properties that you can use in such an ontology document policy in the vocabularies directory of the Jena download. It\u0026rsquo;s called ont-manager.rdf. To change the search path that the document manager will use to initialise itself, you can either pass the new search path as a string when creating a new document manager object, or call the method setMetadataSearchPath().\nThe ModelMaker: creating storage on demand In order for the document manager to build the union of the imported documents (which we sometimes refer to as the imports closure), there must be some means of creating new graphs to store the imported ontologies. Loading a new import means that a new graph needs to be added. Jena defines a model maker as a simple interface that allows different kinds of model storage (in-memory, file-backed, in a persistent database, etc.) to be created on demand. For the database case, this may include passing the database user-name and password and other connection parameters. New model makers can be created with the ModelFactory.\nThere are two cases in which we may want to create storage for models on-demand. The first is when creating the OntModel for the first time. Some variants of createOntologyModel will allocate space for the base model (instead of, for example, being handed a base model to use as one of the method arguments). The second case when storage must be allocated is when adding an imported document to the union of imports. These cases often require different policies, so the OntModelSpec contains two model maker parameters: the base model maker and imports model maker, available via getBaseModelMaker() and getImportsModelMaker() methods respectively.\nThe default specifications in OntModelSpec which begin MEM_ use an in-memory model maker for the both the base model and the imported documents.\nImplementation note: internally to Jena, we use Graph as a primary data structure. However, application code will almost always refer to models, not graphs. What\u0026rsquo;s happening is that a Model is a wrapper around the Graph, which balances a rich, convenient programming interface (Model) with a simple, manageable internal data structure (Graph). Hence some potential confusion in that Figure 4, above, refers to a structure containing graphs, but we use a ModelMaker to generate new stores. The document manager extracts the appropriate graph from the containing model. Except in cases where you are extending Jena\u0026rsquo;s internal structures, you should think of Model as the container of RDF and ontology data.\nControlling imports processing By default, loading imports during the read() call is automatic. To read() an ontology without building the imports closure, call the method setProcessImports( false ) on the document manager object before calling read(). Alternatively, you can set the processImports property in the policy file. You can also be more selective, and ignore only certain URI\u0026rsquo;s when loading the imported documents. To selectively skip certain named imports, call the method addIgnoreImport( String uri ) on the document manager object, or set the ignoreImport property in the policy.\nManaging file references An advantage of working with ontologies is that we can reuse work done by other ontologists, by importing their published ontologies into our own. The OntModel can load such referenced ontologies automatically from their published URL\u0026rsquo;s. This can mean that an application suffers a delay on startup. Worse, it may require extra work to cope with intervening firewalls or web proxies. Worse still, connectivity may be intermittent: we do not want our application to fail just because it temporarily does not have Internet access, or because a previously published ontology has been moved. To alleviate these commonly experienced problems, we can use Jena\u0026rsquo;s FileManager to manage local indirections, so that an attempt to import a document from a given published URL means that a local copy of the document is loaded instead. This may be a file on the local disk, or simply a pointer to a local mirror web site.\nWhile the FileManager can be configured directly, we can also specify redirections declaratively in the document manager policy file:\n\u0026lt;OntologySpec\u0026gt; \u0026lt;publicURI rdf:resource=\u0026quot;... the public URI to map from...\u0026quot; /\u0026gt; \u0026lt;altURL rdf:resource=\u0026quot;... the local URL to map to ...\u0026quot; /\u0026gt; \u0026lt;!-- optional ontology language term --\u0026gt; \u0026lt;language rdf:resource=\u0026quot;... encoding used ...\u0026quot; /\u0026gt; \u0026lt;!-- optional prefix to associate with the public URL --\u0026gt; \u0026lt;prefix rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt;a prefix\u0026lt;/prefix\u0026gt; \u0026lt;/OntologySpec\u0026gt; For example:\n\u0026lt;OntologySpec\u0026gt; \u0026lt;!-- local version of the RDFS vocabulary --\u0026gt; \u0026lt;publicURI rdf:resource=\u0026quot;\u0026quot; /\u0026gt; \u0026lt;altURL rdf:resource=\u0026quot;file:src/main/resources/rdf-schema.rdf\u0026quot; /\u0026gt; \u0026lt;/OntologySpec\u0026gt; This specifies that an attempt to load the RDFS vocabulary from will transparently cause file:src/main/resources/rdf-schema.rdf to be fetched instead. You can specify any number of such re-directions in the policy file, or you can add them to the document manager object directly by calling the various setter methods (see the Javadoc for details). As a side-effect, this mechanism also means that ontologies may be named with any legal URI (not necessarily resolvable) – so long as the altURL is itself resolvable.\nSee the notes on FileManager for details of additional options.\nIn the following example, we use the DocumentManager API to declare that the ESWC ontology is replicated locally on disk. We then load it using the normal URL. Assume that the constant JENA has been initialised to the directory in which Jena was installed.\nOntModel m = ModelFactory.createOntologyModel(); OntDocumentManager dm = m.getDocumentManager(); dm.addAltEntry( \u0026quot;\u0026quot;, \u0026quot;file:\u0026quot; + JENA + \u0026quot;src/examples/resources/eswc-2006-09-21.rdf\u0026quot; ); \u0026quot;\u0026quot; ); Specifying prefixes A model keeps a table of URI prefixes which can be used to present URI\u0026rsquo;s in the shortened prefix:name form. This is useful in displaying URI\u0026rsquo;s in a readable way in user interfaces, and is essential in producing legal XML names that denote arbitrary URI\u0026rsquo;s. The ontology model\u0026rsquo;s table of prefixes can be initialized from a table kept by the document manager, which contains the standard prefixes plus any that are declared by in the policy file (or added to subsequently by method calls).\nCaching of imported models You can use the document manager to assist with loading ontology documents through its cache. Suppose two ontologies, A and B, both import ontology C. We would like not to have to read C twice when loading A and then B. The document manager supports this use case by optionally caching C\u0026rsquo;s model, indexed by URI. When A tries to import C, there is no cached copy, so a new model is created for C, the contents of C\u0026rsquo;s URL read in to the model, then the C model is used in the compound document for A. Subsequently, when ontology B is loading imports, the document manager checks in its cache and finds an existing copy of C. This will be used in preference to reading a fresh copy of C from C\u0026rsquo;s source URL, saving both time and storage space.\nCaching of import models is switched on by default. To turn it off, use the policy property cacheModels, or call the method setCacheModels( boolean caching ) with caching = false. The document manager\u0026rsquo;s current model cache can be cleared at any time by calling clearCache().\nThe generic ontology type: OntResource All of the classes in the ontology API that represent ontology values have OntResource as a common super-class. This makes OntResource a good place to put shared functionality for all such classes, and makes a handy common return value for general methods. The Java interface OntResource extends Jena\u0026rsquo;s RDF Resource interface, so any general method that accepts a resource or an RDFNode will also accept an OntResource, and consequently, any other ontology value.\nSome of the common attributes of ontology resources that are expressed through methods on OntResource are shown below:\nAttribute Meaning versionInfo A string documenting the version or history of this resource comment A general comment associated with this value label A human-readable label seeAlso Another web location to consult for more information about this resource isDefinedBy A specialisation of seeAlso that is intended to supply a definition of this resource sameAs Denotes another resource that this resource is equivalent to differentFrom Denotes another resource that is distinct from this resource (by definition) For each of these properties, there is a standard pattern of available methods:\nMethod Effect add\u0026lt;property\u0026gt; Add an additional value for the given property set\u0026lt;property\u0026gt; Remove any existing values for the property, then add the given value list\u0026lt;property\u0026gt; Return an iterator ranging over the values of the property get\u0026lt;property\u0026gt; Return the value for the given property, if the resource has one. If not, return null. If it has more than one value, an arbitrary selection is made. has\u0026lt;property\u0026gt; Return true if there is at least one value for the given property. Depending on the name of the property, this is sometimes is\u0026lt;property\u0026gt; remove\u0026lt;property\u0026gt; Removes a given value from the values of the property on this resource. Has no effect if the resource does not have that value. For example: addSameAs( Resource r ), or isSameAs( Resource r ). For full details of the individual methods, please consult the Javadoc.\nOntResource defines some other general utility methods. For example, to find out how many values a resource has for a given property, you can call getCardinality( Property p ). To delete the resource from the ontology altogether, you can call remove(). The effect of this is to remove every statement that mentions this resource as a subject or object of a statement.\nTo get the value of a given property, use getPropertyValue( Property p ). To set it, setPropertyValue( Property p, RDFNode value ). Continuing the naming pattern, the values of a named property can be listed (with listPropertyValues), removed (with removeProperty) or added (with addProperty).\nFinally, OntResource provides methods for listing, getting and setting the rdf:type of a resource, which denotes a class to which the resource belongs (noting that, in RDF and OWL, a resource can belong to many classes at once). The rdf:type property is one for which many entailment rules are defined in the semantic models of the various ontology languages. Therefore, the values that listRDFTypes() returns is more than usually dependent on the reasoner bound to the ontology model. For example, suppose we have class A, class B which is a subclass of A, and resource x whose asserted rdf:type is B. With no reasoner, listing x\u0026rsquo;s RDF types will return only B. If the reasoner is able to calculate the closure of the subclass hierarchy (and most can), x\u0026rsquo;s RDF types would also include A. A complete OWL reasoner would also infer that x has rdf:type owl:Thing and rdf:Resource.\nFor some tasks, getting a complete list of the RDF types of a resource is exactly what is needed. For other tasks, this is not the case. If you are developing an ontology editor, for example, you may want to distinguish in its display between inferred and asserted types. In the above example, only x rdf:type B is asserted, everything else is inferred. One way to make this distinction is to make use of the base model (see Figure 4). Getting the resource from the base model and listing the type properties there would return only the asserted values. For example:\n// create the base model String SOURCE = \u0026quot;\u0026quot;; String NS = SOURCE + \u0026quot;#\u0026quot;; OntModel base = ModelFactory.createOntologyModel( OWL_MEM ); SOURCE, \u0026quot;RDF/XML\u0026quot; ); // create the reasoning model using the base OntModel inf = ModelFactory.createOntologyModel( OWL_MEM_MICRO_RULE_INF, base ); // create a dummy paper for this example OntClass paper = base.getOntClass( NS + \u0026quot;Paper\u0026quot; ); Individual p1 = base.createIndividual( NS + \u0026quot;paper1\u0026quot;, paper ); // list the asserted types for (Iterator\u0026lt;Resource\u0026gt; i = p1.listRDFTypes(); i.hasNext(); ) { System.out.println( p1.getURI() + \u0026quot; is asserted in class \u0026quot; + ); } // list the inferred types p1 = inf.getIndividual( NS + \u0026quot;paper1\u0026quot; ); for (Iterator\u0026lt;Resource\u0026gt; i = p1.listRDFTypes(); i.hasNext(); ) { System.out.println( p1.getURI() + \u0026quot; is inferred to be in class \u0026quot; + ); } For other user interface or presentation tasks, we may want something between the complete list of types and the base list of only the asserted values. Consider the class hierarchy in figure 5 (i):\nFigure 5: asserted and inferred relationships\nFigure 5 (i) shows a base model, containing a class hierarchy and an instance x. Figure 5 (ii) shows the full set of relationships that might be inferred from this base model. In Figure 5 (iii), we see only the direct or maximally specific relationships. For example, in 5 (iii) x does not have rdf:type A, since this is an relationship that is covered by the fact that x has rdf:type D, and D is a subclass of A. Notice also that the rdf:type B link is also removed from the direct graph, for a similar reason. Thus the direct graph hides relationships from both the inferred and asserted graphs. When displaying instance x in a user interface, particularly in a tree view of some kind, the direct graph is often the most useful as it contains the useful information in the most compact form.\nTo list the RDF types of a resource, use:\nlistRDFTypes() // assumes not-direct listRDFTypes( boolean direct ) // if direct=true, show only direct relationships Related methods allow the rdf:type to be tested, set and returned.\nOntology classes and basic class expressions Classes are the basic building blocks of an ontology. A simple class is represented in Jena by an OntClass object. As mentioned above, an ontology class is a facet of an RDF resource. One way, therefore, to get an ontology class is to convert a plain RDF resource into its class facet. Assume that m is a suitably defined OntModel, into which the ESWC ontology has already been read, and that NS is a variable denoting the ontology namespace:\nResource r = m.getResource( NS + \u0026quot;Paper\u0026quot; ); OntClass paper = OntClass.class ); This can be shortened by calling getOntClass() on the ontology model:\nOntClass paper = m.getOntClass( NS + \u0026quot;Paper\u0026quot; ); The getOntClass method will retrieve the resource with the given URI, and attempt to obtain the OntClass facet. If either of these operations fail, getOntClass() will return null. Compare this with the createClass method, which will reuse an existing resource if possible, or create a new class resource if not:\nOntClass paper = m.createClass( NS + \u0026quot;Paper\u0026quot; ); OntClass bestPaper = m.createClass( NS + \u0026quot;BestPaper\u0026quot; ); You can use the create class method to create an anonymous class – a class description with no associated URI. Anonymous classes are often used when building more complex ontologies in OWL. They are less useful in RDFS.\nOntClass anonClass = m.createClass(); Once you have the ontology class object, you can begin processing it through the methods defined on OntClass. The attributes of a class are handled in a similar way to the attributes of OntResource, above, with a collection of methods to set, add, get, test, list and remove values. Properties of classes that are handled in this way are:\nAttribute Meaning subClass A subclass of this class, i.e. those classes that are declared subClassOf this class. superClass A super-class of this class, i.e. a class that this class is a subClassOf. equivalentClass A class that represents the same concept as this class. This is not just having the same class extension: the class \u0026lsquo;British Prime Minister in 2003\u0026rsquo; contains the same individual as the class \u0026rsquo;the husband of Cherie Blair\u0026rsquo;, but they represent different concepts. disjointWith Denotes a class with which this class has no instances in common. Thus, in our example ontology, we can print a list the subclasses of an Artefact as follows:\nOntClass artefact = m.getOntClass( NS + \u0026quot;Artefact\u0026quot; ); for (Iterator\u0026lt;OntClass\u0026gt; i = artefact.listSubClasses(); i.hasNext(); ) { OntClass c =; System.out.println( c.getURI() ); } Note that, under RDFS and OWL semantics, each class is a sub-class of itself (in other words, rdfs:subClassOf is reflexive). While this is true in the semantics, Jena users have reported finding it inconvenient. Therefore, the listSubClasses and listSuperClasses convenience methods remove the reflexive from the list of results returned by the iterator. However, if you use the plain Model API to query for rdfs:subClassOf triples, assuming that a reasoner is in use, the reflexive triple will appear among the deduced triples.\nGiven an OntClass object, you can create or remove members of the class extension – individuals that are instances of the class – using the following methods:\nMethod Meaning listInstances()\nlistInstances(boolean direct) Returns an iterator over those instances that include this class among their rdf:type values. The direct flag can be used to select individuals that are direct members of the class, rather than indirectly through the class hierarchy. Thus if p1 has rdf:type :Paper, it will appear in the iterator returned by listInstances on :Artefact, but not in the iterator returned by listInstances(false) on :Artefact. createIndividual()\ncreateIndividual(String uri) Adds a resource to the model, whose asserted rdf:type is this ontology class. If no URI is given, the individual is an anonymous resource. dropIndividual(Resource individual) Removes the association between the given individual and this ontology class. Effectively, this removes the rdf:type link between this class and the resource. Note that this is not the same as removing the individual altogether, unless the only thing that is known about the resource is that it is a member of the class. To delete an OntResource, including classes and individuals, use the remove() method. To test whether a class is a root of the class hierarchy in this model (i.e. it has no known super-classes), call isHierarchyRoot().\nThe domain of a property is intended to allow entailments about the class of an individual, given that it appears as a statement subject. It is not a constraint that can be used to validate a document, in the way that XML schema can do. Nevertheless, many developers find it convenient to use the domain of a property to document the design intent that the property only applies to known instances of the domain class. Given this observation, it can be a useful debugging or display aide to show the properties that have this class among their domain classes. The method listDeclaredProperties() attempts to identify the properties that are intended to apply to instances of this class. Using listDeclaredProperties is explained in detail in the RDF frames how-to.\nOntology properties In an ontology, a property denotes the name of a relationship between resources, or between a resource and a data value. It corresponds to a predicate in logic representations. One interesting aspect of RDFS and OWL is that properties are not defined as aspects of some enclosing class, but are first-class objects in their own right. This means that ontologies and ontology-applications can store, retrieve and make assertions about properties directly. Consequently, Jena has a set of Java classes that allow you to conveniently manipulate the properties represented in an ontology model.\nA property in an ontology model is an extension of the core Jena API class Property and allows access to the additional information that can be asserted about properties in an ontology language. The common API super-class for representing ontology properties in Java is OntProperty. Again, using the pattern of add, set, get, list, has, and remove methods, we can access the following attributes of an OntProperty:\nAttribute Meaning subProperty A sub property of this property; i.e. a property which is declared to be a subPropertyOf this property. If p is a sub property of q, and we know that A p B is true, we can infer that A q B is also true. superProperty A super property of this property, i.e. a property that this property is a subPropertyOf domain Denotes the class or classes that form the domain of this property. Multiple domain values are interpreted as a conjunction. The domain denotes the class of value the property maps from. range Denotes the class or classes that form the range of this property. Multiple range values are interpreted as a conjunction. The range denotes the class of values the property maps to. equivalentProperty Denotes a property that is the same as this property. inverse Denotes a property that is the inverse of this property. Thus if q is the inverse of p, and we know that A q B, then we can infer that B p A. In the example ontology, the property hasProgramme has a domain of OrganizedEvent, a range of Programme and the human-readable label \u0026ldquo;has programme\u0026rdquo;. We can reconstruct this definition in an empty ontology model as follows:\nOntModel m = ModelFactory.createOntologyModel( OntModelSpec.OWL_MEM ); OntClass programme = m.createClass( NS + \u0026quot;Programme\u0026quot; ); OntClass orgEvent = m.createClass( NS + \u0026quot;OrganizedEvent\u0026quot; ); ObjectProperty hasProgramme = m.createObjectProperty( NS + \u0026quot;hasProgramme\u0026quot; ); hasProgramme.addDomain( orgEvent ); body.addRange( programme ); body.addLabel( \u0026quot;has programme\u0026quot;, \u0026quot;en\u0026quot; ); As a further example, we can alternatively add information to an existing ontology. To add a super-property hasDeadline, to generalise the separate properties denoting the submission deadline, notification deadline and camera-ready deadline, do:\nOntModel m = ModelFactory.createOntologyModel( OntModelSpec.OWL_MEM ); \u0026quot;\u0026quot; ); DatatypeProperty subDeadline = m.getDatatypeProperty( NS + \u0026quot;hasSubmissionDeadline\u0026quot; ); DatatypeProperty notifyDeadline = m.getDatatypeProperty( NS + \u0026quot;hasNotificationDeadline\u0026quot; ); DatatypeProperty cameraDeadline = m.getDatatypeProperty( NS + \u0026quot;hasCameraReadyDeadline\u0026quot; ); DatatypeProperty deadline = m.createDatatypeProperty( NS + \u0026quot;deadline\u0026quot; ); deadline.addDomain( m.getOntClass( NS + \u0026quot;Call\u0026quot; ) ); deadline.addRange( XSD.dateTime ); deadline.addSubProperty( subDeadline ); deadline.addSubProperty( notifyDeadline ); deadline.addSubProperty( cameraDeadline ); Note that, although we called the addSubProperty method on the object representing the new super-property, the serialized form of the ontology will contain rdfs:subPropertyOf axioms on each of the sub-property resources, since this is what the language defines. Jena will, in general, try to allow symmetric access to sub-properties and sub-classes from either direction.\nObject and Datatype properties OWL refines the basic property type from RDF into two sub-types: object properties and datatype properties (for more details see [OWL Reference]). The difference between them is that an object property can have only individuals in its range, while a datatype property has concrete data literals (only) in its range. Some OWL reasoners are able to exploit the differences between object and datatype properties to perform more efficient reasoning over ontologies. OWL also adds an annotation property, which is defined to have no semantic entailments, and so is useful when annotating ontology documents, for example.\nIn Jena, the Java interfaces ObjectProperty, DatatypeProperty and AnnotationProperty are sub-types of OntProperty. However, they do not have any behaviours (methods) particular to themselves. Their existence allows the more complex sub-types of ObjectProperty – transitive properties and so forth – to be kept separate in the class hierarchy. However, when you create an object property or datatype property in a model, it will have the effect of asserting different rdf:type statements into the underlying triple store.\nFunctional properties OWL permits object and datatype properties to be functional – that is, for a given individual in the domain, the range value will always be the same. In particular, if father is a functional property, and individual :jane has father :jim and father :james, a reasoner is entitled to conclude that :jim and :james denote the same individual. A functional property is equivalent to stating that the property has a maximum cardinality of one.\nBeing a functional property is represented through the FunctionalProperty facet of an ontology property object. If a property is declared functional (test using the isFunctional() method), then the method asFunctionalProperty() conveniently returns the functional property facet. A non-functional property can be made functional through the convertToFunctionalProperty() method. When you are creating a property object, you also have the option of passing a Boolean parameter to the createObjectProperty() method on OntModel.\nOther property types There are several additional sub-types of ObjectProperty that represent additional capabilities of ontology properties. A TransitiveProperty means that if p is transitive, and we know :a p :b and also b p :c, we can infer that :a p :c. A SymmetricProperty means that if p is symmetric, and we know :a p :b, we can infer :b p :a. An InverseFunctionalProperty means that for any given range element, the domain value is unique.\nGiven that all properties are RDFNode objects, and therefore support the as() method, you can use as() to change from an object property facet to a transitive property facet. To make this more straightforward, the OntProperty Java class has a number of methods that support directly switching to the corresponding facet view:\npublic TransitiveProperty asTransitiveProperty(); public FunctionalProperty asFunctionalProperty(); public SymmetricProperty asSymmetricPropery(); public InverseFunctionalProperty asInverseFunctionalProperty(); These methods all assume that the underlying model will support this change in perspective. If not, the operation will fail with a ConversionException. For example, if a given property p is not asserted to be a transitive property in the underlying RDF model, then invoking p.asTransitiveProperty() will throw a conversion exception. The following methods will, if necessary, add additional information (i.e. the additional rdf:type statement) to allow the conversion to an alternative facet to succeed.\npublic TransitiveProperty convertToTransitiveProperty(); public FunctionalProperty convertToFunctionalProperty(); public SymmetricProperty convertToSymmetricPropery(); public InverseFunctionalProperty convertToInverseFunctionalProperty(); Sometimes it is convenient not to check whether the .as() conversion is warranted by the underlying data. This may be the case, for example, if the developer knows that the conversions are correct given the information from an external ontology which is not currently loaded. To allow .as() to always succeed, set the attribute strictMode to false on the OntModel object: myOntModel.setStrictMode( false ).\nFinally, methods beginning is... (e.g. isTransitiveProperty) allow you to test whether a given property would support a given sub-type facet.\nMore complex class expressions We introduced the handling of basic, named classes above. These are the only kind of class descriptions available in RDFS. In OWL, however, there are a number of additional types of class expression, which allow richer and more expressive descriptions of concepts. There are two main categories of additional class expression: restrictions and Boolean expressions. We\u0026rsquo;ll examine each in turn.\nRestriction class expressions A restriction defines a class by reference to one of the properties of the individuals that comprise the members of the class, and then placing some constraint on that property. For example, in a simple view of animal taxonomy, we might say that mammals are covered in fur, and birds in feathers. Thus the property hasCovering is in one case restricted to have the value fur, in the other to have the value feathers. This is a has value restriction. Six restriction types are currently defined by OWL:\nRestriction type Meaning has value The restricted property has exactly the given value. all values from All values of the restricted property, if it has any, are members of the given class. some values from The property has at least one value which is a member of the given class. cardinality The property has exactly n values, for some positive integer n. min cardinality The property has at least n values, for some positive integer n. max cardinality The property has at most n values, for some positive integer n. Note that, at present, the Jena ontology API has only limited support for OWL2\u0026rsquo;s qualified cardinality restrictions (i.e. cardinalityQ, minCardinalityQ and maxCardinalityQ). Qualified cardinality restrictions are encapsulated in the interfaces CardinalityQRestriction, MinCardinalityQRestriction and CardinalityQRestriction. OntModel also provides methods for creating and accessing qualified cardinality restrictions. Since they are not part of the OWL 1.0 language definition, qualified cardinality restrictions are not supported in OWL ontologies. Qualified cardinality restrictions were added to the OWL 2 update. OWL2 support in Jena will be added in due course.\nJena provides a number of ways of creating restrictions, or retrieving them from a model. Firstly, you can retrieve a general restriction from the model by its URI, if known.\n// get restriction with a given URI Restriction r = m.getRestriction( NS + \u0026quot;theName\u0026quot; ); You can create a new restriction created by nominating the property that the restriction applies to:\n// anonymous restriction on property p OntProperty p = m.createOntProperty( NS + \u0026quot;p\u0026quot; ); Restriction anonR = m.createRestriction( p ); Since a restriction is typically not assigned a URI in an ontology, retrieving an existing restriction by name may not be possible. However, you can list all of the restrictions in a model and search for the one you want:\nIterator\u0026lt;Restriction\u0026gt; i = m.listRestrictions(); while (i.hasNext()) { Restriction r =; if (isTheOne( r )) { // handle the restriction } } A common case is that we want the restrictions on some property p. In this case, from an object denoting p we can list the restrictions that mention that property:\nOntProperty p = m.getProperty( NS + \u0026quot;p\u0026quot; ); Iterator\u0026lt;Restriction\u0026gt; i = p.listReferringRestrictions(); while (i.hasNext()) { Restriction r =; // now handle the restriction ... } A general restriction can be converted to a specific type of restriction via as... methods (if the information is already in the model), or, if the information is not in the model, via convertTo... methods. For example, to convert the example restriction r from the example above to an all values from restriction, we can do the following:\nOntClass c = m.createClass( NS + \u0026quot;SomeClass\u0026quot; ); AllValuesFromRestriction avf = r.convertToAllValuesFromRestriction( c ); To create a particular restriction ab initio, we can use the creation methods defined on OntModel. For example:\nOntClass c = m.createClass( NS + \u0026quot;SomeClass\u0026quot; ); ObjectProperty p = m.createObjectProperty( NS + \u0026quot;p\u0026quot; ); // null denotes the URI in an anonymous restriction AllValuesFromRestriction avf = m.createAllValuesFromRestriction( null, p, c ); Assuming that the above code fragment was using a model m which was created with the OWL language profile, it creates a instance of an OWL restriction that would have the following definition in RDF/XML:\n\u0026lt;owl:Restriction\u0026gt; \u0026lt;owl:onProperty rdf:resource=\u0026quot;#p\u0026quot;/\u0026gt; \u0026lt;owl:allValuesFrom rdf:resource=\u0026quot;#SomeClass\u0026quot;/\u0026gt; \u0026lt;/owl:Restriction\u0026gt; Once we have a particular restriction object, there are methods following the standard add, get, set and test naming pattern to access the aspects of the restriction. For example, in a camera ontology, we might find this definition of a class describing Large-Format cameras:\n\u0026lt;owl:Class rdf:ID=\u0026quot;Large-Format\u0026quot;\u0026gt; \u0026lt;rdfs:subClassOf rdf:resource=\u0026quot;#Camera\u0026quot;/\u0026gt; \u0026lt;rdfs:subClassOf\u0026gt; \u0026lt;owl:Restriction\u0026gt; \u0026lt;owl:onProperty rdf:resource=\u0026quot;#body\u0026quot;/\u0026gt; \u0026lt;owl:allValuesFrom rdf:resource=\u0026quot;#BodyWithNonAdjustableShutterSpeed\u0026quot;/\u0026gt; \u0026lt;/owl:Restriction\u0026gt; \u0026lt;/rdfs:subClassOf\u0026gt; \u0026lt;/owl:Class\u0026gt; Here\u0026rsquo;s one way to access the components of the all values from restriction. Assume m contains a suitable camera ontology:\nOntClass largeFormat = m.getOntClass( camNS + \u0026quot;Large-Format\u0026quot; ); for (Iterator\u0026lt;OntClass\u0026gt; i = LargeFormat.listSuperClasses( true ); i.hasNext(); ) { OntClass c =; if (c.isRestriction()) { Restriction r = c.asRestriction(); if (r.isAllValuesFromRestriction()) { AllValuesFromRestriction av = r.asAllValuesFromRestriction(); System.out.println( \u0026quot;AllValuesFrom class \u0026quot; + av.getAllValuesFrom().getURI() + \u0026quot; on property \u0026quot; + av.getOnProperty().getURI() ); } } } Boolean class expressions Most developers are familiar with the use of Boolean operators to construct propositional expressions: conjunction (and), disjunction (or) and negation (not). OWL provides a means for constructing expressions describing classes with analogous operators, by considering class descriptions in terms of the set of individuals that comprise the members of the class.\nSuppose we wish to say that an instance x has rdf:type A and rdf:type B. This means that x is both a member of the set of individuals in A, and in the set of individuals in B. Thus, x lies in the intersection of classes A and B. If, on the other hand, A is either has rdf:type A or B, then x must lie in the union of A and B. Finally, to say that x does not have rdf:type A, it must lie in the complement of A. These operations, union, intersection and complement are the Boolean operators for constructing class expressions. While complement takes only a single argument, union and intersection must necessarily take more than one argument. Before continuing with constructing and using Boolean class expressions, let\u0026rsquo;s briefly to discuss lists.\nList expressions RDF originally had three container types: Seq, Alt and Bag. While useful, these are all open forms: it is not possible to say that a given container has a fixed number of values. Lists have subsequently been added to the core RDF specification, and are used extensively in OWL. A list follows the well-known cons cell pattern from Lisp, Prolog and other list-handling languages. Each cell of a list is either the end-of-list terminator (nil in Lisp), or is a pair consisting of a value and a pointer to the cell that is the first cell on the tail of the list. In RDF lists, the end-of-list is marked by a resource with name rdf:nil, while each list cell is an anonymous resource with two properties, one denoting the tail and the other the value. Fortunately, this complexity is hidden by some simple syntax:\n\u0026lt;p rdf:parseType=\u0026quot;collection\u0026quot;\u0026gt; \u0026lt;A /\u0026gt; \u0026lt;B /\u0026gt; \u0026lt;/p\u0026gt; According to the RDF specification, this list of two elements has the following expansion as RDF triples:\n\u0026lt;p\u0026gt; \u0026lt;rdf:first\u0026gt;\u0026lt;A /\u0026gt;\u0026lt;/rdf:first\u0026gt; \u0026lt;rdf:rest\u0026gt; \u0026lt;rdf:first\u0026gt;\u0026lt;B /\u0026gt;\u0026lt;/rdf:first\u0026gt; \u0026lt;rdf:rest rdf:resource=\u0026quot;\u0026quot;/\u0026gt; \u0026lt;/rdf:rest\u0026gt; \u0026lt;/p\u0026gt; Given this construction, a well formed list (one with exactly one rdf:first and rdf:rest per cons cell) has a precisely determined set of members. Incidentally, the same list in Turtle is even more compact:\n:example :p ( :A :B ). Although lists are defined in the generic RDF model in Jena, they are extensively used by the ontology API so we mention them here. Full details of the methods defined are in the RDFList javadoc.\nVarious means of constructing lists are defined in Model, as variants on createList. For example, we can construct a list of three classes as follows:\nOntModel m = ModelFactory.createOntModel(); OntClass c0 = m.createClass( NS + \u0026quot;c0\u0026quot; ); OntClass c1 = m.createClass( NS + \u0026quot;c1\u0026quot; ); OntClass c2 = m.createClass( NS + \u0026quot;c2\u0026quot; ); RDFList cs = m.createList( new RDFNode[] {c0, c1, c2} ); Alternatively, we can build a list one element at time:\nOntModel m = ModelFactory.createOntModel(); RDFList cs = m.createList(); // Cs is empty cs = cs.cons( m.createClass( NS + \u0026quot;c0\u0026quot; ) ); cs = cs.cons( m.createClass( NS + \u0026quot;c1\u0026quot; ) ); cs = cs.cons( m.createClass( NS + \u0026quot;c2\u0026quot; ) ); Note that these two approaches end with the classes in the lists in opposite orders, since the cons operation adds a new list cell to the front of the list. Thus the second list will run c2 to c0. In the ontology operations we are discussing here, the order of values in the list is not considered significant.\nFinally, a resource which is a cell in a list sequence will accept .as( RDFList.class )\nOnce the list has been created or obtained from the model, RDFList methods may be used to access members of the list, iterate over the list, and so forth. For example:\nSystem.out.println( \u0026quot;List has \u0026quot; + myRDFList.size() + \u0026quot; members:\u0026quot; ); for (Iterator\u0026lt;RDFNode\u0026gt; i = myRDFList.iterator(); i.hasNext(); ) { System.out.println( ); } Intersection, union and complement class expressions Given Jena\u0026rsquo;s ability to construct lists, building intersection and union class expressions is straightforward. The create methods on OntModel allow us to construct an intersection or union directly. Alternatively, given an existing OntClass, we can use the convertTo... methods to construct facet representing the more specialised expressions. For example, we can define the class of UK industry-related conferences as the intersection of conferences with a UK location and conferences with an industrial track. Here\u0026rsquo;s the XML declaration:\n\u0026lt;owl:Class rdf:ID=\u0026quot;UKIndustrialConference\u0026quot;\u0026gt; \u0026lt;owl:intersectionOf rdf:parseType=\u0026quot;Collection\u0026quot;\u0026gt; \u0026lt;owl:Restriction\u0026gt; \u0026lt;owl:onProperty rdf:resource=\u0026quot;#hasLocation\u0026quot;/\u0026gt; \u0026lt;owl:hasValue rdf:resource=\u0026quot;#united_kingdom\u0026quot;/\u0026gt; \u0026lt;/owl:Restriction\u0026gt; \u0026lt;owl:Restriction\u0026gt; \u0026lt;owl:onProperty rdf:resource=\u0026quot;#hasPart\u0026quot;/\u0026gt; \u0026lt;owl:someValuesFrom rdf:resource=\u0026quot;#IndustryTrack\u0026quot;/\u0026gt; \u0026lt;/owl:Restriction\u0026gt; \u0026lt;/owl:intersectionOf\u0026gt; \u0026lt;/owl:Class\u0026gt; Or, more compactly in N3/Turtle:\n:UKIndustrialConference a owl:Class ; owl:intersectionOf ( [a owl:Restriction ; owl:onProperty :hasLocation ; owl:hasValue :united_kingdom] [a owl:Restriction ; owl:onProperty :hasPart ; owl:someValuesFrom :IndustryTrack] ) Here is code to create this class declaration using Jena, assuming that m is a model into which the ESWC ontology has been read:\n// get the class references OntClass place = m.getOntClass( NS + \u0026quot;Place\u0026quot; ); OntClass indTrack = m.getOntClass( NS + \u0026quot;IndustryTrack\u0026quot; ); // get the property references ObjectProperty hasPart = m.getObjectProperty( NS + \u0026quot;hasPart\u0026quot; ); ObjectProperty hasLoc = m.getObjectProperty( NS + \u0026quot;hasLocation\u0026quot; ); // create the UK instance Individual uk = place.createIndividual( NS + \u0026quot;united_kingdom\u0026quot; ); // now the anonymous restrictions HasValueRestriction ukLocation = m.createHasValueRestriction( null, hasLoc, uk ); SomeValuesFromRestriction hasIndTrack = m.createHasValueRestriction( null, hasPart, indTrack ); // finally create the intersection class IntersectionClass ukIndustrialConf = m.createIntersectionClass( NS + \u0026quot;UKIndustrialConference\u0026quot;, m.createList( new RDFNode[] {ukLocation, hasIndTrack} ) ); Union and intersection class expressions are very similar, so Jena defines a common super-class BooleanClassDescription. This class provides access to the operands to the expression. In the intersection example above, the operands are the two restrictions. The BooleanClassDescription class allows us to set the operands en masse by supplying a list, or to be added or deleted one at a time.\nComplement class expressions are very similar. The principal difference is that they take only a single class as operand, and therefore do not accept a list of operands.\nEnumerated classes The final type class expression allows by OWL is the enumerated class. Recall that a class is a set of individuals. Often, we want to define the members of the class implicitly: for example, \u0026ldquo;the class of UK conferences\u0026rdquo;. Sometimes it is convenient to define a class explicitly, by stating the individuals the class contains. An enumerated class is exactly the class whose members are the given individuals. For example, we know that the class of PrimaryColours contains exactly red, green and blue, and no others.\nIn Jena, an enumerated class is created in a similar way to other classes. The set of values that comprise the enumeration is described by an RDFList. For example, here\u0026rsquo;s a class defining the countries that comprise the United Kingdom:\n\u0026lt;owl:Class rdf:ID=\u0026quot;UKCountries\u0026quot;\u0026gt; \u0026lt;owl:oneOf rdf:parseType=\u0026quot;Collection\u0026quot;\u0026gt; \u0026lt;eswc:Place rdf:about=\u0026quot;#england\u0026quot;/\u0026gt; \u0026lt;eswc:Place rdf:about=\u0026quot;#scotland\u0026quot;/\u0026gt; \u0026lt;eswc:Place rdf:about=\u0026quot;#wales\u0026quot;/\u0026gt; \u0026lt;eswc:Place rdf:about=\u0026quot;#northern_ireland\u0026quot;/\u0026gt; \u0026lt;/owl:oneOf\u0026gt; \u0026lt;/owl:Class\u0026gt; To list the contents of this enumeration, we could do the following:\nOntClass place = m.getOntClass( NS + \u0026quot;Place\u0026quot; ); EnumeratedClass ukCountries = m.createEnumeratedClass( NS + \u0026quot;UKCountries\u0026quot;, null ); ukCountries.addOneOf( place.createIndividual( NS + \u0026quot;england\u0026quot; ) ); ukCountries.addOneOf( place.createIndividual( NS + \u0026quot;scotland\u0026quot; ) ); ukCountries.addOneOf( place.createIndividual( NS + \u0026quot;wales\u0026quot; ) ); ukCountries.addOneOf( place.createIndividual( NS + \u0026quot;northern_ireland\u0026quot; ) ); for (Iterator i = UKCountries.listOneOf(); i.hasNext(); ) { Resource r = (Resource); System.out.println( r.getURI() ); } An OWL DataRange is similar to an enumerated class, except that the members of the DataRange are literal values, such as integers, dates or strings. See the DataRange javadoc for more details.\nListing classes In many applications, we need to inspect the set of classes in an ontology. The list... methods on OntModel provide a variety of means of listing types of class. The methods available include:\npublic ExtendedIterator\u0026lt;OntClass\u0026gt; listClasses(); public ExtendedIterator\u0026lt;EnumeratedClass\u0026gt; listEnumeratedClasses(); public ExtendedIterator\u0026lt;UnionClass\u0026gt; listUnionClasses(); public ExtendedIterator\u0026lt;ComplementClass\u0026gt; listComplementClasses(); public ExtendedIterator\u0026lt;IntersectionClass\u0026gt; listIntersectionClasses(); public ExtendedIterator\u0026lt;Restriction\u0026gt; listRestrictions(); public ExtendedIterator\u0026lt;OntClass\u0026gt; listNamedClasses(); public ExtendedIterator\u0026lt;OntClass\u0026gt; listHierarchyRootClasses(); The last two methods deserve special mention. In OWL, class expressions are typically not named, but are denoted by anonymous resources (aka bNodes). In many applications, such as displaying an ontology in a user interface, we want to pick out the named classes only, ignoring those denoted by bNodes. This is what listNamedClasses() does. The method listHierarchyRootClasses() identifies the classes that are uppermost in the class hierarchy contained in the given model. These are the classes that have no super-classes. The iteration returned by listHierarchyRootClasses() may contain anonymous classes. To get a list of named hierarchy root classes, i.e. the named classes that lie closest to the top of the hierarchy (alternatively: the shallowest fringe of the hierarchy consisting solely of named classes), use the OntTools method namedHierarchyRoots().\nYou should also note that it is important to close the iterators returned from the list... methods, particularly when the underlying store is a database. This is necessary so that any state (e.g. the database connection resources) can be released. Closing happens automatically when the hasNext() method on the iterator returns false. If your code does not iterate all the way to the end of the iterator, you should call the close() method explicitly. Note also that the values returned by these iterators will depend on the asserted data and the reasoner being used. For example, if the model contains a Restriction, that restriction will only be returned by the listClasses() iterator if the model is bound to a reasoner that can infer that any restriction is also be a class, since Restriction is a subClassOf Class. This difference can be exploited by the programmer: to list classes and restrictions separately, perform the listClasses() and listRestrictions() methods on the base model only, or on a model with no reasoner attached.\nInstances or individuals In OWL Full any value can be an individual – and thus the subject of triples in the RDF graph other than ontology declarations. In OWL Lite and DL, the language terms and the instance data that the application is working with are kept separate, by definition of the language. Jena therefore supports a simple notion of an Individual, which is essentially an alias for Resource. While Individuals are largely synonymous with Resources, they do provide an programming interface that is consistent with the other Java classes in the ontology API.\nThere are two ways to create individuals. Both requires the class to which the individual will initially belong:\nOntClass c = m.createClass( NS + \u0026quot;SomeClass\u0026quot; ); // first way: use a call on OntModel Individual ind0 = m.createIndividual( NS + \u0026quot;ind0\u0026quot;, c ); // second way: use a call on OntClass Individual ind1 = c.createIndividual( NS + \u0026quot;ind1\u0026quot; ); The only real difference between these approaches is that the second way will create the individual in the same model that the class is attached to (see the getModel() method). In both of the above examples the individual is named, but this is not necessary. The method OntModel.createIndividual( Resource cls ) creates an anonymous individual belonging to the given class. Note that the type of the class parameter is only Resource. You are not required to use as() to present a Resource to an OntClass before calling this method, though of course an OntClass is a Resource so using an OntClass will work perfectly well.\nIndividual provides a set of methods for testing and manipulating the ontology classes to which an individual belongs. This is a convenience: OWL and RDFS denote class membership through the rdf:type property, and methods for manipulating and testing rdf:type are defined on OntResource. You may use either approach interchangeably.\nOntology meta-data In OWL, but not RDFS, meta-data about the ontology itself is encoded as properties on an individual of class owl:Ontology. By convention, the URI of this individual is the URL, or web address, of the ontology document itself. In the XML serialisation, this is typically shown as:\n\u0026lt;owl:Ontology rdf:about=\u0026quot;\u0026quot;\u0026gt; \u0026lt;/owl:Ontology\u0026gt; Note that the construct rdf:about=\u0026quot;\u0026quot; does not indicate a resource with no URI; it is in fact a shorthand way of referencing the base URI of the document containing the ontology. The base URI may be stated in the document through an xml:base declaration in the XML preamble. The base URI can also be specified when reading the document via Jena\u0026rsquo;s Model API (see the read() methods on OntModel for reference).\nWe can attach various meta-data statements to this object to indicate attributes of the ontology as a whole. The Java object Ontology represents this special instance, and uses the standard add, set, get, list, test and delete pattern to provide access to the following attributes:\nAttribute Meaning backwardCompatibleWith Names a prior version of this ontology that this version is compatible with. incompatibleWith Names a prior version of this ontology that this version is not compatible with priorVersion Names a prior version of this ontology. imports Names an ontology whose definitions this ontology includes In addition to these attributes, the Ontology element typically contains common meta-data properties, such as comment, label and version information.\nIn the Jena API, the ontology\u0026rsquo;s metadata properties can be accessed through the Ontology interface. Suppose we wish to know the list of URI\u0026rsquo;s that the ontology imports. First we must obtain the resource representing the ontology itself:\nString base = ...; // the base URI of the ontology OntModel m = ...; // the model containing the ontology statements Ontology ont = m.getOntology( base ); // now list the ontology imports for (String imp : ont.listImportedOntologyURIs()) { System.out.println( \u0026quot;Ontology \u0026quot; + base + \u0026quot; imports \u0026quot; + imp ); } If the base URI of the ontology is not known, you can list all resources of rdf:type Ontology in a given model by OntModel.listOntologies(). If there is only one of these, it should be safe to assume that it is the Ontology resource for the ontology. However, you should note that if more than one ontology document has been read in to the model (for example by including the imports of a document), there may well be more than one Ontology resource in the model. In this case, you may find it useful to list the ontology resources in just the base model:\nOntModel m = ... // the model, including imports OntModel mBase = ModelFactory.createOntologyModel( OntModelSpec.OWL_MEM, m.getBaseModel() ); for (Iterator i = mBase.listOntologies(); i.hasNext(); ) { Ontology ont = (Ontology); // m's base model has ont as an import ... } A common practice is also to use the Ontology element to attach Dublin Core metadata to the ontology document. Jena provides a copy of the Dublin Core vocabulary, in org.apache.jena.vocabulary.DCTerms. To attach a statement saying that the ontology was authored by John Smith, we can say:\nOntology ont = m.getOntology( baseURI ); ont.addProperty( DCTerms.creator, \u0026quot;John Smith\u0026quot; ); It is also possible to programmatically add imports and other meta-data to a model, for example:\nString base = ...; // the base URI of the ontology OntModel m = ...; Ontology ont = m.createOntology( base ); ont.addImport( m.createResource( \u0026quot;\u0026quot; ) ); ont.addImport( m.createResource( \u0026quot;\u0026quot; ) ); Note that under default conditions, simply adding (or removing) an owl:imports statement to a model will not cause the corresponding document to be imported (or removed). However, by calling OntModel.setDynamicImports(true), the model will start noticing the addition or removal of owl:imports statements.\nOntology inference: overview You have the choice of whether to use the Ontology API with Jena\u0026rsquo;s reasoning capability turned on, and, if so, which of the various reasoners to use. Sometimes a reasoner will add information to the ontology model that it is not useful for your application to see. A good example is an ontology editor. Here, you may wish to present your users with the information they have entered in to their ontology; the addition of the entailed information into the editor\u0026rsquo;s display would be very confusing. Since Jena does not have a means for distinguishing inferred statements from those statements asserted into the base model, a common choice for ontology editors and similar applications is to run with no reasoner.\nIn many other cases, however, it is the addition of the reasoner that makes the ontology useful. For example, if we know that John is the father of Mary, we would expect a \u0026lsquo;yes\u0026rsquo; if we query whether John is the parent of Mary. The parent relationship is not asserted, but we know from our ontology that fatherOf is a sub-property of parentOf. If \u0026lsquo;John fatherOf Mary\u0026rsquo; is true, then \u0026lsquo;John parentOf Mary\u0026rsquo; is also true. The integrated reasoning capability in Jena exists to allow just such entailments to be seen and used.\nFor a complete and thorough description of Jena\u0026rsquo;s inference capabilities, please see the reasoner documentation. This section of of the ontology API documentation is intended to serve as only a brief guide and overview.\nRecall from the introduction that the reasoners in Jena operate by making it appear that triples entailed by the inference engine are part of the model in just the same way as the asserted triples (see Figure 2). The underlying architecture allows the reasoner to be part of the same Java virtual machine (as is the case with the built-in rule-based reasoners), or in a separate process on the local computer, or even a remote computer. Of course, each of these choices will have different characteristics of what reasoning capabilities are supported, and what the implications for performance are.\nThe reasoner attached to an ontology model, if any, is specified through the OntModelSpec. The methods setReasoner() and setReasonerFactory() on the model spec are used to specify a reasoner. The setReasoner variant is intended for use on a specification which will only be used to build a single model. The factory variant is used where the OntModelSpec will be used to build more than one model, ensuring that each model gets its own reasoner object. The ReasonerRegistry provides a collection of pre-built reasoners – see the reasoner documentation for more details. However, it is also possible for you to define your own reasoner that conforms to the appropriate interface. For example, there is an in-process interface to the open-source Pellet reasoner.\nTo facilitate the choice of reasoners for a given model, some common choices have been included in the pre-built ontology model specifications available as static fields on OntModelSpec. The available choices are described in the section on ont model specifications, above.\nDepending on which of these choices is made, the statements returned from queries to a given ontology model may vary considerably.\nAdditional notes Jena\u0026rsquo;s inference machinery defines some specialised services that are not exposed through the addition of extra triples to the model. These are exposed by the InfModel interface; for convenience OntModel extends this interface to make these services directly available to the user. Please note that calling inference-specific methods on an ontology model that does not contain a reasoner will have unpredictable results. Typically these methods will have no effect or return null, but you should not rely on this behaviour.\nIn general, inference models will add many additional statements to a given model, including the axioms appropriate to the ontology language. This is typically not something you will want in the output when the model is serialized, so write() on an ontology model will only write the statements from the base model. This is typically the desired behaviour, but there are occasions (e.g. during debugging) when you may want to write the entire model, virtual triples included. The easiest way to achieve this is to call the writeAll() method on OntModel. An alternative technique, which can sometimes be useful for a variety of use-cases, including debugging, is to snapshot the model by constructing a temporary plain model and adding to it: the contents of the ontology model:\nOntModel m = ... // snapshot the contents of ont model om Model snapshot = ModelFactory.createDefaultModel(); snapshot.add( om ); Working with persistent ontologies A common way to work with ontology data is to load the ontology axioms and instances at run-time from a set of source documents. This is a very flexible approach, but has limitations. In particular, your application must parse the source documents each time it is run. For large ontologies, this can be a source of significant overhead. Jena provides an implementation of the RDF model interface that stores the triples persistently in a database. This saves the overhead of loading the model each time, and means that you can store RDF models significantly larger than the computer\u0026rsquo;s main memory, but at the expense of a higher overhead (a database interaction) to retrieve and update RDF data from the model. In this section we briefly discuss using the ontology API with Jena\u0026rsquo;s persistent database models.\nFor information on setting-up and accessing the persistent models themselves, see the TDB reference sections.\nThere are two somewhat separate requirements for persistently storing ontology data. The first is making the main or base model itself persistent. The second is re-using or creating persistent models for the imports of an ontology. These two requirements are handled slightly differently.\nTo retrieve a Jena model from the database API, we have to know its name. Fortunately, common practice for ontologies on the Semantic Web is that each is named with a URI. We use this URI to name the model that is stored in the database. Note carefully what is actually happening here: we are exploiting a feature of the database sub-system to make persistently stored ontologies easy to retrieve, but we are not in any sense resolving the URI of the model. Once placed into the database, the name of the model is treated as an opaque string.\nTo create a persistent model for the ontology, we create a model maker that will access our underlying database, and use the ontology URI as the database name. We then take the resulting persistent model, and use it as the base model when constructing an ontology model:\nModel base = getMaker().createModel( \u0026quot;\u0026quot; ); OntModel m = ModelFactory.createOntologyModel( OntModelSpec.OWL_MEM_RULE_INF, base ); Here we assume that the getMaker() method returns a suitably initialized ModelMaker that will open the connection to the database. This step only creates a persistent model named with the ontology URI. To initialise the content, we must either add statements to the model using the OntModel API, or do a one-time read from a document:\ \u0026quot;\u0026quot; ); Once this step is completed, the model contents may be accessed in future without needing to read again.\nIf the Customers ontology imports other ontologies, using owl:imports, the Jena Ontology API will build a union model containing the closure of the imports. Even if the base model is persistent, the predefined OntModelSpec objects only specify memory models to contain the imported ontologies, since memory models do not require any additional parameters.\nTo specify that the imported models should stored in, and retrieved from, the database, we must update the ontology spec object to use the model maker that encapsulates the database connection:\nOntModelSpec spec = new OntModelSpec( OntModelSpec.OWL_MEM_RULE_INF ); // set the model maker for the base model spec.setBaseModelMaker( getMaker() ); // set the model maker for imports spec.setImportModelMaker( getMaker() ); This new model maker will then be used to generate persistent models named with the URI of the imported ontology, if it passed instead of OntModelSpec.OWL_MEM_RULE_INF to the createOntologyModel method of the model factory. Note that once the import has been loaded once into the database, it can be re-used by other ontologies that import it. Thus a given database will only contain at most one copy of each imported ontology.\nNote on performance The built-in Jena reasoners, including the rule reasoners, make many small queries into the model in order to propagate the effects of rules firing. When using a persistent database model, each of these small queries creates an SQL interaction with the database engine. This is a very inefficient way to interact with a database system, and performance suffers as a result. Efficient reasoning over large, persistent databases is currently an open research challenge. Our best suggested work-around is, where possible, to snapshot the contents of the database-backed model into RAM for the duration of processing by the inference engine. An alternative solution, that may be applicable if your application does not write to the datastore often, is to precompute the inference closure of the ontology and data in-memory, then store that into a database model to be queried by the run-time application. Such an off-line processing architecture will clearly not be applicable to every application problem.\nA sample program shows the above steps combined, to create an ontology in which both base model and imports are stored in a persistent database.\nExperimental ontology tools Starting with Jena release 2.6, the OntTools class provides a small collection of commonly-requested utilities for assisting with ontology processing. Given that this is a new feature, you should regard it as an experimental facility for the time being. We welcome feedback. The capabilities in OntTools are implemented as static methods. Currently available tools are:\nOntClass getLCA( OntModel m, OntClass u, OntClass v ) Determine the lowest common ancestor for classes u and v. This is the class that is lowest in the class hierarchy, and which includes both u and v among its sub-classes. Path findShortestPath( Model m, Resource start, RDFNode end, Filter onPath ) Breadth-first search, including a cycle check, to locate the shortest path from start to end, in which every triple on the path returns true to the onPath predicate. List namedHierarchyRoots( OntModel m ) Compute a list containing the uppermost fringe of the class hierarchy in the given model which consists only of named classes. ","permalink":"","tags":null,"title":"Jena Ontology API"},{"categories":null,"contents":"Jena Permissions is a SecurityEvaluator interface and a set of dynamic proxies that apply that interface to Jena Graphs, Models, and associated methods and classes. It does not implement any specific security policy but provides a framework for developers or integrators to implement any desired policy.\nDocumentation Overview Usage Notes Design Security Evaluator implementation Assembler for a Secured Model Adding Jena Permissions to Fuseki Overview Jena Permissions transparently intercepts calls to the Graph or Model interface, evaluates access restrictions and either allows or rejects the access. The system is authentication agnostic and will work with most authentication systems. The system uses dynamic proxies to wrap any Graph or Model implementation. The Jena Permissions module includes an Assembler module to extend the standard Assembler to include the ability to create secured models and graphs. A complete example application is also available.\nThe developer using Jena Permissions is required to implement a SecurityEvaluator that provides access to the Principal (User) using the system and also determines if that Principal has the proper access to execute a method. Through the SecurityEvaluator the developer may apply full CRUD (Create, Read, Update, and Delete) restrictions to graphs and optionally triples within the graphs.\nThe javadocs have additional annotations that specify what permissions at graph and triple levels are required for the user to execute the method.\nThere is an example jar that contains configuration examples for both a stand alone application and a Fuseki configuration option.\nUsage Notes When the system is correctly configured the developer creates a SecuredGraph by calling Factory.getInstance( SecurityEvaluator, String, Graph );. Once created the resulting graph automatically makes the appropriate calls to the SecurityEvaluator before passing any approved requests to the underlying graph.\nSecured models are created by calling Factory.getInstance( SecurityEvaluator, String, Model ); or ModelFactory.createModelForGraph( SecuredGraph );\nNOTE: when creating a model by wrapping a secured graph (e.g. ModelFactory.createModelForGraph( SecuredGraph );) the resulting Model does not have the same security requirements that the standard secured model. For example When creating a list on a secured model calling model.createList( RDFNode[] );, the standard secured model verifies that the user has the right to update the triples and allows or denies the entire operation accordingly. The wrapped secured graph does not have visibility to the createList() command and can only operate on the instructions issued by the model.createList() implementation. In the standard implementation the model requests the graph to delete one triple and then insert another. Thus the user must have delete and add permissions, not the update permission.\nThere are several other cases where the difference in the layer can trip up the security system. In all known cases the result is a tighter security definition than was requested. For simplicity sake we recommend that the wrapped secured graph only be used in cases where access to the graph as a whole is granted/denied. In these cases the user either has all CRUD capabilities or none.\n","permalink":"","tags":null,"title":"Jena Permissions - A Permissions wrapper around the Jena RDF implementation"},{"categories":null,"contents":"Jena Permissions provides a standard Jena assembler making it easy to use the SecuredModel in an Assembler based environment. To use the permissions assembler the assembler file must contain the lines:\n[] ja:loadClass \u0026quot;org.apache.jena.permissions.SecuredAssembler\u0026quot; . sec:Model rdfs:subClassOf ja:NamedModel . The secured assembler provides XXXXXXXXXXXx properties for the assembler files.\nAssuming we define:\n@prefix sec: \u0026lt;\u0026gt; . Then the following resources are defined:\nsec:Model - A secured model. One against which the security evaluator is running access checks. All sec:Model instances must have a ja:ModelName to identify it to the SecurityEvaluator\nsec:Evaluator - An instance of SecurityEvaluator.\nThe following are properties are also defined:\nsec:evaluatorFactory - Identifies the class name of a factory class that implements a no-argument getInstance() method that returns an instance of SecurityEvaluator.\nsec:baseModel - Identifies the ja:Model that is to have permissions applied to it.\nsec:evaluatorImpl - Identifies an instance of SecurityEvaluator.\nsec:evaluatorClass - Identifies a class that implements SecurityEvaluator\nsec:args - Identifies arguments to the sec:evaluatorClass constructor.\nThe secured assembler provides two (2) mechanisms to create a secured graph. The first is to use a SecurityEvaluator factory.\nmy:securedModel rdf:type sec:Model ; sec:baseModel my:baseModel ; ja:modelName \u0026quot;\u0026quot; ; sec:evaluatorFactory \u0026quot;\u0026quot; . In the above example static method getInstance() is called on and the result is used as the SecurityEvaluator. This is used to create a secured model (my:securedModel) that wraps the model my:baseModel and identifies itself to the SecurityEvaluator with the URI \u0026quot;\u0026quot;.\nThe second mechanism is to use the sec:Evaluator method.\nmy:secEvaluator rdf:type sec:Evaluator ; sec:args [ rdf:_1 my:secInfoModel ; ] ; sec:evaluatorClass \u0026quot;your.implementation.SecurityEvaluator\u0026quot; . my:securedModel rdf:type sec:Model ; sec:baseModel my:baseModel ; ja:modelName \u0026quot;\u0026quot; ; sec:evaluatorImpl my:secEvaluator . In the above example my:secEvaluator is defined as a sec:Evaluator implemented by the class \u0026quot;your.implementation.SecurityEvaluator\u0026quot;. When the instance is constructed the constructor with one argument is used and it is passed my:secInfoModel as an argument. my:secInfoModel may be any type supported by the assembler. If more than one argument is desired then rdf:_2, rdf:_3, rdf:_4, etc. may be added to the sec:args list. The \u0026quot;your.implementation.SecurityEvaluator\u0026quot; with the proper number of arguments will be called. It is an error to have more than one argument with the proper number of arguments.\nAfter construction the value of my:securedModel is used to construct the my:securedModel instance. This has the same properties as the previous example other than that the SecurityEvaluator instance is different.\n","permalink":"","tags":null,"title":"Jena Permissions - Assembler for a Secured Model"},{"categories":null,"contents":"Jena Permissions is designed to allow integrators to implement almost any security policy. Fundamentally it works by implementing dynamic proxies on top of the Jena Graph and Model interfaces as well as objects returned by those interfaces. The proxy verifies that the actions on those objects are permitted by the policy before allowing the actions to proceed.\nThe graph or model is created by the org.apache.jena.permissions.Factory object by wrapping a Graph or Model implementation and associating it with a URI (graphIRI) and a SecurityEvaluator implementation. The graphIRI is the URI that will be used to identify the graph/model to the security evaluator.\nThe SecurityEvaluator is an object implemented by the integrator to perform the necessary permission checks. A discussion of the SecurityEvaluator implementation can be found in the Security Evaluator documentation.\nAccess to methods in secured objects are determined by the CRUD (Create, Read, Update and Delete) permissions assigned to the user.\nThe system is designed to allow shallow (graph/model level) or deep (triple/statement level) decisions.\nWhen a secured method is called the system performs the following checks in order:\nDetermines if the user has proper access to the underlying graph/model. Generally the required permission is Update (for add or delete methods), or Read.\nIf the user has access to the graph/model determine if the user has permission to execute the method against all triples/statements in the graph/model. This is performed by calling SecurityEvaluator.evaluate(principal, action, graphIRI, Triple.ANY). If the evaluator returns true then the action is permitted. This is general case for shallow permission systems. For deep permissions systems false may be returned.\nif the user does not have permission to execute the method against all triples/statements the SecurityEvaluator.evaluate(principal, action, graphIRI, triple) method is called with the specific triple (note special cases below). If the evaluator returns true the action is permitted, otherwise a properly detailed PermissionDeniedException is thrown.\nSpecial Cases SecurityEvaluator.FUTURE There are a couple of special cases where the Node/Resource is not known when the permission check is made. An example is the creation of a RDF List object. For example to create an empty list the following triple/statement must be constructed:\n_:b1 rdf:first rdf:nil . However, the permissions system can not know the value of _:b1 until after the triple/statement is constructed and added to the graph/model. To handle this situation the permissions system asks the evaluator to evaluate the triple: (SecurityEvaluator.FUTURE, RDF.first, RDF.nill) Similar situations are found when adding to a list, creating reified statements, RDF alt objects, RDF sequences, or RDF anonymous resources of a specific type.\nSecurityEvaluator.VARIABLE The Node.ANY node is used to identify the case where any node may be returned. Specifically it asks if the user can perform the action on All the nodes in this position in the triple. For example:\nNode.ANY RDF:type FOAF:Person Asks if the operation can be performed on all of the nodes of type FOAF:Person.\nThe SecurityEvaluator.VARIABLE differs from Node.ANY in that the system is asking if there are any prohibitions, and not if the user may perform. Thus queries with the VARIABLE type node should return true where ANY returns false. In general this type is used in query evaluation to determine if triple level filtering of results must be performed. Thus:\nSecurityEvaluator.VARIABLE RDF:type FOAF:Person Asks if there are any restrictions against the user performing the action against all triples of type FOAF:Person. The assumption is that checking for restrictions may be a faster check than checking for all access. Note that by returning true the permissions system will check each explicit triple for access permissions. So if the system can not determine if there are access restrictions it is safe to return true.\nObjects Returned from Secured Objects Models and Graphs often return objects from methods. For example the model.createStatement() returns a Statement object. That object holds a reference to the model and performs operations against the model (for example Statement.changeLiteralObject()). Since permissions provides a dynamic wrapper around the base model to create the secured model, returning the model Statement would return an object that no longer has any permissions applied. Therefore the permissions system creates a SecuredStatement that applies permission checks to all operations before calling the base Statement methods.\nAll secured objects return secured objects if those objects may read or alter the underlying graph/model.\nAll secured objects are defined as interfaces and are returned as dynamic proxies.\nAll secured objects have concrete implementations. These implementations must remain concrete to ensure that we handle all cases where returned objects may alter the underlying graph/model.\nSecured Listeners Both the Graph and the Model interfaces provide a listener framework. Listeners are attached to the graph/model and changes to the graph/model are reported to them. In order to ensure that listeners do not leak information, the principal that was active when the listener was attached is preserved in a CachedSecurityEvaluator instance. This security evaluator implementation, wraps the original implementation and retains the current user. Thus when the listener performs the permission checks the original user is used not the current user. This is why the SecurityEvaluator must use the principal parameters and not call getPrincipal() directly during evaluation calls.\nProxy Implementation The proxy implementation uses a reflection InvocationHandler strategy. This strategy results in a proxy that implements all the interfaces of the original object. The original object along with its InvocationHandler instance are kept together in an ItemHolder instance variable in the secured instance. When the invoker is called it determines if the called method is on the secured interface or not. If the method is on the secured interface the invocation handler method is called, otherwise the method on the base class is called.\n","permalink":"","tags":null,"title":"Jena Permissions - Design"},{"categories":null,"contents":"When Jena moved from version 2 to version 3 there was a major renaming of packages. One of the packages renamed was the Jena Permissions package. It was formerly named Jena Security. There are several changes that need to occur to migrate from jena-security version 2.x to jena-permissions version 3.x.\nChanges Package Rename There are two major changes to package names.\nAs with the rest of the Jena code all references to com.hp.hpl.jena have been changed to org.apache.jena. For integrator code this means that a simple rename of the includes is generally all that is required for this. See the main Migration Notes page for other hints and tips regarding this change.\nJena Security has been renamed Jena Permissions and the Maven artifact id has been changed to jena-permissions to reflect this change.\nThe permissions assembler namespace has been changed to\nExceptions Formerly Jena Permissions uses a single exception to identify the access restriction violations. With the tighter integration of permission concepts into the Jena core there are now 7 exceptions. This change will probably not required modification to the SecurityEvaluator implementation but may require modification to classes that utilize the permissions based object.\nAll exceptions are runtime exceptions and so do not have to be explicitly caught. Javadocs indicate which methods throw which exceptions.\nRemoval of org.apache.jena.permissions.AccessDeniedException. This is replace by 5 individual exceptions.\nAddition of org.apache.jena.shared.OperationDeniedException. This exception is a child of the JenaException and is the root of all operation denied states whether through process errors or through permissions violations.\nAddition of org.apache.jena.shared.PermissionDeniedException. This exception is a child of the OperationDeniedException and is the root of all operations denied through permission violations. These can be because the object was statically prohibited from performing an operation (e.g. a read-only graph) or due to the Jena Permissions layer.\nAddition of org.apache.jena.shared.AddDeniedException. This exception is a child of PermissionDeniedException and used to indicate that an attempt was made to add to an unmodifiable object. It may be thrown by read-only graphs or by the permission layer when a create restriction is violated.\nAddition of org.apache.jena.shared.DeleteDeniedException. This exception is a child of PermissionDeniedException and used to indicate that an attempt was made to delete from an unmodifiable object. It may be thrown by read-only graphs or by the permission layer when a delete restriction is violated.\nAddition of org.apache.jena.shared.ReadDeniedException. This exception is a child of PermissionDeniedException and used to indicate that a read restriction was violated.\nAddition of org.apache.jena.shared.UpdateDeniedException. This exception is a child of PermissionDeniedException and used to indicate that a update restriction was violated.\nAddition of org.apache.jena.shared.AuthenticationRequiredException. This exception is a child of OperationDeniedException and used to indicate that user authentication is required but has not occurred. This exception should be thrown when the SecurityEvaluator attempts to evaluate an operation and there is both a permissions restriction and the object returned by getPrincipal() indicates that the user is unauthenticated.\nRemoval of Classes The original \u0026ldquo;security\u0026rdquo; code was intended to be graph agnostic and so injected a \u0026ldquo;shim\u0026rdquo; layer to convert from graph specific classes to security specific classes. With the renaming of the package to \u0026ldquo;permissions\u0026rdquo; and the tighter integration to the Jena core the \u0026ldquo;shim\u0026rdquo; structure has been removed. This should make the permissions layer faster and cleaner to implement.\nSecNode The SecNode class has been removed. This was effectively a proxy for the Jena Node object and has been replaced with that object. The SecNode maintained its type (e.g. URI, Literal or Variable) using an internal Enumeration. The method getType() was used to identify the internal type. With the Jena node replacement statements of the form\nif (secNode.getType().equals( SecNode.Type.Literal )) { // do something } are replaced with\nif (node.isLiteral()) { // do something } SecNode.ANY has been replaced with Node.ANY as it served the same purpose.\nSecNode.FUTURE has been replaced with SecurityEvaluator.FUTURE and is now implemented as a blank node with the label urn:jena-permissions:FUTURE.\nSecNode.VARIABLE has been replaced with SecurityEvaluator.VARIABLE and is now implemented as a blank node with the label urn:jena-permissions:VARIABLE.\nSecTriple The SecTriple class has been removed. This was effectively a proxy for the Jena Triple object and has been replaced with that object.\nMovement of Classes SecuredItem The SecuredItem interface was moved from org.apache.jena.permissions.impl to org.apache.jena.permissions.\nAdditional Methods SecurityEvaluator The method isAuthenticatedUser( Object principal ) has been added. The SecurityEvaluator should respond true if the principal is recognized as an authenticated user. The principal object is guaranteed to have been returned from an earlier getPrincipal() call.\n","permalink":"","tags":null,"title":"Jena Permissions - Migration notes: Version 2.x to Version 3.x"},{"categories":null,"contents":"Overview The SecurityEvaluator interface defines the access control operations. It provides the interface between the authentication (answers the question: \u0026ldquo;who are you?\u0026rdquo;) and the authorization (answers the question: \u0026ldquo;what can you do?\u0026rdquo;), as such it provides access to the current principal (user). The javadocs contain detailed requirements for implementations of the SecurityEvaluator interface, short notes are provided below.\nNOTE The permissions system caches intermediate results and will only call the evaluator if the answer is not already in the cache. There is little or no advantage to implementing caching in the SecurityEvaluator itself.\nNOTE In earlier versions ReadDeniedException was thrown whenever read permissions were not granted. The current version defines a isHardReadError method that defines what action should be taken. The default implementation has changed. See Configuration Methods section below for information.\nActions Principals may perform Create, Read, Update or Delete operations on secured resources. These operations are defined in the Action enum in the SecurityEvaluator interface.\nNode The permission system uses the standard Node.ANY to represent a wild-card in a permission check and the standard Triple.ANY to represent a triple with wild-cards in each of the three positions: subject, predicate and object.\nThe permission system introduces two new node types SecurityEvaluator.VARIABLE, which represents a variable in a permissions query, and SecurityEvaluator.FUTURE, which represents an anonymous node that will be created in the future.\nEvaluator Methods The SecurityEvaluator connects the Jena permissions system with the authentication system used by the application. The SecurityEvaluator must be able to query the authentication system, or its proxy, to determine who the \u0026ldquo;current user\u0026rdquo; is. In this context the \u0026ldquo;current user\u0026rdquo; is the one making the request. In certain instances (specifically when using listeners on secured graphs and models) the \u0026ldquo;current user\u0026rdquo; may not be the user identified by the authentication system at the time of the query.\nThe SecurityEvaluator must implement the following methods. Any of these methods may throw an AuthenticationRequiredException if there is no authenticated user.\nMost of these methods have a principal parameter. The value of that parameter is guaranteed to be a value returned from an earlier calls to getPrincipal(). The principal parameter, not the \u0026ldquo;current user\u0026rdquo; as identified by getPrincipal(), should be used for the permissions evaluation.\nNone of these methods should throw any of the PermissionDeniedException based exceptions. That is handled in a different layer.\nSee the SecurityEvaluator javadocs for detailed implementation notes.\npublic boolean evaluate( Object principal, Action action, Node graphIRI ) throws AuthenticationRequiredException; Determine if the action is permitted on the graph.\npublic boolean evaluate( Object principal, Action action, Node graphIRI, Triple triple ) throws AuthenticationRequiredException; Determine if the action is allowed on the triple within the graph.\npublic boolean evaluate( Object principal, Set\u0026lt;Action\u0026gt; actions, Node graphIRI )throws AuthenticationRequiredException; Determine if all actions are allowed on the graph.\npublic boolean evaluate( Object principal, Set\u0026lt;Action\u0026gt; actions, Node graphIRI, Triple triple ) throws AuthenticationRequiredException; Determine if all the actions are allowed on the triple within the graph.\npublic boolean evaluateAny( Object principal, Set\u0026lt;Action\u0026gt; actions, Node graphIRI ) throws AuthenticationRequiredException; Determine if any of the actions are allowed on the graph.\npublic boolean evaluateAny( Object principal, Set\u0026lt;Action\u0026gt; actions, Node graphIRI, Triple triple ) throws AuthenticationRequiredException; Determine if any of the actions are allowed on the triple within the graph.\npublic boolean evaluateUpdate( Object principal, Node graphIRI, Triple from, Triple to ) throws AuthenticationRequiredException; Determine if the user is allowed to update the \u0026ldquo;from\u0026rdquo; triple to the \u0026ldquo;to\u0026rdquo; triple.\npublic Object getPrincipal() throws AuthenticationRequiredException; Return the current principal or null if there is no current principal.\nConfiguration Methods The evaluator has one configuration method.\npublic default boolean isHardReadError() This method determines how the system will deal with read denied restrictions when attempting to create iterators, counts, or perform existential checks. If set true the system will throw a ReadDeniedException. This is the action that was perfomed in Jena version 3 and earlier. If set false, the default, methods that return iterators return empty iterators, methods that perform existential checks return false, and methods that return counts return 0 (zero).\nSample Implementation This sample is for a graph that contains a set of messages, access to the messages are limited to principals that the messages are to or from. Any triple that is not a message is not affected. This implementation simply has a setPrincipal(String name) method. A real implementation would request the user principal or name from the authentication system. This implementation also requires access to the underlying model to determine if the user has access, however, that is not a requirement of the SecurityEvaluator in general. Determining access from the information provided is an exercise for the implementer.\nNote that this implementation does not vary based on the graph being evaluated (graphIRI). The graphIRI parameter is provided for implementations where such variance is desired.\nSee the example jar for another implementation example.\npublic class ExampleEvaluator implements SecurityEvaluator { private Principal principal; private Model model; private RDFNode msgType = ResourceFactory.createResource( \u0026quot;\u0026quot; ); private Property pTo = ResourceFactory.createProperty( \u0026quot;\u0026quot; ); private Property pFrom = ResourceFactory.createProperty( \u0026quot;\u0026quot; ); /** * * @param model The graph we are going to evaluate against. */ public ExampleEvaluator( Model model ) { this.model = model; } @Override public boolean evaluate(Object principal, Action action, Node graphIRI) { // we allow any action on a graph. return true; } // not that in this implementation all permission checks flow through // this method. We can do this because we have a simple permissions // requirement. A more complex set of permissions requirement would // require a different strategy. private boolean evaluate( Object principalObj, Resource r ) { Principal principal = (Principal)principalObj; // we do not allow anonymous (un-authenticated) reads of data. // Another strategy would be to only require authentication if the // data being requested was restricted -- but that is a more complex // process and not suitable for this simple example. if (principal == null) { throw new AuthenticationRequiredException(); } // a message is only available to sender or recipient if (r.hasProperty( RDF.type, msgType )) { return r.hasProperty( pTo, principal.getName() ) || r.hasProperty( pFrom, principal.getName()); } return true; } // evaluate a node. private boolean evaluate( Object principal, Node node ) { if (node.equals( Node.ANY )) { // all wildcards are false. This forces each triple // to be explicitly checked. return false; } // if the node is a URI or a blank node evaluate it as a resource. if (node.isURI() || node.isBlank()) { Resource r = model.getRDFNode( node ).asResource(); return evaluate( principal, r ); } return true; } // evaluate the triple by evaluating the subject, predicate and object. private boolean evaluate( Object principal, Triple triple ) { return evaluate( principal, triple.getSubject()) \u0026amp;\u0026amp; evaluate( principal, triple.getObject()) \u0026amp;\u0026amp; evaluate( principal, triple.getPredicate()); } @Override public boolean evaluate(Object principal, Action action, Node graphIRI, Triple triple) { return evaluate( principal, triple ); } @Override public boolean evaluate(Object principal, Set\u0026lt;Action\u0026gt; actions, Node graphIRI) { return true; } @Override public boolean evaluate(Object principal, Set\u0026lt;Action\u0026gt; actions, Node graphIRI, Triple triple) { return evaluate( principal, triple ); } @Override public boolean evaluateAny(Object principal, Set\u0026lt;Action\u0026gt; actions, Node graphIRI) { return true; } @Override public boolean evaluateAny(Object principal, Set\u0026lt;Action\u0026gt; actions, Node graphIRI, Triple triple) { return evaluate( principal, triple ); } @Override public boolean evaluateUpdate(Object principal, Node graphIRI, Triple from, Triple to) { return evaluate( principal, from ) \u0026amp;\u0026amp; evaluate( principal, to ); } public void setPrincipal( String userName ) { if (userName == null) { principal = null; } principal = new BasicUserPrincipal( userName ); } @Override public Principal getPrincipal() { return principal; } @Override public boolean isPrincipalAuthenticated(Object principal) { return principal != null; } } ","permalink":"","tags":null,"title":"Jena Permissions - SecurityEvaluator implementation"},{"categories":null,"contents":"Overview Query Builder provides implementations of Ask, Construct, Select and Update builders that allow developers to create queries without resorting to StringBuilders or similar solutions. The Query Builder module is an extra package and is found in the jena-querybuilder jar.\nEach of the builders has a series of methods to define the query. Each method returns the builder for easy chaining. The example:\nSelectBuilder sb = new SelectBuilder() .addVar( \u0026quot;*\u0026quot; ) .addWhere( \u0026quot;?s\u0026quot;, \u0026quot;?p\u0026quot;, \u0026quot;?o\u0026quot; ); Query q = ; produces\nSELECT * WHERE { ?s ?p ?o } Constructing Expressions Expressions are primarily used in filter and bind statements as well as in select clauses. All the standard expressions are implemented in the ExprFactory class. An ExprFactory can be retrieved from any Builder by calling the getExprFactory() method. This will create a Factory that has the same prefix mappings and the query. An alternative is to construct the ExprFactory directly, this factory will not have the prefixes defined in PrefixMapping.Extended.\nSelectBuilder builder = new SelectBuilder(); ExprFactory exprF = builder.getExprFactory() .addPrefix( \u0026quot;cf\u0026quot;, \u0026quot;\u0026quot;) builder.addVar( exprF.floor( ?v ), ?floor ) .addWhere( ?s, \u0026quot;cf:air_temperature\u0026quot;, ?v ) Update Builder The UpdateBuilder is used to create Update, UpdateDeleteWhere or UpdateRequest objects. When an UpdateRequest is built is contains a single Update object as defined by the UpdateBuilder. Update objects can be added to an UpdateRequest using the appendTo() method.\nVar subj = Var.alloc( \u0026quot;s\u0026quot; ); Var obj = Var.alloc( \u0026quot;o\u0026quot; ); UpdateBuilder builder = new UpdateBuilder( PrefixMapping.Standard) .addInsert( subj, \u0026quot;rdfs:comment\u0026quot;, obj ) .addWhere( subj, \u0026quot;dc:title\u0026quot;, obj); UpdateRequest req = builder.buildRequest(); UpdateBuilder builder2 = new UpdateBuilder() .addPrefix( \u0026quot;dc\u0026quot;, \u0026quot;\u0026quot;) .addDelete( subj, \u0026quot;?p\u0026quot;, obj) .where( subj, dc:creator, \u0026quot;me\u0026quot;) .appendTo( req ); Where Builder In some use cases it is desirable to create a where clause without constructing an entire query. The WhereBuilder is designed to fit this need. For example to construct the query:\nPREFIX rdfs: \u0026lt;\u0026gt; PREFIX foaf: \u0026lt;\u0026gt; SELECT ?page ?type WHERE { ?s foaf:page ?page . { ?s rdfs:label \u0026quot;Microsoft\u0026quot;@en . BIND (\u0026quot;A\u0026quot; as ?type) } UNION { ?s rdfs:label \u0026quot;Apple\u0026quot;@en . BIND (\u0026quot;B\u0026quot; as ?type) } } You could use a WhereBuilder to construct the union queries and add them to a Select or other query builder.\nWhereBuilder whereBuilder = new WhereBuilder() .addPrefix( \u0026quot;rdfs\u0026quot;, \u0026quot;\u0026quot; ) addWhere( \u0026quot;?s\u0026quot;, \u0026quot;rdfs:label\u0026quot;, \u0026quot;'Microsoft'@en\u0026quot; ) .addBind( \u0026quot;'A'\u0026quot;, \u0026quot;?type\u0026quot;) .addUnion( new WhereBuilder() .addPrefix( \u0026quot;rdfs\u0026quot;, \u0026quot;\u0026quot; ) .addWhere( \u0026quot;?s\u0026quot;, \u0026quot;rdfs:label\u0026quot;, \u0026quot;'Apple'@en\u0026quot; ) .addBind( \u0026quot;'B'\u0026quot;, \u0026quot;?type\u0026quot;) ); SelectBuilder builder = new SelectBuilder() .addPrefix( \u0026quot;rdfs\u0026quot;, \u0026quot;\u0026quot; ) .addPrefix( \u0026quot;foaf\u0026quot;, \u0026quot;\u0026quot; ); .addVar( \u0026quot;?page\u0026quot;) .addVar( \u0026quot;?type\u0026quot; ) .addWhere( \u0026quot;?s\u0026quot;, \u0026quot;foaf:page\u0026quot;, \u0026quot;?page\u0026quot; ) .addWhere( whereBuilder ); The where clauses could be built inline as:\nSelectBuilder builder = new SelectBuilder() .addPrefixs( PrefixMapping.Standard ) .addPrefix( \u0026quot;foaf\u0026quot;, \u0026quot;\u0026quot; ); .addVar( \u0026quot;?page\u0026quot;) .addVar( \u0026quot;?type\u0026quot; ) .addWhere( \u0026quot;?s\u0026quot;, \u0026quot;foaf:page\u0026quot;, \u0026quot;?page\u0026quot; ) .addWhere( new WhereBuilder() .addPrefix( \u0026quot;rdfs\u0026quot;, \u0026quot;\u0026quot; ) .addWhere( \u0026quot;?s\u0026quot;, \u0026quot;rdfs:label\u0026quot;, \u0026quot;'Microsoft'@en\u0026quot; ) .addBind( \u0026quot;'A'\u0026quot;, \u0026quot;?type\u0026quot;) .addUnion( new WhereBuilder() .addPrefix( \u0026quot;rdfs\u0026quot;, \u0026quot;\u0026quot; ) .addWhere( \u0026quot;?s\u0026quot;, \u0026quot;rdfs:label\u0026quot;, \u0026quot;'Apple'@en\u0026quot; ) .addBind( \u0026quot;'B'\u0026quot;, \u0026quot;?type\u0026quot;) ) ); Template Usage In addition to making it easier to build valid queries the QueryBuilder has a clone method. Using this a developer can create as \u0026ldquo;Template\u0026rdquo; query and add to it as necessary.\nFor example using the above query as the \u0026ldquo;template\u0026rdquo; with this code:\nSelectBuilder sb2 = sb.clone(); sb2.addPrefix( \u0026quot;foaf\u0026quot;, \u0026quot;\u0026quot; ) .addWhere( ?s, RDF.type, \u0026quot;foaf:Person\u0026quot;) ; produces\nPREFIX foaf: \u0026lt;\u0026gt; SELECT * WHERE { ?s ?p ?o . ?s \u0026lt;\u0026gt; foaf:person . } Prepared Statement Usage The query builders have the ability to replace variables with other values. This can be\nSelectBuilder sb = new SelectBuilder() .addVar( \u0026quot;*\u0026quot; ) .addWhere( \u0026quot;?s\u0026quot;, \u0026quot;?p\u0026quot;, \u0026quot;?o\u0026quot; ); sb.setVar( Var.alloc( \u0026quot;?o\u0026quot; ), NodeFactory.createURI( \u0026quot;\u0026quot; ) ) ; Query q =; produces\nSELECT * WHERE { ?s ?p \u0026lt;\u0026gt; } ","permalink":"","tags":null,"title":"Jena Query Builder - A query builder for Jena."},{"categories":null,"contents":"This is a guide to the RDF/XML I/O subsystem of Jena.\nThe RDF/XML parser is designed for use with RIOT and to have the same handling of errors, IRI resolution, and treatment of base IRIs as other RIOT readers.\nRDF/XML Input The usual way to access the RDF/XML parser is via RDFDataMgr or RDFParser.\nModel model = RDFDataMgr.loadModel(\u0026quot;data.rdf\u0026quot;); or\nModel model = RDFParser.source(\u0026quot;data.rdf\u0026quot;).toModel(); The original \u0026ldquo;ARP\u0026rdquo; parser is still available bu tmaybe pahsed out. To access the legacy parser, use the context symbol RIOT.symRDFXML0 to true or `\u0026ldquo;true\u0026rdquo;.\nModel model = RDFParser.source(\u0026quot;\u0026quot;data.rdf\u0026quot;) .set(RIOT.symRDFXML0, true) .parse(dest); This applies to the command line:\nriot --set rdfxml:rdfxml0=true data.rdf This can be set globally in the JVM:\nRIOT.getContext().set(RIOT.symRDFXML0, \u0026quot;true\u0026quot;); Details of legacy RDF/XML input.\nDetails of the original Jena RDF/XML parser, ARP.\nRDF/XML Output Two forms for output are provided:\nThe default output Lang.RDFXML, historically called \u0026ldquo;RDF/XML-ABBREV\u0026rdquo;, which also has a format name RDFFormat.RDFXML_PRETTY. It produces readable output. It requires working memory to analyse the data to be written and it is not streaming.\nFor efficient, streaming output, the basic RDF/XML RDFFormat.RDFXML_PLAIN works for data of any size. It outputs each subject together with all property values without using the full features of RDF/XML.\nFor \u0026ldquo;RDF/XML-ABBREV\u0026rdquo;:\nRDFDataMgr.write(System.out, model, Lang.RDFXML); or\nRDFWriter.source(model).lang(Lang.RDFXML).output(System.out); and for plain RDF/XML:\nRDFDataMgr.write(System.out, model, RDFFormat.RDFXML_PLAIN); or\nRDFWriter.source(model).format(RDFFormat.RDFXML_PLAIN).output(System.out); Details of legacy RDF/XML output.\n","permalink":"","tags":null,"title":"Jena RDF XML"},{"categories":null,"contents":"Legacy Documentation : may not be up-to-date\nThis is a guide to the RDF/XML I/O subsystem of Jena, ARP. The first section gives a quick introduction to the I/O subsystem. The other sections are aimed at users wishing to use advanced features within the RDF/XML I/O subsystem.\nOther content related to Jena RDF/XML How-To includes:\nDetails of ARP, the Jena RDF/XML parser Quick Introduction The main I/O methods in Jena use InputStreams and OutputStreams. This is import to correctly handle character sets.\nThese methods are found on the Model interface. These are:\nModel [read](/documentation/javadoc/jena/org/apache/jena/rdf/model/Model.html#read(, java.lang.String))( in, java.lang.String base) Add statements from an RDF/XML serialization Model [read](/documentation/javadoc/jena/org/apache/jena/rdf/model/Model.html#read(, java.lang.String, java.lang.String))( in, java.lang.String base, java.lang.String lang) Add RDF statements represented in language lang to the model. Model read(java.lang.String url) Add the RDF statements from an XML document. Model write( out) Write the model as an XML document. Model [write](/documentation/javadoc/jena/org/apache/jena/rdf/model/Model.html#write(, java.lang.String))( out, java.lang.String lang) Write a serialized representation of a model in a specified language. Model [write](/documentation/javadoc/jena/org/apache/jena/rdf/model/Model.html#write(, java.lang.String, java.lang.String))( out, java.lang.String lang, java.lang.String base) Write a serialized representation of a model in a specified language. The built-in languages are \u0026quot;RDF/XML\u0026quot;, \u0026quot;RDF/XML-ABBREV\u0026quot; as well as \u0026quot;N-TRIPLE\u0026quot;, and \u0026quot;TURTLE\u0026quot;.\nThere are also methods which use Readers and Writers. Do not use them, unless you are sure it is correct to. In advanced applications, they are useful, see below; and there is every intention to continue to support them. The RDF/XML parser now checks to see if the …) calls are being abused, and issues ERR_ENCODING_MISMATCH and WARN_ENCODING_MISMATCH errors. Most incorrect usage of Readers for RDF/XML input will result in such errors. Most incorrect usage of Writers for RDF/XML output will produce correct XML by using an appropriate XML declaration giving the encoding - e.g.\n\u0026lt;?xml version='1.0' encoding='ISO-8859-15'?\u0026gt; However, such XML is less portable than XML in UTF-8. Using the Model.write(OutputStream …) methods allows the Jena system code to choose UTF-8 encoding, which is the best choice.\nRDF/XML, RDF/XML-ABBREV For input, both of these are the same, and fully implement the RDF Syntax Recommendation, see conformance.\nFor output, \u0026quot;RDF/XML\u0026quot;, produces regular output reasonably efficiently, but it is not readable. In contrast, \u0026quot;RDF/XML-ABBREV\u0026quot;, produces readable output without much regard to efficiency.\nAll the readers and writers for RDF/XML are configurable, see below, input and output.\nCharacter Encoding Issues The easiest way to not read or understand this section is always to use InputStreams and OutputStreams with Jena, and to never use Readers and Writers. If you do this, Jena will do the right thing, for the vast majority of users. If you have legacy code that uses Readers and Writers, or you have special needs with respect to encodings, then this section may be helpful. The last part of this section summarizes the character encodings supported by Jena.\nCharacter encoding is the way that characters are mapped to bytes, shorts or ints. There are many different character encodings. Within Jena, character encodings are important in their relationship to Web content, particularly RDF/XML files, which cannot be understood without knowing the character encoding, and in relationship to Java, which provides support for many character encodings.\nThe Java approach to encodings is designed for ease of use on a single machine, which uses a single encoding; often being a one-byte encoding, e.g. for European languages which do not need thousands of different characters.\nThe XML approach is designed for the Web which uses multiple encodings, and some of them requiring thousands of characters.\nOn the Web, XML files, including RDF/XML files, are by default encoded in \u0026ldquo;UTF-8\u0026rdquo; (Unicode). This is always a good choice for creating content, and is the one used by Jena by default. Other encodings can be used, but may be less interoperable. Other encodings should be named using the canonical name registered at IANA, but other systems have no obligations to support any of these, other than UTF-8 and UTF-16.\nWithin Java, encodings appear primarily with the InputStreamReader and OutputStreamWriter classes, which convert between bytes and characters using a named encoding, and with their subclasses, FileReader and FileWriter, which convert between bytes in the file and characters using the default encoding of the platform. It is not possible to change the encoding used by a Reader or Writer while it is being used. The default encoding of the platform depends on a large range of factors. This default encoding may be useful for communicating with other programs on the same platform. Sometimes the default encoding is not registered at IANA, and so Jena application developers should not use the default encoding for Web content, but use UTF-8.\nEncodings Supported in Jena 2.2 and later On RDF/XML input any encoding supported by Java can be used. If this is not a canonical name registered at IANA a warning message is produced. Some encodings have better support in Java 1.5 than Java 1.4; for such encodings a warning message is produced on Java 1.4, suggesting upgrading.\nOn RDF/XML output any encoding supported by Java can be used, by constructing an OutputStreamWriter using that encoding, and using that for output. If the encoding is not registered at IANA then a warning message is produced. Some encodings have better support in Java 1.5 than Java 1.4; for such encodings a warning message is produced on Java 1.4, suggesting upgrading.\nJava can be configured either with or without a jar of extra encodings on the classpath. This jar is charsets.jar and sits in the lib directory of the Java Runtime. If this jar is not on your classpath then the range of encodings supported is fairly small.\nThe encodings supported by Java are listed by Sun, for 1.4.2, and 1.5.0. For an encoding that is not in these lists it is possible to write your own transcoder as documented in the java.nio.charset package documentation.\nEarlier versions of Jena supported fewer encodings.\nWhen to Use Reader and Writer? Infrequently.\nDespite the character encoding issues, it is still sometimes appropriate to use Readers and Writers with Jena I/O. A good example is using Readers and Writers into StringBuffers in memory. These do not need to be encoded and decoded so a character encoding does not need to be specified. Other examples are when an advanced user explicitly wishes to correctly control the encoding.\nModel [read](/documentation/javadoc/jena/org/apache/jena/rdf/model/Model.html#read(, java.lang.String))( reader, java.lang.String base) Using this method is often a mistake. Model [read](/documentation/javadoc/jena/org/apache/jena/rdf/model/Model.html#read(, java.lang.String, java.lang.String))( reader, java.lang.String base, java.lang.String lang) Using this method is often a mistake. Model write( writer) Caution! Write the model as an XML document. Model [write](/documentation/javadoc/jena/org/apache/jena/rdf/model/Model.html#write(, java.lang.String))( writer, java.lang.String lang) Caution! Write a serialized representation of a model in a specified language. Model [write](/documentation/javadoc/jena/org/apache/jena/rdf/model/Model.html#write(, java.lang.String, java.lang.String))( writer, java.lang.String lang, java.lang.String base) Caution! Write a serialized representation of a model in a specified language. Incorrect use of these read(Reader, …) methods results in warnings and errors with RDF/XML and RDF/XML-ABBREV (except in a few cases where the incorrect use cannot be automatically detected). Incorrect use of the write(Writer, …) methods results in peculiar XML declarations such as \u0026lt;?xml version=\u0026quot;1.0\u0026quot; encoding=\u0026quot;WINDOWS-1252\u0026quot;?\u0026gt;. This would reflect that the character encoding you used (probably without realizing) in your Writer is registered with IANA under the name \u0026ldquo;WINDOWS-1252\u0026rdquo;. The resulting XML is of reduced portability as a result. Glenn Marcy notes:\nsince UTF-8 and UTF-16 are the only encodings REQUIRED to be understood by all conformant XML processors, even ISO-8859-1 would technically be on shaky ground if not for the fact that it is in such widespread use that every reasonable XML processor supports it.With N-TRIPLE incorrect use is usually benign, since N-TRIPLE is ascii based.\nCharacter encoding issues of N3 are not well-defined; hence use of these methods may require changes in the future. Use of the InputStream and OutputStream methods will allow your code to work with future versions of Jena which do the right thing - whatever that is. Currently the OutputStream methods use UTF-8 encoding.\nIntroduction to Advanced Jena I/O The RDF/XML input and output is configurable. However, to configure it, it is necessary to access an RDFReader or RDFWriter object that remains hidden in the simpler interface above.\nThe four vital calls in the Model interface are:\nRDFReader getReader() Return an RDFReader instance for the default serialization language. RDFReader getReader(java.lang.String lang) Return an RDFReader instance for the specified serialization language. RDFReader getWriter() Return an RDFWriter instance for the default serialization language. RDFReader getWriter(java.lang.String lang) An RDFWriter instance for the specified serialization language. Each of these calls returns an RDFReader or RDFWriter that can be used to read or write any Model (not just the one which created it). As well as the necessary [read](/documentation/javadoc/jena/org/apache/jena/rdf/model/RDFReader.html#read(org.apache.jena.rdf.model.Model,, java.lang.String)) and [write](/documentation/javadoc/jena/org/apache/jena/rdf/model/RDFWriter.html#write(org.apache.jena.rdf.model.Model,, java.lang.String)) methods, these interfaces provide:\nRDFErrorHandler setErrorHandler( RDFErrorHandler errHandler ) Set an error handler for the reader java.lang.Object [setProperty](/documentation/javadoc/jena/org/apache/jena/rdf/model/RDFReader.html#setProperty(java.lang.String, java.lang.Object))(java.lang.String propName, java.lang.Object propValue) Set the value of a reader property. Setting properties, or the error handler, on an RDFReader or an RDFWriter allows the programmer to access non-default behaviour. Moreover, since the RDFReader and RDFWriter is not bound to a specific Model, a typical idiom is to create the RDFReader or RDFWriter on system initialization, to set the appropriate properties so that it behaves exactly as required in your application, and then to do all subsequent I/O through it.\nModel m = ModelFactory.createDefaultModel(); RDFWriter writer = m.getRDFWriter(); m = null; // m is no longer needed. writer.setErrorHandler(myErrorHandler); writer.setProperty(\u0026quot;showXmlDeclaration\u0026quot;,\u0026quot;true\u0026quot;); writer.setProperty(\u0026quot;tab\u0026quot;,\u0026quot;8\u0026quot;); writer.setProperty(\u0026quot;relativeURIs\u0026quot;,\u0026quot;same-document,relative\u0026quot;); … Model marray[]; … for (int i=0; i\u0026lt;marray.length; i++) { … OutputStream out = new FileOutputStream(\u0026quot;foo\u0026quot; + i + \u0026quot;.rdf\u0026quot;); writer.write(marray[i], out, \u0026quot;\u0026quot;); out.close(); } Note that all of the current implementations are synchronized, so that a specific RDFReader cannot be reading two different documents at the same time. In a multi-threaded application this may suggest a need for a pool of RDFReaders and/or RDFWriters, or alternatively to create, initialize, use and discard them as needed.\nFor N-TRIPLE there are currently no properties supported for either the RDFReader or the RDFWriter. Hence this idiom above is not very helpful, and just using the Model.write() methods may prove easier.\nFor RDF/XML and RDF/XML-ABBREV, there are many options in both the RDFReader and the RDFWriter. N3 has options on the RDFWriter. These options are detailed below. For RDF/XML they are also found in the JavaDoc for JenaReader.[setProperty](/documentation/javadoc/jena/org/apache/jena/rdfxml/xmlinput/JenaReader.html#setProperty(java.lang.String, java.lang.Object))(String, Object) and RDFXMLWriterI.[setProperty](/documentation/javadoc/jena/org/apache/jena/xmloutput/RDFXMLWriterI.html#setProperty(java.lang.String, java.lang.Object))(String, Object).\nAdvanced RDF/XML Input For access to these advanced features, first get an RDFReader object that is an instance of an ARP parser, by using the getReader() method on any Model. It is then configured using the [setProperty](/documentation/javadoc/jena/org/apache/jena/rdfxml/xmlinput/JenaReader.html#setProperty(java.lang.String, java.lang.Object))(String, Object) method. This changes the properties for parsing RDF/XML. Many of the properties change the RDF parser, some change the XML parser. (The Jena RDF/XML parser, ARP, implements the RDF grammar over a Xerces2-J XML parser). However, changing the features and properties of the XML parser is not likely to be useful, but was easy to implement.\n[setProperty](/documentation/javadoc/jena/org/apache/jena/rdfxml/xmlinput/JenaReader.html#setProperty(java.lang.String, java.lang.Object))(String, Object) can be used to set and get:\nARP properties These allow fine grain control over the extensive error reporting capabilities of ARP. And are detailed directly below. SAX2 features See Xerces features. Value should be given as a String \u0026quot;true\u0026quot; or \u0026quot;false\u0026quot; or a Boolean. SAX2 properties See Xerces properties. Xerces features See Xerces features. Value should be given as a String \u0026quot;true\u0026quot; or \u0026quot;false\u0026quot; or a Boolean. Xerces properties See Xerces properties. ARP properties An ARP property is referred to either by its property name, (see below) or by an absolute URL of the form\u0026lt;PropertyName\u0026gt;. The value should be a String, an Integer or a Boolean depending on the property.\nARP property names and string values are case insensitive.\nProperty Name Description Value class Legal Values iri-rules Set the engine for checking and resolving. \u0026quot;strict\u0026quot; sets the IRI engine with rules for valid IRIs, XLink and RDF; it does not permit spaces in IRIs. \u0026quot;iri\u0026quot;sets the IRI engine to IRI (RFC 3986, RFC 3987) . The default is \u0026quot;lax\u0026quot;(for backwards compatibility), the rules for RDF URI references only, which does permit spaces although the use of spaces is not good practice. String lax\nstrict\niri error-mode ARPOptions.setDefaultErrorMode() ARPOptions.setLaxErrorMode()\nARPOptions.setStrictErrorMode()\nARPOptions.setStrictErrorMode(int)\nThis allows a coarse-grained approach to control of error handling. Setting this property is equivalent to setting many of the fine-grained error handling properties. String default\nlax\nstrict\nstrict-ignore\nstrict-warning\nstrict-error\nstrict-fatal embedding ARPOptions.setEmbedding(boolean) This sets ARP to look for RDF embedded within an enclosing XML document. String or Boolean true\nfalse ERR_\u0026lt;XXX\u0026gt; WARN_\u0026lt;XXX\u0026gt;\nIGN_\u0026lt;XXX\u0026gt; See ARPErrorNumbers for a complete list of the error conditions detected. Setting one of these properties is equivalent to the method ARPOptions.setErrorMode(int, int). Thus fine-grained control over the behaviour in response to specific error conditions is possible. String or Integer EM_IGNORE\nEM_WARNING\nEM_ERROR\nEM_FATAL As an example, if you are working in an environment with legacy RDF data that uses unqualified RDF attributes such as \u0026ldquo;about\u0026rdquo; instead of \u0026ldquo;rdf:about\u0026rdquo;, then the following code is appropriate:\nModel m = ModelFactory.createDefaultModel(); RDFReader arp = m.getReader(); m = null; // m is no longer needed. // initialize arp // Do not warn on use of unqualified RDF attributes. arp.setProperty(\u0026quot;WARN_UNQUALIFIED_RDF_ATTRIBUTE\u0026quot;,\u0026quot;EM_IGNORE\u0026quot;); … InputStream in = new FileInputStream(fname);,in,url); in.close(); As a second example, suppose you wish to work in strict mode, but allow \u0026quot;daml:collection\u0026quot;, the following works:\n… arp.setProperty(\u0026quot;error-mode\u0026quot;, \u0026quot;strict\u0026quot; ); arp.setProperty(\u0026quot;IGN_DAML_COLLECTION\u0026quot;,\u0026quot;EM_IGNORE\u0026quot;); … The other way round does not work.\n… arp.setProperty(\u0026quot;IGN_DAML_COLLECTION\u0026quot;,\u0026quot;EM_IGNORE\u0026quot;); arp.setProperty(\u0026quot;error-mode\u0026quot;, \u0026quot;strict\u0026quot; ); … This is because in strict mode IGN_DAML_COLLECTION is treated as an error, and so the second call to setProperty overwrites the effect of the first.\nThe IRI rules and resolver can be set on a per-reader basis:\nInputStream in = ... ; String baseURI = ... ; Model model = ModelFactory.createDefaultModel(); RDFReader r = model.getReader(\u0026quot;RDF/XML\u0026quot;); r.setProperty(\u0026quot;iri-rules\u0026quot;, \u0026quot;strict\u0026quot;) ; r.setProperty(\u0026quot;error-mode\u0026quot;, \u0026quot;strict\u0026quot;) ; // Warning will be errors. // Alternative to the above \u0026quot;error-mode\u0026quot;: set specific warning to be an error. //r.setProperty( \u0026quot;WARN_MALFORMED_URI\u0026quot;, ARPErrorNumbers.EM_ERROR) ;, in, baseURI) ; in.close(); The global default IRI engine can be set with:\nARPOptions.setIRIFactoryGlobal(IRIFactory.iriImplementation()) ; or other IRI rule engine from IRIFactory.\nInterrupting ARP ARP can be interrupted using the Thread.interrupt() method. This causes an ERR_INTERRUPTED error during the parse, which is usually treated as a fatal error.\nHere is an illustrative code sample:\nARP a = new ARP(); final Thread arpt = Thread.currentThread(); Thread killt = new Thread(new Runnable() { public void run() { try { Thread.sleep(tim); } catch (InterruptedException e) { } arpt.interrupt(); } }); killt.start(); try { in = new FileInputStream(fileName); a.load(in); in.close(); fail(\u0026quot;Thread was not interrupted.\u0026quot;); } catch (SAXParseException e) { } Advanced RDF/XML Output The first RDF/XML output question is whether to use the \u0026quot;RDF/XML\u0026quot; or \u0026quot;RDF/XML-ABBREV\u0026quot; writer. While some of the code is shared, these two writers are really very different, resulting in different but equivalent output. RDF/XML-ABBREV is slower, but should produce more readable XML.\nFor access to advanced features, first get an RDFWriter object, of the appropriate language, by using getWriter(\u0026quot;RDF/XML\u0026quot;) or getWriter(\u0026quot;RDF/XML-ABBREV\u0026quot;) on any Model. It is then configured using the [setProperty](/documentation/javadoc/jena/org/apache/jena/rdfxml/xmlinput/JenaReader.html#setProperty(java.lang.String, java.lang.Object))(String, Object) method. This changes the properties for writing RDF/XML.\nProperties to Control RDF/XML Output Property NameDescriptionValue classLegal Values xmlbase The value to be included for an xml:base attribute on the root element in the file. String A URI string, or null (default) longId Whether to use long or short id's for anon resources. Short id's are easier to read and are the default, but can run out of memory on very large models. String or Boolean \"true\", \"false\" (default) allowBadURIs URIs in the graph are, by default, checked prior to serialization. String or Boolean \"true\", \"false\" (default) relativeURIs What sort of relative URIs should be used. A comma separated list of options: same-document\nsame-document references (e.g. \u0026quot;\u0026quot; or \u0026ldquo;#foo\u0026rdquo;) network\nnetwork paths e.g. \u0026quot;//\u0026quot; omitting the URI scheme absolute\nabsolute paths e.g. \u0026quot;/foo\u0026quot; omitting the scheme and authority relative\nrelative path not beginning in \u0026quot;../\u0026quot; parent\nrelative path beginning in \u0026quot;../\u0026quot; grandparent\nrelative path beginning in \u0026quot;../../\u0026quot; The default value is \u0026ldquo;same-document, absolute, relative, parent\u0026rdquo;. To switch off relative URIs use the value \u0026ldquo;\u0026rdquo;. Relative URIs of any of these types are output where possible if and only if the option has been specified.\nString \u0026nbsp; showXmlDeclaration If true, an XML Declaration is included in the output, if false no XML declaration is included. The default behaviour only gives an XML Declaration when asked to write to an `OutputStreamWriter` that uses some encoding other than UTF-8 or UTF-16. In this case the encoding is shown in the XML declaration. To ensure that the encoding attribute is shown in the XML declaration either: Set this option to true and use the write(Model,Writer,String) variant with an appropriate OutputStreamWriter. Or set this option to false, and write the declaration to an OutputStream before calling write(Model,OutputStream,String). true, \"true\", false, \"false\" or \"default\" can be true, false or \"default\" (null) showDoctypeDeclaration If true, an XML Doctype declaration is included in the output. This declaration includes a `!ENTITY` declaration for each prefix mapping in the model, and any attribute value that starts with the URI of that mapping is written as starting with the corresponding entity invocation. String or Boolean true, false, \"true\", \"false\" tab The number of spaces with which to indent XML child elements. String or Integer positive integer \"2\" is the default attributeQuoteChar How to write XML attributes. String \"\\\"\" or \"'\" blockRules A list of `Resource` or a `String` being a comma separated list of fragment IDs from []( indicating grammar rules that will not be used. Rules that can be blocked are: section-Reification (RDFSyntax.sectionReification) section-List-Expand (RDFSyntax.sectionListExpand) parseTypeLiteralPropertyElt (RDFSyntax.parseTypeLiteralPropertyElt) parseTypeResourcePropertyElt (RDFSyntax.parseTypeLiteralPropertyElt) parseTypeCollectionPropertyElt (RDFSyntax.parseTypeCollectionPropertyElt) idAttr (RDFSyntax.idAttr) propertyAttr (RDFSyntax.propertyAttr) In addition \u0026quot;daml:collection\u0026quot; (DAML_OIL.collection) can be blocked. Blocking idAttr also blocks section-Reification. By default, rule propertyAttr is blocked. For the basic writer (RDF/XML) only parseTypeLiteralPropertyElt has any effect, since none of the other rules are implemented by that writer.\nResource[] or String prettyTypes Only for the RDF/XML-ABBREV writer. This is a list of the types of the principal objects in the model. The writer will tend to create RDF/XML with resources of these types at the top level. Resource[] As an example,\nRDFWriter w = m.getWriter(\u0026quot;RDF/XML-ABBREV\u0026quot;); w.setProperty(\u0026quot;attributeQuoteChar\u0026quot;,\u0026quot;'\u0026quot;); w.setProperty(\u0026quot;showXMLDeclaration\u0026quot;,\u0026quot;true\u0026quot;); w.setProperty(\u0026quot;tab\u0026quot;,\u0026quot;1\u0026quot;); w.setProperty(\u0026quot;blockRules\u0026quot;, \u0026quot;daml:collection,parseTypeLiteralPropertyElt,\u0026quot; +\u0026quot;parseTypeResourcePropertyElt,parseTypeCollectionPropertyElt\u0026quot;); creates a writer that does not use rdf:parseType (preferring rdf:datatype for rdf:XMLLiteral), indents only a little, and produces the XMLDeclaration. Attributes are used, and are quoted with \u0026quot;'\u0026quot;.\nNote that property attributes are not used at all, by default. However, the RDF/XML-ABBREV writer includes a rule to produce property attributes when the value does not contain any spaces. This rule is normally switched off. This rule can be turned on selectively by using the blockRules property as detailed above.\nConformance The RDF/XML I/O endeavours to conform with the RDF Syntax Recommendation.\nThe parser must be set to strict mode. (Note that, the conformant behaviour for rdf:parseType=\u0026quot;daml:collection\u0026quot; is to silently turn \u0026quot;daml:collection\u0026quot; into \u0026quot;Literal\u0026quot;).\nThe RDF/XML writer is conformant, but does not exercise much of the grammar.\nThe RDF/XML-ABBREV writer exercises all of the grammar and is conformant except that it uses the daml:collection construct for DAML ontologies. This non-conformant behaviour can be switched off using the blockRules property.\nFaster RDF/XML I/O To optimise the speed of writing RDF/XML it is suggested that all URI processing is turned off. Also do not use RDF/XML-ABBREV. It is unclear whether the longId attribute is faster or slower; the short IDs have to be generated on the fly and a table maintained during writing. The longer IDs are long, and hence take longer to write. The following creates a faster writer:\nModel m; … … RDFWriter fasterWriter = m.getWriter(\u0026quot;RDF/XML\u0026quot;); fasterWriter.setProperty(\u0026quot;allowBadURIs\u0026quot;,\u0026quot;true\u0026quot;); fasterWriter.setProperty(\u0026quot;relativeURIs\u0026quot;,\u0026quot;\u0026quot;); fasterWriter.setProperty(\u0026quot;tab\u0026quot;,\u0026quot;0\u0026quot;); When reading RDF/XML the check for reuse of rdf:ID has a memory overhead, which can be significant for very large files. In this case, this check can be suppressed by telling ARP to ignore this error.\nModel m; … … RDFReader bigFileReader = m.getReader(\u0026quot;RDF/XML\u0026quot;); bigFileReader.setProperty(\u0026quot;WARN_REDEFINITION_OF_ID\u0026quot;,\u0026quot;EM_IGNORE\u0026quot;); … ","permalink":"","tags":null,"title":"Jena RDF/XML How-To"},{"categories":null,"contents":"Legacy Documentation : may not be up-to-date\nOriginal RDF/XML HowTo.\nThis is a guide to the RDF/XML legacy input subsystem of Jena, ARP.\nAdvanced RDF/XML Input For access to these advanced features, first get an RDFReader object that is an instance of an ARP parser, by using the getReader() method on any Model. It is then configured using the [setProperty](/documentation/javadoc/jena/org/apache/jena/rdfxml/xmlinput0/JenaReader.html#setProperty(java.lang.String, java.lang.Object))(String, Object) method. This changes the properties for parsing RDF/XML. Many of the properties change the RDF parser, some change the XML parser. (The Jena RDF/XML parser, ARP, implements the RDF grammar over a Xerces2-J XML parser). However, changing the features and properties of the XML parser is not likely to be useful, but was easy to implement.\n[setProperty](/documentation/javadoc/jena/org/apache/jena/rdfxml/xmlinput0/JenaReader.html#setProperty(java.lang.String, java.lang.Object))(String, Object) can be used to set and get:\nARP properties These allow fine grain control over the extensive error reporting capabilities of ARP. And are detailed directly below. SAX2 features See Xerces features. Value should be given as a String \u0026quot;true\u0026quot; or \u0026quot;false\u0026quot; or a Boolean. SAX2 properties See Xerces properties. Xerces features See Xerces features. Value should be given as a String \u0026quot;true\u0026quot; or \u0026quot;false\u0026quot; or a Boolean. Xerces properties See Xerces properties. ARP properties An ARP property is referred to either by its property name, (see below) or by an absolute URL of the form\u0026lt;PropertyName\u0026gt;. The value should be a String, an Integer or a Boolean depending on the property.\nARP property names and string values are case insensitive.\nProperty Name Description Value class Legal Values iri-rules Set the engine for checking and resolving. \u0026quot;strict\u0026quot; sets the IRI engine with rules for valid IRIs, XLink and RDF; it does not permit spaces in IRIs. \u0026quot;iri\u0026quot;sets the IRI engine to IRI (RFC 3986, RFC 3987) . The default is \u0026quot;lax\u0026quot;(for backwards compatibility), the rules for RDF URI references only, which does permit spaces although the use of spaces is not good practice. String lax\nstrict\niri error-mode ARPOptions.setDefaultErrorMode() ARPOptions.setLaxErrorMode()\nARPOptions.setStrictErrorMode()\nARPOptions.setStrictErrorMode(int)\nThis allows a coarse-grained approach to control of error handling. Setting this property is equivalent to setting many of the fine-grained error handling properties. String default\nlax\nstrict\nstrict-ignore\nstrict-warning\nstrict-error\nstrict-fatal embedding ARPOptions.setEmbedding(boolean) This sets ARP to look for RDF embedded within an enclosing XML document. String or Boolean true\nfalse ERR_\u0026lt;XXX\u0026gt; WARN_\u0026lt;XXX\u0026gt;\nIGN_\u0026lt;XXX\u0026gt; See ARPErrorNumbers for a complete list of the error conditions detected. Setting one of these properties is equivalent to the method ARPOptions.setErrorMode(int, int). Thus fine-grained control over the behaviour in response to specific error conditions is possible. String or Integer EM_IGNORE\nEM_WARNING\nEM_ERROR\nEM_FATAL To set ARP properties, create a map of values to be set and put this in parser context:\nMap\u0026lt;String, Object\u0026gt; properties = new HashMap\u0026lt;\u0026gt;(); // See class ARPErrorNumbers for the possible ARP properties. properties.put(\u0026#34;WARN_BAD_NAME\u0026#34;, \u0026#34;EM_IGNORE\u0026#34;); // Build and run a parser Model model = RDFParser.create() .lang(Lang.RDFXML) .source(...) .set(SysRIOT.sysRdfReaderProperties, properties) .base(\u0026#34;http://base/\u0026#34;) .toModel(); System.out.println(\u0026#34;== Parsed data output in Turtle\u0026#34;); RDFDataMgr.write(System.out, model, Lang.TURTLE); See example\nLegacy Example\nAs an example, if you are working in an environment with legacy RDF data that uses unqualified RDF attributes such as \u0026ldquo;about\u0026rdquo; instead of \u0026ldquo;rdf:about\u0026rdquo;, then the following code is appropriate:\nModel m = ModelFactory.createDefaultModel(); RDFReader arp = m.getReader(); m = null; // m is no longer needed. // initialize arp // Do not warn on use of unqualified RDF attributes. arp.setProperty(\u0026quot;WARN_UNQUALIFIED_RDF_ATTRIBUTE\u0026quot;,\u0026quot;EM_IGNORE\u0026quot;); … InputStream in = new FileInputStream(fname);,in,url); in.close(); As a second example, suppose you wish to work in strict mode, but allow \u0026quot;daml:collection\u0026quot;, the following works:\n… arp.setProperty(\u0026quot;error-mode\u0026quot;, \u0026quot;strict\u0026quot; ); arp.setProperty(\u0026quot;IGN_DAML_COLLECTION\u0026quot;,\u0026quot;EM_IGNORE\u0026quot;); … The other way round does not work.\n… arp.setProperty(\u0026quot;IGN_DAML_COLLECTION\u0026quot;,\u0026quot;EM_IGNORE\u0026quot;); arp.setProperty(\u0026quot;error-mode\u0026quot;, \u0026quot;strict\u0026quot; ); … This is because in strict mode IGN_DAML_COLLECTION is treated as an error, and so the second call to setProperty overwrites the effect of the first.\nThe IRI rules and resolver can be set on a per-reader basis:\nInputStream in = ... ; String baseURI = ... ; Model model = ModelFactory.createDefaultModel(); RDFReader r = model.getReader(\u0026quot;RDF/XML\u0026quot;); r.setProperty(\u0026quot;iri-rules\u0026quot;, \u0026quot;strict\u0026quot;) ; r.setProperty(\u0026quot;error-mode\u0026quot;, \u0026quot;strict\u0026quot;) ; // Warning will be errors. // Alternative to the above \u0026quot;error-mode\u0026quot;: set specific warning to be an error. //r.setProperty( \u0026quot;WARN_MALFORMED_URI\u0026quot;, ARPErrorNumbers.EM_ERROR) ;, in, baseURI) ; in.close(); The global default IRI engine can be set with:\nARPOptions.setIRIFactoryGlobal(IRIFactory.iriImplementation()) ; or other IRI rule engine from IRIFactory.\nFurther details Details of ARP, the Jena RDF/XML parser\n","permalink":"","tags":null,"title":"Jena RDF/XML Input How-To"},{"categories":null,"contents":"Legacy Documentation : may not be up-to-date\nOriginal RDF/XML HowTo.\nAdvanced RDF/XML Output Two forms for output are provided: pretty printed RDF/XML (\u0026ldquo;RDF/XML-ABBREV\u0026rdquo;) or plain RDF/XML\nWhile some of the code is shared, these two writers are really very different, resulting in different but equivalent output. \u0026ldquo;RDF/XML-ABBREV\u0026rdquo; is slower, but should produce more readable XML.\nProperties to Control RDF/XML Output Property NameDescriptionValue classLegal Values xmlbase The value to be included for an xml:base attribute on the root element in the file. String A URI string, or null (default) longId Whether to use long or short id's for anon resources. Short id's are easier to read and are the default, but can run out of memory on very large models. String or Boolean \"true\", \"false\" (default) allowBadURIs URIs in the graph are, by default, checked prior to serialization. String or Boolean \"true\", \"false\" (default) relativeURIs What sort of relative URIs should be used. A comma separated list of options: same-document\nsame-document references (e.g. \u0026quot;\u0026quot; or \u0026ldquo;#foo\u0026rdquo;) network\nnetwork paths e.g. \u0026quot;//\u0026quot; omitting the URI scheme absolute\nabsolute paths e.g. \u0026quot;/foo\u0026quot; omitting the scheme and authority relative\nrelative path not beginning in \u0026quot;../\u0026quot; parent\nrelative path beginning in \u0026quot;../\u0026quot; grandparent\nrelative path beginning in \u0026quot;../../\u0026quot; The default value is \u0026ldquo;same-document, absolute, relative, parent\u0026rdquo;. To switch off relative URIs use the value \u0026ldquo;\u0026rdquo;. Relative URIs of any of these types are output where possible if and only if the option has been specified.\nString \u0026nbsp; showXmlDeclaration If true, an XML Declaration is included in the output, if false no XML declaration is included. The default behaviour only gives an XML Declaration when asked to write to an `OutputStreamWriter` that uses some encoding other than UTF-8 or UTF-16. In this case the encoding is shown in the XML declaration. To ensure that the encoding attribute is shown in the XML declaration either: Set this option to true and use the write(Model,Writer,String) variant with an appropriate OutputStreamWriter. Or set this option to false, and write the declaration to an OutputStream before calling write(Model,OutputStream,String). true, \"true\", false, \"false\" or \"default\" can be true, false or \"default\" (null) showDoctypeDeclaration If true, an XML Doctype declaration is included in the output. This declaration includes a `!ENTITY` declaration for each prefix mapping in the model, and any attribute value that starts with the URI of that mapping is written as starting with the corresponding entity invocation. String or Boolean true, false, \"true\", \"false\" tab The number of spaces with which to indent XML child elements. String or Integer positive integer \"2\" is the default attributeQuoteChar How to write XML attributes. String \"\\\"\" or \"'\" blockRules A list of `Resource` or a `String` being a comma separated list of fragment IDs from []( indicating grammar rules that will not be used. Rules that can be blocked are: section-Reification (RDFSyntax.sectionReification) section-List-Expand (RDFSyntax.sectionListExpand) parseTypeLiteralPropertyElt (RDFSyntax.parseTypeLiteralPropertyElt) parseTypeResourcePropertyElt (RDFSyntax.parseTypeLiteralPropertyElt) parseTypeCollectionPropertyElt (RDFSyntax.parseTypeCollectionPropertyElt) idAttr (RDFSyntax.idAttr) propertyAttr (RDFSyntax.propertyAttr) In addition \u0026quot;daml:collection\u0026quot; (DAML_OIL.collection) can be blocked. Blocking idAttr also blocks section-Reification. By default, rule propertyAttr is blocked. For the basic writer (RDF/XML) only parseTypeLiteralPropertyElt has any effect, since none of the other rules are implemented by that writer.\nResource[] or String prettyTypes Only for the RDF/XML-ABBREV writer. This is a list of the types of the principal objects in the model. The writer will tend to create RDF/XML with resources of these types at the top level. Resource[] To set properties on the RDF/XML writer:\n// Properties to be set. Map\u0026lt;String, Object\u0026gt; properties = new HashMap\u0026lt;\u0026gt;() ; properties.put(\u0026#34;showXmlDeclaration\u0026#34;, \u0026#34;true\u0026#34;); RDFWriter.create() .base(\u0026#34;\u0026#34;) .format(RDFFormat.RDFXML_PLAIN) .set(SysRIOT.sysRdfWriterProperties, properties) .source(model) .output(System.out); See\nLegacy example\nAs an example,\nRDFWriter w = m.getWriter(\u0026quot;RDF/XML-ABBREV\u0026quot;); w.setProperty(\u0026quot;attributeQuoteChar\u0026quot;,\u0026quot;'\u0026quot;); w.setProperty(\u0026quot;showXMLDeclaration\u0026quot;,\u0026quot;true\u0026quot;); w.setProperty(\u0026quot;tab\u0026quot;,\u0026quot;1\u0026quot;); w.setProperty(\u0026quot;blockRules\u0026quot;, \u0026quot;daml:collection,parseTypeLiteralPropertyElt,\u0026quot; +\u0026quot;parseTypeResourcePropertyElt,parseTypeCollectionPropertyElt\u0026quot;); creates a writer that does not use rdf:parseType (preferring rdf:datatype for rdf:XMLLiteral), indents only a little, and produces the XMLDeclaration. Attributes are used, and are quoted with \u0026quot;'\u0026quot;.\nNote that property attributes are not used at all, by default. However, the RDF/XML-ABBREV writer includes a rule to produce property attributes when the value does not contain any spaces. This rule is normally switched off. This rule can be turned on selectively by using the blockRules property as detailed above.\nConformance The RDF/XML I/O endeavours to conform with the RDF Syntax Recommendation.\nThe parser must be set to strict mode. (Note that, the conformant behaviour for rdf:parseType=\u0026quot;daml:collection\u0026quot; is to silently turn \u0026quot;daml:collection\u0026quot; into \u0026quot;Literal\u0026quot;).\nThe RDF/XML writer is conformant, but does not exercise much of the grammar.\nThe RDF/XML-ABBREV writer exercises all of the grammar and is conformant except that it uses the daml:collection construct for DAML ontologies. This non-conformant behaviour can be switched off using the blockRules property.\nFaster RDF/XML I/O To optimise the speed of writing RDF/XML it is suggested that all URI processing is turned off. Also do not use RDF/XML-ABBREV. It is unclear whether the longId attribute is faster or slower; the short IDs have to be generated on the fly and a table maintained during writing. The longer IDs are long, and hence take longer to write. The following creates a faster writer:\nModel m; … … RDFWriter fasterWriter = m.getWriter(\u0026quot;RDF/XML\u0026quot;); fasterWriter.setProperty(\u0026quot;allowBadURIs\u0026quot;,\u0026quot;true\u0026quot;); fasterWriter.setProperty(\u0026quot;relativeURIs\u0026quot;,\u0026quot;\u0026quot;); fasterWriter.setProperty(\u0026quot;tab\u0026quot;,\u0026quot;0\u0026quot;); When reading RDF/XML the check for reuse of rdf:ID has a memory overhead, which can be significant for very large files. In this case, this check can be suppressed by telling ARP to ignore this error.\nModel m; … … RDFReader bigFileReader = m.getReader(\u0026quot;RDF/XML\u0026quot;); bigFileReader.setProperty(\u0026quot;WARN_REDEFINITION_OF_ID\u0026quot;,\u0026quot;EM_IGNORE\u0026quot;); … ","permalink":"","tags":null,"title":"Jena RDF/XML Output How-To"},{"categories":null,"contents":"You can view a list of the open issues on Github and older open issues on JIRA. priority)\nPull requests, patches and other contributions welcome!\n","permalink":"","tags":null,"title":"Jena Roadmap"},{"categories":null,"contents":"The schemagen provided with Jena is used to convert an OWL or RDFS vocabulary into a Java class file that contains static constants for the terms in the vocabulary. This documents outlines the use of schemagen, and the various options and templates that may be used to control the output.\nSchemagen is typically invoked from the command line or from a built script (such as Ant). Synopsis of the command:\njava jena.schemagen -i \u0026lt;input\u0026gt; [-a \u0026lt;namespaceURI\u0026gt;] [-o \u0026lt;output file\u0026gt;] [-c \u0026lt;config uri\u0026gt;] [-e \u0026lt;encoding\u0026gt;] ... Schemagen is highly configurable, either with command line options or by RDF information read from a configuration file. Many other options are defined, and these are described in detail below. Note that the CLASSPATH environment variable must be set to include the Jena .jar libraries.\nSummary of configuration options For quick reference, here is a list of all of the schemagen options (both command line and configuration file). The use of these options is explained in detail below.\nTable 1: schemagen options\nCommand line option RDF config file property Meaning -a \u0026lt;uri\u0026gt; sgen:namespace The namespace URI for the vocabulary. Names with this URI as prefix are automatically included in the generated vocabulary. If not specified, the base URI of the ontology is used as a default (but note that some ontology documents don\u0026rsquo;t define a base URI). -c \u0026lt;filename\u0026gt;\n-c \u0026lt;url\u0026gt; Specify an alternative config file. --classdec \u0026lt;string\u0026gt; sgen:classdec Additional decoration for class header (such as implements) --classnamesuffix \u0026lt;string\u0026gt; sgen:classnamesuffix Option for adding a suffix to the generated class name, e.g. \u0026ldquo;Vocab\u0026rdquo;. --classSection \u0026lt;string\u0026gt; sgen:classSection Section declaration comment for class section. --classTemplate \u0026lt;string\u0026gt; sgen:classTemplate Template for writing out declarations of class resources. --datatypesSection \u0026lt;string\u0026gt; sgen:datatypesSection Section declaration comment for datatypes section. --datatypeTemplate \u0026lt;string\u0026gt; sgen:datatypeTemplate Template for writing out declarations of datatypes. --declarations \u0026lt;string\u0026gt; sgen:declarations Additional declarations to add at the top of the class. --dos sgen:dos Use MSDOS-style line endings (i.e. \\r\\n). Default is Unix-style line endings. -e \u0026lt;string\u0026gt; sgen:encoding The surface syntax of the input file (e.g. RDF/XML, N3). Defaults to RDF/XML. --footer \u0026lt;string\u0026gt; sgen:footer Template for standard text to add to the end of the file. --header \u0026lt;string\u0026gt; sgen:header Template for the file header, including the class comment. -i \u0026lt;filename\u0026gt; -i \u0026lt;url\u0026gt; sgen:input Specify the input document to load --include \u0026lt;uri\u0026gt; sgen:include Option for including non-local URI\u0026rsquo;s in vocabulary --individualsSection \u0026lt;string\u0026gt; sgen:individualsSection Section declaration comment for individuals section. --individualTemplate \u0026lt;string\u0026gt; sgen:individualTemplate Template for writing out declarations of individuals. --inference sgen:inference Causes the model that loads the document prior to being processed to apply inference rules appropriate to the language. E.g. OWL inference rules will be used on a .owl file. --marker \u0026lt;string\u0026gt; sgen:marker Specify the marker string for substitutions, default is \u0026lsquo;%\u0026rsquo; -n \u0026lt;string\u0026gt; sgen:classname The name of the generated class. The default is to synthesise a name based on input document name. --noclasses sgen:noclasses Option to suppress classes in the generated vocabulary file --nocomments sgen:noComments Turn off all comment output in the generated vocabulary --nodatatypes sgen:nodatatypes Option to suppress datatypes in the generated vocabulary file. --noheader sgen:noHeader Prevent the output of a file header, with class comment etc. --noindividuals sgen:noindividuals Option to suppress individuals in the generated vocabulary file. --noproperties sgen:noproperties Option to suppress properties in the generated vocabulary file. -o \u0026lt;filename\u0026gt; -o \u0026lt;dir\u0026gt; sgen:output Specify the destination for the output. If the given value evaluates to a directory, the generated class will be placed in that directory with a file name formed from the generated (or given) class name with \u0026ldquo;.java\u0026rdquo; appended. --nostrict sgen:noStrict Option to turn off strict checking for ontology classes and properties (prevents ConversionExceptions). --ontology sgen:ontology The generated vocabulary will use the ontology API terms, in preference to RDF model API terms. --owl sgen:owl Specify that the language of the source is OWL (the default). Note that RDFS is a subset of OWL, so this setting also suffices for RDFS. --package \u0026lt;string\u0026gt; sgen:package Specify the Java package name and directory. --propSection \u0026lt;string\u0026gt; sgen:propSection Section declaration comment for properties section. --propTemplate \u0026lt;string\u0026gt; sgen:propTemplate Template for writing out declarations of property resources. -r \u0026lt;uri\u0026gt; Specify the uri of the root node in the RDF configuration model. --rdfs sgen:rdfs Specify that the language of the source ontology is RDFS. --strictIndividuals sgen:strictIndividuals When selecting the individuals to include in the output class, schemagen will normally include those individuals whose rdf:type is in the included namespaces for the vocabulary. However, if strictIndividuals is turned on, then all individuals in the output class must themselves have a URI in the included namespaces. --uppercase sgen:uppercase Option for mapping constant names to uppercase (like Java constants). Default is to leave the case of names unchanged. --includeSource sgen:includeSource Serializes the source code of the vocabulary, and includes this into the generated class file. At class load time, creates a Model containing the definitions from the source What does schemagen do? RDFS and OWL provide a very convenient means to define a controlled vocabulary or ontology. For general ontology processing, Jena provides various API\u0026rsquo;s to allow the source files to be read in and manipulated. However, when developing an application, it is frequently convenient to refer to the controlled vocabulary terms directly from Java code. This leads typically to the declaration of constants, such as:\npublic static final Resource A_CLASS = new ResourceImpl( \u0026quot;\u0026quot; ); When these constants are defined manually, it is tedious and error-prone to maintain them in sync with the source ontology file. Schemagen automates the production of Java constants that correspond to terms in an ontology document. By automating the step from source vocabulary to Java constants, a source of error and inconsistency is removed.\nExample Perhaps the easiest way to explain the detail of what schemagen does is to show an example. Consider the following mini-RDF vocabulary:\n\u0026lt;rdf:RDF xmlns:rdf=\u0026quot;\u0026quot; xmlns:rdfs=\u0026quot;\u0026quot; xmlns=\u0026quot;\u0026quot; xml:base=\u0026quot;\u0026quot;\u0026gt; \u0026lt;rdfs:Class rdf:ID=\u0026quot;Dog\u0026quot;\u0026gt; \u0026lt;rdfs:comment\u0026gt;A class of canine companions\u0026lt;/rdfs:comment\u0026gt; \u0026lt;/rdfs:Class\u0026gt; \u0026lt;rdf:Property rdf:ID=\u0026quot;petName\u0026quot;\u0026gt; \u0026lt;rdfs:comment\u0026gt;The name that everyone calls a dog\u0026lt;/rdfs:comment\u0026gt; \u0026lt;rdfs:domain rdf:resource=\u0026quot;\u0026quot; /\u0026gt; \u0026lt;/rdf:Property\u0026gt; \u0026lt;rdf:Property rdf:ID=\u0026quot;kennelName\u0026quot;\u0026gt; \u0026lt;rdfs:comment\u0026gt;Posh dogs have a formal name on their KC certificate\u0026lt;/rdfs:comment\u0026gt; \u0026lt;/rdf:Property\u0026gt; \u0026lt;Dog rdf:ID=\u0026quot;deputy\u0026quot;\u0026gt; \u0026lt;rdfs:comment\u0026gt;Deputy is a particular Dog\u0026lt;/rdfs:comment\u0026gt; \u0026lt;kennelName\u0026gt;Deputy Dawg of Chilcompton\u0026lt;/kennelName\u0026gt; \u0026lt;/Dog\u0026gt; \u0026lt;/rdf:RDF\u0026gt; We process this document with a command something like: Java jena.schemagen -i deputy.rdf -a to produce the following generated class:\n/* CVS $Id: schemagen.html,v 1.16 2010-06-11 00:08:23 ian_dickinson Exp $ */ import org.apache.jena.rdf.model.*; /** * Vocabulary definitions from deputy.rdf * @author Auto-generated by schemagen on 01 May 2003 21:49 */ public class Deputy { /** \u0026lt;p\u0026gt;The RDF model that holds the vocabulary terms\u0026lt;/p\u0026gt; */ private static Model m_model = ModelFactory.createDefaultModel(); /** \u0026lt;p\u0026gt;The namespace of the vocabulary as a string {@value}\u0026lt;/p\u0026gt; */ public static final String NS = \u0026quot;\u0026quot;; /** \u0026lt;p\u0026gt;The namespace of the vocabulary as a resource {@value}\u0026lt;/p\u0026gt; */ public static final Resource NAMESPACE = m_model.createResource( \u0026quot;\u0026quot; ); /** \u0026lt;p\u0026gt;The name that everyone calls a dog\u0026lt;/p\u0026gt; */ public static final Property petName = m_model.createProperty( \u0026quot;\u0026quot; ); /** \u0026lt;p\u0026gt;Posh dogs have a formal name on their KC certificate\u0026lt;/p\u0026gt; */ public static final Property kennelName = m_model.createProperty( \u0026quot;\u0026quot; ); /** \u0026lt;p\u0026gt;A class of canine companions\u0026lt;/p\u0026gt; */ public static final Resource Dog = m_model.createResource( \u0026quot;\u0026quot; ); /** \u0026lt;p\u0026gt;Deputy is a particular Dog\u0026lt;/p\u0026gt; */ public static final Resource deputy = m_model.createResource( \u0026quot;\u0026quot; ); } Some things to note in this example. All of the named classes, properties and individuals from the source document are translated to Java constants (below we show how to be more selective than this). The properties of the named resources are not translated: schemagen is for giving access to the names in the vocabulary or schema, not to perform a general translation of RDF to Java. The RDFS comments from the source code are translated to Javadoc comments. Finally, we no longer directly call new ResourceImpl: this idiom is no longer recommended by the Jena team.\nWe noted earlier that schemagen is highly configurable. One additional argument generates a vocabulary file that uses Jena\u0026rsquo;s ontology API, rather than the RDF model API. We change rdfs:Class to owl:Class, and invoke Java jena.schemagen -i deputy.rdf -b --ontology to get:\n/* CVs $Id: schemagen.html,v 1.16 2010-06-11 00:08:23 ian_dickinson Exp $ */ import org.apache.jena.rdf.model.*; import org.apache.jena.ontology.*; /** * Vocabulary definitions from deputy.rdf * @author Auto-generated by schemagen on 01 May 2003 22:03 */ public class Deputy { /** \u0026lt;p\u0026gt;The ontology model that holds the vocabulary terms\u0026lt;/p\u0026gt; */ private static OntModel m_model = ModelFactory.createOntologyModel( ProfileRegistry.OWL_LANG ); /** \u0026lt;p\u0026gt;The namespace of the vocabulary as a string {@value}\u0026lt;/p\u0026gt; */ public static final String NS = \u0026quot;\u0026quot;; /** \u0026lt;p\u0026gt;The namespace of the vocabulary as a resource {@value}\u0026lt;/p\u0026gt; */ public static final Resource NAMESPACE = m_model.createResource( \u0026quot;\u0026quot; ); /** \u0026lt;p\u0026gt;The name that everyone calls a dog\u0026lt;/p\u0026gt; */ public static final Property petName = m_model.createProperty( \u0026quot;\u0026quot; ); /** \u0026lt;p\u0026gt;Posh dogs have a formal name on their KC certificate\u0026lt;/p\u0026gt; */ public static final Property kennelName = m_model.createProperty( \u0026quot;\u0026quot; ); /** \u0026lt;p\u0026gt;A class of canine companions\u0026lt;/p\u0026gt; */ public static final OntClass Dog = m_model.createClass( \u0026quot;\u0026quot; ); /** \u0026lt;p\u0026gt;Deputy is a particular Dog\u0026lt;/p\u0026gt; */ public static final Individual deputy = m_model.createIndividual( Dog, \u0026quot;\u0026quot; ); } General principles In essence, schemagen will load a single vocabulary file, and generate a Java class that contains static constants for the named classes, properties and instances of the vocabulary. Most of the generated components of the output Java file can be controlled by option flags, and formatted with a template. Default templates are provided for all elements, so the minimum amount of necessary information is actually very small.\nOptions can be specified on the command line (when invoking schemagen), or may be preset in an RDF file. Any mixture of command line and RDF option specification is permitted. Where a given option is specified both in an RDF file and on the command line, the command line setting takes precedence. Thus the options in the RDF file can be seen as defaults.\nSpecifying command line options To specify a command line option, add its name (and optional value) to the command line when invoking the schemagen tool. E.g: Java jena.schemagen -i myvocab.owl --ontology --uppercase\nSpecifying options in an RDF file To specify an option in an RDF file, create a resource of type sgen:Config, with properties corresponding to the option names listed in Table 1. The following fragment shows a small options file. A complete example configuration file is shown in appendix A.\nBy default, schemagen will look for a configuration file named schemagen.rdf in the current directory. To specify another configuration, use the -c option with a URL to reference the configuration. Multiple configurations (i.e. multiple sgen:Config nodes) can be placed in one RDF document. In this case, each configuration node must be named, and the URI specified in the -r command line option. If there is no -r option, schemagen will look for a node of type rdf:type sgen:Config. If there are multiple such nodes in the model, it is indeterminate which one will be used.\nUsing templates We have several times referred to a template being used to construct part of the generated file. What is a template? Simply put, it is a fragment of output file. Some templates will be used at most once (for example the file header template), some will be used many times (such as the template used to generate a class constant). In order to make the templates adaptable to the job they\u0026rsquo;re doing, before it is written out a template has keyword substitution performed on it. This looks for certain keywords delimited by a pair of special characters (% by default), and replaces them with the current binding for that keyword. Some keyword bindings stay the same throughout the processing of the file, and some are dependent on the language element being processed. The substitutions are:\nTable 2: Substitutable keywords in templates\nKeyword Meaning Typical value classname The name of the Java class being generated Automatically defined from the document name, or given with the -n option date The date and time the class was generated imports The Java imports for this class nl The newline character for the current platform package The Java package name As specified by an option. The option just gives the package name, schemagen turns the name into a legal Java statement. sourceURI The source of the document being processed As given by the -i option or in the config file. valclass The Java class of the value being defined E.g. Property for vocabulary properties, Resource for classes in RDFS, or OntClass for classes using the ontology API valcreator The method used to generate an instance of the Java representation E.g. createResource or createClass valname The name of the Java constant being generated This is generated from the name of the resource in the source file, adjusted to be a legal Java identifier. By default, this will preserve the case of the RDF constant, but setting --uppercase will map all constants to upper-case names (a common convention in Java code). valtype The rdf:type for an individual The class name or URI used when creating an individual in the ontology API valuri The full URI of the value being defined From the RDF, without adjustment. Details of schemagen options We now go through each of the configuration options in detail.\nNote: for brevity, we assume a standard prefix sgen is defined for resource URI\u0026rsquo;s in the schemagen namespace. The expansion for sgen is:, thus:\nxmlns:sgen=\u0026quot;\u0026quot; Note on legal Java identifiers Schemagen will attempt to ensure that all generated code will compile as legal Java. Occasionally, this means that identifiers from input documents, which are legal components of RDF URI identifiers, have to be modified to be legal Java identifiers. Specifically, any character in an identifier name that is not a legal Java identifier character will be replaced with the character \u0026lsquo;_\u0026rsquo; (underscore). Thus the name \u0026lsquo;trading-price\u0026rsquo; might become 'trading_price\u0026rsquo;. In addition, Java requires that identifiers be distinct. If a name clash is detected (for example, trading-price and trading+price both map to the same Java identifier), schemagen will add disambiguators to the second and subsequent uses. These will be based on the role of the identifier; for example property names are disambiguated by appending _PROPn for increasing values of n. In a well-written ontology, identifiers are typically made distinct for clarity and ease-of-use by the ontology users, so the use of the disambiguation tactic is rare. Indeed, it may be taken as a hint that refactoring the ontology itself is desirable.\nSpecifying the configuration file Command line -c \u0026lt;*config-file-path*\u0026gt;\n-c \u0026lt;*config-file-URL*\u0026gt; Config file n/a The default configuration file name is schemagen.rdf in the current directory. To specify a different configuration file, either as a file name on the local file system, or as a URL (e.g. an http: address), the config file location is passed with the -c option. If no -c option is given, and there is no configuration file in the current directory, schemagen will continue and use default values (plus the other command line options) to configure the tool. If a file name or URL is given with -c, and that file cannot be located, schemagen will stop with an error.\nSchemagen will assume the language encoding of the configuration file is implied by the filename/URL suffix: \u0026ldquo;.n3\u0026rdquo; means N3, \u0026ldquo;.nt\u0026rdquo; means NTRIPLES, \u0026ldquo;.rdf\u0026rdquo; and \u0026ldquo;.owl\u0026rdquo; mean \u0026ldquo;RDF/XML\u0026rdquo;. By default it assumes RDF/XML.\nSpecifying the configuration root in the configuration file Command line -r \u0026lt;*config-root-URI*\u0026gt; Config file n/a It is possible to have more than one set of configuration options in one configuration file. If there is only one set of configuration options, schemagen will locate the root by searching for a resource of rdf:type sgen:Config. If there is more than one, and no root is specified on the command line, it is not specified which set of configuration options will be used. The root URI given as a command line option must match exactly with the URI given in the configuration file. For example:\nJava jena.schemagen -c config/localconf.rdf -r matches:\n... \u0026lt;sgen:Config rdf:about=\u0026quot;\u0026quot;\u0026gt; .... \u0026lt;/sgen:Config\u0026gt; Specifying the input document Command line -i \u0026lt;*input-file-path*\u0026gt;\n-i \u0026lt;*input-URL*\u0026gt; Config file \u0026lt;sgen:input rdf:resource=\u0026quot;*inputURL*\u0026quot; /\u0026gt; The only mandatory argument to schemagen is the input document to process. This can be specified in the configuration file, though this does, of course, mean that the same configuration cannot be applied to multiple different input files for consistency. However, by specifying the input document in the default configuration file, schemagen can easily be invoked with the minimum of command line typing. For other means of automating schemagen, see using schemagen with Ant.\nSpecifying the output location Command line -o \u0026lt;*input-file-path*\u0026gt;\n-o \u0026lt;*output-dir*\u0026gt; Config file \u0026lt;sgen:output rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt;*output-path-or-dir*\u0026lt;/sgen:output\u0026gt; Schemagen must know where to write the generated Java file. By default, the output is written to the standard output. Various options exist to change this. The output location can be specified either on the command line, or in the configuration file. If specified in the configuration file, the resource must be a string literal, denoting the file path. If the path given resolves to an existing directory, then it is assumed that the output will be based on the name of the generated class (i.e. it will be the class name with Java appended). Otherwise, the path is assumed to point to a file. Any existing file that has the given path name will be overwritten.\nBy default, schemagen will create files that have the Unix convention for line-endings (i.e. \u0026lsquo;\\n\u0026rsquo;). To switch to DOS-style line endings, use --dos.\nCommand line --dos Config file \u0026lt;sgen:dos rdf:datatype=\u0026quot;\u0026amp;xsd;boolean\u0026quot;\u0026gt;true\u0026lt;/sgen:dos\u0026gt; Specifying the class name Command line -n \u0026lt;*class-name*\u0026gt; Config file \u0026lt;sgen:classname rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt;*classname*\u0026lt;/sgen:classname\u0026gt; By default, the name of the class will be based on the name of the input file. Specifically, the last component of the input document\u0026rsquo;s path name, with the prefix removed, becomes the class name. By default, the initial letter is adjusted to a capital to conform to standard Java usage. Thus file:vocabs/trading.owl becomes To override this default algorithm, a class name specified by -n or in the config file is used exactly as given.\nSometimes it is convenient to have all vocabulary files distinguished by a common suffix, for example or This can be achieved by the classname-suffix option:\nCommand line --classnamesuffix \u0026lt;*suffix*\u0026gt; Config file \u0026lt;sgen:classnamesuffix rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt;*suffix*\u0026lt;/sgen:classnamesuffix\u0026gt; See also the note on legal Java identifiers, which applies to generated class names.\nSpecifying the vocabulary namespace Command line -a \u0026lt;*namespace-URI*\u0026gt; Config file \u0026lt;sgen:namespace rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt;*namespace*\u0026lt;/sgen:namespace\u0026gt; Since ontology files are often modularised, it is not the case that all of the resource names appearing in a given document are being defined by that ontology. They may appear simply as part of the definitions of other terms. Schemagen assumes that there is one primary namespace for each document, and it is names from that namespace that will appear in the generated Java file.\nIn an OWL ontology, this namespace is computed by finding the owl:Ontology element, and using its namespace as the primary namespace of the ontology. This may not be available (it is not, for example, a part of RDFS) or correct, so the namespace may be specified directly with the -a option or in the configuration file.\nSchemagen does not, in the present version, permit more than one primary namespace per generated Java class. However, constants from namespaces other than the primary namespace may be included in the generated Java class by the include option:\nCommand line --include \u0026lt;*namespace-URI*\u0026gt; Config file \u0026lt;sgen:include rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt;*namespace*\u0026lt;/sgen:include\u0026gt; The include option may repeated multiple times to include a variety of constants from other namespaces in the output class.\nSince OWL and RDFS ontologies may include individuals that are named instances of declared classes, schemagen will include individuals among the constants that it generates in Java. By default, an individual will be included if its class has a URI that is in one of the permitted namespaces for the vocabulary, even if the individual itself is not in that namespace. If the option strictIndividuals is set, individuals are only included if they have a URI that is in the permitted namespaces for the vocabulary.\nCommand line --strictIndividuals Config file \u0026lt;sgen:strictIndividuals /\u0026gt; Specifying the syntax (encoding) of the input document Command line -e \u0026lt;*encoding*\u0026gt; Config file \u0026lt;sgen:encoding rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt;*encoding*\u0026lt;/sgen:encoding\u0026gt; Jena can parse a number of different presentation syntaxes for RDF documents, including RDF/XML, N3 and NTRIPLE. By default, the encoding will be derived from the name of the input document (e.g. a document xyz.n3 will be parsed in N3 format), or, if the extension is non-obvious the default is RDF/XML. The encoding, and hence the parser, to use on the input document may be specified by the encoding configuration option.\nChoosing the style of the generated class: ontology or plain RDF Command line --ontology Config file \u0026lt;sgen:ontology rdf:datatype=\u0026quot;\u0026amp;xsd;boolean\u0026quot;\u0026gt;*true or false*\u0026lt;/sgen:ontology\u0026gt; By default, the Java class generated by schemagen will generate constants that are plain RDF Resource, Property or Literal constants. When working with OWL or RDFS ontologies, it may be more convenient to have constants that are OntClass, ObjectProperty, DatatypeProperty and Individual Java objects. To generate these ontology constants, rather than plain RDF constants, set the ontology configuration option.\nFurthermore, since Jena can handle input ontologies in OWL (the default), and RDFS, it is necessary to be able to specify which language is being processed. This will affect both the parsing of the input documents, and the language profile selected for the constants in the generated Java class.\nCommand line --owl Config file \u0026lt;sgen:owl rdf:datatype=\u0026quot;\u0026amp;xsd;boolean\u0026quot;\u0026gt;true\u0026lt;/sgen:owl\u0026gt; Command line --rdfs Config file \u0026lt;sgen:rdfs rdf:datatype=\u0026quot;\u0026amp;xsd;boolean\u0026quot;\u0026gt;true\u0026lt;/sgen:owl\u0026gt; Prior to Jena 2.2, schemagen used a Jena model to load the input document that also applied some rules of inference to the input data. So, for example, a resource that is mentioned as the owl:range of a property can be inferred to be rdf:type owl:Class, and hence listed in the class constants in the generated Java class, even if that fact is not directly asserted in the input model. From Jena 2.2 onwards, this option is now off by default. If correct handling of an input document by schemagen requires the use of inference rules, this must be specified by the inference option.\nCommand line --inference Config file \u0026lt;sgen:inference rdf:datatype=\u0026quot;\u0026amp;xsd;boolean\u0026quot;\u0026gt;true\u0026lt;/sgen:owl\u0026gt; Specifying the Java package Command line --package \u0026lt;*package-name*\u0026gt; Config file \u0026lt;sgen:package rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt;*package-name*\u0026lt;/sgen:package\u0026gt; By default, the Java class generated by schemagen will not be in a Java package. Set the package configuration option to specify the Java package name. Change from Jena 2.6.4-SNAPSHOT onwards: Setting the package name will affect the directory into which the generated class will be written: directories will be appended to the output directory to match the Java package.\nAdditional decorations on the main class declaration Command line --classdec \u0026lt;*class-declaration*\u0026gt; Config file \u0026lt;sgen:classdec rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt;*class-declaration*\u0026lt;/sgen:classdec\u0026gt; In some applications, it may be convenient to add additional information to the declaration of the Java class, for example that the class implements a given interface (such as java.lang.Serializable). Any string given as the value of the class-declaration option will be written immediately after \u0026ldquo;public class \u0026lt;i\u0026gt;ClassName\u0026lt;/i\u0026gt;\u0026rdquo;.\nAdding general declarations within the generated class Command line --declarations \u0026lt;*declarations*\u0026gt; Config file \u0026lt;sgen:declarations rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt;*declarations*\u0026lt;/sgen:declarations\u0026gt; Some more complex vocabularies may require access to static constants, or other Java objects or factories to fully declare the constants defined by the given templates. Any text given by the declarations option will be included in the generated class after the class declaration but before the body of the declared constants. The value of the option should be fully legal Java code (though the template substitutions will be performed on the code). Although this option can be declared as a command line option, it is typically easier to specify as a value in a configuration options file.\nOmitting sections of the generated vocabulary Command line --noclasses\n--nodatatypes\n--noproperties\n--noindividuals Config file \u0026lt;sgen:noclassses rdf:datatype=\u0026quot;\u0026amp;xsd;boolean\u0026quot;\u0026gt;true\u0026lt;/sgen:noclassses\u0026gt;\n\u0026lt;sgen:nodatatypes rdf:datatype=\u0026quot;\u0026amp;xsd;boolean\u0026quot;\u0026gt;true\u0026lt;/sgen:nodatatypes\u0026gt;\n\u0026lt;sgen:noproperties rdf:datatype=\u0026quot;\u0026amp;xsd;boolean\u0026quot;\u0026gt;true\u0026lt;/sgen:noproperties\u0026gt;\n\u0026lt;sgen:noindividuals rdf:datatype=\u0026quot;\u0026amp;xsd;boolean\u0026quot;\u0026gt;true\u0026lt;/sgen:noindividuals\u0026gt; By default, the vocabulary class generated from a given ontology will include constants for each of the included classes, datatypes, properties and individuals in the ontology. To omit any of these groups, use the corresponding noXYZ configuration option. For example, specifying --noproperties means that the generated class will not contain any constants corresponding to predicate names from the ontology, irrespective of what is in the input document.\nSection header comments Command line --classSection *\u0026lt;section heading\u0026gt;*\n--datatypeSection *\u0026lt;section heading\u0026gt;*\n--propSection *\u0026lt;section heading\u0026gt;*\n--individualSection *\u0026lt;section heading*\u0026gt;\n--header *\u0026lt;file header section\u0026gt;*\n--footer *\u0026lt;file footer section\u0026gt;* Config file \u0026lt;sgen:classSection rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt;*section heading*\u0026lt;/sgen:classSection\u0026gt;\n\u0026lt;sgen:datatypeSection rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt;*section heading*\u0026lt;/sgen:datatypeSection\u0026gt;\n\u0026lt;sgen:propSection rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt;*section heading*\u0026lt;/sgen:propSection\u0026gt;\n\u0026lt;sgen:individualSection rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt;*section heading*\u0026lt;/sgen:individualSection\u0026gt;\n\u0026lt;sgen:header rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt;*file header*\u0026lt;/sgen:header\u0026gt;\n\u0026lt;sgen:footer rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt;*file footer*\u0026lt;/sgen:footer\u0026gt; Some coding styles use block comments to delineate different sections of a class. These options allow the introduction of arbitrary Java code, though typically this will be a comment block, at the head of the sections of class constant declarations, datatype constant declarations, property constant declarations, and individual constant declarations.\nInclude vocabulary source code Command line --includeSource Config file \\\u0026lt;sgen:includeSource rdf:datatype=\u0026quot;\u0026amp;xsd;boolean\u0026quot;\u0026gt;true\\\u0026lt;/sgen:includeSource\u0026gt; Schemagen\u0026rsquo;s primary role is to provide Java constants corresponding to the names in a vocabulary. Sometimes, however, we may need more information from the vocabulary source file to available. For example, to know the domain and range of the properties in the vocabulary. If you set the configuration parameter --includeSource, schemagen will:\nconvert the input vocabulary into string form and include that string form in the generated Java class create a Jena model when the Java vocabulary class is first loaded, and load the string-ified vocabulary into that model attach the generated constants to that model, so that, for example, you can look up the declared domain and range of a property or the declared super-classes of a class. Note that Java compilers typically impose some limit on the size of a Java source file (or, more specifically, on the size of .class file they will generate. Loading a particularly large vocabulary with --includeSource may risk breaching that limit.\nUsing schemagen with Maven Apache Maven is a build automation tool typically used for Java. You can use exec-maven-plugin and build-helper-maven-plugin to run schemagen as part of the generate-sources goal of your project. The following example shows one way of performing this task. The developer should customize command-line options or use a configuration file instead as needed.\n\u0026lt;build\u0026gt; \u0026lt;plugins\u0026gt; \u0026lt;plugin\u0026gt; \u0026lt;groupId\u0026gt;org.codehaus.mojo\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;exec-maven-plugin\u0026lt;/artifactId\u0026gt; \u0026lt;executions\u0026gt; \u0026lt;execution\u0026gt; \u0026lt;phase\u0026gt;generate-sources\u0026lt;/phase\u0026gt; \u0026lt;goals\u0026gt; \u0026lt;goal\u0026gt;java\u0026lt;/goal\u0026gt; \u0026lt;/goals\u0026gt; \u0026lt;configuration\u0026gt; \u0026lt;mainClass\u0026gt;jena.schemagen\u0026lt;/mainClass\u0026gt; \u0026lt;commandlineArgs\u0026gt; --inference \\ -i ${basedir}/src/main/resources/example.ttl \\ -e TTL \\ --package org.example.ont \\ -o ${}/generated-sources/java \\ -n ExampleOnt \u0026lt;/commandlineArgs\u0026gt; \u0026lt;/configuration\u0026gt; \u0026lt;/execution\u0026gt; \u0026lt;/executions\u0026gt; \u0026lt;/plugin\u0026gt; \u0026lt;plugin\u0026gt; \u0026lt;groupId\u0026gt;org.codehaus.mojo\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;build-helper-maven-plugin\u0026lt;/artifactId\u0026gt; \u0026lt;executions\u0026gt; \u0026lt;execution\u0026gt; \u0026lt;id\u0026gt;add-source\u0026lt;/id\u0026gt; \u0026lt;goals\u0026gt; \u0026lt;goal\u0026gt;add-source\u0026lt;/goal\u0026gt; \u0026lt;/goals\u0026gt; \u0026lt;configuration\u0026gt; \u0026lt;sources\u0026gt; \u0026lt;source\u0026gt;${}/generated-sources/java\u0026lt;/source\u0026gt; \u0026lt;/sources\u0026gt; \u0026lt;/configuration\u0026gt; \u0026lt;/execution\u0026gt; \u0026lt;/executions\u0026gt; \u0026lt;/plugin\u0026gt; \u0026lt;/plugins\u0026gt; \u0026lt;/build\u0026gt; At this point you can run mvn generate-sources in your project to cause schemagen to run and create your Java source (note that this goal is run automatically from mvn compile or mvn install, so there really isn\u0026rsquo;t any reason to run it manually unless you wish to just generate the source). The source file is placed in the maven standard target/generated-sources/java directory, which is added to the project classpath by build-helper-maven-plugin.\nUsing schemagen with Ant Apache Ant is a tool for automating build steps in Java (and other language) projects. For example, it is the tool used to compile the Jena sources to the jena.jar file, and to prepare the Jena distribution prior to download. Although it would be quite possible to create an Ant taskdef to automate the production of Java classes from input vocabularies, we have not yet done this. Nevertheless, it is straightforward to use schemagen from an ant build script, by making use of Ant\u0026rsquo;s built-in Java task, which can execute an arbitrary Java program.\nThe following example shows a complete ant target definition for generating from example.owl. It ensures that the generation step is only performed when example.owl has been updated more recently than (e.g. if the definitions in the owl file have recently been changed).\n\u0026lt;!-- properties --\u0026gt; \u0026lt;property name=\u0026quot;vocab.dir\u0026quot; value=\u0026quot;src/org/example/vocabulary\u0026quot; /\u0026gt; \u0026lt;property name=\u0026quot;vocab.template\u0026quot; value=\u0026quot;${rdf.dir}/exvocab.rdf\u0026quot; /\u0026gt; \u0026lt;property name=\u0026quot;vocab.tool\u0026quot; value=\u0026quot;jena.schemagen\u0026quot; /\u0026gt; \u0026lt;!-- Section: vocabulary generation --\u0026gt; \u0026lt;target name=\u0026quot;vocabularies\u0026quot; depends=\u0026quot;exVocab\u0026quot; /\u0026gt; \u0026lt;target name=\u0026quot;exVocab.check\u0026quot;\u0026gt; \u0026lt;uptodate property=\u0026quot;exVocab.nobuild\u0026quot; srcFile=\u0026quot;${rdf.dir}/example.owl\u0026quot; targetFile=\u0026quot;${vocab.dir}/\u0026quot; /\u0026gt; \u0026lt;/target\u0026gt; \u0026lt;target name=\u0026quot;exVocab\u0026quot; depends=\u0026quot;exVocab.check\u0026quot; unless=\u0026quot;exVocab.nobuild\u0026quot;\u0026gt; \u0026lt;Java classname=\u0026quot;${vocab.tool}\u0026quot; classpathref=\u0026quot;classpath\u0026quot; fork=\u0026quot;yes\u0026quot;\u0026gt; \u0026lt;arg value=\u0026quot;-i\u0026quot; /\u0026gt; \u0026lt;arg value=\u0026quot;file:${rdf.dir}/example.owl\u0026quot; /\u0026gt; \u0026lt;arg value=\u0026quot;-c\u0026quot; /\u0026gt; \u0026lt;arg value=\u0026quot;${vocab.template}\u0026quot; /\u0026gt; \u0026lt;arg value=\u0026quot;--classnamesuffix\u0026quot; /\u0026gt; \u0026lt;arg value=\u0026quot;Vocab\u0026quot; /\u0026gt; \u0026lt;arg value=\u0026quot;--include\u0026quot; /\u0026gt; \u0026lt;arg value=\u0026quot;\u0026quot; /\u0026gt; \u0026lt;arg value=\u0026quot;--ontology\u0026quot; /\u0026gt; \u0026lt;/Java\u0026gt; \u0026lt;/target\u0026gt; Clearly it is up to each developer to find the appropriate balance between options that are specified via the command line options, and those that are specified in the configuration options file (exvocab.rdf in the above example). This is not the only, nor necessarily the \u0026ldquo;right\u0026rdquo; way to use schemagen from Ant, but if it points readers in the appropriate direction to produce a custom target for their own application it will have served its purpose.\nAppendix A: Complete example configuration file The source of this example is provided in the Jena download as etc/schemagen.rdf. For clarity, RDF/XML text is highlighted in blue.\n\u0026lt;?xml version='1.0'?\u0026gt; \u0026lt;!DOCTYPE rdf:RDF [ \u0026lt;!ENTITY jena ''\u0026gt; \u0026lt;!ENTITY rdf ''\u0026gt; \u0026lt;!ENTITY rdfs ''\u0026gt; \u0026lt;!ENTITY owl ''\u0026gt; \u0026lt;!ENTITY xsd ''\u0026gt; \u0026lt;!ENTITY base '\u0026amp;jena;2003/04/schemagen'\u0026gt; \u0026lt;!ENTITY sgen '\u0026amp;base;#'\u0026gt; ]\u0026gt; \u0026lt;rdf:RDF xmlns:rdf =\u0026quot;\u0026amp;rdf;\u0026quot; xmlns:rdfs =\u0026quot;\u0026amp;rdfs;\u0026quot; xmlns:owl =\u0026quot;\u0026amp;owl;\u0026quot; xmlns:sgen =\u0026quot;\u0026amp;sgen;\u0026quot; xmlns =\u0026quot;\u0026amp;sgen;\u0026quot; xml:base =\u0026quot;\u0026amp;base;\u0026quot; \u0026gt; \u0026lt;!-- Example schemagen configuration for use with jena.schemagen Not all possible options are used in this example, see Javadoc and Howto for full details. Author: Ian Dickinson, CVs: $Id: schemagen.html,v 1.16 2010-06-11 00:08:23 ian_dickinson Exp $ --\u0026gt; \u0026lt;sgen:Config\u0026gt; \u0026lt;!-- specifies that the source document uses OWL --\u0026gt; \u0026lt;sgen:owl rdf:datatype=\u0026quot;\u0026amp;xsd;boolean\u0026quot;\u0026gt;true\u0026lt;/sgen:owl\u0026gt; \u0026lt;!-- specifies that we want the generated vocab to use OntClass, OntProperty, etc, not Resource and Property --\u0026gt; \u0026lt;sgen:ontology rdf:datatype=\u0026quot;\u0026amp;xsd;boolean\u0026quot;\u0026gt;true\u0026lt;/sgen:ontology\u0026gt; \u0026lt;!-- specifies that we want names mapped to uppercase (as standard Java constants) --\u0026gt; \u0026lt;sgen:uppercase rdf:datatype=\u0026quot;\u0026amp;xsd;boolean\u0026quot;\u0026gt;true\u0026lt;/sgen:uppercase\u0026gt; \u0026lt;!-- append Vocab to class name, so input beer.owl becomes --\u0026gt; \u0026lt;sgen:classnamesuffix rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt;Vocab\u0026lt;/sgen:classnamesuffix\u0026gt; \u0026lt;!-- the Java package that the vocabulary is in --\u0026gt; \u0026lt;sgen:package rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt;com.example.vocabulary\u0026lt;/sgen:package\u0026gt; \u0026lt;!-- the directory or file to write the results out to --\u0026gt; \u0026lt;sgen:output rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt;src/com/example/vocabulary\u0026lt;/sgen:output\u0026gt; \u0026lt;!-- the template for the file header --\u0026gt; \u0026lt;sgen:header rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt;/***************************************************************************** * Source code information * ----------------------- * Original author Jane Smart, * Author email * Package @package@ * Web site @website@ * Created %date% * Filename $RCSfile: schemagen.html,v $ * Revision $Revision: 1.16 $ * Release status @releaseStatus@ $State: Exp $ * * Last modified on $Date: 2010-06-11 00:08:23 $ * by $Author: ian_dickinson $ * * @copyright@ *****************************************************************************/ // Package /////////////////////////////////////// %package% // Imports /////////////////////////////////////// %imports% /** * Vocabulary definitions from %sourceURI% * @author Auto-generated by schemagen on %date% */\u0026lt;/sgen:header\u0026gt; \u0026lt;!-- the template for the file footer (note @footer@ is an Ant-ism, and will not be processed by SchemaGen) --\u0026gt; \u0026lt;sgen:footer rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt; /* @footer@ */ \u0026lt;/sgen:footer\u0026gt; \u0026lt;!-- template for extra declarations at the top of the class file --\u0026gt; \u0026lt;sgen:declarations rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt; /** Factory for generating symbols */ private static KsValueFactory s_vf = new DefaultValueFactory(); \u0026lt;/sgen:declarations\u0026gt; \u0026lt;!-- template for introducing the properties in the vocabulary --\u0026gt; \u0026lt;sgen:propSection rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt; // Vocabulary properties /////////////////////////// \u0026lt;/sgen:propSection\u0026gt; \u0026lt;!-- template for introducing the classes in the vocabulary --\u0026gt; \u0026lt;sgen:classSection rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt; // Vocabulary classes /////////////////////////// \u0026lt;/sgen:classSection\u0026gt; \u0026lt;!-- template for introducing the datatypes in the vocabulary --\u0026gt; \u0026lt;sgen:datatypeSection rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt; // Vocabulary datatypes /////////////////////////// \u0026lt;/sgen:datatypeSection\u0026gt; \u0026lt;!-- template for introducing the individuals in the vocabulary --\u0026gt; \u0026lt;sgen:individualsSection rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt; // Vocabulary individuals /////////////////////////// \u0026lt;/sgen:individualsSection\u0026gt; \u0026lt;!-- template for doing fancy declarations of individuals --\u0026gt; \u0026lt;sgen:individualTemplate rdf:datatype=\u0026quot;\u0026amp;xsd;string\u0026quot;\u0026gt;public static final KsSymbol %valname% = s_vf.newSymbol( \u0026quot;%valuri%\u0026quot; ); /** Ontology individual corresponding to {@link #%valname%} */ public static final %valclass% _%valname% = m_model.%valcreator%( %valtype%, \u0026quot;%valuri%\u0026quot; ); \u0026lt;/sgen:individualTemplate\u0026gt; \u0026lt;/sgen:Config\u0026gt; \u0026lt;/rdf:RDF\u0026gt; ","permalink":"","tags":null,"title":"Jena schemagen HOWTO"},{"categories":null,"contents":"The Jena project has issued a number of security advisories during the lifetime of the project. On this page you\u0026rsquo;ll find details of our security issue process, as a listing of our past CVEs and relevant Dependency CVEs.\nProcess Jena follows the standard ASF Security for Committers policy for reporting and addressing security issues.\nIf you think you have identified a Security issue in our project please refer to that policy for how to report it, and the process that the Jena Project Management Committee (PMC) will follow in addressing the issue.\nSingle Supported Version As a project, Apache Jena only has the resources to maintain a single release version. Any accepted security issue will be fixed in a future release in a timeframe appropriate to the severity of the issue.\nStandard Mitigation Advice Note that as a project our guidance to users is always to use the newest Jena version available to ensure you have any security fixes we have made available.\nWhere more specific mitigations are available, these will be denoted in the individual CVEs.\nEnd of Life (EOL) Components Where a security advisory is issued for a component that is already EOL (sometimes referred to as archived or retired within our documentation) then we will not fix the issue but instead reiterate our previous recommendations that users cease using the EOL component and migrate to actively supported components.\nSuch issues will follow the CVE EOL Assignment Process and will be clearly denoted by the UNSUPPORTED WHEN ASSIGNED text at the start of the description.\nSecurity Issues in Dependencies For our dependencies, the project relies primarily upon GitHub Dependabot Alerts to be made aware of available dependency updates, whether security related or otherwise. When a security related update is released and our analysis shows that Jena users may be affected we endeavour to take the dependency upgrade ASAP and make a new release in timeframe appropriate to the severity of the issue.\nJena CVEs The following CVEs specifically relate to the Jena codebase itself and have been addressed by the project. Per our policy above we advise users to always utilise the latest Jena release available.\nPlease refer to the individual CVE links for further details and mitigations.\nCVE-2023-32200 - Exposure of execution in script engine expressions. CVE-2023-32200 affects Jena 3.7.0 through Jena 4.8.0 and relates to the Javascript SPARQL Functions feature of our ARQ SPARQL engine.\nThere is insufficient restrictions of called script functions in Apache Jena versions 4.8.0 and earlier, when invoking custom scripts. It allows a remote user to execute javascript via a SPARQL query.\nFrom Jena 4.9.0, script functions MUST be added to an explicit \u0026ldquo;allow\u0026rdquo; list for them to be called from the SPARQL query engine. This is in addition to the script enabling controls of Jena 4.8.0 which MUST also be applied.\nUsers should upgrade to latest Jena 4.x release available.\nCVE-2023-22665 - Exposure of arbitrary execution in script engine expressions. CVE-2023-22665 affects Jena 3.7.0 through 4.7.0 and relates to the Javascript SPARQL Functions feature of our ARQ SPARQL engine.\nFrom Jena 4.8.0 onwards this feature MUST be explicitly enabled by end users, and on newer JVMs (Java 17 onwards) a JavaScript script engine MUST be explicitly added to the environment.\nHowever, when enabled this feature does expose the majority of the underlying scripting engine directly to SPARQL queries so may provide a vector for arbitrary code execution. Therefore, it is recommended that this feature remain disabled for any publicly accessible deployment that utilises the ARQ query engine.\nUsers should upgrade to latest Jena 4.x release available.\nCVE-2022-45136 - JDBC Serialisation in Apache Jena SDB CVE-2022-45136 affects all versions of Jena SDB up to and including the final 3.17.0 release.\nApache Jena SDB has been EOL since December 2020 and we recommend any remaining users migrate to Jena TDB 2 or other 3rd party vendor alternatives.\nApache Jena would like to thank Crilwa \u0026amp; LaNyer640 for reporting this issue\nCVE-2022-28890 - Processing External DTDs CVE-2022-28890 affects the RDF/XML parser in Jena 4.4.0 only.\nUsers should upgrade to latest Jena 4.x release available.\nApache Jena would like to thank Feras Daragma, Avishag Shapira \u0026amp; Amit Laish (GE Digital, Cyber Security Lab) for their report.\nCVE-2021-39239 - XML External Entity (XXE) Vulnerability CVE-2021-39239 affects XML parsing up to and including the Jena 4.1.0 release.\nUsers should upgrade to latest Jena 4.x release available.\nCVE-2021-33192 - Display information UI XSS in Apache Jena Fuseki CVE-2021-33192 affected Fuseki versions 2.0.0 through 4.0.0.\nUsers should upgrade to latest Jena 4.x release available.\nCVEs in Jena Dependencies The following advisories are CVEs in Jena\u0026rsquo;s dependencies that may affect users of Jena, as with Jena specific CVEs our standard Security Issue Policy applies and any necessary dependency updates, dependency API and/or configuration changes have been adopted and released as soon as appropriate.\nlog4shell CVE-2021-45105, CVE-2021-45105 and CVE-2021-44832, collectively known as log4shell were several vulnerabilities identified in the Apache Log4j project that Jena uses as the concrete logging implementation for Fuseki and our command line tools.\nJena versions prior to 4.4.0 included vulnerable versions of Log4j.\nUsers should upgrade to latest Jena 4.x release available.\n","permalink":"","tags":null,"title":"Jena Security Advisories"},{"categories":null,"contents":"This page gives an overview of transactions in Jena.\nThere are two API for transactions: the basic transaction interface styled after the conventional begin-commit and a higher level Txn API that builds on the basic API using Java8 features.\nAPIs Basic API for Transactions Txn, a high level API to transactions Overview Transaction provide applications with a safe way to use and update data between threads. The properties of transactions are ACID\nAtomic, Consistent, Isolation, Durable - meaning that groups of changes are made visible to other transactions in a single unit or no changes become visible, and when made changes are not reversed, or the case of persistent storage, not lost or the database corrupted. Jena provides transaction on datasets and provides \u0026ldquo;serializable transactions\u0026rdquo;. Any application code reading data sees all changes made elsewhere, not parts of changes. In particular, SPARQL aggregation like COUNT are correct and do not see partial changes due to other transactions.\nThe exact details are dependent on the implementation.\nTransactions can not be nested (a transaction happening inside an outer transaction results in changes visible only to the outer transaction until that commits).\nTransactions are \u0026ldquo;per thread\u0026rdquo;. Actions by different threads on the same dataset are always inside different transactions.\nImplementations Transactions are part of the interface to RDF Datasets. There is a default implementation, based on MRSW locking (multiple-reader or single-writer) that can be used with any mixed set of components. Certain storage sub-systems provide better concurrency with MR+SW (multiple-read and single writer).\nDataset Facilities Creation TxnMem MR+SW DatasetFactory.createTxnMem TDB MR+SW, persistent TDBFactory.create TDB2 MR+SW, persistent TDB2Factory.create General MRSW DatasetFactory.create The general dataset can have any graphs added to it (e.g. inference graphs).\nMore details of transactions in TDB.\n","permalink":"","tags":null,"title":"Jena Transactions"},{"categories":null,"contents":" API for Transactions Read transactions Write transactions Transaction promotion Txn - A higher level API to transactions API for Transactions This page describes the basic transaction API in Jena (3.1.0 and later).\nThere is also a higher-level API useful in many situations but sometimes it is necessary to use the basic transaction API described here.\nRead transactions These are used for SPARQL queries and code using the Jena API actions that do not change the data. The general pattern is:\ndataset.begin(ReadWrite.READ) ; try { ... } finally { dataset.end() ; } The dataset.end() declares the end of the read transaction. Applications may also call dataset.commit() or dataset.abort() which all have the same effect for a read transaction.\nThis example has two queries - no updates between or during the queries will be seen by this code even if another thread commits changes in the lifetime of this transaction.\nDataset dataset = ... ; dataset.begin(ReadWrite.READ) ; try { String qs1 = \u0026quot;SELECT * {?s ?p ?o} LIMIT 10\u0026quot; ; try(QueryExecution qExec = QueryExecution.create(qs1, dataset)) { ResultSet rs = qExec.execSelect() ; ResultSetFormatter.out(rs) ; } String qs2 = \u0026quot;SELECT * {?s ?p ?o} OFFSET 10 LIMIT 10\u0026quot; ; try(QueryExecution qExec = QueryExecution.create(qs2, dataset)) { rs = qExec.execSelect() ; ResultSetFormatter.out(rs) ; } } finally { dataset.end() ; } Write transactions These are used for SPARQL queries, SPARQL updates and any Jena API actions that modify the data. Beware that large operations to change a dataset may consume large amounts of temporary space.\nThe general pattern is:\ndataset.begin(ReadWrite.WRITE) ; try { ... dataset.commit() ; } finally { dataset.end() ; } The dataset.end() will abort the transaction is there was no call to dataset.commit() or dataset.abort() inside the write transaction.\nOnce dataset.commit() or dataset.abort() is called, the application needs to start a new transaction to perform further operations on the dataset.\nDataset dataset = ... ; dataset.begin(TxnType.WRITE) ; try { Model model = dataset.getDefaultModel() ; // API calls to a model in the dataset // Make some changes via the model ... model.add( ... ) // A SPARQL query will see the new statement added. try (QueryExecution qExec = QueryExecution.create( \u0026quot;SELECT (count(?s) AS ?count) { ?s ?p ?o} LIMIT 10\u0026quot;, dataset)) { ResultSet rs = qExec.execSelect() ; ResultSetFormatter.out(rs) ; } // ... perform a SPARQL Update String sparqlUpdateString = StrUtils.strjoinNL( \u0026quot;PREFIX . \u0026lt;http://example/\u0026gt;\u0026quot;, \u0026quot;INSERT { :s :p ?now } WHERE { BIND(now() AS ?now) }\u0026quot; ) ; UpdateRequest request = UpdateFactory.create(sparqlUpdateString) ; UpdateExecution.dataset(dataset).update(request).execute(); // Finally, commit the transaction. dataset.commit() ; // Or call .abort() } finally { dataset.end() ; } Transaction Types, Modes and Promotion. Transaction have a type (enum TxnType) and a mode (enum ReadWrite). TxnType.READ and TxnType.Write start the transaction in that mode and the mode is fixed for the transaction\u0026rsquo;s lifetime. A READ transaction can never update the data of the transactional object it is acting on.\nTransactions can have type TxnType.READ_PROMOTE or TxnType.READ_COMMITTED_PROMOTE. These start in mode READ but can become mode WRITE, either implicitly by attempting an update, or explicitly by calling promote.\nREAD_PROMOTE only succeeds if no writer has made any changes since this transaction started. It gives full isolation.\nREAD_COMMITTED_PROMOTE always succeeds because it changes the view of the data to include any changes made up to that point (it is \u0026ldquo;read committed\u0026rdquo;). Applications should be aware that data they have read up until the point of promotion (the first call or .promote or first update made) may now be invalid. For this reason, READ_PROMOTE is preferred.\nbegin(), the method with no arguments, is equivalent to begin(TxnType.READ_PROMOTE).\nMulti-threaded use Each dataset object has one transaction active at a time per thread. A dataset object can be used by different threads, with independent transactions.\nThe usual idiom within multi-threaded applications is to have one dataset, and so there is one transaction per thread.\nEither:\n// Create a dataset and keep it globally. static Dataset dataset = TDBFactory.createDataset(location) ; Thread 1:\ndataset.begin(TxnType.WRITE) ; try { ... dataset.commit() ; } finally { dataset.end() ; } Thread 2:\ndataset.begin(TxnType.READ) ; try { ... } finally { dataset.end() ; } It is possible (in TDB) to create different dataset objects to the same location.\nThread 1:\nDataset dataset = TDBFactory.createDataset(location) ; dataset.begin(TxnType.WRITE) ; try { ... dataset.commit() ; } finally { dataset.end() ; } Thread 2:\nDataset dataset = TDBFactory.createDataset(location) ; dataset.begin(TxnType.READ) ; try { ... } finally { dataset.end() ; } Each thread has a separate dataset object; these safely share the same storage and have independent transactions.\nMulti JVM Multiple applications, running in multiple JVMs, using the same file databases is not supported and has a high risk of data corruption. Once corrupted a database cannot be repaired and must be rebuilt from the original source data. Therefore there must be a single JVM controlling the database directory and files. From 1.1.0 onwards TDB includes automatic prevention against multi-JVM which prevents this under most circumstances.\nUse our Fuseki component to provide a database server for multiple applications. Fuseki supports SPARQL Query, SPARQL Update and the SPARQL Graph Store protocol.\n","permalink":"","tags":null,"title":"Jena Transactions API"},{"categories":null,"contents":"The following tutorials take a step-by-step approach to explaining aspects of RDF and linked-data applications programming in Jena. For a more task-oriented description, please see the getting started guide.\nRDF core API tutorial SPARQL tutorial Using Jena with Eclipse Manipulating SPARQL using ARQ Jena tutorials in other languages Quelques uns des tutoriels de Jena sont aussi disponibles en français. Vous pouvez les voir en suivant ces liens:\nUne introduction à RDF Requêtes SPARQL utilisant l\u0026rsquo;API Java ARQ Les entrées/sorties RDF Une introduction à SPARQL Os tutoriais a seguir explicam aspectos de RDF e da programação em Jena de aplicações linked-data. Veja também o guia getting started - em inglês.\nUma introdução à API RDF Tutorial SPARQL Manipulando SPARQL usando ARQ Usando o Jena com o Eclipse Simplified Chinese:\nRDF 和 Jena RDF API 入门 Greek:\nΕφαρμογές του Jena API στο Σημασιολογικό Ιστό ","permalink":"","tags":null,"title":"Jena tutorials"},{"categories":null,"contents":"This page lists various projects and tools related to Jena - classes, packages, libraries, applications, or ontologies that enhance Jena or are built on top of it. These projects are not part of the Jena project itself, but may be useful to Jena users.\nThis list is provided for information purposes only, and is not meant as an endorsement of the mentioned projects by the Jena team.\nIf you wish your contribution to appear on this page, please raise a GitHub or JIRA issue with the details to be published.\nRelated projects Name Description License Creator URL GeoSPARQL Jena Implementation of GeoSPARQL 1.0 standard using Apache Jena for SPARQL query or API. Apache 2.0 Greg Albiston and Haozhe Chen geosparql-jena at GitHub GeoSPARQL Fuseki HTTP server application compliant with the GeoSPARQL standard using GeoSPARQL Jena library and Apache Jena Fuseki server Apache 2.0 Greg Albiston geosparql-fuseki at GitHub Jastor Code generator that emits Java Beans from OWL Web Ontologies Common Public License Ben Szekely and Joe Betz Jastor website NG4J Named Graphs API for Jena BSD license Chris Bizer NG4J website Micro Jena (uJena) Reduced version of Jena for mobile devices as per Jena Fulvio Crivellaro and Gabriele Genovese and Giorgio Orsi Micro Jena Gloze XML to RDF, RDF to XML, XSD to OWL mapping tool as per Jena Steve Battle jena files page WYMIWYG KnoBot A fully Jena based semantic CMS. Implements URIQA. File-based persistence. Apache Reto Bachmann-Gmuer / Download KnoBot Infinite Graph An infinite graph implementation for RDF graphs BSD UTD Infinite Graph for Jena Twinkle A GUI interface for working with SPARQL queries Public Domain Leigh Dodds Twinkle project homepage GLEEN A path expression (a.k.a. \u0026ldquo;regular paths\u0026rdquo;) property function library for ARQ SparQL Apache 2.0 Todd Detwiler - University of Washington Structural Informatics Group GLEEN home Jena Sesame Model Jena Sesame Model - Sesame triple store for Jena models GNU Weijian Fang Jena Sesame Model D2RQ Treats non-RDF databases as virtual Jena RDF graphs GNU GPL License Chris Bizer D2RQ website GeoSpatialWeb This projects adds geo-spatial predicates and reasoning features to Jena property functions. GNU GPL License Marco Neumann and Taylor Cowan GeoSpatialWeb Jenabean Jenabean uses Jena\u0026rsquo;s flexible RDF/OWL API to persist Java beans. Apache 2.0 Taylor Cowan and David Donohue Jenabean project page Persistence Annotations 4 RDF Persistence Annotation for RDF (PAR) is a set of annotations and an entity manager that provides JPA like functionality on top of an RDF store while accounting for and exploiting the fundamental differences between graph storage and relational storage. PAR introduces three (3) annotations that map a RDF triple (subject, predicate, object) to a Plain Old Java Object (POJO) using Java\u0026rsquo;s dynamic proxy capabilities. Apache 2.0 Claude Warren PA4RDF at Sourceforge Semantic_Forms Swiss army knife for data management and social networking. open source Jean-Marc Vanel Semantic_Forms JDBC 4 SPARQL JDBC 4 SPARQL is a type 4 JDBC Driver that uses a SPARQL endpoint (or Jena Model) as the data store. Presents graph data as relational data to tools that understand SQL and utilize JDBC Apache 2.0 (Some components GNU LGPL V3.0) Claude Warren jdbc4sparql at GitHub ","permalink":"","tags":null,"title":"Jena-related projects and tools"},{"categories":null,"contents":"As of Jena 2.11.0, LARQ is replaced by jena-text\njena-text includes use of Apache Solr as a shared, search server, or Apache Lucene as a local text index. From Fuseki 0.2.7, jena-text is built into Fuseki.\nLARQ is not compatible with jena-text; the index format has changed and the integration with SPARQL is different.\nLARQ is a combination of ARQ and Lucene. It gives users the ability to perform free text searches within their SPARQL queries. Lucene indexes are additional information for accessing the RDF graph, not storage for the graph itself.\nSome example code is available here:\nTwo helper commands are provided: larq.larqbuilder and larq.larq used respectively for updating and querying LARQ indexes.\nA full description of the free text query language syntax is given in the Lucene query syntax document.\nUsage Patterns There are three basic usage patterns supported:\nPattern 1 : index string literals. The index will return the literals matching the Lucene search pattern. Pattern 2 : index subject resources by string literal. The index returns the subjects with property value matching a text query. Pattern 3 : index graph nodes based on strings not present in the graph. Patterns 1 and 2 have the indexed content in the graph. Both 1 and 2 can be modified by specifying a property so that only values of a given property are indexed. Pattern 2 is less flexible as discussed below. Pattern 3 is covered in the external content section below.\nLARQ can be used in other ways as well but the classes for these patterns are supplied. In both patterns 1 and 2, strings are indexed, being plain strings, string with any language tag or any literal with datatype XSD string.\nIndex Creation There are many ways to use Lucene, which can be set up to handle particular features or languages. The creation of the index is done outside of the ARQ query system proper and only accessed at query time. LARQ includes some platform classes and also utility classes to create indexes on string literals for the use cases above. Indexing can be performed as the graph is read in, or to built from an existing graph.\nIndex Builders An index builder is a class to create a Lucene index from RDF data.\nIndexBuilderString: This is the most commonly used index builder. It indexes plain literals (with or without language tags) and XSD strings and stores the complete literal. Optionally, a property can be supplied which restricts indexing to strings in statements using that property. IndexBuilderSubject: Index the subject resource by a string literal, a store the subject resource, possibly restricted by a specified property. Lucene has many ways to create indexes and the index builder classes do not attempt to provide all possible Lucene features. Applications may need to extend or modify the standard index builders provided by LARQ.\nIndex Creation An index can be built while reading RDF into a model:\n// -- Read and index all literal strings. IndexBuilderString larqBuilder = new IndexBuilderString() ; // -- Index statements as they are added to the model. model.register(larqBuilder) ; FileManager.get().readModel(model, datafile) ; // -- Finish indexing larqBuilder.closeWriter() ; model.unregister(larqBuilder) ; // -- Create the access index IndexLARQ index = larqBuilder.getIndex() ; or it can be created from an existing model:\n// -- Create an index based on existing statements larqBuilder.indexStatements(model.listStatements()) ; // -- Finish indexing larqBuilder.closeWriter() ; // -- Create the access index IndexLARQ index = larqBuilder.getIndex() ; Index Registration Next the index is made available to ARQ. This can be done globally:\n// -- Make globally available LARQ.setDefaultIndex(index) ; or it can be set on a per-query execution basis.\nQueryExecution qExec = QueryExecutionFactory.create(query, model) ; // -- Make available to this query execution only LARQ.setDefaultIndex(qExec.getContext(), index) ; In both these cases, the default index is set, which is the one expected by property function pf:textMatch. Use of multiple indexes in the same query can be achieved by introducing new properties. The application can subclass the search class org.apache.jena.larq.LuceneSearch to set different indexes with different property names.\nQuery using a Lucene index Query execution is as usual using the property function pf:textMatch. \u0026ldquo;textMatch\u0026rdquo; can be thought of as an implied relationship in the data. Note the prefix ends in \u0026ldquo;.\u0026rdquo;.\nString queryString = StringUtils.join(\u0026quot;\\n\u0026quot;, new String[]{ \u0026quot;PREFIX pf: \u0026lt;\u0026gt;\u0026quot;, \u0026quot;SELECT * {\u0026quot; , \u0026quot; ?lit pf:textMatch '+text'\u0026quot;, \u0026quot;}\u0026quot; }) ; Query query = QueryFactory.create(queryString) ; QueryExecution qExec = QueryExecutionFactory.create(query, model) ; ResultSetFormatter.out(System.out, qExec.execSelect(), query) ; The subjects with a property value of the matched literals can be retrieved by looking up the literals in the model:\nPREFIX pf: \u0026lt;\u0026gt; SELECT ?doc { ?lit pf:textMatch '+text' . ?doc ?p ?lit } This is a more flexible way of achieving the effect of using a IndexBuilderSubject. IndexBuilderSubject can be more compact when there are many large literals (it stores the subject not the literal) but does not work for blank node subjects without extremely careful co-ordination with a persistent model. Looking the literal up in the model does not have this complication.\nAccessing the Lucene Score The application can get access to the Lucene match score by using a list argument for the subject of pf:textMatch. The list must have two arguments, both unbound variables at the time of the query.\nPREFIX pf: \u0026lt;\u0026gt; SELECT ?doc ?score { (?lit ?score ) pf:textMatch '+text' . ?doc ?p ?lit } Limiting the number of matches When used with just a query string, pf:textMatch returns all the Lucene matches. In many applications, the application is only interested in the first few matches (Lucene returns matches in order, highest scoring first), or only matches above some score threshold. The query argument that forms the object of the pf:textMatch property can also be a list, including a score threshold and a total limit on the number of results matched.\n?lit pf:textMatch ( '+text' 100 ) . # Limit to at most 100 hits ?lit pf:textMatch ( '+text' 0.5 ) . # Limit to Lucene scores of 0.5 and over. ?lit pf:textMatch ( '+text' 0.5 100 ) . # Limit to scores of 0.5 and limit to 100 hits Direct Application Use The IndexLARQ class provides the ability to search programmatically, not just from ARQ. The searchModelByIndex method returns an iterator over RDFNodes.\n// -- Create the access index IndexLARQ index = larqBuilder.getIndex() ; NodeIterator nIter = index.searchModelByIndex(\u0026quot;+text\u0026quot;) ; for ( ; nIter.hasNext() ; ) { // if it's an index storing literals ... Literal lit = (Literal)nIter.nextNode() ; } External Content Pattern 3: index graph nodes based on strings not present in the graph. Sometimes, the index needs to be created based on external material and the index gives nodes in the graph. This can be done by using IndexBuilderNode which is a helper class to relate external material to some RDF node.\nHere, the indexed content is not in the RDF graph at all. For example, the indexed content may come from HTML.XHTML, PDFs or XML documents and the RDF graph only holds the metadata about these content items.\nThe Lucene contributions page lists some content converters.\nGetting Help and Getting Involved If you have a problem with LARQ, make sure you read the Getting help with Jena page and post a message on the mailing list. You can also search the jena-users mailing list archives here.\nIf you use LARQ and you want to get involved, make sure you read the Getting Involved page. You can help us making LARQ better by:\nimproving this documentation, writing tutorials or blog posts about LARQ letting us know how you use LARQ, your use cases and what are in your opinion missing features answering users question about LARQ on the mailing list submitting bug reports and feature requests on JIRA: voting or submitting patches for the currently open bugs or improvements for LARQ checking out LARQ source code, playing with it and let us know your ideas for possible improvements: ","permalink":"","tags":null,"title":"LARQ - adding free text searches to SPARQL"},{"categories":null,"contents":"The Apache Jena Elephas libraries for Apache Hadoop are a collection of maven artifacts which can be used individually or together as desired. These are available from the same locations as any other Jena artifact, see Using Jena with Maven for more information.\nHadoop Dependencies The first thing to note is that although our libraries depend on relevant Hadoop libraries these dependencies are marked as provided and therefore are not transitive. This means that you may typically also need to declare these basic dependencies as provided in your own POM:\n\u0026lt;!-- Hadoop Dependencies --\u0026gt; \u0026lt;!-- Note these will be provided on the Hadoop cluster hence the provided scope --\u0026gt; \u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.hadoop\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;hadoop-common\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;2.6.0\u0026lt;/version\u0026gt; \u0026lt;scope\u0026gt;provided\u0026lt;/scope\u0026gt; \u0026lt;/dependency\u0026gt; \u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.hadoop\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;hadoop-mapreduce-client-common\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;2.6.0\u0026lt;/version\u0026gt; \u0026lt;scope\u0026gt;provided\u0026lt;/scope\u0026gt; \u0026lt;/dependency\u0026gt; Using Alternative Hadoop versions If you wish to use a different Hadoop version then we suggest that you build the modules yourself from source which can be found in the jena-elephas folder of our source release (available on the Downloads page) or from our Git repository (see Getting Involved for details of the repository).\nWhen building you need to set the hadoop.version property to the desired version e.g.\n\u0026gt; mvn clean package -Dhadoop.version=2.4.1 Would build for Hadoop 2.4.1\nNote that we only support Hadoop 2.x APIs and so Elephas cannot be built for Hadoop 1.x\nJena RDF Tools for Apache Hadoop Artifacts Common API The jena-elephas-common artifact provides common classes for enabling RDF on Hadoop. This is mainly composed of relevant Writable implementations for the various supported RDF primitives.\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-elephas-common\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;x.y.z\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; IO API The IO API artifact provides support for reading and writing RDF in Hadoop:\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-elephas-io\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;x.y.z\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; Map/Reduce The Map/Reduce artifact provides various building block mapper and reducer implementations to help you get started writing Map/Reduce jobs over RDF data quicker:\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-elephas-mapreduce\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;x.y.z\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; RDF Stats Demo The RDF Stats Demo artifact is a Hadoop job jar which can be used to run some simple demo applications over your own RDF data:\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-elephas-stats\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;x.y.z\u0026lt;/version\u0026gt; \u0026lt;classifier\u0026gt;hadoop-job\u0026lt;/classifier\u0026gt; \u0026lt;/dependency\u0026gt; ","permalink":"","tags":null,"title":"Maven Artifacts for Apache Jena Elephas"},{"categories":null,"contents":"The Jena JDBC libraries are a collection of maven artifacts which can be used individually or together as desired. These are available from the same locations as any other Jena artifact, see Using Jena with Maven for more information.\nCore Library The jena-jdbc-core artifact is the core library that contains much of the common implementation for the drivers. This is a dependency of the other artifacts and will typically only be required as a direct dependency if you are implementing a custom driver\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-jdbc-core\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;x.y.z\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; In-Memory Driver The in-memory driver artifact provides the JDBC driver for non-persistent in-memory datasets.\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-jdbc-driver-mem\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;x.y.z\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; TDB Driver The TDB driver artifact provides the JDBC driver for TDB datasets.\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-jdbc-driver-tdb\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;x.y.z\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; Remote Endpoint Driver The Remote Endpoint driver artifact provides the JDBC driver for accessing arbitrary remote SPARQL compliant stores.\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-jdbc-driver-remote\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;x.y.z\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; Driver Bundle The driver bundle artifact is a shaded JAR (i.e. with dependencies included) suitable for dropping into tools to easily make Jena JDBC drivers available without having to do complex class path setups.\nThis artifact depends on all the other artifacts.\n\u0026lt;dependency\u0026gt; \u0026lt;groupId\u0026gt;org.apache.jena\u0026lt;/groupId\u0026gt; \u0026lt;artifactId\u0026gt;jena-jdbc-driver-bundle\u0026lt;/artifactId\u0026gt; \u0026lt;version\u0026gt;x.y.z\u0026lt;/version\u0026gt; \u0026lt;/dependency\u0026gt; ","permalink":"","tags":null,"title":"Maven Artifacts for Jena JDBC"},{"categories":null,"contents":"Apache Jena3 is a major version release for Jena - it is not binary compatible with Jena2. The migration consists of package renaming and database reloading.\nKey Changes Package renaming RDF 1.1 Semantics for plain literals Persistent data (TDB, SDB) should be reloaded. Java8 is required. Security renamed to Permissions. Security Evaluator changes required Package Name Changes Packages with a base name of com.hp.hpl.jena become org.apache.jena.\nGlobal replacement of import com.hp.hpl.jena. with import org.apache.jena. will cover the majority of cases.\nThe Jena APIs remain unchanged expect for this renaming.\nVocabularies unchanged Only java package names are being changed. Vocabularies are not affected.\nAssemblers Migration support is provided by mapping ja:loadClass names beginning com.hp.hpl.jena internally to org.apache.jena. A warning is logged.\nLogging This will also affect logging: logger names reflect the java class naming so loggers for com.hp.hpl.jena become org.apache.jena\nRDF 1.1 Many of the changes and refinements for RDF 1.1 are already in Jena2. The parsers for Turtle-family languages already follow the RDF 1.1 grammars and output is compatible with RDF 1.1 as well as earlier output details.\nRDF 1.1 changes for plain literals In RDF 1.1, all literals have a datatype. The datatype of a plain literal with no language tag (also called a \u0026ldquo;simple literal\u0026rdquo;) has datatype xsd:string. A plain literal with a language tag has datatype rdf:langString.\nConsequences:\n\u0026quot;abc\u0026quot; and \u0026quot;abc\u0026quot;^^xsd:string are the same RDF term in RDF 1.1. Jena2 memory models have always treated these as the same value, but different terms. Jena2 persistent models treated them as two separate term and two separate values.\nData is not invalidated by this change.\nThe parsers will give datatypes to all data read, there is no need to change the data.\nOutput is in the datatype-less form (an abbreviated syntax) even in N-triples.\nApplications which explicitly use ^^xsd:string (or in RDF/XML, rdf:datatype=\u0026quot;\u0026quot;) will see a change in appearance.\nApplications with a mix of plain literals and explicit ^^xsd:string (the RDF 1.1 Work Group believed these to be uncommon) may see changes.\nApplications that do their own RDF output need to be careful to not assume that having datatype excludes the possibility of also having a language tag.\nPersistent Data For data stored in TDB and SDB, it is advisable to reload data.\nData that does not use explicit xsd:string should be safe but it is still recommended that data is reloaded at a convenient time.\nData that does use explicit xsd:string must be reloaded.\nSecurity package renamed to Permissions Jena Security has been renamed Jena Permissions and the Maven artifact id has been changed to jena-permissions to reflect this change.\nShim code that was introduced to map Jena classes to security classes has been removed. This change requires changes to SecurityEvaluator implementations. More details are available at the Permissions migration documentation.\nOther GraphStore interface has been removed ModelFactory.createFileModelMaker has been removed LateBindingIterator has been removed: use LazyIterator instead EarlyBindingIterator has been removed: no replacement UniqueExtendedIterator has been removed: use ExtendedIterator with unique filter ","permalink":"","tags":null,"title":"Migrating from Jena2 to Jena3"},{"categories":null,"contents":"Note: These notes are not kept up to date.\nThey may be of interest into the original design of the Enhanced Node mechanism.\nEnhanced Nodes This note is a development of the original note on the enhanced node and graph design of Jena 2.\nKey objectives for the enhanced node design One problem with the Jena 1 design was that both the DAML layer and the RDB layer independently extended Resource with domain-specific information. That made it impossible to have a DAML-over-RDB implementation. While this could have been fixed by using the \u0026ldquo;enhanced resource\u0026rdquo; mechanism of Jena 1, that would have left a second problem.\nIn Jena 1.0, once a resource has been determined to be a DAML Class (for instance), that remains true for the lifetime of the model. If a resource starts out not qualifying as a DAML Class (no rdf:type daml:Class) then adding the type assertion later doesn\u0026rsquo;t make it a Class. Similarly, of a resource is a DAML Class, but then the type assertion is retracted, the resource is still apparently a class.\nHence being a DAMLClass is a view of the resource that may change over time. Moreover, a given resource may validly have a number of different views simultaneously. Using the current DAMLClass implementation method means that a given resource is limited to a single such view.\nA key objective of the new design is to allow different views, or facets, to be used dynamically when accessing a node. The new design allows nodes to be polymorphic, in the sense that the same underlying node from the graph can present different encapsulations\nthus different affordances to the programmer - on request. In summary, the enhanced node design in Jena 2.0 allows programmers to:\nprovide alternative perspectives onto a node from a graph, supporting additional functionality particular to that perspective; dynamically convert a between perspectives on nodes; register implementations of implementation classes that present the node as an alternative perspective. Terminology To assist the following discussion, the key terms are introduced first.\nnode ~ A subject or object from a triple in the underlying graph graph ~ The underlying container of RDF triples that simplifies the previous abstraction Model enhanced node ~ An encapsulation of a node that adds additional state or functionality to the interface defined for node. For example, a bag is a resource that contains a number of other resources; an enhanced node encapsulating a bag might provide simplified programmatic access to the members of the bag. enhanced graph ~ Just as an enhanced node encapsulates a node and adds extra functionality, an enhanced graph encapsulates an underlying graph and provides additional features. For example, both Model and DAMLModel can be thought of as enhancements to the (deliberately simple) interface to graphs. polymorphic ~ An abstract super-class of enhanced graph and enhanced node that exists purely to provide shared implementation. personality ~ An abstraction that circumscribes the set of alternative views that are available in a given context. In particular, defines a mapping from types (q.v.) to implementations (q.v.). This seems to be taken to be closed for graphs. implementation ~ A factory object that is able to generate polymorphic objects that present a given enhanced node according to a given type. For example, an alt implementation can produce a sub-class of enhanced node that provides accessors for the members of the alt.\nKey points Some key features of the design are:\nevery enhanced graph has a single graph personality, which represents the types of all the enhanced nodes that can be created in this graph; every enhanced node refers to that personality different kinds of enhanced graph can have different personalities, for example, may implement interfaces in different ways, or not implement some at all. enhanced nodes wrap information in the graph, but keep no independent state; they may be discarded and regenerated at whim. How an enhanced node is created Creation from another enhanced node If en is an enhanced node representing some resource we wish to be able to view as being of some (Java) class/interface T, the expression will either deliver an EnhNode of type C, if it is possible to do so, or throw an exception if not.\nTo check if the conversion is allowed, without having to catch exceptions, the expression en.canAs(T.class) delivers true iff the conversion is possible.\nCreation from a base node Somehow, some seed enhanced node must be created, otherwise as() would have nothing to work on. Subclasses of enhanced node provide constructors (perhaps hidden behind factories) which wrap plain nodes up in enhanced graphs. Eventually these invoke the constructor EnhNode(Node,EnhGraph)\nIt\u0026rsquo;s up to the constructors for the enhanced node subclasses to ensure that they are called with appropriate arguments.\ninternal operation of the conversion as(Class T) is defined on EnhNode to invoke asInternal(T) in Polymorphic. If the original enhanced node enis already a valid instance of T, it is returned as the result. Validity is checked by the method isValue().\nIf en is not already of type T, then a cache of alternative views of en is consulted to see if a suitable alternative exists. The cache is implemented as a sibling ring of enhanced nodes - each enhanced node has a link to its next sibling, and the \u0026ldquo;last\u0026rdquo; node links back to the \u0026ldquo;first\u0026rdquo;. This makes it cheap to find alternative views if there are not too many of them, and avoids caches filling up with dead nodes and having to be flushed.\nIf there is no existing suitable enhanced node, the node\u0026rsquo;s personality is consulted. The personality maps the desired class type to an Implementation object, which is a factory with a wrap method which takes a (plain) node and an enhanced graph and delivers the new enhanced node after checking that its conditions apply. The new enhanced node is then linked into the sibling ring.\nHow to build an enhanced node \u0026amp; graph What you have to do to define an enhanced node/graph implementation:\ndefine an interface I for the new enhanced node. (You could use just the implementation class, but we\u0026rsquo;ve stuck with the interface, because there might be different implementations) define the implementation class C. This is just a front for the enhanced node. All the state of C is reflected in the graph (except for caching; but beware that the graph can change without notice). define an Implementation class for the factory. This class defines methods canWrap and wrap, which test a node to see if it is allowed to represent I and construct an implementation of Crespectively. Arrange that the personality of the graph maps the class of I to the factory. At the moment we do this by using (a copy of) the built-in graph personality as the personality for the enhanced graph. For an example, see the code for ReifiedStatementImpl.\nReification API Introduction This document describes the reification API in Jena2, following discussions based on the 0.5a document. The essential decision made during that discussion is that reification triples are captured and dealt with by the Model transparently and appropriately.\nContext The first Jena implementation made some attempt to optimise the representation of reification. In particular it tried to avoid so called \u0026rsquo;triple bloat\u0026rsquo;, ie requiring four triples to represent the reification of a statement. The approach taken was to make a Statement a subclass of Resource so that properties could be directly attached to statement objects.\nThere are a number of defects in the Jena 1 approach.\nNot everyone in the team was bought in to the approach The .equals() method for Statements was arguably wrong and also violated the Java requirements on a .equals() The implied triples of a reification were not present so could not be searched for There was confusion between the optimised representation and explicit representation of reification using triples The optimisation did not round trip through RDF/XML using the the writers and ARP. However, there are some supporters of the approach. They liked:\nthe avoidance of triple bloat that the extra reifications statements are not there to be found on queries or ListStatements and do not affect the size() method. Since Jena was first written the RDFCore WG have clarified the meaning of a reified statement. Whilst Jena 1 took a reified statement to denote a statement, RDFCore have decided that a reified statement denotes an occurrence of a statement, otherwise called a stating. The Jena 1 .equals() methods for Statements is thus inappropriate for comparing reified statements. The goal of reification support in the Jena 2 implementation are:\nto conform to the revised RDF specifications to maintain the expectations of Jena 1; ie they should still be able to reify everything without worrying about triple bloat if they want to as far as is consistent with 2, to not break existing code, or at least make it easy to transition old code to Jena 2. to enable round tripping through RDF/XML and other RDF representation languages enable a complete standard compliant implementation, but not necessarily as default Presentation API Statement will no longer be a subclass of Resource. Thus a statement may not be used where a resource is expected. Instead, a new interface ReifiedStatement will be defined:\npublic interface ReifiedStatement extends Resource { public Statement getStatement(); // could call it a day at that or could duplicate convenience // methods from Statement, eg getSubject(), getInt(). ... } The Statement interface will be extended with the following methods: public interface Statement \u0026hellip; public ReifiedStatement createReifiedStatement(); public ReifiedStatement createReifiedStatement(String URI); /* / public boolean isReified(); public ReifiedStatement getAnyReifiedStatement(); / / public RSIterator listReifiedStatements(); / */ public void removeAllReifications(); \u0026hellip;\nRSIterator is a new iterator which returns ReifiedStatements. It is an extension of ResourceIterator. The Model interface will be extended with the following methods:\npublic interface Model ... public ReifiedStatement createReifiedStatement(Statement stmt); public ReifiedStatement createReifiedStatement(String URI, Statement stmt); /* */ public boolean isReified(Statement st); public ReifiedStatement getAnyReifiedStatement(Statement stmt); /* */ public RSIterator listReifiedStatements(); public RSIterator listReifiedStatements(Statement stmt); /* */ public void removeReifiedStatement(reifiedStatement rs); public void removeAllReifications(Statement st); ... The methods in Statement are defined to be the obvious calls of methods in Model. The interaction of those models is expressed below. Reification operates over statements in the model which use predicates rdf:subject, rdf:predicate, rdf:object, and rdf:type with object rdf:Statement. statements with those predicates are, by default, invisible. They do not appear in calls of listStatements, contains, or uses of the Query mechanism. Adding them to the model will not affect size(). Models that do not hide reification quads will also be available.\nRetrieval The Model::as() mechanism will allow the retrieval of reified statements.\ ReifiedStatement.class ) If someResource has an associated reification quad, then this will deliver an instance rs of ReifiedStatement such that rs.getStatement() will be the statement rs reifies. Otherwise a DoesNotReifyException will be thrown. (Use the predicate canAs() to test if the conversion is possible.) It does not matter how the quad components have arrived in the model; explicitly asserted or by the create mechanisms described below. If quad components are removed from the model, existing ReifiedStatement objects will continue to function, but conversions using as() will fail.\nCreation createReifiedStatement(Statement stmt) creates a new ReifiedStatement object that reifies stmt; the appropriate quads are inserted into the model. The resulting resource is a blank node.\ncreateReifiedStatement(String URI, Statement stmt) creates a new ReifiedStatement object that reifies stmt; the appropriate quads are inserted into the model. The resulting resource is a Resource with the URI given.\nEquality Two reified statements are .equals() iff they reify the same statement and have .equals() resources. Thus it is possible for equal Statements to have unequal reifications.\nIsReified isReified(Statement st) is true iff in the Model of this Statement there is a reification quad for this Statement. It does not matter if the quad was inserted piece-by-piece or all at once using a create method.\nFetching getAnyReifiedStatement(Statement st) delivers an existing ReifiedStatement object that reifies st, if there is one; otherwise it creates a new one. If there are multiple reifications for st, it is not specified which one will be returned.\nListing listReifiedStatements() will return an RSIterator which will deliver all the reified statements in the model.\nlistReifiedStatements( Statement st ) will return an RSIterator which will deliver all the reified statements in the model that reifiy st.\nRemoval removeReifiedStatement(ReifiedStatement rs) will remove the reification rs from the model by removing the reification quad. Other reified statements with different resources will remain.\nremoveAllReifications(Statement st) will remove all the reifications in this model which reify st.\nInput and output The writers will have access to the complete set of Statements and will be able to write out the quad components.\nThe readers need have no special machinery, but it would be efficient for them to be able to call createReifiedStatement when detecting an reification.\nPerformance Jena1\u0026rsquo;s \u0026ldquo;statements as resources\u0026rdquo; approach avoided triples bloat by not storing the reification quads. How, then, do we avoid triple bloat in Jena2?\nThe underlying machinery is intended to capture the reification quad components and store them in a form optimised for reification. In particular, in the case where a statement is completely reified, it is expected to store only the implementation representation of the Statement.\ncreateReifiedStatement is expected to bypass the construction and detection of the quad components, so that in the \u0026ldquo;usual case\u0026rdquo; they will never come into existence.\nThe Reification SPI Introduction This document describes the reification SPI, the mechanisms by which the Graph family supports the Model API reification interface.\nGraphs handle reification at two levels. First, their reifier supports requests to reify triples and to search for reifications. The reifier is responsible for managing the reification information it adds and removes - the graph is not involved.\nSecond, a graph may optionally allow all triples added and removed through its normal operations (including the bulk update interfaces) to be monitored by its reifier. If so, all appropriate triples become the property of the reifier - they are no longer visible through the graph.\nA graph may also have a reifier that doesn\u0026rsquo;t do any reification. This is useful for internal graphs that are not exposed as models. So there are three kinds of Graph:\nGraphs that do no reification; Graphs that only do explicit reification; Graphs that do implicit reification.\nGraph operations for reification The primary reification operation on graphs is to extract their Reifier instance. Handing reification off to a different class allows reification to be handled independently of other Graph issues, eg query handling, bulk update.\nGraph.getReifier() -\u0026gt; Reifier Returns the Reifier for this Graph. Each graph has a single reifier during its lifetime. The reifier object need not be allocated until the first call of getReifier().\nadd(Triple), delete(Triple) These two operations may defer their triples to the graph\u0026rsquo;s reifier using handledAdd(Triple) and handledDelete(Triple); see below for details.\nInterface Reifier Instances of Reifier handle reification requests from their Graph and from the API level code (issues by the API class ModelReifier.\nreifier.getHiddenTriples() -\u0026gt; Graph The reifier may keep reification triples to itself, coded in some special way, rather than having them stored in the parent Graph. This method exposes those triples as another Graph. This is a dynamic graph - it changes as the underlying reifications change. However, it is read-only; triples cannot be added to or removed from it. The SimpleReifier implementation currently does not implement a dynamic graph. This is a bug that will need fixing.\nreifier.getParentGraph() -\u0026gt; Graph Get the Graph that this reifier serves; the result is never null. (Thus the observable relationship between graphs and reifiers is 1-1.)\nclass AlreadyReifiedException This class extends RDFException; it is the exception that may be thrown by reifyAs.\nreifier.reifyAs( Triple t, Node n ) -\u0026gt; Node Record the t as reified in the parent Graph by the given n and returns n. If n already reifies a different Triple, throw a AlreadyReifiedException. Calling reifyAs(t,n) is like adding the triples:\nn rdf:type ref:Statement n rdf:subject t.getSubject() n rdf:predicate t.getPredicate() n rdf:object t.getObject() to the associated Graph; however, it is intended that it is efficient in both time and space.\nreifier.hasTriple( Triple t ) -\u0026gt; boolean Returns true iff some Node n reifies t in this Reifier, typically by an unretracted call of reifyAs(t,n). The intended (and actual) use for hasTriple(Triple) is in the implementation of isReified(Statement) in Model.\nreifier.getTriple( Node n ) -\u0026gt; Triple Get the single Triple associated with n, if there is one. If there isn\u0026rsquo;t, return null. A node reifies at most one triple. If reifyAs, with its explicit check, is bypassed, and extra reification triples are asserted into the parent graph, then getTriple() will simply return null.\nreifier.allNodes() -\u0026gt; ExtendedIterator Returns an (extended) iterator over all the nodes that (still) reifiy something in this reifier. This is intended for the implementation of listReifiedStatements in Model.\nreifier.allNodes( Triple t ) -\u0026gt; ClosableIterator Returns an iterator over all the nodes that (still) reify the triple _t_.\nreifier.remove( Node n, Triple t ) Remove the association between n and the triplet. Subsequently, hasNode(n) will return false and getTriple(n) will return null. This method is used to implement removeReification(Statement) in Model.\nreifier.remove( Triple t ) Remove all the associations between any node n and t; ie, for all n do remove(n,t). This method is used to implement removeAllReifications in Model.\nhandledAdd( Triple t ) -\u0026gt; boolean A graph doing reification may choose to monitor the triples being added to it and have the reifier handle reification triples. In this case, the graph\u0026rsquo;s add(t) should call handledAdd(t) and only proceed with its add if the result is false. A graph that does not use handledAdd() [and handledDelete()] can only use the explicit reification supplied by its reifier.\nhandledRemove( Triple t ) As for handledAdd(t), but applied to delete.\nSimpleReifier SimpleReifier is an implementation of Reifier suitable for in-memory Graphs built over GraphBase. It operates in either of two modes: with and without triple interception. With interception enabled, reification triples fed to (or removed from) its parent graph are captured using handledAdd() and handledRemove; otherwise they are ignored and the graph must store them itself. SimpleReifier keeps a map from nodes to the reification information about that node. Nodes which have no reification information (most of them, in the usual case) do not appear in the map at all.\nNodes with partial or excessive reification information are associated with Fragments. A Fragments for a node n records separately\nthe Ss of all n ref:subject S triples the Ps of all n ref:predicate P triples the Os of all n ref:subject O triples the Ts of all n ref:type T[Statement] triples If the Fragments becomes singular, ie each of these sets contains exactly one element, then n represents a reification of the triple (S, P, O), and the Fragments object is replaced by that triple. (If another reification triple for n arrives, then the triple is re-exploded into Fragments.)\n","permalink":"","tags":null,"title":"Notes on Jena internals"},{"categories":null,"contents":"A Parameterized SPARQL String is a SPARQL query/update into which values may be injected.\nThe intended usage of this is where using a QuerySolutionMap as initial bindings is either inappropriate or not possible e.g.\nGenerating query/update strings in code without lots of error prone and messy string concatenation Preparing a query/update for remote execution Where you do not want to simply say some variable should have a certain value but rather wish to insert constants into the query/update in place of variables Defending against SPARQL injection when creating a query/update using some external input, see SPARQL Injection notes for limitations. Provide a more convenient way to prepend common prefixes to your query This class is useful for preparing both queries and updates hence the generic name as it provides programmatic ways to replace variables in the query with constants and to add prefix and base declarations. A Query or UpdateRequest can be created using the asQuery() and asUpdate() methods assuming the command an instance represents is actually valid as a query/update.\nBuilding parameterised commands A ParameterizedSparqlString is created as follows:\nParameterizedSparqlString pss = new ParameterizedSparqlString(); There are also constructor overloads that take in an initial command text, parameter values, namespace prefixes etc. which may allow you to simplify some code.\nOnce you have an instance you first set your template command with the setCommandText() method like so:\npss.setCommandText(\u0026quot;SELECT * WHERE {\\n\u0026quot; + \u0026quot; ?s a ?type .\\n\u0026quot; + \u0026quot; OPTIONAL { ?s rdfs:label ?label . }\\n\u0026quot; + \u0026quot;}\u0026quot;); Note that in the above example we did not define the rdfs: prefix so as it stands the query is invalid. However you can automatically populate BASE and PREFIX declarations for your command without having to explicitly declare them in your command text by using the setBaseUri() and setNsPrefix() method e.g.\n// Add a Base URI and define the rdfs prefix pss.setBaseUri(\u0026quot;\u0026quot;); pss.setNsPrefix(\u0026quot;rdfs\u0026quot;, \u0026quot;\u0026quot;); You can always call toString() to see the current state of your instance e.g.\n// Print current state to stdout System.out.println(pss.toString()); Which based on the calls so far would print the following:\nBASE \u0026lt;\u0026gt; PREFIX rdfs: \u0026lt;\u0026gt; SELECT * WHERE { ?s a ?type . OPTIONAL { ?s rdfs:label ?label . } } Note that the state of the instance returned by toString() will include any injected values. Part of what the toString() method does is check that your command is not subject to SPARQL injection attacks so in some cases where a possible injection is detected an ARQException will be thrown.\nInjecting Values Once you have a command text prepared then you want to actually inject values into it, values may be injected in several ways:\nBy treating a variable in the SPARQL string as a parameter Using JDBC style positional parameters Appending values directly to the command text being built See the ParameterizedSparqlString javadocs for a comprehensive reference of available methods for setting values, the following sections shows some basic examples of this.\nVariable Parameters Any SPARQL variable in the command text may have a value injected to it, injecting a value replaces all usages of that variable in the command i.e. substitutes the variable for a constant. Importantly injection is done by textual substitution so in some cases may cause unexpected side effects.\nVariables parameters are set via the various setX() methods which take a String as their first argument e.g.\n// Set an IRI pss.setIri(\u0026quot;x\u0026quot;, \u0026quot;\u0026quot;); // Set a Literal pss.setLiteral(\u0026quot;x\u0026quot;, 1234); pss.setLiteral(\u0026quot;x\u0026quot;, true); pss.setLiteral(\u0026quot;x\u0026quot;, \u0026quot;value\u0026quot;); Where you set a value for a variable you have already set the existing value is overwritten. Setting any value to null has the same effect as calling the clearParam(\u0026quot;x\u0026quot;) method\nIf you have the value already as a RDFNode or Node instance you can call the setParam() method instead e.g.\n// Set a Node Node n = NodeFactory.createIRI(\u0026quot;\u0026quot;); pas.setParam(\u0026quot;x\u0026quot;, n); Positional Parameters You can use JDBC style positional parameters if you prefer, a JDBC style parameter is a single ? followed by whitespace or certain punctuation characters (currently ; , .). Positional parameters have a unique index which reflects the order in which they appear in the string. Note that positional parameters use a zero based index.\nPositional parameters are set via the various setX() methods which take an int as their first argument e.g.\n// Set an IRI pss.setIri(0, \u0026quot;\u0026quot;); // Set a Literal pss.setLiteral(0, 1234); pss.setLiteral(0, true); pss.setLiteral(0, \u0026quot;value\u0026quot;); Where you set a value for a variable you have already set the existing value is overwritten. Setting any value to null has the same effect as calling the clearParam(0) method\nIf you have the value already as a RDFNode or Node instance you can call the setParam() method instead e.g.\n// Set a Node Node n = NodeFactory.createIRI(\u0026quot;\u0026quot;); pas.setParam(0, n); Non-existent parameters Where you try to set a variable/positional parameter that does not exist there will be no feedback that the parameter does not exist, however the value set will not be included in the string produced when calling the toString() method.\nBuffer Usage Additionally you may use this purely as a StringBuffer replacement for creating commands since it provides a large variety of convenience methods for appending things either as-is or as nodes (which causes appropriate formatting to be applied).\nFor example we could add an ORDER BY clause to our earlier example like so:\n// Add ORDER BY clause pss.append(\u0026quot;ORDER BY ?s\u0026quot;); Be aware that the basic append() methods append the given value as-is without any special formatting applied, if you wanted to use the value being appended as a constant in the SPARQL query then you should use the appropriate appendLiteral(), appendIri() or appendNode() method e.g.\n// Add a LIMIT clause pss.append(\u0026quot;LIMIT \u0026quot;); pss.appendLiteral(50); Getting a Query/Update Once you\u0026rsquo;ve prepared your command you should then call the asQuery() or asUpdate() method to get it as a Query or UpdateRequest object as appropriate. Doing this calls toString() to produce the final version of your command with all values injected and runs it through the appropriate parser (either QueryFactory or UpdateFactory).\nYou can then use the returned Query or UpdateRequest object as you would normally to make a query/update.\nSPARQL Injection Notes First a couple of warnings:\nThis class does not in any way check that your command is syntactically correct until such time as you try and parse it as a Query or UpdateRequest. Injection is done purely based on textual replacement, it does not understand or respect variable scope in any way. For example if your command text contains sub queries you should ensure that variables within the sub query which you don\u0026rsquo;t want replaced have distinct names from those in the outer query you do want replaced (or vice versa) While this class was in part designed to prevent SPARQL injection it is by no means foolproof because it works purely at the textual level. The current version of the code addresses some possible attack vectors that the developers have identified but we do not claim to be sufficiently devious to have thought of and prevented every possible attack vector.\nTherefore we strongly recommend that users concerned about SPARQL Injection attacks perform their own validation on provided parameters and test their use of this class themselves prior to its use in any security conscious deployment. We also recommend that users do not use easily guess-able variable names for their parameters as these can allow a chained injection attack though generally speaking the code should prevent these.\n","permalink":"","tags":null,"title":"Parameterized SPARQL String"},{"categories":null,"contents":"The origins of RDF as a representation language include frame languages, in which an object, or frame, was the main unit of structuring data. Frames have slots, for example a Person frame might have an age slot, a heightslot etc. RDF, however, has taken a step beyond frame languages by making rdf:Property a first class value, not an element of a frame or resource per se. In RDF, for example, an age property can be defined: \u0026lt;rdf:Property rdf:ID=\u0026quot;age\u0026quot;\u0026gt;, and then applied to any resource, including, but not limited to a Person resource.\nWhile this introduces an extra element of modelling flexibility in RDF, it is often the case that users want to treat some components in their models in a more structured way, similar to the original idea of frames. It is often assumed that rdfs:domain restricts a property to be used only on resources that are in the domain class. For example, a frequently asked question on the Jena support list is why the following is not an error:\n\u0026lt;rdfs:Class rdf:ID=\u0026quot;Person\u0026quot; /\u0026gt; \u0026lt;rdfs:Class rdf:ID=\u0026quot;Truck\u0026quot; /\u0026gt; \u0026lt;rdf:Property rdf:ID=\u0026quot;age\u0026quot;\u0026gt; \u0026lt;rdfs:domain rdf:resource=\u0026quot;Person\u0026quot; /\u0026gt; \u0026lt;/rdf:Property\u0026gt; \u0026lt;Truck rdf:ID=\u0026quot;truck1\u0026quot;\u0026gt; \u0026lt;age\u0026gt;2\u0026lt;/age\u0026gt; \u0026lt;/Truck\u0026gt; Whereas many object-oriented or frame-oriented representations would regard it as an error that the age property was not being applied to a Person, RDF-based applications are simply entitled to infer that truck1 is a (that is, has rdf:type) Truck as well as a Person. This is unlikely to be the case in any real-world domain, but it is a valid RDF inference.\nA consequence of RDF\u0026rsquo;s design is that it is not really possible to answer the commonly asked question \u0026ldquo;Which properties can be applied to resources of class C?\u0026rdquo;. Strictly speaking, the RDF answer is \u0026ldquo;Any property\u0026rdquo;. However, many developers have a legitimate requirement to present a composite view of classes and their associated properties, forming more a more succinct structuring of an ontology or schema. The purpose of this note is to explain the mechanisms built-in to Jena to support a frame-like view of resources, while remaining correct with respect to RDF (and OWL) semantics.\nBasic principles: the properties of a class Since any RDF property can be applied to any RDF resource, we require a definition of the properties of a given class that respects RDF semantics. Consider the following RDF fragment:\n\u0026lt;rdfs:Class rdf:ID=\u0026quot;Person\u0026quot; /\u0026gt; \u0026lt;rdf:Property rdf:ID=\u0026quot;age\u0026quot; /\u0026gt; \u0026lt;Person rdf:ID=\u0026quot;jane_doe\u0026quot;\u0026gt; \u0026lt;age\u0026gt;23\u0026lt;/a\u0026gt; \u0026lt;/Person\u0026gt; Now consider that we add to this fragment that:\n\u0026lt;rdf:Property rdf:about=\u0026quot;age\u0026quot;\u0026gt; \u0026lt;rdfs:domain rdf:resource=\u0026quot;Person\u0026quot; /\u0026gt; \u0026lt;/rdf:Property\u0026gt; This additional information about the domain of the age property does not add any new entailments to the model. Why? Because we already know that jane_doe is a Person. So we can consider age to be one of the properties of Person type resources, because if we use the property as a predicate of that resource, it doesn\u0026rsquo;t add any new rdf:type information about the resource. Conversely, if we know that some resource has an age, we don\u0026rsquo;t learn any new information by declaring that it has rdf:type Person. In summary, for the purposes of this HOWTO we define the properties of a class as just those properties that don\u0026rsquo;t entail any new type information when applied to resources that are already known to be of that class.\nSub-classes, and more complex class expressions Given these basic principles, now consider the following RDF fragment:\n\u0026lt;rdfs:Class rdf:ID=\u0026quot;LivingThing\u0026quot; /\u0026gt; \u0026lt;rdfs:Class rdf:ID=\u0026quot;Animal\u0026quot;\u0026gt; \u0026lt;rdfs:subClassOf rdf:resource=\u0026quot;#LivingThing\u0026quot;\u0026gt; \u0026lt;/rdfs:Class\u0026gt; \u0026lt;rdfs:Class rdf:ID=\u0026quot;Mammal\u0026quot;\u0026gt; \u0026lt;rdfs:subClassOf rdf:resource=\u0026quot;#Animal\u0026quot;\u0026gt; \u0026lt;/rdfs:Class\u0026gt; \u0026lt;rdf:Property rdf:ID=\u0026quot;hasSkeleton\u0026quot;\u0026gt; \u0026lt;rdfs:domain rdf:resource=\u0026quot;Animal\u0026quot; /\u0026gt; \u0026lt;/rdf:Property\u0026gt; Is hasSkeleton one of the properties of Animal? Yes, because any resource of rdf:type Animal can have a hasSkeleton property (with value either true or false) without adding type information. Similarly, any resource that is a Mammal also has rdf:type Animal (by the sub-class relation), so hasSkeleton is a property of Mammal. However, hasSkeleton is not a property of LivingThing, since we don\u0026rsquo;t automatically know that a living thing is an animal - it may be a plant. Stating that a given LivingThing has a hasSkeleton property, even if the value is false, would entail the additional rdf:type statement that the LivingThing is also an Animal.\nFor more complex class expressions in the domain, we look to see what simple domain constraints are entailed. For example, a domain constraint A ∩ B (i.e. \u0026ldquo;A intersection B\u0026rdquo;) for property p entails that both p rdfs:domain A and p rdfs:domain B are true. However, the properties of neither A nor B will include p. To see this, suppose we have a resource x that we already know is of type A, and a statement x p y. This entails x rdf:type A which we already know, but also x rdf:type B. So information is added, even if we know that x is an instance A, so p is not a property of A. The symmetrical argument holds for p not being a property of B.\nHowever, if the domain of p is A ∪ B (i.e. \u0026ldquo;A union B\u0026rdquo;), then both A and B will have p as a property, since an occurrence of, say x p y does not allow us to conclude that either x rdf:type A or x rdf:type B.\nProperty hierarchies Since sub-properties inherit the domain constraints of their parent property, the properties of a class will include the closure over the sub-property hierarchy. Extending the previous example, the properties of Animal and Mammal include both hasSkeleton and hasEndoSkeleton:\n\u0026lt;rdf:Property rdf:ID=\u0026quot;hasSkeleton\u0026quot;\u0026gt; \u0026lt;rdfs:domain rdf:resource=\u0026quot;Animal\u0026quot; /\u0026gt; \u0026lt;/rdf:Property\u0026gt; \u0026lt;rdf:Property rdf:ID=\u0026quot;hasEndoSkeleton\u0026quot;\u0026gt; \u0026lt;rdfs:subPropertyOf rdf:resource=\u0026quot;#hasSkeleton\u0026quot; /\u0026gt; \u0026lt;/rdf:Property\u0026gt; In general, there may be many different ways of deducing simple domain constraints from the axioms asserted in the ontology. Whether or not all of these possible deductions are present in any given RDF model depends on the power and completeness of the reasoner bound to that model.\nGlobal properties Under the principled definition that we propose here, properties which do not express a domain value are global, in the sense that they can apply to any resource. They do not, by definition, entail any new type information about the individuals they are applied to. Put another way, the domain of a property, if unspecified, is either rdfs:Resource or owl:Thing, depending on the ontology language. These are simply the types that all resources have by default. Therefore, every class has all of the global properties as one of the properties of the class.\nA commonly used idiom in some OWL ontologies is to use Restrictions to create an association between a class and the properties of instances of that class. For example, the following fragment shows that all instances of Person should have a familyName property:\n\u0026lt;owl:Class rdf:ID=\u0026quot;Person\u0026quot;\u0026gt; \u0026lt;rdfs:subClassOf\u0026gt; \u0026lt;owl:Restriction\u0026gt; \u0026lt;owl:onProperty rdf:resource=\u0026quot;#familyName\u0026quot; /\u0026gt; \u0026lt;owl:minCardinality rdf:datatype=\u0026quot;\u0026amp;xsd;int\u0026quot;\u0026gt;1\u0026lt;/owl:minCardinality\u0026gt; \u0026lt;/owl:Restriction\u0026gt; \u0026lt;/rdfs:subClassOf\u0026gt; \u0026lt;/owl:Class\u0026gt; This approach shows the intent of the ontology designer that Person instances have familyName properties. We do regard familyName as one of the properties of Person, but only because of the global properties principle. Unless a domain constraint is also specified for familyName, it will appear as one of the properties of classes other than Person. Note that this is a behaviour change from versions of Jena prior to release 2.2. Prior to this release, Jena used a heuristic method to attempt to associate restriction properties with the classes sub-classing that restriction. Since there were problems with precisely defining the heuristic, and ensuring correct behaviour (especially with inference models), we have dropped the use of this heuristic from Jena 2.2 onwards.\nThe Java API Support for frame-like views of classes and properties is provided through the ontology API. The following methods are used to access the properties of a class, and the converse for properties:\nOntClass.listDeclaredProperties(); OntClass.listDeclaredProperties( boolean direct ); OntClass.hasDeclaredProperty( Property prop, boolean direct ); OntProperty.listDeclaringClasses(); OntProperty.listDeclaringClasses( boolean direct ); All of the above API methods return a Jena ExtendedIterator.\nNote a change from the Jena 2.1 interface: the optional Boolean parameter on listDeclaredProperties has changed name from all (Jena 2.1 and earlier) to direct (Jena 2.2 and later). The meaning of the parameter has also changed: all was intended to simulate some reasoning steps in the absence of a reasoner, whereas direct is used to restrict the associations to only the local associations. See more on direct associations.\nA further difference from Jena 2.1 is that the models that are constructed without reasoners perform only very limited simulation of the inference closure of the model. Users who wish the declared properties to include entailments will need to construct their models with one of the built-in or external reasoners. The difference is illustrated by the following code fragment:\n\u0026lt;rdfs:Class rdf:ID=\u0026quot;A\u0026quot; /\u0026gt; \u0026lt;rdfs:Property rdf:ID=\u0026quot;p\u0026quot;\u0026gt; \u0026lt;rdfs:domain rdf:resource=\u0026quot;#A\u0026quot; /\u0026gt; \u0026lt;/rdfs:Property\u0026gt; \u0026lt;rdfs:Property rdf:ID=\u0026quot;q\u0026quot;\u0026gt; \u0026lt;rdfs:subPropertyOf rdf:resource=\u0026quot;#p\u0026quot; /\u0026gt; \u0026lt;/rdfs:Property\u0026gt; OntModel mNoInf = ModelFactory.createOntologyModel( OntModelSpec.OWL_MEM ); OntClass a0 = mNoInf.getOntClass( NS + \u0026quot;A\u0026quot; ); Iterator i0 = a0.listDeclaredProperties(); OntModel mInf = ModelFactory.createOntologyModel( OntModelSpec.OWL_MEM_RULE_INF ); OntClass a1 = mInf.getOntClass( NS + \u0026quot;A\u0026quot; ); Iterator i1 = a1.listDeclaredProperties(); Iterator i1 will return p and q, while i0 will return only p.\nSummary of changes from Jena 2.2-beta-2 and older For users updating code that uses listDeclaredProperties from versions of Jena prior to 2.2-final, the following changes should be noted:\nGlobal properties listDeclaredProperties will treat properties with no specified domain as global, and regard them as properties of all classes. The use of the direct flag can hide global properties from non-root classes. Restriction properties listDeclaredProperties no longer heuristically returns properties associated with a class via the owl:onProperty predicate of a restriction. Limited simulated inference The old version of listDeclaredProperties attempted to simulate the entailed associations between classes and properties. Users are now advised to attach a reasoner to their models to do this. Change in parameter semantics The old version of listDeclaredProperties(boolean all) took one parameter, a Boolean flag to indicate whether additional declared (implied) properties should be listed. Since this is now covered by the use, or otherwise, of a reasoner attached to the model, the new method signature is listDeclaredProperties(boolean direct), where calling the method with direct = true will compress the returned results to use only the direct associations. ","permalink":"","tags":null,"title":"Presenting RDF as frames"},{"categories":null,"contents":"SPARQL has four result forms:\nSELECT – Return a table of results. CONSTRUCT – Return an RDF graph, based on a template in the query. DESCRIBE – Return an RDF graph, based on what the query processor is configured to return. ASK – Ask a boolean query. The SELECT form directly returns a table of solutions as a result set, while DESCRIBE and CONSTRUCT use the outcome of matching to build RDF graphs.\nSolution Modifiers Pattern matching produces a set of solutions. This set can be modified in various ways:\nProjection - keep only selected variables OFFSET/LIMIT - chop the number solutions (best used with ORDER BY) ORDER BY - sorted results DISTINCT - yield only one row for one combination of variables and values. The solution modifiers OFFSET/LIMIT and ORDER BY always apply to all result forms. OFFSET and LIMIT A set of solutions can be abbreviated by specifying the offset (the start index) and the limit (the number of solutions) to be returned. Using LIMIT alone can be useful to ensure not too many solutions are returned, to restrict the effect of some unexpected situation. LIMIT and OFFSET can be used in conjunction with sorting to take a defined slice through the solutions found.\nORDER BY SPARQL solutions are sorted by expression, including custom functions.\nORDER BY ?x ?y ORDER BY DESC(?x) ORDER BY x:func(?x) # Custom sorting condition DISTINCT The SELECT result form can take the DISTINCT modifier which ensures that no two solutions returned are the same - this takes place after projection to the requested variables.\nSELECT The SELECT result form is a projection, with DISTINCT applied, of the solution set. SELECT identifies which named variables are in the result set. This may be \u0026ldquo;*\u0026rdquo; meaning \u0026ldquo;all named variables\u0026rdquo; (blank nodes in the query act like variables for matching but are never returned).\nCONSTRUCT CONSTRUCT builds an RDF based on a graph template. The graph template can have variables which are bound by a WHERE clause. The effect is to calculate the graph fragment, given the template, for each solution from the WHERE clause, after taking into account any solution modifiers. The graph fragments, one per solution, are merged into a single RDF graph which is the result.\nAny blank nodes explicitly mentioned in the graph template are created afresh for each time the template is used for a solution.\nDESCRIBE The CONSTRUCT form, takes an application template for the graph results. The DESCRIBE form also creates a graph but the form of that graph is provided the query processor, not the application. For each URI found, or explicitly mentioned in the DESCRIBE clause, the query processor should provide a useful fragment of RDF, such as all the known details of a book. ARQ allows domain-specific description handlers to be written.\nASK The ASK result form returns a boolean, true of the pattern matched otherwise false.\nReturn to index\n","permalink":"","tags":null,"title":"Producing Result Sets"},{"categories":null,"contents":"SPARQL tem quatro formas de se obter resultados:\nSELECT – Retorna uma tabela de resultados. CONSTRUCT – Retorna um grafo RDF, baseado num template da consulta. DESCRIBE – Retorna um grafo RDF, baseado no quê o processador está configurado para retornar. ASK – Faz uma consulta booleana. A forma SELECT, diretamente, retorna uma tabela de soluções como conjunto de resultados, enquanto que DESCRIBE e CONSTRUCT o resultado da consulta para montar um grafo RDF.\nModificadores de Soluções Casamento de padrões produz um conjunto de soluções. Esse conjunto pode ser modificado de várias maneiras:\nProjection - mantém apenas variáveis selecionadas OFFSET/LIMIT - recorta o número de soluções (melhor usado com ORDER BY) ORDER BY - resultados ordenados DISTINCT - retorna apenas uma linha para uma combinação de variáveis e valores. Os modificadores de solução OFFSET/LIMIT e ORDER BY sempre se aplica a todos os resultados.\nOFFSET e LIMIT Um conjunto de soluções pode ser abreviado especificando o deslocamento (índice inicial) e o limite (número de soluções) a ser retornados. Usando apenas LIMIT é útil para garantir que nem tantas soluções vão ser retornadas, para restringir o efeito de uma situação inesperada. LIMIT e OFFSET pode ser usado em conjunção com ordenamento para pegar um fatia definida dentre as soluções encontradas.\nORDER BY Soluções SPARQL são ordenadas por expressões, incluindo funções padrões.\nORDER BY ?x ?y ORDER BY DESC(?x) ORDER BY x:func(?x) # Custom sorting condition DISTINCT O SELECT pode usar o modificador DISTINCT para garantir que duas soluções retornadas sejam diferentes.\nSELECT O SELECT é uma projeção, com DISTINCT aplicado, do conjunto solução. SELECT identifica quais variáveis nomeadas estão no conjunto resultado. Isso pode ser um \u0026ldquo;*\u0026rdquo; significando que “todas as variáveis” (blank nodes na consulta atuam como variáveis para casamento, mas nada é retornado).\nCONSTRUCT CONSTRUCT monta um RDF baseado num grafo template. O grafo template pode ter variáveis que são definidas na clausula WHERE. O efeito é o cálculo de um fragmento de grafo, dado o template, para cada solução da clausula WHERE, depois levando em conta qualquer modificador de solução. Os fragmentos de grafo, um por solução, são juntados num único grafo RDF que é o resultado.\nQualquer blank node explicitamente mencionado no grafo template são criados novamente para cada vez que o template é usado para uma solução.\nDESCRIBE CONSTRUCT pega um template para o grafo de resultados. O DESCRIBE também cria um grafo mas a forma deste grafo é fornecida pelo processador da consulta, não a aplicação. Pra cada URI encontrada, ou explicitamente mencionada na clausula DESCRIBE, o processor de consultas deve prover um fragmento de RDF útil, como todos os detalhes conhecidos de um livro. ARQ permite a escrita de manipuladores de descrições especificas de domínio.\nASK ASK retorna um booleano, true se o padrão for casado, ou false caso contrário.\nRetornar ao índice\n","permalink":"","tags":null,"title":"Produzindo resultados"},{"categories":null,"contents":"SPARQL allows custom property functions to add functionality to the triple matching process. Property functions can be registered or dynamically loaded.\nSee also the free text search page.\nSee also the FILTER functions FILTER functions library.\nApplications can also provide their own property functions.\nProperty Function Library The prefix apf is \u0026lt;\u0026gt;. (The old prefix of \u0026lt;\u0026gt; continues to work. Applications are encouraged to switch.)\nDirect loading using a URI prefix of \u0026lt;java:org.apache.jena.sparql.pfunction.library.\u0026gt; (note the final dot) also works.\nThe prefix list: is\nProperty nameDescription list list:member member Membership of an RDF List (RDF Collection). If list is not bound or a constant, find and iterate all lists in the graph (can be slow) else evaluate for one particular list. If member a variable, generate solutions with member bound to each element in the list. If member is bound or a constant expression, test to see if a member of the list. list list:index (index member) Index of an RDF List (RDF Collection). If list is not bound or a constant, find and iterate all lists in the graph (can be slow) else evaluate for one particular list. The object is a list pair, either element can be bound, unbound or a fixed node. Unbound variables in the object list are bound by the property function. list list:length length Length of an RDF List (RDF Collection). If list is not bound or a constant, find and iterate all lists in the graph (can be slow) else evaluate for one particular list. The object is tested against or bound to the length of the list. container rdfs:member member Membership of an RDF Container (rdf:Bag, rdf:Seq, rdf:Alt). Pre-registered URI. If this infers with queries running over a Jena inference model which also provides rdfs:member, then remove this from the global registry. PropertyFunctionRegistry.get().\nremove(RDFS.member.getURI()) ; apf:textMatch Free text match. bag apf:bag member The argument bag must be bound by this point in the query or a constant expression. If bag is bound or a URI, and member a variable, generate solutions with member bound to each element in the bag. If member is bound or a constant expression, test to see if a member of the list. seq apf:seq member The argument seq must be bound by this point in the query or a constant expression. If seq is bound or a URI, and member a variable, generate solutions with member bound to each element in the sequence. If member is bound or a constant expression, test to see if a member of the list. seq apf:alt member The argument alt must be bound by this point in the query or a constant expression. If alt is bound or a URI, and member a variable, generate solutions with member bound to each element in the alt . If member is bound or a constant expression, test to see if a member of the list. varOrTermapf:assignvarOrTerm Assign an RDF term from one side to the other. If both are fixed RDF terms or bound variables, it becomes a boolean test that the subject is the same RDF term as the object. iriapf:splitIRI (namespace localname)\niriapf:splitURI (namespace localname) Split the IRI or URI into namespace (an IRI) and local name (a string). Compare if given values or bound variables, otherwise set the variable. The object is a list with 2 elements. splitURI is an synonym. subject apf:str object The subject is the string form of the object, like the function str(). Object must be bound or a constant. Object can not be a blank node (see apf:blankNode) subject apf:blankNode label subject apf:bnode label\nSubject must be bound to a blank node or a constant. Label is either a string, in which case test for whether this is the blank node label of subject, or it is a variable, which is assigned the blank node label as a plain string. Argument mismatch causes no match. Use with care. subject apf:versionARQ version\nSet the subject to the IRI for ARQ and set the object to the version string (format \"N.N.N\" where N is a number). If any of the variables are already set, test for the correct value. var apf:concat (arg arg ...) Concatenate the arguments in the object list as strings, and assign to var. var apf:strSplit (arg arg) Split a string and return a binding for each result. The subject variable should be unbound. The first argument to the object list is the string to be split. The second argument to the object list is a regular expression by which to split the string. The subject var is bound for each result of the split, and each result has the whitespace trimmed from it. ARQ documentation index\n","permalink":"","tags":null,"title":"Property Functions in ARQ"},{"categories":null,"contents":"\u0026ldquo;RDF Binary\u0026rdquo; is a efficient format for RDF and RDF-related data using Apache Thrift or Google Protocol Buffers as the binary data encoding.\nThe W3C standard RDF syntaxes are text or XML based. These incur costs in parsing; the most human-readable formats also incur high costs to write, and have limited scalability due to the need to analyse the data for pretty printing rather than simply stream to output.\nBinary formats are faster to process - they do not incur the parsing costs of text-base formats. \u0026ldquo;RDF Binary\u0026rdquo; defines basic encoding for RDF terms, then builds data formats for RDF graphs, RDF datasets, and for SPARQL result sets. This gives a basis for high-performance linked data systems.\nThrift and Protobuf provides efficient, widely-used, binary encoding layers each with a large number of language bindings.\nFor more details of RDF Thrift.\nThrift encoding of RDF Terms RDF Thrift uses the Thrift compact protocol.\nSource: BinaryRDF.thrift\nRDF terms struct RDF_IRI { 1: required string iri } # A prefix name (abbrev for an IRI) struct RDF_PrefixName { 1: required string prefix ; 2: required string localName ; } struct RDF_BNode { 1: required string label } struct RDF_Literal { 1: required string lex ; 2: optional string langtag ; 3: optional string datatype ; 4: optional RDF_PrefixName dtPrefix ; } struct RDF_Decimal { 1: required i64 value ; 2: required i32 scale ; } struct RDF_VAR { 1: required string name ; } struct RDF_ANY { } struct RDF_UNDEF { } struct RDF_REPEAT { } union RDF_Term { 1: RDF_IRI iri 2: RDF_BNode bnode 3: RDF_Literal literal 4: RDF_PrefixName prefixName 5: RDF_VAR variable 6: RDF_ANY any 7: RDF_UNDEF undefined 8: RDF_REPEAT repeat 9: RDF_Triple tripleTerm # RDF-star # Value forms of literals. 10: i64 valInteger 11: double valDouble 12: RDF_Decimal valDecimal } Thrift encoding of Triples, Quads and rows. struct RDF_Triple { 1: required RDF_Term S 2: required RDF_Term P 3: required RDF_Term O } struct RDF_Quad { 1: required RDF_Term S 2: required RDF_Term P 3: required RDF_Term O 4: optional RDF_Term G } struct RDF_PrefixDecl { 1: required string prefix ; 2: required string uri ; } Thrift encoding of RDF Graphs and RDF Datasets union RDF_StreamRow { 1: RDF_PrefixDecl prefixDecl 2: RDF_Triple triple 3: RDF_Quad quad } RDF Graphs are encoded as a stream of RDF_Triple and RDF_PrefixDecl.\nRDF Datasets are encoded as a stream of RDF_Triple, RDF-Quad and RDF_PrefixDecl.\nThrift encoding of SPARQL Result Sets A SPARQL Result Set is encoded as a list of variables (the header), then a stream of rows (the results).\nstruct RDF_VarTuple { 1: list\u0026lt;RDF_VAR\u0026gt; vars } struct RDF_DataTuple { 1: list\u0026lt;RDF_Term\u0026gt; row } Protobuf encoding of RDF Terms The Protobuf schema is simialr.\nSource: binary-rdf.proto\nStreaming isused to allow for abitrary size graphs. Therefore the steram items (RDF_StreamRow below) are written with an initial length (writeDelimitedTo in the Java API).\nSee Protobuf Techniques Streaming.\nsyntax = \u0026#34;proto3\u0026#34;; option java_package = \u0026#34;org.apache.jena.riot.protobuf.wire\u0026#34; ; // Prefer one file with static inner classes. option java_outer_classname = \u0026#34;PB_RDF\u0026#34; ; // Optimize for speed (default) option optimize_for = SPEED ; //option java_multiple_files = true; // ==== RDF Term Definitions message RDF_IRI { string iri = 1 ; } // A prefix name (abbrev for an IRI) message RDF_PrefixName { string prefix = 1 ; string localName = 2 ; } message RDF_BNode { string label = 1 ; // 2 * fixed64 } // Common abbreviations for datatypes and other URIs? // union with additional values. message RDF_Literal { string lex = 1 ; oneof literalKind { bool simple = 9 ; string langtag = 2 ; string datatype = 3 ; RDF_PrefixName dtPrefix = 4 ; } } message RDF_Decimal { sint64 value = 1 ; sint32 scale = 2 ; } message RDF_Var { string name = 1 ; } message RDF_ANY { } message RDF_UNDEF { } message RDF_REPEAT { } message RDF_Term { oneof term { RDF_IRI iri = 1 ; RDF_BNode bnode = 2 ; RDF_Literal literal = 3 ; RDF_PrefixName prefixName = 4 ; RDF_Var variable = 5 ; RDF_Triple tripleTerm = 6 ; RDF_ANY any = 7 ; RDF_UNDEF undefined = 8 ; RDF_REPEAT repeat = 9 ; // Value forms of literals. sint64 valInteger = 20 ; double valDouble = 21 ; RDF_Decimal valDecimal = 22 ; } } // === StreamRDF items message RDF_Triple { RDF_Term S = 1 ; RDF_Term P = 2 ; RDF_Term O = 3 ; } message RDF_Quad { RDF_Term S = 1 ; RDF_Term P = 2 ; RDF_Term O = 3 ; RDF_Term G = 4 ; } // Prefix declaration message RDF_PrefixDecl { string prefix = 1; string uri = 2 ; } // StreamRDF message RDF_StreamRow { oneof row { RDF_PrefixDecl prefixDecl = 1 ; RDF_Triple triple = 2 ; RDF_Quad quad = 3 ; RDF_IRI base = 4 ; } } message RDF_Stream { repeated RDF_StreamRow row = 1 ; } // ==== SPARQL Result Sets message RDF_VarTuple { repeated RDF_Var vars = 1 ; } message RDF_DataTuple { repeated RDF_Term row = 1 ; } // ==== RDF Graph message RDF_Graph { repeated RDF_Triple triple = 1 ; } ","permalink":"","tags":null,"title":"RDF Binary using Apache Thrift"},{"categories":null,"contents":"RDFConnection provides a unified set of operations for working on RDF with SPARQL operations. It provides SPARQL Query, SPARQL Update and the SPARQL Graph Store operations. The interface is uniform - the same interface applies to local data and to remote data using HTTP and the SPARQL protocols (SPARQL protocol) and SPARQL Graph Store Protocol).\nOutline RDFConnection provides a number of different styles for working with RDF data in Java. It provides support for try-resource and functional code passing styles, as well the more basic sequence of methods calls.\nFor example: using try-resources to manage the connection, and perform two operations, one to load some data, and one to make a query can be written as:\ntry ( RDFConnection conn = RDFConnection.connect(...) ) { conn.load(\u0026#34;data.ttl\u0026#34;) ; conn.querySelect(\u0026#34;SELECT DISTINCT ?s { ?s ?p ?o }\u0026#34;, (qs) -\u0026gt; { Resource subject = qs.getResource(\u0026#34;s\u0026#34;) ; System.out.println(\u0026#34;Subject: \u0026#34; + subject) ; }) ; } This could have been written as (approximately \u0026ndash; the error handling is better in the example above):\nRDFConnection conn = RDFConnection.connect(...) conn.load(\u0026#34;data.ttl\u0026#34;) ; QueryExecution qExec = conn.query(\u0026#34;SELECT DISTINCT ?s { ?s ?p ?o }\u0026#34;) ; ResultSet rs = qExec.execSelect() ; while(rs.hasNext()) { QuerySolution qs = ; Resource subject = qs.getResource(\u0026#34;s\u0026#34;) ; System.out.println(\u0026#34;Subject: \u0026#34; + subject) ; } qExec.close() ; conn.close() ; Transactions Transactions are the preferred way to work with RDF data. Operations on an RDFConnection outside of an application-controlled transaction will cause the system to add one for the duration of the operation. This \u0026ldquo;autocommit\u0026rdquo; feature may lead to inefficient operations due to excessive overhead.\nThe Txn class provides a Java8-style transaction API. Transactions are code passed in the Txn library that handles the transaction lifecycle.\ntry ( RDFConnection conn = RDFConnection.connect(...) ) { Txn.execWrite(conn, () -\u0026gt; { conn.load(\u0026#34;data1.ttl\u0026#34;) ; conn.load(\u0026#34;data2.ttl\u0026#34;) ; conn.querySelect(\u0026#34;SELECT DISTINCT ?s { ?s ?p ?o }\u0026#34;, (qs) -\u0026gt; Resource subject = qs.getResource(\u0026#34;s\u0026#34;) ; System.out.println(\u0026#34;Subject: \u0026#34; + subject) ; }) ; }) ; } The traditional style of explicit begin, commit, abort is also available.\ntry ( RDFConnection conn = RDFConnection.connect(...) ) { conn.begin(ReadWrite.WRITE) ; try { conn.load(\u0026#34;data1.ttl\u0026#34;) ; conn.load(\u0026#34;data2.ttl\u0026#34;) ; conn.querySelect(\u0026#34;SELECT DISTINCT ?s { ?s ?p ?o }\u0026#34;, (qs) -\u0026gt; { Resource subject = qs.getResource(\u0026#34;s\u0026#34;) ; System.out.println(\u0026#34;Subject: \u0026#34; + subject) ; }) ; conn.commit() ; } finally { conn.end() ; } } The use of try-finally ensures that transactions are properly finished. The conn.end() provides an abort in case an exception occurs in the transaction and a commit has not been issued. The use of try-finally ensures that transactions are properly finished.\nTxn is wrapping these steps up and calling the application supplied code for the transaction body.\nRemote Transactions SPARQL does not define a remote transaction standard protocol. Each remote operation should be atomic (all happens or nothing happens) - this is the responsibility of the remote server.\nAn RDFConnection will at least provide the client-side locking features. This means that overlapping operations that change data are naturally handled by the transaction pattern within a single JVM.\nConfiguring a remote RDFConnection. The default settings on a remote connection should work for any SPARQL triple store endpoint which supports HTTP content negotiation. Sometimes different settings are desirable or required and RDFConnectionRemote provides a builder to construct RDFConnectionRemotes.\nAt its simplest, it is:\nRDFConnectionRemoteBuilder builder = RDFConnection.create() .destination(\u0026#34;http://host/triplestore\u0026#34;); which uses default settings used by RDFConenctionFactory.connect.\nSee example 4 and example 5.\nThere are many options, including setting HTTP headers for content types (javadoc) and providing detailed configuration with Apache HttpComponents HttpClient.\nFuseki Specific Connection If the remote destination is an Apache Jena Fuseki server, then the default general settings work, but it is possible to have a specialised connection\nRDFConnectionRemoteBuilder builder = RDFConnectionFuseki.create() .destination(\u0026#34;http://host/fuseki\u0026#34;); which uses settings tuned to Fuseki, including round-trip handling of blank nodes.\nSee example 6.\nGraph Store Protocol The SPARQL Graph Store Protocol (GSP) is a set of operations to work on whole graphs in a dataset. It provides a standardised way to manage the data in a dataset.\nThe operations are to fetch a graph, set the RDF data in a graph, add more RDF data into a graph, and delete a graph from a dataset.\nFor example: load two files:\ntry ( RDFConnection conn = RDFConnection.connect(...) ) { conn.load(\u0026#34;data1.ttl\u0026#34;) ; conn.load(\u0026#34;data2.nt\u0026#34;) ; } The file extension is used to determine the syntax.\nThere is also a set of scripts to help do these operations from the command line with SOH. It is possible to write curl scripts as well. The SPARQL Graph Store Protocol provides a standardised way to manage the data in a dataset.\nIn addition, RDFConnection provides an extension to give the same style of operation to work on a whole dataset (deleting the dataset is not provided).\nconn.loadDataset(\u0026#34;data-complete.trig\u0026#34;) ; Local vs Remote GSP operations work on whole models and datasets. When used on a remote connection, the result of a GSP operation is a separate copy of the remote RDF data. When working with local connections, 3 isolation modes are available:\nCopy – the models and datasets returned are independent copies. Updates are made to the return copy only. This is most like a remote connection and is useful for testing. Read-only – the models and datasets are made read-only but any changes to the underlying RDF data by changes by another route will be visible. This provides a form of checking for large datasets when \u0026ldquo;copy\u0026rdquo; is impractical. None – the models and datasets are passed back with no additional wrappers, and they can be updated with the changes being made the underlying dataset. The default for a local RDFConnection is \u0026ldquo;none\u0026rdquo;. When used with TDB, accessing returned models must be done with transactions in this mode.\nQuery Usage RDFConnection provides methods for each of the SPARQL query forms (SELECT, CONSTRUCT, DESCRIBE, ASK) as well as a way to get the QueryExecution for specialized configuration. When creating an QueryExecution explicitly, care should be taken to close it.\nIf the application wishes to capture the result set from a SELECT query and retain it across the lifetime of the transaction or QueryExecution, then the application should create a copy which is not attached to any external system with ResultSetFactory.copyResults.\ntry ( RDFConnection conn = RDFConnection.connect(\u0026#34;https://...\u0026#34;) ) { ResultSet safeCopy = Txn.execReadReturn(conn, () -\u0026gt; { // Process results by row: conn.querySelect(\u0026#34;SELECT DISTINCT ?s { ?s ?p ?o }\u0026#34;, (qs) -\u0026gt; { Resource subject = qs.getResource(\u0026#34;s\u0026#34;) ; System.out.println(\u0026#34;Subject: \u0026#34;+subject) ; }) ; ResultSet rs = conn.query(\u0026#34;SELECT * { ?s ?p ?o }\u0026#34;).execSelect() ; return ResultSetFactory.copyResults(rs) ; }) ; } Update Usage SPARQL Update operations can be performed and mixed with other operations.\ntry ( RDFConnection conn = RDFConnection.connect(...) ) { Txn.execWrite(conn, () -\u0026gt; { conn.update(\u0026#34;DELETE DATA { ... }\u0026#34; ) ; conn.load(\u0026#34;data.ttl\u0026#34;) ; }) ; } Dataset operations In addition to the SPARQL Graph Store Protocol, operations on whole datasets are provided for fetching (HTTP GET), adding data (HTTP POST) and setting the data (HTTP PUT) on a dataset URL. This assumes the remote server supported these REST-style operations. Apache Jena Fuseki does provide these.\nSubinterfaces To help structure code, the RDFConnection consists of a number of different interfaces. An RDFConnection can be passed to application code as one of these interfaces so that only certain subsets of the full operations are visible to the called code.\nquery via SparqlQueryConnection update via SparqlUpdateConnection graph store protocol RDFDatasetAccessConnection (read operations), and RDFDatasetConnection (read and write operations). Examples for simple usage examples see for example of how to use with StreamRDF see ","permalink":"","tags":null,"title":"RDF Connection : SPARQL operations API"},{"categories":null,"contents":"This page describes RDF Patch. An RDF Patch is a set of changes to an RDF dataset. The change are for triples, quads and prefixes.\nChanges to triples involving blank nodes are handled by using their system identifier which uniquely identifies a blank node. Unlike RDF syntaxes, blank nodes are not generated afresh each time the document is parsed.\nExample This example ensures certain prefixes are in the dataset and adds some basic triples for a new subclass of \u0026lt;http://example/SUPER_CLASS\u0026gt;.\nTX . PA \u0026#34;rdf\u0026#34; \u0026#34;\u0026#34; . PA \u0026#34;owl\u0026#34; \u0026#34;\u0026#34; . PA \u0026#34;rdfs\u0026#34; \u0026#34;\u0026#34; . A \u0026lt;http://example/SubClass\u0026gt; \u0026lt;\u0026gt; \u0026lt;\u0026gt; . A \u0026lt;http://example/SubClass\u0026gt; \u0026lt;\u0026gt; \u0026lt;http://example/SUPER_CLASS\u0026gt; . A \u0026lt;http://example/SubClass\u0026gt; \u0026lt;\u0026gt; \u0026#34;SubClass\u0026#34; . TC . Structure The text format for an RDF Patch is N-Triples-like: it is a series of rows, each row ends with a . (DOT). The tokens on a row are keywords, URIs, blank nodes, writen with their label (see below) or RDF Literals, in N-triples syntax. A keyword follows the same rules as Turtle prefix declarations without a trailing :.\nA line has an operation code, then some number of items depending on the operation.\nOperation H Header TXTCTA Change block: transactions PAPD Change: Prefix add and delete AD Change: Add and delete triples and quads The general structure of an RDF patch is a header (possible empty), then a number of change blocks.\nEach change block is a transaction. Transactions can be explicit recorded (\u0026lsquo;TX\u0026rsquo; start, TC commit) to include multiple transaction in one patch. They are not required. If not present, the patch should be applied atomically to the data.\nheader TX Quad, triple or prefix changes TC or TA Multiple transaction blocks are allowed for multiple sets of changes in one patch.\nA binary version based on RDF Thrift is provided. Parsing binary compared to text for N-triples achieves a x3-x4 increase in throughput.\nHeader The header provides for basic information about patch. It is a series of (key, value) pairs.\nIt is better to put complex metadata in a separate file and link to it from the header, but certain information is best kept with the patch. If patches are given an identifier, and als refer to the exp[ected previous patch, it create a log and patches can be applied in the right order.\nA header section can be used to provide additional information. In this example a patch has an identifier and refers to a previous patch. This might be used to create a log of patches, a log being a sequnce of chnages to apply in-order.\nH id \u0026lt;uuid:0686c69d-8f89-4496-acb5-744f0157a8db\u0026gt; . H prev \u0026lt;uuid:3ee0eca0-6d5f-4b4d-85db-f69ab1167eb1\u0026gt; . TX . PA \u0026#34;rdf\u0026#34; \u0026#34;\u0026#34; . PA \u0026#34;owl\u0026#34; \u0026#34;\u0026#34; . PA \u0026#34;rdfs\u0026#34; \u0026#34;\u0026#34; . A \u0026lt;http://example/SubClass\u0026gt; \u0026lt;\u0026gt; \u0026lt;\u0026gt; . A \u0026lt;http://example/SubClass\u0026gt; \u0026lt;\u0026gt; \u0026lt;http://example/SUPER_CLASS\u0026gt; . A \u0026lt;http://example/SubClass\u0026gt; \u0026lt;\u0026gt; \u0026#34;SubClass\u0026#34; . TC . Header format:\nH word RDFTerm . where word is a string in quotes, or an unquoted string (no spaces, starts with a letter, same as a prefix without the colon).\nThe header is ended by the first non H line or the end of the patch.\nTransactions TX . TC . These delimit a block of quad, triple and prefix changes.\nAbort, TA is provided so that changes can be streamed, not obliging the application to buffer change and wait to confirm the action is committed.\nTransactions should be applied atomically when a patch is applied.\nChanges A change is an add or delete of a quad or a prefix.\nPrefixes Prefixes do not apply to the data of the patch. They are changes to the data the patch is applied to.\nThe prefix name is without the trailing colon. It can be given as a quoted string or unquoted string (keyword) with the same limitations as Turtle on the prefix name.\nPA rdf \u0026lt;\u0026gt; . PA is adding a prefix, PD is deleting a prefix.\nQuads and Triples Triples and quads are written like N-Quads, 3 or 4 RDF terms, with the addition of an initial A or D for \u0026ldquo;add\u0026rdquo; or \u0026ldquo;delete\u0026rdquo;. Triples are in the order S-P-O, quads are S-P-O-G.\nAdd a triple:\nA \u0026lt;http://example/SubClass\u0026gt; \u0026lt;\u0026gt; \u0026lt;\u0026gt; . Blank nodes In order to synchronize datasets, changes involving blank nodes may need to refer to a blank node already in the data. RDF Patch deals with this by making blank node labels refer to the \u0026ldquo;system identifier\u0026rdquo; for the blank node.\nIn this way, RDF Patch is not an \u0026ldquo;RDF Format\u0026rdquo;. In all syntaxes for RDF (Turtle, TriG, RDF/XML etc), blank nodes are \u0026ldquo;document scoped\u0026rdquo; meaning that the blank node is unique to that one time reading of the document. A new blank node is generated every time the file is read into a graph or dataset, and that blank node does not appear in the existing data.\nIn practice, most RDF triplestores have some kind of internal identifier that identifies the blank node. RDF Patch requires a \u0026ldquo;system identifier\u0026rdquo; for blank nodes so that change can refer to an existing blank node in the data.\nThese can be written as _:label or \u0026lt;_:label\u0026gt; (the latter provides a wider set of permissible characters in the label). Note that _ is illegal as an IRI scheme to highlight the fact this is not, strictly, an IRI.\nRDF 1.1 describes skolemization where blank nodes are replaced by a URI. A system could use those for RDF Patch if it also meets the additional requirements to be able to receive and reverse the mapping back to the internal blank node object and also that all system generating patches can safely generate new, fresh skolem IRIs that will become new blank nodes in the RDF dataset then a patch is applied to it.\nPreferred Style The preferred style is to write patch rows on a single line, single space between tokens on a row and a single space before the terminal .. No comments should be included (comments start # and run to end of line).\nHeaders should be placed before the item they refer to; for information used by an RDF Patch Log, the metadata is about the whole patch and should be at the start of the file, before any TX.\n","permalink":"","tags":null,"title":"RDF Patch"},{"categories":null,"contents":"This section details the Jena RDF/XML parser. ARP is the parsing subsystem in Jena for handling the RDF/XML syntax.\nARP Features Using ARP without Jena Using other SAX and DOM XML sources ARP Features Java based RDF parser. Compliant with RDF Syntax and RDF Test Cases Recommendations. Compliant with following standards and recommendations: xml:lang\nxml:lang is fully supported, both in RDF/XML and any document embedding RDF/XML. Moreover, the language tags are checked against RFC1766, RFC3066, ISO639-1, ISO3166. xml:base\nxml:base is fully supported, both in RDF/XML and any document embedding RDF/XML. URI\nAll URI references are checked against RFC2396. The treatment of international URIs implements the concept of RDF URI Reference. XML Names\nAll rdf:ID\u0026rsquo;s are checked against the XML Names specification. Unicode Normal Form C\nString literals are checked for conformance with an early uniform normalization processing model. XML Literals\nrdf:parseType='Literal' is processed respecting namespaces, processing instructions and XML comments. This follows the XML exclusive canonicalizations recommendation with comments. Relative Namespace URI references\nNamespace URI references are checked in light of the W3C XML Plenary decision. Command-line RDF/XML error checking. Can be used independently of Jena, with customizable StatementHandler. Highly configurable error processing. Xerces based XML parsing. Processes both standalone and embedded RDF/XML. Streaming parser, suitable for large files. Supports SAX and DOM, for integration with non-file XML sources. ","permalink":"","tags":null,"title":"RDF/XML Input in Jena"},{"categories":null,"contents":"This page details the setup of RDF I/O technology (RIOT).\nFormats Commands Reading RDF in Jena Writing RDF in Jena Working with RDF Streams Additional details on working with RDF/XML Formats The following RDF formats are supported by Jena. In addition, other syntaxes can be integrated into both the parser and writer registries.\nTurtle RDF/XML N-Triples JSON-LD RDF/JSON TriG N-Quads TriX RDF Binary RDF/JSON is different from JSON-LD - it is a direct encoding of RDF triples in JSON. See the description of RDF/JSON.\nFrom Jena 4.5.0, JSON-LD 1.1 is the main supported version of JSON-LD.\nRDF Binary is a binary encoding of RDF (graphs and datasets) that can be useful for fast parsing. See RDF Binary.\nCommand line tools There are scripts in Jena download to run these commands.\nriot - parse, guessing the syntax from the file extension. Assumes N-Quads/N-Triples from stdin. turtle, ntriples, nquads, trig, rdfxml - parse a particular language These can be called directly as Java programs:\nThe file extensions understood are:\nExtension Language .ttl Turtle .nt N-Triples .nq N-Quads .trig TriG .rdf RDF/XML .owl RDF/XML .jsonld JSON-LD .trdf RDF Thrift .rt RDF Thrift .rpb RDF Protobuf .pbrdf RDF Protobuf .rj RDF/JSON .trix TriX .n3 is supported but only as a synonym for Turtle.\nThe TriX support is for the core TriX format.\nIn addition, if the extension is .gz the file is assumed to be gzip compressed. The file name is examined for an inner extension. For example, .nt.gz is gzip compressed N-Triples.\nJena does not support all possible compression formats itself, only GZip and BZip2 are supported directly. If you want to use an alternative compression format you can do so by piping the output of the relevant decompression utility into one of Jena\u0026rsquo;s commands e.g.\nzstd -d \u0026lt; FILE.nq.zst | riot --syntax NQ ... These scripts call java programs in the riotcmd package. For example:\njava -cp ... riotcmd.riot file.ttl This can be a mixture of files in different syntaxes when file extensions are used to determine the file syntax type.\nThe scripts all accept the same arguments (type \u0026quot;riot --help\u0026quot; to get command line reminders):\n--syntax=NAME; Explicitly set the input syntax for all files. --validate: Checking mode: same as --strict --sink --check=true. --check=true/false: Run with checking of literals and IRIs either on or off. --time: Output timing information. --sink: No output. --output=FORMAT: Output in a given syntax (streaming if possible). --formatted=FORMAT: Output in a given syntax, using pretty printing. --stream=FORMAT: Output in a given syntax, streaming (not all syntaxes can be streamed). To aid in checking for errors in UTF8-encoded files, there is a utility which reads a file of bytes as UTF8 and checks the encoding.\nutf8 \u0026ndash; read bytes as UTF8 Inference RIOT support creation of inferred triples during the parsing process:\nriotcmd.infer --rdfs VOCAB FILE FILE ... Output will contain the base data and triples inferred based on RDF subclass, subproperty, domain and range declarations.\n","permalink":"","tags":null,"title":"Reading and Writing RDF in Apache Jena"},{"categories":null,"contents":" JSON-LD 1.1 is the default version of JSON-LD supported by Apache Jena. This page is out of date and left temporary only for information about using JSON-LD 1.1 in versions 4.2.x to 4.4.x. This page details support for reading JSON-LD 1.1 using Titanium JSON-LD.\nWhile Titanium is licensed under the Apache License, it has a dependency on the Eclipse Jakarta JSON Processing API, which is licensed under the Eclipse Public License 2.0.\nAdditional Dependencies The Titanium engine (com.apicatalog:titanium-json-ld) uses the Eclipse Jakarta JSON Processing licensed under the Eclipse Public License 2.0 with dependencies:\njakarta.json:jakarta.json-api org.glassfish:jakarta.json Failure to add these dependencies will result in UnsupportedOperationException\nNeed both titanium-json-ld (1.1.0 or later) and org.glassfish:jakarta on the classpath Usage Jena currently (from version 4.2.0) offers both JSON-LD 1.0 and also JSON-LD 1.1.\nThe file extension for JSONLD 1.1 is .jsonld11.\nIf not reading from a file with this file extension, the application needs to force the language choice to be JSON-LD 1.1 with RDFParser using forceLang(Lang.JSONLD11):\nRDFParser.source(...) .forceLang(Lang.JSONLD11) ... .build() or short-cut form:\nRDFParser.source(URL or InputStream) .forceLang(Lang.JSONLD11) .parse(dataset); ","permalink":"","tags":null,"title":"Reading JSON-LD 1.1"},{"categories":null,"contents":"This page details the setup of RDF I/O technology (RIOT) for Apache Jena.\nSee Writing RDF for details of the RIOT Writer system.\nAPI Determining the RDF syntax Example 1 : Using the RDFDataMgr Example 2 : Model usage Example 3 : Using RDFParser Logging The StreamManager and LocationMapper Configuring a StreamManager Configuring a LocationMapper Advanced examples Iterating over parser output Filtering the output of parsing Add a new language Full details of operations are given in the javadoc.\nAPI Much of the functionality is accessed via the Jena Model API; direct calling of the RIOT subsystem isn\u0026rsquo;t needed. A resource name with no URI scheme is assumed to be a local file name.\nApplications typically use at most RDFDataMgr to read RDF datasets.\nThe major classes in the RIOT API are:\nClass Comment RDFDataMgr Main set of functions to read and load models and datasets StreamRDF Interface for the output of all parsers RDFParser Detailed setup of a parser StreamManager Handles the opening of typed input streams RDFLanguages Registered languages RDFParserRegistry Registered parser factories Determining the RDF syntax The syntax of the RDF file is determined by the content type (if an HTTP request), then the file extension if there is no content type. Content type text/plain is ignored; it is assumed to be type returned for an unconfigured http server. The application can also pass in a declared language hint.\nThe string name traditionally used in is mapped to RIOT Lang as:\nJena reader RIOT Lang \u0026quot;TURTLE\u0026quot; TURTLE \u0026quot;TTL\u0026quot; TURTLE \u0026quot;Turtle\u0026quot; TURTLE \u0026quot;N-TRIPLES\u0026quot; NTRIPLES \u0026quot;N-TRIPLE\u0026quot; NTRIPLES \u0026quot;NT\u0026quot; NTRIPLES \u0026quot;RDF/XML\u0026quot; RDFXML \u0026quot;N3\u0026quot; N3 \u0026quot;JSON-LD\u0026quot; JSONLD \u0026quot;RDF/JSON\u0026quot; RDFJSON \u0026quot;RDF/JSON\u0026quot; RDFJSON The following is a suggested Apache httpd .htaccess file:\nAddType text/turtle .ttl AddType application/rdf+xml .rdf AddType application/n-triples .nt AddType application/ld+json .jsonld AddType text/trig .trig AddType application/n-quads .nq AddType application/trix+xml .trix AddType application/rdf+thrift .rt AddType application/rdf+protobuf .rpb Example 1 : Using the RDFDataMgr RDFDataMgr provides operations to load, read and write models and datasets.\nRDFDataMgr \u0026ldquo;load\u0026rdquo; operations create an in-memory container (model, or dataset as appropriate); \u0026ldquo;read\u0026rdquo; operations add data into an existing model or dataset.\n// Create a model and read into it from file // \u0026quot;data.ttl\u0026quot; assumed to be Turtle. Model model = RDFDataMgr.loadModel(\u0026quot;data.ttl\u0026quot;) ; // Create a dataset and read into it from file // \u0026quot;data.trig\u0026quot; assumed to be TriG. Dataset dataset = RDFDataMgr.loadDataset(\u0026quot;data.trig\u0026quot;) ; // Read into an existing Model, \u0026quot;data2.ttl\u0026quot;) ; Example 2 : Common usage The original Jena Model API operation for read and write provide another way to the same machinery:\nModel model = ModelFactory.createDefaultModel() ;\u0026quot;data.ttl\u0026quot;) ; If the syntax is not as the file extension, a language can be declared:\\u0026quot;\u0026quot;, \u0026quot;TURTLE\u0026quot;) ; Example 3 : Using RDFParser Detailed control over the setup of the parsing process is provided by RDFParser which provides a builder pattern. It has many options - see the javadoc for all details.\nFor example, to read Trig data, and set the error handler specially,\nDataset dataset; // The parsers will do the necessary character set conversion. try (InputStream in = new FileInputStream(\u0026quot;data.some.unusual.extension\u0026quot;)) { dataset = RDFParser.create() .source(in) .lang(RDFLanguages.TRIG) .errorHandler(ErrorHandlerFactory.errorHandlerStrict) .base(\u0026quot;http://example/base\u0026quot;) .toDataset(noWhere); } Logging The parsers log to a logger called org.apache.jena.riot. To avoid WARN messages, set this to ERROR in the logging system of the application.\nStreamManager and LocationMapper Operations to read RDF data can be redirected to local copies and to other URLs. This is useful to provide local copies of remote resources.\nBy default, the RDFDataMgr uses the global StreamManager to open typed InputStreams. The StreamManager can be set using the RDFParser builder:\n// Create a copy of the global default StreamManager. StreamManager sm = StreamManager.get().clone(); // Add directory \u0026quot;/tmp\u0026quot; as a place to look for files sm.addLocator(new LocatorFile(\u0026quot;/tmp\u0026quot;)); RDFParser.create() .streamManager(sm) .source(\u0026quot;data.ttl\u0026quot;) .parse(...); It can also be set in a Context object given the the RDFParser for the operation, but normally this defaults to the global Context available via Context.get(). The constant SysRIOT.sysStreamManager, which is, is used.\nSpecialized StreamManagers can be configured with specific locators for data:\nFile locator (with own current directory) URL locator Class loader locator Zip file locator Configuring a StreamManager The StreamManager can be reconfigured with different places to look for files. The default configuration used for the global StreamManager is a file access class, where the current directory is that of the java process, a URL accessor for reading from the web, and a class loader-based accessor. Different setups can be built and used either as the global set up, or on a per request basis.\nThere is also a LocationMapper for rewriting file names and URLs before use to allow placing known names in different places (e.g. having local copies of import http resources).\nConfiguring a LocationMapper Location mapping files are RDF, usually written in Turtle although an RDF syntax can be used.\n@prefix lm: \u0026lt;\u0026gt; . [] lm:mapping [ lm:name \u0026quot;file:foo.ttl\u0026quot; ; lm:altName \u0026quot;file:etc/foo.ttl\u0026quot; ] , [ lm:prefix \u0026quot;file:etc/\u0026quot; ; lm:altPrefix \u0026quot;file:ETC/\u0026quot; ] , [ lm:name \u0026quot;file:etc/foo.ttl\u0026quot; ; lm:altName \u0026quot;file:DIR/foo.ttl\u0026quot; ] . There are two types of location mapping: exact match renaming and prefix renaming. When trying to find an alternative location, a LocationMapper first tries for an exact match; if none is found, the LocationMapper will search for the longest matching prefix. If two are the same length, there is no guarantee on order tried; there is no implied order in a location mapper configuration file (it sets up two hash tables).\nIn the example above, file:etc/foo.ttl becomes file:DIR/foo.ttl because that is an exact match. The prefix match of file:/etc/ is ignored.\nAll string tests are done case sensitively because the primary use is for URLs.\nNotes:\nProperty values are not URIs, but strings. This is a system feature, not an RDF feature. Prefix mapping is name rewriting; alternate names are not treated as equivalent resources in the rest of Jena. While application writers are encouraged to use URIs to identify files, this is not always possible. There is no check to see if the alternative system resource is equivalent to the original. A LocationMapper finds its configuration file by looking for the following files, in order:\nfile:location-mapping.rdf file:location-mapping.ttl file:etc/location-mapping.rdf file:etc/location-mapping.ttl This is a specified as a path - note the path separator is always the character \u0026lsquo;;\u0026rsquo; regardless of operating system because URLs contain \u0026lsquo;:\u0026rsquo;.\nApplications can also set mappings programmatically. No configuration file is necessary.\nThe base URI for reading models will be the original URI, not the alternative location.\nAdvanced examples Example code may be found in jena-examples:arq/examples.\nIterating over parser output One of the capabilities of the RIOT API is the ability to treat parser output as an iterator, this is useful when you don\u0026rsquo;t want to go to the trouble of writing a full sink implementation and can easily express your logic in normal iterator style.\nTo do this you use AsyncParser.asyncParseTriples which parses the input on another thread:\nIteratorCloseable\u0026lt;Triple\u0026gt; iter = AsyncParser.asyncParseTriples(filename); iter.forEachRemaining(triple-\u0026gt;{ // Do something with triple }); Calling the iterator\u0026rsquo;s close method stops parsing and closes the involved resources. For N-Triples and N-Quads, you can use RiotParsers.createIteratorNTriples(input) which parses the input on the calling thread.\nRIOT example 9.\nAdditional control over parsing is provided by the AsyncParser.of(...) methods which return AsyncParserBuilder instances. The builder features a fluent API that allows for fine-tuning internal buffer sizes as well as eventually obtaining a standard Java Stream. Calling the stream\u0026rsquo;s close method stops parsing and closes the involved resources. Therefore, these streams are best used in conjunction with try-with-resources blocks:\ntry (Stream\u0026lt;Triple\u0026gt; stream = AsyncParser.of(filename) .setQueueSize(2).setChunkSize(100).streamTriples().limit(1000)) { // Do something with the stream } The AsyncParser also supports parsing RDF into a stream of EltStreamRDF elements. Each element can hold a triple, quad, prefix, base IRI or exception. For all Stream-based methods there also exist Iterator-based versions:\nIteratorCloseable\u0026lt;EltStreamRDF\u0026gt; it = AsyncParser.of(filename).asyncParseElements(); try { while (it.hasNext()) { EltStreamRDF elt =; if (elt.isTriple()) { // Do something with elt.getTriple(); } else if (elt.isPrefix()) { // Do something with elt.getPrefix() and elt.getIri(); } } } finally { Iter.close(it); } Filter the output of parsing When working with very large files, it can be useful to process the stream of triples or quads produced by the parser so as to work in a streaming fashion.\nSee RIOT example 4\nAdd a new language The set of languages is not fixed. A new language, together with a parser, can be added to RIOT as shown in RIOT example 6\n","permalink":"","tags":null,"title":"Reading RDF in Apache Jena"},{"categories":null,"contents":"This section of the documentation describes the current support for inference available within Jena. It includes an outline of the general inference API, together with details of the specific rule engines and configurations for RDFS and OWL inference supplied with Jena.\nNot all of the fine details of the API are covered here: refer to the Jena Javadoc to get the full details of the capabilities of the API. Note that this is a preliminary version of this document, some errors or inconsistencies are possible, feedback to the mailing lists is welcomed. Overview of inference support The Jena inference subsystem is designed to allow a range of inference engines or reasoners to be plugged into Jena. Such engines are used to derive additional RDF assertions which are entailed from some base RDF together with any optional ontology information and the axioms and rules associated with the reasoner. The primary use of this mechanism is to support the use of languages such as RDFS and OWL which allow additional facts to be inferred from instance data and class descriptions. However, the machinery is designed to be quite general and, in particular, it includes a generic rule engine that can be used for many RDF processing or transformation tasks.\nWe will try to use the term inference to refer to the abstract process of deriving additional information and the term reasoner to refer to a specific code object that performs this task. Such usage is arbitrary and if we slip into using equivalent terms like reasoning and inference engine, please forgive us. The overall structure of the inference machinery is illustrated below. Applications normally access the inference machinery by using the ModelFactory to associate a data set with some reasoner to create a new Model. Queries to the created model will return not only those statements that were present in the original data but also additional statements than can be derived from the data using the rules or other inference mechanisms implemented by the reasoner.\nAs illustrated the inference machinery is actually implemented at the level of the Graph SPI, so that any of the different Model interfaces can be constructed around an inference Graph. In particular, the Ontology API provides convenient ways to link appropriate reasoners into the OntModels that it constructs. As part of the general RDF API we also provide an InfModel, this is an extension to the normal Model interface that provides additional control and access to an underlying inference graph. The reasoner API supports the notion of specializing a reasoner by binding it to a set of schema or ontology data using the bindSchema call. The specialized reasoner can then be attached to different sets of instance data using bind calls. In situations where the same schema information is to be used multiple times with different sets of instance data then this technique allows for some reuse of inferences across the different uses of the schema. In RDF there is no strong separation between schema (aka Ontology AKA tbox) data and instance (AKA abox) data and so any data, whether class or instance related, can be included in either the bind or bindSchema calls - the names are suggestive rather than restrictive.\nTo keep the design as open ended as possible Jena also includes a ReasonerRegistry. This is a static class though which the set of reasoners currently available can be examined. It is possible to register new reasoner types and to dynamically search for reasoners of a given type. The ReasonerRegistry also provides convenient access to prebuilt instances of the main supplied reasoners.\nAvailable reasoners Included in the Jena distribution are a number of predefined reasoners:\nTransitive reasoner: Provides support for storing and traversing class and property lattices. This implements just the transitive and reflexive properties of rdfs:subPropertyOf and rdfs:subClassOf. RDFS rule reasoner: Implements a configurable subset of the RDFS entailments. OWL, OWL Mini, OWL Micro Reasoners: A set of useful but incomplete implementation of the OWL/Lite subset of the OWL/Full language. Generic rule reasoner: A rule based reasoner that supports user defined rules. Forward chaining, tabled backward chaining and hybrid execution strategies are supported. [Index]\nThe Inference API Generic reasoner API Small examples Operations on inference models\n- Validation\n- Extended list statements\n- Direct and indirect relations\n- Derivations\n- Accessing raw data and deductions\n- Processing control\n- Tracing Generic reasoner API Finding a reasoner For each type of reasoner there is a factory class (which conforms to the interface ReasonerFactory) an instance of which can be used to create instances of the associated Reasoner. The factory instances can be located by going directly to a known factory class and using the static theInstance() method or by retrieval from a global ReasonerRegistry which stores factory instances indexed by URI assigned to the reasoner. In addition, there are convenience methods on the ReasonerRegistry for locating a prebuilt instance of each of the main reasoners (getTransitiveReasoner, getRDFSReasoner, getRDFSSimpleReasoner, getOWLReasoner, getOWLMiniReasoner, getOWLMicroReasoner).\nNote that the factory objects for constructing reasoners are just there to simplify the design and extension of the registry service. Once you have a reasoner instance, the same instance can reused multiple times by binding it to different datasets, without risk of interference - there is no need to create a new reasoner instance each time.\nIf working with the Ontology API it is not always necessary to explicitly locate a reasoner. The prebuilt instances of OntModelSpec provide easy access to the appropriate reasoners to use for different Ontology configurations.\nSimilarly, if all you want is a plain RDF Model with RDFS inference included then the convenience methods ModelFactory.createRDFSModel can be used. Configuring a reasoner The behaviour of many of the reasoners can be configured. To allow arbitrary configuration information to be passed to reasoners we use RDF to encode the configuration details. The ReasonerFactory.create method can be passed a Jena Resource object, the properties of that object will be used to configure the created reasoner.\nTo simplify the code required for simple cases we also provide a direct Java method to set a single configuration parameter, Reasoner.setParameter. The parameter being set is identified by the corresponding configuration property.\nFor the built in reasoners the available configuration parameters are described below and are predefined in the ReasonerVocabulary class.\nThe parameter value can normally be a String or a structured value. For example, to set a boolean value one can use the strings \u0026quot;true\u0026quot; or \u0026quot;false\u0026quot;, or in Java use a Boolean object or in RDF use an instance of xsd:Boolean\nApplying a reasoner to data Once you have an instance of a reasoner it can then be attached to a set of RDF data to create an inference model. This can either be done by putting all the RDF data into one Model or by separating into two components - schema and instance data. For some external reasoners a hard separation may be required. For all of the built in reasoners the separation is arbitrary. The prime value of this separation is to allow some deductions from one set of data (typically some schema definitions) to be efficiently applied to several subsidiary sets of data (typically sets of instance data).\nIf you want to specialize the reasoner this way, by partially-applying it to a set schema data, use the Reasoner.bindSchema method which returns a new, specialized, reasoner.\nTo bind the reasoner to the final data set to create an inference model see the ModelFactory methods, particularly ModelFactory.createInfModel. Accessing inferences Finally, having created an inference model, any API operations which access RDF statements will be able to access additional statements which are entailed from the bound data by means of the reasoner. Depending on the reasoner these additional virtual statements may all be precomputed the first time the model is touched, may be dynamically recomputed each time or may be computed on-demand but cached.\nReasoner description The reasoners can be described using RDF metadata which can be searched to locate reasoners with appropriate properties. The calls Reasoner.getCapabilities and Reasoner.supportsProperty are used to access this descriptive metadata.\n[API Index] [Main Index]\nSome small examples These initial examples are not designed to illustrate the power of the reasoners but to illustrate the code required to set one up.\nLet us first create a Jena model containing the statements that some property \u0026quot;p\u0026quot; is a subproperty of another property \u0026quot;q\u0026quot; and that we have a resource \u0026quot;a\u0026quot; with value \u0026quot;foo\u0026quot; for \u0026quot;p\u0026quot;. This could be done by writing an RDF/XML or N3 file and reading that in but we have chosen to use the RDF API:\nString NS = \u0026quot;urn:x-hp-jena:eg/\u0026quot;; // Build a trivial example data set Model rdfsExample = ModelFactory.createDefaultModel(); Property p = rdfsExample.createProperty(NS, \u0026quot;p\u0026quot;); Property q = rdfsExample.createProperty(NS, \u0026quot;q\u0026quot;); rdfsExample.add(p, RDFS.subPropertyOf, q); rdfsExample.createResource(NS+\u0026quot;a\u0026quot;).addProperty(p, \u0026quot;foo\u0026quot;); Now we can create an inference model which performs RDFS inference over this data by using:\nInfModel inf = ModelFactory.createRDFSModel(rdfsExample); // [1] We can then check that resulting model shows that \u0026quot;a\u0026quot; also has property \u0026quot;q\u0026quot; of value \u0026quot;foo\u0026quot; by virtue of the subPropertyOf entailment:\nResource a = inf.getResource(NS+\u0026quot;a\u0026quot;); System.out.println(\u0026quot;Statement: \u0026quot; + a.getProperty(q)); Which prints the output:\nStatement: [urn:x-hp-jena:eg/a, urn:x-hp-jena:eg/q, Literal\u0026lt;foo\u0026gt;] Alternatively we could have created an empty inference model and then added in the statements directly to that model.\nIf we wanted to use a different reasoner which is not available as a convenience method or wanted to configure one we would change line [1]. For example, to create the same set up manually we could replace \\[1\\] by:\nReasoner reasoner = ReasonerRegistry.getRDFSReasoner(); InfModel inf = ModelFactory.createInfModel(reasoner, rdfsExample); or even more manually by\nReasoner reasoner = RDFSRuleReasonerFactory.theInstance().create(null); InfModel inf = ModelFactory.createInfModel(reasoner, rdfsExample); The purpose of creating a new reasoner instance like this variant would be to enable configuration parameters to be set. For example, if we were to listStatements on inf Model we would see that it also \u0026quot;includes\u0026quot; all the RDFS axioms, of which there are quite a lot. It is sometimes useful to suppress these and only see the \u0026quot;interesting\u0026quot; entailments. This can be done by setting the processing level parameter by creating a description of a new reasoner configuration and passing that to the factory method:\nResource config = ModelFactory.createDefaultModel() .createResource() .addProperty(ReasonerVocabulary.PROPsetRDFSLevel, \u0026quot;simple\u0026quot;); Reasoner reasoner = RDFSRuleReasonerFactory.theInstance().create(config); InfModel inf = ModelFactory.createInfModel(reasoner, rdfsExample); This is a rather long winded way of setting a single parameter, though it can be useful in the cases where you want to store this sort of configuration information in a separate (RDF) configuration file. For hardwired cases the following alternative is often simpler:\nReasoner reasoner = RDFSRuleReasonerFactory.theInstance()Create(null); reasoner.setParameter(ReasonerVocabulary.PROPsetRDFSLevel, ReasonerVocabulary.RDFS_SIMPLE); InfModel inf = ModelFactory.createInfModel(reasoner, rdfsExample); Finally, supposing you have a more complex set of schema information, defined in a Model called schema, and you want to apply this schema to several sets of instance data without redoing too many of the same intermediate deductions. This can be done by using the SPI level methods: Reasoner boundReasoner = reasoner.bindSchema(schema); InfModel inf = ModelFactory.createInfModel(boundReasoner, data); This creates a new reasoner, independent from the original, which contains the schema data. Any queries to an InfModel created using the boundReasoner will see the schema statements, the data statements and any statements entailed from the combination of the two. Any updates to the InfModel will be reflected in updates to the underlying data model - the schema model will not be affected.\n[API Index] [Main Index]\nOperations on inference models For many applications one simply creates a model incorporating some inference step, using the ModelFactory methods, and then just works within the standard Jena Model API to access the entailed statements. However, sometimes it is necessary to gain more control over the processing or to access additional reasoner features not available as virtual triples.\nValidation The most common reasoner operation which can't be exposed through additional triples in the inference model is that of validation. Typically the ontology languages used with the semantic web allow constraints to be expressed, the validation interface is used to detect when such constraints are violated by some data set. A simple but typical example is that of datatype ranges in RDFS. RDFS allows us to specify the range of a property as lying within the value space of some datatype. If an RDF statement asserts an object value for that property which lies outside the given value space there is an inconsistency.\nTo test for inconsistencies with a data set using a reasoner we use the InfModel.validate() interface. This performs a global check across the schema and instance data looking for inconsistencies. The result is a ValidityReport object which comprises a simple pass/fail flag (ValidityReport.isValid()) together with a list of specific reports (instances of the ValidityReport.Report interface) which detail any detected inconsistencies. At a minimum the individual reports should be printable descriptions of the problem but they can also contain an arbitrary reasoner-specific object which can be used to pass additional information which can be used for programmatic handling of the violations.\nFor example, to check a data set and list any problems one could do something like:\nModel data = RDFDataMgr.loadModel(fname); InfModel infmodel = ModelFactory.createRDFSModel(data); ValidityReport validity = infmodel.validate(); if (validity.isValid()) { System.out.println(\u0026quot;OK\u0026quot;); } else { System.out.println(\u0026quot;Conflicts\u0026quot;); for (Iterator i = validity.getReports(); i.hasNext(); ) { System.out.println(\u0026quot; - \u0026quot; +; } } The file testing/reasoners/rdfs/dttest2.nt declares a property bar with range xsd:integer and attaches a bar value to some resource with the value \u0026quot;25.5\u0026quot;^^xsd:decimal. If we run the above sample code on this file we see:\nConflicts - Error (dtRange): Property has a typed range Datatype[ -\u0026gt; class java.math.BigInteger]that is not compatible with 25.5: Whereas the file testing/reasoners/rdfs/dttest3.nt uses the value \u0026quot;25\u0026quot;^^xsd:decimal instead, which is a valid integer and so passes. Note that the individual validation records can include warnings as well as errors. A warning does not affect the overall isValid() status but may indicate some issue the application may wish to be aware of. For example, it would be possible to develop a modification to the RDFS reasoner which warned about use of a property on a resource that is not explicitly declared to have the type of the domain of the property. A particular case of this arises in the case of OWL. In the Description Logic community a class which cannot have an instance is regarded as \u0026quot;inconsistent\u0026quot;. That term is used because it generally arises from an error in the ontology. However, it is not a logical inconsistency - i.e. something giving rise to a contradiction. Having an instance of such a class is, clearly a logical error. In the Jena 2.2 release we clarified the semantics of isValid(). An ontology which is logically consistent but contains empty classes is regarded as valid (that is isValid() is false only if there is a logical inconsistency). Class expressions which cannot be instantiated are treated as warnings rather than errors. To make it easier to test for this case there is an additional method Report.isClean() which returns true if the ontology is both valid (logically consistent) and generated no warnings (such as inconsistent classes).\nExtended list statements The default API supports accessing all entailed information at the level of individual triples. This is surprisingly flexible but there are queries which cannot be easily supported this way. The first such is when the query needs to make reference to an expression which is not already present in the data. For example, in description logic systems it is often possible to ask if there are any instances of some class expression. Whereas using the triple-based approach we can only ask if there are any instances of some class already defined (though it could be defined by a bNode rather than be explicitly named).\nTo overcome this limitation the InfModel API supports a notion of \u0026quot;posit\u0026quot;, that is a set of assertions which can be used to temporarily declare new information such as the definition of some class expression. These temporary assertions can then be ref