blob: 5146c7c43f95b62479f33792e2092e7f50e3e353 [file] [log] [blame]
------
Apache Any23 - Supported Formats
------
The Apache Software Foundation
------
2011-2012
~~ Licensed to the Apache Software Foundation (ASF) under one or more
~~ contributor license agreements. See the NOTICE file distributed with
~~ this work for additional information regarding copyright ownership.
~~ The ASF licenses this file to You under the Apache License, Version 2.0
~~ (the "License"); you may not use this file except in compliance with
~~ the License. You may obtain a copy of the License at
~~
~~ http://www.apache.org/licenses/LICENSE-2.0
~~
~~ Unless required by applicable law or agreed to in writing, software
~~ distributed under the License is distributed on an "AS IS" BASIS,
~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~~ See the License for the specific language governing permissions and
~~ limitations under the License.
Supported Formats in Apache Any23
<<Apache Any23>> supports all the main standard formats introduced by the <<Semantic Web>> community.
* <<Input Formats>>
The following list shows the accepted input formats and for each one the support level.
* <<(X)HTML>> with <<RDFa 1.0>>, <<RDFa 1.1>>, <<Microdata>> and <<Microformats>>. <<Apache Any23>> fully supports the
{{{http://www.w3.org/TR/html5/}(X)HTML5}} input format and in particular
provides a set of extractors for processing embedded {{{http://www.w3.org/TR/rdfa-syntax/}RDFa 1.0}},
{{{http://www.w3.org/TR/rdfa-core/}RDFa 1.1}}, {{{http://microformats.org/}Microformats}}
and {{{http://www.w3.org/TR/microdata/}Microdata}}.
* <<Turtle>> <<Apache Any23>> fully supports the {{{http://www.w3.org/TeamSubmission/turtle/}Turtle}} specification.
* <<N-Triples>> <<Apache Any23>> fully supports the {{{http://www.w3.org/TR/rdf-testcases/#ntriples}N-Triples}} specification.
* <<N-Quads>> <<Apache Any23>> Version 1.1 supports the 2012 {{{https://web.archive.org/web/20150322024714/http://sw.deri.org/2008/07/n-quads/}N-Quads}} specification (last accessed: 2016-06-17). <<Apache Any23>> Version 1.2 will support the current {{{https://www.w3.org/TR/n-quads/}N-Quads}} specification.
* <<RDF/XML>> <<Apache Any23>> fully supports the {{{http://www.w3.org/TR/rdf-syntax-grammar/}RDF/XML}} specification.
* <<CSV>> <<Apache Any23>> allows you to represent header-provided {{{http://www.ietf.org/rfc/rfc4180.txt}CSV}} files with RDF using a specific {{{./dev-csv-extractor.html}algorithm}}.
* <<Output Formats>>
The supported output formats are enlisted below.
* <<Turtle>> <<Apache Any23>> is able to produce output in {{{http://www.w3.org/TeamSubmission/turtle/}Turtle}}.
* <<N-Triples>> <<Apache Any23>> is able to produce output in {{{http://www.w3.org/TR/rdf-testcases/#ntriples}N-Triples}}.
* <<N-Quads>> <<Apache Any23>> is able to produce output in the 2012 {{{https://web.archive.org/web/20150322024714/http://sw.deri.org/2008/07/n-quads/}N-Quads}} format (last accessed: 2016-06-17). <<Apache Any23>> Version 1.2 will support the current {{{https://www.w3.org/TR/n-quads/}N-Quads}} specification.
* <<RDF/XML>> <<Apache Any23>> is able to produce output in {{{http://www.w3.org/TR/rdf-syntax-grammar/}RDF/XML}}.
* <<JSON Statements>> <<Apache Any23>> is able to produce output in {{{http://www.json.org/}JSON}} . See the specific {{{json-statements}format}}.
* <<XML Report>> <<Apache Any23>> is able to produce a detailed report of the latest document extraction if required. See further details {{{./service.html#report-format}here}}.
* JSON Statements Format
{json-statements}
Apache Any23 is able to produce JSON output following the format described below.
Given the following example statements (expressed in N-Quads format):
+-------------------------------------------------------------------------------
_:bn1 <http://pred/1> <http://value/1> <http://graph/1> .
<http://sub/2> <http://pred/2> "language literal"@en <http://graph/2> .
<http://sub/3> <http://pred/3> "123"^^<http://datatype> <http://graph/3> .
+-------------------------------------------------------------------------------
these will be represented as:
+-------------------------------------------------------------------------------
{
"quads" : [
[
{
"type" : "bnode",
"value" : "bn1"
},
"http://pred/1",
{
"type" : "uri",
"value" : "http://value/1"
},
"http://graph/1"
],
[
{
"type" : "uri",
"value" : "http://sub/2"
},
"http://pred/2",
{
"type" : "literal",
"value" : "language literal",
"lang" : "en",
"datatype" : null
},
"http://graph/2"
],
[
{
"type" : "uri",
"value" : "http://sub/3"
},
"http://pred/3",
{
"type" : "literal",
"value" : "123",
"lang" : null,
"datatype" : "http://datatype"
},
"http://graph/3"
]
]
}
+-------------------------------------------------------------------------------
The <<JSON object>> structure is described by the following <<BNF>> rules,
where quotes are omitted to improve readability:
+-------------------------------------------------------------------------------
<json-response> ::= { "quads" : <statements> }
<statements> ::= [ <statement>+ ]
<statement> ::= [ <subject> , <predicate> , <object> , <graph> ]
<subject> ::= { "type" : <subject-type> , "value" : <value> }
<predicate> ::= <uri>
<object> ::= { "type" : <object-type> , "value" : <value> , "lang" : <lang> , "datatype" : <datatype> }
<graph> ::= <uri> | null
<subject-type> ::= "uri" | "bnode"
<object-type> ::= "uri" | "bnode"| "literal"
<value> ::= String
<lang> ::= String | null
<datatype> ::= <uri> | null
<uri> ::= String
+-------------------------------------------------------------------------------