blob: 8be8aa4f308441008144f8cf958881d020accd0d [file] [log] [blame]
------
Apache Any23 - Data Conversion
------
The Apache Software Foundation
------
2011-2012
~~ Licensed to the Apache Software Foundation (ASF) under one or more
~~ contributor license agreements. See the NOTICE file distributed with
~~ this work for additional information regarding copyright ownership.
~~ The ASF licenses this file to You under the Apache License, Version 2.0
~~ (the "License"); you may not use this file except in compliance with
~~ the License. You may obtain a copy of the License at
~~
~~ http://www.apache.org/licenses/LICENSE-2.0
~~
~~ Unless required by applicable law or agreed to in writing, software
~~ distributed under the License is distributed on an "AS IS" BASIS,
~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~~ See the License for the specific language governing permissions and
~~ limitations under the License.
Data Conversion
+----------------------------------------------------------------------------------------------
/*1*/ Any23 runner = new Any23();
/*2*/ final String content = "@prefix foo: <http://example.org/ns#> . " +
"@prefix : <http://other.example.org/ns#> ." +
"foo:bar foo: : . " +
":bar : foo:bar . ";
// The second argument of StringDocumentSource() must be a valid IRI.
/*3*/ DocumentSource source = new StringDocumentSource(content, "http://host.com/service");
/*4*/ ByteArrayOutputStream out = new ByteArrayOutputStream();
/*5*/ TripleHandler handler = new NTriplesWriter(out);
try {
/*6*/ runner.extract(source, handler);
} finally {
/*7*/ handler.close();
}
/*8*/ String nt = out.toString("UTF-8");
+----------------------------------------------------------------------------------------------
This example aims to demonstrate how to use <<Apache Any23>> to perform RDF data conversion.
In this code we provide some input data expressed as <<Turtle>> and convert it in <<NTriples>> format.
At <<line 1>> we define a new instance of the <<Apache Any23>> facade, that provides all the methods
useful for the transformation. The facade constructor accepts a list of extractor names, if specified
the extraction will be done only over this list, otherwise the data <MIME Type> will detected and will be applied
all the compatible extractors declared within the
{{{./apidocs/org/apache/any23/extractor/ExtractorRegistry.html}ExtractorRegistry}}.
The <<line 2>> defines the input string containing some {{{http://www.w3.org/TeamSubmission/turtle/}Turtle}} data.
At <<line 3>> we instantiate a {{{./apidocs/org/apache/any23/source/StringDocumentSource.html}StringDocumentSource}},
specifying a content and a the source <IRI>.
The <IRI> should be the source of the content data, and must be valid.
Besides the {{{./apidocs/org/apache/any23/source/StringDocumentSource.html}StringDocumentSource}},
you can also provide input from other sources, such as <HTTP> requests
and local files. See the classes in the sources {{{./apidocs/org/apache/any23/source/package-summary.html}package}}.
The <<line 4>> defines a buffered output stream that will be used to store the data produced by the
writer declared at <<line 5>>.
A writer stores the extracted triples in some destination.
We use an {{{./apidocs/org/apache/any23/writer/NTriplesWriter.html}NTriplesWriter}} here that writes
into a <<ByteArrayOutputStream>>. The main <<RDF>> formats writers are available and it is possible also to store
the triples directly into an <<RDF4J>> repository to query them via <<SPARQL>>.
See {{{./apidocs/org/apache/any23/writer/RepositoryWriter.html}RepositoryWriter}} and the writer
{{{./apidocs/org/apache/any23/writer/package-summary.html}package}}.
The extractor method invoked at <<line 6>> performs the metadata extraction.
This method accepts as first argument a {{{./apidocs/org/apache/any23/source/DocumentSource.html}DocumentSource}} and as
second argument a {{{./apidocs/org/apache/any23/writer/TripleHandler.html}TripleHandler}},
that will receive the sequence parsing events generated by the applied extractors. The extract method defines also
another signature where it is possible to specify a charset encoding for the input data. If <<null>>, the charset
will be auto detected.
The {{{./apidocs/org/apache/any23/writer/TripleHandler.html}TripleHandler}} needs to be explicitly closed,
this is done safely in a <<finally>> block at <<line 7>>.
The expected output is <UTF-8> encoded at <<line 8>>:
+----------------------------------------------------------------------------------------------
<http://example.org/ns#bar> <http://example.org/ns#> <http://other.example.org/ns#> .
<http://other.example.org/ns#bar> <http://other.example.org/ns#> <http://example.org/ns#bar> .
+----------------------------------------------------------------------------------------------