| ------ |
| Apache Any23 - Data Conversion |
| ------ |
| The Apache Software Foundation |
| ------ |
| 2011-2012 |
| |
| ~~ Licensed to the Apache Software Foundation (ASF) under one or more |
| ~~ contributor license agreements. See the NOTICE file distributed with |
| ~~ this work for additional information regarding copyright ownership. |
| ~~ The ASF licenses this file to You under the Apache License, Version 2.0 |
| ~~ (the "License"); you may not use this file except in compliance with |
| ~~ the License. You may obtain a copy of the License at |
| ~~ |
| ~~ http://www.apache.org/licenses/LICENSE-2.0 |
| ~~ |
| ~~ Unless required by applicable law or agreed to in writing, software |
| ~~ distributed under the License is distributed on an "AS IS" BASIS, |
| ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| ~~ See the License for the specific language governing permissions and |
| ~~ limitations under the License. |
| |
| Data Conversion |
| |
| +---------------------------------------------------------------------------------------------- |
| /*1*/ Any23 runner = new Any23(); |
| /*2*/ final String content = "@prefix foo: <http://example.org/ns#> . " + |
| "@prefix : <http://other.example.org/ns#> ." + |
| "foo:bar foo: : . " + |
| ":bar : foo:bar . "; |
| // The second argument of StringDocumentSource() must be a valid IRI. |
| /*3*/ DocumentSource source = new StringDocumentSource(content, "http://host.com/service"); |
| /*4*/ ByteArrayOutputStream out = new ByteArrayOutputStream(); |
| /*5*/ TripleHandler handler = new NTriplesWriter(out); |
| try { |
| /*6*/ runner.extract(source, handler); |
| } finally { |
| /*7*/ handler.close(); |
| } |
| /*8*/ String nt = out.toString("UTF-8"); |
| +---------------------------------------------------------------------------------------------- |
| |
| This example aims to demonstrate how to use <<Apache Any23>> to perform RDF data conversion. |
| In this code we provide some input data expressed as <<Turtle>> and convert it in <<NTriples>> format. |
| |
| At <<line 1>> we define a new instance of the <<Apache Any23>> facade, that provides all the methods |
| useful for the transformation. The facade constructor accepts a list of extractor names, if specified |
| the extraction will be done only over this list, otherwise the data <MIME Type> will detected and will be applied |
| all the compatible extractors declared within the |
| {{{./apidocs/org/apache/any23/extractor/ExtractorRegistry.html}ExtractorRegistry}}. |
| |
| The <<line 2>> defines the input string containing some {{{http://www.w3.org/TeamSubmission/turtle/}Turtle}} data. |
| |
| At <<line 3>> we instantiate a {{{./apidocs/org/apache/any23/source/StringDocumentSource.html}StringDocumentSource}}, |
| specifying a content and a the source <IRI>. |
| The <IRI> should be the source of the content data, and must be valid. |
| Besides the {{{./apidocs/org/apache/any23/source/StringDocumentSource.html}StringDocumentSource}}, |
| you can also provide input from other sources, such as <HTTP> requests |
| and local files. See the classes in the sources {{{./apidocs/org/apache/any23/source/package-summary.html}package}}. |
| |
| The <<line 4>> defines a buffered output stream that will be used to store the data produced by the |
| writer declared at <<line 5>>. |
| |
| A writer stores the extracted triples in some destination. |
| We use an {{{./apidocs/org/apache/any23/writer/NTriplesWriter.html}NTriplesWriter}} here that writes |
| into a <<ByteArrayOutputStream>>. The main <<RDF>> formats writers are available and it is possible also to store |
| the triples directly into an <<RDF4J>> repository to query them via <<SPARQL>>. |
| See {{{./apidocs/org/apache/any23/writer/RepositoryWriter.html}RepositoryWriter}} and the writer |
| {{{./apidocs/org/apache/any23/writer/package-summary.html}package}}. |
| |
| The extractor method invoked at <<line 6>> performs the metadata extraction. |
| This method accepts as first argument a {{{./apidocs/org/apache/any23/source/DocumentSource.html}DocumentSource}} and as |
| second argument a {{{./apidocs/org/apache/any23/writer/TripleHandler.html}TripleHandler}}, |
| that will receive the sequence parsing events generated by the applied extractors. The extract method defines also |
| another signature where it is possible to specify a charset encoding for the input data. If <<null>>, the charset |
| will be auto detected. |
| |
| The {{{./apidocs/org/apache/any23/writer/TripleHandler.html}TripleHandler}} needs to be explicitly closed, |
| this is done safely in a <<finally>> block at <<line 7>>. |
| |
| The expected output is <UTF-8> encoded at <<line 8>>: |
| |
| +---------------------------------------------------------------------------------------------- |
| <http://example.org/ns#bar> <http://example.org/ns#> <http://other.example.org/ns#> . |
| <http://other.example.org/ns#bar> <http://other.example.org/ns#> <http://example.org/ns#bar> . |
| +---------------------------------------------------------------------------------------------- |