| ------ |
| Apache Any23 - Configuration |
| ------ |
| The Apache Software Foundation |
| ------ |
| 2011-2012 |
| |
| ~~ Licensed to the Apache Software Foundation (ASF) under one or more |
| ~~ contributor license agreements. See the NOTICE file distributed with |
| ~~ this work for additional information regarding copyright ownership. |
| ~~ The ASF licenses this file to You under the Apache License, Version 2.0 |
| ~~ (the "License"); you may not use this file except in compliance with |
| ~~ the License. You may obtain a copy of the License at |
| ~~ |
| ~~ http://www.apache.org/licenses/LICENSE-2.0 |
| ~~ |
| ~~ Unless required by applicable law or agreed to in writing, software |
| ~~ distributed under the License is distributed on an "AS IS" BASIS, |
| ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| ~~ See the License for the specific language governing permissions and |
| ~~ limitations under the License. |
| |
| Configuration |
| |
| * Configure the Core Module |
| |
| The core module contains the main library code and the command-line implementation. |
| |
| The main library configuration parameters are managed by the |
| {{{./apidocs/org/apache/any23/configuration/DefaultConfiguration.html} Configuration}} |
| class. The default values are declared within the {{{https://github.com/apache/any23/blob/master/api/src/main/resources/default-configuration.properties} default-configuration.properties}} |
| file. The following sections explain how to override the default configuration. |
| |
| ** Override Default Configuration from Command-line |
| |
| The default configuration can be overriden via command-line by passing to the <<java>> command system properties |
| with the same name of the ones declared in configuration. |
| |
| For example to override the <<HTTP Max Client Connections>> parameter it is sufficient to add the following |
| option to the <<java>> command-line invocation: |
| |
| +---------------------------------------------------------------------------------------------- |
| -Dany23.http.client.max.connections=10 |
| +---------------------------------------------------------------------------------------------- |
| |
| any23 and any23server scripts accept the variable <<ANY23_OPTS>> to specify custom options. |
| It is possible to customize the <<HTTP Max Client Connections>> for the <<any23>> script simply using: |
| |
| +---------------------------------------------------------------------------------------------- |
| cli/target/appassembler/bin/$ ANY23_OPTS="-Dany23.http.client.max.connections=10" any23 http://path/to/resource |
| +---------------------------------------------------------------------------------------------- |
| |
| ** Override Default Configuration Programmatically |
| |
| The {{{./apidocs/org/apache/any23/configuration/Configuration.html} Configuration}} |
| properties can be accessed in read-only mode just retrieving the configuration <<singleton>> instance.\ |
| Such instance is <immutable>: |
| |
| +---------------------------------------------------------------------------------------------- |
| final Configuration immutableConf = DefaultConfiguration.singleton(); |
| final String propertyValue = immutableConf.getProperty("propertyName", "default value"); |
| ... |
| +---------------------------------------------------------------------------------------------- |
| |
| To obtain a <modifiable> {{{./apidocs/org/apache/any23/configuration/Configuration.html} Configuration}} |
| instead it is possible to use the <<copy()>> method.\ |
| One of the <<Apache Any23>> constructors accepts a <<Configuration>> object that allows to customize the behavior |
| of the <<Apache Any23>> instance for its entire life-cycle. |
| |
| +---------------------------------------------------------------------------------------------- |
| final ModifiableConfiguration modifiableConf = DefaultConfiguration.copy(); |
| final String oldPropertyValue = modifiableConf.setProperty("propertyName", "new property value"); |
| final Apache Any23 any23 = new Apache Any23(modifiableConf, "extractor1", ...); |
| ... |
| +---------------------------------------------------------------------------------------------- |
| |
| * Use of ExtractionParameters |
| |
| It is possible to customize the behavior of a single data extraction by providing an |
| {{{./apidocs/org/apache/any23/extractor/ExtractionParameters.html} ExtractionParameters}} |
| instance to one the <Apache Any23#extract()> methods accepting it. <<ExtractionParameters>> allows to customize any <property> and <flag> |
| other then the <<specific extraction options>>.\ |
| If no custom parameters are specified the default configuration values are used. |
| |
| +---------------------------------------------------------------------------------------------- |
| final Any23 any23 = ... |
| final TripleHandler tripleHandler = ... |
| final ExtractionParameters extractionParameters = ExtractionParameters.getDefault(); |
| extractionParameters.setFlag("any23.microdata.strict", true); |
| any23.extract(extractionParameters, "http://path/to/doc", tripleHandler); |
| +---------------------------------------------------------------------------------------------- |
| |
| * Apache Any23 Core Module Default Configuration |
| |
| *-------------------------------------------+-------------------------------+------------------------------------------------------------------------------------+ |
| | Property Name | Default Property Value |Description | |
| *-------------------------------------------+-------------------------------+------------------------------------------------------------------------------------+ |
| | any23.core.version | <current any23 core version> |String declaring the Apache Any23 Core module version. | |
| *-------------------------------------------+-------------------------------+------------------------------------------------------------------------------------+ |
| |any23.http.user.agent.default |Apache Any23-CLI |User Agent Name used for HTTP requests. | |
| *-------------------------------------------+-------------------------------+------------------------------------------------------------------------------------+ |
| |any23.http.client.timeout |10000 (10 secs) |Timeout in milliseconds for a HTTP request. | |
| *-------------------------------------------+-------------------------------+------------------------------------------------------------------------------------+ |
| |any23.http.client.max.connections |5 |Max number of concurrent HTTP connections allowed by the internal Apache Any23 HTTP client.| |
| *-------------------------------------------+-------------------------------+------------------------------------------------------------------------------------+ |
| |any23.rdfa.extractor.xslt |rdfa.xslt |XSLT Stylesheet to be used to perform HTML to RDF extraction of RDFa. | |
| *-------------------------------------------+-------------------------------+------------------------------------------------------------------------------------+ |
| |any23.extraction.metadata.timesize |off (possible values: on/off) |Activates/deactivates the generation of time and size metadata triples. | |
| *-------------------------------------------+-------------------------------+------------------------------------------------------------------------------------+ |
| |any23.extraction.metadata.nesting |on (possible values: on/off) |Activates/deactivates the generation of nesting triples for Microformat entities. | |
| *-------------------------------------------+-------------------------------+------------------------------------------------------------------------------------+ |
| |any23.extraction.metadata.domain.per.entity|on (possible values: on/off) |Activates/deactivates the generation of domain triple per entity. | |
| *-------------------------------------------+-------------------------------+------------------------------------------------------------------------------------+ |
| |any23.extraction.rdfa.programmatic |on (possible values: on/off) |Switches between the programmatic RDFa 1.1 Extractor and the RDFa 1.0 XSLT base one.| | |
| *-------------------------------------------+-------------------------------+------------------------------------------------------------------------------------+ |
| |any23.extraction.context.iri |?(means current document IRI) |Default value for extraction content IRI. | |
| *-------------------------------------------+-------------------------------+------------------------------------------------------------------------------------+ |
| |any23.plugin.dirs |./plugins |Directory containing Apache Any23 plugins. | |
| *-------------------------------------------+-------------------------------+------------------------------------------------------------------------------------+ |
| |any23.microdata.strict |on (possible values: on/off) |Activates/deactivates the microdata strict validation. | |
| *-------------------------------------------+-------------------------------+------------------------------------------------------------------------------------+ |
| |any23.microdata.ns.default |http://schema.org/|Microdata default namespace. | |
| *-------------------------------------------+-------------------------------+------------------------------------------------------------------------------------+ |
| |any23.extraction.head.meta |on (possible values: on/off) |Activates/deactivates the HTMLMetaExtractor. | |
| *-------------------------------------------+-------------------------------+------------------------------------------------------------------------------------+ |
| |any23.extraction.csv.field |, |CSVExtractor field separator. | |
| *-------------------------------------------+-------------------------------+------------------------------------------------------------------------------------+ |
| |any23.extraction.csv.comment |# |CSVExtractor line comment marker. | |
| *-------------------------------------------+-------------------------------+------------------------------------------------------------------------------------+ |