apache-stanbol-data-1.1.0
STANBOL-889: copy for tag apache-stanbol-data-1.1.0

git-svn-id: https://svn.apache.org/repos/asf/stanbol/tags/apache-stanbol-data-1.1.0@1432861 13f79535-47bb-0310-9956-ffa450edef68
tree: c2427259531ed248d12449859147f13839e66b26
  1. defaultconfig/
  2. opennlp/
  3. parent/
  4. registries/
  5. sentiment/
  6. sites/
  7. pom.xml
  8. README.md
README.md

Data files and configurations for Stanbol

Introduction

This source repository holds artifacts that are used to provide

  • data files (NER models, Entity data ...)
  • configurations of stanbol components or well known datasets, spefic domains

This also contains the default configuration used with the default Stanbol Launchers. Have a look at the defaultdata and opennlp bundlelists in the launchers/bundlelists folder. Generally this is a good place for users that want to build their own launcher to look for examples.

Management of Data Files

To avoid loading subversion repository with large binary files this artifacts are typically not included but need to be build/precomputed or downloaded form other sites. The the documentations of the according module for details.

Modules of this repository tree are typically NOT part of the Stanbol reactor. Because they are considered optional and typically it is necessary to download/ precompute some resources users might not want to do for each build.

Bundles used as default configuration by the Stanbol Launchers are also available by included Maven repositories and will be downloaded during the normal Stanbol build (if not yet available in the local cache).

OpenNLP

This sub-folder contains bundles that contain several OpenNLP models. Such bundles will contribute such files to the Stanbol DataFileProvider.

Sites

This sub-folder contains bundles that install ReferencedSites to the Stanbol Entityhub. Typically such bundles only contain the configuration but do not include the actual data. However for small data sets the index might also be included in the bundle. See the README.md files for details.

Notes

Bundles created by the various modules depend on the following two components:

DataFileProvider Service

The DataFileProvoder Service is typically used by components that need to load big binary files to Apache Stanbol. See {stanbol-root}/commons/stanboltools/datafileprovider for details

Bundleprovider

The Bundleprovider is an extension to the Apache Sling installer framework and supports to load multiple configuration files form a single bundle.

It is intended to be used in cases where a single Stanbol module needs to package several configuration files (e.g. the configuration of several OSGI Services).

See {stanbol-root}/commons/installer/bundleprovider for details.