blob: b845ad3cd475d4af8b870122ccd9fc70bda94d13 [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[
<!ENTITY imgroot "images/references/ref.resources/">
<!ENTITY tp "ugr.ref.resources.">
<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >
%uimaents;
]>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<chapter id="ugr.ref.resources">
<title>UIMA Resources</title>
<titleabbrev>UIMA Resources</titleabbrev>
<section id="ugr.ref.resources.overview">
<title>What is a UIMA Resource?</title>
<para>UIMA uses the term <code>Resource</code> to describe all UIMA components
that can be acquired by an application or by other resources.</para>
<figure id="ref.resource.fig.kinds">
<title>Resource Kinds</title>
<mediaobject>
<imageobject>
<imagedata width="3in" format="PNG" fileref="&imgroot;res_resource_kinds.png"/>
</imageobject>
<textobject><phrase>Resource Kinds, a partial list</phrase>
</textobject>
</mediaobject>
</figure>
<para>There are many kinds of resources; here's a list of the main kinds:
<variablelist>
<varlistentry>
<term><emphasis role="strong">Annotator</emphasis></term>
<listitem><para>a user written component, receives a CAS, does some processing, and returns the possibly
updated CAS. Variants include CollectionReaders, CAS Consumers, CAS Multipliers.</para></listitem>
</varlistentry>
<varlistentry>
<term><emphasis role="strong">Flow Controller</emphasis></term>
<listitem><para>a user written component controlling the flow of CASes within an aggregate.</para></listitem>
</varlistentry>
<varlistentry>
<term><emphasis role="strong">External Resource</emphasis></term>
<listitem><para>a user written component. Variants include:
<itemizedlist spacing="compact">
<listitem><para>Data - includes special lifecycle call to load data</para></listitem>
<listitem><para>Parameterized - allows multiple instantiations with simple string parameter variants;
example: a dictionary, that has variants in content for different languages</para></listitem>
<listitem><para>Configurable - supports configuration from the XML specifier</para></listitem>
</itemizedlist>
</para></listitem>
</varlistentry>
</variablelist>
</para>
<section id="ugr.ref.resources.resource-inner-implementations">
<title>Resource Inner Implementations</title>
<para>Many of the resource kinds include in their specification a (possibly optional) element, which is
the name of a Java class which implements the resource. We will call this class the "inner implementation".</para>
<para>The UIMA framework creates instances of Resource from resource specifiers, by calling
the framework's <code>produceResource(specifier, additional_parameters)</code> method.
This call produces a instance of Resource. </para>
<blockquote>
<para>
For example, calling produceResource on an AnalysisEngineDescription produces an instance of
AnalysisEngine. This, in turn will have a reference to the user-written inner implementation class.
specified by the <code>annotatorImplementationName</code>.
</para>
<para>External resource descriptors may include an <code>implementationName</code> element.
Calling produceResource on a ExternalResourceDescription produces an instance of Resource;
the resource obtained by subsequent calls to <code>getResource(...)</code>
is dependent on the particular descriptor, and may be an instance of
the inner implementation class.
</para>
</blockquote>
<para>For external resources, each resource specifier kind handles the case where
the inner implementation is omitted. If it is supplied, the named class must implement
the interface specified in the bindings for this resource. In addition, the particular specifier kind may
further restrict the kinds of classes the user supplies as the implementationName.
</para>
<para>Some examples of this further restriction:
<variablelist>
<varlistentry>
<term><emphasis role="strong">customResource</emphasis></term>
<listitem><para>the class must also implement the Resource interface</para></listitem>
</varlistentry>
<varlistentry>
<term><emphasis role="strong">dataResource</emphasis></term>
<listitem><para>the class must also implement the SharedResourceObject interface</para></listitem>
</varlistentry>
</variablelist>
</para>
</section>
</section>
<section id="ugr.ref.resources.sharing-across-pipelines">
<title>Sharing Resources, even across pipelines</title>
<titleabbrev>Sharing Resources</titleabbrev>
<para>UIMA applications run one or more UIMA Pipelines. Each pipeline has a top-level Analysis Engine, which
may be an aggregation of many other Analysis Engine components. The UIMA framework instantiates Annotator
resources as specified to configure the pipelines.</para>
<para>Sometimes, many identical pipelines are created (for example,
in order to exploit multi-core hardware by processing multiple CASes in parallel). In this case, the framework
would produce multiple instances of those Annotation resources; these are implemented as multiple instances
of the same Java class.</para>
<para>Sets of External Resources plus a CAS Pool and UIMA Extension ClassLoader are set up and kept,
per instance of a ResourceManager;
this instance serves to allow sharing of these items across one or more pipelines.
<itemizedlist>
<listitem>
<para>The UIMA Extension ClassLoader (if specified) is used to find the resources to be loaded
by the framework</para>
</listitem>
<listitem>
<para>The <code>External Resources</code> are specified by a pipeline's resource configuration.</para>
</listitem>
<listitem>
<para>The CAS Pool is a pool of CASs all with identical type systems and index definitions, associated
with a pipeline.</para>
</listitem>
</itemizedlist> </para>
<para>When setting up a pipeline, the UIMA Framework's <code>produceResource</code>
or one of its specialized variants is called, and a new
ResourceManager being created and used for that pipeline. However, in many cases, it may be advantageous to
share the same Resources across multiple pipelines; this is easily doable by passing a common instance of the
ResourceManager to the pipeline creation methods (using the additional parameters of the produceResource method).</para>
<para>
To handle additional use cases, the ResourceManager has a <code>copy()</code> method which creates a copy of the
Resource Manager instance. The new instance is created with a null CAS Manager; if you want to share the
the CAS Pool, you have to copy the CAS Manager: <code>newRM.setCasManager(originalRM.getCasManager())</code>.
You also may set the Extension Class Loader in the new instance (PEAR wrappers use this to allow
PEARs to have their own classpath). See the Javadocs for details.
</para>
</section>
<section id="ugr.ref.resources.external-resource-multiple-parameterized-instances">
<title>External Resources support for multiple Parameterized Instances</title>
<para>A typical external resource gets a single instantiation, shared with all users of a particular
ResourceManager.
Sometimes, multiple instantiations may be useful (of the same resource). The framework supports this for
ParameterizedDataResources. There's one kind supplied with UIMA - the fileLanguageResourceSpecifier.
This works by having each call to getResource(name, extra_keys[]) use the extra keys to select a particular
instance. On the first call for a particular instance, the named resource uses the extra keys to
initialize a new instance by calling its <code>load</code> method with a data resource derived from the
extra keys by the named resource.
</para>
<para>For example, the fileLanguageResourceSpecifier uses the language code and goes through
a process with lots of defaulting and fall back to find a resource to load, based on the language code.
</para>
</section>
</chapter>