blob: e7a30db640f74ca36198d2002dff866cbdb4b0de [file] [log] [blame]
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
[[ugr.ref.resources]]
= UIMA Resources
// <titleabbrev>UIMA Resources</titleabbrev>
[[ugr.ref.resources.overview]]
== What is a UIMA Resource?
UIMA uses the term `Resource` to describe all UIMA components that can be acquired by an application or by other resources.
.Resource Kinds
image::images/references/ref.resources/res_resource_kinds.png["Resource Kinds, a partial list"]
There are many kinds of resources; here's a list of the main kinds:
*Annotator*::
a user written component, receives a CAS, does some processing, and returns the possibly updated CAS.
Variants include CollectionReaders, CAS Consumers, CAS Multipliers.
*Flow Controller*::
a user written component controlling the flow of CASes within an aggregate.
*External Resource*::
a user written component.
Variants include:
+
* Data - includes special lifecycle call to load data
* Parameterized - allows multiple instantiations with simple string parameter variants; example: a dictionary, that has variants in content for different languages
* Configurable - supports configuration from the XML specifier
[[ugr.ref.resources.resource_inner_implementations]]
=== Resource Inner Implementations
Many of the resource kinds include in their specification a (possibly optional) element, which is the name of a Java class which implements the resource.
We will call this class the "inner implementation".
The UIMA framework creates instances of Resource from resource specifiers, by calling the framework's `produceResource(specifier, additional_parameters)` method.
This call produces a instance of Resource.
____
For example, calling produceResource on an AnalysisEngineDescription produces an instance of AnalysisEngine.
This, in turn will have a reference to the user-written inner implementation class.
specified by the ``annotatorImplementationName``.
External resource descriptors may include an `implementationName` element.
Calling produceResource on a ExternalResourceDescription produces an instance of Resource; the resource obtained by subsequent calls to `getResource(...)` is dependent on the particular descriptor, and may be an instance of the inner implementation class.
____
For external resources, each resource specifier kind handles the case where the inner implementation is omitted.
If it is supplied, the named class must implement the interface specified in the bindings for this resource.
In addition, the particular specifier kind may further restrict the kinds of classes the user supplies as the implementationName.
Some examples of this further restriction:
*customResource*::
the class must also implement the Resource interface
*dataResource*::
the class must also implement the SharedResourceObject interface
[[ugr.ref.resources.sharing_across_pipelines]]
== Sharing Resources, even across pipelines
// <titleabbrev>Sharing Resources</titleabbrev>
UIMA applications run one or more UIMA Pipelines.
Each pipeline has a top-level Analysis Engine, which may be an aggregation of many other Analysis Engine components.
The UIMA framework instantiates Annotator resources as specified to configure the pipelines.
Sometimes, many identical pipelines are created (for example, in order to exploit multi-core hardware by processing multiple CASes in parallel). In this case, the framework would produce multiple instances of those Annotation resources; these are implemented as multiple instances of the same Java class.
Sets of External Resources plus a CAS Pool and UIMA Extension ClassLoader are set up and kept, per instance of a ResourceManager; this instance serves to allow sharing of these items across one or more pipelines.
* The UIMA Extension ClassLoader (if specified) is used to find the resources to be loaded by the framework
* The `External Resources` are specified by a pipeline's resource configuration.
* The CAS Pool is a pool of CASs all with identical type systems and index definitions, associated with a pipeline.
When setting up a pipeline, the UIMA Framework's `produceResource` or one of its specialized variants is called, and a new ResourceManager being created and used for that pipeline.
However, in many cases, it may be advantageous to share the same Resources across multiple pipelines; this is easily doable by passing a common instance of the ResourceManager to the pipeline creation methods (using the additional parameters of the produceResource method).
To handle additional use cases, the ResourceManager has a `copy()` method which creates a copy of the Resource Manager instance.
The new instance is created with a null CAS Manager; if you want to share the the CAS Pool, you have to copy the CAS Manager: ``newRM.setCasManager(originalRM.getCasManager())``.
You also may set the Extension Class Loader in the new instance (PEAR wrappers use this to allow PEARs to have their own classpath). See the Javadocs for details.
[[ugr.ref.resources.external_resource_multiple_parameterized_instances]]
== External Resources support for multiple Parameterized Instances
A typical external resource gets a single instantiation, shared with all users of a particular ResourceManager.
Sometimes, multiple instantiations may be useful (of the same resource). The framework supports this for ParameterizedDataResources.
There's one kind supplied with UIMA - the fileLanguageResourceSpecifier.
This works by having each call to getResource(name, extra_keys[]) use the extra keys to select a particular instance.
On the first call for a particular instance, the named resource uses the extra keys to initialize a new instance by calling its `load` method with a data resource derived from the extra keys by the named resource.
For example, the fileLanguageResourceSpecifier uses the language code and goes through a process with lots of defaulting and fall back to find a resource to load, based on the language code.