blob: 1621fa0307db7bd632a3873a19a860283e3bf3f5 [file] [log] [blame]
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
[[_ugr.tools.uimafit.externalresources]]
= External Resources
An analysis engine often uses some data model.
This may be as simple as word frequency counts or as complex as the model of a parser.
Often these models can become quite large.
If an analysis engine is deployed multiple times in the same pipeline or runs on multiple CPU cores, memory can be saved by using a shared instance of the data model.
UIMA supports such a scenario by so-called external resources.
The following sections illustrates how external resources can be used with uimaFIT.
First create a class for the shared data model.
Usually this class would load its data from some URI and then expose it via its methods.
An example would be to load word frequency counts and to provide a `getFrequency()` method.
In our simple example we do not load anything from the provided URI - we just offer a method to get the URI from which data be loaded.
[source,java]
----
// Simple model that only stores the URI it was loaded from. Normally data
// would be loaded from the URI instead and made accessible through methods
// in this class. This simple example only allows accessing the URI.
public static final class SharedModel implements SharedResourceObject {
private String uri;
public void load(DataResource aData)
throws ResourceInitializationException {
uri = aData.getUri().toString();
}
public String getUri() { return uri; }
}
----
== Resource injection
=== Regular UIMA components
When an external resource is used in a regular UIMA component, it is usually fetched from the context, cast and copied to a class member variable.
[source,java]
----
class MyAnalysisEngine extends CasAnnotator_ImplBase {
final static String MODEL_KEY = "Model";
private SharedModel model;
public void initialize(UimaContext context)
throws ResourceInitializationException {
configuredResource = (SharedModel)
getContext().getResourceObject(MODEL_KEY);
}
}
----
uimaFIT can be used to inject external resources into such traditional components using the `createDependencyAndBind()` method.
To show that this works with any off-the-shelf UIMA component, the following example uses uimaFIT to configure the OpenNLP Tokenizer:
[source,java]
----
// Create descriptor
AnalysisEngineDescription tokenizer = createEngineDescription(
Tokenizer.class,
UimaUtil.TOKEN_TYPE_PARAMETER, Token.class.getName(),
UimaUtil.SENTENCE_TYPE_PARAMETER, Sentence.class.getName());
// Create the external resource dependency for the model and bind it
createDependencyAndBind(tokenizer, UimaUtil.MODEL_PARAMETER,
TokenizerModelResourceImpl.class,
"http://opennlp.sourceforge.net/models-1.5/en-token.bin");
----
[NOTE]
====
We recommend declaring parameter constants in the classes that use them, e.g.
here in `Tokenizer`.
This way, the parameters for a class can be found easily.
However, OpenNLP declares parameters centrally in `UimaUtil`.
Thus, the example above is correct, although unconventional.
====
[NOTE]
====
Note that uimaFIT is unable to perform type-coercion on parameters if a descriptor is created from a class that does not contain `@ConfigurationParameter` annotations, such as the OpenNLP `Tokenizer`.
Such a descriptor does not contain any parameter declarations! However, it is still possible to configure such a component using uimaFIT by passing exactly the expected types as parameter values.
Thus, we need use the `getName()` method to get the class name as a string, instead of simply passing the class itself.
Also, setting multi-valued parameter from a list or single value does not work here.
Multi-values parameters must be passed as an array of the required type.
Only the default UIMA types are possible: `String`, `boolean`, `int`, and `float`.
====
=== uimaFIT-aware components
uimaFIT provides the `@ExternalResource` annotation to inject external resources directly into class member variables.
.`@ExternalResource` annotation
[cols="1,1,1", frame="all", options="header"]
|===
| Parameter
| Description
| Default
|key
|Resource key
|field name
|api
|Used when the external resource type is different from the field type, e.g.
when using an ExternalResourceLocator
|field type
|mandatory
|Whether a value must be specified
|true
|===
[source,java]
----
// Example annotator that uses the SharedModel. In the process() we only
// test if the model was properly initialized by uimaFIT
public static class Annotator
extends org.apache.uima.fit.component.JCasAnnotator_ImplBase {
final static String MODEL_KEY = "Model";
@ExternalResource(key = MODEL_KEY)
private SharedModel model;
public void process(JCas aJCas) throws AnalysisEngineProcessException {
assertTrue(model.getUri().endsWith("gene_model_v02.bin"));
// Prints the instance ID to the console - this proves the same
// instance of the SharedModel is used in both Annotator instances.
System.out.println(model);
}
}
----
Note, that it is no longer necessary to implement the `initialize()` method.
uimaFIT takes care of locating the external resource `Model` in the UIMA context and assigns it to the field `model`.
If a mandatory resource is not present in the context, an exception is thrown.
The resource injection mechanism is implemented in the `ExternalResourceInitializer` class.
uimaFIT provides several base classes that already come with an `initialize()` method using the initializer:
* `CasAnnotator_ImplBase`
* `CasCollectionReader_ImplBase`
* `CasConsumer_ImplBase`
* `CasFlowController_ImplBase`
* `CasMultiplier_ImplBase`
* `JCasAnnotator_ImplBase`
* `JCasCollectionReader_ImplBase`
* `JCasConsumer_ImplBase`
* `JCasFlowController_ImplBase`
* `JCasMultiplier_ImplBase`
* `Resource_ImplBase`
When building a pipeline, external resources can be set of a component just like configuration parameters.
External resources and configuration parameters can be mixed and appear in any order when creating a component description.
Note that in the following example, we create only one external resource description and use it to configure two different analysis engines.
Because we only use a single description, also only a single instance of the external resource is created and shared between the two engines.
[source,java]
----
ExternalResourceDescription extDesc = createSharedResourceDescription(
SharedModel.class, new File("somemodel.bin"));
// Binding external resource to each Annotator individually
AnalysisEngineDescription aed1 = createEngineDescription(
Annotator.class,
Annotator.MODEL_KEY, extDesc);
AnalysisEngineDescription aed2 = createEngineDescription(
Annotator.class,
Annotator.MODEL_KEY, extDesc);
// Check the external resource was injected
AnalysisEngineDescription aaed = createEngineDescription(aed1, aed2);
AnalysisEngine ae = createEngine(aaed);
ae.process(ae.newJCas());
----
This example is given as a full JUnit-based example in the the _uimaFIT-examples_ project.
=== Resources extending Resource_ImplBase
One kind of resources extend `Resource_ImplBase`.
These are the easiest to handle, because uimaFIT's version of `Resource_ImplBase` already implements the necessary logic.
Just be sure to call `super.initialize()` when overriding `initialize()`.
Also mind that external resources are not available yet when `initialize()` is called.
For any initialization logic that requires resources, override and implement `afterResourcesInitialized()`.
Other than that, injection of external resources works as usual.
[source,java]
----
public static class ChainableResource extends Resource_ImplBase {
public final static String PARAM_CHAINED_RESOURCE = "chainedResource";
@ExternalResource(key = PARAM_CHAINED_RESOURCE)
private ChainableResource chainedResource;
public void afterResourcesInitialized() {
// init logic that requires external resources
}
}
----
=== Resources implementing SharedResourceObject
The other kind of resources implement `SharedResourceObject``.
Since this is an interface, uimaFIT cannot provide the initialization logic, so you have to implement a couple of things in the resource:
* implement `ExternalResourceAware`
* declare a configuration parameter `ExternalResourceFactory.PARAM_RESOURCE_NAME` and return its value in `getResourceName()`
* invoke `ConfigurationParameterInitializer.initialize()` in the `load()` method.
Again, mind that external resource not properly initialized until uimaFIT invokes `afterResourcesInitialized()`.
[source,java]
----
public class TestSharedResourceObject implements
SharedResourceObject, ExternalResourceAware {
@ConfigurationParameter(name=ExternalResourceFactory.PARAM_RESOURCE_NAME)
private String resourceName;
public final static String PARAM_CHAINED_RESOURCE = "chainedResource";
@ExternalResource(key = PARAM_CHAINED_RESOURCE)
private ChainableResource chainedResource;
public String getResourceName() {
return resourceName;
}
public void load(DataResource aData)
throws ResourceInitializationException {
ConfigurationParameterInitializer.initialize(this, aData);
// rest of the init logic that does not require external resources
}
public void afterResourcesInitialized() {
// init logic that requires external resources
}
}
----
=== Note on injecting resources into resources
Nested resources are only initialized if they are used in a pipeline which contains at least one component that calls `ConfigurationParameterInitializer.initialize()`.
Any component extending uimaFIT's component base classes qualifies.
If you use nested resources in a pipeline without any uimaFIT-aware components, you can just add uimaFIT's `NoopAnnotator` to the pipeline.
== Resource locators
Normally, in UIMA an external resource needs to implement either `SharedResourceObject` or `Resource`.
In order to inject arbitrary objects, uimaFIT has the concept of `ExternalResourceLocator`.
When a resource implements this interface, not the resource itself is injected, but the method `getResource()` is called on the resource and the result is injected.
The following example illustrates how to inject an object from JNDI into a UIMA component:
[source,java]
----
class MyAnalysisEngine2 extends JCasAnnotator_ImplBase {
static final String RES_DICTIONARY = "dictionary";
@ExternalResource(key = RES_DICTIONARY)
Dictionary dictionary;
}
AnalysisEngineDescription desc = createEngineDescription(
MyAnalysisEngine2.class);
bindResource(desc, MyAnalysisEngine2.RES_DICTIONARY,
JndiResourceLocator.class,
JndiResourceLocator.PARAM_NAME, "dictionaries/german");
----