<?xml version="1.0" encoding="UTF-8"?> | |
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN" | |
"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[ | |
<!ENTITY imgroot "images/tutorials_and_users_guides/tug.fc/"> | |
<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent"> | |
%uimaents; | |
]> | |
<!-- | |
Licensed to the Apache Software Foundation (ASF) under one | |
or more contributor license agreements. See the NOTICE file | |
distributed with this work for additional information | |
regarding copyright ownership. The ASF licenses this file | |
to you under the Apache License, Version 2.0 (the | |
"License"); you may not use this file except in compliance | |
with the License. You may obtain a copy of the License at | |
http://www.apache.org/licenses/LICENSE-2.0 | |
Unless required by applicable law or agreed to in writing, | |
software distributed under the License is distributed on an | |
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | |
KIND, either express or implied. See the License for the | |
specific language governing permissions and limitations | |
under the License. | |
--> | |
<chapter id="ugr.tug.fc"> | |
<title>Flow Controller Developer's Guide</title> | |
<para>A Flow Controller is a component that plugs into an Aggregate Analysis Engine. When a CAS is input to the | |
Aggregate, the Flow Controller determines the order in which the components of that aggregate are invoked on that | |
CAS. The ability to provide your own Flow Controller implementation is new as of release 2.0 of UIMA.</para> | |
<para>Flow Controllers may decide the flow dynamically, based on the contents of the CAS. So, as just one example, | |
you could develop a Flow Controller that first sends each CAS to a Language Identification Annotator and then, | |
based on the output of the Language Identification Annotator, routes that CAS to an Annotator that is specialized | |
for that particular language.</para> | |
<section id="ugr.tug.fc.developing_fc_code"> | |
<title>Developing the Flow Controller Code</title> | |
<section id="ugr.tug.fc.fc_interface_overview"> | |
<title>Flow Controller Interface Overview</title> | |
<para>Flow Controller implementations should extend from the | |
<literal>JCasFlowController_ImplBase</literal> or | |
<literal>CasFlowController_ImplBase</literal> classes, depending on which CAS interface they prefer | |
to use. As with other types of components, the Flow Controller ImplBase classes define optional | |
<literal>initialize</literal>, <literal>destroy</literal>, and <literal>reconfigure</literal> | |
methods. They also define the required method <literal>computeFlow</literal>.</para> | |
<para>The <literal>computeFlow</literal> method is called by the framework whenever a new CAS enters the | |
Aggregate Analysis Engine. It is given the CAS as an argument and must return an object which implements the | |
<literal>Flow</literal> interface (the Flow object). The Flow Controller developer must define this | |
object. It is the object that is responsible for routing this particular CAS through the components of the | |
Aggregate Analysis Engine. For convenience, the framework provides basic implementation of flow objects | |
in the classes CasFlow_ImplBase and JCasFlow_ImplBase; use the JCas one if you are using the JCas interface | |
to the CAS.</para> | |
<para>The framework then uses the Flow object and calls its <literal>next()</literal> method, which returns | |
a <literal>Step</literal> object (implemented by the UIMA Framework) that indicates what to do next with | |
this CAS next. There are three types of steps currently supported:</para> | |
<itemizedlist> | |
<listitem> | |
<para><literal>SimpleStep</literal>, which specifies a single Analysis Engine that should receive | |
the CAS next.</para> | |
</listitem> | |
<listitem> | |
<para><literal>ParallelStep</literal>, which specifies that multiple Analysis Engines should | |
receive the CAS next, and that the relative order in which these Analysis Engines execute does not | |
matter. Logically, they can run in parallel. The runtime is not obligated to actually execute them in | |
parallel, however, and the current implementation will execute them serially in an arbitrary | |
order.</para> | |
</listitem> | |
<listitem> | |
<para><literal>FinalStep</literal>, which indicates that the flow is completed. </para> | |
</listitem> | |
</itemizedlist> | |
<para>After executing the step, the framework will call the Flow object's <literal>next()</literal> | |
method again to determine the next destination, and this will be repeated until the Flow Object indicates | |
that processing is complete by returning a <literal>FinalStep</literal>.</para> | |
<para>The Flow Controller has access to a <literal>FlowControllerContext</literal>, which is a subtype of | |
<literal>UimaContext</literal>. In addition to the configuration parameter and resource access | |
provided by a <literal>UimaContext</literal>, the <literal>FlowControllerContext</literal> also | |
gives access to the metadata for all of the Analysis Engines that the Flow Controller can route CASes to. Most | |
Flow Controllers will need to use this information to make routing decisions. You can get a handle to the | |
<literal>FlowControllerContext</literal> by calling the <literal>getContext()</literal> method | |
defined in <literal>JCasFlowController_ImplBase</literal> and | |
<literal>CasFlowController_ImplBase</literal>. Then, the | |
<literal>FlowControllerContext.getAnalysisEngineMetaDataMap</literal> method can be called to get a | |
map containing an entry for each of the Analysis Engines in the Aggregate. The keys in this map are the same as | |
the delegate analysis engine keys specified in the aggregate descriptor, and the values are the | |
corresponding <literal>AnalysisEngineMetaData</literal> objects.</para> | |
<para>Finally, the Flow Controller has optional methods <literal>addAnalysisEngines</literal> and | |
<literal>removeAnalysisEngines</literal>. These methods are intended to notify the Flow Controller if | |
new Analysis Engines are available to route CASes to, or if previously available Analysis Engines are no | |
longer available. However, the current version of the Apache UIMA framework does not support dynamically | |
adding or removing Analysis Engines to/from an aggregate, so these methods are not currently called. Future | |
versions may support this feature. </para> | |
</section> | |
<section id="ugr.tug.fc.example_code"> | |
<title>Example Code</title> | |
<para>This section walks through the source code of an example Flow Controller that simluates a simple version | |
of the <quote>Whiteboard</quote> flow model. At each step of the flow, the Flow Controller looks it all of the | |
available Analysis Engines that have not yet run on this CAS, and picks one whose input requirements are | |
satisfied.</para> | |
<para>The Java class for the example is | |
<literal>org.apache.uima.examples.flow.WhiteboardFlowController</literal> and the source code is | |
included in the UIMA SDK under the <literal>examples/src</literal> directory.</para> | |
<section id="ugr.tug.fc.whiteboard"> | |
<title>The WhiteboardFlowController Class</title> | |
<programlisting>public class WhiteboardFlowController | |
extends CasFlowController_ImplBase { | |
public Flow computeFlow(CAS aCAS) | |
throws AnalysisEngineProcessException { | |
WhiteboardFlow flow = new WhiteboardFlow(); | |
// As of release 2.3.0, the following is not needed, | |
// because the framework does this automatically | |
// flow.setCas(aCAS); | |
return flow; | |
} | |
class WhiteboardFlow extends CasFlow_ImplBase { | |
// Discussed Later | |
} | |
}</programlisting> | |
<para>The <literal>WhiteboardFlowController</literal> extends from | |
<literal>CasFlowController_ImplBase</literal> and implements the | |
<literal>computeFlow</literal> method. The implementation of the <literal>computeFlow</literal> | |
method is very simple; it just constructs a new <literal>WhiteboardFlow</literal> object that will be | |
responsible for routing this CAS. The framework will add a handle to that CAS | |
which it will later use to make its routing decisions.</para> | |
<para>Note that we will have one instance of <literal>WhiteboardFlow</literal> per CAS, so if there are | |
multiple CASes being simultaneously processed there will not be any confusion.</para> | |
</section> | |
<section id="ugr.tug.fc.whiteboardflow"> | |
<title>The WhiteboardFlow Class</title> | |
<programlisting>class WhiteboardFlow extends CasFlow_ImplBase { | |
private Set mAlreadyCalled = new HashSet(); | |
public Step next() throws AnalysisEngineProcessException { | |
// Get the CAS that this Flow object is responsible for routing. | |
// Each Flow instance is responsible for a single CAS. | |
CAS cas = getCas(); | |
// iterate over available AEs | |
Iterator aeIter = getContext().getAnalysisEngineMetaDataMap(). | |
entrySet().iterator(); | |
while (aeIter.hasNext()) { | |
Map.Entry entry = (Map.Entry) aeIter.next(); | |
// skip AEs that were already called on this CAS | |
String aeKey = (String) entry.getKey(); | |
if (!mAlreadyCalled.contains(aeKey)) { | |
// check for satisfied input capabilities | |
//(i.e. the CAS contains at least one instance | |
// of each required input | |
AnalysisEngineMetaData md = | |
(AnalysisEngineMetaData) entry.getValue(); | |
Capability[] caps = md.getCapabilities(); | |
boolean satisfied = true; | |
for (int i = 0; i < caps.length; i++) { | |
satisfied = inputsSatisfied(caps[i].getInputs(), cas); | |
if (satisfied) | |
break; | |
} | |
if (satisfied) { | |
mAlreadyCalled.add(aeKey); | |
if (mLogger.isLoggable(Level.FINEST)) { | |
getContext().getLogger().log(Level.FINEST, | |
"Next AE is: " + aeKey); | |
} | |
return new SimpleStep(aeKey); | |
} | |
} | |
} | |
// no appropriate AEs to call - end of flow | |
getContext().getLogger().log(Level.FINEST, "Flow Complete."); | |
return new FinalStep(); | |
} | |
private boolean inputsSatisfied(TypeOrFeature[] aInputs, CAS aCAS) { | |
//implementation detail; see the actual source code | |
} | |
}</programlisting> | |
<para>Each instance of the <literal>WhiteboardFlowController</literal> is responsible for routing a | |
single CAS. A handle to the CAS instance is available by calling the <literal>getCas()</literal> method, | |
which is a standard method defined on the <literal>CasFlow_ImplBase </literal>superclass.</para> | |
<para>Each time the <literal>next</literal> method is called, the Flow object iterates over the metadata | |
of all of the available Analysis Engines (obtained via the call to <literal>getContext(). | |
getAnalysisEngineMetaDataMap)</literal> and sees if the input types declared in an | |
AnalysisEngineMetaData object are satisfied by the CAS (that is, the CAS contains at least one instance of | |
each declared input type). The exact details of checking for instances of types in the CAS are not discussed | |
here – see the WhiteboardFlowController.java file for the complete source.</para> | |
<para>When the Flow object decides which AnalysisEngine should be called next, it indicates this by | |
creating a SimpleStep object with the key for that AnalysisEngine and returning it:</para> | |
<programlisting>return new SimpleStep(aeKey);</programlisting> | |
<para>The Flow object keeps a list of which Analysis Engines it has invoked in the | |
<literal>mAlreadyCalled</literal> field, and never invokes the same Analysis Engine twice. Note this | |
is not a hard requirement. It is acceptable to design a FlowController that invokes the same Analysis | |
Engine more than once. However, if you do this you must make sure that the flow will eventually | |
terminate.</para> | |
<para>If there are no Analysis Engines left whose input requirements are satisfied, the Flow object signals | |
the end of the flow by returning a FinalStep object:</para> | |
<programlisting>return new FinalStep();</programlisting> | |
<para>Also, note the use of the logger to write tracing messages indicating the decisions made by the Flow | |
Controller. This is a good practice that helps with debugging if the Flow Controller is behaving in an | |
unexpected way.</para> | |
</section> | |
</section> | |
</section> | |
<section id="ugr.tug.fc.creating_fc_descriptor"> | |
<title>Creating the Flow Controller Descriptor</title> | |
<para>To create a Flow Controller Descriptor in the CDE, use File → New → Other | |
→ UIMA → Flow Controller Descriptor File: | |
<screenshot> | |
<mediaobject> | |
<imageobject> | |
<imagedata width="5.5in" format="JPG" fileref="&imgroot;image002.jpg"/> | |
</imageobject> | |
<textobject><phrase>Screenshot of Eclipse new object wizard showing Flow Controller</phrase></textobject> | |
</mediaobject> | |
</screenshot></para> | |
<para>This will bring up the Overview page for the Flow Controller Descriptor: | |
<screenshot> | |
<mediaobject> | |
<imageobject> | |
<imagedata width="5.5in" format="JPG" fileref="&imgroot;image004.jpg"/> | |
</imageobject> | |
<textobject><phrase>Screenshot of Component Descriptor Editor Overview page for new Flow Controller</phrase></textobject> | |
</mediaobject> | |
</screenshot></para> | |
<para>Type in the Java class name that implements the Flow Controller, or use the <quote>Browse</quote> button | |
to select it. You must select a Java class that implements the <literal>FlowController</literal> | |
interface.</para> | |
<para>Flow Controller Descriptors are very similar to Primitive Analysis Engine Descriptors – for | |
example you can specify configuration parameters and external resources if you wish.</para> | |
<para>If you wish to edit a Flow Controller Descriptor by hand, see <olink targetdoc="&uima_docs_ref;"/> | |
<olink targetdoc="&uima_docs_ref;" | |
targetptr="ugr.ref.xml.component_descriptor.flow_controller"/> for the syntax.</para> | |
</section> | |
<section id="ugr.tug.fc.adding_fc_to_aggregate"> | |
<title>Adding a Flow Controller to an Aggregate Analysis Engine</title> | |
<titleabbrev>Adding Flow Controller to an Aggregate</titleabbrev> | |
<para>To use a Flow Controller you must add it to an Aggregate Analysis Engine. You can only have one Flow | |
Controller per Aggregate Analysis Engine. In the Component Descriptor Editor, the Flow Controller is | |
specified on the Aggregate page, as a choice in the flow control kind - pick <quote>User-defined Flow</quote>. | |
When you do, the Browse and Search buttons underneath become active, and allow you to specify an existing Flow | |
Controller Descriptor, which when you select it, will be imported into the aggregate descriptor. | |
<screenshot> | |
<mediaobject> | |
<imageobject> | |
<imagedata width="4.5in" format="JPG" fileref="&imgroot;image006.jpg"/> | |
</imageobject> | |
<textobject><phrase>Screenshot of Component Descriptor Editor Aggregate page showing selecting user-defined flow</phrase></textobject> | |
</mediaobject> | |
</screenshot></para> | |
<para>The key name is created automatically from the name element in the Flow Controller Descriptor being | |
imported. If you need to change this name, you can do so by switching to the <quote>Source</quote> view using the | |
bottom tabs, and editing the name in the XML source.</para> | |
<para>If you edit your Aggregate Analysis Engine Descriptor by hand, the syntax for adding a Flow Controller is: | |
<programlisting> <delegateAnalysisEngineSpecifiers> | |
... | |
</delegateAnalysisEngineSpecifiers> | |
<emphasis role="bold"><flowController key=<quote>[String]</quote>> | |
<import .../> | |
</flowController></emphasis></programlisting></para> | |
<para>As usual, you can use either in import by location or import by name – see <olink | |
targetdoc="&uima_docs_ref;"/> <olink | |
targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.component_descriptor.imports"/>.</para> | |
<para>The key that you assign to the FlowController can be used elsewhere in the Aggregate Analysis Engine | |
Descriptor – in parameter overrides, resource bindings, and Sofa mappings.</para> | |
</section> | |
<section id="ugr.tug.fc.adding_fc_to_cpe"> | |
<title>Adding a Flow Controller to a Collection Processing Engine</title> | |
<titleabbrev>Adding Flow Controller to CPE</titleabbrev> | |
<para>Flow Controllers cannot be added directly to Collection Processing Engines. To use a Flow Controller in a | |
CPE you first need to wrap the part of your CPE that requires complex flow control into an Aggregate Analysis | |
Engine, and then add the Aggregate Analysis Engine to your CPE. The CPE's deployment and error handling | |
options can then only be configured for the entire Aggregate Analysis Engine as a unit.</para> | |
</section> | |
<section id="ugr.tug.fc.using_fc_with_cas_multipliers"> | |
<title>Using Flow Controllers with CAS Multipliers</title> | |
<para>If you want your Flow Controller to work inside an Aggregate Analysis Engine that contains a CAS Multiplier | |
(see <olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.cm"/>), there are additional | |
things you must consider.</para> | |
<para>When your Flow Controller routes a CAS to a CAS Multiplier, the CAS Multiplier may produce new CASes that | |
then will also need to be routed by the Flow Controller. When a new output CAS is produced, the framework will call | |
the <literal>newCasProduced</literal> method on the Flow object that was managing the flow of the parent CAS | |
(the one that was input to the CAS Multiplier). The <literal>newCasProduced</literal> method must create a new Flow | |
object that will be responsible for routing the new output CAS.</para> | |
<para>In the <literal>CasFlow_ImplBase</literal> and <literal>JCasFlow_ImplBase</literal> classes, the | |
<literal>newCasProduced</literal> method is defined to throw an exception indicating that the Flow | |
Controller does not handle CAS Multipliers. If you want your Flow Controller to properly deal with CAS | |
Multipliers you must override this method.</para> | |
<para>If your Flow class extends <literal>CasFlow_ImplBase</literal>, the method signature to override is: | |
<programlisting>protected Flow newCasProduced(CAS newOutputCas, String producedBy)</programlisting> | |
</para> | |
<para>If your Flow class extends <literal>JCasFlow_ImplBase</literal>, the method signature to override is: | |
<programlisting>protected Flow newCasProduced(JCas newOutputCas, String producedBy)</programlisting> | |
</para> | |
<para>Also, there is a variant of <literal>FinalStep</literal> which can only be specified for output CASes | |
produced by CAS Multipliers within the Aggregate Analysis Engine containing the Flow Controller. This | |
version of <literal>FinalStep</literal> is produced by the calling the constructor with a | |
<literal>true</literal> argument, and it causes the CAS to be immediately released back to the pool. No | |
further processing will be done on it and it will not be output from the aggregate. This is the way that you can | |
build an Aggregate Analysis Engine that outputs some new CASes but not others. Note that if you never want any new | |
CASes to be output from the Aggregate Analysis Engine, you don't need to use this; instead just declare | |
<literal><outputsNewCASes>false</outputsNewCASes></literal> in your Aggregate Analysis | |
Engine Descriptor as described in <olink targetdoc="&uima_docs_tutorial_guides;" | |
targetptr="ugr.tug.cm.aggregate_cms"/>.</para> | |
<para>For more information on how CAS Multipliers interact with Flow Controllers, see | |
<olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.cm.cm_and_fc"/>. | |
</para> | |
</section> | |
<section id="ugr.tug.fc.continuing_when_exceptions_occur"> | |
<title>Continuing the Flow When Exceptions Occur</title> | |
<para> If an exception occurs when processing a CAS, the framework may call the method | |
<programlisting>boolean continueOnFailure(String failedAeKey, Exception failure)</programlisting> | |
on the Flow object that was managing the flow of that CAS. If this method returns <literal>true</literal>, then | |
the framework may continue to call the <literal>next()</literal> method to continue routing the CAS. If this | |
method returns <literal>false</literal> (the default), the framework will not make any more calls to the | |
<literal>next()</literal> method. </para> | |
<para>In the case where the last Step was a ParallelStep, if at least one of the destinations resulted in a failure, | |
then <literal>continueOnFailure</literal> will be called to report one of the failures. If this method | |
returns true, but one of the other destinations in the ParallelStep resulted in a failure, then the | |
<literal>continueOnFailure</literal> method will be called again to report the next failure. This | |
continues until either this method returns false or there are no more failures. </para> | |
<para>Note that it is possible for processing of a CAS to be aborted without this method being called. This method | |
is only called when an attempt is being made to continue processing of the CAS following an exception, which may | |
be an application configuration decision.</para> | |
<para>In any case, if processing is aborted by the framework for any reason, including because | |
<literal>continueOnFailure</literal> returned false, the framework will call the | |
<literal>Flow.aborted()</literal> method to allow the Flow object to clean up any resources.</para> | |
<para>For an example of how to continue after an exception, see the example | |
code <literal>org.apache.uima.examples.flow.AdvancedFixedFlowController</literal>, in | |
the <literal>examples/src</literal> directory of the UIMA SDK. This exampe also demonstrates the use of | |
<literal>ParallelStep</literal>.</para> | |
</section> | |
</chapter> |