blob: d5f9dc4129ee4bdb84692e2a9d15fd1319e87f76 [file] [log] [blame]
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<chapter id="ugr.tools.uimafit.introduction">
<title>Introduction</title>
<para>While uimaFIT provides many features for a UIMA developer, there are two overarching themes
that most features fall under. These two sides of uimaFIT are,while complementary, largely
independent of each other. One of the beauties of uimaFIT is that a developer that uses one side
of uimaFIT extensively is not required to use the other side at all. </para>
<section>
<title>Simplify Component Implementation</title>
<para>The first broad theme of uimaFIT provides features that <emphasis>simplify component
implementation</emphasis>. Our favorite example of this is the
<classname>@ConfigurationParameter</classname> annotation which allows you to annotate a
member variable as a configuration parameter. This annotation in combination with the method
<methodname>ConfigurationParameterInitializer.initialize()</methodname> completely automates
the process of initializing member variables with values from the
<interfacename>UimaContext</interfacename> passed into your analysis engine's initialize
method. Similarly, the annotation <classname>@ExternalResource</classname> annotation in
combination with the method <methodname>ExternalResourceInitializer.initialize()</methodname>
completely automates the binding of an external resource as defined in the
<interfacename>UimaContext</interfacename> to a member variable. Dispensing with manually
writing the code that performs these two tasks reduces effort, eliminates verbose and
potentially buggy boiler-plate code, and makes implementing a UIMA component more enjoyable.
Consider, for example, a member variable that is of type <classname>Locale</classname>. With
uimaFIT you can simply annotate the member variable with
<classname>@ConfigurationParameter</classname> and have your initialize method automatically
initialize the variable correctly with a string value in the
<interfacename>UimaContext</interfacename> such as <literal>en_US</literal>. </para>
</section>
<section>
<title>Simplify Component Instantiation</title>
<para>The second broad theme of uimaFIT provides features that <emphasis>simplify component
instantiation</emphasis>. Working with UIMA, have you ever said to yourself <quote>but I
just want to tag some text!?</quote> What does it take to <quote>just tag some text?</quote>
Here's a list of things you must do with the traditional approach:</para>
<itemizedlist>
<listitem>
<para>wrap your tagger as a UIMA analysis engine</para>
</listitem>
<listitem>
<para>write a descriptor file for your analysis engine</para>
</listitem>
<listitem>
<para>write a CAS consumer that produces the desired output</para>
</listitem>
<listitem>
<para>write another descriptor file for the CAS consumer</para>
</listitem>
<listitem>
<para>write a descriptor file for a collection reader</para>
</listitem>
<listitem>
<para>write a descriptor file that describes a pipeline</para>
</listitem>
<listitem>
<para>invoke the Collection Processing Manager with your pipeline descriptor file</para>
</listitem>
</itemizedlist>
<section>
<title>From a class</title>
<para>Each of these steps has its own pitfalls and can be rather time consuming. This is a
rather unsatisfying answer to our simple desire to just tag some text. With uimaFIT you can
literally eliminate all of these steps. </para>
<para>Here's a simple snippet of Java code that illustrates <quote>tagging some text</quote>
with uimaFIT:</para>
<programlisting>import static org.apache.uima.fit.factory.JCasFactory.createJCas;
import static org.apache.uima.fit.pipeline.SimplePipeline.runPipeline;
import static
org.apache.uima.fit.factory.AnalysisEngineFactory.createEngineDescription;
JCas jCas = createJCas();
jCas.setDocumentText("some text");
runPipeline(jCas,
createEngineDescription(MyTokenizer.class),
createEngineDescription(MyTagger.class));
for(Token token : iterate(jCas, Token.class)){
System.out.println(token.getTag());
}</programlisting>
<para>This code uses several static method imports for brevity. And while the
terseness of this code won't make a Python programmer blush - it is certainly much easier
than the seven steps outlined above! </para>
</section>
<section>
<title>From an XML descriptor</title>
<para>uimaFIT provides mechanisms to instantiate and run UIMA components programmatically with
or without descriptor files. For example, if you have a descriptor file for your analysis
engine defined by <classname>MyTagger</classname> (as shown above), then you can instead
instantiate the analysis engine with:</para>
<programlisting>AnalysisEngineDescription tagger = createEngineDescription(
"mypackage.MyTagger");</programlisting>
<para>This will find the descriptor file <filename>mypackage/MyTagger.xml</filename> by name.
Similarly, you can find a descriptor file by location with
<methodname>createEngineDescriptionFromPath()</methodname>. However, if you want to dispense
with XML descriptor files altogether (and you probably do), you can use the method
<methodname>createEngineDescription()</methodname> as shown above. One of the driving motivations
for creating the second side of uimaFIT is our frustration with descriptor files and our
desire to eliminate them. Descriptor files are difficult to maintain because they are
generally tightly coupled with java code, they decay without warning, they are wearisome to
test, and they proliferate, among other reasons.</para>
</section>
</section>
<section>
<title>Is this cheating?</title>
<para>One question that is often raised by new uimaFIT users is whether or not it breaks the
<emphasis>UIMA way</emphasis>. That is, does adopting uimaFIT lead me down a path of
creating UIMA components and systems that are incompatible with the traditional UIMA approach?
The answer to this question is <emphasis>no</emphasis>. For starters, uimaFIT does not skirt
the UIMA mechanism of describing components - it only skips the XML part of it. For example,
when the method <methodname>createEngineDescription()</methodname> is called (as shown above) an
<interfacename>AnalysisEngineDescription</interfacename> is created for the analysis engine.
This is the same object type that is instantiated when a descriptor file is used. So, instead
of parsing XML to instantiate an analysis engine description from XML, uimaFIT uses a factory
method to instantiate it from method parameters. One of the happy benefits of this approach is
that for a given <interfacename>AnalysisEnginedDescription</interfacename> you can generate
an XML descriptor file using <methodname>AnalysisEngineDescription.toXML()</methodname>. So,
uimaFIT actually provides a very simple and direct path for <emphasis>generating</emphasis>
XML descriptor files rather than manually creating and maintaining them! </para>
<para>It is also useful to clarify that if you only want to use one side or the other of
uimaFIT, then you are free to do so. This is possible precisely because uimaFIT does not
workaround UIMA's mechanisms for describing components but rather uses them directly. For
example, if the only thing you want to use in uimaFIT is the
<classname>@ConfigurationParameter</classname>, then you can do so without worrying about
what effect this will have on your descriptor files. This is because your analysis engine will
be initialized with exactly the same <interfacename>UimaContext</interfacename> regardless of
whether you instantiate your analysis engine in the <emphasis>UIMA way</emphasis> or use one
of uimaFIT's factory methods. Similarly, a UIMA component does not need to be annotated with
<classname>@ConfiguratioParameter</classname> for you to make use of the
<methodname>createEngineDescription()</methodname> method. This is because when you pass
configuration parameter values in to the <methodname>createEngineDescription()</methodname> method,
they are added to an <interfacename>AnalysisEngineDescription</interfacename> which is used by
UIMA to populate a <interfacename>UimaContext</interfacename> - just as it would if you used a
descriptor file. </para>
</section>
<section>
<title>Conclusion</title>
<para>Because uimaFIT can be used to simplify component implementation and instantiation it is
easy to assume that you can't do one without the other. This page has demonstrated that while
these two sides of uimaFIT complement each other, they are not coupled together and each can
be effectively used without the other. Similarly, by understanding how uimaFIT uses the UIMA
component description mechanisms directly, one can be assured that uimaFIT enables UIMA
development that is compatible and consistent with the UIMA standard and APIs. </para>
</section>
</chapter>