| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| <chapter id="ugr.tools.uimafit.introduction"> |
| <title>Introduction</title> |
| <para>While uimaFIT provides many features for a UIMA developer, there are two overarching themes |
| that most features fall under. These two sides of uimaFIT are,while complementary, largely |
| independent of each other. One of the beauties of uimaFIT is that a developer that uses one side |
| of uimaFIT extensively is not required to use the other side at all. </para> |
| <section> |
| <title>Simplify Component Implementation</title> |
| <para>The first broad theme of uimaFIT provides features that <emphasis>simplify component |
| implementation</emphasis>. Our favorite example of this is the |
| <classname>@ConfigurationParameter</classname> annotation which allows you to annotate a |
| member variable as a configuration parameter. This annotation in combination with the method |
| <methodname>ConfigurationParameterInitializer.initialize()</methodname> completely automates |
| the process of initializing member variables with values from the |
| <interfacename>UimaContext</interfacename> passed into your analysis engine's initialize |
| method. Similarly, the annotation <classname>@ExternalResource</classname> annotation in |
| combination with the method <methodname>ExternalResourceInitializer.initialize()</methodname> |
| completely automates the binding of an external resource as defined in the |
| <interfacename>UimaContext</interfacename> to a member variable. Dispensing with manually |
| writing the code that performs these two tasks reduces effort, eliminates verbose and |
| potentially buggy boiler-plate code, and makes implementing a UIMA component more enjoyable. |
| Consider, for example, a member variable that is of type <classname>Locale</classname>. With |
| uimaFIT you can simply annotate the member variable with |
| <classname>@ConfigurationParameter</classname> and have your initialize method automatically |
| initialize the variable correctly with a string value in the |
| <interfacename>UimaContext</interfacename> such as <literal>en_US</literal>. </para> |
| </section> |
| <section> |
| <title>Simplify Component Instantiation</title> |
| <para>The second broad theme of uimaFIT provides features that <emphasis>simplify component |
| instantiation</emphasis>. Working with UIMA, have you ever said to yourself <quote>but I |
| just want to tag some text!?</quote> What does it take to <quote>just tag some text?</quote> |
| Here's a list of things you must do with the traditional approach:</para> |
| <itemizedlist> |
| <listitem> |
| <para>wrap your tagger as a UIMA analysis engine</para> |
| </listitem> |
| <listitem> |
| <para>write a descriptor file for your analysis engine</para> |
| </listitem> |
| <listitem> |
| <para>write a CAS consumer that produces the desired output</para> |
| </listitem> |
| <listitem> |
| <para>write another descriptor file for the CAS consumer</para> |
| </listitem> |
| <listitem> |
| <para>write a descriptor file for a collection reader</para> |
| </listitem> |
| <listitem> |
| <para>write a descriptor file that describes a pipeline</para> |
| </listitem> |
| <listitem> |
| <para>invoke the Collection Processing Manager with your pipeline descriptor file</para> |
| </listitem> |
| </itemizedlist> |
| <section> |
| <title>From a class</title> |
| <para>Each of these steps has its own pitfalls and can be rather time consuming. This is a |
| rather unsatisfying answer to our simple desire to just tag some text. With uimaFIT you can |
| literally eliminate all of these steps. </para> |
| <para>Here's a simple snippet of Java code that illustrates <quote>tagging some text</quote> |
| with uimaFIT:</para> |
| <programlisting>import static org.apache.uima.fit.factory.JCasFactory.createJCas; |
| import static org.apache.uima.fit.pipeline.SimplePipeline.runPipeline; |
| import static |
| org.apache.uima.fit.factory.AnalysisEngineFactory.createEngineDescription; |
| |
| JCas jCas = createJCas(); |
| |
| jCas.setDocumentText("some text"); |
| |
| runPipeline(jCas, |
| createEngineDescription(MyTokenizer.class), |
| createEngineDescription(MyTagger.class)); |
| |
| for(Token token : iterate(jCas, Token.class)){ |
| System.out.println(token.getTag()); |
| }</programlisting> |
| <para>This code uses several static method imports for brevity. And while the |
| terseness of this code won't make a Python programmer blush - it is certainly much easier |
| than the seven steps outlined above! </para> |
| </section> |
| <section> |
| <title>From an XML descriptor</title> |
| <para>uimaFIT provides mechanisms to instantiate and run UIMA components programmatically with |
| or without descriptor files. For example, if you have a descriptor file for your analysis |
| engine defined by <classname>MyTagger</classname> (as shown above), then you can instead |
| instantiate the analysis engine with:</para> |
| <programlisting>AnalysisEngineDescription tagger = createEngineDescription( |
| "mypackage.MyTagger");</programlisting> |
| <para>This will find the descriptor file <filename>mypackage/MyTagger.xml</filename> by name. |
| Similarly, you can find a descriptor file by location with |
| <methodname>createEngineDescriptionFromPath()</methodname>. However, if you want to dispense |
| with XML descriptor files altogether (and you probably do), you can use the method |
| <methodname>createEngineDescription()</methodname> as shown above. One of the driving motivations |
| for creating the second side of uimaFIT is our frustration with descriptor files and our |
| desire to eliminate them. Descriptor files are difficult to maintain because they are |
| generally tightly coupled with java code, they decay without warning, they are wearisome to |
| test, and they proliferate, among other reasons.</para> |
| </section> |
| </section> |
| <section> |
| <title>Is this cheating?</title> |
| <para>One question that is often raised by new uimaFIT users is whether or not it breaks the |
| <emphasis>UIMA way</emphasis>. That is, does adopting uimaFIT lead me down a path of |
| creating UIMA components and systems that are incompatible with the traditional UIMA approach? |
| The answer to this question is <emphasis>no</emphasis>. For starters, uimaFIT does not skirt |
| the UIMA mechanism of describing components - it only skips the XML part of it. For example, |
| when the method <methodname>createEngineDescription()</methodname> is called (as shown above) an |
| <interfacename>AnalysisEngineDescription</interfacename> is created for the analysis engine. |
| This is the same object type that is instantiated when a descriptor file is used. So, instead |
| of parsing XML to instantiate an analysis engine description from XML, uimaFIT uses a factory |
| method to instantiate it from method parameters. One of the happy benefits of this approach is |
| that for a given <interfacename>AnalysisEnginedDescription</interfacename> you can generate |
| an XML descriptor file using <methodname>AnalysisEngineDescription.toXML()</methodname>. So, |
| uimaFIT actually provides a very simple and direct path for <emphasis>generating</emphasis> |
| XML descriptor files rather than manually creating and maintaining them! </para> |
| <para>It is also useful to clarify that if you only want to use one side or the other of |
| uimaFIT, then you are free to do so. This is possible precisely because uimaFIT does not |
| workaround UIMA's mechanisms for describing components but rather uses them directly. For |
| example, if the only thing you want to use in uimaFIT is the |
| <classname>@ConfigurationParameter</classname>, then you can do so without worrying about |
| what effect this will have on your descriptor files. This is because your analysis engine will |
| be initialized with exactly the same <interfacename>UimaContext</interfacename> regardless of |
| whether you instantiate your analysis engine in the <emphasis>UIMA way</emphasis> or use one |
| of uimaFIT's factory methods. Similarly, a UIMA component does not need to be annotated with |
| <classname>@ConfiguratioParameter</classname> for you to make use of the |
| <methodname>createEngineDescription()</methodname> method. This is because when you pass |
| configuration parameter values in to the <methodname>createEngineDescription()</methodname> method, |
| they are added to an <interfacename>AnalysisEngineDescription</interfacename> which is used by |
| UIMA to populate a <interfacename>UimaContext</interfacename> - just as it would if you used a |
| descriptor file. </para> |
| </section> |
| <section> |
| <title>Conclusion</title> |
| <para>Because uimaFIT can be used to simplify component implementation and instantiation it is |
| easy to assume that you can't do one without the other. This page has demonstrated that while |
| these two sides of uimaFIT complement each other, they are not coupled together and each can |
| be effectively used without the other. Similarly, by understanding how uimaFIT uses the UIMA |
| component description mechanisms directly, one can be assured that uimaFIT enables UIMA |
| development that is compatible and consistent with the UIMA standard and APIs. </para> |
| </section> |
| </chapter> |