| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| <chapter id="ugr.tools.uimafit.typesystem"> |
| <title>Type System Detection</title> |
| <para>UIMA requires that types that are used in the CAS are defined in XML files - so-called |
| <emphasis>type system descriptions</emphasis> (TSD). Whenever a UIMA component is created, it |
| must be associated with such a type system. While it is possible to manually load the type |
| system descriptors and pass them to each UIMA component and to each created CAS, it is quite |
| inconvenient to do so. For this reason, uimaFIT supports the automatic detection of such files |
| in the classpath. Thus is becomes possible for a UIMA component provider to have component's |
| type automatically detected and thus the components becomes immediately usable by adding it to |
| the classpath.</para> |
| <section> |
| <title>Making types auto-detectable</title> |
| <para>The provider of a type system should create a file |
| <filename>META-INF/org.apache.uima.fit/types.txt</filename> in the classpath. This file |
| should define the locations of the type system descriptions. Assume that a type |
| <classname>org.apache.uima.fit.type.Token</classname> is specified in the TSD |
| <filename>org/apache/uima/fit/type/Token.xml</filename>, then the file should have the |
| following contents:</para> |
| <programlisting>classpath*:org/apache/uima/fit/type/Token.xml</programlisting> |
| <note> |
| <para>Mind that the file <filename>types.txt</filename> is must be located in |
| <filename>META-INF/org.apache.uima.fit</filename> where |
| <filename>org.apache.uima.fit</filename> is the name of a sub-directory inside |
| <filename>META-INF</filename>. <emphasis>We are not using the Java package notation |
| here!</emphasis></para> |
| </note> |
| <para>To specify multiple TSDs, add additonal lines to the file. If you have a large number of |
| TSDs, you may prefer to add a pattern. Assume that we have a large number of TSDs under |
| <filename>org/apache/uima/fit/type</filename>, we can use the following pattern which |
| recursively scans the package <package>org.apache.uima.fit.type</package> and all sub-packages |
| for XML files and tries to load them as TSDs.</para> |
| <programlisting>classpath*:org/apache/uima/fit/type/**/*.xml</programlisting> |
| <para>Try to design your packages structure in a way that TSDs and JCas wrapper classes |
| generated from them are separate from the rest of your code.</para> |
| <para>If it is not possible or inconvenient to add the `types.txt` file, patterns can also be |
| specified using the system property |
| <parameter>org.apache.uima.fit.type.import_pattern</parameter>. Multiple patterns may be |
| specified separated by semicolon:</para> |
| <programlisting>-Dorg.apache.uima.fit.type.import_pattern=\ |
| classpath*:org/apache/uima/fit/type/**/*.xml</programlisting> |
| <note> |
| <para>The <literal>\</literal> in the example is used as a line-continuation indicator. It |
| and all spaces following it should be ommitted.</para> |
| </note> |
| </section> |
| <section> |
| <title>Making index definitions and type priorities auto-detectable</title> |
| <para>Auto-detection also works for index definitions and type priority definitions. For |
| index definitions, the respective file where to register the index definition XML files is |
| <filename>META-INF/org.apache.uima.fit/fsindexes.txt</filename> and for type priorities, it |
| is <filename>META-INF/org.apache.uima.fit/typepriorities.txt</filename>.</para> |
| </section> |
| <section> |
| <title>Using type auto-detection </title> |
| <para>The auto-detected type system can be obtained from the |
| <classname>TypeSystemDescriptionFactory</classname>:</para> |
| <programlisting>TypeSystemDescription tsd = |
| TypeSystemDescriptionFactory.createTypeSystemDescription()</programlisting> |
| <para>Popular factory methods also support auto-detection:</para> |
| <programlisting>AnalysisEngine ae = createEngine(MyEngine.class);</programlisting> |
| </section> |
| <section> |
| <title>Multiple META-INF/org.apache.uima.fit/types.txt files</title> |
| <para>uimaFIT supports multiple <filename>types.txt</filename> files in the classpath (e.g. in differnt JARs). The |
| <filename>types.txt</filename> files are located via Spring using the classpath search |
| pattern: </para> |
| <programlisting>TYPE_MANIFEST_PATTERN = "classpath*:META-INF/org.apache.uima.fit/types.txt" </programlisting> |
| <para>This resolves to a list URLs pointing to ALL <filename>types.txt</filename> files. The |
| resolved URLs are unique and will point either to a specific point in the file system or into |
| a specific JAR. These URLs can be handled by the standard Java URL loading mechanism. |
| Example:</para> |
| <programlisting>jar:/path/to/syntax-types.jar!/META-INF/org.apache.uima.fit/types.txt |
| jar:/path/to/token-types.jar!/META-INF/org.apache.uima.fit/types.txt</programlisting> |
| <para>uimaFIT then reads all patters from all of these URLs and uses these to search the |
| classpath again. The patterns now resolve to a list of URLs pointing to the individual type |
| system XML descriptors. All of these URLs are collected in a set to avoid duplicate loading |
| (for performance optimization - not strictly necessary because the UIMA type system merger can |
| handle compatible duplicates). Then the descriptors are loaded into memory and merged using |
| the standard UIMA type system merger |
| (<methodname>CasCreationUtils.mergeTypeSystems()</methodname>). Example:</para> |
| <programlisting>jar:/path/to/syntax-types.jar!/desc/types/Syntax.xml |
| jar:/path/to/token-types.jar!/org/foobar/typesystems/Tokens.xml </programlisting> |
| <para>Voilá, the result is a type system covering all types could be found in the |
| classpath.</para> |
| <para>It is recommended <orderedlist> |
| <listitem> |
| <para>to put type system descriptors into packages resembling a namespace you "own" and to |
| use a package-scoped wildcard |
| search<programlisting>classpath*:org/apache/uima/fit/type/**/*.xml`</programlisting></para> |
| </listitem> |
| <listitem> |
| <para>or when putting descriptors into a "well-known" package like |
| <package>desc.type</package>, that <filename>types.txt</filename> file should |
| explicitly list all type system descriptors instead of using a wildcard |
| search<programlisting>classpath*:desc/type/Token.xml |
| classpath*:desc/type/Syntax.xml </programlisting></para> |
| </listitem> |
| </orderedlist>Method 1 should be preferred. Both methods can be mixed. </para> |
| </section> |
| <section> |
| <title>Performance note and caching</title> |
| <para>Currently uimaFIT evaluates the patterns for TSDs once and caches the locations, but not |
| the actual merged type system description. A rescan can be forced using |
| <methodname>TypeSystemDescriptionFactory.forceTypeDescriptorsScan()</methodname>. This may |
| change in future.</para> |
| </section> |
| <section> |
| <title>Potential problems</title> |
| <para>The mechanism works fine. However, there are specific issues with Java in general that one |
| should be aware of.</para> |
| <section> |
| <title>m2eclipse fails to copy descriptors to target/classes</title> |
| <para>There seems to be a bug in some older versions of m2eclipse that causes resources not |
| always to be copied to <filename>target/classes</filename>. If UIMA complains about type |
| definitions missing at runtime, try to <emphasis>clean/rebuild</emphasis> your project and |
| carefully check the m2eclipse console in the console view for error messages that might |
| cause m2eclipse to abort.</para> |
| </section> |
| <section> |
| <title>Class version conflicts</title> |
| <para>A problem can occur if you end up having multiple incompatible versions of the same type |
| system in the classpath. This is a general problem and not related to the auto-detection |
| feature. It is the same as when you have incompatible version of a particular class (e.g. |
| <interfacename>JCas</interfacename> wrapper or some third-party-library) in the classpath. |
| The behavior of the Java Classloader is undefined in that case. The detection will do its |
| best to try and load everything it can find, but the UIMA type system merger may barf or you |
| may end up with undefined behavior at runtime because one of the class versions is used at |
| random. </para> |
| </section> |
| <section> |
| <title>Classes and resources in the default package</title> |
| <para>It is bad practice to place classes into the default (unnamed) package. In fact it is |
| not possible to import classes from the default package in another class. Similarly it is a |
| bad idea to put resources at the root of the classpath. The Spring documentation on |
| resources <ulink |
| url="http://static.springsource.org/spring/docs/3.0.x/reference/resources.html#resources-app-ctx-wildcards-in-resource-paths" |
| >explains this in detail</ulink>.</para> |
| <para>For this reason the <filename>types.txt</filename> resides in |
| <filename>/META-INF/org.apache.uima.fit</filename> and it is suggest that type system |
| descriptors reside either in a proper package like |
| <filename>/org/foobar/typesystems/XXX.xml</filename> or in |
| <filename>/desc/types/XXX.xml</filename>. </para> |
| </section> |
| </section> |
| </chapter> |