blob: e10a57040e51d92312704efb3375af3680529f42 [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[
<!ENTITY imgroot "images/tools/tools.jcasgen/" >
<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >
%uimaents;
]>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<chapter id="ugr.tools.jcasgen">
<title>JCasGen User&apos;s Guide</title>
<para>JCasGen reads a descriptor for an application (either an Analysis Engine Descriptor,
or a Type System Descriptor), creates the merged type system
specification by merging all the type system information from all the components
referred to in the descriptor, and then uses this merged type system to create Java source
files for classes that enable JCas access to the CAS. Java classes are not produced for the
built-in types, since these classes are already provided by the UIMA SDK. (An exception is
the built-in type <literal>uima.tcas.DocumentAnnotation</literal>, see the warning below.) </para>
<warning><para>If the components comprising the input to the type merging process
have different definitions for the same type name,
JCasGen will show a warning, and in some environments may offer to abort the operation.
If you continue past this warning,
JCasGen will produce correct Java source files representing the merged types
(that is, the
type definition containing all of the features defined on that type by all of the
components). It is recommended that you do not use this capability (of having
two different definitions for the same type name, with different feature sets) since it can make it
difficult to combine/package your annotator with others. See <olink targetdoc="&uima_docs_ref;"/>
<olink targetdoc="&uima_docs_ref;"
targetptr="ugr.ref.jcas.merging_types_from_other_specs"/> for more information.
</para>
<para>Also note that if your type system declares a custom version of the
<literal>uima.tcas.DocumentAnnotation</literal>
built-in type, then JCasGen will generate a Java source file for it. If you do this, you need to be
aware of the issues discussed in <olink targetdoc="&uima_docs_ref;"/>
<olink
targetdoc="&uima_docs_ref;"
targetptr="ugr.ref.jcas.documentannotation_issues"/>.</para></warning>
<para>JCasGen can be run in many ways. For Eclipse users using the Component Descriptor Editor, there's a button
on the Type System Description page to run it on that type system. There's also a jcasgen-maven-plugin to use
in maven build scripts. There's a menu-driven GUI tool for it.
And, there are command line scripts you can use to invoke it.</para>
<para>There are several versions of JCasGen. The basic version reads an XML descriptor
which contains a type system descriptor, and generates the corresponding Java Class
Models for those types. Variants exist for the Eclipse environment that allow merging the
newly generated Java source code with previously augmented versions; see <olink
targetdoc="&uima_docs_ref;"/> <olink targetdoc="&uima_docs_ref;"
targetptr="ugr.ref.jcas.augmenting_generated_code"/> for a discussion of how the
Java Class Models can be augmented by adding additional methods and fields.</para>
<para>Input to JCasGen needs to be mostly self-contained. In particular, any types that are
defined to depend on user-defined supertypes must have that supertype defined, if the
supertype is <literal>uima.tcas.Annotation </literal>or a subtype of it. Any features
referencing ranges which are subtypes of uima.cas.String must have those subtypes
included. If this is not followed, a warning message is given stating that the resulting
generation may be inaccurate.</para>
<para>JCasGen is typically invoked automatically when using the Component Descriptor
Editor (see <olink targetdoc="&uima_docs_tools;"
targetptr="ugr.tools.cde.auto_jcasgen"/>), but can also be run using a shell
script. These scripts can take 0, 1, or 2 arguments. The first argument is the location of
the file containing the input XML descriptor. The second argument specifies where the
generated Java source code should go. If it isn&apos;t given, JCasGen generates its
output into a subfolder called JCas (or sometimes JCasNew &ndash; see below), of the first
argument&apos;s path.</para>
<para>The first argument, the input file, can be written as
<literal>jar:&lt;url>!{entry}</literal>, for example:
<literal>jar:http://www.foo.com/bar/baz.jar!/COM/foo/quux.class</literal></para>
<para>If no arguments are given to JCasGen, then it launches a GUI to interact with the user
and ask for the same input. The GUI will remember the arguments you previously used.
Here&apos;s what it looks like:
<screenshot>
<mediaobject>
<imageobject>
<imagedata width="5.8in" format="JPG" fileref="&imgroot;image002.jpg"/>
</imageobject>
<textobject><phrase>JCasGen tool showing fields for input arguments</phrase>
</textobject>
</mediaobject>
</screenshot></para>
<para>When running with automatic merging of the generated Java source with previously
augmented versions, the output location is where the merge function obtains the source
for the merge operation.</para>
<para>As is customary for Java, the generated class source files are placed in the
appropriate subdirectory structure according to Java conventions that correspond to
the package (name space) name.</para>
<para>The Java classes must be compiled and the resulting class files included in the class
path of your application; you make these classes available for other annotator writers
using your types, perhaps packaged as an xxx.jar file. If the xxx.jar file is made to
contain only the Java Class Models for the CAS types, it can be reused by any users of these
types.</para>
<section id="ugr.tools.jcasgen.running_without_eclipse">
<title>Running stand-alone without Eclipse</title>
<para>There is no capability to automatically merge the generated Java source with
previous versions, unless running with Eclipse. If run without Eclipse, no automatic
merging of the generated Java source is done with any previous versions. In this case,
the output is put in a folder called <quote>JCasNew</quote> unless overridden by
specifying a second argument.</para>
<para>The distribution includes a shell script/bat file to run the stand-alone version,
called jcasgen.</para>
</section>
<section id="ugr.tools.jcasgen.running_standalone_with_eclipse">
<title>Running stand-alone with Eclipse</title>
<para>If you have Eclipse and EMF (EMF = Eclipse Modeling Framework; both of these are
available from <ulink url="http://www.eclipse.org"/>) installed (version 3 or
later) JCasGen can merge the Java code it generates with previous versions, picking up
changes you might have inserted by hand. The output (and source of the merge input) is in a
folder <quote>JCas</quote> under the same path as the input XML file, unless
overridden by specifying a second argument.</para>
<para>You must install the UIMA plug-ins into Eclipse to enable this function.</para>
<para>The distribution includes a shell script/bat file to run the stand-alone with
Eclipse version, called jcasgen_merge. This works by starting Eclipse in
<quote>headless</quote> mode (no GUI) and invoking JCasGen within Eclipse. You will
need to set the ECLIPSE_HOME environment variable or modify the jcasgen_merge shell
script to specify where to find Eclipse. The version of Eclipse needed is 3 or higher,
with the EMF plug-in and the UIMA runtime plug-in installed. A temporary workspace is
used; the name/location of this is customizable in the shell script.</para>
<para>Log and error messages are written to the UIMA log. This file is called uima.log, and
is located in the default working directory, which if not overridden, is the startup
directory of Eclipse.</para>
</section>
<section id="ugr.tools.jcasgen.running_within_eclipse">
<title>Running within Eclipse</title>
<para>There are two ways to run JCasGen within Eclipse. The first way is to configure an
Eclipse external tools launcher, and use it to run the stand-alone shell scripts, with
the arguments filled in. Here&apos;s a picture of a typical launcher configuration
screen (you get here by navigating from the top menu: Run &ndash;&gt; External Tools
&ndash;&gt; External tools...).
<screenshot>
<mediaobject>
<imageobject>
<imagedata width="5.8in" format="JPG" fileref="&imgroot;image004.jpg"/>
</imageobject>
<textobject><phrase>Running JCasGen within Eclipse using the external tool launcher</phrase>
</textobject>
</mediaobject>
</screenshot></para>
<para>The second way (which is the normal way it's done) to run within Eclipse is to use the
Component Descriptor Editor (CDE) (see <olink targetdoc="&uima_docs_tools;"
targetptr="ugr.tools.cde"/>). This tool can be configured to automatically
launch JCasGen whenever the type system descriptor is modified. In this release, this
operation completely regenerates the files, even if just a small thing changed. For
very large type systems, you probably don&apos;t want to enable this all the time. The
configurator tool has an option to enable/disable this function.</para>
</section>
<section id="ugr.tools.jcasgen.maven_plugin">
<title>Using the jcasgen-maven-plugin</title>
<para>For Maven builds, you can use the jcasgen-maven-plugin to take one or more
top level descriptors (Type System or Analysis Engine descriptors), merge them
together in the standard way UIMA merges type definitions, and produce the corresponding
JCas source classes. These, by default, are generated to the standard spot for Maven
builds for generated files.</para>
<para>You can use ant-like include / exclude patterns to specify the top level descriptor
files. If you set &lt;limitToProject> to true, then after a complete UIMA type system
merge is done with all of the types, including those that are imported, only those
types which are defined within this Maven project (that is, in some subdirectory of the project)
will be generated.</para>
<para>To use the jcasgen-maven-plugin, specify it in the POM as follows:</para>
<programlisting><![CDATA[<plugin>
<groupId>org.apache.uima</groupId>
<artifactId>jcasgen-maven-plugin</artifactId>
<version>2.4.1</version> <!-- change this to the latest version -->
<executions>
<execution>
<goals><goal>generate</goal></goals> <!-- this is the only goal -->
<!-- runs in phase process-resources by default -->
<configuration>
<!-- REQUIRED -->
<typeSystemIncludes>
<!-- one or more ant-like file patterns
identifying top level descriptors -->
<typeSystemInclude>src/main/resources/MyTs.xml
</typeSystemInclude>
</typeSystemIncludes>
<!-- OPTIONAL -->
<!-- a sequence of ant-like file patterns
to exclude from the above include list -->
<typeSystemExcludes>
</typeSystemExcludes>
<!-- OPTIONAL -->
<!-- where the generated files go -->
<!-- default value:
${project.build.directory}/generated-sources/jcasgen" -->
<outputDirectory>
</outputDirectory>
<!-- true or false, default = false -->
<!-- if true, then although the complete merged type system
will be created internally, only those types whose
definition is contained within this maven project will be
generated. The others will be presumed to be
available via other projects. -->
<!-- OPTIONAL -->
<limitToProject>false</limitToProject>
</configuration>
</execution>
</executions>
</plugin>]]></programlisting>
</section>
</chapter>