blob: 8b162c97aa0f594ce65c62de7aca093fd49026db [file] [log] [blame]
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
[[ugr.tools.jcasgen]]
= JCasGen User's Guide
JCasGen reads a descriptor for an application (either an Analysis Engine Descriptor, or a Type System Descriptor), creates the merged type system specification by merging all the type system information from all the components referred to in the descriptor, and then uses this merged type system to create Java source files for classes that enable JCas access to the CAS.
Java classes are not produced for the built-in types, since these classes are already provided by the UIMA SDK.
(An exception is the built-in type ``uima.tcas.DocumentAnnotation``, see the warning below.)
[WARNING]
====
If the components comprising the input to the type merging process have different definitions for the same type name, JCasGen will show a warning, and in some environments may offer to abort the operation.
If you continue past this warning, JCasGen will produce correct Java source files representing the merged types (that is, the type definition containing all of the features defined on that type by all of the components). It is recommended that you do not use this capability (of having two different definitions for the same type name, with different feature sets) since it can make it difficult to xref:ref.adoc#ugr.ref.jcas.merging_types_from_other_specs[combine/package] your annotator with others.
Also note that if your type system declares a custom version of the `uima.tcas.DocumentAnnotation` built-in type, then JCasGen will generate a Java source file for it.
If you do this, you need to be aware of the issues discussed in the xref:ref.adoc#ugr.ref.jcas.documentannotation_issues[JCas Reference].
====
JCasGen can be run in many ways.
For Eclipse users using the Component Descriptor Editor, there's a button on the Type System Description page to run it on that type system.
There's also a jcasgen-maven-plugin to use in maven build scripts.
There's a menu-driven GUI tool for it.
And, there are command line scripts you can use to invoke it.
There are several versions of JCasGen.
The basic version reads an XML descriptor which contains a type system descriptor, and generates the corresponding Java Class Models for those types.
Variants exist for the Eclipse environment that allow merging the newly generated Java source code with xref:ref.adoc#ugr.ref.jcas.augmenting_generated_code[previously augmented versions].
Input to JCasGen needs to be mostly self-contained.
In particular, any types that are defined to depend on user-defined supertypes must have that supertype defined, if the supertype is `uima.tcas.Annotation` or a subtype of it.
Any features referencing ranges which are subtypes of uima.cas.String must have those subtypes included.
If this is not followed, a warning message is given stating that the resulting generation may be inaccurate.
JCasGen is typically invoked automatically when using the xref:tools.adoc#ugr.tools.cde.auto_jcasgen[Component Descriptor Editor], but can also be run using a shell script.
These scripts can take 0, 1, or 2 arguments.
The first argument is the location of the file containing the input XML descriptor.
The second argument specifies where the generated Java source code should go.
If it isn't given, JCasGen generates its output into a subfolder called JCas (or sometimes JCasNew -- see below), of the first argument's path.
The first argument, the input file, can be written as `jar:<url>!{entry}`, for example: `jar:http://www.foo.com/bar/baz.jar!/COM/foo/quux.class`
If no arguments are given to JCasGen, then it launches a GUI to interact with the user and ask for the same input.
The GUI will remember the arguments you previously used.
Here's what it looks like:
.JCasGen tool showing fields for input arguments
image::images/tools/tools.jcasgen/image002.jpg[JCasGen tool showing fields for input arguments]
When running with automatic merging of the generated Java source with previously augmented versions, the output location is where the merge function obtains the source for the merge operation.
As is customary for Java, the generated class source files are placed in the appropriate subdirectory structure according to Java conventions that correspond to the package (name space) name.
The Java classes must be compiled and the resulting class files included in the class path of your application; you make these classes available for other annotator writers using your types, perhaps packaged as an xxx.jar file.
If the xxx.jar file is made to contain only the Java Class Models for the CAS types, it can be reused by any users of these types.
[[ugr.tools.jcasgen.running_without_eclipse]]
== Running stand-alone without Eclipse
There is no capability to automatically merge the generated Java source with previous versions, unless running with Eclipse.
If run without Eclipse, no automatic merging of the generated Java source is done with any previous versions.
In this case, the output is put in a folder called "`JCasNew`" unless overridden by specifying a second argument.
The distribution includes a shell script/bat file to run the stand-alone version, called jcasgen.
[[ugr.tools.jcasgen.running_standalone_with_eclipse]]
== Running stand-alone with Eclipse
If you have Eclipse and EMF (EMF = Eclipse Modeling Framework; both of these are available from http://www.eclipse.org) installed (version 3 or later) JCasGen can merge the Java code it generates with previous versions, picking up changes you might have inserted by hand.
The output (and source of the merge input) is in a folder "`JCas`" under the same path as the input XML file, unless overridden by specifying a second argument.
You must install the UIMA plug-ins into Eclipse to enable this function.
The distribution includes a shell script/bat file to run the stand-alone with Eclipse version, called jcasgen_merge.
This works by starting Eclipse in "`headless`" mode (no GUI) and invoking JCasGen within Eclipse.
You will need to set the ECLIPSE_HOME environment variable or modify the jcasgen_merge shell script to specify where to find Eclipse.
The version of Eclipse needed is 3 or higher, with the EMF plug-in and the UIMA runtime plug-in installed.
A temporary workspace is used; the name/location of this is customizable in the shell script.
Log and error messages are written to the UIMA log.
This file is called uima.log, and is located in the default working directory, which if not overridden, is the startup directory of Eclipse.
[[ugr.tools.jcasgen.running_within_eclipse]]
== Running within Eclipse
There are two ways to run JCasGen within Eclipse.
The first way is to configure an Eclipse external tools launcher, and use it to run the stand-alone shell scripts, with the arguments filled in.
Here's a picture of a typical launcher configuration screen (you get here by navigating from the top menu: Run –> External Tools –> External tools...).
image::images/tools/tools.jcasgen/image004.jpg[Running JCasGen within Eclipse using the external tool launcher]
The second way (which is the normal way it's done) to run within Eclipse is to use the xref:tools.adoc#ugr.tools.cde[Component Descriptor Editor (CDE)]. This tool can be configured to automatically launch JCasGen whenever the type system descriptor is modified.
In this release, this operation completely regenerates the files, even if just a small thing changed.
For very large type systems, you probably don't want to enable this all the time.
The configurator tool has an option to enable/disable this function.
[[ugr.tools.jcasgen.maven_plugin]]
== Using the jcasgen-maven-plugin
For Maven builds, you can use the jcasgen-maven-plugin to take one or more top level descriptors (Type System or Analysis Engine descriptors), merge them together in the standard way UIMA merges type definitions, and produce the corresponding JCas source classes.
These, by default, are generated to the standard spot for Maven builds for generated files.
You can use ant-like include / exclude patterns to specify the top level descriptor files.
If you set `<limitToProject>` to `true`, then after a complete UIMA type system merge is done with all of the types, including those that are imported, only those types which are defined within this Maven project (that is, in some subdirectory of the project) will be generated.
To use the `jcasgen-maven-plugin`, specify it in the POM as follows:
[source]
----
<plugin>
<groupId>org.apache.uima</groupId>
<artifactId>jcasgen-maven-plugin</artifactId>
<version>2.4.1</version> <!-- change this to the latest version -->
<executions>
<execution>
<goals><goal>generate</goal></goals> <!-- this is the only goal -->
<!-- runs in phase process-resources by default -->
<configuration>
<!-- REQUIRED -->
<typeSystemIncludes>
<!-- one or more ant-like file patterns
identifying top level descriptors -->
<typeSystemInclude>src/main/resources/MyTs.xml
</typeSystemInclude>
</typeSystemIncludes>
<!-- OPTIONAL -->
<!-- a sequence of ant-like file patterns
to exclude from the above include list -->
<typeSystemExcludes>
</typeSystemExcludes>
<!-- OPTIONAL -->
<!-- where the generated files go -->
<!-- default value:
${project.build.directory}/generated-sources/jcasgen" -->
<outputDirectory>
</outputDirectory>
<!-- true or false, default = false -->
<!-- if true, then although the complete merged type system
will be created internally, only those types whose
definition is contained within this maven project will be
generated. The others will be presumed to be
available via other projects. -->
<!-- OPTIONAL -->
<limitToProject>false</limitToProject>
</configuration>
</execution>
</executions>
</plugin>
----