blob: 3a4cb5e318d3dbcf4642c2ebfb66d4ff2416ac27 [file] [log] [blame]
Apache UIMA (Unstructured Information Management Architecture) v2.4.1 SDK
-------------------------------------------------------------------------
Building from the Source Distribution
-------------------------------------
We use Maven 3.0 or later for building; download this if needed,
and set the environment variable MAVEN_OPTS to -Xmx800m -XX:MaxPerSize=256m.
Then do the build by going into the .../uimaj directory, and issuing the command
mvn clean install
This builds everything except the ...source-release.zip file. If you want that,
change the command to
mvn clean install -Papache-release
Look for the result here:
target/uimaj-[version]-source-release.zip (if run with -Papache-release)
For more details, please see http://uima.apache.org/building-uima.html
What's New in 2.4.1
-------------------
There are many improvements, some bug fixes, and tooling enhancements in this release. The major changes include:
* Documentation of binary serialization.
* New kinds of compressed binary serialization that compress the data significantly
and one form that supports unequal source/target type systems
* A new facility called External Parameter Overrides for specifying parameter settings for annotators
that uses properties files and is independent of the annotator hierarchy
* CasCopier enhancements to allow copying one view to a different view.
* Additional options to restrict JCasGen operation to generating just those types that are defined in a project,
excluding other types that are imported from other projects
* A new Maven plugin that runs JCasGen (see tools documentation for how to configure and use this)
* a new ability to preserve white space (indentation) when parsing XML descriptors; this is now used in the
Component Descriptor Editor (CDE), to preserve indentation when editing an existing descriptor.
* Performance and space improvements
* Some "bulk" methods for efficiently removing Feature Structures from Indexes
* The CDE supports 3 new things: the preserving of existing white-space in XML descriptors,
External Parameter Overrides, and a configuration option to restrict JCasGen to just those
types defined in the project.
* Enhancements to the DocumentAnalyzer utility to support reading CASes in various formats
Some User-facing Interfaces and Classes have new methods:
* FsIndexRepository - new methods for bulk removal of all instances of a type from the indexes
* JCas - same methods added for bulk removal, as above
* Serialization - Javadocs added to document the kinds of serialization and deserialization supported (binary and compressed binary forms)
- Methods added to support Binary Compressed serialization / deserialization
* CasCopier - new copyCasView methods for copying to a different view
Some interfaces and classes, less likely to be used by users, were changed:
* ConfigurationManager's createContext method has additional parameter for the new external parameter override mechanism
* ConfigurationParameter - has new support for external parameter override names
Supported Platforms
--------------------
Apache UIMA requires Java level 1.5; it has been tested with Sun/Oracle Java SDK v5 and v6 amd v7, and IBM Java 6 and 7.
Running the Eclipse plugin tooling for UIMA requires you start Eclipse using a Java 5 or later, as well.
The supported platforms are: Windows, Linux, Solaris, AIX and Mac OS X.
Other platforms and Java (5+) implementations should work, but have not been significantly tested.
Many of the scripts in the /bin directory invoke Java. They use the value of the environment variable, JAVA_HOME,
to locate the Java to use; if it is not set, they invoke "java" expecting to find an appropriate Java in your PATH.
Environment Variables
----------------------
After you have unpacked the Apache UIMA distribution from the package of your choice (e.g. .zip or .gz),
perform the steps below to set up UIMA so that it will function properly.
* Set JAVA_HOME to the directory of your JRE installation you would like to use for UIMA.
* Set UIMA_HOME to the apache-uima directory of your unpacked Apache UIMA distribution
* Append UIMA_HOME/bin to your PATH
* Please run the script UIMA_HOME/bin/adjustExamplePaths.bat (or .sh), to update
paths in the examples based on the actual UIMA_HOME directory path.
This script runs a Java program;
you must either have java in your PATH or set the environment variable JAVA_HOME to a
suitable JRE.
Note: The Mac OS X operating system procedures for setting up global environment
variables are described here: see http://developer.apple.com/qa/qa2001/qa1067.html.
Verifying Your Installation
----------------------------
To test the installation, run the documentAnalyzer.bat (or .sh) file located in the bin subdirectory.
This should pop up a "Document Analyzer" window. Set the values displayed in this GUI to as follows:
* Input Directory: UIMA_HOME/examples/data
* Output Directory: UIMA_HOME/examples/data/processed
* Location of Analysis Engine XML Descriptor: UIMA_HOME/examples/descriptors/analysis_engine/PersonTitleAnnotator.xml
Replace UIMA_HOME above with the path of your Apache UIMA installation.
Next, click the "Run" button, which should, after a brief pause, pop up an "Analyzed Results" window.
Double-click on one of the documents to display the analysis results for that document.
Getting Started
----------------
For an introduction to Apache UIMA and how to use it, please read the documentation
located in the docs subdirectory. A good place to start is the overview_and_setup
book's first chapter, which has a brief guide to the documentation.