The UIMA C++ framework supports testing and embedding UIMA components into native processes. A UIMA C++ test driver, runAECpp
, is available so that UIMA C++ components can be fully developed and tested in the native environment, no use of Java is needed.
UIMA C++ includes APIs to parse component descriptors, instantiate and call analysis engines, so that UIMA C++ compliant AE can be used in native applications. The Apache UIMA C++ SDK is Docker-based. For interoperability, UIMA C++ components are expected to be built and distributed against a particular Docker image, thus ensuring correct compiler and dependent library settings.
Checkout the source code as follows:
git clone https://github.com/apache/uima-uimacpp.git
UIMACPP runtime prerequisites are APR, ICU, Xerces-C, APR-Util and a JDK for building the JNI interface. The SDK also requires doxygen for building the documentation. See the Dockerfile for details.
The Docker image is built on top of Debian stable slim image. After cloning the project, on the root directory do:
sudo docker build . -t apache:uimacpp
This should create an image about 250+ Mb in size.
The easier way to test it is by running the Perltorator:
mkdir out sudo docker run --interactive --tty --name uimacppdev \ --mount type=bind,source="$(pwd)"/examples/data,target=/data \ --mount type=bind,source="$(pwd)"/out,target=/out \ apache:uimacpp \ /usr/local/uimacpp/desc/Perltator.xml /data /out
The out
folder will be populated by XMI files with the same name as the original files in data
.
Other useful Docker commands:
sudo docker rm uimacppdev
To remove an old container.
sudo docker run --interactive --tty --name uimacppdev --entrypoint /bin/bash apache:uimacpp
To run a container interactively using bash
.
The UIMACPP package includes several sample UIMA C++ annotators and a sample C++ application that instantiates and uses a C++ annotator. More details on how to build and run the examples will be available over time.
A UIMA C++ annotator descriptor differs from a Java descriptor in the frameworkImplementation, specifying
<frameworkImplementation>org.apache.uima.cpp</frameworkImplementation>
For a C++ annotator, the annotatorImplementationName specifies the name of a dynamic link library. UIMACPP will add the OS appropriate suffix and search the active dynamic libary path (LD_LIBRARY_PATH
for Linux). The suffix is not automatically added when the annotatorImplementationName includes a path. An annotator library is derived from the UIMACPP class “Annotator” and must implement basic annotator methods. Annotators in Perl and Python languages each use a C++ annotator to instantiate the appropriate interpreter, load the specified annotator source and call the annotator methods.
As in UIMA, UIMACPP includes application level methods to instantiate an Analysis Engine from a UIMA annotator descriptor, create a CAS using the AE type system, and call AE methods.
examples/src/ExampleApplication.cpp
is a simple program that instantiates the specified annotator, reads a directory of txt files, and for each file sets the document text in a CAS and calls the AE process method. For annotator development, this program can be modified to create arbitrary CAS content to drive the annotator. Because the entire application is C++, standard tools such as gdb
or devenv
can be easily used for debugging.
runAECpp
is a UIMA C++ application driver modeled closely after the Java tool runAE. Like ExampleApplication
, this tool can read a directory of text files and exercise the given annotator. In addition, runAECpp
can take input from XML format CAS files, call the annotator's process()
method, and output the resultant CAS in XML format files. XML format CAS input files can be created from upstream UIMA components, or created manually with the content needed to develop and unit test an annotator. This is the default entrypoint point for the Docker image.