<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> | |
<!-- | |
* Licensed to the Apache Software Foundation (ASF) under one | |
* or more contributor license agreements. See the NOTICE file | |
* distributed with this work for additional information | |
* regarding copyright ownership. The ASF licenses this file | |
* to you under the Apache License, Version 2.0 (the | |
* "License"); you may not use this file except in compliance | |
* with the License. You may obtain a copy of the License at | |
* | |
* http://www.apache.org/licenses/LICENSE-2.0 | |
* | |
* Unless required by applicable law or agreed to in writing, | |
* software distributed under the License is distributed on an | |
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | |
* KIND, either express or implied. See the License for the | |
* specific language governing permissions and limitations | |
* under the License. | |
--> | |
<html> | |
<head> | |
<title></title> | |
</head> | |
<body> | |
<h1>UIMA C++ Examples</h1> | |
<p> | |
This directory contains example code that illustrate how to use the UIMA C++ Framework. | |
</p> | |
<h2>Building the Examples</h2> | |
Set environment variables as described in the overview and build the examples as follows: | |
<h3>On Linux</h3> | |
<ul> | |
<code> | |
cd $UIMACPP_HOME/examples<br> | |
make -C src -f all.mak | |
</code> | |
</ul> | |
This will create shared libraries and executables in the src directory which must be placed in the LD_LIBRARY_PATH and PATH as follows: | |
<ul> | |
<code> | |
export LD_LIBRARY_PATH=`pwd`/src:$LD_LIBRARY_PATH</br> | |
export PATH=`pwd`/src:$PATH | |
</code> | |
</ul> | |
<h3>On Windows</h3> | |
<ul> | |
<code> | |
cd %UIMACPP_HOME%\examples<br> | |
devenv src\uimacpp-examples.sln /build release<br> | |
</code> | |
</ul> | |
This will create DLLs and executables in the src directory which must be placed in the PATH as follows: | |
<ul> | |
<code> | |
export PATH=%CD%\src;$PATH | |
</code> | |
</ul> | |
<h2>Running the Sample UIMA Components</h2> | |
<p>These components can be run using either in the native C++ environment using the <code>runAECpp</code> program or run from Java using the <code>runAE</code> utility or integrated into a CPE. The UIMA C++ descriptors are located in the descriptors subdirectory. | |
<h3>DaveDetector</h3> | |
A UIMA annotator that finds Daves in text and annotates them. It has one configuration parameter, <code>DaveString</code>, that specifies the string to match. It illustrates how to use the CAS APIs to create annotations and add them to the index. | |
<p> | |
To run this annotator in C++: | |
</p> | |
<ul> | |
<code> | |
runAECpp descriptors/DaveDetector.xml data/example.txt <yourOutputDir><br> | |
runAECpp -xmi descriptors/DaveDetector.xml data/tcas.xmi <yourOutputDir><br> | |
runAECpp -xmi descriptors/DaveDetector.xml data/sofa.xmi <yourOutputDir> -s EnglishDocument<br> | |
</code> | |
</ul> | |
<h3>SofaExampleAnnotator</h3> | |
A simple multi-Sofa example annotator that expects an English text Sofa as input and creates a German text Sofa as output. This annotator has no configuration parameters, and requires no initialization method. | |
To run this annotator in C++: | |
<ul> | |
<code> | |
runAECpp -xmi descriptors/SofaExampleAnnotator.xml data/sofa.xmi <yourOutputDir><br> | |
</code> | |
</ul> | |
<h3>SofaStreamHandlerFile</h3> | |
This component implements the Sofa stream handler interface defined in <code>sofastreamhandler.hpp</code> to provide stream access to data located using the <code>file:</code> URI scheme. It enables a UIMA component to access remote Sofa data referenced with a <code>file:</code> URI. | |
This example may be used as a model for building handlers for custom URI schemes. | |
The shared library <code>SofaStreamHandlerFile</code> must be registered with the framework as follows: | |
<ul> | |
On Windows | |
<ul> | |
<code> | |
set UIMACPP_STREAMHANDLERS=file:SofaStreamHandlerFile %UIMACPP_STREAMHANDLERS%<br> | |
</code> | |
</ul> | |
On Linux | |
<ul> | |
<code> | |
export UIMACPP_STREAMHANDLERS="file:SofaStreamHandlerFile $UIMACPP_STREAMHANDLERS"<br> | |
</code> | |
</ul> | |
<p> | |
Handlers for several URI schemes may be registered separated by a blank. There can be only one handler per URI scheme. | |
</p> | |
</ul> | |
The <code>SofaDataAnnotator</code> described below illustrates reading Sofa data as a stream. | |
<h3>SofaDataAnnotator</h3> | |
An annotator that accesses the data in Sofa "EnglishDocument" as a text stream. | |
It tokenizes the data on whitespace and creates an annotation for each token. The annotator may be run with an input Sofa where the Sofa data is local or with a Sofa where the Sofa data is remote and specified as a URI. To run this annotator in C++ to process a Sofa with local data: | |
<ul> | |
<code> | |
runAECpp -xmi descriptors/SofaDataAnnotator.xml data/sofa.xmi <yourOutputDir><br> | |
</code> | |
</ul> | |
To run and process Sofa where the Sofa data is specified as a <code>file:</code> URL, register the example <code>sofaStreamFileHandler</code> handler for the <code>file</code> URI scheme as described in the section above. and run: | |
<ul> | |
<code> | |
runAECpp -xmi descriptors/SofaDataAnnotator.xml data/filetcas.xmi <yourOutputDir><br> | |
</code> | |
</ul> | |
<h3>SimpleTextSegmenter</h3> | |
A simple example CAS Multiplier which is a type of analysis component that outputs new CASes. This example illustrates one use of the CAS Multiplier which is to break down a large CAS into smaller pieces which are put into new CASs. The <code>SimpleTextSegmenter</code> breaks down the input document into segments based on a delimiter and creates a Sofa for each segment in a new CAS. The delimiter to use can be specified by setting the value of the configuration parameter <code>DelimiterString</code> in the descriptor. | |
To run this annotator in C++: | |
<ul> | |
<code> | |
runAECpp -xmi descriptors/SimpleTextSegmenter.xml data/docforsegmenter.xmi <yourOutputDir><br> | |
</code> | |
</ul> | |
The original input CAS as well as the new CASs containing the segments will be written out as separate files in your <yourOutputDir> in XMI format. | |
<h3>XCasWriterCasConsumer</h3> | |
A simple example CAS Consumer that generates an XCAS (XML representation of a CAS) and writes it to stdout by default. It can be configured to write the XCAS output to a file in a directory specified by modifying the descriptor and setting a value for the configuration parameter <code>OutputDirectory</code>. The <code>XCasWriterCasConsumer</code> can be inserted at any point in a aggregate or CPE flow to dump the contents of CAS and is useful for debugging. | |
To run this annotator in C++: | |
<ul> | |
<code> | |
runAECpp -xmi descriptors/XCasWriterCasConsumer.xml data/tcas.xmi <yourOutputDir><br> | |
runAECpp -xmi descriptors/XCasWriterCasConsumer.xml data/sofa.xmi <yourOutputDir><br> | |
</code> | |
</ul> | |
To see the results of the earlier examples: | |
<ul> | |
<code> | |
runAECpp -xmi descriptors/XCasWriterCasConsumer.xml <yourOutputDir>/tcas.xmi<br> | |
runAECpp -xmi descriptors/XCasWriterCasConsumer.xml <yourOutputDir>/sofa.xmi<br> | |
runAECpp -xmi descriptors/XCasWriterCasConsumer.xml <yourOutputDir>/filetcas.xmi<br> | |
</code> | |
</ul> | |
<h2>Running the Sample UIMA Applications</h2> | |
These illustrate how to write stand-alone C++ applications that run UIMA C++ components. | |
Build the examples and set up environment variables as described above. | |
<h3>ExampleApplication</h3> | |
This application reads all the .txt files in a directory, creates a CAS for | |
each in turn and sends them through an AnalysisEngine. | |
The results are printed on stdout as an XCAS. The application takes two arguments, the path to a UIMA C++ descriptor file, and a file or directory containing input data: | |
<ul> | |
<code> | |
ExampleApplication descriptors/DaveDetector.xml data<br> | |
</code> | |
</ul> | |
<h3>SofaExampleApplication</h3> | |
A multiple Sofa example that creates a text Sofa called EnglishDocument and sets its Sofa data to some English text and calls the SofaExampleAnnotator which produces a Sofa with German text and writes the annotations to stdout. This application takes one argument, the path to the SofaExampleAnnotator descriptor file: | |
<ul> | |
<code> | |
SofaExampleApplication descriptors/SofaExampleAnnotator.xml<br> | |
</code> | |
</ul> | |
</body> | |
</html> |