RELEASE_NOTES.html - uima-uimacpp - Git at Google

 <!--
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
  * regarding copyright ownership.  The ASF licenses this file
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
  *
  *   http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
  -->
 <html>
 <head>
   <title>Apache UIMA C++ v2.2.2 Releate Notes</title>
 </head>
 <body>
 <h1>Apache UIMA C++ (Unstructured Information Management Architecture) v2.2.2 Release Notes</h1>

 <h2>Contents</h2>
 <p>
 <a href="#what.is.uima">1. What is UIMA?</a><br/>
 <a href="#major.changes">2. Major Changes in this Release</a><br/>
 <a href="#migrating">3. Migrating from IBM UIMA C++ to Apache UIMA C++</a><br/>
 <a href="#get.involved">4. How to Get Involved</a><br/>
 <a href="#report.issues">5. How to Report Issues</a><br/>
 <a href="#more.info">6. More Documentation on Apache UIMA C++</a><br/>
 </p>

 <h2><a name="what.is.uima">1. What is UIMA?</a></h2>

      <p>
   			Unstructured Information Management applications are
 				software systems that analyze large volumes of
 				unstructured information in order to discover knowledge
 				that is relevant to an end user. UIMA is a framework and
 				SDK for developing such applications. An example UIM
 				application might ingest plain text and identify
 				entities, such as persons, places, organizations; or
 				relations, such as works-for or located-at. UIMA enables
 				such an application to be decomposed into components,
 				for example "language identification" -&gt; "language
 				specific segmentation" -&gt; "sentence boundary
 				detection" -&gt; "entity detection (person/place names
 				etc.)". Each component must implement interfaces defined
 				by the framework and must provide self-describing
 				metadata via XML descriptor files. The framework manages
 				these components and the data flow between them.
 				Components are written in Java or C++; the data that
 				flows between components is designed for efficient
 				mapping between these languages. UIMA additionally
 				provides capabilities to wrap components as network
 				services, and can scale to very large volumes by
 				replicating processing pipelines over a cluster of
 				networked nodes.
 			</p>
                                                 <p>
 				Apache UIMA is an Apache-licensed open source
 				implementation of the UIMA specification (that
 				specification is, in turn, being developed concurrently
 				by a technical committee within
 				<a href="http://www.oasis-open.org">OASIS</a>
 				, a standards organization). We invite and encourage you
 				to participate in both the implementation and
 				specification efforts.
 			</p>
                                                 <p>
 				UIMA is a component framework for analysing unstructured
 				content such as text, audio and video. It comprises an
 				SDK and tooling for composing and running analytic
 				components written in Java and C++, with some support
 				for Perl, Python and TCL.
 			</p>

 <h2><a name="major.changes">2. Major Changes in this Release</a></h2>
 <p>
 This section describes what has changed between version 1.4.4 and version 2.2.2 of
 UIMA C++. A migration guide is provided below that describes the required updates to
 your C++ code and descriptors. See Section 3, "Migrating from IBM UIMA C++ to
 Apache UIMA C++".
 </p>

 <!--
 tutorial and other interlock with Java?
 -->

 <h3>2.1. Complete Content for Build, Test and Package</h3>
 <p>
 This release includes a test suite for the uimacpp library. Also
 included are the tools to build both source and binary distribution
 packages.
 </p>

 <h3>2.2. Extended Platform Support</h3>
 <p>
 On 64-bit Unix platforms the Apache UIMA C++ framework can be built as
 a 64-bit library. This enables C++, Perl, Python and Tcl analytics to
 fully utilize a 64-bit address space. Both XML and binary CAS
 serialization formats are compatible between 32 and 64-bit builds.
 </p>
 <p>
 MacOSX is now fully supported for SDK build and use.
 </p>

 <h3>2.3. Better Integration with Java SDK</h3>
 <p>
 The Apache UIMA SDK shell scripts and Eclipse run configurations set native environment paths assuming the UIMA C++ SDK is installed directly under $UIMA_HOME. This enables the standard UIMA SDK tools to work seemlessly with C++ based annotators.
 </p>
 <p>
 On Unix platforms, the UIMA C++ examples directory can be loaded as an Eclipse CDT project, supporting development of both UIMA C++ and Java components in the same Eclipse IDE.
 </p>
 <p>
 By default, when a uimacpp annotator is instantiated from Java, the annotator runs in the JVM process with communication via the JNI. Multiple uimacpp annotators instantiated in the same JVM must share the same native environment, therefor they must share the same version UIMA C++ framework. As before, a uimacpp annotator can be isolated by wrapping it as a Vinci service.
 </p>
 <p>
 A new approach is provided in this release which allows process isolation of uimacpp annotators without wrapping each one in a JVM. When deployed from Java as a UIMA-AS service, a uimacpp annotator is spawned by the JVM as native process. The native UIMA-AS service communitates to clients via JMS messaging, completely independently of the JVM. However, the native service connects back to the JVM to enable JMX monitoring and logfile integration with other UIMA annotators running in the same JVM.
 </p>

 <h3>2.4. C++ Namespace and Module Name Changes</h3>
 <p>
 The UIMA C++ namespace and shared library has changed from "taf" to "uima".
 Environment variable TAFROOT has changed to UIMACPP_HOME.
 All of the source files have dropped the prefix "taf_". SDK header files
 have moved from $TAFROOT/include/ to $UIMACPP_HOME/include/uima/.
 </p>

 <h3>2.5. XML Descriptor Changes</h3>
 <p>
 The XML namespace in UIMA component descriptors has changed from
 http://uima.watson.ibm.com/resourceSpecifier to
 http://uima.apache.org/resourceSpecifier. The value of the
 &lt;frameworkImplementation> for C++ components must now be org.apache.uima.cpp.
 Although <code>taeDescription</code> is still supported, the use of <code>analysisEngineDescription</code>
 is recommended.
 </p>

 <h3>2.6. TCAS replaced by CAS</h3>
 <p>
 In Apache UIMA the TCAS interface has been removed. All uses of it must now be
 replaced by the CAS interface. All methods that used to be defined on TCAS
 were moved to CAS.
 All annotators should now derive from class <code>Annotator</code>, although for backwards
 compatibility C++ annotators can still derive from the class <code>TextAnnotator</code>.
 For all C++ component types, the CAS delivered to the process method will be a base CAS if Sofa capabilities are
 declared in the component descriptor, else the selected CAS view.
 </p>
 <p>
 The method
 <ul>
  <code>CAS.getTCAS(getSofa(getAnnotatorContext().mapToSofaID("SofaName")))</code>
 </ul>
 has been replaced with
 <ul>
  <code>CAS->getView("SofaName")</code>
 </ul>
 as the Sofa mapping code has been integrated into the CAS.
 </p>

 <h3>2.7. Support added for XMI Serialization</h3>
 <p>
 The proposed standard for XML interchange of CAS data, XMI serialization,
 is now supported by UIMA C++. The C++ application driver, runAECpp, has a new option
 to specify XMI format input files, and the output format is now XMI.
 </p>
 <p>
 XMI serialization is also key to implementing the UIMA-AS service wrapper for uimacpp-based annotators.
 </p>

 <h3>2.8. Building the SDK on Unix is Simplified</h3>
 <p>
 The Unix build is simplified by redistributing GNU automake output files
 in the source tarball. When building from an SVN checkout, up-to-date versions
 of GNU automake, autoconf and libtool are still required.
 </p>

 <h2><a name="migrating">3. Migrating from IBM UIMA C++ to Apache UIMA C++</a></h2>
 <p>
 Although not required, CPP component descriptors of type <code>taeDescription</code> should be changed to type <code>analysisEngineDescription</code>.
 </p>

 <h3>3.1. Migrating C++ Source Code</h3>
 <p>
 This section describes what source code changes are required to migrate from
 UIMA C++ version 1.4.4 to Apache UIMA C++ v2.2.2. Please note that the first two changes
 are order dependent.
 </p>

 <ul>
 <li>Replace [case sensitive] all occurances of <code>getTCAS</code> with <code>getView</code></li>
 <li>Replace [case sensitive] all occurances of <code>TCAS</code> with <code>CAS</code></li>
 <li>Replace [case sensitive] all occurances of <code>TAF_</code> with <code>UIMA_</code></li>
 <li>Replace [case sensitive] all occurances of <code>taf_</code> with <code>uima/</code></li>
 <li>Replace <code>"tafapi.hpp"</code> with <code>"uima/api.hpp"</code></li>
 <li>Replace <code>TextAnnotator</code> with <code>Annotator</code></li>
 <li>Replace the generic C API wrapper, usually at the bottom of a cpp component, with
 the MAKE_AE() macro. See sample code in $UIMACPP_HOME/examples/src</li>
 </ul>

 <h3>3.1. Migrating Scriptator Source Code</h3>
 <p>
 Tcl source code using variables of type TCAS should use CAS instead.
 No changes should be necessary for Perl or Python source.
 </p>

 <h2><a name="get.involved">4. How to Get Involved</a></h2>
 <p>
 The Apache UIMA project really needs and appreciates any contributions,
 including documentation help, source code and feedback.  If you are interested
 in contributing, please visit
 <a href="http://incubator.apache.org/uima/get-involved.html">
   http://incubator.apache.org/uima/get-involved.html</a>.
 </p>

 <h2><a name="report.issues">5. How to Report Issues</a></h2>
 <p>
 The Apache UIMA project uses JIRA for issue tracking.  Please report any
 issues you find at
 <a href="http://issues.apache.org/jira/browse/uima">http://issues.apache.org/jira/browse/uima</a>
 </p>

 <h2><a name="more.info">6. More Documentation on Apache UIMA C++</a></h2>
 <p>
 Please see <a href="docs/overview_and_setup.html">Overview and Setup</a>
 for a high level overview of UIMA C++,
 and <a href="docs/html/index.html">Doxygen</a> for details on the UIMA C++ APIs.
 </p>

 </body>
 </html>
	<!--
	* Licensed to the Apache Software Foundation (ASF) under one
	* or more contributor license agreements. See the NOTICE file
	* distributed with this work for additional information
	* regarding copyright ownership. The ASF licenses this file
	* to you under the Apache License, Version 2.0 (the
	* "License"); you may not use this file except in compliance
	* with the License. You may obtain a copy of the License at
	*
	* http://www.apache.org/licenses/LICENSE-2.0
	*
	* Unless required by applicable law or agreed to in writing,
	* software distributed under the License is distributed on an
	* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	* KIND, either express or implied. See the License for the
	* specific language governing permissions and limitations
	* under the License.
	-->
	<html>
	<head>
	<title>Apache UIMA C++ v2.2.2 Releate Notes</title>
	</head>
	<body>
	<h1>Apache UIMA C++ (Unstructured Information Management Architecture) v2.2.2 Release Notes</h1>

	<h2>Contents</h2>
	<p>
	<a href="#what.is.uima">1. What is UIMA?</a><br/>
	<a href="#major.changes">2. Major Changes in this Release</a><br/>
	<a href="#migrating">3. Migrating from IBM UIMA C++ to Apache UIMA C++</a><br/>
	<a href="#get.involved">4. How to Get Involved</a><br/>
	<a href="#report.issues">5. How to Report Issues</a><br/>
	<a href="#more.info">6. More Documentation on Apache UIMA C++</a><br/>
	</p>

	<h2><a name="what.is.uima">1. What is UIMA?</a></h2>

	<p>
	Unstructured Information Management applications are
	software systems that analyze large volumes of
	unstructured information in order to discover knowledge
	that is relevant to an end user. UIMA is a framework and
	SDK for developing such applications. An example UIM
	application might ingest plain text and identify
	entities, such as persons, places, organizations; or
	relations, such as works-for or located-at. UIMA enables
	such an application to be decomposed into components,
	for example "language identification" -> "language
	specific segmentation" -> "sentence boundary
	detection" -> "entity detection (person/place names
	etc.)". Each component must implement interfaces defined
	by the framework and must provide self-describing
	metadata via XML descriptor files. The framework manages
	these components and the data flow between them.
	Components are written in Java or C++; the data that
	flows between components is designed for efficient
	mapping between these languages. UIMA additionally
	provides capabilities to wrap components as network
	services, and can scale to very large volumes by
	replicating processing pipelines over a cluster of
	networked nodes.
	</p>
	<p>
	Apache UIMA is an Apache-licensed open source
	implementation of the UIMA specification (that
	specification is, in turn, being developed concurrently
	by a technical committee within
	<a href="http://www.oasis-open.org">OASIS</a>
	, a standards organization). We invite and encourage you
	to participate in both the implementation and
	specification efforts.
	</p>
	<p>
	UIMA is a component framework for analysing unstructured
	content such as text, audio and video. It comprises an
	SDK and tooling for composing and running analytic
	components written in Java and C++, with some support
	for Perl, Python and TCL.
	</p>

	<h2><a name="major.changes">2. Major Changes in this Release</a></h2>
	<p>
	This section describes what has changed between version 1.4.4 and version 2.2.2 of
	UIMA C++. A migration guide is provided below that describes the required updates to
	your C++ code and descriptors. See Section 3, "Migrating from IBM UIMA C++ to
	Apache UIMA C++".
	</p>

	<!--
	tutorial and other interlock with Java?
	-->

	<h3>2.1. Complete Content for Build, Test and Package</h3>
	<p>
	This release includes a test suite for the uimacpp library. Also
	included are the tools to build both source and binary distribution
	packages.
	</p>

	<h3>2.2. Extended Platform Support</h3>
	<p>
	On 64-bit Unix platforms the Apache UIMA C++ framework can be built as
	a 64-bit library. This enables C++, Perl, Python and Tcl analytics to
	fully utilize a 64-bit address space. Both XML and binary CAS
	serialization formats are compatible between 32 and 64-bit builds.
	</p>
	<p>
	MacOSX is now fully supported for SDK build and use.
	</p>

	<h3>2.3. Better Integration with Java SDK</h3>
	<p>
	The Apache UIMA SDK shell scripts and Eclipse run configurations set native environment paths assuming the UIMA C++ SDK is installed directly under $UIMA_HOME. This enables the standard UIMA SDK tools to work seemlessly with C++ based annotators.
	</p>
	<p>
	On Unix platforms, the UIMA C++ examples directory can be loaded as an Eclipse CDT project, supporting development of both UIMA C++ and Java components in the same Eclipse IDE.
	</p>
	<p>
	By default, when a uimacpp annotator is instantiated from Java, the annotator runs in the JVM process with communication via the JNI. Multiple uimacpp annotators instantiated in the same JVM must share the same native environment, therefor they must share the same version UIMA C++ framework. As before, a uimacpp annotator can be isolated by wrapping it as a Vinci service.
	</p>
	<p>
	A new approach is provided in this release which allows process isolation of uimacpp annotators without wrapping each one in a JVM. When deployed from Java as a UIMA-AS service, a uimacpp annotator is spawned by the JVM as native process. The native UIMA-AS service communitates to clients via JMS messaging, completely independently of the JVM. However, the native service connects back to the JVM to enable JMX monitoring and logfile integration with other UIMA annotators running in the same JVM.
	</p>

	<h3>2.4. C++ Namespace and Module Name Changes</h3>
	<p>
	The UIMA C++ namespace and shared library has changed from "taf" to "uima".
	Environment variable TAFROOT has changed to UIMACPP_HOME.
	All of the source files have dropped the prefix "taf_". SDK header files
	have moved from $TAFROOT/include/ to $UIMACPP_HOME/include/uima/.
	</p>

	<h3>2.5. XML Descriptor Changes</h3>
	<p>
	The XML namespace in UIMA component descriptors has changed from
	http://uima.watson.ibm.com/resourceSpecifier to
	http://uima.apache.org/resourceSpecifier. The value of the
	<frameworkImplementation> for C++ components must now be org.apache.uima.cpp.
	Although <code>taeDescription</code> is still supported, the use of <code>analysisEngineDescription</code>
	is recommended.
	</p>

	<h3>2.6. TCAS replaced by CAS</h3>
	<p>
	In Apache UIMA the TCAS interface has been removed. All uses of it must now be
	replaced by the CAS interface. All methods that used to be defined on TCAS
	were moved to CAS.
	All annotators should now derive from class <code>Annotator</code>, although for backwards
	compatibility C++ annotators can still derive from the class <code>TextAnnotator</code>.
	For all C++ component types, the CAS delivered to the process method will be a base CAS if Sofa capabilities are
	declared in the component descriptor, else the selected CAS view.
	</p>
	<p>
	The method
	<ul>
	<code>CAS.getTCAS(getSofa(getAnnotatorContext().mapToSofaID("SofaName")))</code>
	</ul>
	has been replaced with
	<ul>
	<code>CAS->getView("SofaName")</code>
	</ul>
	as the Sofa mapping code has been integrated into the CAS.
	</p>

	<h3>2.7. Support added for XMI Serialization</h3>
	<p>
	The proposed standard for XML interchange of CAS data, XMI serialization,
	is now supported by UIMA C++. The C++ application driver, runAECpp, has a new option
	to specify XMI format input files, and the output format is now XMI.
	</p>
	<p>
	XMI serialization is also key to implementing the UIMA-AS service wrapper for uimacpp-based annotators.
	</p>

	<h3>2.8. Building the SDK on Unix is Simplified</h3>
	<p>
	The Unix build is simplified by redistributing GNU automake output files
	in the source tarball. When building from an SVN checkout, up-to-date versions
	of GNU automake, autoconf and libtool are still required.
	</p>

	<h2><a name="migrating">3. Migrating from IBM UIMA C++ to Apache UIMA C++</a></h2>
	<p>
	Although not required, CPP component descriptors of type <code>taeDescription</code> should be changed to type <code>analysisEngineDescription</code>.
	</p>

	<h3>3.1. Migrating C++ Source Code</h3>
	<p>
	This section describes what source code changes are required to migrate from
	UIMA C++ version 1.4.4 to Apache UIMA C++ v2.2.2. Please note that the first two changes
	are order dependent.
	</p>

	<ul>
	<li>Replace [case sensitive] all occurances of <code>getTCAS</code> with <code>getView</code></li>
	<li>Replace [case sensitive] all occurances of <code>TCAS</code> with <code>CAS</code></li>
	<li>Replace [case sensitive] all occurances of <code>TAF_</code> with <code>UIMA_</code></li>
	<li>Replace [case sensitive] all occurances of <code>taf_</code> with <code>uima/</code></li>
	<li>Replace <code>"tafapi.hpp"</code> with <code>"uima/api.hpp"</code></li>
	<li>Replace <code>TextAnnotator</code> with <code>Annotator</code></li>
	<li>Replace the generic C API wrapper, usually at the bottom of a cpp component, with
	the MAKE_AE() macro. See sample code in $UIMACPP_HOME/examples/src</li>
	</ul>

	<h3>3.1. Migrating Scriptator Source Code</h3>
	<p>
	Tcl source code using variables of type TCAS should use CAS instead.
	No changes should be necessary for Perl or Python source.
	</p>

	<h2><a name="get.involved">4. How to Get Involved</a></h2>
	<p>
	The Apache UIMA project really needs and appreciates any contributions,
	including documentation help, source code and feedback. If you are interested
	in contributing, please visit
	<a href="http://incubator.apache.org/uima/get-involved.html">
	http://incubator.apache.org/uima/get-involved.html</a>.
	</p>

	<h2><a name="report.issues">5. How to Report Issues</a></h2>
	<p>
	The Apache UIMA project uses JIRA for issue tracking. Please report any
	issues you find at
	<a href="http://issues.apache.org/jira/browse/uima">http://issues.apache.org/jira/browse/uima</a>
	</p>

	<h2><a name="more.info">6. More Documentation on Apache UIMA C++</a></h2>
	<p>
	Please see <a href="docs/overview_and_setup.html">Overview and Setup</a>
	for a high level overview of UIMA C++,
	and <a href="docs/html/index.html">Doxygen</a> for details on the UIMA C++ APIs.
	</p>

	</body>
	</html>