RELEASE_NOTES.html - uima-uimaj - Git at Google

 	<!--
     ***************************************************************
     * Licensed to the Apache Software Foundation (ASF) under one
     * or more contributor license agreements.  See the NOTICE file
     * distributed with this work for additional information
     * regarding copyright ownership.  The ASF licenses this file
     * to you under the Apache License, Version 2.0 (the
     * "License"); you may not use this file except in compliance
     * with the License.  You may obtain a copy of the License at
     *
     *   http://www.apache.org/licenses/LICENSE-2.0
     *
     * Unless required by applicable law or agreed to in writing,
     * software distributed under the License is distributed on an
     * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     * KIND, either express or implied.  See the License for the
     * specific language governing permissions and limitations
     * under the License.
     ***************************************************************
    -->
 <html>
 <head>
   <title>Apache UIMA v2.7.0 Release Notes</title>
 </head>
 <body>
 <h1>Apache UIMA (Unstructured Information Management Architecture) v2.7.0 Release Notes</h1>

 <h2>Contents</h2>
 <p>
 <a href="#what.is.uima">What is UIMA?</a><br/>
 <a href="#major.changes">Major Changes in this Release</a><br/>
 <a href="#get.involved">How to Get Involved</a><br/>
 <a href="#report.issues">How to Report Issues</a><br/>
 <a href="#list.issues">List of JIRA Issues Fixed in this Release</a><br/>
 </p>

 <h2><a id="what.is.uima">1. What is UIMA?</a></h2>

      <p>
   			Unstructured Information Management applications are
 				software systems that analyze large volumes of
 				unstructured information in order to discover knowledge
 				that is relevant to an end user. UIMA is a framework and
 				SDK for developing such applications. An example UIM
 				application might ingest plain text and identify
 				entities, such as persons, places, organizations; or
 				relations, such as works-for or located-at. UIMA enables
 				such an application to be decomposed into components,
 				for example "language identification" -&gt; "language
 				specific segmentation" -&gt; "sentence boundary
 				detection" -&gt; "entity detection (person/place names
 				etc.)". Each component must implement interfaces defined
 				by the framework and must provide self-describing
 				metadata via XML descriptor files. The framework manages
 				these components and the data flow between them.
 				Components are written in Java or C++; the data that
 				flows between components is designed for efficient
 				mapping between these languages. UIMA additionally
 				provides capabilities to wrap components as network
 				services, and can scale to very large volumes by
 				replicating processing pipelines over a cluster of
 				networked nodes.
 			</p>
       <p>
 				Apache UIMA is an Apache-licensed open source
 				implementation of the UIMA specification (that
 				specification is, in turn, being developed concurrently
 				by a technical committee within
 				<a href="http://www.oasis-open.org">OASIS</a>,
 				a standards organization). We invite and encourage you
 				to participate in both the implementation and
 				specification efforts.
 			</p>
       <p>
 				UIMA is a component framework for analysing unstructured
 				content such as text, audio and video. It comprises an
 				SDK and tooling for composing and running analytic
 				components written in Java and C++, with some support
 				for Perl, Python and TCL.
 			</p>

 <h2><a id="major.changes">Major Changes in this Release</a></h2>

 <h3>Java 7 minimum level</h3>
 <p>Java 7 is now the minimum level of Java required.</p>

 <h3>Several JVM properties support backwards compatibility</h3>
 <p>See the README and <a target="_blank" href="http://uima.apache.org/d/uimaj-2.7.0/references.html#ugr.ref.config">
 the Reference chapter</a> for a description of these.

 <h3>JSON serialization support</h3>
 <p>
  JSON serialization support is added for Type System Descriptions, and for CASs.
   Several formats for JSON CAS serialization are provided, please see the chapter in the
   UIMA reference documentation for details.
 </p>

 <h3>Sorted and Bag indexes no longer store multiple instances of identical FSs</h3>
 <p>
 The meaning of "bag" and "sorted" index has been made consistent with how these are handled
 when sending CASes to remote services.  This means that adding the same identical FS to the
 indexes, multiple times, will no longer add duplicate index entries.  And, removing a FS from an index
 is guaranteed that that particular FS will no longer be in any index.  (Before, if you had
 added a particular FS to a sorted or bag index, multiple times, the remove behavior would
 remove just one of the instances).  Deserialization of CASes sent to remote services has never
 added Feature Structures to the index multiple times, so this change makes that behavior
 consistent. For more details, see <a target="_blank" href="https://issues.apache.org/jira/browse/UIMA-3399">
 Jira issue UIMA-3399</a>.
 </p>

 <p>Because some users may need the previous behavior that permitted duplicates of identical Feature Structures in
 the sorted and bag indexes. this change can be disabled, by running the JVM with the defined property
 "uima.allow_duplicate_add_to_indexes".</p>


 <h3>Index corruption avoidance</h3>
 <p>To prevent potential index corruption, UIMA now recovers (unless disabled by
    <code>"-Duima.disable_auto_protect_indexes"</code>)
   from illegal modifications of features.</p>

   <p>These are modifications to features used as index keys, done while the Feature Structure
   being modified is currently in one or more indexes (see
   <a target="_blank" href="https://issues.apache.org/jira/browse/UIMA-4135">Jira issue UIMA-4135</a>).</p>

   <p>Corruption is prevented by first removing the feature structure being updated,
   then doing the update, and then adding the feature structure back to the indexes.
   Because these actions can affect performance, it is recommended that you run with JVM property
   "-Duima.report_fs_update_corrupts_index" in order to see if any user code has this problem, and fix these via
   redesign, or by wrapping the affected areas with a form of <code>protectIndexes()</code>, which does the
   needed removes and add-backs under your control, so you can do several feature updates at once,
   before adding the feature structure back.  <code>protectIndexes</code> is described in the
   <a target="_blank" href=
   "http://uima.apache.org/d/uimaj-2.7.0/references.html#ugr.ref.cas.updating_indexed_feature_structures">
   CAS Reference Chapter</a> and the CAS Javadocs; you can also use it with the JCas.</p>

   <p>Because this protection is automatic and hidden, if you are iterating over sorted or set indexes, the
   automatic recovery may cause unexpected ConcurrentModificationExceptions to be thrown by the iterator when
   advancing. To work around this, either stop modifying features which are used as keys in the index being iterated over,
   or use Snapshot iterators (see following).</p>

 <h3>New Snapshot iterators won't throw ConcurrentModificationException</h3>
 <p>A long-standing difficulty with Feature Structure iterators, namely, that adding to / removing from the underlying
   index being iterated over is not allowed while iterating (unless you use a moveToXXX kind of operation to "reset" the
   iterator), is addressed with a new class of "snapshot" iterators.
   These take a snapshot of the state of the index when the iterator is created; subsequent modifications to the index
   are then permitted, while the iterator continues to iterate over the snapshot it created; these iterators do not
   throw ConcurrentModificationException.  The implementation of this feature is via a new method on FSIndex,
   <a target="_blank" href=
   "http://uima.apache.org/d/uimaj-2.7.0/apidocs/org/apache/uima/cas/FSIndex.html#withSnapshotIterators--">
   withSnapshotIterators()</a>, which
   creates a light-weight copy of the the FSIndex instance whose iterator method iterators gets the Snapshot kind.
   This approach allows using the new index in Java's "extended for" statement.</p>

   <p>The current implementation of the snapshot iterators makes a snapshot of the index being iterated over, at
   creation time, which has a cost in space and time.</p>

 <h3>Other changes</h3>
 <p>Some of the other major changes are listed here; for the complete list, see <a href="issuesFixed/jira-report.html">
 the Issues Fixed report</a>.</p>
 <ul><li> making the JCasGen Eclipse plugin work with more varieties of specifications for class paths.
  Jira issues: UIMA-<a target="_blank" href="https://issues.apache.org/jira/browse/UIMA-4080">4080</a>/
  <a target="_blank" href="https://issues.apache.org/jira/browse/UIMA-4081">4081</a></li>

  <li>moveTo(a_Feature_Structure) or creating a new iterator to start at a feature sometimes went to the wrong place.
  Jira issues: <a target="_blank" href="https://issues.apache.org/jira/browse/UIMA-4094">UIMA-4094</a>
  and <a target="_blank" href="https://issues.apache.org/jira/browse/UIMA-4105">UIMA-4105</a>.</li>

  <li>deserialization of deltaCAS when modifying existing indexed Feature Structures could corrupt the indexes.
  Jira issue: <a target="_blank" href="https://issues.apache.org/jira/browse/UIMA-4100">UIMA-4100</a>.</li>

  <li>default bag indexes will now be created if there are only Set indexes.
  Jira issue: <a target="_blank" href="https://issues.apache.org/jira/browse/UIMA-4111">UIMA-4111</a>.</li>

  <li><p>Xmi CAS Serialization now checks to see that list and array feature values marked as multipleReferencesAllowed=false
  (or not marked at all, which defaults to multipleReferencesAllowed=false) are not multiply-referenced.  If they are,
  they continue to be serialized as if they are independent objects (as was previously done), but now a new
  warning message is issued.  Because there can be a huge number of these messages, they are automatically
  throttled down, to prevent running out of room in the error logs.</p>
  <p>The message strings for these look like "Feature [some-feature-name] is marked multipleReferencesAllowed=false,
  but it has multiple references.  These will be serialized in duplicate."</p></li>

  <li>The CasCopier now checks to insure that the range type of the target feature has the same name as the
  range type of the source feature, which catches errors when two different type systems are used for the source and
  target.  For example, this now prevents a feature with range type "uima.cas.Float" from being copied
  into one with range type "uima.cas.String".
  </li>
  </ul>

  <h3>Performance change highlights</h3>
  <ul>
  <li>Iterators obtained for Bag indexes and from the method getAllIndexedFS( ... type ... ) are
  by definition, unordered.  The implementation for these iterators is now
  much faster, taking advantage of the unordered aspect of these things and returning items in a more
  efficient sequence.
  Jira issue: <a target="_blank" href="https://issues.apache.org/jira/browse/UIMA-4166">UIMA-4166</a>.</li>
  <li>CasCopier has been reimplemented and is approximately 5-10 times faster.</li>
  </ul>

  <p>The complete list of fixes is <a href="issuesFixed/jira-report.html">here</a>.

 <h2><a id="get.involved">How to Get Involved</a></h2>
 <p>
 The Apache UIMA project really needs and appreciates any contributions,
 including documentation help, source code and feedback.  If you are interested
 in contributing, please visit
 <a href="http://uima.apache.org/get-involved.html">
   http://uima.apache.org/get-involved.html</a>.
 </p>

 <h2><a id="report.issues">How to Report Issues</a></h2>
 <p>
 The Apache UIMA project uses JIRA for issue tracking.  Please report any
 issues you find at
 <a href="http://issues.apache.org/jira/browse/uima">http://issues.apache.org/jira/browse/uima</a>
 </p>

 <h2><a id="list.issues">List of JIRA Issues Fixed in this Release</a></h2>
 Click <a href="issuesFixed/jira-report.html">issuesFixed/jira-report.hmtl</a> for the list of
 issues fixed in this release.

 </body>
 </html>
	<!--
	***************************************************************
	* Licensed to the Apache Software Foundation (ASF) under one
	* or more contributor license agreements. See the NOTICE file
	* distributed with this work for additional information
	* regarding copyright ownership. The ASF licenses this file
	* to you under the Apache License, Version 2.0 (the
	* "License"); you may not use this file except in compliance
	* with the License. You may obtain a copy of the License at
	*
	* http://www.apache.org/licenses/LICENSE-2.0
	*
	* Unless required by applicable law or agreed to in writing,
	* software distributed under the License is distributed on an
	* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	* KIND, either express or implied. See the License for the
	* specific language governing permissions and limitations
	* under the License.
	***************************************************************
	-->
	<html>
	<head>
	<title>Apache UIMA v2.7.0 Release Notes</title>
	</head>
	<body>
	<h1>Apache UIMA (Unstructured Information Management Architecture) v2.7.0 Release Notes</h1>

	<h2>Contents</h2>
	<p>
	<a href="#what.is.uima">What is UIMA?</a><br/>
	<a href="#major.changes">Major Changes in this Release</a><br/>
	<a href="#get.involved">How to Get Involved</a><br/>
	<a href="#report.issues">How to Report Issues</a><br/>
	<a href="#list.issues">List of JIRA Issues Fixed in this Release</a><br/>
	</p>

	<h2><a id="what.is.uima">1. What is UIMA?</a></h2>

	<p>
	Unstructured Information Management applications are
	software systems that analyze large volumes of
	unstructured information in order to discover knowledge
	that is relevant to an end user. UIMA is a framework and
	SDK for developing such applications. An example UIM
	application might ingest plain text and identify
	entities, such as persons, places, organizations; or
	relations, such as works-for or located-at. UIMA enables
	such an application to be decomposed into components,
	for example "language identification" -> "language
	specific segmentation" -> "sentence boundary
	detection" -> "entity detection (person/place names
	etc.)". Each component must implement interfaces defined
	by the framework and must provide self-describing
	metadata via XML descriptor files. The framework manages
	these components and the data flow between them.
	Components are written in Java or C++; the data that
	flows between components is designed for efficient
	mapping between these languages. UIMA additionally
	provides capabilities to wrap components as network
	services, and can scale to very large volumes by
	replicating processing pipelines over a cluster of
	networked nodes.
	</p>
	<p>
	Apache UIMA is an Apache-licensed open source
	implementation of the UIMA specification (that
	specification is, in turn, being developed concurrently
	by a technical committee within
	<a href="http://www.oasis-open.org">OASIS</a>,
	a standards organization). We invite and encourage you
	to participate in both the implementation and
	specification efforts.
	</p>
	<p>
	UIMA is a component framework for analysing unstructured
	content such as text, audio and video. It comprises an
	SDK and tooling for composing and running analytic
	components written in Java and C++, with some support
	for Perl, Python and TCL.
	</p>

	<h2><a id="major.changes">Major Changes in this Release</a></h2>

	<h3>Java 7 minimum level</h3>
	<p>Java 7 is now the minimum level of Java required.</p>

	<h3>Several JVM properties support backwards compatibility</h3>
	<p>See the README and <a target="_blank" href="http://uima.apache.org/d/uimaj-2.7.0/references.html#ugr.ref.config">
	the Reference chapter</a> for a description of these.

	<h3>JSON serialization support</h3>
	<p>
	JSON serialization support is added for Type System Descriptions, and for CASs.
	Several formats for JSON CAS serialization are provided, please see the chapter in the
	UIMA reference documentation for details.
	</p>

	<h3>Sorted and Bag indexes no longer store multiple instances of identical FSs</h3>
	<p>
	The meaning of "bag" and "sorted" index has been made consistent with how these are handled
	when sending CASes to remote services. This means that adding the same identical FS to the
	indexes, multiple times, will no longer add duplicate index entries. And, removing a FS from an index
	is guaranteed that that particular FS will no longer be in any index. (Before, if you had
	added a particular FS to a sorted or bag index, multiple times, the remove behavior would
	remove just one of the instances). Deserialization of CASes sent to remote services has never
	added Feature Structures to the index multiple times, so this change makes that behavior
	consistent. For more details, see <a target="_blank" href="https://issues.apache.org/jira/browse/UIMA-3399">
	Jira issue UIMA-3399</a>.
	</p>

	<p>Because some users may need the previous behavior that permitted duplicates of identical Feature Structures in
	the sorted and bag indexes. this change can be disabled, by running the JVM with the defined property
	"uima.allow_duplicate_add_to_indexes".</p>


	<h3>Index corruption avoidance</h3>
	<p>To prevent potential index corruption, UIMA now recovers (unless disabled by
	<code>"-Duima.disable_auto_protect_indexes"</code>)
	from illegal modifications of features.</p>

	<p>These are modifications to features used as index keys, done while the Feature Structure
	being modified is currently in one or more indexes (see
	<a target="_blank" href="https://issues.apache.org/jira/browse/UIMA-4135">Jira issue UIMA-4135</a>).</p>

	<p>Corruption is prevented by first removing the feature structure being updated,
	then doing the update, and then adding the feature structure back to the indexes.
	Because these actions can affect performance, it is recommended that you run with JVM property
	"-Duima.report_fs_update_corrupts_index" in order to see if any user code has this problem, and fix these via
	redesign, or by wrapping the affected areas with a form of <code>protectIndexes()</code>, which does the
	needed removes and add-backs under your control, so you can do several feature updates at once,
	before adding the feature structure back. <code>protectIndexes</code> is described in the
	<a target="_blank" href=
	"http://uima.apache.org/d/uimaj-2.7.0/references.html#ugr.ref.cas.updating_indexed_feature_structures">
	CAS Reference Chapter</a> and the CAS Javadocs; you can also use it with the JCas.</p>

	<p>Because this protection is automatic and hidden, if you are iterating over sorted or set indexes, the
	automatic recovery may cause unexpected ConcurrentModificationExceptions to be thrown by the iterator when
	advancing. To work around this, either stop modifying features which are used as keys in the index being iterated over,
	or use Snapshot iterators (see following).</p>

	<h3>New Snapshot iterators won't throw ConcurrentModificationException</h3>
	<p>A long-standing difficulty with Feature Structure iterators, namely, that adding to / removing from the underlying
	index being iterated over is not allowed while iterating (unless you use a moveToXXX kind of operation to "reset" the
	iterator), is addressed with a new class of "snapshot" iterators.
	These take a snapshot of the state of the index when the iterator is created; subsequent modifications to the index
	are then permitted, while the iterator continues to iterate over the snapshot it created; these iterators do not
	throw ConcurrentModificationException. The implementation of this feature is via a new method on FSIndex,
	<a target="_blank" href=
	"http://uima.apache.org/d/uimaj-2.7.0/apidocs/org/apache/uima/cas/FSIndex.html#withSnapshotIterators--">
	withSnapshotIterators()</a>, which
	creates a light-weight copy of the the FSIndex instance whose iterator method iterators gets the Snapshot kind.
	This approach allows using the new index in Java's "extended for" statement.</p>

	<p>The current implementation of the snapshot iterators makes a snapshot of the index being iterated over, at
	creation time, which has a cost in space and time.</p>

	<h3>Other changes</h3>
	<p>Some of the other major changes are listed here; for the complete list, see <a href="issuesFixed/jira-report.html">
	the Issues Fixed report</a>.</p>
	<ul><li> making the JCasGen Eclipse plugin work with more varieties of specifications for class paths.
	Jira issues: UIMA-<a target="_blank" href="https://issues.apache.org/jira/browse/UIMA-4080">4080</a>/
	<a target="_blank" href="https://issues.apache.org/jira/browse/UIMA-4081">4081</a></li>

	<li>moveTo(a_Feature_Structure) or creating a new iterator to start at a feature sometimes went to the wrong place.
	Jira issues: <a target="_blank" href="https://issues.apache.org/jira/browse/UIMA-4094">UIMA-4094</a>
	and <a target="_blank" href="https://issues.apache.org/jira/browse/UIMA-4105">UIMA-4105</a>.</li>

	<li>deserialization of deltaCAS when modifying existing indexed Feature Structures could corrupt the indexes.
	Jira issue: <a target="_blank" href="https://issues.apache.org/jira/browse/UIMA-4100">UIMA-4100</a>.</li>

	<li>default bag indexes will now be created if there are only Set indexes.
	Jira issue: <a target="_blank" href="https://issues.apache.org/jira/browse/UIMA-4111">UIMA-4111</a>.</li>

	<li><p>Xmi CAS Serialization now checks to see that list and array feature values marked as multipleReferencesAllowed=false
	(or not marked at all, which defaults to multipleReferencesAllowed=false) are not multiply-referenced. If they are,
	they continue to be serialized as if they are independent objects (as was previously done), but now a new
	warning message is issued. Because there can be a huge number of these messages, they are automatically
	throttled down, to prevent running out of room in the error logs.</p>
	<p>The message strings for these look like "Feature [some-feature-name] is marked multipleReferencesAllowed=false,
	but it has multiple references. These will be serialized in duplicate."</p></li>

	<li>The CasCopier now checks to insure that the range type of the target feature has the same name as the
	range type of the source feature, which catches errors when two different type systems are used for the source and
	target. For example, this now prevents a feature with range type "uima.cas.Float" from being copied
	into one with range type "uima.cas.String".
	</li>
	</ul>

	<h3>Performance change highlights</h3>
	<ul>
	<li>Iterators obtained for Bag indexes and from the method getAllIndexedFS( ... type ... ) are
	by definition, unordered. The implementation for these iterators is now
	much faster, taking advantage of the unordered aspect of these things and returning items in a more
	efficient sequence.
	Jira issue: <a target="_blank" href="https://issues.apache.org/jira/browse/UIMA-4166">UIMA-4166</a>.</li>
	<li>CasCopier has been reimplemented and is approximately 5-10 times faster.</li>
	</ul>

	<p>The complete list of fixes is <a href="issuesFixed/jira-report.html">here</a>.

	<h2><a id="get.involved">How to Get Involved</a></h2>
	<p>
	The Apache UIMA project really needs and appreciates any contributions,
	including documentation help, source code and feedback. If you are interested
	in contributing, please visit
	<a href="http://uima.apache.org/get-involved.html">
	http://uima.apache.org/get-involved.html</a>.
	</p>

	<h2><a id="report.issues">How to Report Issues</a></h2>
	<p>
	The Apache UIMA project uses JIRA for issue tracking. Please report any
	issues you find at
	<a href="http://issues.apache.org/jira/browse/uima">http://issues.apache.org/jira/browse/uima</a>
	</p>

	<h2><a id="list.issues">List of JIRA Issues Fixed in this Release</a></h2>
	Click <a href="issuesFixed/jira-report.html">issuesFixed/jira-report.hmtl</a> for the list of
	issues fixed in this release.

	</body>
	</html>