<?xml version="1.0" encoding="UTF-8"?> | |
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN" | |
"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[ | |
<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent"> | |
%uimaents; | |
]> | |
<!-- | |
Licensed to the Apache Software Foundation (ASF) under one | |
or more contributor license agreements. See the NOTICE file | |
distributed with this work for additional information | |
regarding copyright ownership. The ASF licenses this file | |
to you under the Apache License, Version 2.0 (the | |
"License"); you may not use this file except in compliance | |
with the License. You may obtain a copy of the License at | |
http://www.apache.org/licenses/LICENSE-2.0 | |
Unless required by applicable law or agreed to in writing, | |
software distributed under the License is distributed on an | |
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | |
KIND, either express or implied. See the License for the | |
specific language governing permissions and limitations | |
under the License. | |
--> | |
<chapter id="ugr.tug.mvs"> | |
<title>Multiple CAS Views of an Artifact</title> | |
<titleabbrev>Multiple CAS Views</titleabbrev> | |
<para>UIMA provides an extension to the basic model of the CAS which supports analysis of | |
multiple views of the same artifact, all contained with the CAS. This chapter describes | |
the concepts, terminology, and the API and XML extensions that enable this.</para> | |
<para>Multiple CAS Views can simplify things when different versions of the artifact are | |
needed at different stages of the analysis. They are also key to enabling multimodal | |
analysis where the initial artifact is transformed from one modality to another, or where | |
the artifact itself is multimodal, such as the audio, video and closed-captioned text | |
associated with an MPEG object. Each representation of the artifact can be analyzed | |
independently with the standard UIMA programming model; in addition, multi-view | |
components and applications can be constructed.</para> | |
<para>UIMA supports this by augmenting the CAS with additional light-weight CAS objects, | |
one for each view, where these objects share most of the same underlying CAS, except for two | |
things: each view has its own set of indexed Feature Structures, and each view has its own | |
subject of analysis (Sofa) - its own version of the artifact being analyzed. The Feature | |
Structure instances themselves are in the shared part of the CAS; only the entries in the | |
indexes are unique for each CAS view.</para> | |
<para>All of these CAS view objects are kept together with the CAS, and passed as a unit | |
between components in a UIMA application. APIs exist which allow components and | |
applications to switch among the various view objects, as needed.</para> | |
<para>Feature Structures may be indexed in multiple views, if necessary. New methods on CAS | |
Views facilitate adding or removing Feature Structures to or from their index | |
repositories:</para> | |
<programlisting>aView.addFsToIndexes(aFeatureStructure) | |
aView.removeFsFromIndexes(aFeatureStructure)</programlisting> | |
<para>specify the view in which this Feature Structure should be added to or removed from the | |
indexes.</para> | |
<section id="ugr.tug.mvs.cas_views_and_sofas"> | |
<title>CAS Views and Sofas</title> | |
<para>Sofas (see <olink targetdoc="&uima_docs_tutorial_guides;" | |
targetptr="ugr.tug.aas.sofa"/>) and CAS Views are linked. In this implementation, | |
every CAS view has one associated Sofa, and every Sofa has one associated CAS | |
View.</para> | |
<section id="ugr.tug.mvs.naming_views_sofas"> | |
<title>Naming CAS Views and Sofas</title> | |
<para>The developer assigns a name to the View / Sofa, which is a simple string | |
(following the rules for Java identifiers, usually without periods, but see special | |
exception below). These names are declared in the component XML metadata, and are | |
used during assembly and by the runtime to enable switching among multiple Views of | |
the CAS at the same time.</para> | |
<note><para>The name is called the Sofa name, for historical reasons, but it applies | |
equally to the View. In the rest of this chapter, we'll refer to it as the Sofa | |
name.</para></note> | |
<para>Some applications contain components that expect a variable number of Sofas as | |
input or output. An example of a component that takes a variable number of input Sofas | |
could be one that takes several translations of a document and merges them, where each | |
translation was in a separate Sofa. </para> | |
<para> You can specify a variable number of input or output sofa names, where each name | |
has the same base part, by writing the base part of the name (with no periods), followed | |
by a period character and an asterisk character (.*). These denote sofas that have | |
names matching the base part up to the period; for example, names such as | |
<literal>base_name_part.TTX_3d</literal> would match a specification of | |
<literal>base_name_part.*</literal>.</para> | |
</section> | |
<section id="ugr.tug.mvs.multi_view_and_single_view"> | |
<title>Multi-View, Single-View components & applications</title> | |
<titleabbrev>Multi/Single View parts in Applications</titleabbrev> | |
<para>Components and applications can be written to be Multi-View or Single-View. | |
Most components used as primitive building blocks are expected to be Single-View. | |
UIMA provides capabilities to combine these kinds of components with Multi-View | |
components when assembling analysis aggregates or applications.</para> | |
<para>Single-View components and applications use only one subject of analysis, and | |
one CAS View. The code and descriptors for these components do not use the facilities | |
described in this chapter.</para> | |
<para>Conversely, Multi-View components and applications are aware of the | |
possibility of multiple Views and Sofas, and have code and XML descriptors that | |
create and manipulate them.</para> | |
</section> | |
</section> | |
<section id="ugr.tug.mvs.multi_view_components"> | |
<title>Multi-View Components</title> | |
<section id="ugr.tug.mvs.deciding_multi_view"> | |
<title>How UIMA decides if a component is Multi-View</title> | |
<titleabbrev>Deciding: Multi-View</titleabbrev> | |
<para>Every UIMA component has an associated XML Component Descriptor. Multi-View | |
components are identified simply as those whose descriptors declare one or more Sofa | |
names in their Capability sections, as inputs or outputs. If a Component Descriptor | |
does not mention any input or output Sofa names, the framework treats that component | |
as a Single-View component.</para> | |
</section> | |
<section id="ugr.tug.mvs.additional_capabilities"> | |
<title>Multi-View: additional capabilities</title> | |
<para>Additional capabilities provided for components and applications aware of the | |
possibilities of multiple Views and Sofas include:</para> | |
<itemizedlist spacing="compact"><listitem><para>Creating new Views, and for | |
each, setting up the associated Sofa data</para></listitem> | |
<listitem><para>Getting a reference to an existing View and its associated Sofa, by | |
name </para></listitem> | |
<listitem><para>Specifying a view in which to index a particular Feature Structure | |
instance </para></listitem></itemizedlist> | |
</section> | |
<section id="ugr.tug.mvs.component_xml_metadata"> | |
<title>Component XML metadata</title> | |
<para>Each Multi-View component that creates a Sofa or wants to switch to a specific | |
previously created Sofa must declare the name for the Sofa in the capabilities | |
section. For example, a component expecting as input a web document in html format and | |
creating a plain text document for further processing might declare:</para> | |
<programlisting><capabilities> | |
<capability> | |
<inputs/> | |
<outputs/> | |
<inputSofas> | |
<emphasis role="bold"> <sofaName>rawContent</sofaName></emphasis> | |
</inputSofas> | |
<outputSofas> | |
<emphasis role="bold"> <sofaName>detagContent</sofaName></emphasis> | |
</outputSofas> | |
</capability> | |
</capabilities></programlisting> | |
<para>Details on this specification are found in <olink | |
targetdoc="&uima_docs_ref;"/> <olink | |
targetdoc="&uima_docs_ref;" | |
targetptr="ugr.ref.xml.component_descriptor"/>. The Component Descriptor | |
Editor supports Sofa declarations on the <olink targetdoc="&uima_docs_tools;" | |
targetptr="ugr.tools.cde.capabilities">Capabilites Page</olink>.</para> | |
</section> | |
</section> | |
<section id="ugr.tug.mvs.sofa_capabilities_and_apis_for_apps"> | |
<title>Sofa Capabilities and APIs for Applications</title> | |
<titleabbrev>Sofa Capabilities & APIs for Apps</titleabbrev> | |
<para>In addition to components, applications can make use of these capabilities. When | |
an application creates a new CAS, it also creates the initial view of that CAS - and this | |
view is the object that is returned from the create call. Additional views beyond this | |
first one can be dynamically created at any time. The application can use the Sofa APIs | |
described in <olink targetdoc="&uima_docs_tutorial_guides;" | |
targetptr="ugr.tug.aas"/> to specify the data to be analyzed.</para> | |
<para>If an Application creates a new CAS, the initial CAS that is created will be a view | |
named <quote>_InitialView</quote>. This name can be used in the application and in | |
Sofa Mapping (see the next section) to refer to this otherwise unnamed view.</para> | |
</section> | |
<section id="ugr.tug.mvs.sofa_name_mapping"> | |
<title>Sofa Name Mapping</title> | |
<para>Sofa Name mapping is the mechanism which enables UIMA component developers to | |
choose locally meaningful Sofa names in their source code and let aggregate, | |
collection processing engine developers, and application developers connect output | |
Sofas created in one component to input Sofas required in another.</para> | |
<para>At a given aggregation level, the assembler or application developer defines | |
names for all the Sofas, and then specifies how these names map to the contained | |
components, using the Sofa Map.</para> | |
<para>Consider annotator code to create a new CAS view:</para> | |
<programlisting>CAS viewX = cas.createView("X");</programlisting> | |
<para>Or code to get an existing CAS view:</para> | |
<programlisting>CAS viewX = cas.getView("X");</programlisting> | |
<para>Without Sofa name mapping the SofaID for the new Sofa will be <quote>X</quote>. | |
However, if a name mapping for <quote>X</quote> has been specified by the aggregate or | |
CPE calling this annotator, the actual SofaID in the CAS can be different.</para> | |
<para>All Sofas in a CAS must have unique names. This is accomplished by mapping all | |
declared Sofas as described in the following sections. An attempt to create a Sofa with a | |
SofaID already in use will throw an exception.</para> | |
<para>Sofa name mapping must not use the <quote>.</quote> (period) character. Runtime Sofa | |
mapping maps names up to the <quote>.</quote> and appends the period and the following | |
characters to the mapped name.</para> | |
<para>To get a Java Iterator for all the views in a CAS:</para> | |
<programlisting>Iterator allViews = cas.getViewIterator();</programlisting> | |
<para>To get a Java Iterator for selected views in a CAS, for example, views whose name | |
is either exactly equal to namePrefix or is of the form namePrefix.suffix, where suffix | |
can be any String:</para> | |
<programlisting>Iterator someViews = cas.getViewIterator(String namePrefix);</programlisting> | |
<note><para>Sofa name mapping is applied to namePrefix.</para></note> | |
<para>Sofa name mappings are not currently supported for remote Analysis Engines. | |
See <xref linkend="ugr.tug.mvs.name_mapping_remote_services"/>.</para> | |
<section id="ugr.tug.mvs.name_mapping_aggregate"> | |
<title>Name Mapping in an Aggregate Descriptor</title> | |
<para>For each component of an Aggregate, name mapping specifies the conversion | |
between component Sofa names and names at the aggregate level.</para> | |
<para>Here's an example. Consider two Multi-View annotators to be assembled | |
into an aggregate which takes an audio segment consisting of spoken English and | |
produces a German text translation.</para> | |
<para>The first annotator takes an audio segment as input Sofa and produces a text | |
transcript as output Sofa. The annotator designer might choose these Sofa names to be | |
<quote>AudioInput</quote> and <quote>TranscribedText</quote>.</para> | |
<para>The second annotator is designed to translate text from English to German. This | |
developer might choose the input and output Sofa names to be | |
<quote>EnglishDocument</quote> and <quote>GermanDocument</quote>, | |
respectively.</para> | |
<para>In order to hook these two annotators together, the following section would be | |
added to the top level of the aggregate descriptor:</para> | |
<programlisting><![CDATA[<sofaMappings> | |
<sofaMapping> | |
<componentKey>SpeechToText</componentKey> | |
<componentSofaName>AudioInput</componentSofaName> | |
<aggregateSofaName>SegementedAudio</aggregateSofaName> | |
</sofaMapping> | |
<sofaMapping> | |
<componentKey>SpeechToText</componentKey> | |
<componentSofaName>TranscribedText</componentSofaName> | |
<aggregateSofaName>EnglishTranscript</aggregateSofaName> | |
</sofaMapping> | |
<sofaMapping> | |
<componentKey>EnglishToGermanTranslator</componentKey> | |
<componentSofaName>EnglishDocument</componentSofaName> | |
<aggregateSofaName>EnglishTranscript</aggregateSofaName> | |
</sofaMapping> | |
<sofaMapping> | |
<componentKey>EnglishToGermanTranslator</componentKey> | |
<componentSofaName>GermanDocument</componentSofaName> | |
<aggregateSofaName>GermanTranslation</aggregateSofaName> | |
</sofaMapping> | |
</sofaMappings>]]></programlisting> | |
<para>The Component Descriptor Editor supports Sofa name mapping in aggregates and | |
simplifies the task. See <olink targetdoc="&uima_docs_tools;"/> | |
<olink targetdoc="&uima_docs_tools;" | |
targetptr="ugr.tools.cde.capabilities.sofa_name_mapping"/> for details.</para> | |
</section> | |
<section id="ugr.tug.mvs.name_mapping_cpe"><title>Name Mapping in a CPE | |
Descriptor</title> | |
<para>The CPE descriptor aggregates together a Collection Reader and CAS Processors | |
(Annotators and CAS Consumers). Sofa mappings can be added to the following elements | |
of CPE descriptors: <literal><collectionIterator></literal>, | |
<literal><casInitializer></literal> and the | |
<literal><casProcessor></literal>. To be consistent with the | |
organization of CPE descriptors, the maps for the CPE descriptor are distributed | |
among the XML markup for each of the parts (collectionIterator, casInitializer, | |
casProcessor). Because of this the<literal> | |
<componentKey></literal> element is not needed. Finally, rather than | |
sub-elements for the parts, the XML markup for these uses attributes. See <olink | |
targetdoc="&uima_docs_ref;"/> <olink | |
targetdoc="&uima_docs_ref;" | |
targetptr="ugr.ref.xml.cpe_descriptor.descriptor.cas_processors.individual.sofa_name_mappings"/>.</para> | |
<para>Here's an example. Let's use the aggregate from the previous section | |
in a collection processing engine. Here we will add a Collection Reader that outputs | |
audio segments in an output Sofa named <quote>nextSegment</quote>. Remember to | |
declare an output Sofa nextSegment in the collection reader description. | |
We'll add a CAS Consumer in the next section.</para> | |
<programlisting><collectionReader> | |
<collectionIterator> | |
<descriptor> | |
. . . | |
</descriptor> | |
<configurationParameterSettings>...</configurationParameterSettings> | |
<emphasis role="bold"> <sofaNameMappings> | |
<sofaNameMapping componentSofaName="nextSegment" | |
cpeSofaName="SegementedAudio"/> | |
</sofaNameMappings> | |
</emphasis> </collectionIterator> | |
<casInitializer/> | |
<collectionReader></programlisting> | |
<para>At this point the CAS Processor section for the aggregate does not need any Sofa | |
mapping because the aggregate input Sofa has the same name, | |
<quote>SegementedAudio</quote>, as is being produced by the Collection | |
Reader.</para> | |
</section> | |
<section id="ugr.tug.mvs.specifying_cas_view_for_process"> | |
<title>Specifying the CAS View delivered to a Components Process Method</title> | |
<titleabbrev>CAS View received by Process</titleabbrev> | |
<para>All components receive a Sofa named <quote>_InitialView</quote>, or | |
a Sofa that is mapped to this name.</para> | |
<para>For example, assume that the CAS Consumer to be used in our CPE is a Single-View | |
component that expects the analysis results associated with the input CAS, and that | |
we want it to use the results from the translated German text Sofa. The following | |
mapping added to the CAS Processor section for the CPE will instruct the CPE to get the | |
CAS view for the German text Sofa and pass it to the CAS Consumer:</para> | |
<programlisting><casProcessor> | |
. . . | |
<emphasis role="bold"><sofaNameMappings> | |
<sofaNameMapping componentSofaName="_InitialView" | |
cpeSofaName="GermanTranslation"/> | |
<sofaNameMappings> | |
</emphasis></casProcessor></programlisting> | |
<para id="ugr.tug.mvs.sofa_mapping_leav_out_name">An alternative syntax for | |
this kind of mapping is to simply leave out the component sofa name in this | |
case.</para> | |
</section> | |
<section id="ugr.tug.mvs.name_mapping_application"> | |
<title>Name Mapping in a UIMA Application</title> | |
<para>Applications which instantiate UIMA components directly using the | |
UIMAFramework methods can also create a top level Sofa mapping using the | |
<quote>additional parameters</quote> capability.</para> | |
<programlisting>//create a "root" UIMA context for your whole application | |
UimaContextAdmin rootContext = | |
UIMAFramework.newUimaContext(UIMAFramework.getLogger(), | |
UIMAFramework.newDefaultResourceManager(), | |
UIMAFramework.newConfigurationManager()); | |
input = new XMLInputSource("test.xml"); | |
desc = UIMAFramework.getXMLParser().parseAnalysisEngineDescription(input); | |
//setup sofa name mappings using the api | |
HashMap sofamappings = new HashMap(); | |
sofamappings.put("localName1", "globalName1"); | |
sofamappings.put("localName2", "globalName2"); | |
//create a UIMA Context for the new AE we are about to create | |
//first argument is unique key among all AEs used in the application | |
UimaContextAdmin childContext = rootContext.createChild("myAE", sofamap); | |
//instantiate AE, passing the UIMA Context through the additional | |
//parameters map | |
Map additionalParams = new HashMap(); | |
additionalParams.put(Resource.PARAM_UIMA_CONTEXT, childContext); | |
AnalysisEngine ae = | |
UIMAFramework.produceAnalysisEngine(desc,additionalParams);</programlisting> | |
<para>Sofa mappings are applied from the inside out, i.e., local to global. First, any | |
aggregate mappings are applied, then any CPE mappings, and finally, any specified | |
using this <quote>additional parameters</quote> capability.</para> | |
</section> | |
<section id="ugr.tug.mvs.name_mapping_remote_services"> | |
<title>Name Mapping for Remote Services</title> | |
<para>Currently, no client-side Sofa mapping information is passed from a UIMA client | |
to a remote service. This can cause complications for UIMA services in a Multi-View | |
application.</para> | |
<para>Remote Multi-View services will work only if the service is Single-View, or if the | |
Sofa names expected by the service exactly match the Sofa names produced by the client.</para> | |
<para>If your application requires Sofa mappings for a remote Analysis Engine, you | |
can wrap your remotely deployed AE in an aggregate (on the remote side), and specify | |
the necessary Sofa mappings in the descriptor for that aggregate.</para> | |
</section> | |
</section> | |
<section id="ugr.tug.mvs.jcas_extensions_for_multi_views"> | |
<title>JCas extensions for Multiple Views</title> | |
<para>The JCas interface to the CAS can be used with any / all views. | |
You can always get a JCas object from an existing CAS | |
object by using the method getJCas(); this call will create the JCas if it doesn't | |
already exist. If it does exist, it just returns the existing JCas that corresponds to | |
the CAS.</para> | |
<para>JCas implements the getView(...) method, enabling switching to other named | |
views, just like the corresponding method on the CAS. The JCas version, however, | |
returns JCas objects, instead of CAS objects, corresponding to the view.</para> | |
</section> | |
<section id="ugr.tug.mvs.sample_application"> | |
<title>Sample Multi-View Application</title> | |
<para>The UIMA SDK contains a simple Sofa example application which demonstrates many | |
Sofa specific concepts and methods. The source code for the application driver is in | |
<literal>examples/src/org/apache/uima/examples/SofaExampleApplication.java</literal> | |
and the Multi-View annotator is given in | |
<literal>SofaExampleAnnotator.java</literal> in the same directory.</para> | |
<para>This sample application demonstrates a language translator annotator which | |
expects an input text Sofa with an English document and creates an output text Sofa | |
containing a German translation. Some of the key Sofa concepts illustrated here | |
include:</para> | |
<itemizedlist spacing="compact"><listitem><para>Sofa creation.</para> | |
</listitem> | |
<listitem><para>Access of multiple CAS views.</para></listitem> | |
<listitem><para>Unique feature structure index space for each view.</para> | |
</listitem> | |
<listitem><para>Feature structures containing cross references between | |
annotations in different CAS views.</para></listitem> | |
<listitem><para>The strong affinity of annotations with a specific Sofa. </para> | |
</listitem></itemizedlist> | |
<section id="ugr.tug.mvs.sample_application.descriptor"> | |
<title>Annotator Descriptor</title> | |
<para>The annotator descriptor in | |
<literal>examples/descriptors/analysis_engine/SofaExampleAnnotator.xml</literal> | |
declares an input Sofa named <quote>EnglishDocument</quote> and an output Sofa | |
named <quote>GermanDocument</quote>. A custom type | |
<quote>CrossAnnotation</quote> is also defined:</para> | |
<programlisting><![CDATA[<typeDescription> | |
<name>sofa.test.CrossAnnotation</name> | |
<description/> | |
<supertypeName>uima.tcas.Annotation</supertypeName> | |
<features> | |
<featureDescription> | |
<name>otherAnnotation</name> | |
<description/> | |
<rangeTypeName>uima.tcas.Annotation</rangeTypeName> | |
</featureDescription> | |
</features> | |
</typeDescription>]]></programlisting> | |
<para>The <literal>CrossAnnotation</literal> type is derived from | |
<literal>uima.tcas.Annotation </literal>and includes one new feature: a | |
reference to another annotation.</para> | |
</section> | |
<section id="ugr.tug.mvs.sample_application.setup"> | |
<title>Application Setup</title> | |
<para>The application driver instantiates an analysis engine, | |
<literal>seAnnotator</literal>, from the annotator descriptor, obtains a new | |
CAS using that engine's CAS definition, and creates the expected input | |
Sofa using:</para> | |
<programlisting>CAS cas = seAnnotator.newCAS(); | |
CAS aView = cas.createView("EnglishDocument");</programlisting> | |
<para>Since <literal>seAnnotator</literal> is a primitive component, and no Sofa | |
mapping has been defined, the SofaID will be <quote>EnglishDocument</quote>. | |
Local Sofa data is set using:</para> | |
<programlisting>aView.setDocumentText("this beer is good");</programlisting> | |
<para>At this point the CAS contains all necessary inputs for the translation | |
annotator and its process method is called.</para> | |
</section> | |
<section id="ugr.tug.mvs.sample_application.annotator_processing"> | |
<title>Annotator Processing</title> | |
<para>Annotator processing consists of parsing the English document into individual | |
words, doing word-by-word translation and concatenating the translations into a | |
German translation. Analysis metadata on the English Sofa will be an annotation for | |
each English word. Analysis metadata on the German Sofa will be a | |
<literal>CrossAnnotation</literal> for each German word, where the | |
<literal>otherAnnotation</literal> feature will be a reference to the associated | |
English annotation.</para> | |
<para>Code of interest includes two CAS views:</para> | |
<programlisting>// get View of the English text Sofa | |
englishView = aCas.getView("EnglishDocument"); | |
// Create the output German text Sofa | |
germanView = aCas.createView("GermanDocument");</programlisting> | |
<para>the indexing of annotations with the appropriate view:</para> | |
<programlisting>englishView.addFsToIndexes(engAnnot); | |
. . . | |
germanView.addFsToIndexes(germAnnot);</programlisting> | |
<para>and the combining of metadata belonging to different Sofas in the same feature | |
structure:</para> | |
<programlisting>// add link to English text | |
germAnnot.setFeatureValue(other, engAnnot);</programlisting> | |
</section> | |
<section id="ugr.tug.mvs.sample_application.accessing_results"> | |
<title>Accessing the results of analysis</title> | |
<para>The application needs to get the results of analysis, which may be in different | |
views. Analysis results for each Sofa are dumped independently by iterating over all | |
annotations for each associated CAS view. For the English Sofa:</para> | |
<programlisting>//get annotation iterator for this CAS | |
FSIndex anIndex = aView.getAnnotationIndex(); | |
FSIterator anIter = anIndex.iterator(); | |
while (anIter.isValid()) { | |
AnnotationFS annot = (AnnotationFS) anIter.get(); | |
System.out.println(" " + annot.getType().getName() | |
+ ": " + annot.getCoveredText()); | |
anIter.moveToNext(); | |
}</programlisting> | |
<para>Iterating over all German annotations looks the same, except for the | |
following:</para> | |
<programlisting>if (annot.getType() == cross) { | |
AnnotationFS crossAnnot = | |
(AnnotationFS) annot.getFeatureValue(other); | |
System.out.println(" other annotation feature: " | |
+ crossAnnot.getCoveredText()); | |
}</programlisting> | |
<para>Of particular interest here is the built-in Annotation type method | |
<literal>getCoveredText()</literal>. This method uses the | |
<quote>begin</quote> and <quote>end</quote> features of the annotation to create | |
a substring from the CAS document. The SofaRef feature of the annotation is used to | |
identify the correct Sofa's data from which to create the substring.</para> | |
<para>The example program output is:</para> | |
<programlisting>---Printing all annotations for English Sofa--- | |
uima.tcas.DocumentAnnotation: this beer is good | |
uima.tcas.Annotation: this | |
uima.tcas.Annotation: beer | |
uima.tcas.Annotation: is | |
uima.tcas.Annotation: good | |
---Printing all annotations for German Sofa--- | |
uima.tcas.DocumentAnnotation: das bier ist gut | |
sofa.test.CrossAnnotation: das | |
other annotation feature: this | |
sofa.test.CrossAnnotation: bier | |
other annotation feature: beer | |
sofa.test.CrossAnnotation: ist | |
other annotation feature: is | |
sofa.test.CrossAnnotation: gut | |
other annotation feature: good</programlisting> | |
</section> | |
</section> | |
<section id="ugr.tug.mvs.views_api_summary"> | |
<title>Views API Summary</title> | |
<para>The recommended way to deliver a particular CAS view to a <emphasis role="bold-italic">Single-View</emphasis> component is to use by Sofa-mapping in | |
the CPE and/or aggregate descriptors.</para> | |
<para>For <emphasis role="bold-italic">Multi-View </emphasis> components or | |
applications, the following methods are used to create or get a reference to a CAS view | |
for a particular Sofa:</para> | |
<para>Creating a new View:</para> | |
<programlisting>JCas newView = aJCas.createView(String localNameOfTheViewBeforeMapping); | |
CAS newView = aCAS .createView(String localNameOfTheViewBeforeMapping);</programlisting> | |
<para>Getting a View from a CAS or JCas:</para> | |
<programlisting><?db-font-size 80% ?>JCas myView = aJCas.getView(String localNameOfTheViewBeforeMapping); | |
CAS myView = aCAS .getView(String localNameOfTheViewBeforeMapping); | |
Iterator allViews = aCasOrJCas.getViewIterator(); | |
Iterator someViews = aCasOrJCas.getViewIterator(String localViewNamePrefix);</programlisting> | |
<para>The following methods are useful for all annotators and applications:</para> | |
<para>Setting Sofa data for a CAS or JCas:</para> | |
<programlisting>aCasOrJCas.setDocumentText(String docText); | |
aCasOrJCas.setSofaDataString(String docText, String mimeType); | |
aCasOrJCas.setSofaDataArray(FeatureStructure array, String mimeType); | |
aCasOrJCas.setSofaDataURI(String uri, String mimeType);</programlisting> | |
<para>Getting Sofa data for a particular CAS or JCas:</para> | |
<programlisting>String doc = aCasOrJCas.getDocumentText(); | |
String doc = aCasOrJCas.getSofaDataString(); | |
FeatureStructure array = aCasOrJCas.getSofaDataArray(); | |
String uri = aCasOrJCas.getSofaDataURI(); | |
InputStream is = aCasOrJCas.getSofaDataStream();</programlisting> | |
</section> | |
<section id="ugr.tug.mvs.sofa_incompatibilities_v1_v2"> | |
<title>Sofa Incompatibilities between UIMA version 1 and version 2</title> | |
<titleabbrev>Sofa Incompatibilities: V1 and V2</titleabbrev> | |
<para>A major change in version 2 is related to the support of Single-View components | |
and applications. Given an analysis engine, <literal>ae</literal>, the API | |
<programlisting>CAS cas = ae.newCas();</programlisting> | |
used to return the base CAS. Now it returns a view of the Sofa named | |
<quote>_InitialView</quote>. This Sofa will actually only be created if any Sofa data | |
is set for this view. The initial view is used for Single-View applications and | |
Multi-View annotators with no Sofa mapping.</para> | |
<para>The process method of Multi-View annotators receive the base CAS, however the base | |
CAS no longer has an index repository to hold <quote>global</quote> data. Global data | |
needs to be put in a specific named CAS view of your choice.</para> | |
<note>New behavior as of version 2.7.0 | |
<para>A Multi-View component is no longer passed the base CAS view. | |
See <xref linkend="ugr.tug.mvs.specifying_cas_view_for_process"/>.</para> | |
</note> | |
<para>Because of these changes, the following scenarios will break with v2.0 clients: | |
<itemizedlist spacing="compact"><listitem><para>Any version 1.x services (you | |
must migrate the services to version 2).</para></listitem> | |
<listitem><para>Applications or components explicitly referencing | |
<quote>_DefaultTextSofaName</quote> in code or descriptors.</para> | |
</listitem> | |
<listitem><para>Multi-View applications using the Base CAS index repository. | |
</para></listitem></itemizedlist></para> | |
</section> | |
</chapter> |