blob: 60d22b7e41060336b86e9546050c98a94eb5e751 [file] [log] [blame]
#summary Describes nuances of Sofas, CASes and CAS Views (in development)
= Introduction =
There are some use-cases for copying text and annotations between CASes, and views. There are various Annotators and classes that help deal with these situations available from the different projects:
* UIMA has the CasCopier that copies annotations between separate CASes, preserving view names.
* uimaFit has the ViewTextCopierAnnotator that copies text between views in the same CAS.
* Phillip Wetzler has progress on code that copies annotations documented in an issue at uimaFIT (46), and ClearTK (175). The code is attached to the uimaFIT issue. It needs test code.
This page describes the problems and the tools. A glossary is included.
= Details =
==CASes, Views and Multi-Cas==
In a simple UIMA annotator, a CAS is a structure that contains FeatureSets and the text or the Sofa, the subject of Annotation.
A more involved annotator can work with a CAS that has more than one view on the CAS. Such an annotator uses a Multi-View CAS and is said to be sofa aware. A Multi-View CAS is a base CAS that contains views, but has no Sofa or indeces (or FeatureSets). Each view has a sofa, indeces and FeatureSets. The name sofa-aware comes from a variable in PrimitiveAnalysisEngine_impl (mSofaAware). An annotator written for single view CASes can be passed a Multi-CAS: it will always access the _InitialView because the surrounding AnalysisComponent picks it out and sends it to the AnalysisEngine. An annotator written for multi-view CAS, when passed a single-view CAS will only have the _InitialView available.
An annotator's process(Jcas) method is applicable to both types of CASes (single and multi-view).
An annotator is sofa-aware if it has been annotated as such with sofaCapabilities (see 6.2.3 of UTDG) . A sofa-aware annotator must get a view to work on since the top-level CAS has no sofa or featureSets. This is accomplished with jcas.getView("_InitialView"). A sofa-unaware annotator is always passed the _InitialView, the default.
Any annotator, sofa-aware or sofa-unaware, can say jcas.getView("_InitialView") whether sofa-aware or not.
Don't be lulled into a false sense of control when writing pipelines in uimaFIT. Just because you can pass a view to the process method of an annotator, does not mean that view will be available inside the process method. When anntoation engines are created in uimaFIT, like in UIMA, it is actually an analysis component that surrounds the analysis engine that is created. The process method of the component works to pass the right view to the analysis engine in case it is a cas-unaware or single-view component. The name sofa-aware comes from a a member variable, mSofaAware, in PrimitiveAnalysisEngine_impl. (personal experience)
When copying annotations from one view to another, using clone() for eample, make sure the sofa feature set is changed to indicate the new view. (need details on exactly how to do this)
==Related Tools==
* *CasCopier* use this to copy between distinct CASes
* *....ViewCreator* use this to safely create views
* *ViewTextCopierAnnotator* use this to copy text between views
* *foo* use this to copy annotations between views. (untested)
----
=Glossary=
* *CAS* Common Annotation/Analysis Structure, an array of FeatureSets, various indeces, and the DocumentText (Ch. 1 UTDG)
* *base CAS* a base CAS is the top-level CAS that contains views. It is different from a simple CAS that contains no views in that the base-CAS has no sofa or indexes.
* *JCas* A Java interface to a CAS (Ch. 1 UTDG)
* *FeatureStructure* Cas Feature Structure type, types of annotations described in a TypeSystem.
* *Feature*, attributes of a FeatureStructure (?)
* *Sofa Subject of Annotation* The Sofa is the text under consideration. The view is the Sofa and related annotations. They are often used interchangeably.
* *multi-View*, *multiple CAS views* UIMA supports a kind of CAS that hs more than one view. Each view has its own Sofa and related FeatureStructures. (Ch. 6 "Mutliple CAS Views of an Artifact" in the UTDG)
* *sofa-aware* A sofa-aware annotator can work with different views. A non sofa-aware, or sofa-unaware, annotator will always work on the default view, _InitialView. The name comes from a member variable in ______ in UIMA.
* *subview*
UTDG: UIMA Tutorial and Developer's Guide