ctakes-ne-contexts/README - ctakes - Git at Google


 Contents
 - Introduction
 - Negation Annotator
 	- NegationAnnotator.xml
 	- updating negex patterns
 - Status Annotator

 ############
 Introduction
 ############

 The context annotator provides a mechanism for examining the context of existing annotations, finding
 events of interest in the context, and acting on those events in some way.  The negation and status
 annotators both take advantage of this infrastructure by examining the context of named entities
 (e.g. disorders and findings) to see if they should be considered as negated (e.g. "no chest pain")
 or if their status should be modified (e.g. "myocardial infarction" should have status "history of").

 In fact, the "negation annotator" is really just the context annotator configured to deal with negations.

 Similarly, the "status annotator" is the context annotator configured to identify the status of named entities.

 To better understand the context annotator code you should start by reading the javadocs for the class
 org.apache.ctakes.necontexts.ContextAnnotator.java.  It provides a nice conceptual overview of how the code works.

 ##################
 Negation Annotator
 ##################
 What follows is an explanation of how negation is performed using the context annotator.

 The negation detection annotator is a pattern-based (no maxent models required/used) approach that uses
 finite state machines and is roughly based on the popular Negex algorithm introduced by Wendy Chapman.

 %%%%%%%%%%%%%%%%%%%%%
 NegationAnnotator.xml

 We will start by examining the descriptor file desc/NegationAnnotator.xml.  Open this file with the Component
 Descriptor Editor.  Select the first tab labeled "Overview" and observe that the analysis engine that is
 specified is org.apache.ctakes.necontexts.ContextAnnotator.  There is no "negation annotator" analysis
 engine - we simply configure the ContextAnnotator for the task.  Next select the tab labeled "Parameter Settings".

 We will discuss each setting in turn:

  - MaxLeftScopeSize = 10
    The maximum number of annotations that will make up the left-hand side context is ten.
    Increase or decrease this parameter setting to increase or decrease the left hand side context.
  - MaxRightScopeSize = 10
    The maximum number of annotations that will make up the right-hand side context is ten.
    Increase or decrease this parameter setting to increase or decrease the right-hand side context.
  - ScopeOrder = LEFT, RIGHT
    The context annotator will look for signs of negation on the left-hand side of a named entity first and then
    the right-hand side.
  - ContextAnalyzerClass = org.apache.ctakes.necontexts.negation.NegationContextAnalyzer
    The context analyzer looks at the context (e.g. the 10 words on the left or right of the named entity) and
    determines if the named entity should be negated.  If it should, then the negation context analyzer will
    generate a context hit to be consumed by the context hit consumer (see below.)
  - ContextHitConsumerClass = org.apache.ctakes.necontexts.negation.NegationContextHitConsumer
    The context hit consumer handles context hits generated by the context analyzer.  In this case, the negation
    context hit consumer simply sets the certainty of a named entity to -1 which indicates that it has been negated.
  - WindowAnnotationClass = edu.mayo.bmi.uima.common.type.Sentence
    When the context annotator collects the context annotations for a named entity, it will not look beyond the
    boundaries of the sentence that the named entity is found in.
  - FocusAnnotationClass = edu.mayo.bmi.uima.common.type.NamedEntity
    The negation annotator is concerned with negating named entities and thus the focus annotation type
    (the annotations for which a context is generated and examined) specifies named entities.
  - ContextAnnotationClass = edu.mayo.bmi.uima.common.type.BaseToken
    The context of the named entities is a list of tokens - this is what the context analyzer is going to examine.

 So, the work of negating a named entity is done by
  1) finding negations by the NegationContextAnalyzer
  2) updating the status of NamedEntities by the NegationContextHitConsumer.

 The former is a pretty lightweight wrapper around another class which has all of the negation pattern finding
 logic - org.apache.ctakes.core.fsm.machine.NegationFSM.  If you want to update the pattern matching of negation detection,
 then you would have to do it in that class.

 %%%%%%%%%%%%%%%%%%%%%%%
 updating negex patterns

 Updating the negation detection patterns will involve either 1) trial and error experimentation or 2) understanding how
 the the NegationFSM works.  The rules, patterns, words that identify negation are hard-coded into the class
 org.apache.ctakes.core.fsm.machine.NegationFSM which is found in the core project.  I would suggest starting off with the
 trial-and-error approach.  For example, if you wanted to add "impossible" to the lexicon of negation words,
 then you could try adding it to the the _negAdjectivesSet and test the behavior.


 ################
 Status Annotator
 ################

 The way the status annotator works mirrors very closely how the negation annotator works.
 You are encouraged to read the above section, examine the parameter settings given for desc/StatusAnnotator.xml,
 and look at org.apache.ctakes.core.fsm.machine.StatusIndicatorFSM.

	Contents
	- Introduction
	- Negation Annotator
	- NegationAnnotator.xml
	- updating negex patterns
	- Status Annotator

	############
	Introduction
	############

	The context annotator provides a mechanism for examining the context of existing annotations, finding
	events of interest in the context, and acting on those events in some way. The negation and status
	annotators both take advantage of this infrastructure by examining the context of named entities
	(e.g. disorders and findings) to see if they should be considered as negated (e.g. "no chest pain")
	or if their status should be modified (e.g. "myocardial infarction" should have status "history of").

	In fact, the "negation annotator" is really just the context annotator configured to deal with negations.

	Similarly, the "status annotator" is the context annotator configured to identify the status of named entities.

	To better understand the context annotator code you should start by reading the javadocs for the class
	org.apache.ctakes.necontexts.ContextAnnotator.java. It provides a nice conceptual overview of how the code works.

	##################
	Negation Annotator
	##################
	What follows is an explanation of how negation is performed using the context annotator.

	The negation detection annotator is a pattern-based (no maxent models required/used) approach that uses
	finite state machines and is roughly based on the popular Negex algorithm introduced by Wendy Chapman.

	%%%%%%%%%%%%%%%%%%%%%
	NegationAnnotator.xml

	We will start by examining the descriptor file desc/NegationAnnotator.xml. Open this file with the Component
	Descriptor Editor. Select the first tab labeled "Overview" and observe that the analysis engine that is
	specified is org.apache.ctakes.necontexts.ContextAnnotator. There is no "negation annotator" analysis
	engine - we simply configure the ContextAnnotator for the task. Next select the tab labeled "Parameter Settings".

	We will discuss each setting in turn:

	- MaxLeftScopeSize = 10
	The maximum number of annotations that will make up the left-hand side context is ten.
	Increase or decrease this parameter setting to increase or decrease the left hand side context.
	- MaxRightScopeSize = 10
	The maximum number of annotations that will make up the right-hand side context is ten.
	Increase or decrease this parameter setting to increase or decrease the right-hand side context.
	- ScopeOrder = LEFT, RIGHT
	The context annotator will look for signs of negation on the left-hand side of a named entity first and then
	the right-hand side.
	- ContextAnalyzerClass = org.apache.ctakes.necontexts.negation.NegationContextAnalyzer
	The context analyzer looks at the context (e.g. the 10 words on the left or right of the named entity) and
	determines if the named entity should be negated. If it should, then the negation context analyzer will
	generate a context hit to be consumed by the context hit consumer (see below.)
	- ContextHitConsumerClass = org.apache.ctakes.necontexts.negation.NegationContextHitConsumer
	The context hit consumer handles context hits generated by the context analyzer. In this case, the negation
	context hit consumer simply sets the certainty of a named entity to -1 which indicates that it has been negated.
	- WindowAnnotationClass = edu.mayo.bmi.uima.common.type.Sentence
	When the context annotator collects the context annotations for a named entity, it will not look beyond the
	boundaries of the sentence that the named entity is found in.
	- FocusAnnotationClass = edu.mayo.bmi.uima.common.type.NamedEntity
	The negation annotator is concerned with negating named entities and thus the focus annotation type
	(the annotations for which a context is generated and examined) specifies named entities.
	- ContextAnnotationClass = edu.mayo.bmi.uima.common.type.BaseToken
	The context of the named entities is a list of tokens - this is what the context analyzer is going to examine.

	So, the work of negating a named entity is done by
	1) finding negations by the NegationContextAnalyzer
	2) updating the status of NamedEntities by the NegationContextHitConsumer.

	The former is a pretty lightweight wrapper around another class which has all of the negation pattern finding
	logic - org.apache.ctakes.core.fsm.machine.NegationFSM. If you want to update the pattern matching of negation detection,
	then you would have to do it in that class.

	%%%%%%%%%%%%%%%%%%%%%%%
	updating negex patterns

	Updating the negation detection patterns will involve either 1) trial and error experimentation or 2) understanding how
	the the NegationFSM works. The rules, patterns, words that identify negation are hard-coded into the class
	org.apache.ctakes.core.fsm.machine.NegationFSM which is found in the core project. I would suggest starting off with the
	trial-and-error approach. For example, if you wanted to add "impossible" to the lexicon of negation words,
	then you could try adding it to the the _negAdjectivesSet and test the behavior.


	################
	Status Annotator
	################

	The way the status annotator works mirrors very closely how the negation annotator works.
	You are encouraged to read the above section, examine the parameter settings given for desc/StatusAnnotator.xml,
	and look at org.apache.ctakes.core.fsm.machine.StatusIndicatorFSM.