ctakes-drug-ner/README - ctakes - Git at Google

 Contents
 - Introduction
 - Running the Drug NER pipeline
 	- AggregateTAE.xml  for CDA documents conforming to the provided DTD.
 	- AggregatePlaintextProcessor.xml  for plaintext documents.
 - Fixes
 	- Version 1.2.1

 ############
 Introduction
 ############

 This project adds the ability to identify attributes of  drug mentions such as Dosage, Frequency, Frequency Unit,
 Route and Strength from either plaintext or CDA documents. It also provides the ability to specify which sections
 of a note contain drugs in a list format versus drug mentions within the narrative of the note. This allows for
 customized processing done on different sections and generally improves the quality of the annotations.
 This project utilizes various cTAKES components and hence requires cTAKES to be installed prior to using this component.


 ############################################################################
 Running the ctakes-clinical-pipeline
 ############################################################################

 This project (or PEAR file), the "Drug NER", relies on other projects/PEAR files such as
 'ctakes-clinical-pipeline', 'context dependent tokenizer', 'core', 'dictionary lookup',
 'LVG' and 'NE contexts'.


 The pipeline can process two types of documents
  - plaintext files
  - Clinical Document Architecture (CDA) XML files that conform to the DTD provided


 %%%%%%%%%%%%%%%%%%%%%%%%%
 AggregateTAE.xml  for CDA documents conforming to the provided DTD.

 The file desc/analysis_engine/AggregateTAE.xml is the aggregate
 analysis engine to use to run the entire pipeline, including the
 CdaCasInitialzer analysis engine, which reads CDA documents that conform
 to the DTD provided, and create Segment annotations based on the sections
 within the CDA document.

 Open this file using the Component Descriptor Editor as described in the tutorial.
 Click on the tab labeled "Aggregate" to observe that the Component Engine Flow (pipeline)
 defined by this descriptor includes CdaCasInitialzer as the first component.
 Observe that part of speech tagging (POSTagger) comes before chunking (Chunker), etc.

 Click on the tab labeled "Parameter Settings" to view the parameters set in this
 descriptor.  The 'medicationRelatedSection' is *not* set (generally set to
 20104, 20133, 20147 for Mayo Corpus) in the default implementation. If this parameter is left blank,
 all sections will be treated as narrative sections and if these sections do contain Drugs in list format
 the accuracy for identifying Drug mentions and its attributes may not be acceptable.
 It is recommended to specify section ids that contain drugs in a list format if
 such sections are available.

 Another parameter that relates to aforementioned 'medicationRelatedSection' is 'sectionOverrideSet'.
 This parameter specifies the section ids where DrugLookupWindow annotations will span the complete
 span of text of the specified section. The 'sectionOverrideSet' is *not* set (generally set to
 20104, 20133, 20147 for Mayo Corpus) in the default implementation. If this parameter is left blank,
 all sections will be treated as narrative sections and if these sections do contain Drugs in list format
 the accuracy for identifying Drug mentions and its attributes may not be acceptable.
 It is recommended to specify section ids that contain drugs in a list format if
 such sections are available.

 If you are not planning to use CDA documents as input, but rather plain text documents, and you
 prefer the entire document's contents be handled as lists, rather than narrative, then 'SIMPLE_SEGMENT'
 can be entered into the 'medicationRelatedSection' or 'sectionOverrideSet' (see the tutorial for
 additional information on adding the 'SIMPLE_SEGMENT' to the Compenent Engine Flow (pipeline)).

 The parameters are:
 DrugMentionAnnotator.xml
 - medicationRelatedSection -  IDs of sections generated by your Segment Annotator where
                               drug mentions appear in a list format.

 DrugCNP2LookupWindow.xml
 - sectionOverrideSet - IDs of sections (or segments) where the complete section will be treated
                        as DrugLookupWindow which is designed to process medications or drugs in
                        'list format'.

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 AggregatePlaintextProcessor.xml  for plaintext documents.

 The file desc/analysis_engine/AggregatePlaintextProcessor.xml is the aggregate
 analysis engine to use to run the entire pipeline, including the
 SimpleSegmentAnnotator analysis engine, which creates a Segment annotation that
 wraps the entire plaintext document.  Other annotators in the pipeline require
 at least 1 Segment annotation.

 Click on the tab labeled "Parameter Settings" to view the parameters set in this
 descriptor.  The 'medicationRelatedSection' is set to 20104, 20133, 20147. These
 are section ids specific to Mayo's CDA documents. These section ids must be changed
 to match the ids generated by your Segment Annotator.

 The parameters are:
 - SegmentID - the identifier or name to assign to the Segment annotation
 - medicationRelatedSection -  IDs of sections generated by your Segment Annotator where
                               drug mentions appear in a list format.

 ############
 Fixes
 ############

 Version 1.2.1 -
 	- Fix problem where drug mentions are not aligned correctly with named entity parent
 	- Fix issue where drug change status increase/decrease/change not correctly creating a noChange mention and/or assigned incorrect attributes.
	Contents
	- Introduction
	- Running the Drug NER pipeline
	- AggregateTAE.xml for CDA documents conforming to the provided DTD.
	- AggregatePlaintextProcessor.xml for plaintext documents.
	- Fixes
	- Version 1.2.1

	############
	Introduction
	############

	This project adds the ability to identify attributes of drug mentions such as Dosage, Frequency, Frequency Unit,
	Route and Strength from either plaintext or CDA documents. It also provides the ability to specify which sections
	of a note contain drugs in a list format versus drug mentions within the narrative of the note. This allows for
	customized processing done on different sections and generally improves the quality of the annotations.
	This project utilizes various cTAKES components and hence requires cTAKES to be installed prior to using this component.


	############################################################################
	Running the ctakes-clinical-pipeline
	############################################################################

	This project (or PEAR file), the "Drug NER", relies on other projects/PEAR files such as
	'ctakes-clinical-pipeline', 'context dependent tokenizer', 'core', 'dictionary lookup',
	'LVG' and 'NE contexts'.


	The pipeline can process two types of documents
	- plaintext files
	- Clinical Document Architecture (CDA) XML files that conform to the DTD provided


	%%%%%%%%%%%%%%%%%%%%%%%%%
	AggregateTAE.xml for CDA documents conforming to the provided DTD.

	The file desc/analysis_engine/AggregateTAE.xml is the aggregate
	analysis engine to use to run the entire pipeline, including the
	CdaCasInitialzer analysis engine, which reads CDA documents that conform
	to the DTD provided, and create Segment annotations based on the sections
	within the CDA document.

	Open this file using the Component Descriptor Editor as described in the tutorial.
	Click on the tab labeled "Aggregate" to observe that the Component Engine Flow (pipeline)
	defined by this descriptor includes CdaCasInitialzer as the first component.
	Observe that part of speech tagging (POSTagger) comes before chunking (Chunker), etc.

	Click on the tab labeled "Parameter Settings" to view the parameters set in this
	descriptor. The 'medicationRelatedSection' is not set (generally set to
	20104, 20133, 20147 for Mayo Corpus) in the default implementation. If this parameter is left blank,
	all sections will be treated as narrative sections and if these sections do contain Drugs in list format
	the accuracy for identifying Drug mentions and its attributes may not be acceptable.
	It is recommended to specify section ids that contain drugs in a list format if
	such sections are available.

	Another parameter that relates to aforementioned 'medicationRelatedSection' is 'sectionOverrideSet'.
	This parameter specifies the section ids where DrugLookupWindow annotations will span the complete
	span of text of the specified section. The 'sectionOverrideSet' is not set (generally set to
	20104, 20133, 20147 for Mayo Corpus) in the default implementation. If this parameter is left blank,
	all sections will be treated as narrative sections and if these sections do contain Drugs in list format
	the accuracy for identifying Drug mentions and its attributes may not be acceptable.
	It is recommended to specify section ids that contain drugs in a list format if
	such sections are available.

	If you are not planning to use CDA documents as input, but rather plain text documents, and you
	prefer the entire document's contents be handled as lists, rather than narrative, then 'SIMPLE_SEGMENT'
	can be entered into the 'medicationRelatedSection' or 'sectionOverrideSet' (see the tutorial for
	additional information on adding the 'SIMPLE_SEGMENT' to the Compenent Engine Flow (pipeline)).

	The parameters are:
	DrugMentionAnnotator.xml
	- medicationRelatedSection - IDs of sections generated by your Segment Annotator where
	drug mentions appear in a list format.

	DrugCNP2LookupWindow.xml
	- sectionOverrideSet - IDs of sections (or segments) where the complete section will be treated
	as DrugLookupWindow which is designed to process medications or drugs in
	'list format'.

	%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
	AggregatePlaintextProcessor.xml for plaintext documents.

	The file desc/analysis_engine/AggregatePlaintextProcessor.xml is the aggregate
	analysis engine to use to run the entire pipeline, including the
	SimpleSegmentAnnotator analysis engine, which creates a Segment annotation that
	wraps the entire plaintext document. Other annotators in the pipeline require
	at least 1 Segment annotation.

	Click on the tab labeled "Parameter Settings" to view the parameters set in this
	descriptor. The 'medicationRelatedSection' is set to 20104, 20133, 20147. These
	are section ids specific to Mayo's CDA documents. These section ids must be changed
	to match the ids generated by your Segment Annotator.

	The parameters are:
	- SegmentID - the identifier or name to assign to the Segment annotation
	- medicationRelatedSection - IDs of sections generated by your Segment Annotator where
	drug mentions appear in a list format.

	############
	Fixes
	############

	Version 1.2.1 -
	- Fix problem where drug mentions are not aligned correctly with named entity parent
	- Fix issue where drug change status increase/decrease/change not correctly creating a noChange mention and/or assigned incorrect attributes.