blob: 35897de9826ac5bf3074f11499b627d67741f531 [file] [log] [blame]
Contents
- Introduction
- Running the Drug NER pipeline
- AggregateTAE.xml for CDA documents conforming to the provided DTD.
- AggregatePlaintextProcessor.xml for plaintext documents.
- Fixes
- Version 1.2.1
############
Introduction
############
This project adds the ability to identify attributes of drug mentions such as Dosage, Frequency, Frequency Unit,
Route and Strength from either plaintext or CDA documents. It also provides the ability to specify which sections
of a note contain drugs in a list format versus drug mentions within the narrative of the note. This allows for
customized processing done on different sections and generally improves the quality of the annotations.
This project utilizes various cTAKES components and hence requires cTAKES to be installed prior to using this component.
############################################################################
Running the ctakes-clinical-pipeline
############################################################################
This project (or PEAR file), the "Drug NER", relies on other projects/PEAR files such as
'ctakes-clinical-pipeline', 'context dependent tokenizer', 'core', 'dictionary lookup',
'LVG' and 'NE contexts'.
The pipeline can process two types of documents
- plaintext files
- Clinical Document Architecture (CDA) XML files that conform to the DTD provided
%%%%%%%%%%%%%%%%%%%%%%%%%
AggregateTAE.xml for CDA documents conforming to the provided DTD.
The file desc/analysis_engine/AggregateTAE.xml is the aggregate
analysis engine to use to run the entire pipeline, including the
CdaCasInitialzer analysis engine, which reads CDA documents that conform
to the DTD provided, and create Segment annotations based on the sections
within the CDA document.
Open this file using the Component Descriptor Editor as described in the tutorial.
Click on the tab labeled "Aggregate" to observe that the Component Engine Flow (pipeline)
defined by this descriptor includes CdaCasInitialzer as the first component.
Observe that part of speech tagging (POSTagger) comes before chunking (Chunker), etc.
Click on the tab labeled "Parameter Settings" to view the parameters set in this
descriptor. The 'medicationRelatedSection' is *not* set (generally set to
20104, 20133, 20147 for Mayo Corpus) in the default implementation. If this parameter is left blank,
all sections will be treated as narrative sections and if these sections do contain Drugs in list format
the accuracy for identifying Drug mentions and its attributes may not be acceptable.
It is recommended to specify section ids that contain drugs in a list format if
such sections are available.
Another parameter that relates to aforementioned 'medicationRelatedSection' is 'sectionOverrideSet'.
This parameter specifies the section ids where DrugLookupWindow annotations will span the complete
span of text of the specified section. The 'sectionOverrideSet' is *not* set (generally set to
20104, 20133, 20147 for Mayo Corpus) in the default implementation. If this parameter is left blank,
all sections will be treated as narrative sections and if these sections do contain Drugs in list format
the accuracy for identifying Drug mentions and its attributes may not be acceptable.
It is recommended to specify section ids that contain drugs in a list format if
such sections are available.
If you are not planning to use CDA documents as input, but rather plain text documents, and you
prefer the entire document's contents be handled as lists, rather than narrative, then 'SIMPLE_SEGMENT'
can be entered into the 'medicationRelatedSection' or 'sectionOverrideSet' (see the tutorial for
additional information on adding the 'SIMPLE_SEGMENT' to the Compenent Engine Flow (pipeline)).
The parameters are:
DrugMentionAnnotator.xml
- medicationRelatedSection - IDs of sections generated by your Segment Annotator where
drug mentions appear in a list format.
DrugCNP2LookupWindow.xml
- sectionOverrideSet - IDs of sections (or segments) where the complete section will be treated
as DrugLookupWindow which is designed to process medications or drugs in
'list format'.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
AggregatePlaintextProcessor.xml for plaintext documents.
The file desc/analysis_engine/AggregatePlaintextProcessor.xml is the aggregate
analysis engine to use to run the entire pipeline, including the
SimpleSegmentAnnotator analysis engine, which creates a Segment annotation that
wraps the entire plaintext document. Other annotators in the pipeline require
at least 1 Segment annotation.
Click on the tab labeled "Parameter Settings" to view the parameters set in this
descriptor. The 'medicationRelatedSection' is set to 20104, 20133, 20147. These
are section ids specific to Mayo's CDA documents. These section ids must be changed
to match the ids generated by your Segment Annotator.
The parameters are:
- SegmentID - the identifier or name to assign to the Segment annotation
- medicationRelatedSection - IDs of sections generated by your Segment Annotator where
drug mentions appear in a list format.
############
Fixes
############
Version 1.2.1 -
- Fix problem where drug mentions are not aligned correctly with named entity parent
- Fix issue where drug change status increase/decrease/change not correctly creating a noChange mention and/or assigned incorrect attributes.