ctakes-dictionary-lookup-fast - ctakes

tree: 6ef57e6206b11bd9ce0c381b3556228211e745c8 [path history] [tgz]

ctakes-dictionary-lookup-fast/README.md

The fast dictionary lookup annotator identifies terms in text and normalizes them to codes in an ontology: UMLS CUI, Snomed-CT, RxNorm, etc. The fast dictionary lookup module comes with multiple possible pre-packaged configurations and is also customizable and extendable.

A Parse Dictionary Descriptor file
B Create Dictionaries and Concept Factories

Get Lookup Windows from CAS
For each Lookup window, get candidate Lookup Tokens
For each Lookup Token, get matches in Dictionary Index
For each Token match, check Lookup Window for Full Text match
For each Full Text match, create Concepts
Store appropriate Concepts in CAS as Annotations

Structure Diagram

The main descriptor ...-fast/desc/analysis_engine/UmlsLookupAnnotator.xml
The resource (dictionary) configuration file resources/.../dictionary/lookup/fast/sno_rx_16ab.xml (The file name might be different if you created your own custom dictionary)

Not all terms are within Noun Phrases
Some Noun Phrases overlapped, causing repeated lookups (in my 3.0 candidate trials)
Not all cTakes Noun Phrases are accurate.

Because the lookup is fast, using a full Sentence for lookup doesn't seem to hurt much. However, you can always switch it back to see if precision is increased enough to warrant the decrease in recall. This is changed in UmlsLookupAnnotator.xml.