tree: 6ef57e6206b11bd9ce0c381b3556228211e745c8 [path history] [tgz]
  1. desc/
  2. doc/
  3. src/
  4. LICENSE
  5. pom.xml
  6. README.md
ctakes-dictionary-lookup-fast/README.md

The fast dictionary lookup annotator identifies terms in text and normalizes them to codes in an ontology: UMLS CUI, Snomed-CT, RxNorm, etc. The fast dictionary lookup module comes with multiple possible pre-packaged configurations and is also customizable and extendable.

  • A Parse Dictionary Descriptor file
  • B Create Dictionaries and Concept Factories
  1. Get Lookup Windows from CAS
  2. For each Lookup window, get candidate Lookup Tokens
  3. For each Lookup Token, get matches in Dictionary Index
  4. For each Token match, check Lookup Window for Full Text match
  5. For each Full Text match, create Concepts
  6. Store appropriate Concepts in CAS as Annotations

Structure Diagram

  1. The main descriptor ...-fast/desc/analysis_engine/UmlsLookupAnnotator.xml
  2. The resource (dictionary) configuration file resources/.../dictionary/lookup/fast/sno_rx_16ab.xml (The file name might be different if you created your own custom dictionary)
  1. Not all terms are within Noun Phrases
  2. Some Noun Phrases overlapped, causing repeated lookups (in my 3.0 candidate trials)
  3. Not all cTakes Noun Phrases are accurate.

Because the lookup is fast, using a full Sentence for lookup doesn't seem to hurt much. However, you can always switch it back to see if precision is increased enough to warrant the decrease in recall. This is changed in UmlsLookupAnnotator.xml.