ctakes-dictionary-lookup - ctakes

tree: 5ce431d5b916bb592d349606515c318e9992d39d [path history] [tgz]

ctakes-dictionary-lookup/README.md

This project includes two sample dictionaries.
A sample database (a Lucene index) containing a few drug names is included for running the examples.
A sample database (using 2 Lucene indexes) containing a few anatomical sites, procedures, and disorders/diseases is included for running the examples.
The programs used to create these Lucene indexes are
scripts/java/edu/mayo/bmi/dictionarytools/CreateLuceneIndexForExampleDrugs.java scripts/java/edu/mayo/bmi/dictionarytools/CreateLuceneIndexForSnomedLikeSample.java

Creating your own dictionaries

To create a more complete dictionary of drug concepts, you could download a copy of the UMLS Metathesaurus and build upon the program mentioned above to create a lucene index of the complete RxNorm or another source of drug concepts.
Or you could use a different program in that same package that reads from a pipe-delimited file: scripts/java/edu/mayo/bmi/dictionarytools/CreateLuceneIndexFromDelimitedFile
The pipe-delimited file should contain lines in the following format.
CUI|drug name aka description|terminology aka source|codeInThatSource|PreferredIndicator|TUI
Where PreferredIndicator = P if the name is the preferred name for the drug.
For example, if you want include terms from semantic type “Biomedical or Dental Material” (TUI T122), one line in the file you create should be:
C1154185|Topical Spray|RXNORM|346165|P|T122
The CreateLuceneIndexFromDelimitedFile class could then be used to create a lucene index from the data in the file.

To create a more complete dictionary of anatomical sites, procedures, signs/symptoms, and/or diseases/disorders, you could download a copy of the UMLS Metathesaurus and build upon (add code to) the CreateLuceneIndexForSnomedLikeSample class to create 2 lucene indexes - one Lucene index for the concepts and their CUIs, and one that maps the codes from the source(s) to the CUIs.

Alternatively you could create and populate a database with the following two tables
umls_ms_2005 (or whatever name you specify within LookupDesc_Db.xml) with columns “fword” “cui” “tui” “text” umls_snomed_map with columns “cui” “code”

Configuring the annotator to use your dictionary

These steps are not necessary in order to run the pipeline with the very small sample dictionaries that are included with this project.
If you created your own dictionary(s) as outlined above, here is how you could configure this annotator to use your dictionary(s).
If you created a lucene index directory called drug_index, within descriptor DictionaryLookupAnnotator.xml
you could update the value of the IndexDirectory for external resource RxnormIndex to reference the location of your drug_index directory. Recall you need to use a text editor or you need to be on tab Source to edit this portion of that descriptor since it is within a configurableDataResourceSpecifier.
Alternatively, you could simply replace the contents of directory dictionary lookup/resources/lookup/drug_index with the contents of the lucene index directory you created.
If you created 2 lucene index directories using CreateLuceneIndexForSnomedLikeSample.java, you could simply replace the contents of the two directories dictionary lookup/resources/lookup/snomed-like_sample dictionary lookup/resources/lookup/snomed-like_codes_sample with the contents of the lucene index directories you created.

Alternatively, if you created database tables umls_snomed_map and umls_ms_2005 as outlined above, you could do the following steps

Replace the use of DictionaryLookupAnnotator.xml with DictionaryLookupAnnotatorDB.xml in your pipeline (in your aggregate flow. e.g. in AggregatePlaintextProcessor.xml) The class UmlsToSnomedDbConsumerImpl.java that is used in this case is included with this distribution.
Update some values within DictionaryLookupAnnotatorDB.xml for your environment 2a) Username 2b) Password 2c) DriverClassName 2d) URL