| # This module has been deprecated. It is much faster to precompute lexical variants for your dictionary. |
| |
| |
| |
| Contents |
| - Introduction |
| - Description of resources |
| - lvg.properties |
| - LVG database |
| - Running the LVG annotator |
| - LvgAnnotator.xml |
| - AggregateAE.xml |
| |
| ############ |
| Introduction |
| ############ |
| |
| This annotator wraps the National Library of Medicine (NLM) SPECIALIST lexical tools. |
| |
| See the cTAKES Wiki for the latest information about this annotator: |
| https://cwiki.apache.org/confluence/display/CTAKES/cTAKES |
| |
| Documentation for the SPECIALIST lexical tools is at: |
| https://lsg3.nlm.nih.gov/Specialist/Home/index.html |
| |
| Documentation for Lvg and Norm can be found at: |
| https://lsg3.nlm.nih.gov/LexSysGroup/Projects/lvg/current/web/index.html |
| |
| This annotator generates a canonical form for each word and also generates a list of lemma |
| entries with Penn Treebank tags. These tags could be useful for a part of speech (POS) tagger. |
| |
| However, for the OpenNLP POS tagger, cTAKES uses a tag dictionary rather than lemma information. |
| See the documentation for the POS tagger annotator. |
| |
| |
| ######################## |
| Description of resources |
| ######################## |
| |
| %%%%%%%%%%%%%%%% |
| lvg.properties |
| %%%%%%%%%%%%%%%% |
| The LVG configuration file lvg.properties defines the location |
| and attributes of the LVG database and the jdbc driver used. |
| |
| %%%%%%%%%%%%%%%% |
| LVG database |
| %%%%%%%%%%%%%%%% |
| |
| The database engine used is hsqldb. |
| |
| The LVG database available from the NLM is hundreds of megabytes. To keep this |
| project relatively small, the database tables included with this project have a |
| relatively small number of rows. |
| |
| ######################### |
| Running the LVG annotator |
| ######################### |
| |
| %%%%%%%%%%%%%%%% |
| LvgAnnotator.xml |
| %%%%%%%%%%%%%%%% |
| |
| The parameters are: |
| UseSegments - controls whether only certain sections will be annotated by this annotator |
| SegmentsToSkip - list of sections not to be processed by this annotator |
| UseCmdCache - controls whether to look up information in a cache before using norm |
| CmdCacheFileLocation - location of norm cache file |
| CmdCacheFrequencyCutoff - |
| ExclusionSet - words for which canonicalForm is never set and Lemma entries are never posted |
| XeroxTreebankMap - mapping of part of speech tags, used to POS tags from lexical tools to Penn Treebank tags |
| PostLemmas - controls whether any lemma entries are posted to the CAS |
| UseLemmaCache - controls whether to look up lemma information in a cache before using lvg |
| LemmaCacheFileLocation - the location of the cache file |
| LemmaCacheFileFrequencyCutoff - |
| |
| Note: as distributed, PostLemmas is set to false. This is done to reduce the size of the CAS. |
| Set PostLemmas to true to have org.apache.ctakes.typesystem.type.Lemma annotations added to the CAS. |
| |