ctakes-lvg/README - ctakes - Git at Google

 # This module has been deprecated.  It is much faster to precompute lexical variants for your dictionary.


 Contents
 - Introduction
 - Description of resources
     - lvg.properties
     - LVG database
 - Running the LVG annotator
 	- LvgAnnotator.xml
 	- AggregateAE.xml

 ############
 Introduction
 ############

 This annotator wraps the National Library of Medicine (NLM) SPECIALIST lexical tools.

 See the cTAKES Wiki for the latest information about this annotator:
 https://cwiki.apache.org/confluence/display/CTAKES/cTAKES

 Documentation for the SPECIALIST lexical tools is at:
 https://lsg3.nlm.nih.gov/Specialist/Home/index.html

 Documentation for Lvg and Norm can be found at:
 https://lsg3.nlm.nih.gov/LexSysGroup/Projects/lvg/current/web/index.html

 This annotator generates a canonical form for each word and also generates a list of lemma
 entries with Penn Treebank tags.  These tags could be useful for a part of speech (POS) tagger.

 However, for the OpenNLP POS tagger, cTAKES uses a tag dictionary rather than lemma information.
 See the documentation for the POS tagger annotator.


 ########################
 Description of resources
 ########################

 %%%%%%%%%%%%%%%%
 lvg.properties
 %%%%%%%%%%%%%%%%
 The LVG configuration file lvg.properties defines the location
 and attributes of the LVG database and the jdbc driver used.

 %%%%%%%%%%%%%%%%
 LVG database
 %%%%%%%%%%%%%%%%

 The database engine used is hsqldb.

 The LVG database available from the NLM is hundreds of megabytes.  To keep this
 project relatively small, the database tables included with this project have a
 relatively small number of rows.

 #########################
 Running the LVG annotator
 #########################

 %%%%%%%%%%%%%%%%
 LvgAnnotator.xml
 %%%%%%%%%%%%%%%%

 The parameters are:
   UseSegments - controls whether only certain sections will be annotated by this annotator
   SegmentsToSkip - list of sections not to be processed by this annotator
   UseCmdCache - controls whether to look up information in a cache before using norm
   CmdCacheFileLocation - location of norm cache file
   CmdCacheFrequencyCutoff -
   ExclusionSet - words for which canonicalForm is never set and Lemma entries are never posted
   XeroxTreebankMap - mapping of part of speech tags, used to POS tags from lexical tools to Penn Treebank tags
   PostLemmas - controls whether any lemma entries are posted to the CAS
   UseLemmaCache - controls whether to look up lemma information in a cache before using lvg
   LemmaCacheFileLocation - the location of the cache file
   LemmaCacheFileFrequencyCutoff -

 Note: as distributed, PostLemmas is set to false.  This is done to reduce the size of the CAS.
 Set PostLemmas to true to have org.apache.ctakes.typesystem.type.Lemma annotations added to the CAS.
	# This module has been deprecated. It is much faster to precompute lexical variants for your dictionary.



	Contents
	- Introduction
	- Description of resources
	- lvg.properties
	- LVG database
	- Running the LVG annotator
	- LvgAnnotator.xml
	- AggregateAE.xml

	############
	Introduction
	############

	This annotator wraps the National Library of Medicine (NLM) SPECIALIST lexical tools.

	See the cTAKES Wiki for the latest information about this annotator:
	https://cwiki.apache.org/confluence/display/CTAKES/cTAKES

	Documentation for the SPECIALIST lexical tools is at:
	https://lsg3.nlm.nih.gov/Specialist/Home/index.html

	Documentation for Lvg and Norm can be found at:
	https://lsg3.nlm.nih.gov/LexSysGroup/Projects/lvg/current/web/index.html

	This annotator generates a canonical form for each word and also generates a list of lemma
	entries with Penn Treebank tags. These tags could be useful for a part of speech (POS) tagger.

	However, for the OpenNLP POS tagger, cTAKES uses a tag dictionary rather than lemma information.
	See the documentation for the POS tagger annotator.


	########################
	Description of resources
	########################

	%%%%%%%%%%%%%%%%
	lvg.properties
	%%%%%%%%%%%%%%%%
	The LVG configuration file lvg.properties defines the location
	and attributes of the LVG database and the jdbc driver used.

	%%%%%%%%%%%%%%%%
	LVG database
	%%%%%%%%%%%%%%%%

	The database engine used is hsqldb.

	The LVG database available from the NLM is hundreds of megabytes. To keep this
	project relatively small, the database tables included with this project have a
	relatively small number of rows.

	#########################
	Running the LVG annotator
	#########################

	%%%%%%%%%%%%%%%%
	LvgAnnotator.xml
	%%%%%%%%%%%%%%%%

	The parameters are:
	UseSegments - controls whether only certain sections will be annotated by this annotator
	SegmentsToSkip - list of sections not to be processed by this annotator
	UseCmdCache - controls whether to look up information in a cache before using norm
	CmdCacheFileLocation - location of norm cache file
	CmdCacheFrequencyCutoff -
	ExclusionSet - words for which canonicalForm is never set and Lemma entries are never posted
	XeroxTreebankMap - mapping of part of speech tags, used to POS tags from lexical tools to Penn Treebank tags
	PostLemmas - controls whether any lemma entries are posted to the CAS
	UseLemmaCache - controls whether to look up lemma information in a cache before using lvg
	LemmaCacheFileLocation - the location of the cache file
	LemmaCacheFileFrequencyCutoff -

	Note: as distributed, PostLemmas is set to false. This is done to reduce the size of the CAS.
	Set PostLemmas to true to have org.apache.ctakes.typesystem.type.Lemma annotations added to the CAS.