opennlp-docs/src/docbkx/lemmatizer.xml - opennlp - Git at Google

 <?xml version="1.0" encoding="UTF-8"?>
 <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
 "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[
 ]>
 <!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor
 	license agreements. See the NOTICE file distributed with this work for additional
 	information regarding copyright ownership. The ASF licenses this file to
 	you under the Apache License, Version 2.0 (the "License"); you may not use
 	this file except in compliance with the License. You may obtain a copy of
 	the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required
 	by applicable law or agreed to in writing, software distributed under the
 	License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS
 	OF ANY KIND, either express or implied. See the License for the specific
 	language governing permissions and limitations under the License. -->

 <chapter id="tools.lemmatizer">
 	<title>Lemmatizer</title>
 		<para>
 			The lemmatizer returns, for a given word form (token) and Part of Speech
 			tag,
 			the dictionary form of a word, which is usually referred to as its
 			lemma. A token could
 			ambiguously be derived from several basic forms or dictionary words which is why
 			the
 			postag of the word is required to find the lemma. For example, the form
 			`show' may refer
 			to either the verb "to show" or to the noun "show".
 			Currently OpenNLP implement statistical and dictionary-based lemmatizers.
 		</para>
 		<section id="tools.lemmatizer.tagging.cmdline">
 			<title>Lemmatizer Tool</title>
 			<para>
 				The easiest way to try out the Lemmatizer is the command line tool,
 				which provides access to the statistical
 				lemmatizer. Note that the tool is only intended for demonstration and testing.
 			</para>
 			<para>
 				Once you have trained a lemmatizer model (see below for instructions),
 				you can start the Lemmatizer Tool with this command:
 			</para>
 			<para>
 				<screen>
 		   <![CDATA[
 $ opennlp LemmatizerME en-lemmatizer.bin < sentences]]>
 		  </screen>
 				The Lemmatizer now reads a pos tagged sentence(s) per line from
 				standard input. For example, you can copy this sentence to the
 				console:
 				<screen>
 		    <![CDATA[
 Rockwell_NNP International_NNP Corp._NNP 's_POS Tulsa_NNP unit_NN said_VBD it_PRP
 signed_VBD a_DT tentative_JJ agreement_NN extending_VBG its_PRP$ contract_NN with_IN
 Boeing_NNP Co._NNP to_TO provide_VB structural_JJ parts_NNS for_IN Boeing_NNP 's_POS
 747_CD jetliners_NNS ._.]]>
 		  </screen>
 				The Lemmatizer will now echo the lemmas for each word postag pair to
 				the console:
 				<screen>
 		    <![CDATA[
 Rockwell NNP rockwell
 International NNP international
 Corp. NNP corp.
 's POS 's
 Tulsa NNP tulsa
 unit NN unit
 said VBD say
 it PRP it
 signed VBD sign
 ...
 ]]>
 		  </screen>
 			</para>
 		</section>
 		<section id="tools.lemmatizer.tagging.api">
 			<title>Lemmatizer API</title>
 			<para>
 				The Lemmatizer can be embedded into an application via its API.
 				Currently a statistical
 				and DictionaryLemmatizer are available. Note that these two methods are
 				complementary and
 				the DictionaryLemmatizer can also be used as a way of post-processing
 				the output of the statistical
 				lemmatizer.
 			</para>
 			<para>
 				The statistical lemmatizer requires that a trained model is loaded
 				into memory from disk or from another source.
 				In the example below it is loaded from disk:
 				<programlisting language="java">
 		<![CDATA[
 LemmatizerModel model = null;
 try (InputStream modelIn = new FileInputStream("en-lemmatizer.bin"))) {
   model = new LemmatizerModel(modelIn);
 }
 ]]>
 			</programlisting>
 				After the model is loaded a LemmatizerME can be instantiated.
 				<programlisting language="java">
 				<![CDATA[
 LemmatizerME lemmatizer = new LemmatizerME(model);]]>
 			</programlisting>
 				The Lemmatizer instance is now ready to lemmatize data. It expects a
 				tokenized sentence
 				as input, which is represented as a String array, each String object
 				in the array
 				is one token, and the POS tags associated with each token.
 			</para>
 			<para>
 				The following code shows how to determine the most likely lemma for
 				a sentence.
 				<programlisting language="java">
 		  <![CDATA[
 String[] tokens = new String[] { "Rockwell", "International", "Corp.", "'s",
     "Tulsa", "unit", "said", "it", "signed", "a", "tentative", "agreement",
     "extending", "its", "contract", "with", "Boeing", "Co.", "to",
     "provide", "structural", "parts", "for", "Boeing", "'s", "747",
     "jetliners", "." };

 String[] postags = new String[] { "NNP", "NNP", "NNP", "POS", "NNP", "NN",
     "VBD", "PRP", "VBD", "DT", "JJ", "NN", "VBG", "PRP$", "NN", "IN",
     "NNP", "NNP", "TO", "VB", "JJ", "NNS", "IN", "NNP", "POS", "CD", "NNS",
     "." };

 String[] lemmas = lemmatizer.lemmatize(tokens, postags);]]>
 		</programlisting>
 				The lemmas array contains one lemma for each token in the
 				input array. The corresponding
 				tag and lemma can be found at the same index as the token has in the
 				input array.
 			</para>

 			<para>
 				The DictionaryLemmatizer is constructed
 				by passing the InputStream of a lemmatizer dictionary. Such dictionary
 				consists of a text file containing, for each row, a word, its postag and the
 				corresponding lemma, each column separated by a tab character.
 				<screen>
 		<![CDATA[
 show		NN	show
 showcase	NN	showcase
 showcases	NNS	showcase
 showdown	NN	showdown
 showdowns	NNS	showdown
 shower		NN	shower
 showers		NNS	shower
 showman		NN	showman
 showmanship	NN	showmanship
 showmen		NNS	showman
 showroom	NN	showroom
 showrooms	NNS	showroom
 shows		NNS	show
 shrapnel	NN	shrapnel
 		]]>
 		</screen>
 				Alternatively, if a (word,postag) pair can output multiple lemmas, the
 				the lemmatizer dictionary would consists of a text file containing, for
 				each row, a word, its postag and the corresponding lemmas separated by "#":
 				<screen>
 		<![CDATA[
 muestras	NN	muestra
 cantaba		V	cantar
 fue		V	ir#ser
 entramos	V	entrar
 		]]>
 					</screen>
 				First the dictionary must be loaded into memory from disk or another
 				source.
 				In the sample below it is loaded from disk.
 				<programlisting language="java">
 				<![CDATA[
 InputStream dictLemmatizer = null;

 try (dictLemmatizer = new FileInputStream("english-lemmatizer.txt")) {

 }
 ]]>
 			</programlisting>
 				After the dictionary is loaded the DictionaryLemmatizer can be
 				instantiated.
 				<programlisting language="java">
 			  <![CDATA[
 DictionaryLemmatizer lemmatizer = new DictionaryLemmatizer(dictLemmatizer);]]>
 			</programlisting>
 				The DictionaryLemmatizer instance is now ready. It expects two
 				String arrays as input,
 				a containing the tokens and another one their respective postags.
 			</para>
 			<para>
 				The following code shows how to find a lemma using a
 				DictionaryLemmatizer.
 				<programlisting language="java">
 		  <![CDATA[
 String[] tokens = new String[]{"Most", "large", "cities", "in", "the", "US", "had",
                              "morning", "and", "afternoon", "newspapers", "."};
 String[] tags = tagger.tag(sent);
 String[] lemmas = lemmatizer.lemmatize(tokens, postags);
 ]]>
 			</programlisting>
 				The tags array contains one part-of-speech tag for each token in the
 				input array. The corresponding
 				tag and lemmas can be found at the same index as the token has in the
 				input array.
 			</para>
 		</section>
 		<section id="tools.lemmatizer.training">
 			<title>Lemmatizer Training</title>
 			<para>
 				The training data consist of three columns separated by spaces. Each
 				word has been put on a
 				separate line and there is an empty line after each sentence. The first
 				column contains
 				the current word, the second its part-of-speech tag and the third its
 				lemma.
 				Here is an example of the file format:
 			</para>
 			<para>
 				Sample sentence of the training data:
 				<screen>
 		<![CDATA[
 He        PRP  he
 reckons   VBZ  reckon
 the       DT   the
 current   JJ   current
 accounts  NNS  account
 deficit   NN   deficit
 will      MD   will
 narrow    VB   narrow
 to        TO   to
 only      RB   only
 #         #    #
 1.8       CD   1.8
 millions  CD   million
 in        IN   in
 September NNP  september
 .         .    O]]>
 		</screen>
 				The Universal Dependencies Treebank and the CoNLL 2009 datasets
 				distribute training data for many languages.
 			</para>
 			<section id="tools.lemmatizer.training.tool">
 				<title>Training Tool</title>
 				<para>
 					OpenNLP has a command line tool which is used to train the models on
 					various corpora.
 				</para>
 				<para>
 					Usage of the tool:
 					<screen>
 		<![CDATA[
 $ opennlp LemmatizerTrainerME
 Usage: opennlp LemmatizerTrainerME [-factory factoryName] [-params paramsFile] -lang language -model modelFile -data sampleData [-encoding charsetName]

 Arguments description:
 	-factory factoryName
 		A sub-class of LemmatizerFactory where to get implementation and resources.
 	-params paramsFile
 		training parameters file.
 	-lang language
 		language which is being processed.
 	-model modelFile
 		output model file.
 	-data sampleData
 		data to be used, usually a file name.
 	-encoding charsetName
 	encoding for reading and writing text, if absent the system default is used.
 		]]>
 		</screen>
 					Its now assumed that the english lemmatizer model should be trained
 					from a file called
 					en-lemmatizer.train which is encoded as UTF-8. The following command will train the
 					lemmatizer and write the model to en-lemmatizer.bin:
 					<screen>
 		<![CDATA[
 $ opennlp LemmatizerTrainerME -model en-lemmatizer.bin -params PerceptronTrainerParams.txt -lang en -data en-lemmatizer.train -encoding UTF-8]]>
 		</screen>
 				</para>
 			</section>
 			<section id="tools.lemmatizer.training.api">
 				<title>Training API</title>
 				<para>
 					The Lemmatizer offers an API to train a new lemmatizer model. First
 					a training parameters
 					file needs to be instantiated:
 					<programlisting language="java">
                     <![CDATA[
  TrainingParameters mlParams = CmdLineUtil.loadTrainingParameters(params.getParams(), false);
  if (mlParams == null) {
    mlParams = ModelUtil.createDefaultTrainingParameters();
  }]]>
                 </programlisting>
 					Then we read the training data:
 					<programlisting language="java">
                     <![CDATA[
 InputStreamFactory inputStreamFactory = null;
     try {
       inputStreamFactory = new MarkableFileInputStreamFactory(
           new File(en-lemmatizer.train));
     } catch (FileNotFoundException e) {
       e.printStackTrace();
     }
     ObjectStream<String> lineStream = null;
     LemmaSampleStream lemmaStream = null;
     try {
       lineStream = new PlainTextByLineStream(
       (inputStreamFactory), "UTF-8");
       lemmaStream = new LemmaSampleStream(lineStream);
     } catch (IOException e) {
       CmdLineUtil.handleCreateObjectStreamError(e);
     }
 ]]>
                 </programlisting>
 					The following step proceeds to train the model:
 					<programlisting>
     LemmatizerModel model;
     try {
       LemmatizerFactory lemmatizerFactory = LemmatizerFactory
           .create(params.getFactory());
       model = LemmatizerME.train(params.getLang(), lemmaStream, mlParams,
           lemmatizerFactory);
     } catch (IOException e) {
       throw new TerminateToolException(-1,
           "IO error while reading training data or indexing data: "
               + e.getMessage(),
           e);
     } finally {
       try {
         sampleStream.close();
       } catch (IOException e) {
       }
     }
 		</programlisting>
 				</para>
 			</section>
 			</section>
 			<section id="tools.lemmatizer.evaluation">
 				<title>Lemmatizer Evaluation</title>
 				<para>
 					The built in evaluation can measure the accuracy of the statistical
 					lemmatizer.
 					The accuracy can be measured on a test data set.
 				</para>
 				<para>
 					There is a command line tool to evaluate a given model on a test
 					data set.
 					The following command shows how the tool can be run:
 					<screen>
 				<![CDATA[
 $ opennlp LemmatizerEvaluator -model en-lemmatizer.bin -data en-lemmatizer.test -encoding utf-8]]>
 			 </screen>
 					This will display the resulting accuracy score, e.g.:
 					<screen>
 				<![CDATA[
 Loading model ... done
 Evaluating ... done

 Accuracy: 0.9659110277825124]]>
 			 </screen>
 				</para>
 		</section>
 </chapter>
	<?xml version="1.0" encoding="UTF-8"?>
	<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
	"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[
	]>
	<!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor
	license agreements. See the NOTICE file distributed with this work for additional
	information regarding copyright ownership. The ASF licenses this file to
	you under the Apache License, Version 2.0 (the "License"); you may not use
	this file except in compliance with the License. You may obtain a copy of
	the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required
	by applicable law or agreed to in writing, software distributed under the
	License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS
	OF ANY KIND, either express or implied. See the License for the specific
	language governing permissions and limitations under the License. -->

	<chapter id="tools.lemmatizer">
	<title>Lemmatizer</title>
	<para>
	The lemmatizer returns, for a given word form (token) and Part of Speech
	tag,
	the dictionary form of a word, which is usually referred to as its
	lemma. A token could
	ambiguously be derived from several basic forms or dictionary words which is why
	the
	postag of the word is required to find the lemma. For example, the form
	`show' may refer
	to either the verb "to show" or to the noun "show".
	Currently OpenNLP implement statistical and dictionary-based lemmatizers.
	</para>
	<section id="tools.lemmatizer.tagging.cmdline">
	<title>Lemmatizer Tool</title>
	<para>
	The easiest way to try out the Lemmatizer is the command line tool,
	which provides access to the statistical
	lemmatizer. Note that the tool is only intended for demonstration and testing.
	</para>
	<para>
	Once you have trained a lemmatizer model (see below for instructions),
	you can start the Lemmatizer Tool with this command:
	</para>
	<para>
	<screen>
	<![CDATA[
	$ opennlp LemmatizerME en-lemmatizer.bin < sentences]]>
	</screen>
	The Lemmatizer now reads a pos tagged sentence(s) per line from
	standard input. For example, you can copy this sentence to the
	console:
	<screen>
	<![CDATA[
	Rockwell_NNP International_NNP Corp._NNP 's_POS Tulsa_NNP unit_NN said_VBD it_PRP
	signed_VBD a_DT tentative_JJ agreement_NN extending_VBG its_PRP$ contract_NN with_IN
	Boeing_NNP Co._NNP to_TO provide_VB structural_JJ parts_NNS for_IN Boeing_NNP 's_POS
	747_CD jetliners_NNS ._.]]>
	</screen>
	The Lemmatizer will now echo the lemmas for each word postag pair to
	the console:
	<screen>
	<![CDATA[
	Rockwell NNP rockwell
	International NNP international
	Corp. NNP corp.
	's POS 's
	Tulsa NNP tulsa
	unit NN unit
	said VBD say
	it PRP it
	signed VBD sign
	...
	]]>
	</screen>
	</para>
	</section>
	<section id="tools.lemmatizer.tagging.api">
	<title>Lemmatizer API</title>
	<para>
	The Lemmatizer can be embedded into an application via its API.
	Currently a statistical
	and DictionaryLemmatizer are available. Note that these two methods are
	complementary and
	the DictionaryLemmatizer can also be used as a way of post-processing
	the output of the statistical
	lemmatizer.
	</para>
	<para>
	The statistical lemmatizer requires that a trained model is loaded
	into memory from disk or from another source.
	In the example below it is loaded from disk:
	<programlisting language="java">
	<![CDATA[
	LemmatizerModel model = null;
	try (InputStream modelIn = new FileInputStream("en-lemmatizer.bin"))) {
	model = new LemmatizerModel(modelIn);
	}
	]]>
	</programlisting>
	After the model is loaded a LemmatizerME can be instantiated.
	<programlisting language="java">
	<![CDATA[
	LemmatizerME lemmatizer = new LemmatizerME(model);]]>
	</programlisting>
	The Lemmatizer instance is now ready to lemmatize data. It expects a
	tokenized sentence
	as input, which is represented as a String array, each String object
	in the array
	is one token, and the POS tags associated with each token.
	</para>
	<para>
	The following code shows how to determine the most likely lemma for
	a sentence.
	<programlisting language="java">
	<![CDATA[
	String[] tokens = new String[] { "Rockwell", "International", "Corp.", "'s",
	"Tulsa", "unit", "said", "it", "signed", "a", "tentative", "agreement",
	"extending", "its", "contract", "with", "Boeing", "Co.", "to",
	"provide", "structural", "parts", "for", "Boeing", "'s", "747",
	"jetliners", "." };

	String[] postags = new String[] { "NNP", "NNP", "NNP", "POS", "NNP", "NN",
	"VBD", "PRP", "VBD", "DT", "JJ", "NN", "VBG", "PRP$", "NN", "IN",
	"NNP", "NNP", "TO", "VB", "JJ", "NNS", "IN", "NNP", "POS", "CD", "NNS",
	"." };

	String[] lemmas = lemmatizer.lemmatize(tokens, postags);]]>
	</programlisting>
	The lemmas array contains one lemma for each token in the
	input array. The corresponding
	tag and lemma can be found at the same index as the token has in the
	input array.
	</para>

	<para>
	The DictionaryLemmatizer is constructed
	by passing the InputStream of a lemmatizer dictionary. Such dictionary
	consists of a text file containing, for each row, a word, its postag and the
	corresponding lemma, each column separated by a tab character.
	<screen>
	<![CDATA[
	show NN show
	showcase NN showcase
	showcases NNS showcase
	showdown NN showdown
	showdowns NNS showdown
	shower NN shower
	showers NNS shower
	showman NN showman
	showmanship NN showmanship
	showmen NNS showman
	showroom NN showroom
	showrooms NNS showroom
	shows NNS show
	shrapnel NN shrapnel
	]]>
	</screen>
	Alternatively, if a (word,postag) pair can output multiple lemmas, the
	the lemmatizer dictionary would consists of a text file containing, for
	each row, a word, its postag and the corresponding lemmas separated by "#":
	<screen>
	<![CDATA[
	muestras NN muestra
	cantaba V cantar
	fue V ir#ser
	entramos V entrar
	]]>
	</screen>
	First the dictionary must be loaded into memory from disk or another
	source.
	In the sample below it is loaded from disk.
	<programlisting language="java">
	<![CDATA[
	InputStream dictLemmatizer = null;

	try (dictLemmatizer = new FileInputStream("english-lemmatizer.txt")) {

	}
	]]>
	</programlisting>
	After the dictionary is loaded the DictionaryLemmatizer can be
	instantiated.
	<programlisting language="java">
	<![CDATA[
	DictionaryLemmatizer lemmatizer = new DictionaryLemmatizer(dictLemmatizer);]]>
	</programlisting>
	The DictionaryLemmatizer instance is now ready. It expects two
	String arrays as input,
	a containing the tokens and another one their respective postags.
	</para>
	<para>
	The following code shows how to find a lemma using a
	DictionaryLemmatizer.
	<programlisting language="java">
	<![CDATA[
	String[] tokens = new String[]{"Most", "large", "cities", "in", "the", "US", "had",
	"morning", "and", "afternoon", "newspapers", "."};
	String[] tags = tagger.tag(sent);
	String[] lemmas = lemmatizer.lemmatize(tokens, postags);
	]]>
	</programlisting>
	The tags array contains one part-of-speech tag for each token in the
	input array. The corresponding
	tag and lemmas can be found at the same index as the token has in the
	input array.
	</para>
	</section>
	<section id="tools.lemmatizer.training">
	<title>Lemmatizer Training</title>
	<para>
	The training data consist of three columns separated by spaces. Each
	word has been put on a
	separate line and there is an empty line after each sentence. The first
	column contains
	the current word, the second its part-of-speech tag and the third its
	lemma.
	Here is an example of the file format:
	</para>
	<para>
	Sample sentence of the training data:
	<screen>
	<![CDATA[
	He PRP he
	reckons VBZ reckon
	the DT the
	current JJ current
	accounts NNS account
	deficit NN deficit
	will MD will
	narrow VB narrow
	to TO to
	only RB only
	# # #
	1.8 CD 1.8
	millions CD million
	in IN in
	September NNP september
	. . O]]>
	</screen>
	The Universal Dependencies Treebank and the CoNLL 2009 datasets
	distribute training data for many languages.
	</para>
	<section id="tools.lemmatizer.training.tool">
	<title>Training Tool</title>
	<para>
	OpenNLP has a command line tool which is used to train the models on
	various corpora.
	</para>
	<para>
	Usage of the tool:
	<screen>
	<![CDATA[
	$ opennlp LemmatizerTrainerME
	Usage: opennlp LemmatizerTrainerME [-factory factoryName] [-params paramsFile] -lang language -model modelFile -data sampleData [-encoding charsetName]

	Arguments description:
	-factory factoryName
	A sub-class of LemmatizerFactory where to get implementation and resources.
	-params paramsFile
	training parameters file.
	-lang language
	language which is being processed.
	-model modelFile
	output model file.
	-data sampleData
	data to be used, usually a file name.
	-encoding charsetName
	encoding for reading and writing text, if absent the system default is used.
	]]>
	</screen>
	Its now assumed that the english lemmatizer model should be trained
	from a file called
	en-lemmatizer.train which is encoded as UTF-8. The following command will train the
	lemmatizer and write the model to en-lemmatizer.bin:
	<screen>
	<![CDATA[
	$ opennlp LemmatizerTrainerME -model en-lemmatizer.bin -params PerceptronTrainerParams.txt -lang en -data en-lemmatizer.train -encoding UTF-8]]>
	</screen>
	</para>
	</section>
	<section id="tools.lemmatizer.training.api">
	<title>Training API</title>
	<para>
	The Lemmatizer offers an API to train a new lemmatizer model. First
	a training parameters
	file needs to be instantiated:
	<programlisting language="java">
	<![CDATA[
	TrainingParameters mlParams = CmdLineUtil.loadTrainingParameters(params.getParams(), false);
	if (mlParams == null) {
	mlParams = ModelUtil.createDefaultTrainingParameters();
	}]]>
	</programlisting>
	Then we read the training data:
	<programlisting language="java">
	<![CDATA[
	InputStreamFactory inputStreamFactory = null;
	try {
	inputStreamFactory = new MarkableFileInputStreamFactory(
	new File(en-lemmatizer.train));
	} catch (FileNotFoundException e) {
	e.printStackTrace();
	}
	ObjectStream<String> lineStream = null;
	LemmaSampleStream lemmaStream = null;
	try {
	lineStream = new PlainTextByLineStream(
	(inputStreamFactory), "UTF-8");
	lemmaStream = new LemmaSampleStream(lineStream);
	} catch (IOException e) {
	CmdLineUtil.handleCreateObjectStreamError(e);
	}
	]]>
	</programlisting>
	The following step proceeds to train the model:
	<programlisting>
	LemmatizerModel model;
	try {
	LemmatizerFactory lemmatizerFactory = LemmatizerFactory
	.create(params.getFactory());
	model = LemmatizerME.train(params.getLang(), lemmaStream, mlParams,
	lemmatizerFactory);
	} catch (IOException e) {
	throw new TerminateToolException(-1,
	"IO error while reading training data or indexing data: "
	+ e.getMessage(),
	e);
	} finally {
	try {
	sampleStream.close();
	} catch (IOException e) {
	}
	}
	</programlisting>
	</para>
	</section>
	</section>
	<section id="tools.lemmatizer.evaluation">
	<title>Lemmatizer Evaluation</title>
	<para>
	The built in evaluation can measure the accuracy of the statistical
	lemmatizer.
	The accuracy can be measured on a test data set.
	</para>
	<para>
	There is a command line tool to evaluate a given model on a test
	data set.
	The following command shows how the tool can be run:
	<screen>
	<![CDATA[
	$ opennlp LemmatizerEvaluator -model en-lemmatizer.bin -data en-lemmatizer.test -encoding utf-8]]>
	</screen>
	This will display the resulting accuracy score, e.g.:
	<screen>
	<![CDATA[
	Loading model ... done
	Evaluating ... done

	Accuracy: 0.9659110277825124]]>
	</screen>
	</para>
	</section>
	</chapter>