opennlp-docs/src/docbkx/machine-learning.xml - opennlp - Git at Google

 <?xml version="1.0" encoding="UTF-8"?>
 <!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
 "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[
 ]>
 <!--
 Licensed to the Apache Software Foundation (ASF) under one
 or more contributor license agreements.  See the NOTICE file
 distributed with this work for additional information
 regarding copyright ownership.  The ASF licenses this file
 to you under the Apache License, Version 2.0 (the
 "License"); you may not use this file except in compliance
 with the License.  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing,
 software distributed under the License is distributed on an
 "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 KIND, either express or implied.  See the License for the
 specific language governing permissions and limitations
 under the License.
 -->

 <chapter id="opennlp.ml">
 <title>Machine Learning</title>
 	<section id="opennlp.ml.maxent">
 		<title>Maximum Entropy</title>
 		<para>
 		To explain what maximum entropy is, it will be simplest to quote from Manning and Schutze* (p. 589):
 		<quote>
 		Maximum entropy modeling is a framework for integrating information from many heterogeneous
 		information sources for classification.  The data for a  classification problem is described
 		as a (potentially large) number of features.  These features can be quite complex and allow
 		the experimenter to make use of prior knowledge about what types of informations are expected
 		to be important for classification. Each feature corresponds to a constraint on the model.
 		We then compute the maximum entropy model, the model with the maximum entropy of all the models
 		that satisfy the constraints.  This term may seem perverse, since we have spent most of the book
 		trying to minimize the (cross) entropy of models, but the idea is that we do not want to go beyond
 		the data.  If we chose a model with less entropy, we would add `information' constraints to the
 		model that are not justified by the empirical evidence available to us. Choosing the maximum
 		entropy model is motivated by the desire to preserve as much uncertainty as possible.
 		</quote>
 		</para>
 		<para>
 		So that gives a rough idea of what the maximum entropy framework is.
 		Don't assume anything about your probability distribution other than what you have observed.
 		</para>
 		<para>
 		On the engineering level, using maxent is an excellent way of creating programs which perform
 		very difficult classification tasks very well.  For example,  precision and recall figures for
 		programs using maxent models have reached (or are) the state of the art on tasks like part of
 		speech tagging, sentence detection, prepositional phrase attachment, and named entity recognition.
 		On the engineering level, an added benefit is that the person creating a maxent model only needs
 		to inform the training procedure of the event space, and need not worry about independence between
 		features.
 		</para>
 		<para>
 		While the authors of this implementation of maximum entropy are generally interested using
 		maxent models in natural language processing, the framework is certainly quite general and
 		useful for a much wider variety of fields.  In fact, maximum entropy modeling was originally
 		developed for statistical physics.
 		</para>
 		<para>
 		For a very in-depth discussion of how maxent can be used in natural language processing,
 		try reading Adwait Ratnaparkhi's dissertation.   Also,  check out Berger, Della Pietra,
 		and Della Pietra's paper A Maximum Entropy Approach to Natural Language Processing, which
 		provides an excellent introduction and discussion of the framework.
 		</para>
 		<para>
 		*Foundations of statistical natural language processing . Christopher D. Manning, Hinrich Schutze.
 		Cambridge, Mass. : MIT Press, c1999.
 		</para>
 			<section id="opennlp.ml.maxent.impl">
 		<title>Implementation</title>
 		<para>
 		We have tried to make the opennlp.maxent implementation easy to use.  To create a model, one
 		needs (of course) the training data, and then implementations of two interfaces in the
 		opennlp.maxent package, EventStream and ContextGenerator.  These have fairly simple specifications,
 		and example implementations can be found in the OpenNLP Tools preprocessing components.
 		</para>
 		<para>
 		We have also set in place some interfaces and code to make it easier to automate the training
 		and evaluation process (the Evalable interface and the TrainEval class).  It is not necessary
 		to use this functionality, but if you do you'll find it much easier to see how well your models
 		are doing.  The opennlp.grok.preprocess.namefind package is an example of a maximum entropy
 		component which uses this functionality.
 		</para>
 		<para>
 		We have managed to use several techniques to reduce the size of the models when writing them to
 		disk, which also means that reading in a model for use is much quicker than with less compact
 		encodings of the model.  This was especially important to us since we use many maxent models in
 		the Grok library, and we wanted the start up time and the physical size of the library to be as
 		minimal as possible. As of version 1.2.0, maxent has an io package which greatly simplifies the
 		process of loading and saving models in different formats.
 		</para>
 		</section>
 	</section>
 </chapter>
	<?xml version="1.0" encoding="UTF-8"?>
	<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
	"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[
	]>
	<!--
	Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.
	-->

	<chapter id="opennlp.ml">
	<title>Machine Learning</title>
	<section id="opennlp.ml.maxent">
	<title>Maximum Entropy</title>
	<para>
	To explain what maximum entropy is, it will be simplest to quote from Manning and Schutze* (p. 589):
	<quote>
	Maximum entropy modeling is a framework for integrating information from many heterogeneous
	information sources for classification. The data for a classification problem is described
	as a (potentially large) number of features. These features can be quite complex and allow
	the experimenter to make use of prior knowledge about what types of informations are expected
	to be important for classification. Each feature corresponds to a constraint on the model.
	We then compute the maximum entropy model, the model with the maximum entropy of all the models
	that satisfy the constraints. This term may seem perverse, since we have spent most of the book
	trying to minimize the (cross) entropy of models, but the idea is that we do not want to go beyond
	the data. If we chose a model with less entropy, we would add `information' constraints to the
	model that are not justified by the empirical evidence available to us. Choosing the maximum
	entropy model is motivated by the desire to preserve as much uncertainty as possible.
	</quote>
	</para>
	<para>
	So that gives a rough idea of what the maximum entropy framework is.
	Don't assume anything about your probability distribution other than what you have observed.
	</para>
	<para>
	On the engineering level, using maxent is an excellent way of creating programs which perform
	very difficult classification tasks very well. For example, precision and recall figures for
	programs using maxent models have reached (or are) the state of the art on tasks like part of
	speech tagging, sentence detection, prepositional phrase attachment, and named entity recognition.
	On the engineering level, an added benefit is that the person creating a maxent model only needs
	to inform the training procedure of the event space, and need not worry about independence between
	features.
	</para>
	<para>
	While the authors of this implementation of maximum entropy are generally interested using
	maxent models in natural language processing, the framework is certainly quite general and
	useful for a much wider variety of fields. In fact, maximum entropy modeling was originally
	developed for statistical physics.
	</para>
	<para>
	For a very in-depth discussion of how maxent can be used in natural language processing,
	try reading Adwait Ratnaparkhi's dissertation. Also, check out Berger, Della Pietra,
	and Della Pietra's paper A Maximum Entropy Approach to Natural Language Processing, which
	provides an excellent introduction and discussion of the framework.
	</para>
	<para>
	*Foundations of statistical natural language processing . Christopher D. Manning, Hinrich Schutze.
	Cambridge, Mass. : MIT Press, c1999.
	</para>
	<section id="opennlp.ml.maxent.impl">
	<title>Implementation</title>
	<para>
	We have tried to make the opennlp.maxent implementation easy to use. To create a model, one
	needs (of course) the training data, and then implementations of two interfaces in the
	opennlp.maxent package, EventStream and ContextGenerator. These have fairly simple specifications,
	and example implementations can be found in the OpenNLP Tools preprocessing components.
	</para>
	<para>
	We have also set in place some interfaces and code to make it easier to automate the training
	and evaluation process (the Evalable interface and the TrainEval class). It is not necessary
	to use this functionality, but if you do you'll find it much easier to see how well your models
	are doing. The opennlp.grok.preprocess.namefind package is an example of a maximum entropy
	component which uses this functionality.
	</para>
	<para>
	We have managed to use several techniques to reduce the size of the models when writing them to
	disk, which also means that reading in a model for use is much quicker than with less compact
	encodings of the model. This was especially important to us since we use many maxent models in
	the Grok library, and we wanted the start up time and the physical size of the library to be as
	minimal as possible. As of version 1.2.0, maxent has an io package which greatly simplifies the
	process of loading and saving models in different formats.
	</para>
	</section>
	</section>
	</chapter>