<?xml version="1.0" encoding="UTF-8"?> | |
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" | |
"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [ | |
<!ENTITY imgroot "./images/" > | |
<!ENTITY % xinclude SYSTEM "../../../uima-docbook-tool/xinclude.mod"> | |
%xinclude; | |
]> | |
<!-- | |
Licensed to the Apache Software Foundation (ASF) under one | |
or more contributor license agreements. See the NOTICE file | |
distributed with this work for additional information | |
regarding copyright ownership. The ASF licenses this file | |
to you under the Apache License, Version 2.0 (the | |
"License"); you may not use this file except in compliance | |
with the License. You may obtain a copy of the License at | |
http://www.apache.org/licenses/LICENSE-2.0 | |
Unless required by applicable law or agreed to in writing, | |
software distributed under the License is distributed on an | |
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | |
KIND, either express or implied. See the License for the | |
specific language governing permissions and limitations | |
under the License. | |
--> | |
<book lang="en"> | |
<title>Tagger Annotator Documentation</title> | |
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" | |
href="../../../SandboxDocs/src/docbook/book_info.xml" /> | |
<preface id="sandbox.tagger.introduction"> | |
<title>Introduction</title> | |
<para> | |
Tagger Annotator is an Apache UIMA statistical analysis | |
engine that annotates tokens with corresponding grammatical | |
types (parts of speech, or just POS). The tagger is a | |
standard hidden Markov model (HMM) tagger. | |
</para> | |
</preface> | |
<chapter id="sandbox.tagger.prerequisites"> | |
<title>Prerequisites</title> | |
<para> | |
The UIMA HMM Tagger annotator assumes that sentences and | |
tokens have already been annotated in the CAS with Sentence | |
and Token annotations respectively (see e.g. | |
<code>Whitespace Tokenizer Annotator</code> | |
). | |
Further, the tagger requires a parameter file which | |
specifies a number of necessary parameters for tagging | |
procedure (see | |
<xref | |
linkend="sandbox.tagger.annotatorDescriptor.configParam" /> | |
). | |
Two trained models for English and German are included in | |
the package (in the | |
<code>resources</code> | |
folder). Other models can be trained outside of the UIMA | |
framework (see | |
<xref linkend="sandbox.tagger.training" /> | |
). | |
</para> | |
</chapter> | |
<chapter id="sandbox.tagger.processingOverview"> | |
<title>Processing Overview</title> | |
<para> | |
The algorithm iterates over sentences and tokens in turn to | |
accumulate a list of words. These are then sent to a | |
processing engine of HMM tagger. For each | |
<code>Token</code> | |
, the | |
<code>posTag</code> | |
field is updated with the corresponding part of speech (e.g. | |
<code>posTag = "NN"</code> | |
where | |
<code>NN</code> | |
stands for | |
<emphasis>common noun</emphasis> | |
). | |
</para> | |
</chapter> | |
<chapter id="sandbox.tagger.annotatorDescriptor"> | |
<title>Annotator Descriptor</title> | |
<para> | |
Two descriptors are employed to configure tagger's | |
functionality: | |
<itemizedlist> | |
<listitem> | |
<para> | |
<code>HmmTagger.xml</code> | |
- is a primitive analysis engine descriptor, | |
which defines the tagger basic functionality and | |
can be combined in an aggregate analysis engine | |
with an arbitrary tokenizer. This descriptor | |
cannot be used on itself as the tagger alone | |
does not perfom tokenization. | |
</para> | |
</listitem> | |
<listitem> | |
<para> | |
<code>HmmTaggerTAE.xml</code> | |
- is an aggregate analysis engine whose only | |
function is to combine UIMA | |
<code>Whitespace Tokenizer Annotator</code> | |
with | |
<code>HMM Tagger Annotator</code> | |
and is thereby a "ready to use" tagging | |
descriptor. | |
</para> | |
</listitem> | |
</itemizedlist> | |
</para> | |
<section id="sandbox.tagger.annotatorDescriptor.configParam"> | |
<title>Configuration Parameters</title> | |
<para> | |
The HMM tagger annotator ( | |
<code>HmmTagger.xml</code> | |
) requires the following configuration parameters: | |
</para> | |
<para> | |
<itemizedlist> | |
<listitem> | |
<para> | |
<code>NGRAM_SIZE</code> | |
- this parameter is an Integer, defining | |
whether a bi- or trigram model should be | |
used for tagging (default is N=3). | |
<programlisting><emphasis><![CDATA[ <configurationParameters> | |
<configurationParameter> | |
<name>NGRAM_SIZE</name> | |
<type>Integer</type> | |
<multiValued>false</multiValued> | |
<mandatory>true</mandatory> | |
</configurationParameter> | |
</configurationParameters> | |
<configurationParameterSettings> | |
<nameValuePair> | |
<name>NGRAM_SIZE</name> | |
<value> | |
<integer>3</integer> | |
</value> | |
</nameValuePair> | |
</configurationParameterSettings>]]></emphasis></programlisting> | |
</para> | |
</listitem> | |
<listitem> | |
<para> | |
<code>ModelFile</code> | |
- binary file containing the statistical model which should be used for tagging is defined as an external resource | |
<programlisting><emphasis><![CDATA[ | |
<externalResources> | |
<externalResource> | |
<name>ModelFile</name> | |
<description>HMM Tagger model file</description> | |
<fileResourceSpecifier> | |
<fileUrl>file:german/TuebaModel.dat</fileUrl> | |
</fileResourceSpecifier> | |
<implementationName>org.apache.uima.examples.tagger.ModelResource</implementationName> | |
</externalResource> | |
</externalResources>]]></emphasis></programlisting> | |
Thus, one can easily use a different model by changing the <code>fileUrl</code> line: | |
<code>file:german/TuebaModel.dat</code>. | |
(NB. <emphasis>New models must be located in the <code>resources</code> folder</emphasis>.) | |
After these two parameters have been set, the tagger is ready to use. | |
</para> | |
</listitem> | |
</itemizedlist> | |
</para> | |
</section> | |
<section id="sandbox.tagger.annotatorDescriptor.capabilities"> | |
<title>Capabilities</title> | |
<para> | |
As the tagger inherits tokenization indexes from the CAS, | |
<code>uima.SentenceAnnotation</code> and <code>uima.TokenAnnotation</code> with their | |
<code>begin</code> and <code>end</code> features respectively have to be defined as | |
input capabilities in the HMM Tagger annotator descriptor. <code>Token</code> receives | |
also an additional <code>posTag</code> feature as an output capability. | |
</para> | |
<para> | |
<programlisting><emphasis><![CDATA[<capabilities> | |
<capability> | |
<inputs> | |
<type>org.apache.uima.TokenAnnotation</type> | |
<type allAnnotatorFeatures="true">org.apache.uima.SentenceAnnotation</type> | |
<feature>org.apache.uima.TokenAnnotation:end</feature> | |
<feature>org.apache.uima.TokenAnnotation:begin</feature> | |
</inputs> | |
<outputs> | |
<type>org.apache.uima.TokenAnnotation</type> | |
<feature>org.apache.uima.TokenAnnotation:posTag</feature> | |
<feature>org.apache.uima.TokenAnnotation:end</feature> | |
<feature>org.apache.uima.TokenAnnotation:begin</feature> | |
</outputs> | |
</capability> | |
</capabilities>]]></emphasis></programlisting> | |
</para> | |
</section> | |
</chapter> | |
<chapter id="sandbox.tagger.unittest"> | |
<title>Functionality Test</title> | |
<para> | |
The <code>TaggerTest</code> is a JUnit test file (available in the <code>test</code> folder), | |
which provides an opportunity to test provided models for English and German, | |
as well as the basic functionality of the tagger. In order to check whether | |
the tagger's configuration is correct, just run this file as JUnit Test and you should get the following output: | |
<programlisting><![CDATA[Tesing German Model... | |
The used model is:resources/german/TuebaModel.dat | |
61646 distinct words in the model | |
Number of part-of-speech tags used: 54 | |
These are: [$(, $,, $., ADJA, ADJD, ADV, APPO, APPR, APPRART, APZR, ART, CARD, ... ] | |
Testing German trigram tagger.. | |
[Jerry, liebt, Wansley, .] | |
expected: [NE, VVFIN, NE, $.] | |
tagger output: [NE, VVFIN, NE, $.] | |
Very Good! | |
========================================================== | |
Tesing English Model... | |
The used model is:resources/english/BrownModel.dat | |
56012 distinct words in the model | |
Number of part-of-speech tags used: 473 | |
These are: [', '', (, ), *, ,, --, ., :, ``, abl, abn, abx, ap, ap$, at, be, bed, ...] | |
Testing English trigram tagger... | |
[Jerry, loves, Wansley, .] | |
expected: [np, vbz, np, .] | |
tagger output: [np, vbz, np, .] | |
Very Good!]]></programlisting> | |
</para> | |
</chapter> | |
<chapter id="sandbox.tagger.tagger"> | |
<title>Overview of the Tagger package</title> | |
<para> | |
The package <code>org.apache.uima.examples.tagger</code> contains: | |
<itemizedlist> | |
<listitem> | |
<para> | |
two interfaces: | |
<orderedlist> | |
<listitem> | |
<para> | |
<code>IModelResource</code> | |
- model resource interface | |
</para> | |
</listitem> | |
<listitem> | |
<para> | |
<code>Tagger</code> | |
- general tagger interface, in case one would want to integrate further tagger types. | |
</para> | |
</listitem> | |
</orderedlist> | |
</para> | |
</listitem> | |
<listitem> | |
<para> three classes: | |
<orderedlist> | |
<listitem> | |
<para> | |
<code>HMMTagger</code> | |
- hidden Markov model tagger for UIMA, that is using Viterbi algorithm to compute the most | |
probable part-of-speech sequence for a given list of tokens. | |
</para> | |
</listitem> | |
<listitem> | |
<para> | |
<code>Viterbi</code> | |
- implementation of the Viterbi Algorithm. This class makes up the core of the tagger. | |
</para> | |
</listitem> | |
<listitem> | |
<para> | |
<code>ModelResource.java</code> | |
- implementation of the <code>IModelResource</code> | |
</para> | |
</listitem> | |
</orderedlist> | |
</para> | |
</listitem> | |
</itemizedlist> | |
</para> | |
</chapter> | |
<chapter id="sandbox.tagger.training"> | |
<title>Training Own Models</title> | |
<para> | |
Though we decide not to include training directly into UIMA framework, one can easily | |
train other models for different pre-annotated corpora outside of the UIMA using <code>ModelGeneration</code> class, | |
available in the subpackage <code>org.apache.uima.examples.tagger.trainAndTest</code>. | |
This subpackage includes some further files needed for training of own models: | |
<itemizedlist> | |
<listitem> | |
<para> | |
<code>MappingInterface</code> | |
- defines mapping for a tagset. For example, one may wish to map a more detailed tagset | |
to a less distinctive one (i.e. tell a program to tag all verbs as just <code>VERB</code> | |
instead of differentiating between <code>verb infinitive</code>, <code>verb imperative</code>, etc. | |
Two sample implementations for <code>MappingInterface</code> are included, | |
namely <code>TagMappingBrown</code> (mapping reducing Brown corpus tagset from more than 400 tags to 93) and | |
<code>GrobMappingTueba</code>(mapping German STTS tagset from 54 tags to basic 11 categories plus special symbols and punctuation) | |
</para> | |
</listitem> | |
<listitem> | |
<para> | |
<code>ModelGeneration</code> | |
- trains an N-gram model for the tagger, iterating over a List of <code>Token</code>s. | |
Writes the resulting model to a binary file. At the moment, | |
only bi-and trigram models are supported. Further N-grams can be easily integrated. | |
<code>ModelGeneration</code> is not concerned with the fact, | |
whether the training corpus is given as a single file or as a directory containing a number of files, | |
as this is a <code>CORPUS_READER</code> implementation issue. Two supplied readers include both a reader | |
for a corpus as a single file (<code>TT_FormatReader</code>code>) or as a directory (<code>BrownReader</code>code>) | |
</para> | |
</listitem> | |
<listitem> | |
<para> | |
Interface <code>CorpusReader</code> | |
- should be used to implement corpus readers for own corpora; the objective | |
of the reader is to take charge of the preprocessing and transform tokenized units | |
(usually <emphasis>words</emphasis>) into a List of <code>Token</code> objects. | |
Two sample implementations of <code>CorpusReader</code> are included: | |
<orderedlist> | |
<listitem> | |
<para> | |
<code>BrownReader</code> | |
- for the Brown corpus from the nltk distribution (nltk.sourceforge.net) | |
</para> | |
</listitem> | |
<listitem> | |
<para> | |
<code>TT_FormatReader</code> | |
- for the corpora in TreeTagger format, i.e. one word per line | |
with tags separated from the words by tabs. | |
</para> | |
</listitem> | |
</orderedlist> | |
</para> | |
</listitem> | |
</itemizedlist> | |
</para> | |
<para> | |
To train a new model, one should adjust a number of parameters in the <code>"tagger.properties"</code> file, | |
which is in Java properties file format (see <xref linkend="properties.file"/>). After the parameters are set, you just need to run | |
<code>ModelGeneration.java</code> | |
<programlisting id="properties.file" xreflabel="tagger.properties file"><emphasis><![CDATA[######## This is the default tagger.properties file | |
######## This file is used for training and testing only, | |
######## The configuration for tagging is directly tuned in the descriptor "HmmTagger.xml" | |
########################## BOTH FOR TRAINING AND EVALUATION ################################ | |
######## THESE ARE THE DEFAULT MODEL FILES FOR GERMAN AND ENGLISH | |
######## You can either uncomment one of them, if you want to replace given models with your own one, | |
#MODEL_FILE = resources/german/TuebaModel.dat | |
#MODEL_FILE = resources/english/BrownModel.dat | |
######## or specify a completely different name | |
MODEL_FILE = | |
######## If mapping of tags is desired, uncomment the following | |
#DO_MAPPING = true | |
####### EXAMPLES OF MAPPING CLASSES | |
## Basic mapping for the Brown corpus (nltk distribution) tagset: to get 93 tags out of 473 | |
#MAPPING = org.apache.uima.examples.tagger.TagMappingBrown | |
## Basic mapping for STTS tagset: from 54 tags onto the basic ca. 15 classes plus punctuation | |
#MAPPING = org.apache.uima.examples.tagger.GrobMappingTueba | |
## If you implement your own mapping, you should specify here in the same manner as above a java-path to the class | |
MAPPING = | |
####### FILE CONTAINING TRAINING CORPUS: | |
####### can be in specified either as an absolute or as a relative path | |
####### e.g. FILE = ../../tueba_tigerFormat.txt or FILE = C:/Data/tueba.txt | |
FILE = | |
######## If corpus is in a different format and cannot be read with the provided READERS, | |
######## you should specify here a java-path to the class (s. examples below) | |
#CORPUS_READER = org.apache.uima.examples.tagger.trainAndTest.TT_FormatReader | |
#CORPUS_READER = org.apache.uima.examples.tagger.trainAndTest.BrownReader | |
CORPUS_READER = | |
################# ONLY FOR EVALUATION ############################### | |
######### GOLD STANDARD CORPUS FILE: | |
######### can be specified as an absolute or as a relative path | |
## e.g. GOLD_STANDARD = ../../tueba_tigerFormat.txt or GOLD_STANDARD = C:/Data/tueba.txt | |
GOLD_STANDARD = | |
######### Here we specify whether one intends to test a bi- or a trigram model (default is a trigram model) | |
N=3 | |
]]></emphasis> | |
</programlisting> | |
</para> | |
</chapter> | |
<chapter id="sandbox.tagger.evaluation"> | |
<title>Evaluation</title> | |
<para> | |
To evaluate performance if a "gold standard" corpus is available, one can use the following provided file: | |
<itemizedlist> | |
<listitem> | |
<para> | |
<code>TaggerEvaluation.java</code> | |
- can be used to evaluate the tagger and/or new models on a manually annotated corpus. | |
</para> | |
</listitem> | |
</itemizedlist> | |
</para> | |
<para> | |
<code>HMMTagger</code> was evaluated for English and German. For English, it was trained on 80% of the Brown corpus | |
(180,000 tokens) and tested on the rest unseen 20%. The achieved accuracy was about 96%, test corpus contained 4.5% of unknown tokens. | |
</para> | |
<para> | |
For German, it achieves between 95% and 96% accuracy when trained and tested on the same type of corpus, i.e. with 80% of corpus used for training and 20% for testing. | |
The accuracy goes a bit down when tagging is performed for different types of corpora than the training one, mostly due to the growing number of unknown words. | |
</para> | |
</chapter> | |
<appendix id="sandbox.tagger.theory"> | |
<title>Theory Behind</title> | |
<para> | |
This chapter is just a sketch of the statistical model | |
undelying the tagger. | |
Hidden Markov Models (HMMs) are the mainstay of the | |
applications employing statistical modeling in any form, | |
like speech recognition and production systems, signal | |
processing, part of speech tagging. | |
A Hidden Markov Model is a probabilistic function of a | |
Markov process. A Markov process is a process that fulfills | |
Markov assumptions. | |
Markov assumptions are: | |
<itemizedlist> | |
<listitem> | |
<para> | |
<code>limited horizon</code> | |
- Markov processes are states without memory, | |
except for condition of the current state. | |
Though we usually consider sequences of | |
variables that are not independent of each | |
other, it often suffices to know the value of | |
the current situation without going deep into | |
the past happenings. As [ | |
<biblioref linkend="schuetze" /> | |
] put it, we do not really need to know, how | |
many books were in the library last week or last | |
year in order to predict how many books there | |
will be tomorrow. It is often enough to know the | |
current situation. Thereby, future states in the | |
Markov process are independent of the past, they | |
only depend on the present. Let | |
<inlineequation> | |
<mathphrase> | |
X = (X | |
<subscript>1</subscript> | |
, ..., X | |
<subscript>T</subscript> | |
) | |
</mathphrase> | |
</inlineequation> | |
be a sequence of random variables taking the | |
values from the finite state space | |
<inlineequation> | |
<mathphrase> | |
S = (s | |
<subscript>1</subscript> | |
, ..., s | |
<subscript>N</subscript> | |
) | |
</mathphrase> | |
</inlineequation> | |
, then a limited horizon property could be | |
formalized by: | |
<informalequation> | |
<mathphrase> | |
P(X | |
<subscript>t+1</subscript> | |
= s | |
<subscript>k</subscript> | |
|X | |
<subscript>1</subscript> | |
, ..., X | |
<subscript>t</subscript> | |
) = P(X | |
<subscript>t+1</subscript> | |
= s | |
<subscript>k</subscript> | |
|X | |
<subscript>t</subscript> | |
) | |
</mathphrase> | |
</informalequation> | |
</para> | |
</listitem> | |
<listitem> | |
<para> | |
<code>time invariance</code> | |
</para> | |
<para> | |
The probabilities do not change over time, i.e. | |
if we know that the probability of observing a | |
rainbow after the rain is equal to 90\%, we know | |
that it should be true for today as well as for | |
tomorrow. | |
</para> | |
</listitem> | |
</itemizedlist> | |
</para> | |
<para> | |
If | |
<code>X</code> | |
conforms to these two properties, then it is said to be a | |
Markov chain. | |
One can describe a Markov chain by a transition matrix: | |
<informalequation> | |
<mathphrase> | |
A = a | |
<subscript>i,j</subscript> | |
= P(X | |
<subscript>t+1</subscript> | |
= s | |
<subscript>j</subscript> | |
|X | |
<subscript>t</subscript> | |
=s | |
<subscript>i</subscript> | |
) | |
</mathphrase> | |
</informalequation> | |
<informalequation> | |
<mathphrase> | |
- with a | |
<subscript>i,j</subscript> | |
>= 0 (for all | |
<emphasis>i,j</emphasis> | |
) and the sum of all transition probabilities from | |
state | |
<emphasis>i</emphasis> | |
(a | |
<subscript>i,j</subscript> | |
) should be equal to 1 (for all | |
<emphasis>i</emphasis> | |
) | |
</mathphrase> | |
</informalequation> | |
</para> | |
<para> | |
Markov models can be used whenever one needs to model the | |
probability of a linear sequence of variables. | |
One distinguishes Visible Markov Models (VMM) vs. Hidden | |
Markov Models. The difference is that when we work with | |
"visible" events, we can directly estimate the corresponding | |
probabilities (which is the case if training corpus is | |
available to train own models for HMM taggers). | |
Finding a sequence of part of speech tags (i.e. Viterbi part | |
of the tagger) in contrast is a hidden Markov model as the | |
states (tags) are not directly observable. | |
</para> | |
<para> | |
<emphasis>The goal of HMM - based tagger</emphasis> | |
is to find part of speech tags ( = hidden states) that | |
generate a sequence of words ( = observable states). Most of | |
the known implementations of POS taggers are viewing text as | |
being produced by a hidden Markov model, so that tagging can | |
be regarded as a Markov process deciding which states the | |
system went through to generate a given text. | |
</para> | |
<para> | |
<emphasis>General Form of HMM</emphasis> | |
</para> | |
<para> | |
A HMM is a five-tuple: | |
<inlineequation> | |
<mathphrase>(S, O, &pgr;, A, B)</mathphrase> | |
</inlineequation> | |
<informalexample> | |
<para>where:</para> | |
<para> | |
<itemizedlist> | |
<listitem> | |
<para> | |
<code>S</code> | |
- the set of states (here: parts of | |
speech) | |
</para> | |
</listitem> | |
<listitem> | |
<para> | |
<code>K</code> | |
- the set of observations (here: words) | |
</para> | |
</listitem> | |
<listitem> | |
<para> | |
<code>&pgr;</code> | |
- initial state probabilities | |
</para> | |
</listitem> | |
<listitem> | |
<para> | |
<code>A</code> | |
- state transitions probabilities | |
</para> | |
</listitem> | |
<listitem> | |
<para> | |
<code>B</code> | |
- symbol emissions probabilities | |
</para> | |
</listitem> | |
</itemizedlist> | |
</para> | |
</informalexample> | |
</para> | |
<para> | |
Further, | |
<code> | |
X | |
<subscript>t</subscript> | |
</code> | |
(state sequence) and | |
<code> | |
O | |
<subscript>t</subscript> | |
</code> | |
(output sequence) are given. | |
Tagging procedure is then the following: | |
<informalexample> | |
<orderedlist> | |
<listitem> | |
<para> | |
<code>t := 1</code> | |
</para> | |
</listitem> | |
<listitem> | |
<para> | |
<code> | |
Start in state s | |
<subscript>i</subscript> | |
with probability &pgr; | |
<subscript>i</subscript> | |
(i.e., X | |
<subscript>1</subscript> | |
= i) | |
</code> | |
</para> | |
</listitem> | |
<listitem> | |
<para> | |
<code>forever do:</code> | |
</para> | |
<itemizedlist> | |
<listitem> | |
<para> | |
<code> | |
Move from s | |
<subscript>i</subscript> | |
to s | |
<subscript>j</subscript> | |
with probability a | |
<subscript>i,j</subscript> | |
(i.e. X | |
<subscript>t+1</subscript> | |
= j) | |
</code> | |
</para> | |
</listitem> | |
<listitem> | |
<para> | |
<code> | |
Emit observation symbol o | |
<subscript>t</subscript> | |
= k with probability b | |
<subscript>i,j,k</subscript> | |
</code> | |
</para> | |
</listitem> | |
<listitem> | |
<para> | |
<code>t := T+1</code> | |
</para> | |
</listitem> | |
</itemizedlist> | |
</listitem> | |
<listitem> | |
<para> | |
<code>end</code> | |
</para> | |
</listitem> | |
</orderedlist> | |
</informalexample> | |
</para> | |
<para> | |
Despite their limitations, HMM-s are one of the most | |
successful techniques in natural language processing and are | |
widely used, especially in sequence tagging applications. | |
The best statistical taggers all perform at about the same | |
level of accuracy. | |
</para> | |
</appendix> | |
<!-- ... --> | |
<glossary> | |
<title>Glossary</title> | |
<glossdiv> | |
<title>HMM</title> | |
<glossentry id="hmm"> | |
<glossterm>Hidden Markov Model</glossterm> | |
<acronym>HMM</acronym> | |
<glossdef> | |
<para></para> | |
</glossdef> | |
</glossentry> | |
</glossdiv> | |
<glossdiv> | |
<title>POS</title> | |
<glossentry id="pos"> | |
<glossterm>Part of Speech</glossterm> | |
<acronym>POS</acronym> | |
<glossdef> | |
<para></para> | |
</glossdef> | |
</glossentry> | |
</glossdiv> | |
</glossary> | |
<bibliography> | |
<biblioentry xreflabel="ManningSchuetze99" id="schuetze"> | |
<authorgroup> | |
<author> | |
<firstname>Christopher</firstname> | |
<surname>Manning</surname> | |
</author> | |
<author> | |
<firstname>Hinrich</firstname> | |
<surname>Schuetze</surname> | |
</author> | |
</authorgroup> | |
<title> | |
Foundations of Statistical Natural Language Processing | |
</title> | |
<copyright> | |
<year>1999</year> | |
</copyright> | |
<publisher> | |
<publishername>MIT Press</publishername> | |
</publisher> | |
</biblioentry> | |
</bibliography> | |
</book> |