layout: default title: Hidden Markov Models


Hidden Markov Models

Introduction and Usage

Hidden Markov Models are used in multiple areas of Machine Learning, such as speech recognition, handwritten letter recognition or natural language processing.

Formal Definition

A Hidden Markov Model (HMM) is a statistical model of a process consisting of two (in our case discrete) random variables O and Y, which change their state sequentially. The variable Y with states {y_1, ... , y_n} is called the “hidden variable”, since its state is not directly observable. The state of Y changes sequentially with a so called - in our case first-order

  • Markov Property. This means, that the state change probability of Y only depends on its current state and does not change in time. Formally we write: P(Y(t+1)=y_i|Y(0)...Y(t)) = P(Y(t+1)=y_i|Y(t)) = P(Y(2)=y_i|Y(1)). The variable O with states {o_1, ... , o_m} is called the “observable variable”, since its state can be directly observed. O does not have a Markov Property, but its state probability depends statically on the current state of Y.

Formally, an HMM is defined as a tuple M=(n,m,P,A,B), where n is the number of hidden states, m is the number of observable states, P is an n-dimensional vector containing initial hidden state probabilities, A is the nxn-dimensional “transition matrix” containing the transition probabilities such that A[i,j](i,j.html) =P(Y(t)=y_i|Y(t-1)=y_j) and B is the mxn-dimensional “emission matrix” containing the observation probabilities such that B[i,j]= P(O=o_i|Y=y_j).

Problems

Rabiner [1](1.html) defined three main problems for HMM models:

  1. Evaluation: Given a sequence O of observations and a model M, what is the probability P(O|M) that sequence O was generated by model M. The Evaluation problem can be efficiently solved using the Forward algorithm
  2. Decoding: Given a sequence O of observations and a model M, what is the most likely sequence Y*=argmax(Y) P(O|M,Y) of hidden variables to generate this sequence. The Decoding problem can be efficiently solved using the Viterbi algorithm.
  3. Learning: Given a sequence O of observations, what is the most likely model M*=argmax(M)P(O|M) to generate this sequence. The Learning problem can be efficiently solved using the Baum-Welch algorithm.

Example

To build a Hidden Markov Model and use it to build some predictions, try a simple example like this:

Create an input file to train the model. Here we have a sequence drawn from the set of states 0, 1, 2, and 3, separated by space characters.

$ echo "0 1 2 2 2 1 1 0 0 3 3 3 2 1 2 1 1 1 1 2 2 2 0 0 0 0 0 0 2 2 2 0 0 0 0 0 0 2 2 2 3 3 3 3 3 3 2 3 2 3 2 3 2 1 3 0 0 0 1 0 1 0 2 1 2 1 2 1 2 3 3 3 3 2 2 3 2 1 1 0" > hmm-input

Now run the baumwelch job to train your model, after first setting MAHOUT_LOCAL to true, to use your local file system.

$ export MAHOUT_LOCAL=true
$ $MAHOUT_HOME/bin/mahout baumwelch -i hmm-input -o hmm-model -nh 3 -no 4 -e .0001 -m 1000

Output like the following should appear in the console.

Initial probabilities: 
0 1 2 
1.0 0.0 3.5659361683006626E-251 
Transition matrix:
  0 1 2 
0 6.098919959130616E-5 0.9997275322964165 2.1147850399214744E-4 
1 7.404648706054873E-37 0.9086408633885092 0.09135913661149081 
2 0.2284374545687356 7.01786289571088E-11 0.7715625453610858 
Emission matrix: 
  0 1 2 3 
0 0.9999997858591223 2.0536163836449762E-39 2.1414087769942127E-7 1.052441093535389E-27 
1 7.495656581383351E-34 0.2241269055449904 0.4510889999455847 0.32478409450942497 
2 0.815051477991782 0.18494852200821799 8.465660634827592E-33 2.8603899591778015E-36 
14/03/22 09:52:21 INFO driver.MahoutDriver: Program took 180 ms (Minutes: 0.003)

The model trained with the input set now is in the file ‘hmm-model’, which we can use to build a predicted sequence.

$ $MAHOUT_HOME/bin/mahout hmmpredict -m hmm-model -o hmm-predictions -l 10

To see the predictions:

$ cat hmm-predictions 
0 1 3 3 2 2 2 2 1 2

Resources

[1] Lawrence R. Rabiner (February 1989). “A tutorial on Hidden Markov Models and selected applications in speech recognition”. Proceedings of the IEEE 77 (2): 257-286. doi:10.1109/5.18626.