| The following guideline is used for specifying the language model |
| filename. |
| |
| <language>[_<territory>][.<codeset>].<format> |
| |
| <language> is lowercase and taken from ISO 639-1. If ISO 639-1 does |
| not define a two-letter language code, a three-letter code defined by |
| ISO 639-2 is used. |
| |
| The <territory> field is optional, uppercase, and taken from ISO |
| 3166-1. If ISO 3166-1 does not define a two-letter country code, use |
| two or three lowercase letters and if possible, use the top-level |
| domain for the country. |
| |
| The <codeset> field is only optional if there is only one codeset |
| present for a language. It should be specified using a lowercase |
| representation of the preferred MIME name for that codeset. |
| |
| The <format> is "lm" to specify the original language model format and |
| "ln" to specify the new language model format. |
| |
| -- |
| |
| The original language models in "lm" format are part of the TextCat |
| program located at http://odur.let.rug.nl/~vannoord/TextCat/ and were |
| originally authored by Gertjan van Noord <vannoord@let.rug.nl>. |
| |
| tr.iso-8859-9.ln and ja.iso-2022-jp.ln were collated by Daniel Quinlan. |