Updated file format information
diff --git a/5.0/file-formats.md b/5.0/file-formats.md
index 68af2b0..980da2e 100644
--- a/5.0/file-formats.md
+++ b/5.0/file-formats.md
@@ -7,53 +7,52 @@
## Translation models (grammars)
-Joshua supports three grammar file formats.
+Joshua supports two grammar file formats: a text-based version (also used by Hiero, shared by
+[cdec](), and supported by [hierarchical Moses]()), and an efficient
+[packed representation](packing.html) developed by [Juri Ganitkevich](http://cs.jhu.edu/~juri).
-1. Thrax / Hiero
-1. SAMT [deprecated]
-1. packed
-
-The *Hiero* format is not restricted to Hiero grammars, but simply means *the format that David
-Chiang developed for Hiero*. It can support a much broader class of SCFGs containing an arbitrary
-set of nonterminals. Similarly, the *SAMT* format is not restricted to SAMT grammars but instead
-simply denotes *the grammar format that Zollmann and Venugopal developed for their decoder*. To
-remove this source of confusion, "thrax" is the preferred format designation, and is in fact the
-default.
-
-The packed grammar format is the efficient grammar representation developed by
-[Juri Ganitkevich](http://cs.jhu.edu/~juri) [is described in detail elsewhere](packing.html).
-
-Grammar rules in the Thrax format follow this format:
+Grammar rules follow this format.
[LHS] ||| SOURCE-SIDE ||| TARGET-SIDE ||| FEATURES
-Here are some two examples, one for a Hiero grammar, and the other for an SAMT grammar:
+The source and target sides contain a mixture of terminals and nonterminals. The nonterminals are
+linked across sides by indices. There is no limit to the number of paired nonterminals in the rule
+or on the nonterminal labels (Joshua supports decoding with SAMT and GHKM grammars).
- [X] ||| el chico [X] ||| the boy [X] ||| -3.14 0 2 17
- [S] ||| el chico [VP] ||| the boy [VP] ||| -3.14 0 2 17
+ [X] ||| el chico [X,1] ||| the boy [X,1] ||| -3.14 0 2 17
+ [S] ||| el chico [VP,1] ||| the boy [VP,1] ||| -3.14 0 2 17
+ [VP] ||| [NP,1] [IN,2] [VB,3] ||| [VB,3] [IN,2] [NP,1] ||| 0.0019026637 0.81322956
+
The feature values can have optional labels, e.g.:
- [X] ||| el chico [X] ||| the boy [X] ||| lexprob=-3.14 abstract=0 numwords=2 count=17
+ [X] ||| el chico [X,1] ||| the boy [X,1] ||| lexprob=-3.14 abstract=0 numwords=2 count=17
-These feature names are made up. For an actual list of feature names, please
-[see the Thrax documentation](thrax.html).
+One file common to decoding is the glue grammar, which for hiero grammar is defined as follows:
-The SAMT grammar format is deprecated and undocumented.
+ [GOAL] ||| <s> ||| <s> ||| 0
+ [GOAL] ||| [GOAL,1] [X,2] ||| [GOAL,1] [X,2] ||| -1
+ [GOAL] ||| [GOAL,1] </s> ||| [GOAL,1] </s> ||| 0
+
+Joshua's [pipeline](pipeline.html) supports extraction of Hiero and SAMT grammars via
+[Thrax](thrax.html) or GHKM grammars using [Michel Galley](http://www-nlp.stanford.edu/~mgalley/)'s
+GHKM extractor (included) or Moses' GHKM extractor (if Moses is installed).
## Language Model
-Joshua has three language model implementations: [KenLM](), [BerkeleyLM](), and an (unrecommended)
-dummy Java implementation. All language model implementations support the standard ARPA format
-output by [SRILM](). In addition, KenLM and BerkeleyLM support compiled formats that can be loaded
-more quickly and efficiently.
+Joshua has two language model implementations: [KenLM](http://kheafield.com/code/kenlm/) and
+[BerkeleyLM](http://berkeleylm.googlecode.com). All language model implementations support the
+standard ARPA format output by [SRILM](http://www.speech.sri.com/projects/srilm/). In addition,
+KenLM and BerkeleyLM support compiled formats that can be loaded more quickly and efficiently. KenLM
+is written in C++ and is supported via a JNI bridge, while BerkeleyLM is written in Java. KenLM is
+the default because of its support for left-state minimization.
### Compiling for KenLM
To compile an ARPA grammar for KenLM, use the (provided) `build-binary` command, located deep within
the Joshua source code:
- $JOSHUA/src/joshua/decoder/ff/lm/kenlm/build_binary lm.arpa lm.kenlm
+ $JOSHUA/bin/build_binary lm.arpa lm.kenlm
This script takes the `lm.arpa` file and produces the compiled version in `lm.kenlm`.
@@ -65,14 +64,10 @@
The `lm.berkeleylm` file can then be listed directly in the [Joshua configuration file](decoder.html).
-## Joshua configuration
+## Joshua configuration file
-See [the decoder page](decoder.html).
-
-## Pipeline configuration
-
-See [the pipeline page](pipeline.html).
+The [decoder page](decoder.html) documents decoder command-line and config file options.
## Thrax configuration
-See [the thrax page](thrax.html).
+See [the thrax page](thrax.html) for more information about the Thrax configuration file.