OPENNLP-615
Greatly simplified fuzzy string match scoring by simply normalizing the lucene output levenstein, and fixed a bug in the filtering of hits below the thresh. Refined deduping logic a bit, and made the default bag of words radius for doccat larger,  which improved scores in testing.
5 files changed