commit | e17e199bc651322455f9ef186a3f7d1ac778d2d1 | [log] [tgz] |
---|---|---|
author | Mark Giaconia <markg@apache.org> | Sun Feb 16 21:51:41 2014 +0000 |
committer | Mark Giaconia <markg@apache.org> | Sun Feb 16 21:51:41 2014 +0000 |
tree | 14a9e59b9beab86ec6812458fcfe3dbe8cbecf9a | |
parent | 95a83e919fbb9e30679bb1f390c8b2bd5f05ed73 [diff] |
OPENNLP-615 Greatly simplified fuzzy string match scoring by simply normalizing the lucene output levenstein, and fixed a bug in the filtering of hits below the thresh. Refined deduping logic a bit, and made the default bag of words radius for doccat larger, which improved scores in testing.