Finite state transducers
This package implements Finite State Transducers with the following characteristics:
Fast and low memory overhead construction of the minimal FST (but inputs must be provided in sorted order)
Low object overhead and quick deserialization (byte[] representation)
Optional two-pass compression: FST.pack
Lookup-by-output when the outputs are in sorted order (e.g., ordinals or file pointers)
Pluggable Outputs representation
N-shortest-paths search by weight
Enumerators (IntsRef and BytesRef) that behave like {@link java.util.SortedMap SortedMap} iterators
FST Construction example:
// Input values (keys). These must be provided to Builder in Unicode sorted order! String inputValues[] = {"cat", "dog", "dogs"}; long outputValues[] = {5, 7, 12}; PositiveIntOutputs outputs = PositiveIntOutputs.getSingleton(); Builder<Long> builder = new Builder<Long>(INPUT_TYPE.BYTE1, outputs); BytesRef scratchBytes = new BytesRef(); IntsRef scratchInts = new IntsRef(); for (int i = 0; i < inputValues.length; i++) { scratchBytes.copyChars(inputValues[i]); builder.add(Util.toIntsRef(scratchBytes, scratchInts), outputValues[i]); } FST<Long> fst = builder.finish();
Retrieval by key:
Long value = Util.get(fst, new BytesRef("dog")); System.out.println(value); // 7
Retrieval by value:
// Only works because outputs are also in sorted order IntsRef key = Util.getByOutput(fst, 12); System.out.println(Util.toBytesRef(key, scratchBytes).utf8ToString()); // dogs
Iterate over key-value pairs in sorted order:
// Like TermsEnum, this also supports seeking (advance) BytesRefFSTEnum<Long> iterator = new BytesRefFSTEnum<Long>(fst); while (iterator.next() != null) { InputOutput<Long> mapEntry = iterator.current(); System.out.println(mapEntry.input.utf8ToString()); System.out.println(mapEntry.output); }
N-shortest paths by weight:
Comparator<Long> comparator = new Comparator<Long>() { public int compare(Long left, Long right) { return left.compareTo(right); } }; Arc<Long> firstArc = fst.getFirstArc(new Arc<Long>()); MinResult<Long> paths[] = Util.shortestPaths(fst, firstArc, comparator, 2); System.out.println(Util.toBytesRef(paths[0].input, scratchBytes).utf8ToString()); // cat System.out.println(paths[0].output); // 5 System.out.println(Util.toBytesRef(paths[1].input, scratchBytes).utf8ToString()); // dog System.out.println(paths[1].output); // 7