Speed-up reduce view merging in fabric
Previously, `find_next_key/5` reduce collector [1] effectively ran
`hd(lists:sort(...))` to get the next key. It was spending time sorting, then
throwing everything away except the head. For the next row, it was sorting
again, throwing everything away, and taking only the head, and so on...
Needless to say, that's wasteful and we should simply find the minimum and
return that.
As an another optimization replace the dict with a map. Maps are more
efficient and generally faster.
Also, add extra unit tests asserting ICU collation behavior actually takes
place. In a few cases explicitly assert that the new behavior is the same as
the previous `hd(lists:sort(...))` behavior.
To benchmark used a simple reduce view with a built-in _count reducer [2]. For
100k docs, Q=8 this optimization leads to an ~2x speedup in view querying.
Theoretically it would be an O(n log n) -> O(n) speedup.
PR
```
./reduce_group_bench.py --ndocs 100000 --q 8 --reps 7
db=reduce_group_bench q=8 n=1 ndocs=100000 group=true
loading...
loaded 100000 docs in 4.4s
building index...
built in 4.5s
timing group=true query x7...
rows=100000
min=5644.0ms median=5786.6ms max=6108.0ms
```
MAIN
```
./reduce_group_bench.py --ndocs 100000 --q 8 --reps 7
db=reduce_group_bench q=8 n=1 ndocs=100000 group=true
loading...
loaded 100000 docs in 4.5s
building index...
built in 4.3s
timing group=true query x7...
rows=100000
min=10910.8ms median=11212.5ms max=11627.4ms
```
[1] Reduce collector buffers rows from workers and keys them by the view key.
For each row it emits it 1) finds the next key 2) takes all the rows that
collate equal to it. This optimization will only improve reduce views
[2] https://gist.github.com/nickva/cc0a3fb296be965e014eec8c484aa8a5
2 files changed