tree: c39d80c413dad094858232c884a8f7be5fa3a298 [path history] [tgz]
  1. src/
  2. .gitignore
  3. LICENSE
  4. nouveau.yaml
  5. pom.xml
  6. README.md
  7. TODO
java/nouveau/README.md

nouveau

Lucene 9 + DropWizard = Maybe a good search option for Apache CouchDB?

Nouveau is an experimental search extension for CouchDB 3.x.

What works?

  • you can define a default analyzer and different analyzers by field name.
  • sorting on text and numbers
  • classic lucene query syntax
  • count and range facets
  • cursor support for paginating efficiently through large results sets
  • indexes automatically deleted if database is deleted (as long as nouveau is running!)
  • integration with ken
  • update=false
  • support for stale=ok
  • integration with mango

What doesn't work yet?

  • include_docs=true
  • No support for results grouping
  • No support to configure stop words for analyzers

Why is this better than dreyfus/clouseau?

  • No scalang (or Scala!)
  • Supports any version of Java that Lucene 9 supports
  • memory-mapped I/O for performance
  • direct I/O used for segment merging (so we don't evict useful data from disk cache)
  • It's new and shiny.

Erlang side

You'll need to run a fork of couchdb: https://github.com/rnewson/couchdb-nouveau

Getting started

Build Nouveau with;

mvn package

Run Nouvea with;

java -jar target/nouveau-*.jar server nouveau.yaml

Now run CouchDB using the ‘nouveau’ branch of my fork at https://github.com/rnewson/couchdb-nouveau;

make && dev/run --admin=foo:bar

Make a database with some data and an index definition;

#!/bin/sh

URL="http://foo:bar@127.0.0.1:15984/foo"

curl -X DELETE "$URL"
curl -X PUT "$URL?n=3&q=16"

curl -X PUT "$URL/_design/foo" -d '{"nouveau":{"bar":{"default_analyzer":"standard", "field_analyzers":{"foo":"english"}, "index":"function(doc) { index(\"foo\", \"bar\", \"string\"); index(\"foo\", \"bar\", \"stored_string\"); }"}}}'

# curl "$URL/_index" -Hcontent-type:application/json -d '{"type":"nouveau", "index": {"fields": [{"name": "bar", "type":"number"}]}}'

for I in {1..100}; do
    DOCID=$RANDOM
    DOCID=$[ $DOCID % 100000 ]
    BAR=$RANDOM
    BAR=$[ $BAR % 100000 ]
    curl -X PUT "$URL/doc$DOCID" -d "{\"bar\": $BAR}"
done

In order not to collide with dreyfus I've hooked Nouveau in with some uglier paths for now;

curl 'foo:bar@localhost:15984/foo/_design/foo/_nouveau/bar?q=*:*'

This will cause Nouveau to build indexes for each copy (N) and each shard range (Q) and then perform a search and return the results. Lots of query syntax is working as is sorting on strings and numbers (`sort=[“fieldnamehere<string>”] or sort=[“fieldnamehere<number>”], defaulting to number).

Facet support

Counts of string fields and Ranges for numeric fields;

curl 'foo:bar@localhost:15984/foo/_design/foo/_nouveau/bar?q=*:*&limit=1&ranges={"bar":[{"label":"cheap","min":0,"max":100}]}&counts=["foo"]' -g

Index function

To ease migration nouveau functions can use the ‘index’ function exactly as it exists in dreyfus, but the function also supports a new style.

ArgumentsEffect
index(“foo”, “bar”);adds a TextField.
index(“foo”, “bar”, {“store”:true});adds a TextField and a StoredField.
index(“foo”, “bar”, {“store”:true, “facet”:true});adds a TextField, a StoredField and a SortedSetDocValuesField.
index(“foo”, “bar”, “text”);adds a TextField.
index(“foo”, “bar”, “string”);adds a StringField.
index(“foo”, “bar”, “stored_string”);adds a StoredField.
index(“foo”, “bar”, “sorted_set_dv”);adds a SortedSetDocValuesField.
index(“foo”, “bar”, “string”, true);adds a TextField with Store.YES