tree 254216d1014ef116e47c0b13fcac94cdd43afc1f
parent a297e2e9e2c006381f982e73a08fff29c4f8db8f
author brkolla <bkolla@cloudant.com> 1445559402 -0400
committer brkolla <bkolla@cloudant.com> 1445969894 -0400

Provide an ability to disable the indexing of array lengths.

Depending on the data shape, cloudant query would end up creating many
thousands of unique fields and this is leading to JVM heap exhaustion
as Lucene tries to cache information about fields and Lucene is not
designed to handle many thousands fields.
This change allows the user to disable the indexing of array lengths
field. So that they don’t need to take the hit on performance if they
don’t plan to use that field in their queries ($size operator)

Array length field is a single extra field per unique path to an array. The case where we found this was a client that had data that used arbitrary data as keys which exploded the number of fields in Lucene. The obvious fix was to switch to only indexing what they wanted to query on. Unfortunately that didn't prevent the automatically created array length fields from being created. This patch is a big hammer to remove the auto generated array length fields which may be generally useful. Though we're also planning on another patch that removes array length fields for anything that's not specified in the index's field list.

Add index_array_lengths to the list of valid fields in the index
document so that the index document with this field will pass and
enforce the boolean value.
