src/ddocs/views/collation.rst - couchdb-documentation - Git at Google

 .. Licensed under the Apache License, Version 2.0 (the "License"); you may not
 .. use this file except in compliance with the License. You may obtain a copy of
 .. the License at
 ..
 ..   http://www.apache.org/licenses/LICENSE-2.0
 ..
 .. Unless required by applicable law or agreed to in writing, software
 .. distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
 .. WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
 .. License for the specific language governing permissions and limitations under
 .. the License.

 .. _views/collation:

 ===============
 Views Collation
 ===============

 Basics
 ======

 View functions specify a key and a value to be returned for each row. CouchDB
 collates the view rows by this key. In the following example, the ``LastName``
 property serves as the key, thus the result will be sorted by ``LastName``:

 .. code-block:: javascript

     function(doc) {
         if (doc.Type == "customer") {
             emit(doc.LastName, {FirstName: doc.FirstName, Address: doc.Address});
         }
     }

 CouchDB allows arbitrary JSON structures to be used as keys. You can use JSON
 arrays as keys for fine-grained control over sorting and grouping.

 Examples
 ========

 The following clever trick would return both customer and order documents.
 The key is composed of a customer ``_id`` and a sorting token. Because the key
 for order documents begins with the ``_id`` of a customer document, all the
 orders will be sorted by customer. Because the sorting token for customers is
 lower than the token for orders, the customer document will come before the
 associated orders. The values 0 and 1 for the sorting token are arbitrary.

 .. code-block:: javascript

     function(doc) {
         if (doc.Type == "customer") {
             emit([doc._id, 0], null);
         } else if (doc.Type == "order") {
             emit([doc.customer_id, 1], null);
         }
     }

 To list a specific customer with ``_id`` XYZ, and all of that customer's orders,
 limit the startkey and endkey ranges to cover only documents for that customer's
 ``_id``::

     startkey=["XYZ"]&endkey=["XYZ", {}]

 It is not recommended to emit the document itself in the view. Instead, to
 include the bodies of the documents when requesting the view, request the view
 with ``?include_docs=true``.

 Sorting by Dates
 ================

 It maybe be convenient to store date attributes in a human readable format
 (i.e. as a `string`), but still sort by date. This can be done by converting
 the date to a `number` in the :js:func:`emit` function. For example, given
 a document with a created_at attribute of ``'Wed Jul 23 16:29:21 +0100 2013'``,
 the following emit function would sort by date:

 .. code-block:: javascript

     emit(Date.parse(doc.created_at).getTime(), null);

 Alternatively, if you use a date format which sorts lexicographically,
 such as ``"2013/06/09 13:52:11 +0000"`` you can just

 .. code-block:: javascript

     emit(doc.created_at, null);

 and avoid the conversion. As a bonus, this date format is compatible with the
 JavaScript date parser, so you can use ``new Date(doc.created_at)`` in your
 client side JavaScript to make date sorting easy in the browser.

 String Ranges
 =============

 If you need start and end keys that encompass every string with a given prefix,
 it is better to use a high value Unicode character, than to use a ``'ZZZZ'``
 suffix.

 That is, rather than::

     startkey="abc"&endkey="abcZZZZZZZZZ"

 You should use::

     startkey="abc"&endkey="abc\ufff0"

 Collation Specification
 =======================

 This section is based on the view_collation function in `view_collation.js`_:

 .. _view_collation.js: https://github.com/apache/couchdb/blob/main/test/javascript/tests/view_collation.js

 .. code-block:: javascript

     // special values sort before all other types
     null
     false
     true

     // then numbers
     1
     2
     3.0
     4

     // then text, case sensitive
     "a"
     "A"
     "aa"
     "b"
     "B"
     "ba"
     "bb"

     // then arrays. compared element by element until different.
     // Longer arrays sort after their prefixes
     ["a"]
     ["b"]
     ["b","c"]
     ["b","c", "a"]
     ["b","d"]
     ["b","d", "e"]

     // then object, compares each key value in the list until different.
     // larger objects sort after their subset objects.
     {a:1}
     {a:2}
     {b:1}
     {b:2}
     {b:2, a:1} // Member order does matter for collation.
                // CouchDB preserves member order
                // but doesn't require that clients will.
                // this test might fail if used with a js engine
                // that doesn't preserve order
     {b:2, c:2}

 Comparison of strings is done using `ICU`_ which implements the
 `Unicode Collation Algorithm`_, giving a dictionary sorting of keys.
 This can give surprising results if you were expecting ASCII ordering.
 Note that:

 - All symbols sort before numbers and letters (even the "high" symbols like
   tilde, ``0x7e``)

 - Differing sequences of letters are compared without regard to case, so
   ``a < aa`` but also ``A < aa`` and ``a < AA``

 - Identical sequences of letters are compared with regard to case, with
   lowercase before uppercase, so ``a < A``

 .. _ICU: http://site.icu-project.org/
 .. _Unicode Collation Algorithm: https://www.unicode.org/reports/tr10/

 You can demonstrate the collation sequence for 7-bit ASCII characters like this:

 .. code-block:: ruby

     require 'rubygems'
     require 'restclient'
     require 'json'

     DB="http://127.0.0.1:5984/collator"

     RestClient.delete DB rescue nil
     RestClient.put "#{DB}",""

     (32..126).each do |c|
         RestClient.put "#{DB}/#{c.to_s(16)}", {"x"=>c.chr}.to_json
     end

     RestClient.put "#{DB}/_design/test", <<EOS
     {
         "views":{
             "one":{
                 "map":"function (doc) { emit(doc.x,null); }"
             }
         }
     }
     EOS

     puts RestClient.get("#{DB}/_design/test/_view/one")

 This shows the collation sequence to be::

     ` ^ _ - , ; : ! ? . ' " ( ) [ ] { } @ * / \ & # % + < = > | ~ $ 0 1 2 3 4 5 6 7 8 9
     a A b B c C d D e E f F g G h H i I j J k K l L m M n N o O p P q Q r R s S t T u U v V w W x X y Y z Z

 Key ranges
 ----------

 Take special care when querying key ranges. For example: the query::

     startkey="Abc"&endkey="AbcZZZZ"

 will match "ABC" and "abc1", but not "abc". This is because UCA sorts as::

     abc < Abc < ABC < abc1 < AbcZZZZZ

 For most applications, to avoid problems you should lowercase the `startkey`::

     startkey="abc"&endkey="abcZZZZZZZZ"

 will match all keys starting with ``[aA][bB][cC]``

 Complex keys
 ------------

 The query ``startkey=["foo"]&endkey=["foo",{}]`` will match most array keys
 with "foo" in the first element, such as ``["foo","bar"]`` and
 ``["foo",["bar","baz"]]``. However it will not match ``["foo",{"an":"object"}]``

 _all_docs
 =========

 The :ref:`_all_docs <api/db/all_docs>`  view is a special case because it uses
 ASCII collation for doc ids, not UCA::

     startkey="_design/"&endkey="_design/ZZZZZZZZ"

 will not find ``_design/abc`` because `'Z'` comes before `'a'` in the ASCII
 sequence. A better solution is::

     startkey="_design/"&endkey="_design0"

 Raw collation
 =============

 To squeeze a little more performance out of views, you can specify
 ``"options":{"collation":"raw"}``  within the view definition for native Erlang
 collation, especially if you don't require UCA. This gives a different collation
 sequence:

 .. code-block:: javascript

     1
     false
     null
     true
     {"a":"a"},
     ["a"]
     "a"

 Beware that ``{}`` is no longer a suitable "high" key sentinel value. Use a
 string like ``"\ufff0"`` instead.
	.. Licensed under the Apache License, Version 2.0 (the "License"); you may not
	.. use this file except in compliance with the License. You may obtain a copy of
	.. the License at
	..
	.. http://www.apache.org/licenses/LICENSE-2.0
	..
	.. Unless required by applicable law or agreed to in writing, software
	.. distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
	.. WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
	.. License for the specific language governing permissions and limitations under
	.. the License.

	.. _views/collation:

	===============
	Views Collation
	===============

	Basics
	======

	View functions specify a key and a value to be returned for each row. CouchDB
	collates the view rows by this key. In the following example, the ``LastName``
	property serves as the key, thus the result will be sorted by ``LastName``:

	.. code-block:: javascript

	function(doc) {
	if (doc.Type == "customer") {
	emit(doc.LastName, {FirstName: doc.FirstName, Address: doc.Address});
	}
	}

	CouchDB allows arbitrary JSON structures to be used as keys. You can use JSON
	arrays as keys for fine-grained control over sorting and grouping.

	Examples
	========

	The following clever trick would return both customer and order documents.
	The key is composed of a customer ``_id`` and a sorting token. Because the key
	for order documents begins with the ``_id`` of a customer document, all the
	orders will be sorted by customer. Because the sorting token for customers is
	lower than the token for orders, the customer document will come before the
	associated orders. The values 0 and 1 for the sorting token are arbitrary.

	.. code-block:: javascript

	function(doc) {
	if (doc.Type == "customer") {
	emit([doc._id, 0], null);
	} else if (doc.Type == "order") {
	emit([doc.customer_id, 1], null);
	}
	}

	To list a specific customer with ``_id`` XYZ, and all of that customer's orders,
	limit the startkey and endkey ranges to cover only documents for that customer's
	``_id``::

	startkey=["XYZ"]&endkey=["XYZ", {}]

	It is not recommended to emit the document itself in the view. Instead, to
	include the bodies of the documents when requesting the view, request the view
	with ``?include_docs=true``.

	Sorting by Dates
	================

	It maybe be convenient to store date attributes in a human readable format
	(i.e. as a `string`), but still sort by date. This can be done by converting
	the date to a `number` in the :js:func:`emit` function. For example, given
	a document with a created_at attribute of ``'Wed Jul 23 16:29:21 +0100 2013'``,
	the following emit function would sort by date:

	.. code-block:: javascript

	emit(Date.parse(doc.created_at).getTime(), null);

	Alternatively, if you use a date format which sorts lexicographically,
	such as ``"2013/06/09 13:52:11 +0000"`` you can just

	.. code-block:: javascript

	emit(doc.created_at, null);

	and avoid the conversion. As a bonus, this date format is compatible with the
	JavaScript date parser, so you can use ``new Date(doc.created_at)`` in your
	client side JavaScript to make date sorting easy in the browser.

	String Ranges
	=============

	If you need start and end keys that encompass every string with a given prefix,
	it is better to use a high value Unicode character, than to use a ``'ZZZZ'``
	suffix.

	That is, rather than::

	startkey="abc"&endkey="abcZZZZZZZZZ"

	You should use::

	startkey="abc"&endkey="abc\ufff0"

	Collation Specification
	=======================

	This section is based on the view_collation function in `view_collation.js`_:

	.. _view_collation.js: https://github.com/apache/couchdb/blob/main/test/javascript/tests/view_collation.js

	.. code-block:: javascript

	// special values sort before all other types
	null
	false
	true

	// then numbers
	1
	2
	3.0
	4

	// then text, case sensitive
	"a"
	"A"
	"aa"
	"b"
	"B"
	"ba"
	"bb"

	// then arrays. compared element by element until different.
	// Longer arrays sort after their prefixes
	["a"]
	["b"]
	["b","c"]
	["b","c", "a"]
	["b","d"]
	["b","d", "e"]

	// then object, compares each key value in the list until different.
	// larger objects sort after their subset objects.
	{a:1}
	{a:2}
	{b:1}
	{b:2}
	{b:2, a:1} // Member order does matter for collation.
	// CouchDB preserves member order
	// but doesn't require that clients will.
	// this test might fail if used with a js engine
	// that doesn't preserve order
	{b:2, c:2}

	Comparison of strings is done using `ICU`_ which implements the
	`Unicode Collation Algorithm`_, giving a dictionary sorting of keys.
	This can give surprising results if you were expecting ASCII ordering.
	Note that:

	- All symbols sort before numbers and letters (even the "high" symbols like
	tilde, ``0x7e``)

	- Differing sequences of letters are compared without regard to case, so
	``a < aa`` but also ``A < aa`` and ``a < AA``

	- Identical sequences of letters are compared with regard to case, with
	lowercase before uppercase, so ``a < A``

	.. _ICU: http://site.icu-project.org/
	.. _Unicode Collation Algorithm: https://www.unicode.org/reports/tr10/

	You can demonstrate the collation sequence for 7-bit ASCII characters like this:

	.. code-block:: ruby

	require 'rubygems'
	require 'restclient'
	require 'json'

	DB="http://127.0.0.1:5984/collator"

	RestClient.delete DB rescue nil
	RestClient.put "#{DB}",""

	(32..126).each do \|c\|
	RestClient.put "#{DB}/#{c.to_s(16)}", {"x"=>c.chr}.to_json
	end

	RestClient.put "#{DB}/_design/test", <<EOS
	{
	"views":{
	"one":{
	"map":"function (doc) { emit(doc.x,null); }"
	}
	}
	}
	EOS

	puts RestClient.get("#{DB}/_design/test/_view/one")

	This shows the collation sequence to be::

	` ^ _ - , ; : ! ? . ' " ( ) [ ] { } @ * / \ & # % + < = > \| ~ $ 0 1 2 3 4 5 6 7 8 9
	a A b B c C d D e E f F g G h H i I j J k K l L m M n N o O p P q Q r R s S t T u U v V w W x X y Y z Z

	Key ranges
	----------

	Take special care when querying key ranges. For example: the query::

	startkey="Abc"&endkey="AbcZZZZ"

	will match "ABC" and "abc1", but not "abc". This is because UCA sorts as::

	abc < Abc < ABC < abc1 < AbcZZZZZ

	For most applications, to avoid problems you should lowercase the `startkey`::

	startkey="abc"&endkey="abcZZZZZZZZ"

	will match all keys starting with ``[aA][bB][cC]``

	Complex keys
	------------

	The query ``startkey=["foo"]&endkey=["foo",{}]`` will match most array keys
	with "foo" in the first element, such as ``["foo","bar"]`` and
	``["foo",["bar","baz"]]``. However it will not match ``["foo",{"an":"object"}]``

	_all_docs
	=========

	The :ref:`_all_docs <api/db/all_docs>` view is a special case because it uses
	ASCII collation for doc ids, not UCA::

	startkey="_design/"&endkey="_design/ZZZZZZZZ"

	will not find ``_design/abc`` because `'Z'` comes before `'a'` in the ASCII
	sequence. A better solution is::

	startkey="_design/"&endkey="_design0"

	Raw collation
	=============

	To squeeze a little more performance out of views, you can specify
	``"options":{"collation":"raw"}`` within the view definition for native Erlang
	collation, especially if you don't require UCA. This gives a different collation
	sequence:

	.. code-block:: javascript

	1
	false
	null
	true
	{"a":"a"},
	["a"]
	"a"

	Beware that ``{}`` is no longer a suitable "high" key sentinel value. Use a
	string like ``"\ufff0"`` instead.