blob: ceff7adf4164f6a50af349e0a5bdc8be87002c90 [file] [log] [blame]
.. Licensed under the Apache License, Version 2.0 (the "License"); you may not
.. use this file except in compliance with the License. You may obtain a copy of
.. the License at
..
.. http://www.apache.org/licenses/LICENSE-2.0
..
.. Unless required by applicable law or agreed to in writing, software
.. distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
.. WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
.. License for the specific language governing permissions and limitations under
.. the License.
.. _views/nosql:
=============================
View Cookbook for SQL Jockeys
=============================
This is a collection of some common SQL queries and how to get the same result
in CouchDB. The key to remember here is that CouchDB does not work like an SQL
database at all, and that best practices from the SQL world do not translate
well or at all to CouchDB. This document’s “cookbook” assumes that you are
familiar with the CouchDB basics such as creating and updating databases and
documents.
Using Views
===========
How you would do this in SQL::
CREATE TABLE
or::
ALTER TABLE
How you can do this in CouchDB?
Using views is a two-step process. First you define a view; then you query it.
This is analogous to defining a table structure (with indexes) using
``CREATE TABLE`` or ``ALTER TABLE`` and querying it using an SQL query.
Defining a View
---------------
Defining a view is done by creating a special document in a CouchDB database.
The only real specialness is the ``_id`` of the document, which starts with
``_design/`` — for example, _design/application. Other than that, it is just a
regular CouchDB document. To make sure CouchDB understands that you are defining
a view, you need to prepare the contents of that design document in a special
format. Here is an example:
.. code-block:: javascript
{
"_id": "_design/application",
"_rev": "1-C1687D17",
"views": {
"viewname": {
"map": "function(doc) { ... }",
"reduce": "function(keys, values) { ... }"
}
}
}
We are defining a view `viewname`. The definition of the view consists of two
functions: the map function and the reduce function. Specifying a reduce
function is optional. We’ll look at the nature of the functions later. Note that
`viewname` can be whatever you like: ``users``, ``by-name``, or ``by-date`` are
just some examples.
A single design document can also include multiple view definitions, each
identified by a unique name:
.. code-block:: javascript
{
"_id": "_design/application",
"_rev": "1-C1687D17",
"views": {
"viewname": {
"map": "function(doc) { ... }",
"reduce": "function(keys, values) { ... }"
},
"anotherview": {
"map": "function(doc) { ... }",
"reduce": "function(keys, values) { ... }"
}
}
}
Querying a View
---------------
The name of the design document and the name of the view are significant for
querying the view. To query the view `viewname`, you perform an HTTP ``GET``
request to the following URI::
/database/_design/application/_view/viewname
database is the name of the database you created your design document in. Next
up is the design document name, and then the view name prefixed with ``_view/``.
To query `anotherview`, replace `viewname` in that URI with `anotherview`.
If you want to query a view in a different design document, adjust the design
document name.
MapReduce Functions
-------------------
MapReduce is a concept that solves problems by applying a two-step process,
aptly named the map phase and the reduce phase. The map phase looks at all
documents in CouchDB separately one after the other and creates a `map result`.
The map result is an ordered list of key/value pairs. Both key and value can
be specified by the user writing the map function. A map function may call the
built-in ``emit(key, value)`` function 0 to N times per document, creating a row
in the map result per invocation.
CouchDB is smart enough to run a map function only once for every document, even
on subsequent queries on a view. Only changes to documents or new documents need
to be processed anew.
Map functions
-------------
Map functions run in isolation for every document. They can’t modify the
document, and they can’t talk to the outside world—they can’t have side effects.
This is required so that CouchDB can guarantee correct results without having
to recalculate a complete result when only one document gets changed.
The map result looks like this:
.. code-block:: javascript
{"total_rows":3,"offset":0,"rows":[
{"id":"fc2636bf50556346f1ce46b4bc01fe30","key":"Lena","value":5},
{"id":"1fb2449f9b9d4e466dbfa47ebe675063","key":"Lisa","value":4},
{"id":"8ede09f6f6aeb35d948485624b28f149","key":"Sarah","value":6}
]}
It is a list of rows sorted by the value of key. The id is added automatically
and refers back to the document that created this row. The value is the data
you’re looking for. For example purposes, it’s the girl’s age.
The map function that produces this result is:
.. code-block:: javascript
function(doc) {
if(doc.name && doc.age) {
emit(doc.name, doc.age);
}
}
It includes the if statement as a sanity check to ensure that we’re operating
on the right fields and calls the emit function with the name and age as the key
and value.
Look Up by Key
==============
How you would do this in SQL::
SELECT field FROM table WHERE value="searchterm"
How you can do this in CouchDB?
Use case: get a result (which can be a record or set of records) associated
with a key ("searchterm").
To look something up quickly, regardless of the storage mechanism, an index is
needed. An index is a data structure optimized for quick search and retrieval.
CouchDB’s map result is stored in such an index, which happens to be a B+ tree.
To look up a value by "searchterm", we need to put all values into the key of a
view. All we need is a simple map function:
.. code-block:: javascript
function(doc) {
if(doc.value) {
emit(doc.value, null);
}
}
This creates a list of documents that have a value field sorted by the data in
the value field. To find all the records that match "searchterm", we query the
view and specify the search term as a query parameter::
/database/_design/application/_view/viewname?key="searchterm"
Consider the documents from the previous section, and say we’re indexing on the
age field of the documents to find all the five-year-olds:
.. code-block:: javascript
function(doc) {
if(doc.age && doc.name) {
emit(doc.age, doc.name);
}
}
Query::
/ladies/_design/ladies/_view/age?key=5
Result:
.. code-block:: javascript
{"total_rows":3,"offset":1,"rows":[
{"id":"fc2636bf50556346f1ce46b4bc01fe30","key":5,"value":"Lena"}
]}
Easy.
Note that you have to emit a value. The view result includes the associated
document ID in every row. We can use it to look up more data from the document
itself. We can also use the ``?include_docs=true`` parameter to have CouchDB
fetch the individual documents for us.
Look Up by Prefix
=================
How you would do this in SQL::
SELECT field FROM table WHERE value LIKE "searchterm%"
How you can do this in CouchDB?
Use case: find all documents that have a field value that starts with
`searchterm`. For example, say you stored a MIME type (like `text/html` or
`image/jpg`) for each document and now you want to find all documents that are
images according to the MIME type.
The solution is very similar to the previous example: all we need is a map
function that is a little more clever than the first one. But first, an example
document:
.. code-block:: javascript
{
"_id": "Hugh Laurie",
"_rev": "1-9fded7deef52ac373119d05435581edf",
"mime-type": "image/jpg",
"description": "some dude"
}
The clue lies in extracting the prefix that we want to search for from our
document and putting it into our view index. We use a regular expression to
match our prefix:
.. code-block:: javascript
function(doc) {
if(doc["mime-type"]) {
// from the start (^) match everything that is not a slash ([^\/]+) until
// we find a slash (\/). Slashes needs to be escaped with a backslash (\/)
var prefix = doc["mime-type"].match(/^[^\/]+\//);
if(prefix) {
emit(prefix, null);
}
}
}
We can now query this view with our desired MIME type prefix and not only find
all images, but also text, video, and all other formats::
/files/_design/finder/_view/by-mime-type?key="image/"
Aggregate Functions
===================
How you would do this in SQL::
SELECT COUNT(field) FROM table
How you can do this in CouchDB?
Use case: calculate a derived value from your data.
We haven’t explained reduce functions yet. Reduce functions are similar to
aggregate functions in SQL. They compute a value over multiple documents.
To explain the mechanics of reduce functions, we’ll create one that doesn’t make
a whole lot of sense. But this example is easy to understand. We’ll explore more
useful reductions later.
Reduce functions operate on the output of the map function (also called the map
result or intermediate result). The reduce function’s job, unsurprisingly, is to
reduce the list that the map function produces.
Here’s what our summing reduce function looks like:
.. code-block:: javascript
function(keys, values) {
var sum = 0;
for(var idx in values) {
sum = sum + values[idx];
}
return sum;
}
Here’s an alternate, more idiomatic JavaScript version:
.. code-block:: javascript
function(keys, values) {
var sum = 0;
values.forEach(function(element) {
sum = sum + element;
});
return sum;
}
.. note::
Don't miss effective built-in :ref:`reduce functions <reducefun>` like
``_sum`` and ``_count``
This reduce function takes two arguments: a list of keys and a list of values.
For our summing purposes we can ignore the keys-list and consider only the value
list. We’re looping over the list and add each item to a running total that
we’re returning at the end of the function.
You’ll see one difference between the map and the reduce function. The map
function uses ``emit()`` to create its result, whereas the reduce function
returns a value.
For example, from a list of integer values that specify the age, calculate the
sum of all years of life for the news headline,
`“786 life years present at event.”` A little contrived, but very simple and
thus good for demonstration purposes. Consider the documents and the map view we
used earlier in this document.
The reduce function to calculate the total age of all girls is:
.. code-block:: javascript
function(keys, values) {
return sum(values);
}
Note that, instead of the two earlier versions, we use CouchDB’s predefined
:js:func:`sum` function. It does the same thing as the other two, but it is such
a common piece of code that CouchDB has it included.
The result for our reduce view now looks like this:
.. code-block:: javascript
{"rows":[
{"key":null,"value":15}
]}
The total sum of all age fields in all our documents is 15. Just what we wanted.
The key member of the result object is null, as we can’t know anymore which
documents took part in the creation of the reduced result. We’ll cover more
advanced reduce cases later on.
As a rule of thumb, the reduce function should reduce to a single scalar value.
That is, an integer; a string; or a small, fixed-size list or object that
includes an aggregated value (or values) from the values argument.
It should never just return values or similar. CouchDB will give you a warning
if you try to use reduce “the wrong way”:
.. code-block:: javascript
{
"error":"reduce_overflow_error",
"message":"Reduce output must shrink more rapidly: Current output: ..."
}
Get Unique Values
=================
How you would do this in SQL::
SELECT DISTINCT field FROM table
How you can do this in CouchDB?
Getting unique values is not as easy as adding a keyword. But a reduce view and
a special query parameter give us the same result. Let’s say you want a list of
tags that your users have tagged themselves with and no duplicates.
First, let’s look at the source documents. We punt on ``_id`` and ``_rev``
attributes here:
.. code-block:: javascript
{
"name":"Chris",
"tags":["mustache", "music", "couchdb"]
}
.. code-block:: javascript
{
"name":"Noah",
"tags":["hypertext", "philosophy", "couchdb"]
}
.. code-block:: javascript
{
"name":"Jan",
"tags":["drums", "bike", "couchdb"]
}
Next, we need a list of all tags. A map function will do the trick:
.. code-block:: javascript
function(doc) {
if(doc.name && doc.tags) {
doc.tags.forEach(function(tag) {
emit(tag, null);
});
}
}
The result will look like this:
.. code-block:: javascript
{"total_rows":9,"offset":0,"rows":[
{"id":"3525ab874bc4965fa3cda7c549e92d30","key":"bike","value":null},
{"id":"3525ab874bc4965fa3cda7c549e92d30","key":"couchdb","value":null},
{"id":"53f82b1f0ff49a08ac79a9dff41d7860","key":"couchdb","value":null},
{"id":"da5ea89448a4506925823f4d985aabbd","key":"couchdb","value":null},
{"id":"3525ab874bc4965fa3cda7c549e92d30","key":"drums","value":null},
{"id":"53f82b1f0ff49a08ac79a9dff41d7860","key":"hypertext","value":null},
{"id":"da5ea89448a4506925823f4d985aabbd","key":"music","value":null},
{"id":"da5ea89448a4506925823f4d985aabbd","key":"mustache","value":null},
{"id":"53f82b1f0ff49a08ac79a9dff41d7860","key":"philosophy","value":null}
]}
As promised, these are all the tags, including duplicates. Since each document
gets run through the map function in isolation, it cannot know if the same key
has been emitted already. At this stage, we need to live with that. To achieve
uniqueness, we need a reduce:
.. code-block:: javascript
function(keys, values) {
return true;
}
This reduce doesn’t do anything, but it allows us to specify a special query
parameter when querying the view::
/dudes/_design/dude-data/_view/tags?group=true
CouchDB replies:
.. code-block:: javascript
{"rows":[
{"key":"bike","value":true},
{"key":"couchdb","value":true},
{"key":"drums","value":true},
{"key":"hypertext","value":true},
{"key":"music","value":true},
{"key":"mustache","value":true},
{"key":"philosophy","value":true}
]}
In this case, we can ignore the value part because it is always true, but the
result includes a list of all our tags and no duplicates!
With a small change we can put the reduce to good use, too. Let’s see how many
of the non-unique tags are there for each tag. To calculate the tag frequency,
we just use the summing up we already learned about. In the map function,
we emit a 1 instead of null:
.. code-block:: javascript
function(doc) {
if(doc.name && doc.tags) {
doc.tags.forEach(function(tag) {
emit(tag, 1);
});
}
}
In the reduce function, we return the sum of all values:
.. code-block:: javascript
function(keys, values) {
return sum(values);
}
Now, if we query the view with the ``?group=true`` parameter, we get back the
count for each tag:
.. code-block:: javascript
{"rows":[
{"key":"bike","value":1},
{"key":"couchdb","value":3},
{"key":"drums","value":1},
{"key":"hypertext","value":1},
{"key":"music","value":1},
{"key":"mustache","value":1},
{"key":"philosophy","value":1}
]}
Enforcing Uniqueness
====================
How you would do this in SQL::
UNIQUE KEY(column)
How you can do this in CouchDB?
Use case: your applications require that a certain value exists only once in a
database.
This is an easy one: within a CouchDB database, each document must have a
unique ``_id`` field. If you require unique values in a database, just assign
them to a document’s ``_id`` field and CouchDB will enforce uniqueness for you.
There’s one caveat, though: in the distributed case, when you are running more
than one CouchDB node that accepts write requests, uniqueness can be guaranteed
only per node or outside of CouchDB. CouchDB will allow two identical IDs to be
written to two different nodes. On replication, CouchDB will detect a conflict
and flag the document accordingly.