blob: df30078b7777d2ce7bd276c5584a942b4e1fc4a5 [file] [log] [blame]
.. Licensed under the Apache License, Version 2.0 (the "License"); you may not
.. use this file except in compliance with the License. You may obtain a copy of
.. the License at
..
.. http://www.apache.org/licenses/LICENSE-2.0
..
.. Unless required by applicable law or agreed to in writing, software
.. distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
.. WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
.. License for the specific language governing permissions and limitations under
.. the License.
.. _views/json:
================
Joins With Views
================
Linked Documents
================
If your :ref:`map function <mapfun>` emits an object value which has
``{'_id': XXX}`` and you :ref:`query view <api/ddoc/view>` with
``include_docs=true`` parameter, then CouchDB will fetch the document with id
``XXX`` rather than the document which was processed to emit the key/value pair.
This means that if one document contains the ids of other documents, it can
cause those documents to be fetched in the view too, adjacent to the same key
if required.
For example, if you have the following hierarchically-linked documents:
.. code-block:: javascript
[
{ "_id": "11111" },
{ "_id": "22222", "ancestors": ["11111"], "value": "hello" },
{ "_id": "33333", "ancestors": ["22222","11111"], "value": "world" }
]
You can emit the values with the ancestor documents adjacent to them in the view
like this:
.. code-block:: javascript
function(doc) {
if (doc.value) {
emit([doc.value, 0], null);
if (doc.ancestors) {
for (var i in doc.ancestors) {
emit([doc.value, Number(i)+1], {_id: doc.ancestors[i]});
}
}
}
}
The result you get is:
.. code-block:: javascript
{
"total_rows": 5,
"offset": 0,
"rows": [
{
"id": "22222",
"key": [
"hello",
0
],
"value": null,
"doc": {
"_id": "22222",
"_rev": "1-0eee81fecb5aa4f51e285c621271ff02",
"ancestors": [
"11111"
],
"value": "hello"
}
},
{
"id": "22222",
"key": [
"hello",
1
],
"value": {
"_id": "11111"
},
"doc": {
"_id": "11111",
"_rev": "1-967a00dff5e02add41819138abb3284d"
}
},
{
"id": "33333",
"key": [
"world",
0
],
"value": null,
"doc": {
"_id": "33333",
"_rev": "1-11e42b44fdb3d3784602eca7c0332a43",
"ancestors": [
"22222",
"11111"
],
"value": "world"
}
},
{
"id": "33333",
"key": [
"world",
1
],
"value": {
"_id": "22222"
},
"doc": {
"_id": "22222",
"_rev": "1-0eee81fecb5aa4f51e285c621271ff02",
"ancestors": [
"11111"
],
"value": "hello"
}
},
{
"id": "33333",
"key": [
"world",
2
],
"value": {
"_id": "11111"
},
"doc": {
"_id": "11111",
"_rev": "1-967a00dff5e02add41819138abb3284d"
}
}
]
}
which makes it very cheap to fetch a document plus all its ancestors in one
query.
Note that the ``"id"`` in the row is still that of the originating document.
The only difference is that ``include_docs`` fetches a different doc.
The current revision of the document is resolved at query time, not at the time
the view is generated. This means that if a new revision of the linked document
is added later, it will appear in view queries even though the view itself
hasn't changed. To force a specific revision of a linked document to be used,
emit a ``"_rev"`` property as well as ``"_id"``.
Using View Collation
====================
:Author: Christopher Lenz
:Date: 2007-10-05
:Source: http://www.cmlenz.net/archives/2007/10/couchdb-joins
Just today, there was a discussion on IRC on how you'd go about modeling a
simple blogging system with post and comment entities, where any blog
post might have N comments. If you'd be using an SQL database, you'd obviously
have two tables with foreign keys and you'd be using joins. (At least until you
needed to add some `denormalization`_).
.. _denormalization: http://en.wikipedia.org/wiki/Denormalization
But what would the “obvious” approach in CouchDB look like?
Approach #1: Comments Inlined
-----------------------------
A simple approach would be to have one document per blog post, and store the
comments inside that document:
.. code-block:: javascript
{
"_id": "myslug",
"_rev": "123456",
"author": "john",
"title": "My blog post",
"content": "Bla bla bla …",
"comments": [
{"author": "jack", "content": "…"},
{"author": "jane", "content": "…"}
]
}
.. note::
Of course the model of an actual blogging system would be more extensive,
you'd have tags, timestamps, etc, etc. This is just to demonstrate the basics.
The obvious advantage of this approach is that the data that belongs together
is stored in one place. Delete the post, and you automatically delete the
corresponding comments, and so on.
You may be thinking that putting the comments inside the blog post document
would not allow us to query for the comments themselves, but you'd be wrong.
You could trivially write a CouchDB view that would return all comments across
all blog posts, keyed by author:
.. code-block:: javascript
function(doc) {
for (var i in doc.comments) {
emit(doc.comments[i].author, doc.comments[i].content);
}
}
Now you could list all comments by a particular user by invoking the view and
passing it a ``?key="username"`` query string parameter.
However, this approach has a drawback that can be quite significant for many
applications: To add a comment to a post, you need to:
- Fetch the blog post document
- Add the new comment to the JSON structure
- Send the updated document to the server
Now if you have multiple client processes adding comments at roughly the same
time, some of them will get a `HTTP 409 Conflict` error on step 3 (that's
optimistic concurrency in action). For some applications this makes sense, but
in many other apps, you'd want to append new related data regardless of whether
other data has been added in the meantime.
The only way to allow non-conflicting addition of related data is by putting
that related data into separate documents.
Approach #2: Comments Separate
------------------------------
Using this approach you'd have one document per blog post, and one document per
comment. The comment documents would have a backlink to the post they belong
to.
The blog post document would look similar to the above, minus the comments
property. Also, we'd now have a type property on all our documents so that we
can tell the difference between posts and comments:
.. code-block:: javascript
{
"_id": "myslug",
"_rev": "123456",
"type": "post",
"author": "john",
"title": "My blog post",
"content": "Bla bla bla …"
}
The comments themselves are stored in separate documents, which also have a type
property (this time with the value “comment”), and additionally feature a post
property containing the ID of the post document they belong to:
.. code-block:: javascript
{
"_id": "ABCDEF",
"_rev": "123456",
"type": "comment",
"post": "myslug",
"author": "jack",
"content": "…"
}
.. code-block:: javascript
{
"_id": "DEFABC",
"_rev": "123456",
"type": "comment",
"post": "myslug",
"author": "jane",
"content": "…"
}
To list all comments per blog post, you'd add a simple view, keyed by blog post
ID:
.. code-block:: javascript
function(doc) {
if (doc.type == "comment") {
emit(doc.post, {author: doc.author, content: doc.content});
}
}
And you'd invoke that view passing it a ``?key="post_id"`` query string
parameter.
Viewing all comments by author is just as easy as before:
.. code-block:: javascript
function(doc) {
if (doc.type == "comment") {
emit(doc.author, {post: doc.post, content: doc.content});
}
}
So this is better in some ways, but it also has a disadvantage.
Imagine you want to display a blog post with all the associated comments on the
same web page. With our first approach, we needed just a single request to the
CouchDB server, namely a ``GET`` request to the document. With this second
approach, we need two requests: a ``GET`` request to the post document, and a
``GET`` request to the view that returns all comments for the post.
That is okay, but not quite satisfactory. Just imagine you wanted to add
threaded comments: you'd now need an additional fetch per comment. What we'd
probably want then would be a way to join the blog post and the various comments
together to be able to retrieve them with a single HTTP request.
This was when Damien Katz, the author of CouchDB, chimed in to the discussion
on IRC to show us the way.
Optimization: Using the Power of View Collation
-----------------------------------------------
Obvious to Damien, but not at all obvious to the rest of us: it's fairly simple
to make a view that includes both the content of the blog post document, and
the content of all the comments associated with that post. The way you do that
is by using `complex keys`. Until now we've been using simple string values for
the view keys, but in fact they can be arbitrary JSON values, so let's make
some use of that:
.. code-block:: javascript
function(doc) {
if (doc.type == "post") {
emit([doc._id, 0], null);
} else if (doc.type == "comment") {
emit([doc.post, 1], null);
}
}
Okay, this may be confusing at first. Let's take a step back and look at what
views in CouchDB are really about.
CouchDB views are basically highly efficient on-disk dictionaries that map keys
to values, where the key is automatically indexed and can be used to filter
and/or sort the results you get back from your views. When you “invoke” a view,
you can say that you're only interested in a subset of the view rows by
specifying a ``?key=foo`` query string parameter. Or you can specify
``?startkey=foo`` and/or ``?endkey=bar`` query string parameters to fetch rows
over a range of keys. Finally, by adding ``?include_docs=true`` to the query,
the result will include the full body of each emitted document.
It's also important to note that keys are always used for collating (i.e.
sorting) the rows. CouchDB has well defined (but as of yet undocumented) rules
for comparing arbitrary JSON objects for collation. For example, the JSON value
``["foo", 2]`` is sorted after (considered “greater than”) the values
``["foo"]`` or ``["foo", 1, "bar"]``, but before e.g. ``["foo", 2, "bar"]``.
This feature enables a whole class of tricks that are rather non-obvious...
.. seealso::
:ref:`views/collation`
With that in mind, let's return to the view function above. First note that,
unlike the previous view functions we've used here, this view handles both
"post" and "comment" documents, and both of them end up as rows in the same
view. Also, the key in this view is not just a simple string, but an array.
The first element in that array is always the ID of the post, regardless of
whether we're processing an actual post document, or a comment associated with
a post. The second element is 0 for post documents, and 1 for comment documents.
Let's assume we have two blog posts in our database. Without limiting the view
results via ``key``, ``startkey``, or ``endkey``, we'd get back something like
the following:
.. code-block:: javascript
{
"total_rows": 5, "offset": 0, "rows": [{
"id": "myslug",
"key": ["myslug", 0],
"value": null
}, {
"id": "ABCDEF",
"key": ["myslug", 1],
"value": null
}, {
"id": "DEFABC",
"key": ["myslug", 1],
"value": null
}, {
"id": "other_slug",
"key": ["other_slug", 0],
"value": null
}, {
"id": "CDEFAB",
"key": ["other_slug", 1],
"value": null
},
]
}
.. note::
The ``...`` placeholders here would contain the complete JSON encoding of the
corresponding documents
Now, to get a specific blog post and all associated comments, we'd invoke that
view with the query string::
?startkey=["myslug"]&endkey=["myslug", 2]&include_docs=true
We'd get back the first three rows, those that belong to the ``myslug`` post,
but not the others, along with the full bodies of each document. Et voila, we
now have the data we need to display a post with all associated comments,
retrieved via a single ``GET`` request.
You may be asking what the 0 and 1 parts of the keys are for. They're simply
to ensure that the post document is always sorted before the the associated
comment documents. So when you get back the results from this view for a
specific post, you'll know that the first row contains the data for the blog
post itself, and the remaining rows contain the comment data.
One remaining problem with this model is that comments are not ordered, but
that's simply because we don't have date/time information associated with them.
If we had, we'd add the timestamp as third element of the key array, probably
as ISO date/time strings. Now we would continue using the query string
``?startkey=["myslug"]&endkey=["myslug", 2]&include_docs=true`` to fetch the
blog post and all associated comments, only now they'd be in chronological
order.