src/cluster/purging.rst - couchdb-documentation - Git at Google

 .. Licensed under the Apache License, Version 2.0 (the "License"); you may not
 .. use this file except in compliance with the License. You may obtain a copy of
 .. the License at
 ..
 ..   http://www.apache.org/licenses/LICENSE-2.0
 ..
 .. Unless required by applicable law or agreed to in writing, software
 .. distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
 .. WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
 .. License for the specific language governing permissions and limitations under
 .. the License.

 .. _cluster/purging:

 ===============
 Clustered Purge
 ===============
 The primary purpose of clustered purge is to clean databases that have multiple
 deleted tombstones or single documents that contain large numbers of conflicts.
 But it can also be used to purge any document (deleted or non-deleted) with any
 number of revisions.

 Clustered purge is designed to maintain eventual consistency and prevent
 unnecessary invalidation of secondary indexes. For this, every database keeps
 track of a certain number of historical purges requested in the database, as
 well as its current ``purge_seq``. Internal replications and secondary indexes
 process database's purges and periodically update their corresponding purge
 checkpoint documents to report ``purge_seq`` processed by them. To ensure
 eventual consistency, the database will remove stored historical purge requests
 only after they have been processed by internal replication jobs and secondary
 indexes.

 Internal Structures
 ====================================
 To enable internal replication of purge information between nodes and secondary
 indexes, two internal purge trees were added to a database file to track
 historical purges.

 .. code-block:: text

     purge_tree: UUID -> {PurgeSeq, DocId, Revs}
     purge_seq_tree: PurgeSeq -> {UUID, DocId, Revs}

 Each interactive request to ``_purge API``, creates an ordered set of pairs on
 increasing ``purge_seq`` and purge_request, where purge_request is a tuple that
 contains docid and list of revisions. For each purge_request uuid is generated.
 A purge request is added to internal purge trees:
 a tuple ``{UUID -> {PurgeSeq, DocId, Revs}}`` is added to ``purge_tree``,
 a tuple is ``{PurgeSeq -> {UUID, DocId, Revs}}`` added ``to purge_seq_tree``.

 Compaction of Purges
 ====================================

 During the compaction of the database the oldest purge requests are to be
 removed to store only ``purged_infos_limit`` number of purges in the database.
 But in order to keep the database consistent with indexes and other replicas,
 we can only remove purge requests that have already been processed by indexes
 and internal replications jobs. Thus, occasionally purge trees may store
 more than ``purged_infos_limit`` purges. If the number of stored purges in the
 database exceeds ``purged_infos_limit`` by a certain threshold, a warning is
 produced in logs signaling a problem of synchronization of database's purges
 with indexes and other replicas.

 Local Purge Checkpoint Documents
 ====================================
 Indexes and internal replications of the database with purges create and
 periodically update local checkpoint purge documents:
 ``_local/purge-$type-$hash``. These documents report the last ``purge_seq``
 processed by them and the timestamp of the last processing. An example of a
 local checkpoint purge document:

 .. code-block:: json

     {
       "_id": "_local/purge-mrview-86cacdfbaf6968d4ebbc324dd3723fe7",
       "type": "mrview",
       "purge_seq": 10,
       "updated_on": 1540541874,
       "ddoc_id": "_design/foo",
       "signature": "5d10247925f826ae3e00966ec24b7bf6"
     }

 The below image shows possible local checkpoint documents that a database may
 have.

 .. figure:: ../../images/purge-checkpoint-docs.png
     :align: center
     :alt: Local Purge Checkpoint Documents

     Local Purge Checkpoint Documents

 Internal Replication
 ====================================

 .. rst-class:: open

 Purge requests are replayed across all nodes in an eventually consistent manner.
 Internal replication of purges consists of two steps:

 1. Pull replication. Internal replication first starts by pulling purges from
 target and applying them on source to make sure we don't reintroduce to target
 source's docs/revs that have been already purged on target. In this step, we use
 purge checkpoint documents stored on target to keep track of the last target's
 ``purge_seq`` processed by the source. We find purge requests occurred after
 this ``purge_seq``, and replay them on source. This step is done by updating
 the target's checkpoint purge documents with the latest process ``purge_seq``
 and timestamp.

 2. Push replication. Then internal replication proceeds as usual with an extra
 step inserted to push source's purge requests to target. In this step, we use
 local internal replication checkpoint documents, that are updated both on target
 and source.

 Under normal conditions, an interactive purge request is already sent to every
 node containing a database shard's replica, and applied on every replica.
 Internal replication of purges between nodes is just an extra step to ensure
 consistency between replicas, where all purge requests on one node are replayed
 on another node. In order not to replay the same purge request on a replica,
 each interactive purge request is tagged with a unique ``uuid``. Internal
 replication filters out purge requests with UUIDs that already exist in the
 replica's ``purge_tree``, and applies only purge requests with UUIDs that don't
 exist in the ``purge_tree``. This is the reason why we needed to have two
 internal purge trees: 1) ``purge_tree``: ``{UUID -> {PurgeSeq, DocId, Revs}}``
 allows to quickly find purge requests with UUIDs that already exist in the
 replica; 2) ``purge_seq_tree``: ``{PurgeSeq -> {UUID, DocId, Revs}}`` allows to
 iterate from a given ``purge_seq`` to collect all purge requests happened after
 this ``purge_seq``.

 Indexes
 ====================================
 Each purge request will bump up ``update_seq`` of the database, so that each
 secondary index is also updated in order to apply the purge requests to maintain
 consistency within the main database.

 Config Settings
 ====================================
 These settings can be updated in the default.ini or local.ini:

 +-----------------------+--------------------------------------------+----------+
 | Field                 | Description                                | Default  |
 +=======================+============================================+==========+
 | max_document_id_number| Allowed maximum number of documents in one | 100      |
 |                       | purge request                              |          |
 +-----------------------+--------------------------------------------+----------+
 | max_revisions_number  | Allowed maximum number of accumulated      | 1000     |
 |                       | revisions in one purge request             |          |
 +-----------------------+--------------------------------------------+----------+
 | allowed_purge_seq_lag | Beside purged_infos_limit, allowed         | 100      |
 |                       | additional buffer to store purge requests  |          |
 +-----------------------+--------------------------------------------+----------+
 | index_lag_warn_seconds| Allowed durations when index is not        | 86400    |
 |                       | updated for local purge checkpoint document|          |
 +-----------------------+--------------------------------------------+----------+

 During a database compaction,  we check all checkpoint purge docs. A client (an
 index or internal replication job) is allowed to have the last reported
 ``purge_seq`` to be smaller than the current database shard's ``purge_seq`` by
 the value of ``(purged_infos_limit + allowed_purge_seq_lag)``.  If the client's
 ``purge_seq`` is even smaller, and the client has not checkpointed within
 ``index_lag_warn_seconds``, it prevents compaction of purge trees and we have to
 issue the following log warning for this client:

 .. code-block:: text

     Purge checkpoint '_local/purge-mrview-9152d15c12011288629bcffba7693fd4’
     not updated in 86400 seconds in
     <<"shards/00000000-1fffffff/testdb12.1491979089">>

 If this type of log warning occurs, check the client to see why the processing
 of purge requests is stalled in it.

 There is a mapping relationship between a design document of indexes and local
 checkpoint docs. If a design document of indexes is updated or deleted, the
 corresponding local checkpoint document should be also automatically deleted.
 But in an unexpected case, when a design doc was updated/deleted, but its
 checkpoint document still exists in a database, the following warning will be
 issued:

 .. code-block:: text

     "Invalid purge doc '<<"_design/bar">>' on database
     <<"shards/00000000-1fffffff/testdb12.1491979089">>
     with purge_seq '50'"

 If this type of log warning occurs, remove the local purge doc from a database.
	.. Licensed under the Apache License, Version 2.0 (the "License"); you may not
	.. use this file except in compliance with the License. You may obtain a copy of
	.. the License at
	..
	.. http://www.apache.org/licenses/LICENSE-2.0
	..
	.. Unless required by applicable law or agreed to in writing, software
	.. distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
	.. WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
	.. License for the specific language governing permissions and limitations under
	.. the License.

	.. _cluster/purging:

	===============
	Clustered Purge
	===============
	The primary purpose of clustered purge is to clean databases that have multiple
	deleted tombstones or single documents that contain large numbers of conflicts.
	But it can also be used to purge any document (deleted or non-deleted) with any
	number of revisions.

	Clustered purge is designed to maintain eventual consistency and prevent
	unnecessary invalidation of secondary indexes. For this, every database keeps
	track of a certain number of historical purges requested in the database, as
	well as its current ``purge_seq``. Internal replications and secondary indexes
	process database's purges and periodically update their corresponding purge
	checkpoint documents to report ``purge_seq`` processed by them. To ensure
	eventual consistency, the database will remove stored historical purge requests
	only after they have been processed by internal replication jobs and secondary
	indexes.

	Internal Structures
	====================================
	To enable internal replication of purge information between nodes and secondary
	indexes, two internal purge trees were added to a database file to track
	historical purges.

	.. code-block:: text

	purge_tree: UUID -> {PurgeSeq, DocId, Revs}
	purge_seq_tree: PurgeSeq -> {UUID, DocId, Revs}

	Each interactive request to ``_purge API``, creates an ordered set of pairs on
	increasing ``purge_seq`` and purge_request, where purge_request is a tuple that
	contains docid and list of revisions. For each purge_request uuid is generated.
	A purge request is added to internal purge trees:
	a tuple ``{UUID -> {PurgeSeq, DocId, Revs}}`` is added to ``purge_tree``,
	a tuple is ``{PurgeSeq -> {UUID, DocId, Revs}}`` added ``to purge_seq_tree``.

	Compaction of Purges
	====================================

	During the compaction of the database the oldest purge requests are to be
	removed to store only ``purged_infos_limit`` number of purges in the database.
	But in order to keep the database consistent with indexes and other replicas,
	we can only remove purge requests that have already been processed by indexes
	and internal replications jobs. Thus, occasionally purge trees may store
	more than ``purged_infos_limit`` purges. If the number of stored purges in the
	database exceeds ``purged_infos_limit`` by a certain threshold, a warning is
	produced in logs signaling a problem of synchronization of database's purges
	with indexes and other replicas.

	Local Purge Checkpoint Documents
	====================================
	Indexes and internal replications of the database with purges create and
	periodically update local checkpoint purge documents:
	``_local/purge-$type-$hash``. These documents report the last ``purge_seq``
	processed by them and the timestamp of the last processing. An example of a
	local checkpoint purge document:

	.. code-block:: json

	{
	"_id": "_local/purge-mrview-86cacdfbaf6968d4ebbc324dd3723fe7",
	"type": "mrview",
	"purge_seq": 10,
	"updated_on": 1540541874,
	"ddoc_id": "_design/foo",
	"signature": "5d10247925f826ae3e00966ec24b7bf6"
	}

	The below image shows possible local checkpoint documents that a database may
	have.

	.. figure:: ../../images/purge-checkpoint-docs.png
	:align: center
	:alt: Local Purge Checkpoint Documents

	Local Purge Checkpoint Documents

	Internal Replication
	====================================

	.. rst-class:: open

	Purge requests are replayed across all nodes in an eventually consistent manner.
	Internal replication of purges consists of two steps:

	1. Pull replication. Internal replication first starts by pulling purges from
	target and applying them on source to make sure we don't reintroduce to target
	source's docs/revs that have been already purged on target. In this step, we use
	purge checkpoint documents stored on target to keep track of the last target's
	``purge_seq`` processed by the source. We find purge requests occurred after
	this ``purge_seq``, and replay them on source. This step is done by updating
	the target's checkpoint purge documents with the latest process ``purge_seq``
	and timestamp.

	2. Push replication. Then internal replication proceeds as usual with an extra
	step inserted to push source's purge requests to target. In this step, we use
	local internal replication checkpoint documents, that are updated both on target
	and source.

	Under normal conditions, an interactive purge request is already sent to every
	node containing a database shard's replica, and applied on every replica.
	Internal replication of purges between nodes is just an extra step to ensure
	consistency between replicas, where all purge requests on one node are replayed
	on another node. In order not to replay the same purge request on a replica,
	each interactive purge request is tagged with a unique ``uuid``. Internal
	replication filters out purge requests with UUIDs that already exist in the
	replica's ``purge_tree``, and applies only purge requests with UUIDs that don't
	exist in the ``purge_tree``. This is the reason why we needed to have two
	internal purge trees: 1) ``purge_tree``: ``{UUID -> {PurgeSeq, DocId, Revs}}``
	allows to quickly find purge requests with UUIDs that already exist in the
	replica; 2) ``purge_seq_tree``: ``{PurgeSeq -> {UUID, DocId, Revs}}`` allows to
	iterate from a given ``purge_seq`` to collect all purge requests happened after
	this ``purge_seq``.

	Indexes
	====================================
	Each purge request will bump up ``update_seq`` of the database, so that each
	secondary index is also updated in order to apply the purge requests to maintain
	consistency within the main database.

	Config Settings
	====================================
	These settings can be updated in the default.ini or local.ini:

	+-----------------------+--------------------------------------------+----------+
	\| Field \| Description \| Default \|
	+=======================+============================================+==========+
	\| max_document_id_number\| Allowed maximum number of documents in one \| 100 \|
	\| \| purge request \| \|
	+-----------------------+--------------------------------------------+----------+
	\| max_revisions_number \| Allowed maximum number of accumulated \| 1000 \|
	\| \| revisions in one purge request \| \|
	+-----------------------+--------------------------------------------+----------+
	\| allowed_purge_seq_lag \| Beside purged_infos_limit, allowed \| 100 \|
	\| \| additional buffer to store purge requests \| \|
	+-----------------------+--------------------------------------------+----------+
	\| index_lag_warn_seconds\| Allowed durations when index is not \| 86400 \|
	\| \| updated for local purge checkpoint document\| \|
	+-----------------------+--------------------------------------------+----------+

	During a database compaction, we check all checkpoint purge docs. A client (an
	index or internal replication job) is allowed to have the last reported
	``purge_seq`` to be smaller than the current database shard's ``purge_seq`` by
	the value of ``(purged_infos_limit + allowed_purge_seq_lag)``. If the client's
	``purge_seq`` is even smaller, and the client has not checkpointed within
	``index_lag_warn_seconds``, it prevents compaction of purge trees and we have to
	issue the following log warning for this client:

	.. code-block:: text

	Purge checkpoint '_local/purge-mrview-9152d15c12011288629bcffba7693fd4’
	not updated in 86400 seconds in
	<<"shards/00000000-1fffffff/testdb12.1491979089">>

	If this type of log warning occurs, check the client to see why the processing
	of purge requests is stalled in it.

	There is a mapping relationship between a design document of indexes and local
	checkpoint docs. If a design document of indexes is updated or deleted, the
	corresponding local checkpoint document should be also automatically deleted.
	But in an unexpected case, when a design doc was updated/deleted, but its
	checkpoint document still exists in a database, the following warning will be
	issued:

	.. code-block:: text

	"Invalid purge doc '<<"_design/bar">>' on database
	<<"shards/00000000-1fffffff/testdb12.1491979089">>
	with purge_seq '50'"

	If this type of log warning occurs, remove the local purge doc from a database.