src/couch_replicator/README.md - couchdb - Git at Google

 Developer Oriented Replicator Description
 =========================================

 This description of scheduling replicator's functionality is mainly geared to
 CouchDB developers. It dives a bit into the internal and explains how
 everything is connected together. A higher level overview is available in the
 [RFC](https://github.com/apache/couchdb-documentation/pull/581). This
 documention assumes the audience is familiar with that description as well as
 with the [Couch Jobs
 RFC](https://github.com/apache/couchdb-documentation/blob/main/rfcs/007-background-jobs.md)
 as well as with the [Node Types
 RFC](https://github.com/apache/couchdb-documentation/blob/main/rfcs/013-node-types.md).

 A natural place to start is the top application supervisor:
 `couch_replicator_sup`. The set of children in the supervisor is split into
 `frontend` and `backend`. The `frontend` set is started on nodes which have the
 `api_frontend` node type label set to `true`, and `backend` ones are started on
 nodes which have the `replication` label set to `true`. The same node could
 have both them set to `true`, and it could act as a replication front and
 backend node. However, it is not guaranteed that jobs which are created by the
 frontend part will necessarily run on the backend on the same node.


 Frontend Description
 --

 The "frontend" consists of the parts which handle HTTP requests and monitor
 `_replicator` databases for changes and then create `couch_jobs` replication
 job records. Some of the modules involved in this are:

  * `couch_replicator` : Contains the main API "entry" point into the
    `couch_replicator` application. The `replicate/2` function creates transient
    replication jobs. `after_db_create/2`, `after_db_delete/2`,
    `after_doc_write/6` functions are called from `couch_epi` callbacks to
    create replication jobs from `_replicator` db events. Eventually they all
    call `couch_replicator_jobs:add_job/3` to create a `couch_jobs` replication
    job. Before the job is created, either the HTTP request body or the
    `_replicator` doc body is parsed into a `Rep` map object. An important
    property of this object is that it can be serialized to JSON and
    deserialized from JSON. This object is saved in the `?REP` field of the
    replication `couch_jobs` job data. Besides creating replication job
    `couch_replicator` is also responsible for handling `_scheduler/jobs` and
    `_scheduler/docs` monitoring API response. That happens in the `jobs/0`,
    `job/1`, `docs/` and `doc/2` function.

 Backend Description
 --

 The "backend" consists of parts which run replication jobs, update their state,
 and handle rescheduling on intermettent errors. All the job activity on these
 nodes is ultumately driven from `couch_jobs` acceptors which wait in
 `couch_jobs:accept/2` for replication jobs.

  * `couch_replicator_job_server` : A singleton process in charge of which
    spawning and keeping track of `couch_replicator_job` processes. It ensures
    there is a limited number of replication jobs running on each node. It
    periodically accepts new jobs and stopping the oldest running ones in order
    to give other pending jobs a chance to run. It runs this logic in the
    `reschedule/1` function. That function is called with a frequency defined by
    the `interval_sec` configuration setting. The other pramers which determine
    how jobs start and stop are `max_jobs` and `max_churn`. The node will try to
    limit running up to `max_jobs` job on average with periodic spikes of up to
    `max_jobs + max_churn` job at a time, and it will try not to start more than
    `max_churn` number of job during each rescheduling cycle.

  * `couch_replicator_connection`: Maintains a global replication connection
     pool. It allows reusing connections across replication tasks. The main
     interface is `acquire/1` and `release/1`. The general idea is once a
     connection is established, it is kept around for
     `replicator.connection_close_interval` milliseconds in case another
     replication task wants to re-use it. It is worth pointing out how linking
     and monitoring is handled: workers are linked to the connection pool when
     they are created. If they crash, the connection pool will receive an 'EXIT'
     event and clean up after the worker. The connection pool also monitors
     owners (by monitoring the `Pid` from the `From` argument in the call to
     `acquire/1`) and cleans up if owner dies, and the pool receives a 'DOWN'
     message. Another interesting thing is that connection establishment
     (creation) happens in the owner process so the pool is not blocked on it.

  * `couch_replicator_rate_limiter`: Implements a rate limiter to handle
     connection throttling from sources or targets where requests return 429
     error codes. Uses the Additive Increase / Multiplicative Decrease feedback
     control algorithm to converge on the channel capacity. Implemented using a
     16-way sharded ETS table to maintain connection state. The table sharding
     code is split out to `couch_replicator_rate_limiter_tables` module. The
     purpose of the module it to maintain and continually estimate sleep
     intervals for each connection represented as a `{Method, Url}` pair. The
     interval is updated accordingly on each call to `failure/1` or `success/1`
     calls. For a successful request, a client should call `success/1`. Whenever
     a 429 response is received the client should call `failure/1`. When no
     failures are happening the code ensures the ETS tables are empty in order
     to have a lower impact on a running system.
	Developer Oriented Replicator Description
	=========================================

	This description of scheduling replicator's functionality is mainly geared to
	CouchDB developers. It dives a bit into the internal and explains how
	everything is connected together. A higher level overview is available in the
	[RFC](https://github.com/apache/couchdb-documentation/pull/581). This
	documention assumes the audience is familiar with that description as well as
	with the [Couch Jobs
	RFC](https://github.com/apache/couchdb-documentation/blob/main/rfcs/007-background-jobs.md)
	as well as with the [Node Types
	RFC](https://github.com/apache/couchdb-documentation/blob/main/rfcs/013-node-types.md).

	A natural place to start is the top application supervisor:
	`couch_replicator_sup`. The set of children in the supervisor is split into
	`frontend` and `backend`. The `frontend` set is started on nodes which have the
	`api_frontend` node type label set to `true`, and `backend` ones are started on
	nodes which have the `replication` label set to `true`. The same node could
	have both them set to `true`, and it could act as a replication front and
	backend node. However, it is not guaranteed that jobs which are created by the
	frontend part will necessarily run on the backend on the same node.


	Frontend Description
	--

	The "frontend" consists of the parts which handle HTTP requests and monitor
	`_replicator` databases for changes and then create `couch_jobs` replication
	job records. Some of the modules involved in this are:

	* `couch_replicator` : Contains the main API "entry" point into the
	`couch_replicator` application. The `replicate/2` function creates transient
	replication jobs. `after_db_create/2`, `after_db_delete/2`,
	`after_doc_write/6` functions are called from `couch_epi` callbacks to
	create replication jobs from `_replicator` db events. Eventually they all
	call `couch_replicator_jobs:add_job/3` to create a `couch_jobs` replication
	job. Before the job is created, either the HTTP request body or the
	`_replicator` doc body is parsed into a `Rep` map object. An important
	property of this object is that it can be serialized to JSON and
	deserialized from JSON. This object is saved in the `?REP` field of the
	replication `couch_jobs` job data. Besides creating replication job
	`couch_replicator` is also responsible for handling `_scheduler/jobs` and
	`_scheduler/docs` monitoring API response. That happens in the `jobs/0`,
	`job/1`, `docs/` and `doc/2` function.

	Backend Description
	--

	The "backend" consists of parts which run replication jobs, update their state,
	and handle rescheduling on intermettent errors. All the job activity on these
	nodes is ultumately driven from `couch_jobs` acceptors which wait in
	`couch_jobs:accept/2` for replication jobs.

	* `couch_replicator_job_server` : A singleton process in charge of which
	spawning and keeping track of `couch_replicator_job` processes. It ensures
	there is a limited number of replication jobs running on each node. It
	periodically accepts new jobs and stopping the oldest running ones in order
	to give other pending jobs a chance to run. It runs this logic in the
	`reschedule/1` function. That function is called with a frequency defined by
	the `interval_sec` configuration setting. The other pramers which determine
	how jobs start and stop are `max_jobs` and `max_churn`. The node will try to
	limit running up to `max_jobs` job on average with periodic spikes of up to
	`max_jobs + max_churn` job at a time, and it will try not to start more than
	`max_churn` number of job during each rescheduling cycle.

	* `couch_replicator_connection`: Maintains a global replication connection
	pool. It allows reusing connections across replication tasks. The main
	interface is `acquire/1` and `release/1`. The general idea is once a
	connection is established, it is kept around for
	`replicator.connection_close_interval` milliseconds in case another
	replication task wants to re-use it. It is worth pointing out how linking
	and monitoring is handled: workers are linked to the connection pool when
	they are created. If they crash, the connection pool will receive an 'EXIT'
	event and clean up after the worker. The connection pool also monitors
	owners (by monitoring the `Pid` from the `From` argument in the call to
	`acquire/1`) and cleans up if owner dies, and the pool receives a 'DOWN'
	message. Another interesting thing is that connection establishment
	(creation) happens in the owner process so the pool is not blocked on it.

	* `couch_replicator_rate_limiter`: Implements a rate limiter to handle
	connection throttling from sources or targets where requests return 429
	error codes. Uses the Additive Increase / Multiplicative Decrease feedback
	control algorithm to converge on the channel capacity. Implemented using a
	16-way sharded ETS table to maintain connection state. The table sharding
	code is split out to `couch_replicator_rate_limiter_tables` module. The
	purpose of the module it to maintain and continually estimate sleep
	intervals for each connection represented as a `{Method, Url}` pair. The
	interval is updated accordingly on each call to `failure/1` or `success/1`
	calls. For a successful request, a client should call `success/1`. Whenever
	a 429 response is received the client should call `failure/1`. When no
	failures are happening the code ensures the ETS tables are empty in order
	to have a lower impact on a running system.