| <!-- doc/src/sgml/replication-origins.sgml --> |
| <chapter id="replication-origins"> |
| <title>Replication Progress Tracking</title> |
| |
| <indexterm zone="replication-origins"> |
| <primary>Replication Progress Tracking</primary> |
| </indexterm> |
| <indexterm zone="replication-origins"> |
| <primary>Replication Origins</primary> |
| </indexterm> |
| |
| <para> |
| Replication origins are intended to make it easier to implement |
| logical replication solutions on top |
| of <link linkend="logicaldecoding">logical decoding</link>. |
| They provide a solution to two common problems: |
| <itemizedlist> |
| <listitem> |
| <para>How to safely keep track of replication progress</para> |
| </listitem> |
| <listitem> |
| <para>How to change replication behavior based on the |
| origin of a row; for example, to prevent loops in bi-directional |
| replication setups</para> |
| </listitem> |
| </itemizedlist> |
| </para> |
| |
| <para> |
| Replication origins have just two properties, a name and an OID. The name, |
| which is what should be used to refer to the origin across systems, is |
| free-form <type>text</type>. It should be used in a way that makes conflicts |
| between replication origins created by different replication solutions |
| unlikely; e.g., by prefixing the replication solution's name to it. |
| The OID is used only to avoid having to store the long version |
| in situations where space efficiency is important. It should never be shared |
| across systems. |
| </para> |
| |
| <para> |
| Replication origins can be created using the function |
| <link linkend="pg-replication-origin-create"><function>pg_replication_origin_create()</function></link>; |
| dropped using |
| <link linkend="pg-replication-origin-drop"><function>pg_replication_origin_drop()</function></link>; |
| and seen in the |
| <link linkend="catalog-pg-replication-origin"><structname>pg_replication_origin</structname></link> |
| system catalog. |
| </para> |
| |
| <para> |
| One nontrivial part of building a replication solution is to keep track of |
| replay progress in a safe manner. When the applying process, or the whole |
| cluster, dies, it needs to be possible to find out up to where data has |
| successfully been replicated. Naive solutions to this, such as updating a |
| row in a table for every replayed transaction, have problems like run-time |
| overhead and database bloat. |
| </para> |
| |
| <para> |
| Using the replication origin infrastructure a session can be |
| marked as replaying from a remote node (using the |
| <link linkend="pg-replication-origin-session-setup"><function>pg_replication_origin_session_setup()</function></link> |
| function). Additionally the <acronym>LSN</acronym> and commit |
| time stamp of every source transaction can be configured on a per |
| transaction basis using |
| <link linkend="pg-replication-origin-xact-setup"><function>pg_replication_origin_xact_setup()</function></link>. |
| If that's done replication progress will persist in a crash safe |
| manner. Replay progress for all replication origins can be seen in the |
| <link linkend="view-pg-replication-origin-status"> |
| <structname>pg_replication_origin_status</structname> |
| </link> view. An individual origin's progress, e.g., when resuming |
| replication, can be acquired using |
| <link linkend="pg-replication-origin-progress"><function>pg_replication_origin_progress()</function></link> |
| for any origin or |
| <link linkend="pg-replication-origin-session-progress"><function>pg_replication_origin_session_progress()</function></link> |
| for the origin configured in the current session. |
| </para> |
| |
| <para> |
| In replication topologies more complex than replication from exactly one |
| system to one other system, another problem can be that it is hard to avoid |
| replicating replayed rows again. That can lead both to cycles in the |
| replication and inefficiencies. Replication origins provide an optional |
| mechanism to recognize and prevent that. When configured using the functions |
| referenced in the previous paragraph, every change and transaction passed to |
| output plugin callbacks (see <xref linkend="logicaldecoding-output-plugin"/>) |
| generated by the session is tagged with the replication origin of the |
| generating session. This allows treating them differently in the output |
| plugin, e.g., ignoring all but locally-originating rows. Additionally |
| the <link linkend="logicaldecoding-output-plugin-filter-origin"> |
| <function>filter_by_origin_cb</function></link> callback can be used |
| to filter the logical decoding change stream based on the |
| source. While less flexible, filtering via that callback is |
| considerably more efficient than doing it in the output plugin. |
| </para> |
| </chapter> |