docs/basics/introduction.rst - distributedlog - Git at Google

 ---
 # Top navigation
 top-nav-group: user-guide
 top-nav-pos: 1
 top-nav-title: Introduction

 # Sub-level navigation
 sub-nav-group: user-guide
 sub-nav-parent: user-guide
 sub-nav-id: introduction
 sub-nav-pos: 1
 sub-nav-title: Introduction

 layout: default
 ---

 .. contents:: DistributedLog Overview

 Introduction
 ============

 DistributedLog (DL) is a high performance replicated log service.
 It offers durability, replication and strong consistency, which provides a fundamental building block
 for building reliable distributed systems, e.g replicated-state-machines, general pub/sub systems,
 distributed databases, distributed queues and etc.

 DistributedLog maintains sequences of records in categories called *Logs* (aka *Log Streams*).
 The processes that write records to a DL log are *writers*, while the processes that read
 from logs and process the records are *readers*.


 .. figure:: ../images/softwarestack.png
    :align: center

    Figure 1. DistributedLog Software Stack

 Logs
 ----

 A **log** is an ordered, immutable sequence of *log records*.

 .. figure:: ../images/datamodel.png
    :align: center

    Figure 2. Anatomy of a log stream

 Log Records
 ~~~~~~~~~~~

 Each **log record** is a sequence of bytes.
 **Log records** are written sequentially into a *log stream*, and will be assigned with
 a unique sequence number *called* **DLSN** (DistributedLog Sequence Number). Besides *DLSN*,
 applications could assign its own sequence number while constructing log records. The
 application defined sequence number is called **TransactionID** (*txid*). Either *DLSN*
 or *TransactionID* could be used for positioning readers to start reading from a specific
 *log record*.

 Log Segments
 ~~~~~~~~~~~~

 A **log** is broken down into *segments*, which each log segment contains its subset of
 records. **Log segments** are distributed and stored in a log segment store (e.g Apache BookKeeper).
 DistributedLog rolls the log segments based on configured rolling policy - either a configurable
 period of time (e.g. every 2 hours) or a configurable maximum size (e.g. every 128MB).
 So the data of logs will be divided into equal-sized *log segments* and distributed evenly
 across log segment storage nodes. It allows the log to scale beyond a size that will fit on
 a single server and also spread read traffic among the cluster.

 The data of logs will either be kept forever until application *explicitly* truncates or be retained
 for a configurable period of time. **Explicit Truncation** is useful for building replicated
 state machines such as distributed databases. They usually require strong controls over when
 the data could be truncated. **Time-based Retention** is useful for real-time analytics. They only
 care about the data within a period of time.

 Namespaces
 ~~~~~~~~~~

 The *log streams* belong to same organization are usually categorized and managed under
 a **namespace**. A DL **namespace** is basically for applications to locate where the
 *log streams* are. Applications could *create* and *delete* streams under a namespace,
 and also be able to *truncate* a stream to given sequence number (either *DLSN* or *TransactionID*).

 Writers
 -------

 Writers write data into the logs of their choice. All the records are
 appended into the logs in order. The sequencing is done by the writer,
 which means there is only one active writer for a log at a given time.
 DL guarantees correctness when two writers attempt writing to
 to a same log when network partition happens - via fencing mechanism
 in log segment store.

 The log writers are served and managed in a service tier called *Write Proxy*.
 The *Write Proxy* is used for accepting fan-in writes from large number
 of clients. Details on **Fan-in and Fan-out** can be found further into this doc.

 Readers
 -------

 Readers read records from the logs of their choice, starting from a provided
 position. The provided position could be either *DLSN* or *TransactionID*.
 The readers will read records in strict order from the logs. Different readers
 could read records starting from different positions in a same log.

 Unlike other pub/sub systems, DistributedLog doesn't record/manage readers' positions.
 It leaves the tracking responsibility to applications, as different applications
 might have different requirements on tracking and coordinating positions. It is hard
 to get it right with a single approach. For example, distributed databases might store
 the reader positions along with SSTables, so they would resume applying transactions
 from the positions stored in SSTables. Tracking reader positions could easily be done
 in application level using various stores (e.g. ZooKeeper, file system, or key/value stores).

 The log records could be cached in a service tier called *Read Proxy*, to serve
 a large number of readers.

 Fan-in and Fan-out
 ------------------

 The core of DistributedLog supports single-writer, multiple-readers semantics. The service layer
 built on top of the *DistributedLog Core* to support large scale of number of writers and readers.
 The service layer includes **Write Proxy** and **Read Proxy**. **Write Proxy** manages
 the writers of logs and fail over them when machines are failed. It allows supporting
 which don't care about the log ownership by aggregating writes from many sources (aka *Fan-in*).
 **Read Proxy** optimize reader path by caching log records in cases where hundreds or
 thousands of readers are consuming a same log stream.

 Guarantees
 ----------

 At a high level, DistributedLog gives the following guarantees:

 * Records written by a writer to a log will be appended in the order they are written. That is, if a record *R1* is written by same writer as a record *R2*, *R1* will have a smaller sequence number than *R2*.
 * Readers see records in the same order they were written to the log.
 * All records are persisted on disk before acknowledgments, to gurantee durability.
 * For a log with replication factor of N, DistributedLog tolerates up to N-1 server failures without losing any records.

 More details on these guarantees are given in the [design section](http://distributedlog.incubator.apache.org/docs/latest/user_guide/design/main.html).
	---
	# Top navigation
	top-nav-group: user-guide
	top-nav-pos: 1
	top-nav-title: Introduction

	# Sub-level navigation
	sub-nav-group: user-guide
	sub-nav-parent: user-guide
	sub-nav-id: introduction
	sub-nav-pos: 1
	sub-nav-title: Introduction

	layout: default
	---

	.. contents:: DistributedLog Overview

	Introduction
	============

	DistributedLog (DL) is a high performance replicated log service.
	It offers durability, replication and strong consistency, which provides a fundamental building block
	for building reliable distributed systems, e.g replicated-state-machines, general pub/sub systems,
	distributed databases, distributed queues and etc.

	DistributedLog maintains sequences of records in categories called Logs (aka Log Streams).
	The processes that write records to a DL log are writers, while the processes that read
	from logs and process the records are readers.


	.. figure:: ../images/softwarestack.png
	:align: center

	Figure 1. DistributedLog Software Stack

	Logs
	----

	A log is an ordered, immutable sequence of log records.

	.. figure:: ../images/datamodel.png
	:align: center

	Figure 2. Anatomy of a log stream

	Log Records
	~~~~~~~~~~~

	Each log record is a sequence of bytes.
	Log records are written sequentially into a log stream, and will be assigned with
	a unique sequence number called DLSN (DistributedLog Sequence Number). Besides DLSN,
	applications could assign its own sequence number while constructing log records. The
	application defined sequence number is called TransactionID (txid). Either DLSN
	or TransactionID could be used for positioning readers to start reading from a specific
	log record.

	Log Segments
	~~~~~~~~~~~~

	A log is broken down into segments, which each log segment contains its subset of
	records. Log segments are distributed and stored in a log segment store (e.g Apache BookKeeper).
	DistributedLog rolls the log segments based on configured rolling policy - either a configurable
	period of time (e.g. every 2 hours) or a configurable maximum size (e.g. every 128MB).
	So the data of logs will be divided into equal-sized log segments and distributed evenly
	across log segment storage nodes. It allows the log to scale beyond a size that will fit on
	a single server and also spread read traffic among the cluster.

	The data of logs will either be kept forever until application explicitly truncates or be retained
	for a configurable period of time. Explicit Truncation is useful for building replicated
	state machines such as distributed databases. They usually require strong controls over when
	the data could be truncated. Time-based Retention is useful for real-time analytics. They only
	care about the data within a period of time.

	Namespaces
	~~~~~~~~~~

	The log streams belong to same organization are usually categorized and managed under
	a namespace. A DL namespace is basically for applications to locate where the
	log streams are. Applications could create and delete streams under a namespace,
	and also be able to truncate a stream to given sequence number (either DLSN or TransactionID).

	Writers
	-------

	Writers write data into the logs of their choice. All the records are
	appended into the logs in order. The sequencing is done by the writer,
	which means there is only one active writer for a log at a given time.
	DL guarantees correctness when two writers attempt writing to
	to a same log when network partition happens - via fencing mechanism
	in log segment store.

	The log writers are served and managed in a service tier called Write Proxy.
	The Write Proxy is used for accepting fan-in writes from large number
	of clients. Details on Fan-in and Fan-out can be found further into this doc.

	Readers
	-------

	Readers read records from the logs of their choice, starting from a provided
	position. The provided position could be either DLSN or TransactionID.
	The readers will read records in strict order from the logs. Different readers
	could read records starting from different positions in a same log.

	Unlike other pub/sub systems, DistributedLog doesn't record/manage readers' positions.
	It leaves the tracking responsibility to applications, as different applications
	might have different requirements on tracking and coordinating positions. It is hard
	to get it right with a single approach. For example, distributed databases might store
	the reader positions along with SSTables, so they would resume applying transactions
	from the positions stored in SSTables. Tracking reader positions could easily be done
	in application level using various stores (e.g. ZooKeeper, file system, or key/value stores).

	The log records could be cached in a service tier called Read Proxy, to serve
	a large number of readers.

	Fan-in and Fan-out
	------------------

	The core of DistributedLog supports single-writer, multiple-readers semantics. The service layer
	built on top of the DistributedLog Core to support large scale of number of writers and readers.
	The service layer includes Write Proxy and Read Proxy. Write Proxy manages
	the writers of logs and fail over them when machines are failed. It allows supporting
	which don't care about the log ownership by aggregating writes from many sources (aka Fan-in).
	Read Proxy optimize reader path by caching log records in cases where hundreds or
	thousands of readers are consuming a same log stream.

	Guarantees
	----------

	At a high level, DistributedLog gives the following guarantees:

	* Records written by a writer to a log will be appended in the order they are written. That is, if a record R1 is written by same writer as a record R2, R1 will have a smaller sequence number than R2.
	* Readers see records in the same order they were written to the log.
	* All records are persisted on disk before acknowledgments, to gurantee durability.
	* For a log with replication factor of N, DistributedLog tolerates up to N-1 server failures without losing any records.

	More details on these guarantees are given in the [design section](http://distributedlog.incubator.apache.org/docs/latest/user_guide/design/main.html).