src/doc/4.0-rc2/_sources/architecture/overview.rst.txt - cassandra-website - Git at Google

 .. Licensed to the Apache Software Foundation (ASF) under one
 .. or more contributor license agreements.  See the NOTICE file
 .. distributed with this work for additional information
 .. regarding copyright ownership.  The ASF licenses this file
 .. to you under the Apache License, Version 2.0 (the
 .. "License"); you may not use this file except in compliance
 .. with the License.  You may obtain a copy of the License at
 ..
 ..     http://www.apache.org/licenses/LICENSE-2.0
 ..
 .. Unless required by applicable law or agreed to in writing, software
 .. distributed under the License is distributed on an "AS IS" BASIS,
 .. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 .. See the License for the specific language governing permissions and
 .. limitations under the License.

 .. _overview:

 Overview
 ========

 Apache Cassandra is an open source, distributed, NoSQL database. It presents
 a partitioned wide column storage model with eventually consistent semantics.

 Apache Cassandra was initially designed at `Facebook
 <https://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf>`_
 using a staged event-driven architecture (`SEDA
 <http://www.sosp.org/2001/papers/welsh.pdf>`_) to implement a combination of
 Amazon’s `Dynamo
 <http://courses.cse.tamu.edu/caverlee/csce438/readings/dynamo-paper.pdf>`_
 distributed storage and replication techniques combined with Google's `Bigtable
 <https://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf>`_
 data and storage engine model. Dynamo and Bigtable were both developed to meet
 emerging requirements for scalable, reliable and highly available storage
 systems, but each had areas that could be improved.

 Cassandra was designed as a best in class combination of both systems to meet
 emerging large scale, both in data footprint and query volume, storage
 requirements. As applications began to require full global replication and
 always available low-latency reads and writes, it became imperative to design a
 new kind of database model as the relational database systems of the time
 struggled to meet the new requirements of global scale applications.

 Systems like Cassandra are designed for these challenges and seek the
 following design objectives:

 - Full multi-master database replication
 - Global availability at low latency
 - Scaling out on commodity hardware
 - Linear throughput increase with each additional processor
 - Online load balancing and cluster growth
 - Partitioned key-oriented queries
 - Flexible schema

 Features
 --------

 Cassandra provides the Cassandra Query Language (CQL), an SQL-like language,
 to create and update database schema and access data. CQL allows users to
 organize data within a cluster of Cassandra nodes using:

 - **Keyspace**: defines how a dataset is replicated, for example in which
   datacenters and how many copies. Keyspaces contain tables.
 - **Table**: defines the typed schema for a collection of partitions. Cassandra
   tables have flexible addition of new columns to tables with zero downtime.
   Tables contain partitions, which contain partitions, which contain columns.
 - **Partition**: defines the mandatory part of the primary key all rows in
   Cassandra must have. All performant queries supply the partition key in
   the query.
 - **Row**: contains a collection of columns identified by a unique primary key
   made up of the partition key and optionally additional clustering keys.
 - **Column**: A single datum with a type which belong to a row.

 CQL supports numerous advanced features over a partitioned dataset such as:

 - Single partition lightweight transactions with atomic compare and set
   semantics.
 - User-defined types, functions and aggregates
 - Collection types including sets, maps, and lists.
 - Local secondary indices
 - (Experimental) materialized views

 Cassandra explicitly chooses not to implement operations that require cross
 partition coordination as they are typically slow and hard to provide highly
 available global semantics. For example Cassandra does not support:

 - Cross partition transactions
 - Distributed joins
 - Foreign keys or referential integrity.

 Operating
 ---------

 Apache Cassandra configuration settings are configured in the ``cassandra.yaml``
 file that can be edited by hand or with the aid of configuration management tools.
 Some settings can be manipulated live using an online interface, but others
 require a restart of the database to take effect.

 Cassandra provides tools for managing a cluster. The ``nodetool`` command
 interacts with Cassandra's live control interface, allowing runtime manipulation
 of many settings from ``cassandra.yaml``. The ``auditlogviewer`` is used
 to view the audit logs. The  ``fqltool`` is used to view, replay and compare
 full query logs.  The ``auditlogviewer`` and ``fqltool`` are new tools in
 Apache Cassandra 4.0.

 In addition, Cassandra supports out of the box atomic snapshot functionality,
 which presents a point in time snapshot of Cassandra's data for easy
 integration with many backup tools. Cassandra also supports incremental backups
 where data can be backed up as it is written.

 Apache Cassandra 4.0 has added several new features including virtual tables.
 transient replication, audit logging, full query logging, and support for Java
 11. Two of these features are experimental: transient replication and Java 11
 support.
	.. Licensed to the Apache Software Foundation (ASF) under one
	.. or more contributor license agreements. See the NOTICE file
	.. distributed with this work for additional information
	.. regarding copyright ownership. The ASF licenses this file
	.. to you under the Apache License, Version 2.0 (the
	.. "License"); you may not use this file except in compliance
	.. with the License. You may obtain a copy of the License at
	..
	.. http://www.apache.org/licenses/LICENSE-2.0
	..
	.. Unless required by applicable law or agreed to in writing, software
	.. distributed under the License is distributed on an "AS IS" BASIS,
	.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	.. See the License for the specific language governing permissions and
	.. limitations under the License.

	.. _overview:

	Overview
	========

	Apache Cassandra is an open source, distributed, NoSQL database. It presents
	a partitioned wide column storage model with eventually consistent semantics.

	Apache Cassandra was initially designed at `Facebook
	<https://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf>`_
	using a staged event-driven architecture (`SEDA
	<http://www.sosp.org/2001/papers/welsh.pdf>`_) to implement a combination of
	Amazon’s `Dynamo
	<http://courses.cse.tamu.edu/caverlee/csce438/readings/dynamo-paper.pdf>`_
	distributed storage and replication techniques combined with Google's `Bigtable
	<https://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf>`_
	data and storage engine model. Dynamo and Bigtable were both developed to meet
	emerging requirements for scalable, reliable and highly available storage
	systems, but each had areas that could be improved.

	Cassandra was designed as a best in class combination of both systems to meet
	emerging large scale, both in data footprint and query volume, storage
	requirements. As applications began to require full global replication and
	always available low-latency reads and writes, it became imperative to design a
	new kind of database model as the relational database systems of the time
	struggled to meet the new requirements of global scale applications.

	Systems like Cassandra are designed for these challenges and seek the
	following design objectives:

	- Full multi-master database replication
	- Global availability at low latency
	- Scaling out on commodity hardware
	- Linear throughput increase with each additional processor
	- Online load balancing and cluster growth
	- Partitioned key-oriented queries
	- Flexible schema

	Features
	--------

	Cassandra provides the Cassandra Query Language (CQL), an SQL-like language,
	to create and update database schema and access data. CQL allows users to
	organize data within a cluster of Cassandra nodes using:

	- Keyspace: defines how a dataset is replicated, for example in which
	datacenters and how many copies. Keyspaces contain tables.
	- Table: defines the typed schema for a collection of partitions. Cassandra
	tables have flexible addition of new columns to tables with zero downtime.
	Tables contain partitions, which contain partitions, which contain columns.
	- Partition: defines the mandatory part of the primary key all rows in
	Cassandra must have. All performant queries supply the partition key in
	the query.
	- Row: contains a collection of columns identified by a unique primary key
	made up of the partition key and optionally additional clustering keys.
	- Column: A single datum with a type which belong to a row.

	CQL supports numerous advanced features over a partitioned dataset such as:

	- Single partition lightweight transactions with atomic compare and set
	semantics.
	- User-defined types, functions and aggregates
	- Collection types including sets, maps, and lists.
	- Local secondary indices
	- (Experimental) materialized views

	Cassandra explicitly chooses not to implement operations that require cross
	partition coordination as they are typically slow and hard to provide highly
	available global semantics. For example Cassandra does not support:

	- Cross partition transactions
	- Distributed joins
	- Foreign keys or referential integrity.

	Operating
	---------

	Apache Cassandra configuration settings are configured in the ``cassandra.yaml``
	file that can be edited by hand or with the aid of configuration management tools.
	Some settings can be manipulated live using an online interface, but others
	require a restart of the database to take effect.

	Cassandra provides tools for managing a cluster. The ``nodetool`` command
	interacts with Cassandra's live control interface, allowing runtime manipulation
	of many settings from ``cassandra.yaml``. The ``auditlogviewer`` is used
	to view the audit logs. The ``fqltool`` is used to view, replay and compare
	full query logs. The ``auditlogviewer`` and ``fqltool`` are new tools in
	Apache Cassandra 4.0.

	In addition, Cassandra supports out of the box atomic snapshot functionality,
	which presents a point in time snapshot of Cassandra's data for easy
	integration with many backup tools. Cassandra also supports incremental backups
	where data can be backed up as it is written.

	Apache Cassandra 4.0 has added several new features including virtual tables.
	transient replication, audit logging, full query logging, and support for Java
	11. Two of these features are experimental: transient replication and Java 11
	support.