docs/_docs/persistence/snapshots.adoc - ignite - Git at Google

 // Licensed to the Apache Software Foundation (ASF) under one or more
 // contributor license agreements.  See the NOTICE file distributed with
 // this work for additional information regarding copyright ownership.
 // The ASF licenses this file to You under the Apache License, Version 2.0
 // (the "License"); you may not use this file except in compliance with
 // the License.  You may obtain a copy of the License at
 //
 // http://www.apache.org/licenses/LICENSE-2.0
 //
 // Unless required by applicable law or agreed to in writing, software
 // distributed under the License is distributed on an "AS IS" BASIS,
 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 // See the License for the specific language governing permissions and
 // limitations under the License.
 = Cluster Snapshots

 == Overview

 Ignite provides an ability to create full cluster snapshots for deployments using
 link:persistence/native-persistence[Ignite Persistence]. An Ignite snapshot includes a consistent cluster-wide copy of
 all data records persisted on disk and some other files needed for a restore procedure.

 The snapshot structure is similar to the layout of the
 link:persistence/native-persistence#configuring-persistent-storage-directory[Ignite Persistence storage directory],
 with several exceptions. Let's take this snapshot as an example to review the structure:
 [source,shell]
 ----
 work
 └── snapshots
     └── backup23012020
         └── db
             ├── binary_meta
             │         ├── node1
             │         ├── node2
             │         └── node3
             ├── marshaller
             │         ├── node1
             │         ├── node2
             │         └── node3
             ├── node1
             │    └── my-sample-cache
             │        ├── cache_data.dat
             │        ├── part-3.bin
             │        ├── part-4.bin
             │        └── part-6.bin
             ├── node2
             │    └── my-sample-cache
             │        ├── cache_data.dat
             │        ├── part-1.bin
             │        ├── part-5.bin
             │        └── part-7.bin
             └── node3
                 └── my-sample-cache
                     ├── cache_data.dat
                     ├── part-0.bin
                     └── part-2.bin
 ----
 * The snapshot is located under the `work\snapshots` directory and named as `backup23012020` where `work` is Ignite's work
 directory.
 * The snapshot is created for a 3-node cluster with all the nodes running on the same machine. In this example,
 the nodes are named as `node1`, `node2`, and `node3`, while in practice, the names are equal to nodes'
 link:https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood#IgnitePersistentStoreunderthehood-SubfoldersGeneration[consistent IDs].
 * The snapshot keeps a copy of the `my-sample-cache` cache.
 * The `db` folder keeps a copy of data records in `part-N.bin` and `cache_data.dat` files. Write-ahead and checkpointing
 are not added into the snapshot as long as those are not required for the current restore procedure.
 * The `binary_meta` and `marshaller` directories store metadata and marshaller-specific information.

 [NOTE]
 ====
 [discrete]
 === Usually Snapshot is Spread Across the Cluster

 The previous example shows the snapshot created for the cluster running on the same physical machine. Thus, the whole
 snapshot is located in a single place. While in practice, all the nodes will be running on different machines having the
 snapshot data spread across the cluster. Each node keeps a segment of the snapshot with the data belonging to this particular node.
 The link:persistence/snapshots#restoring-from-snapshot[restore procedure] explains how to tether together all the segments during recovery.
 ====

 == Configuring Snapshot Directory

 By default, a segment of the snapshot is stored in the work directory of a respective Ignite node and uses the same storage
 media where Ignite Persistence keeps data, index, WAL, and other files. Since the snapshot can consume as much space as
 already taken by the persistence files and can affect your applications' performance by sharing the disk I/O with the
 Ignite Persistence routines, it's suggested to store the snapshot and persistence files on different media.

 You can avoid this interference between Ignite Native persistence and snapshotting
 by either changing link:persistence/native-persistence#configuring-persistent-storage-directory[storage directories of the persistence files]
 or overriding the default snapshots' location as shown below:
 [tabs]
 --
 tab:XML[]
 [source, xml]
 ----
 include::code-snippets/xml/snapshots.xml[tags=ignite-config;!discovery, indent=0]
 ----
 tab:Java[]
 [source, java]
 ----
 include::{javaCodeDir}/Snapshots.java[tags=config, indent=0]
 ----
 --

 == Creating Snapshot

 Ignite provides several APIs for the snapshot creation. Let's review all the options.

 === Using Control Script

 Ignite ships the link:control-script[control script] that supports snapshots-related commands listed below:

 [source,shell]
 ----
 #Create a cluster snapshot:
 control.(sh|bat) --snapshot create snapshot_name

 #Cancel a running snapshot:
 control.(sh|bat) --snapshot cancel snapshot_name

 #Kill a running snapshot:
 control.(sh|bat) --kill SNAPSHOT snapshot_name
 ----

 === Using JMX

 Use the `SnapshotMXBean` interface to perform the snapshot-specific procedures via JMX:

 [cols="1,1",opts="header"]
 |===
 |Method | Description
 |createSnapshot(String snpName) | Create a snapshot.
 |cancelSnapshot(String snpName) | Cancel a snapshot on the node initiated its creation.
 |===

 === Using Java API

 Also, it's possible to create a snapshot programmatically in Java:

 [tabs]
 --
 tab:Java[]

 [source, java]
 ----
 include::{javaCodeDir}/Snapshots.java[tags=create, indent=0]
 ----
 --

 == Restoring From Snapshot

 Currently, the data restore procedure has to be performed manually. In a nutshell, you need to stop the cluster,
 replace persistence data and other files with the data from the snapshot, and restart the nodes.

 The detailed procedure looks as follows:

 . Stop the cluster you intend to restore
 . Remove all files from the checkpoint `$IGNITE_HOME/work/cp` directory
 . Do the following on each node. Clean the
 link:link:persistence/native-persistence#configuring-persistent-storage-directory[`db/{node_id}`] directory separately if
 it's not located under the Ignite `work` dir:
     - Remove the files related to the `{nodeId}` from the `$IGNITE_HOME/work/db/binary_meta` directory
     - Remove the files related to the `{nodeId}` from the `$IGNITE_HOME/work/db/marshaller` directory
     - Remove the files and sub-directories related to the `{nodeId}` under your `$IGNITE_HOME/work/db` directory. Clean the
     - Copy the files belonging to a node with the `{node_id}` from the snapshot into the `$IGNITE_HOME/work/` directory.
 If the `db/{node_id}` directory is not located under the Ignite `work` dir then you need to copy data files there.
 . Restart the cluster

 *Restore On Cluster of Different Topology*

 Sometimes you might want to create a snapshot of an N-node cluster and use it to restore on an M-node cluster. The table
 below explains what options are supported:

 [cols="1,1",opts="header"]
 |===
 |Condition | Description
 |N == M | The *recommended* case. Create and use the snapshot on clusters of a similar topology.
 |N < M | Start the first N nodes of the M-node cluster and apply the snapshot. Add the rest of the M-cluster nodes to
 the topology and wait while the data gets rebalanced and indexes are rebuilt.
 |N > M | Unsupported.
 |===

 == Consistency Guarantees

 All snapshots are fully consistent in terms of concurrent cluster-wide operations as well as ongoing changes with Ignite
 Persistence data, index, schema, binary metadata, marshaller and other files on nodes.

 The cluster-wide snapshot consistency is achieved by triggering the link:https://cwiki.apache.org/confluence/display/IGNITE/%28Partition+Map%29+Exchange+-+under+the+hood[Partition-Map-Exchange]
 procedure. By doing that, the cluster will eventually get to the point in time when all previously started transactions are completed, and new
 ones are paused. Once this happens, the cluster initiates the snapshot creation procedure. The PME procedure ensures
 that the snapshot includes primary and backup in a consistent state.

 The consistency between the Ignite Persistence files and their snapshot copies is achieved by copying the original
 files to the destination snapshot directory with tracking all concurrent ongoing changes. The tracking of the changes
 might require extra space on the Ignite Persistence storage media (up to the 1x size of the storage media).

 == Current Limitations

 The snapshot procedure has some limitations that you should be aware of before using the feature in your production environment:

 * Snapshotting of specific caches/tables is unsupported. You always create a full cluster snapshot.
 * Caches/tables that are not persisted in Ignite Persistence are not included into the snapshot.
 * Encrypted caches are not included in the snapshot.
 * You can have only one snapshotting operation running at a time.
 * The snapshot procedure is interrupted if a server node leaves the cluster.
 * Snapshot may be restored only at the same cluster topology with the same node IDs;
 * The automatic restore procedure is not available yet. You have to restore it manually.

 If any of these limitations prevent you from using Apache Ignite, then select alternate snapshotting implementations for
 Ignite provided by enterprise vendors.
	// Licensed to the Apache Software Foundation (ASF) under one or more
	// contributor license agreements. See the NOTICE file distributed with
	// this work for additional information regarding copyright ownership.
	// The ASF licenses this file to You under the Apache License, Version 2.0
	// (the "License"); you may not use this file except in compliance with
	// the License. You may obtain a copy of the License at
	//
	// http://www.apache.org/licenses/LICENSE-2.0
	//
	// Unless required by applicable law or agreed to in writing, software
	// distributed under the License is distributed on an "AS IS" BASIS,
	// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	// See the License for the specific language governing permissions and
	// limitations under the License.
	= Cluster Snapshots

	== Overview

	Ignite provides an ability to create full cluster snapshots for deployments using
	link:persistence/native-persistence[Ignite Persistence]. An Ignite snapshot includes a consistent cluster-wide copy of
	all data records persisted on disk and some other files needed for a restore procedure.

	The snapshot structure is similar to the layout of the
	link:persistence/native-persistence#configuring-persistent-storage-directory[Ignite Persistence storage directory],
	with several exceptions. Let's take this snapshot as an example to review the structure:
	[source,shell]
	----
	work
	└── snapshots
	└── backup23012020
	└── db
	├── binary_meta
	│ ├── node1
	│ ├── node2
	│ └── node3
	├── marshaller
	│ ├── node1
	│ ├── node2
	│ └── node3
	├── node1
	│ └── my-sample-cache
	│ ├── cache_data.dat
	│ ├── part-3.bin
	│ ├── part-4.bin
	│ └── part-6.bin
	├── node2
	│ └── my-sample-cache
	│ ├── cache_data.dat
	│ ├── part-1.bin
	│ ├── part-5.bin
	│ └── part-7.bin
	└── node3
	└── my-sample-cache
	├── cache_data.dat
	├── part-0.bin
	└── part-2.bin
	----
	* The snapshot is located under the `work\snapshots` directory and named as `backup23012020` where `work` is Ignite's work
	directory.
	* The snapshot is created for a 3-node cluster with all the nodes running on the same machine. In this example,
	the nodes are named as `node1`, `node2`, and `node3`, while in practice, the names are equal to nodes'
	link:https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood#IgnitePersistentStoreunderthehood-SubfoldersGeneration[consistent IDs].
	* The snapshot keeps a copy of the `my-sample-cache` cache.
	* The `db` folder keeps a copy of data records in `part-N.bin` and `cache_data.dat` files. Write-ahead and checkpointing
	are not added into the snapshot as long as those are not required for the current restore procedure.
	* The `binary_meta` and `marshaller` directories store metadata and marshaller-specific information.

	[NOTE]
	====
	[discrete]
	=== Usually Snapshot is Spread Across the Cluster

	The previous example shows the snapshot created for the cluster running on the same physical machine. Thus, the whole
	snapshot is located in a single place. While in practice, all the nodes will be running on different machines having the
	snapshot data spread across the cluster. Each node keeps a segment of the snapshot with the data belonging to this particular node.
	The link:persistence/snapshots#restoring-from-snapshot[restore procedure] explains how to tether together all the segments during recovery.
	====

	== Configuring Snapshot Directory

	By default, a segment of the snapshot is stored in the work directory of a respective Ignite node and uses the same storage
	media where Ignite Persistence keeps data, index, WAL, and other files. Since the snapshot can consume as much space as
	already taken by the persistence files and can affect your applications' performance by sharing the disk I/O with the
	Ignite Persistence routines, it's suggested to store the snapshot and persistence files on different media.

	You can avoid this interference between Ignite Native persistence and snapshotting
	by either changing link:persistence/native-persistence#configuring-persistent-storage-directory[storage directories of the persistence files]
	or overriding the default snapshots' location as shown below:
	[tabs]
	--
	tab:XML[]
	[source, xml]
	----
	include::code-snippets/xml/snapshots.xml[tags=ignite-config;!discovery, indent=0]
	----
	tab:Java[]
	[source, java]
	----
	include::{javaCodeDir}/Snapshots.java[tags=config, indent=0]
	----
	--

	== Creating Snapshot

	Ignite provides several APIs for the snapshot creation. Let's review all the options.

	=== Using Control Script

	Ignite ships the link:control-script[control script] that supports snapshots-related commands listed below:

	[source,shell]
	----
	#Create a cluster snapshot:
	control.(sh\|bat) --snapshot create snapshot_name

	#Cancel a running snapshot:
	control.(sh\|bat) --snapshot cancel snapshot_name

	#Kill a running snapshot:
	control.(sh\|bat) --kill SNAPSHOT snapshot_name
	----

	=== Using JMX

	Use the `SnapshotMXBean` interface to perform the snapshot-specific procedures via JMX:

	[cols="1,1",opts="header"]
	\|===
	\|Method \| Description
	\|createSnapshot(String snpName) \| Create a snapshot.
	\|cancelSnapshot(String snpName) \| Cancel a snapshot on the node initiated its creation.
	\|===

	=== Using Java API

	Also, it's possible to create a snapshot programmatically in Java:

	[tabs]
	--
	tab:Java[]

	[source, java]
	----
	include::{javaCodeDir}/Snapshots.java[tags=create, indent=0]
	----
	--

	== Restoring From Snapshot

	Currently, the data restore procedure has to be performed manually. In a nutshell, you need to stop the cluster,
	replace persistence data and other files with the data from the snapshot, and restart the nodes.

	The detailed procedure looks as follows:

	. Stop the cluster you intend to restore
	. Remove all files from the checkpoint `$IGNITE_HOME/work/cp` directory
	. Do the following on each node. Clean the
	link:link:persistence/native-persistence#configuring-persistent-storage-directory[`db/{node_id}`] directory separately if
	it's not located under the Ignite `work` dir:
	- Remove the files related to the `{nodeId}` from the `$IGNITE_HOME/work/db/binary_meta` directory
	- Remove the files related to the `{nodeId}` from the `$IGNITE_HOME/work/db/marshaller` directory
	- Remove the files and sub-directories related to the `{nodeId}` under your `$IGNITE_HOME/work/db` directory. Clean the
	- Copy the files belonging to a node with the `{node_id}` from the snapshot into the `$IGNITE_HOME/work/` directory.
	If the `db/{node_id}` directory is not located under the Ignite `work` dir then you need to copy data files there.
	. Restart the cluster

	Restore On Cluster of Different Topology

	Sometimes you might want to create a snapshot of an N-node cluster and use it to restore on an M-node cluster. The table
	below explains what options are supported:

	[cols="1,1",opts="header"]
	\|===
	\|Condition \| Description
	\|N == M \| The recommended case. Create and use the snapshot on clusters of a similar topology.
	\|N < M \| Start the first N nodes of the M-node cluster and apply the snapshot. Add the rest of the M-cluster nodes to
	the topology and wait while the data gets rebalanced and indexes are rebuilt.
	\|N > M \| Unsupported.
	\|===

	== Consistency Guarantees

	All snapshots are fully consistent in terms of concurrent cluster-wide operations as well as ongoing changes with Ignite
	Persistence data, index, schema, binary metadata, marshaller and other files on nodes.

	The cluster-wide snapshot consistency is achieved by triggering the link:https://cwiki.apache.org/confluence/display/IGNITE/%28Partition+Map%29+Exchange+-+under+the+hood[Partition-Map-Exchange]
	procedure. By doing that, the cluster will eventually get to the point in time when all previously started transactions are completed, and new
	ones are paused. Once this happens, the cluster initiates the snapshot creation procedure. The PME procedure ensures
	that the snapshot includes primary and backup in a consistent state.

	The consistency between the Ignite Persistence files and their snapshot copies is achieved by copying the original
	files to the destination snapshot directory with tracking all concurrent ongoing changes. The tracking of the changes
	might require extra space on the Ignite Persistence storage media (up to the 1x size of the storage media).

	== Current Limitations

	The snapshot procedure has some limitations that you should be aware of before using the feature in your production environment:

	* Snapshotting of specific caches/tables is unsupported. You always create a full cluster snapshot.
	* Caches/tables that are not persisted in Ignite Persistence are not included into the snapshot.
	* Encrypted caches are not included in the snapshot.
	* You can have only one snapshotting operation running at a time.
	* The snapshot procedure is interrupted if a server node leaves the cluster.
	* Snapshot may be restored only at the same cluster topology with the same node IDs;
	* The automatic restore procedure is not available yet. You have to restore it manually.

	If any of these limitations prevent you from using Apache Ignite, then select alternate snapshotting implementations for
	Ignite provided by enterprise vendors.