geode-docs/configuring/running/starting_up_shutting_down.html.md.erb - geode - Git at Google

 ---
 title:  Starting Up and Shutting Down Your System
 ---

 <!--
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the "License"); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at

      http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
 -->

 Determine the proper startup and shutdown procedures, and write your startup and shutdown scripts.

 Well-designed procedures for starting and stopping your system can speed startup and protect your data. The processes you need to start and stop include server and locator processes and your other <%=vars.product_name%> applications, including clients. The procedures you use depend in part on your system’s configuration and the dependencies between your system processes.

 Use the following guidelines to create startup and shutdown procedures and scripts. Some of these instructions use [`gfsh`](../../tools_modules/gfsh/chapter_overview.html).

 ## <a id="starting_up_shutting_down__section_3D111558326D4A38BE48C17D44BB66DB" class="no-quick-link"></a>Starting Up Your System

 You should follow certain order guidelines when starting your <%=vars.product_name%> system.

 Start servers before you start their client applications. In each cluster, follow these guidelines for member startup:

 -   Start locators first. See [Running <%=vars.product_name%> Locator Processes](running_the_locator.html) for examples of locator start up commands.
 -   Start cache servers before the rest of your processes unless the implementation requires that other processes be started ahead of them. See [Running <%=vars.product_name%> Server Processes](running_the_cacheserver.html) for examples of server start up commands.
 -   If your cluster uses both persistent replicated and non-persistent replicated regions, you should start up all the persistent replicated members in parallel before starting the non-persistent regions. This way, persistent members will not delay their startup for other persistent members with later data.
 -   For a system that includes persistent regions, see [Start Up and Shut Down with Disk Stores](../../managing/disk_storage/starting_system_with_disk_stores.html).
 -   If you are running producer processes and consumer or event listener processes, start the consumers first. This ensures the consumers and listeners do not miss any notifications or updates.
 -   If you are starting up your locators and peer members all at once, you can use the `locator-wait-time` property (in seconds) upon process start up. This timeout allows peers to wait for the locators to finish starting up before attempting to join the cluster. If a process has been configured to wait for a locator to start, it will log an info-level message

     > `GemFire startup was unable to contact a                             locator. Waiting for one to start. Configured locators are                             frodo[12345],pippin[12345]. `

     The process will then sleep for a second and retry until it either connects or the number of seconds specified in `locator-wait-time` has elapsed. By default, `locator-wait-time` is set to zero meaning that a process that cannot connect to a locator upon startup will throw an exception.

 **Note:**
 You can optionally override the default timeout period for shutting down individual processes. This override setting must be specified during member startup. See [Shutting Down the System](#starting_up_shutting_down__section_mnx_4cp_cv) for details.

 ## <a id="starting_up_shutting_down__section_2F8ABBFCE641463C8A8721841407993D" class="no-quick-link"></a>Starting Up After Losing Data on Disk

 This information pertains to catastrophic loss of <%=vars.product_name%> disk store files. If you lose disk store files, your next startup may hang, waiting for the lost disk stores to come back online. If your system hangs at startup, use the `gfsh` command `show missing-disk-store` to list missing disk stores and, if needed, revoke missing disk stores so your system startup can complete. You must use the Disk Store ID to revoke a disk store. These are the two commands:

 ``` pre
 gfsh>show missing-disk-stores

 Disk Store ID             |   Host    |               Directory
 ------------------------------------ | --------- | -------------------------------------
 60399215-532b-406f-b81f-9b5bd8d1b55a | excalibur | /usr/local/gemfire/deploy/disk_store1

 gfsh>revoke missing-disk-store --id=60399215-532b-406f-b81f-9b5bd8d1b55a
 ```

 **Note:**
 This `gfsh` commands require that you are connected to the cluster via a JMX Manager node.

 ## <a id="starting_up_shutting_down__section_mnx_4cp_cv" class="no-quick-link"></a>Shutting Down the System

 Shut down your <%=vars.product_name%> system by using either the `gfsh` `shutdown` command or by shutting down individual members one at a time.

 ## <a id="starting_up_shutting_down__section_0EB4DDABB6A348BA83B786EEE7C84CF1" class="no-quick-link"></a>Using the shutdown Command

 If you are using persistent regions, (members are persisting data to disk), you should use the `gfsh` `shutdown` command to stop the running system in an orderly fashion. This command synchronizes persistent partitioned regions before shutting down, which makes the next startup of the cluster as efficient as possible.

 If possible, all members should be running before you shut them down so synchronization can occur. Shut down the system using the following `gfsh` command:

 ``` pre
 gfsh>shutdown
 ```

 By default, the shutdown command will only shut down data nodes. If you want to shut down all nodes including locators, specify the `--include-locators=true` parameter. For example:

 ``` pre
 gfsh>shutdown --include-locators=true
 ```

 This will shut down all locators one by one, shutting down the manager last.

 To shutdown all data members after a grace period, specify a time-out option (in seconds).

 ``` pre
 gfsh>shutdown --time-out=60
 ```

 To shutdown all members including locators after a grace period, specify a time-out option (in seconds).

 ``` pre
 gfsh>shutdown --include-locators=true --time-out=60
 ```

 ## <a id="starting_up_shutting_down__section_A07D40BC118544D0984860A3B4A5CB29" class="no-quick-link"></a>Shutting Down System Members Individually

 If you are not using persistent regions, you can shut down the cluster by shutting down each member in the reverse order of their startup. (See [Starting Up Your System](#starting_up_shutting_down__section_3D111558326D4A38BE48C17D44BB66DB) for the recommended order of member startup.)

 Shut down the cluster members according to the type of member. For example, use the following mechanisms to shut down members:

 -   Use the appropriate mechanism to shut down any <%=vars.product_name%>-connected client applications that are running in the cluster.
 -   Shut down any cache servers. To shut down a server, issue the following `gfsh` command:

     ``` pre
     gfsh>stop server --name=<...>
     ```

     or

     ``` pre
     gfsh>stop server --dir=<server_working_dir>
     ```

 -   Shut down any locators. To shut down a locator, issue the following `gfsh` command:

     ``` pre
     gfsh>stop locator --name=<...>
     ```

     or

     ``` pre
     gfsh>stop locator --dir=<locator_working_dir>
     ```

 -   Do not use the command line `kill -9` to shut down a server under
 ordinary circumstances.
 Especially on systems with a small number of members,
 using a `kill` instead of a `gfsh stop` can cause the partition detection
 mechanism to place the system in an end state that will wait forever to
 reconnect to the killed server,
 and there will be no way to restart that killed server.
 If a `kill` command appears the only way to rid the system of a server,
 then `kill` *all* the processes of the cluster
 or use `kill -INT`, which will allow an orderly shutdown of the process.

 ## <a id="starting_up_shutting_down__section_7CF680CF8A924C57A7052AE2F975DA81" class="no-quick-link"></a>Option for System Member Shutdown Behavior

 The `DISCONNECT_WAIT` command line argument sets the maximum time for each individual step in the shutdown process. If any step takes longer than the specified amount, it is forced to end. Each operation is given this grace period, so the total length of time the cache member takes to shut down depends on the number of operations and the `DISCONNECT_WAIT` setting. During the shutdown process, <%=vars.product_name%> produces messages such as:

 ``` pre
 Disconnect listener still running
 ```

 The `DISCONNECT_WAIT` default is 10000 milliseconds.

 To change it, set this system property on the Java command line used for member startup. For example:

 ``` pre
 gfsh>start server --J=-DDistributionManager.DISCONNECT_WAIT=<milliseconds>
 ```

 Each process can have different `DISCONNECT_WAIT` settings.
	---
	title: Starting Up and Shutting Down Your System
	---

	<!--
	Licensed to the Apache Software Foundation (ASF) under one or more
	contributor license agreements. See the NOTICE file distributed with
	this work for additional information regarding copyright ownership.
	The ASF licenses this file to You under the Apache License, Version 2.0
	(the "License"); you may not use this file except in compliance with
	the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	-->

	Determine the proper startup and shutdown procedures, and write your startup and shutdown scripts.

	Well-designed procedures for starting and stopping your system can speed startup and protect your data. The processes you need to start and stop include server and locator processes and your other <%=vars.product_name%> applications, including clients. The procedures you use depend in part on your system’s configuration and the dependencies between your system processes.

	Use the following guidelines to create startup and shutdown procedures and scripts. Some of these instructions use [`gfsh`](../../tools_modules/gfsh/chapter_overview.html).

	## <a id="starting_up_shutting_down__section_3D111558326D4A38BE48C17D44BB66DB" class="no-quick-link"></a>Starting Up Your System

	You should follow certain order guidelines when starting your <%=vars.product_name%> system.

	Start servers before you start their client applications. In each cluster, follow these guidelines for member startup:

	- Start locators first. See [Running <%=vars.product_name%> Locator Processes](running_the_locator.html) for examples of locator start up commands.
	- Start cache servers before the rest of your processes unless the implementation requires that other processes be started ahead of them. See [Running <%=vars.product_name%> Server Processes](running_the_cacheserver.html) for examples of server start up commands.
	- If your cluster uses both persistent replicated and non-persistent replicated regions, you should start up all the persistent replicated members in parallel before starting the non-persistent regions. This way, persistent members will not delay their startup for other persistent members with later data.
	- For a system that includes persistent regions, see [Start Up and Shut Down with Disk Stores](../../managing/disk_storage/starting_system_with_disk_stores.html).
	- If you are running producer processes and consumer or event listener processes, start the consumers first. This ensures the consumers and listeners do not miss any notifications or updates.
	- If you are starting up your locators and peer members all at once, you can use the `locator-wait-time` property (in seconds) upon process start up. This timeout allows peers to wait for the locators to finish starting up before attempting to join the cluster. If a process has been configured to wait for a locator to start, it will log an info-level message

	> `GemFire startup was unable to contact a locator. Waiting for one to start. Configured locators are frodo[12345],pippin[12345]. `

	The process will then sleep for a second and retry until it either connects or the number of seconds specified in `locator-wait-time` has elapsed. By default, `locator-wait-time` is set to zero meaning that a process that cannot connect to a locator upon startup will throw an exception.

	Note:
	You can optionally override the default timeout period for shutting down individual processes. This override setting must be specified during member startup. See [Shutting Down the System](#starting_up_shutting_down__section_mnx_4cp_cv) for details.

	## <a id="starting_up_shutting_down__section_2F8ABBFCE641463C8A8721841407993D" class="no-quick-link"></a>Starting Up After Losing Data on Disk

	This information pertains to catastrophic loss of <%=vars.product_name%> disk store files. If you lose disk store files, your next startup may hang, waiting for the lost disk stores to come back online. If your system hangs at startup, use the `gfsh` command `show missing-disk-store` to list missing disk stores and, if needed, revoke missing disk stores so your system startup can complete. You must use the Disk Store ID to revoke a disk store. These are the two commands:

	``` pre
	gfsh>show missing-disk-stores

	Disk Store ID \| Host \| Directory
	------------------------------------ \| --------- \| -------------------------------------
	60399215-532b-406f-b81f-9b5bd8d1b55a \| excalibur \| /usr/local/gemfire/deploy/disk_store1

	gfsh>revoke missing-disk-store --id=60399215-532b-406f-b81f-9b5bd8d1b55a
	```

	Note:
	This `gfsh` commands require that you are connected to the cluster via a JMX Manager node.

	## <a id="starting_up_shutting_down__section_mnx_4cp_cv" class="no-quick-link"></a>Shutting Down the System

	Shut down your <%=vars.product_name%> system by using either the `gfsh` `shutdown` command or by shutting down individual members one at a time.

	## <a id="starting_up_shutting_down__section_0EB4DDABB6A348BA83B786EEE7C84CF1" class="no-quick-link"></a>Using the shutdown Command

	If you are using persistent regions, (members are persisting data to disk), you should use the `gfsh` `shutdown` command to stop the running system in an orderly fashion. This command synchronizes persistent partitioned regions before shutting down, which makes the next startup of the cluster as efficient as possible.

	If possible, all members should be running before you shut them down so synchronization can occur. Shut down the system using the following `gfsh` command:

	``` pre
	gfsh>shutdown
	```

	By default, the shutdown command will only shut down data nodes. If you want to shut down all nodes including locators, specify the `--include-locators=true` parameter. For example:

	``` pre
	gfsh>shutdown --include-locators=true
	```

	This will shut down all locators one by one, shutting down the manager last.

	To shutdown all data members after a grace period, specify a time-out option (in seconds).

	``` pre
	gfsh>shutdown --time-out=60
	```

	To shutdown all members including locators after a grace period, specify a time-out option (in seconds).

	``` pre
	gfsh>shutdown --include-locators=true --time-out=60
	```

	## <a id="starting_up_shutting_down__section_A07D40BC118544D0984860A3B4A5CB29" class="no-quick-link"></a>Shutting Down System Members Individually

	If you are not using persistent regions, you can shut down the cluster by shutting down each member in the reverse order of their startup. (See [Starting Up Your System](#starting_up_shutting_down__section_3D111558326D4A38BE48C17D44BB66DB) for the recommended order of member startup.)

	Shut down the cluster members according to the type of member. For example, use the following mechanisms to shut down members:

	- Use the appropriate mechanism to shut down any <%=vars.product_name%>-connected client applications that are running in the cluster.
	- Shut down any cache servers. To shut down a server, issue the following `gfsh` command:

	``` pre
	gfsh>stop server --name=<...>
	```

	or

	``` pre
	gfsh>stop server --dir=<server_working_dir>
	```

	- Shut down any locators. To shut down a locator, issue the following `gfsh` command:

	``` pre
	gfsh>stop locator --name=<...>
	```

	or

	``` pre
	gfsh>stop locator --dir=<locator_working_dir>
	```

	- Do not use the command line `kill -9` to shut down a server under
	ordinary circumstances.
	Especially on systems with a small number of members,
	using a `kill` instead of a `gfsh stop` can cause the partition detection
	mechanism to place the system in an end state that will wait forever to
	reconnect to the killed server,
	and there will be no way to restart that killed server.
	If a `kill` command appears the only way to rid the system of a server,
	then `kill` all the processes of the cluster
	or use `kill -INT`, which will allow an orderly shutdown of the process.

	## <a id="starting_up_shutting_down__section_7CF680CF8A924C57A7052AE2F975DA81" class="no-quick-link"></a>Option for System Member Shutdown Behavior

	The `DISCONNECT_WAIT` command line argument sets the maximum time for each individual step in the shutdown process. If any step takes longer than the specified amount, it is forced to end. Each operation is given this grace period, so the total length of time the cache member takes to shut down depends on the number of operations and the `DISCONNECT_WAIT` setting. During the shutdown process, <%=vars.product_name%> produces messages such as:

	``` pre
	Disconnect listener still running
	```

	The `DISCONNECT_WAIT` default is 10000 milliseconds.

	To change it, set this system property on the Java command line used for member startup. For example:

	``` pre
	gfsh>start server --J=-DDistributionManager.DISCONNECT_WAIT=<milliseconds>
	```

	Each process can have different `DISCONNECT_WAIT` settings.