blob: 2ed3956710971da26259430a5fc0e0f92c2790c3 [file] [log] [blame]
// Licensed to the Apache Software Foundation (ASF) under one or more
// contributor license agreements. See the NOTICE file distributed with
// this work for additional information regarding copyright ownership.
// The ASF licenses this file to You under the Apache License, Version 2.0
// (the "License"); you may not use this file except in compliance with
// the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
= Troubleshooting and Debugging
This article covers some common tips and tricks for debugging and troubleshooting Ignite deployments.
== Debugging Tools: Consistency Check Command
The `./control.sh|bat` utility includes a set of link:tools/control-script#consistency-check-commands[consistency check commands]
that help with verifying internal data consistency invariants.
== Persistence Files Disappear on Restart
On some systems, the default location for Ignite persistence files might be under a `temp` folder. This can lead to situations when persistence files are removed by an operating system whenever a node process is restarted. To avoid this:
* Ensure that `WARN` logging level is enabled for Ignite. You will see a warning if the persistence files are written to the temporary directory.
* Change the location of all persistence files using the `DataStorageConfiguration` APIs, such as `setStoragePath(...)`,
`setWalPath(...)`, and `setWalArchivePath(...)`
== Cluster Doesn't Start After Field Type Changes
When developing your application, you may need to change the type of a custom
object’s field. For instance, let’s say you have object `A` with field `A.range` of
`int` type and then you decide to change the type of `A.range` to `long` right in
the source code. When you do this, the cluster or the application will fail to
restart because Ignite doesn't support field/column type changes.
When this happens _and you are still in development_, you need to go into the
file system and remove the following directories: `marshaller/`, `db/`, and `wal/`
located in the Ignite working directory (`db` and `wal` might be located in other
places if you have redefined their location).
However, if you are _in production_ then instead of changing field types, add a
new field with a different name to your object model and remove the old one. This operation is fully
supported. At the same time, the `ALTER TABLE` command can be used to add new
columns or remove existing ones at run time.
== Debugging GC Issues
The section contains information that may be helpful when you need to debug and
troubleshoot issues related to Java heap usage or GC pauses.
=== Heap Dumps
If JVM generates `OutOfMemoryException` exceptions then dump the heap automatically the next time the exception occurs.
This helps if the root cause of this exception is not clear and a deeper look at the heap state at the moment of failure is required:
++++
<code-tabs>
<code-tab data-tab="Shell">
++++
[source,shell]
----
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/path/to/heapdump
-XX:OnOutOfMemoryError=“kill -9 %p
-XX:+ExitOnOutOfMemoryError
----
++++
</code-tab>
</code-tabs>
++++
=== Detailed GC Logs
In order to capture detailed information about GC related activities, make sure you have the settings below configured
in the JVM settings of your cluster nodes:
++++
<code-tabs>
<code-tab data-tab="Shell">
++++
[source,shell]
----
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=10
-XX:GCLogFileSize=100M
-Xloggc:/path/to/gc/logs/log.txt
----
++++
</code-tab>
</code-tabs>
++++
Replace `/path/to/gc/logs/` with an actual path on your file system.
In addition, for G1 collector set the property below. It provides many additional details that are
purposefully not included in the `-XX:+PrintGCDetails` setting:
++++
<code-tabs>
<code-tab data-tab="Shell">
++++
[source,shell]
----
-XX:+PrintAdaptiveSizePolicy
----
++++
</code-tab>
</code-tabs>
++++
=== Performance Analysis With Flight Recorder
In cases when you need to debug performance or memory issues you can use Java Flight Recorder to continuously
collect low level runtime statistics, enabling after-the-fact incident analysis. To enable Java Flight Recorder use the
following settings:
++++
<code-tabs>
<code-tab data-tab="Shell">
++++
[source,shell]
----
-XX:+UnlockCommercialFeatures
-XX:+FlightRecorder
-XX:+UnlockDiagnosticVMOptions
-XX:+DebugNonSafepoints
----
++++
</code-tab>
</code-tabs>
++++
To start recording the state on a particular Ignite node use the following command:
++++
<code-tabs>
<code-tab data-tab="Shell">
++++
[source,shell]
----
jcmd <PID> JFR.start name=<recordcing_name> duration=60s filename=/var/recording/recording.jfr settings=profile
----
++++
</code-tab>
</code-tabs>
++++
For Flight Recorder related details refer to Oracle's official documentation.
=== JVM Pauses
Occasionally you may see an warning message about the JVM being paused for too long. It can happen during bulk loading, for example.
Adjusting the `IGNITE_JVM_PAUSE_DETECTOR_THRESHOLD` timeout setting may give the process time to finish without generating the warning. You can set the threshold via an environment variable, or pass it as a JVM argument (`-DIGNITE_JVM_PAUSE_DETECTOR_THRESHOLD=5000`) or as a parameter to ignite.sh (`-J-DIGNITE_JVM_PAUSE_DETECTOR_THRESHOLD=5000`).
The value is in milliseconds.