| // Licensed to the Apache Software Foundation (ASF) under one or more |
| // contributor license agreements. See the NOTICE file distributed with |
| // this work for additional information regarding copyright ownership. |
| // The ASF licenses this file to You under the Apache License, Version 2.0 |
| // (the "License"); you may not use this file except in compliance with |
| // the License. You may obtain a copy of the License at |
| // |
| // http://www.apache.org/licenses/LICENSE-2.0 |
| // |
| // Unless required by applicable law or agreed to in writing, software |
| // distributed under the License is distributed on an "AS IS" BASIS, |
| // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| // See the License for the specific language governing permissions and |
| // limitations under the License. |
| = Troubleshooting and Debugging |
| |
| This article covers some common tips and tricks for debugging and troubleshooting Ignite deployments. |
| |
| == Debugging Tools: Consistency Check Command |
| |
| The `./control.sh|bat` utility includes a set of link:tools/control-script#consistency-check-commands[consistency check commands] |
| that help with verifying internal data consistency invariants. |
| |
| == Persistence Files Disappear on Restart |
| |
| On some systems, the default location for Ignite persistence files might be under a `temp` folder. This can lead to situations when persistence files are removed by an operating system whenever a node process is restarted. To avoid this: |
| |
| * Ensure that `WARN` logging level is enabled for Ignite. You will see a warning if the persistence files are written to the temporary directory. |
| * Change the location of all persistence files using the `DataStorageConfiguration` APIs, such as `setStoragePath(...)`, |
| `setWalPath(...)`, and `setWalArchivePath(...)` |
| |
| == Cluster Doesn't Start After Field Type Changes |
| |
| When developing your application, you may need to change the type of a custom |
| object’s field. For instance, let’s say you have object `A` with field `A.range` of |
| `int` type and then you decide to change the type of `A.range` to `long` right in |
| the source code. When you do this, the cluster or the application will fail to |
| restart because Ignite doesn't support field/column type changes. |
| |
| When this happens _and you are still in development_, you need to go into the |
| file system and remove the following directories: `marshaller/`, `db/`, and `wal/` |
| located in the Ignite working directory (`db` and `wal` might be located in other |
| places if you have redefined their location). |
| |
| However, if you are _in production_ then instead of changing field types, add a |
| new field with a different name to your object model and remove the old one. This operation is fully |
| supported. At the same time, the `ALTER TABLE` command can be used to add new |
| columns or remove existing ones at run time. |
| |
| == Debugging GC Issues |
| |
| The section contains information that may be helpful when you need to debug and |
| troubleshoot issues related to Java heap usage or GC pauses. |
| |
| === Heap Dumps |
| |
| If JVM generates `OutOfMemoryException` exceptions then dump the heap automatically the next time the exception occurs. |
| This helps if the root cause of this exception is not clear and a deeper look at the heap state at the moment of failure is required: |
| |
| ++++ |
| <code-tabs> |
| <code-tab data-tab="Shell"> |
| ++++ |
| [source,shell] |
| ---- |
| -XX:+HeapDumpOnOutOfMemoryError |
| -XX:HeapDumpPath=/path/to/heapdump |
| -XX:OnOutOfMemoryError=“kill -9 %p” |
| -XX:+ExitOnOutOfMemoryError |
| ---- |
| ++++ |
| </code-tab> |
| </code-tabs> |
| ++++ |
| |
| === Detailed GC Logs |
| |
| In order to capture detailed information about GC related activities, make sure you have the settings below configured |
| in the JVM settings of your cluster nodes: |
| |
| ++++ |
| <code-tabs> |
| <code-tab data-tab="Shell"> |
| ++++ |
| [source,shell] |
| ---- |
| -XX:+PrintGCDetails |
| -XX:+PrintGCTimeStamps |
| -XX:+PrintGCDateStamps |
| -XX:+UseGCLogFileRotation |
| -XX:NumberOfGCLogFiles=10 |
| -XX:GCLogFileSize=100M |
| -Xloggc:/path/to/gc/logs/log.txt |
| ---- |
| ++++ |
| </code-tab> |
| </code-tabs> |
| ++++ |
| |
| Replace `/path/to/gc/logs/` with an actual path on your file system. |
| |
| In addition, for G1 collector set the property below. It provides many additional details that are |
| purposefully not included in the `-XX:+PrintGCDetails` setting: |
| |
| ++++ |
| <code-tabs> |
| <code-tab data-tab="Shell"> |
| ++++ |
| [source,shell] |
| ---- |
| -XX:+PrintAdaptiveSizePolicy |
| ---- |
| ++++ |
| </code-tab> |
| </code-tabs> |
| ++++ |
| |
| === Performance Analysis With Flight Recorder |
| |
| In cases when you need to debug performance or memory issues you can use Java Flight Recorder to continuously |
| collect low level runtime statistics, enabling after-the-fact incident analysis. To enable Java Flight Recorder use the |
| following settings: |
| |
| ++++ |
| <code-tabs> |
| <code-tab data-tab="Shell"> |
| ++++ |
| [source,shell] |
| ---- |
| -XX:+UnlockCommercialFeatures |
| -XX:+FlightRecorder |
| -XX:+UnlockDiagnosticVMOptions |
| -XX:+DebugNonSafepoints |
| ---- |
| ++++ |
| </code-tab> |
| </code-tabs> |
| ++++ |
| |
| To start recording the state on a particular Ignite node use the following command: |
| |
| ++++ |
| <code-tabs> |
| <code-tab data-tab="Shell"> |
| ++++ |
| [source,shell] |
| ---- |
| jcmd <PID> JFR.start name=<recordcing_name> duration=60s filename=/var/recording/recording.jfr settings=profile |
| ---- |
| ++++ |
| </code-tab> |
| </code-tabs> |
| ++++ |
| |
| For Flight Recorder related details refer to Oracle's official documentation. |
| |
| === JVM Pauses |
| |
| Occasionally you may see an warning message about the JVM being paused for too long. It can happen during bulk loading, for example. |
| |
| Adjusting the `IGNITE_JVM_PAUSE_DETECTOR_THRESHOLD` timeout setting may give the process time to finish without generating the warning. You can set the threshold via an environment variable, or pass it as a JVM argument (`-DIGNITE_JVM_PAUSE_DETECTOR_THRESHOLD=5000`) or as a parameter to ignite.sh (`-J-DIGNITE_JVM_PAUSE_DETECTOR_THRESHOLD=5000`). |
| |
| The value is in milliseconds. |
| |