| --- |
| layout: website-normal |
| title: Brooklyn Slow or Unresponsive |
| toc: /guide/toc.json |
| --- |
| |
| There are many possible causes for a Brooklyn server becoming slow or unresponsive. This guide |
| describes some possible reasons, and some commands and tools that can help diagnose the problem. |
| |
| Possible reasons include: |
| |
| * CPU is max'ed out |
| * Memory usage is extremely high |
| * SSH'ing is very slow due (e.g. due to lack of entropy) |
| * Out of disk space |
| |
| See [Brooklyn Requirements](/guide/ops/requirements.md) for details of server |
| requirements. |
| |
| |
| ## Machine Diagnostics |
| |
| The following commands will collect OS-level diagnostics about the machine, and about the Brooklyn |
| process. The commands below assume use of CentOS 6.x. Minor adjustments may be required for |
| other platforms. |
| |
| |
| #### OS and Machine Details |
| |
| To display system information, run: |
| |
| {% highlight bash %} |
| uname -a |
| {% endhighlight %} |
| |
| To show details of the CPU and memory available to the machine, run: |
| |
| {% highlight bash %} |
| cat /proc/cpuinfo |
| cat /proc/meminfo |
| {% endhighlight %} |
| |
| |
| #### User Limits |
| |
| To display information about user limits, run the command below (while logged in as the same user |
| who runs Brooklyn): |
| |
| {% highlight bash %} |
| ulimit -a |
| {% endhighlight %} |
| |
| If Brooklyn is run as a different user (e.g. with user name "adalovelace"), then instead run: |
| |
| {% highlight bash %} |
| ulimit -a -u adalovelace |
| {% endhighlight %} |
| |
| Of particular interest is the limit for "open files". |
| |
| See [Increase System Resource Limits](increase-system-resource-limits.md) |
| for more information. |
| |
| |
| #### Disk Space |
| |
| The command below will list the disk size for each partition, including the amount used and |
| available. If the Brooklyn base directory, persistence directory or logging directory are close |
| to 0% available, this can cause serious problems: |
| |
| {% highlight bash %} |
| df -h |
| {% endhighlight %} |
| |
| |
| #### CPU and Memory Usage |
| |
| To view the CPU and memory usage of all processes, and of the machine as a whole, one can use the |
| `top` command. This runs interactively, updating every few seconds. To collect the output once |
| (e.g. to share diagnostic information in a bug report), run: |
| |
| {% highlight bash %} |
| top -n 1 -b > top.txt |
| {% endhighlight %} |
| |
| |
| #### File and Network Usage |
| |
| To count the number of open files for the Brooklyn process (which includes open socket connections): |
| |
| {% highlight bash %} |
| BROOKLYN_HOME=/home/users/brooklyn/apache-brooklyn-0.9.0-bin |
| BROOKLYN_PID=$(cat $BROOKLYN_HOME/pid_java) |
| lsof -p $BROOKLYN_PID | wc -l |
| {% endhighlight %} |
| |
| To count (or view the number of "established" internet connections, run: |
| |
| {% highlight bash %} |
| netstat -an | grep ESTABLISHED | wc -l |
| {% endhighlight %} |
| |
| |
| #### Linux Kernel Entropy |
| |
| A lack of entropy can cause random number generation to be extremely slow. This can cause |
| tasks like ssh to also be extremely slow. See |
| [Linux kernel entropy](increase-entropy.md) |
| for details of how to work around this. |
| |
| |
| #### SSHD Limits |
| |
| Apache Brooklyn will attempt to re-use the SSH connections to machines on a per-location basis, by default, |
| keeping sessions open for up to 5 minutes if the entity/location is managed. |
| If the same target is used via multiple `SshMachineLocation` instances |
| (such as through BYON or localhost), this may trigger SSHD throttling. |
| |
| This can be resolved by setting either `sshCacheExpiryDuration: 10s` or `brooklyn.ssh.config.close: true`, |
| on the location, as described [here](/guide/locations#ssh-low-level-configuration). |
| |
| It could also be resolved by increasing `MaxSessions` and `MaxStartups` in `sshd_config` on the target system. |
| More info on SSHD limits are documented [here](https://en.wikibooks.org/wiki/OpenSSH/Cookbook/Load_Balancing). |
| |
| |
| |
| ## Process Diagnostics |
| |
| #### Thread and Memory Usage |
| |
| To get memory and thread usage for the Brooklyn (Java) process, two useful tools are `jstack` |
| and `jmap`. These require the "development kit" to also be installed |
| (e.g. `yum install java-1.8.0-openjdk-devel`). Some useful commands are shown below: |
| |
| {% highlight bash %} |
| BROOKLYN_HOME=/home/users/brooklyn/apache-brooklyn-0.9.0-bin |
| BROOKLYN_PID=$(cat $BROOKLYN_HOME/pid_java) |
| |
| jstack $BROOKLYN_PID |
| jmap -histo:live $BROOKLYN_PID |
| jmap -heap $BROOKLYN_PID |
| {% endhighlight %} |
| |
| |
| #### Runnable Threads |
| |
| The [jstack-active](https://github.com/apache/brooklyn-dist/blob/master/scripts/jstack-active.sh) |
| script is a convenient light-weight way to quickly see which threads of a running Brooklyn |
| server are attempting to consume the CPU. It filters the output of `jstack`, to show only the |
| "really-runnable" threads (as opposed to those that are blocked). |
| |
| {% highlight bash %} |
| BROOKLYN_HOME=/home/users/brooklyn/apache-brooklyn-0.9.0-bin |
| BROOKLYN_PID=$(cat $BROOKLYN_HOME/pid_java) |
| |
| curl -O https://raw.githubusercontent.com/apache/brooklyn-dist/master/scripts/jstack-active.sh |
| |
| jstack-active $BROOKLYN_PID |
| {% endhighlight %} |
| |
| |
| #### OSGi Resolution |
| |
| The Karaf OSGi subsystem can in some cases spend a lot of energy identifying which bundles should provide packages. |
| Often this can be resolved by removing redundant bundles. |
| To observe this, add the following to `etc/org.ops4j.pax.logging.cfg` (no restart normally needed): |
| |
| log4j2.logger.felix.name = org.apache.felix |
| log4j2.logger.felix.level = DEBUG |
| |
| To get a unique list of duplicate packages: `grep chains data/log/*.debug.log | sed 's/.*exposed to //' | sed 's/from .*//' | sort | uniq` |
| |
| |
| #### Profiling |
| |
| If an in-depth investigation of the CPU usage (and/or object creation) of a Brooklyn Server is |
| requiring, there are many profiling tools designed specifically for this purpose. These generally |
| require that the process be launched in such a way that a profiler can attach, which may not be |
| appropriate for a production server. |
| |
| |
| #### Remote Debugging |
| |
| If the Brooklyn Server was originally run to allow a remote debugger to connect (strongly |
| discouraged in production!), then this provides a convenient way to investigate why Brooklyn |
| is being slow or unresponsive. See the Debugging Tips in the |
| tip [Debugging Remote Brooklyn](/guide/dev/tips/debugging-remote-brooklyn.md) |
| and the [IDE docs](/guide/dev/env/ide) for more information. |
| |
| |
| ## Log Files |
| |
| Apache Brooklyn will by default create brooklyn.info.log and brooklyn.debug.log files. See the |
| [Logging](/guide/ops/logging/index.md) docs for more information. |
| |
| The following are useful log messages to search for (e.g. using `grep`). Note the wording of |
| these messages (or their very presence) may change in future version of Brooklyn. |
| |
| |
| #### Normal Logging |
| |
| The lines below are commonly logged, and can be useful to search for when finding the start of a section of logging. |
| |
| {% highlight text %} |
| 2016-05-30 17:05:51,458 INFO o.a.b.l.BrooklynWebServer [main]: Started Brooklyn console at http://127.0.0.1:8081/, running classpath://brooklyn.war |
| 2016-05-30 17:06:04,098 INFO o.a.b.c.m.h.HighAvailabilityManagerImpl [main]: Management node tF3GPvQ5 running as HA MASTER autodetected |
| 2016-05-30 17:06:08,982 INFO o.a.b.c.m.r.InitialFullRebindIteration [brooklyn-execmanager-rvpnFTeL-0]: Rebinding from /home/compose/compose-amp-state/brooklyn-persisted-state/data for master rvpnFTeL... |
| 2016-05-30 17:06:11,105 INFO o.a.b.c.m.r.RebindIteration [brooklyn-execmanager-rvpnFTeL-0]: Rebind complete (MASTER) in 2s: 19 apps, 54 entities, 50 locations, 46 policies, 704 enrichers, 0 feeds, 160 catalog items |
| {% endhighlight %} |
| |
| |
| #### Memory Usage |
| |
| The debug log includes (every minute) a log statement about the memory usage and task activity. For example: |
| |
| {% highlight text %} |
| 2016-05-27 12:20:19,395 DEBUG o.a.b.c.m.i.BrooklynGarbageCollector [brooklyn-gc]: brooklyn gc (before) - using 328 MB / 496 MB memory (5.58 kB soft); 69 threads; storage: {datagrid={size=7, createCount=7}, refsMapSize=0, listsMapSize=0}; tasks: 10 active, 33 unfinished; 78 remembered, 1696906 total submitted) |
| 2016-05-27 12:20:19,395 DEBUG o.a.b.c.m.i.BrooklynGarbageCollector [brooklyn-gc]: brooklyn gc (after) - using 328 MB / 496 MB memory (5.58 kB soft); 69 threads; storage: {datagrid={size=7, createCount=7}, refsMapSize=0, listsMapSize=0}; tasks: 10 active, 33 unfinished; 78 remembered, 1696906 total submitted) |
| {% endhighlight %} |
| |
| These can be extremely useful if investigating a memory or thread leak, or to determine whether a |
| surprisingly high number of tasks are being executed. |
| |
| |
| #### Subscriptions |
| |
| One source of high CPU in Brooklyn is when a subscription (e.g. for a policy or enricher) is being |
| triggered many times (i.e. handling many events). A log message like that below will be logged on |
| every 1000 events handled by a given single subscription. |
| |
| {% highlight text %} |
| 2016-05-30 17:29:09,125 DEBUG o.a.b.c.m.i.LocalSubscriptionManager [brooklyn-execmanager-rvpnFTeL-8]: 1000 events for subscriber Subscription[SCFnav9g;CanopyComposeApp{id=gIeTwhU2}@gIeTwhU2:webapp.url] |
| {% endhighlight %} |
| |
| If a subscription is handling a huge number of events, there are a couple of common reasons: |
| * first, it could be subscribing to too much activity - e.g. a wildcard subscription for all |
| events from all entities. |
| * second it could be an infinite loop (e.g. where an enricher responds to a sensor-changed event |
| by setting that same sensor, thus triggering another sensor-changed event). |
| |
| |
| #### User Activity |
| |
| All activity triggered by the REST API or web-console will be logged. Some examples are shown below: |
| |
| {% highlight text %} |
| 2016-05-19 17:52:30,150 INFO o.a.b.r.r.ApplicationResource [brooklyn-jetty-server-8081-qtp1058726153-17473]: Launched from YAML: name: My Example App |
| location: aws-ec2:us-east-1 |
| services: |
| - type: org.apache.brooklyn.entity.webapp.tomcat.TomcatServer |
| |
| 2016-05-30 14:46:19,516 DEBUG brooklyn.REST [brooklyn-jetty-server-8081-qtp1104967201-20881]: Request Tisj14 starting: POST /v1/applications/NiBy0v8Q/entities/NiBy0v8Q/expunge from 77.70.102.66 |
| {% endhighlight %} |
| |
| |
| #### Entity Activity |
| |
| If investigating the behaviour of a particular entity (e.g. on failure), it can be very useful to |
| `grep` the info and debug log for the entity's id. For a software process, the debug log will |
| include the stdout and stderr of all the commands executed by that entity. |
| |
| It can also be very useful to search for all effector invocations, to see where the behaviour |
| has been triggered: |
| |
| {% highlight text %} |
| 2016-05-27 12:45:43,529 DEBUG o.a.b.c.m.i.EffectorUtils [brooklyn-execmanager-gvP7MuZF-14364]: Invoking effector stop on TomcatServerImpl{id=mPujYmPd} |
| {% endhighlight %} |