There are many possible causes for a Brooklyn server becoming slow or unresponsive. This guide describes some possible reasons, and some commands and tools that can help diagnose the problem.
Possible reasons include:
See Brooklyn Requirements for details of server requirements.
The following commands will collect OS-level diagnostics about the machine, and about the Brooklyn process. The commands below assume use of CentOS 6.x. Minor adjustments may be required for other platforms.
To display system information, run:
{% highlight bash %} uname -a {% endhighlight %}
To show details of the CPU and memory available to the machine, run:
{% highlight bash %} cat /proc/cpuinfo cat /proc/meminfo {% endhighlight %}
To display information about user limits, run the command below (while logged in as the same user who runs Brooklyn):
{% highlight bash %} ulimit -a {% endhighlight %}
If Brooklyn is run as a different user (e.g. with user name “adalovelace”), then instead run:
{% highlight bash %} ulimit -a -u adalovelace {% endhighlight %}
Of particular interest is the limit for “open files”.
See Increase System Resource Limits for more information.
The command below will list the disk size for each partition, including the amount used and available. If the Brooklyn base directory, persistence directory or logging directory are close to 0% available, this can cause serious problems:
{% highlight bash %} df -h {% endhighlight %}
To view the CPU and memory usage of all processes, and of the machine as a whole, one can use the top
command. This runs interactively, updating every few seconds. To collect the output once (e.g. to share diagnostic information in a bug report), run:
{% highlight bash %} top -n 1 -b > top.txt {% endhighlight %}
To count the number of open files for the Brooklyn process (which includes open socket connections):
{% highlight bash %} BROOKLYN_HOME=/home/users/brooklyn/apache-brooklyn-0.9.0-bin BROOKLYN_PID=$(cat $BROOKLYN_HOME/pid_java) lsof -p $BROOKLYN_PID | wc -l {% endhighlight %}
To count (or view the number of “established” internet connections, run:
{% highlight bash %} netstat -an | grep ESTABLISHED | wc -l {% endhighlight %}
A lack of entropy can cause random number generation to be extremely slow. This can cause tasks like ssh to also be extremely slow. See Linux kernel entropy for details of how to work around this.
To get memory and thread usage for the Brooklyn (Java) process, two useful tools are jstack
and jmap
. These require the “development kit” to also be installed (e.g. yum install java-1.8.0-openjdk-devel
). Some useful commands are shown below:
{% highlight bash %} BROOKLYN_HOME=/home/users/brooklyn/apache-brooklyn-0.9.0-bin BROOKLYN_PID=$(cat $BROOKLYN_HOME/pid_java)
jstack $BROOKLYN_PID jmap -histo:live $BROOKLYN_PID jmap -heap $BROOKLYN_PID {% endhighlight %}
The jstack-active script is a convenient light-weight way to quickly see which threads of a running Brooklyn server are attempting to consume the CPU. It filters the output of jstack
, to show only the “really-runnable” threads (as opposed to those that are blocked).
{% highlight bash %} BROOKLYN_HOME=/home/users/brooklyn/apache-brooklyn-0.9.0-bin BROOKLYN_PID=$(cat $BROOKLYN_HOME/pid_java)
curl -O https://raw.githubusercontent.com/apache/brooklyn-dist/master/scripts/jstack-active.sh
jstack-active $BROOKLYN_PID {% endhighlight %}
The Karaf OSGi subsystem can in some cases spend a lot of energy identifying which bundles should provide packages. Often this can be resolved by removing redundant bundles. To observe this, add the following to etc/org.ops4j.pax.logging.cfg
(no restart normally needed):
log4j2.logger.felix.name = org.apache.felix log4j2.logger.felix.level = DEBUG
To get a unique list of duplicate packages: grep chains data/log/*.debug.log | sed 's/.*exposed to //' | sed 's/from .*//' | sort | uniq
If an in-depth investigation of the CPU usage (and/or object creation) of a Brooklyn Server is requiring, there are many profiling tools designed specifically for this purpose. These generally require that the process be launched in such a way that a profiler can attach, which may not be appropriate for a production server.
If the Brooklyn Server was originally run to allow a remote debugger to connect (strongly discouraged in production!), then this provides a convenient way to investigate why Brooklyn is being slow or unresponsive. See the Debugging Tips in the tip Debugging Remote Brooklyn and the IDE docs for more information.
Apache Brooklyn will by default create brooklyn.info.log and brooklyn.debug.log files. See the Logging docs for more information.
The following are useful log messages to search for (e.g. using grep
). Note the wording of these messages (or their very presence) may change in future version of Brooklyn.
The lines below are commonly logged, and can be useful to search for when finding the start of a section of logging.
{% highlight text %} 2016-05-30 17:05:51,458 INFO o.a.b.l.BrooklynWebServer [main]: Started Brooklyn console at http://127.0.0.1:8081/, running classpath://brooklyn.war 2016-05-30 17:06:04,098 INFO o.a.b.c.m.h.HighAvailabilityManagerImpl [main]: Management node tF3GPvQ5 running as HA MASTER autodetected 2016-05-30 17:06:08,982 INFO o.a.b.c.m.r.InitialFullRebindIteration [brooklyn-execmanager-rvpnFTeL-0]: Rebinding from /home/compose/compose-amp-state/brooklyn-persisted-state/data for master rvpnFTeL... 2016-05-30 17:06:11,105 INFO o.a.b.c.m.r.RebindIteration [brooklyn-execmanager-rvpnFTeL-0]: Rebind complete (MASTER) in 2s: 19 apps, 54 entities, 50 locations, 46 policies, 704 enrichers, 0 feeds, 160 catalog items {% endhighlight %}
The debug log includes (every minute) a log statement about the memory usage and task activity. For example:
{% highlight text %} 2016-05-27 12:20:19,395 DEBUG o.a.b.c.m.i.BrooklynGarbageCollector [brooklyn-gc]: brooklyn gc (before) - using 328 MB / 496 MB memory (5.58 kB soft); 69 threads; storage: {datagrid={size=7, createCount=7}, refsMapSize=0, listsMapSize=0}; tasks: 10 active, 33 unfinished; 78 remembered, 1696906 total submitted) 2016-05-27 12:20:19,395 DEBUG o.a.b.c.m.i.BrooklynGarbageCollector [brooklyn-gc]: brooklyn gc (after) - using 328 MB / 496 MB memory (5.58 kB soft); 69 threads; storage: {datagrid={size=7, createCount=7}, refsMapSize=0, listsMapSize=0}; tasks: 10 active, 33 unfinished; 78 remembered, 1696906 total submitted) {% endhighlight %}
These can be extremely useful if investigating a memory or thread leak, or to determine whether a surprisingly high number of tasks are being executed.
One source of high CPU in Brooklyn is when a subscription (e.g. for a policy or enricher) is being triggered many times (i.e. handling many events). A log message like that below will be logged on every 1000 events handled by a given single subscription.
{% highlight text %} 2016-05-30 17:29:09,125 DEBUG o.a.b.c.m.i.LocalSubscriptionManager [brooklyn-execmanager-rvpnFTeL-8]: 1000 events for subscriber Subscription[SCFnav9g;CanopyComposeApp{id=gIeTwhU2}@gIeTwhU2:webapp.url] {% endhighlight %}
If a subscription is handling a huge number of events, there are a couple of common reasons:
All activity triggered by the REST API or web-console will be logged. Some examples are shown below:
{% highlight text %} 2016-05-19 17:52:30,150 INFO o.a.b.r.r.ApplicationResource [brooklyn-jetty-server-8081-qtp1058726153-17473]: Launched from YAML: name: My Example App location: aws-ec2:us-east-1 services:
2016-05-30 14:46:19,516 DEBUG brooklyn.REST [brooklyn-jetty-server-8081-qtp1104967201-20881]: Request Tisj14 starting: POST /v1/applications/NiBy0v8Q/entities/NiBy0v8Q/expunge from 77.70.102.66 {% endhighlight %}
If investigating the behaviour of a particular entity (e.g. on failure), it can be very useful to grep
the info and debug log for the entity's id. For a software process, the debug log will include the stdout and stderr of all the commands executed by that entity.
It can also be very useful to search for all effector invocations, to see where the behaviour has been triggered:
{% highlight text %} 2016-05-27 12:45:43,529 DEBUG o.a.b.c.m.i.EffectorUtils [brooklyn-execmanager-gvP7MuZF-14364]: Invoking effector stop on TomcatServerImpl{id=mPujYmPd} {% endhighlight %}