blob: ff892fad54303bc5237068f4715d57a56ed474f6 [file] [log] [blame]
<?xml version="1.0"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<document xmlns="http://maven.apache.org/XDOC/2.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/XDOC/2.0 http://maven.apache.org/xsd/xdoc-2.0.xsd">
<properties>
<title>Performance</title>
<author email="rpopma@apache.org">Remko Popma</author>
<author email="rgoers@apache.org">Ralph Goers</author>
</properties>
<body>
<section name="Performance">
<!--
<p>One of the often-cited arguments against logging is its
computational cost. This is a legitimate concern as even moderately
sized applications can generate thousands of log requests. Much
effort was spent measuring and tweaking logging performance. Log4j
claims to be fast and flexible: speed first, flexibility second.
</p>
-->
<p>Apart from functional requirements, an important reason for selecting a logging library is often how well it
fulfills non-functional requirements like reliability and performance.</p>
<p>This page compares the performance of a number of logging frameworks
(java.util.logging "JUL", Logback, Log4j 1.2 and Log4j 2.6),
and documents some performance trade-offs for Log4j 2 functionality.
</p>
<a name="benchmarks" />
<h3>Benchmarks</h3>
<p>Performance can mean different things to different people. Common terms in this context are
throughput and latency: <em>Throughput</em> is a measure of capacity and can be expressed in a single number:
how many messages can be logged in a certain period of time.
<em>Response time latency</em> is how long it takes to log a message.
This cannot be expressed in a single number because each measurement has its own
response time and we are often most interested in the outliers: how many there were and how large they were.</p>
<p>When evaluating a logging framework's performance these may be useful questions to ask:</p>
<ul>
<li>What is its <b>peak throughput</b>?
Many systems that react to external events need to log bursts of messages followed by periods of
relative quiet.
This number is the maximum throughput measured over a short period of time and gives some idea
of how well the logging library deals with bursts.
For systems that need to log a lot at a constant high rate (for example, batch jobs)
this is less likely to be a useful measure of performance.</li>
<li>What is the <b>maximum sustained throughput</b>? This is the throughput averaged over a long time.
This is a useful measure of the "upper limit" of the capacity of the logging library.
It is not recommended that reactive applications actually log at this rate since under this load
they will likely experience jitter and large response time spikes.</li>
<a name="responseTime" />
<li><p>What is its <b>response time</b> behaviour under various loads?
This is the most important question for applications that need to react to external events in a timely manner.
Response time is the total amount of time it takes to log a message and is the sum of the
service time and wait time.
The <b>service time</b> is the time it takes to do the work to log the message.
As the workload increases, the service time often varies little:
to do X amount of work it always takes X amount of time.
The <b>wait time</b> is how long the request had to wait in a queue before being serviced.
<em>As the workload increases, wait time often grows to many times the service time.</em>
</p>
</li>
</ul>
<a name="responseTimeVsServiceTime" />
<table>
<tr><td>
<h5>Sidebar: Why Care About Response Time Latency?</h5>
<table border="0" style="border: 0">
<tr style="border: 0">
<td width="50%" style="border: 0"><p>
What is often measured and reported as <em>latency</em> is actually <em>service time</em>,
and omits that a service time spike adds wait time for many subsequent events.
This may present results that are more optimistic than what users experience.
</p><p>
The graph on the right illustrates how much more optimistic service time is than response time.
The graph shows response time and service time for the same system under a load of 100,000 messages
per second. Out of 24 million measurements, only ~50 are more than 250 microseconds, less than 0.001%.
In a service time-only graph this would hardly be visible.
However, the depending on the load it will take a while to catch up after a spike.
</p><p>
The response time graph shows that in reality many more events are
impacted by these delays than the service time numbers alone would suggest.
</p>
<p>
To learn more, watch Gil Tene's eye-opening presentation
<a href="http://www.infoq.com/presentations/latency-response-time">How NOT to measure
latency</a>.
</p>
</td>
<td width="50%" style="border: 0"><a href="images/ResponseTimeVsServiceTimeAsyncLoggers.png"><img
src="images/ResponseTimeVsServiceTimeAsyncLoggers.png" width="480" height="288"/></a>
</td>
</tr>
</table>
</td></tr>
</table>
<a name="loglibComparison" />
<h3>Logging Library Performance Comparison</h3>
<a name="asyncLogging" />
<h4>Asynchronous Logging - Peak Throughput Comparison</h4>
<p>Asynchronous logging is useful to deal with bursts of events. How this works is that
a minimum amount of work is done by the application thread to capture all required information in a log event,
and this log event is then put on a queue for later processing by a background thread.
As long as the queue is sized large enough, the application threads should be able to spend
very little time on the logging call and return to the business logic very quickly.
</p>
<p>It turns out that the choice of queue is extremely important for peak throughput.
Log4j 2's Async Loggers use a
<a href="http://lmax-exchange.github.com/disruptor/">lock-free data structure</a>, whereas
Logback, Log4j 1.2 and Log4j 2's Asynchronous Appenders use an ArrayBlockingQueue.
With a blocking queue, multi-threaded applications often experience lock contention when trying to
enqueue the log event.
</p>
<p>The graph below illustrates the difference a lock-free data structure can make to throughput
in multi-threaded scenarios. <em>Log4j 2 scales better with the number of threads:
an application with more threads can log more. The other logging libraries suffer
from lock contention and total throughput stays constant or drops when more threads are logging.
This means that with the other logging libraries, each individual thread will be able to log less.</em></p>
<p>Bear in mind that this is <em>peak</em> throughput: Log4j 2's Async Loggers give better throughput up to a point, but
once the queue is full, the appender thread needs to wait until a slot becomes available in the queue,
and throughput will drop to the maximum sustained throughput of the underlying appenders at best.
</p>
<p><img src="images/async-throughput-comparison.png" alt="Peak throughput comparison" />
</p>
<p>For details, see the <a href="manual/async.html">Async
Loggers</a> manual page.</p>
<a name="asyncLoggingResponseTime" />
<h4>Asynchronous Logging Response Time</h4>
<p>Response time behaviour varies a lot with the workload and the number of threads that log concurrently.
The <a href="manual/async.html#Latency">Async Loggers</a> manual page and the
<a href="manual/garbagefree.html#Latency">garbage-free logging</a> manual page
provide some graphs showing response time behaviour under various loads.
</p>
<p>This section shows another graph showing response time latency behaviour
under a modest total workload of 64,000 messages per second, with 4 threads logging concurrently.
At this load and on this hardware/OS/JVM configuration, lock contention and context switches play less of a role
and the pauses are mostly caused by minor garbage collections.
Garbage collection pause duration and frequency can vary a lot: when testing the Log4j 1.2.17
Async Appender
a minor GC pause of 7 milliseconds occurred while the Log4j 2 Async Appender test only saw
a GC pause of a little over 2 milliseconds. This does not necessarily mean that one is better than the other.
</p>
<p>Generally, garbage-free async loggers had the best response time behaviour
in all configurations we tested. </p>
<p><img src="images/ResponseTimeAsyncLogging4Threads@16kEach.png" alt="" /></p>
<p>The above result was obtained with
the ResponseTimeTest class which can be found in the Log4j 2 unit test source directory,
running on JDK 1.8.0_45 on
RHEL 6.5 (Linux 2.6.32-573.1.1.el6.x86_64) with
10-core Xeon CPU E5-2660 v3 @2.60GHz with hyperthreading switched on (20 virtual cores).
</p>
<a name="asyncLoggingWithParams" />
<h4>Asynchronous Logging Parameterized Messages</h4>
<p>Many logging libraries offer an API for logging parameterized messages.
This enables application code to look something like this:<pre>
logger.debug("Entry number: {} is {}", i, entry[i]);</pre>
In the above example, the fully formatted message text is not created unless the DEBUG level is enabled for the logger.
Without this API, you would need three lines of code to accomplish the same:
<pre>
if (logger.isDebugEnabled()) {
logger.debug("Entry number: " + i + " is " + entry[i].toString());
}</pre>
</p>
<p>If the DEBUG level <em>is</em> enabled, then at some point the message needs to be formatted.
When logging asynchronously, the message parameters may be changed
by the application thread before the background thread had a chance to log the message.
This would show the wrong values in the log file.
To prevent this, Log4j 2, Log4j 1.2 and Logback format the message text in the application thread
<em>before</em> passing off the log event to the background thread.</p>
<p>This is the safe thing to do, but the formatting has a performance cost.
The graph below compares the throughput of logging messages with parameters using various logging libraries.
These are all asynchronous logging calls, so these numbers do not include the cost of disk I/O
and represent <em>peak</em> throughput.</p>
<p>JUL (java.util.logging) does not have a built-in asynchronous Handler.
<a href="https://docs.oracle.com/javase/8/docs/api/java/util/logging/MemoryHandler.html">MemoryHandler</a>
is the nearest thing available so we included it here.
MemoryHandler does <em>not</em> do the safe thing of taking a snapshot
of the current parameter state (it just keeps a reference to the original parameter objects),
and as a result it is very fast when single-threaded.
However, when more application threads are logging concurrently, the cost of lock contention outweighs this gain.</p>
<p>In absolute numbers, <em>Log4j 2's Async Loggers perform well compared to the other logging
frameworks, but notice that the message formatting cost increases sharply with the number of parameters.
In this area, Log4j 2 still has work to do to improve: we would like to keep this cost more constant.</em>
</p>
<p><img src="images/ParamMsgThrpt1-4T.png" /></p>
<p>The results above are for JUL (java.util.logging) 1.8.0_45, Log4j 2.6, Log4j 1.2.17 and Logback 1.1.7,
and were obtained with the
<a href="http://openjdk.java.net/projects/code-tools/jmh/">JMH</a> Java benchmark harness.
See the AsyncAppenderLog4j1Benchmark, AsyncAppenderLog4j2Benchmark, AsyncAppenderLogbackBenchmark,
AsyncLoggersBenchmark and the MemoryHandlerJULBenchmark source code in the log4j-perf module.
</p>
<a name="asyncLoggingWithLocation" />
<h4>Asynchronous Logging with Caller Location Information</h4>
<p>
Some layouts can show the class, method and line number in the application where the logging call was made.
In Log4j 2, examples of such layout options are HTML
<a href="layouts.html#HtmlLocationInfo">locationInfo</a>,
or one of the patterns <a href="layouts.html#PatternClass">%C or $class</a>,
<a href="layouts.html#PatternFile">%F or %file</a>,
<a href="layouts.html#PatternLocation">%l or %location</a>,
<a href="layouts.html#PatternLine">%L or %line</a>,
<a href="layouts.html#PatternMethod">%M or %method</a>.
In order to provide caller location information, the logging library
will take a snapshot of the stack, and walk the stack trace to find the location information.
</p>
<p>
The graph below shows the performance impact of capturing caller location information when logging
asynchronously from a single thread. Our tests show that <em>capturing caller location has a similar impact
across all logging libraries, and slows down asynchronous
logging by about 30-100x</em>.
</p>
<p><img src="images/AsyncWithLocationThrpt1T-labeled.png" /></p>
<p>The results above are for JUL (java.util.logging) 1.8.0_45, Log4j 2.6, Log4j 1.2.17 and Logback 1.1.7,
and were obtained with the
<a href="http://openjdk.java.net/projects/code-tools/jmh/">JMH</a> Java benchmark harness.
See the AsyncAppenderLog4j1LocationBenchmark, AsyncAppenderLog4j2LocationBenchmark,
AsyncAppenderLogbackLocationBenchmark, AsyncLoggersLocationBenchmark and the
MemoryHandlerJULLocationBenchmark source code in the log4j-perf module.
</p>
<a name="fileLoggingComparison" />
<h4>Synchronous File Logging - Sustained Throughput Comparison</h4>
<p>This section discusses the maximum sustained throughput of logging to a file.
In any system, the maximum sustained throughput is determined by its slowest component.
In the case of logging, this is the appender, where the message formatting and disk I/O takes place.
For this reason we will look at simple <em>synchronous</em> logging to a file,
without queues or background threads.</p>
<p>The graph below compares Log4j 2.6's RandomAccessFile appender
to the respective File appenders of Log4j 1.2.17, Logback 1.1.7 and
Java util logging (JUL) on Oracle Java 1.8.0_45. ImmediateFlush was set to false for all
loggers that support this. The JUL results are for the XMLFormatter (which in our measurements was
about twice as fast as the SimpleFormatter).</p>
<p><em>Log4j 2's sustained throughput drops a little when more threads are logging simultaneously,
but its fine-grained locking pays off and throughput stays relatively high.
The other logging frameworks' throughput drops dramatically in multi-threaded applications:
Log4j 1.2 has 1/4th of its single-threaded capacity,
Logback has 1/10th of its single-threaded capacity, and JUL steadily drops from 1/4th to 1/10th of its
single-threaded throughput as more threads are added.</em></p>
<p><img src="images/SyncThroughputLoggerComparisonLinux.png" /></p>
<p>The synchronous logging throughput results above are obtained with the
<a href="http://openjdk.java.net/projects/code-tools/jmh/">JMH</a> Java benchmark harness.
See the FileAppenderBenchmark source code in the log4j-perf module.</p>
<h4>Synchronous File Logging - Response Time Comparison</h4>
<p>Response time for synchronous file logging varies a lot with the workload and the
number of threads. Below is a sample for a workload of 32,000 events per second,
with 2 threads logging 16,000 events per second each.</p>
<p><img src="images/SynchronousFileResponseTime2T32k-labeled.png" /></p>
<p>The above result was obtained with the ResponseTimeTest class which can be found in the Log4j 2
unit test source directory, running on JDK 1.8.0_45 on RHEL 6.5 (Linux 2.6.32-573.1.1.el6.x86_64)
with 10-core Xeon CPU E5-2660 v3 @2.60GHz with hyperthreading switched on (20 virtual cores).</p>
<!--
TODO
<h4>Synchronous Socket Sustained Throughput Comparison</h4>
<h4>Synchronous Syslog Sustained Throughput Comparison</h4>
-->
<a name="filtering" />
<h4>Filtering by Level</h4>
<p>The most basic filtering a logging framework provides is filtering by log level.
When logging is turned off entirely or just for a set of Levels, the cost of a log request consists
of a number of method invocations plus an integer comparison.
Unlike Log4j, Log4j 2 Loggers don't "walk a hierarchy".
Loggers point directly to the Logger configuration that best matches the Logger's name.
This incurs extra overhead when the Logger is first created but reduces the overhead every
time the Logger is used.
</p>
<h4>Advanced Filtering</h4>
<p>
Both Logback and Log4j 2 support advanced filtering. Logback calls them TurboFilters while
Log4j 2 has a single Filter object. Advanced filtering provides the capability to filter
LogEvents using more than just the Level before the events are passed to Appenders.
However, this flexibility does come with some cost.
Since multi-threading can also have an impact on the
performance of advanced filtering, the chart below shows the difference in performance of filtering based
on a Marker or a Marker's parent.</p>
<p>
The "Simple Marker" comparison checks to see if a Marker that has no references
to other markers matches the requested Marker. The "Parent Marker" comparison checks to see
if a Marker that does have references to other markers matches the requested Marker. </p>
<p>It appears that coarse-grained synchronization in SLF4J can impact performance in
multi-threaded scenarios. See
<a href="http://jira.qos.ch/browse/SLF4J-240">SLF4J-240</a>.</p>
<p><img src="images/MarkerFilterCostComparison.png" /></p>
<p>Log4j and Logback also support filtering
on a value in the Log4j ThreadContext vs filtering in Logback on a value in the MDC.
The graph below shows that the performance difference between Log4j 2 and Logback
is small for the ThreadContext filter.
</p><p><img src="images/ThreadContextFilterCostComparison.png" /></p>
<p>The Filter comparison results above are obtained with the
<a href="http://openjdk.java.net/projects/code-tools/jmh/">JMH</a> Java benchmark harness.
See the MarkerFilterBenchmark and MDCFilterBenchmark in the log4j-perf module for details on these
benchmarks.</p>
<a name="tradeoffs" />
<h3>Trade-offs</h3>
<a name="whichAppender" />
<h4>Which Log4j 2 Appender to Use?</h4>
<p>Assuming that you selected Log4j 2 as your logging framework,
next you may be interested in learning what the performance trade-offs are for
selecting a specific Log4j 2 configuration. For example, there are three appenders
for logging to a file: the File, RandomAccessFile and MemoryMappedFile appenders.
Which one should you use?</p>
<p>If performance is all you care about, the graphs below show your best choice is either
the MemoryMappedFile appender or the RandomAccessFile appender.
Some things to bear in mind:
</p>
<ul>
<li>MemoryMappedFile appender does not have a rolling variant yet.</li>
<li>When the log file size exceeds the MemoryMappedFile's region length, the file needs to be remapped.
This can be a very expensive operation, taking several seconds if the region is large.</li>
<li>MemoryMappedFile appender creates a presized file from the beginning and fills it up gradually.
This can confuse tools like <tt>tail</tt>; many such tools don't work very well with memory mapped files.</li>
<li>On Windows, using a tool like <tt>tail</tt> on a file created by RandomAccessFile appender
can hold a lock on this file which may prevent Log4j from opening the file again when the
application is restarted. In a development environment where you expect to restart your application
regularly while using tools like tail to view the log file contents, the File appender may be
a reasonable trade-off between performance and flexibility. For production environments
performance may have higher priority.</li>
</ul>
<p>
The graph below shows sustained throughput for the console and file appenders in Log4j 2.6,
and for reference also provides the 2.5 performance.</p>
<p>It turns out that the garbage-free text encoding logic in 2.6 gives
these appenders a performance boost compared to Log4j 2.5.
It used to be that the RandomAccessFile appender was significantly faster,
especially in multi-threaded scenarios, but with the 2.6 release the File appender
performance has improved and the performance difference between these two appender is smaller.</p>
<p>Another takeaway is just how much of a performance drag logging to the console can be.
Considering logging to a file and using a tool like <tt>tail</tt> to watch the file change in real time.</p>
<p><img src="images/Log4j2AppenderThroughputComparison-linux.png"/></p>
<p>On Windows, the results are similar but the RandomAccessFile and MemoryMappedFile appenders outperform
the plain File appender in multi-threaded scenarios.
The absolute numbers are higher on Windows: we don't know why but it looks like Windows
handles lock contention better than Linux.</p>
<p><img src="images/Log4j2AppenderThroughputComparison-windows.png"/></p>
<p>The Log4j 2 appender comparison results above are obtained with the
<a href="http://openjdk.java.net/projects/code-tools/jmh/">JMH</a> Java benchmark harness.
See the Log4j2AppenderComparisonBenchmark source code in the log4j-perf module.</p>
<!--
<p>The user should be aware of the following performance issues.</p>
<h3>Logging performance when logging is turned off.</h3>
<p>When logging is turned off entirely or just for a set of Levels, the cost of a log request consists of
two method invocations plus an integer comparison. On a 2.53 GHz Intel Core 2 Duo MacBook Pro
calling isDebugEnabled 10 million times produces an average result in nanoseconds of:</p>
<pre>
Log4j: 4
Logback: 5
Log4j 2: 3
</pre>
<p>
The numbers above will vary slightly from run to run so the only conclusion that should be
drawn is that all 3 frameworks perform similarly on this task.
</p>
<p>However, The method invocation involves the "hidden" cost of parameter construction.
</p>
<p>For example,
</p>
<pre>
logger.debug("Entry number: " + i + " is " + String.valueOf(entry[i]));
</pre>
<p>
incurs the cost of constructing the message parameter, i.e. converting both integer
<code>i</code> and <code>entry[i]</code> to a String, and concatenating intermediate strings,
regardless of whether the message will be logged or not.
This cost of parameter construction can be quite high and it
depends on the size of the parameters involved.
A comparison run on the same hardware as above yields:
</p>
<pre>
Log4j: 188
Logback: 183
Log4j 2: 188
</pre>
<p>
Again, no conclusion should be drawn regarding relative differences between the frameworks on
this task, but it should be obvious that it is considerably more expensive than simply testing
the level.
</p>
<p>
The best approach to avoid the cost of parameter construction is to use Log4j 2's formatting
capabilities. For example, instead of the above write:
</p>
<pre>
logger.debug("Entry number: {} is {}", i, entry[i]);
</pre>
<p>
Using this approach, a comparison run again on the same hardware produces:
</p>
<pre>
Log4j: Not supported
Logback: 9
Log4j 2: 4
</pre>
<p>
These results show that the difference in performance between the call to isDebugEnabled and
logger.debug is barely discernible.
</p>
<p>In some circumstances one of the parameters to logger.debug will be a costly method call that
should be avoided if debugging is disabled. In those cases write:
</p>
<pre>
if(logger.isDebugEnabled() {
logger.debug("Entry number: " + i + " is " + entry[i].toString());
}
</pre>
<p>This will not incur the cost of whatever the toString() method needs to do if debugging is disabled.
On the other hand, if the logger is enabled for the debug level, it will incur twice the cost of
evaluating whether the logger is enabled or not: once
in <code>isDebugEnabled</code> and once in <code>debug</code>. This is an insignificant
overhead because evaluating a logger takes about 1% of the time it takes to actually log.
</p>
<p>Certain users resort to pre-processing or compile-time
techniques to compile out all log statements. This leads to perfect
performance efficiency with respect to logging. However, since the
resulting application binary does not contain any log statements,
logging cannot be turned on for that binary. This seems to be
a disproportionate price to pay in exchange for a small performance
gain.
</p>
<h3>The performance of deciding whether to log or not to log when logging is turned on.</h3>
<p>
Unlike Log4j, Log4j 2 Loggers don't "walk a hierarchy". Loggers point directly to the
Logger configuration that best matches the Logger's name. This incurs extra overhead when the Logger
is first created but reduces the overhead every time the Logger is used.
</p>
<h3>Actually outputting log messages</h3>
<p>This is the cost of formatting the log output and sending it to its target destination. Here again,
a serious effort was made to make layouts (formatters) perform as quickly as possible. The same
is true for appenders. One of the fundamental tenets of Log4j 2 is to use immutable objects whenever
possible and to lock at the lowest granularity possible. However, the cost of actually formatting and
delivering log events will never be insignificant. For example, the results of writing to a simple log
file using the same format using Log4j, Logback and Log4j 2 are:
</p>
<pre>
Log4j: 1651
Logback: 1419
Log4j 2.0: 1542
</pre>
<p>
As with many of the other results on this page the differences between the frameworks above should be
considered insignificant. The values will change somewhat on each execution and changing the order the
frameworks are tested or adding calls to System.gc() between the tests can cause a variation in the
reported times. However, these results show that actually writing out the events can be at least 1000
times more expensive than when they are disabled, so it is always recommended to take advantage of
Log4j 2's fine-grained filtering capabilities.
</p>
-->
</section>
</body>
</document>