| //// |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| https://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| //// |
| = Performance |
| Remko Popma <rpopma@apache.org>; Ralph Goers <rgoers@apache.org> |
| |
| //// |
| One of the often-cited arguments against logging is its |
| computational cost. This is a legitimate concern as even moderately |
| sized applications can generate thousands of log requests. Much |
| effort was spent measuring and tweaking logging performance. Log4j |
| claims to be fast and flexible: speed first, flexibility second. |
| //// |
| |
| Apart from functional requirements, an important reason for selecting a |
| logging library is often how well it fulfills non-functional |
| requirements like reliability and performance. |
| |
| This page compares the performance of a number of logging frameworks |
| (java.util.logging "JUL", Logback, Log4j 1.2 and Log4j 2.6), and |
| documents some performance trade-offs for Log4j 2 functionality. |
| |
| [#benchmarks] |
| == Benchmarks |
| |
| Performance can mean different things to different people. Common terms |
| in this context are throughput and latency: _Throughput_ is a measure of |
| capacity and can be expressed in a single number: how many messages can |
| be logged in a certain period of time. _Response time latency_ is how |
| long it takes to log a message. This cannot be expressed in a single |
| number because each measurement has its own response time and we are |
| often most interested in the outliers: how many there were and how large |
| they were. |
| |
| When evaluating a logging framework's performance these may be useful |
| questions to ask: |
| |
| [qanda] |
| What is its *peak throughput*?:: |
| Many systems that react to external |
| events need to log bursts of messages followed by periods of relative |
| quiet. This number is the maximum throughput measured over a short |
| period of time and gives some idea of how well the logging library deals |
| with bursts. For systems that need to log a lot at a constant high rate |
| (for example, batch jobs) this is less likely to be a useful measure of |
| performance. |
| |
| What is the *maximum sustained throughput*?:: |
| This is the throughput |
| averaged over a long time. This is a useful measure of the "upper limit" |
| of the capacity of the logging library. It is not recommended that |
| reactive applications actually log at this rate since under this load |
| they will likely experience jitter and large response time spikes. |
| |
| What is its *response time* behaviour under various loads?:: |
| This is the |
| most important question for applications that need to react to external |
| events in a timely manner. Response time is the total amount of time it |
| takes to log a message and is the sum of the service time and wait time. |
| The *service time* is the time it takes to do the work to log the |
| message. As the workload increases, the service time often varies |
| little: to do X amount of work it always takes X amount of time. The |
| *wait time* is how long the request had to wait in a queue before being |
| serviced. _As the workload increases, wait time often grows to many |
| times the service time._ |
| |
| [[responseTimeVsServiceTime]] |
| |
| .Why Care About Response Time Latency? |
| **** |
| [width="100%",cols="50%,50%"] |
| |=== |
| a| |
| What is often measured and reported as _latency_ is actually _service |
| time_, and omits that a service time spike adds wait time for many |
| subsequent events. This may present results that are more optimistic |
| than what users experience. |
| |
| The graph on the right illustrates how much more optimistic service time |
| is than response time. The graph shows response time and service time |
| for the same system under a load of 100,000 messages per second. Out of |
| 24 million measurements, only ~50 are more than 250 microseconds, less |
| than 0.001%. In a service time-only graph this would hardly be visible. |
| However, the depending on the load it will take a while to catch up |
| after a spike. |
| |
| The response time graph shows that in reality many more events are |
| impacted by these delays than the service time numbers alone would |
| suggest. |
| |
| To learn more, watch Gil Tene's eye-opening presentation |
| http://www.infoq.com/presentations/latency-response-time[How NOT to |
| measure latency]. |
| |
| |
| |link:images/ResponseTimeVsServiceTimeAsyncLoggers.png[image:images/ResponseTimeVsServiceTimeAsyncLoggers.png[image,width=480,height=288]] |
| |=== |
| **** |
| |
| [#loglibComparison] |
| == Logging Library Performance Comparison |
| |
| [#asyncLogging] |
| === Asynchronous Logging - Peak Throughput Comparison |
| |
| Asynchronous logging is useful to deal with bursts of events. How this |
| works is that a minimum amount of work is done by the application thread |
| to capture all required information in a log event, and this log event |
| is then put on a queue for later processing by a background thread. As |
| long as the queue is sized large enough, the application threads should |
| be able to spend very little time on the logging call and return to the |
| business logic very quickly. |
| |
| It turns out that the choice of queue is extremely important for peak |
| throughput. Log4j 2's Async Loggers use a |
| http://lmax-exchange.github.com/disruptor/[lock-free data structure], |
| whereas Logback, Log4j 1.2 and Log4j 2's Asynchronous Appenders use an |
| ArrayBlockingQueue. With a blocking queue, multi-threaded applications |
| often experience lock contention when trying to enqueue the log event. |
| |
| The graph below illustrates the difference a lock-free data structure |
| can make to throughput in multi-threaded scenarios. _Log4j 2 scales |
| better with the number of threads: an application with more threads can |
| log more. The other logging libraries suffer from lock contention and |
| total throughput stays constant or drops when more threads are logging. |
| This means that with the other logging libraries, each individual thread |
| will be able to log less._ |
| |
| Bear in mind that this is _peak_ throughput: Log4j 2's Async Loggers |
| give better throughput up to a point, but once the queue is full, the |
| appender thread needs to wait until a slot becomes available in the |
| queue, and throughput will drop to the maximum sustained throughput of |
| the underlying appenders at best. |
| |
| image:images/async-throughput-comparison.png[Peak throughput comparison] |
| |
| For details, see the link:manual/async.html[Async Loggers] manual page. |
| |
| [#asyncLoggingResponseTime] |
| === Asynchronous Logging Response Time |
| |
| Response time behaviour varies a lot with the workload and the number of |
| threads that log concurrently. The link:manual/async.html#Latency[Async |
| Loggers] manual page and the |
| link:manual/garbagefree.html#Latency[garbage-free logging] manual page |
| provide some graphs showing response time behaviour under various loads. |
| |
| This section shows another graph showing response time latency behaviour |
| under a modest total workload of 64,000 messages per second, with 4 |
| threads logging concurrently. At this load and on this hardware/OS/JVM |
| configuration, lock contention and context switches play less of a role |
| and the pauses are mostly caused by minor garbage collections. Garbage |
| collection pause duration and frequency can vary a lot: when testing the |
| Log4j 1.2.17 Async Appender a minor GC pause of 7 milliseconds occurred |
| while the Log4j 2 Async Appender test only saw a GC pause of a little |
| over 2 milliseconds. This does not necessarily mean that one is better |
| than the other. |
| |
| Generally, garbage-free async loggers had the best response time |
| behaviour in all configurations we tested. |
| |
| image:images/ResponseTimeAsyncLogging4Threads@16kEach.png[Response time comparison] |
| |
| The above result was obtained with the ResponseTimeTest class which can |
| be found in the Log4j 2 unit test source directory, running on JDK |
| 1.8.0_45 on RHEL 6.5 (Linux 2.6.32-573.1.1.el6.x86_64) with 10-core Xeon |
| CPU E5-2660 v3 @2.60GHz with hyperthreading switched on (20 virtual |
| cores). |
| |
| [#asyncLoggingWithParams] |
| === Asynchronous Logging Parameterized Messages |
| |
| Many logging libraries offer an API for logging parameterized messages. |
| This enables application code to look something like this: |
| |
| [source,java] |
| ---- |
| logger.debug("Entry number: {} is {}", i, entry[i]); |
| ---- |
| |
| In the above example, the fully formatted message text is not created |
| unless the DEBUG level is enabled for the logger. Without this API, you |
| would need three lines of code to accomplish the same: |
| |
| [source,java] |
| ---- |
| if (logger.isDebugEnabled()) { |
| logger.debug("Entry number: " + i + " is " + entry[i].toString()); |
| } |
| ---- |
| |
| If the DEBUG level _is_ enabled, then at some point the message needs to |
| be formatted. When logging asynchronously, the message parameters may be |
| changed by the application thread before the background thread had a |
| chance to log the message. This would show the wrong values in the log |
| file. To prevent this, Log4j 2, Log4j 1.2 and Logback format the message |
| text in the application thread _before_ passing off the log event to the |
| background thread. |
| |
| This is the safe thing to do, but the formatting has a performance cost. |
| The graph below compares the throughput of logging messages with |
| parameters using various logging libraries. These are all asynchronous |
| logging calls, so these numbers do not include the cost of disk I/O and |
| represent _peak_ throughput. |
| |
| JUL (java.util.logging) does not have a built-in asynchronous Handler. |
| https://docs.oracle.com/javase/8/docs/api/java/util/logging/MemoryHandler.html[`MemoryHandler`] |
| is the nearest thing available so we included it here. MemoryHandler |
| does _not_ do the safe thing of taking a snapshot of the current |
| parameter state (it just keeps a reference to the original parameter |
| objects), and as a result it is very fast when single-threaded. However, |
| when more application threads are logging concurrently, the cost of lock |
| contention outweighs this gain. |
| |
| In absolute numbers, _Log4j 2's Async Loggers perform well compared to |
| the other logging frameworks, but notice that the message formatting |
| cost increases sharply with the number of parameters. In this area, |
| Log4j 2 still has work to do to improve: we would like to keep this cost |
| more constant._ |
| |
| image:images/ParamMsgThrpt1-4T.png[image] |
| |
| The results above are for JUL (java.util.logging) 1.8.0_45, Log4j 2.6, |
| Log4j 1.2.17 and Logback 1.1.7, and were obtained with the |
| http://openjdk.java.net/projects/code-tools/jmh/[JMH] Java benchmark |
| harness. See the AsyncAppenderLog4j1Benchmark, |
| AsyncAppenderLog4j2Benchmark, AsyncAppenderLogbackBenchmark, |
| AsyncLoggersBenchmark and the MemoryHandlerJULBenchmark source code in |
| the log4j-perf module. |
| |
| [#asyncLoggingWithLocation] |
| === Asynchronous Logging with Caller Location Information |
| |
| Some layouts can show the class, method and line number in the |
| application where the logging call was made. In Log4j 2, examples of |
| such layout options are HTML |
| link:layouts.html#HtmlLocationInfo[locationInfo], or one of the patterns |
| link:layouts.html#PatternClass[%C or $class], |
| link:layouts.html#PatternFile[%F or %file], |
| link:layouts.html#PatternLocation[%l or %location], |
| link:layouts.html#PatternLine[%L or %line], |
| link:layouts.html#PatternMethod[%M or %method]. In order to provide |
| caller location information, the logging library will take a snapshot of |
| the stack, and walk the stack trace to find the location information. |
| |
| The graph below shows the performance impact of capturing caller |
| location information when logging asynchronously from a single thread. |
| Our tests show that _capturing caller location has a similar impact |
| across all logging libraries, and slows down asynchronous logging by |
| about 30-100x_. |
| |
| image:images/AsyncWithLocationThrpt1T-labeled.png[image] |
| |
| The results above are for JUL (java.util.logging) 1.8.0_45, Log4j 2.6, |
| Log4j 1.2.17 and Logback 1.1.7, and were obtained with the |
| http://openjdk.java.net/projects/code-tools/jmh/[JMH] Java benchmark |
| harness. See the AsyncAppenderLog4j1LocationBenchmark, |
| AsyncAppenderLog4j2LocationBenchmark, |
| AsyncAppenderLogbackLocationBenchmark, AsyncLoggersLocationBenchmark and |
| the MemoryHandlerJULLocationBenchmark source code in the log4j-perf |
| module. |
| |
| [#fileLoggingComparison] |
| === Synchronous File Logging - Sustained Throughput Comparison |
| |
| This section discusses the maximum sustained throughput of logging to a |
| file. In any system, the maximum sustained throughput is determined by |
| its slowest component. In the case of logging, this is the appender, |
| where the message formatting and disk I/O takes place. For this reason |
| we will look at simple _synchronous_ logging to a file, without queues |
| or background threads. |
| |
| The graph below compares Log4j 2.6's RandomAccessFile appender to the |
| respective File appenders of Log4j 1.2.17, Logback 1.1.7 and Java util |
| logging (JUL) on Oracle Java 1.8.0_45. ImmediateFlush was set to false |
| for all loggers that support this. The JUL results are for the |
| XMLFormatter (which in our measurements was about twice as fast as the |
| SimpleFormatter). |
| |
| _Log4j 2's sustained throughput drops a little when more threads are |
| logging simultaneously, but its fine-grained locking pays off and |
| throughput stays relatively high. The other logging frameworks' |
| throughput drops dramatically in multi-threaded applications: Log4j 1.2 |
| has 1/4th of its single-threaded capacity, Logback has 1/10th of its |
| single-threaded capacity, and JUL steadily drops from 1/4th to 1/10th of |
| its single-threaded throughput as more threads are added._ |
| |
| image:images/SyncThroughputLoggerComparisonLinux.png[image] |
| |
| The synchronous logging throughput results above are obtained with the |
| http://openjdk.java.net/projects/code-tools/jmh/[JMH] Java benchmark |
| harness. See the FileAppenderBenchmark source code in the log4j-perf |
| module. |
| |
| === Synchronous File Logging - Response Time Comparison |
| |
| Response time for synchronous file logging varies a lot with the |
| workload and the number of threads. Below is a sample for a workload of |
| 32,000 events per second, with 2 threads logging 16,000 events per |
| second each. |
| |
| image:images/SynchronousFileResponseTime2T32k-labeled.png[image] |
| |
| The above result was obtained with the ResponseTimeTest class which can |
| be found in the Log4j 2 unit test source directory, running on JDK |
| 1.8.0_45 on RHEL 6.5 (Linux 2.6.32-573.1.1.el6.x86_64) with 10-core Xeon |
| CPU E5-2660 v3 @2.60GHz with hyperthreading switched on (20 virtual |
| cores). |
| |
| //// |
| TODO |
| === Synchronous Socket Sustained Throughput Comparison |
| |
| === Synchronous Syslog Sustained Throughput Comparison |
| //// |
| |
| [#filtering] |
| === Filtering by Level |
| |
| The most basic filtering a logging framework provides is filtering by |
| log level. When logging is turned off entirely or just for a set of |
| Levels, the cost of a log request consists of a number of method |
| invocations plus an integer comparison. Unlike Log4j, Log4j 2 Loggers |
| don't "walk a hierarchy". Loggers point directly to the Logger |
| configuration that best matches the Logger's name. This incurs extra |
| overhead when the Logger is first created but reduces the overhead every |
| time the Logger is used. |
| |
| === Advanced Filtering |
| |
| Both Logback and Log4j 2 support advanced filtering. Logback calls them |
| TurboFilters while Log4j 2 has a single Filter object. Advanced |
| filtering provides the capability to filter LogEvents using more than |
| just the Level before the events are passed to Appenders. However, this |
| flexibility does come with some cost. Since multi-threading can also |
| have an impact on the performance of advanced filtering, the chart below |
| shows the difference in performance of filtering based on a Marker or a |
| Marker's parent. |
| |
| The "Simple Marker" comparison checks to see if a Marker that has no |
| references to other markers matches the requested Marker. The "Parent |
| Marker" comparison checks to see if a Marker that does have references |
| to other markers matches the requested Marker. |
| |
| It appears that coarse-grained synchronization in SLF4J can impact |
| performance in multi-threaded scenarios. See |
| http://jira.qos.ch/browse/SLF4J-240[SLF4J-240]. |
| |
| image:images/MarkerFilterCostComparison.png[image] |
| |
| Log4j and Logback also support filtering on a value in the Log4j |
| ThreadContext vs filtering in Logback on a value in the MDC. The graph |
| below shows that the performance difference between Log4j 2 and Logback |
| is small for the ThreadContext filter. |
| |
| image:images/ThreadContextFilterCostComparison.png[image] |
| |
| The Filter comparison results above are obtained with the |
| http://openjdk.java.net/projects/code-tools/jmh/[JMH] Java benchmark |
| harness. See the MarkerFilterBenchmark and MDCFilterBenchmark in the |
| log4j-perf module for details on these benchmarks. |
| |
| [#tradeoffs] |
| == Trade-offs |
| |
| [#whichAppender] |
| === Which Log4j 2 Appender to Use? |
| |
| Assuming that you selected Log4j 2 as your logging framework, next you |
| may be interested in learning what the performance trade-offs are for |
| selecting a specific Log4j 2 configuration. For example, there are three |
| appenders for logging to a file: the File, RandomAccessFile and |
| MemoryMappedFile appenders. Which one should you use? |
| |
| If performance is all you care about, the graphs below show your best |
| choice is either the MemoryMappedFile appender or the RandomAccessFile |
| appender. Some things to bear in mind: |
| |
| * MemoryMappedFile appender does not have a rolling variant yet. |
| * When the log file size exceeds the MemoryMappedFile's region length, |
| the file needs to be remapped. This can be a very expensive operation, |
| taking several seconds if the region is large. |
| * MemoryMappedFile appender creates a presized file from the beginning |
| and fills it up gradually. This can confuse tools like `tail`; many such |
| tools don't work very well with memory mapped files. |
| * On Windows, using a tool like `tail` on a file created by |
| RandomAccessFile appender can hold a lock on this file which may prevent |
| Log4j from opening the file again when the application is restarted. In |
| a development environment where you expect to restart your application |
| regularly while using tools like tail to view the log file contents, the |
| File appender may be a reasonable trade-off between performance and |
| flexibility. For production environments performance may have higher |
| priority. |
| |
| The graph below shows sustained throughput for the console and file |
| appenders in Log4j 2.6, and for reference also provides the 2.5 |
| performance. |
| |
| It turns out that the garbage-free text encoding logic in 2.6 gives |
| these appenders a performance boost compared to Log4j 2.5. It used to be |
| that the RandomAccessFile appender was significantly faster, especially |
| in multi-threaded scenarios, but with the 2.6 release the File appender |
| performance has improved and the performance difference between these |
| two appender is smaller. |
| |
| Another takeaway is just how much of a performance drag logging to the |
| console can be. Considering logging to a file and using a tool like |
| `tail` to watch the file change in real time. |
| |
| image:images/Log4j2AppenderThroughputComparison-linux.png[image] |
| |
| On Windows, the results are similar but the RandomAccessFile and |
| MemoryMappedFile appenders outperform the plain File appender in |
| multi-threaded scenarios. The absolute numbers are higher on Windows: we |
| don't know why but it looks like Windows handles lock contention better |
| than Linux. |
| |
| image:images/Log4j2AppenderThroughputComparison-windows.png[image] |
| |
| The Log4j 2 appender comparison results above are obtained with the |
| http://openjdk.java.net/projects/code-tools/jmh/[JMH] Java benchmark |
| harness. See the Log4j2AppenderComparisonBenchmark source code in the |
| log4j-perf module. |
| |
| //// |
| The user should be aware of the following performance issues. |
| |
| === Logging performance when logging is turned off. |
| |
| When logging is turned off entirely or just for a set of Levels, the |
| cost of a log request consists of two method invocations plus an integer |
| comparison. On a 2.53 GHz Intel Core 2 Duo MacBook Pro calling |
| isDebugEnabled 10 million times produces an average result in |
| nanoseconds of: |
| |
| .... |
| Log4j: 4 |
| Logback: 5 |
| Log4j 2: 3 |
| |
| .... |
| |
| The numbers above will vary slightly from run to run so the only |
| conclusion that should be drawn is that all 3 frameworks perform |
| similarly on this task. |
| |
| However, The method invocation involves the "hidden" cost of parameter |
| construction. |
| |
| For example, |
| |
| .... |
| logger.debug("Entry number: " + i + " is " + String.valueOf(entry[i])); |
| |
| .... |
| |
| incurs the cost of constructing the message parameter, i.e. converting |
| both integer `i` and `entry[i]` to a String, and concatenating |
| intermediate strings, regardless of whether the message will be logged |
| or not. This cost of parameter construction can be quite high and it |
| depends on the size of the parameters involved. A comparison run on the |
| same hardware as above yields: |
| |
| .... |
| Log4j: 188 |
| Logback: 183 |
| Log4j 2: 188 |
| |
| .... |
| |
| Again, no conclusion should be drawn regarding relative differences |
| between the frameworks on this task, but it should be obvious that it is |
| considerably more expensive than simply testing the level. |
| |
| The best approach to avoid the cost of parameter construction is to use |
| Log4j 2's formatting capabilities. For example, instead of the above |
| write: |
| |
| .... |
| logger.debug("Entry number: {} is {}", i, entry[i]); |
| |
| .... |
| |
| Using this approach, a comparison run again on the same hardware |
| produces: |
| |
| .... |
| Log4j: Not supported |
| Logback: 9 |
| Log4j 2: 4 |
| |
| .... |
| |
| These results show that the difference in performance between the call |
| to isDebugEnabled and logger.debug is barely discernible. |
| |
| In some circumstances one of the parameters to logger.debug will be a |
| costly method call that should be avoided if debugging is disabled. In |
| those cases write: |
| |
| .... |
| if(logger.isDebugEnabled() { |
| logger.debug("Entry number: " + i + " is " + entry[i].toString()); |
| } |
| |
| .... |
| |
| This will not incur the cost of whatever the toString() method needs to |
| do if debugging is disabled. On the other hand, if the logger is enabled |
| for the debug level, it will incur twice the cost of evaluating whether |
| the logger is enabled or not: once in `isDebugEnabled` and once in |
| `debug`. This is an insignificant overhead because evaluating a logger |
| takes about 1% of the time it takes to actually log. |
| |
| Certain users resort to pre-processing or compile-time techniques to |
| compile out all log statements. This leads to perfect performance |
| efficiency with respect to logging. However, since the resulting |
| application binary does not contain any log statements, logging cannot |
| be turned on for that binary. This seems to be a disproportionate price |
| to pay in exchange for a small performance gain. |
| |
| The performance of deciding whether to log or not to log when logging is |
| turned on. |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| Unlike Log4j, Log4j 2 Loggers don't "walk a hierarchy". Loggers point |
| directly to the Logger configuration that best matches the Logger's |
| name. This incurs extra overhead when the Logger is first created but |
| reduces the overhead every time the Logger is used. |
| |
| === Actually outputting log messages |
| |
| This is the cost of formatting the log output and sending it to its |
| target destination. Here again, a serious effort was made to make |
| layouts (formatters) perform as quickly as possible. The same is true |
| for appenders. One of the fundamental tenets of Log4j 2 is to use |
| immutable objects whenever possible and to lock at the lowest |
| granularity possible. However, the cost of actually formatting and |
| delivering log events will never be insignificant. For example, the |
| results of writing to a simple log file using the same format using |
| Log4j, Logback and Log4j 2 are: |
| |
| .... |
| Log4j: 1651 |
| Logback: 1419 |
| Log4j 2.0: 1542 |
| |
| .... |
| |
| As with many of the other results on this page the differences between |
| the frameworks above should be considered insignificant. The values will |
| change somewhat on each execution and changing the order the frameworks |
| are tested or adding calls to System.gc() between the tests can cause a |
| variation in the reported times. However, these results show that |
| actually writing out the events can be at least 1000 times more |
| expensive than when they are disabled, so it is always recommended to |
| take advantage of Log4j 2's fine-grained filtering capabilities. |
| //// |