| .. Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| |
| .. include:: ../../common.defs |
| |
| .. _performance-tuning: |
| |
| Performance Tuning |
| ****************** |
| |
| |ATS| in its default configuration should perform suitably for running the |
| included regression test suite, but will need special attention to both its own |
| configuration and the environment in which it runs to perform optimally for |
| production usage. |
| |
| There are numerous options and strategies for tuning the performance of |TS| |
| and we attempt to document as many of them as possible in the sections below. |
| Because |TS| offers enough flexibility to be useful for many caching and |
| proxying scenarios, which tuning strategies will be most effective for any |
| given use case may differ, as well as the specific values for various |
| configuration options. |
| |
| .. toctree:: |
| :maxdepth: 2 |
| |
| Before You Start |
| ================ |
| |
| One of the most important aspects of any attempt to optimize the performance |
| of a |TS| installation is the ability to measure that installation's |
| performance; both prior to and after any changes are made. To that end, it is |
| strongly recommended that you establish some means to monitor and record a |
| variety of performance metrics: request and response speed, latency, and |
| throughput; memory and CPU utilization; and storage I/O operations. |
| |
| Attempts to tune a system without being able to compare the impact of changes |
| made will at best result in haphazard, *feel good* results that may end up |
| having no real world impact on your customers' experiences, and at worst may |
| even result in lower performance than before you started. Additionally, in the |
| all too common situation of budget constraints, having proper measurements of |
| existing performance will greatly ease the process of focusing on those |
| individual components that, should they require hardware expenditures or larger |
| investments of employee time, have the highest potential gains relative to |
| their cost. |
| |
| Building Traffic Server |
| ======================= |
| |
| While the default compilation settings for |TS| will produce a set of binaries |
| capable of serving most caching and proxying needs, there are some build |
| options worth considering in specific environments. |
| |
| .. TODO:: |
| |
| - any reasons why someone wouldn't want to just go with distro packages? |
| (other than "distro doesn't package versions i want") |
| - list relevant build options, impact each can potentially have |
| |
| Hardware Tuning |
| =============== |
| |
| As with any other server software, efficient allocation of hardware resources |
| will have a significant impact on |TS| performance. |
| |
| CPU Selection |
| ------------- |
| |
| |ATS| uses a hybrid event-driven engine and multi-threaded processing model for |
| handling incoming requests. As such, it is highly scalable and makes efficient |
| use of modern, multicore processor architectures. |
| |
| .. TODO:: |
| |
| any benchmarks showing relative req/s improvements between 1 core, 2 core, |
| N core? diminishing rate of return? can't be totally linear, but maybe it |
| doesn't realistically drop off within the currently available options (i.e. |
| the curve holds up pretty well all the way through current four socket xeon |
| 8 core systems, so given a lack of monetary constraint, adding more cores |
| is a surefire performance improvement (up to the bandwidth limits), or does |
| it fall off earlier, or can any modern 4 core saturate a 10G network link |
| given fast enough disks?) |
| |
| Memory Allocation |
| ----------------- |
| |
| Though |TS| stores cached content within an on-disk host database, the entire |
| :ref:`cache-directory` is always maintained in memory during server operation. |
| Additionally, most operating systems will maintain disk caches within system |
| memory. It is also possible, and commonly advisable, to maintain an in-memory |
| cache of frequently accessed content. |
| |
| The memory footprint of the |TS| process is largely fixed at the time of server |
| startup. Your |TS| systems will need at least enough memory to satisfy basic |
| operating system requirements, as well as capacity for the cache directory, and |
| any memory cache you wish to use. The default settings allocate roughly 10 |
| megabytes of RAM cache for every gigabyte of disk cache storage, though this |
| setting can be adjusted manually in :file:`records.config` using the setting |
| :ts:cv:`proxy.config.cache.ram_cache.size`. |TS| will, under the default |
| configuration, adjust this automatically if your system does not have enough |
| physical memory to accomodate the aforementioned target. |
| |
| Aside from the cost of physical memory, and necessary supporting hardware to |
| make use of large amounts of RAM, there is little downside to increasing the |
| memory allocation of your cache servers. You will see, however, no benefit from |
| sizing your memory allocation larger than the sum of your content (and index |
| overhead). |
| |
| Disk Storage |
| ------------ |
| |
| Except in cases where your entire cache may fit into system memory, your cache |
| nodes will eventually need to interact with their disks. While a more detailed |
| discussion of storage stratification is covered in `Cache Partitioning`_ below, |
| very briefly you may be able to realize gains in performance by separating |
| more frequently accessed content onto faster disks (PCIe SSDs, for instance) |
| while maintaining the bulk of your on-disk cache objects, which may not receive |
| the same high volume of requests, on lower-cost mechanical drives. |
| |
| |
| |
| Operating System Tuning |
| ======================== |
| |
| |ATS| is supported on a variety of operating systems, and as a result the tuning |
| strategies available at the OS level will vary depending upon your chosen |
| platform. |
| |
| General Recommendations |
| ----------------------- |
| |
| TCP Keep Alive |
| ~~~~~~~~~~~~~~ |
| |
| TCP Congestion Control Settings |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Ephemeral and Reserved Ports |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Jumbo Frames |
| ~~~~~~~~~~~~ |
| |
| .. TODO:: would they be useful/harmful/neutral for anything other than local forward/transparent proxies? |
| |
| Linux |
| ----- |
| |
| FreeBSD |
| ------- |
| |
| OmniOS / illumos |
| ---------------- |
| |
| Mac OS X |
| -------- |
| |
| Traffic Server Tuning |
| ===================== |
| |
| |TS| itself, of course, has many options you may want to consider adjusting to |
| achieve optimal performance in your environment. Many of these settings are |
| recorded in :file:`records.config` and may be adjusted with the |
| :option:`traffic_ctl config set` command line utility while the server is operating. |
| |
| CPU and Thread Optimization |
| --------------------------- |
| |
| Thread Scaling |
| ~~~~~~~~~~~~~~ |
| |
| By default, |TS| creates 1.5 threads per CPU core on the host system. This may |
| be adjusted with the following settings in :file:`records.config`: |
| |
| * :ts:cv:`proxy.config.exec_thread.autoconfig` |
| * :ts:cv:`proxy.config.exec_thread.autoconfig.scale` |
| * :ts:cv:`proxy.config.exec_thread.limit` |
| |
| Thread Affinity |
| ~~~~~~~~~~~~~~~ |
| |
| On multi-socket servers, such as Intel architectures with NUMA, you can adjust |
| the thread affinity configuration to take advantage of cache pipelines and |
| faster memory access, as well as preventing possibly costly thread migrations |
| across sockets. This is adjusted with :ts:cv:`proxy.config.exec_thread.affinity` |
| in :file:`records.config`. :: |
| |
| CONFIG proxy.config.exec_thread.affinity INT 1 |
| |
| Thread Stack Size |
| ~~~~~~~~~~~~~~~~~ |
| |
| :ts:cv:`proxy.config.thread.default.stacksize` |
| |
| .. TODO:: |
| |
| is there ever a need to fiddle with this, outside of possibly custom developed plugins? |
| |
| Polling Timeout |
| ~~~~~~~~~~~~~~~ |
| |
| If you are experiencing unusually or unacceptably high CPU utilization during |
| idle workloads, you may consider adjusting the polling timeout with |
| :ts:cv:`proxy.config.net.poll_timeout`:: |
| |
| CONFIG proxy.config.net.poll_timeout INT 60 |
| |
| Memory Optimization |
| ------------------- |
| |
| :ts:cv:`proxy.config.thread.default.stacksize` |
| :ts:cv:`proxy.config.cache.ram_cache.size` |
| |
| |
| Disk Storage Optimization |
| ------------------------- |
| |
| :ts:cv:`proxy.config.cache.force_sector_size` |
| :ts:cv:`proxy.config.cache.max_doc_size` |
| :ts:cv:`proxy.config.cache.target_fragment_size` |
| |
| Cache Partitioning |
| ~~~~~~~~~~~~~~~~~~ |
| |
| Network Tuning |
| -------------- |
| |
| :ts:cv:`proxy.config.net.connections_throttle` |
| |
| Error responses from origins are conistent and costly |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| If error responses are costly for your origin server to generate, you may elect |
| to have |TS| cache these responses for a period of time. The default behavior is |
| to consider all of these responses to be uncacheable, which will lead to every |
| client request to result in an origin request. |
| |
| This behavior is controlled by both enabling the feature via |
| :ts:cv:`proxy.config.http.negative_caching_enabled` and setting the cache time |
| (in seconds) with :ts:cv:`proxy.config.http.negative_caching_lifetime`. :: |
| |
| CONFIG proxy.config.http.negative_caching_enabled INT 1 |
| CONFIG proxy.config.http.negative_caching_lifetime INT 10 |
| |
| SSL-Specific Options |
| ~~~~~~~~~~~~~~~~~~~~ |
| |
| :ts:cv:`proxy.config.ssl.max_record_size` |
| :ts:cv:`proxy.config.ssl.session_cache` |
| :ts:cv:`proxy.config.ssl.session_cache.size` |
| |
| Thread Types |
| ------------ |
| |
| Logging Configuration |
| --------------------- |
| |
| .. TODO:: |
| |
| binary vs. ascii output |
| multiple log formats (netscape+squid+custom vs. just custom) |
| overhead to log collation |
| using direct writes vs. syslog target |
| |
| Plugin Tuning |
| ============= |
| |
| Common Scenarios and Pitfalls |
| ============================= |
| |
| While environments vary widely and |TS| is useful in a great number of different |
| situations, there are at least some recurring elements that may be used as |
| shortcuts to identifying problem areas, or realizing easier performance gains. |
| |
| .. TODO:: |
| |
| - origins not sending proper expiration headers (can fix at the origin (preferable) or use proxy.config.http.cache.heuristic_(min|max)_lifetime as hacky bandaids) |
| - cookies and http_auth prevent caching |
| - avoid thundering herd with read-while-writer (link to section in http-proxy-caching) |