| .. Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| |
| .. include:: ../../common.defs |
| |
| .. _developer-cache-ram-cache: |
| |
| |
| RAM Cache |
| ********* |
| |
| New RAM Cache Algorithm (CLFUS) |
| =============================== |
| |
| The new RAM Cache uses ideas from a number of cache replacement policies and |
| algorithms, including LRU, LFU, CLOCK, GDFS and 2Q, called CLFUS (Clocked Least |
| Frequently Used by Size). It avoids any patented algorithms and includes the |
| following features: |
| |
| * Balances Recentness, Frequency and Size to maximize hit rate (not byte hit |
| rate). |
| |
| * Is Scan Resistant and extracts robust hit rates even when the working set does |
| not fit in the RAM Cache. |
| |
| * Supports compression at 3 levels: fastlz, gzip (libz), and xz (liblzma). |
| Compression can be moved to another thread. |
| |
| * Has very low CPU overhead, only slightly more than a basic LRU. Rather than |
| using an O(lg n) heap, it uses a probabilistic replacement policy for O(1) |
| cost with low C. |
| |
| * Has relatively low memory overhead of approximately 200 bytes per object in |
| memory. |
| |
| The rationale for emphasizing hit rate over byte hit rate is that the overhead |
| of pulling more bytes from secondary storage is low compared to the cost of a |
| request. |
| |
| The RAM Cache consists of an object hash fronting 2 LRU/CLOCK lists and a *seen* |
| hash table. The first cached list contains objects in memory, while the second |
| contains a history of objects which have either recently been in memory or are |
| being considered for keeping in memory. The *seen* hash table is used to make |
| the algorithm scan resistant. |
| |
| The list entries record the following information: |
| |
| ============== ================================================================ |
| Value Description |
| ============== ================================================================ |
| key 16 byte unique object identifier |
| auxkeys 8 bytes worth of version number (in our system, the block in the |
| partition). When the version of an object changes old entries are |
| purged from the cache. |
| hits Number of hits within this clock period. |
| size size of the object in the cache. |
| len Length of the object, which differs from *size* because of |
| compression and padding). |
| compressed_len Compressed length of the object. |
| compressed Compression type, or ``none`` if no compression. Possible types |
| are: *fastlz*, *libz*, and *liblzma*. |
| uncompressible Flag indicating that content cannot be compressed (true), or that |
| it mat be compressed (false). |
| copy Whether or not this object should be copied in and copied out |
| (e.g. HTTP HDR). |
| LRU link |
| HASH link |
| IOBufferData Smart point to the data buffer. |
| ============== ================================================================ |
| |
| The interface to the cache is *Get* and *Put* operations. Get operations check |
| if an object is in the cache and are called on a read attempt. The Put operation |
| decides whether or not to cache the provided object in memory. It is called |
| after a read from secondary storage. |
| |
| Seen Hash |
| ========= |
| |
| The *Seen List* becomes active after the *Cached* and *History* lists become |
| full following a cold start. The purpose is to make the cache scan resistant, |
| which means that the cache state must not be affected at all by a long sequence |
| Get and Put operations on objects which are seen only once. This is essential, |
| and without it not only would the cache be polluted, but it could lose critical |
| information about the objects that it cares about. It is therefore essential |
| that the Cache and History lists are not affected by Get or Put operations on |
| objects seen the first time. The Seen Hash maintains a set of 16 bit hash tags, |
| and requests which do not hit in the object cache (are in the Cache List or |
| History List) and do not match the hash tag result in the hash tag being updated |
| but are otherwise ignored. The Seen Hash is sized to approximately the number of |
| objects in the cache in order to match the number that are passed through it |
| with the CLOCK rate of the Cached and History Lists. |
| |
| Cached List |
| =========== |
| |
| The *Cached List* contains objects actually in memory. The basic operation is |
| LRU with new entries inserted into a FIFO queue and hits causing objects to be |
| reinserted. The interesting bit comes when an object is being considered for |
| insertion. A check is first made against the Object Hash to see if the object |
| is in the Cached List or History. Hits result in updating the ``hit`` field and |
| reinsertion of the object. History hits result in the ``hit`` field being |
| updated and a comparison to see if this object should be kept in memory. The |
| comparison is against the least recently used members of the Cache List, and |
| is based on a weighted frequency:: |
| |
| CACHE_VALUE = hits / (size + overhead) |
| |
| A new object must be enough bytes worth of currently cached objects to cover |
| itself. Each time an object is considered for replacement the CLOCK moves |
| forward. If the History object has a greater value then it is inserted into the |
| Cached List and the replaced objects are removed from memory and their list |
| entries are inserted into the History List. If the History object has a lesser |
| value it is reinserted into the History List. Objects considered for replacement |
| (at least one) but not replaced have their ``hits`` field set to ``0`` and are |
| reinserted into the Cached List. This is the CLOCK operation on the Cached List. |
| |
| History List |
| ============ |
| |
| Each CLOCK, the least recently used entry in the History List is dequeued and |
| if the ``hits`` field is not greater than ``1`` (it was hit at least once in |
| the History or Cached List) it is deleted. Otherwise, the ``hits`` is set to |
| ``0`` and it is requeued on the History List. |
| |
| Compression and Decompression |
| ============================= |
| |
| Compression is performed by a background operation (currently called as part of |
| Put) which maintains a pointer into the Cached List and runs toward the head |
| compressing entries. Decompression occurs on demand during a Get. In the case |
| of objects tagged ``copy``, the compressed version is reinserted in the LRU |
| since we need to make a copy anyway. Those not tagged ``copy`` are inserted |
| uncompressed in the hope that they can be reused in uncompressed form. This is |
| a compile time option and may be something we want to change. |
| |
| There are 3 algorithms and levels of compression (speed on an Intel i7 920 |
| series processor using one thread): |
| |
| ======= ================ ================== ==================================== |
| Method Compression Rate Decompression Rate Notes |
| ======= ================ ================== ==================================== |
| fastlz 173 MB/sec 442 MB/sec Basically free since disk or network |
| will limit first; ~53% final size. |
| libz 55 MB/sec 234 MB/sec Almost free, particularly |
| decompression; ~37% final size. |
| liblzma 3 MB/sec 50 MB/sec Expensive; ~27% final size. |
| ======= ================ ================== ==================================== |
| |
| These are ballpark numbers, and your millage will vary enormously. JPEG, for |
| example, will not compress with any of these (or at least will only do so at |
| such a marginal level that the cost of compression and decompression is wholly |
| unjustified), and the same is true of many other media and binary file types |
| which embed some form of compression. The RAM Cache does detect compression |
| level and will declare something *incompressible* if it doesn't get below 90% of |
| the original size. This value is cached so that the RAM Cache will not attempt |
| to compress it again (at least as long as it is in the history). |
| |