| .. Licensed to the Apache Software Foundation (ASF) under one |
| .. or more contributor license agreements. See the NOTICE file |
| .. distributed with this work for additional information |
| .. regarding copyright ownership. The ASF licenses this file |
| .. to you under the Apache License, Version 2.0 (the |
| .. "License"); you may not use this file except in compliance |
| .. with the License. You may obtain a copy of the License at |
| |
| .. http://www.apache.org/licenses/LICENSE-2.0 |
| |
| .. Unless required by applicable law or agreed to in writing, |
| .. software distributed under the License is distributed on an |
| .. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| .. KIND, either express or implied. See the License for the |
| .. specific language governing permissions and limitations |
| .. under the License. |
| |
| .. highlight:: console |
| |
| =========== |
| Swiss Table |
| =========== |
| |
| A specialized hash table implementation used to dynamically map combinations of |
| key field values to a dense set of integer ids. Ids can later be used in place |
| of keys to identify groups of rows with equal keys. |
| |
| Introduction |
| ============ |
| |
| Hash group-by in Arrow uses a variant of a hash table based on a data structure |
| called Swiss table. Swiss table uses linear probing. There is an array of slots |
| and the information related to inserted keys is stored in these slots. A hash |
| function determines the slot where the search for a matching key will start |
| during hash table lookup. Then the slots are visited sequentially, wrapping |
| around the end of an array, until either a match or an empty slot is found, the |
| latter case meaning that there is no match. Swiss table organizes the slots in |
| blocks of 8 and has a design that enables data level parallelism at the block |
| level. More precisely, it allows for visiting all slots within a block at once |
| during lookups, by simply using 64-bit arithmetic. SIMD instructions can further |
| enhance this data level parallelism allowing to process multiple blocks related |
| to multiple input keys together using SIMD vectors of 64-bit elements. Occupied |
| slots within a block are always clustered together. The name Swiss table comes |
| from likening resulting sequences of empty slots to holes in a one dimensional |
| cheese. |
| |
| Interface |
| ========= |
| |
| Hash table used in query processing for implementing join and group-by operators |
| does not need to provide all of the operations that a general purpose hash table |
| would. Simplified requirements can help achieve a simpler and more efficient |
| design. For instance we do not need to be able to remove previously inserted |
| keys. It’s an append-only data structure: new keys can be added but old keys are |
| never erased. Also, only a single copy of each key can be inserted - it is like |
| ``std::map`` in that sense and not ``std::multimap``. |
| |
| Our Swiss table is fully vectorized. That means that all methods work on vectors |
| of input keys processing them in batches. Specialized SIMD implementations of |
| processing functions are almost always provided for performance critical |
| operations. All callback interfaces used from the core hash table code are also |
| designed to work on batches of inputs instead of individual keys. The batch size |
| can be almost arbitrary and is selected by the client of the hash table. Batch |
| size should be the smallest number of input items, big enough so that the |
| benefits of vectorization and SIMD can be fully experienced. Keeping it small |
| means less memory used for temporary arrays storing intermediate results of |
| computation (vector equivalent of some temporary variables kept on the stack). |
| That in turn means smaller space in CPU caches, which also means less impact on |
| other memory access intensive operations. We pick 1024 as the default size of |
| the batch. We will call it a **mini-batch** to distinguish it from potentially |
| other forms of batches used at higher levels in the code, e.g. when scheduling |
| work for worker threads or relational operators inside an analytic query. |
| |
| The main functionality provided by Swiss table is mapping of arbitrarily complex |
| keys to unique integer ids. Let us call it **lookup-or-insert**. Given a |
| sequence of key values, return a corresponding sequence of integer ids, such |
| that all keys that are equal receive the same id and for K distinct keys the |
| integer ids will be assigned from the set of numbers 0 to (K-1). If we find a |
| matching key in a hash table for a given input, we return the **key id** |
| assigned when the key was first inserted into a hash table. If we fail to find |
| an already inserted match, we assign the first unused integer as a key id and |
| add a new entry to a hash table. Due to vectorized processing, which may result |
| in out-of-order processing of individual inputs, it is not guaranteed that if |
| there are two new key values in the same input batch and one of them appears |
| earlier in the input sequence, then it will receive a smaller key id. Additional |
| mapping functionality can be built on top of basic mapping to integer key id, |
| for instance if we want to assign and perhaps keep updating some values to all |
| unique keys, we can keep these values in a resizable vector indexed by obtained |
| key id. |
| |
| The implementation of Swiss table does not need to have any information related |
| to the domain of the keys. It does not use their logical data type or |
| information about their physical representation and does not even use pointers |
| to keys. All access to keys is delegated to a separate class or classes that |
| provide callback functions for three operations: |
| |
| - computing hashes of keys; |
| - checking equality for given pairs of keys; |
| - appending a given sequence of keys to a stack maintained outside of Swiss |
| table object, so that they can be referenced later on by key ids (key ids will |
| be equal to their positions in the stack). |
| |
| When passing arguments to callback functions the keys are referenced using |
| integer ids. For the left side - that is the keys present in the input |
| mini-batch - ordinal positions within that mini-batch are used. For the right |
| side - that is the keys inserted into the hash table - these are identified by |
| key ids assigned to them and stored inside Swiss table when they were first |
| encountered and processed. |
| |
| Diagram with logical view of information passing in callbacks: |
| |
| .. image:: img/swiss_table_1.jpg |
| |
| Hash table values for inserted keys are also stored inside Swiss table. Because |
| of that, hash table logic does not need to ever re-evaluate the hash, and there |
| is actually no need for a hash function callback. It is enough that the caller |
| provides hash values for all entries in the batch when calling lookup-or-insert. |
| |
| Basic architecture and organization of data |
| =========================================== |
| |
| The hash table is an array of **slots**. Slots are grouped in groups of 8 called |
| **blocks**. The number of blocks is a power of 2. The empty hash table starts |
| with a single block, with all slots empty. Then, as the keys are getting |
| inserted and the amount of empty slots is shrinking, at some point resizing of |
| the hash table is triggered. The data stored in slots is moved to a new hash |
| table that has the double of the number of blocks. |
| |
| The diagram below shows the basic organization of data in our implementation of |
| Swiss table: |
| |
| .. image:: img/swiss_table_2.jpg |
| |
| N is the log of the number of blocks, :math:`2^{N+3}` is the number of slots and |
| also the maximum number of inserted keys and hence (N + 3) is the number of bits |
| required to store a key id. We will refer to N as the **size of the hash table**. |
| |
| Index of a block within an array will be called **block id**, and similarly index |
| of a slot will be **slot id**. Sometimes we will focus on a single block and |
| refer to slots that belong to it by using a **local slot id**, which is an index |
| from 0 to 7. |
| |
| Every slot can either be **empty** or store data related to a single inserted |
| key. There are three pieces of information stored inside a slot: |
| |
| - status byte, |
| - key id, |
| - key hash. |
| |
| Status byte, as the name suggests, stores 8 bits. The highest bit indicates if |
| the slot is empty (the highest bit is set) or corresponds to one of inserted |
| keys (the highest bit is zero). The remaining 7 bits contain 7 bits of key hash |
| that we call a **stamp**. The stamp is used to eliminate some false positives |
| when searching for a matching key for a given input. Slot also stores **key id**, |
| which is a non-negative integer smaller than the number of inserted keys, that is |
| used as a reference to the actual inserted key. The last piece of information |
| related to an inserted key is its **hash** value. We store hashes for all keys, |
| so that they never need to be re-computed. That greatly simplifies some |
| operations, like resizing of a hash table, that may not even need to look at the |
| keys at all. For an empty slot, the status byte is 0x80, key id is zero and the |
| hash is not used and can be set to any number. |
| |
| A single block contains 8 slots and can be viewed as a micro-stack of up to 8 |
| inserted keys. When the first key is inserted into an empty block, it will occupy |
| a slot with local id 0. The second inserted key will go into slot number 1 and so |
| on. We use N highest bits of hash to get an index of a **start block**, when |
| searching for a match or an empty slot to insert a previously not seen key when |
| that is the case. If the start block contains any empty slots, then the search |
| for either a match or place to insert a key will end at that block. We will call |
| such a block an **open block**. A block that is not open is a full block. In the |
| case of full block, the input key related search may continue in the next block |
| modulo the number of blocks. If the key is not inserted into its start block, we |
| will refer to it as an **overflow** entry, other entries being **non-overflow**. |
| Overflow entries are slower to process, since they require visiting more than one |
| block, so we want to keep their percentage low. This is done by choosing the |
| right **load factor** (percentage of occupied slots in the hash table) at which |
| the hash table gets resized and the number of blocks gets doubled. By tuning this |
| value we can control the probability of encountering an overflow entry. |
| |
| The most interesting part of each block is the set of status bytes of its slots, |
| which is simply a single 64-bit word. The implementation of efficient searches |
| across these bytes during lookups require using either leading zero count or |
| trailing zero count intrinsic. Since there are cases when only the first one is |
| available, in order to take advantage of it, we order the bytes in the 64-bit |
| status word so that the first slot within a block uses the highest byte and the |
| last one uses the lowest byte (slots are in reversed bytes order). The diagram |
| below shows how the information about slots is stored within a 64-bit status |
| word: |
| |
| .. image:: img/swiss_table_3.jpg |
| |
| Each status byte has a 7-bit fragment of hash value - a **stamp** - and an empty |
| slot bit. Empty slots have status byte equal to 0x80 - the highest bit is set to |
| 1 to indicate an empty slot and the lowest bits, which are used by a stamp, are |
| set to zero. |
| |
| The diagram below shows which bits of hash value are used by hash table: |
| |
| .. image:: img/swiss_table_4.jpg |
| |
| If a hash table has :math:`2^{N}` blocks, then we use N highest bits of a hash |
| to select a start block when searching for a match. The next 7 bits are used as |
| a stamp. Using the highest bits to pick a start block means that a range of hash |
| values can be easily mapped to a range of block ids of start blocks for hashes |
| in that range. This is useful when resizing a hash table or merging two hash |
| tables together. |
| |
| Interleaving status bytes and key ids |
| ------------------------------------- |
| |
| Status bytes and key ids for all slots are stored in a single array of bytes. |
| They are first grouped by 8 into blocks, then each block of status bytes is |
| interleaved with a corresponding block of key ids. Finally key ids are |
| represented using the smallest possible number of bits and bit-packed (bits |
| representing each next key id start right after the last bit of the previous key |
| id). Note that regardless of the chosen number of bits, a block of bit-packed |
| key ids (that is 8 of them) will start and end on the byte boundary. |
| |
| The diagram below shows the organization of bytes and bits of a single block in |
| interleaved array: |
| |
| .. image:: img/swiss_table_5.jpg |
| |
| From the size of the hash table we can derive the number K of bits needed in the |
| worst case to encode any key id. K is equal to the number of bits needed to |
| represent slot id (number of keys is not greater than the number of slots and any |
| key id is strictly less than the number of keys), which for a hash table of size |
| N (N blocks) equals (N+3). To simplify bit packing and unpacking and avoid |
| handling of special cases, we will round up K to full bytes for K > 24 bits. |
| |
| Status bytes are stored in a single 64-bit word in reverse byte order (the last |
| byte corresponds to the slot with local id 0). On the other hand key ids are |
| stored in the normal order (the order of slot ids). |
| |
| Since both status byte and key id for a given slot are stored in the same array |
| close to each other, we can expect that most of the lookups will read only one |
| CPU cache-line from memory inside Swiss table code (then at least another one |
| outside Swiss table to access the bytes of the key for the purpose of |
| comparison). Even if we hit an overflow entry, it is still likely to reside on |
| the same cache-line as the start block data. Hash values, which are stored |
| separately from status byte and key id, are only used when resizing and do not |
| impact the lookups outside these events. |
| |
| .. note:: |
| Improvement to consider: |
| In addition to the Swiss table data, we need to store an array of inserted |
| keys, one for each key id. If keys are of fixed length, then the address of |
| the bytes of the key can be calculated by multiplying key id by the common |
| length of the key. If keys are of varying length, then there will be an |
| additional array with an offset of each key within the array of concatenated |
| bytes of keys. That means that any key comparison during lookup will involve |
| 3 arrays: one to get key id, one to get key offset and final one with bytes of |
| the key. This could be reduced to 2 array lookups if we stored key offset |
| instead of key id interleaved with slot status bytes. Offset indexed by key id |
| and stored in its own array becomes offset indexed by slot id and stored |
| interleaved with slot status bytes. At the same time key id indexed by slot id |
| and interleaved with slot status bytes before becomes key id referenced using |
| offset and stored with key bytes. There may be a slight increase in the total |
| size of memory needed by the hash table, equal to the difference in the number |
| of bits used to store offset and those used to store key id, multiplied by the |
| number of slots, but that should be a small fraction of the total size. |
| |
| 32-bit hash vs 64-bit hash |
| -------------------------- |
| |
| Currently we use 32-bit hash values in Swiss table code and 32-bit integers as |
| key ids. For the robust implementation, sooner or later we will need to support |
| 64-bit hash and 64-bit key ids. When we use 32-bit hash, it means that we run |
| out of hash bits when hash table size N is greater than 25 (25 bits of hash |
| needed to select a block and 7 bits needed to generate a stamp byte reach 32 |
| total bits). When the number of inserted keys exceeds the maximal number of keys |
| stored in a hash table of size 25 (which is at least :math:`2^{24}`), the chance |
| of false positives during lookups will start quickly growing. 32-bit hash should |
| not be used with more than about 16 million inserted keys. |
| |
| Low memory footprint and low chance of hash collisions |
| ------------------------------------------------------ |
| |
| Swiss table is a good choice of a hash table for modern hardware, because it |
| combines lookups that can take advantage of special CPU instructions with space |
| efficiency and low chance of hash collisions. |
| |
| Space efficiency is important for performance, because the cost of random array |
| accesses, often dominating the lookup cost for larger hash tables, increases with |
| the size of the arrays. This happens due to limited space of CPU caches. Let us |
| look at what is the amortized additional storage cost for a key in a hash table |
| apart from the essential cost of storing data of all those keys. Furthermore, we |
| can skip the storage of hash values, since these are only used during infrequent |
| hash table resize operations (should not have a big impact on CPU cache usage in |
| normal cases). |
| |
| Half full hash table of size N will use 2 status bytes per inserted key (because |
| for every filled slot there is one empty slot) and 2*(N+3) bits for key id |
| (again, one for the occupied slot and one for the empty). For N = 16 for |
| instance this is slightly under 7 bytes per inserted key. |
| |
| Swiss table also has a low probability of false positives leading to wasted key |
| comparisons. Here is some rationale behind why this should be the case. Hash |
| table of size N can contain up to :math:`2^{N+3}` keys. Search for a match |
| involves (N + 7) hash bits: N to select a start block and 7 to use as a stamp. |
| There are always at least 16 times more combinations of used hash bits than |
| there are keys in the hash table (32 times more if the hash table is half full). |
| These numbers mean that the probability of false positives resulting from a |
| search for a matching slot should be low. That corresponds to an expected number |
| of comparisons per lookup being close to 1 for keys already present and 0 for |
| new keys. |
| |
| Lookup |
| ====== |
| |
| Lookup-or-insert operation, given a hash of a key, finds a list of candidate |
| slots with corresponding keys that are likely to be equal to the input key. The |
| list may be empty, which means that the key does not exist yet in the hash |
| table. If it is not empty, then the callback function for key comparison is |
| called for each next candidate to verify that there is indeed a match. False |
| positives get rejected and we end up either finding an actual match or an empty |
| slot, which means that the key is new to the hash table. New keys get assigned |
| next available integers as key ids, and are appended to the set of keys stored in |
| the hash table. As a result of inserting new keys to the hash table, the density |
| of occupied slots may reach an upper limit, at which point the hash table will be |
| resized and will afterwards have twice as many slots. That is in summary |
| lookup-or-insert functionality, but the actual implementation is a bit more |
| involved, because of vectorization of the processing and various optimizations |
| for common cases. |
| |
| Search within a single block |
| ---------------------------- |
| |
| There are three possible cases that can occur when searching for a match for a |
| given key (that is, for a given stamp of a key) within a single block, |
| illustrated below. |
| |
| 1. There is a matching stamp in the block of status bytes: |
| |
| .. image:: img/swiss_table_6.jpg |
| |
| 2. There is no matching stamp in the block, but there is an empty slot in the |
| block: |
| |
| .. image:: img/swiss_table_7.jpg |
| |
| 3. There is no matching stamp in the block and the block is full (there are no |
| empty slots left): |
| |
| .. image:: img/swiss_table_8.jpg |
| |
| 64-bit arithmetic can be used to search for a matching slot within the entire |
| single block at once, without iterating over all slots in it. Following is an |
| example of a sequence of steps to find the first status byte for a given stamp, |
| returning the first empty slot on miss if the block is not full or 8 (one past |
| maximum local slot id) otherwise. |
| |
| Following is a sketch of the possible steps to execute when searching for the |
| matching stamp in a single block. |
| |
| | *Example will use input stamp 0x5E and a 64-bit status bytes word with one empty |
| slot:* |
| | *0x 4B17 5E3A 5E2B 1180* |
| |
| 1. [1 instruction] Replicate stamp to all bytes by multiplying it by 0x 0101 0101 |
| 0101 0101. |
| |
| | *We obtain: 0x 5E5E 5E5E 5E5E 5E5E.* |
| |
| 2. [1 instruction] XOR replicated stamp with status bytes word. Bytes corresponding |
| to a matching stamp will be 0, bytes corresponding to empty slots will have a |
| value between 128 and 255, bytes corresponding to non-matching non-empty slots |
| will have a value between 1 and 127. |
| |
| | *We obtain: 0x 1549 0064 0075 4FDE.* |
| |
| 3. [2 instructions] In the next step we want to have information about a match in |
| the highest bit of each byte. We can ignore here empty slot bytes, because they |
| will be taken care of at a later step. Set the highest bit in each byte (OR with |
| 0x 8080 8080 8080 8080) and then subtract 1 from each byte (subtract 0x 0101 0101 |
| 0101 0101 from 64-bit word). Now if a byte corresponds to a non-empty slot then |
| the highest bit 0 indicates a match and 1 indicates a miss. |
| |
| | *We obtain: 0x 95C9 80E4 80F5 CFDE,* |
| | *then 0x 94C8 7FE3 7FF4 CEDD.* |
| |
| 4. [3 instructions] In the next step we want to obtain in each byte one of two |
| values: 0x80 if it is either an empty slot or a match, 0x00 otherwise. We do |
| it in three steps: NOT the result of the previous step to change the meaning |
| of the highest bit; OR with the original status word to set highest bit in a |
| byte to 1 for empty slots; mask out everything other than the highest bits in |
| all bytes (AND with 0x 8080 8080 8080 8080). |
| |
| | *We obtain: 6B37 801C 800B 3122,* |
| | *then 6B37 DE3E DE2B 31A2,* |
| | *finally 0x0000 8000 8000 0080.* |
| |
| 5. [2 instructions] Finally, use leading zero bits count and divide it by 8 to |
| find an index of the last byte that corresponds either to a match or an empty |
| slot. If the leading zero count intrinsic returns 64 for a 64-bit input zero, |
| then after dividing by 8 we will also get the desired answer in case of a full |
| block without any matches. |
| |
| | *We obtain: 16,* |
| | *then 2 (index of the first slot within the block that matches the stamp).* |
| |
| If SIMD instructions with 64-bit lanes are available, multiple single block |
| searches for different keys can be executed together. For instance AVX2 |
| instruction set allows to process quadruplets of 64-bit values in a single |
| instruction, four searches at once. |
| |
| Complete search potentially across multiple blocks |
| -------------------------------------------------- |
| |
| Full implementation of a search for a matching key may involve visiting multiple |
| blocks beginning with the start block selected based on the hash of the key. We |
| move to the next block modulo the number of blocks, whenever we do not find a |
| match in the current block and the current block is full. The search may also |
| involve visiting one or more slots in each block. Visiting in this case means |
| calling a comparison callback to verify the match whenever a slot with a matching |
| stamp is encountered. Eventually the search stops when either: |
| |
| - the matching key is found in one of the slots matching the stamp, or |
| - an empty slot is reached. This is illustrated in the diagram below: |
| |
| .. image:: img/swiss_table_9.jpg |
| |
| Optimistic processing with two passes |
| ------------------------------------- |
| |
| Hash table lookups may have high cost in the pessimistic case, when we encounter |
| cases of hash collisions and full blocks that lead to visiting further blocks. In |
| the majority of cases we can expect an optimistic situation - the start block is |
| not full, so we will only visit this one block, and all stamps in the block are |
| different, so we will need at most one comparison to find a match. We can expect |
| about 90% of the key lookups for an existing key to go through the optimistic |
| path of processing. For that reason it pays off to optimize especially for this |
| 90% of inputs. |
| |
| Lookups in Swiss table are split into two passes over an input batch of keys. The |
| **first pass: fast-path lookup**, is a highly optimized, vectorized, |
| SIMD-friendly, branch-free code that fully handles optimistic cases. The **second |
| pass: slow-path lookup**, is normally executed only for the selection of inputs |
| that have not been finished in the first pass, although it can also be called |
| directly on all of the inputs, skipping fast-path lookup. It handles all special |
| cases and inserts but in order to be robust it is not as efficient as fast-path. |
| Slow-path lookup does not need to repeat the work done in fast-path lookup - it |
| can use the state reached at the end of fast-path lookup as a starting point. |
| |
| Fast-path lookup implements search only for the first stamp match and only within |
| the start block. It only makes sense when we already have at least one key |
| inserted into the hash table, since it does not handle inserts. It takes a vector |
| of key hashes as an input and based on it outputs three pieces of information for |
| each key: |
| |
| - Key id corresponding to the slot in which a matching stamp was found. Any valid |
| key id if a matching stamp was not found. |
| - A flag indicating if a match was found or not. |
| - Slot id of a slot from which slow-path should pick up the search if the first |
| match was either not found or it turns out to be false positive after |
| evaluating key comparison. |
| |
| .. note:: |
| Improvement to consider: precomputing 1st pass lookup results. |
| |
| If the hash table is small, the number of inserted keys is small, we could |
| further simplify and speed-up the first pass by storing in a lookup table |
| pre-computed results for all combinations of hash bits. Let us consider the |
| case of Swiss table of size 5 that has 256 slots and up to 128 inserted keys. |
| Only 12 bits of hash are used by lookup in that case: 5 to select a block, 7 |
| to create a stamp. For all :math:`2^{12}` combinations of those bits we could |
| keep the result of first pass lookup in an array. Key id and a match |
| indicating flag can use one byte: 7 bits for key id and 1 bit for the flag. |
| Note that slot id is only needed if we go into 2nd pass lookup, so it can be |
| stored separately and likely only accessed by a small subset of keys. |
| Fast-path lookup becomes almost a single fetch of result from a 4KB array. |
| Lookup arrays used to implement this need to be kept in sync with the main |
| copy of data about slots, which requires extra care during inserts. Since the |
| number of entries in lookup arrays is much higher than the number of slots, |
| this technique only makes sense for small hash tables. |
| |
| Dense comparisons |
| ----------------- |
| |
| If there is at least one key inserted into a hash table, then every slot contains |
| a key id value that corresponds to some actual key that can be used in |
| comparison. That is because empty slots are initialized with 0 as their key id. |
| After the fast-path lookup we get a match-found flag for each input. If it is |
| set, then we need to run a comparison of the input key with the key in the hash |
| table identified by key id returned by fast-path code. The comparison will verify |
| that there is a true match between the keys. We only need to do this for a |
| subset of inputs that have a match candidate, but since we have key id values |
| corresponding to some real key for all inputs, we may as well execute |
| comparisons on all inputs unconditionally. If the majority (e.g. more than 80%) |
| of the keys have a match candidate, the cost of evaluating comparison for the |
| remaining fraction of keys but without filtering may actually be cheaper than the |
| cost of running evaluation only for required keys while referencing filter |
| information. This can be seen as a variant of general preconditioning techniques |
| used to avoid diverging conditional branches in the code. It may be used, based |
| on some heuristic, to verify matches reported by fast-path lookups and is |
| referred to as **dense comparisons**. |
| |
| Resizing |
| ======== |
| |
| New hash table is initialized as empty and has only a single block with a space |
| for only a few key entries. Doubling of the hash table size becomes necessary as |
| more keys get inserted. It is invoked during the 2nd pass of the lookups, which |
| also handles inserts. It happens immediately after the number of inserted keys |
| reaches a specific upper limit decided based on a current size of the hash table. |
| There may still be unprocessed entries from the input mini-batch after resizing, |
| so the 2nd pass of the lookup is restarted right after, with the bigger hash |
| table and the remaining subset of unprocessed entries. |
| |
| Current policy, that should work reasonably well, is to resize a small hash table |
| (up to 8KB) when it is 50% full. Larger hash tables are resized when 75% full. |
| We want to keep size in memory as small as possible, while maintaining a low |
| probability of blocks becoming full. |
| |
| When discussing resizing we will be talking about **resize source** and **resize |
| target** tables. The diagram below shows how the same hash bits are interpreted |
| differently by the source and the target. |
| |
| .. image:: img/swiss_table_10.jpg |
| |
| For a given hash, if a start block id was L in the source table, it will be |
| either (2*L+0) or (2*L+1) in the target table. Based on that we can expect data |
| access locality when migrating the data between the tables. |
| |
| Resizing is cheap also thanks to the fact that hash values for keys in the hash |
| table are kept together with other slot data and do not need to be recomputed. |
| That means that resizing procedure does not ever need to access the actual bytes |
| of the key. |
| |
| 1st pass |
| -------- |
| |
| Based on the hash value for a given slot we can tell whether this slot contains |
| an overflow or non-overflow entry. In the first pass we go over all source slots |
| in sequence, filter out overflow entries and move to the target table all other |
| entries. Non-overflow entries from a block L will be distributed between blocks |
| (2*L+0) and (2*L+1) of the target table. None of these target blocks can |
| overflow, since they will be accommodating at most 8 input entries during this |
| pass. |
| |
| For every non-overflow entry, the highest bit of a stamp in the source slot |
| decides whether it will go to the left or to the right target block. It is |
| further possible to avoid any conditional branches in this partitioning code, so |
| that the result is friendly to the CPU execution pipeline. |
| |
| .. image:: img/swiss_table_11.jpg |
| |
| 2nd pass |
| -------- |
| |
| In the second pass of resizing, we scan all source slots again, this time |
| focusing only on the overflow entries that were all skipped in the 1st pass. We |
| simply reinsert them in the target table using generic insertion code with one |
| exception. Since we know that all the source keys are different, there is no |
| need to search for a matching stamp or run key comparisons (or look at the key |
| values). We just need to find the first open block beginning with the start |
| block in the target table and use its first empty slot as the insert |
| destination. |
| |
| We expect overflow entries to be rare and therefore the relative cost of that |
| pass should stay low. |