| |
| This is Bob Jenkins' hash table code from burtleburtle.net/bob/hash/hashtab.html. |
| The author, Bob Jenkins, has placed this code in the public domain, |
| He states as such in comments in the code itself. Regarding inclusion of the |
| code in an Apache project, he states: |
| I do prefer that you keep a pointer back to me as the source, so I can |
| answer questions if any come up. I would object if someone claimed |
| they'd written it themselves. But other than that, there's no restrictions, |
| you can remove or rewrite any of the comments or code, |
| including the disclaimers. |
| |
| James DeCocq (jadecocq) wrote: |
| > Hi Bob, |
| > Last year I used your excellent hash table C code in a project which |
| > is now to become an Apache open source project. What is the usual |
| > attribution and licensing disclaimer procedures when folks use your |
| > code in open source projects? |
| |
| CUSTOMIZED FOR ETCH |
| The original code has been customized for use by etch. Changes are as follows: |
| - renamed files belonging to this subproject to begin with "jenk". |
| - changed name of unique.c to jenktest.c; changed this to read from file not stdin |
| - added includes to .c files such that they compile individually |
| |
| BEGIN JENKINS COMMENTS ON THE CODE |
| hashtab.h and hashtab.c form the hash table module. |
| The files before those are the support files that I always use. |
| The file after it (unique.c) is an example of how to use the hash table. |
| (The program UNIQUE takes a file in STDIN and dumps the unique lines |
| (duplicates removed) to STDOUT. |
| It also shuffles the unique lines pseudorandomly. |
| The sample input provided doesn't have any duplicate lines, |
| so the output should be the same size as the input, but the lines will be shuffled.) |
| |
| The hash table has a fixed hash function, and its size is a power of two. |
| It doubles in size when it needs to, and when it doubles, it doubles all at once. |
| It never shrinks. Input keys and data are pointed to, not copied. |
| Keys are unique. |
| Collisions are handled by chaining. |
| |
| Functions are: |
| hcreate - create a hash table |
| hdestroy - destroy a hash table |
| hcount - how many items are in the hash table? |
| hkey - the key at the current position |
| hkeyl - the length of the key at the current position |
| hstuff - the other data at the current position |
| hfind - position the hash table at some key |
| hadd - add a new <key,stuff> pair to the table |
| hdel - delete the item at the current position |
| hfirst - position at the first item in the table |
| hnext - move to the next item in the table |
| hstat - print statistics about this table |
| |
| The most unusual thing about this module is that it maintains a current position. |
| This means you can't have a dangling pointer into it. |
| If you position on something, and then delete it, the position moves to another item. |
| On the other hand, it also means it's hard to do nested loops over all the items in the table, |
| since there can be only one position at a time. |
| END JENKINS COMMENTS ON THE CODE |
| |
| |
| JENKINS EMAIL RE 64-BIT COMPATIBILITY |
| I don't have any. However, Thomas Wang came up with this one: |
| |
| public long hash64shift(long key) |
| { |
| key = (~key) + (key << 21); // key = (key << 21) - key - 1; |
| key = key ^ (key >>> 24); |
| key = (key + (key << 3)) + (key << 8); // key * 265 |
| key = key ^ (key >>> 14); |
| key = (key + (key << 2)) + (key << 4); // key * 21 |
| key = key ^ (key >>> 28); |
| key = key + (key << 31); |
| return key; |
| } |
| |
| I haven't tested it, but the functions of his that I have tested aren't bad, and the operations look about like what I'd expect is needed. My preliminary stabs at a 64-bit functions said 8 or 9 shifts are needed, he's got 10, but many are done in parallel, so it does look like a promising function. |
| |
| James DeCocq (jadecocq) wrote: |
| > Hi Bob, |
| > |
| > Have you tried the code in 64 bits? I don't currently have the ability |
| > to do so, but a glance at some of the code, specifically the renew() |
| > macro, seems as if it might assume that a pointer is the same size as |
| > an int. If it is the case that it makes assumptions as to 32 bits, are |
| > there specific tweaks you may have made for 64 bits that you can share? |
| END JENKINS EMAIL RE 64-BIT COMPATIBILITY |
| |
| |