blob: b782a65b6674d7b37da8e8b3ed8e66f8e7a7df6a [file] [log] [blame]
This is Bob Jenkins' hash table code from burtleburtle.net/bob/hash/hashtab.html.
The author, Bob Jenkins, has placed this code in the public domain,
He states as such in comments in the code itself. Regarding inclusion of the
code in an Apache project, he states:
I do prefer that you keep a pointer back to me as the source, so I can
answer questions if any come up. I would object if someone claimed
they'd written it themselves. But other than that, there's no restrictions,
you can remove or rewrite any of the comments or code,
including the disclaimers.
James DeCocq (jadecocq) wrote:
> Hi Bob,
> Last year I used your excellent hash table C code in a project which
> is now to become an Apache open source project. What is the usual
> attribution and licensing disclaimer procedures when folks use your
> code in open source projects?
CUSTOMIZED FOR ETCH
The original code has been customized for use by etch. Changes are as follows:
- renamed files belonging to this subproject to begin with "jenk".
- changed name of unique.c to jenktest.c; changed this to read from file not stdin
- added includes to .c files such that they compile individually
BEGIN JENKINS COMMENTS ON THE CODE
hashtab.h and hashtab.c form the hash table module.
The files before those are the support files that I always use.
The file after it (unique.c) is an example of how to use the hash table.
(The program UNIQUE takes a file in STDIN and dumps the unique lines
(duplicates removed) to STDOUT.
It also shuffles the unique lines pseudorandomly.
The sample input provided doesn't have any duplicate lines,
so the output should be the same size as the input, but the lines will be shuffled.)
The hash table has a fixed hash function, and its size is a power of two.
It doubles in size when it needs to, and when it doubles, it doubles all at once.
It never shrinks. Input keys and data are pointed to, not copied.
Keys are unique.
Collisions are handled by chaining.
Functions are:
hcreate - create a hash table
hdestroy - destroy a hash table
hcount - how many items are in the hash table?
hkey - the key at the current position
hkeyl - the length of the key at the current position
hstuff - the other data at the current position
hfind - position the hash table at some key
hadd - add a new <key,stuff> pair to the table
hdel - delete the item at the current position
hfirst - position at the first item in the table
hnext - move to the next item in the table
hstat - print statistics about this table
The most unusual thing about this module is that it maintains a current position.
This means you can't have a dangling pointer into it.
If you position on something, and then delete it, the position moves to another item.
On the other hand, it also means it's hard to do nested loops over all the items in the table,
since there can be only one position at a time.
END JENKINS COMMENTS ON THE CODE
JENKINS EMAIL RE 64-BIT COMPATIBILITY
I don't have any. However, Thomas Wang came up with this one:
public long hash64shift(long key)
{
key = (~key) + (key << 21); // key = (key << 21) - key - 1;
key = key ^ (key >>> 24);
key = (key + (key << 3)) + (key << 8); // key * 265
key = key ^ (key >>> 14);
key = (key + (key << 2)) + (key << 4); // key * 21
key = key ^ (key >>> 28);
key = key + (key << 31);
return key;
}
I haven't tested it, but the functions of his that I have tested aren't bad, and the operations look about like what I'd expect is needed. My preliminary stabs at a 64-bit functions said 8 or 9 shifts are needed, he's got 10, but many are done in parallel, so it does look like a promising function.
James DeCocq (jadecocq) wrote:
> Hi Bob,
>
> Have you tried the code in 64 bits? I don't currently have the ability
> to do so, but a glance at some of the code, specifically the renew()
> macro, seems as if it might assume that a pointer is the same size as
> an int. If it is the case that it makes assumptions as to 32 bits, are
> there specific tweaks you may have made for 64 bits that you can share?
END JENKINS EMAIL RE 64-BIT COMPATIBILITY