binding-c/runtime/c/ext/hashtab/readme-jenk.txt - etch - Git at Google


 This is Bob Jenkins' hash table code from burtleburtle.net/bob/hash/hashtab.html.
 The author, Bob Jenkins, has placed this code in the public domain,
 He states as such in comments in the code itself. Regarding inclusion of the
 code in an Apache project, he states:
    I do prefer that you keep a pointer back to me as the source, so I can
    answer questions if any come up.  I would object if someone claimed
    they'd written it themselves. But other than that, there's no restrictions,
    you can remove or rewrite any of the comments or code,
    including the disclaimers.

    James DeCocq (jadecocq) wrote:
    > Hi Bob,
    > Last year I used your excellent hash table C code in a project which
    > is now to become an Apache open source project. What is the usual
    > attribution and licensing disclaimer procedures when folks use your
    > code in open source projects?

 CUSTOMIZED FOR ETCH
 The original code has been customized for use by etch. Changes are as follows:
 - renamed files belonging to this subproject to begin with "jenk".
 - changed name of unique.c to jenktest.c; changed this to read from file not stdin
 - added includes to .c files such that they compile individually

 BEGIN JENKINS COMMENTS ON THE CODE
 hashtab.h and hashtab.c form the hash table module.
 The files before those are the support files that I always use.
 The file after it (unique.c) is an example of how to use the hash table.
 (The program UNIQUE takes a file in STDIN and dumps the unique lines
 (duplicates removed) to STDOUT.
 It also shuffles the unique lines pseudorandomly.
 The sample input provided doesn't have any duplicate lines,
 so the output should be the same size as the input, but the lines will be shuffled.)

 The hash table has a fixed hash function, and its size is a power of two.
 It doubles in size when it needs to, and when it doubles, it doubles all at once.
 It never shrinks. Input keys and data are pointed to, not copied.
 Keys are unique.
 Collisions are handled by chaining.

 Functions are:
 hcreate  - create a hash table
 hdestroy - destroy a hash table
 hcount   - how many items are in the hash table?
 hkey     - the key at the current position
 hkeyl    - the length of the key at the current position
 hstuff   - the other data at the current position
 hfind    - position the hash table at some key
 hadd     - add a new <key,stuff> pair to the table
 hdel     - delete the item at the current position
 hfirst   - position at the first item in the table
 hnext    - move to the next item in the table
 hstat    - print statistics about this table

 The most unusual thing about this module is that it maintains a current position.
 This means you can't have a dangling pointer into it.
 If you position on something, and then delete it, the position moves to another item.
 On the other hand, it also means it's hard to do nested loops over all the items in the table,
 since there can be only one position at a time.
 END JENKINS COMMENTS ON THE CODE


 JENKINS EMAIL RE 64-BIT COMPATIBILITY
 I don't have any.  However, Thomas Wang came up with this one:

 public long hash64shift(long key)
 {
   key = (~key) + (key << 21); // key = (key << 21) - key - 1;
   key = key ^ (key >>> 24);
   key = (key + (key << 3)) + (key << 8); // key * 265
   key = key ^ (key >>> 14);
   key = (key + (key << 2)) + (key << 4); // key * 21
   key = key ^ (key >>> 28);
   key = key + (key << 31);
   return key;
 }

 I haven't tested it, but the functions of his that I have tested aren't bad, and the operations look about like what I'd expect is needed.  My preliminary stabs at a 64-bit functions said 8 or 9 shifts are needed, he's got 10, but many are done in parallel, so it does look like a promising function.

 James DeCocq (jadecocq) wrote:
 > Hi Bob,
 >
 > Have you tried the code in 64 bits? I don't currently have the ability
 > to do so, but a glance at some of the code, specifically the renew()
 > macro, seems as if it might assume that a pointer is the same size as
 > an int. If it is the case that it makes assumptions as to 32 bits, are
 > there specific tweaks you may have made for 64 bits that you can share?
 END JENKINS EMAIL RE 64-BIT COMPATIBILITY

	This is Bob Jenkins' hash table code from burtleburtle.net/bob/hash/hashtab.html.
	The author, Bob Jenkins, has placed this code in the public domain,
	He states as such in comments in the code itself. Regarding inclusion of the
	code in an Apache project, he states:
	I do prefer that you keep a pointer back to me as the source, so I can
	answer questions if any come up. I would object if someone claimed
	they'd written it themselves. But other than that, there's no restrictions,
	you can remove or rewrite any of the comments or code,
	including the disclaimers.

	James DeCocq (jadecocq) wrote:
	> Hi Bob,
	> Last year I used your excellent hash table C code in a project which
	> is now to become an Apache open source project. What is the usual
	> attribution and licensing disclaimer procedures when folks use your
	> code in open source projects?

	CUSTOMIZED FOR ETCH
	The original code has been customized for use by etch. Changes are as follows:
	- renamed files belonging to this subproject to begin with "jenk".
	- changed name of unique.c to jenktest.c; changed this to read from file not stdin
	- added includes to .c files such that they compile individually

	BEGIN JENKINS COMMENTS ON THE CODE
	hashtab.h and hashtab.c form the hash table module.
	The files before those are the support files that I always use.
	The file after it (unique.c) is an example of how to use the hash table.
	(The program UNIQUE takes a file in STDIN and dumps the unique lines
	(duplicates removed) to STDOUT.
	It also shuffles the unique lines pseudorandomly.
	The sample input provided doesn't have any duplicate lines,
	so the output should be the same size as the input, but the lines will be shuffled.)

	The hash table has a fixed hash function, and its size is a power of two.
	It doubles in size when it needs to, and when it doubles, it doubles all at once.
	It never shrinks. Input keys and data are pointed to, not copied.
	Keys are unique.
	Collisions are handled by chaining.

	Functions are:
	hcreate - create a hash table
	hdestroy - destroy a hash table
	hcount - how many items are in the hash table?
	hkey - the key at the current position
	hkeyl - the length of the key at the current position
	hstuff - the other data at the current position
	hfind - position the hash table at some key
	hadd - add a new <key,stuff> pair to the table
	hdel - delete the item at the current position
	hfirst - position at the first item in the table
	hnext - move to the next item in the table
	hstat - print statistics about this table

	The most unusual thing about this module is that it maintains a current position.
	This means you can't have a dangling pointer into it.
	If you position on something, and then delete it, the position moves to another item.
	On the other hand, it also means it's hard to do nested loops over all the items in the table,
	since there can be only one position at a time.
	END JENKINS COMMENTS ON THE CODE


	JENKINS EMAIL RE 64-BIT COMPATIBILITY
	I don't have any. However, Thomas Wang came up with this one:

	public long hash64shift(long key)
	{
	key = (~key) + (key << 21); // key = (key << 21) - key - 1;
	key = key ^ (key >>> 24);
	key = (key + (key << 3)) + (key << 8); // key * 265
	key = key ^ (key >>> 14);
	key = (key + (key << 2)) + (key << 4); // key * 21
	key = key ^ (key >>> 28);
	key = key + (key << 31);
	return key;
	}

	I haven't tested it, but the functions of his that I have tested aren't bad, and the operations look about like what I'd expect is needed. My preliminary stabs at a 64-bit functions said 8 or 9 shifts are needed, he's got 10, but many are done in parallel, so it does look like a promising function.

	James DeCocq (jadecocq) wrote:
	> Hi Bob,
	>
	> Have you tried the code in 64 bits? I don't currently have the ability
	> to do so, but a glance at some of the code, specifically the renew()
	> macro, seems as if it might assume that a pointer is the same size as
	> an int. If it is the case that it makes assumptions as to 32 bits, are
	> there specific tweaks you may have made for 64 bits that you can share?
	END JENKINS EMAIL RE 64-BIT COMPATIBILITY