binding-c/runtime/c/project/notes/hashmods.txt - etch - Git at Google


 1. we probably do not need the hashkey uint at the head of object,
    a. it is there in order to tell jenkins that an object is length 4 and to hash those 4 bytes.
    b. if we can supply the hash code to jenkins, rather than it doing the hash based on address and length,
       then we can lose this item.

 2. however we will need to replace the hashkey uint with a hashkey *function*.
    a. we want to be able to ask an object to give us its hashkey.
       1. or alternatively it could give us its hashable address and length.
       2. however it is probably cleaner to simply return a 4 byte hashkey.
    b. this is the way c# objects work, by supplying the actual key. they have a default, however
       types tpo be used as keys are supposed to override the default.
    c. we could add methods to jenkins to accept a hash function callback pointer.
       1. if our objects supply address and length we would not need to change jenkins to not hash.
       2. if our objects instead supply the hashkey value, we would need to add fuctions to jenkins to accept
          a precalculated hashcode rather than do the hash calculation itself.
    d. each object would need to have a unique vector of at least 4 bytes to hash.
       0. jenkins will copy the key to its own memory
       1. objects not likely to be hashed could simply supply their memory address.
       2. objects with names obviously use their name.
       3. objects with unique numeric keys use either the stringed key or the numeric key.
       4. we could establish an object id if we need it.
       5. for an object to do hash lookup rather than sequential search, the most common lookup key
          should be that used for hash.
              a. therefore hashing on object id is not very useful.

 3. if an object hashkey function returns its key address and length:
    a. we make new versions of all jenkins functions accepting key and length.
    b. these new functions accept a callback instead of key and length.
    c. the new functions continue to compute their key as usual.
    d. we should optionally return a comparator function with the hashkey return.
    e. how will we return two parameters?
       1. jenkins passes pointer to struct hashkeyinfo, which is info it wants back:
          a. char* key
          b. uint  keylen;
          c. etch_comparator* compare;
          d. uint optional_hashkey; /* if non-zero, jenkins uses precalc key */
    f. so the callback looks like this
       int (*get_hashkey) (&my_hashkeyinfo, const int option);


 4. plan for objects with no current search key
    a. the objects are responsible for themselves. anonymous Object keys need to plan for a key.

 5. we should find_by the actual object, rather than by the actual key.
    a. for example, type lookup, keyed by field.
    b.
       result = jenkins_find(mytype->get_hashkey, &optional_out);
       or ...
       result = etchmap_find(mytype, &optional_out);


 6. however when we *put* an object, jenkins expects the address we give it to be both the key object,
    *and* the hash start.
    a. we could add a function to jenkins to accept different addresses for object and hashkey.

 7. the way jenkins works it expects the key object and the hashable info to exist at the same
    address. and the way etch works is that this address must point at the key object.
    a. possibly we can change jenkins to assume that the supplied address points at a hashinfo callback.


 X. test plan
    a. don't change object layout yet, but *assume* hashkey header is the callback
    b. write test program which defines new custom object types with varying keys

	1. we probably do not need the hashkey uint at the head of object,
	a. it is there in order to tell jenkins that an object is length 4 and to hash those 4 bytes.
	b. if we can supply the hash code to jenkins, rather than it doing the hash based on address and length,
	then we can lose this item.

	2. however we will need to replace the hashkey uint with a hashkey function.
	a. we want to be able to ask an object to give us its hashkey.
	1. or alternatively it could give us its hashable address and length.
	2. however it is probably cleaner to simply return a 4 byte hashkey.
	b. this is the way c# objects work, by supplying the actual key. they have a default, however
	types tpo be used as keys are supposed to override the default.
	c. we could add methods to jenkins to accept a hash function callback pointer.
	1. if our objects supply address and length we would not need to change jenkins to not hash.
	2. if our objects instead supply the hashkey value, we would need to add fuctions to jenkins to accept
	a precalculated hashcode rather than do the hash calculation itself.
	d. each object would need to have a unique vector of at least 4 bytes to hash.
	0. jenkins will copy the key to its own memory
	1. objects not likely to be hashed could simply supply their memory address.
	2. objects with names obviously use their name.
	3. objects with unique numeric keys use either the stringed key or the numeric key.
	4. we could establish an object id if we need it.
	5. for an object to do hash lookup rather than sequential search, the most common lookup key
	should be that used for hash.
	a. therefore hashing on object id is not very useful.

	3. if an object hashkey function returns its key address and length:
	a. we make new versions of all jenkins functions accepting key and length.
	b. these new functions accept a callback instead of key and length.
	c. the new functions continue to compute their key as usual.
	d. we should optionally return a comparator function with the hashkey return.
	e. how will we return two parameters?
	1. jenkins passes pointer to struct hashkeyinfo, which is info it wants back:
	a. char* key
	b. uint keylen;
	c. etch_comparator* compare;
	d. uint optional_hashkey; /* if non-zero, jenkins uses precalc key */
	f. so the callback looks like this
	int (*get_hashkey) (&my_hashkeyinfo, const int option);



	4. plan for objects with no current search key
	a. the objects are responsible for themselves. anonymous Object keys need to plan for a key.

	5. we should find_by the actual object, rather than by the actual key.
	a. for example, type lookup, keyed by field.
	b.
	result = jenkins_find(mytype->get_hashkey, &optional_out);
	or ...
	result = etchmap_find(mytype, &optional_out);


	6. however when we put an object, jenkins expects the address we give it to be both the key object,
	and the hash start.
	a. we could add a function to jenkins to accept different addresses for object and hashkey.

	7. the way jenkins works it expects the key object and the hashable info to exist at the same
	address. and the way etch works is that this address must point at the key object.
	a. possibly we can change jenkins to assume that the supplied address points at a hashinfo callback.



	X. test plan
	a. don't change object layout yet, but assume hashkey header is the callback
	b. write test program which defines new custom object types with varying keys