| This directory contains several hash-map implementations, similar in |
| API to SGI's hash_map class, but with different performance |
| characteristics. sparse_hash_map uses very little space overhead, 1-2 |
| bits per entry. dense_hash_map is very fast, particulary on lookup. |
| (sparse_hash_set and dense_hash_set are the set versions of these |
| routines.) On the other hand, these classes have requirements that |
| may not make them appropriate for all applications. |
| |
| All these implementation use a hashtable with internal quadratic |
| probing. This method is space-efficient -- there is no pointer |
| overhead -- and time-efficient for good hash functions. |
| |
| COMPILING |
| --------- |
| To compile test applications with these classes, run ./configure |
| followed by make. To install these header files on your system, run |
| 'make install'. (On Windows, the instructions are different; see |
| README.windows.) See INSTALL for more details. |
| |
| This code should work on any modern C++ system. It has been tested on |
| Linux (Ubuntu, Fedora, RedHat, Debian), Solaris 10 x86, FreeBSD 6.0, |
| OS X 10.3 and 10.4, and Windows under both VC++7 and VC++8. |
| |
| USING |
| ----- |
| See the html files in the doc directory for small example programs |
| that use these classes. It's enough to just include the header file: |
| |
| #include <google/sparse_hash_map> // or sparse_hash_set, dense_hash_map, ... |
| google::sparse_hash_set<int, int> number_mapper; |
| |
| and use the class the way you would other hash-map implementations. |
| (Though see "API" below for caveats.) |
| |
| By default (you can change it via a flag to ./configure), these hash |
| implementations are defined in the google namespace. |
| |
| API |
| --- |
| The API for sparse_hash_map, dense_hash_map, sparse_hash_set, and |
| dense_hash_set, are a superset of the API of SGI's hash_map class. |
| See doc/sparse_hash_map.html, et al., for more information about the |
| API. |
| |
| The usage of these classes differ from SGI's hash_map, and other |
| hashtable implementations, in the following major ways: |
| |
| 1) dense_hash_map requires you to set aside one key value as the |
| 'empty bucket' value, set via the set_empty_key() method. This |
| *MUST* be called before you can use the dense_hash_map. It is |
| illegal to insert any elements into a dense_hash_map whose key is |
| equal to the empty-key. |
| |
| 2) For both dense_hash_map and sparse_hash_map, if you wish to delete |
| elements from the hashtable, you must set aside a key value as the |
| 'deleted bucket' value, set via the set_deleted_key() method. If |
| your hash-map is insert-only, there is no need to call this |
| method. If you call set_deleted_key(), it is illegal to insert any |
| elements into a dense_hash_map or sparse_hash_map whose key is |
| equal to the deleted-key. |
| |
| 3) These hash-map implementation support I/O. See below. |
| |
| There are also some smaller differences: |
| |
| 1) The constructor takes an optional argument that specifies the |
| number of elements you expect to insert into the hashtable. This |
| differs from SGI's hash_map implementation, which takes an optional |
| number of buckets. |
| |
| 2) erase() does not immediately reclaim memory. As a consequence, |
| erase() does not invalidate any iterators, making loops like this |
| correct: |
| for (it = ht.begin(); it != ht.end(); ++it) |
| if (...) ht.erase(it); |
| As another consequence, a series of erase() calls can leave your |
| hashtable using more memory than it needs to. The hashtable will |
| automatically compact() at the next call to insert(), but to |
| manually compact a hashtable, you can call |
| ht.resize(0) |
| |
| 3) While sparse_hash_map et al. accept an Allocator template argument, |
| they ignore it. They use malloc() and free() for all memory |
| allocations. |
| |
| 4) sparse_hash_map et al. do not use exceptions. |
| |
| I/O |
| --- |
| In addition to the normal hash-map operations, sparse_hash_map can |
| read and write hashtables to disk. (dense_hash_map also has the API, |
| but it has not yet been implemented, and writes will always fail.) |
| |
| In the simplest case, writing a hashtable is as easy as calling two |
| methods on the hashtable: |
| ht.write_metadata(fp); |
| ht.write_nopointer_data(fp); |
| |
| Reading in this data is equally simple: |
| google::sparse_hash_map<...> ht; |
| ht.read_metadata(fp); |
| ht.read_nopointer_data(fp); |
| |
| The above is sufficient if the key and value do not contain any |
| pointers: they are basic C types or agglomorations of basic C types. |
| If the key and/or value do contain pointers, you can still store the |
| hashtable by replacing write_nopointer_data() with a custom writing |
| routine. See sparse_hash_map.html et al. for more information. |
| |
| SPARSETABLE |
| ----------- |
| In addition to the hash-map and hash-set classes, this package also |
| provides sparsetable.h, an array implementation that uses space |
| proportional to the number of elements in the array, rather than the |
| maximum element index. It uses very little space overhead: 1 bit per |
| entry. See doc/sparsetable.html for the API. |
| |
| RESOURCE USAGE |
| -------------- |
| * sparse_hash_map has memory overhead of about 2 bits per hash-map |
| entry. |
| * dense_hash_map has a factor of 2-3 memory overhead: if your |
| hashtable data takes X bytes, dense_hash_map will use 3X-4X memory |
| total. |
| |
| Hashtables tend to double in size when resizing, creating an |
| additional 50% space overhead. dense_hash_map does in fact have a |
| significant "high water mark" memory use requirement. |
| sparse_hash_map, however, is written to need very little space |
| overhead when resizing: only a few bits per hashtable entry. |
| |
| PERFORMANCE |
| ----------- |
| You can compile and run the included file time_hash_map.cc to examine |
| the performance of sparse_hash_map, dense_hash_map, and your native |
| hash_map implementation on your system. One test against the |
| SGI hash_map implementation gave the following timing information for |
| a simple find() call: |
| SGI hash_map: 22 ns |
| dense_hash_map: 13 ns |
| sparse_hash_map: 117 ns |
| SGI map: 113 ns |
| |
| See doc/performance.html for more detailed charts on resource usage |
| and performance data. |
| |
| --- |
| 16 March 2005 |
| (Last updated: 20 March 2007) |