c3170a9fc6b5c30e85b94be615d8064ed7d26cd6 - kudu

commit	c3170a9fc6b5c30e85b94be615d8064ed7d26cd6	[log] [tgz]
author	Todd Lipcon <todd@apache.org>	Fri Jan 17 09:04:18 2020 -0800
committer	Todd Lipcon <todd@apache.org>	Tue Jan 21 23:36:59 2020 +0000
tree	28ca59d12d30e934fe12493ac1b6a16c8d625b07
parent	638d95d295da0bddf3fa157792ec7de8889ac919 [diff]

schema: use dense_hash_map instead of std::unordered_map

In a time series benchmark I'm working on, the client spent 12% of its
CPU in Schema::FindColumn. In particular, most of the CPU went to the
bucket calculation in std::unordered_map, which required a 'divq'
instruction that can take hundreds of cycles.

This switches Schema to use a dense_hash_map instead which performs
better. After this change, the percent of CPU used by my benchmark
worker thread in Schema::FindColumn dropped from ~12% to ~1.5% which
resulted in a few percent overall throughput increase.

This also made the fancy allocator which tried to count memory usage
unnecessary, since dense_hash_map is a simple enough data structure that
we can directly compute the memory usage. Now we can also simplify the
constructors since we no longer need to pass an allocator instance.

Change-Id: I8e8f80229b2dcfad05e204a6f6e50ce7dc3f4c73
Reviewed-on: http://gerrit.cloudera.org:8080/15064
Reviewed-by: Adar Dembo <adar@cloudera.com>
Tested-by: Kudu Jenkins

6 files changed

tree: 28ca59d12d30e934fe12493ac1b6a16c8d625b07