commit	d78ca2f5bcbef57a644591ad7805bc5e28d3ab91	[log] [tgz]
author	Knut Nesheim <knutin@gmail.com>	Fri Jan 10 15:53:25 2014 +0100
committer	Knut Nesheim <knutin@gmail.com>	Fri Jan 10 15:53:25 2014 +0100
tree	d210536a0feb3ffcef89fc14ec677b2b92490a82
parent	23dd5b13cbe802fa056299b0a5ed7cf1b6ee108a [diff]

tree: d210536a0feb3ffcef89fc14ec677b2b92490a82

bin/
doc/
src/
test/
.gitignore
Makefile
README.md
rebar
rebar.config

README.md

HyperLogLog for Erlang

hyper is an implementation of the HyperLogLog algorithm with bias correction from HLL++ as (described by Google)[http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//pubs/archive/40671.pdf].

Using hyper you can estimate the cardinality of large data sets using constant memory. The union of two HLL filters is lossless, which allows perfect parallellization.

Usage

1> hyper:insert(<<"foobar">>, hyper:insert(<<"quux">>, hyper:new(4))).
{hyper,4,{hyper_gb,{2,{10,1,{8,1,nil,nil},nil}}}}
2> hyper:card(v(-1)).
2.136502281992361

The error from estimations can be seen in this example:

10> Run = fun (P, Card) -> hyper:card(lists:foldl(fun (_, H) -> hyper:insert(crypto:rand_bytes(32), H) end, hyper:new(P), lists:seq(1, Card))) end.
#Fun<erl_eval.12.80484245>
11> Run(12, 10000).
9984.777312630407
12> Run(14, 10000).
9994.719695068114
13> Run(16, 10000).
9991.34646427657

HyperLogLog for Erlang

Usage

Report