initial commit

commit: eb22734b85f857900bffd3a856626629c0f15019 [log] [tgz]
author: joewilliams <williams.joe@gmail.com> Fri Mar 30 15:22:30 2012 -0700
committer: joewilliams <williams.joe@gmail.com> Fri Mar 30 15:22:30 2012 -0700
tree: e77fd6fd582e5d919fb2d7b2f17ecfe529a19b76
diff --git a/LICENSE b/LICENSE
new file mode 100644
index 0000000..11069ed
--- /dev/null
+++ b/LICENSE

@@ -0,0 +1,201 @@
+                              Apache License
+                        Version 2.0, January 2004
+                     http://www.apache.org/licenses/
+
+TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+1. Definitions.
+
+   "License" shall mean the terms and conditions for use, reproduction,
+   and distribution as defined by Sections 1 through 9 of this document.
+
+   "Licensor" shall mean the copyright owner or entity authorized by
+   the copyright owner that is granting the License.
+
+   "Legal Entity" shall mean the union of the acting entity and all
+   other entities that control, are controlled by, or are under common
+   control with that entity. For the purposes of this definition,
+   "control" means (i) the power, direct or indirect, to cause the
+   direction or management of such entity, whether by contract or
+   otherwise, or (ii) ownership of fifty percent (50%) or more of the
+   outstanding shares, or (iii) beneficial ownership of such entity.
+
+   "You" (or "Your") shall mean an individual or Legal Entity
+   exercising permissions granted by this License.
+
+   "Source" form shall mean the preferred form for making modifications,
+   including but not limited to software source code, documentation
+   source, and configuration files.
+
+   "Object" form shall mean any form resulting from mechanical
+   transformation or translation of a Source form, including but
+   not limited to compiled object code, generated documentation,
+   and conversions to other media types.
+
+   "Work" shall mean the work of authorship, whether in Source or
+   Object form, made available under the License, as indicated by a
+   copyright notice that is included in or attached to the work
+   (an example is provided in the Appendix below).
+
+   "Derivative Works" shall mean any work, whether in Source or Object
+   form, that is based on (or derived from) the Work and for which the
+   editorial revisions, annotations, elaborations, or other modifications
+   represent, as a whole, an original work of authorship. For the purposes
+   of this License, Derivative Works shall not include works that remain
+   separable from, or merely link (or bind by name) to the interfaces of,
+   the Work and Derivative Works thereof.
+
+   "Contribution" shall mean any work of authorship, including
+   the original version of the Work and any modifications or additions
+   to that Work or Derivative Works thereof, that is intentionally
+   submitted to Licensor for inclusion in the Work by the copyright owner
+   or by an individual or Legal Entity authorized to submit on behalf of
+   the copyright owner. For the purposes of this definition, "submitted"
+   means any form of electronic, verbal, or written communication sent
+   to the Licensor or its representatives, including but not limited to
+   communication on electronic mailing lists, source code control systems,
+   and issue tracking systems that are managed by, or on behalf of, the
+   Licensor for the purpose of discussing and improving the Work, but
+   excluding communication that is conspicuously marked or otherwise
+   designated in writing by the copyright owner as "Not a Contribution."
+
+   "Contributor" shall mean Licensor and any individual or Legal Entity
+   on behalf of whom a Contribution has been received by Licensor and
+   subsequently incorporated within the Work.
+
+2. Grant of Copyright License. Subject to the terms and conditions of
+   this License, each Contributor hereby grants to You a perpetual,
+   worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+   copyright license to reproduce, prepare Derivative Works of,
+   publicly display, publicly perform, sublicense, and distribute the
+   Work and such Derivative Works in Source or Object form.
+
+3. Grant of Patent License. Subject to the terms and conditions of
+   this License, each Contributor hereby grants to You a perpetual,
+   worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+   (except as stated in this section) patent license to make, have made,
+   use, offer to sell, sell, import, and otherwise transfer the Work,
+   where such license applies only to those patent claims licensable
+   by such Contributor that are necessarily infringed by their
+   Contribution(s) alone or by combination of their Contribution(s)
+   with the Work to which such Contribution(s) was submitted. If You
+   institute patent litigation against any entity (including a
+   cross-claim or counterclaim in a lawsuit) alleging that the Work
+   or a Contribution incorporated within the Work constitutes direct
+   or contributory patent infringement, then any patent licenses
+   granted to You under this License for that Work shall terminate
+   as of the date such litigation is filed.
+
+4. Redistribution. You may reproduce and distribute copies of the
+   Work or Derivative Works thereof in any medium, with or without
+   modifications, and in Source or Object form, provided that You
+   meet the following conditions:
+
+   (a) You must give any other recipients of the Work or
+       Derivative Works a copy of this License; and
+
+   (b) You must cause any modified files to carry prominent notices
+       stating that You changed the files; and
+
+   (c) You must retain, in the Source form of any Derivative Works
+       that You distribute, all copyright, patent, trademark, and
+       attribution notices from the Source form of the Work,
+       excluding those notices that do not pertain to any part of
+       the Derivative Works; and
+
+   (d) If the Work includes a "NOTICE" text file as part of its
+       distribution, then any Derivative Works that You distribute must
+       include a readable copy of the attribution notices contained
+       within such NOTICE file, excluding those notices that do not
+       pertain to any part of the Derivative Works, in at least one
+       of the following places: within a NOTICE text file distributed
+       as part of the Derivative Works; within the Source form or
+       documentation, if provided along with the Derivative Works; or,
+       within a display generated by the Derivative Works, if and
+       wherever such third-party notices normally appear. The contents
+       of the NOTICE file are for informational purposes only and
+       do not modify the License. You may add Your own attribution
+       notices within Derivative Works that You distribute, alongside
+       or as an addendum to the NOTICE text from the Work, provided
+       that such additional attribution notices cannot be construed
+       as modifying the License.
+
+   You may add Your own copyright statement to Your modifications and
+   may provide additional or different license terms and conditions
+   for use, reproduction, or distribution of Your modifications, or
+   for any such Derivative Works as a whole, provided Your use,
+   reproduction, and distribution of the Work otherwise complies with
+   the conditions stated in this License.
+
+5. Submission of Contributions. Unless You explicitly state otherwise,
+   any Contribution intentionally submitted for inclusion in the Work
+   by You to the Licensor shall be under the terms and conditions of
+   this License, without any additional terms or conditions.
+   Notwithstanding the above, nothing herein shall supersede or modify
+   the terms of any separate license agreement you may have executed
+   with Licensor regarding such Contributions.
+
+6. Trademarks. This License does not grant permission to use the trade
+   names, trademarks, service marks, or product names of the Licensor,
+   except as required for reasonable and customary use in describing the
+   origin of the Work and reproducing the content of the NOTICE file.
+
+7. Disclaimer of Warranty. Unless required by applicable law or
+   agreed to in writing, Licensor provides the Work (and each
+   Contributor provides its Contributions) on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+   implied, including, without limitation, any warranties or conditions
+   of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+   PARTICULAR PURPOSE. You are solely responsible for determining the
+   appropriateness of using or redistributing the Work and assume any
+   risks associated with Your exercise of permissions under this License.
+
+8. Limitation of Liability. In no event and under no legal theory,
+   whether in tort (including negligence), contract, or otherwise,
+   unless required by applicable law (such as deliberate and grossly
+   negligent acts) or agreed to in writing, shall any Contributor be
+   liable to You for damages, including any direct, indirect, special,
+   incidental, or consequential damages of any character arising as a
+   result of this License or out of the use or inability to use the
+   Work (including but not limited to damages for loss of goodwill,
+   work stoppage, computer failure or malfunction, or any and all
+   other commercial damages or losses), even if such Contributor
+   has been advised of the possibility of such damages.
+
+9. Accepting Warranty or Additional Liability. While redistributing
+   the Work or Derivative Works thereof, You may choose to offer,
+   and charge a fee for, acceptance of support, warranty, indemnity,
+   or other liability obligations and/or rights consistent with this
+   License. However, in accepting such obligations, You may act only
+   on Your own behalf and on Your sole responsibility, not on behalf
+   of any other Contributor, and only if You agree to indemnify,
+   defend, and hold each Contributor harmless for any liability
+   incurred by, or claims asserted against, such Contributor by reason
+   of your accepting any such warranty or additional liability.
+
+END OF TERMS AND CONDITIONS
+
+APPENDIX: How to apply the Apache License to your work.
+
+   To apply the Apache License to your work, attach the following
+   boilerplate notice, with the fields enclosed by brackets "[]"
+   replaced with your own identifying information. (Don't include
+   the brackets!)  The text should be enclosed in the appropriate
+   comment syntax for the file format. We also recommend that a
+   file or class name and description of purpose be included on the
+   same "printed page" as the copyright notice for easier
+   identification within third-party archives.
+
+Copyright [yyyy] [name of copyright owner]
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.

diff --git a/README.md b/README.md
new file mode 100644
index 0000000..5b941da
--- /dev/null
+++ b/README.md

@@ -0,0 +1 @@
+### bear : a set of statistics functions for erlang

diff --git a/rebar b/rebar
new file mode 100755
index 0000000..77abae6
--- /dev/null
+++ b/rebar
Binary files differ

diff --git a/rebar.config b/rebar.config
new file mode 100644
index 0000000..3ee1ec7
--- /dev/null
+++ b/rebar.config

@@ -0,0 +1,3 @@
+{deps, []}.
+{erl_opts, [debug_info]}.
+{cover_enabled, true}.

diff --git a/src/bear.erl b/src/bear.erl
new file mode 100644
index 0000000..97bac0f
--- /dev/null
+++ b/src/bear.erl

@@ -0,0 +1,372 @@
+%%%
+%%% Copyright 2011, Boundary
+%%%
+%%% Licensed under the Apache License, Version 2.0 (the "License");
+%%% you may not use this file except in compliance with the License.
+%%% You may obtain a copy of the License at
+%%%
+%%%     http://www.apache.org/licenses/LICENSE-2.0
+%%%
+%%% Unless required by applicable law or agreed to in writing, software
+%%% distributed under the License is distributed on an "AS IS" BASIS,
+%%% WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+%%% See the License for the specific language governing permissions and
+%%% limitations under the License.
+%%%
+
+
+%%%-------------------------------------------------------------------
+%%% File:      bear.erl
+%%% @author    joe williams <j@boundary.com>
+%%% @doc
+%%% statistics functions for calucating based on id and a list of values
+%%% @end
+%%%------------------------------------------------------------------
+
+-module(bear).
+
+-compile([export_all]).
+
+-export([
+         get_statistics/1,
+         get_statistics/2
+        ]).
+
+-define(HIST_BINS, 10).
+
+-define(STATS_MIN, 5).
+
+-record(scan_result, {n=0, sumX=0, sumXX=0, sumInv=0, sumLog, max, min}).
+-record(scan_result2, {x2=0, x3=0, x4=0}).
+
+-compile([native]).
+
+get_statistics(Values) when length(Values) < ?STATS_MIN ->
+    [
+     {min, 0.0},
+     {max, 0.0},
+     {arithmetic_mean, 0.0},
+     {geometric_mean, 0.0},
+     {harmonic_mean, 0.0},
+     {median, 0.0},
+     {variance, 0.0},
+     {standard_deviation, 0.0},
+     {skewness, 0.0},
+     {kurtosis, 0.0},
+     {percentile,
+      [
+       {75, 0.0},
+       {95, 0.0},
+       {99, 0.0},
+       {999, 0.0}
+      ]
+     },
+     {histogram, [{0, 0}]}
+     ];
+get_statistics(Values) ->
+    Scan_res = scan_values(Values),
+    Scan_res2 = scan_values2(Values, Scan_res),
+    Variance = variance(Scan_res, Scan_res2),
+    SortedValues = lists:sort(Values),
+    [
+     {min, Scan_res#scan_result.min},
+     {max, Scan_res#scan_result.max},
+     {arithmetic_mean, arithmetic_mean(Scan_res)},
+     {geometric_mean, geometric_mean(Scan_res)},
+     {harmonic_mean, harmonic_mean(Scan_res)},
+     {median, percentile(SortedValues, Scan_res, 0.5)},
+     {variance, Variance},
+     {standard_deviation, std_deviation(Scan_res, Scan_res2)},
+     {skewness, skewness(Scan_res, Scan_res2)},
+     {kurtosis, kurtosis(Scan_res, Scan_res2)},
+     {percentile,
+      [
+       {75, percentile(SortedValues, Scan_res, 0.75)},
+       {95, percentile(SortedValues, Scan_res, 0.95)},
+       {99, percentile(SortedValues, Scan_res, 0.99)},
+       {999, percentile(SortedValues, Scan_res, 0.999)}
+      ]
+     },
+     {histogram, get_histogram(Values, Scan_res, Scan_res2)}
+     ].
+
+get_statistics(Values, _) when length(Values) < ?STATS_MIN ->
+    0.0;
+get_statistics(_, Values) when length(Values) < ?STATS_MIN ->
+    0.0;
+get_statistics(Values1, Values2) when length(Values1) /= length(Values2) ->
+    0.0;
+get_statistics(Values1, Values2) ->
+    [
+     {covariance, get_covariance(Values1, Values2)},
+     {tau, get_kendall_correlation(Values1, Values2)},
+     {rho, get_pearson_correlation(Values1, Values2)},
+     {r, get_spearman_correlation(Values1, Values2)}
+    ].
+
+%%%===================================================================
+%%% Internal functions
+%%%===================================================================
+
+scan_values([X|Values]) ->
+    scan_values(Values, #scan_result{n=1, sumX=X, sumXX=X*X,
+             sumLog=math_log(X),
+             max=X, min=X, sumInv=inverse(X)}).
+
+scan_values([X|Values],
+      #scan_result{n=N, sumX=SumX, sumXX=SumXX, sumLog=SumLog,
+                   max=Max, min=Min, sumInv=SumInv}=Acc) ->
+    scan_values(Values,
+                Acc#scan_result{n=N+1, sumX=SumX+X, sumXX=SumXX+X*X,
+                                sumLog=SumLog+math_log(X),
+                                max=max(X,Max), min=min(X,Min),
+                                sumInv=SumInv+inverse(X)});
+scan_values([], Acc) ->
+    Acc.
+
+scan_values2(Values, #scan_result{n=N, sumX=SumX}) ->
+    scan_values2(Values, SumX/N, #scan_result2{}).
+
+scan_values2([X|Values], Mean, #scan_result2{x2=X2, x3=X3, x4=X4}=Acc) ->
+    Diff = X-Mean,
+    Diff2 = Diff*Diff,
+    Diff3 = Diff2*Diff,
+    Diff4 = Diff2*Diff2,
+    scan_values2(Values, Mean, Acc#scan_result2{x2=X2+Diff2, x3=X3+Diff3,
+            x4=X4+Diff4});
+scan_values2([], _, Acc) ->
+    Acc.
+
+
+arithmetic_mean(#scan_result{n=N, sumX=Sum}) ->
+    Sum/N.
+
+geometric_mean(#scan_result{n=N, sumLog=SumLog}) ->
+    math:exp(SumLog/N).
+
+harmonic_mean(#scan_result{n=N, sumInv=Sum}) ->
+    N/Sum.
+
+percentile(SortedValues, #scan_result{n=N}, Percentile)
+  when is_list(SortedValues) ->
+    Element = round(Percentile * N),
+    lists:nth(Element, SortedValues).
+
+%% Two pass variance
+%% Results match those given by the 'var' function in R
+variance(#scan_result{n=N}, #scan_result2{x2=X2}) ->
+    X2/(N-1).
+
+std_deviation(Scan_res, Scan_res2) ->
+    math:sqrt(variance(Scan_res, Scan_res2)).
+
+%% http://en.wikipedia.org/wiki/Skewness
+%%
+%% skewness results should match this R function:
+%% skewness <- function(x) {
+%%    m3 <- mean((x - mean(x))^3)
+%%    skew <- m3 / (sd(x)^3)
+%%    skew
+%% }
+skewness(#scan_result{n=N}=Scan_res, #scan_result2{x3=X3}=Scan_res2) ->
+    case math:pow(std_deviation(Scan_res,Scan_res2), 3) of
+        0.0 ->
+            0.0;  %% Is this really the correct thing to do here?
+        Else ->
+            (X3/N)/Else
+    end.
+
+%% http://en.wikipedia.org/wiki/Kurtosis
+%%
+%% results should match this R function:
+%% kurtosis <- function(x) {
+%%     m4 <- mean((x - mean(x))^4)
+%%     kurt <- m4 / (sd(x)^4) - 3
+%%     kurt
+%% }
+kurtosis(#scan_result{n=N}=Scan_res, #scan_result2{x4=X4}=Scan_res2) ->
+    case math:pow(std_deviation(Scan_res,Scan_res2), 4) of
+        0.0 ->
+            0.0;  %% Is this really the correct thing to do here?
+        Else ->
+            ((X4/N)/Else) - 3
+    end.
+
+get_histogram(Values, Scan_res, Scan_res2) ->
+    Bins = get_hist_bins(Scan_res#scan_result.min,
+                         Scan_res#scan_result.max,
+                         std_deviation(Scan_res, Scan_res2),
+                         length(Values)
+                        ),
+
+    Dict = lists:foldl(fun (Value, Dict) ->
+             update_bin(Value, Bins, Dict)
+           end,
+           dict:from_list([{Bin, 0} || Bin <- Bins]),
+           Values),
+
+    lists:sort(dict:to_list(Dict)).
+
+update_bin(Value, [Bin|_Bins], Dict) when Value =< Bin ->
+    dict:update_counter(Bin, 1, Dict);
+update_bin(Values, [_Bin|Bins], Dict) ->
+    update_bin(Values, Bins, Dict).
+
+%% two pass covariance
+%% (http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Covariance)
+%% matches results given by excel's 'covar' function
+get_covariance(Values, _) when length(Values) < ?STATS_MIN ->
+    0.0;
+get_covariance(_, Values) when length(Values) < ?STATS_MIN ->
+    0.0;
+get_covariance(Values1, Values2) when length(Values1) /= length(Values2) ->
+    0.0;
+get_covariance(Values1, Values2) ->
+    {SumX, SumY, N} = foldl2(fun (X, Y, {SumX, SumY, N}) ->
+             {SumX+X, SumY+Y, N+1}
+           end, {0,0,0}, Values1, Values2),
+    MeanX = SumX/N,
+    MeanY = SumY/N,
+    Sum = foldl2(fun (X, Y, Sum) ->
+       Sum + ((X - MeanX) * (Y - MeanY))
+     end,
+     0, Values1, Values2),
+    Sum/N.
+
+get_kendall_correlation(Values, _) when length(Values) < ?STATS_MIN ->
+    0.0;
+get_kendall_correlation(_, Values) when length(Values) < ?STATS_MIN ->
+    0.0;
+get_kendall_correlation(Values1, Values2) when length(Values1) /= length(Values2) ->
+    0.0;
+get_kendall_correlation(Values1, Values2) ->
+    bear_scutil:kendall_correlation(Values1, Values2).
+
+get_spearman_correlation(Values, _) when length(Values) < ?STATS_MIN ->
+    0.0;
+get_spearman_correlation(_, Values) when length(Values) < ?STATS_MIN ->
+    0.0;
+get_spearman_correlation(Values1, Values2) when length(Values1) /= length(Values2) ->
+    0.0;
+get_spearman_correlation(Values1, Values2) ->
+    TR1 = ranks_of(Values1),
+    TR2 = ranks_of(Values2),
+    Numerator   = 6 * foldl2(fun (X, Y, Acc) ->
+             Diff = X-Y,
+             Acc + Diff*Diff
+           end, 0, TR1,TR2),
+    N = length(Values1),
+    Denominator = math:pow(N,3)-N,
+    1-(Numerator/Denominator).
+
+ranks_of(Values) when is_list(Values) ->
+    [Fst|Rest] = revsort(Values),
+    TRs = ranks_of(Rest, [], 2, Fst, 1),
+    Dict = gb_trees:from_orddict(TRs),
+    L = lists:foldl(fun (Val, Acc) ->
+                            Rank = gb_trees:get(Val, Dict),
+                            [Rank|Acc]
+                    end, [], Values),
+    lists:reverse(L).
+
+ranks_of([E|Es],Acc, N, E, S) ->
+    ranks_of(Es, Acc, N+1, E, S);
+ranks_of([E|Es], Acc, N, P, S) ->
+    ranks_of(Es,[{P,(S+N-1)/2}|Acc], N+1, E, N);
+ranks_of([],  Acc, N, P, S) ->
+    [{P,(S+N-1)/2}|Acc].
+
+
+get_pearson_correlation(Values, _) when length(Values) < ?STATS_MIN ->
+    0.0;
+get_pearson_correlation(_, Values) when length(Values) < ?STATS_MIN ->
+    0.0;
+get_pearson_correlation(Values1, Values2) when length(Values1) /= length(Values2) ->
+    0.0;
+get_pearson_correlation(Values1, Values2) ->
+    {SumX, SumY, SumXX, SumYY, SumXY, N} =
+  foldl2(fun (X,Y,{SX, SY, SXX, SYY, SXY, N}) ->
+          {SX+X, SY+Y, SXX+X*X, SYY+Y*Y, SXY+X*Y, N+1}
+        end, {0,0,0,0,0,0}, Values1, Values2),
+    Numer = (N*SumXY) - (SumX * SumY),
+    case math:sqrt(((N*SumXX)-(SumX*SumX)) * ((N*SumYY)-(SumY*SumY))) of
+        0.0 ->
+            0.0; %% Is this really the correct thing to do here?
+        Denom ->
+            Numer/Denom
+    end.
+
+revsort(L) ->
+    lists:reverse(lists:sort(L)).
+
+%% Foldl over two lists
+foldl2(F, Acc, [I1|L1], [I2|L2]) when is_function(F,3) ->
+    foldl2(F, F(I1, I2, Acc), L1, L2);
+foldl2(_F, Acc, [], []) ->
+    Acc.
+
+%% wrapper for math:log/1 to avoid dividing by zero
+math_log(0) ->
+    1;
+math_log(X) ->
+    math:log(X).
+
+%% wrapper for calculating inverse to avoid dividing by zero
+inverse(0) ->
+    0;
+inverse(X) ->
+    1/X.
+
+get_hist_bins(Min, Max, StdDev, Count) ->
+    BinWidth = get_bin_width(StdDev, Count),
+    BinCount = get_bin_count(Min, Max, BinWidth),
+    case get_bin_list(BinWidth, BinCount, []) of
+        List when length(List) =< 1 ->
+            [Max];
+        Bins ->
+            %% add Min to Bins
+            [Bin + Min || Bin <- Bins]
+    end.
+
+get_bin_list(Width, Bins, Acc) when Bins > length(Acc) ->
+    Bin = ((length(Acc) + 1) * Width ),
+    get_bin_list(Width, Bins, [round_bin(Bin)| Acc]);
+get_bin_list(_, _, Acc) ->
+    lists:usort(Acc).
+
+round_bin(Bin) ->
+    Base = case erlang:trunc(math:pow(10, round(math:log10(Bin) - 1))) of
+        0 ->
+            1;
+        Else ->
+            Else
+    end,
+    %io:format("bin ~p, base ~p~n", [Bin, Base]),
+    round_bin(Bin, Base).
+
+round_bin(Bin, Base) when Bin rem Base == 0 ->
+    Bin;
+round_bin(Bin, Base) ->
+    Bin + Base - (Bin rem Base).
+
+% the following is up for debate as far as what the best method
+% of choosing bin counts and widths. these seem to work *good enough*
+% in my testing
+
+% bin width based on Sturges
+% http://www.jstor.org/pss/2965501
+get_bin_width(StdDev, Count) ->
+    %io:format("stddev: ~p, count: ~p~n", [StdDev, Count]),
+    case round((3.5 * StdDev) / math:pow(Count, 0.3333333)) of
+        0 ->
+            1;
+        Else ->
+            Else
+    end.
+
+% based on the simple ceilng function at
+% http://en.wikipedia.org/wiki/Histograms#Number_of_bins_and_width
+% with a modification to attempt to get on bin beyond the max value
+get_bin_count(Min, Max, Width) ->
+    %io:format("min: ~p, max: ~p, width ~p~n", [Min, Max, Width]),
+    round((Max - Min) / Width) + 1.

diff --git a/src/bear_scutil.erl b/src/bear_scutil.erl
new file mode 100644
index 0000000..e684fb7
--- /dev/null
+++ b/src/bear_scutil.erl

@@ -0,0 +1,75 @@
+%% taken from http://crunchyd.com/scutil/
+%% All code here is MIT Licensed
+%%  http://scutil.com/license.html
+
+-module(bear_scutil).
+
+-export([
+         kendall_correlation/2
+        ]).
+-compile([export_all]).
+-compile([native]).
+
+% seems to match the value returned by the 'cor' (method="kendal") R function
+% http://en.wikipedia.org/wiki/Kendall_tau_rank_correlation_coefficient
+kendall_correlation(List1, List2) when is_list(List1), is_list(List2) ->
+    {RA,_} = lists:unzip(tied_ordered_ranking(List1)),
+    {RB,_} = lists:unzip(tied_ordered_ranking(List2)),
+
+    Ordering = lists:keysort(1, lists:zip(RA,RB)),
+    {_,OrdB} = lists:unzip(Ordering),
+
+    N = length(List1),
+    P = lists:sum(kendall_right_of(OrdB, [])),
+
+    -(( (4*P) / (N * (N - 1))) - 1).
+
+%%%===================================================================
+%%% Internal functions
+%%%==================================================================
+
+simple_ranking(List) when is_list(List) ->
+    lists:zip(lists:seq(1,length(List)),lists:reverse(lists:sort(List))).
+
+tied_ranking(List) ->
+    tied_rank_worker(simple_ranking(List), [], no_prev_value).
+
+tied_ordered_ranking(List) when is_list(List) ->
+    tied_ordered_ranking(List, tied_ranking(List), []).
+
+tied_ordered_ranking([], [], Work) ->
+    lists:reverse(Work);
+
+tied_ordered_ranking([Front|Rem], Ranks, Work) ->
+    {value,Item}  = lists:keysearch(Front,2,Ranks),
+    {IRank,Front} = Item,
+    tied_ordered_ranking(Rem, Ranks--[Item], [{IRank,Front}]++Work).
+
+kendall_right_of([], Work) ->
+    lists:reverse(Work);
+kendall_right_of([F|R], Work) ->
+    kendall_right_of(R, [kendall_right_of_item(F,R)]++Work).
+
+kendall_right_of_item(B, Rem) ->
+    length([R || R <- Rem, R < B]).
+
+tied_add_prev(Work, {FoundAt, NewValue}) ->
+    lists:duplicate( length(FoundAt), {lists:sum(FoundAt)/length(FoundAt), NewValue} ) ++ Work.
+
+tied_rank_worker([], Work, PrevValue) ->
+    lists:reverse(tied_add_prev(Work, PrevValue));
+
+tied_rank_worker([Item|Remainder], Work, PrevValue) ->
+    case PrevValue of
+        no_prev_value ->
+            {BaseRank,BaseVal} = Item,
+            tied_rank_worker(Remainder, Work, {[BaseRank],BaseVal});
+        {FoundAt,OldVal} ->
+            case Item of
+                {Id,OldVal} ->
+                    tied_rank_worker(Remainder, Work, {[Id]++FoundAt,OldVal});
+                {Id,NewVal} ->
+                    tied_rank_worker(Remainder, tied_add_prev(Work, PrevValue), {[Id],NewVal})
+
+            end
+    end.
commit	eb22734b85f857900bffd3a856626629c0f15019	[log] [tgz]
author	joewilliams <williams.joe@gmail.com>	Fri Mar 30 15:22:30 2012 -0700
committer	joewilliams <williams.joe@gmail.com>	Fri Mar 30 15:22:30 2012 -0700
tree	e77fd6fd582e5d919fb2d7b2f17ecfe529a19b76