blob: 64e43686a5a170ee4a2a32248aab9b2a1b22f04b [file] [log] [blame]
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## KLL Sketch Examples"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Basic Sketch Usage"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from datasketches import kll_floats_sketch, kll_ints_sketch"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Options are a `kll_floats_sketch` or `kll_ints_sketch`. We'll use the former so we can draw samples from a Gaussian distribution. We start by creating a sketch with $k=200$, which gives a normalized rank error of about 1.65%, and feeding in 1 million points."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"n = 1000000\n",
"kll = kll_floats_sketch(200)\n",
"from numpy.random import randn\n",
"for i in range(0, n):\n",
" kll.update(randn()) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Since the data is distributed as $\\cal{N}(0,1)$, 0.0 should be near the median rank (0.5)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.497608"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"kll.get_rank(0.0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And the median should also be near 0.0"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.003108405973762274"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"kll.get_quantile(0.5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We track the min and max values as well. They are stored separately from the quantile data so we can always determine the full _empirical_ data range. In this case they should be very roughly symmetric around 0.0. We can query these values explicitly, or implicitly by asking for the values at ranks 0.0 and 1.0."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[-4.6000142097473145, 4.779754638671875]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"[kll.get_min_value(), kll.get_max_value()]"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[-4.6000142097473145, 4.779754638671875]"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"kll.get_quantiles([0.0, 1.0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And out of curiosity, we can check how many items the sketch has seen and how many it is retaining"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1000000"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"kll.get_n()"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"614"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"kll.get_num_retained()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, we can serialize the sketch for archiving, and reconstruct it later. Note that the serialized image does _not_ contain information on whether it is a floats or ints sketch."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2536"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sk_bytes = kll.serialize()\n",
"len(sk_bytes)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"### KLL sketch summary:\n",
" K : 200\n",
" min K : 200\n",
" M : 8\n",
" N : 1000000\n",
" Epsilon : 1.33%\n",
" Epsilon PMF : 1.65%\n",
" Empty : false\n",
" Estimation mode: true\n",
" Levels : 13\n",
" Sorted : true\n",
" Capacity items : 617\n",
" Retained items : 614\n",
" Storage bytes : 2536\n",
" Min value : -4.6\n",
" Max value : 4.78\n",
"### End sketch summary\n",
"\n"
]
}
],
"source": [
"kll2 = kll_floats_sketch.deserialize(sk_bytes)\n",
"print(kll2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Merging Sketches"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"KLL sketches have a `merge()` operation to combine sketches. The resulting sketch will have no worse error boudns than if the full data had been sent to a single sketch."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Our previous sketch used $\\cal{N}(0,1)$, so now we'll generate a shifted Gaussian distributed as $\\cal{N}(4,1)$. For added variety, we can use half as many points. The next section will generate a plot, so we will defer queries of the merged skech to that section."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"sk2 = kll_floats_sketch(200)\n",
"for i in range(0, int(n/2)):\n",
" sk2.update(4 + randn())"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"### KLL sketch summary:\n",
" K : 200\n",
" min K : 200\n",
" M : 8\n",
" N : 1500000\n",
" Epsilon : 1.33%\n",
" Epsilon PMF : 1.65%\n",
" Empty : false\n",
" Estimation mode: true\n",
" Levels : 13\n",
" Sorted : false\n",
" Capacity items : 617\n",
" Retained items : 580\n",
" Storage bytes : 2400\n",
" Min value : -4.6\n",
" Max value : 9.06\n",
"### End sketch summary\n",
"\n"
]
}
],
"source": [
"kll.merge(sk2)\n",
"print(kll)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Generating Histograms"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The KLL sketch allows us compute histograms via the probability mass function (pmf). Since histograms are a typical plot type when visualizing data distributions, we will create such a figure. To instead create a cumulative distribution function (cdf) from the sketch, simply replace the call to `get_pmf()` with `get_cdf()`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We want our x-axis to have evenly distributed bins, so the first step is to split the empirical data range\n",
"into a set of bins."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"xmin = kll.get_min_value()\n",
"num_splits = 30\n",
"step = (kll.get_max_value() - xmin) / num_splits\n",
"splits = [xmin + (i*step) for i in range(0, num_splits)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`get_pmf()` returns the probability mass in the range $(x_{i-1}, x_i]$, for each bin $i$. If we use the minimum value for $x_{i-1}$ this covers the low end, but `get_pmf()` also returns an extra bin with all mass greater than the last-provided split point. As a result, the pmf array is 1 larger than the list of split points. We need to be sure to append a value to the split points for plotting."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"pmf = kll.get_pmf(splits)\n",
"x = splits # this will hold the x-axis values, so need to append the max value\n",
"x.append(kll.get_max_value())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We need some plotting-related imports and options"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"import seaborn as sns\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline\n",
"sns.set(color_codes=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using a negative width in the plot gives right-aligned bins, which matches the bin definition noted earlier."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<BarContainer object of 31 artists>"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXwAAAD7CAYAAABpJS8eAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAUgklEQVR4nO3dfYxc11nH8e+u7V3b8Tpt3aliJyQISh8UBHFoCCp2oCgG1IjUlCSNFKM0ahs3qigvSkVVxYHwkgoaBVcWKVQJqgPutrSuqJvWSSs3hhSqNKWKE4mkj2glAjhGmA0Q240d22v+mLvRdLu7c3d3dmdmz/cjWZp7zrnrZ95+e/fMvWcGzp07hyRp6RvsdgGSpMVh4EtSIQx8SSqEgS9JhTDwJakQy7tdwDSGgZ8CjgBnu1yLJPWLZcB64BvAqcmdvRr4PwV8tdtFSFKfugr4h8mNvRr4RwD+539OMD7eW9cJrFu3hrGx490uoxZrXRj9VCv0V73WOj+DgwO8+tXnQZWhk9UK/Ii4CdgBDAE7M/O+acY9CBzMzN3V9ibgI8AKYAx4Z2Y+V+O/PAswPn6u5wIf6MmapmOtC6OfaoX+qtdaO2LKqfC2H9pGxIXA3cBm4DJge0RcOmnMhoh4CLhh0u6fAN6VmRur27vmULgkqQPqnKWzBXg0M1/IzBPAXuD6SWO2AfuAT080RMQwsCMzn66angYunn/JkqS5qDOls4HvnQ86AlzZOiAz7wGIiM0tbaeAPVX7IHAX8Ln5lStJmqs6gT8wRdt43f8gIoaAB6v/60N194PmhyK9qNEY6XYJtVnrwuinWqG/6rXWhVMn8A/TPMVnwnrg+To/PCLWAJ+n+YHt1sw8PZvixsaO99yHIo3GCEePHut2GbVY68Lop1qhv+q11vkZHByY8UC5TuAfAO6KiAZwArgO2F7z/98DfBt4T2b2VnJLUmHafmibmYeBO4CDwCFgNDOfiIj9EXHFdPtFxOXAVmAT8GREHIqI/R2qW5I0S7XOw8/MUWB0Uts1U4y7peX2k0w9/6+CjKxdxcrhmV9mJ0+dWaRqpLL16pW2WiJWDi/n2tv3zTjmoXu3LlI1UtlcLVOSCuERvnrCy6fPtj3F7eSpMxx78aVFqkhaegx89YShFctqTf301klwUn9xSkeSCmHgS1IhDHxJKoSBL0mFMPAlqRAGviQVwsCXpEIY+JJUCANfkgph4EtSIQx8SSqEgS9JhTDwJakQBr4kFcLAl6RCGPiSVAgDX5IKYeBLUiEMfEkqhIEvSYUw8CWpEAa+JBVieZ1BEXETsAMYAnZm5n3TjHsQOJiZu6vti4E9wOuABLZl5vEO1K1CjaxdxcrhmV+2J0+d4diLLy1SRVL/aBv4EXEhcDfwRuAU8LWIOJiZz7SM2QB8DLgaONiy+0eBj2bmpyLiTuBO4AMdrF+FWTm8nGtv3zfjmIfu3cqxRapH6id1pnS2AI9m5guZeQLYC1w/acw2YB/w6YmGiFgB/Gw1HmA3cMN8C5YkzU2dKZ0NwJGW7SPAla0DMvMegIjY3NL8WuDFzDzTst9Fcy9VkjQfdQJ/YIq28QXc7xXr1q2ZzfBF02iMdLuE2vqp1k5a6Pvdb49rP9VrrQunTuAfBq5q2V4PPF9jv6PA2ohYlplnZ7HfK8bGjjM+fm42uyy4RmOEo0f7Y4a4F2rt1htiIe93Lzyus9FP9Vrr/AwODsx4oFxnDv8AcHVENCJiNXAd8Ei7nTLzNPBV4Maq6Wbg4Rr/nyRpAbQN/Mw8DNxB8+ybQ8BoZj4REfsj4oo2u78X2B4Rz9D8K2HHfAuWJM1NrfPwM3MUGJ3Uds0U426ZtP0c8Oa5lydJ6hSvtJWkQhj4klQIA1+SCmHgS1IhDHxJKoSBL0mFMPAlqRAGviQVwsCXpEIY+JJUCANfkgph4EtSIWotniZNVvfLxCX1DgNfc1L3y8Ql9Q6ndCSpEAa+JBXCwJekQhj4klQIA1+SCmHgS1IhDHxJKoSBL0mFMPAlqRAGviQVwsCXpEIY+JJUCANfkgpRa7XMiLgJ2AEMATsz875J/RuB+4HzgceA2zLzTET8IPBXwFrgf4F3ZOZznStfklRX2yP8iLgQuBvYDFwGbI+ISycN2wO8LzPfAAwAt1btfwh8MjM3Ap+tfo4kqQvqTOlsAR7NzBcy8wSwF7h+ojMiLgFWZebjVdNu4Ibq9jKaR/cA5wEvdaJoSdLs1ZnS2QAcadk+AlzZpv+i6vadwNci4jdoTge9aTbFrVu3ZjbDF02jMdLtEmrrp1o7aaHvd789rv1Ur7UunDqBPzBF23jN/geB7Zm5LyKuA/42In4iM8/VKW5s7Djj47WGLppGY4SjR491u4xaFrLWXn+hL+Rz1E+vAeiveq11fgYHB2Y8UK4zpXMYuKBlez3wfLv+iGgAP5qZ+wAy87PVuNfWK12S1El1Av8AcHVENCJiNXAd8MhEZ3XWzcmI2FQ13Qw8DPx31b4ZoOo/lplHO3kHJEn1tA38zDwM3AEcBA4Bo5n5RETsj4grqmHbgJ0R8SzND2d3VdM2vwrcGxFPAx+m+ctCktQFtc7Dz8xRYHRS2zUtt5/iez/InWh/AvjpedYoSeoAr7SVpEIY+JJUCANfkgph4EtSIQx8SSqEgS9JhTDwJakQBr4kFcLAl6RCGPiSVAgDX5IKYeBLUiEMfEkqhIEvSYUw8CWpEAa+JBXCwJekQtT6xiupn7x8+iyNxkjbcSdPneHYiy8tQkVSbzDwteQMrVjGtbfvazvuoXu3cmwR6pF6hVM6klQIA1+SCmHgS1IhDHxJKoSBL0mFMPAlqRAGviQVotZ5+BFxE7ADGAJ2ZuZ9k/o3AvcD5wOPAbdl5pmIWA88AGwAvgtsy8x/7Vz5kqS62h7hR8SFwN3AZuAyYHtEXDpp2B7gfZn5BmAAuLVq/2vgocy8vLr9J50qXJI0O3WmdLYAj2bmC5l5AtgLXD/RGRGXAKsy8/GqaTdwQ0S8luYviI9V7R+n+VeCJKkL6kzpbACOtGwfAa5s038R8MPAvwE7I+Lnq9u/Ppvi1q1bM5vhi6bOOi29Yra1vnz6LEMrli1QNb1nrs9lr7wG6j5f579qdd88r73y2NbRT7VCvcAfmKJtvEb/cuBy4Pcy87ci4t3Ag8Cb6xY3Nnac8fFzdYcvikZjhKNH+2MFlrnU2miM1F6HZimYy3PZS6+B2TxfvVLzTHrpsW2nF2sdHByY8UC5zpTOYeCClu31wPM1+v8TOJaZX6jaR/nevwwkSYuoTuAfAK6OiEZErAauAx6Z6MzM54CTEbGparoZeDgzvwMcjoi3VO3XAt/sXOmS6ppYMrrdv5G1q7pdqhZQ2ymdzDwcEXcAB2melvlAZj4REfuB383MfwK2AfdHxAjwJLCr2v1twMci4h7gReAdC3EnJM3MJaMFNc/Dz8xRmlMyrW3XtNx+iimmazIzmcWcvSRp4XilrSQVwsCXpEIY+JJUCANfkgph4EtSIQx8SSqEgS9JhTDwJakQBr4kFcLAl6RCGPiSVAgDX5IKYeBLUiEMfEkqhIEvSYUw8CWpELW+AEXS4htZu4qVw75F1Tm+mqQetXJ4eduvJXzo3q2LVI2WAqd0JKkQBr4kFcLAl6RCOIcv6RUvnz5LozEy45iTp85w7MWXFqkidZKBL+kVQyuW1fqg+Ngi1aPOckpHkgph4EtSIQx8SSpErTn8iLgJ2AEMATsz875J/RuB+4HzgceA2zLzTEv/5cDjmTncqcIlSbPT9gg/Ii4E7gY2A5cB2yPi0knD9gDvy8w3AAPArS37rwb+jOYvC0lSl9SZ0tkCPJqZL2TmCWAvcP1EZ0RcAqzKzMerpt3ADS373wvs7Ey5kqS5qhP4G4AjLdtHgIvq9EfEW4HVmbl3nnVKkuapzhz+wBRt4+36I+ICmvP+W+ZSGMC6dWvmuuuCandhSi/pp1q7Ya6PT+mP60Le/356bPupVqgX+IeBq1q21wPPT+q/YIr+XwbWAY9FBAARcQi4KjNrXbcxNnac8fFzdYYumkZjhKNH++Oyk7nU2m8v4Pmay3O5WK+BXn4uFur+L/X310IbHByY8UC5TuAfAO6KiAZwArgO2D7RmZnPRcTJiNiUmf8I3Aw8nJkPAA9MjIuIc5m5cY73Q5I0T23n8DPzMHAHcBA4BIxm5hMRsT8irqiGbQN2RsSzwHnAroUqWJI0N7XOw8/MUWB0Uts1LbefAq5s8zOmmuuXJC0Sr7SVpEIY+JJUCANfkgrhevgqVp0v+4DOf+HHyNpVrBye+a138tSZGfuluTDwVaw6X/YBnf/Cj5XDy2t9yYjUaQZ+IeocVUpa2kyAQtQ5qgSPLKWlzA9tJakQBr4kFcLAl6RCGPiSVAgDX5IK4Vk6kmat7sVjnbxgTfNn4EuatboXj/XW14PIKR1JKoSBL0mFMPAlqRDO4UttTLWq5uTtidUtXQVTvczAl9qos6rmxBpEroKpXuaUjiQVwsCXpEIY+JJUCANfkgph4EtSIQx8SSqEgS9Jhah1Hn5E3ATsAIaAnZl536T+jcD9wPnAY8BtmXkmIjYBHwFWAGPAOzPzuQ7WL0mqqe0RfkRcCNwNbAYuA7ZHxKWThu0B3peZbwAGgFur9k8A78rMjdXtXZ0qXJI0O3WmdLYAj2bmC5l5AtgLXD/RGRGXAKsy8/GqaTdwQ0QMAzsy8+mq/Wng4o5VLkmalTpTOhuAIy3bR4Ar2/RflJmnaB75ExGDwF3A5+ZTrCRp7uoE/sAUbeN1+yNiCHiw+r8+NJvi1q1bM5vhi2bywlm9rJ9q1dIzl9dfP71m+6lWqBf4h4GrWrbXA89P6r9gqv6IWAN8nuYHtlsz8/RsihsbO874+LnZ7LLgGo0Rjh7tj+/xaa21316YWhpm+17p1/dXrxgcHJjxQLnOHP4B4OqIaETEauA64JGJzuqsm5PVGTkANwMPV7f3AN8G3l5N8UiSuqRt4GfmYeAO4CBwCBjNzCciYn9EXFEN2wbsjIhngfOAXRFxObAV2AQ8GRGHImL/gtwLSVJbtc7Dz8xRYHRS2zUtt5/iez/IBXiSqef3JUld4JW2klQIA1+SCmHgS1IhDHxJKoRfYi5pwYysXcXK4Zlj5uSpMxx78aVFqqhsBr6kBbNyeDnX3r5vxjEP3buV3rp8aeky8JeAmY6ivMJW0gQDfwmoexQlqWx+aCtJhTDwJakQBr4kFcLAl6RCGPiSVAgDX5IKYeBLUiEMfEkqhBdeSeq61qvFp7s63DV35s/Al9R1rrmzOJzSkaRCGPiSVAgDX5IKYeBLUiEMfEkqhGfp9LC6Xw8nSXUY+D3MLzaR1ElO6UhSIQx8SSpErSmdiLgJ2AEMATsz875J/RuB+4HzgceA2zLzTERcDOwBXgcksC0zj3ewfklSTW0DPyIuBO4G3gicAr4WEQcz85mWYXuAd2fm4xHxl8CtwJ8DHwU+mpmfiog7gTuBD3T6TvQbP4yVZu/l02enXWen1amXzzI8tKztuBLX5qlzhL8FeDQzXwCIiL3A9cAfVNuXAKsy8/Fq/G7g9yPiAeBngV9paf976gX+MoDBwYFad2KxzbeulcPLedcffXnGMX+54xcBeN2rV9X6mXXGdfJn9fq4Xq6t0+N6ubZOjhtasazt+waa7526407M873caxnVUs+Uv/EGzp07N+MPiIgPAudl5o5q+93AlZm5vdp+E3BPZm6utl8P7Ad+DvhGZl5UtS8HvpuZQzXq3gx8tcY4SdL3uwr4h8mNdY7wp/oVNl6jv91+M/kGzYKPAGdr7iNJpVsGrKeZod+nTuAfphm+E9YDz0/qv2CK/qPA2ohYlplnp9hvJqeY4reTJKmt70zXUee0zAPA1RHRiIjVwHXAIxOdmfkccDIiNlVNNwMPZ+ZpmtMyN7a2z6F4SVIHtA38zDwM3AEcBA4Bo5n5RETsj4grqmHbgJ0R8SxwHrCran8vsD0inqH5V8KOTt8BSVI9bT+0lSQtDV5pK0mFMPAlqRAGviQVwsCXpEK4Hv4cRcTlwOOZOdztWqZTnSr7EWAFMAa8szqNtqe0W5yvl0TE7wFvrza/mJm/08166oiIe4BGZt7S7VqmExHXAnfRPMvvS5n5m92taHoR8WvAB6vNhzPz/d2sZzY8wp+D6nqEP6MZUL3sE8C7MnNjdXtXm/GLrmVxvs3AZTRP4720u1VNLSK2AL8IXA5sBN4YEW/rblUzi4irgVu6XcdMIuKHgL8AtgI/DvxkRLylu1VNrXrv76K5dMxlwFXV66IvGPhzcy+ws9tFzCQihoEdmfl01fQ0cHEXS5rOK4vzZeYJYGJxvl50BLg9M1+uLix8lt58TAGIiNfQ/GX6oW7X0sbbgL/JzP+oHtcbga93uabpLKOZm+fR/Mt5BdA3S246pTNLEfFWYHVm7o2Ibpczrcw8RXPZaiJikOafy5/rZk3T2EAzSCccAa7sUi0zysx/nrgdET9CM5h+pnsVtfUxmhdN/kC3C2nj9cDLEfElmsu0PERzKfWek5nHqqXev0Uz6P8O+FpXi5oFA38aEXED338U/y1gLc2j0p4xXa2ZuSUihoAHaT7XvXikN59F9roiIn4M+CLw/sz8l27XM5VqVdt/z8yvRMQt3a6njeU0l1J/M3Ac2Ae8g+aS6j0lIn4CeCdwCfB/NA+q3g/c08266jLwp5GZnwE+09pWvYk+CDw2cXQfEYeAqzLz2KIXWZmqVoCIWAN8nuYHtlurP5d7TbvF+XpK9UH4Z4HfysxPdbueGdwIrK9en68B1kTEzsz87S7XNZX/BA5k5lGAiPgczb/ydnezqGn8EvCVzPwvgIjYTXMJGQN/qcnMB4AHJrYj4lz1gWiv2gN8G3hPZvbqGhoHgLsiogGcoLk43/buljS1iPgBmtNiN2bmo92uZyaZ+QsTt6sj/Df3aNgDfAF4MCJeBRwD3kJvTj8CPAV8OCLOA74LXMs0SxH3Ij+0XaKq00a3ApuAJyPiUETs73JZ32e6xfm6W9W03g+sBP60ejwPRcRt3S6q32Xm14EP01wS/RngOeDjXS1qGpn5ZeCTwDdpngixAvjjrhY1Cy6eJkmF8Ahfkgph4EtSIQx8SSqEgS9JhTDwJakQBr4kFcLAl6RCGPiSVIj/B2kl2CPiXnJXAAAAAElFTkSuQmCC\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.bar(x=x,height=pmf,align='edge',width=-0.43)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The leftmost peak came from the first sketch, with data centered around 0.0. The smaller, rightmost peak came from our second sketch, which had half as many samples and was centered around 4.0. The KLL sketch captures the shape of the combiend distribution."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}