blob: b4e3584616c109228e5b9286189db76f6934f205 [file] [log] [blame]
= Interpolation, Derivatives and Integrals
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
Interpolation, derivatives and integrals are three interrelated topics which are part of the field of mathematics called numerical analysis. This section explores the math expressions available for numerical anlysis.
== Interpolation
Interpolation is used to construct new data points between a set of known control of points.
The ability to predict new data points allows for sampling along the curve defined by the
control points.
The interpolation functions described below all return an _interpolation model_
that can be passed to other functions which make use of the sampling capability.
If returned directly the interpolation model returns an array containing predictions for each of the
control points. This is useful in the case of `loess` interpolation which first smooths the control points
and then interpolates the smoothed points. All other interpolation functions simply return the original
control points because interpolation predicts a curve that passes through the original control points.
There are different algorithms for interpolation that will result in different predictions
along the curve. The math expressions library currently supports the following
interpolation functions:
* `lerp`: Linear interpolation predicts points that pass through each control point and
form straight lines between control points.
* `spline`: Spline interpolation predicts points that pass through each control point
and form a smooth curve between control points.
* `akima`: Akima spline interpolation is similar to spline interpolation but is stable to outliers.
* `loess`: Loess interpolation first performs a non-linear local regression to smooth the original
control points. Then a spline is used to interpolate the smoothed control points.
=== Upsampling
Interpolation can be used to increase the sampling rate along a curve. One example
of this would be to take a time series with samples every minute and create a data set with
samples every second. In order to do this the data points between the minutes must be created.
The `predict` function can be used to predict values anywhere within the bounds of the interpolation
range. The example below shows a very simple example of upsampling.
[source,text]
----
let(x=array(0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20), <1>
y=array(5, 10, 60, 190, 100, 130, 100, 20, 30, 10, 5), <2>
l=lerp(x, y), <3>
u=array(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20), <4>
p=predict(l, u)) <5>
----
<1> In the example linear interpolation is performed on the arrays in variables *`x`* and *`y`*. The *`x`* variable,
which is the x-axis, is a sequence from 0 to 20 with a stride of 2.
<2> The *`y`* variable defines the curve along the x-axis.
<3> The `lerp` function performs the interpolation and returns the interpolation model.
<4> The `u` value is an array from 0 to 20 with a stride of 1. This fills in the gaps of the original x axis.
The `predict` function then uses the interpolation function in variable *`l`* to predict values for
every point in the array assigned to variable *`u`*.
<5> The variable *`p`* is the array of predictions, which is the upsampled set of *`y`* values.
When this expression is sent to the `/stream` handler it responds with:
[source,json]
----
{
"result-set": {
"docs": [
{
"g": [
5,
7.5,
10,
35,
60,
125,
190,
145,
100,
115,
130,
115,
100,
60,
20,
25,
30,
20,
10,
7.5,
5
]
},
{
"EOF": true,
"RESPONSE_TIME": 0
}
]
}
}
----
=== Smoothing Interpolation
The `loess` function is a smoothing interpolator which means it doesn't derive
a function that passes through the original control points. Instead the `loess` function
returns a function that smooths the original control points.
A technique known as local regression is used to compute the smoothed curve. The size of the
neighborhood of the local regression can be adjusted
to control how close the new curve conforms to the original control points.
The `loess` function is passed *`x`*- and *`y`*-axes and fits a smooth curve to the data.
If only a single array is provided it is treated as the *`y`*-axis and a sequence is generated
for the *`x`*-axis.
The example below uses the `loess` function to fit a curve to a set of *`y`* values in an array.
The `bandwidth` parameter defines the percent of data to use for the local
regression. The lower the percent the smaller the neighborhood used for the local
regression and the closer the curve will be to the original data.
[source,text]
----
let(echo="residuals, sumSqError",
y=array(0, 1, 2, 3, 4, 5.7, 6, 7, 7, 7,6, 7, 7, 7, 6, 5, 5, 3, 2, 1, 0),
curve=loess(y, bandwidth=.3),
residuals=ebeSubtract(y, curve),
sumSqError=sumSq(residuals))
----
In the example the fitted curve is subtracted from the original curve using the
`ebeSubtract` function. The output shows the error between the
fitted curve and the original curve, known as the residuals. The output also includes
the sum-of-squares of the residuals which provides a measure
of how large the error is:
[source,json]
----
{
"result-set": {
"docs": [
{
"residuals": [
0,
0,
0,
-0.040524802275866634,
-0.10531988096456502,
0.5906115002526198,
0.004215074334896762,
0.4201374330912433,
0.09618315578013803,
0.012107948556718817,
-0.9892939034492398,
0.012014364143757561,
0.1093830927709325,
0.523166271893805,
0.09658362075164639,
-0.011433819306139625,
0.9899403519886416,
-0.011707983372932773,
-0.004223284004140737,
-0.00021462867928434548,
0.0018723112875456138
],
"sumSqError": 2.8016013870800616
},
{
"EOF": true,
"RESPONSE_TIME": 0
}
]
}
}
----
In the next example the curve is fit using a `bandwidth` of `.25`:
[source,text]
----
let(echo="residuals, sumSqError",
y=array(0, 1, 2, 3, 4, 5.7, 6, 7, 6, 5, 5, 3, 2, 1, 0),
curve=loess(y, .25),
residuals=ebeSubtract(y, curve),
sumSqError=sumSq(residuals))
----
Notice that the curve is a closer fit, shown by the smaller `residuals` and lower value for the sum-of-squares of the
residuals:
[source,json]
----
{
"result-set": {
"docs": [
{
"residuals": [
0,
0,
0,
0,
-0.19117650587715396,
0.442863451538809,
-0.18553845993358564,
0.29990769020356645,
0,
0.23761890236245709,
-0.7344358765888117,
0.2376189023624491,
0,
0.30373119215254984,
-3.552713678800501e-15,
-0.23761890236245264,
0.7344358765888046,
-0.2376189023625095,
0,
2.842170943040401e-14,
-2.4868995751603507e-14
],
"sumSqError": 1.7539413576337557
},
{
"EOF": true,
"RESPONSE_TIME": 0
}
]
}
}
----
== Derivatives
The derivative of a function measures the rate of change of the *`y`* value in respects to the
rate of change of the *`x`* value.
The `derivative` function can compute the derivative of any interpolation function.
It can also compute the derivative of a derivative.
The example below computes the derivative for a `loess` interpolation function.
[source,text]
----
let(x=array(0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20),
y=array(0, 1, 2, 3, 4, 5.7, 6, 7, 7, 7,6, 7, 7, 7, 6, 5, 5, 3, 2, 1, 0),
curve=loess(x, y, bandwidth=.3),
derivative=derivative(curve))
----
When this expression is sent to the `/stream` handler it
responds with:
[source,json]
----
{
"result-set": {
"docs": [
{
"derivative": [
1.0022002675659012,
0.9955994648681976,
1.0154018729613081,
1.0643674501141696,
1.0430879694757085,
0.9698717643975381,
0.7488201070357539,
0.44627000894357516,
0.19019561285422165,
0.01703599324311178,
-0.001908408138535126,
-0.009121607450087499,
-0.2576361507216319,
-0.49378951291352746,
-0.7288073815664,
-0.9871806872210384,
-1.0025400632604322,
-1.001836567536853,
-1.0076227586138085,
-1.0021524620888589,
-1.0020541789058157
]
},
{
"EOF": true,
"RESPONSE_TIME": 0
}
]
}
}
----
== Integrals
An integral is a measure of the volume underneath a curve.
The `integrate` function computes an integral for a specific
range of an interpolated curve.
In the example below the `integrate` function computes an
integral for the entire range of the curve, 0 through 20.
[source,text]
----
let(x=array(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20),
y=array(0, 1, 2, 3, 4, 5.7, 6, 7, 7, 7,6, 7, 7, 7, 6, 5, 5, 3, 2, 1, 0),
curve=loess(x, y, bandwidth=.3),
integral=integrate(curve, 0, 20))
----
When this expression is sent to the `/stream` handler it
responds with:
[source,json]
----
{
"result-set": {
"docs": [
{
"integral": 90.17446104846645
},
{
"EOF": true,
"RESPONSE_TIME": 0
}
]
}
}
----
In the next example an integral is computed for the range of 0 through 10.
[source,text]
----
let(x=array(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20),
y=array(0, 1, 2, 3, 4, 5.7, 6, 7, 7, 7,6, 7, 7, 7, 6, 5, 5, 3, 2, 1, 0),
curve=loess(x, y, bandwidth=.3),
integral=integrate(curve, 0, 10))
----
When this expression is sent to the `/stream` handler it
responds with:
[source,json]
----
{
"result-set": {
"docs": [
{
"integral": 45.300912584519914
},
{
"EOF": true,
"RESPONSE_TIME": 0
}
]
}
}
----
== Bicubic Spline
The `bicubicSpline` function can be used to interpolate and predict values
anywhere within a grid of data.
A simple example will make this more clear:
[source,text]
----
let(years=array(1998, 2000, 2002, 2004, 2006),
floors=array(1, 5, 9, 13, 17, 19),
prices = matrix(array(300000, 320000, 330000, 350000, 360000, 370000),
array(320000, 330000, 340000, 350000, 365000, 380000),
array(400000, 410000, 415000, 425000, 430000, 440000),
array(410000, 420000, 425000, 435000, 445000, 450000),
array(420000, 430000, 435000, 445000, 450000, 470000)),
bspline=bicubicSpline(years, floors, prices),
prediction=predict(bspline, 2003, 8))
----
In this example a bicubic spline is used to interpolate a matrix of real estate data.
Each row of the matrix represent specific `years`. Each column of the matrix
represents `floors` of the building. The grid of numbers is the average selling price of
an apartment for each year and floor. For example in 2002 the average selling price for
the 9th floor was `415000` (row 3, column 3).
The `bicubicSpline` function is then used to
interpolate the grid, and the `predict` function is used to predict a value for year 2003, floor 8.
Notice that the matrix does not include a data point for year 2003, floor 8. The `bicupicSpline`
function creates that data point based on the surrounding data in the matrix:
[source,json]
----
{
"result-set": {
"docs": [
{
"prediction": 418279.5009328358
},
{
"EOF": true,
"RESPONSE_TIME": 0
}
]
}
}
----