blob: dc3af58bccf73ab7f25451f992db1786b2d403d4 [file] [log] [blame]
= Interpolation, Derivatives and Integrals
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
This section explores the interrelated math expressions for interpolation and numerical calculus.
== Interpolation
Interpolation is used to construct new data points between a set of known control of points.
The ability to predict new data points allows for sampling along the curve defined by the
control points.
The interpolation functions described below all return an _interpolation function_
that can be passed to other functions which make use of the sampling capability.
If returned directly the interpolation function returns an array containing predictions for each of the
control points. This is useful in the case of `loess` interpolation which first smooths the control points
and then interpolates the smoothed points. All other interpolation functions simply return the original
control points because interpolation predicts a curve that passes through the original control points.
There are different algorithms for interpolation that will result in different predictions
along the curve. The math expressions library currently supports the following
interpolation functions:
* `lerp`: Linear interpolation predicts points that pass through each control point and
form straight lines between control points.
* `spline`: Spline interpolation predicts points that pass through each control point
and form a smooth curve between control points.
* `akima`: Akima spline interpolation is similar to spline interpolation but is stable to outliers.
* `loess`: Loess interpolation first performs a non-linear local regression to smooth the original
control points. Then a spline is used to interpolate the smoothed control points.
=== Sampling Along the Curve
One way to better understand interpolation is to visualize what it means to sample along a curve. The example
below zooms in on a specific region of a curve by sampling the curve between a specific x-axis range.
image::images/math-expressions/interpolate1.png[]
The visualization above first creates two arrays with x and y-axis points. Notice that the x-axis ranges from
0 to 9. Then the `akima`, `spline` and `lerp`
functions are applied to the vectors to create three interpolation functions.
Then 500 hundred random samples are drawn from a uniform distribution between 0 and 3. These are
the new zoomed in x-axis points, between 0 and 3. Notice that we are sampling a specific
area of the curve.
Then the `predict` function is used to predict y-axis points for
the sampled x-axis, for all three interpolation functions. Finally all three prediction vectors
are plotted with the sampled x-axis points.
The red line is the `lerp` interpolation, the blue line is the `akima` and the purple line is
the `spline` interpolation. You can see they each produce different curves in between the control
points.
=== Smoothing Interpolation
The `loess` function is a smoothing interpolator which means it doesn't derive
a function that passes through the original control points. Instead the `loess` function
returns a function that smooths the original control points.
A technique known as local regression is used to compute the smoothed curve. The size of the
neighborhood of the local regression can be adjusted
to control how close the new curve conforms to the original control points.
The `loess` function is passed x- and y-axes and fits a smooth curve to the data.
If only a single array is provided it is treated as the y-axis and a sequence is generated
for the x-axis.
The example below shows the `loess` function being used to model a monthly
time series. In the example the `timeseries` function is used to generate
a monthly time series of average closing prices for the stock ticker
*AMZN*. The `date_dt` and `avg(close_d)` fields from the time series
are then vectorized and stored in variables `x` and `y`. The `loess`
function is then applied to the *y* vector containing the average closing
prices. The `bandwidth` named parameter specifies the percentage
of the data set used to compute the local regression. The `loess` function
returns the fitted model of smoothed data points.
The `zplot` function is then used to plot the `x`, `y` and `y1`
variables.
image::images/math-expressions/loess.png[]
== Derivatives
The derivative of a function measures the rate of change of the `y` value in respects to the
rate of change of the `x` value.
The `derivative` function can compute the derivative for any of the
interpolation functions described above. Each interpolation function
will produce different derivatives that match the characteristics
of the function.
=== The First Derivative (Velocity)
A simple example shows how the `derivative` function is used to calculate
the rate of change or *velocity*.
In the example two vectors are created, one representing hours and
one representing miles traveled. The `lerp` function is then used to
create a linear interpolation of the `hours` and `miles` vectors.
The `derivative` function is then applied to the
linear interpolation. `zplot` is then used to plot the *`hours`*
on the x-axis and `miles` on the y-axis, and the `derivative` as `mph`,
at each x-axis point.
image::images/math-expressions/derivative.png[]
Notice that the *miles_traveled* line has a slope of 10 until the
5th hour where
it changes to a slope of 50. The *mph* line, which is
the derivative, visualizes the *velocity* of the
*miles_traveled* line.
Also notice that the derivative is calculated along
straight lines showing immediate change from one point to the next. This
is because linear interpolation (`lerp`) is used as the interpolation
function. If the `spline` or `akima` functions had been used it would have produced
a derivative with rounded curves.
=== The Second Derivative (Acceleration)
While the first derivative represents velocity, the second derivative
represents `acceleration`. The second the derivative is the derivative
of the first derivative.
The example below builds on the first example and adds the second derivative.
Notice that the second derivative `d2` is taken by applying the
derivative function to a linear interpolation of the first derivative.
The second derivative is plotted as *acceleration* on the chart.
image::images/math-expressions/derivatives.png[]
Notice that the acceleration line is 0 until the *mph* line increases from 10 to 50. At this
point the *acceleration* line moves to 40. As the *mph* line stays at 50, the acceleration
line drops to 0.
=== Price Velocity
The example below shows how to plot the `derivative` for a time series generated
by the `timeseries` function. In the example a monthly time series is
generated for the average closing price for the stock ticker `amzn`.
The `avg(close)` column is vectorized and interpolated using linear
interpolation (`lerp`). The `zplot` function is then used to plot the derivative
of the time series.
image::images/math-expressions/derivative2.png[]
Notice that the derivative plot clearly shows the rates of change in the stock price over time.
== Integrals
An integral is a measure of the volume underneath a curve.
The `integral` function computes the cumulative integrals for a curve or the integral for a specific
range of an interpolated curve. Like the `derivative` function the `integral` function operates
over interpolation functions.
=== Single Integral
If the `integral` function is passed a *start* and *end* range it will compute the volume under the
curve within that specific range.
In the example below the `integral` function computes an
integral for the entire range of the curve, 0 through 10. Notice that the `integral` function is passed
the interpolated curve and the start and end range, and returns the integral for the range.
[source,text]
----
let(x=array(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20),
y=array(0, 1, 2, 3, 4, 5.7, 6, 7, 7, 7,6, 7, 7, 7, 6, 5, 5, 3, 2, 1, 0),
curve=loess(x, y, bandwidth=.3),
integral=integral(curve, 0, 10))
----
When this expression is sent to the `/stream` handler it
responds with:
[source,json]
----
{
"result-set": {
"docs": [
{
"integral": 45.300912584519914
},
{
"EOF": true,
"RESPONSE_TIME": 0
}
]
}
}
----
=== Cumulative Integral Plot
If the `integral` function is passed a single interpolated curve it returns a vector of the cumulative
integrals for the curve. The cumulative integrals vector contains a cumulative integral calculation
for each x-axis point. The cumulative integral is calculated by taking the
integral of the range between each x-axis point and the *first* x-axis point. In the example above this would
mean calculating a vector of integrals as such:
[source,text]
----
let(x=array(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20),
y=array(0, 1, 2, 3, 4, 5.7, 6, 7, 7, 7,6, 7, 7, 7, 6, 5, 5, 3, 2, 1, 0),
curve=loess(x, y, bandwidth=.3),
integrals=array(0, integral(curve, 0, 1), integral(curve, 0, 2), integral(curve, 0, 3), ...)
----
The plot of cumulative integrals visualizes how much cumulative volume of the curve is under each point
x-axis point.
The example below shows the cumulative integral plot for a time series generated by
the `timeseries` function. In the example a monthly time series is
generated for the average closing price for the stock ticker `amzn`.
The `avg(close)` column is vectorized and interpolated using a `spline`.
The `zplot` function is then used to plot the cumulative integral
of the time series.
image::images/math-expressions/integral.png[]
The plot above visualizes the volume under the curve as the *AMZN* stock
price changes over time. Because this plot is cumulative, the volume under
a stock price time series which stays the same over time, will
have a positive *linear* slope. A stock that has rising prices will have a *concave* shape and
a stock with falling prices will have a *convex* shape.
In this particular example the integral plot becomes more *concave* over time
showing accelerating increases in stock price.
== Bicubic Spline
The `bicubicSpline` function can be used to interpolate and predict values
anywhere within a grid of data.
A simple example will make this more clear:
[source,text]
----
let(years=array(1998, 2000, 2002, 2004, 2006),
floors=array(1, 5, 9, 13, 17, 19),
prices = matrix(array(300000, 320000, 330000, 350000, 360000, 370000),
array(320000, 330000, 340000, 350000, 365000, 380000),
array(400000, 410000, 415000, 425000, 430000, 440000),
array(410000, 420000, 425000, 435000, 445000, 450000),
array(420000, 430000, 435000, 445000, 450000, 470000)),
bspline=bicubicSpline(years, floors, prices),
prediction=predict(bspline, 2003, 8))
----
In this example a bicubic spline is used to interpolate a matrix of real estate data.
Each row of the matrix represent specific `years`. Each column of the matrix
represents `floors` of the building. The grid of numbers is the average selling price of
an apartment for each year and floor. For example in 2002 the average selling price for
the 9th floor was `415000` (row 3, column 3).
The `bicubicSpline` function is then used to
interpolate the grid, and the `predict` function is used to predict a value for year 2003, floor 8.
Notice that the matrix does not include a data point for year 2003, floor 8. The `bicubicSpline`
function creates that data point based on the surrounding data in the matrix:
[source,json]
----
{
"result-set": {
"docs": [
{
"prediction": 418279.5009328358
},
{
"EOF": true,
"RESPONSE_TIME": 0
}
]
}
}
----