blob: 50d7ac63a7128cd22a55e482b0b8d9188f5c769d [file] [log] [blame]
= Computational Geometry
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
This section of the math expressions user guide covers computational geometry functions.
== Convex Hull
A convex hull is the smallest convex set of points that encloses a data set. Math expressions has support for computing
the convex hull of a 2D data set. Once a convex hull has been calculated, a set of math expression functions
can be applied to geometrically describe and visualize the convex hull.
=== Visualization
The `convexHull` function can be used to visualize a border around a
set of 2D points. Border visualizations can be useful for understanding where data points are
in relation to the border.
In the examples below the `convexHull` function is used
to visualize a border for a set of latitude and longitude points of rat sightings in the NYC311
complaints database. An investigation of the border around the rat sightings can be done
to better understand how rats may be entering or exiting the specific region.
==== Scatter Plot
Before visualizing the convex hull its often useful to visualize the 2D points as a scatter plot.
In this example the `random` function draws a sample of records from the NYC311 (complaints database) collection where
the complaint description matches "rat sighting" and the zip code is 11238. The latitude and longitude fields
are then vectorized and plotted as a scatter plot with longitude on x-axis and latitude on the
y-axis.
image::images/math-expressions/convex0.png[]
Notice from the scatter plot that many of the points appear to lie near the border of the plot.
==== Convex Hull Plot
The `convexHull` function can be used to visualize the border. The example uses the same points
drawn from the NYC311 database. But instead of plotting the points directly the latitude and
longitude points are added as rows to a matrix. The matrix is then transposed with `transpose`
function so that each row of the matrix contains a single latitude and longitude point.
The `convexHull` function is then used calculate the convex hull for the matrix of points.
The convex hull is set a variable called `hull`.
Once the convex hull has been created the `getVertices` function can be used to
retrieve the matrix of points in the scatter plot that comprise the convex border around the scatter plot.
The `colAt` function can then be used to retrieve the latitude and longitude vectors from the matrix
so they can visualized by the `zplot` function. In the example below the convex hull points are
visualized as a scatter plot.
image::images/math-expressions/hullplot.png[]
Notice that the 15 points in the scatter plot describe that latitude and longitude points of the
convex hull.
==== Projecting and Clustering
The once a convex hull as been calculated the `projectToBorder` can then be used to project
points to the nearest point on the border. In the example below the `projectToBorder` function
is used to project the original scatter scatter plot points to the nearest border.
The `projectToBorder` function returns a matrix of lat/lon points for the border projections. In
the example the matrix of border points is then clustered into 7 clusters using kmeans clustering.
The `zplot` function is then used to plot the clustered border points.
image::images/math-expressions/convex1.png[]
Notice in the visualization its easy to see which spots along the border have the highest
density of points. In the case or the rat sightings this information is useful in understanding
which border points are closest for the rats to enter or exit from.
==== Plotting the Centroids
Once the border points have been clustered its very easy to extract the centroids of the clusters
and plot them on a map. The example below extracts the centroids from the clusters using the
`getCentroids` function. `getCentroids` returns the matrix of lat/lon points which represent
the centroids of border clusters. The `colAt` function can then be used to extract the lat/lon
vectors so they can be plotted on a map using `zplot`.
image::images/math-expressions/convex2.png[]
The map above shows the centroids of the border clusters. The centroids from the highest
density clusters can now be zoomed and investigated geo-spatially to determine what might be
the best places to begin an investigation of the border.
== Enclosing Disk
The `enclosingDisk` function finds the smallest enclosing circle the encloses a 2D data set.
Once an enclosing disk has been calculated, a set of math expression functions
can be applied to geometrically describe the enclosing disk.
In the example below an enclosing disk is calculated for a randomly generated set of 1000 2D observations.
Then the following functions are called on the enclosing disk:
* `getCenter`: Returns the 2D point that is the center of the disk.
* `getRadius`: Returns the radius of the disk.
* `getSupportPoints`: Returns the support points of the disk.
[source,text]
----
let(echo="center, radius, support",
x=sample(normalDistribution(0, 20), 1000),
y=sample(normalDistribution(0, 20), 1000),
observations=transpose(matrix(x,y)),
disk=enclosingDisk(observations),
center=getCenter(disk),
radius=getRadius(disk),
support=getSupportPoints(disk))
----
When this expression is sent to the `/stream` handler it responds with:
[source,json]
----
{
"result-set": {
"docs": [
{
"center": [
-6.668825009733749,
-2.9825450908240025
],
"radius": 72.66109546907208,
"support": [
[
20.350992271739464,
64.46791279377014
],
[
33.02079953093981,
57.880978456420365
],
[
-44.7273247899923,
-64.87911518353323
]
]
},
{
"EOF": true,
"RESPONSE_TIME": 8
}
]
}
}
----