solr/solr-ref-guide/src/regression.adoc - lucene-solr - Git at Google

 = Linear Regression
 // Licensed to the Apache Software Foundation (ASF) under one
 // or more contributor license agreements.  See the NOTICE file
 // distributed with this work for additional information
 // regarding copyright ownership.  The ASF licenses this file
 // to you under the Apache License, Version 2.0 (the
 // "License"); you may not use this file except in compliance
 // with the License.  You may obtain a copy of the License at
 //
 //   http://www.apache.org/licenses/LICENSE-2.0
 //
 // Unless required by applicable law or agreed to in writing,
 // software distributed under the License is distributed on an
 // "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 // KIND, either express or implied.  See the License for the
 // specific language governing permissions and limitations
 // under the License.

 The math expressions library supports simple and multivariate linear regression.

 <<Simple Linear Regression, Simple Linear Regression>> -
 <<Regression Plot, Regression Plot>> -
 <<Residual Plot, Residual Plot>> -
 <<Multivariate Linear Regression, Multivariate Linear Regression>>

 == Simple Linear Regression

 The `regress` function is used to build a linear regression model
 between two random variables. Sample observations are provided with two
 numeric arrays. The first numeric array is the independent variable and
 the second array is the dependent variable.

 In the example below the `random` function selects 5000 random samples each containing
 the fields `filesize_d` and `response_d`. The two fields are vectorized
 and stored in variables *`b`* and *`c`*. Then the `regress` function performs a regression
 analysis on the two numeric arrays.

 The `regress` function returns a single tuple with the results of the regression
 analysis.

 [source,text]
 ----
 let(a=random(logs, q="*:*", rows="50000", fl="filesize_d, response_d"),
     x=col(a, filesize_d),
     y=col(a, response_d),
     r=regress(x, y))
 ----

 Note that in this regression analysis the value of `RSquared` is `.75`. This means that changes in
 `filesize_d` explain 75% of the variability of the `response_d` variable:

 [source,json]
 ----
 {
   "result-set": {
     "docs": [
       {
         "significance": 0,
         "totalSumSquares": 96595678.64838874,
         "R": 0.9052835767815126,
         "RSquared": 0.8195383543903288,
         "meanSquareError": 348.6502485633668,
         "intercept": 55.64040842391729,
         "slopeConfidenceInterval": 0.0000822026526346821,
         "regressionSumSquares": 79163863.52071753,
         "slope": 0.019984612363694493,
         "interceptStdErr": 1.6792610845256566,
         "N": 50000
       },
       {
         "EOF": true,
         "RESPONSE_TIME": 344
       }
     ]
   }
 }
 ----

 The diagnostics can be visualized in a table using Zeppelin-Solr.

 image::images/math-expressions/diagnostics.png[]


 === Prediction

 The `predict` function uses the regression model to make predictions.
 Using the example above the regression model can be used to predict the value
 of `response_d` given a value for `filesize_d`.

 In the example below the `predict` function uses the regression analysis to predict
 the value of `response_d` for the `filesize_d` value of `40000`.

 [source,text]
 ----
 let(a=random(logs, q="*:*", rows="5000", fl="filesize_d, response_d"),
     x=col(a, filesize_d),
     y=col(a, response_d),
     r=regress(x, y),
     p=predict(r, 40000))
 ----

 When this expression is sent to the `/stream` handler it responds with:

 [source,json]
 ----
 {
   "result-set": {
     "docs": [
       {
         "p": 748.079241022975
       },
       {
         "EOF": true,
         "RESPONSE_TIME": 95
       }
     ]
   }
 }
 ----

 The `predict` function can also make predictions for an array of values. In this
 case it returns an array of predictions.

 In the example below the `predict` function uses the regression analysis to
 predict values for each of the 5000 samples of `filesize_d` used to generate the model.
 In this case 5000 predictions are returned.

 [source,text]
 ----
 let(a=random(logs, q="*:*", rows="5000", fl="filesize_d, response_d"),
     x=col(a, filesize_d),
     y=col(a, response_d),
     r=regress(x, y),
     p=predict(r, x))
 ----

 When this expression is sent to the `/stream` handler it responds with:

 [source,text]
 ----
 {
   "result-set": {
     "docs": [
       {
         "p": [
           742.2525322514165,
           709.6972488729955,
           687.8382568904871,
           820.2511324266264,
           720.4006432289061,
           761.1578181053039,
           759.1304101159126,
           699.5597256337142,
           742.4738911248204,
           769.0342605881644,
           746.6740473150268,
           ...
           ]
       },
       {
         "EOF": true,
         "RESPONSE_TIME": 113
       }
     ]
   }
 }
 ----

 === Regression Plot

 Using *zplot* and the Zeppelin-Solr interpreter we can visualize both the observations and the predictions in
 the same scatter plot. In the example below zplot is plotting the filesize_d observations on the
 *x* axis, the response_d observations on the *y* access and the predictions on the *y1* access.

 image::images/math-expressions/linear.png[]

 === Residuals

 The difference between the observed value and the predicted value is known as the
 residual. There isn't a specific function to calculate the residuals but vector
 math can used to perform the calculation.

 In the example below the predictions are stored in variable *`p`*. The `ebeSubtract`
 function is then used to subtract the predictions
 from the actual `response_d` values stored in variable *`y`*. Variable *`e`* contains
 the array of residuals.

 [source,text]
 ----
 let(a=random(logs, q="*:*", rows="500", fl="filesize_d, response_d"),
     x=col(a, filesize_d),
     y=col(a, response_d),
     r=regress(x, y),
     p=predict(r, x),
     e=ebeSubtract(y, p))
 ----

 When this expression is sent to the `/stream` handler it responds with:

 [source,text]
 ----
 {
   "result-set": {
     "docs": [
       {
         "e": [
           31.30678554491226,
           -30.292830927953446,
           -30.49508862647258,
           -30.499884780783532,
           -9.696458959319784,
           -30.521563961535094,
           -30.28380938033081,
           -9.890289849359306,
           30.819723560583157,
           -30.213178859683012,
           -30.609943619066826,
           10.527700442607625,
           10.68046928406568,
           ...
           ]
       },
       {
         "EOF": true,
         "RESPONSE_TIME": 113
       }
     ]
   }
 }
 ----

 === Residual Plot

 Using *zplot* and Zeppelin-Solr we can visualize the residuals with
 a residuals plot. The example residual plot below plots the predicted value on the
 *x* axis and the error of the prediction on the *y* access.

 image::images/math-expressions/residual-plot.png[]

 The residual plot can be used to interpret reliability of the model. Three things
 to look for are:

 1) Do the residuals appear to be normally distributed with a mean of 0. This makes
 it easier to interpret the results of the model to determine if the distribution
 of the errors is acceptable for predictions. It also makes it easier to use a model
 of the residuals for anomaly detection on new predictions.

 2) Do the residuals appear to be *heteroscedastic*. Which means is the variance
 of the residuals the same across the range of predictions. By plotting the prediction
 on the *x* axis and error on *y* access we can see if the variability stays the same
 as the predictions get higher. If the residuals are heteroscedastic it means
 that we can trust the models error to be consistent across the range of predictions.

 3) Is there any pattern to the residuals? If so there is likely still a signal within the
 data that needs to be modeled.


 == Multivariate Linear Regression

 The `olsRegress` function performs a multivariate linear regression analysis. Multivariate linear
 regression models the linear relationship between two or more independent variables and a dependent variable.

 The example below extends the simple linear regression example by introducing a new independent variable
 called `load_d`. The `load_d` variable is the load on the network while the file is being downloaded.

 Notice that the two independent variables `filesize_d` and `load_d` are vectorized and stored
 in the variables *`b`* and *`c`*. The variables *`b`* and *`c`* are then added as rows to a `matrix`. The matrix is
 then transposed so that each row in the matrix represents one observation with `filesize_d` and `service_d`.
 The `olsRegress` function then performs the multivariate regression analysis using the observation matrix as the
 independent variables and the `response_d` values, stored in variable *`d`*, as the dependent variable.

 [source,text]
 ----
 let(a=random(testapp, q="*:*", rows="30000", fl="filesize_d, load_d, response_d"),
     x=col(a, filesize_d),
     y=col(a, load_d),
     z=col(a, response_d),
     m=transpose(matrix(x, y)),
     r=olsRegress(m, z))
 ----

 Notice in the response that the RSquared of the regression analysis is 1. This means that linear relationship between
 `filesize_d` and `service_d` describe 100% of the variability of the `response_d` variable:

 [source,json]
 ----
 {
   "result-set": {
     "docs": [
       {
         "regressionParametersStandardErrors": [
           1.7792032752524236,
           0.0000429945089590394,
           0.0008592489428291642
         ],
         "RSquared": 0.8850359458670845,
         "regressionParameters": [
           0.7318766882597804,
           0.01998298784650873,
           0.10982104952105468
         ],
         "regressandVariance": 1938.8190758686717,
         "regressionParametersVariance": [
           [
             0.014201127587649602,
             -3.326633951803927e-7,
             -0.000001732754417954437
           ],
           [
             -3.326633951803927e-7,
             8.292732891338694e-12,
             2.0407522508189773e-12
           ],
           [
             -0.000001732754417954437,
             2.0407522508189773e-12,
             3.3121477630934995e-9
           ]
         ],
         "adjustedRSquared": 0.8850282808303053,
         "residualSumSquares": 6686612.141261716
       },
       {
         "EOF": true,
         "RESPONSE_TIME": 374
       }
     ]
   }
 }
 ----

 === Prediction

 The `predict` function can also be used to make predictions for multivariate linear regression.

 Below is an example of a single prediction using the multivariate linear regression model and a single observation.
 The observation is an array that matches the structure of the observation matrix used to build the model. In this case
 the first value represents a `filesize_d` of `40000` and the second value represents a `load_d` of `4`.

 [source,text]
 ----
 let(a=random(logs, q="*:*", rows="5000", fl="filesize_d, load_d, response_d"),
     x=col(a, filesize_d),
     y=col(a, load_d),
     z=col(a, response_d),
     m=transpose(matrix(x, y)),
     r=olsRegress(m, z),
     p=predict(r, array(40000, 4)))
 ----

 When this expression is sent to the `/stream` handler it responds with:

 [source,json]
 ----
 {
   "result-set": {
     "docs": [
       {
         "p": 801.7725344814675
       },
       {
         "EOF": true,
         "RESPONSE_TIME": 70
       }
     ]
   }
 }
 ----

 The `predict` function can also make predictions for more than one multivariate observation. In this scenario
 an observation matrix used.

 In the example below the observation matrix used to build the multivariate regression model
 is passed to the `predict` function and it returns an array of predictions.

 [source,text]
 ----
 let(a=random(logs, q="*:*", rows="5000", fl="filesize_d, load_d, response_d"),
     x=col(a, filesize_d),
     y=col(a, load_d),
     z=col(a, response_d),
     m=transpose(matrix(x, y)),
     r=olsRegress(m, z),
     p=predict(r, m))
 ----

 When this expression is sent to the `/stream` handler it responds with:

 [source,text]
 ----
 {
   "result-set": {
     "docs": [
       {
         "p": [
           917.7122088913725,
           900.5418518783401,
           871.7805676516689,
           822.1887964840801,
           828.0842807117554,
           785.1262470470162,
           833.2583851225845,
           802.016811579941,
           841.5253327135974,
           896.9648275225625,
           858.6511235977382,
           869.8381475112501,
           ...
           ]
       },
       {
         "EOF": true,
         "RESPONSE_TIME": 113
       }
     ]
   }
 }
 ----

 === Residuals

 Once the predictions are generated the residuals can be calculated using the same approach used with
 simple linear regression.

 Below is an example of the residuals calculation following a multivariate linear regression. In the example
 the predictions stored variable *`g`* are subtracted from observed values stored in variable *`d`*.

 [source,text]
 ----
 let(a=random(logs, q="*:*", rows="5000", fl="filesize_d, load_d, response_d"),
     x=col(a, filesize_d),
     y=col(a, load_d),
     z=col(a, response_d),
     m=transpose(matrix(x, y)),
     r=olsRegress(m, z),
     p=predict(r, m),
     e=ebeSubtract(z, p))
 ----

 When this expression is sent to the `/stream` handler it responds with:

 [source,text]
 ----
 {
   "result-set": {
     "docs": [
       {
         "e": [
           21.452271655340496,
           9.647947283595727,
           -23.02328008866334,
           -13.533046479596806,
           -16.1531952414299,
           4.966514036315402,
           23.70151322413119,
           -4.276176642246014,
           10.781062392156628,
           0.00039750380267378205,
           -1.8307638852961645,
           ...
           ]
       },
       {
         "EOF": true,
         "RESPONSE_TIME": 113
       }
     ]
   }
 }
 ----

 === Residual Plot

 The residual plot for multi-variate linear regression is the same as for simple linear regression.
 The predictions are plotted on the *x* axis and the error is plotted on the *y* axis.

 image::images/math-expressions/residual-plot2.png[]

 The residual plot for multi-variate linear regression can be interpreted in the exact same way as simple linear regression.
	= Linear Regression
	// Licensed to the Apache Software Foundation (ASF) under one
	// or more contributor license agreements. See the NOTICE file
	// distributed with this work for additional information
	// regarding copyright ownership. The ASF licenses this file
	// to you under the Apache License, Version 2.0 (the
	// "License"); you may not use this file except in compliance
	// with the License. You may obtain a copy of the License at
	//
	// http://www.apache.org/licenses/LICENSE-2.0
	//
	// Unless required by applicable law or agreed to in writing,
	// software distributed under the License is distributed on an
	// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	// KIND, either express or implied. See the License for the
	// specific language governing permissions and limitations
	// under the License.

	The math expressions library supports simple and multivariate linear regression.

	<<Simple Linear Regression, Simple Linear Regression>> -
	<<Regression Plot, Regression Plot>> -
	<<Residual Plot, Residual Plot>> -
	<<Multivariate Linear Regression, Multivariate Linear Regression>>

	== Simple Linear Regression

	The `regress` function is used to build a linear regression model
	between two random variables. Sample observations are provided with two
	numeric arrays. The first numeric array is the independent variable and
	the second array is the dependent variable.

	In the example below the `random` function selects 5000 random samples each containing
	the fields `filesize_d` and `response_d`. The two fields are vectorized
	and stored in variables `b` and `c`. Then the `regress` function performs a regression
	analysis on the two numeric arrays.

	The `regress` function returns a single tuple with the results of the regression
	analysis.

	[source,text]
	----
	let(a=random(logs, q=":", rows="50000", fl="filesize_d, response_d"),
	x=col(a, filesize_d),
	y=col(a, response_d),
	r=regress(x, y))
	----

	Note that in this regression analysis the value of `RSquared` is `.75`. This means that changes in
	`filesize_d` explain 75% of the variability of the `response_d` variable:

	[source,json]
	----
	{
	"result-set": {
	"docs": [
	{
	"significance": 0,
	"totalSumSquares": 96595678.64838874,
	"R": 0.9052835767815126,
	"RSquared": 0.8195383543903288,
	"meanSquareError": 348.6502485633668,
	"intercept": 55.64040842391729,
	"slopeConfidenceInterval": 0.0000822026526346821,
	"regressionSumSquares": 79163863.52071753,
	"slope": 0.019984612363694493,
	"interceptStdErr": 1.6792610845256566,
	"N": 50000
	},
	{
	"EOF": true,
	"RESPONSE_TIME": 344
	}
	]
	}
	}
	----

	The diagnostics can be visualized in a table using Zeppelin-Solr.

	image::images/math-expressions/diagnostics.png[]


	=== Prediction

	The `predict` function uses the regression model to make predictions.
	Using the example above the regression model can be used to predict the value
	of `response_d` given a value for `filesize_d`.

	In the example below the `predict` function uses the regression analysis to predict
	the value of `response_d` for the `filesize_d` value of `40000`.

	[source,text]
	----
	let(a=random(logs, q=":", rows="5000", fl="filesize_d, response_d"),
	x=col(a, filesize_d),
	y=col(a, response_d),
	r=regress(x, y),
	p=predict(r, 40000))
	----

	When this expression is sent to the `/stream` handler it responds with:

	[source,json]
	----
	{
	"result-set": {
	"docs": [
	{
	"p": 748.079241022975
	},
	{
	"EOF": true,
	"RESPONSE_TIME": 95
	}
	]
	}
	}
	----

	The `predict` function can also make predictions for an array of values. In this
	case it returns an array of predictions.

	In the example below the `predict` function uses the regression analysis to
	predict values for each of the 5000 samples of `filesize_d` used to generate the model.
	In this case 5000 predictions are returned.

	[source,text]
	----
	let(a=random(logs, q=":", rows="5000", fl="filesize_d, response_d"),
	x=col(a, filesize_d),
	y=col(a, response_d),
	r=regress(x, y),
	p=predict(r, x))
	----

	When this expression is sent to the `/stream` handler it responds with:

	[source,text]
	----
	{
	"result-set": {
	"docs": [
	{
	"p": [
	742.2525322514165,
	709.6972488729955,
	687.8382568904871,
	820.2511324266264,
	720.4006432289061,
	761.1578181053039,
	759.1304101159126,
	699.5597256337142,
	742.4738911248204,
	769.0342605881644,
	746.6740473150268,
	...
	]
	},
	{
	"EOF": true,
	"RESPONSE_TIME": 113
	}
	]
	}
	}
	----

	=== Regression Plot

	Using zplot and the Zeppelin-Solr interpreter we can visualize both the observations and the predictions in
	the same scatter plot. In the example below zplot is plotting the filesize_d observations on the
	x axis, the response_d observations on the y access and the predictions on the y1 access.

	image::images/math-expressions/linear.png[]

	=== Residuals

	The difference between the observed value and the predicted value is known as the
	residual. There isn't a specific function to calculate the residuals but vector
	math can used to perform the calculation.

	In the example below the predictions are stored in variable `p`. The `ebeSubtract`
	function is then used to subtract the predictions
	from the actual `response_d` values stored in variable `y`. Variable `e` contains
	the array of residuals.

	[source,text]
	----
	let(a=random(logs, q=":", rows="500", fl="filesize_d, response_d"),
	x=col(a, filesize_d),
	y=col(a, response_d),
	r=regress(x, y),
	p=predict(r, x),
	e=ebeSubtract(y, p))
	----

	When this expression is sent to the `/stream` handler it responds with:

	[source,text]
	----
	{
	"result-set": {
	"docs": [
	{
	"e": [
	31.30678554491226,
	-30.292830927953446,
	-30.49508862647258,
	-30.499884780783532,
	-9.696458959319784,
	-30.521563961535094,
	-30.28380938033081,
	-9.890289849359306,
	30.819723560583157,
	-30.213178859683012,
	-30.609943619066826,
	10.527700442607625,
	10.68046928406568,
	...
	]
	},
	{
	"EOF": true,
	"RESPONSE_TIME": 113
	}
	]
	}
	}
	----

	=== Residual Plot

	Using zplot and Zeppelin-Solr we can visualize the residuals with
	a residuals plot. The example residual plot below plots the predicted value on the
	x axis and the error of the prediction on the y access.

	image::images/math-expressions/residual-plot.png[]

	The residual plot can be used to interpret reliability of the model. Three things
	to look for are:

	1) Do the residuals appear to be normally distributed with a mean of 0. This makes
	it easier to interpret the results of the model to determine if the distribution
	of the errors is acceptable for predictions. It also makes it easier to use a model
	of the residuals for anomaly detection on new predictions.

	2) Do the residuals appear to be heteroscedastic. Which means is the variance
	of the residuals the same across the range of predictions. By plotting the prediction
	on the x axis and error on y access we can see if the variability stays the same
	as the predictions get higher. If the residuals are heteroscedastic it means
	that we can trust the models error to be consistent across the range of predictions.

	3) Is there any pattern to the residuals? If so there is likely still a signal within the
	data that needs to be modeled.


	== Multivariate Linear Regression

	The `olsRegress` function performs a multivariate linear regression analysis. Multivariate linear
	regression models the linear relationship between two or more independent variables and a dependent variable.

	The example below extends the simple linear regression example by introducing a new independent variable
	called `load_d`. The `load_d` variable is the load on the network while the file is being downloaded.

	Notice that the two independent variables `filesize_d` and `load_d` are vectorized and stored
	in the variables `b` and `c`. The variables `b` and `c` are then added as rows to a `matrix`. The matrix is
	then transposed so that each row in the matrix represents one observation with `filesize_d` and `service_d`.
	The `olsRegress` function then performs the multivariate regression analysis using the observation matrix as the
	independent variables and the `response_d` values, stored in variable `d`, as the dependent variable.

	[source,text]
	----
	let(a=random(testapp, q=":", rows="30000", fl="filesize_d, load_d, response_d"),
	x=col(a, filesize_d),
	y=col(a, load_d),
	z=col(a, response_d),
	m=transpose(matrix(x, y)),
	r=olsRegress(m, z))
	----

	Notice in the response that the RSquared of the regression analysis is 1. This means that linear relationship between
	`filesize_d` and `service_d` describe 100% of the variability of the `response_d` variable:

	[source,json]
	----
	{
	"result-set": {
	"docs": [
	{
	"regressionParametersStandardErrors": [
	1.7792032752524236,
	0.0000429945089590394,
	0.0008592489428291642
	],
	"RSquared": 0.8850359458670845,
	"regressionParameters": [
	0.7318766882597804,
	0.01998298784650873,
	0.10982104952105468
	],
	"regressandVariance": 1938.8190758686717,
	"regressionParametersVariance": [
	[
	0.014201127587649602,
	-3.326633951803927e-7,
	-0.000001732754417954437
	],
	[
	-3.326633951803927e-7,
	8.292732891338694e-12,
	2.0407522508189773e-12
	],
	[
	-0.000001732754417954437,
	2.0407522508189773e-12,
	3.3121477630934995e-9
	]
	],
	"adjustedRSquared": 0.8850282808303053,
	"residualSumSquares": 6686612.141261716
	},
	{
	"EOF": true,
	"RESPONSE_TIME": 374
	}
	]
	}
	}
	----

	=== Prediction

	The `predict` function can also be used to make predictions for multivariate linear regression.

	Below is an example of a single prediction using the multivariate linear regression model and a single observation.
	The observation is an array that matches the structure of the observation matrix used to build the model. In this case
	the first value represents a `filesize_d` of `40000` and the second value represents a `load_d` of `4`.

	[source,text]
	----
	let(a=random(logs, q=":", rows="5000", fl="filesize_d, load_d, response_d"),
	x=col(a, filesize_d),
	y=col(a, load_d),
	z=col(a, response_d),
	m=transpose(matrix(x, y)),
	r=olsRegress(m, z),
	p=predict(r, array(40000, 4)))
	----

	When this expression is sent to the `/stream` handler it responds with:

	[source,json]
	----
	{
	"result-set": {
	"docs": [
	{
	"p": 801.7725344814675
	},
	{
	"EOF": true,
	"RESPONSE_TIME": 70
	}
	]
	}
	}
	----

	The `predict` function can also make predictions for more than one multivariate observation. In this scenario
	an observation matrix used.

	In the example below the observation matrix used to build the multivariate regression model
	is passed to the `predict` function and it returns an array of predictions.

	[source,text]
	----
	let(a=random(logs, q=":", rows="5000", fl="filesize_d, load_d, response_d"),
	x=col(a, filesize_d),
	y=col(a, load_d),
	z=col(a, response_d),
	m=transpose(matrix(x, y)),
	r=olsRegress(m, z),
	p=predict(r, m))
	----

	When this expression is sent to the `/stream` handler it responds with:

	[source,text]
	----
	{
	"result-set": {
	"docs": [
	{
	"p": [
	917.7122088913725,
	900.5418518783401,
	871.7805676516689,
	822.1887964840801,
	828.0842807117554,
	785.1262470470162,
	833.2583851225845,
	802.016811579941,
	841.5253327135974,
	896.9648275225625,
	858.6511235977382,
	869.8381475112501,
	...
	]
	},
	{
	"EOF": true,
	"RESPONSE_TIME": 113
	}
	]
	}
	}
	----

	=== Residuals

	Once the predictions are generated the residuals can be calculated using the same approach used with
	simple linear regression.

	Below is an example of the residuals calculation following a multivariate linear regression. In the example
	the predictions stored variable `g` are subtracted from observed values stored in variable `d`.

	[source,text]
	----
	let(a=random(logs, q=":", rows="5000", fl="filesize_d, load_d, response_d"),
	x=col(a, filesize_d),
	y=col(a, load_d),
	z=col(a, response_d),
	m=transpose(matrix(x, y)),
	r=olsRegress(m, z),
	p=predict(r, m),
	e=ebeSubtract(z, p))
	----

	When this expression is sent to the `/stream` handler it responds with:

	[source,text]
	----
	{
	"result-set": {
	"docs": [
	{
	"e": [
	21.452271655340496,
	9.647947283595727,
	-23.02328008866334,
	-13.533046479596806,
	-16.1531952414299,
	4.966514036315402,
	23.70151322413119,
	-4.276176642246014,
	10.781062392156628,
	0.00039750380267378205,
	-1.8307638852961645,
	...
	]
	},
	{
	"EOF": true,
	"RESPONSE_TIME": 113
	}
	]
	}
	}
	----

	=== Residual Plot

	The residual plot for multi-variate linear regression is the same as for simple linear regression.
	The predictions are plotted on the x axis and the error is plotted on the y axis.

	image::images/math-expressions/residual-plot2.png[]

	The residual plot for multi-variate linear regression can be interpreted in the exact same way as simple linear regression.