| = Monte Carlo Simulations |
| // Licensed to the Apache Software Foundation (ASF) under one |
| // or more contributor license agreements. See the NOTICE file |
| // distributed with this work for additional information |
| // regarding copyright ownership. The ASF licenses this file |
| // to you under the Apache License, Version 2.0 (the |
| // "License"); you may not use this file except in compliance |
| // with the License. You may obtain a copy of the License at |
| // |
| // http://www.apache.org/licenses/LICENSE-2.0 |
| // |
| // Unless required by applicable law or agreed to in writing, |
| // software distributed under the License is distributed on an |
| // "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| // KIND, either express or implied. See the License for the |
| // specific language governing permissions and limitations |
| // under the License. |
| |
| |
| Monte Carlo simulations are commonly used to model the behavior of |
| stochastic systems. This section describes |
| how to perform both uncorrelated and correlated Monte Carlo simulations |
| using the sampling capabilities of the probability distribution framework. |
| |
| == Uncorrelated Simulations |
| |
| Uncorrelated Monte Carlo simulations model stochastic systems with the assumption |
| that the underlying random variables move independently of each other. |
| A simple example of a Monte Carlo simulation using two independently changing random variables |
| is described below. |
| |
| In this example a Monte Carlo simulation is used to determine the probability that a simple hinge assembly will |
| fall within a required length specification. |
| |
| The hinge has two components A and B. The combined length of the two components must be less then 5 centimeters |
| to fall within specification. |
| |
| A random sampling of lengths for component A has shown that its length conforms to a |
| normal distribution with a mean of 2.2 centimeters and a standard deviation of .0195 |
| centimeters. |
| |
| A random sampling of lengths for component B has shown that its length conforms |
| to a normal distribution with a mean of 2.71 centimeters and a standard deviation of .0198 centimeters. |
| |
| [source,text] |
| ---- |
| let(componentA=normalDistribution(2.2, .0195), <1> |
| componentB=normalDistribution(2.71, .0198), <2> |
| simresults=monteCarlo(sampleA=sample(componentA), <3> |
| sampleB=sample(componentB), |
| add(sampleA, sampleB), <4> |
| 100000), <5> |
| simmodel=empiricalDistribution(simresults), <6> |
| prob=cumulativeProbability(simmodel, 5)) <7> |
| ---- |
| |
| The Monte Carlo simulation below performs the following steps: |
| |
| <1> A normal distribution with a mean of 2.2 and a standard deviation of .0195 is created to model the length of `componentA`. |
| <2> A normal distribution with a mean of 2.71 and a standard deviation of .0198 is created to model the length of `componentB`. |
| <3> The `monteCarlo` function samples from the `componentA` and `componentB` distributions and sets the values to variables `sampleA` and `sampleB`. |
| <4> It then calls the `add(sampleA, sampleB)`* function to find the combined lengths of the samples. |
| <5> The `monteCarlo` function runs a set number of times, 100000, and collects the results in an array. Each |
| time the function is called new samples are drawn from the `componentA` |
| and `componentB` distributions. On each run, the `add` function adds the two samples to calculate the combined length. |
| The result of each run is collected in an array and assigned to the `simresults` variable. |
| <6> An `empiricalDistribution` function is then created from the `simresults` array to model the distribution of the |
| simulation results. |
| <7> Finally, the `cumulativeProbability` function is called on the `simmodel` to determine the cumulative probability |
| that the combined length of the components is 5 or less. |
| |
| Based on the simulation there is .9994371944629039 probability that the combined length of a component pair will |
| be 5 or less: |
| |
| [source,json] |
| ---- |
| { |
| "result-set": { |
| "docs": [ |
| { |
| "prob": 0.9994371944629039 |
| }, |
| { |
| "EOF": true, |
| "RESPONSE_TIME": 660 |
| } |
| ] |
| } |
| } |
| ---- |
| |
| == Correlated Simulations |
| |
| The simulation above assumes that the lengths of `componentA` and `componentB` vary independently. |
| What would happen to the probability model if there was a correlation between the lengths of |
| `componentA` and `componentB`? |
| |
| In the example below a database containing assembled pairs of components is used to determine |
| if there is a correlation between the lengths of the components, and how the correlation effects the model. |
| |
| Before performing a simulation of the effects of correlation on the probability model its |
| useful to understand what the correlation is between the lengths of `componentA` and `componentB`. |
| |
| [source,text] |
| ---- |
| let(a=random(collection5, q="*:*", rows="5000", fl="componentA_d, componentB_d"), <1> |
| b=col(a, componentA_d)), <2> |
| c=col(a, componentB_d)), |
| d=corr(b, c)) <3> |
| ---- |
| |
| <1> In the example, 5000 random samples are selected from a collection of assembled hinges. |
| Each sample contains lengths of the components in the fields `componentA_d` and `componentB_d`. |
| <2> Both fields are then vectorized. The *componentA_d* vector is stored in |
| variable *`b`* and the *componentB_d* variable is stored in variable *`c`*. |
| <3> Then the correlation of the two vectors is calculated using the `corr` function. |
| |
| Note from the result that the outcome from `corr` is 0.9996931313216989. |
| This means that `componentA_d` and *`componentB_d` are almost perfectly correlated. |
| |
| [source,json] |
| ---- |
| { |
| "result-set": { |
| "docs": [ |
| { |
| "d": 0.9996931313216989 |
| }, |
| { |
| "EOF": true, |
| "RESPONSE_TIME": 309 |
| } |
| ] |
| } |
| } |
| ---- |
| |
| === Correlation Effects on the Probability Model |
| |
| The example below explores how to use a multivariate normal distribution function |
| to model how correlation effects the probability of hinge defects. |
| |
| In this example 5000 random samples are selected from a collection |
| containing length data for assembled hinges. Each sample contains |
| the fields `componentA_d` and `componentB_d`. |
| |
| Both fields are then vectorized. The `componentA_d` vector is stored in |
| variable *`b`* and the `componentB_d` variable is stored in variable *`c`*. |
| |
| An array is created that contains the means of the two vectorized fields. |
| |
| Then both vectors are added to a matrix which is transposed. This creates |
| an observation matrix where each row contains one observation of |
| `componentA_d` and `componentB_d`. A covariance matrix is then created from the columns of |
| the observation matrix with the |
| `cov` function. The covariance matrix describes the covariance between `componentA_d` and `componentB_d`. |
| |
| The `multivariateNormalDistribution` function is then called with the |
| array of means for the two fields and the covariance matrix. The model |
| for the multivariate normal distribution is stored in variable *`g`*. |
| |
| The `monteCarlo` function then calls the function `add(sample(g))` 50000 times |
| and collections the results in a vector. Each time the function is called a single sample |
| is drawn from the multivariate normal distribution. Each sample is a vector containing |
| one `componentA` and `componentB` pair. The `add` function adds the values in the vector to |
| calculate the length of the pair. Over the long term the samples drawn from the |
| multivariate normal distribution will conform to the covariance matrix used to construct it. |
| |
| Just as in the non-correlated example an empirical distribution is used to model probabilities |
| of the simulation vector and the `cumulativeProbability` function is used to compute the cumulative |
| probability that the combined component length will be 5 centimeters or less. |
| |
| Notice that the probability of a hinge meeting specification has dropped to 0.9889517439980468. |
| This is because the strong correlation |
| between the lengths of components means that their lengths rise together causing more hinges to |
| fall out of the 5 centimeter specification. |
| |
| [source,text] |
| ---- |
| let(a=random(hinges, q="*:*", rows="5000", fl="componentA_d, componentB_d"), |
| b=col(a, componentA_d), |
| c=col(a, componentB_d), |
| cor=corr(b,c), |
| d=array(mean(b), mean(c)), |
| e=transpose(matrix(b, c)), |
| f=cov(e), |
| g=multiVariateNormalDistribution(d, f), |
| h=monteCarlo(add(sample(g)), 50000), |
| i=empiricalDistribution(h), |
| j=cumulativeProbability(i, 5)) |
| ---- |
| |
| When this expression is sent to the `/stream` handler it responds with: |
| |
| [source,json] |
| ---- |
| { |
| "result-set": { |
| "docs": [ |
| { |
| "j": 0.9889517439980468 |
| }, |
| { |
| "EOF": true, |
| "RESPONSE_TIME": 599 |
| } |
| ] |
| } |
| } |
| ---- |