Improving scale-up documentation for OpenWhisk deployment on Kubernetes (#552)

commit: 606e97a80074cd0160846a8be1cb0bf3c2d7b974 [log] [tgz]
author: Ali Raza Tariq <17100275@lums.edu.pk> Thu Dec 05 05:35:58 2019 -0800
committer: David Grove <dgrove-oss@users.noreply.github.com> Thu Dec 05 08:35:58 2019 -0500
tree: 3a49e3fdcc960fd0bb2c3d7353284eca058bfa8a
parent: bb2ed4281c6bbcbc514052d2d6a3cf6ec9dfed97 [diff]
diff --git a/README.md b/README.md
index 532f617..9ecfc82 100644
--- a/README.md
+++ b/README.md

@@ -306,6 +306,15 @@
 If your deployment is not working, check our
 [troubleshooting guide](./docs/troubleshooting.md) for ideas.
 
+## Scale-up your OpenWhisk Deployment
+
+Using defaults, your deployment is configured to provide a bare-minimum working platform for testing and exploration. For your specialized workloads, you can scale-up your openwhisk deployment by defining your deployment configurations in your `mycluster.yaml` which overrides the defaults in `helm/openwhisk/values.yaml`. Some important parameters to consider (for other parameters, check `helm/openwhisk/values.yaml` and [configurationChoices](./docs/configurationChoices.md)):
+* `actionsInvokesPerminute`: limits the maximum number of invocations per minute.
+* `actionsInvokesPerminute`: limits the maximum concurrent invocations.
+* `containerPool`: total memory available per `invoker` instance. `Invoker` uses this memory to create containers for user-actions. The concurrency-limit (actions running in parallel) will depend upon the total memory configured for `containerPool` and memory allocated per action (`default:` 256mb per container).
+
+For more information about increasing concurrency-limit, check [scaling-up your deployment](./docs/k8s-custom-build-cluster-scaleup.md).
+
 # Administering OpenWhisk
 
 [Wskadmin](https://github.com/apache/openwhisk/tree/master/tools/admin) is the tool to perform various administrative operations against an OpenWhisk deployment.

diff --git a/docs/k8s-custom-build-cluster-scaleup.md b/docs/k8s-custom-build-cluster-scaleup.md
new file mode 100644
index 0000000..e290371
--- /dev/null
+++ b/docs/k8s-custom-build-cluster-scaleup.md

@@ -0,0 +1,73 @@
+<!--
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+-->
+
+# Scaling-up OpenWhisk Deployment on custom-built-kubernetes cluster
+
+## Overview
+
+The default configurations of openwhisk deployment, support low concurrency-limit which can only be used for testing purposes. This document outlines how this concurrency-limit can be increased to scale-up openwhisk deployment for more practical use, on custom-built-kubernetes cluster. Also, provides information regarding some issues one might encounter while scaling-up.
+
+## Scale-up
+
+### Small Scale
+
+By default, openwhisk deployment is configured to provide a bare-minimum working platform for testing and exploration. For your specialized workloads, you can scale-up your openwhisk deployment by defining your deployment configurations in your `mycluster.yaml` which overrides the defaults in `helm/openwhisk/values.yaml`. Some important parameters to consider (for other parameters, check `helm/openwhisk/values.yaml` and [configurationChoices](./docs/configurationChoices.md)):
+* `actionsInvokesPerminute`: limits the maximum number of invocations per minute.
+* `actionsInvokesPerminute`: limits the maximum concurrent invocations.
+* `containerPool.userMemory`: total memory available per `invoker` instance. `Invoker` uses this memory to create containers for user-actions. The concurrency-limit (actions running in parallel) will depend upon the total memory configured for `containerPool` and memory allocated per action (`default:` 256mb per container).
+* `triggersFiresPerminute`: limits the maximum triggers invoked per minute.
+
+Modifying the above mentioned parameters, one can easily increase the concurrency-limit (`default:` 8) to `100` or `200` without affecting the runtime performance (may vary based on the running functions). To further increase the concurrency-limit, check `Large` scale-up below.
+
+### Large Scale
+
+In order to further increase the scale-up beyond `Small Scale`, one needs to modify the following additional configurations appropriately (on top of the above mentioned):
+* `invoker:jvmHeapMB`: jvmHeap memory available to each invoker instance. May or may not require increase based on running functions. For more information check `troubleshooting` below.
+* `invoker:containerFactory:_:replicaCount`: number of invoker instances that will be used to handle the incoming workload. By default, there is only one invoker instance which can become overwhelmed if workload goes beyond a certain threshold.
+* `controller:replicaCount`: number of controller instances that will be used to handle the incoming workload. Same as invoker instances.
+* `invoker:options`: Log processing at the invoker can become a bottleneck for the KubernetesContainerFactory. One might try disabling invoker log processing by setting it to `-Dwhisk.spi.LogStoreProvider=org.apache.openwhisk.core.containerpool.logging.LogDriverLogStoreProvider`. In general, one needs to offload log processing from the invoker to a node-level log store provider if one is trying to push a large load through the system.
+
+## Troubleshooting
+
+### Client-side
+
+On the client-side, the most frequently received error:
+```
+"error": "The server is currently unavailable (because it is overloaded or down for maintenance).
+```
+The above mentioned error occurs when controller is unable to find any healthy invoker instance to serve the incoming requests. To resolve this issue, one needs to debug the `Deployment-side` to figure-out the cause for unhealth invoker instances.
+
+### Deployment-side
+
+For debugging, one needs to identify the `invoker` and `controller` pods and check their logs for further details. Few known errors:
+```
+class io.fabric8.kubernetes.client.KubernetesClientTimeoutException - Timed out waiting for [0] milliseconds for [Pod] with name
+```
+The above error occurs when one has configured too large a `containerPool` to match the incoming workloads, without configuring the scale-up for the invoker instance(s) to keep up with the serving rate.
+
+```
+java.lang.OutOfMemoryError: Java heap space
+```
+The above error occurs when the configured `invoker:jvmHeapMB` memory is insufficient for the faced workload.
+
+#### error: only single invoker instance being used to handle all the workload
+
+OpenWhisk treats [blackbox (docker) actions](https://github.com/apache/openwhisk/blob/master/docs/actions-docker.md) differently when compared to regular actions. By default, OpenWhisk loadbalancer is configured to use only `10%` (only 1 invoker-instance if total invoker-instances are less than 10) of invoker instances for `blackbox` actions. This behavior can be configured by modifying `whisk.loadbalancer.blackbox-fraction` in `helm/openwhisk/values.yaml`.
+
+
commit	606e97a80074cd0160846a8be1cb0bf3c2d7b974	[log] [tgz]
author	Ali Raza Tariq <17100275@lums.edu.pk>	Thu Dec 05 05:35:58 2019 -0800
committer	David Grove <dgrove-oss@users.noreply.github.com>	Thu Dec 05 08:35:58 2019 -0500
tree	3a49e3fdcc960fd0bb2c3d7353284eca058bfa8a
parent	bb2ed4281c6bbcbc514052d2d6a3cf6ec9dfed97 [diff]