tree: b93dacb3ea0888a7489cd67261c35143f6599047 [path history] [tgz]

tests/performance/README.md

:electric_plug: Apache OpenWhisk - Performance Tests

A few simple but efficient test suites for determining the maximum throughput and end-user latency of the Apache OpenWhisk system.

Workflow

A standard OpenWhisk system is deployed. (Note that the API Gateway is currently left out for the tests.)
All limits are set to 999999, which in our current use case means “No throttling at all”.
The deployment is using the docker setup proposed by the OpenWhisk development team: overlay driver and HTTP API enabled via a UNIX port.

The load is driven by the blazingly fast wrk.

Travis Machine Setup

The machine provided by Travis has ~2 CPU cores (likely shared through virtualization) and 7.5GB memory.

Suites

wrk

Latency Test

Determines the end-to-end latency a user experience when doing a blocking invocation. The action used is a no-op so the numbers returned are the plain overhead of the OpenWhisk system. The requests are directly against the controller.

1 HTTP request at a time (concurrency: 1)
You can specify how long this test will run. Default are 30s.
no-op action

Note: The throughput number has a 100% correlation with the latency in this case. This test does not serve to determine the maximum throughput of the system.

Throughput Test

Determines the maximum throughput a user can get out of the system while using a single action. The action used is a no-op, so the numbers are plain OpenWhisk overhead. Note that the throughput does not directly correlate to end-to-end latency here, as the system does more processing in the background as it shows to the user in a blocking invocation. The requests are directly against the controller.

4 HTTP requests at a time (concurrency: 4) (using CPU cores * 2 to exploit some buffering)
10.000 samples with a single user
no-op action

Running tests against your own system is simple too!

All you have to do is use the corresponding script located in /*_tests folder, remembering that the parameters are defined inline.

gatling

Simulations

You can specify two thresholds for the simulations. The reason is, that Gatling is able to handle each assertion as a JUnit test. On using CI/CD pipelines (e.g. Jenkins) you will be able to set a threshold on an amount of failed testcases to mark the build as stable, unstable and failed.

ApiV1Simulation

This Simulation calls the api/v1. You can specify the endpoint, the amount of connections against the backend and the duration of this burst.

The test is doing as many requests as possible for the given amount of time (SECONDS). Afterwards it compares if the test reached the intended throughput (REQUESTS_PER_SEC, MIN_REQUESTS_PER_SEC).

Available environment variables:

OPENWHISK_HOST                (required)
CONNECTIONS                   (required)
SECONDS                       (default: 10)
REQUESTS_PER_SEC              (required)
MIN_REQUESTS_PER_SEC          (default: REQUESTS_PER_SEC)
MAX_ERRORS_ALLOWED            (default: 0)
MAX_ERRORS_ALLOWED_PERCENTAGE (default: 0)

You can run the simulation with (in OPENWHISK_HOME)

OPENWHISK_HOST="openwhisk.mydomain.com" CONNECTIONS="10" REQUESTS_PER_SEC="50" ./gradlew gatlingRun-org.apache.openwhisk.ApiV1Simulation

Latency Simulation

This simulation creates actions of the following four kinds: nodejs:default, swift:default, java:default and python:default. Afterwards the action is invoked once. This is the cold-start and will not be part of the thresholds. Next, the action will be invoked 100 times blocking and one after each other. Between each invoke is a pause of PAUSE_BETWEEN_INVOKES milliseconds. The last step is to delete the action.

Once one language is finished, the next kind will be taken. They are not running in parallel. There are never more than 1 activations in the system, as we only want to meassure latency of warm activations. As all actions are invoked blocking and only one action is in the system, it doesn't matter how many controllers and invokers are deployed. If several controllers or invokers are deployed, all controllers send the activation always to the same invoker.

The comparison of the thresholds is against the mean response times of the warm activations.

Available environment variables:

OPENWHISK_HOST                (required)
API_KEY                       (required, format: UUID:KEY)
PAUSE_BETWEEN_INVOKES         (default: 0)
MEAN_RESPONSE_TIME            (required)
MAX_MEAN_RESPONSE_TIME        (default: MEAN_RESPONSE_TIME)
EXCLUDED_KINDS                (default: "", format: "python:default,java:default,swift:default")
MAX_ERRORS_ALLOWED            (default: 0)
MAX_ERRORS_ALLOWED_PERCENTAGE (default: 0)

It is possible to override the MEAN_RESPONSE_TIME, MAX_MEAN_RESPONSE_TIME, MAX_ERRORS_ALLOWED and MAX_ERRORS_ALLOWED_PERCENTAGE for each kind by adding the kind as prefix in upper case, like JAVA_MEAN_RESPONSE_TIME.

You can run the simulation with (in OPENWHISK_HOME)

OPENWHISK_HOST="openwhisk.mydomain.com" MEAN_RESPONSE_TIME="20" API_KEY="UUID:KEY" ./gradlew gatlingRun-org.apache.openwhisk.LatencySimulation

BlockingInvokeOneActionSimulation

This simulation executes the same action with the same user over and over again. The aim of this test is, to test the throughput of the system, if all containers are always warm.

The action that is invoked, writes one log line and returns a little JSON.

The simulations creates the action in the beginning, invokes it as often as possible for 5 seconds, to warm all containers up and invokes it afterwards for the given amount of time. The warmup-phase will not be part of the assertions.

To run the test, you can specify the amount of concurrent requests. Keep in mind, that the actions are invoked blocking and the system is limited to AMOUNT_OF_INVOKERS * SLOTS_PER_INVOKER * NON_BLACKBOX_INVOKER_RATIO concurrent actions/requests.

Available environment variables:

OPENWHISK_HOST                (required)
API_KEY                       (required, format: UUID:KEY)
CONNECTIONS                   (required)
SECONDS                       (default: 10)
REQUESTS_PER_SEC              (required)
MIN_REQUESTS_PER_SEC          (default: REQUESTS_PER_SEC)
MAX_ERRORS_ALLOWED            (default: 0)
MAX_ERRORS_ALLOWED_PERCENTAGE (default: 0)

You can run the simulation with

OPENWHISK_HOST="openwhisk.mydomain.com" CONNECTIONS="10" REQUESTS_PER_SEC="50" API_KEY="UUID:KEY" ./gradlew gatlingRun-org.apache.openwhisk.BlockingInvokeOneActionSimulation

ColdBlockingInvokeSimulation

This simulation makes as much cold invocations as possible. Therefore, you have to specify how many users should be used. This amount of users is executing actions in parallel. I recommend using the same amount of users like your amount of node-js action slots in your invokers.

The users, that are used are loaded from the file gatling_tests/src/gatling/resources/data/users.csv. If you want to increase the number of parallel users, you have to specify at least this amount of valid users in that file.

Each user creates n actions (default is 5). Afterwards all users are executing their actions in parallel. But each user is rotating it‘s action. That’s how the cold starts are enforced.

The aim of the test is, to test the throughput of the system, if all containers are always cold.

The action that is invoked, writes one log line and returns a little JSON.

Available environment variables:

OPENWHISK_HOST                (required)
USERS                         (required)
SECONDS                       (default: 10)
REQUESTS_PER_SEC              (required)
MIN_REQUESTS_PER_SEC          (default: REQUESTS_PER_SEC)
MAX_ERRORS_ALLOWED            (default: 0)
MAX_ERRORS_ALLOWED_PERCENTAGE (default: 0)

You can run the simulation with

OPENWHISK_HOST="openwhisk.mydomain.com" USERS="10" REQUESTS_PER_SEC="50" ./gradlew gatlingRun-org.apache.openwhisk.ColdBlockingInvokeSimulation