This module contains benchmarks used to test the performance of the RunInference transform running inference with common models and frameworks. Each benchmark is explained in detail below. Beam's performance over time can be viewed at http://s.apache.org/beam-community-metrics/d/ZpS8Uf44z/python-ml-runinference-benchmarks?orgId=1
The Pytorch RunInference Image Classification 50K benchmark runs an example image classification pipeline using various different resnet image classification models (the benchmarks on Beam's dashboard display resnet101 and resnet152) against 50,000 example images from the OpenImage dataset. The benchmarks produce the following metrics:
These metrics are published to InfluxDB and BigQuery.
Pytorch Image Classification with Resnet 101.
Pytorch Image Classification with Resnet 152.
Pytorch Imagenet Classification with Resnet 152 with Tesla T4 GPU.
Approximate size of the models used in the tests
The Pytorch RunInference Language Modeling benchmark runs an example language modeling pipeline using the Bert large uncased and Bert base uncased models and a dataset of 50,000 manually generated sentences. The benchmarks produce the following metrics:
These metrics are published to InfluxDB and BigQuery.
Pytorch Langauge Modeling using Hugging Face bert-base-uncased model.
Pytorch Langauge Modeling using Hugging Face bert-large-uncased model.
Approximate size of the models used in the tests
All the performance tests are defined at job_InferenceBenchmarkTests_Python.groovy.