RunInference Benchmarks
This module contains benchmarks used to test the performance of the RunInference transform running inference with common models and frameworks. Each benchmark is explained in detail below. Beam's performance over time can be viewed at http://s.apache.org/beam-community-metrics/d/ZpS8Uf44z/python-ml-runinference-benchmarks?orgId=1
Pytorch RunInference Image Classification 50K
The Pytorch RunInference Image Classification 50K benchmark runs an example image classification pipeline using various different resnet image classification models (the benchmarks on Beam's dashboard display resnet101 and resnet152) against 50,000 example images from the OpenImage dataset. The benchmarks produce the following metrics:
- Mean Inference Requested Batch Size - the average batch size that RunInference groups the images into for batch prediction
- Mean Inference Batch Latency - the average amount of time it takes to perform inference on a given batch of images
- Mean Load Model Latency - the average amount of time it takes to load a model. This is done once per DoFn instance on worker startup, so the cost is amortized across the pipeline.
Approximate size of the models used in the tests
- resnet101: 170.5 MB
- resnet152: 230.4 MB
The above tests are configured to run using following configurations
- machine_type: n1-standard-2
- num_workers: 75
- autoscaling_algorithm: NONE
- disk_size_gb: 50
Pytorch RunInference Language Modeling
The Pytorch RunInference Language Modeling benchmark runs an example language modeling pipeline using the Bert large uncased and Bert base uncased models and a dataset of 50,000 manually generated sentences. The benchmarks produce the following metrics:
- Mean Inference Requested Batch Size - the average batch size that RunInference groups the images into for batch prediction
- Mean Inference Batch Latency - the average amount of time it takes to perform inference on a given batch of images
- Mean Load Model Latency - the average amount of time it takes to load a model. This is done once per DoFn instance on worker startup, so the cost is amortized across the pipeline.
Approximate size of the models used in the tests
- bert-base-uncased: 417.7 MB
- bert-large-uncased: 1.2 GB
The above tests are configured to run using following configurations
- machine_type: n1-standard-2
- num_workers: 250
- autoscaling_algorithm: NONE
- disk_size_gb: 75