Prompt: What are the AI and ML capabilities in Apache Beam?

Response: Apache Beam has several built-in AI and ML capabilities that enable you to:

  • Process large datasets for both preprocessing and model inference.
  • Conduct exploratory data analysis and smoothly scale up data pipelines in production as part of your MLOps ecosystem.
  • Run your models in production with varying data loads, both in batch and streaming pipelines.

For common AI platform integration patterns in Apache Beam, see the ‘AI Platform integration patterns’ section in the Apache Beam documentation.

The recommended way to implement inference in Apache Beam is by using the RunInference API. For more information about how to use RunInference for PyTorch, scikit-learn, and TensorFlow, see the ‘Use RunInference in Apache Beam’ example in the Apache Beam GitHub repository.

Using pre-trained models in Apache Beam is also supported with PyTorch, Scikit-learn, and TensorFlow. Running inference on custom models is also supported.

Apache Beam also supports automatically updating the model being used with the RunInference PTransform in streaming pipelines without stopping the pipeline. The feature lets you avoid downtime downtime. For more information, see ‘Automatic model refresh’ on the Apache Beam website.

For more information about using machine learning models with Apache Beam, read the blog post ‘Running ML models now easier with new Dataflow ML innovations on Apache Beam’.