learning/prompts/documentation-lookup/38_ai_runinference.md - beam - Git at Google

 Prompt:
 How can I run inference on a trained ML model using Apache Beam?

 Response:
 Apache Beam enables efficient inference on both local and remote ML models within your pipelines through the RunInference API. This functionality is available in the Python SDK versions 2.40.0 and later. The Java SDK versions 2.41.0 and later also support the API through Apache Beam’s [Multi-language Pipelines](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines) framework. The `RunInference` transform performs inference on a [`PCollection`](https://beam.apache.org/documentation/programming-guide/#pcollections) of examples using an ML model and outputs a `PCollection` containing both the input examples and the corresponding output predictions.

 Key features of the RunInference API include:
 * Support for both batch and streaming inference.
 * Centralized model management for efficient memory and bandwidth usage.
 * Compatibility with multiple model frameworks and model hubs.
 * Automatic model refreshing to ensure the latest model version is used.
 * GPU support for model inference.

 The RunInference API supports a variety of frameworks and model hubs, including [TensorFlow](https://www.tensorflow.org/), [PyTorch](https://pytorch.org/), [Scikit-learn](https://scikit-learn.org/), [XGBoost](https://xgboost.ai/), [Hugging Face](https://huggingface.co/), [TensorFlow Hub](https://www.tensorflow.org/hub), [Vertex AI](https://cloud.google.com/vertex-ai), [TensorRT](https://developer.nvidia.com/tensorrt), and [ONNX](https://onnx.ai/). Additionally, you can easily integrate custom model frameworks.

 To import a model into your Apache Beam pipeline, you will need to configure the [`ModelHandler`](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler) object, which wraps the underlying model and allows you to set necessary environment variables for inference.

 Here is an example of importing a PyTorch model handler for use in your pipeline:

 ```python
 from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
 from apache_beam.ml.inference.base import RunInference

   model_handler = PytorchModelHandlerTensor(
     # Model handler setup
   )

 with pipeline as p:
     predictions = p |  'Read' >> beam.ReadFromSource('a_source')
                     |  'RunInference' >> RunInference(model_handler)
 ```

 For comprehensive end-to-end examples of inference with supported model frameworks and model hubs, refer to the [Apache Beam GitHub repository](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference).

 If you need to run inference on a model that isn't explicitly supported, you can [create your own `ModelHandler` or `KeyedModelHandler`](https://beam.apache.org/documentation/ml/about-ml/#use-custom-models with custom logic] to load and use your model. For an example of running inference on a custom model loaded with [spaCy](https://spacy.io/), refer to the [Bring your own ML model to Beam RunInference](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_custom_inference.ipynb) example in the Apache Beam GitHub repository.

 For recommended patterns and best practices when leveraging Apache Beam for inference tasks, see the [RunInference Patterns](https://beam.apache.org/documentation/ml/about-ml/#runinference-patterns) section in the official documentation.

 For an example of using the RunInference API in the Java SDK, see the [example multi-language pipelines](https://github.com/apache/beam/tree/master/examples/multi-language) in the Apache Beam GitHub repository. Additionally, for an illustration of a composite Python transform integrating the RunInference API with preprocessing and postprocessing from a Beam Java SDK pipeline, you can refer to the [Using RunInference from Java SDK](https://beam.apache.org/documentation/ml/multi-language-inference/) section in the official documentation.
	Prompt:
	How can I run inference on a trained ML model using Apache Beam?

	Response:
	Apache Beam enables efficient inference on both local and remote ML models within your pipelines through the RunInference API. This functionality is available in the Python SDK versions 2.40.0 and later. The Java SDK versions 2.41.0 and later also support the API through Apache Beam’s [Multi-language Pipelines](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines) framework. The `RunInference` transform performs inference on a [`PCollection`](https://beam.apache.org/documentation/programming-guide/#pcollections) of examples using an ML model and outputs a `PCollection` containing both the input examples and the corresponding output predictions.

	Key features of the RunInference API include:
	* Support for both batch and streaming inference.
	* Centralized model management for efficient memory and bandwidth usage.
	* Compatibility with multiple model frameworks and model hubs.
	* Automatic model refreshing to ensure the latest model version is used.
	* GPU support for model inference.

	The RunInference API supports a variety of frameworks and model hubs, including [TensorFlow](https://www.tensorflow.org/), [PyTorch](https://pytorch.org/), [Scikit-learn](https://scikit-learn.org/), [XGBoost](https://xgboost.ai/), [Hugging Face](https://huggingface.co/), [TensorFlow Hub](https://www.tensorflow.org/hub), [Vertex AI](https://cloud.google.com/vertex-ai), [TensorRT](https://developer.nvidia.com/tensorrt), and [ONNX](https://onnx.ai/). Additionally, you can easily integrate custom model frameworks.

	To import a model into your Apache Beam pipeline, you will need to configure the [`ModelHandler`](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler) object, which wraps the underlying model and allows you to set necessary environment variables for inference.

	Here is an example of importing a PyTorch model handler for use in your pipeline:

	```python
	from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor
	from apache_beam.ml.inference.base import RunInference

	model_handler = PytorchModelHandlerTensor(
	# Model handler setup
	)

	with pipeline as p:
	predictions = p \| 'Read' >> beam.ReadFromSource('a_source')
	\| 'RunInference' >> RunInference(model_handler)
	```

	For comprehensive end-to-end examples of inference with supported model frameworks and model hubs, refer to the [Apache Beam GitHub repository](https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/inference).

	If you need to run inference on a model that isn't explicitly supported, you can [create your own `ModelHandler` or `KeyedModelHandler`](https://beam.apache.org/documentation/ml/about-ml/#use-custom-models with custom logic] to load and use your model. For an example of running inference on a custom model loaded with [spaCy](https://spacy.io/), refer to the [Bring your own ML model to Beam RunInference](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_custom_inference.ipynb) example in the Apache Beam GitHub repository.

	For recommended patterns and best practices when leveraging Apache Beam for inference tasks, see the [RunInference Patterns](https://beam.apache.org/documentation/ml/about-ml/#runinference-patterns) section in the official documentation.

	For an example of using the RunInference API in the Java SDK, see the [example multi-language pipelines](https://github.com/apache/beam/tree/master/examples/multi-language) in the Apache Beam GitHub repository. Additionally, for an illustration of a composite Python transform integrating the RunInference API with preprocessing and postprocessing from a Beam Java SDK pipeline, you can refer to the [Using RunInference from Java SDK](https://beam.apache.org/documentation/ml/multi-language-inference/) section in the official documentation.