Prompt: How can I integrate BigQuery ML with Apache Beam?

Response: BigQuery ML allows you to develop and execute ML models using GoogleSQL queries. You can seamlessly integrate these models into your Apache Beam pipeline using TFX Basic Shared Libraries (tfx_bsl) and perform inference on them using the RunInference API.

For an example of training a basic BigQuery ML model, refer to the BigQuery documentation.

Once you have trained your model, you will need to export it. Here is an example BigQuery command to export a model to a Google Cloud Storage bucket:

bq extract -m <model name> gs://<cloud storage path>

To incorporate your BigQuery ML model into an Apache Beam pipeline using tfx_bsl, it must be saved in the TensorFlow SavedModel format. You will need to download the model to your local directory to perform local predictions:

import apache_beam
import tensorflow as tf
from google.protobuf import text_format
from tensorflow.python.framework import tensor_util
from tfx_bsl.beam import run_inference
from tfx_bsl.public.beam import RunInference
from tfx_bsl.public.proto import model_spec_pb2

inputs = tf.train.Example(...)

model_path = <path to the directory where the model is stored>

def extract_prediction(response):
    # Extract the prediction from the response depending on the signature of the model

with beam.Pipeline() as p:
    res = (
        p
        | beam.Create([inputs])
        | RunInference(
            model_spec_pb2.InferenceSpecType(
                saved_model_spec=model_spec_pb2.SavedModelSpec(
                    model_path=model_path,
                    signature_name=['serving_default'])))
        | beam.ParDo(extract_prediction)
    )

This example uses the RunInference transform from the tfx_bsl library, directing it to the local directory where the model is stored. The transform takes tf.train.Example type elements as inputs and produces tensorflow_serving.apis.prediction_log_pb2.PredictionLog type elements as outputs.

Depending on the signature of your model, you can extract values from the output. For the prediction output format of exported models for each model type, refer to the ‘Prediction output format’ section in the BigQuery documentation.