examples/notebooks/beam-ml/run_inference_generative_ai.ipynb - beam - Git at Google

 {
   "nbformat": 4,
   "nbformat_minor": 0,
   "metadata": {
     "colab": {
       "provenance": []
     },
     "kernelspec": {
       "name": "python3",
       "display_name": "Python 3"
     },
     "language_info": {
       "name": "python"
     }
   },
   "cells": [
     {
       "cell_type": "code",
       "source": [
         "# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
         "\n",
         "# Licensed to the Apache Software Foundation (ASF) under one\n",
         "# or more contributor license agreements. See the NOTICE file\n",
         "# distributed with this work for additional information\n",
         "# regarding copyright ownership. The ASF licenses this file\n",
         "# to you under the Apache License, Version 2.0 (the\n",
         "# \"License\"); you may not use this file except in compliance\n",
         "# with the License. You may obtain a copy of the License at\n",
         "#\n",
         "#   http://www.apache.org/licenses/LICENSE-2.0\n",
         "#\n",
         "# Unless required by applicable law or agreed to in writing,\n",
         "# software distributed under the License is distributed on an\n",
         "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
         "# KIND, either express or implied. See the License for the\n",
         "# specific language governing permissions and limitations\n",
         "# under the License"
       ],
       "metadata": {
         "cellView": "form",
         "id": "IVkpU8HZ1eyz"
       },
       "execution_count": 1,
       "outputs": []
     },
     {
       "cell_type": "markdown",
       "source": [
         "# Use RunInference for Generative AI\n",
         "\n",
         "<table align=\"left\">\n",
         "  <td>\n",
         "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_generative_ai.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\" />Run in Google Colab</a>\n",
         "  </td>\n",
         "  <td>\n",
         "    <a target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_generative_ai.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\" />View source on GitHub</a>\n",
         "  </td>\n",
         "</table>\n"
       ],
       "metadata": {
         "id": "kH8SORNim8on"
       }
     },
     {
       "cell_type": "markdown",
       "source": [
         "This notebook shows how to use the Apache Beam [RunInference](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.RunInference) transform for generative AI tasks. It uses a large language model (LLM) from the [Hugging Face Model Hub](https://huggingface.co/models).\n",
         "\n",
         "This notebook demonstrates the following steps:\n",
         "- Load and save a model from the Hugging Face Model Hub.\n",
         "- Use the PyTorch model handler for RunInference.\n",
         "\n",
         "For more information about using RunInference, see [Get started with AI/ML pipelines](https://beam.apache.org/documentation/ml/overview/) in the Apache Beam documentation."
       ],
       "metadata": {
         "id": "7N2XzwoA0k4L"
       }
     },
     {
       "cell_type": "markdown",
       "source": [
         "## Install the Apache Beam SDK and dependencies\n",
         "\n",
         "Use the following code to install the Apache Beam Python SDK, PyTorch, and Transformers."
       ],
       "metadata": {
         "id": "nhf_lOeEsO1C"
       }
     },
     {
       "cell_type": "code",
       "execution_count": null,
       "metadata": {
         "id": "wS9a3Y0oZ_l5"
       },
       "outputs": [],
       "source": [
         "!pip install apache_beam[interactive,gcp]==2.48.0\n",
         "!pip install torch\n",
         "!pip install transformers"
       ]
     },
     {
       "cell_type": "markdown",
       "source": [
         "Use the following code to import dependencies\n",
         "\n",
         "**Important**: If an error occurs, restart your runtime."
       ],
       "metadata": {
         "id": "I7vMsFGW16bZ"
       }
     },
     {
       "cell_type": "code",
       "source": [
         "import os\n",
         "import apache_beam as beam\n",
         "from apache_beam.options.pipeline_options import PipelineOptions\n",
         "from apache_beam.ml.inference.base import PredictionResult\n",
         "from apache_beam.ml.inference.base import RunInference\n",
         "from apache_beam.ml.inference.pytorch_inference import make_tensor_model_fn\n",
         "from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor\n",
         "import torch\n",
         "from transformers import AutoConfig\n",
         "from transformers import AutoModelForSeq2SeqLM\n",
         "from transformers import AutoTokenizer\n",
         "from transformers.tokenization_utils import PreTrainedTokenizer\n",
         "\n",
         "\n",
         "MAX_RESPONSE_TOKENS = 256\n",
         "\n",
         "model_name = \"google/flan-t5-small\"\n",
         "state_dict_path = \"saved_model\""
       ],
       "metadata": {
         "id": "uhbOYUzvbOSc"
       },
       "execution_count": 3,
       "outputs": []
     },
     {
       "cell_type": "markdown",
       "source": [
         "## Download and save the model\n",
         "This notebook uses the [auto classes](https://huggingface.co/docs/transformers/model_doc/auto) from Hugging Face to instantly load the model in memory. Later, the model is saved to the path defined previously."
       ],
       "metadata": {
         "id": "yRls3LmxswrC"
       }
     },
     {
       "cell_type": "code",
       "source": [
         "model = AutoModelForSeq2SeqLM.from_pretrained(\n",
         "        model_name, torch_dtype=torch.bfloat16\n",
         "    )\n",
         "\n",
         "directory = os.path.dirname(state_dict_path)\n",
         "torch.save(model.state_dict(), state_dict_path)"
       ],
       "metadata": {
         "id": "PKhkiQFJe44n"
       },
       "execution_count": null,
       "outputs": []
     },
     {
       "cell_type": "markdown",
       "source": [
         "## Define utility functions\n",
         "The input and output for the [`google/flan-t5-small`](https://huggingface.co/google/flan-t5-small) model are token tensors. These utility functions are used for the conversion of text to token tensors and then back to text.\n"
       ],
       "metadata": {
         "id": "7TSqb3l1s7F7"
       }
     },
     {
       "cell_type": "code",
       "source": [
         "def to_tensors(input_text: str, tokenizer) -> torch.Tensor:\n",
         "    \"\"\"Encodes input text into token tensors.\n",
         "    Args:\n",
         "        input_text: Input text for the LLM model.\n",
         "        tokenizer: Tokenizer for the LLM model.\n",
         "    Returns: Tokenized input tokens.\n",
         "    \"\"\"\n",
         "    return tokenizer(input_text, return_tensors=\"pt\").input_ids[0]\n",
         "\n",
         "\n",
         "def from_tensors(result: PredictionResult, tokenizer) -> str:\n",
         "    \"\"\"Decodes output token tensors into text.\n",
         "    Args:\n",
         "        result: Prediction results from the RunInference transform.\n",
         "        tokenizer: Tokenizer for the LLM model.\n",
         "    Returns: The model's response as text.\n",
         "    \"\"\"\n",
         "    output_tokens = result.inference\n",
         "    return tokenizer.decode(output_tokens, skip_special_tokens=True)"
       ],
       "metadata": {
         "id": "OeTMbaLidnBe"
       },
       "execution_count": 5,
       "outputs": []
     },
     {
       "cell_type": "code",
       "source": [
         "# Load the tokenizer.\n",
         "tokenizer = AutoTokenizer.from_pretrained(model_name)\n",
         "\n",
         "# Create an instance of the PyTorch model handler.\n",
         "model_handler = PytorchModelHandlerTensor(\n",
         "            state_dict_path=state_dict_path,\n",
         "            model_class=AutoModelForSeq2SeqLM.from_config,\n",
         "            model_params={\"config\": AutoConfig.from_pretrained(model_name)},\n",
         "            inference_fn=make_tensor_model_fn(\"generate\"),\n",
         "            )"
       ],
       "metadata": {
         "id": "77mOPlQR5Mev"
       },
       "execution_count": null,
       "outputs": []
     },
     {
       "cell_type": "markdown",
       "source": [
         "## Run the Pipeline\n"
       ],
       "metadata": {
         "id": "y7IU3cYKtA3e"
       }
     },
     {
       "cell_type": "code",
       "source": [
         "example = [\"translate English to Spanish: We are in New York City.\"]\n",
         "\n",
         "pipeline = beam.Pipeline(options=PipelineOptions(save_main_session=True,pickle_library=\"cloudpickle\"))\n",
         "\n",
         "with pipeline as p:\n",
         "  _ = (\n",
         "          p\n",
         "          | \"Create Examples\" >> beam.Create(example)\n",
         "          | \"To tensors\" >> beam.Map(to_tensors, tokenizer)\n",
         "          | \"RunInference\"\n",
         "            >> RunInference(\n",
         "                model_handler,\n",
         "                inference_args={\"max_new_tokens\": MAX_RESPONSE_TOKENS},\n",
         "            )\n",
         "          | \"From tensors\" >> beam.Map(from_tensors, tokenizer)\n",
         "          | \"Print\" >> beam.Map(print)\n",
         "      )"
       ],
       "metadata": {
         "colab": {
           "base_uri": "https://localhost:8080/",
           "height": 106
         },
         "id": "JUBrJEDYcjRG",
         "outputId": "e7e7f0a2-dc0d-427d-bdf8-047739bb3160"
       },
       "execution_count": 7,
       "outputs": [
         {
           "output_type": "display_data",
           "data": {
             "application/javascript": [
               "\n",
               "        if (typeof window.interactive_beam_jquery == 'undefined') {\n",
               "          var jqueryScript = document.createElement('script');\n",
               "          jqueryScript.src = 'https://code.jquery.com/jquery-3.4.1.slim.min.js';\n",
               "          jqueryScript.type = 'text/javascript';\n",
               "          jqueryScript.onload = function() {\n",
               "            var datatableScript = document.createElement('script');\n",
               "            datatableScript.src = 'https://cdn.datatables.net/1.10.20/js/jquery.dataTables.min.js';\n",
               "            datatableScript.type = 'text/javascript';\n",
               "            datatableScript.onload = function() {\n",
               "              window.interactive_beam_jquery = jQuery.noConflict(true);\n",
               "              window.interactive_beam_jquery(document).ready(function($){\n",
               "                \n",
               "              });\n",
               "            }\n",
               "            document.head.appendChild(datatableScript);\n",
               "          };\n",
               "          document.head.appendChild(jqueryScript);\n",
               "        } else {\n",
               "          window.interactive_beam_jquery(document).ready(function($){\n",
               "            \n",
               "          });\n",
               "        }"
             ]
           },
           "metadata": {}
         },
         {
           "output_type": "stream",
           "name": "stdout",
           "text": [
             "Estamos en Nueva York City.\n"
           ]
         }
       ]
     }
   ]
 }
	{
	"nbformat": 4,
	"nbformat_minor": 0,
	"metadata": {
	"colab": {
	"provenance": []
	},
	"kernelspec": {
	"name": "python3",
	"display_name": "Python 3"
	},
	"language_info": {
	"name": "python"
	}
	},
	"cells": [
	{
	"cell_type": "code",
	"source": [
	"# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
	"\n",
	"# Licensed to the Apache Software Foundation (ASF) under one\n",
	"# or more contributor license agreements. See the NOTICE file\n",
	"# distributed with this work for additional information\n",
	"# regarding copyright ownership. The ASF licenses this file\n",
	"# to you under the Apache License, Version 2.0 (the\n",
	"# \"License\"); you may not use this file except in compliance\n",
	"# with the License. You may obtain a copy of the License at\n",
	"#\n",
	"# http://www.apache.org/licenses/LICENSE-2.0\n",
	"#\n",
	"# Unless required by applicable law or agreed to in writing,\n",
	"# software distributed under the License is distributed on an\n",
	"# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
	"# KIND, either express or implied. See the License for the\n",
	"# specific language governing permissions and limitations\n",
	"# under the License"
	],
	"metadata": {
	"cellView": "form",
	"id": "IVkpU8HZ1eyz"
	},
	"execution_count": 1,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"source": [
	"# Use RunInference for Generative AI\n",
	"\n",
	"<table align=\"left\">\n",
	" <td>\n",
	" <a target=\"_blank\" href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_generative_ai.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\" />Run in Google Colab</a>\n",
	" </td>\n",
	" <td>\n",
	" <a target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_generative_ai.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\" />View source on GitHub</a>\n",
	" </td>\n",
	"</table>\n"
	],
	"metadata": {
	"id": "kH8SORNim8on"
	}
	},
	{
	"cell_type": "markdown",
	"source": [
	"This notebook shows how to use the Apache Beam [RunInference](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.RunInference) transform for generative AI tasks. It uses a large language model (LLM) from the [Hugging Face Model Hub](https://huggingface.co/models).\n",
	"\n",
	"This notebook demonstrates the following steps:\n",
	"- Load and save a model from the Hugging Face Model Hub.\n",
	"- Use the PyTorch model handler for RunInference.\n",
	"\n",
	"For more information about using RunInference, see [Get started with AI/ML pipelines](https://beam.apache.org/documentation/ml/overview/) in the Apache Beam documentation."
	],
	"metadata": {
	"id": "7N2XzwoA0k4L"
	}
	},
	{
	"cell_type": "markdown",
	"source": [
	"## Install the Apache Beam SDK and dependencies\n",
	"\n",
	"Use the following code to install the Apache Beam Python SDK, PyTorch, and Transformers."
	],
	"metadata": {
	"id": "nhf_lOeEsO1C"
	}
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"id": "wS9a3Y0oZ_l5"
	},
	"outputs": [],
	"source": [
	"!pip install apache_beam[interactive,gcp]==2.48.0\n",
	"!pip install torch\n",
	"!pip install transformers"
	]
	},
	{
	"cell_type": "markdown",
	"source": [
	"Use the following code to import dependencies\n",
	"\n",
	"Important: If an error occurs, restart your runtime."
	],
	"metadata": {
	"id": "I7vMsFGW16bZ"
	}
	},
	{
	"cell_type": "code",
	"source": [
	"import os\n",
	"import apache_beam as beam\n",
	"from apache_beam.options.pipeline_options import PipelineOptions\n",
	"from apache_beam.ml.inference.base import PredictionResult\n",
	"from apache_beam.ml.inference.base import RunInference\n",
	"from apache_beam.ml.inference.pytorch_inference import make_tensor_model_fn\n",
	"from apache_beam.ml.inference.pytorch_inference import PytorchModelHandlerTensor\n",
	"import torch\n",
	"from transformers import AutoConfig\n",
	"from transformers import AutoModelForSeq2SeqLM\n",
	"from transformers import AutoTokenizer\n",
	"from transformers.tokenization_utils import PreTrainedTokenizer\n",
	"\n",
	"\n",
	"MAX_RESPONSE_TOKENS = 256\n",
	"\n",
	"model_name = \"google/flan-t5-small\"\n",
	"state_dict_path = \"saved_model\""
	],
	"metadata": {
	"id": "uhbOYUzvbOSc"
	},
	"execution_count": 3,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"source": [
	"## Download and save the model\n",
	"This notebook uses the [auto classes](https://huggingface.co/docs/transformers/model_doc/auto) from Hugging Face to instantly load the model in memory. Later, the model is saved to the path defined previously."
	],
	"metadata": {
	"id": "yRls3LmxswrC"
	}
	},
	{
	"cell_type": "code",
	"source": [
	"model = AutoModelForSeq2SeqLM.from_pretrained(\n",
	" model_name, torch_dtype=torch.bfloat16\n",
	" )\n",
	"\n",
	"directory = os.path.dirname(state_dict_path)\n",
	"torch.save(model.state_dict(), state_dict_path)"
	],
	"metadata": {
	"id": "PKhkiQFJe44n"
	},
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"source": [
	"## Define utility functions\n",
	"The input and output for the [`google/flan-t5-small`](https://huggingface.co/google/flan-t5-small) model are token tensors. These utility functions are used for the conversion of text to token tensors and then back to text.\n"
	],
	"metadata": {
	"id": "7TSqb3l1s7F7"
	}
	},
	{
	"cell_type": "code",
	"source": [
	"def to_tensors(input_text: str, tokenizer) -> torch.Tensor:\n",
	" \"\"\"Encodes input text into token tensors.\n",
	" Args:\n",
	" input_text: Input text for the LLM model.\n",
	" tokenizer: Tokenizer for the LLM model.\n",
	" Returns: Tokenized input tokens.\n",
	" \"\"\"\n",
	" return tokenizer(input_text, return_tensors=\"pt\").input_ids[0]\n",
	"\n",
	"\n",
	"def from_tensors(result: PredictionResult, tokenizer) -> str:\n",
	" \"\"\"Decodes output token tensors into text.\n",
	" Args:\n",
	" result: Prediction results from the RunInference transform.\n",
	" tokenizer: Tokenizer for the LLM model.\n",
	" Returns: The model's response as text.\n",
	" \"\"\"\n",
	" output_tokens = result.inference\n",
	" return tokenizer.decode(output_tokens, skip_special_tokens=True)"
	],
	"metadata": {
	"id": "OeTMbaLidnBe"
	},
	"execution_count": 5,
	"outputs": []
	},
	{
	"cell_type": "code",
	"source": [
	"# Load the tokenizer.\n",
	"tokenizer = AutoTokenizer.from_pretrained(model_name)\n",
	"\n",
	"# Create an instance of the PyTorch model handler.\n",
	"model_handler = PytorchModelHandlerTensor(\n",
	" state_dict_path=state_dict_path,\n",
	" model_class=AutoModelForSeq2SeqLM.from_config,\n",
	" model_params={\"config\": AutoConfig.from_pretrained(model_name)},\n",
	" inference_fn=make_tensor_model_fn(\"generate\"),\n",
	" )"
	],
	"metadata": {
	"id": "77mOPlQR5Mev"
	},
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"source": [
	"## Run the Pipeline\n"
	],
	"metadata": {
	"id": "y7IU3cYKtA3e"
	}
	},
	{
	"cell_type": "code",
	"source": [
	"example = [\"translate English to Spanish: We are in New York City.\"]\n",
	"\n",
	"pipeline = beam.Pipeline(options=PipelineOptions(save_main_session=True,pickle_library=\"cloudpickle\"))\n",
	"\n",
	"with pipeline as p:\n",
	" _ = (\n",
	" p\n",
	" \| \"Create Examples\" >> beam.Create(example)\n",
	" \| \"To tensors\" >> beam.Map(to_tensors, tokenizer)\n",
	" \| \"RunInference\"\n",
	" >> RunInference(\n",
	" model_handler,\n",
	" inference_args={\"max_new_tokens\": MAX_RESPONSE_TOKENS},\n",
	" )\n",
	" \| \"From tensors\" >> beam.Map(from_tensors, tokenizer)\n",
	" \| \"Print\" >> beam.Map(print)\n",
	" )"
	],
	"metadata": {
	"colab": {
	"base_uri": "https://localhost:8080/",
	"height": 106
	},
	"id": "JUBrJEDYcjRG",
	"outputId": "e7e7f0a2-dc0d-427d-bdf8-047739bb3160"
	},
	"execution_count": 7,
	"outputs": [
	{
	"output_type": "display_data",
	"data": {
	"application/javascript": [
	"\n",
	" if (typeof window.interactive_beam_jquery == 'undefined') {\n",
	" var jqueryScript = document.createElement('script');\n",
	" jqueryScript.src = 'https://code.jquery.com/jquery-3.4.1.slim.min.js';\n",
	" jqueryScript.type = 'text/javascript';\n",
	" jqueryScript.onload = function() {\n",
	" var datatableScript = document.createElement('script');\n",
	" datatableScript.src = 'https://cdn.datatables.net/1.10.20/js/jquery.dataTables.min.js';\n",
	" datatableScript.type = 'text/javascript';\n",
	" datatableScript.onload = function() {\n",
	" window.interactive_beam_jquery = jQuery.noConflict(true);\n",
	" window.interactive_beam_jquery(document).ready(function($){\n",
	" \n",
	" });\n",
	" }\n",
	" document.head.appendChild(datatableScript);\n",
	" };\n",
	" document.head.appendChild(jqueryScript);\n",
	" } else {\n",
	" window.interactive_beam_jquery(document).ready(function($){\n",
	" \n",
	" });\n",
	" }"
	]
	},
	"metadata": {}
	},
	{
	"output_type": "stream",
	"name": "stdout",
	"text": [
	"Estamos en Nueva York City.\n"
	]
	}
	]
	}
	]
	}