blob: 705ce9016320f99d1a47c37d2e18271d1c744176 [file] [log] [blame]
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github"
},
"source": [
"<a href=\"https://colab.research.google.com/github/apache/beam/blob/master//Users/dcavazos/src/beam/examples/notebooks/documentation/transforms/python/elementwise/regex-py.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "view-the-docs-top"
},
"source": [
"<table align=\"left\"><td><a target=\"_blank\" href=\"https://beam.apache.org/documentation/transforms/python/elementwise/regex\"><img src=\"https://beam.apache.org/images/logos/full-color/name-bottom/beam-logo-full-color-name-bottom-100.png\" width=\"32\" height=\"32\" />View the docs</a></td></table>"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "_-code"
},
"outputs": [],
"source": [
"#@title Licensed under the Apache License, Version 2.0 (the \"License\")\n",
"# Licensed to the Apache Software Foundation (ASF) under one\n",
"# or more contributor license agreements. See the NOTICE file\n",
"# distributed with this work for additional information\n",
"# regarding copyright ownership. The ASF licenses this file\n",
"# to you under the Apache License, Version 2.0 (the\n",
"# \"License\"); you may not use this file except in compliance\n",
"# with the License. You may obtain a copy of the License at\n",
"#\n",
"# http://www.apache.org/licenses/LICENSE-2.0\n",
"#\n",
"# Unless required by applicable law or agreed to in writing,\n",
"# software distributed under the License is distributed on an\n",
"# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
"# KIND, either express or implied. See the License for the\n",
"# specific language governing permissions and limitations\n",
"# under the License."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "regex"
},
"source": [
"# Regex\n",
"\n",
"<script type=\"text/javascript\">\n",
"localStorage.setItem('language', 'language-py')\n",
"</script>\n",
"\n",
"<table align=\"left\" style=\"margin-right:1em\">\n",
" <td>\n",
" <a class=\"button\" target=\"_blank\" href=\"https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.Regex\"><img src=\"https://beam.apache.org/images/logos/sdks/python.png\" width=\"32px\" height=\"32px\" alt=\"Pydoc\"/> Pydoc</a>\n",
" </td>\n",
"</table>\n",
"\n",
"<br/><br/><br/>\n",
"\n",
"Filters input string elements based on a regex. May also transform them based on the matching groups."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "setup"
},
"source": [
"## Setup\n",
"\n",
"To run a code cell, you can click the **Run cell** button at the top left of the cell,\n",
"or select it and press **`Shift+Enter`**.\n",
"Try modifying a code cell and re-running it to see what happens.\n",
"\n",
"> To learn more about Colab, see\n",
"> [Welcome to Colaboratory!](https://colab.sandbox.google.com/notebooks/welcome.ipynb).\n",
"\n",
"First, let's install the `apache-beam` module."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "setup-code"
},
"outputs": [],
"source": [
"!pip install --quiet -U apache-beam"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "examples"
},
"source": [
"## Examples\n",
"\n",
"In the following examples, we create a pipeline with a `PCollection` of text strings.\n",
"Then, we use the `Regex` transform to search, replace, and split through the text elements using\n",
"[regular expressions](https://docs.python.org/3/library/re.html).\n",
"\n",
"You can use tools to help you create and test your regular expressions, such as\n",
"[regex101](https://regex101.com/).\n",
"Make sure to specify the Python flavor at the left side bar.\n",
"\n",
"Lets look at the\n",
"[regular expression `(?P<icon>[^\\s,]+), *(\\w+), *(\\w+)`](https://regex101.com/r/Z7hTTj/3)\n",
"for example.\n",
"It matches anything that is not a whitespace `\\s` (`[ \\t\\n\\r\\f\\v]`) or comma `,`\n",
"until a comma is found and stores that in the named group `icon`,\n",
"this can match even `utf-8` strings.\n",
"Then it matches any number of whitespaces, followed by at least one word character\n",
"`\\w` (`[a-zA-Z0-9_]`), which is stored in the second group for the *name*.\n",
"It does the same with the third group for the *duration*.\n",
"\n",
"> *Note:* To avoid unexpected string escaping in your regular expressions,\n",
"> it is recommended to use\n",
"> [raw strings](https://docs.python.org/3/reference/lexical_analysis.html?highlight=raw#string-and-bytes-literals)\n",
"> such as `r'raw-string'` instead of `'escaped-string'`."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "example-1-regex-match"
},
"source": [
"### Example 1: Regex match\n",
"\n",
"`Regex.matches` keeps only the elements that match the regular expression,\n",
"returning the matched group.\n",
"The argument `group` is set to `0` (the entire match) by default,\n",
"but can be set to a group number like `3`, or to a named group like `'icon'`.\n",
"\n",
"`Regex.matches` starts to match the regular expression at the beginning of the string.\n",
"To match until the end of the string, add `'$'` at the end of the regular expression.\n",
"\n",
"To start matching at any point instead of the beginning of the string, use\n",
"[`Regex.find(regex)`](#example-4-regex-find)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "example-1-regex-match-code"
},
"outputs": [],
"source": [
"import apache_beam as beam\n",
"\n",
"# Matches a named group 'icon', and then two comma-separated groups.\n",
"regex = r'(?P<icon>[^\\s,]+), *(\\w+), *(\\w+)'\n",
"with beam.Pipeline() as pipeline:\n",
" plants_matches = (\n",
" pipeline\n",
" | 'Garden plants' >> beam.Create([\n",
" '🍓, Strawberry, perennial',\n",
" '🥕, Carrot, biennial ignoring trailing words',\n",
" '🍆, Eggplant, perennial',\n",
" '🍅, Tomato, annual',\n",
" '🥔, Potato, perennial',\n",
" '# 🍌, invalid, format',\n",
" 'invalid, 🍉, format',\n",
" ])\n",
" | 'Parse plants' >> beam.Regex.matches(regex)\n",
" | beam.Map(print)\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "example-1-regex-match-2"
},
"source": [
"<table align=\"left\" style=\"margin-right:1em\">\n",
" <td>\n",
" <a class=\"button\" target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/elementwise/regex.py\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" width=\"32px\" height=\"32px\" alt=\"View source code\"/> View source code</a>\n",
" </td>\n",
"</table>\n",
"\n",
"<br/><br/><br/>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "example-2-regex-match-with-all-groups"
},
"source": [
"### Example 2: Regex match with all groups\n",
"\n",
"`Regex.all_matches` keeps only the elements that match the regular expression,\n",
"returning *all groups* as a list.\n",
"The groups are returned in the order encountered in the regular expression,\n",
"including `group 0` (the entire match) as the first group.\n",
"\n",
"`Regex.all_matches` starts to match the regular expression at the beginning of the string.\n",
"To match until the end of the string, add `'$'` at the end of the regular expression.\n",
"\n",
"To start matching at any point instead of the beginning of the string, use\n",
"[`Regex.find_all(regex, group=Regex.ALL, outputEmpty=False)`](#example-5-regex-find-all)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "example-2-regex-match-with-all-groups-code"
},
"outputs": [],
"source": [
"import apache_beam as beam\n",
"\n",
"# Matches a named group 'icon', and then two comma-separated groups.\n",
"regex = r'(?P<icon>[^\\s,]+), *(\\w+), *(\\w+)'\n",
"with beam.Pipeline() as pipeline:\n",
" plants_all_matches = (\n",
" pipeline\n",
" | 'Garden plants' >> beam.Create([\n",
" '🍓, Strawberry, perennial',\n",
" '🥕, Carrot, biennial ignoring trailing words',\n",
" '🍆, Eggplant, perennial',\n",
" '🍅, Tomato, annual',\n",
" '🥔, Potato, perennial',\n",
" '# 🍌, invalid, format',\n",
" 'invalid, 🍉, format',\n",
" ])\n",
" | 'Parse plants' >> beam.Regex.all_matches(regex)\n",
" | beam.Map(print)\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "example-2-regex-match-with-all-groups-2"
},
"source": [
"<table align=\"left\" style=\"margin-right:1em\">\n",
" <td>\n",
" <a class=\"button\" target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/elementwise/regex.py\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" width=\"32px\" height=\"32px\" alt=\"View source code\"/> View source code</a>\n",
" </td>\n",
"</table>\n",
"\n",
"<br/><br/><br/>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "example-3-regex-match-into-key-value-pairs"
},
"source": [
"### Example 3: Regex match into key-value pairs\n",
"\n",
"`Regex.matches_kv` keeps only the elements that match the regular expression,\n",
"returning a key-value pair using the specified groups.\n",
"The argument `keyGroup` is set to a group number like `3`, or to a named group like `'icon'`.\n",
"The argument `valueGroup` is set to `0` (the entire match) by default,\n",
"but can be set to a group number like `3`, or to a named group like `'icon'`.\n",
"\n",
"`Regex.matches_kv` starts to match the regular expression at the beginning of the string.\n",
"To match until the end of the string, add `'$'` at the end of the regular expression.\n",
"\n",
"To start matching at any point instead of the beginning of the string, use\n",
"[`Regex.find_kv(regex, keyGroup)`](#example-6-regex-find-as-key-value-pairs)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "example-3-regex-match-into-key-value-pairs-code"
},
"outputs": [],
"source": [
"import apache_beam as beam\n",
"\n",
"# Matches a named group 'icon', and then two comma-separated groups.\n",
"regex = r'(?P<icon>[^\\s,]+), *(\\w+), *(\\w+)'\n",
"with beam.Pipeline() as pipeline:\n",
" plants_matches_kv = (\n",
" pipeline\n",
" | 'Garden plants' >> beam.Create([\n",
" '🍓, Strawberry, perennial',\n",
" '🥕, Carrot, biennial ignoring trailing words',\n",
" '🍆, Eggplant, perennial',\n",
" '🍅, Tomato, annual',\n",
" '🥔, Potato, perennial',\n",
" '# 🍌, invalid, format',\n",
" 'invalid, 🍉, format',\n",
" ])\n",
" | 'Parse plants' >> beam.Regex.matches_kv(regex, keyGroup='icon')\n",
" | beam.Map(print)\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "example-3-regex-match-into-key-value-pairs-2"
},
"source": [
"<table align=\"left\" style=\"margin-right:1em\">\n",
" <td>\n",
" <a class=\"button\" target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/elementwise/regex.py\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" width=\"32px\" height=\"32px\" alt=\"View source code\"/> View source code</a>\n",
" </td>\n",
"</table>\n",
"\n",
"<br/><br/><br/>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "example-4-regex-find"
},
"source": [
"### Example 4: Regex find\n",
"\n",
"`Regex.find` keeps only the elements that match the regular expression,\n",
"returning the matched group.\n",
"The argument `group` is set to `0` (the entire match) by default,\n",
"but can be set to a group number like `3`, or to a named group like `'icon'`.\n",
"\n",
"`Regex.find` matches the first occurrence of the regular expression in the string.\n",
"To start matching at the beginning, add `'^'` at the beginning of the regular expression.\n",
"To match until the end of the string, add `'$'` at the end of the regular expression.\n",
"\n",
"If you need to match from the start only, consider using\n",
"[`Regex.matches(regex)`](#example-1-regex-match)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "example-4-regex-find-code"
},
"outputs": [],
"source": [
"import apache_beam as beam\n",
"\n",
"# Matches a named group 'icon', and then two comma-separated groups.\n",
"regex = r'(?P<icon>[^\\s,]+), *(\\w+), *(\\w+)'\n",
"with beam.Pipeline() as pipeline:\n",
" plants_matches = (\n",
" pipeline\n",
" | 'Garden plants' >> beam.Create([\n",
" '# 🍓, Strawberry, perennial',\n",
" '# 🥕, Carrot, biennial ignoring trailing words',\n",
" '# 🍆, Eggplant, perennial - 🍌, Banana, perennial',\n",
" '# 🍅, Tomato, annual - 🍉, Watermelon, annual',\n",
" '# 🥔, Potato, perennial',\n",
" ])\n",
" | 'Parse plants' >> beam.Regex.find(regex)\n",
" | beam.Map(print)\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "example-4-regex-find-2"
},
"source": [
"<table align=\"left\" style=\"margin-right:1em\">\n",
" <td>\n",
" <a class=\"button\" target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/elementwise/regex.py\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" width=\"32px\" height=\"32px\" alt=\"View source code\"/> View source code</a>\n",
" </td>\n",
"</table>\n",
"\n",
"<br/><br/><br/>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "example-5-regex-find-all"
},
"source": [
"### Example 5: Regex find all\n",
"\n",
"`Regex.find_all` returns a list of all the matches of the regular expression,\n",
"returning the matched group.\n",
"The argument `group` is set to `0` by default, but can be set to a group number like `3`, to a named group like `'icon'`, or to `Regex.ALL` to return all groups.\n",
"The argument `outputEmpty` is set to `True` by default, but can be set to `False` to skip elements where no matches were found.\n",
"\n",
"`Regex.find_all` matches the regular expression anywhere it is found in the string.\n",
"To start matching at the beginning, add `'^'` at the start of the regular expression.\n",
"To match until the end of the string, add `'$'` at the end of the regular expression.\n",
"\n",
"If you need to match all groups from the start only, consider using\n",
"[`Regex.all_matches(regex)`](#example-2-regex-match-with-all-groups)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "example-5-regex-find-all-code"
},
"outputs": [],
"source": [
"import apache_beam as beam\n",
"\n",
"# Matches a named group 'icon', and then two comma-separated groups.\n",
"regex = r'(?P<icon>[^\\s,]+), *(\\w+), *(\\w+)'\n",
"with beam.Pipeline() as pipeline:\n",
" plants_find_all = (\n",
" pipeline\n",
" | 'Garden plants' >> beam.Create([\n",
" '# 🍓, Strawberry, perennial',\n",
" '# 🥕, Carrot, biennial ignoring trailing words',\n",
" '# 🍆, Eggplant, perennial - 🍌, Banana, perennial',\n",
" '# 🍅, Tomato, annual - 🍉, Watermelon, annual',\n",
" '# 🥔, Potato, perennial',\n",
" ])\n",
" | 'Parse plants' >> beam.Regex.find_all(regex)\n",
" | beam.Map(print)\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "example-5-regex-find-all-2"
},
"source": [
"<table align=\"left\" style=\"margin-right:1em\">\n",
" <td>\n",
" <a class=\"button\" target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/elementwise/regex.py\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" width=\"32px\" height=\"32px\" alt=\"View source code\"/> View source code</a>\n",
" </td>\n",
"</table>\n",
"\n",
"<br/><br/><br/>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "example-6-regex-find-as-key-value-pairs"
},
"source": [
"### Example 6: Regex find as key-value pairs\n",
"\n",
"`Regex.find_kv` returns a list of all the matches of the regular expression,\n",
"returning a key-value pair using the specified groups.\n",
"The argument `keyGroup` is set to a group number like `3`, or to a named group like `'icon'`.\n",
"The argument `valueGroup` is set to `0` (the entire match) by default,\n",
"but can be set to a group number like `3`, or to a named group like `'icon'`.\n",
"\n",
"`Regex.find_kv` matches the first occurrence of the regular expression in the string.\n",
"To start matching at the beginning, add `'^'` at the beginning of the regular expression.\n",
"To match until the end of the string, add `'$'` at the end of the regular expression.\n",
"\n",
"If you need to match as key-value pairs from the start only, consider using\n",
"[`Regex.matches_kv(regex)`](#example-3-regex-match-into-key-value-pairs)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "example-6-regex-find-as-key-value-pairs-code"
},
"outputs": [],
"source": [
"import apache_beam as beam\n",
"\n",
"# Matches a named group 'icon', and then two comma-separated groups.\n",
"regex = r'(?P<icon>[^\\s,]+), *(\\w+), *(\\w+)'\n",
"with beam.Pipeline() as pipeline:\n",
" plants_matches_kv = (\n",
" pipeline\n",
" | 'Garden plants' >> beam.Create([\n",
" '# 🍓, Strawberry, perennial',\n",
" '# 🥕, Carrot, biennial ignoring trailing words',\n",
" '# 🍆, Eggplant, perennial - 🍌, Banana, perennial',\n",
" '# 🍅, Tomato, annual - 🍉, Watermelon, annual',\n",
" '# 🥔, Potato, perennial',\n",
" ])\n",
" | 'Parse plants' >> beam.Regex.find_kv(regex, keyGroup='icon')\n",
" | beam.Map(print)\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "example-6-regex-find-as-key-value-pairs-2"
},
"source": [
"<table align=\"left\" style=\"margin-right:1em\">\n",
" <td>\n",
" <a class=\"button\" target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/elementwise/regex.py\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" width=\"32px\" height=\"32px\" alt=\"View source code\"/> View source code</a>\n",
" </td>\n",
"</table>\n",
"\n",
"<br/><br/><br/>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "example-7-regex-replace-all"
},
"source": [
"### Example 7: Regex replace all\n",
"\n",
"`Regex.replace_all` returns the string with all the occurrences of the regular expression replaced by another string.\n",
"You can also use\n",
"[backreferences](https://docs.python.org/3/library/re.html?highlight=backreference#re.sub)\n",
"on the `replacement`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "example-7-regex-replace-all-code"
},
"outputs": [],
"source": [
"import apache_beam as beam\n",
"\n",
"with beam.Pipeline() as pipeline:\n",
" plants_replace_all = (\n",
" pipeline\n",
" | 'Garden plants' >> beam.Create([\n",
" '🍓 : Strawberry : perennial',\n",
" '🥕 : Carrot : biennial',\n",
" '🍆\\t:\\tEggplant\\t:\\tperennial',\n",
" '🍅 : Tomato : annual',\n",
" '🥔 : Potato : perennial',\n",
" ])\n",
" | 'To CSV' >> beam.Regex.replace_all(r'\\s*:\\s*', ',')\n",
" | beam.Map(print)\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "example-7-regex-replace-all-2"
},
"source": [
"<table align=\"left\" style=\"margin-right:1em\">\n",
" <td>\n",
" <a class=\"button\" target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/elementwise/regex.py\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" width=\"32px\" height=\"32px\" alt=\"View source code\"/> View source code</a>\n",
" </td>\n",
"</table>\n",
"\n",
"<br/><br/><br/>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "example-8-regex-replace-first"
},
"source": [
"### Example 8: Regex replace first\n",
"\n",
"`Regex.replace_first` returns the string with the first occurrence of the regular expression replaced by another string.\n",
"You can also use\n",
"[backreferences](https://docs.python.org/3/library/re.html?highlight=backreference#re.sub)\n",
"on the `replacement`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "example-8-regex-replace-first-code"
},
"outputs": [],
"source": [
"import apache_beam as beam\n",
"\n",
"with beam.Pipeline() as pipeline:\n",
" plants_replace_first = (\n",
" pipeline\n",
" | 'Garden plants' >> beam.Create([\n",
" '🍓, Strawberry, perennial',\n",
" '🥕, Carrot, biennial',\n",
" '🍆,\\tEggplant, perennial',\n",
" '🍅, Tomato, annual',\n",
" '🥔, Potato, perennial',\n",
" ])\n",
" | 'As dictionary' >> beam.Regex.replace_first(r'\\s*,\\s*', ': ')\n",
" | beam.Map(print)\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "example-8-regex-replace-first-2"
},
"source": [
"<table align=\"left\" style=\"margin-right:1em\">\n",
" <td>\n",
" <a class=\"button\" target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/elementwise/regex.py\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" width=\"32px\" height=\"32px\" alt=\"View source code\"/> View source code</a>\n",
" </td>\n",
"</table>\n",
"\n",
"<br/><br/><br/>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "example-9-regex-split"
},
"source": [
"### Example 9: Regex split\n",
"\n",
"`Regex.split` returns the list of strings that were delimited by the specified regular expression.\n",
"The argument `outputEmpty` is set to `False` by default, but can be set to `True` to keep empty items in the output list."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "example-9-regex-split-code"
},
"outputs": [],
"source": [
"import apache_beam as beam\n",
"\n",
"with beam.Pipeline() as pipeline:\n",
" plants_split = (\n",
" pipeline\n",
" | 'Garden plants' >> beam.Create([\n",
" '🍓 : Strawberry : perennial',\n",
" '🥕 : Carrot : biennial',\n",
" '🍆\\t:\\tEggplant : perennial',\n",
" '🍅 : Tomato : annual',\n",
" '🥔 : Potato : perennial',\n",
" ])\n",
" | 'Parse plants' >> beam.Regex.split(r'\\s*:\\s*')\n",
" | beam.Map(print)\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "example-9-regex-split-2"
},
"source": [
"<table align=\"left\" style=\"margin-right:1em\">\n",
" <td>\n",
" <a class=\"button\" target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/elementwise/regex.py\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" width=\"32px\" height=\"32px\" alt=\"View source code\"/> View source code</a>\n",
" </td>\n",
"</table>\n",
"\n",
"<br/><br/><br/>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "related-transforms"
},
"source": [
"## Related transforms\n",
"\n",
"* [FlatMap](https://beam.apache.org/documentation/transforms/python/elementwise/flatmap) behaves the same as `Map`, but for\n",
" each input it may produce zero or more outputs.\n",
"* [Map](https://beam.apache.org/documentation/transforms/python/elementwise/map) applies a simple 1-to-1 mapping function over each element in the collection\n",
"\n",
"<table align=\"left\" style=\"margin-right:1em\">\n",
" <td>\n",
" <a class=\"button\" target=\"_blank\" href=\"https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.Regex\"><img src=\"https://beam.apache.org/images/logos/sdks/python.png\" width=\"32px\" height=\"32px\" alt=\"Pydoc\"/> Pydoc</a>\n",
" </td>\n",
"</table>\n",
"\n",
"<br/><br/><br/></icon>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "view-the-docs-bottom"
},
"source": [
"<table align=\"left\"><td><a target=\"_blank\" href=\"https://beam.apache.org/documentation/transforms/python/elementwise/regex\"><img src=\"https://beam.apache.org/images/logos/full-color/name-bottom/beam-logo-full-color-name-bottom-100.png\" width=\"32\" height=\"32\" />View the docs</a></td></table>"
]
}
],
"metadata": {
"colab": {
"name": "Regex - element-wise transform",
"toc_visible": true
},
"kernelspec": {
"display_name": "python3",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}