Adding a new example, code snippet, or Tour of Beam learning unit into the Playground is a three-step process:
Playground sources and output presentation formats:
This guide will walk through all steps.
Playground runs example code snippets using Apache Beam Direct Runner and requires that a code snippet is a complete runnable code.
Code snippets can use data sources to demonstrate transforms and concepts. Playground restricts code access to Internet for security reasons. Following are the recommend ways for code snippet's data sources and dependecies:
Source/Dependency | Notes |
---|---|
File | Store code snippet's data file in a GCS bucket in apache-beam-testing project. |
BigQuery | Create a BigQuery dataset/table in apache-beam-testing project. |
Python package | Python packages accessible by Playground are located in a Beam Python SDK container and in Playground Python container. Add required packages to Playground Python container. Please submit pull request with changes to the container or contact dev@beam.apache.org |
GitHub repo | If your example clones or dependes on files in a GitHub repo, copy required files to a GCS bucket in apache-beam-testing project and use the GCS files. |
Playground provides multiple features to help focus users on certain parts of the code.
Playground automatically applies the following to all snippets:
Playground supports Named Sections to tag code blocks and provide the following view options:
Please see Snippet View Options section for details how different view options can be used.
If you do not need any of those view options, skip to the next step.
Named Sections are defined with the following syntax:
// [START section_name] void method() { ... } // [END section_name]
Create a named section for each part of your code that you want the above features for. To learn more details about the syntax please see the README of the editor that Playground uses.
There are several types of code snippets in the Playground:
See the workflow above how artifacts map to these sources.
Playground Examples Catalog helps users discover example snippets and is the recommended way to add examples. Playground automatically scans, verifies and deploys example snippets from the directories listed below.
Note: SCIO examples are stored in a separate repository. To add support for a new SCIO example, please refer to this section of
TASKS.md
.
Playground Java, Python, and Go examples are automatically picked from these predefined directories by the playground_examples_ci.yml
GitHub workflow after a PR is merged to Beam repo:
/examples
/learning/katas
/sdks
.Adding Scala example snippets automatically is not supported, and Scala example snippets can be added to the catalog manually.
Playground relies on metadata comments block to identify and place an example into the database, which is required for an example to show in the Examples Catalog. See this for an example. Playground automatically removes metadata comments block before storing the example in database, so the metadata is not visible to end users. The block is in the format of a YAML map:
beam-playground: # Name of the Beam example that will be displayed in the Playground Examples Catalog. Required. name: "" # Description of the Beam example that will be displayed in the Playground Examples Catalog. Required. description: "" # Contains information about pipeline options of the Beam example/test/kata. Optional. pipeline_options: "--name1 value1 --name2 value2" # The line number to scroll to when the snippet is loaded. # Note that lines of the metadata block are cut so line numbers after it are shifted. # Optional, defaults to 1 (the first line). context_line: 1 # Categories this example is included into. See below for the supported values. # Optional, defaults to no categories making the example unlisted. categories: - "Combiners" - "Core Transforms" # Tags by which this snippet can be found in the Example Catalog. Optional. tags: - "numbers" - "count" # Helps user to identify example's complexity. Values: BASIC|MEDIUM|ADVANCED. Required. complexity: BASIC # Specifies the example to be loaded as default when its SDK selected in the Playground. # See section "Default examples" below. Optional, defaults to false. default_example: true # If the snippet has a Colab notebook, can link the URL of the Colab notebook that is based on this snippet. url_notebook: "https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/documentation/transforms/python/elementwise/filter-py.ipynb" # Specifies if the given example consists of multiple files or not. Optional, defaults to false. multifile: true # Specifies example caching by Playground. Optional, defaults to false (the output is cached). always_run: true # Datasets which will be used by emulators. Optional. # Please see section "Kafka emulator" for more information. datasets: # Dataset name CountWords: # Dataset location. Only "local" is supported. Required. location: local # Dataset format. Supported values are "avro" and "json". Required. format: avro # List of emulators to start during pipeline execution. Currently only `kafka` type is supported. Optional. emulators: - type: kafka topic: # Dataset id. Will be used as a topic name. id: dataset # Name of dataset specified in "datasets" section. source_dataset: "CountWords"
For metadata reference, see the fields in the Tag
class here.
The list of supported categories for an example is here. To add a new category, submit a PR that adds a category to the categories.yaml. When it is merged, the new category can be used in an example.
Each SDK must have a single default example. If there is none, the user will see an error in the app and a blank editor. If there are more than one, it is not defined which one will be selected.
Examples which require Kafka server emulator need to include the emulators
tag and provide dataset
in the example's tag. You can refer to an example here.
Add your dataset in either JSON or Avro format into the playground/backend/datasets
path.
Add the following elements to the example's metadata tag:
emulators: - type: kafka topic: id: dataset source_dataset: <dataset_name> datasets: <dataset_name>: location: local format: json # or 'avro'
replace <dataset_name>
with the name of your dataset file without the file name extension.
Use the exact string "kafka_server:9092"
as the server name in your code snippet. This string will be replaced by the actual host name and port automatically before the compilation step by Playground.
Kafka emulator limitations:
- Playground Kafka emulator currently supports only Beam Java SDK.
- The exact string
"kafka_server:9092"
should be present in the code snippet; any other variation like"kafa_server" + ":9092"
will not work.
Create and submit a PR with the code snippet into the Apache Beam repository following the Contribution guide. Verify that all pre-commit tests are passing.
Playground CI will verify and deploy the example to Playground Example Catalog when the PR is merged.
The snippet will be assigned an ID. You can find it in the address bar of the browser when you select it in the dropdown.
For example, in this URL:
https://play.beam.apache.org/?path=SDK_JAVA_MinimalWordCount&sdk=java
the ID is: SDK_JAVA_MinimalWordCount
.
You will need the snippet ID to embed the Playground with the snippet into a website page.
Not all examples must be visible in the example dropdown. Some examples are best in the context of Apache Beam documentation. To embed them into the documentation, use unlisted examples. They work and are checked and cached the same way as the examples displayed in the Playground catalog.
Proceed the same way as with Source 1. Playground Examples Catalog except:
/learning/beamdoc
categories
default_example
tags
The ID of the snippet is a function of the SDK and the name
attribute from its metadata:
SDK | ID |
---|---|
Go | SDK_GO_name |
Java | SDK_JAVA_name |
Python | SDK_PYTHON_name |
“Tour of Beam” is a separate project that combines learning materials with runnable snippets and allows students to track their learning progress. It uses the Playground engine, and so its content is added in a similar way.
A Tour of Beam unit consists of learning materials and an optional runnable snippet. See the learning content README on how to add units and link snippets to them.
Tour of Beam snippets are checked and cached the same way as Playground examples.
Proceed the same way as with Source 1. Playground Examples Catalog except:
/learning/tour-of-beam/learning-content
. It is recommended to follow the directory hierarchy as described in the learning content README.categories
default_example
tags
The ID of the snippet is a function of the SDK and the name
attribute from its metadata:
SDK | ID |
---|---|
Go | TB_EXAMPLES_SDK_GO_name |
Java | TB_EXAMPLES_SDK_JAVA_name |
Python | TB_EXAMPLES_SDK_PYTHON_name |
For instance, for the Go the example CSV
it is TB_EXAMPLES_SDK_GO_CSV
.
A code snippet can be saved to the Playground using “Share my code” button in the Playground:
This is easy and fast. It does not require any interaction with the Beam team.
Share my code considerations:
- A user-shared snippet is immutable. If you edit the code and re-share, a new snippet and a new link will be generated.
- Playground automatically applies a 3-month retention policy to shared snippets that are not used. To request a deletion of a snippet, please send an email to dev@beam.apache.org with subject: [Playground] Delete a snippet.
- Playground does not cache output or graph for user-shared snippets.
- Playground does not verify user-shared snippets.
Playground can load a snippet stored on an HTTPS server using the provided URL, including GitHub direct links to raw file content.
This is as easy and fast as using Share my code button, but also allows you to modify a snippet after it is published without changing a link.
Loading snippet from HTTPS URL considerations:
- Playground does not cache output or graph for HTTPS URL snippets.
- Playground does not verify HTTPS URL snippets.
For Playground to be able to load the snippet over HTTPS, the HTTPS server needs to allow the access by sending the following header:
Access-Control-Allow-Origin: *
at least when requested with *.beam.apache.org
as referer
.
This is related to Cross-Origin Resource Sharing (CORS), to read more about CORS please see CORS (Cross-Origin Resource Sharing).
Many prefer to host code snippets in their GitHub repositories. GitHub is known to allow cross-origin access on direct links to raw file content. An example of loading a GitHub snippet:
https://play.beam.apache.org/?sdk=go&url=https://raw.githubusercontent.com/apache/beam-starter-go/main/main.go
The snippet can now be shown in the Playground. Choose any of the following ways.
The link contains the path
to your snippet in the database. It is in the following format:
https://play.beam.apache.org/?path=SDK_JAVA_MinimalWordCount&sdk=java
A special case is the default snippet for an SDK. It can be loaded by the following link:
https://play.beam.apache.org/?sdk=python&default=true
This way if another snippet is ever made default, the links you shared will lead to the new snippet.
Link to an unlisted example can be constructed by providing your snippet ID and SDK in the following URL:
https://play.beam.apache.org/?path=<ID>&sdk=<SDK>
The ID of the snippet is a function of the SDK and the name
attribute from its metadata:
SDK | ID |
---|---|
Go | SDK_GO_name |
Java | SDK_JAVA_name |
Python | SDK_PYTHON_name |
Link to a snippet can be constructed by providing your snippet ID and SDK in the following URL:
https://play.beam.apache.org/?path=<ID>&sdk=<SDK>
The ID of the snippet is a function of the SDK and the name
attribute from its metadata:
SDK | ID |
---|---|
Go | TB_EXAMPLES_SDK_GO_name |
Java | TB_EXAMPLES_SDK_JAVA_name |
Python | TB_EXAMPLES_SDK_PYTHON_name |
For instance, for the Go the example CSV
it is TB_EXAMPLES_SDK_GO_CSV
, and the link is
https://play.beam.apache.org/?path=TB_EXAMPLES_SDK_GO_CSV&sdk=go
You get the link when you click “Share my code” button. It is in the following format:
https://play.beam.apache.org/?sdk=java&shared=SNIPPET_ID
Add the URL to the url
parameter, for example:
https://play.beam.apache.org/?sdk=go&url=https://raw.githubusercontent.com/apache/beam-starter-go/main/main.go
You can link to an empty editor to make your users start their snippets from scratch:
https://play.beam.apache.org/?sdk=go&empty=true
The above URLs load snippets that you want. But what happens if the user switches SDK? Normally this will be shown:
This can be changed by linking to multiple examples, up to one per SDK.
For this purpose, make a JSON array with any combination of parameters that are allowed for loading single examples, for instance:
[ { "sdk": "java", "path": "SDK_JAVA_AggregationMax" }, { "sdk": "go", "url": "https://raw.githubusercontent.com/apache/beam-starter-go/main/main.go" } ]
Then pass it inexamples
query parameter like this:
https://play.beam.apache.org/?sdk=go&examples=[{"sdk":"java","path":"SDK_JAVA_AggregationMax"},{"sdk":"go","url":"https://raw.githubusercontent.com/apache/beam-starter-go/main/main.go"}]
This starts with the Go example loaded from the URL. If SDK is then switched to Java, the AggregationMax
catalog example is loaded for it. If SDK is switched to any other one, the default example for that SDK is loaded, because no override was provided.
Embedded Playground is a simplified interface of the Playground web app designed to be embedded into an <iframe>
in web pages. It supports most of the Playground web app features. The embedded Playground URLs start with https://play.beam.apache.org/embedded
and use the same query string parameters as the Playground web app. Additionally, the Embedded playground supports editable=0
parameter to make the editor read-only.
play.beam.apache.org/?...
with play.beam.apache.org/embedded?...
because the embedded interface is simpler.<iframe>
HTML element as follows:<iframe src="https://play.beam.apache.org/embedded?sdk=go&url=https://raw.githubusercontent.com/apache/beam-starter-go/main/main.go" width="90%" height="600px" allow="clipboard-write" />
Apache Beam website uses Hugo Markdown preprocessor. Custom Hugo shortcodes were added to Apache Beam website to embed Playground snippets. Use the custom shortcodes to embed Playground into the Apache Beam website:
playground
shortcode, see this comment for a complete example.playground_snippet
shortcode, see this comment for all supported options.These shortcodes generate an iframe
with the URLs described above.
If your code contains named sections as described in the Step 1. Prepare a code snippet, you can apply view options to those sections. Otherwise skip this.
Add readonly
parameter with comma-separated section names:
https://play.beam.apache.org/?sdk=go&url=...&readonly=section_name
Add unfold
parameter with comma-separated section names:
https://play.beam.apache.org/?sdk=go&url=...&unfold=section_name
This folds all foldable blocks that do not overlap with any of the given sections.
Add show
parameter with a single section name:
https://play.beam.apache.org/?sdk=go&url=...&show=section_name
It is still the whole snippet that is sent for execution, although only the given section is visible.
This also makes the editor read-only so the user cannot add code that conflicts with the hidden text.