Database

Beam Playground uses Google Cloud Platform Datastore for storing examples and snippets. Redis is used for caching catalog reads from Datastore to avoid having to enumerate all of the exmaples on each catalog request.

Datastore namespaces

Playground can use custom namespace to store entities in Datastore to support simultaneous deployment of several backend instances in the same GCP project. By default Playground namespace is used. custom namespace can be selected by setting DATASTORE_NAMESPACE environment variable.

Migrations

If a breaking change to DB schema is made, it's adviced to implement a migration procedure to handle the data change in the Datastore. There are no formalized rules on when a migration should be implemented, but in general the following should be considered:

  • all examples and precompiled objects are recreated during the deployment process and are not modified during the application runtime
  • user code snippets do survive application update and may require data migration

In order to implement a migration a new Go file wtih name like migration_xxx.go should be created under the internal/db/schema/migration path. The file should contain a structure which implements the following interface:

type Version interface {
	// GetVersion returns the version string of the schema
	GetVersion() string
	// GetDescription returns the description of the schema version
	GetDescription() string
	// InitiateData initializes the data for the schema or performs a migration
	InitiateData(args *DBArgs) error
}

After implementing the migration logic inside of the InitiateData() function it should be covered by tests and the new migration should be added to the list of exisitng migrations in the cmd/server/server.go file inside of setupDBStructure() function.

Datastore schema

There are several entity kinds in Datastore: | Entity kind | Description | Corresponding Go struct | |-------------|-------------|-------------------------| | pg_schema_versions | Schema version entity | entity.SchemaEntity | | pg_sdks | SDK entity | entity.SDKEntity | | pg_examples | Example entity | entity.ExampleEntity | | pg_snippets | Snippet entity | entity.SnippetEntity | | pg_datasets | Dataset entity | entity.DatasetEntity | | pg_files | File entity | entity.FileEntity | | pg_pc_objects | Precompiled object entity | entity.PrecompiledObjectEntity |

pg_schema_versions

This entity kind is used to store schema version and description of changes. | Field name | Description | Type | |------------|-------------|------| | Name/ID | Schema version | Key | | descr | Description of changes | string |

pg_sdks

This entity kind is used to store information about each supported SDK.

Field nameDescriptionType
Name/IDSDK nameKey
defaultExampleName of the default example for this SDKstring

pg_examples

This entity kind is used to store example catalog items.

Field nameDescriptionType
Name/IDExample ID. Has form of <SDK>_<Example name>Key
nameExample namestring
sdkSDK IDKey
descrExample descriptionstring
tagsExample tags[]string
catsExample categories[]string
pathUrl of the example on Githubstring
typeType of the example. Possible values are PRECOMPILED_OBJECT_TYPE_UNSPECIFIED, PRECOMPILED_OBJECT_TYPE_EXAMPLE, PRECOMPILED_OBJECT_TYPE_KATA, PRECOMPILED_OBJECT_TYPE_UNIT_TESTstring
originPG_EXAMPLES for Playground examples, TB_EXAMPLES for Tour of Beam examplesstring
schVerSchema versionKey
urlVCSUrl of the example on Githubstring
urlNotebookUrl to a Collab notebook which has the example codestring
alwaysRunIf true, frontend will ignore any precompiled objects assosciated with the example and run it alwaysbool

pg_snippets

This entity kind is used to store snippets.

Field nameDescriptionType
Name/IDSnippet ID. For shared user code the ID is computed based on the snippet content hash, for the snippets containing examples code the ID is the same as for the related exampleKey
ownerIdCannot find any usagestring
sdkSDK IDKey
pipeOptsPipeline optionsstring
createdCreation timetime.Time
lVisitedLast visit timetime.Time
originPG_SNIPPETS for Playground examples, TB_SNIPPETS for Tour of Beam examples, PG_USER for snippets with code shared by users, TB_USER for snippets created by Tour Of Beam usersstring
visitCountNumber of times the snippet was visitedint
schVerSchema versionKey
numberOfFilesNumber of files in the snippet. Used to derive file keys.int
complexityComplexity of the snippet. Possible values are COMPLEXITY_UNSPECIFIED, COMPLEXITY_BASIC, COMPLEXITY_MEDIUM, COMPLEXITY_ADVANCEDstring
persistenceKeyUsed to track snippets created with Tour of Beam. When Tour of Beam user save a new snippet, all other snippets with the same persistenceKey are removed.string
datasetsContains an array of DatasetNestedEntity objects which describe datasets and emulators assosciated with the snippet[]DatasetNestedEntity

DatasetNestedEntity

Field nameDescriptionType
configA JSON serialized map[string]string object which contains emulator configurationstring
datasetDataset IDKey
emulatorEmulator name. Currently only kafka is supportedstring

pg_datasets

Field nameDescriptionType
Name/IDName of the datasetKey
pathPath to the dataset file on the runner filesystem under path specified in DATASETS_PATH environment variable (/opt/playground/backend/datasets by default)string

pg_files

Field nameDescriptionType
Name/IDThis field is constructed by concatenating snippet ID with an underscore (_) and an ordinal number of the file in the snippet. For example, if snippet SDK_JAVA_Example has numberOfFiles set to 2 then there will be two pg_files entities with SDK_JAVA_Example_0 and SDK_JAVA_Example_1 keys.Key
contentContent of the filestring
cntxLineLine number on which frontend will initially focus the text editor cursor when the file is being displayedint32
isMainWhether the file is the main file in the snippet. There can only be one main file in the snippetbool
nameName of the file which will shown to the user by the frontendstring

pg_pc_objects

These entities contain pre-compiled (cached) outputs of examples. There are three types of precompiled objects:

  • OUTPUT, containing example's run output
  • LOG, contianing example's log output
  • GRAPH, containing example's execution graph output All of these precompiled objects share the same schema
Field nameDescriptionType
Name/IDKey is constructed by concatenating example's ID with precompiled object type, e.g. SDK_GO_WordCount_OUTPUT, SDK_GO_WordCount_LOG, SDK_GO_WordCount_GRAPHKey
contentSaved output of the example's runstring

Datastore indexes

Indexes are defined in index.yaml file. The file is used during deployment to create indexes in the Datastore.

Redis

Playground uses Redis as a cache for examples catalog to avoid having to re-enumerate all exmaples upon each request, as a temporary storage for examples output (logs, graphs, etc.) and as a message bus to relay events like a user request for pipeline cancellation.

Each pipeline run uses pipleine id as a Redis key, with the following subkeys (source): | Key | Subkey | Description | |-----|--------|-------------| | Pipeline Id | STATUS | Pipeline status. Possible values can be found in api.proto in Status enum. | | Pipeline Id | RUN_OUTPUT | Pipeline run output. | | Pipeline Id | RUN_ERROR | Pipeline run error message. | | Pipeline Id | VALIDATION_OUTPUT | Pipeline validation step output. | | Pipeline Id | PREPARATION_OUTPUT | Pipeline preparation step output. | | Pipeline Id | COMPILE_OUTPUT | Pipeline compilation step output. | | Pipeline Id | CANCELED | Used to signal that user has requested pipeline cancellation. Runner periodically polls the cache to check if this key has been set to true and cancels the pipeline if it has. | | Pipeline Id | RUN_OUTPUT_INDEX | Index of the start of the run step's output. Upon each request of the pipeline execution logs this value is set to the end of the returned log and used in subsequent requests to skip already sent log fragment. | | Pipeline Id | LOGS | Pipeline execution logs. |

Additionally there are keys used globally by the Playground: | Key | Subkey | Description | |-----|--------|-------------| | EXAMPLES_CATALOG | None | Used to store cached version of examples catalog. | | SDKS_CATALOG | None | Used to store cached version of supported SDKS list with list of names of default examples. | | DEFAULT_PRECOMPILED_OBJECTS | Sdk | Used to store a default example metadata in cache. |