playground/backend/CONTRIBUTE.md - beam - Git at Google

 <!--
     Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

     Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.
 -->

 # Contribution Guide

 This guide consists of:

 - [Project structure](#project-structure)
 - [Generated Files](#generated-files)
 - [Code processing pipeline](#code-processing-pipeline)
 - [How to add a new supported language](#how-to-add-a-new-supported-language)

 See also:
 - [Database schema](SCHEMA.md)

 ## Project structure

 ```
 backend/
 ├── cmd
 │   ├── migration_tool           # tool to apply database migrations
 │   ├── remove_unused_snippets   # tool to remove old snippets manually
 │   └── server                   # entry point to the backend application
 ├── configs                      # config files for each SDK
 ├── containers                   # set up and build backend docker images
 ├── datasets                     # datasets for examples using Kafka emulator
 ├── internal                     # backend logic
 │   ├── api                      # generated grpc API files
 │   ├── cache                    # logic for working with cache
 │   ├── code_processing          # logic for processing the received code
 │   ├── components               # backend components
 │   ├── constants                # code constants used in the application
 │   ├── db                       # logic for working with database, e.g. the Cloud Datastore
 │   ├── emulators                # logic for starting various emulators, e.g. Kafka
 │   ├── environment              # tools for working with application environment settings
 │   ├── errors                   # custom errors
 │   ├── executors                # logic used to run the user submitted code
 │   ├── external_functions       # logic for calling Google Cloud Functions
 │   ├── fs_tool                  # logic for woking with filesystem operations during run preparation
 │   ├── logger                   # cusotm logger
 │   ├── preparers                # logic for preparing the user submitted code before execution
 │   ├── setup_tools              # logic for setting up executors
 │   ├── streaming                # implementation of run output streamer
 │   ├── tasks                    # periodic tasks scheduler
 │   ├── tests                    # common testing logic
 │   ├── utils                    # miscellaneous tools
 │   └── validators               # logic for pre-execution code validation
 ├── playground_functions         # Google Cloud Functions for write access to the database
 ├── functions.go                 # entry point for Cloud Functions
 ├── go.mod                       # Go project build configuration
 ├── logging.properties           # configuration for Java runner logger
 ├── new_scio_project.sh          # script for creating new SCIO project, used by SCIO runner
 └── properties.yaml              # application properties
 ...
 ```

 ## Generated Files

 All generated files (generated grpc API files, `go.sum`) should be published to the repository.

 ## Code processing pipeline

 ### Controller’s work

 1. Backend receives a request for the `RunCode` API method
 2. Backend checks that the SDK from the request matches the backend’s environment.
 3. Backend generates the key of the code processing (`uuid` format), saves it to the cache, and sends it back to the
    client.
 4. Backend starts a new goroutine that processes the code from the client request.

 ### Code processing goroutine

 1. Backend sets up a timeout for each code processing.
 2. Backend starts a new goroutine to check that current code processing is still actual and hasn’t been canceled by the
    client.
 3. Validation of the received code.
 4. Preparing the received code.
 5. Compilation of the received code.
 6. Execution of the received code.

 Each step (`3-6` steps) is a separate goroutine and could be stopped if code processing has been canceled, or it takes
 too much time.

 After each step (even if it ends with failure) status of the code processing changes according to a finished step, so
 the client clearly understands what is happening with the code processing at the moment.

 Status, all outputs, and all error messages are placed to the one common cache, so even if there are several instances
 it does not matter which instance process the code.

 ## How to add a new supported language

 1. Add the language to [api.proto](../api/v1/api.proto) file:

 ```
 enum Sdk {
   SDK_UNSPECIFIED = 0;
   SDK_JAVA = 1;
   SDK_GO = 2;
   SDK_PYTHON = 3;
   SDK_SCIO = 4;
 }
 ```

 2. Create a new environment for a new language as [this one](containers/java)
 3. Create a new config file for a new language as [this one](configs/SDK_JAVA.json)
 4. Update a method to create file system according to a new language [here](internal/fs_tool/fs.go) (`NewLifeCycle()`
    method)
 5. Update a method to set up a file system according to a new
    language [here](internal/setup_tools/life_cycle/life_cycle_setuper.go) (`Setup()` method)
 6. Update a method to set up code validator according to a new language[here](internal/utils/validators_utils.go)
 7. Update a method to set up code preparers according to a new language [here](internal/utils/preparators_utils.go)
 8. Update a method to set up compiler according to a new
    language [here](internal/setup_tools/builder/setup_builder.go) (`Compiler()` method)
 9. Update a method to set up runner according to a new
    language [here](internal/setup_tools/builder/setup_builder.go) (`Runner()` method)
 10. Update a method to set up test runner according to a new
     language [here](internal/setup_tools/builder/setup_builder.go) (`TestRunner()` method)
 11. Update a method to compile client's code according to a new
     language [here](internal/code_processing/code_processing.go) (`compileStep()` method)
 12. Update a method to execute client's code according to a new
     language [here](internal/code_processing/code_processing.go) (`runStep()` method)

 ## Adding an emulator-enabled example
 1. Develop an example with an appropriate dataset
 2. Put the dataset here in json or avro format as an array with objects: `playground/backend/datasets`
 3. Put the example to the Apache Beam repository
 4. Add a beam-playground comment to the example:
 ```yaml
  beam-playground:
    name: { example name }
    description: { description }
    multifile: { true | false }
    context_line: { the line where the code starts }
    categories:
      - { category }
    complexity: { BASIC | MEDIUM | ADVANCED }
    tags:
      - { tag }
    emulators:
      kafka:
         topic:
           id: { topic name }
           dataset: { dataset_1 }
    datasets:
      { dataset_1 }:
           location: local
           format: { json | avro }
 ```
 5. Create a PR to the [Apache Beam Repository](https://github.com/apache/beam)
	<!--
	Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.
	-->

	# Contribution Guide

	This guide consists of:

	- [Project structure](#project-structure)
	- [Generated Files](#generated-files)
	- [Code processing pipeline](#code-processing-pipeline)
	- [How to add a new supported language](#how-to-add-a-new-supported-language)

	See also:
	- [Database schema](SCHEMA.md)

	## Project structure

	```
	backend/
	├── cmd
	│ ├── migration_tool # tool to apply database migrations
	│ ├── remove_unused_snippets # tool to remove old snippets manually
	│ └── server # entry point to the backend application
	├── configs # config files for each SDK
	├── containers # set up and build backend docker images
	├── datasets # datasets for examples using Kafka emulator
	├── internal # backend logic
	│ ├── api # generated grpc API files
	│ ├── cache # logic for working with cache
	│ ├── code_processing # logic for processing the received code
	│ ├── components # backend components
	│ ├── constants # code constants used in the application
	│ ├── db # logic for working with database, e.g. the Cloud Datastore
	│ ├── emulators # logic for starting various emulators, e.g. Kafka
	│ ├── environment # tools for working with application environment settings
	│ ├── errors # custom errors
	│ ├── executors # logic used to run the user submitted code
	│ ├── external_functions # logic for calling Google Cloud Functions
	│ ├── fs_tool # logic for woking with filesystem operations during run preparation
	│ ├── logger # cusotm logger
	│ ├── preparers # logic for preparing the user submitted code before execution
	│ ├── setup_tools # logic for setting up executors
	│ ├── streaming # implementation of run output streamer
	│ ├── tasks # periodic tasks scheduler
	│ ├── tests # common testing logic
	│ ├── utils # miscellaneous tools
	│ └── validators # logic for pre-execution code validation
	├── playground_functions # Google Cloud Functions for write access to the database
	├── functions.go # entry point for Cloud Functions
	├── go.mod # Go project build configuration
	├── logging.properties # configuration for Java runner logger
	├── new_scio_project.sh # script for creating new SCIO project, used by SCIO runner
	└── properties.yaml # application properties
	...
	```

	## Generated Files

	All generated files (generated grpc API files, `go.sum`) should be published to the repository.

	## Code processing pipeline

	### Controller’s work

	1. Backend receives a request for the `RunCode` API method
	2. Backend checks that the SDK from the request matches the backend’s environment.
	3. Backend generates the key of the code processing (`uuid` format), saves it to the cache, and sends it back to the
	client.
	4. Backend starts a new goroutine that processes the code from the client request.

	### Code processing goroutine

	1. Backend sets up a timeout for each code processing.
	2. Backend starts a new goroutine to check that current code processing is still actual and hasn’t been canceled by the
	client.
	3. Validation of the received code.
	4. Preparing the received code.
	5. Compilation of the received code.
	6. Execution of the received code.

	Each step (`3-6` steps) is a separate goroutine and could be stopped if code processing has been canceled, or it takes
	too much time.

	After each step (even if it ends with failure) status of the code processing changes according to a finished step, so
	the client clearly understands what is happening with the code processing at the moment.

	Status, all outputs, and all error messages are placed to the one common cache, so even if there are several instances
	it does not matter which instance process the code.

	## How to add a new supported language

	1. Add the language to [api.proto](../api/v1/api.proto) file:

	```
	enum Sdk {
	SDK_UNSPECIFIED = 0;
	SDK_JAVA = 1;
	SDK_GO = 2;
	SDK_PYTHON = 3;
	SDK_SCIO = 4;
	}
	```

	2. Create a new environment for a new language as [this one](containers/java)
	3. Create a new config file for a new language as [this one](configs/SDK_JAVA.json)
	4. Update a method to create file system according to a new language [here](internal/fs_tool/fs.go) (`NewLifeCycle()`
	method)
	5. Update a method to set up a file system according to a new
	language [here](internal/setup_tools/life_cycle/life_cycle_setuper.go) (`Setup()` method)
	6. Update a method to set up code validator according to a new language[here](internal/utils/validators_utils.go)
	7. Update a method to set up code preparers according to a new language [here](internal/utils/preparators_utils.go)
	8. Update a method to set up compiler according to a new
	language [here](internal/setup_tools/builder/setup_builder.go) (`Compiler()` method)
	9. Update a method to set up runner according to a new
	language [here](internal/setup_tools/builder/setup_builder.go) (`Runner()` method)
	10. Update a method to set up test runner according to a new
	language [here](internal/setup_tools/builder/setup_builder.go) (`TestRunner()` method)
	11. Update a method to compile client's code according to a new
	language [here](internal/code_processing/code_processing.go) (`compileStep()` method)
	12. Update a method to execute client's code according to a new
	language [here](internal/code_processing/code_processing.go) (`runStep()` method)

	## Adding an emulator-enabled example
	1. Develop an example with an appropriate dataset
	2. Put the dataset here in json or avro format as an array with objects: `playground/backend/datasets`
	3. Put the example to the Apache Beam repository
	4. Add a beam-playground comment to the example:
	```yaml
	beam-playground:
	name: { example name }
	description: { description }
	multifile: { true \| false }
	context_line: { the line where the code starts }
	categories:
	- { category }
	complexity: { BASIC \| MEDIUM \| ADVANCED }
	tags:
	- { tag }
	emulators:
	kafka:
	topic:
	id: { topic name }
	dataset: { dataset_1 }
	datasets:
	{ dataset_1 }:
	location: local
	format: { json \| avro }
	```
	5. Create a PR to the [Apache Beam Repository](https://github.com/apache/beam)