Contribution Guide

This guide consists of:

See also:

Project structure

backend/
├── cmd
│   ├── migration_tool           # tool to apply database migrations
│   ├── remove_unused_snippets   # tool to remove old snippets manually
│   └── server                   # entry point to the backend application
├── configs                      # config files for each SDK
├── containers                   # set up and build backend docker images
├── datasets                     # datasets for examples using Kafka emulator
├── internal                     # backend logic
│   ├── api                      # generated grpc API files
│   ├── cache                    # logic for working with cache
│   ├── code_processing          # logic for processing the received code
│   ├── components               # backend components
│   ├── constants                # code constants used in the application
│   ├── db                       # logic for working with database, e.g. the Cloud Datastore
│   ├── emulators                # logic for starting various emulators, e.g. Kafka
│   ├── environment              # tools for working with application environment settings
│   ├── errors                   # custom errors
│   ├── executors                # logic used to run the user submitted code
│   ├── external_functions       # logic for calling Google Cloud Functions
│   ├── fs_tool                  # logic for woking with filesystem operations during run preparation
│   ├── logger                   # cusotm logger
│   ├── preparers                # logic for preparing the user submitted code before execution
│   ├── setup_tools              # logic for setting up executors
│   ├── streaming                # implementation of run output streamer
│   ├── tasks                    # periodic tasks scheduler
│   ├── tests                    # common testing logic
│   ├── utils                    # miscellaneous tools
│   └── validators               # logic for pre-execution code validation
├── playground_functions         # Google Cloud Functions for write access to the database
├── functions.go                 # entry point for Cloud Functions
├── go.mod                       # Go project build configuration
├── logging.properties           # configuration for Java runner logger
├── new_scio_project.sh          # script for creating new SCIO project, used by SCIO runner
└── properties.yaml              # application properties
...

Generated Files

All generated files (generated grpc API files, go.sum) should be published to the repository.

Code processing pipeline

Controller’s work

  1. Backend receives a request for the RunCode API method
  2. Backend checks that the SDK from the request matches the backend’s environment.
  3. Backend generates the key of the code processing (uuid format), saves it to the cache, and sends it back to the client.
  4. Backend starts a new goroutine that processes the code from the client request.

Code processing goroutine

  1. Backend sets up a timeout for each code processing.
  2. Backend starts a new goroutine to check that current code processing is still actual and hasn’t been canceled by the client.
  3. Validation of the received code.
  4. Preparing the received code.
  5. Compilation of the received code.
  6. Execution of the received code.

Each step (3-6 steps) is a separate goroutine and could be stopped if code processing has been canceled, or it takes too much time.

After each step (even if it ends with failure) status of the code processing changes according to a finished step, so the client clearly understands what is happening with the code processing at the moment.

Status, all outputs, and all error messages are placed to the one common cache, so even if there are several instances it does not matter which instance process the code.

How to add a new supported language

  1. Add the language to api.proto file:
enum Sdk {
  SDK_UNSPECIFIED = 0;
  SDK_JAVA = 1;
  SDK_GO = 2;
  SDK_PYTHON = 3;
  SDK_SCIO = 4;
}
  1. Create a new environment for a new language as this one
  2. Create a new config file for a new language as this one
  3. Update a method to create file system according to a new language here (NewLifeCycle() method)
  4. Update a method to set up a file system according to a new language here (Setup() method)
  5. Update a method to set up code validator according to a new languagehere
  6. Update a method to set up code preparers according to a new language here
  7. Update a method to set up compiler according to a new language here (Compiler() method)
  8. Update a method to set up runner according to a new language here (Runner() method)
  9. Update a method to set up test runner according to a new language here (TestRunner() method)
  10. Update a method to compile client's code according to a new language here (compileStep() method)
  11. Update a method to execute client's code according to a new language here (runStep() method)

Adding an emulator-enabled example

  1. Develop an example with an appropriate dataset
  2. Put the dataset here in json or avro format as an array with objects: playground/backend/datasets
  3. Put the example to the Apache Beam repository
  4. Add a beam-playground comment to the example:
 beam-playground:
   name: { example name }
   description: { description }
   multifile: { true | false }
   context_line: { the line where the code starts }
   categories:
     - { category }
   complexity: { BASIC | MEDIUM | ADVANCED }
   tags:
     - { tag }
   emulators:
     kafka:
        topic:
          id: { topic name }
          dataset: { dataset_1 }
   datasets:
     { dataset_1 }:
          location: local
          format: { json | avro }
  1. Create a PR to the Apache Beam Repository