blob: cc095c3df98e6ef37001d307b72aa1aee78330db [file] [log] [blame] [view]
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
# Contribution Guide
This guide consists of:
- [Project structure](#project-structure)
- [Generated Files](#generated-files)
- [Code processing pipeline](#code-processing-pipeline)
- [How to add a new supported language](#how-to-add-a-new-supported-language)
See also:
- [Database schema](SCHEMA.md)
## Project structure
```
backend/
├── cmd
│ ├── migration_tool # tool to apply database migrations
│ ├── remove_unused_snippets # tool to remove old snippets manually
│ └── server # entry point to the backend application
├── configs # config files for each SDK
├── containers # set up and build backend docker images
├── datasets # datasets for examples using Kafka emulator
├── internal # backend logic
│ ├── api # generated grpc API files
│ ├── cache # logic for working with cache
│ ├── code_processing # logic for processing the received code
│ ├── components # backend components
│ ├── constants # code constants used in the application
│ ├── db # logic for working with database, e.g. the Cloud Datastore
│ ├── emulators # logic for starting various emulators, e.g. Kafka
│ ├── environment # tools for working with application environment settings
│ ├── errors # custom errors
│ ├── executors # logic used to run the user submitted code
│ ├── external_functions # logic for calling Google Cloud Functions
│ ├── fs_tool # logic for woking with filesystem operations during run preparation
│ ├── logger # cusotm logger
│ ├── preparers # logic for preparing the user submitted code before execution
│ ├── setup_tools # logic for setting up executors
│ ├── streaming # implementation of run output streamer
│ ├── tasks # periodic tasks scheduler
│ ├── tests # common testing logic
│ ├── utils # miscellaneous tools
│ └── validators # logic for pre-execution code validation
├── playground_functions # Google Cloud Functions for write access to the database
├── functions.go # entry point for Cloud Functions
├── go.mod # Go project build configuration
├── logging.properties # configuration for Java runner logger
├── new_scio_project.sh # script for creating new SCIO project, used by SCIO runner
└── properties.yaml # application properties
...
```
## Generated Files
All generated files (generated grpc API files, `go.sum`) should be published to the repository.
## Code processing pipeline
### Controller’s work
1. Backend receives a request for the `RunCode` API method
2. Backend checks that the SDK from the request matches the backend’s environment.
3. Backend generates the key of the code processing (`uuid` format), saves it to the cache, and sends it back to the
client.
4. Backend starts a new goroutine that processes the code from the client request.
### Code processing goroutine
1. Backend sets up a timeout for each code processing.
2. Backend starts a new goroutine to check that current code processing is still actual and hasn’t been canceled by the
client.
3. Validation of the received code.
4. Preparing the received code.
5. Compilation of the received code.
6. Execution of the received code.
Each step (`3-6` steps) is a separate goroutine and could be stopped if code processing has been canceled, or it takes
too much time.
After each step (even if it ends with failure) status of the code processing changes according to a finished step, so
the client clearly understands what is happening with the code processing at the moment.
Status, all outputs, and all error messages are placed to the one common cache, so even if there are several instances
it does not matter which instance process the code.
## How to add a new supported language
1. Add the language to [api.proto](../api/v1/api.proto) file:
```
enum Sdk {
SDK_UNSPECIFIED = 0;
SDK_JAVA = 1;
SDK_GO = 2;
SDK_PYTHON = 3;
SDK_SCIO = 4;
}
```
2. Create a new environment for a new language as [this one](containers/java)
3. Create a new config file for a new language as [this one](configs/SDK_JAVA.json)
4. Update a method to create file system according to a new language [here](internal/fs_tool/fs.go) (`NewLifeCycle()`
method)
5. Update a method to set up a file system according to a new
language [here](internal/setup_tools/life_cycle/life_cycle_setuper.go) (`Setup()` method)
6. Update a method to set up code validator according to a new language[here](internal/utils/validators_utils.go)
7. Update a method to set up code preparers according to a new language [here](internal/utils/preparators_utils.go)
8. Update a method to set up compiler according to a new
language [here](internal/setup_tools/builder/setup_builder.go) (`Compiler()` method)
9. Update a method to set up runner according to a new
language [here](internal/setup_tools/builder/setup_builder.go) (`Runner()` method)
10. Update a method to set up test runner according to a new
language [here](internal/setup_tools/builder/setup_builder.go) (`TestRunner()` method)
11. Update a method to compile client's code according to a new
language [here](internal/code_processing/code_processing.go) (`compileStep()` method)
12. Update a method to execute client's code according to a new
language [here](internal/code_processing/code_processing.go) (`runStep()` method)
## Adding an emulator-enabled example
1. Develop an example with an appropriate dataset
2. Put the dataset here in json or avro format as an array with objects: `playground/backend/datasets`
3. Put the example to the Apache Beam repository
4. Add a beam-playground comment to the example:
```yaml
beam-playground:
name: { example name }
description: { description }
multifile: { true | false }
context_line: { the line where the code starts }
categories:
- { category }
complexity: { BASIC | MEDIUM | ADVANCED }
tags:
- { tag }
emulators:
kafka:
topic:
id: { topic name }
dataset: { dataset_1 }
datasets:
{ dataset_1 }:
location: local
format: { json | avro }
```
5. Create a PR to the [Apache Beam Repository](https://github.com/apache/beam)