{% include code-style-navbar.md %}
{% toc %}
Additional guidelines about changes in specific components.
Where should the config option go?
‘flink-conf.yaml’: All configuration that pertains to execution behavior that one may want to standardize across jobs. Think of it as parameters someone would set wearing an “ops” hat, or someone that provides a stream processing platform to other teams.
‘ExecutionConfig’: Parameters specific to an individual Flink application, needed by the operators during execution. Typical examples are watermark interval, serializer parameters, object reuse.
ExecutionEnvironment (in code): Everything that is specific to an individual Flink application and is only needed to build program / dataflow, not needed inside the operators during execution.
How to name config keys:
Config key names should be hierarchical. Think of the configuration as nested objects (JSON style)
taskmanager: { jvm-exit-on-oom: true, network: { detailed-metrics: false, request-backoff: { initial: 100, max: 10000 }, memory: { fraction: 0.1, min: 64MB, max: 1GB, buffers-per-channel: 2, floating-buffers-per-gate: 16 } } }
The resulting config keys should hence be:
NOT "taskmanager.detailed.network.metrics"
But rather "taskmanager.network.detailed-metrics"
Connectors are historically hard to implement and need to deal with many aspects of threading, concurrency, and checkpointing.
As part of FLIP-27 we are working on making this much simpler for sources. New sources should not have to deal with any aspect of concurrency/threading and checkpointing any more.
A similar FLIP can be expected for sinks in the near future.
Examples should be self-contained and not require systems other than Flink to run. Except for examples that show how to use specific connectors, like the Kafka connector. Sources/sinks that are ok to use are StreamExecutionEnvironment.socketTextStream
, which should not be used in production but is quite handy for exploring how things work, and file-based sources/sinks. (For streaming, there is the continuous file source)
Examples should also not be pure toy-examples but strike a balance between real-world code and purely abstract examples. The WordCount example is quite long in the tooth by now but it’s a good showcase of simple code that highlights functionality and can do useful things.
Examples should also be heavy in comments. They should describe the general idea of the example in the class-level Javadoc and describe what is happening and what functionality is used throughout the code. The expected input data and output data should also be described.
Examples should include parameter parsing, so that you can run an example (from the Jar that is created for each example using bin/flink run path/to/myExample.jar --param1 … --param2
.
The SQL standard should be the main source of truth.
Discuss divergence from the standard or vendor-specific interpretations.
Consider the Table API as a bridge between the SQL and Java/Scala programming world.
SHIFT_LEFT
function, make sure that the contribution is general enough not only for INT
but also BIGINT
or TINYINT
.Test for nullability.
NULL
for almost every operation and has a 3-valued boolean logic.Avoid full integration tests
Don’t introduce physical plan changes in minor releases!
Keep Java in mind when designing interfaces.