| ################################################################################ |
| # Licensed to the Apache Software Foundation (ASF) under one |
| # or more contributor license agreements. See the NOTICE file |
| # distributed with this work for additional information |
| # regarding copyright ownership. The ASF licenses this file |
| # to you under the Apache License, Version 2.0 (the |
| # "License"); you may not use this file except in compliance |
| # with the License. You may obtain a copy of the License at |
| # |
| # http://www.apache.org/licenses/LICENSE-2.0 |
| # |
| # Unless required by applicable law or agreed to in writing, software |
| # distributed under the License is distributed on an "AS IS" BASIS, |
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| # See the License for the specific language governing permissions and |
| # limitations under the License. |
| ################################################################################ |
| |
| """ |
| Entry point classes of Flink DataStream API: |
| |
| - :class:`StreamExecutionEnvironment`: |
| The context in which a streaming program is executed. |
| - :class:`DataStream`: |
| Represents a stream of elements of the same type. A DataStream can be transformed |
| into another DataStream by applying a transformation. |
| - :class:`KeyedStream`: |
| Represents a :class:`DataStream` where elements are partitioned by key using a |
| provided KeySelector. |
| - :class:`WindowedStream`: |
| Represents a data stream where elements are grouped by key, and for each |
| key, the stream of elements is split into windows based on a WindowAssigner. Window emission |
| is triggered based on a Trigger. |
| - :class:`ConnectedStreams`: |
| Represent two connected streams of (possibly) different data types. Connected |
| streams are useful for cases where operations on one stream directly affect the operations on |
| the other stream, usually via shared state between the streams. |
| |
| Functions used to transform a :class:`DataStream` into another :class:`DataStream`: |
| |
| - :class:`MapFunction`: |
| Performs a map transformation of a :class:`DataStream` at element wise. |
| - :class:`CoMapFunction`: |
| Performs a map transformation over two connected streams. |
| - :class:`FlatMapFunction`: |
| Performs a flatmap transformation of a :class:`DataStream` which produces zero, one, or more |
| elements for each input element. |
| - :class:`CoFlatMapFunction`: |
| Performs a flatmap transformation over two connected streams. |
| - :class:`FilterFunction`: |
| A filter function is a predicate applied individually to each record. |
| - :class:`ReduceFunction`: |
| Combines groups of elements to a single value. |
| - :class:`ProcessFunction`: |
| Similar to :class:`FlatMapFunction`, except that it could access the current timestamp and |
| watermark in :class:`ProcessFunction`. |
| - :class:`KeyedProcessFunction`: |
| Similar to :class:`ProcessFunction`, except that it was applied to a :class:`KeyedStream` and |
| could register event-time and processing-time timers. |
| - :class:`CoProcessFunction`: |
| Similar to :class:`CoFlatMapFunction`, except that it could access the current timestamp and |
| watermark in :class:`CoProcessFunction`. |
| - :class:`KeyedCoProcessFunction`: |
| Similar to :class:`CoProcessFunction`, except that it was applied to a keyed |
| :class:`ConnectedStreams` and could register event-time and processing-time timers. |
| - :class:`WindowFunction`: |
| Base interface for functions that are evaluated over keyed (grouped) windows. |
| - :class:`ProcessWindowFunction`: |
| Similar to :class:`WindowFunction`, except that it could access a context for retrieving extra |
| information such as the current timestamp, the watermark, etc. |
| - :class:`AggregateFunction`: |
| Base class for a user-defined aggregate function. |
| - :class:`RuntimeContext`: |
| Contains information about the context in which functions are executed. Each |
| parallel instance of the function will have a context through which it can access static |
| contextual information (such as the current parallelism), etc. |
| |
| Classes to define window: |
| |
| - :class:`Window`: |
| A grouping of elements into finite buckets. |
| - :class:`TimeWindow`: |
| A grouping of elements according to a time interval from start (inclusive) to end (exclusive). |
| - :class:`CountWindow`: |
| A grouping of elements according to element count from start (inclusive) to end (exclusive). |
| - :class:`WindowAssigner`: |
| Assigns zero or more :class:`Window` to an element. |
| - :class:`MergingWindowAssigner`: |
| A :class:`WindowAssigner` that can merge windows. |
| - :class:`TriggerResult`: |
| Result type for trigger methods. This determines what happens with the window, for example |
| whether the window function should be called, or the window should be discarded. |
| - :class:`Trigger`: |
| Determines when a pane of a window should be evaluated to emit the results for that |
| part of the window. |
| |
| Classes to define the behavior of checkpoint and state backend: |
| |
| - :class:`CheckpointingMode`: |
| Defines what consistency guarantees the system gives in the presence of failures. |
| - :class:`CheckpointConfig`: |
| Configuration that captures all checkpointing related settings. |
| - :class:`StateBackend`: |
| Base class of the state backends which define how the state of a streaming application is |
| stored locally within the cluster. Different state backends store their state in different |
| fashions, and use different data structures to hold the state of a running application. |
| - :class:`HashMapStateBackend`: |
| Holds the working state in the memory (JVM heap) of the TaskManagers and |
| checkpoints based on the configured :class:`CheckpointStorage`. |
| - :class:`EmbeddedRocksDBStateBackend`: |
| Stores its state in an embedded `RocksDB` instance. This state backend can store very large |
| state that exceeds memory and spills to local disk. |
| - :class:`CustomStateBackend`: |
| A wrapper of customized java state backend. |
| - :class:`JobManagerCheckpointStorage`: |
| Checkpoints state directly to the JobManager's memory (hence the name), but savepoints will |
| be persisted to a file system. |
| - :class:`FileSystemCheckpointStorage`: |
| Checkpoints state as files to a file system. Each checkpoint individually will store all its |
| files in a subdirectory that includes the checkpoint number, such as |
| `hdfs://namenode:port/flink-checkpoints/chk-17/`. |
| - :class:`CustomCheckpointStorage`: |
| A wrapper of customized java checkpoint storage. |
| |
| Classes for state operations: |
| |
| - :class:`state.ValueState`: |
| Interface for partitioned single-value state. The value can be retrieved or updated. |
| - :class:`state.ListState`: |
| Interface for partitioned list state in Operations. The state is accessed and modified by |
| user functions, and checkpointed consistently by the system as part of the distributed |
| snapshots. |
| - :class:`state.MapState`: |
| Interface for partitioned key-value state. The key-value pair can be added, updated and |
| retrieved. |
| - :class:`state.ReducingState`: |
| Interface for reducing state. Elements can be added to the state, they will be combined using |
| a :class:`ReduceFunction`. The current state can be inspected. |
| - :class:`state.AggregatingState`: |
| Interface for aggregating state, based on an :class:`AggregateFunction`. Elements that are |
| added to this type of state will be eagerly pre-aggregated using a given AggregateFunction. |
| - :class:`state.StateTtlConfig`: |
| Configuration of state TTL logic. |
| |
| Classes to define source & sink: |
| |
| - :class:`connectors.FlinkKafkaConsumer`: |
| A streaming data source that pulls a parallel data stream from Apache Kafka. |
| - :class:`connectors.FlinkKafkaProducer`: |
| A streaming data sink to produce data into a Kafka topic. |
| - :class:`connectors.FileSource`: |
| A unified data source that reads files - both in batch and in streaming mode. |
| This source supports all (distributed) file systems and object stores that can be accessed via |
| the Flink's FileSystem class. |
| - :class:`connectors.FileSink`: |
| A unified sink that emits its input elements to FileSystem files within buckets. This |
| sink achieves exactly-once semantics for both BATCH and STREAMING. |
| - :class:`connectors.NumberSequenceSource`: |
| A data source that produces a sequence of numbers (longs). This source is useful for testing |
| and for cases that just need a stream of N events of any kind. |
| - :class:`connectors.JdbcSink`: |
| A data sink to produce data into an external storage using JDBC. |
| - :class:`connectors.StreamingFileSink`: |
| Sink that emits its input elements to files within buckets. This is integrated with the |
| checkpointing mechanism to provide exactly once semantics. |
| |
| Other important classes: |
| |
| - :class:`TimeCharacteristic`: |
| Defines how the system determines time for time-dependent order and operations that depend |
| on time (such as time windows). |
| - :class:`TimeDomain`: |
| Specifies whether a firing timer is based on event time or processing time. |
| - :class:`KeySelector`: |
| The extractor takes an object and returns the deterministic key for that object. |
| - :class:`Partitioner`: |
| Function to implement a custom partition assignment for keys. |
| - :class:`SinkFunction`: |
| Interface for implementing user defined sink functionality. |
| - :class:`SourceFunction`: |
| Interface for implementing user defined source functionality. |
| """ |
| from pyflink.datastream.checkpoint_config import CheckpointConfig, ExternalizedCheckpointCleanup |
| from pyflink.datastream.checkpointing_mode import CheckpointingMode |
| from pyflink.datastream.data_stream import DataStream, KeyedStream, WindowedStream, \ |
| ConnectedStreams, DataStreamSink |
| from pyflink.datastream.execution_mode import RuntimeExecutionMode |
| from pyflink.datastream.functions import (MapFunction, CoMapFunction, FlatMapFunction, |
| CoFlatMapFunction, ReduceFunction, RuntimeContext, |
| KeySelector, FilterFunction, Partitioner, SourceFunction, |
| SinkFunction, CoProcessFunction, KeyedProcessFunction, |
| KeyedCoProcessFunction, AggregateFunction, WindowFunction, |
| ProcessWindowFunction) |
| from pyflink.datastream.slot_sharing_group import SlotSharingGroup, MemorySize |
| from pyflink.datastream.state_backend import (StateBackend, MemoryStateBackend, FsStateBackend, |
| RocksDBStateBackend, CustomStateBackend, |
| PredefinedOptions, HashMapStateBackend, |
| EmbeddedRocksDBStateBackend) |
| from pyflink.datastream.checkpoint_storage import (CheckpointStorage, JobManagerCheckpointStorage, |
| FileSystemCheckpointStorage, |
| CustomCheckpointStorage) |
| from pyflink.datastream.stream_execution_environment import StreamExecutionEnvironment |
| from pyflink.datastream.time_characteristic import TimeCharacteristic |
| from pyflink.datastream.time_domain import TimeDomain |
| from pyflink.datastream.functions import ProcessFunction |
| from pyflink.datastream.timerservice import TimerService |
| from pyflink.datastream.window import Window, TimeWindow, CountWindow, WindowAssigner, \ |
| MergingWindowAssigner, TriggerResult, Trigger |
| |
| __all__ = [ |
| 'StreamExecutionEnvironment', |
| 'DataStream', |
| 'KeyedStream', |
| 'WindowedStream', |
| 'ConnectedStreams', |
| 'DataStreamSink', |
| 'MapFunction', |
| 'CoMapFunction', |
| 'FlatMapFunction', |
| 'CoFlatMapFunction', |
| 'ReduceFunction', |
| 'FilterFunction', |
| 'ProcessFunction', |
| 'KeyedProcessFunction', |
| 'CoProcessFunction', |
| 'KeyedCoProcessFunction', |
| 'WindowFunction', |
| 'ProcessWindowFunction', |
| 'AggregateFunction', |
| 'RuntimeContext', |
| 'TimerService', |
| 'CheckpointingMode', |
| 'CheckpointConfig', |
| 'ExternalizedCheckpointCleanup', |
| 'StateBackend', |
| 'HashMapStateBackend', |
| 'EmbeddedRocksDBStateBackend', |
| 'CustomStateBackend', |
| 'MemoryStateBackend', |
| 'RocksDBStateBackend', |
| 'FsStateBackend', |
| 'PredefinedOptions', |
| 'CheckpointStorage', |
| 'JobManagerCheckpointStorage', |
| 'FileSystemCheckpointStorage', |
| 'CustomCheckpointStorage', |
| 'RuntimeExecutionMode', |
| 'Window', |
| 'TimeWindow', |
| 'CountWindow', |
| 'WindowAssigner', |
| 'MergingWindowAssigner', |
| 'TriggerResult', |
| 'Trigger', |
| 'TimeCharacteristic', |
| 'TimeDomain', |
| 'KeySelector', |
| 'Partitioner', |
| 'SourceFunction', |
| 'SinkFunction', |
| 'SlotSharingGroup', |
| 'MemorySize' |
| ] |