flink-python/docs/reference/pyflink.datastream/datastream.rst - flink - Git at Google

 .. ################################################################################
      Licensed to the Apache Software Foundation (ASF) under one
      or more contributor license agreements.  See the NOTICE file
      distributed with this work for additional information
      regarding copyright ownership.  The ASF licenses this file
      to you under the Apache License, Version 2.0 (the
      "License"); you may not use this file except in compliance
      with the License.  You may obtain a copy of the License at

          http://www.apache.org/licenses/LICENSE-2.0

      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
     limitations under the License.
    ################################################################################

 ==========
 DataStream
 ==========

 DataStream
 ----------

 A DataStream represents a stream of elements of the same type.

 .. currentmodule:: pyflink.datastream.data_stream

 .. autosummary::
     :toctree: api/

     DataStream.get_name
     DataStream.name
     DataStream.uid
     DataStream.set_uid_hash
     DataStream.set_parallelism
     DataStream.set_max_parallelism
     DataStream.get_type
     DataStream.get_execution_environment
     DataStream.force_non_parallel
     DataStream.set_buffer_timeout
     DataStream.start_new_chain
     DataStream.disable_chaining
     DataStream.slot_sharing_group
     DataStream.set_description
     DataStream.map
     DataStream.flat_map
     DataStream.key_by
     DataStream.filter
     DataStream.window_all
     DataStream.union
     DataStream.connect
     DataStream.shuffle
     DataStream.project
     DataStream.rescale
     DataStream.rebalance
     DataStream.forward
     DataStream.broadcast
     DataStream.process
     DataStream.assign_timestamps_and_watermarks
     DataStream.partition_custom
     DataStream.add_sink
     DataStream.sink_to
     DataStream.execute_and_collect
     DataStream.print
     DataStream.get_side_output
     DataStream.cache


 DataStreamSink
 --------------

 A Stream Sink. This is used for emitting elements from a streaming topology.

 .. currentmodule:: pyflink.datastream.data_stream

 .. autosummary::
     :toctree: api/

     DataStreamSink.name
     DataStreamSink.uid
     DataStreamSink.set_uid_hash
     DataStreamSink.set_parallelism
     DataStreamSink.set_description
     DataStreamSink.disable_chaining
     DataStreamSink.slot_sharing_group


 KeyedStream
 -----------

 A Stream Sink. This is used for emitting elements from a streaming topology.

 .. currentmodule:: pyflink.datastream.data_stream

 .. autosummary::
     :toctree: api/

     KeyedStream.map
     KeyedStream.flat_map
     KeyedStream.reduce
     KeyedStream.filter
     KeyedStream.sum
     KeyedStream.min
     KeyedStream.max
     KeyedStream.min_by
     KeyedStream.max_by
     KeyedStream.add_sink
     KeyedStream.key_by
     KeyedStream.process
     KeyedStream.window
     KeyedStream.count_window
     KeyedStream.union
     KeyedStream.connect
     KeyedStream.partition_custom
     KeyedStream.print


 CachedDataStream
 ----------------

 CachedDataStream represents a DataStream whose intermediate result will be cached at the first
 time when it is computed. And the cached intermediate result can be used in later job that using
 the same CachedDataStream to avoid re-computing the intermediate result.

 .. currentmodule:: pyflink.datastream.data_stream

 .. autosummary::
     :toctree: api/

     CachedDataStream.get_type
     CachedDataStream.get_execution_environment
     CachedDataStream.set_description
     CachedDataStream.map
     CachedDataStream.flat_map
     CachedDataStream.key_by
     CachedDataStream.filter
     CachedDataStream.window_all
     CachedDataStream.union
     CachedDataStream.connect
     CachedDataStream.shuffle
     CachedDataStream.project
     CachedDataStream.rescale
     CachedDataStream.rebalance
     CachedDataStream.forward
     CachedDataStream.broadcast
     CachedDataStream.process
     CachedDataStream.assign_timestamps_and_watermarks
     CachedDataStream.partition_custom
     CachedDataStream.add_sink
     CachedDataStream.sink_to
     CachedDataStream.execute_and_collect
     CachedDataStream.print
     CachedDataStream.get_side_output
     CachedDataStream.cache
     CachedDataStream.invalidate


 WindowedStream
 --------------

 A WindowedStream represents a data stream where elements are grouped by key, and for each
 key, the stream of elements is split into windows based on a WindowAssigner. Window emission
 is triggered based on a Trigger.

 The windows are conceptually evaluated for each key individually, meaning windows can trigger
 at different points for each key.

 Note that the WindowedStream is purely an API construct, during runtime the WindowedStream will
 be collapsed together with the KeyedStream and the operation over the window into one single
 operation.

 .. currentmodule:: pyflink.datastream.data_stream

 .. autosummary::
     :toctree: api/

     WindowedStream.get_execution_environment
     WindowedStream.get_input_type
     WindowedStream.trigger
     WindowedStream.allowed_lateness
     WindowedStream.side_output_late_data
     WindowedStream.reduce
     WindowedStream.aggregate
     WindowedStream.apply
     WindowedStream.process


 AllWindowedStream
 -----------------

 A AllWindowedStream represents a data stream where the stream of elements is split into windows
 based on a WindowAssigner. Window emission is triggered based on a Trigger.

 If an Evictor is specified it will be used to evict elements from the window after evaluation
 was triggered by the Trigger but before the actual evaluation of the window.
 When using an evictor, window performance will degrade significantly, since pre-aggregation of
 window results cannot be used.

 Note that the AllWindowedStream is purely an API construct, during runtime the AllWindowedStream
 will be collapsed together with the operation over the window into one single operation.

 .. currentmodule:: pyflink.datastream.data_stream

 .. autosummary::
     :toctree: api/

     AllWindowedStream.get_execution_environment
     AllWindowedStream.get_input_type
     AllWindowedStream.trigger
     AllWindowedStream.allowed_lateness
     AllWindowedStream.side_output_late_data
     AllWindowedStream.reduce
     AllWindowedStream.aggregate
     AllWindowedStream.apply
     AllWindowedStream.process


 ConnectedStreams
 ----------------

 ConnectedStreams represent two connected streams of (possibly) different data types.
 Connected streams are useful for cases where operations on one stream directly
 affect the operations on the other stream, usually via shared state between the streams.

 An example for the use of connected streams would be to apply rules that change over time
 onto another stream. One of the connected streams has the rules, the other stream the
 elements to apply the rules to. The operation on the connected stream maintains the
 current set of rules in the state. It may receive either a rule update and update the state
 or a data element and apply the rules in the state to the element.

 The connected stream can be conceptually viewed as a union stream of an Either type, that
 holds either the first stream's type or the second stream's type.

 .. currentmodule:: pyflink.datastream.data_stream

 .. autosummary::
     :toctree: api/

     ConnectedStreams.key_by
     ConnectedStreams.map
     ConnectedStreams.flat_map
     ConnectedStreams.process


 BroadcastStream
 ---------------

 .. currentmodule:: pyflink.datastream.data_stream

 .. autosummary::
     :toctree: api/

     BroadcastStream


 BroadcastConnectedStream
 ------------------------

 A BroadcastConnectedStream represents the result of connecting a keyed or non-keyed stream, with
 a :class:`BroadcastStream` with :class:`~state.BroadcastState` (s). As in the case of
 :class:`ConnectedStreams` these streams are useful for cases where operations on one stream
 directly affect the operations on the other stream, usually via shared state between the
 streams.

 An example for the use of such connected streams would be to apply rules that change over time
 onto another, possibly keyed stream. The stream with the broadcast state has the rules, and will
 store them in the broadcast state, while the other stream will contain the elements to apply the
 rules to. By broadcasting the rules, these will be available in all parallel instances, and can
 be applied to all partitions of the other stream.

 .. currentmodule:: pyflink.datastream.data_stream

 .. autosummary::
     :toctree: api/

     BroadcastConnectedStream.process
	.. ################################################################################
	Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	################################################################################

	==========
	DataStream
	==========

	DataStream
	----------

	A DataStream represents a stream of elements of the same type.

	.. currentmodule:: pyflink.datastream.data_stream

	.. autosummary::
	:toctree: api/

	DataStream.get_name
	DataStream.name
	DataStream.uid
	DataStream.set_uid_hash
	DataStream.set_parallelism
	DataStream.set_max_parallelism
	DataStream.get_type
	DataStream.get_execution_environment
	DataStream.force_non_parallel
	DataStream.set_buffer_timeout
	DataStream.start_new_chain
	DataStream.disable_chaining
	DataStream.slot_sharing_group
	DataStream.set_description
	DataStream.map
	DataStream.flat_map
	DataStream.key_by
	DataStream.filter
	DataStream.window_all
	DataStream.union
	DataStream.connect
	DataStream.shuffle
	DataStream.project
	DataStream.rescale
	DataStream.rebalance
	DataStream.forward
	DataStream.broadcast
	DataStream.process
	DataStream.assign_timestamps_and_watermarks
	DataStream.partition_custom
	DataStream.add_sink
	DataStream.sink_to
	DataStream.execute_and_collect
	DataStream.print
	DataStream.get_side_output
	DataStream.cache


	DataStreamSink
	--------------

	A Stream Sink. This is used for emitting elements from a streaming topology.

	.. currentmodule:: pyflink.datastream.data_stream

	.. autosummary::
	:toctree: api/

	DataStreamSink.name
	DataStreamSink.uid
	DataStreamSink.set_uid_hash
	DataStreamSink.set_parallelism
	DataStreamSink.set_description
	DataStreamSink.disable_chaining
	DataStreamSink.slot_sharing_group


	KeyedStream
	-----------

	A Stream Sink. This is used for emitting elements from a streaming topology.

	.. currentmodule:: pyflink.datastream.data_stream

	.. autosummary::
	:toctree: api/

	KeyedStream.map
	KeyedStream.flat_map
	KeyedStream.reduce
	KeyedStream.filter
	KeyedStream.sum
	KeyedStream.min
	KeyedStream.max
	KeyedStream.min_by
	KeyedStream.max_by
	KeyedStream.add_sink
	KeyedStream.key_by
	KeyedStream.process
	KeyedStream.window
	KeyedStream.count_window
	KeyedStream.union
	KeyedStream.connect
	KeyedStream.partition_custom
	KeyedStream.print


	CachedDataStream
	----------------

	CachedDataStream represents a DataStream whose intermediate result will be cached at the first
	time when it is computed. And the cached intermediate result can be used in later job that using
	the same CachedDataStream to avoid re-computing the intermediate result.

	.. currentmodule:: pyflink.datastream.data_stream

	.. autosummary::
	:toctree: api/

	CachedDataStream.get_type
	CachedDataStream.get_execution_environment
	CachedDataStream.set_description
	CachedDataStream.map
	CachedDataStream.flat_map
	CachedDataStream.key_by
	CachedDataStream.filter
	CachedDataStream.window_all
	CachedDataStream.union
	CachedDataStream.connect
	CachedDataStream.shuffle
	CachedDataStream.project
	CachedDataStream.rescale
	CachedDataStream.rebalance
	CachedDataStream.forward
	CachedDataStream.broadcast
	CachedDataStream.process
	CachedDataStream.assign_timestamps_and_watermarks
	CachedDataStream.partition_custom
	CachedDataStream.add_sink
	CachedDataStream.sink_to
	CachedDataStream.execute_and_collect
	CachedDataStream.print
	CachedDataStream.get_side_output
	CachedDataStream.cache
	CachedDataStream.invalidate


	WindowedStream
	--------------

	A WindowedStream represents a data stream where elements are grouped by key, and for each
	key, the stream of elements is split into windows based on a WindowAssigner. Window emission
	is triggered based on a Trigger.

	The windows are conceptually evaluated for each key individually, meaning windows can trigger
	at different points for each key.

	Note that the WindowedStream is purely an API construct, during runtime the WindowedStream will
	be collapsed together with the KeyedStream and the operation over the window into one single
	operation.

	.. currentmodule:: pyflink.datastream.data_stream

	.. autosummary::
	:toctree: api/

	WindowedStream.get_execution_environment
	WindowedStream.get_input_type
	WindowedStream.trigger
	WindowedStream.allowed_lateness
	WindowedStream.side_output_late_data
	WindowedStream.reduce
	WindowedStream.aggregate
	WindowedStream.apply
	WindowedStream.process


	AllWindowedStream
	-----------------

	A AllWindowedStream represents a data stream where the stream of elements is split into windows
	based on a WindowAssigner. Window emission is triggered based on a Trigger.

	If an Evictor is specified it will be used to evict elements from the window after evaluation
	was triggered by the Trigger but before the actual evaluation of the window.
	When using an evictor, window performance will degrade significantly, since pre-aggregation of
	window results cannot be used.

	Note that the AllWindowedStream is purely an API construct, during runtime the AllWindowedStream
	will be collapsed together with the operation over the window into one single operation.

	.. currentmodule:: pyflink.datastream.data_stream

	.. autosummary::
	:toctree: api/

	AllWindowedStream.get_execution_environment
	AllWindowedStream.get_input_type
	AllWindowedStream.trigger
	AllWindowedStream.allowed_lateness
	AllWindowedStream.side_output_late_data
	AllWindowedStream.reduce
	AllWindowedStream.aggregate
	AllWindowedStream.apply
	AllWindowedStream.process


	ConnectedStreams
	----------------

	ConnectedStreams represent two connected streams of (possibly) different data types.
	Connected streams are useful for cases where operations on one stream directly
	affect the operations on the other stream, usually via shared state between the streams.

	An example for the use of connected streams would be to apply rules that change over time
	onto another stream. One of the connected streams has the rules, the other stream the
	elements to apply the rules to. The operation on the connected stream maintains the
	current set of rules in the state. It may receive either a rule update and update the state
	or a data element and apply the rules in the state to the element.

	The connected stream can be conceptually viewed as a union stream of an Either type, that
	holds either the first stream's type or the second stream's type.

	.. currentmodule:: pyflink.datastream.data_stream

	.. autosummary::
	:toctree: api/

	ConnectedStreams.key_by
	ConnectedStreams.map
	ConnectedStreams.flat_map
	ConnectedStreams.process


	BroadcastStream
	---------------

	.. currentmodule:: pyflink.datastream.data_stream

	.. autosummary::
	:toctree: api/

	BroadcastStream


	BroadcastConnectedStream
	------------------------

	A BroadcastConnectedStream represents the result of connecting a keyed or non-keyed stream, with
	a :class:`BroadcastStream` with :class:`~state.BroadcastState` (s). As in the case of
	:class:`ConnectedStreams` these streams are useful for cases where operations on one stream
	directly affect the operations on the other stream, usually via shared state between the
	streams.

	An example for the use of such connected streams would be to apply rules that change over time
	onto another, possibly keyed stream. The stream with the broadcast state has the rules, and will
	store them in the broadcast state, while the other stream will contain the elements to apply the
	rules to. By broadcasting the rules, these will be available in all parallel instances, and can
	be applied to all partitions of the other stream.

	.. currentmodule:: pyflink.datastream.data_stream

	.. autosummary::
	:toctree: api/

	BroadcastConnectedStream.process