blob: 0f2101271f3c35b102d5cc01a5012b8280a8ceab [file] [log] [blame]
.. ################################################################################
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
################################################################################
=====
Table
=====
Table
=====
A :class:`~pyflink.table.Table` object is the core abstraction of the Table API.
Similar to how the DataStream API has DataStream,
the Table API is built around :class:`~pyflink.table.Table`.
A :class:`~pyflink.table.Table` object describes a pipeline of data transformations. It does not
contain the data itself in any way. Instead, it describes how to read data from a table source,
and how to eventually write data to a table sink. The declared pipeline can be
printed, optimized, and eventually executed in a cluster. The pipeline can work with bounded or
unbounded streams which enables both streaming and batch scenarios.
By the definition above, a :class:`~pyflink.table.Table` object can actually be considered as
a view in SQL terms.
The initial :class:`~pyflink.table.Table` object is constructed by a
:class:`~pyflink.table.TableEnvironment`. For example,
:func:`~pyflink.table.TableEnvironment.from_path` obtains a table from a catalog.
Every :class:`~pyflink.table.Table` object has a schema that is available through
:func:`~pyflink.table.Table.get_schema`. A :class:`~pyflink.table.Table` object is
always associated with its original table environment during programming.
Every transformation (i.e. :func:`~pyflink.table.Table.select`} or
:func:`~pyflink.table.Table.filter` on a :class:`~pyflink.table.Table` object leads to a new
:class:`~pyflink.table.Table` object.
Use :func:`~pyflink.table.Table.execute` to execute the pipeline and retrieve the transformed
data locally during development. Otherwise, use :func:`~pyflink.table.Table.execute_insert` to
write the data into a table sink.
Many methods of this class take one or more :class:`~pyflink.table.Expression` as parameters.
For fluent definition of expressions and easier readability, we recommend to add a star import:
Example:
::
>>> from pyflink.table.expressions import *
Check the documentation for more programming language specific APIs.
The following example shows how to work with a :class:`~pyflink.table.Table` object.
Example:
::
>>> from pyflink.table import EnvironmentSettings, TableEnvironment
>>> from pyflink.table.expressions import *
>>> env_settings = EnvironmentSettings.in_streaming_mode()
>>> t_env = TableEnvironment.create(env_settings)
>>> table = t_env.from_path("my_table").select(col("colA").trim(), col("colB") + 12)
>>> table.execute().print()
.. currentmodule:: pyflink.table
.. autosummary::
:toctree: api/
Table.add_columns
Table.add_or_replace_columns
Table.aggregate
Table.alias
Table.distinct
Table.drop_columns
Table.drop_columns
Table.execute
Table.execute_insert
Table.explain
Table.fetch
Table.filter
Table.flat_aggregate
Table.flat_map
Table.full_outer_join
Table.get_schema
Table.group_by
Table.intersect
Table.intersect_all
Table.join
Table.join_lateral
Table.left_outer_join
Table.left_outer_join_lateral
Table.limit
Table.map
Table.minus
Table.minus_all
Table.offset
Table.order_by
Table.over_window
Table.print_schema
Table.rename_columns
Table.right_outer_join
Table.select
Table.to_pandas
Table.union
Table.union_all
Table.where
Table.window
GroupedTable
============
A table that has been grouped on a set of grouping keys.
.. currentmodule:: pyflink.table
.. autosummary::
:toctree: api/
GroupedTable.select
GroupedTable.aggregate
GroupedTable.flat_aggregate
GroupWindowedTable
==================
A table that has been windowed for :class:`~pyflink.table.GroupWindow`.
.. currentmodule:: pyflink.table
.. autosummary::
:toctree: api/
GroupWindowedTable.group_by
WindowGroupedTable
==================
A table that has been windowed and grouped for :class:`~pyflink.table.window.GroupWindow`.
.. currentmodule:: pyflink.table
.. autosummary::
:toctree: api/
WindowGroupedTable.select
WindowGroupedTable.aggregate
OverWindowedTable
=================
A table that has been windowed for :class:`~pyflink.table.window.OverWindow`.
Unlike group windows, which are specified in the GROUP BY clause, over windows do not collapse
rows. Instead over window aggregates compute an aggregate for each input row over a range of
its neighboring rows.
.. currentmodule:: pyflink.table
.. autosummary::
:toctree: api/
OverWindowedTable.select
AggregatedTable
===============
A table that has been performed on the aggregate function.
.. currentmodule:: pyflink.table.table
.. autosummary::
:toctree: api/
AggregatedTable.select
FlatAggregateTable
==================
A table that performs flatAggregate on a :class:`~pyflink.table.Table`, a
:class:`~pyflink.table.GroupedTable` or a :class:`~pyflink.table.WindowGroupedTable`
.. currentmodule:: pyflink.table.table
.. autosummary::
:toctree: api/
FlatAggregateTable.select