docs/index.rst - kyuubi - Git at Google

 .. Licensed to the Apache Software Foundation (ASF) under one or more
    contributor license agreements.  See the NOTICE file distributed with
    this work for additional information regarding copyright ownership.
    The ASF licenses this file to You under the Apache License, Version 2.0
    (the "License"); you may not use this file except in compliance with
    the License.  You may obtain a copy of the License at

 ..    http://www.apache.org/licenses/LICENSE-2.0

 .. Unless required by applicable law or agreed to in writing, software
    distributed under the License is distributed on an "AS IS" BASIS,
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License.

 .. Kyuubi documentation master file, created by
    sphinx-quickstart on Wed Oct 28 14:23:28 2020.
    You can adapt this file completely to your liking, but it should at least
    contain the root `toctree` directive.

 Welcome
 =======

 Apache Kyuubi™ is a distributed and multi-tenant gateway to provide serverless
 SQL on Data Warehouses and Lakehouses.

 Kyuubi builds distributed SQL query engines on top of various kinds of modern
 computing frameworks, e.g.,
 Apache `Spark <https://spark.apache.org/>`_,
 `Flink <https://flink.apache.org/>`_,
 `Doris <https://doris.apache.org/>`_,
 `Hive <https://hive.apache.org/>`_,
 `Trino <https://trino.io/>`_,
 and `StarRocks <https://www.starrocks.io/>`_, etc., to query massive datasets distributed
 over fleets of machines from heterogeneous data sources.

 The Kyuubi Server lane of the below swimlane divides our prospective users into
 end users and administrators. On the one hand, it hides the technical details of
 computing and storage from the end users. Thus, they can focus on their business and
 data with familiar tools. On the other hand, it hides the complexity of business
 logic from the administrators. Therefore, they can upgrade components on the server
 side with zero maintenance downtime, optimize workloads with a clear view of what
 end users are doing, ensure authentication, authorization, and auditing for both
 cluster and data security, and so forth.

 .. image:: imgs/kyuubi_layers.drawio.png
    :scale: 60
    :align: center
    :target: layers

 In general, the complete ecosystem of Kyuubi falls into the hierarchies shown in
 the above figure, with each layer loosely coupled to the other. It's a child's play
 to combine some of the components above to build a modern data stack. For example,
 you can use Kyuubi, Spark and `Iceberg <https://iceberg.apache.org/>`_ to build
 and manage Data Lakehouse with pure SQL for both data processing, e.g. ETL, and online analytics processing(OLAP), e.g. BI.
 All workloads can be done on one platform, using one copy of data, with one SQL interface.

 A Unified Gateway
 -----------------

 The Server module plays the role of a unified gateway. The server enables simplified,
 secure access to any cluster resource through an entry point to deploy different
 workloads for end(remote) users. Behind this single entry, administrators have a single
 point for configuration, security, and control of remote access to clusters. And end
 users have an improved experience with seamless data processing with any the Kyuubi
 engine they need.

 Application Programming Interface
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 End users can use the application programming interface listed below for connectivity
 and interoperation between supported clients and a Kyuubi server. The current implementations are:

 - Hive Thrift Protocol
    - A HiveServer2-compatible interface that allows end users to use a thrift
      client(cross-language support, both tcp and http), a `Java Database Connectivity(JDBC) <client/jdbc/index.html>`_ interface over
      thrift, or an `Open Database Connectivity (ODBC) <client/odbc/index.html>`_ interface over a JDBC-to-ODBC bridge to communicate with Kyuubi.
 - `RESTful APIs <client/rest/index.html>`_
    - It provides system management APIs, including engines, sessions, operations, and miscellaneous ones.
    - It provides methods that allow clients to submit SQL queries and receive the query results, submit metadata requests and receive metadata results.
    - It enables easy submission of self-contained applications for batch processing, such as Spark jobs.
 - MySQL Protocol
    - A MySQL-compatible interface that allows end users to use MySQL Connectors, such as Connector/J, to communicate with Kyuubi.
 - We've planned to add more
    - Please join our mailing lists if you have any ideas or questions.


 Multi-tenancy
 ~~~~~~~~~~~~~

 Kyuubi supports the end-to-end multi-tenancy. On the control plane, the Kyuubi server
 provides a centralized authentication layer to reduce the risk of data and resource
 breaches. It supports various protocols, such as LDAP and Kerberos, for securing networking
 between clients and servers. On the data plane, the Kyuubi engines use the same trusted
 client identities to instantiate themselves. The resource acquirement and data
 and metadata access all happen within their own engine. Thus, cluster managers and storage
 providers can easily guarantee data and resource security. Besides, Kyuubi also provides engine
 authorization extensions to optimize the data security model to fine-grained row/column level.
 Please see the `security page <security/index.html>`_ for more information.

 High Availability
 ~~~~~~~~~~~~~~~~~

 Kyuubi is designed with High availability (HA), ensuring it operates continuously
 without failure for a designated period. HA works to provide Kyuubi that meets an
 agreed-upon operational performance level.

 - `Load balancing <deployment/high_availability_guide.html>`_
    - It becomes necessary for Kyuubi in a real-world production environment to ensure high availability because of multi-tenant access.
    - It effectively prevents single point of failures.
    - It helps achieve zero downtime for planned system maintenance
 - `Failure detectability <monitor/index.html>`_
    - Failures and system load of kyuubi server and engines are visible via metrics, logs, and so forth.

 Serverless SQL and More
 -----------------------

 Serverless SQL on Lakehouses makes it easier for end users to gain insight
 from the data universe and optimize data pipelines. It enables:

 - The same user experience as an RDBMS using familiar SQL for various workloads.
 - Extensive and secure data access capability across diverse data sources.
 - High performance on large volumes of data with scalable computing resources.

 Besides, Kyuubi also supports submissions of code snippets and self-contained applications serverlessly
 for more advanced usage.

 Ease of Use
 ~~~~~~~~~~~

 End users could have an optimized experience exploring their data universe in a
 serverless way, using either JDBC + SQL or REST + code. For most scenarios, the
 superpower of corresponding engines, such as Spark, and Flink, is no longer necessary.
 That is, most work related to deployment, runtime optimization, etc., should be done
 by professionals on the Kyuubi server side. It is suitable for the following scenarios:

 - Basic discovery and exploration
    - Quickly reason about the data in various formats (Parquet, CSV, JSON, text)
      in your data lake in cloud storage or an on-prem HDFS cluster.

 - Lakehouse formation and analytics
    - Easily build an ACID table storage layer via Hudi, Iceberg, Delta Lake or/and Paimon.

 - Logical data warehouse
    - Provide a relational abstraction on top of disparate data without ETL jobs,
      from collecting to connecting.

 Run Anywhere at Any Scale
 ~~~~~~~~~~~~~~~~~~~~~~~~~

 Most of the Kyuubi engine types have a distributed backend or can schedule distributed
 tasks at runtime. They can process data on single-node machines or clusters, such as
 YARN and Kubernetes. Besides, the Kyuubi server also supports running on bare metal or
 in a docker.

 High Performance
 ~~~~~~~~~~~~~~~~

 Query performance is one of the critical factors in implementing Serverless SQL.
 Implementing serviceability on state-of-the-art query engines for bigdata lays
 the foundation for us to achieve this goal:

 - State-of-the-art query engines
 - Multiple application for high throughput
 - Sharable execution runtime for low latency
 - Server-side global and continuous optimization
 - Auxiliary performance plugins, such as Z-Ordering,
   Query Optimizer, and so on

 Another goal of Serverless SQL is to make end users
 need not or rarely care about tricky performance
 optimization issues.

 What's Next
 -----------

 .. toctree::
    :caption: Admin Guide
    :maxdepth: 2
    :glob:

    quick_start/index
    configuration/settings
    deployment/index
    Security <security/index>
    monitor/index
    tools/index

 .. toctree::
    :caption: User Guide
    :maxdepth: 2
    :glob:

    Clients & APIs <client/index>

 .. toctree::
    :caption: Extension Guide
    :maxdepth: 2
    :glob:

    Extensions <extensions/index>

 .. toctree::
    :caption: Connectors
    :maxdepth: 2
    :glob:

    connector/index

 .. toctree::
    :caption: Kyuubi Insider
    :maxdepth: 2

    overview/index

 .. toctree::
    :caption: Contributing
    :maxdepth: 2

    contributing/code/index
    contributing/doc/index

 .. toctree::
    :caption: Appendix
    :maxdepth: 2

    appendix/index
	.. Licensed to the Apache Software Foundation (ASF) under one or more
	contributor license agreements. See the NOTICE file distributed with
	this work for additional information regarding copyright ownership.
	The ASF licenses this file to You under the Apache License, Version 2.0
	(the "License"); you may not use this file except in compliance with
	the License. You may obtain a copy of the License at

	.. http://www.apache.org/licenses/LICENSE-2.0

	.. Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.

	.. Kyuubi documentation master file, created by
	sphinx-quickstart on Wed Oct 28 14:23:28 2020.
	You can adapt this file completely to your liking, but it should at least
	contain the root `toctree` directive.

	Welcome
	=======

	Apache Kyuubi™ is a distributed and multi-tenant gateway to provide serverless
	SQL on Data Warehouses and Lakehouses.

	Kyuubi builds distributed SQL query engines on top of various kinds of modern
	computing frameworks, e.g.,
	Apache `Spark <https://spark.apache.org/>`_,
	`Flink <https://flink.apache.org/>`_,
	`Doris <https://doris.apache.org/>`_,
	`Hive <https://hive.apache.org/>`_,
	`Trino <https://trino.io/>`_,
	and `StarRocks <https://www.starrocks.io/>`_, etc., to query massive datasets distributed
	over fleets of machines from heterogeneous data sources.

	The Kyuubi Server lane of the below swimlane divides our prospective users into
	end users and administrators. On the one hand, it hides the technical details of
	computing and storage from the end users. Thus, they can focus on their business and
	data with familiar tools. On the other hand, it hides the complexity of business
	logic from the administrators. Therefore, they can upgrade components on the server
	side with zero maintenance downtime, optimize workloads with a clear view of what
	end users are doing, ensure authentication, authorization, and auditing for both
	cluster and data security, and so forth.

	.. image:: imgs/kyuubi_layers.drawio.png
	:scale: 60
	:align: center
	:target: layers

	In general, the complete ecosystem of Kyuubi falls into the hierarchies shown in
	the above figure, with each layer loosely coupled to the other. It's a child's play
	to combine some of the components above to build a modern data stack. For example,
	you can use Kyuubi, Spark and `Iceberg <https://iceberg.apache.org/>`_ to build
	and manage Data Lakehouse with pure SQL for both data processing, e.g. ETL, and online analytics processing(OLAP), e.g. BI.
	All workloads can be done on one platform, using one copy of data, with one SQL interface.

	A Unified Gateway
	-----------------

	The Server module plays the role of a unified gateway. The server enables simplified,
	secure access to any cluster resource through an entry point to deploy different
	workloads for end(remote) users. Behind this single entry, administrators have a single
	point for configuration, security, and control of remote access to clusters. And end
	users have an improved experience with seamless data processing with any the Kyuubi
	engine they need.

	Application Programming Interface
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	End users can use the application programming interface listed below for connectivity
	and interoperation between supported clients and a Kyuubi server. The current implementations are:

	- Hive Thrift Protocol
	- A HiveServer2-compatible interface that allows end users to use a thrift
	client(cross-language support, both tcp and http), a `Java Database Connectivity(JDBC) <client/jdbc/index.html>`_ interface over
	thrift, or an `Open Database Connectivity (ODBC) <client/odbc/index.html>`_ interface over a JDBC-to-ODBC bridge to communicate with Kyuubi.
	- `RESTful APIs <client/rest/index.html>`_
	- It provides system management APIs, including engines, sessions, operations, and miscellaneous ones.
	- It provides methods that allow clients to submit SQL queries and receive the query results, submit metadata requests and receive metadata results.
	- It enables easy submission of self-contained applications for batch processing, such as Spark jobs.
	- MySQL Protocol
	- A MySQL-compatible interface that allows end users to use MySQL Connectors, such as Connector/J, to communicate with Kyuubi.
	- We've planned to add more
	- Please join our mailing lists if you have any ideas or questions.


	Multi-tenancy
	~~~~~~~~~~~~~

	Kyuubi supports the end-to-end multi-tenancy. On the control plane, the Kyuubi server
	provides a centralized authentication layer to reduce the risk of data and resource
	breaches. It supports various protocols, such as LDAP and Kerberos, for securing networking
	between clients and servers. On the data plane, the Kyuubi engines use the same trusted
	client identities to instantiate themselves. The resource acquirement and data
	and metadata access all happen within their own engine. Thus, cluster managers and storage
	providers can easily guarantee data and resource security. Besides, Kyuubi also provides engine
	authorization extensions to optimize the data security model to fine-grained row/column level.
	Please see the `security page <security/index.html>`_ for more information.

	High Availability
	~~~~~~~~~~~~~~~~~

	Kyuubi is designed with High availability (HA), ensuring it operates continuously
	without failure for a designated period. HA works to provide Kyuubi that meets an
	agreed-upon operational performance level.

	- `Load balancing <deployment/high_availability_guide.html>`_
	- It becomes necessary for Kyuubi in a real-world production environment to ensure high availability because of multi-tenant access.
	- It effectively prevents single point of failures.
	- It helps achieve zero downtime for planned system maintenance
	- `Failure detectability <monitor/index.html>`_
	- Failures and system load of kyuubi server and engines are visible via metrics, logs, and so forth.

	Serverless SQL and More
	-----------------------

	Serverless SQL on Lakehouses makes it easier for end users to gain insight
	from the data universe and optimize data pipelines. It enables:

	- The same user experience as an RDBMS using familiar SQL for various workloads.
	- Extensive and secure data access capability across diverse data sources.
	- High performance on large volumes of data with scalable computing resources.

	Besides, Kyuubi also supports submissions of code snippets and self-contained applications serverlessly
	for more advanced usage.

	Ease of Use
	~~~~~~~~~~~

	End users could have an optimized experience exploring their data universe in a
	serverless way, using either JDBC + SQL or REST + code. For most scenarios, the
	superpower of corresponding engines, such as Spark, and Flink, is no longer necessary.
	That is, most work related to deployment, runtime optimization, etc., should be done
	by professionals on the Kyuubi server side. It is suitable for the following scenarios:

	- Basic discovery and exploration
	- Quickly reason about the data in various formats (Parquet, CSV, JSON, text)
	in your data lake in cloud storage or an on-prem HDFS cluster.

	- Lakehouse formation and analytics
	- Easily build an ACID table storage layer via Hudi, Iceberg, Delta Lake or/and Paimon.

	- Logical data warehouse
	- Provide a relational abstraction on top of disparate data without ETL jobs,
	from collecting to connecting.

	Run Anywhere at Any Scale
	~~~~~~~~~~~~~~~~~~~~~~~~~

	Most of the Kyuubi engine types have a distributed backend or can schedule distributed
	tasks at runtime. They can process data on single-node machines or clusters, such as
	YARN and Kubernetes. Besides, the Kyuubi server also supports running on bare metal or
	in a docker.

	High Performance
	~~~~~~~~~~~~~~~~

	Query performance is one of the critical factors in implementing Serverless SQL.
	Implementing serviceability on state-of-the-art query engines for bigdata lays
	the foundation for us to achieve this goal:

	- State-of-the-art query engines
	- Multiple application for high throughput
	- Sharable execution runtime for low latency
	- Server-side global and continuous optimization
	- Auxiliary performance plugins, such as Z-Ordering,
	Query Optimizer, and so on

	Another goal of Serverless SQL is to make end users
	need not or rarely care about tricky performance
	optimization issues.

	What's Next
	-----------

	.. toctree::
	:caption: Admin Guide
	:maxdepth: 2
	:glob:

	quick_start/index
	configuration/settings
	deployment/index
	Security <security/index>
	monitor/index
	tools/index

	.. toctree::
	:caption: User Guide
	:maxdepth: 2
	:glob:

	Clients & APIs <client/index>

	.. toctree::
	:caption: Extension Guide
	:maxdepth: 2
	:glob:

	Extensions <extensions/index>

	.. toctree::
	:caption: Connectors
	:maxdepth: 2
	:glob:

	connector/index

	.. toctree::
	:caption: Kyuubi Insider
	:maxdepth: 2

	overview/index

	.. toctree::
	:caption: Contributing
	:maxdepth: 2

	contributing/code/index
	contributing/doc/index

	.. toctree::
	:caption: Appendix
	:maxdepth: 2

	appendix/index