blob: e8557997ae288b447ed7fa3eb27b432954a885e9 [file] [log] [blame]
..
.. Licensed to the Apache Software Foundation (ASF) under one
.. or more contributor license agreements. See the NOTICE file
.. distributed with this work for additional information
.. regarding copyright ownership. The ASF licenses this file
.. to you under the Apache License, Version 2.0 (the
.. "License"); you may not use this file except in compliance
.. with the License. You may obtain a copy of the License at
..
.. http://www.apache.org/licenses/LICENSE-2.0
..
.. Unless required by applicable law or agreed to in writing,
.. software distributed under the License is distributed on an
.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
.. KIND, either express or implied. See the License for the
.. specific language governing permissions and limitations
.. under the License.
..
.. warning:: The documentation is not up-to-date and has moved to `Apache Pinot Docs <https://docs.pinot.apache.org/>`_.
.. _code-modules:
*****************************
Code Modules and Organization
*****************************
.. contents:: Table of Contents
Before proceeding to contributing changes to Pinot, review the contents of this section.
External Dependencies
---------------------
Pinot depends on a number of external projects, the most notable ones are:
* Apache Zookeeper
* Apache Helix
* Apache Kafka
* Apache Thrift
* Netty
* Google Guava
* Yammer
*Helix* is used for ClusterManagement, and Pinot code is tightly integrated with Helix and Zookeeper interfaces.
*Kafka* is the default realtime stream provider, but can be replaced with others. See customizations section for more info.
*Thrift* is used for message exchange between broker and server components, with *Netty* providing the server functionality
for processing messages in a non-blocking fashion.
*Guava* is used for number of auxiliary components such as Caches and RateLimiters.
*Yammer* metrics is used to register and expose metrics from Pinot components.
In addition, Pinot relies on several key external libraries for some of its core functionality:
*Roaring Bitmaps*: Pinot's inverted indices are built using `RoaringBitmap <https://github.com/RoaringBitmap/RoaringBitmap>`_ library.
*t-Digest*: Pinot's digest based percentile calculations are based on `T-Digest <https://github.com/tdunning/t-digest>`_ library.
Pinot Modules
-------------
Pinot is a multi-module project, with each module providing specific functionality that helps us to build services from
a combination of modules. This helps keep clean interface contracts between different modules as well as reduce the
overall executable size for individually deployable component.
Each module has a ``src/main/java`` folder where the code resides and ``src/test/java`` where the *unit* tests corresponding to
the module's code reside.
.. _pinot-foundation:
Foundational modules
--------------------
The following figure provides a high-level overview of the foundational Pinot modules.
.. figure:: img/PinotFoundation.png
:scale: 50 %
pinot-common
^^^^^^^^^^^^
``pinot-common`` provides classes common to Pinot components. Some key classes you will find here are:
* ``config``: Definitions for various elements of Pinot's table config.
* ``metrics``: Definitions for base metrics provided by Controller, Broker and Server.
* ``metadata``: Definitions of metadata stored in Zookeeper.
* ``pql.parsers``: Code to compile PQL strings into corresponding AbstractSyntaxTrees (AST).
* ``request``: Autogenerated thrift classes representing various parts of PQL requests.
* ``response``: Definitions of response format returned by the Broker.
* ``filesystem``: provides abstractions for working with ``segments`` on local or remote filesystems. This module allows for users to plugin filesystems specific to their usecase. Extensions to the base ``PinotFS`` should ideally be housed in their specific modules so as not pull in unnecessary dependencies for all users.
pinot-core
^^^^^^^^^^
``pinot-core`` modules provides the core functionality of Pinot, specifically for handling segments, various index
structures, query execution - filters, transformations, aggregations etc and support for realtime segments.
pinot-server
^^^^^^^^^^^^
``pinot-server`` provides server specific functionality including server startup and REST APIs exposed by the server.
.. figure:: img/PinotServer.png
:scale: 50 %
pinot-controller
^^^^^^^^^^^^^^^^
``pinot-server`` houses all the controller specific functionality, including many cluster administration APIs, segment
upload (for both offline and realtime), segment assignment, retention strategies etc.
.. figure:: img/PinotController.png
:scale: 50 %
pinot-broker
^^^^^^^^^^^^
``pinot-broker`` provides broker functionality that includes wiring the broker startup sequence, building broker routing
tables, PQL request handling.
.. figure:: img/PinotBroker.png
:scale: 50 %
pinot-minion
^^^^^^^^^^^^
``pinot-minion`` provides functionality for running auxiliary/periodic tasks on a Pinot Cluster such as purging records
for compliance with regulations like GDPR.
pinot-hadoop
^^^^^^^^^^^^
``pinot-hadoop`` provides classes for segment generation jobs using Hadoop infrastructure.
.. figure:: img/PinotMinionHadoop.png
:scale: 50 %
Auxiliary modules
-----------------
In addition to the core modules described above, Pinot code provides the following modules:
* ``pinot-tools``: This module is a collection of many tools useful for setting up Pinot cluster, creating/updating segments.
It also houses the Pinot quick start guide code.
* ``pinot-perf``: This module has a collection of benchmark test code used to evaluate design options.
* ``pinot-client-api``: This module houses the Java client API. See :ref:`java-client` for more info.
* ``pinot-integration-tests``: This module holds integration tests that test functionality across multiple classes or components.
These tests typically do not rely on mocking and provide more end to end coverage for code.
.. _extension-modules:
Extension modules
-----------------
``pinot-hadoop-filesystem`` and ``pinot-azure-filesystem`` are module added to support extensions to Pinot filesystem.
The functionality is broken down into modules of their own to avoid polluting the common modules with additional large libraries.
These libraries bring in transitive dependencies of their own that can cause classpath conflicts at runtime. We would like to
avoid this for the common usage of Pinot as much as possible.