tag	ae1815fe95ecdbf7dbfd6ee8938ac33b3b9da307
tagger	Xiang Fu <fx19880617@gmail.com>	Wed Mar 25 13:46:05 2020 -0700
object	9b2dc20c07dec6cf33df08c4444d996e8202c3ba

commit	9b2dc20c07dec6cf33df08c4444d996e8202c3ba	[log] [tgz]
author	Xiang Fu <fx19880617@gmail.com>	Sat Mar 14 20:59:46 2020 -0700
committer	Xiang Fu <fx19880617@gmail.com>	Sat Mar 14 20:59:46 2020 -0700
tree	eb0d04ef86f173db1b8c463bc2351856290a8182
parent	d989427ec076d4553206e9fdd062be24de50477d [diff]

tree: eb0d04ef86f173db1b8c463bc2351856290a8182

README.md

Apache Pinot (incubating)

Apache Pinot is a realtime distributed OLAP datastore, which is used to deliver scalable real time analytics with low latency. It can ingest data from offline data sources (such as Hadoop and flat files) as well as online sources (such as Kafka). Pinot is designed to scale horizontally.

These presentations on Pinot give an overview of Pinot:

Looking for the ThirdEye anomaly detection and root-cause analysis platform? Check out the Pinot/ThirdEye project

Key Features

A column-oriented database with various compression schemes such as Run Length, Fixed Bit Length
Pluggable indexing technologies - Sorted Index, Bitmap Index, Inverted Index, Star-Tree Index
Ability to optimize query/execution plan based on query and segment metadata
Near real time ingestion from Kafka and batch ingestion from Hadoop
SQL like language that supports selection, aggregation, filtering, group by, order by, distinct queries on fact data
Support for multivalued fields
Horizontally scalable and fault tolerant

Because of the design choices we made to achieve these goals, there are certain limitations present in Pinot:

Pinot is not a replacement for database i.e it cannot be used as source of truth store, cannot mutate data
Not a replacement for search engine i.e full text search, relevance not supported
Query cannot span across multiple tables

Pinot works very well for querying time series data with lots of Dimensions and Metrics. Example - Query (profile views, ad campaign performance, etc.) in an analytical fashion (who viewed this profile in the last weeks, how many ads were clicked per campaign).

Instructions to build Pinot

More detailed instructions can be found at Quick Demo section in the documentation.

# Clone a repo
$ git clone https://github.com/apache/incubator-pinot.git
$ cd incubator-pinot

# Build Pinot
$ mvn clean install -DskipTests -Pbin-dist

# Run the Quick Demo
$ cd pinot-distribution/target/apache-pinot-incubating-<version>-SNAPSHOT-bin
$ bin/quick-start-batch.sh

Deploy Pinot on Kubernetes

Please refer to Kubernetes Readme to deploy Pinot using Helm and load demo data set.

Pinot also provides k8s integration with interactive query engine Presto and data visualization tool Apache Superset.

Getting Involved

Ask questions on Apache Pinot Slack
Please join Apache Pinot mailing lists
dev-subscribe@pinot.apache.org (subscribe to pinot-dev mailing list)
dev@pinot.apache.org (posting to pinot-dev mailing list)
users-subscribe@pinot.apache.org (subscribe to pinot-user mailing list)
users@pinot.apache.org (posting to pinot-user mailing list)

Documentation

Check out Pinot documentation for a complete description of Pinot's features.

License

Apache Pinot is under Apache License, Version 2.0