Pinot Quickstart on Kubernetes with Helm

Prerequisite

(Optional) Setup a Kubernetes cluster on Amazon Elastic Kubernetes Service (Amazon EKS)

(Optional) Create a new k8s cluster on AWS EKS

aws configure

Note that environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY will override the aws configuration in file ~/.aws/credentials.

  • Create an EKS cluster

Please modify the parameters in the example command below:

eksctl create cluster \
--name pinot-quickstart \
--version 1.14 \
--region us-west-2 \
--nodegroup-name standard-workers \
--node-type t3.small \
--nodes 3 \
--nodes-min 3 \
--nodes-max 4 \
--node-ami auto

You can monitor cluster status by command:

EKS_CLUSTER_NAME=pinot-quickstart
aws eks describe-cluster --name ${EKS_CLUSTER_NAME}

Once the cluster is in ACTIVE status, it's ready to be used.

(Optional) How to connect to an existing cluster

Simply run below command to get the credential for the cluster you just created or your existing cluster.

EKS_CLUSTER_NAME=pinot-quickstart
aws eks update-kubeconfig --name ${EKS_CLUSTER_NAME}

To verify the connection, you can run

kubectl get nodes

(Optional) Setup a Kubernetes cluster on Google Kubernetes Engine(GKE)

(Optional) Create a new k8s cluster on GKE

  • Google Cloud SDK (https://cloud.google.com/sdk/install)
  • Enable Google Cloud Account and create a project, e.g. pinot-demo.
    • pinot-demo will be used as example value for ${GCLOUD_PROJECT} variable in script example.
    • pinot-demo@example.com will be used as example value for ${GCLOUD_EMAIL}.

Below script will:

  • Create a gCloud cluster pinot-quickstart
  • Request 2 servers of type n1-standard-8 for demo.

Please fill both environment variables: ${GCLOUD_PROJECT} and ${GCLOUD_EMAIL} with your gcloud project and gcloud account email in below script.

GCLOUD_PROJECT=[your gcloud project name]
GCLOUD_EMAIL=[Your gcloud account email]
./setup_gke.sh

E.g.

GCLOUD_PROJECT=pinot-demo
GCLOUD_EMAIL=pinot-demo@example.com
./setup_gke.sh

(Optional) How to connect to an existing cluster

Simply run below command to get the credential for the cluster you just created or your existing cluster. Please modify the Env variables ${GCLOUD_PROJECT}, ${GCLOUD_ZONE}, ${GCLOUD_CLUSTER} accordingly in below script.

GCLOUD_PROJECT=pinot-demo
GCLOUD_ZONE=us-west1-b
GCLOUD_CLUSTER=pinot-quickstart
gcloud container clusters get-credentials ${GCLOUD_CLUSTER} --zone ${GCLOUD_ZONE} --project ${GCLOUD_PROJECT}

(Optional) Setup a Kubernetes cluster on Microsoft Azure

(Optional) Create a new k8s cluster on Azure

az login
  • Create Resource Group
AKS_RESOURCE_GROUP=pinot-demo
AKS_RESOURCE_GROUP_LOCATION=eastus
az group create --name ${AKS_RESOURCE_GROUP} --location ${AKS_RESOURCE_GROUP_LOCATION}
  • Create an AKS cluster
AKS_RESOURCE_GROUP=pinot-demo
AKS_CLUSTER_NAME=pinot-quickstart
az aks create --resource-group ${AKS_RESOURCE_GROUP}  --name ${AKS_CLUSTER_NAME} --node-count 3

(Optional) Please register default provider if above command failed for error: MissingSubscriptionRegistration

az provider register --namespace Microsoft.Network

(Optional) How to connect to an existing cluster

Simply run below command to get the credential for the cluster you just created or your existing cluster.

AKS_RESOURCE_GROUP=pinot-demo
AKS_CLUSTER_NAME=pinot-quickstart
az aks get-credentials --resource-group ${AKS_RESOURCE_GROUP} --name ${AKS_CLUSTER_NAME}

To verify the connection, you can run

kubectl get nodes

How to setup a Pinot cluster for demo

Update helm dependency

helm dependency update

Start Pinot with Helm

  • For helm v3.0.0
kubectl create ns pinot-quickstart
helm install -n pinot-quickstart pinot .
  • For helm v2.12.1

If cluster is just initialized, ensure helm is initialized by running:

helm init --service-account tiller

Then deploy pinot cluster by:

helm install --namespace "pinot-quickstart" --name "pinot" .

Troubleshooting (For helm v2.12.1)

  • Error: Please run below command if encountering issue:
Error: could not find tiller".
  • Resolution:
kubectl -n kube-system delete deployment tiller-deploy
kubectl -n kube-system delete service/tiller-deploy
helm init --service-account tiller
  • Error: Please run below command if encountering permission issue:

Error: release pinot failed: namespaces "pinot-quickstart" is forbidden: User "system:serviceaccount:kube-system:default" cannot get resource "namespaces" in API group "" in the namespace "pinot-quickstart"

  • Resolution:
kubectl apply -f helm-rbac.yaml

To check deployment status

kubectl get all -n pinot-quickstart

Pinot Realtime QuickStart

Bring up a Kafka Cluster for realtime data ingestion

  • For helm v3.0.0
helm repo add incubator http://storage.googleapis.com/kubernetes-charts-incubator
helm install -n pinot-quickstart kafka incubator/kafka
  • For helm v2.12.1
helm repo add incubator http://storage.googleapis.com/kubernetes-charts-incubator
helm install --namespace "pinot-quickstart"  --name kafka incubator/kafka

Create Kafka topic

kubectl -n pinot-quickstart exec kafka-0 -- kafka-topics --zookeeper kafka-zookeeper:2181 --topic flights-realtime --create --partitions 1 --replication-factor 1
kubectl -n pinot-quickstart exec kafka-0 -- kafka-topics --zookeeper kafka-zookeeper:2181 --topic flights-realtime-avro --create --partitions 1 --replication-factor 1

Load data into Kafka and create Pinot schema/table

kubectl apply -f pinot-realtime-quickstart.yml

How to query pinot data

Please use below script to do local port-forwarding and open Pinot query console on your web browser.

./query-pinot-data.sh

Configuring the Chart

This chart includes a ZooKeeper chart as a dependency to the Pinot cluster in its requirement.yaml by default. The chart can be customized using the following configurable parameters:

ParameterDescriptionDefault
image.repositoryPinot Container image repowinedepot/pinot
image.tagPinot Container image tag0.1.13-SNAPSHOT
image.pullPolicyPinot Container image pull policyIfNotPresent
cluster.namePinot Cluster namepinot-quickstart
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
controller.nameName of Pinot Controllercontroller
controller.portPinot controller port9000
controller.replicaCountPinot controller replicas2
controller.data.dirPinot controller data directory, should be same as controller.persistence.mountPath or a sub directory of it/var/pinot/controller/data
controller.vip.hostPinot Vip hostpinot-controller
controller.vip.portPinot Vip port9000
controller.persistence.enabledUse a PVC to persist Pinot Controller datatrue
controller.persistence.accessModeAccess mode of data volumeReadWriteOnce
controller.persistence.sizeSize of data volume1G
controller.persistence.mountPathMount path of controller data volume/var/pinot/controller/data
controller.persistence.storageClassStorage class of backing PVC""
controller.jvmOptsPinot Controller JVM Options-Xms4G -Xmx4G
controller.log4j2ConfFilePinot Controller log4j2 configuration file/opt/pinot/conf/pinot-controller-log4j2.xml
controller.service.portService Port9000
controller.external.enabledIf True, exposes Pinot Controller externallyfalse
controller.external.typeService TypeLoadBalancer
controller.external.portService Port9000
controller.resourcesPinot Controller resource requests and limits{}
controller.nodeSelectorNode labels for controller pod assignment{}
controller.affinityDefines affinities and anti-affinities for pods as defined in: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity preferences{}
controller.tolerationsList of node tolerations for the pods. https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/[]
controller.podAnnotationsAnnotations to be added to controller pod{}
controller.updateStrategy.typeStatefulSet update strategy to use.RollingUpdate
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
broker.nameName of Pinot Brokerbroker
broker.portPinot broker port8099
broker.replicaCountPinot broker replicas2
broker.jvmOptsPinot Broker JVM Options-Xms4G -Xmx4G
broker.log4j2ConfFilePinot Broker log4j2 configuration file/opt/pinot/conf/pinot-broker-log4j2.xml
broker.service.portService Port8099
broker.external.enabledIf True, exposes Pinot Broker externallyfalse
broker.external.typeExternal service TypeLoadBalancer
broker.external.portExternal service Port8099
broker.routingTable.builderClassRouting Table Builder Classrandom
broker.resourcesPinot Broker resource requests and limits{}
broker.nodeSelectorNode labels for broker pod assignment{}
broker.affinityDefines affinities and anti-affinities for pods as defined in: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity preferences{}
broker.tolerationsList of node tolerations for the pods. https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/[]
broker.podAnnotationsAnnotations to be added to broker pod{}
broker.updateStrategy.typeStatefulSet update strategy to use.RollingUpdate
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
server.nameName of Pinot Serverserver
server.port.nettyPinot server netty port8098
server.port.adminPinot server admin port8097
server.replicaCountPinot server replicas2
server.dataDirPinot server data directory, should be same as server.persistence.mountPath or a sub directory of it/var/pinot/server/data/index
server.segmentTarDirPinot server segment directory, should be same as server.persistence.mountPath or a sub directory of it/var/pinot/server/data/segments
server.persistence.enabledUse a PVC to persist Pinot Server datatrue
server.persistence.accessModeAccess mode of data volumeReadWriteOnce
server.persistence.sizeSize of data volume4G
server.persistence.mountPathMount path of server data volume/var/pinot/server/data
server.persistence.storageClassStorage class of backing PVC""
server.jvmOptsPinot Server JVM Options-Xms4G -Xmx4G -XX:MaxDirectMemorySize=10g
server.log4j2ConfFilePinot Server log4j2 configuration file/opt/pinot/conf/pinot-server-log4j2.xml
server.service.portService Port8098
server.resourcesPinot Server resource requests and limits{}
server.nodeSelectorNode labels for server pod assignment{}
server.affinityDefines affinities and anti-affinities for pods as defined in: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity preferences{}
server.tolerationsList of node tolerations for the pods. https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/[]
server.podAnnotationsAnnotations to be added to server pod{}
server.updateStrategy.typeStatefulSet update strategy to use.RollingUpdate
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
zookeeper.enabledIf True, installs Zookeeper Charttrue
zookeeper.resourcesZookeeper resource requests and limits{}
zookeeper.envEnvironmental variables provided to Zookeeper Zookeeper{ZK_HEAP_SIZE: "1G"}
zookeeper.storageZookeeper Persistent volume size2Gi
zookeeper.image.PullPolicyZookeeper Container pull policyIfNotPresent
zookeeper.urlURL of Zookeeper Cluster (unneeded if installing Zookeeper Chart)""
zookeeper.portPort of Zookeeper Cluster2181
zookeeper.affinityDefines affinities and anti-affinities for pods as defined in: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity preferences{}
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Specify parameters using --set key=value[,key=value] argument to helm install

Alternatively a YAML file that specifies the values for the parameters can be provided like this:

helm install --name pinot -f values.yaml .

If you are using GKE, Create a storageClass:

kubectl apply -f gke-ssd.yaml

or If you want to use pd-standard storageClass:

kubectl apply -f gke-pd.yaml

Use superset to query Pinot

Bring up Superset

kubectl apply -f superset.yaml

Set up Admin account (First time)

kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'export FLASK_APP=superset:app && flask fab create-admin'

Init Superset (First time)

kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'superset db upgrade'
kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'superset init'

Load Demo Data source

kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'superset import_datasources -p /etc/superset/pinot_example_datasource.yaml'
kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'superset import_dashboards -p /etc/superset/pinot_example_dashboard.json'

Access Superset UI

You can run below command to navigate superset in your browser with the previous admin credential.

./open-superset-ui.sh

You can open the imported dashboard by click Dashboards banner then click on AirlineStats.

Access Pinot Using Presto

Deploy Presto with Pinot Plugin

You can run below command to deploy a customized Presto with Pinot plugin.

kubectl apply -f presto-coordinator.yaml

Query Presto using Presto CLI

Once Presto is deployed, you could run below command.

./pinot-presto-cli.sh

Sample queries to execute

  • List all catalogs
presto:default> show catalogs;
 Catalog
---------
 pinot
 system
(2 rows)

Query 20191112_050827_00003_xkm4g, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:01 [0 rows, 0B] [0 rows/s, 0B/s]

  • List All tables
presto:default> show tables;
    Table
--------------
 airlinestats
(1 row)

Query 20191112_050907_00004_xkm4g, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:01 [1 rows, 29B] [1 rows/s, 41B/s]
  • Show schema
presto:default> DESCRIBE pinot.dontcare.airlinestats;
        Column        |  Type   | Extra | Comment
----------------------+---------+-------+---------
 flightnum            | integer |       |
 origin               | varchar |       |
 quarter              | integer |       |
 lateaircraftdelay    | integer |       |
 divactualelapsedtime | integer |       |
......

Query 20191112_051021_00005_xkm4g, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:02 [80 rows, 6.06KB] [35 rows/s, 2.66KB/s]
  • Count total documents
presto:default> select count(*) as cnt from pinot.dontcare.airlinestats limit 10;
 cnt
------
 9745
(1 row)

Query 20191112_051114_00006_xkm4g, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0:00 [1 rows, 8B] [2 rows/s, 19B/s]

(Optional) Deploy more Presto workers

You can run below command to deploy more presto workers if needed.

kubectl apply -f presto-worker.yaml

Then you could verify the new worker nodes are added by:

presto:default> select * from system.runtime.nodes;
               node_id                |         http_uri         |      node_version      | coordinator | state
--------------------------------------+--------------------------+------------------------+-------------+--------
 38959968-6262-46a1-a321-ee0db6cbcbd3 | http://10.244.0.182:8080 | 0.230-SNAPSHOT-4e66289 | false       | active
 83851b8c-fe7f-49fe-ae0c-e3daf6d92bef | http://10.244.2.183:8080 | 0.230-SNAPSHOT-4e66289 | false       | active
 presto-coordinator                   | http://10.244.1.25:8080  | 0.230-SNAPSHOT-4e66289 | true        | active
(3 rows)

Query 20191206_095812_00027_na99c, FINISHED, 2 nodes
Splits: 17 total, 17 done (100.00%)
0:00 [3 rows, 248B] [11 rows/s, 984B/s]

How to clean up Pinot deployment

kubectl delete ns pinot-quickstart