| --- |
| title: Running on Kubernetes |
| hide_title: true |
| sidebar_position: 12 |
| version: 1 |
| --- |
| |
| ## Running on Kubernetes |
| |
| Running on Kubernetes is supported with the provided [Helm](https://helm.sh/) chart found in the official [Superset helm repository](https://apache.github.io/superset/index.yaml). |
| |
| ### Prerequisites |
| |
| - A Kubernetes cluster |
| - Helm installed |
| |
| ### Running |
| |
| 1. Add the Superset helm repository |
| |
| ```sh |
| helm repo add superset https://apache.github.io/superset |
| "superset" has been added to your repositories |
| ``` |
| |
| 2. View charts in repo |
| |
| ```sh |
| helm search repo superset |
| NAME CHART VERSION APP VERSION DESCRIPTION |
| superset/superset 0.1.1 1.0 Apache Superset is a modern, enterprise-ready b... |
| ``` |
| |
| 3. Configure your setting overrides |
| |
| Just like any typical Helm chart, you'll need to craft a `values.yaml` file that would define/override any of the values exposed into the default [values.yaml](https://github.com/apache/superset/tree/master/helm/superset/values.yaml), or from any of the dependent charts it depends on: |
| |
| - [bitnami/redis](https://artifacthub.io/packages/helm/bitnami/redis) |
| - [bitnami/postgresql](https://artifacthub.io/packages/helm/bitnami/postgresql) |
| |
| More info down below on some important overrides you might need. |
| |
| 4. Install and run |
| |
| ```sh |
| helm upgrade --install --values my-values.yaml superset superset/superset |
| ``` |
| |
| You should see various pods popping up, such as: |
| |
| ```sh |
| kubectl get pods |
| NAME READY STATUS RESTARTS AGE |
| superset-celerybeat-7cdcc9575f-k6xmc 1/1 Running 0 119s |
| superset-f5c9c667-dw9lp 1/1 Running 0 4m7s |
| superset-f5c9c667-fk8bk 1/1 Running 0 4m11s |
| superset-init-db-zlm9z 0/1 Completed 0 111s |
| superset-postgresql-0 1/1 Running 0 6d20h |
| superset-redis-master-0 1/1 Running 0 6d20h |
| superset-worker-75b48bbcc-jmmjr 1/1 Running 0 4m8s |
| superset-worker-75b48bbcc-qrq49 1/1 Running 0 4m12s |
| ``` |
| |
| The exact list will depend on some of your specific configuration overrides but you should generally expect: |
| |
| - N `superset-xxxx-yyyy` and `superset-worker-xxxx-yyyy` pods (depending on your `supersetNode.replicaCount` and `supersetWorker.replicaCount` values) |
| - 1 `superset-postgresql-0` depending on your postgres settings |
| - 1 `superset-redis-master-0` depending on your redis settings |
| - 1 `superset-celerybeat-xxxx-yyyy` pod if you have `supersetCeleryBeat.enabled = true` in your values overrides |
| |
| 1. Access it |
| |
| The chart will publish appropriate services to expose the Superset UI internally within your k8s cluster. To access it externally you will have to either: |
| |
| - Configure the Service as a `LoadBalancer` or `NodePort` |
| - Set up an `Ingress` for it - the chart includes a definition, but will need to be tuned to your needs (hostname, tls, annotations etc...) |
| - Run `kubectl port-forward superset-xxxx-yyyy :8088` to directly tunnel one pod's port into your localhost |
| |
| Depending how you configured external access, the URL will vary. Once you've identified the appropriate URL you can log in with: |
| |
| - user: `admin` |
| - password: `admin` |
| |
| ### Important settings |
| |
| #### Security settings |
| |
| Default security settings and passwords are included but you **SHOULD** override those with your own, in particular: |
| |
| ```yaml |
| postgresql: |
| postgresqlPassword: superset |
| ``` |
| |
| Make sure, you set a unique strong complex alphanumeric string for your SECRET_KEY and use a tool to help you generate |
| a sufficiently random sequence. |
| |
| - To generate a good key you can run, `openssl rand -base64 42` |
| |
| ```yaml |
| configOverrides: |
| secret: | |
| SECRET_KEY = 'YOUR_OWN_RANDOM_GENERATED_SECRET_KEY' |
| ``` |
| |
| If you want to change the previous secret key then you should rotate the keys. |
| Default secret key for kubernetes deployment is `thisISaSECRET_1234` |
| |
| ```yaml |
| configOverrides: |
| my_override: | |
| PREVIOUS_SECRET_KEY = 'YOUR_PREVIOUS_SECRET_KEY' |
| SECRET_KEY = 'YOUR_OWN_RANDOM_GENERATED_SECRET_KEY' |
| init: |
| command: |
| - /bin/sh |
| - -c |
| - | |
| . {{ .Values.configMountPath }}/superset_bootstrap.sh |
| superset re-encrypt-secrets |
| . {{ .Values.configMountPath }}/superset_init.sh |
| ``` |
| |
| #### Dependencies |
| |
| Install additional packages and do any other bootstrap configuration in this script. For production clusters it's |
| recommended to build own image with this step done in CI. The following example installs the Big Query and Elasticsearch |
| database drivers so that you can connect to those datasources in your Superset installation. |
| |
| ```yaml |
| bootstrapScript: | |
| #!/bin/bash |
| pip install psycopg2==2.9.1 \ |
| redis==3.2.1 \ |
| pybigquery==2.26.0 \ |
| elasticsearch-dbapi==0.2.5 &&\ |
| if [ ! -f ~/bootstrap ]; then echo "Running Superset with uid {{ .Values.runAsUser }}" > ~/bootstrap; fi |
| ``` |
| |
| #### superset_config.py |
| |
| The default `superset_config.py` is fairly minimal and you will very likely need to extend it. This is done by specifying one or more key/value entries in `configOverrides`, e.g.: |
| |
| ```yaml |
| configOverrides: |
| my_override: | |
| # This will make sure the redirect_uri is properly computed, even with SSL offloading |
| ENABLE_PROXY_FIX = True |
| FEATURE_FLAGS = { |
| "DYNAMIC_PLUGINS": True |
| } |
| ``` |
| |
| Those will be evaluated as Helm templates and therefore will be able to reference other `values.yaml` variables e.g. `{{ .Values.ingress.hosts[0] }}` will resolve to your ingress external domain. |
| |
| The entire `superset_config.py` will be installed as a secret, so it is safe to pass sensitive parameters directly... however it might be more readable to use secret env variables for that. |
| |
| Full python files can be provided by running `helm upgrade --install --values my-values.yaml --set-file configOverrides.oauth=set_oauth.py` |
| |
| #### Environment Variables |
| |
| Those can be passed as key/values either with `extraEnv` or `extraSecretEnv` if they're sensitive. They can then be referenced from `superset_config.py` using e.g. `os.environ.get("VAR")`. |
| |
| ```yaml |
| extraEnv: |
| SMTP_HOST: smtp.gmail.com |
| SMTP_USER: user@gmail.com |
| SMTP_PORT: "587" |
| SMTP_MAIL_FROM: user@gmail.com |
| |
| extraSecretEnv: |
| SMTP_PASSWORD: xxxx |
| |
| configOverrides: |
| smtp: | |
| import ast |
| SMTP_HOST = os.getenv("SMTP_HOST","localhost") |
| SMTP_STARTTLS = ast.literal_eval(os.getenv("SMTP_STARTTLS", "True")) |
| SMTP_SSL = ast.literal_eval(os.getenv("SMTP_SSL", "False")) |
| SMTP_USER = os.getenv("SMTP_USER","superset") |
| SMTP_PORT = os.getenv("SMTP_PORT",25) |
| SMTP_PASSWORD = os.getenv("SMTP_PASSWORD","superset") |
| ``` |
| |
| #### System packages |
| |
| If new system packages are required, they can be installed before application startup by overriding the container's `command`, e.g.: |
| |
| ```yaml |
| supersetWorker: |
| command: |
| - /bin/sh |
| - -c |
| - | |
| apt update |
| apt install -y somepackage |
| apt autoremove -yqq --purge |
| apt clean |
| |
| # Run celery worker |
| . {{ .Values.configMountPath }}/superset_bootstrap.sh; celery --app=superset.tasks.celery_app:app worker |
| ``` |
| |
| #### Data sources |
| |
| Data source definitions can be automatically declared by providing key/value yaml definitions in `extraConfigs`: |
| |
| ```yaml |
| extraConfigs: |
| import_datasources.yaml: | |
| databases: |
| - allow_file_upload: true |
| allow_ctas: true |
| allow_cvas: true |
| database_name: example-db |
| extra: "{\r\n \"metadata_params\": {},\r\n \"engine_params\": {},\r\n \"\ |
| metadata_cache_timeout\": {},\r\n \"schemas_allowed_for_file_upload\": []\r\n\ |
| }" |
| sqlalchemy_uri: example://example-db.local |
| tables: [] |
| ``` |
| |
| Those will also be mounted as secrets and can include sensitive parameters. |
| |
| ### Configuration Examples |
| |
| #### Setting up OAuth |
| |
| ```yaml |
| extraEnv: |
| AUTH_DOMAIN: example.com |
| |
| extraSecretEnv: |
| GOOGLE_KEY: xxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.apps.googleusercontent.com |
| GOOGLE_SECRET: xxxxxxxxxxxxxxxxxxxxxxxx |
| |
| configOverrides: |
| enable_oauth: | |
| # This will make sure the redirect_uri is properly computed, even with SSL offloading |
| ENABLE_PROXY_FIX = True |
| |
| from flask_appbuilder.security.manager import AUTH_OAUTH |
| AUTH_TYPE = AUTH_OAUTH |
| OAUTH_PROVIDERS = [ |
| { |
| "name": "google", |
| "icon": "fa-google", |
| "token_key": "access_token", |
| "remote_app": { |
| "client_id": os.getenv("GOOGLE_KEY"), |
| "client_secret": os.getenv("GOOGLE_SECRET"), |
| "api_base_url": "https://www.googleapis.com/oauth2/v2/", |
| "client_kwargs": {"scope": "email profile"}, |
| "request_token_url": None, |
| "access_token_url": "https://accounts.google.com/o/oauth2/token", |
| "authorize_url": "https://accounts.google.com/o/oauth2/auth", |
| "authorize_params": {"hd": os.getenv("AUTH_DOMAIN", "")} |
| }, |
| } |
| ] |
| |
| # Map Authlib roles to superset roles |
| AUTH_ROLE_ADMIN = 'Admin' |
| AUTH_ROLE_PUBLIC = 'Public' |
| |
| # Will allow user self registration, allowing to create Flask users from Authorized User |
| AUTH_USER_REGISTRATION = True |
| |
| # The default user self registration role |
| AUTH_USER_REGISTRATION_ROLE = "Admin" |
| ``` |
| |
| #### Enable Alerts and Reports |
| |
| For this, as per the [Alerts and Reports doc](/docs/installation/email-reports), you will need to: |
| |
| ##### Install a supported webdriver in the Celery worker |
| |
| This is done either by using a custom image that has the webdriver pre-installed, or installing at startup time by overriding the `command`. Here's a working example for `chromedriver`: |
| |
| ```yaml |
| supersetWorker: |
| command: |
| - /bin/sh |
| - -c |
| - | |
| # Install chrome webdriver |
| # See https://github.com/apache/superset/blob/4fa3b6c7185629b87c27fc2c0e5435d458f7b73d/docs/src/pages/docs/installation/email_reports.mdx |
| apt update |
| wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb |
| apt install -y --no-install-recommends ./google-chrome-stable_current_amd64.deb |
| wget https://chromedriver.storage.googleapis.com/88.0.4324.96/chromedriver_linux64.zip |
| unzip chromedriver_linux64.zip |
| chmod +x chromedriver |
| mv chromedriver /usr/bin |
| apt autoremove -yqq --purge |
| apt clean |
| rm -f google-chrome-stable_current_amd64.deb chromedriver_linux64.zip |
| |
| # Run |
| . {{ .Values.configMountPath }}/superset_bootstrap.sh; celery --app=superset.tasks.celery_app:app worker |
| ``` |
| |
| ##### Run the Celery beat |
| |
| This pod will trigger the scheduled tasks configured in the alerts and reports UI section: |
| |
| ```yaml |
| supersetCeleryBeat: |
| enabled: true |
| ``` |
| |
| ##### Configure the appropriate Celery jobs and SMTP/Slack settings |
| |
| ```yaml |
| extraEnv: |
| SMTP_HOST: smtp.gmail.com |
| SMTP_USER: user@gmail.com |
| SMTP_PORT: "587" |
| SMTP_MAIL_FROM: user@gmail.com |
| |
| extraSecretEnv: |
| SLACK_API_TOKEN: xoxb-xxxx-yyyy |
| SMTP_PASSWORD: xxxx-yyyy |
| |
| configOverrides: |
| feature_flags: | |
| import ast |
| |
| FEATURE_FLAGS = { |
| "ALERT_REPORTS": True |
| } |
| |
| SMTP_HOST = os.getenv("SMTP_HOST","localhost") |
| SMTP_STARTTLS = ast.literal_eval(os.getenv("SMTP_STARTTLS", "True")) |
| SMTP_SSL = ast.literal_eval(os.getenv("SMTP_SSL", "False")) |
| SMTP_USER = os.getenv("SMTP_USER","superset") |
| SMTP_PORT = os.getenv("SMTP_PORT",25) |
| SMTP_PASSWORD = os.getenv("SMTP_PASSWORD","superset") |
| SMTP_MAIL_FROM = os.getenv("SMTP_MAIL_FROM","superset@superset.com") |
| |
| SLACK_API_TOKEN = os.getenv("SLACK_API_TOKEN",None) |
| celery_conf: | |
| from celery.schedules import crontab |
| |
| class CeleryConfig(object): |
| broker_url = f"redis://{env('REDIS_HOST')}:{env('REDIS_PORT')}/0" |
| imports = ('superset.sql_lab', "superset.tasks", "superset.tasks.thumbnails", ) |
| result_backend = f"redis://{env('REDIS_HOST')}:{env('REDIS_PORT')}/0" |
| task_annotations = { |
| 'sql_lab.get_sql_results': { |
| 'rate_limit': '100/s', |
| }, |
| 'email_reports.send': { |
| 'rate_limit': '1/s', |
| 'time_limit': 600, |
| 'soft_time_limit': 600, |
| 'ignore_result': True, |
| }, |
| } |
| beat_schedule = { |
| 'reports.scheduler': { |
| 'task': 'reports.scheduler', |
| 'schedule': crontab(minute='*', hour='*'), |
| }, |
| 'reports.prune_log': { |
| 'task': 'reports.prune_log', |
| 'schedule': crontab(minute=0, hour=0), |
| }, |
| 'cache-warmup-hourly': { |
| 'task': 'cache-warmup', |
| 'schedule': crontab(minute='*/30', hour='*'), |
| 'kwargs': { |
| 'strategy_name': 'top_n_dashboards', |
| 'top_n': 10, |
| 'since': '7 days ago', |
| }, |
| } |
| } |
| |
| CELERY_CONFIG = CeleryConfig |
| reports: | |
| EMAIL_PAGE_RENDER_WAIT = 60 |
| WEBDRIVER_BASEURL = "http://{{ template "superset.fullname" . }}:{{ .Values.service.port }}/" |
| WEBDRIVER_BASEURL_USER_FRIENDLY = "https://www.example.com/" |
| WEBDRIVER_TYPE= "chrome" |
| WEBDRIVER_OPTION_ARGS = [ |
| "--force-device-scale-factor=2.0", |
| "--high-dpi-support=2.0", |
| "--headless", |
| "--disable-gpu", |
| "--disable-dev-shm-usage", |
| # This is required because our process runs as root (in order to install pip packages) |
| "--no-sandbox", |
| "--disable-setuid-sandbox", |
| "--disable-extensions", |
| ] |
| ``` |