blob: 73bf4dd26d6d101dcf57fa93819ee899274c7d5c [file] [log] [blame]
---
title: Docker Builds
hide_title: true
sidebar_position: 7
version: 1
---
# Docker builds, images and tags
The Apache Superset community extensively uses Docker for development, release,
and productionizing Superset. This page details our Docker builds and tag naming
schemes to help users navigate our offerings.
Images are built and pushed to the [Superset Docker Hub repository](
https://hub.docker.com/r/apache/superset) using GitHub Actions.
Different sets of images are built and/or published at different times:
- **Published releases** (`release`): published using
tags like `5.0.0` and the `latest` tag.
- **Pull request iterations** (`pull_request`): for each pull request, while
we actively build the docker to validate the build, we do
not publish those images for security reasons, we simply `docker build --load`
- **Merges to the main branch** (`push`): resulting in new SHAs, with tags
prefixed with `master` for the latest `master` version.
## Build presets
We have a set of build "presets" that each represent a combination of
parameters for the build, mostly pointing to either different target layer
for the build, and/or base image.
Here are the build presets that are exposed through the `supersetbot docker` utility:
- `lean`: The default Docker image, including both frontend and backend. Tags
without a build_preset are lean builds (ie: `latest`, `5.0.0`, `4.1.2`, ...). `lean`
builds do not contain database
drivers, meaning you need to install your own. That applies to analytics databases **AND
the metadata database**. You'll likely want to layer either `mysqlclient` or `psycopg2-binary`
depending on the metadata database you choose for your installation, plus the required
drivers to connect to your analytics database(s).
- `dev`: For development, with a headless browser, dev-related utilities and root access. This
includes some commonly used database drivers like `mysqlclient`, `psycopg2-binary` and
some other used for development/CI
- `py311`, e.g., Py311: Similar to lean but with a different Python version (in this example, 3.11).
- `ci`: For certain CI workloads.
- `websocket`: For Superset clusters supporting advanced features.
- `dockerize`: Used by Helm in initContainers to wait for database dependencies to be available.
## Key tags examples
- `latest`: The latest official release build
- `latest-dev`: the `-dev` image of the latest official release build, with a
headless browser and root access.
- `master`: The latest build from the `master` branch, implicitly the lean build
preset
- `master-dev`: Similar to `master` but includes a headless browser and root access.
- `pr-5252`: The latest commit in PR 5252.
- `30948dc401b40982cb7c0dbf6ebbe443b2748c1b-dev`: A build for
this specific SHA, which could be from a `master` merge, or release.
- `websocket-latest`: The WebSocket image for use in a Superset cluster.
For insights or modifications to the build matrix and tagging conventions,
check the [supersetbot docker](https://github.com/apache-superset/supersetbot)
subcommand and the [docker.yml](https://github.com/apache/superset/blob/master/.github/workflows/docker.yml)
GitHub action.
## Building your own production Docker image
Every Superset deployment will require its own set of drivers depending on the data warehouse(s),
etc. so we recommend that users build their own Docker image by extending the `lean` image.
Here's an example Dockerfile that does this. Follow the in-line comments to customize it for
your desired Superset version and database drivers. The comments also note that a certain feature flag will
have to be enabled in your config file.
You would build the image with `docker build -t mysuperset:latest .` or `docker build -t ourcompanysuperset:5.0.0 .`
```Dockerfile
# change this to apache/superset:5.0.0 or whatever version you want to build from;
# otherwise the default is the latest commit on GitHub master branch
FROM apache/superset:master
USER root
# Set environment variable for Playwright
ENV PLAYWRIGHT_BROWSERS_PATH=/usr/local/share/playwright-browsers
# Install packages using uv into the virtual environment
# Superset started using uv after the 4.1 branch; if you are building from apache/superset:4.1.x or an older version,
# replace the first two lines with RUN pip install \
RUN . /app/.venv/bin/activate && \
uv pip install \
# install psycopg2 for using PostgreSQL metadata store - could be a MySQL package if using that backend:
psycopg2-binary \
# add the driver(s) for your data warehouse(s), in this example we're showing for Microsoft SQL Server:
pymssql \
# package needed for using single-sign on authentication:
Authlib \
# openpyxl to be able to upload Excel files
openpyxl \
# Pillow for Alerts & Reports to generate PDFs of dashboards
Pillow \
# install Playwright for taking screenshots for Alerts & Reports. This assumes the feature flag PLAYWRIGHT_REPORTS_AND_THUMBNAILS is enabled
# That feature flag will default to True starting in 6.0.0
# Playwright works only with Chrome.
# If you are still using Selenium instead of Playwright, you would instead install here the selenium package and a headless browser & webdriver
playwright \
&& playwright install-deps \
&& PLAYWRIGHT_BROWSERS_PATH=/usr/local/share/playwright-browsers playwright install chromium
# Switch back to the superset user
USER superset
CMD ["/app/docker/entrypoints/run-server.sh"]
```
## Key ARGs in Dockerfile
- `BUILD_TRANSLATIONS`: whether to build the translations into the image. For the
frontend build this tells webpack to strip out all locales other than `en` from
the `moment-timezone` library. For the backendthis skips compiling the
`*.po` translation files
- `DEV_MODE`: whether to skip the frontend build, this is used by our `docker-compose` dev setup
where we mount the local volume and build using `webpack` in `--watch` mode, meaning as you
alter the code in the local file system, webpack, from within a docker image used for this
purpose, will constantly rebuild the frontend as you go. This ARG enables the initial
`docker-compose` build to take much less time and resources
- `INCLUDE_CHROMIUM`: whether to include chromium in the backend build so that it can be
used as a headless browser for workloads related to "Alerts & Reports" and thumbnail generation
- `INCLUDE_FIREFOX`: same as above, but for firefox
- `PY_VER`: specifying the base image for the python backend, we don't recommend altering
this setting if you're not working on forwards or backwards compatibility
## Caching
To accelerate builds, we follow Docker best practices and use `apache/superset-cache`.
## About database drivers
Our docker images come with little to zero database driver support since
each environment requires different drivers, and maintaining a build with
wide database support would be both challenging (dozens of databases,
python drivers, and os dependencies) and inefficient (longer
build times, larger images, lower layer cache hit rate, ...).
For production use cases, we recommend that you derive our `lean` image(s) and
add database support for the database you need.
## On supporting different platforms (namely arm64 AND amd64)
Currently all automated builds are multi-platform, supporting both `linux/arm64`
and `linux/amd64`. This enables higher level constructs like `helm` and
`docker compose` to point to these images and effectively be multi-platform
as well.
Pull requests and master builds
are one-image-per-platform so that they can be parallelized and the
build matrix for those is more sparse as we don't need to build every
build preset on every platform, and generally can be more selective here.
For those builds, we suffix tags with `-arm` where it applies.
### Working with Apple silicon
Apple's current generation of computers uses ARM-based CPUs, and Docker
running on MACs seem to require `linux/arm64/v8` (at least one user's M2 was
configured in that way). Setting the environment
variable `DOCKER_DEFAULT_PLATFORM` to `linux/amd64` seems to function in
term of leveraging, and building upon the Superset builds provided here.
```bash
export DOCKER_DEFAULT_PLATFORM=linux/amd64
```
Presumably, `linux/arm64/v8` would be more optimized for this generation
of chips, but less compatible across the ARM ecosystem.