| --- |
| title: Docker Builds |
| hide_title: true |
| sidebar_position: 7 |
| version: 1 |
| --- |
| |
| # Docker builds, images and tags |
| |
| The Apache Superset community extensively uses Docker for development, release, |
| and productionizing Superset. This page details our Docker builds and tag naming |
| schemes to help users navigate our offerings. |
| |
| Images are built and pushed to the [Superset Docker Hub repository]( |
| https://hub.docker.com/r/apache/superset) using GitHub Actions. |
| Different sets of images are built and/or published at different times: |
| |
| - **Published releases** (`release`): published using |
| tags like `5.0.0` and the `latest` tag. |
| - **Pull request iterations** (`pull_request`): for each pull request, while |
| we actively build the docker to validate the build, we do |
| not publish those images for security reasons, we simply `docker build --load` |
| - **Merges to the main branch** (`push`): resulting in new SHAs, with tags |
| prefixed with `master` for the latest `master` version. |
| |
| ## Build presets |
| |
| We have a set of build "presets" that each represent a combination of |
| parameters for the build, mostly pointing to either different target layer |
| for the build, and/or base image. |
| |
| Here are the build presets that are exposed through the `supersetbot docker` utility: |
| |
| - `lean`: The default Docker image, including both frontend and backend. Tags |
| without a build_preset are lean builds (ie: `latest`, `5.0.0`, `4.1.2`, ...). `lean` |
| builds do not contain database |
| drivers, meaning you need to install your own. That applies to analytics databases **AND |
| the metadata database**. You'll likely want to layer either `mysqlclient` or `psycopg2-binary` |
| depending on the metadata database you choose for your installation, plus the required |
| drivers to connect to your analytics database(s). |
| - `dev`: For development, with a headless browser, dev-related utilities and root access. This |
| includes some commonly used database drivers like `mysqlclient`, `psycopg2-binary` and |
| some other used for development/CI |
| - `py311`, e.g., Py311: Similar to lean but with a different Python version (in this example, 3.11). |
| - `ci`: For certain CI workloads. |
| - `websocket`: For Superset clusters supporting advanced features. |
| - `dockerize`: Used by Helm in initContainers to wait for database dependencies to be available. |
| |
| ## Key tags examples |
| |
| - `latest`: The latest official release build |
| - `latest-dev`: the `-dev` image of the latest official release build, with a |
| headless browser and root access. |
| - `master`: The latest build from the `master` branch, implicitly the lean build |
| preset |
| - `master-dev`: Similar to `master` but includes a headless browser and root access. |
| - `pr-5252`: The latest commit in PR 5252. |
| - `30948dc401b40982cb7c0dbf6ebbe443b2748c1b-dev`: A build for |
| this specific SHA, which could be from a `master` merge, or release. |
| - `websocket-latest`: The WebSocket image for use in a Superset cluster. |
| |
| For insights or modifications to the build matrix and tagging conventions, |
| check the [supersetbot docker](https://github.com/apache-superset/supersetbot) |
| subcommand and the [docker.yml](https://github.com/apache/superset/blob/master/.github/workflows/docker.yml) |
| GitHub action. |
| |
| ## Building your own production Docker image |
| |
| Every Superset deployment will require its own set of drivers depending on the data warehouse(s), |
| etc. so we recommend that users build their own Docker image by extending the `lean` image. |
| |
| Here's an example Dockerfile that does this. Follow the in-line comments to customize it for |
| your desired Superset version and database drivers. The comments also note that a certain feature flag will |
| have to be enabled in your config file. |
| |
| You would build the image with `docker build -t mysuperset:latest .` or `docker build -t ourcompanysuperset:5.0.0 .` |
| |
| ```Dockerfile |
| # change this to apache/superset:5.0.0 or whatever version you want to build from; |
| # otherwise the default is the latest commit on GitHub master branch |
| FROM apache/superset:master |
| |
| USER root |
| |
| # Set environment variable for Playwright |
| ENV PLAYWRIGHT_BROWSERS_PATH=/usr/local/share/playwright-browsers |
| |
| # Install packages using uv into the virtual environment |
| # Superset started using uv after the 4.1 branch; if you are building from apache/superset:4.1.x or an older version, |
| # replace the first two lines with RUN pip install \ |
| RUN . /app/.venv/bin/activate && \ |
| uv pip install \ |
| # install psycopg2 for using PostgreSQL metadata store - could be a MySQL package if using that backend: |
| psycopg2-binary \ |
| # add the driver(s) for your data warehouse(s), in this example we're showing for Microsoft SQL Server: |
| pymssql \ |
| # package needed for using single-sign on authentication: |
| Authlib \ |
| # openpyxl to be able to upload Excel files |
| openpyxl \ |
| # Pillow for Alerts & Reports to generate PDFs of dashboards |
| Pillow \ |
| # install Playwright for taking screenshots for Alerts & Reports. This assumes the feature flag PLAYWRIGHT_REPORTS_AND_THUMBNAILS is enabled |
| # That feature flag will default to True starting in 6.0.0 |
| # Playwright works only with Chrome. |
| # If you are still using Selenium instead of Playwright, you would instead install here the selenium package and a headless browser & webdriver |
| playwright \ |
| && playwright install-deps \ |
| && PLAYWRIGHT_BROWSERS_PATH=/usr/local/share/playwright-browsers playwright install chromium |
| |
| # Switch back to the superset user |
| USER superset |
| |
| CMD ["/app/docker/entrypoints/run-server.sh"] |
| ``` |
| |
| ## Key ARGs in Dockerfile |
| |
| - `BUILD_TRANSLATIONS`: whether to build the translations into the image. For the |
| frontend build this tells webpack to strip out all locales other than `en` from |
| the `moment-timezone` library. For the backendthis skips compiling the |
| `*.po` translation files |
| - `DEV_MODE`: whether to skip the frontend build, this is used by our `docker-compose` dev setup |
| where we mount the local volume and build using `webpack` in `--watch` mode, meaning as you |
| alter the code in the local file system, webpack, from within a docker image used for this |
| purpose, will constantly rebuild the frontend as you go. This ARG enables the initial |
| `docker-compose` build to take much less time and resources |
| - `INCLUDE_CHROMIUM`: whether to include chromium in the backend build so that it can be |
| used as a headless browser for workloads related to "Alerts & Reports" and thumbnail generation |
| - `INCLUDE_FIREFOX`: same as above, but for firefox |
| - `PY_VER`: specifying the base image for the python backend, we don't recommend altering |
| this setting if you're not working on forwards or backwards compatibility |
| |
| ## Caching |
| |
| To accelerate builds, we follow Docker best practices and use `apache/superset-cache`. |
| |
| ## About database drivers |
| |
| Our docker images come with little to zero database driver support since |
| each environment requires different drivers, and maintaining a build with |
| wide database support would be both challenging (dozens of databases, |
| python drivers, and os dependencies) and inefficient (longer |
| build times, larger images, lower layer cache hit rate, ...). |
| |
| For production use cases, we recommend that you derive our `lean` image(s) and |
| add database support for the database you need. |
| |
| ## On supporting different platforms (namely arm64 AND amd64) |
| |
| Currently all automated builds are multi-platform, supporting both `linux/arm64` |
| and `linux/amd64`. This enables higher level constructs like `helm` and |
| `docker compose` to point to these images and effectively be multi-platform |
| as well. |
| |
| Pull requests and master builds |
| are one-image-per-platform so that they can be parallelized and the |
| build matrix for those is more sparse as we don't need to build every |
| build preset on every platform, and generally can be more selective here. |
| For those builds, we suffix tags with `-arm` where it applies. |
| |
| ### Working with Apple silicon |
| |
| Apple's current generation of computers uses ARM-based CPUs, and Docker |
| running on MACs seem to require `linux/arm64/v8` (at least one user's M2 was |
| configured in that way). Setting the environment |
| variable `DOCKER_DEFAULT_PLATFORM` to `linux/amd64` seems to function in |
| term of leveraging, and building upon the Superset builds provided here. |
| |
| ```bash |
| export DOCKER_DEFAULT_PLATFORM=linux/amd64 |
| ``` |
| |
| Presumably, `linux/arm64/v8` would be more optimized for this generation |
| of chips, but less compatible across the ARM ecosystem. |