| --- |
| title: Caching |
| hide_title: true |
| sidebar_position: 3 |
| version: 1 |
| --- |
| |
| # Caching |
| |
| :::note |
| When a cache backend is configured, Superset expects it to remain available. Operations will |
| fail if the configured backend becomes unavailable rather than silently degrading. This |
| fail-fast behavior ensures operators are immediately aware of infrastructure issues. |
| ::: |
| |
| Superset uses [Flask-Caching](https://flask-caching.readthedocs.io/) for caching purposes. |
| Flask-Caching supports various caching backends, including Redis (recommended), Memcached, |
| SimpleCache (in-memory), or the local filesystem. |
| [Custom cache backends](https://flask-caching.readthedocs.io/en/latest/#custom-cache-backends) |
| are also supported. |
| |
| Caching can be configured by providing dictionaries in |
| `superset_config.py` that comply with [the Flask-Caching config specifications](https://flask-caching.readthedocs.io/en/latest/#configuring-flask-caching). |
| |
| The following cache configurations can be customized in this way: |
| |
| - Dashboard filter state (required): `FILTER_STATE_CACHE_CONFIG`. |
| - Explore chart form data (required): `EXPLORE_FORM_DATA_CACHE_CONFIG` |
| - Metadata cache (optional): `CACHE_CONFIG` |
| - Charting data queried from datasets (optional): `DATA_CACHE_CONFIG` |
| |
| For example, to configure the filter state cache using Redis: |
| |
| ```python |
| FILTER_STATE_CACHE_CONFIG = { |
| 'CACHE_TYPE': 'RedisCache', |
| 'CACHE_DEFAULT_TIMEOUT': 86400, |
| 'CACHE_KEY_PREFIX': 'superset_filter_cache', |
| 'CACHE_REDIS_URL': 'redis://localhost:6379/0' |
| } |
| ``` |
| |
| ## Dependencies |
| |
| In order to use dedicated cache stores, additional python libraries must be installed |
| |
| - For Redis: we recommend the [redis](https://pypi.python.org/pypi/redis) Python package |
| - Memcached: we recommend using [pylibmc](https://pypi.org/project/pylibmc/) client library as |
| `python-memcached` does not handle storing binary data correctly. |
| |
| These libraries can be installed using pip. |
| |
| ## Fallback Metastore Cache |
| |
| Note, that some form of Filter State and Explore caching are required. If either of these caches |
| are undefined, Superset falls back to using a built-in cache that stores data in the metadata |
| database. While it is recommended to use a dedicated cache, the built-in cache can also be used |
| to cache other data. |
| |
| For example, to use the built-in cache to store chart data, use the following config: |
| |
| ```python |
| DATA_CACHE_CONFIG = { |
| "CACHE_TYPE": "SupersetMetastoreCache", |
| "CACHE_KEY_PREFIX": "superset_results", # make sure this string is unique to avoid collisions |
| "CACHE_DEFAULT_TIMEOUT": 86400, # 60 seconds * 60 minutes * 24 hours |
| } |
| ``` |
| |
| ## Chart Cache Timeout |
| |
| The cache timeout for charts may be overridden by the settings for an individual chart, dataset, or |
| database. Each of these configurations will be checked in order before falling back to the default |
| value defined in `DATA_CACHE_CONFIG`. |
| |
| Note, that by setting the cache timeout to `-1`, caching for charting data can be disabled, either |
| per chart, dataset or database, or by default if set in `DATA_CACHE_CONFIG`. |
| |
| ## SQL Lab Query Results |
| |
| Caching for SQL Lab query results is used when async queries are enabled and is configured using |
| `RESULTS_BACKEND`. |
| |
| Note that this configuration does not use a flask-caching dictionary for its configuration, but |
| instead requires a cachelib object. |
| |
| See [Async Queries via Celery](/admin-docs/configuration/async-queries-celery) for details. |
| |
| ## Caching Thumbnails |
| |
| This is an optional feature that can be turned on by activating its [feature flag](/admin-docs/configuration/configuring-superset#feature-flags) on config: |
| |
| ``` |
| FEATURE_FLAGS = { |
| "THUMBNAILS": True, |
| "THUMBNAILS_SQLA_LISTENERS": True, |
| } |
| ``` |
| |
| By default thumbnails are rendered per user, and will fall back to the Selenium user for anonymous users. |
| To always render thumbnails as a fixed user (`admin` in this example), use the following configuration: |
| |
| ```python |
| from superset.tasks.types import FixedExecutor |
| |
| THUMBNAIL_EXECUTORS = [FixedExecutor("admin")] |
| ``` |
| |
| For this feature you will need a cache system and celery workers. All thumbnails are stored on cache |
| and are processed asynchronously by the workers. |
| |
| An example config where images are stored on S3 could be: |
| |
| ```python |
| from flask import Flask |
| from s3cache.s3cache import S3Cache |
| |
| ... |
| |
| class CeleryConfig(object): |
| broker_url = "redis://localhost:6379/0" |
| imports = ( |
| "superset.sql_lab", |
| "superset.tasks.thumbnails", |
| ) |
| result_backend = "redis://localhost:6379/0" |
| worker_prefetch_multiplier = 10 |
| task_acks_late = True |
| |
| |
| CELERY_CONFIG = CeleryConfig |
| |
| def init_thumbnail_cache(app: Flask) -> S3Cache: |
| return S3Cache("bucket_name", 'thumbs_cache/') |
| |
| |
| THUMBNAIL_CACHE_CONFIG = init_thumbnail_cache |
| ``` |
| |
| Using the above example cache keys for dashboards will be `superset_thumb__dashboard__{ID}`. You can |
| override the base URL for selenium using: |
| |
| ``` |
| WEBDRIVER_BASEURL = "https://superset.company.com" |
| ``` |
| |
| Additional selenium web drive configuration can be set using `WEBDRIVER_CONFIGURATION`. You can |
| implement a custom function to authenticate selenium. The default function uses the `flask-login` |
| session cookie. Here's an example of a custom function signature: |
| |
| ```python |
| def auth_driver(driver: WebDriver, user: "User") -> WebDriver: |
| pass |
| ``` |
| |
| Then on configuration: |
| |
| ``` |
| WEBDRIVER_AUTH_FUNC = auth_driver |
| ``` |
| |
| ## Distributed Coordination Backend |
| |
| Superset supports an optional distributed coordination (`DISTRIBUTED_COORDINATION_CONFIG`) for |
| high-performance distributed operations. This configuration enables: |
| |
| - **Distributed locking**: Moves lock operations from the metadata database to Redis, improving |
| performance and reducing metastore load |
| - **Real-time event notifications**: Enables instant pub/sub messaging for task abort signals and |
| completion notifications instead of polling-based approaches |
| |
| :::note |
| This requires Redis or Valkey specifically—it uses Redis-specific features (pub/sub, `SET NX EX`) |
| that are not available in general Flask-Caching backends. |
| ::: |
| |
| ### Configuration |
| |
| The distributed coordination uses Flask-Caching style configuration for consistency with other cache |
| backends. Configure `DISTRIBUTED_COORDINATION_CONFIG` in `superset_config.py`: |
| |
| ```python |
| DISTRIBUTED_COORDINATION_CONFIG = { |
| "CACHE_TYPE": "RedisCache", |
| "CACHE_REDIS_HOST": "localhost", |
| "CACHE_REDIS_PORT": 6379, |
| "CACHE_REDIS_DB": 0, |
| "CACHE_REDIS_PASSWORD": "", # Optional |
| } |
| ``` |
| |
| For Redis Sentinel deployments: |
| |
| ```python |
| DISTRIBUTED_COORDINATION_CONFIG = { |
| "CACHE_TYPE": "RedisSentinelCache", |
| "CACHE_REDIS_SENTINELS": [("sentinel1", 26379), ("sentinel2", 26379)], |
| "CACHE_REDIS_SENTINEL_MASTER": "mymaster", |
| "CACHE_REDIS_SENTINEL_PASSWORD": None, # Sentinel password (if different) |
| "CACHE_REDIS_PASSWORD": "", # Redis password |
| "CACHE_REDIS_DB": 0, |
| } |
| ``` |
| |
| For SSL/TLS connections: |
| |
| ```python |
| DISTRIBUTED_COORDINATION_CONFIG = { |
| "CACHE_TYPE": "RedisCache", |
| "CACHE_REDIS_HOST": "redis.example.com", |
| "CACHE_REDIS_PORT": 6380, |
| "CACHE_REDIS_SSL": True, |
| "CACHE_REDIS_SSL_CERTFILE": "/path/to/client.crt", |
| "CACHE_REDIS_SSL_KEYFILE": "/path/to/client.key", |
| "CACHE_REDIS_SSL_CA_CERTS": "/path/to/ca.crt", |
| } |
| ``` |
| |
| ### Distributed Lock TTL |
| |
| You can configure the default lock TTL (time-to-live) in seconds. Locks automatically expire after |
| this duration to prevent deadlocks from crashed processes: |
| |
| ```python |
| DISTRIBUTED_LOCK_DEFAULT_TTL = 30 # Default: 30 seconds |
| ``` |
| |
| Individual lock acquisitions can override this value when needed. |
| |
| ### Database-Only Mode |
| |
| When `DISTRIBUTED_COORDINATION_CONFIG` is not configured, Superset uses database-backed operations: |
| |
| - **Locking**: Uses the KeyValue table with periodic cleanup of expired entries |
| - **Event notifications**: Uses database polling instead of pub/sub |
| |
| While database-backed operations work reliably, the Redis backend is recommended for production |
| deployments where low latency and reduced database load are important. |
| |
| :::resources |
| - [Blog: The Data Engineer's Guide to Lightning-Fast Superset Dashboards](https://preset.io/blog/the-data-engineers-guide-to-lightning-fast-apache-superset-dashboards/) |
| - [Blog: Accelerating Dashboards with Materialized Views](https://preset.io/blog/accelerating-apache-superset-dashboards-with-materialized-views/) |
| ::: |