| --- |
| title: Async Queries via Celery |
| hide_title: true |
| sidebar_position: 4 |
| version: 1 |
| --- |
| |
| # Async Queries via Celery |
| |
| ## Celery |
| |
| On large analytic databases, it’s common to run queries that execute for minutes or hours. To enable |
| support for long running queries that execute beyond the typical web request’s timeout (30-60 |
| seconds), it is necessary to configure an asynchronous backend for Superset which consists of: |
| |
| - one or many Superset workers (which is implemented as a Celery worker), and can be started with |
| the `celery worker` command, run `celery worker --help` to view the related options. |
| - a celery broker (message queue) for which we recommend using Redis or RabbitMQ |
| - a results backend that defines where the worker will persist the query results |
| |
| Configuring Celery requires defining a `CELERY_CONFIG` in your `superset_config.py`. Both the worker |
| and web server processes should have the same configuration. |
| |
| ```python |
| class CeleryConfig(object): |
| broker_url = "redis://localhost:6379/0" |
| imports = ( |
| "superset.sql_lab", |
| "superset.tasks.scheduler", |
| ) |
| result_backend = "redis://localhost:6379/0" |
| worker_prefetch_multiplier = 10 |
| task_acks_late = True |
| task_annotations = { |
| "sql_lab.get_sql_results": { |
| "rate_limit": "100/s", |
| }, |
| } |
| |
| CELERY_CONFIG = CeleryConfig |
| ``` |
| |
| To start a Celery worker to leverage the configuration, run the following command: |
| |
| ```bash |
| celery --app=superset.tasks.celery_app:app worker --pool=prefork -O fair -c 4 |
| ``` |
| |
| To start a job which schedules periodic background jobs, run the following command: |
| |
| ```bash |
| celery --app=superset.tasks.celery_app:app beat |
| ``` |
| |
| To setup a result backend, you need to pass an instance of a derivative of from |
| from flask_caching.backends.base import BaseCache to the RESULTS_BACKEND configuration key in your superset_config.py. You can |
| use Memcached, Redis, S3 (https://pypi.python.org/pypi/s3werkzeugcache), memory or the file system |
| (in a single server-type setup or for testing), or to write your own caching interface. Your |
| `superset_config.py` may look something like: |
| |
| ```python |
| # On S3 |
| from s3cache.s3cache import S3Cache |
| S3_CACHE_BUCKET = 'foobar-superset' |
| S3_CACHE_KEY_PREFIX = 'sql_lab_result' |
| RESULTS_BACKEND = S3Cache(S3_CACHE_BUCKET, S3_CACHE_KEY_PREFIX) |
| |
| # On Redis |
| from flask_caching.backends.rediscache import RedisCache |
| RESULTS_BACKEND = RedisCache( |
| host='localhost', port=6379, key_prefix='superset_results') |
| ``` |
| |
| For performance gains, [MessagePack](https://github.com/msgpack/msgpack-python) and |
| [PyArrow](https://arrow.apache.org/docs/python/) are now used for results serialization. This can be |
| disabled by setting `RESULTS_BACKEND_USE_MSGPACK = False` in your `superset_config.py`, should any |
| issues arise. Please clear your existing results cache store when upgrading an existing environment. |
| |
| **Important Notes** |
| |
| - It is important that all the worker nodes and web servers in the Superset cluster _share a common |
| metadata database_. This means that SQLite will not work in this context since it has limited |
| support for concurrency and typically lives on the local file system. |
| |
| - There should _only be one instance of celery beat running_ in your entire setup. If not, |
| background jobs can get scheduled multiple times resulting in weird behaviors like duplicate |
| delivery of reports, higher than expected load / traffic etc. |
| |
| - SQL Lab will _only run your queries asynchronously if_ you enable **Asynchronous Query Execution** |
| in your database settings (Sources > Databases > Edit record). |
| |
| ## Celery Flower |
| |
| Flower is a web based tool for monitoring the Celery cluster which you can install from pip: |
| |
| ```bash |
| pip install flower |
| ``` |
| |
| You can run flower using: |
| |
| ```bash |
| celery --app=superset.tasks.celery_app:app flower |
| ``` |