| --- |
| sidebar_position: 9 |
| --- |
| |
| # FAQ |
| |
| ## How big of a dataset can Superset handle? |
| |
| Superset can work with even gigantic databases! Superset acts as a thin layer above your underlying |
| databases or data engines, which do all the processing. Superset simply visualizes the results of |
| the query. |
| |
| The key to achieving acceptable performance in Superset is whether your database can execute queries |
| and return results at a speed that is acceptable to your users. If you experience slow performance with |
| Superset, benchmark and tune your data warehouse. |
| |
| ## What are the computing specifications required to run Superset? |
| |
| The specs of your Superset installation depend on how many users you have and what their activity is, not |
| on the size of your data. Superset admins in the community have reported 8GB RAM, 2vCPUs as adequate to |
| run a moderately-sized instance. To develop Superset, e.g., compile code or build images, you may |
| need more power. |
| |
| Monitor your resource usage and increase or decrease as needed. Note that Superset usage has a tendency |
| to occur in spikes, e.g., if everyone in a meeting loads the same dashboard at once. |
| |
| Superset's application metadata does not require a very large database to store it, though |
| the log file grows over time. |
| |
| ## Can I join / query multiple tables at one time? |
| |
| Not in the Explore or Visualization UI. A Superset SQLAlchemy datasource can only be a single table |
| or a view. |
| |
| When working with tables, the solution would be to create a table that contains all the fields |
| needed for your analysis, most likely through some scheduled batch process. |
| |
| A view is a simple logical layer that abstracts an arbitrary SQL queries as a virtual table. This can |
| allow you to join and union multiple tables and to apply some transformation using arbitrary SQL |
| expressions. The limitation there is your database performance, as Superset effectively will run a |
| query on top of your query (view). A good practice may be to limit yourself to joining your main |
| large table to one or many small tables only, and avoid using _GROUP BY_ where possible as Superset |
| will do its own _GROUP BY_ and doing the work twice might slow down performance. |
| |
| Whether you use a table or a view, performance depends on how fast your database can deliver |
| the result to users interacting with Superset. |
| |
| However, if you are using SQL Lab, there is no such limitation. You can write SQL queries to join |
| multiple tables as long as your database account has access to the tables. |
| |
| ## How do I create my own visualization? |
| |
| We recommend reading the instructions in |
| [Creating Visualization Plugins](/docs/contributing/howtos#creating-visualization-plugins). |
| |
| ## Can I upload and visualize CSV data? |
| |
| Absolutely! Read the instructions [here](/docs/using-superset/exploring-data) to learn |
| how to enable and use CSV upload. |
| |
| ## Why are my queries timing out? |
| |
| There are many possible causes for why a long-running query might time out. |
| |
| For running long query from Sql Lab, by default Superset allows it run as long as 6 hours before it |
| being killed by celery. If you want to increase the time for running query, you can specify the |
| timeout in configuration. For example: |
| |
| ``` |
| SQLLAB_ASYNC_TIME_LIMIT_SEC = 60 * 60 * 6 |
| ``` |
| |
| If you are seeing timeouts (504 Gateway Time-out) when loading dashboard or explore slice, you are |
| probably behind gateway or proxy server (such as Nginx). If it did not receive a timely response |
| from Superset server (which is processing long queries), these web servers will send 504 status code |
| to clients directly. Superset has a client-side timeout limit to address this issue. If query didn’t |
| come back within client-side timeout (60 seconds by default), Superset will display warning message |
| to avoid gateway timeout message. If you have a longer gateway timeout limit, you can change the |
| timeout settings in **superset_config.py**: |
| |
| ``` |
| SUPERSET_WEBSERVER_TIMEOUT = 60 |
| ``` |
| |
| ## Why is the map not visible in the geospatial visualization? |
| |
| You need to register a free account at [Mapbox.com](https://www.mapbox.com), obtain an API key, and add it |
| to **.env** at the key MAPBOX_API_KEY: |
| |
| ``` |
| MAPBOX_API_KEY = "longstringofalphanumer1c" |
| ``` |
| |
| ## How to limit the timed refresh on a dashboard? |
| |
| By default, the dashboard timed refresh feature allows you to automatically re-query every slice on |
| a dashboard according to a set schedule. Sometimes, however, you won’t want all of the slices to be |
| refreshed - especially if some data is slow moving, or run heavy queries. To exclude specific slices |
| from the timed refresh process, add the `timed_refresh_immune_slices` key to the dashboard JSON |
| Metadata field: |
| |
| ``` |
| { |
| "filter_immune_slices": [], |
| "expanded_slices": {}, |
| "filter_immune_slice_fields": {}, |
| "timed_refresh_immune_slices": [324] |
| } |
| ``` |
| |
| In the example above, if a timed refresh is set for the dashboard, then every slice except 324 will |
| be automatically re-queried on schedule. |
| |
| Slice refresh will also be staggered over the specified period. You can turn off this staggering by |
| setting the `stagger_refresh` to false and modify the stagger period by setting `stagger_time` to a |
| value in milliseconds in the JSON Metadata field: |
| |
| ``` |
| { |
| "stagger_refresh": false, |
| "stagger_time": 2500 |
| } |
| ``` |
| |
| Here, the entire dashboard will refresh at once if periodic refresh is on. The stagger time of 2.5 |
| seconds is ignored. |
| |
| **Why does ‘flask fab’ or superset freezed/hung/not responding when started (my home directory is |
| NFS mounted)?** |
| |
| By default, Superset creates and uses an SQLite database at `~/.superset/superset.db`. SQLite is |
| known to [not work well if used on NFS](https://www.sqlite.org/lockingv3.html) due to broken file |
| locking implementation on NFS. |
| |
| You can override this path using the **SUPERSET_HOME** environment variable. |
| |
| Another workaround is to change where superset stores the sqlite database by adding the following in |
| `superset_config.py`: |
| |
| ``` |
| SQLALCHEMY_DATABASE_URI = 'sqlite:////new/location/superset.db?check_same_thread=false' |
| ``` |
| |
| You can read more about customizing Superset using the configuration file |
| [here](/docs/configuration/configuring-superset). |
| |
| ## What if the table schema changed? |
| |
| Table schemas evolve, and Superset needs to reflect that. It’s pretty common in the life cycle of a |
| dashboard to want to add a new dimension or metric. To get Superset to discover your new columns, |
| all you have to do is to go to **Data -> Datasets**, click the edit icon next to the dataset |
| whose schema has changed, and hit **Sync columns from source** from the **Columns** tab. |
| Behind the scene, the new columns will get merged. Following this, you may want to re-edit the |
| table afterwards to configure the Columns tab, check the appropriate boxes and save again. |
| |
| ## What database engine can I use as a backend for Superset? |
| |
| To clarify, the database backend is an OLTP database used by Superset to store its internal |
| information like your list of users and dashboard definitions. While Superset supports a |
| [variety of databases as data *sources*](/docs/configuration/databases#installing-database-drivers), |
| only a few database engines are supported for use as the OLTP backend / metadata store. |
| |
| Superset is tested using MySQL, PostgreSQL, and SQLite backends. It’s recommended you install |
| Superset on one of these database servers for production. Installation on other OLTP databases |
| may work but isn’t tested. It has been reported that [Microsoft SQL Server does *not* |
| work as a Superset backend](https://github.com/apache/superset/issues/18961). Column-store, |
| non-OLTP databases are not designed for this type of workload. |
| |
| ## How can I configure OAuth authentication and authorization? |
| |
| You can take a look at this Flask-AppBuilder |
| [configuration example](https://github.com/dpgaspar/Flask-AppBuilder/blob/master/examples/oauth/config.py). |
| |
| ## Is there a way to force the dashboard to use specific colors? |
| |
| It is possible on a per-dashboard basis by providing a mapping of labels to colors in the JSON |
| Metadata attribute using the `label_colors` key. |
| |
| ```json |
| { |
| "label_colors": { |
| "Girls": "#FF69B4", |
| "Boys": "#ADD8E6" |
| } |
| } |
| ``` |
| |
| ## Does Superset work with [insert database engine here]? |
| |
| The [Connecting to Databases section](/docs/configuration/databases) provides the best |
| overview for supported databases. Database engines not listed on that page may work too. We rely on |
| the community to contribute to this knowledge base. |
| |
| For a database engine to be supported in Superset through the SQLAlchemy connector, it requires |
| having a Python compliant [SQLAlchemy dialect](https://docs.sqlalchemy.org/en/13/dialects/) as well |
| as a [DBAPI driver](https://www.python.org/dev/peps/pep-0249/) defined. Database that have limited |
| SQL support may work as well. For instance it’s possible to connect to Druid through the SQLAlchemy |
| connector even though Druid does not support joins and subqueries. Another key element for a |
| database to be supported is through the Superset Database Engine Specification interface. This |
| interface allows for defining database-specific configurations and logic that go beyond the |
| SQLAlchemy and DBAPI scope. This includes features like: |
| |
| - date-related SQL function that allow Superset to fetch different time granularities when running |
| time-series queries |
| - whether the engine supports subqueries. If false, Superset may run 2-phase queries to compensate |
| for the limitation |
| - methods around processing logs and inferring the percentage of completion of a query |
| - technicalities as to how to handle cursors and connections if the driver is not standard DBAPI |
| |
| Beyond the SQLAlchemy connector, it’s also possible, though much more involved, to extend Superset |
| and write your own connector. The only example of this at the moment is the Druid connector, which |
| is getting superseded by Druid’s growing SQL support and the recent availability of a DBAPI and |
| SQLAlchemy driver. If the database you are considering integrating has any kind of of SQL support, |
| it’s probably preferable to go the SQLAlchemy route. Note that for a native connector to be possible |
| the database needs to have support for running OLAP-type queries and should be able to do things that |
| are typical in basic SQL: |
| |
| - aggregate data |
| - apply filters |
| - apply HAVING-type filters |
| - be schema-aware, expose columns and types |
| |
| ## Does Superset offer a public API? |
| |
| Yes, a public REST API, and the surface of that API formal is expanding steadily. You can read more about this API and |
| interact with it using Swagger [here](/docs/api). |
| |
| Some of the |
| original vision for the collection of endpoints under **/api/v1** was originally specified in |
| [SIP-17](https://github.com/apache/superset/issues/7259) and constant progress has been |
| made to cover more and more use cases. |
| |
| The API available is documented using [Swagger](https://swagger.io/) and the documentation can be |
| made available under **/swagger/v1** by enabling the following flag in `superset_config.py`: |
| |
| ``` |
| FAB_API_SWAGGER_UI = True |
| ``` |
| |
| There are other undocumented [private] ways to interact with Superset programmatically that offer no |
| guarantees and are not recommended but may fit your use case temporarily: |
| |
| - using the ORM (SQLAlchemy) directly |
| - using the internal FAB ModelView API (to be deprecated in Superset) |
| - altering the source code in your fork |
| |
| ## How can I see usage statistics (e.g., monthly active users)? |
| |
| This functionality is not included with Superset, but you can extract and analyze Superset's application |
| metadata to see what actions have occurred. By default, user activities are logged in the `logs` table |
| in Superset's metadata database. One company has published a write-up of [how they analyzed Superset |
| usage, including example queries](https://engineering.hometogo.com/monitor-superset-usage-via-superset-c7f9fba79525). |
| |
| ## What Does Hours Offset in the Edit Dataset view do? |
| |
| In the Edit Dataset view, you can specify a time offset. This field lets you configure the |
| number of hours to be added or subtracted from the time column. |
| This can be used, for example, to convert UTC time to local time. |
| |
| ## Does Superset collect any telemetry data? |
| |
| Superset uses [Scarf](https://about.scarf.sh/) by default to collect basic telemetry data upon installing and/or running Superset. This data helps the maintainers of Superset better understand which versions of Superset are being used, in order to prioritize patch/minor releases and security fixes. |
| We use the [Scarf Gateway](https://docs.scarf.sh/gateway/) to sit in front of container registries, the [scarf-js](https://about.scarf.sh/package-sdks) package to track `npm` installations, and a Scarf pixel to gather anonymous analytics on Superset page views. |
| Scarf purges PII and provides aggregated statistics. Superset users can easily opt out of analytics in various ways documented [here](https://docs.scarf.sh/gateway/#do-not-track) and [here](https://docs.scarf.sh/package-analytics/#as-a-user-of-a-package-using-scarf-js-how-can-i-opt-out-of-analytics). |
| Superset maintainers can also opt out of telemetry data collection by setting the `SCARF_ANALYTICS` environment variable to `false` in the Superset container (or anywhere Superset/webpack are run). |
| Additional opt-out instructions for Docker users are available on the [Docker Installation](/docs/installation/docker-compose) page. |
| |
| ## Does Superset have an archive panel or trash bin from which a user can recover deleted assets? |
| |
| No. Currently, there is no way to recover a deleted Superset dashboard/chart/dataset/database from the UI. However, there is an [ongoing discussion](https://github.com/apache/superset/discussions/18386) about implementing such a feature. |
| |
| Hence, it is recommended to take periodic backups of the metadata database. For recovery, you can launch a recovery instance of a Superset server with the backed-up copy of the DB attached and use the Export Dashboard button in the Superset UI (or the `superset export-dashboards` CLI command). Then, take the .zip file and import it into the current Superset instance. |
| |
| Alternatively, you can programmatically take regular exports of the assets as a backup. |