| # QueryObject Documentation |
| |
| **File**: `superset/common/query_object.py` |
| |
| The `QueryObject` class represents a single database query specification constructed on the client. It describes *what* data to fetch (columns, metrics, filters) without specifying *how* to fetch it. The datasource connector is responsible for translating the QueryObject into the appropriate query language (SQL, GraphQL, etc.). |
| |
| ## Table of Contents |
| |
| - [Core Query Specification](#core-query-specification) |
| - [Data Selection](#data-selection) |
| - [Filtering](#filtering) |
| - [Aggregation & Metrics](#aggregation--metrics) |
| - [Time/Temporal](#timetemporal) |
| - [Sorting & Limiting](#sorting--limiting) |
| - [Series Limiting (Timeseries)](#series-limiting-timeseries) |
| - [Post-Processing](#post-processing) |
| - [Annotations](#annotations) |
| - [Query Execution Control](#query-execution-control) |
| - [Deprecated Fields](#deprecated-fields) |
| |
| --- |
| |
| ## Core Query Specification |
| |
| ### `datasource` |
| |
| **Type**: `BaseDatasource | None` |
| |
| **Description**: Reference to the datasource (dataset, table, or query) from which to fetch data. This is the data source object itself, not just an identifier. |
| |
| **Default**: `None` |
| |
| **Usage**: Set automatically by the QueryContext when loading from the schema. The datasource provides metadata about available columns, metrics, and handles the actual query execution. |
| |
| **Example**: |
| ```python |
| # Automatically set during query context creation |
| query_object.datasource # <SqlaTable 'public.sales_data'> |
| ``` |
| |
| --- |
| |
| ### `columns` |
| |
| **Type**: `list[Column]` |
| |
| **Column Type**: `Union[AdhocColumn, str]` |
| |
| **Description**: List of dimensions (non-aggregated columns) to select in the query. These become GROUP BY columns if metrics are specified, or simple SELECT columns otherwise. |
| |
| **Default**: `[]` (empty list) |
| |
| **Formats Supported**: |
| 1. **String reference**: `"country"` - references a physical column |
| 2. **Adhoc column** (dict): |
| ```python |
| { |
| "label": "upper_name", |
| "sqlExpression": "UPPER(name)", |
| "hasCustomLabel": True |
| } |
| ``` |
| |
| **Usage Notes**: |
| - Formula annotations don't count as columns (filtered out) |
| - Used as the default for `series_columns` in timeseries queries |
| - Referenced in GROUP BY clauses when metrics are present |
| |
| **Related**: |
| - `column_names` property: Returns list of column labels as strings |
| - `series_columns`: Subset of columns used for series limiting |
| |
| **Example**: |
| ```python |
| query_object.columns = ["country", "state", "city"] |
| |
| # Or with adhoc columns: |
| query_object.columns = [ |
| "country", |
| { |
| "label": "year", |
| "sqlExpression": "EXTRACT(YEAR FROM created_at)", |
| "hasCustomLabel": True |
| } |
| ] |
| ``` |
| |
| --- |
| |
| ### `metrics` |
| |
| **Type**: `list[Metric] | None` |
| |
| **Metric Type**: `Union[AdhocMetric, str]` |
| |
| **Description**: List of aggregate expressions to compute. These become the SELECT clause aggregations in SQL queries. |
| |
| **Default**: `None` |
| |
| **Formats Supported**: |
| 1. **String reference**: `"count"` - references a predefined metric |
| 2. **Legacy format**: `{"label": "count"}` - references a predefined metric (converted to string) |
| 3. **Adhoc SIMPLE metric**: |
| ```python |
| { |
| "expressionType": "SIMPLE", |
| "aggregate": "SUM", |
| "column": {"column_name": "revenue"}, |
| "label": "Total Revenue", |
| "hasCustomLabel": True |
| } |
| ``` |
| 4. **Adhoc SQL metric**: |
| ```python |
| { |
| "expressionType": "SQL", |
| "sqlExpression": "SUM(price * quantity)", |
| "label": "Total Sales", |
| "hasCustomLabel": True |
| } |
| ``` |
| |
| **Usage Notes**: |
| - When metrics are specified, queries automatically include GROUP BY |
| - When `None` or empty, no aggregation is performed |
| - Legacy format `{"label": "..."}` is automatically converted to string |
| |
| **Related**: |
| - `metric_names` property: Returns list of metric labels as strings |
| - `is_rowcount`: Alternative to metrics for counting rows |
| |
| **Example**: |
| ```python |
| # Simple metric references |
| query_object.metrics = ["count", "sum__revenue"] |
| |
| # Adhoc metrics |
| query_object.metrics = [ |
| { |
| "expressionType": "SIMPLE", |
| "aggregate": "AVG", |
| "column": {"column_name": "price"}, |
| "label": "Average Price" |
| }, |
| { |
| "expressionType": "SQL", |
| "sqlExpression": "SUM(CASE WHEN status='completed' THEN 1 ELSE 0 END)", |
| "label": "Completed Orders" |
| } |
| ] |
| ``` |
| |
| --- |
| |
| ## Data Selection |
| |
| ### `is_timeseries` |
| |
| **Type**: `bool` |
| |
| **Description**: Indicates whether this query is a timeseries query (data points over time). Affects how results are processed and displayed. |
| |
| **Default**: Automatically determined - `True` if `DTTM_ALIAS` (special time column constant) is in `columns`, otherwise can be explicitly set |
| |
| **Usage Notes**: |
| - Influences `series_columns` initialization |
| - Affects post-processing behavior in some visualizations |
| - Used to determine if series limiting should apply |
| |
| **Example**: |
| ```python |
| # Automatically set to True for timeseries |
| query_object.columns = ["__timestamp", "country"] |
| query_object.is_timeseries # True |
| |
| # Explicitly set |
| query_object.is_timeseries = True |
| ``` |
| |
| --- |
| |
| ### `is_rowcount` |
| |
| **Type**: `bool` |
| |
| **Description**: When `True`, returns only the total row count instead of actual data. Used for preview operations or checking data volume. |
| |
| **Default**: `False` |
| |
| **Usage Notes**: |
| - Mutually exclusive with normal metric aggregation |
| - Result contains just a count, no actual rows |
| - Useful for pagination and data size checks |
| |
| **Example**: |
| ```python |
| query_object.is_rowcount = True |
| # Query returns: {"count": 15234} |
| ``` |
| |
| --- |
| |
| ### `result_type` |
| |
| **Type**: `ChartDataResultType | None` |
| |
| **Values**: `"full"`, `"samples"`, `"query"`, `"results"`, `"post_processed"`, `"columns"`, `"timegrains"`, `"drill_detail"` |
| |
| **Description**: Specifies what type of result to return. Controls which processing pipeline is used. |
| |
| **Default**: `None` (inherits from QueryContext if not specified) |
| |
| **Types**: |
| - `"full"`: Complete query execution with all data |
| - `"samples"`: Sample rows from the datasource (no metrics, limited rows) |
| - `"query"`: Return the query string without executing |
| - `"results"`: Like "full" but with minimal metadata |
| - `"post_processed"`: Full results for client-side post-processing |
| - `"columns"`: Return column metadata only |
| - `"timegrains"`: Return available time granularities |
| - `"drill_detail"`: Return drill-through detail rows |
| |
| **Example**: |
| ```python |
| query_object.result_type = ChartDataResultType.SAMPLES |
| # Returns sample rows without aggregation |
| ``` |
| |
| --- |
| |
| ## Filtering |
| |
| ### `filter` |
| |
| **Type**: `list[QueryObjectFilterClause]` |
| |
| **FilterClause Type**: |
| ```python |
| { |
| "col": Union[str, AdhocColumn], # Column to filter |
| "op": str, # Operator (e.g., "==", "IN", "LIKE") |
| "val": FilterValues | None, # Value(s) to compare |
| "grain": str | None, # Time grain for temporal filters |
| "isExtra": bool | None # Added by dashboard filters |
| } |
| ``` |
| |
| **Description**: List of filter conditions to apply to the query. These become WHERE clause conditions in SQL. |
| |
| **Default**: `[]` (empty list) |
| |
| **Supported Operators**: `==`, `!=`, `>`, `<`, `>=`, `<=`, `IN`, `NOT IN`, `LIKE`, `ILIKE`, `REGEX`, `TEMPORAL_RANGE`, `IS NULL`, `IS NOT NULL`, and more (see `FilterOperator` enum) |
| |
| **Usage Notes**: |
| - Filters are ANDed together |
| - `isExtra: True` indicates filter was added by dashboard/native filters |
| - Temporal filters (`TEMPORAL_RANGE`) require `grain` to be set |
| - Filters go through Jinja template processing if datasource supports it |
| - Sanitized for SQL injection before execution |
| |
| **Example**: |
| ```python |
| query_object.filter = [ |
| { |
| "col": "country", |
| "op": "IN", |
| "val": ["USA", "Canada", "Mexico"] |
| }, |
| { |
| "col": "revenue", |
| "op": ">=", |
| "val": 1000 |
| }, |
| { |
| "col": "created_at", |
| "op": "TEMPORAL_RANGE", |
| "val": "Last 30 days", |
| "grain": "P1D" |
| } |
| ] |
| ``` |
| |
| --- |
| |
| ### `extras` |
| |
| **Type**: `dict[str, Any]` |
| |
| **Description**: Additional query parameters and modifiers. This is an extensible dictionary for extra query options. |
| |
| **Default**: `{}` (empty dict) |
| |
| **Common Keys**: |
| - `"where"`: Raw SQL WHERE clause (added via AND) |
| - `"having"`: Raw SQL HAVING clause for aggregate filters |
| - `"time_grain_sqla"`: Time granularity (e.g., `"P1D"` for 1 day) |
| - `"relative_start"`: Start reference point (`"today"` or `"now"`) |
| - `"relative_end"`: End reference point (`"today"` or `"now"`) |
| - `"instant_time_comparison_range"`: For advanced time comparison features |
| |
| **Usage Notes**: |
| - WHERE and HAVING clauses support Jinja templates |
| - All SQL clauses are sanitized for security |
| - `time_grain_sqla` controls temporal aggregation level |
| - **Warning**: Direct SQL clauses (`where`, `having`) bypass some security layers |
| |
| **Example**: |
| ```python |
| query_object.extras = { |
| "where": "status = 'active' AND archived = false", |
| "having": "SUM(revenue) > 10000", |
| "time_grain_sqla": "P1W", # Weekly granularity |
| "relative_start": "now" |
| } |
| ``` |
| |
| --- |
| |
| ### `apply_fetch_values_predicate` |
| |
| **Type**: `bool` |
| |
| **Description**: When `True`, applies additional WHERE clause predicates defined in the datasource configuration for fetching filter values. |
| |
| **Default**: `False` |
| |
| **Usage Notes**: |
| - Used primarily when loading filter options |
| - Applies datasource-specific filter predicates |
| - Helps limit the domain of filter values |
| |
| **Example**: |
| ```python |
| query_object.apply_fetch_values_predicate = True |
| # Applies any predicates configured on the datasource |
| ``` |
| |
| --- |
| |
| ### `applied_time_extras` |
| |
| **Type**: `dict[str, str]` |
| |
| **Description**: Mapping of temporal extras that have been applied to the query. Used for tracking which time filters were actually used. |
| |
| **Default**: `{}` (empty dict) |
| |
| **Usage Notes**: |
| - Populated during query execution |
| - Used for displaying which time filters are active |
| - Keys are typically time column names |
| - Values are human-readable descriptions |
| |
| **Example**: |
| ```python |
| query_object.applied_time_extras = { |
| "__time_range": "1 year ago : now", |
| "__time_grain": "P1D" |
| } |
| ``` |
| |
| --- |
| |
| ## Aggregation & Metrics |
| |
| ### `groupby` (DEPRECATED) |
| |
| **Status**: ⚠️ **DEPRECATED** - Use `columns` instead |
| |
| **Type**: N/A (automatically renamed to `columns`) |
| |
| **Description**: Legacy field name for grouping columns. Automatically converted to `columns` during initialization. |
| |
| **Migration**: Replace `groupby` with `columns` in all new code. |
| |
| --- |
| |
| ## Time/Temporal |
| |
| ### `granularity` |
| |
| **Type**: `str | None` |
| |
| **Description**: Name of the temporal column to use for time-based operations (filtering, grouping). This is the primary time dimension for the query. |
| |
| **Default**: `None` |
| |
| **Usage Notes**: |
| - Used for temporal filtering and aggregation |
| - Essential for timeseries queries |
| - References a datetime column in the datasource |
| - Used in time range filters |
| |
| **Related**: |
| - `extras["time_grain_sqla"]`: Controls temporal aggregation granularity |
| - `time_range`: The time range to filter by |
| - `granularity_sqla`: Deprecated alias for `granularity` |
| |
| **Example**: |
| ```python |
| query_object.granularity = "order_date" |
| # All time operations will use the order_date column |
| ``` |
| |
| --- |
| |
| ### `granularity_sqla` (DEPRECATED) |
| |
| **Status**: ⚠️ **DEPRECATED** - Use `granularity` instead |
| |
| **Type**: N/A (automatically renamed to `granularity`) |
| |
| **Description**: Legacy SQL-specific field name for temporal column. Automatically converted during initialization. |
| |
| **Migration**: Replace `granularity_sqla` with `granularity` in all new code. |
| |
| --- |
| |
| ### `time_range` |
| |
| **Type**: `str | None` |
| |
| **Description**: Human-readable time range specification for filtering temporal data. Supports both relative and absolute formats. |
| |
| **Default**: `None` |
| |
| **Supported Formats**: |
| - **Relative**: `"Last 7 days"`, `"Last week"`, `"Last month"`, `"Last quarter"`, `"Last year"` |
| - **Relative with number**: `"Last 30 days"`, `"Last 6 months"`, `"Next 2 weeks"` |
| - **Absolute**: `"2023-01-01 : 2023-12-31"` (ISO 8601) |
| - **Mixed**: `"2023-01-01 : now"`, `"1 year ago : now"` |
| - **No filter**: `"No filter"` |
| |
| **Usage Notes**: |
| - Parsed into `from_dttm` and `to_dttm` datetime objects |
| - Relative times are resolved at query execution time |
| - Used for cache key generation (not the parsed datetime values) |
| - Supports parsedatetime syntax |
| |
| **Related**: |
| - `from_dttm`: Computed start datetime |
| - `to_dttm`: Computed end datetime |
| - `granularity`: Column to filter on |
| |
| **Example**: |
| ```python |
| query_object.time_range = "Last 30 days" |
| # Computed at runtime: from_dttm = now() - 30 days, to_dttm = now() |
| |
| query_object.time_range = "2023-01-01 : 2023-06-30" |
| # Explicit range |
| ``` |
| |
| --- |
| |
| ### `from_dttm` |
| |
| **Type**: `datetime | None` |
| |
| **Description**: Computed start datetime for the time range filter. Automatically calculated from `time_range`. |
| |
| **Default**: `None` |
| |
| **Usage Notes**: |
| - Set automatically during query processing |
| - Not included in cache key (time_range is used instead) |
| - May be overridden for time offset queries |
| |
| **Example**: |
| ```python |
| query_object.time_range = "Last 7 days" |
| # After processing: |
| query_object.from_dttm # datetime(2024, 1, 15, 0, 0, 0) |
| ``` |
| |
| --- |
| |
| ### `to_dttm` |
| |
| **Type**: `datetime | None` |
| |
| **Description**: Computed end datetime for the time range filter. Automatically calculated from `time_range`. |
| |
| **Default**: `None` |
| |
| **Usage Notes**: |
| - Set automatically during query processing |
| - Not included in cache key (time_range is used instead) |
| - May be overridden for time offset queries |
| |
| **Example**: |
| ```python |
| query_object.time_range = "Last 7 days" |
| # After processing: |
| query_object.to_dttm # datetime(2024, 1, 22, 23, 59, 59) |
| ``` |
| |
| --- |
| |
| ### `inner_from_dttm` |
| |
| **Type**: `datetime | None` |
| |
| **Description**: Inner time range start for nested temporal operations. Used when applying time filters to queries that don't have time as a dimension. |
| |
| **Default**: `None` |
| |
| **Usage Notes**: |
| - Used in advanced time comparison scenarios |
| - Typically same as `from_dttm` for simple queries |
| - May differ when time offsets are applied |
| |
| **Example**: |
| ```python |
| # Set during time offset processing |
| query_object.inner_from_dttm = datetime(2024, 1, 1) |
| ``` |
| |
| --- |
| |
| ### `inner_to_dttm` |
| |
| **Type**: `datetime | None` |
| |
| **Description**: Inner time range end for nested temporal operations. Used when applying time filters to queries that don't have time as a dimension. |
| |
| **Default**: `None` |
| |
| **Usage Notes**: |
| - Used in advanced time comparison scenarios |
| - Typically same as `to_dttm` for simple queries |
| - May differ when time offsets are applied |
| |
| **Example**: |
| ```python |
| # Set during time offset processing |
| query_object.inner_to_dttm = datetime(2024, 1, 31) |
| ``` |
| |
| --- |
| |
| ### `time_shift` |
| |
| **Type**: `str | None` |
| |
| **Description**: Shifts the entire time range by a specified offset. Used for comparing data across different time periods. |
| |
| **Default**: `None` |
| |
| **Supported Formats**: Any parsedatetime-compatible string (e.g., `"1 week ago"`, `"3 months ago"`, `"1 year ago"`) |
| |
| **Usage Notes**: |
| - Applied to both `from_dttm` and `to_dttm` |
| - Different from `time_offsets` (which creates separate queries) |
| - Affects the main query time range |
| |
| **Example**: |
| ```python |
| query_object.time_range = "Last 7 days" |
| query_object.time_shift = "1 week ago" |
| # Shifts the entire 7-day window back by 1 week |
| ``` |
| |
| --- |
| |
| ### `time_offsets` |
| |
| **Type**: `list[str]` |
| |
| **Description**: List of time offsets for creating comparison queries. Each offset generates an additional query with shifted time ranges, enabling time-over-time comparisons. |
| |
| **Default**: `[]` (empty list) |
| |
| **Supported Formats**: |
| 1. **Relative**: `"1 week ago"`, `"1 year ago"`, `"3 months ago"` |
| 2. **Date range** (with feature flag): `"2023-01-01 : 2023-01-31"` |
| 3. **Special**: `"inherit"` (uses the time range span) |
| 4. **Custom date**: `"2023-06-15"` (compares to this specific date) |
| |
| **Usage Notes**: |
| - Creates separate queries for each offset |
| - Results are joined with the main query results |
| - Offset metrics are renamed (e.g., `revenue` → `revenue__1 week ago`) |
| - Requires `DATE_RANGE_TIMESHIFTS_ENABLED` feature flag for date range format |
| - Used for year-over-year, month-over-month comparisons |
| |
| **Related**: |
| - Time grain is required for proper joining of offset results |
| - `series_limit` applies to the main query, not offset queries |
| |
| **Example**: |
| ```python |
| query_object.time_offsets = ["1 week ago", "1 year ago"] |
| # Generates 3 queries total: |
| # 1. Main query (this week) |
| # 2. Same query shifted back 1 week |
| # 3. Same query shifted back 1 year |
| # Results joined on time + other dimensions |
| |
| # Or with date range (requires feature flag): |
| query_object.time_offsets = ["2023-01-01 : 2023-01-31"] |
| # Compares current period to January 2023 |
| ``` |
| |
| --- |
| |
| ## Sorting & Limiting |
| |
| ### `orderby` |
| |
| **Type**: `list[OrderBy]` |
| |
| **OrderBy Type**: `tuple[Union[Metric, Column], bool]` |
| |
| **Description**: List of ordering specifications. Each tuple contains a column/metric and a boolean indicating ascending order. |
| |
| **Default**: `[]` (empty list) |
| |
| **Format**: `[(column_or_metric, is_ascending), ...]` |
| |
| **Usage Notes**: |
| - Boolean `True` = ascending order |
| - Boolean `False` = descending order |
| - Can order by both columns and metrics |
| - Applied after aggregation |
| |
| **Related**: |
| - `order_desc`: Default sort direction (deprecated in favor of explicit orderby) |
| |
| **Example**: |
| ```python |
| query_object.orderby = [ |
| ("revenue", False), # Order by revenue descending |
| ("country", True) # Then by country ascending |
| ] |
| ``` |
| |
| --- |
| |
| ### `order_desc` |
| |
| **Type**: `bool` |
| |
| **Description**: Default sort direction when orderby is not specified. Primarily affects how series are ordered. |
| |
| **Default**: `True` (descending) |
| |
| **Usage Notes**: |
| - Less flexible than `orderby` |
| - Prefer using explicit `orderby` for complex sorting |
| - Used mainly for backward compatibility |
| |
| **Example**: |
| ```python |
| query_object.order_desc = False |
| # Results sorted in ascending order |
| ``` |
| |
| --- |
| |
| ### `row_limit` |
| |
| **Type**: `int | None` |
| |
| **Description**: Maximum number of rows to return from the query. Acts as a SQL LIMIT clause. |
| |
| **Default**: `None` (uses system default) |
| |
| **Range**: `>= 0` (0 means no limit) |
| |
| **Usage Notes**: |
| - Applied after all filtering and aggregation |
| - Different from `series_limit` (which limits timeseries) |
| - System default from `config["ROW_LIMIT"]` |
| - Can be overridden per-query |
| |
| **Related**: |
| - `row_offset`: Works with row_limit for pagination |
| - `series_limit`: For limiting timeseries/series count |
| |
| **Example**: |
| ```python |
| query_object.row_limit = 1000 |
| # Returns at most 1000 rows |
| ``` |
| |
| --- |
| |
| ### `row_offset` |
| |
| **Type**: `int` |
| |
| **Description**: Number of rows to skip before returning results. Acts as a SQL OFFSET clause for pagination. |
| |
| **Default**: `0` (no offset) |
| |
| **Range**: `>= 0` |
| |
| **Usage Notes**: |
| - Used with `row_limit` for pagination |
| - Applied after ordering |
| - Useful for infinite scroll or paginated tables |
| |
| **Example**: |
| ```python |
| # Page 3 of results (20 per page) |
| query_object.row_limit = 20 |
| query_object.row_offset = 40 # Skip first 40 rows |
| ``` |
| |
| --- |
| |
| ## Series Limiting (Timeseries) |
| |
| ### `series_columns` |
| |
| **Type**: `list[Column]` |
| |
| **Description**: Subset of `columns` to use when limiting the number of series in timeseries queries. Defines which dimensions create distinct series. |
| |
| **Default**: Automatically initialized based on context: |
| - If explicitly provided, uses those columns |
| - If `is_timeseries=True` and `metrics` exist, uses all `columns` |
| - Otherwise, empty list |
| |
| **Usage Notes**: |
| - All series_columns must be present in `columns` |
| - Used with `series_limit` to control series count |
| - Validated during query validation |
| - Creates one series per unique combination of series_column values |
| |
| **Related**: |
| - `series_limit`: Maximum number of series |
| - `series_limit_metric`: Metric to use for ranking series |
| - `group_others_when_limit_reached`: Whether to group remaining series |
| |
| **Example**: |
| ```python |
| query_object.columns = ["country", "product", "date"] |
| query_object.series_columns = ["country", "product"] |
| # Creates one series per (country, product) combination |
| ``` |
| |
| --- |
| |
| ### `series_limit` |
| |
| **Type**: `int` |
| |
| **Description**: Maximum number of series to return in a timeseries query. Series are ranked by `series_limit_metric` and top N are kept. |
| |
| **Default**: `0` (no limit) |
| |
| **Usage Notes**: |
| - Only applies to timeseries queries |
| - Requires `series_columns` and `series_limit_metric` to be set |
| - Top series are selected by the specified metric |
| - See `group_others_when_limit_reached` for handling excluded series |
| |
| **Related**: |
| - `series_columns`: Dimensions that define series |
| - `series_limit_metric`: Metric used for ranking |
| - `timeseries_limit`: Deprecated alias |
| |
| **Example**: |
| ```python |
| query_object.series_limit = 10 |
| query_object.series_limit_metric = "revenue" |
| query_object.series_columns = ["country"] |
| # Returns top 10 countries by revenue |
| ``` |
| |
| --- |
| |
| ### `series_limit_metric` |
| |
| **Type**: `Metric | None` |
| |
| **Description**: The metric to use for ranking series when `series_limit` is applied. Determines which series are kept. |
| |
| **Default**: `None` |
| |
| **Usage Notes**: |
| - Required when `series_limit` is set |
| - Must be one of the metrics in the query |
| - Series are ranked in descending order by this metric |
| - Can be a string reference or adhoc metric |
| |
| **Related**: |
| - `series_limit`: Number of series to keep |
| - `timeseries_limit_metric`: Deprecated alias |
| |
| **Example**: |
| ```python |
| query_object.series_limit_metric = "sum__revenue" |
| # Ranks series by total revenue |
| ``` |
| |
| --- |
| |
| ### `group_others_when_limit_reached` |
| |
| **Type**: `bool` |
| |
| **Description**: When `True` and series limit is reached, groups all remaining series into an "Others" category. Prevents incomplete data visualization. |
| |
| **Default**: `False` |
| |
| **Usage Notes**: |
| - Only relevant when `series_limit` is set |
| - Aggregates metrics for excluded series |
| - Helps show complete totals while limiting series count |
| - "Others" category appears as a separate series |
| |
| **Example**: |
| ```python |
| query_object.series_limit = 5 |
| query_object.series_limit_metric = "count" |
| query_object.group_others_when_limit_reached = True |
| # Shows top 5 series + "Others" category with remaining aggregated |
| ``` |
| |
| --- |
| |
| ### `timeseries_limit` (DEPRECATED) |
| |
| **Status**: ⚠️ **DEPRECATED** - Use `series_limit` instead |
| |
| **Type**: N/A (automatically renamed to `series_limit`) |
| |
| **Description**: Legacy field name for series limit. Automatically converted during initialization. |
| |
| **Migration**: Replace `timeseries_limit` with `series_limit` in all new code. |
| |
| --- |
| |
| ### `timeseries_limit_metric` (DEPRECATED) |
| |
| **Status**: ⚠️ **DEPRECATED** - Use `series_limit_metric` instead |
| |
| **Type**: N/A (automatically renamed to `series_limit_metric`) |
| |
| **Description**: Legacy field name for series limit metric. Automatically converted during initialization. |
| |
| **Migration**: Replace `timeseries_limit_metric` with `series_limit_metric` in all new code. |
| |
| --- |
| |
| ## Post-Processing |
| |
| ### `post_processing` |
| |
| **Type**: `list[dict[str, Any]]` |
| |
| **Description**: Ordered list of post-processing operations to apply to the query results. These transformations run on the DataFrame after SQL execution. |
| |
| **Default**: `[]` (empty list) |
| |
| **Operation Format**: |
| ```python |
| { |
| "operation": str, # Operation name (from pandas_postprocessing module) |
| "options": dict # Operation-specific parameters |
| } |
| ``` |
| |
| **Available Operations**: |
| - `aggregate`: Group and aggregate data |
| - `pivot`: Pivot table transformation |
| - `rolling`: Rolling window calculations |
| - `sort`: Sort data |
| - `select`: Select/rename columns |
| - `contribution`: Calculate contribution percentages |
| - `prophet`: Time series forecasting |
| - `boxplot`: Statistical boxplot calculations |
| - `histogram`: Create histogram bins |
| - `geohash_decode`: Decode geohash to lat/lon |
| - `geohash_encode`: Encode lat/lon to geohash |
| - `geodetic_parse`: Parse geodetic coordinates |
| - And more... |
| |
| **Usage Notes**: |
| - Operations applied in sequence |
| - Each operation receives output of previous operation |
| - All operations are from `superset.utils.pandas_postprocessing` |
| - Validation ensures operation names are valid |
| - Useful for client-side transformations without re-querying |
| |
| **Example**: |
| ```python |
| query_object.post_processing = [ |
| { |
| "operation": "pivot", |
| "options": { |
| "index": ["country"], |
| "columns": ["product"], |
| "aggregates": { |
| "revenue": {"operator": "sum"} |
| } |
| } |
| }, |
| { |
| "operation": "sort", |
| "options": { |
| "columns": {"revenue": False} # Descending |
| } |
| } |
| ] |
| ``` |
| |
| **Histogram Example**: |
| ```python |
| query_object.post_processing = [ |
| { |
| "operation": "histogram", |
| "options": { |
| "column": "age", |
| "bins": 25 |
| } |
| } |
| ] |
| ``` |
| |
| --- |
| |
| ## Annotations |
| |
| ### `annotation_layers` |
| |
| **Type**: `list[dict[str, Any]]` |
| |
| **Description**: List of annotation layers to overlay on the chart. Annotations add contextual information like events, ranges, or reference data. |
| |
| **Default**: `[]` (empty list, formula annotations filtered out) |
| |
| **Layer Structure**: |
| ```python |
| { |
| "annotationType": str, # "FORMULA", "NATIVE", "line", "table" |
| "name": str, # Layer name |
| "value": Any, # Layer-specific value (ID, formula, etc.) |
| "show": bool, # Whether to show the layer |
| "sourceType": str, # "NATIVE", "line", "table", etc. |
| "color": str, # Layer color |
| "opacity": str, # "opacityLow", "opacityMedium", "opacityHigh" |
| "style": str, # "solid", "dashed", "dotted", "longDashed" |
| "width": float, # Line width |
| "showMarkers": bool, # Show markers on line annotations |
| "showLabel": bool, # Always show label |
| "hideLine": bool, # Hide line (show markers only) |
| "timeColumn": str, # Column with timestamps |
| "intervalEndColumn": str, # For interval annotations |
| "titleColumn": str, # Column for titles |
| "descriptionColumns": list, # Columns for descriptions |
| "overrides": dict # Override query properties |
| } |
| ``` |
| |
| **Annotation Types**: |
| - `"FORMULA"`: Simple formula overlays (e.g., constant line) - **filtered out, don't affect query** |
| - `"NATIVE"`: Native Superset annotations stored in DB |
| - `"line"` / `"table"`: Annotations from other charts/queries |
| |
| **Usage Notes**: |
| - Formula annotations are filtered out (don't affect payload) |
| - Other annotations trigger additional data fetches |
| - Annotations can override time range, granularity, etc. |
| - Used primarily for time series visualizations |
| |
| **Example**: |
| ```python |
| query_object.annotation_layers = [ |
| { |
| "annotationType": "NATIVE", |
| "name": "Important Events", |
| "value": 1, # annotation_layer_id |
| "show": True, |
| "sourceType": "NATIVE", |
| "color": "#ff0000" |
| }, |
| { |
| "annotationType": "line", |
| "name": "Baseline", |
| "value": 42, # chart_id |
| "show": True, |
| "sourceType": "line", |
| "style": "dashed", |
| "overrides": { |
| "time_range": "Last year" |
| } |
| } |
| ] |
| ``` |
| |
| --- |
| |
| ## Query Execution Control |
| |
| ### `url_params` |
| |
| **Type**: `dict[str, str]` (keys and values are strings) |
| |
| **Description**: Optional query parameters passed from dashboard or Explore view URLs. Used for dynamic filtering and Jinja template variables. |
| |
| **Default**: Not set (from kwargs) |
| |
| **Usage Notes**: |
| - Available in Jinja templates via `url_param` function |
| - Typically set by dashboard filters or URL parameters |
| - Keys and values are always strings |
| - Can be used for dynamic query customization |
| |
| **Example**: |
| ```python |
| query_object.url_params = { |
| "country_filter": "USA", |
| "min_date": "2024-01-01" |
| } |
| |
| # In Jinja template: |
| # WHERE country = '{{ url_param("country_filter") }}' |
| ``` |
| |
| --- |
| |
| ## Deprecated Fields |
| |
| ### `where` (DEPRECATED) |
| |
| **Status**: ⚠️ **DEPRECATED** - Use `extras["where"]` instead |
| |
| **Type**: Field moved to extras during initialization |
| |
| **Description**: Raw SQL WHERE clause. Automatically moved to `extras["where"]`. |
| |
| **Migration**: Use `extras["where"]` directly instead of passing `where` parameter. |
| |
| --- |
| |
| ### `having` (DEPRECATED) |
| |
| **Status**: ⚠️ **DEPRECATED** - Use `extras["having"]` instead |
| |
| **Type**: Field moved to extras during initialization |
| |
| **Description**: Raw SQL HAVING clause. Automatically moved to `extras["having"]`. |
| |
| **Migration**: Use `extras["having"]` directly instead of passing `having` parameter. |
| |
| --- |
| |
| ## Properties |
| |
| ### `metric_names` |
| |
| **Type**: `list[str]` (read-only property) |
| |
| **Description**: Returns the metric labels as strings. Converts adhoc metrics to their labels. |
| |
| **Usage**: Accessing metric names without dealing with adhoc metric dictionaries. |
| |
| **Example**: |
| ```python |
| query_object.metrics = ["count", {"expressionType": "SQL", "label": "revenue", ...}] |
| query_object.metric_names # ["count", "revenue"] |
| ``` |
| |
| --- |
| |
| ### `column_names` |
| |
| **Type**: `list[str]` (read-only property) |
| |
| **Description**: Returns the column labels as strings. Converts adhoc columns to their labels. |
| |
| **Usage**: Accessing column names without dealing with adhoc column dictionaries. |
| |
| **Example**: |
| ```python |
| query_object.columns = ["country", {"label": "year", "sqlExpression": "...", ...}] |
| query_object.column_names # ["country", "year"] |
| ``` |
| |
| --- |
| |
| ## Methods |
| |
| ### `validate(raise_exceptions: bool = True) -> QueryObjectValidationError | None` |
| |
| **Description**: Validates the query object for correctness. Checks for duplicate labels, missing series columns, invalid time offsets, and sanitizes filters. |
| |
| **Parameters**: |
| - `raise_exceptions`: If `True`, raises exception on validation error. If `False`, returns the error object. |
| |
| **Validates**: |
| 1. No missing series columns |
| 2. No duplicate column/metric labels |
| 3. Valid time offset configurations |
| 4. Sanitizes WHERE/HAVING clauses for SQL injection |
| |
| **Returns**: `None` if valid, or `QueryObjectValidationError` if `raise_exceptions=False` |
| |
| **Example**: |
| ```python |
| try: |
| query_object.validate() |
| except QueryObjectValidationError as e: |
| print(f"Validation failed: {e.message}") |
| ``` |
| |
| --- |
| |
| ### `to_dict() -> dict[str, Any]` |
| |
| **Description**: Serializes the QueryObject to a dictionary. Used for passing to datasource connectors and caching. |
| |
| **Returns**: Dictionary representation of the query object |
| |
| **Usage**: Called internally when executing queries on datasources. |
| |
| **Example**: |
| ```python |
| query_dict = query_object.to_dict() |
| # { |
| # "columns": ["country"], |
| # "metrics": ["count"], |
| # "filters": [...], |
| # ... |
| # } |
| ``` |
| |
| --- |
| |
| ### `cache_key(**extra: Any) -> str` |
| |
| **Description**: Generates a unique cache key for this query. Uses MD5/SHA hash of query parameters. |
| |
| **Parameters**: |
| - `**extra`: Additional key-value pairs to include in cache key |
| |
| **Key Components**: |
| - All query parameters from `to_dict()` |
| - `time_range` (not `from_dttm`/`to_dttm` - for relative time caching) |
| - `datasource.uid` |
| - `result_type` |
| - `post_processing` |
| - `time_offsets` |
| - `annotation_layers` (if present) |
| - User impersonation key (if enabled) |
| |
| **Returns**: MD5/SHA hash string |
| |
| **Example**: |
| ```python |
| cache_key = query_object.cache_key( |
| time_offset="1 week ago", |
| time_grain="P1D" |
| ) |
| # "a3f5c8e9d..." |
| ``` |
| |
| --- |
| |
| ### `exec_post_processing(df: DataFrame) -> DataFrame` |
| |
| **Description**: Applies all post-processing operations to a DataFrame in sequence. |
| |
| **Parameters**: |
| - `df`: Pandas DataFrame from query execution |
| |
| **Returns**: Transformed DataFrame |
| |
| **Raises**: `InvalidPostProcessingError` if operation is invalid |
| |
| **Example**: |
| ```python |
| df = datasource.query(query_object.to_dict()).df |
| processed_df = query_object.exec_post_processing(df) |
| ``` |
| |
| --- |
| |
| ## Type Definitions |
| |
| ### `Column` |
| |
| **Type**: `Union[AdhocColumn, str]` |
| |
| **AdhocColumn Structure**: |
| ```python |
| { |
| "label": str, # Column label |
| "sqlExpression": str, # SQL expression |
| "hasCustomLabel": bool, # Whether label is custom |
| "columnType": str, # "BASE_AXIS" or "SERIES" |
| "timeGrain": str # Time grain if temporal |
| } |
| ``` |
| |
| --- |
| |
| ### `Metric` |
| |
| **Type**: `Union[AdhocMetric, str]` |
| |
| **AdhocMetric Structure**: |
| ```python |
| { |
| "expressionType": "SIMPLE" | "SQL", |
| "label": str, |
| "hasCustomLabel": bool, |
| |
| # For SIMPLE: |
| "aggregate": "SUM" | "AVG" | "COUNT" | "MIN" | "MAX" | "COUNT_DISTINCT", |
| "column": AdhocMetricColumn, |
| |
| # For SQL: |
| "sqlExpression": str |
| } |
| ``` |
| |
| --- |
| |
| ### `OrderBy` |
| |
| **Type**: `tuple[Union[Metric, Column], bool]` |
| |
| **Format**: `(column_or_metric, is_ascending)` |
| |
| **Example**: `("revenue", False)` means "ORDER BY revenue DESC" |
| |
| --- |
| |
| ## Common Usage Patterns |
| |
| ### Basic Query |
| |
| ```python |
| query_object = QueryObject( |
| columns=["country", "city"], |
| metrics=["count"], |
| row_limit=100 |
| ) |
| ``` |
| |
| ### Timeseries Query |
| |
| ```python |
| query_object = QueryObject( |
| columns=["__timestamp", "country"], |
| metrics=["sum__revenue"], |
| granularity="order_date", |
| time_range="Last 30 days", |
| extras={"time_grain_sqla": "P1D"}, |
| is_timeseries=True |
| ) |
| ``` |
| |
| ### Time Comparison Query |
| |
| ```python |
| query_object = QueryObject( |
| columns=["country"], |
| metrics=["sum__revenue"], |
| time_range="Last 7 days", |
| time_offsets=["1 week ago", "1 year ago"], |
| granularity="order_date" |
| ) |
| ``` |
| |
| ### Post-Processed Query |
| |
| ```python |
| query_object = QueryObject( |
| columns=["age"], |
| row_limit=10000, |
| post_processing=[ |
| { |
| "operation": "histogram", |
| "options": {"column": "age", "bins": 25} |
| } |
| ] |
| ) |
| ``` |
| |
| ### Filtered and Sorted Query |
| |
| ```python |
| query_object = QueryObject( |
| columns=["country", "product"], |
| metrics=["sum__revenue", "count"], |
| filter=[ |
| {"col": "status", "op": "==", "val": "completed"}, |
| {"col": "revenue", "op": ">=", "val": 100} |
| ], |
| orderby=[("sum__revenue", False)], # Descending |
| row_limit=50 |
| ) |
| ``` |