blob: 8769a0efa684707494a1b08a73dd10feb427b3a4 [file] [log] [blame] [view]
# QueryObject Documentation
**File**: `superset/common/query_object.py`
The `QueryObject` class represents a single database query specification constructed on the client. It describes *what* data to fetch (columns, metrics, filters) without specifying *how* to fetch it. The datasource connector is responsible for translating the QueryObject into the appropriate query language (SQL, GraphQL, etc.).
## Table of Contents
- [Core Query Specification](#core-query-specification)
- [Data Selection](#data-selection)
- [Filtering](#filtering)
- [Aggregation & Metrics](#aggregation--metrics)
- [Time/Temporal](#timetemporal)
- [Sorting & Limiting](#sorting--limiting)
- [Series Limiting (Timeseries)](#series-limiting-timeseries)
- [Post-Processing](#post-processing)
- [Annotations](#annotations)
- [Query Execution Control](#query-execution-control)
- [Deprecated Fields](#deprecated-fields)
---
## Core Query Specification
### `datasource`
**Type**: `BaseDatasource | None`
**Description**: Reference to the datasource (dataset, table, or query) from which to fetch data. This is the data source object itself, not just an identifier.
**Default**: `None`
**Usage**: Set automatically by the QueryContext when loading from the schema. The datasource provides metadata about available columns, metrics, and handles the actual query execution.
**Example**:
```python
# Automatically set during query context creation
query_object.datasource # <SqlaTable 'public.sales_data'>
```
---
### `columns`
**Type**: `list[Column]`
**Column Type**: `Union[AdhocColumn, str]`
**Description**: List of dimensions (non-aggregated columns) to select in the query. These become GROUP BY columns if metrics are specified, or simple SELECT columns otherwise.
**Default**: `[]` (empty list)
**Formats Supported**:
1. **String reference**: `"country"` - references a physical column
2. **Adhoc column** (dict):
```python
{
"label": "upper_name",
"sqlExpression": "UPPER(name)",
"hasCustomLabel": True
}
```
**Usage Notes**:
- Formula annotations don't count as columns (filtered out)
- Used as the default for `series_columns` in timeseries queries
- Referenced in GROUP BY clauses when metrics are present
**Related**:
- `column_names` property: Returns list of column labels as strings
- `series_columns`: Subset of columns used for series limiting
**Example**:
```python
query_object.columns = ["country", "state", "city"]
# Or with adhoc columns:
query_object.columns = [
"country",
{
"label": "year",
"sqlExpression": "EXTRACT(YEAR FROM created_at)",
"hasCustomLabel": True
}
]
```
---
### `metrics`
**Type**: `list[Metric] | None`
**Metric Type**: `Union[AdhocMetric, str]`
**Description**: List of aggregate expressions to compute. These become the SELECT clause aggregations in SQL queries.
**Default**: `None`
**Formats Supported**:
1. **String reference**: `"count"` - references a predefined metric
2. **Legacy format**: `{"label": "count"}` - references a predefined metric (converted to string)
3. **Adhoc SIMPLE metric**:
```python
{
"expressionType": "SIMPLE",
"aggregate": "SUM",
"column": {"column_name": "revenue"},
"label": "Total Revenue",
"hasCustomLabel": True
}
```
4. **Adhoc SQL metric**:
```python
{
"expressionType": "SQL",
"sqlExpression": "SUM(price * quantity)",
"label": "Total Sales",
"hasCustomLabel": True
}
```
**Usage Notes**:
- When metrics are specified, queries automatically include GROUP BY
- When `None` or empty, no aggregation is performed
- Legacy format `{"label": "..."}` is automatically converted to string
**Related**:
- `metric_names` property: Returns list of metric labels as strings
- `is_rowcount`: Alternative to metrics for counting rows
**Example**:
```python
# Simple metric references
query_object.metrics = ["count", "sum__revenue"]
# Adhoc metrics
query_object.metrics = [
{
"expressionType": "SIMPLE",
"aggregate": "AVG",
"column": {"column_name": "price"},
"label": "Average Price"
},
{
"expressionType": "SQL",
"sqlExpression": "SUM(CASE WHEN status='completed' THEN 1 ELSE 0 END)",
"label": "Completed Orders"
}
]
```
---
## Data Selection
### `is_timeseries`
**Type**: `bool`
**Description**: Indicates whether this query is a timeseries query (data points over time). Affects how results are processed and displayed.
**Default**: Automatically determined - `True` if `DTTM_ALIAS` (special time column constant) is in `columns`, otherwise can be explicitly set
**Usage Notes**:
- Influences `series_columns` initialization
- Affects post-processing behavior in some visualizations
- Used to determine if series limiting should apply
**Example**:
```python
# Automatically set to True for timeseries
query_object.columns = ["__timestamp", "country"]
query_object.is_timeseries # True
# Explicitly set
query_object.is_timeseries = True
```
---
### `is_rowcount`
**Type**: `bool`
**Description**: When `True`, returns only the total row count instead of actual data. Used for preview operations or checking data volume.
**Default**: `False`
**Usage Notes**:
- Mutually exclusive with normal metric aggregation
- Result contains just a count, no actual rows
- Useful for pagination and data size checks
**Example**:
```python
query_object.is_rowcount = True
# Query returns: {"count": 15234}
```
---
### `result_type`
**Type**: `ChartDataResultType | None`
**Values**: `"full"`, `"samples"`, `"query"`, `"results"`, `"post_processed"`, `"columns"`, `"timegrains"`, `"drill_detail"`
**Description**: Specifies what type of result to return. Controls which processing pipeline is used.
**Default**: `None` (inherits from QueryContext if not specified)
**Types**:
- `"full"`: Complete query execution with all data
- `"samples"`: Sample rows from the datasource (no metrics, limited rows)
- `"query"`: Return the query string without executing
- `"results"`: Like "full" but with minimal metadata
- `"post_processed"`: Full results for client-side post-processing
- `"columns"`: Return column metadata only
- `"timegrains"`: Return available time granularities
- `"drill_detail"`: Return drill-through detail rows
**Example**:
```python
query_object.result_type = ChartDataResultType.SAMPLES
# Returns sample rows without aggregation
```
---
## Filtering
### `filter`
**Type**: `list[QueryObjectFilterClause]`
**FilterClause Type**:
```python
{
"col": Union[str, AdhocColumn], # Column to filter
"op": str, # Operator (e.g., "==", "IN", "LIKE")
"val": FilterValues | None, # Value(s) to compare
"grain": str | None, # Time grain for temporal filters
"isExtra": bool | None # Added by dashboard filters
}
```
**Description**: List of filter conditions to apply to the query. These become WHERE clause conditions in SQL.
**Default**: `[]` (empty list)
**Supported Operators**: `==`, `!=`, `>`, `<`, `>=`, `<=`, `IN`, `NOT IN`, `LIKE`, `ILIKE`, `REGEX`, `TEMPORAL_RANGE`, `IS NULL`, `IS NOT NULL`, and more (see `FilterOperator` enum)
**Usage Notes**:
- Filters are ANDed together
- `isExtra: True` indicates filter was added by dashboard/native filters
- Temporal filters (`TEMPORAL_RANGE`) require `grain` to be set
- Filters go through Jinja template processing if datasource supports it
- Sanitized for SQL injection before execution
**Example**:
```python
query_object.filter = [
{
"col": "country",
"op": "IN",
"val": ["USA", "Canada", "Mexico"]
},
{
"col": "revenue",
"op": ">=",
"val": 1000
},
{
"col": "created_at",
"op": "TEMPORAL_RANGE",
"val": "Last 30 days",
"grain": "P1D"
}
]
```
---
### `extras`
**Type**: `dict[str, Any]`
**Description**: Additional query parameters and modifiers. This is an extensible dictionary for extra query options.
**Default**: `{}` (empty dict)
**Common Keys**:
- `"where"`: Raw SQL WHERE clause (added via AND)
- `"having"`: Raw SQL HAVING clause for aggregate filters
- `"time_grain_sqla"`: Time granularity (e.g., `"P1D"` for 1 day)
- `"relative_start"`: Start reference point (`"today"` or `"now"`)
- `"relative_end"`: End reference point (`"today"` or `"now"`)
- `"instant_time_comparison_range"`: For advanced time comparison features
**Usage Notes**:
- WHERE and HAVING clauses support Jinja templates
- All SQL clauses are sanitized for security
- `time_grain_sqla` controls temporal aggregation level
- **Warning**: Direct SQL clauses (`where`, `having`) bypass some security layers
**Example**:
```python
query_object.extras = {
"where": "status = 'active' AND archived = false",
"having": "SUM(revenue) > 10000",
"time_grain_sqla": "P1W", # Weekly granularity
"relative_start": "now"
}
```
---
### `apply_fetch_values_predicate`
**Type**: `bool`
**Description**: When `True`, applies additional WHERE clause predicates defined in the datasource configuration for fetching filter values.
**Default**: `False`
**Usage Notes**:
- Used primarily when loading filter options
- Applies datasource-specific filter predicates
- Helps limit the domain of filter values
**Example**:
```python
query_object.apply_fetch_values_predicate = True
# Applies any predicates configured on the datasource
```
---
### `applied_time_extras`
**Type**: `dict[str, str]`
**Description**: Mapping of temporal extras that have been applied to the query. Used for tracking which time filters were actually used.
**Default**: `{}` (empty dict)
**Usage Notes**:
- Populated during query execution
- Used for displaying which time filters are active
- Keys are typically time column names
- Values are human-readable descriptions
**Example**:
```python
query_object.applied_time_extras = {
"__time_range": "1 year ago : now",
"__time_grain": "P1D"
}
```
---
## Aggregation & Metrics
### `groupby` (DEPRECATED)
**Status**: ⚠️ **DEPRECATED** - Use `columns` instead
**Type**: N/A (automatically renamed to `columns`)
**Description**: Legacy field name for grouping columns. Automatically converted to `columns` during initialization.
**Migration**: Replace `groupby` with `columns` in all new code.
---
## Time/Temporal
### `granularity`
**Type**: `str | None`
**Description**: Name of the temporal column to use for time-based operations (filtering, grouping). This is the primary time dimension for the query.
**Default**: `None`
**Usage Notes**:
- Used for temporal filtering and aggregation
- Essential for timeseries queries
- References a datetime column in the datasource
- Used in time range filters
**Related**:
- `extras["time_grain_sqla"]`: Controls temporal aggregation granularity
- `time_range`: The time range to filter by
- `granularity_sqla`: Deprecated alias for `granularity`
**Example**:
```python
query_object.granularity = "order_date"
# All time operations will use the order_date column
```
---
### `granularity_sqla` (DEPRECATED)
**Status**: ⚠️ **DEPRECATED** - Use `granularity` instead
**Type**: N/A (automatically renamed to `granularity`)
**Description**: Legacy SQL-specific field name for temporal column. Automatically converted during initialization.
**Migration**: Replace `granularity_sqla` with `granularity` in all new code.
---
### `time_range`
**Type**: `str | None`
**Description**: Human-readable time range specification for filtering temporal data. Supports both relative and absolute formats.
**Default**: `None`
**Supported Formats**:
- **Relative**: `"Last 7 days"`, `"Last week"`, `"Last month"`, `"Last quarter"`, `"Last year"`
- **Relative with number**: `"Last 30 days"`, `"Last 6 months"`, `"Next 2 weeks"`
- **Absolute**: `"2023-01-01 : 2023-12-31"` (ISO 8601)
- **Mixed**: `"2023-01-01 : now"`, `"1 year ago : now"`
- **No filter**: `"No filter"`
**Usage Notes**:
- Parsed into `from_dttm` and `to_dttm` datetime objects
- Relative times are resolved at query execution time
- Used for cache key generation (not the parsed datetime values)
- Supports parsedatetime syntax
**Related**:
- `from_dttm`: Computed start datetime
- `to_dttm`: Computed end datetime
- `granularity`: Column to filter on
**Example**:
```python
query_object.time_range = "Last 30 days"
# Computed at runtime: from_dttm = now() - 30 days, to_dttm = now()
query_object.time_range = "2023-01-01 : 2023-06-30"
# Explicit range
```
---
### `from_dttm`
**Type**: `datetime | None`
**Description**: Computed start datetime for the time range filter. Automatically calculated from `time_range`.
**Default**: `None`
**Usage Notes**:
- Set automatically during query processing
- Not included in cache key (time_range is used instead)
- May be overridden for time offset queries
**Example**:
```python
query_object.time_range = "Last 7 days"
# After processing:
query_object.from_dttm # datetime(2024, 1, 15, 0, 0, 0)
```
---
### `to_dttm`
**Type**: `datetime | None`
**Description**: Computed end datetime for the time range filter. Automatically calculated from `time_range`.
**Default**: `None`
**Usage Notes**:
- Set automatically during query processing
- Not included in cache key (time_range is used instead)
- May be overridden for time offset queries
**Example**:
```python
query_object.time_range = "Last 7 days"
# After processing:
query_object.to_dttm # datetime(2024, 1, 22, 23, 59, 59)
```
---
### `inner_from_dttm`
**Type**: `datetime | None`
**Description**: Inner time range start for nested temporal operations. Used when applying time filters to queries that don't have time as a dimension.
**Default**: `None`
**Usage Notes**:
- Used in advanced time comparison scenarios
- Typically same as `from_dttm` for simple queries
- May differ when time offsets are applied
**Example**:
```python
# Set during time offset processing
query_object.inner_from_dttm = datetime(2024, 1, 1)
```
---
### `inner_to_dttm`
**Type**: `datetime | None`
**Description**: Inner time range end for nested temporal operations. Used when applying time filters to queries that don't have time as a dimension.
**Default**: `None`
**Usage Notes**:
- Used in advanced time comparison scenarios
- Typically same as `to_dttm` for simple queries
- May differ when time offsets are applied
**Example**:
```python
# Set during time offset processing
query_object.inner_to_dttm = datetime(2024, 1, 31)
```
---
### `time_shift`
**Type**: `str | None`
**Description**: Shifts the entire time range by a specified offset. Used for comparing data across different time periods.
**Default**: `None`
**Supported Formats**: Any parsedatetime-compatible string (e.g., `"1 week ago"`, `"3 months ago"`, `"1 year ago"`)
**Usage Notes**:
- Applied to both `from_dttm` and `to_dttm`
- Different from `time_offsets` (which creates separate queries)
- Affects the main query time range
**Example**:
```python
query_object.time_range = "Last 7 days"
query_object.time_shift = "1 week ago"
# Shifts the entire 7-day window back by 1 week
```
---
### `time_offsets`
**Type**: `list[str]`
**Description**: List of time offsets for creating comparison queries. Each offset generates an additional query with shifted time ranges, enabling time-over-time comparisons.
**Default**: `[]` (empty list)
**Supported Formats**:
1. **Relative**: `"1 week ago"`, `"1 year ago"`, `"3 months ago"`
2. **Date range** (with feature flag): `"2023-01-01 : 2023-01-31"`
3. **Special**: `"inherit"` (uses the time range span)
4. **Custom date**: `"2023-06-15"` (compares to this specific date)
**Usage Notes**:
- Creates separate queries for each offset
- Results are joined with the main query results
- Offset metrics are renamed (e.g., `revenue` → `revenue__1 week ago`)
- Requires `DATE_RANGE_TIMESHIFTS_ENABLED` feature flag for date range format
- Used for year-over-year, month-over-month comparisons
**Related**:
- Time grain is required for proper joining of offset results
- `series_limit` applies to the main query, not offset queries
**Example**:
```python
query_object.time_offsets = ["1 week ago", "1 year ago"]
# Generates 3 queries total:
# 1. Main query (this week)
# 2. Same query shifted back 1 week
# 3. Same query shifted back 1 year
# Results joined on time + other dimensions
# Or with date range (requires feature flag):
query_object.time_offsets = ["2023-01-01 : 2023-01-31"]
# Compares current period to January 2023
```
---
## Sorting & Limiting
### `orderby`
**Type**: `list[OrderBy]`
**OrderBy Type**: `tuple[Union[Metric, Column], bool]`
**Description**: List of ordering specifications. Each tuple contains a column/metric and a boolean indicating ascending order.
**Default**: `[]` (empty list)
**Format**: `[(column_or_metric, is_ascending), ...]`
**Usage Notes**:
- Boolean `True` = ascending order
- Boolean `False` = descending order
- Can order by both columns and metrics
- Applied after aggregation
**Related**:
- `order_desc`: Default sort direction (deprecated in favor of explicit orderby)
**Example**:
```python
query_object.orderby = [
("revenue", False), # Order by revenue descending
("country", True) # Then by country ascending
]
```
---
### `order_desc`
**Type**: `bool`
**Description**: Default sort direction when orderby is not specified. Primarily affects how series are ordered.
**Default**: `True` (descending)
**Usage Notes**:
- Less flexible than `orderby`
- Prefer using explicit `orderby` for complex sorting
- Used mainly for backward compatibility
**Example**:
```python
query_object.order_desc = False
# Results sorted in ascending order
```
---
### `row_limit`
**Type**: `int | None`
**Description**: Maximum number of rows to return from the query. Acts as a SQL LIMIT clause.
**Default**: `None` (uses system default)
**Range**: `>= 0` (0 means no limit)
**Usage Notes**:
- Applied after all filtering and aggregation
- Different from `series_limit` (which limits timeseries)
- System default from `config["ROW_LIMIT"]`
- Can be overridden per-query
**Related**:
- `row_offset`: Works with row_limit for pagination
- `series_limit`: For limiting timeseries/series count
**Example**:
```python
query_object.row_limit = 1000
# Returns at most 1000 rows
```
---
### `row_offset`
**Type**: `int`
**Description**: Number of rows to skip before returning results. Acts as a SQL OFFSET clause for pagination.
**Default**: `0` (no offset)
**Range**: `>= 0`
**Usage Notes**:
- Used with `row_limit` for pagination
- Applied after ordering
- Useful for infinite scroll or paginated tables
**Example**:
```python
# Page 3 of results (20 per page)
query_object.row_limit = 20
query_object.row_offset = 40 # Skip first 40 rows
```
---
## Series Limiting (Timeseries)
### `series_columns`
**Type**: `list[Column]`
**Description**: Subset of `columns` to use when limiting the number of series in timeseries queries. Defines which dimensions create distinct series.
**Default**: Automatically initialized based on context:
- If explicitly provided, uses those columns
- If `is_timeseries=True` and `metrics` exist, uses all `columns`
- Otherwise, empty list
**Usage Notes**:
- All series_columns must be present in `columns`
- Used with `series_limit` to control series count
- Validated during query validation
- Creates one series per unique combination of series_column values
**Related**:
- `series_limit`: Maximum number of series
- `series_limit_metric`: Metric to use for ranking series
- `group_others_when_limit_reached`: Whether to group remaining series
**Example**:
```python
query_object.columns = ["country", "product", "date"]
query_object.series_columns = ["country", "product"]
# Creates one series per (country, product) combination
```
---
### `series_limit`
**Type**: `int`
**Description**: Maximum number of series to return in a timeseries query. Series are ranked by `series_limit_metric` and top N are kept.
**Default**: `0` (no limit)
**Usage Notes**:
- Only applies to timeseries queries
- Requires `series_columns` and `series_limit_metric` to be set
- Top series are selected by the specified metric
- See `group_others_when_limit_reached` for handling excluded series
**Related**:
- `series_columns`: Dimensions that define series
- `series_limit_metric`: Metric used for ranking
- `timeseries_limit`: Deprecated alias
**Example**:
```python
query_object.series_limit = 10
query_object.series_limit_metric = "revenue"
query_object.series_columns = ["country"]
# Returns top 10 countries by revenue
```
---
### `series_limit_metric`
**Type**: `Metric | None`
**Description**: The metric to use for ranking series when `series_limit` is applied. Determines which series are kept.
**Default**: `None`
**Usage Notes**:
- Required when `series_limit` is set
- Must be one of the metrics in the query
- Series are ranked in descending order by this metric
- Can be a string reference or adhoc metric
**Related**:
- `series_limit`: Number of series to keep
- `timeseries_limit_metric`: Deprecated alias
**Example**:
```python
query_object.series_limit_metric = "sum__revenue"
# Ranks series by total revenue
```
---
### `group_others_when_limit_reached`
**Type**: `bool`
**Description**: When `True` and series limit is reached, groups all remaining series into an "Others" category. Prevents incomplete data visualization.
**Default**: `False`
**Usage Notes**:
- Only relevant when `series_limit` is set
- Aggregates metrics for excluded series
- Helps show complete totals while limiting series count
- "Others" category appears as a separate series
**Example**:
```python
query_object.series_limit = 5
query_object.series_limit_metric = "count"
query_object.group_others_when_limit_reached = True
# Shows top 5 series + "Others" category with remaining aggregated
```
---
### `timeseries_limit` (DEPRECATED)
**Status**: ⚠️ **DEPRECATED** - Use `series_limit` instead
**Type**: N/A (automatically renamed to `series_limit`)
**Description**: Legacy field name for series limit. Automatically converted during initialization.
**Migration**: Replace `timeseries_limit` with `series_limit` in all new code.
---
### `timeseries_limit_metric` (DEPRECATED)
**Status**: ⚠️ **DEPRECATED** - Use `series_limit_metric` instead
**Type**: N/A (automatically renamed to `series_limit_metric`)
**Description**: Legacy field name for series limit metric. Automatically converted during initialization.
**Migration**: Replace `timeseries_limit_metric` with `series_limit_metric` in all new code.
---
## Post-Processing
### `post_processing`
**Type**: `list[dict[str, Any]]`
**Description**: Ordered list of post-processing operations to apply to the query results. These transformations run on the DataFrame after SQL execution.
**Default**: `[]` (empty list)
**Operation Format**:
```python
{
"operation": str, # Operation name (from pandas_postprocessing module)
"options": dict # Operation-specific parameters
}
```
**Available Operations**:
- `aggregate`: Group and aggregate data
- `pivot`: Pivot table transformation
- `rolling`: Rolling window calculations
- `sort`: Sort data
- `select`: Select/rename columns
- `contribution`: Calculate contribution percentages
- `prophet`: Time series forecasting
- `boxplot`: Statistical boxplot calculations
- `histogram`: Create histogram bins
- `geohash_decode`: Decode geohash to lat/lon
- `geohash_encode`: Encode lat/lon to geohash
- `geodetic_parse`: Parse geodetic coordinates
- And more...
**Usage Notes**:
- Operations applied in sequence
- Each operation receives output of previous operation
- All operations are from `superset.utils.pandas_postprocessing`
- Validation ensures operation names are valid
- Useful for client-side transformations without re-querying
**Example**:
```python
query_object.post_processing = [
{
"operation": "pivot",
"options": {
"index": ["country"],
"columns": ["product"],
"aggregates": {
"revenue": {"operator": "sum"}
}
}
},
{
"operation": "sort",
"options": {
"columns": {"revenue": False} # Descending
}
}
]
```
**Histogram Example**:
```python
query_object.post_processing = [
{
"operation": "histogram",
"options": {
"column": "age",
"bins": 25
}
}
]
```
---
## Annotations
### `annotation_layers`
**Type**: `list[dict[str, Any]]`
**Description**: List of annotation layers to overlay on the chart. Annotations add contextual information like events, ranges, or reference data.
**Default**: `[]` (empty list, formula annotations filtered out)
**Layer Structure**:
```python
{
"annotationType": str, # "FORMULA", "NATIVE", "line", "table"
"name": str, # Layer name
"value": Any, # Layer-specific value (ID, formula, etc.)
"show": bool, # Whether to show the layer
"sourceType": str, # "NATIVE", "line", "table", etc.
"color": str, # Layer color
"opacity": str, # "opacityLow", "opacityMedium", "opacityHigh"
"style": str, # "solid", "dashed", "dotted", "longDashed"
"width": float, # Line width
"showMarkers": bool, # Show markers on line annotations
"showLabel": bool, # Always show label
"hideLine": bool, # Hide line (show markers only)
"timeColumn": str, # Column with timestamps
"intervalEndColumn": str, # For interval annotations
"titleColumn": str, # Column for titles
"descriptionColumns": list, # Columns for descriptions
"overrides": dict # Override query properties
}
```
**Annotation Types**:
- `"FORMULA"`: Simple formula overlays (e.g., constant line) - **filtered out, don't affect query**
- `"NATIVE"`: Native Superset annotations stored in DB
- `"line"` / `"table"`: Annotations from other charts/queries
**Usage Notes**:
- Formula annotations are filtered out (don't affect payload)
- Other annotations trigger additional data fetches
- Annotations can override time range, granularity, etc.
- Used primarily for time series visualizations
**Example**:
```python
query_object.annotation_layers = [
{
"annotationType": "NATIVE",
"name": "Important Events",
"value": 1, # annotation_layer_id
"show": True,
"sourceType": "NATIVE",
"color": "#ff0000"
},
{
"annotationType": "line",
"name": "Baseline",
"value": 42, # chart_id
"show": True,
"sourceType": "line",
"style": "dashed",
"overrides": {
"time_range": "Last year"
}
}
]
```
---
## Query Execution Control
### `url_params`
**Type**: `dict[str, str]` (keys and values are strings)
**Description**: Optional query parameters passed from dashboard or Explore view URLs. Used for dynamic filtering and Jinja template variables.
**Default**: Not set (from kwargs)
**Usage Notes**:
- Available in Jinja templates via `url_param` function
- Typically set by dashboard filters or URL parameters
- Keys and values are always strings
- Can be used for dynamic query customization
**Example**:
```python
query_object.url_params = {
"country_filter": "USA",
"min_date": "2024-01-01"
}
# In Jinja template:
# WHERE country = '{{ url_param("country_filter") }}'
```
---
## Deprecated Fields
### `where` (DEPRECATED)
**Status**: ⚠️ **DEPRECATED** - Use `extras["where"]` instead
**Type**: Field moved to extras during initialization
**Description**: Raw SQL WHERE clause. Automatically moved to `extras["where"]`.
**Migration**: Use `extras["where"]` directly instead of passing `where` parameter.
---
### `having` (DEPRECATED)
**Status**: ⚠️ **DEPRECATED** - Use `extras["having"]` instead
**Type**: Field moved to extras during initialization
**Description**: Raw SQL HAVING clause. Automatically moved to `extras["having"]`.
**Migration**: Use `extras["having"]` directly instead of passing `having` parameter.
---
## Properties
### `metric_names`
**Type**: `list[str]` (read-only property)
**Description**: Returns the metric labels as strings. Converts adhoc metrics to their labels.
**Usage**: Accessing metric names without dealing with adhoc metric dictionaries.
**Example**:
```python
query_object.metrics = ["count", {"expressionType": "SQL", "label": "revenue", ...}]
query_object.metric_names # ["count", "revenue"]
```
---
### `column_names`
**Type**: `list[str]` (read-only property)
**Description**: Returns the column labels as strings. Converts adhoc columns to their labels.
**Usage**: Accessing column names without dealing with adhoc column dictionaries.
**Example**:
```python
query_object.columns = ["country", {"label": "year", "sqlExpression": "...", ...}]
query_object.column_names # ["country", "year"]
```
---
## Methods
### `validate(raise_exceptions: bool = True) -> QueryObjectValidationError | None`
**Description**: Validates the query object for correctness. Checks for duplicate labels, missing series columns, invalid time offsets, and sanitizes filters.
**Parameters**:
- `raise_exceptions`: If `True`, raises exception on validation error. If `False`, returns the error object.
**Validates**:
1. No missing series columns
2. No duplicate column/metric labels
3. Valid time offset configurations
4. Sanitizes WHERE/HAVING clauses for SQL injection
**Returns**: `None` if valid, or `QueryObjectValidationError` if `raise_exceptions=False`
**Example**:
```python
try:
query_object.validate()
except QueryObjectValidationError as e:
print(f"Validation failed: {e.message}")
```
---
### `to_dict() -> dict[str, Any]`
**Description**: Serializes the QueryObject to a dictionary. Used for passing to datasource connectors and caching.
**Returns**: Dictionary representation of the query object
**Usage**: Called internally when executing queries on datasources.
**Example**:
```python
query_dict = query_object.to_dict()
# {
# "columns": ["country"],
# "metrics": ["count"],
# "filters": [...],
# ...
# }
```
---
### `cache_key(**extra: Any) -> str`
**Description**: Generates a unique cache key for this query. Uses MD5/SHA hash of query parameters.
**Parameters**:
- `**extra`: Additional key-value pairs to include in cache key
**Key Components**:
- All query parameters from `to_dict()`
- `time_range` (not `from_dttm`/`to_dttm` - for relative time caching)
- `datasource.uid`
- `result_type`
- `post_processing`
- `time_offsets`
- `annotation_layers` (if present)
- User impersonation key (if enabled)
**Returns**: MD5/SHA hash string
**Example**:
```python
cache_key = query_object.cache_key(
time_offset="1 week ago",
time_grain="P1D"
)
# "a3f5c8e9d..."
```
---
### `exec_post_processing(df: DataFrame) -> DataFrame`
**Description**: Applies all post-processing operations to a DataFrame in sequence.
**Parameters**:
- `df`: Pandas DataFrame from query execution
**Returns**: Transformed DataFrame
**Raises**: `InvalidPostProcessingError` if operation is invalid
**Example**:
```python
df = datasource.query(query_object.to_dict()).df
processed_df = query_object.exec_post_processing(df)
```
---
## Type Definitions
### `Column`
**Type**: `Union[AdhocColumn, str]`
**AdhocColumn Structure**:
```python
{
"label": str, # Column label
"sqlExpression": str, # SQL expression
"hasCustomLabel": bool, # Whether label is custom
"columnType": str, # "BASE_AXIS" or "SERIES"
"timeGrain": str # Time grain if temporal
}
```
---
### `Metric`
**Type**: `Union[AdhocMetric, str]`
**AdhocMetric Structure**:
```python
{
"expressionType": "SIMPLE" | "SQL",
"label": str,
"hasCustomLabel": bool,
# For SIMPLE:
"aggregate": "SUM" | "AVG" | "COUNT" | "MIN" | "MAX" | "COUNT_DISTINCT",
"column": AdhocMetricColumn,
# For SQL:
"sqlExpression": str
}
```
---
### `OrderBy`
**Type**: `tuple[Union[Metric, Column], bool]`
**Format**: `(column_or_metric, is_ascending)`
**Example**: `("revenue", False)` means "ORDER BY revenue DESC"
---
## Common Usage Patterns
### Basic Query
```python
query_object = QueryObject(
columns=["country", "city"],
metrics=["count"],
row_limit=100
)
```
### Timeseries Query
```python
query_object = QueryObject(
columns=["__timestamp", "country"],
metrics=["sum__revenue"],
granularity="order_date",
time_range="Last 30 days",
extras={"time_grain_sqla": "P1D"},
is_timeseries=True
)
```
### Time Comparison Query
```python
query_object = QueryObject(
columns=["country"],
metrics=["sum__revenue"],
time_range="Last 7 days",
time_offsets=["1 week ago", "1 year ago"],
granularity="order_date"
)
```
### Post-Processed Query
```python
query_object = QueryObject(
columns=["age"],
row_limit=10000,
post_processing=[
{
"operation": "histogram",
"options": {"column": "age", "bins": 25}
}
]
)
```
### Filtered and Sorted Query
```python
query_object = QueryObject(
columns=["country", "product"],
metrics=["sum__revenue", "count"],
filter=[
{"col": "status", "op": "==", "val": "completed"},
{"col": "revenue", "op": ">=", "val": 100}
],
orderby=[("sum__revenue", False)], # Descending
row_limit=50
)
```