DataFusion uses name-based field mapping when coercing struct types across different operations. This document explains how struct coercion works, when it applies, and how to handle NULL fields.
When combining structs from different sources (e.g., in UNION, array construction, or JOINs), DataFusion matches struct fields by name rather than by position. This provides more robust and predictable behavior compared to positional matching.
-- These two structs have the same fields in different order SELECT [{a: 1, b: 2}, {b: 3, a: 4}]; -- Result: Field names matched, values unified -- [{"a": 1, "b": 2}, {"a": 4, "b": 3}]
The following query operations use name-based field mapping for struct coercion:
When creating array literals with struct elements that have different field orders:
-- Structs with reordered fields in array literal SELECT [{x: 1, y: 2}, {y: 3, x: 4}]; -- Unified type: List(Struct("x": Int32, "y": Int32)) -- Values: [{"x": 1, "y": 2}, {"x": 4, "y": 3}]
When it applies:
[{...}, {...}][[{x: 1}, {x: 2}]]When constructing arrays from table columns with different struct schemas:
CREATE TABLE t_left (s struct(x int, y int)) AS VALUES ({x: 1, y: 2}); CREATE TABLE t_right (s struct(y int, x int)) AS VALUES ({y: 3, x: 4}); -- Dynamically constructs unified array schema SELECT [t_left.s, t_right.s] FROM t_left JOIN t_right; -- Result: [{"x": 1, "y": 2}, {"x": 4, "y": 3}]
When it applies:
[col1, col2]When combining query results with different struct field orders:
SELECT {a: 1, b: 2} as s UNION ALL SELECT {b: 3, a: 4} as s; -- Result: {"a": 1, "b": 2} and {"a": 4, "b": 3}
When it applies:
When multiple CTEs produce structs with different field orders that are combined:
WITH t1 AS (SELECT {a: 1, b: 2} as s), t2 AS (SELECT {b: 3, a: 4} as s) SELECT s FROM t1 UNION ALL SELECT s FROM t2; -- Result: Field names matched across CTEs
When creating tables or temporary results with struct values in different field orders:
CREATE TABLE t AS VALUES ({a: 1, b: 2}), ({b: 3, a: 4}); -- Table schema unified: struct(a: int, b: int) -- Values: {a: 1, b: 2} and {a: 4, b: 3}
When joining tables where the JOIN condition involves structs with different field orders:
CREATE TABLE orders (customer struct(name varchar, id int)); CREATE TABLE customers (info struct(id int, name varchar)); -- Join matches struct fields by name SELECT * FROM orders JOIN customers ON orders.customer = customers.info;
When collecting structs with different field orders using aggregate functions like array_agg:
SELECT array_agg(s) FROM ( SELECT {x: 1, y: 2} as s UNION ALL SELECT {y: 3, x: 4} as s ) t GROUP BY category; -- Result: Array of structs with unified field order
When using window functions with struct expressions having different field orders:
SELECT id, row_number() over (partition by s order by id) as rn FROM ( SELECT {category: 1, value: 10} as s, 1 as id UNION ALL SELECT {value: 20, category: 1} as s, 2 as id ); -- Fields matched by name in PARTITION BY clause
When structs have different field sets, missing fields are filled with NULL values during coercion.
-- Struct in first position has fields: a, b -- Struct in second position has fields: b, c -- Unified schema includes all fields: a, b, c SELECT [ CAST({a: 1, b: 2} AS STRUCT(a INT, b INT, c INT)), CAST({b: 3, c: 4} AS STRUCT(a INT, b INT, c INT)) ]; -- Result: -- [ -- {"a": 1, "b": 2, "c": NULL}, -- {"a": NULL, "b": 3, "c": 4} -- ]
Field count must match exactly. If structs have different numbers of fields and their field names don't completely overlap, the query will fail:
-- This fails because field sets don't match: -- t_left has {x, y} but t_right has {x, y, z} SELECT [t_left.s, t_right.s] FROM t_left JOIN t_right; -- Error: Cannot coerce struct with mismatched field counts
Workaround: Use explicit CAST
To handle partial field overlap, explicitly cast structs to a unified schema:
SELECT [ CAST(t_left.s AS STRUCT(x INT, y INT, z INT)), CAST(t_right.s AS STRUCT(x INT, y INT, z INT)) ] FROM t_left JOIN t_right;
If you have existing code that relied on positional struct field matching, you may need to update it.
Old behavior (positional):
-- These would have been positionally mapped (left-to-right) SELECT [{x: 1, y: 2}, {y: 3, x: 4}]; -- Old result (positional): [{"x": 1, "y": 2}, {"y": 3, "x": 4}]
New behavior (name-based):
-- Now uses name-based matching SELECT [{x: 1, y: 2}, {y: 3, x: 4}]; -- New result (by name): [{"x": 1, "y": 2}, {"x": 4, "y": 3}]
If you need precise control over struct field order and types, use explicit CAST:
-- Guarantee specific field order and types SELECT CAST({b: 3, a: 4} AS STRUCT(a INT, b INT)); -- Result: {"a": 4, "b": 3}
When joining or combining structs, define target schemas explicitly:
-- Good: explicit schema definition SELECT CAST(data AS STRUCT(id INT, name VARCHAR, active BOOLEAN)) FROM external_source;
Prefer named struct constructors for clarity:
-- Good: field names are explicit SELECT named_struct('id', 1, 'name', 'Alice', 'active', true); -- Or using struct literal syntax SELECT {id: 1, name: 'Alice', active: true};
Always verify that field mappings work as expected:
-- Use arrow_typeof to verify unified schema SELECT arrow_typeof([{x: 1, y: 2}, {y: 3, x: 4}]); -- Result: List(Struct("x": Int32, "y": Int32))
When combining structs with partial field overlap, use explicit CAST:
-- Instead of relying on implicit coercion SELECT [ CAST(left_struct AS STRUCT(x INT, y INT, z INT)), CAST(right_struct AS STRUCT(x INT, y INT, z INT)) ];
In complex queries, document the expected struct schemas:
-- Expected schema: {customer_id: INT, name: VARCHAR, age: INT} SELECT { customer_id: c.id, name: c.name, age: c.age } as customer_info FROM customers c;
Cause: Trying to combine structs with different numbers of fields.
Solution:
-- Use explicit CAST to handle missing fields SELECT [ CAST(struct1 AS STRUCT(a INT, b INT, c INT)), CAST(struct2 AS STRUCT(a INT, b INT, c INT)) ];
Cause: Referencing a field name that doesn't exist in the struct.
Solution:
-- Verify field names match exactly (case-sensitive) SELECT s['field_name'] FROM my_table; -- Use bracket notation for access -- Or use get_field function SELECT get_field(s, 'field_name') FROM my_table;
Cause: Struct coercion added NULL for missing fields.
Solution: Check that all structs have the required fields, or explicitly handle NULLs:
SELECT COALESCE(s['field'], default_value) FROM my_table;
arrow_typeof() - Returns the Arrow type of an expressionstruct() / named_struct() - Creates struct valuesget_field() - Extracts field values from structsCAST() - Explicitly casts structs to specific schemas