docs/source/contributor-guide/specification/output-field-name-semantic.md - datafusion - Git at Google

 <!---
   Licensed to the Apache Software Foundation (ASF) under one
   or more contributor license agreements.  See the NOTICE file
   distributed with this work for additional information
   regarding copyright ownership.  The ASF licenses this file
   to you under the Apache License, Version 2.0 (the
   "License"); you may not use this file except in compliance
   with the License.  You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing,
   software distributed under the License is distributed on an
   "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
   KIND, either express or implied.  See the License for the
   specific language governing permissions and limitations
   under the License.
 -->

 # Output field name semantics

 This specification documents how field names in output record batches should be
 generated based on given user queries. The filed name rules apply to
 DataFusion queries planned from both SQL queries and Dataframe APIs.

 ## Field name rules

 - All bare column field names MUST not contain relation/table qualifier.
   - Both `SELECT t1.id`, `SELECT id` and `df.select_columns(&["id"])` SHOULD result in field name: `id`
 - All compound column field names MUST contain relation/table qualifier.
   - `SELECT foo + bar` SHOULD result in field name: `table.foo PLUS table.bar`
 - Function names MUST be converted to lowercase.
   - `SELECT AVG(c1)` SHOULD result in field name: `avg(table.c1)`
 - Literal string MUST not be wrapped with quotes or double quotes.
   - `SELECT 'foo'` SHOULD result in field name: `foo`
 - Operator expressions MUST be wrapped with parentheses.
   - `SELECT -2` SHOULD result in field name: `(- 2)`
 - Operator and operand MUST be separated by spaces.
   - `SELECT 1+2` SHOULD result in field name: `(1 + 2)`
 - Function arguments MUST be separated by a comma `,` and a space.
   - `SELECT f(c1,c2)` and `df.select(vec![f.udf("f")?.call(vec![col("c1"), col("c2")])])` SHOULD result in field name: `f(table.c1, table.c2)`

 ## Appendices

 ### Examples and comparison with other systems

 Data schema for test sample queries:

 ```
 CREATE TABLE t1 (id INT, a VARCHAR(5));
 INSERT INTO t1 (id, a) VALUES (1, 'foo');
 INSERT INTO t1 (id, a) VALUES (2, 'bar');

 CREATE TABLE t2 (id INT, b VARCHAR(5));
 INSERT INTO t2 (id, b) VALUES (1, 'hello');
 INSERT INTO t2 (id, b) VALUES (2, 'world');
 ```

 #### Projected columns

 Query:

 ```
 SELECT t1.id, a, t2.id, b
 FROM t1
 JOIN t2 ON t1.id = t2.id
 ```

 DataFusion Arrow record batches output:

 | id  | a   | id  | b     |
 | --- | --- | --- | ----- |
 | 1   | foo | 1   | hello |
 | 2   | bar | 2   | world |

 Spark, MySQL 8 and PostgreSQL 13 output:

 | id  | a   | id  | b     |
 | --- | --- | --- | ----- |
 | 1   | foo | 1   | hello |
 | 2   | bar | 2   | world |

 SQLite 3 output:

 | id  | a   | b     |
 | --- | --- | ----- |
 | 1   | foo | hello |
 | 2   | bar | world |

 #### Function transformed columns

 Query:

 ```
 SELECT ABS(t1.id), abs(-id) FROM t1;
 ```

 DataFusion Arrow record batches output:

 | abs(t1.id) | abs((- t1.id)) |
 | ---------- | -------------- |
 | 1          | 1              |
 | 2          | 2              |

 Spark output:

 | abs(id) | abs((- id)) |
 | ------- | ----------- |
 | 1       | 1           |
 | 2       | 2           |

 MySQL 8 output:

 | ABS(t1.id) | abs(-id) |
 | ---------- | -------- |
 | 1          | 1        |
 | 2          | 2        |

 PostgreSQL 13 output:

 | abs | abs |
 | --- | --- |
 | 1   | 1   |
 | 2   | 2   |

 SQlite 3 output:

 | ABS(t1.id) | abs(-id) |
 | ---------- | -------- |
 | 1          | 1        |
 | 2          | 2        |

 #### Function with operators

 Query:

 ```
 SELECT t1.id + ABS(id), ABS(id * t1.id) FROM t1;
 ```

 DataFusion Arrow record batches output:

 | t1.id + abs(t1.id) | abs(t1.id \* t1.id) |
 | ------------------ | ------------------- |
 | 2                  | 1                   |
 | 4                  | 4                   |

 Spark output:

 | id + abs(id) | abs(id \* id) |
 | ------------ | ------------- |
 | 2            | 1             |
 | 4            | 4             |

 MySQL 8 output:

 | t1.id + ABS(id) | ABS(id \* t1.id) |
 | --------------- | ---------------- |
 | 2               | 1                |
 | 4               | 4                |

 PostgreSQL output:

 | ?column? | abs |
 | -------- | --- |
 | 2        | 1   |
 | 4        | 4   |

 SQLite output:

 | t1.id + ABS(id) | ABS(id \* t1.id) |
 | --------------- | ---------------- |
 | 2               | 1                |
 | 4               | 4                |

 #### Project literals

 Query:

 ```
 SELECT 1, 2+5, 'foo_bar';
 ```

 DataFusion Arrow record batches output:

 | 1   | (2 + 5) | foo_bar |
 | --- | ------- | ------- |
 | 1   | 7       | foo_bar |

 Spark output:

 | 1   | (2 + 5) | foo_bar |
 | --- | ------- | ------- |
 | 1   | 7       | foo_bar |

 MySQL output:

 | 1   | 2+5 | foo_bar |
 | --- | --- | ------- |
 | 1   | 7   | foo_bar |

 PostgreSQL output:

 | ?column? | ?column? | ?column? |
 | -------- | -------- | -------- |
 | 1        | 7        | foo_bar  |

 SQLite 3 output:

 | 1   | 2+5 | 'foo_bar' |
 | --- | --- | --------- |
 | 1   | 7   | foo_bar   |
	<!---
	Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.
	-->

	# Output field name semantics

	This specification documents how field names in output record batches should be
	generated based on given user queries. The filed name rules apply to
	DataFusion queries planned from both SQL queries and Dataframe APIs.

	## Field name rules

	- All bare column field names MUST not contain relation/table qualifier.
	- Both `SELECT t1.id`, `SELECT id` and `df.select_columns(&["id"])` SHOULD result in field name: `id`
	- All compound column field names MUST contain relation/table qualifier.
	- `SELECT foo + bar` SHOULD result in field name: `table.foo PLUS table.bar`
	- Function names MUST be converted to lowercase.
	- `SELECT AVG(c1)` SHOULD result in field name: `avg(table.c1)`
	- Literal string MUST not be wrapped with quotes or double quotes.
	- `SELECT 'foo'` SHOULD result in field name: `foo`
	- Operator expressions MUST be wrapped with parentheses.
	- `SELECT -2` SHOULD result in field name: `(- 2)`
	- Operator and operand MUST be separated by spaces.
	- `SELECT 1+2` SHOULD result in field name: `(1 + 2)`
	- Function arguments MUST be separated by a comma `,` and a space.
	- `SELECT f(c1,c2)` and `df.select(vec![f.udf("f")?.call(vec![col("c1"), col("c2")])])` SHOULD result in field name: `f(table.c1, table.c2)`

	## Appendices

	### Examples and comparison with other systems

	Data schema for test sample queries:

	```
	CREATE TABLE t1 (id INT, a VARCHAR(5));
	INSERT INTO t1 (id, a) VALUES (1, 'foo');
	INSERT INTO t1 (id, a) VALUES (2, 'bar');

	CREATE TABLE t2 (id INT, b VARCHAR(5));
	INSERT INTO t2 (id, b) VALUES (1, 'hello');
	INSERT INTO t2 (id, b) VALUES (2, 'world');
	```

	#### Projected columns

	Query:

	```
	SELECT t1.id, a, t2.id, b
	FROM t1
	JOIN t2 ON t1.id = t2.id
	```

	DataFusion Arrow record batches output:

	\| id \| a \| id \| b \|
	\| --- \| --- \| --- \| ----- \|
	\| 1 \| foo \| 1 \| hello \|
	\| 2 \| bar \| 2 \| world \|

	Spark, MySQL 8 and PostgreSQL 13 output:

	\| id \| a \| id \| b \|
	\| --- \| --- \| --- \| ----- \|
	\| 1 \| foo \| 1 \| hello \|
	\| 2 \| bar \| 2 \| world \|

	SQLite 3 output:

	\| id \| a \| b \|
	\| --- \| --- \| ----- \|
	\| 1 \| foo \| hello \|
	\| 2 \| bar \| world \|

	#### Function transformed columns

	Query:

	```
	SELECT ABS(t1.id), abs(-id) FROM t1;
	```

	DataFusion Arrow record batches output:

	\| abs(t1.id) \| abs((- t1.id)) \|
	\| ---------- \| -------------- \|
	\| 1 \| 1 \|
	\| 2 \| 2 \|

	Spark output:

	\| abs(id) \| abs((- id)) \|
	\| ------- \| ----------- \|
	\| 1 \| 1 \|
	\| 2 \| 2 \|

	MySQL 8 output:

	\| ABS(t1.id) \| abs(-id) \|
	\| ---------- \| -------- \|
	\| 1 \| 1 \|
	\| 2 \| 2 \|

	PostgreSQL 13 output:

	\| abs \| abs \|
	\| --- \| --- \|
	\| 1 \| 1 \|
	\| 2 \| 2 \|

	SQlite 3 output:

	\| ABS(t1.id) \| abs(-id) \|
	\| ---------- \| -------- \|
	\| 1 \| 1 \|
	\| 2 \| 2 \|

	#### Function with operators

	Query:

	```
	SELECT t1.id + ABS(id), ABS(id * t1.id) FROM t1;
	```

	DataFusion Arrow record batches output:

	\| t1.id + abs(t1.id) \| abs(t1.id \* t1.id) \|
	\| ------------------ \| ------------------- \|
	\| 2 \| 1 \|
	\| 4 \| 4 \|

	Spark output:

	\| id + abs(id) \| abs(id \* id) \|
	\| ------------ \| ------------- \|
	\| 2 \| 1 \|
	\| 4 \| 4 \|

	MySQL 8 output:

	\| t1.id + ABS(id) \| ABS(id \* t1.id) \|
	\| --------------- \| ---------------- \|
	\| 2 \| 1 \|
	\| 4 \| 4 \|

	PostgreSQL output:

	\| ?column? \| abs \|
	\| -------- \| --- \|
	\| 2 \| 1 \|
	\| 4 \| 4 \|

	SQLite output:

	\| t1.id + ABS(id) \| ABS(id \* t1.id) \|
	\| --------------- \| ---------------- \|
	\| 2 \| 1 \|
	\| 4 \| 4 \|

	#### Project literals

	Query:

	```
	SELECT 1, 2+5, 'foo_bar';
	```

	DataFusion Arrow record batches output:

	\| 1 \| (2 + 5) \| foo_bar \|
	\| --- \| ------- \| ------- \|
	\| 1 \| 7 \| foo_bar \|

	Spark output:

	\| 1 \| (2 + 5) \| foo_bar \|
	\| --- \| ------- \| ------- \|
	\| 1 \| 7 \| foo_bar \|

	MySQL output:

	\| 1 \| 2+5 \| foo_bar \|
	\| --- \| --- \| ------- \|
	\| 1 \| 7 \| foo_bar \|

	PostgreSQL output:

	\| ?column? \| ?column? \| ?column? \|
	\| -------- \| -------- \| -------- \|
	\| 1 \| 7 \| foo_bar \|

	SQLite 3 output:

	\| 1 \| 2+5 \| 'foo_bar' \|
	\| --- \| --- \| --------- \|
	\| 1 \| 7 \| foo_bar \|