blob: 4934bc267437527f09f945254518ef030628e27f [file] [log] [blame] [view]
<!---
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
# DML
DML stands for "Data Manipulation Language" and relates to inserting
and modifying data in tables.
## COPY
Copies the contents of a table or query to file(s). Supported file
formats are `parquet`, `csv`, `json`, and `arrow`.
<pre>
COPY { <i><b>table_name</i></b> | <i><b>query</i></b> }
TO '<i><b>file_name</i></b>'
[ STORED AS <i><b>format</i></b> ]
[ PARTITIONED BY <i><b>column_name</i></b> [, ...] ]
[ OPTIONS( <i><b>option</i></b> [, ... ] ) ]
</pre>
`STORED AS` specifies the file format the `COPY` command will write. If this
clause is not specified, it will be inferred from the file extension if possible.
`PARTITIONED BY` specifies the columns to use for partitioning the output files into
separate hive-style directories. By default, columns used in `PARTITIONED BY` will be removed
from the output format. If you want to keep the columns, you should provide the option
`execution.keep_partition_by_columns true`. `execution.keep_partition_by_columns` flag can also
be enabled through `ExecutionOptions` within `SessionConfig`.
The output format is determined by the first match of the following rules:
1. Value of `STORED AS`
2. Filename extension (e.g. `foo.parquet` implies `PARQUET` format)
For a detailed list of valid OPTIONS, see [Format Options](format_options.md).
### Examples
Copy the contents of `source_table` to `file_name.json` in JSON format:
```sql
> COPY source_table TO 'file_name.json';
+-------+
| count |
+-------+
| 2 |
+-------+
```
Copy the contents of `source_table` to one or more Parquet formatted
files in the `dir_name` directory:
```sql
> COPY source_table TO 'dir_name' STORED AS PARQUET;
+-------+
| count |
+-------+
| 2 |
+-------+
```
Copy the contents of `source_table` to multiple directories
of hive-style partitioned parquet files:
```sql
> COPY source_table TO 'dir_name' STORED AS parquet, PARTITIONED BY (column1, column2);
+-------+
| count |
+-------+
| 2 |
+-------+
```
If the data contains values of `x` and `y` in column1 and only `a` in
column2, output files will appear in the following directory structure:
```text
dir_name/
column1=x/
column2=a/
<file>.parquet
<file>.parquet
...
column1=y/
column2=a/
<file>.parquet
<file>.parquet
...
```
Run the query `SELECT * from source ORDER BY time` and write the
results (maintaining the order) to a parquet file named
`output.parquet` with a maximum parquet row group size of 10MB:
```sql
> COPY (SELECT * from source ORDER BY time) TO 'output.parquet' OPTIONS (MAX_ROW_GROUP_SIZE 10000000);
+-------+
| count |
+-------+
| 2 |
+-------+
```
## INSERT
### Examples
Insert values into a table.
<pre>
INSERT INTO <i><b>table_name</i></b> { VALUES ( <i><b>expression</i></b> [, ...] ) [, ...] | <i><b>query</i></b> }
</pre>
```sql
> INSERT INTO target_table VALUES (1, 'Foo'), (2, 'Bar');
+-------+
| count |
+-------+
| 2 |
+-------+
```