| <?xml version="1.0" encoding="UTF-8"?> |
| <!DOCTYPE html |
| PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> |
| <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> |
| <head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> |
| |
| <meta name="copyright" content="(C) Copyright 2024" /> |
| <meta name="DC.rights.owner" content="(C) Copyright 2024" /> |
| <meta name="DC.Type" content="concept" /> |
| <meta name="DC.Title" content="INSERT Statement" /> |
| <meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html" /> |
| <meta name="prodname" content="Impala" /> |
| <meta name="prodname" content="Impala" /> |
| <meta name="prodname" content="Impala" /> |
| <meta name="version" content="Impala 3.4.x" /> |
| <meta name="version" content="Impala 3.4.x" /> |
| <meta name="version" content="Impala 3.4.x" /> |
| <meta name="DC.Format" content="XHTML" /> |
| <meta name="DC.Identifier" content="insert" /> |
| <link rel="stylesheet" type="text/css" href="../commonltr.css" /> |
| <title>INSERT Statement</title> |
| </head> |
| <body id="insert"> |
| |
| |
| <h1 class="title topictitle1" id="ariaid-title1">INSERT Statement</h1> |
| |
| |
| |
| |
| <div class="body conbody"> |
| |
| <p class="p"> |
| |
| Impala supports inserting into tables and partitions that you create with the Impala <code class="ph codeph">CREATE |
| TABLE</code> statement, or pre-defined tables and partitions created through Hive. |
| </p> |
| |
| |
| <p class="p"> |
| <strong class="ph b">Syntax:</strong> |
| </p> |
| |
| |
| <pre class="pre codeblock"><code>[<var class="keyword varname">with_clause</var>] |
| INSERT <span class="ph">[<var class="keyword varname">hint_clause</var>]</span> { INTO | OVERWRITE } [TABLE] <var class="keyword varname">table_name</var> |
| [(<var class="keyword varname">column_list</var>)] |
| [ PARTITION (<var class="keyword varname">partition_clause</var>)] |
| { |
| [<var class="keyword varname">hint_clause</var>] <var class="keyword varname">select_statement</var> |
| | VALUES (<var class="keyword varname">value</var> [, <var class="keyword varname">value</var> ...]) [, (<var class="keyword varname">value</var> [, <var class="keyword varname">value</var> ...]) ...] |
| } |
| |
| partition_clause ::= <var class="keyword varname">col_name</var> [= <var class="keyword varname">constant</var>] [, <var class="keyword varname">col_name</var> [= <var class="keyword varname">constant</var>] ...] |
| |
| hint_clause ::= |
| <var class="keyword varname">hint_with_dashes</var> | |
| <var class="keyword varname">hint_with_cstyle_delimiters</var> | |
| <var class="keyword varname">hint_with_brackets</var> |
| |
| hint_with_dashes ::= -- +SHUFFLE | -- +NOSHUFFLE <span class="ph">-- +CLUSTERED</span> |
| |
| hint_with_cstyle_comments ::= /* +SHUFFLE */ | /* +NOSHUFFLE */ <span class="ph">| /* +CLUSTERED */</span> |
| |
| hint_with_brackets ::= [SHUFFLE] | [NOSHUFFLE] |
| (With this hint format, the square brackets are part of the syntax.) |
| </code></pre> |
| |
| <div class="note note"><span class="notetitle">Note:</span> |
| The square bracket style of hint is now deprecated and might be removed in a future |
| release. For that reason, any newly added hints are not available with the square |
| bracket syntax. |
| </div> |
| |
| |
| <p class="p"> |
| <strong class="ph b">Appending or replacing (INTO and OVERWRITE clauses):</strong> |
| </p> |
| |
| |
| <p class="p"> |
| The <code class="ph codeph">INSERT INTO</code> syntax appends data to a table. The existing data files are left as-is, and |
| the inserted data is put into one or more new data files. |
| </p> |
| |
| |
| <p class="p"> |
| The <code class="ph codeph">INSERT OVERWRITE</code> syntax replaces the data in a table. |
| |
| |
| Currently, the overwritten data files are deleted immediately; they do not go through the HDFS trash |
| mechanism. |
| </p> |
| |
| |
| <p class="p"> |
| <strong class="ph b">Complex type considerations:</strong> |
| </p> |
| |
| |
| <p class="p"> |
| The <code class="ph codeph">INSERT</code> statement currently does not support writing data files |
| containing complex types (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code>). |
| To prepare Parquet data for such tables, you generate the data files outside Impala and then |
| use <code class="ph codeph">LOAD DATA</code> or <code class="ph codeph">CREATE EXTERNAL TABLE</code> to associate those |
| data files with the table. Currently, such tables must use the Parquet file format. |
| See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about working with complex types. |
| </p> |
| |
| |
| <p class="p"> |
| <strong class="ph b">Kudu considerations:</strong> |
| </p> |
| |
| |
| <p class="p"> |
| Currently, the <code class="ph codeph">INSERT OVERWRITE</code> syntax cannot be used with Kudu tables. |
| </p> |
| |
| |
| <p class="p"> |
| Kudu tables require a unique primary key for each row. If an <code class="ph codeph">INSERT</code> |
| statement attempts to insert a row with the same values for the primary key columns |
| as an existing row, that row is discarded and the insert operation continues. |
| When rows are discarded due to duplicate primary keys, the statement finishes |
| with a warning, not an error. (This is a change from early releases of Kudu |
| where the default was to return in error in such cases, and the syntax |
| <code class="ph codeph">INSERT IGNORE</code> was required to make the statement succeed. |
| The <code class="ph codeph">IGNORE</code> clause is no longer part of the <code class="ph codeph">INSERT</code> |
| syntax.) |
| </p> |
| |
| |
| <p class="p"> |
| For situations where you prefer to replace rows with duplicate primary key values, |
| rather than discarding the new data, you can use the <code class="ph codeph">UPSERT</code> |
| statement instead of <code class="ph codeph">INSERT</code>. <code class="ph codeph">UPSERT</code> inserts |
| rows that are entirely new, and for rows that match an existing primary key in the |
| table, the non-primary-key columns are updated to reflect the values in the |
| <span class="q">"upserted"</span> data. |
| </p> |
| |
| |
| <p class="p"> |
| If you really want to store new rows, not replace existing ones, but cannot do so |
| because of the primary key uniqueness constraint, consider recreating the table |
| with additional columns included in the primary key. |
| </p> |
| |
| |
| <p class="p"> |
| See <a class="xref" href="impala_kudu.html#impala_kudu">Using Impala to Query Kudu Tables</a> for more details about using Impala with Kudu. |
| </p> |
| |
| |
| <p class="p"> |
| <strong class="ph b">Usage notes:</strong> |
| </p> |
| |
| |
| <p class="p"> |
| Impala currently supports: |
| </p> |
| |
| |
| <ul class="ul"> |
| <li class="li"> |
| Copy data from another table using <code class="ph codeph">SELECT</code> query. In Impala 1.2.1 and higher, you can |
| combine <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">INSERT</code> operations into a single step with the |
| <code class="ph codeph">CREATE TABLE AS SELECT</code> syntax, which bypasses the actual <code class="ph codeph">INSERT</code> keyword. |
| </li> |
| |
| |
| <li class="li"> |
| An optional <a class="xref" href="impala_with.html#with"><code class="ph codeph">WITH</code> clause</a> before the |
| <code class="ph codeph">INSERT</code> keyword, to define a subquery referenced in the <code class="ph codeph">SELECT</code> portion. |
| </li> |
| |
| |
| <li class="li"> |
| Create one or more new rows using constant expressions through <code class="ph codeph">VALUES</code> clause. (The |
| <code class="ph codeph">VALUES</code> clause was added in Impala 1.0.1.) |
| </li> |
| |
| |
| <li class="li"> |
| <p class="p"> |
| By default, the first column of each newly inserted row goes into the first column of the table, the |
| second column into the second column, and so on. |
| </p> |
| |
| <p class="p"> |
| You can also specify the columns to be inserted, an arbitrarily ordered subset of the columns in the |
| destination table, by specifying a column list immediately after the name of the destination table. This |
| feature lets you adjust the inserted columns to match the layout of a <code class="ph codeph">SELECT</code> statement, |
| rather than the other way around. (This feature was added in Impala 1.1.) |
| </p> |
| |
| <p class="p"> |
| The number of columns mentioned in the column list (known as the <span class="q">"column permutation"</span>) must match |
| the number of columns in the <code class="ph codeph">SELECT</code> list or the <code class="ph codeph">VALUES</code> tuples. The |
| order of columns in the column permutation can be different than in the underlying table, and the columns |
| of each input row are reordered to match. If the number of columns in the column permutation is less than |
| in the destination table, all unmentioned columns are set to <code class="ph codeph">NULL</code>. |
| </p> |
| |
| </li> |
| |
| |
| <li class="li"> |
| An optional hint clause immediately either before the <code class="ph codeph">SELECT</code> keyword or after the |
| <code class="ph codeph">INSERT</code> keyword, to fine-tune the behavior when doing an <code class="ph codeph">INSERT ... SELECT</code> |
| operation into partitioned Parquet tables. The hint clause cannot be specified in multiple places. |
| The hint keywords are <code class="ph codeph">[SHUFFLE]</code> and <code class="ph codeph">[NOSHUFFLE]</code>, including the square brackets. |
| Inserting into partitioned Parquet tables can be a resource-intensive operation because it potentially |
| involves many files being written to HDFS simultaneously, and separate |
| <span class="ph">large</span> memory buffers being allocated to buffer the data for each |
| partition. For usage details, see <a class="xref" href="impala_parquet.html#parquet_etl">Loading Data into Parquet Tables</a>. |
| </li> |
| |
| </ul> |
| |
| |
| <div class="note note"><span class="notetitle">Note:</span> |
| <ul class="ul"> |
| <li class="li"> |
| Insert commands that partition or add files result in changes to Hive metadata. Because Impala uses Hive |
| metadata, such changes may necessitate a metadata refresh. For more information, see the |
| <a class="xref" href="impala_refresh.html#refresh">REFRESH</a> function. |
| </li> |
| |
| |
| <li class="li"> |
| Currently, Impala can only insert data into tables that use the text and Parquet formats. For other file |
| formats, insert the data using Hive and use Impala to query it. |
| </li> |
| |
| |
| <li class="li"> |
| As an alternative to the <code class="ph codeph">INSERT</code> statement, if you have existing data files elsewhere in |
| HDFS, the <code class="ph codeph">LOAD DATA</code> statement can move those files into a table. This statement works |
| with tables of any file format. |
| </li> |
| |
| </ul> |
| |
| </div> |
| |
| |
| <p class="p"> |
| <strong class="ph b">Statement type:</strong> DML (but still affected by |
| <a class="xref" href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL</a> query option) |
| </p> |
| |
| |
| <p class="p"> |
| <strong class="ph b">Usage notes:</strong> |
| </p> |
| |
| |
| <p class="p"> |
| When you insert the results of an expression, particularly of a built-in function call, into a small numeric |
| column such as <code class="ph codeph">INT</code>, <code class="ph codeph">SMALLINT</code>, <code class="ph codeph">TINYINT</code>, or |
| <code class="ph codeph">FLOAT</code>, you might need to use a <code class="ph codeph">CAST()</code> expression to coerce values into the |
| appropriate type. Impala does not automatically convert from a larger type to a smaller one. For example, to |
| insert cosine values into a <code class="ph codeph">FLOAT</code> column, write <code class="ph codeph">CAST(COS(angle) AS FLOAT)</code> |
| in the <code class="ph codeph">INSERT</code> statement to make the conversion explicit. |
| </p> |
| |
| |
| <p class="p"> |
| <strong class="ph b">File format considerations:</strong> |
| </p> |
| |
| |
| <p class="p"> |
| Because Impala can read certain file formats that it cannot write, |
| the <code class="ph codeph">INSERT</code> statement does not work for all kinds of |
| Impala tables. See <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a> |
| for details about what file formats are supported by the |
| <code class="ph codeph">INSERT</code> statement. |
| </p> |
| |
| |
| <p class="p"> |
| Any <code class="ph codeph">INSERT</code> statement for a Parquet table requires enough free space in |
| the HDFS filesystem to write one block. Because Parquet data files use a block size of 1 |
| GB by default, an <code class="ph codeph">INSERT</code> might fail (even for a very small amount of |
| data) if your HDFS is running low on space. |
| </p> |
| |
| |
| <p class="p"> |
| If you connect to different Impala nodes within an <span class="keyword cmdname">impala-shell</span> |
| session for load-balancing purposes, you can enable the <code class="ph codeph">SYNC_DDL</code> query |
| option to make each DDL statement wait before returning, until the new or changed |
| metadata has been received by all the Impala nodes. See |
| <a class="xref" href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL Query Option</a> for details. |
| </p> |
| |
| |
| <div class="note important"><span class="importanttitle">Important:</span> |
| After adding or replacing data in a table used in performance-critical queries, issue a |
| <code class="ph codeph">COMPUTE STATS</code> statement to make sure all statistics are up-to-date. |
| Consider updating statistics for a table after any <code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD |
| DATA</code>, or <code class="ph codeph">CREATE TABLE AS SELECT</code> statement in Impala, or after |
| loading data through Hive and doing a <code class="ph codeph">REFRESH |
| <var class="keyword varname">table_name</var></code> in Impala. This technique is especially important |
| for tables that are very large, used in join queries, or both. |
| </div> |
| |
| |
| <p class="p"> |
| <strong class="ph b">Examples:</strong> |
| </p> |
| |
| |
| <p class="p"> |
| The following example sets up new tables with the same definition as the <code class="ph codeph">TAB1</code> table from the |
| <a class="xref" href="impala_tutorial.html#tutorial">Tutorial</a> section, using different file |
| formats, and demonstrates inserting data into the tables created with the <code class="ph codeph">STORED AS TEXTFILE</code> |
| and <code class="ph codeph">STORED AS PARQUET</code> clauses: |
| </p> |
| |
| |
| <pre class="pre codeblock"><code>CREATE DATABASE IF NOT EXISTS file_formats; |
| USE file_formats; |
| |
| DROP TABLE IF EXISTS text_table; |
| CREATE TABLE text_table |
| ( id INT, col_1 BOOLEAN, col_2 DOUBLE, col_3 TIMESTAMP ) |
| STORED AS TEXTFILE; |
| |
| DROP TABLE IF EXISTS parquet_table; |
| CREATE TABLE parquet_table |
| ( id INT, col_1 BOOLEAN, col_2 DOUBLE, col_3 TIMESTAMP ) |
| STORED AS PARQUET;</code></pre> |
| |
| <p class="p"> |
| With the <code class="ph codeph">INSERT INTO TABLE</code> syntax, each new set of inserted rows is appended to any existing |
| data in the table. This is how you would record small amounts of data that arrive continuously, or ingest new |
| batches of data alongside the existing data. For example, after running 2 <code class="ph codeph">INSERT INTO TABLE</code> |
| statements with 5 rows each, the table contains 10 rows total: |
| </p> |
| |
| |
| <pre class="pre codeblock"><code>[localhost:21000] > insert into table text_table select * from default.tab1; |
| Inserted 5 rows in 0.41s |
| |
| [localhost:21000] > insert into table text_table select * from default.tab1; |
| Inserted 5 rows in 0.46s |
| |
| [localhost:21000] > select count(*) from text_table; |
| +----------+ |
| | count(*) | |
| +----------+ |
| | 10 | |
| +----------+ |
| Returned 1 row(s) in 0.26s</code></pre> |
| |
| <p class="p"> |
| With the <code class="ph codeph">INSERT OVERWRITE TABLE</code> syntax, each new set of inserted rows replaces any existing |
| data in the table. This is how you load data to query in a data warehousing scenario where you analyze just |
| the data for a particular day, quarter, and so on, discarding the previous data each time. You might keep the |
| entire set of data in one raw table, and transfer and transform certain rows into a more compact and |
| efficient form to perform intensive analysis on that subset. |
| </p> |
| |
| |
| <p class="p"> |
| For example, here we insert 5 rows into a table using the <code class="ph codeph">INSERT INTO</code> clause, then replace |
| the data by inserting 3 rows with the <code class="ph codeph">INSERT OVERWRITE</code> clause. Afterward, the table only |
| contains the 3 rows from the final <code class="ph codeph">INSERT</code> statement. |
| </p> |
| |
| |
| <pre class="pre codeblock"><code>[localhost:21000] > insert into table parquet_table select * from default.tab1; |
| Inserted 5 rows in 0.35s |
| |
| [localhost:21000] > insert overwrite table parquet_table select * from default.tab1 limit 3; |
| Inserted 3 rows in 0.43s |
| [localhost:21000] > select count(*) from parquet_table; |
| +----------+ |
| | count(*) | |
| +----------+ |
| | 3 | |
| +----------+ |
| Returned 1 row(s) in 0.43s</code></pre> |
| |
| <p class="p"> |
| The <code class="ph codeph"><a class="xref" href="impala_insert.html#values">VALUES</a></code> clause lets you insert one or more |
| rows by specifying constant values for all the columns. The number, types, and order of the expressions must |
| match the table definition. |
| </p> |
| |
| |
| <div class="note note" id="insert__insert_values_warning"><span class="notetitle">Note:</span> |
| The <code class="ph codeph">INSERT ... VALUES</code> technique is not suitable for loading large quantities of data into |
| HDFS-based tables, because the insert operations cannot be parallelized, and each one produces a separate |
| data file. Use it for setting up small dimension tables or tiny amounts of data for experimenting with SQL |
| syntax, or with HBase tables. Do not use it for large ETL jobs or benchmark tests for load operations. Do not |
| run scripts with thousands of <code class="ph codeph">INSERT ... VALUES</code> statements that insert a single row each |
| time. If you do run <code class="ph codeph">INSERT ... VALUES</code> operations to load data into a staging table as one |
| stage in an ETL pipeline, include multiple row values if possible within each <code class="ph codeph">VALUES</code> clause, |
| and use a separate database to make cleanup easier if the operation does produce many tiny files. |
| </div> |
| |
| |
| <p class="p"> |
| The following example shows how to insert one row or multiple rows, with expressions of different types, |
| using literal values, expressions, and function return values: |
| </p> |
| |
| |
| <pre class="pre codeblock"><code>create table val_test_1 (c1 int, c2 float, c3 string, c4 boolean, c5 timestamp); |
| insert into val_test_1 values (100, 99.9/10, 'abc', true, now()); |
| create table val_test_2 (id int, token string); |
| insert overwrite val_test_2 values (1, 'a'), (2, 'b'), (-1,'xyzzy');</code></pre> |
| |
| <p class="p"> |
| These examples show the type of <span class="q">"not implemented"</span> error that you see when attempting to insert data into |
| a table with a file format that Impala currently does not write to: |
| </p> |
| |
| |
| <pre class="pre codeblock"><code>DROP TABLE IF EXISTS sequence_table; |
| CREATE TABLE sequence_table |
| ( id INT, col_1 BOOLEAN, col_2 DOUBLE, col_3 TIMESTAMP ) |
| STORED AS SEQUENCEFILE; |
| |
| DROP TABLE IF EXISTS rc_table; |
| CREATE TABLE rc_table |
| ( id INT, col_1 BOOLEAN, col_2 DOUBLE, col_3 TIMESTAMP ) |
| STORED AS RCFILE; |
| |
| [localhost:21000] > insert into table rc_table select * from default.tab1; |
| Remote error |
| Backend 0:RC_FILE not implemented. |
| |
| [localhost:21000] > insert into table sequence_table select * from default.tab1; |
| Remote error |
| Backend 0:SEQUENCE_FILE not implemented. </code></pre> |
| |
| <p class="p"> |
| The following examples show how you can copy the data in all the columns from one table to another, copy the |
| data from only some columns, or specify the columns in the select list in a different order than they |
| actually appear in the table: |
| </p> |
| |
| |
| <pre class="pre codeblock"><code>-- Start with 2 identical tables. |
| create table t1 (c1 int, c2 int); |
| create table t2 like t1; |
| |
| -- If there is no () part after the destination table name, |
| -- all columns must be specified, either as * or by name. |
| insert into t2 select * from t1; |
| insert into t2 select c1, c2 from t1; |
| |
| -- With the () notation following the destination table name, |
| -- you can omit columns (all values for that column are NULL |
| -- in the destination table), and/or reorder the values |
| -- selected from the source table. This is the "column permutation" feature. |
| insert into t2 (c1) select c1 from t1; |
| insert into t2 (c2, c1) select c1, c2 from t1; |
| |
| -- The column names can be entirely different in the source and destination tables. |
| -- You can copy any columns, not just the corresponding ones, from the source table. |
| -- But the number and type of selected columns must match the columns mentioned in the () part. |
| alter table t2 replace columns (x int, y int); |
| insert into t2 (y) select c1 from t1; |
| </code></pre> |
| |
| <p class="p"> |
| <strong class="ph b">Sorting considerations:</strong> Although you can specify an <code class="ph codeph">ORDER BY</code> |
| clause in an <code class="ph codeph">INSERT ... SELECT</code> statement, any <code class="ph codeph">ORDER BY</code> |
| clause is ignored and the results are not necessarily sorted. An <code class="ph codeph">INSERT ... |
| SELECT</code> operation potentially creates many different data files, prepared by |
| different executor Impala daemons, and therefore the notion of the data being stored in |
| sorted order is impractical. |
| </p> |
| |
| |
| <p class="p"> |
| <strong class="ph b">Concurrency considerations:</strong> Each <code class="ph codeph">INSERT</code> operation creates new data files with unique |
| names, so you can run multiple <code class="ph codeph">INSERT INTO</code> statements simultaneously without filename |
| conflicts. |
| |
| While data is being inserted into an Impala table, the data is staged temporarily in a subdirectory inside |
| the data directory; during this period, you cannot issue queries against that table in Hive. If an |
| <code class="ph codeph">INSERT</code> operation fails, the temporary data file and the subdirectory could be left behind in |
| the data directory. If so, remove the relevant subdirectory and any data files it contains manually, by |
| issuing an <code class="ph codeph">hdfs dfs -rm -r</code> command, specifying the full path of the work subdirectory, whose |
| name ends in <code class="ph codeph">_dir</code>. |
| </p> |
| |
| </div> |
| |
| |
| <div class="related-links"> |
| <div class="familylinks"> |
| <div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div> |
| </div> |
| </div><div class="topic concept nested1" aria-labelledby="ariaid-title2" id="values"> |
| |
| <h2 class="title topictitle2" id="ariaid-title2">VALUES Clause</h2> |
| |
| |
| <div class="body conbody"> |
| |
| <p class="p"> |
| The <code class="ph codeph">VALUES</code> clause is a general-purpose way to specify the columns of one or more rows, |
| typically within an <code class="ph codeph"><a class="xref" href="impala_insert.html#insert">INSERT</a></code> statement. |
| </p> |
| |
| |
| <div class="note note"><span class="notetitle">Note:</span> |
| The <code class="ph codeph">INSERT ... VALUES</code> technique is not suitable for loading large |
| quantities of data into HDFS-based tables, because the insert operations cannot be |
| parallelized, and each one produces a separate data file. Use it for setting up small |
| dimension tables or tiny amounts of data for experimenting with SQL syntax, or with |
| HBase tables. Do not use it for large ETL jobs or benchmark tests for load operations. |
| Do not run scripts with thousands of <code class="ph codeph">INSERT ... VALUES</code> statements that |
| insert a single row each time. If you do run <code class="ph codeph">INSERT ... VALUES</code> |
| operations to load data into a staging table as one stage in an ETL pipeline, include |
| multiple row values if possible within each <code class="ph codeph">VALUES</code> clause, and use a |
| separate database to make cleanup easier if the operation does produce many tiny files. |
| </div> |
| |
| |
| <p class="p"> |
| The following examples illustrate: |
| </p> |
| |
| |
| <ul class="ul"> |
| <li class="li"> |
| How to insert a single row using a <code class="ph codeph">VALUES</code> clause. |
| </li> |
| |
| |
| <li class="li"> |
| How to insert multiple rows using a <code class="ph codeph">VALUES</code> clause. |
| </li> |
| |
| |
| <li class="li"> |
| How the row or rows from a <code class="ph codeph">VALUES</code> clause can be appended to a table through |
| <code class="ph codeph">INSERT INTO</code>, or replace the contents of the table through <code class="ph codeph">INSERT |
| OVERWRITE</code>. |
| </li> |
| |
| |
| <li class="li"> |
| How the entries in a <code class="ph codeph">VALUES</code> clause can be literals, function results, or any other kind |
| of expression. See <a class="xref" href="impala_literals.html#literals">Literals</a> for the notation to use for literal |
| values, especially <a class="xref" href="impala_literals.html#string_literals">String Literals</a> for quoting and escaping |
| conventions for strings. See <a class="xref" href="impala_operators.html#operators">SQL Operators</a> and |
| <a class="xref" href="impala_functions.html#builtins">Impala Built-In Functions</a> for other things you can include in expressions with the |
| <code class="ph codeph">VALUES</code> clause. |
| </li> |
| |
| </ul> |
| |
| |
| <pre class="pre codeblock"><code>[localhost:21000] > describe val_example; |
| Query: describe val_example |
| Query finished, fetching results ... |
| +-------+---------+---------+ |
| | name | type | comment | |
| +-------+---------+---------+ |
| | id | int | | |
| | col_1 | boolean | | |
| | col_2 | double | | |
| +-------+---------+---------+ |
| |
| [localhost:21000] > insert into val_example values (1,true,100.0); |
| Inserted 1 rows in 0.30s |
| [localhost:21000] > select * from val_example; |
| +----+-------+-------+ |
| | id | col_1 | col_2 | |
| +----+-------+-------+ |
| | 1 | true | 100 | |
| +----+-------+-------+ |
| |
| [localhost:21000] > insert overwrite val_example values (10,false,pow(2,5)), (50,true,10/3); |
| Inserted 2 rows in 0.16s |
| [localhost:21000] > select * from val_example; |
| +----+-------+-------------------+ |
| | id | col_1 | col_2 | |
| +----+-------+-------------------+ |
| | 10 | false | 32 | |
| | 50 | true | 3.333333333333333 | |
| +----+-------+-------------------+</code></pre> |
| |
| <p class="p"> |
| When used in an <code class="ph codeph">INSERT</code> statement, the Impala <code class="ph codeph">VALUES</code> clause can specify |
| some or all of the columns in the destination table, and the columns can be specified in a different order |
| than they actually appear in the table. To specify a different set or order of columns than in the table, |
| use the syntax: |
| </p> |
| |
| |
| <pre class="pre codeblock"><code>INSERT INTO <var class="keyword varname">destination</var> |
| (<var class="keyword varname">col_x</var>, <var class="keyword varname">col_y</var>, <var class="keyword varname">col_z</var>) |
| VALUES |
| (<var class="keyword varname">val_x</var>, <var class="keyword varname">val_y</var>, <var class="keyword varname">val_z</var>); |
| </code></pre> |
| |
| <p class="p"> |
| Any columns in the table that are not listed in the <code class="ph codeph">INSERT</code> statement are set to |
| <code class="ph codeph">NULL</code>. |
| </p> |
| |
| |
| <p class="p"> |
| <strong class="ph b">HDFS considerations:</strong> |
| </p> |
| |
| |
| <p class="p"> |
| Impala physically writes all inserted files under the ownership of its default user, typically |
| <code class="ph codeph">impala</code>. Therefore, this user must have HDFS write permission in the corresponding table |
| directory. |
| </p> |
| |
| |
| <p class="p"> |
| The permission requirement is independent of the authorization performed by the Ranger framework. (If the |
| connected user is not authorized to insert into a table, Ranger blocks that operation immediately, |
| regardless of the privileges available to the <code class="ph codeph">impala</code> user.) Files created by Impala are |
| not owned by and do not inherit permissions from the connected user. |
| </p> |
| |
| |
| <p class="p"> |
| The number of data files produced by an <code class="ph codeph">INSERT</code> statement depends on the size of the |
| cluster, the number of data blocks that are processed, the partition key columns in a partitioned table, |
| and the mechanism Impala uses for dividing the work in parallel. Do not assume that an |
| <code class="ph codeph">INSERT</code> statement will produce some particular number of output files. In case of |
| performance issues with data written by Impala, check that the output files do not suffer from issues such |
| as many tiny files or many tiny partitions. (In the Hadoop context, even files or partitions of a few tens |
| of megabytes are considered <span class="q">"tiny"</span>.) |
| </p> |
| |
| |
| <p class="p"> |
| The <code class="ph codeph">INSERT</code> statement has always left behind a hidden work directory |
| inside the data directory of the table. Formerly, this hidden work directory was named |
| <span class="ph filepath">.impala_insert_staging</span> . In Impala 2.0.1 and later, this directory |
| name is changed to <span class="ph filepath">_impala_insert_staging</span> . (While HDFS tools are |
| expected to treat names beginning either with underscore and dot as hidden, in practice |
| names beginning with an underscore are more widely supported.) If you have any scripts, |
| cleanup jobs, and so on that rely on the name of this work directory, adjust them to use |
| the new name. |
| </p> |
| |
| |
| <p class="p"> |
| <strong class="ph b">HBase considerations:</strong> |
| </p> |
| |
| |
| <p class="p"> |
| You can use the <code class="ph codeph">INSERT</code> statement with HBase tables as follows: |
| </p> |
| |
| |
| <ul class="ul"> |
| <li class="li"> |
| <p class="p"> |
| You can insert a single row or a small set of rows into an HBase table with the <code class="ph codeph">INSERT ... |
| VALUES</code> syntax. This is a good use case for HBase tables with Impala, because HBase tables are |
| not subject to the same kind of fragmentation from many small insert operations as HDFS tables are. |
| </p> |
| |
| </li> |
| |
| |
| <li class="li"> |
| <p class="p"> |
| You can insert any number of rows at once into an HBase table using the <code class="ph codeph">INSERT ... |
| SELECT</code> syntax. |
| </p> |
| |
| </li> |
| |
| |
| <li class="li"> |
| <p class="p"> |
| If more than one inserted row has the same value for the HBase key column, only the last inserted row |
| with that value is visible to Impala queries. You can take advantage of this fact with <code class="ph codeph">INSERT |
| ... VALUES</code> statements to effectively update rows one at a time, by inserting new rows with the |
| same key values as existing rows. Be aware that after an <code class="ph codeph">INSERT ... SELECT</code> operation |
| copying from an HDFS table, the HBase table might contain fewer rows than were inserted, if the key |
| column in the source table contained duplicate values. |
| </p> |
| |
| </li> |
| |
| |
| <li class="li"> |
| <p class="p"> |
| You cannot <code class="ph codeph">INSERT OVERWRITE</code> into an HBase table. New rows are always appended. |
| </p> |
| |
| </li> |
| |
| |
| <li class="li"> |
| <p class="p"> |
| When you create an Impala or Hive table that maps to an HBase table, the column order you specify with |
| the <code class="ph codeph">INSERT</code> statement might be different than the order you declare with the |
| <code class="ph codeph">CREATE TABLE</code> statement. Behind the scenes, HBase arranges the columns based on how |
| they are divided into column families. This might cause a mismatch during insert operations, especially |
| if you use the syntax <code class="ph codeph">INSERT INTO <var class="keyword varname">hbase_table</var> SELECT * FROM |
| <var class="keyword varname">hdfs_table</var></code>. Before inserting data, verify the column order by issuing a |
| <code class="ph codeph">DESCRIBE</code> statement for the table, and adjust the order of the select list in the |
| <code class="ph codeph">INSERT</code> statement. |
| </p> |
| |
| </li> |
| |
| </ul> |
| |
| |
| <p class="p"> |
| See <a class="xref" href="impala_hbase.html#impala_hbase">Using Impala to Query HBase Tables</a> for more details about using Impala with HBase. |
| </p> |
| |
| |
| <p class="p"> |
| <strong class="ph b">Amazon S3 considerations:</strong> |
| </p> |
| |
| <p class="p"> In <span class="keyword">Impala 2.6</span> and higher, the Impala DML statements (<code class="ph codeph">INSERT</code>, |
| <code class="ph codeph">LOAD DATA</code>, and <code class="ph codeph">CREATE TABLE AS |
| SELECT</code>) can write data into a table or partition that resides |
| in S3. The syntax of the DML statements is the same as for any other |
| tables, because the S3 location for tables and partitions is specified |
| by an <code class="ph codeph">s3a://</code> prefix in the <code class="ph codeph">LOCATION</code> |
| attribute of <code class="ph codeph">CREATE TABLE</code> or <code class="ph codeph">ALTER |
| TABLE</code> statements. If you bring data into S3 using the normal |
| S3 transfer mechanisms instead of Impala DML statements, issue a |
| <code class="ph codeph">REFRESH</code> statement for the table before using Impala |
| to query the S3 data. </p> |
| |
| <p class="p"> Because of differences |
| between S3 and traditional filesystems, DML operations for S3 tables can |
| take longer than for tables on HDFS. For example, both the <code class="ph codeph">LOAD |
| DATA</code> statement and the final stage of the |
| <code class="ph codeph">INSERT</code> and <code class="ph codeph">CREATE TABLE AS SELECT</code> |
| statements involve moving files from one directory to another. (In the |
| case of <code class="ph codeph">INSERT</code> and <code class="ph codeph">CREATE TABLE AS |
| SELECT</code>, the files are moved from a temporary staging |
| directory to the final destination directory.) Because S3 does not |
| support a <span class="q">"rename"</span> operation for existing objects, in these cases |
| Impala actually copies the data files from one location to another and |
| then removes the original files. In <span class="keyword">Impala 2.6</span>, |
| the <code class="ph codeph">S3_SKIP_INSERT_STAGING</code> query option provides a way |
| to speed up <code class="ph codeph">INSERT</code> statements for S3 tables and |
| partitions, with the tradeoff that a problem during statement execution |
| could leave data in an inconsistent state. It does not apply to |
| <code class="ph codeph">INSERT OVERWRITE</code> or <code class="ph codeph">LOAD DATA</code> |
| statements. See <a class="xref" href="../shared/../topics/impala_s3_skip_insert_staging.html#s3_skip_insert_staging">S3_SKIP_INSERT_STAGING Query Option</a> for details. </p> |
| |
| <p class="p">See <a class="xref" href="impala_s3.html#s3">Using Impala with Amazon S3 Object Store</a> for details about reading and writing S3 data with Impala.</p> |
| |
| |
| <p class="p"> |
| <strong class="ph b">ADLS considerations:</strong> |
| </p> |
| |
| <p class="p"> |
| In <span class="keyword">Impala 2.9</span> and higher, the Impala DML statements |
| (<code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>, and <code class="ph codeph">CREATE TABLE AS |
| SELECT</code>) can write data into a table or partition that resides in the Azure Data |
| Lake Store (ADLS). ADLS Gen2 is supported in <span class="keyword">Impala 3.1</span> and higher. |
| </p> |
| <p class="p"> |
| In the<code class="ph codeph">CREATE TABLE</code> or <code class="ph codeph">ALTER TABLE</code> statements, specify |
| the ADLS location for tables and partitions with the <code class="ph codeph">adl://</code> prefix for |
| ADLS Gen1 and <code class="ph codeph">abfs://</code> or <code class="ph codeph">abfss://</code> for ADLS Gen2 in the |
| <code class="ph codeph">LOCATION</code> attribute. |
| </p> |
| <p class="p"> |
| If you bring data into ADLS using the normal ADLS transfer mechanisms instead of Impala |
| DML statements, issue a <code class="ph codeph">REFRESH</code> statement for the table before using |
| Impala to query the ADLS data. |
| </p> |
| |
| <p class="p">See <a class="xref" href="impala_adls.html#adls">Using Impala with the Azure Data Lake Store (ADLS)</a> for details about reading and writing ADLS data with Impala.</p> |
| |
| |
| <p class="p"> |
| <strong class="ph b">Security considerations:</strong> |
| </p> |
| |
| <p class="p"> |
| If these statements in your environment contain sensitive literal values such as credit |
| card numbers or tax identifiers, Impala can redact this sensitive information when |
| displaying the statements in log files and other administrative contexts. See |
| <span class="xref">the documentation for your Apache Hadoop distribution</span> for details. |
| </p> |
| |
| |
| <p class="p"> |
| <strong class="ph b">Cancellation:</strong> Can be cancelled. To cancel this statement, use Ctrl-C from the |
| <span class="keyword cmdname">impala-shell</span> interpreter, the <span class="ph uicontrol">Cancel</span> button |
| from the <span class="ph uicontrol">Watch</span> page in Hue, or <span class="ph uicontrol">Cancel</span> from |
| the list of in-flight queries (for a particular node) on the |
| <span class="ph uicontrol">Queries</span> tab in the Impala web UI (port 25000). |
| </p> |
| |
| |
| <p class="p"> |
| <strong class="ph b">HDFS permissions:</strong> |
| </p> |
| |
| <p class="p"> |
| The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under, |
| typically the <code class="ph codeph">impala</code> user, must have read |
| permission for the files in the source directory of an <code class="ph codeph">INSERT ... SELECT</code> |
| operation, and write permission for all affected directories in the destination table. |
| (An <code class="ph codeph">INSERT</code> operation could write files to multiple different HDFS directories |
| if the destination table is partitioned.) |
| This user must also have write permission to create a temporary work directory |
| in the top-level HDFS directory of the destination table. |
| An <code class="ph codeph">INSERT OVERWRITE</code> operation does not require write permission on |
| the original data files in the table, only on the table directories themselves. |
| </p> |
| |
| |
| <p class="p"> |
| <strong class="ph b">Restrictions:</strong> |
| </p> |
| |
| |
| <p class="p"> |
| For <code class="ph codeph">INSERT</code> operations into <code class="ph codeph">CHAR</code> or |
| <code class="ph codeph">VARCHAR</code> columns, you must cast all <code class="ph codeph">STRING</code> literals or |
| expressions returning <code class="ph codeph">STRING</code> to to a <code class="ph codeph">CHAR</code> or |
| <code class="ph codeph">VARCHAR</code> type with the appropriate length. |
| </p> |
| |
| |
| <p class="p"> |
| <strong class="ph b">Related startup options:</strong> |
| </p> |
| |
| |
| <p class="p"> |
| By default, if an <code class="ph codeph">INSERT</code> statement creates any new subdirectories |
| underneath a partitioned table, those subdirectories are assigned default HDFS |
| permissions for the <code class="ph codeph">impala</code> user. To make each subdirectory have the |
| same permissions as its parent directory in HDFS, specify the |
| <code class="ph codeph">‑‑insert_inherit_permissions</code> startup option for the |
| <span class="keyword cmdname">impalad</span> daemon. |
| </p> |
| |
| </div> |
| |
| </div> |
| |
| <div class="topic concept nested1" aria-labelledby="ariaid-title3" id="partition_insert"> |
| <h2 class="title topictitle2" id="ariaid-title3">Inserting Into Partitioned Tables with PARTITION Clause</h2> |
| |
| <div class="body conbody"> |
| <p class="p"> |
| For a partitioned table, the optional <code class="ph codeph">PARTITION</code> clause |
| identifies which partition or partitions the values are inserted |
| into. |
| </p> |
| |
| <p class="p"> |
| All examples in this section will use the table declared as below: |
| </p> |
| |
| <pre class="pre codeblock"><code>CREATE TABLE t1 (w INT) PARTITIONED BY (x INT, y STRING);</code></pre> |
| </div> |
| |
| |
| <div class="topic concept nested2" aria-labelledby="ariaid-title4" id="static_partition_insert"> |
| <h3 class="title topictitle3" id="ariaid-title4">Static Partition Inserts</h3> |
| |
| <div class="body conbody"> |
| <p class="p"> |
| In a static partition insert where a partition key column is given a |
| constant value, such as <code class="ph codeph">PARTITION</code> |
| <code class="ph codeph">(year=2012, month=2)</code>, the rows are inserted with the |
| same values specified for those partition key columns. |
| </p> |
| |
| <p class="p"> |
| The number of columns in the <code class="ph codeph">SELECT</code> list must equal |
| the number of columns in the column permutation. |
| </p> |
| |
| <p class="p"> |
| The <code class="ph codeph">PARTITION</code> clause must be used for static |
| partitioning inserts. |
| </p> |
| |
| <p class="p"> |
| Example: |
| </p> |
| |
| <div class="p"> |
| The following statement will insert the |
| <code class="ph codeph">some_other_table.c1</code> values for the |
| <code class="ph codeph">w</code> column, and all the rows inserted will have the |
| same <code class="ph codeph">x</code> value of <code class="ph codeph">10</code>, and the same |
| <code class="ph codeph">y</code> value of |
| <code class="ph codeph">‘a’</code>.<pre class="pre codeblock"><code>INSERT INTO t1 PARTITION (x=10, y='a') |
| SELECT c1 FROM some_other_table;</code></pre> |
| </div> |
| |
| </div> |
| |
| </div> |
| |
| <div class="topic concept nested2" aria-labelledby="ariaid-title5" id="dynamic_partition_insert"> |
| <h3 class="title topictitle3" id="ariaid-title5">Dynamic Partition Inserts</h3> |
| |
| <div class="body conbody"> |
| <p class="p"> |
| In a dynamic partition insert where a partition key |
| column is in the <code class="ph codeph">INSERT</code> statement but not assigned a |
| value, such as in <code class="ph codeph">PARTITION (year, region)</code>(both |
| columns unassigned) or <code class="ph codeph">PARTITION(year, region='CA')</code> |
| (<code class="ph codeph">year</code> column unassigned), the unassigned columns |
| are filled in with the final columns of the <code class="ph codeph">SELECT</code> or |
| <code class="ph codeph">VALUES</code> clause. In this case, the number of columns |
| in the <code class="ph codeph">SELECT</code> list must equal the number of columns |
| in the column permutation plus the number of partition key columns not |
| assigned a constant value. |
| </p> |
| |
| <p class="p"> |
| See <a class="xref" href="impala_partitioning.html#partition_static_dynamic">Static and Dynamic Partitioning Clauses</a> for examples and performance characteristics of static and dynamic |
| partitioned inserts. |
| </p> |
| |
| <p class="p"> |
| The following rules apply to dynamic partition |
| inserts. |
| </p> |
| |
| <ul class="ul"> |
| <li class="li"> |
| <p class="p"> |
| The columns are bound in the order they appear in the |
| <code class="ph codeph">INSERT</code> statement. |
| </p> |
| |
| <p class="p"> |
| The table below shows the values inserted with the |
| <code class="ph codeph">INSERT</code> statements of different column |
| orders. |
| </p> |
| |
| </li> |
| |
| </ul> |
| |
| |
| <div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="dynamic_partition_insert__table_vyx_dp3_ldb" class="table" frame="border" border="1" rules="all"><colgroup><col /><col /><col /><col /></colgroup><tbody class="tbody"> |
| <tr class="row"> |
| <td class="entry cellrowborder" style="text-align:left;vertical-align:top;"> </td> |
| |
| <td class="entry cellrowborder" style="text-align:left;vertical-align:top;">Column <code class="ph codeph">w</code> Value</td> |
| |
| <td class="entry cellrowborder" style="text-align:left;vertical-align:top;">Column <code class="ph codeph">x</code> Value</td> |
| |
| <td class="entry cellrowborder" style="text-align:left;vertical-align:top;">Column <code class="ph codeph">y</code> Value</td> |
| |
| </tr> |
| |
| <tr class="row"> |
| <td class="entry cellrowborder" style="text-align:left;vertical-align:top;"><code class="ph codeph">INSERT INTO t1 (w, x, y) VALUES (1, 2, |
| 'c');</code></td> |
| |
| <td class="entry cellrowborder" style="text-align:left;vertical-align:top;"><code class="ph codeph">1</code></td> |
| |
| <td class="entry cellrowborder" style="text-align:left;vertical-align:top;"><code class="ph codeph">2</code></td> |
| |
| <td class="entry cellrowborder" style="text-align:left;vertical-align:top;"><code class="ph codeph">‘c’</code></td> |
| |
| </tr> |
| |
| <tr class="row"> |
| <td class="entry cellrowborder" style="text-align:left;vertical-align:top;"><code class="ph codeph">INSERT INTO t1 (x,w) PARTITION (y) VALUES (1, |
| 2, 'c');</code></td> |
| |
| <td class="entry cellrowborder" style="text-align:left;vertical-align:top;"><code class="ph codeph">2</code></td> |
| |
| <td class="entry cellrowborder" style="text-align:left;vertical-align:top;"><code class="ph codeph">1</code></td> |
| |
| <td class="entry cellrowborder" style="text-align:left;vertical-align:top;"><code class="ph codeph">‘c’</code></td> |
| |
| </tr> |
| |
| </tbody> |
| </table> |
| </div> |
| |
| <ul class="ul"> |
| <li class="li"> |
| When a partition clause is specified but the non-partition |
| columns are not specified in the <code class="ph codeph">INSERT</code> statement, |
| as in the first example below, the non-partition columns are treated |
| as though they had been specified before the |
| <code class="ph codeph">PARTITION</code> clause in the SQL. |
| <p class="p"> |
| Example: These |
| three statements are equivalent, inserting <code class="ph codeph">1</code> to |
| <code class="ph codeph">w</code>, <code class="ph codeph">2</code> to <code class="ph codeph">x</code>, |
| and <code class="ph codeph">‘c’</code> to <code class="ph codeph">y</code> |
| columns. |
| </p> |
| |
| <pre class="pre codeblock"><code>INSERT INTO t1 PARTITION (x,y) VALUES (1, 2, ‘c’); |
| INSERT INTO t1 (w) PARTITION (x, y) VALUES (1, 2, ‘c’); |
| INSERT INTO t1 PARTITION (x, y='c') VALUES (1, 2);</code></pre> |
| </li> |
| |
| <li class="li"> |
| The <code class="ph codeph">PARTITION</code> clause is not required for |
| dynamic partition, but all the partition columns must be explicitly |
| present in the <code class="ph codeph">INSERT</code> statement in the column list |
| or in the <code class="ph codeph">PARTITION</code> clause. The partition columns |
| cannot be defaulted to <code class="ph codeph">NULL</code>. |
| <p class="p"> |
| Example: |
| </p> |
| |
| <p class="p">The following statements are valid because the partition |
| columns, <code class="ph codeph">x</code> and <code class="ph codeph">y</code>, are present in |
| the <code class="ph codeph">INSERT</code> statements, either in the |
| <code class="ph codeph">PARTITION</code> clause or in the column |
| list. |
| </p> |
| |
| <pre class="pre codeblock"><code>INSERT INTO t1 PARTITION (x,y) VALUES (1, 2, ‘c’); |
| INSERT INTO t1 (w, x) PARTITION (y) VALUES (1, 2, ‘c’);</code></pre> |
| <p class="p"> |
| The following statement is not valid for the partitioned table as |
| defined above because the partition columns, <code class="ph codeph">x</code> |
| and <code class="ph codeph">y</code>, are not present in the |
| <code class="ph codeph">INSERT</code> statement. |
| </p> |
| |
| <pre class="pre codeblock"><code>INSERT INTO t1 VALUES (1, 2, 'c');</code></pre> |
| </li> |
| |
| <li class="li"> |
| If partition columns do not exist in the source table, you can |
| specify a specific value for that column in the |
| <code class="ph codeph">PARTITION</code> clause. |
| <p class="p"> |
| Example: The <code class="ph codeph">source</code> table only contains the column |
| <code class="ph codeph">w</code> and <code class="ph codeph">y</code>. The value, |
| <code class="ph codeph">20</code>, specified in the <code class="ph codeph">PARTITION</code> |
| clause, is inserted into the <code class="ph codeph">x</code> column. |
| </p> |
| |
| <pre class="pre codeblock"><code>INSERT INTO t1 PARTITION (x=20, y) SELECT * FROM source;</code></pre> |
| </li> |
| |
| </ul> |
| |
| </div> |
| |
| </div> |
| |
| </div> |
| |
| </body> |
| </html> |