| <?xml version="1.0" encoding="UTF-8"?> |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| <!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> |
| <concept id="struct"> |
| |
| <title>STRUCT Complex Type (<keyword keyref="impala23"/> or higher only)</title> |
| |
| <prolog> |
| <metadata> |
| <data name="Category" value="Impala"/> |
| <data name="Category" value="Impala Data Types"/> |
| <data name="Category" value="SQL"/> |
| <data name="Category" value="Developers"/> |
| <data name="Category" value="Data Analysts"/> |
| </metadata> |
| </prolog> |
| |
| <conbody> |
| |
| <p> |
| A complex data type, representing multiple fields of a single item. Frequently used as the element type of an <codeph>ARRAY</codeph> |
| or the <codeph>VALUE</codeph> part of a <codeph>MAP</codeph>. |
| </p> |
| |
| <p conref="../shared/impala_common.xml#common/syntax_blurb"/> |
| |
| <codeblock><varname>column_name</varname> STRUCT < <varname>name</varname> : <varname>type</varname> [COMMENT '<varname>comment_string</varname>'], ... > |
| |
| type ::= <varname>primitive_type</varname> | <varname>complex_type</varname> |
| </codeblock> |
| |
| <p> |
| The names and number of fields within the <codeph>STRUCT</codeph> are fixed. Each field can be a different type. A field within a |
| <codeph>STRUCT</codeph> can also be another <codeph>STRUCT</codeph>, or an <codeph>ARRAY</codeph> or a <codeph>MAP</codeph>, allowing |
| you to create nested data structures with a maximum nesting depth of 100. |
| </p> |
| |
| <p> |
| A <codeph>STRUCT</codeph> can be the top-level type for a column, or can itself be an item within an <codeph>ARRAY</codeph> or the |
| value part of the key-value pair in a <codeph>MAP</codeph>. |
| </p> |
| |
| <p> |
| When a <codeph>STRUCT</codeph> is used as an <codeph>ARRAY</codeph> element or a <codeph>MAP</codeph> value, you use a join clause to |
| bring the <codeph>ARRAY</codeph> or <codeph>MAP</codeph> elements into the result set, and then refer to |
| <codeph><varname>array_name</varname>.ITEM.<varname>field</varname></codeph> or |
| <codeph><varname>map_name</varname>.VALUE.<varname>field</varname></codeph>. In the case of a <codeph>STRUCT</codeph> directly inside |
| an <codeph>ARRAY</codeph> or <codeph>MAP</codeph>, you can omit the <codeph>.ITEM</codeph> and <codeph>.VALUE</codeph> pseudocolumns |
| and refer directly to <codeph><varname>array_name</varname>.<varname>field</varname></codeph> or |
| <codeph><varname>map_name</varname>.<varname>field</varname></codeph>. |
| </p> |
| |
| <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/> |
| |
| <p conref="../shared/impala_common.xml#common/complex_types_combo"/> |
| |
| <p> |
| A <codeph>STRUCT</codeph> is similar conceptually to a table row: it contains a fixed number of named fields, each with a predefined |
| type. To combine two related tables, while using complex types to minimize repetition, the typical way to represent that data is as an |
| <codeph>ARRAY</codeph> of <codeph>STRUCT</codeph> elements. |
| </p> |
| |
| <p> |
| Because a <codeph>STRUCT</codeph> has a fixed number of named fields, it typically does not make sense to have a |
| <codeph>STRUCT</codeph> as the type of a table column. In such a case, you could just make each field of the <codeph>STRUCT</codeph> |
| into a separate column of the table. The <codeph>STRUCT</codeph> type is most useful as an item of an <codeph>ARRAY</codeph> or the |
| value part of the key-value pair in a <codeph>MAP</codeph>. A nested type column with a <codeph>STRUCT</codeph> at the lowest level |
| lets you associate a variable number of row-like objects with each row of the table. |
| </p> |
| |
| <p> |
| The <codeph>STRUCT</codeph> type is straightforward to reference within a query. You do not need to include the |
| <codeph>STRUCT</codeph> column in a join clause or give it a table alias, as is required for the <codeph>ARRAY</codeph> and |
| <codeph>MAP</codeph> types. You refer to the individual fields using dot notation, such as |
| <codeph><varname>struct_column_name</varname>.<varname>field_name</varname></codeph>, without any pseudocolumn such as |
| <codeph>ITEM</codeph> or <codeph>VALUE</codeph>. |
| </p> |
| |
| <p conref="../shared/impala_common.xml#common/complex_types_describe"/> |
| |
| <p conref="../shared/impala_common.xml#common/internals_blurb"/> |
| |
| <p> |
| Within the Parquet data file, the values for each <codeph>STRUCT</codeph> field are stored adjacent to each other, so that they can be |
| encoded and compressed using all the Parquet techniques for storing sets of similar or repeated values. The adjacency applies even |
| when the <codeph>STRUCT</codeph> values are part of an <codeph>ARRAY</codeph> or <codeph>MAP</codeph>. During a query, Impala avoids |
| unnecessary I/O by reading only the portions of the Parquet data file containing the requested <codeph>STRUCT</codeph> fields. |
| </p> |
| |
| <p conref="../shared/impala_common.xml#common/added_in_230"/> |
| |
| <p conref="../shared/impala_common.xml#common/restrictions_blurb"/> |
| |
| <ul conref="../shared/impala_common.xml#common/complex_types_restrictions"> |
| <li/> |
| </ul> |
| |
| <p conref="../shared/impala_common.xml#common/kudu_blurb"/> |
| <p conref="../shared/impala_common.xml#common/kudu_unsupported_data_type"/> |
| |
| <p conref="../shared/impala_common.xml#common/example_blurb"/> |
| |
| <note conref="../shared/impala_common.xml#common/complex_type_schema_pointer"/> |
| |
| <p> |
| The following example shows a table with various kinds of <codeph>STRUCT</codeph> columns, both at the top level and nested within |
| other complex types. Practice the <codeph>CREATE TABLE</codeph> and query notation for complex type columns using empty tables, until |
| you can visualize a complex data structure and construct corresponding SQL statements reliably. |
| </p> |
| |
| <codeblock><![CDATA[CREATE TABLE struct_demo |
| ( |
| id BIGINT, |
| name STRING, |
| |
| -- A STRUCT as a top-level column. Demonstrates how the table ID column |
| -- and the ID field within the STRUCT can coexist without a name conflict. |
| employee_info STRUCT < employer: STRING, id: BIGINT, address: STRING >, |
| |
| -- A STRUCT as the element type of an ARRAY. |
| places_lived ARRAY < STRUCT <street: STRING, city: STRING, country: STRING >>, |
| |
| -- A STRUCT as the value portion of the key-value pairs in a MAP. |
| memorable_moments MAP < STRING, STRUCT < year: INT, place: STRING, details: STRING >>, |
| |
| -- A STRUCT where one of the fields is another STRUCT. |
| current_address STRUCT < street_address: STRUCT <street_number: INT, street_name: STRING, street_type: STRING>, country: STRING, postal_code: STRING > |
| ) |
| STORED AS PARQUET; |
| ]]> |
| </codeblock> |
| |
| <p> |
| The following example shows how to examine the structure of a table containing one or more <codeph>STRUCT</codeph> columns by using |
| the <codeph>DESCRIBE</codeph> statement. You can visualize each <codeph>STRUCT</codeph> as its own table, with columns named the same |
| as each field of the <codeph>STRUCT</codeph>. If the <codeph>STRUCT</codeph> is nested inside another complex type, such as |
| <codeph>ARRAY</codeph>, you can extend the qualified name passed to <codeph>DESCRIBE</codeph> until the output shows just the |
| <codeph>STRUCT</codeph> fields. |
| </p> |
| |
| <codeblock><![CDATA[DESCRIBE struct_demo; |
| +-------------------+--------------------------+ |
| | name | type | |
| +-------------------+--------------------------+ |
| | id | bigint | |
| | name | string | |
| | employee_info | struct< | |
| | | employer:string, | |
| | | id:bigint, | |
| | | address:string | |
| | | > | |
| | places_lived | array<struct< | |
| | | street:string, | |
| | | city:string, | |
| | | country:string | |
| | | >> | |
| | memorable_moments | map<string,struct< | |
| | | year:int, | |
| | | place:string, | |
| | | details:string | |
| | | >> | |
| | current_address | struct< | |
| | | street_address:struct< | |
| | | street_number:int, | |
| | | street_name:string, | |
| | | street_type:string | |
| | | >, | |
| | | country:string, | |
| | | postal_code:string | |
| | | > | |
| +-------------------+--------------------------+ |
| ]]> |
| </codeblock> |
| |
| <p> |
| The top-level column <codeph>EMPLOYEE_INFO</codeph> is a <codeph>STRUCT</codeph>. Describing |
| <codeph><varname>table_name</varname>.<varname>struct_name</varname></codeph> displays the fields of the <codeph>STRUCT</codeph> as if |
| they were columns of a table: |
| </p> |
| |
| <codeblock><![CDATA[DESCRIBE struct_demo.employee_info; |
| +----------+--------+ |
| | name | type | |
| +----------+--------+ |
| | employer | string | |
| | id | bigint | |
| | address | string | |
| +----------+--------+ |
| ]]> |
| </codeblock> |
| |
| <p> |
| Because <codeph>PLACES_LIVED</codeph> is a <codeph>STRUCT</codeph> inside an <codeph>ARRAY</codeph>, the initial |
| <codeph>DESCRIBE</codeph> shows the structure of the <codeph>ARRAY</codeph>: |
| </p> |
| |
| <codeblock><![CDATA[DESCRIBE struct_demo.places_lived; |
| +------+------------------+ |
| | name | type | |
| +------+------------------+ |
| | item | struct< | |
| | | street:string, | |
| | | city:string, | |
| | | country:string | |
| | | > | |
| | pos | bigint | |
| +------+------------------+ |
| ]]> |
| </codeblock> |
| |
| <p> |
| Ask for the details of the <codeph>ITEM</codeph> field of the <codeph>ARRAY</codeph> to see just the layout of the |
| <codeph>STRUCT</codeph>: |
| </p> |
| |
| <codeblock><![CDATA[DESCRIBE struct_demo.places_lived.item; |
| +---------+--------+ |
| | name | type | |
| +---------+--------+ |
| | street | string | |
| | city | string | |
| | country | string | |
| +---------+--------+ |
| ]]> |
| </codeblock> |
| |
| <p> |
| Likewise, <codeph>MEMORABLE_MOMENTS</codeph> has a <codeph>STRUCT</codeph> inside a <codeph>MAP</codeph>, which requires an extra |
| level of qualified name to see just the <codeph>STRUCT</codeph> part: |
| </p> |
| |
| <codeblock><![CDATA[DESCRIBE struct_demo.memorable_moments; |
| +-------+------------------+ |
| | name | type | |
| +-------+------------------+ |
| | key | string | |
| | value | struct< | |
| | | year:int, | |
| | | place:string, | |
| | | details:string | |
| | | > | |
| +-------+------------------+ |
| ]]> |
| </codeblock> |
| |
| <p> |
| For a <codeph>MAP</codeph>, ask to see the <codeph>VALUE</codeph> field to see the corresponding <codeph>STRUCT</codeph> fields in a |
| table-like structure: |
| </p> |
| |
| <codeblock><![CDATA[DESCRIBE struct_demo.memorable_moments.value; |
| +---------+--------+ |
| | name | type | |
| +---------+--------+ |
| | year | int | |
| | place | string | |
| | details | string | |
| +---------+--------+ |
| ]]> |
| </codeblock> |
| |
| <p> |
| For a <codeph>STRUCT</codeph> inside a <codeph>STRUCT</codeph>, we can see the fields of the outer <codeph>STRUCT</codeph>: |
| </p> |
| |
| <codeblock><![CDATA[DESCRIBE struct_demo.current_address; |
| +----------------+-----------------------+ |
| | name | type | |
| +----------------+-----------------------+ |
| | street_address | struct< | |
| | | street_number:int, | |
| | | street_name:string, | |
| | | street_type:string | |
| | | > | |
| | country | string | |
| | postal_code | string | |
| +----------------+-----------------------+ |
| ]]> |
| </codeblock> |
| |
| <p> |
| Then we can use a further qualified name to see just the fields of the inner <codeph>STRUCT</codeph>: |
| </p> |
| |
| <codeblock><![CDATA[DESCRIBE struct_demo.current_address.street_address; |
| +---------------+--------+ |
| | name | type | |
| +---------------+--------+ |
| | street_number | int | |
| | street_name | string | |
| | street_type | string | |
| +---------------+--------+ |
| ]]> |
| </codeblock> |
| |
| <p> |
| The following example shows how to examine the structure of a table containing one or more <codeph>STRUCT</codeph> columns by using |
| the <codeph>DESCRIBE</codeph> statement. You can visualize each <codeph>STRUCT</codeph> as its own table, with columns named the same |
| as each field of the <codeph>STRUCT</codeph>. If the <codeph>STRUCT</codeph> is nested inside another complex type, such as |
| <codeph>ARRAY</codeph>, you can extend the qualified name passed to <codeph>DESCRIBE</codeph> until the output shows just the |
| <codeph>STRUCT</codeph> fields. |
| </p> |
| |
| <!-- To do: See why the most verbose query form gives an error. --> |
| |
| <codeblock><![CDATA[DESCRIBE struct_demo; |
| +-------------------+--------------------------+---------+ |
| | name | type | comment | |
| +-------------------+--------------------------+---------+ |
| | id | bigint | | |
| | name | string | | |
| | employee_info | struct< | | |
| | | employer:string, | | |
| | | id:bigint, | | |
| | | address:string | | |
| | | > | | |
| | places_lived | array<struct< | | |
| | | street:string, | | |
| | | city:string, | | |
| | | country:string | | |
| | | >> | | |
| | memorable_moments | map<string,struct< | | |
| | | year:int, | | |
| | | place:string, | | |
| | | details:string | | |
| | | >> | | |
| | current_address | struct< | | |
| | | street_address:struct< | | |
| | | street_number:int, | | |
| | | street_name:string, | | |
| | | street_type:string | | |
| | | >, | | |
| | | country:string, | | |
| | | postal_code:string | | |
| | | > | | |
| +-------------------+--------------------------+---------+ |
| |
| SELECT id, employee_info.id FROM struct_demo; |
| |
| SELECT id, employee_info.id AS employee_id FROM struct_demo; |
| |
| SELECT id, employee_info.id AS employee_id, employee_info.employer |
| FROM struct_demo; |
| |
| SELECT id, name, street, city, country |
| FROM struct_demo, struct_demo.places_lived; |
| |
| SELECT id, name, places_lived.pos, places_lived.street, places_lived.city, places_lived.country |
| FROM struct_demo, struct_demo.places_lived; |
| |
| SELECT id, name, pl.pos, pl.street, pl.city, pl.country |
| FROM struct_demo, struct_demo.places_lived AS pl; |
| |
| SELECT id, name, places_lived.pos, places_lived.street, places_lived.city, places_lived.country |
| FROM struct_demo, struct_demo.places_lived; |
| |
| SELECT id, name, pos, street, city, country |
| FROM struct_demo, struct_demo.places_lived; |
| |
| SELECT id, name, memorable_moments.key, |
| memorable_moments.value.year, |
| memorable_moments.value.place, |
| memorable_moments.value.details |
| FROM struct_demo, struct_demo.memorable_moments |
| WHERE memorable_moments.key IN ('Birthday','Anniversary','Graduation'); |
| |
| SELECT id, name, mm.key, mm.value.year, mm.value.place, mm.value.details |
| FROM struct_demo, struct_demo.memorable_moments AS mm |
| WHERE mm.key IN ('Birthday','Anniversary','Graduation'); |
| |
| SELECT id, name, memorable_moments.key, memorable_moments.value.year, |
| memorable_moments.value.place, memorable_moments.value.details |
| FROM struct_demo, struct_demo.memorable_moments |
| WHERE key IN ('Birthday','Anniversary','Graduation'); |
| |
| SELECT id, name, key, value.year, value.place, value.details |
| FROM struct_demo, struct_demo.memorable_moments |
| WHERE key IN ('Birthday','Anniversary','Graduation'); |
| |
| SELECT id, name, key, year, place, details |
| FROM struct_demo, struct_demo.memorable_moments |
| WHERE key IN ('Birthday','Anniversary','Graduation'); |
| |
| SELECT id, name, |
| current_address.street_address.street_number, |
| current_address.street_address.street_name, |
| current_address.street_address.street_type, |
| current_address.country, |
| current_address.postal_code |
| FROM struct_demo; |
| ]]> |
| </codeblock> |
| |
| <p> |
| For example, this table uses a struct that encodes several data values for each phone number associated with a person. Each person can |
| have a variable-length array of associated phone numbers, and queries can refer to the category field to locate specific home, work, |
| mobile, and so on kinds of phone numbers. |
| </p> |
| |
| <codeblock><![CDATA[CREATE TABLE contact_info_many_structs |
| ( |
| id BIGINT, name STRING, |
| phone_numbers ARRAY < STRUCT <category:STRING, country_code:STRING, area_code:SMALLINT, full_number:STRING, mobile:BOOLEAN, carrier:STRING > > |
| ) STORED AS PARQUET; |
| ]]> |
| </codeblock> |
| |
| <p> |
| Because structs are naturally suited to composite values where the fields have different data types, you might use them to decompose |
| things such as addresses: |
| </p> |
| |
| <codeblock><![CDATA[CREATE TABLE contact_info_detailed_address |
| ( |
| id BIGINT, name STRING, |
| address STRUCT < house_number:INT, street:STRING, street_type:STRING, apartment:STRING, city:STRING, region:STRING, country:STRING > |
| ); |
| ]]> |
| </codeblock> |
| |
| <p> |
| In a big data context, splitting out data fields such as the number part of the address and the street name could let you do analysis |
| on each field independently. For example, which streets have the largest number range of addresses, what are the statistical |
| properties of the street names, which areas have a higher proportion of <q>Roads</q>, <q>Courts</q> or <q>Boulevards</q>, and so on. |
| </p> |
| |
| <p conref="../shared/impala_common.xml#common/related_info"/> |
| |
| <p> |
| <xref href="impala_complex_types.xml#complex_types"/>, <xref href="impala_array.xml#array"/>, |
| <!-- <xref href="impala_struct.xml#struct"/>, --> |
| <xref href="impala_map.xml#map"/> |
| </p> |
| |
| </conbody> |
| |
| </concept> |