| ******* |
| Queries |
| ******* |
| |
| ======== |
| Overview |
| ======== |
| |
| *Synopsis* |
| |
| .. code-block:: sql |
| |
| SELECT [distinct [all]] * | <expression> [[AS] <alias>] [, ...] |
| [FROM <table reference> [[AS] <table alias name>] [, ...]] |
| [WHERE <condition>] |
| [GROUP BY <expression> [, ...]] |
| [HAVING <condition>] |
| [ORDER BY <expression> [ASC|DESC] [NULLS (FIRST|LAST)] [, ...]] |
| |
| |
| |
| =========== |
| From Clause |
| =========== |
| |
| *Synopsis* |
| |
| .. code-block:: sql |
| |
| [FROM <table reference> [[AS] <table alias name>] [, ...]] |
| |
| |
| The ``FROM`` clause specifies one or more other tables given in a comma-separated table reference list. |
| A table reference can be a relation name , or a subquery, a table join, or complex combinations of them. |
| |
| ----------------------- |
| Table and Table Aliases |
| ----------------------- |
| |
| A temporary name can be given to tables and complex table references to be used |
| for references to the derived table in the rest of the query. This is called a table alias. |
| |
| To create a a table alias, please use ``AS``: |
| |
| .. code-block:: sql |
| |
| FROM table_reference AS alias |
| |
| or |
| |
| .. code-block:: sql |
| |
| FROM table_reference alias |
| |
| The ``AS`` keyword can be omitted, and *Alias* can be any identifier. |
| |
| A typical application of table aliases is to give short names to long table references. For example: |
| |
| .. code-block:: sql |
| |
| SELECT * FROM long_table_name_1234 s JOIN another_long_table_name_5678 a ON s.id = a.num; |
| |
| ------------- |
| Joined Tables |
| ------------- |
| |
| Tajo supports all kinds of join types. |
| |
| Join Types |
| ~~~~~~~~~~ |
| |
| Cross Join |
| ^^^^^^^^^^ |
| |
| .. code-block:: sql |
| |
| FROM T1 CROSS JOIN T2 |
| |
| Cross join, also called *Cartesian product*, results in every possible combination of rows from T1 and T2. |
| |
| ``FROM T1 CROSS JOIN T2`` is equivalent to ``FROM T1, T2``. |
| |
| Qualified joins |
| ^^^^^^^^^^^^^^^ |
| |
| Qualified joins implicitly or explicitly have join conditions. Inner/Outer/Natural Joins all are qualified joins. |
| Except for natural join, ``ON`` or ``USING`` clause in each join is used to specify a join condition. |
| A join condition must include at least one boolean expression, and it can also include just filter conditions. |
| |
| **Inner Join** |
| |
| .. code-block:: sql |
| |
| T1 [INNER] JOIN T2 ON boolean_expression |
| T1 [INNER] JOIN T2 USING (join column list) |
| |
| ``INNER`` keyword is the default, and so ``INNER`` can be omitted when you use inner join. |
| |
| **Outer Join** |
| |
| .. code-block:: sql |
| |
| T1 (LEFT|RIGHT|FULL) OUTER JOIN T2 ON boolean_expression |
| T1 (LEFT|RIGHT|FULL) OUTER JOIN T2 USING (join column list) |
| |
| One of ``LEFT``, ``RIGHT``, or ``FULL`` must be specified for outer joins. |
| Join conditions in outer join will have different behavior according to corresponding table references of join conditions. |
| To know outer join behavior in more detail, please refer to |
| `Advanced outer join constructs <http://www.ibm.com/developerworks/data/library/techarticle/purcell/0201purcell.html>`_. |
| |
| **Natural Join** |
| |
| .. code-block:: sql |
| |
| T1 NATURAL JOIN T2 |
| |
| ``NATURAL`` is a short form of ``USING``. It forms a ``USING`` list consisting of all common column names that appear in |
| both join tables. These common columns appear only once in the output table. If there are no common columns, |
| ``NATURAL`` behaves like ``CROSS JOIN``. |
| |
| **Subqueries** |
| |
| A subquery is a query that is nested inside another query. It can be embedded in the FROM and WHERE clauses. |
| |
| Example: |
| |
| .. code-block:: sql |
| |
| FROM (SELECT col1, sum(col2) FROM table1 WHERE col3 > 0 group by col1 order by col1) AS alias_name |
| WHERE col1 IN (SELECT col1 FROM table1 WHERE col2 > 0 AND col2 < 100) AS alias_name |
| |
| For more detailed information, please refer to :doc:`joins`. |
| |
| ============ |
| Where Clause |
| ============ |
| |
| The syntax of the WHERE Clause is |
| |
| *Synopsis* |
| |
| .. code-block:: sql |
| |
| WHERE search_condition |
| |
| ``search_condition`` can be any boolean expression. |
| In order to know additional predicates, please refer to :doc:`/sql_language/predicates`. |
| |
| ========================== |
| Groupby and Having Clauses |
| ========================== |
| |
| *Synopsis* |
| |
| .. code-block:: sql |
| |
| SELECT select_list |
| FROM ... |
| [WHERE ...] |
| GROUP BY grouping_column_reference [, grouping_column_reference]... |
| [HAVING boolean_expression] |
| |
| The rows which passes ``WHERE`` filter may be subject to grouping, specified by ``GROUP BY`` clause. |
| Grouping combines a set of rows having common values into one group, and then computes rows in the group with aggregation functions. ``HAVING`` clause can be used with only ``GROUP BY`` clause. It eliminates the unqualified result rows of grouping. |
| |
| ``grouping_column_reference`` can be a column reference, a complex expression including scalar functions and arithmetic operations. |
| |
| .. code-block:: sql |
| |
| SELECT l_orderkey, SUM(l_quantity) AS quantity FROM lineitem GROUP BY l_orderkey; |
| |
| SELECT substr(l_shipdate,1,4) as year, SUM(l_orderkey) AS total2 FROM lineitem GROUP BY substr(l_shipdate,1,4); |
| |
| If a SQL statement includes ``GROUP BY`` clause, expressions in select list must be either grouping_column_reference or aggregation function. For example, the following example query is not allowed because ``l_orderkey`` does not occur in ``GROUP BY`` clause. |
| |
| .. code-block:: sql |
| |
| SELECT l_orderkey, l_partkey, SUM(l_orderkey) AS total FROM lineitem GROUP BY l_partkey; |
| |
| Aggregation functions can be used with ``DISTINCT`` keywords. It forces an individual aggregate function to take only distinct values of the argument expression. ``DISTINCT`` keyword is used as follows: |
| |
| .. code-block:: sql |
| |
| SELECT l_partkey, COUNT(distinct l_quantity), SUM(distinct l_extendedprice) AS total FROM lineitem GROUP BY l_partkey; |
| |
| ========================= |
| Orderby and Limit Clauses |
| ========================= |
| |
| *Synopsis* |
| |
| .. code-block:: sql |
| |
| FROM ... ORDER BY <sort_expr> [(ASC|DESC)] [NULLS (FIRST|LAST) [,...] |
| |
| ``sort_expr`` can be a column reference, aliased column reference, or a complex expression. |
| ``ASC`` indicates an ascending order of ``sort_expr`` values. ``DESC`` indicates a descending order of ``sort_expr`` values. |
| ``ASC`` is the default order. |
| |
| ``NULLS FIRST`` and ``NULLS LAST`` options can be used to determine whether nulls values appear |
| before or after non-null values in the sort ordering. By default, null values are dealt as if larger than any non-null value; |
| that is, ``NULLS FIRST`` is the default for ``DESC`` order, and ``NULLS LAST`` otherwise. |
| |
| ================ |
| Window Functions |
| ================ |
| |
| A window function performs a calculation across multiple table rows that belong to some window frame. |
| |
| *Synopsis* |
| |
| .. code-block:: sql |
| |
| SELECT ...., func(param) OVER ([PARTITION BY partition-expr [, ...]] [ORDER BY sort-expr [, ...]]), ...., FROM |
| |
| The PARTITION BY list within OVER specifies dividing the rows into groups, or partitions, that share the same values of |
| the PARTITION BY expression(s). For each row, the window function is computed across the rows that fall into |
| the same partition as the current row. |
| |
| We will briefly explain some examples using window functions. |
| |
| --------- |
| Examples |
| --------- |
| |
| Multiple window functions can be used in a SQL statement as follows: |
| |
| .. code-block:: sql |
| |
| SELECT l_orderkey, sum(l_discount) OVER (PARTITION BY l_orderkey), sum(l_quantity) OVER (PARTITION BY l_orderkey) FROM LINEITEM; |
| |
| If ``OVER()`` clause is empty as following, it makes all table rows into one window frame. |
| |
| .. code-block:: sql |
| |
| SELECT salary, sum(salary) OVER () FROM empsalary; |
| |
| Also, ``ORDER BY`` clause can be used without ``PARTITION BY`` clause as follows: |
| |
| .. code-block:: sql |
| |
| SELECT salary, sum(salary) OVER (ORDER BY salary) FROM empsalary; |
| |
| Also, all expressions and aggregation functions are allowed in ``ORDER BY`` clause as follows: |
| |
| .. code-block:: sql |
| |
| select |
| l_orderkey, |
| count(*) as cnt, |
| row_number() over (partition by l_orderkey order by count(*) desc) |
| row_num |
| from |
| lineitem |
| group by |
| l_orderkey |
| |
| .. note:: |
| |
| Currently, Tajo does not support multiple different partition-expressions in one SQL statement. |