| ********************************* |
| Session Variables |
| ********************************* |
| |
| Once a tajo client connects to the Tajo master, it assigns a unique session. This session is kept until the client is disconnected or it is expired. |
| |
| For the sake of more convenient user configuration, Tajo provides `session variables`. |
| With session variables, different configurations are allowed for each session. |
| |
| ``tsql`` provides the meta command ``\set`` to manipulate session variables. Just ``\set`` command shows all session variables. :: |
| |
| default> \set |
| 'name1'='val1' |
| 'name2'='val2' |
| 'name3'='val3' |
| ... |
| |
| ``\set key val`` will set the session variable named *key* with the value *val*. :: |
| |
| default> \set |
| 'CURRENT_DATABASE'='default' |
| |
| default> \set key1 val1 |
| |
| default> \set |
| 'CURRENT_DATABASE'='default' |
| 'key1'='val1' |
| |
| |
| Also, ``\unset key`` will unset the session variable named *key*. |
| |
| |
| Currently, tajo provides the following session variables. |
| |
| .. describe:: BROADCAST_NON_CROSS_JOIN_THRESHOLD |
| |
| A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won't exceed this threshold. |
| |
| * Configuration name: :ref:`tajo.dist-query.broadcast.non-cross-join.threshold-kb` |
| * Property value: Integer |
| * Unit: KB |
| * Default value: 5120 |
| * Example |
| |
| .. code-block:: sh |
| |
| \set BROADCAST_NON_CROSS_JOIN_THRESHOLD 5120 |
| |
| .. describe:: BROADCAST_CROSS_JOIN_THRESHOLD |
| |
| A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won't exceed this threshold. |
| |
| * Configuration name: :ref:`tajo.dist-query.broadcast.cross-join.threshold-kb` |
| * Property value: Integer |
| * Unit: KB |
| * Default value: 1024 |
| * Example |
| |
| .. code-block:: sh |
| |
| \set BROADCAST_CROSS_JOIN_THRESHOLD 1024 |
| |
| .. warning:: |
| In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully. |
| |
| .. describe:: JOIN_TASK_INPUT_SIZE |
| |
| The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage. |
| As a result, it determines the degree of the parallel processing of the join query. |
| |
| * Configuration name: :ref:`tajo.dist-query.join.task-volume-mb` |
| * Property value: Integer |
| * Unit: MB |
| * Default value: 64 |
| * Example |
| |
| .. code-block:: sh |
| |
| \set JOIN_TASK_INPUT_SIZE 64 |
| |
| .. describe:: JOIN_PER_SHUFFLE_SIZE |
| |
| The repartition join is executed in two stages. When a join query is executed with the repartition join, |
| this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages. |
| |
| * Configuration name: :ref:`tajo.dist-query.join.partition-volume-mb` |
| * Property value: Integer |
| * Unit: MB |
| * Default value: 128 |
| * Example |
| |
| .. code-block:: sh |
| |
| \set JOIN_PER_SHUFFLE_SIZE 128 |
| |
| .. describe:: HASH_JOIN_SIZE_LIMIT |
| |
| This value provides the criterion to decide the algorithm to perform a join in a task. |
| If the input data is smaller than this value, join is performed with the in-memory hash join. |
| Otherwise, the sort-merge join is used. |
| |
| * Configuration name: :ref:`tajo.executor.join.common.in-memory-hash-threshold-mb` |
| * Property value: Integer |
| * Unit: MB |
| * Default value: 64 |
| * Example |
| |
| .. code-block:: sh |
| |
| \set HASH_JOIN_SIZE_LIMIT 64 |
| |
| .. warning:: |
| This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, |
| its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. |
| This value should be tuned carefully. |
| |
| .. describe:: INNER_HASH_JOIN_SIZE_LIMIT |
| |
| This value provides the criterion to decide the algorithm to perform an inner join in a task. |
| If the input data is smaller than this value, the inner join is performed with the in-memory hash join. |
| Otherwise, the sort-merge join is used. |
| |
| * Configuration name: :ref:`tajo.executor.join.inner.in-memory-hash-threshold-mb` |
| * Property value: Integer |
| * Unit: MB |
| * Default value: 64 |
| * Example |
| |
| .. code-block:: sh |
| |
| \set INNER_HASH_JOIN_SIZE_LIMIT 64 |
| |
| .. warning:: |
| This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, |
| its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. |
| This value should be tuned carefully. |
| |
| .. describe:: OUTER_HASH_JOIN_SIZE_LIMIT |
| |
| This value provides the criterion to decide the algorithm to perform an outer join in a task. |
| If the input data is smaller than this value, the outer join is performed with the in-memory hash join. |
| Otherwise, the sort-merge join is used. |
| |
| * Configuration name: :ref:`tajo.executor.join.outer.in-memory-hash-threshold-mb` |
| * Property value: Integer |
| * Unit: MB |
| * Default value: 64 |
| * Example |
| |
| .. code-block:: sh |
| |
| \set OUTER_HASH_JOIN_SIZE_LIMIT 64 |
| |
| .. warning:: |
| This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, |
| its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. |
| This value should be tuned carefully. |
| |
| .. describe:: JOIN_HASH_TABLE_SIZE |
| |
| The initial size of hash table for in-memory hash join. |
| |
| * Configuration name: :ref:`tajo.executor.join.hash-table.size` |
| * Property value: Integer |
| * Default value: 100000 |
| * Example |
| |
| .. code-block:: sh |
| |
| \set JOIN_HASH_TABLE_SIZE 100000 |
| |
| .. describe:: SORT_TASK_INPUT_SIZE |
| |
| The sort operation is executed in two stages. When a sort query is executed, this value indicates the amount of input data processed by each task at the second stage. |
| As a result, it determines the degree of the parallel processing of the sort query. |
| |
| * Configuration name: :ref:`tajo.dist-query.sort.task-volume-mb` |
| * Property value: Integer |
| * Unit: MB |
| * Default value: 64 |
| * Example |
| |
| .. code-block:: sh |
| |
| \set SORT_TASK_INPUT_SIZE 64 |
| |
| .. describe:: EXTSORT_BUFFER_SIZE |
| |
| A threshold to choose the sort algorithm. If the input data is larger than this threshold, the external sort algorithm is used. |
| |
| * Configuration name: :ref:`tajo.executor.external-sort.buffer-mb` |
| * Property value: Integer |
| * Unit: MB |
| * Default value: 200 |
| * Example |
| |
| .. code-block:: sh |
| |
| \set EXTSORT_BUFFER_SIZE 200 |
| |
| .. describe:: SORT_LIST_SIZE |
| |
| The initial size of list for in-memory sort. |
| |
| * Configuration name: :ref:`tajo.executor.sort.list.size` |
| * Property value: Integer |
| * Default value: 100000 |
| * Example |
| |
| .. code-block:: sh |
| |
| \set SORT_LIST_SIZE 100000 |
| |
| .. describe:: GROUPBY_MULTI_LEVEL_ENABLED |
| |
| A flag to enable the multi-level algorithm for distinct aggregation. If this value is set, 3-phase aggregation algorithm is used. |
| Otherwise, 2-phase aggregation algorithm is used. |
| |
| * Configuration name: :ref:`tajo.dist-query.groupby.multi-level-aggr` |
| * Property value: Boolean |
| * Default value: true |
| * Example |
| |
| .. code-block:: sh |
| |
| \set GROUPBY_MULTI_LEVEL_ENABLED true |
| |
| .. describe:: GROUPBY_PER_SHUFFLE_SIZE |
| |
| The aggregation is executed in two stages. When an aggregation query is executed, |
| this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages. |
| |
| * Configuration name: :ref:`tajo.dist-query.groupby.partition-volume-mb` |
| * Property value: Integer |
| * Unit: MB |
| * Default value: 256 |
| * Example |
| |
| .. code-block:: sh |
| |
| \set GROUPBY_PER_SHUFFLE_SIZE 256 |
| |
| .. describe:: GROUPBY_TASK_INPUT_SIZE |
| |
| The aggregation operation is executed in two stages. When an aggregation query is executed, this value indicates the amount of input data processed by each task at the second stage. |
| As a result, it determines the degree of the parallel processing of the aggregation query. |
| |
| * Configuration name: :ref:`tajo.dist-query.groupby.task-volume-mb` |
| * Property value: Integer |
| * Unit: MB |
| * Default value: 64 |
| * Example |
| |
| .. code-block:: sh |
| |
| \set GROUPBY_TASK_INPUT_SIZE 64 |
| |
| .. describe:: HASH_GROUPBY_SIZE_LIMIT |
| |
| This value provides the criterion to decide the algorithm to perform an aggregation in a task. |
| If the input data is smaller than this value, the aggregation is performed with the in-memory hash aggregation. |
| Otherwise, the sort-based aggregation is used. |
| |
| * Configuration name: :ref:`tajo.executor.groupby.in-memory-hash-threshold-mb` |
| * Property value: Integer |
| * Unit: MB |
| * Default value: 64 |
| * Example |
| |
| .. code-block:: sh |
| |
| \set HASH_GROUPBY_SIZE_LIMIT 64 |
| |
| .. warning:: |
| This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, |
| its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. |
| This value should be tuned carefully. |
| |
| .. describe:: AGG_HASH_TABLE_SIZE |
| |
| The initial size of hash table for in-memory aggregation. |
| |
| * Configuration name: :ref:`tajo.executor.aggregate.hash-table.size` |
| * Property value: Integer |
| * Default value: 10000 |
| * Example |
| |
| .. code-block:: sh |
| |
| \set AGG_HASH_TABLE_SIZE 10000 |
| |
| .. describe:: TIMEZONE |
| |
| Refer to :doc:`/time_zone`. |
| |
| * Configuration name: :ref:`tajo.timezone` |
| * Property value: Time zone id |
| * Default value: Default time zone of JVM |
| * Example |
| |
| .. code-block:: sh |
| |
| \set TIMEZONE GMT+9 |
| |
| .. describe:: DATE_ORDER |
| |
| Date order specification. |
| |
| * Configuration name: :ref:`tajo.datetime.date-order` |
| * Property value: One of YMD, DMY, MDY. |
| * Default value: YMD |
| * Example |
| |
| .. code-block:: sh |
| |
| \set DATE_ORDER YMD |
| |
| .. describe:: PARTITION_NO_RESULT_OVERWRITE_ENABLED |
| |
| If this value is true, a partitioned table is overwritten even if a subquery leads to no result. Otherwise, the table data will be kept if there is no result. |
| |
| * Configuration name: :ref:`tajo.partition.overwrite.even-if-no-result` |
| * Property value: Boolean |
| * Default value: false |
| * Example |
| |
| .. code-block:: sh |
| |
| \set PARTITION_NO_RESULT_OVERWRITE_ENABLED false |
| |
| .. describe:: TABLE_PARTITION_PER_SHUFFLE_SIZE |
| |
| In Tajo, storing a partition table is executed in two stages. |
| This value indicates the output size of a task of the former stage, which determines the number of partitions to be shuffled between two stages. |
| |
| * Configuration name: :ref:`tajo.dist-query.table-partition.task-volume-mb` |
| * Property value: Integer |
| * Unit: MB |
| * Default value: 256 |
| * Example |
| |
| .. code-block:: sh |
| |
| \set TABLE_PARTITION_PER_SHUFFLE_SIZE 256 |
| |
| .. describe:: ARITHABORT |
| |
| A flag to indicate how to handle the errors caused by invalid arithmetic operations. If true, a running query will be terminated with an overflow or a divide-by-zero. |
| |
| * Configuration name: :ref:`tajo.behavior.arithmetic-abort` |
| * Property value: Boolean |
| * Default value: false |
| * Example |
| |
| .. code-block:: sh |
| |
| \set ARITHABORT false |
| |
| .. describe:: MAX_OUTPUT_FILE_SIZE |
| |
| Maximum per-output file size. 0 means infinite. |
| |
| * Property value: Integer |
| * Unit: MB |
| * Default value: 0 |
| * Example |
| |
| .. code-block:: sh |
| |
| \set MAX_OUTPUT_FILE_SIZE 0 |
| |
| .. describe:: SESSION_EXPIRY_TIME |
| |
| Session expiry time. |
| |
| * Property value: Integer |
| * Unit: seconds |
| * Default value: 3600 |
| * Example |
| |
| .. code-block:: sh |
| |
| \set SESSION_EXPIRY_TIME 3600 |
| |
| .. describe:: CLI_COLUMNS |
| |
| Sets the width for the wrapped format. |
| |
| * Property value: Integer |
| * Default value: 120 |
| * Example |
| |
| .. code-block:: sh |
| |
| \set CLI_COLUMNS 120 |
| |
| .. describe:: CLI_NULL_CHAR |
| |
| Sets the string to be printed in place of a null value. |
| |
| * Property value: String |
| * Default value: '' |
| * Example |
| |
| .. code-block:: sh |
| |
| \set CLI_NULL_CHAR '' |
| |
| .. describe:: CLI_PAGE_ROWS |
| |
| Sets the number of rows for paging. |
| |
| * Property value: Integer |
| * Default value: 100 |
| * Example |
| |
| .. code-block:: sh |
| |
| \set CLI_PAGE_ROWS 100 |
| |
| .. describe:: CLI_PAGING_ENABLED |
| |
| Enable paging of result display. |
| |
| * Property value: Boolean |
| * Default value: true |
| * Example |
| |
| .. code-block:: sh |
| |
| \set CLI_PAGING_ENABLED true |
| |
| .. describe:: CLI_DISPLAY_ERROR_TRACE |
| |
| Enable display of error trace. |
| |
| * Property value: Boolean |
| * Default value: true |
| * Example |
| |
| .. code-block:: sh |
| |
| \set CLI_DISPLAY_ERROR_TRACE true |
| |
| .. describe:: CLI_FORMATTER_CLASS |
| |
| Sets the output format class to display results. |
| |
| * Property value: Class name |
| * Default value: org.apache.tajo.cli.tsql.DefaultTajoCliOutputFormatter |
| * Example |
| |
| .. code-block:: sh |
| |
| \set CLI_FORMATTER_CLASS org.apache.tajo.cli.tsql.DefaultTajoCliOutputFormatter |
| |
| .. describe:: ON_ERROR_STOP |
| |
| tsql will exit if an error occurs. |
| |
| * Property value: Boolean |
| * Default value: false |
| * Example |
| |
| .. code-block:: sh |
| |
| \set ON_ERROR_STOP false |
| |
| .. describe:: NULL_CHAR |
| |
| Null char of text file output. This value is used when the table property `text.null` is not specified. |
| |
| * Property value: String |
| * Default value: '\\N' |
| * Example |
| |
| .. code-block:: sh |
| |
| \set NULL_CHAR '\\N' |
| |
| .. describe:: DEBUG_ENABLED |
| |
| A flag to enable debug mode. |
| |
| * Property value: Boolean |
| * Default value: false |
| * Example |
| |
| .. code-block:: sh |
| |
| \set DEBUG_ENABLED false |
| |
| .. describe:: FETCH_ROWNUM |
| |
| The number of rows to be fetched from Master each time. |
| |
| * Property value: Integer |
| * Default value: 200 |
| * Example |
| |
| .. code-block:: sh |
| |
| \set FETCH_ROWNUM 200 |
| |
| |
| |