HAWQ supports several storage models and a mix of storage models. When you create a table, you choose how to store its data. This topic explains the options for table storage and how to choose the best storage model for your workload.
Note: To simplify the creation of database tables, you can specify the default values for some table storage options with the HAWQ server configuration parameter gp_default_storage_options
.
HAWQ provides storage orientation models of either row-oriented or Parquet tables. Evaluate performance using your own data and query workloads to determine the best alternatives.
Row-oriented storage: good for OLTP types of workloads with many iterative transactions and many columns of a single row needed all at once, so retrieving is efficient.
Note: Column-oriented storage is no longer available. Parquet storage should be used, instead.
Row-oriented storage provides the best options for the following situations:
SELECT
list or WHERE
clause of your queries, choose a row-oriented model.The ALTER TABLE
command changes the definition of a table. Use ALTER TABLE
to change table attributes such as column definitions, distribution policy, storage model, and partition structure (see also Maintaining Partitioned Tables). For example, to add a not-null constraint to a table column:
=> ALTER TABLE address ALTER COLUMN street SET NOT NULL;
ALTER TABLE
provides options to change a table's distribution policy . When the table distribution options change, the table data is redistributed on disk, which can be resource intensive. You can also redistribute table data using the existing distribution policy.
For partitioned tables, changes to the distribution policy apply recursively to the child partitions. This operation preserves the ownership and all other attributes of the table. For example, the following command redistributes the table sales across all segments using the customer_id column as the distribution key:
ALTER TABLE sales SET DISTRIBUTED BY (customer_id);
When you change the hash distribution of a table, table data is automatically redistributed. Changing the distribution policy to a random distribution does not cause the data to be redistributed. For example:
ALTER TABLE sales SET DISTRIBUTED RANDOMLY;
To redistribute table data for tables with a random distribution policy (or when the hash distribution policy has not changed) use REORGANIZE=TRUE
. Reorganizing data may be necessary to correct a data skew problem, or when segment resources are added to the system. For example, the following command redistributes table data across all segments using the current distribution policy, including random distribution.
ALTER TABLE sales SET WITH (REORGANIZE=TRUE);
TheDROP TABLE
command removes tables from the database. For example:
DROP TABLE mytable;
To empty a table of rows without removing the table definition, use DELETE
or TRUNCATE
. For example:
DELETE FROM mytable; TRUNCATE mytable;
DROP TABLE
always removes any indexes, rules, triggers, and constraints that exist for the target table. Specify CASCADE
to drop a table that is referenced by a view. CASCADE
removes dependent views.