| Hive MetaStore Upgrade HowTo |
| ============================ |
| |
| This document describes how to upgrade the schema of a MySQL backed |
| Hive MetaStore instance from one release version of Hive to another |
| release version of Hive. For example, by following the steps listed |
| below it is possible to upgrade a Hive 0.5.0 MetaStore schema to a |
| Hive 0.7.0 MetaStore schema. Before attempting this project we |
| strongly recommend that you read through all of the steps in this |
| document and familiarize yourself with the required tools. |
| |
| MetaStore Upgrade Steps |
| ======================= |
| |
| 1) Shutdown your MetaStore instance and restrict access to the |
| MetaStore's MySQL database. It is very important that no one else |
| accesses or modifies the contents of database while you are |
| performing the schema upgrade. |
| |
| 2) Create a backup of your MySQL metastore database. This will allow |
| you to revert any changes made during the upgrade process if |
| something goes wrong. The mysqldump utility is the easiest way to |
| create a backup of a MySQL database: |
| |
| % mysqldump --opt <metastore_db_name> > metastore_backup.sql |
| |
| Note that you may need also need to specify a hostname and username |
| using the --host and --user command line switches. |
| |
| 3) Dump your metastore database schema to a file. We use the mysqldump |
| utility again, but this time with a command line option that |
| specifies we are only interested in dumping the DDL statements |
| required to create the schema: |
| |
| % mysqldump --skip-add-drop-table --no-data <metastore_db_name> > my-schema-x.y.z.mysql.sql |
| |
| 4) The schema upgrade scripts assume that the schema you are upgrading |
| closely matches the official schema for your particular version of |
| Hive. The files in this directory with names like |
| "hive-schema-x.y.z.mysql.sql" contain dumps of the official schemas |
| corresponding to each of the released versions of Hive. You can |
| determine differences between your schema and the official schema |
| by diffing the contents of the official dump with the schema dump |
| you created in the previous step. Some differences are acceptable |
| and will not interfere with the upgrade process, but others need to |
| be resolved manually or the upgrade scripts will fail to complete. |
| |
| * Missing Tables: Hive's default configuration causes the MetaStore |
| to create schema elements only when they are needed. Some tables |
| may be missing from your MetaStore schema if you have not created |
| the corresponding Hive catalog objects, e.g. the PARTITIONS table |
| will probably not exist if you have not created any table |
| partitions in your MetaStore. You MUST create these missing tables |
| before running the upgrade scripts. The easiest way to do this is |
| by executing the official schema DDL script against your |
| schema. Each of the CREATE TABLE statements in the schema script |
| include an IF NOT EXISTS clause, so tables which already exist in |
| your schema will be ignored, and those which don't exist will get |
| created. |
| |
| * Extra Tables: Your schema may include a table named NUCLEUS_TABLES |
| or a table named SEQUENCE_TABLE. These tables are managed |
| by the DataNucleus ORM layer and will be created automatically if |
| they don't exist. No action on your part is required. |
| |
| * Reversed Column Constraint Names in the Same Table: Tables with |
| multiple constraints may have the names of the constraints |
| reversed. For example, the PARTITIONS table contains two foreign |
| key constraints named PARTITIONS_FK1 and PARTITIONS_FK2 which |
| reference SDS.SD_ID and TBLS.TBL_ID respectively. However, in your |
| schema you may find that PARTITIONS_FK1 references TBLS.TBL_ID and |
| PARTITIONS_FK2 references SDS.SD_ID. Either version is acceptable |
| -- the only requirement is that these constraints actually exist. |
| |
| * Differences in Column/Constraint Names: Your schema may contain |
| tables with columns named "IDX" or unique keys named |
| "UNIQUE<tab_name>". If you find either of these in your schema you |
| will need to change the names to "INTEGER_IDX" and |
| "UNIQUE_<tab_name>" before running the upgrade scripts. For more |
| background on this issue please refer to HIVE-1435. |
| |
| 5) You are now ready to run the schema upgrade scripts. If you are |
| upgrading from Hive 0.5.0 to Hive 0.6.0 you need to run the |
| upgrade-0.5.0-to-0.6.0.mysql.sql script, but if you are upgrading |
| from 0.5.0 to 0.7.0 you will need to run the 0.5.0 to 0.6.0 upgrade |
| script followed by the 0.6.0 to 0.7.0 upgrade script. |
| |
| % mysql --verbose |
| mysql> use <metastore_db_name>; |
| Database changed |
| mysql> source upgrade-0.5.0-to-0.6.0.mysql.sql |
| mysql> source upgrade-0.6.0-to-0.7.0.mysql.sql |
| |
| These scripts should run to completion without any errors. If you |
| do encounter errors you need to analyze the cause and attempt to |
| trace it back to one of the preceding steps. |
| |
| 6) The final step of the upgrade process is validating your freshly |
| upgraded schema against the official schema for your particular |
| version of Hive. This is accomplished by repeating steps (3) and |
| (4), but this time comparing against the official version of the |
| upgraded schema, e.g. if you upgraded the schema to Hive 0.7.0 then |
| you will want to compare your schema dump against the contents of |
| hive-schema-0.7.0.mysql.sql |