| Hive MetaStore Upgrade HowTo |
| ============================ |
| |
| This document describes how to upgrade the schema of a SQL Server backed |
| Hive MetaStore instance from one release version of Hive to another |
| release version of Hive. For example, by following the steps listed |
| below it is possible to upgrade a Hive 0.12.0 MetaStore schema to a |
| Hive 0.14.0 MetaStore schema. Before attempting this project we |
| strongly recommend that you read through all of the steps in this |
| document and familiarize yourself with the required tools. |
| |
| MetaStore Upgrade Steps |
| ======================= |
| |
| 1) Take care to follow best practice for performing DB schema upgrade. |
| This normally includes ensuring no one is accessing the database |
| (Metastore service in particular) and backing up your database using |
| tools specific to your database type. |
| |
| 2) Upgrade starting with version 0.12.0 follows the usual sequence of |
| upgrading from installed version to next released version, to next |
| released version until the desired version is reached as specified |
| in upgrade.order.mssql. Upgrading from 0.11.0 to 0.13.0 works |
| differently - please see later in this document. |
| |
| 3) The schema upgrade scripts assume that the schema you are upgrading |
| closely matches the official schema for your particular version of |
| Hive. The files in this directory with names like |
| "hive-schema-x.y.z.mssql.sql" contain dumps of the official schemas |
| corresponding to each of the released versions of Hive. You can |
| determine differences between your schema and the official schema |
| by comparing the contents of the official dump with the schema dump |
| you created in step 1. |
| |
| Some differences are acceptable and will not interfere |
| with the upgrade process, but others need to be resolved manually |
| or the upgrade scripts will fail to complete. |
| |
| * Missing Tables: Hive's default configuration causes the MetaStore |
| to create schema elements only when they are needed. Some tables |
| may be missing from your MetaStore schema if you have not created |
| the corresponding Hive catalog objects, e.g. the PARTITIONS table |
| will probably not exist if you have not created any table |
| partitions in your MetaStore. You MUST create these missing tables |
| before running the upgrade scripts. The easiest way to do this is |
| by executing the official schema DDL script against your |
| schema. You should expect most of the DDL statements to fail since |
| the table/constraint/index already exist. |
| |
| * Reversed Column Constraint Names in the Same Table: Tables with |
| multiple constraints may have the names of the constraints |
| reversed. For example, the PARTITIONS table contains two foreign |
| key constraints named PARTITIONS_FK1 and PARTITIONS_FK2 which |
| reference SDS.SD_ID and TBLS.TBL_ID respectively. However, in your |
| schema you may find that PARTITIONS_FK1 references TBLS.TBL_ID and |
| PARTITIONS_FK2 references SDS.SD_ID. Either version is acceptable |
| -- the only requirement is that these constraints actually exist. |
| |
| * Differences in Column/Constraint Names: Your schema may contain |
| tables with columns named "IDX" or unique keys named |
| "UNIQUE<tab_name>". If you find either of these in your schema you |
| will need to change the names to "INTEGER_IDX" and |
| "UNIQUE_<tab_name>" before running the upgrade scripts. For more |
| background on this issue please refer to HIVE-1435. |
| |
| 4) Now run upgrade-x.y.z-to-a.b.c.mssql.sql script. |
| |
| 5) Validate the results. Use DB specific tool to generate a set |
| of DDL statements from your schema and compare it to |
| hive-schema-a.b.c.mssql.sql. |
| |
| |
| Upgrading from 0.11.0 to 0.13.0 |
| =============================== |
| 1) Manually execute hive-txn-schema-0.13.0.mssql.sql script to create |
| tables needed for ACID support. |
| |
| 2) Make sure you have |
| <property> |
| <name>datanucleus.autoCreateSchema</name> |
| <value>true</value> |
| </property> |
| in your hive-site.xml. This will cause DataNucleus to create |
| tables which are missing from your database once metastore starts. |
| |
| |