blob: 8e5a33e7195800f7eac56401f35668502b83ae97 [file] [log] [blame]
Hive MetaStore Upgrade HowTo
============================
This document describes how to upgrade the schema of a SQL Server backed
Hive MetaStore instance from one release version of Hive to another
release version of Hive. For example, by following the steps listed
below it is possible to upgrade a Hive 0.12.0 MetaStore schema to a
Hive 0.14.0 MetaStore schema. Before attempting this project we
strongly recommend that you read through all of the steps in this
document and familiarize yourself with the required tools.
MetaStore Upgrade Steps
=======================
1) Take care to follow best practice for performing DB schema upgrade.
This normally includes ensuring no one is accessing the database
(Metastore service in particular) and backing up your database using
tools specific to your database type.
2) Upgrade starting with version 0.12.0 follows the usual sequence of
upgrading from installed version to next released version, to next
released version until the desired version is reached as specified
in upgrade.order.mssql. Upgrading from 0.11.0 to 0.13.0 works
differently - please see later in this document.
3) The schema upgrade scripts assume that the schema you are upgrading
closely matches the official schema for your particular version of
Hive. The files in this directory with names like
"hive-schema-x.y.z.mssql.sql" contain dumps of the official schemas
corresponding to each of the released versions of Hive. You can
determine differences between your schema and the official schema
by comparing the contents of the official dump with the schema dump
you created in step 1.
Some differences are acceptable and will not interfere
with the upgrade process, but others need to be resolved manually
or the upgrade scripts will fail to complete.
* Missing Tables: Hive's default configuration causes the MetaStore
to create schema elements only when they are needed. Some tables
may be missing from your MetaStore schema if you have not created
the corresponding Hive catalog objects, e.g. the PARTITIONS table
will probably not exist if you have not created any table
partitions in your MetaStore. You MUST create these missing tables
before running the upgrade scripts. The easiest way to do this is
by executing the official schema DDL script against your
schema. You should expect most of the DDL statements to fail since
the table/constraint/index already exist.
* Reversed Column Constraint Names in the Same Table: Tables with
multiple constraints may have the names of the constraints
reversed. For example, the PARTITIONS table contains two foreign
key constraints named PARTITIONS_FK1 and PARTITIONS_FK2 which
reference SDS.SD_ID and TBLS.TBL_ID respectively. However, in your
schema you may find that PARTITIONS_FK1 references TBLS.TBL_ID and
PARTITIONS_FK2 references SDS.SD_ID. Either version is acceptable
-- the only requirement is that these constraints actually exist.
* Differences in Column/Constraint Names: Your schema may contain
tables with columns named "IDX" or unique keys named
"UNIQUE<tab_name>". If you find either of these in your schema you
will need to change the names to "INTEGER_IDX" and
"UNIQUE_<tab_name>" before running the upgrade scripts. For more
background on this issue please refer to HIVE-1435.
4) Now run upgrade-x.y.z-to-a.b.c.mssql.sql script.
5) Validate the results. Use DB specific tool to generate a set
of DDL statements from your schema and compare it to
hive-schema-a.b.c.mssql.sql.
Upgrading from 0.11.0 to 0.13.0
===============================
1) Manually execute hive-txn-schema-0.13.0.mssql.sql script to create
tables needed for ACID support.
2) Make sure you have
<property>
<name>datanucleus.autoCreateSchema</name>
<value>true</value>
</property>
in your hive-site.xml. This will cause DataNucleus to create
tables which are missing from your database once metastore starts.