| .\" Licensed to the Apache Software Foundation (ASF) under one or more |
| .\" contributor license agreements. See the NOTICE file distributed with |
| .\" this work for additional information regarding copyright ownership. |
| .\" The ASF licenses this file to You under the Apache License, Version 2.0 |
| .\" (the "License"); you may not use this file except in compliance with |
| .\" the License. You may obtain a copy of the License at |
| .\" |
| .\" http://www.apache.org/licenses/LICENSE-2.0 |
| .\" |
| .\" Unless required by applicable law or agreed to in writing, software |
| .\" distributed under the License is distributed on an "AS IS" BASIS, |
| .\" WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| .\" See the License for the specific language governing permissions and |
| .\" limitations under the License. |
| .\" |
| .\" Process this file with |
| .\" groff -man -Tascii hive.1 |
| .\" |
| .TH hive 1 "October 2010 " Linux "User Manuals" |
| |
| .SH NAME |
| Hive \- Data warehouse infrastructure built atop Hadoop. |
| |
| .SH SYNOPSIS |
| |
| .B hive |
| [OPTIONS] --service \fISERVICE\fR [PARAMETERS] |
| |
| .SH DESCRIPTION |
| |
| Hive is a data warehouse system for Hadoop that facilitates easy data |
| summarization, ad-hoc querying and analysis of large datasets stored |
| in Hadoop compatible file systems. Hive provides a mechanism to put |
| structure on this data and query the data using a SQL-like language |
| called HiveQL. At the same time this language also allows traditional |
| map/reduce programmers to plug in their custom mappers and reducers |
| when it is inconvenient or inefficient to express this logic in |
| HiveQL. |
| |
| Please note that Hadoop is a batch processing system and Hadoop jobs |
| tend to have high latency and incur substantial overheads in job |
| submission and scheduling. Consequently the average latency for Hive |
| queries is generally very high (minutes) even when data sets involved |
| are very small (say a few hundred megabytes). As a result it cannot be |
| compared with systems such as Oracle where analyses are conducted on a |
| significantly smaller amount of data but the analyses proceed much |
| more iteratively with the response times between iterations being less |
| than a few minutes. Hive aims to provide acceptable (but not optimal) |
| latency for interactive data browsing, queries over small data sets or |
| test queries. |
| |
| Hive is not designed for online transaction processing and does not |
| support real-time queries or row level insert/updates. It is best used |
| for batch jobs over large sets of immutable data (like web logs). What |
| Hive values most are scalability (scale out with more machines added |
| dynamically to the Hadoop cluster), extensibility (with MapReduce |
| framework and UDF/UDAF/UDTF), fault-tolerance, and loose-coupling with |
| its input formats. |
| |
| For more information about Hive, see http://hive.apache.org. |
| |
| \fISERVICE\fR may be one of the following: |
| cli The Hive shell, the default service |
| hiveserver Start the Hive server |
| hwi Hive web interface |
| jar Run a jar that uses Hadoop and Hive APIs |
| lineage Output lineage info for a query |
| metastore Start the Hive metastore |
| |
| To list available parameters for a service: |
| .B hive |
| --service \fISERVICE\fR --help |
| |
| .SH OPTIONS |
| |
| .IP "--auxpath" |
| Auxillary jars |
| |
| .IP "--config" |
| Hive configuration directory |
| |
| .IP "--service" |
| Starts specific service/component. cli is default |
| |
| .IP "-hiveconf <x>=<y>" |
| Sets Hive configuration property "x" equal to "y". |
| |
| .SH ENVIRONMENT |
| |
| .IP HIVE_OPT |
| Extra Java runtime options. |
| |
| .IP HADOOP_HOME |
| Optionally, the Hadoop home to run with. |
| |
| .IP HIVE_AUX_JARS_PATH |
| Auxillary JARs, overridden by --auxpath command line argument. |
| |
| .IP HIVE_CONF_DIR |
| Alternate location for Hive configuration directory. |
| |
| .SH COPYRIGHT |
| Copyright (C) 2010 The Apache Software Foundation. All rights reserved. |