blob: 9a1acdc6302ba4b098f8675b18bc1f2b5c7ac019 [file] [log] [blame]
////
/**
*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
////
[[external_apis]]
= Apache HBase External APIs
:doctype: book
:numbered:
:toc: left
:icons: font
:experimental:
This chapter will cover access to Apache HBase either through non-Java languages and
through custom protocols. For information on using the native HBase APIs, refer to
link:http://hbase.apache.org/apidocs/index.html[User API Reference] and the
<<hbase_apis,HBase APIs>> chapter.
== REST
Representational State Transfer (REST) was introduced in 2000 in the doctoral
dissertation of Roy Fielding, one of the principal authors of the HTTP specification.
REST itself is out of the scope of this documentation, but in general, REST allows
client-server interactions via an API that is tied to the URL itself. This section
discusses how to configure and run the REST server included with HBase, which exposes
HBase tables, rows, cells, and metadata as URL specified resources.
There is also a nice series of blogs on
link:http://blog.cloudera.com/blog/2013/03/how-to-use-the-apache-hbase-rest-interface-part-1/[How-to: Use the Apache HBase REST Interface]
by Jesse Anderson.
=== Starting and Stopping the REST Server
The included REST server can run as a daemon which starts an embedded Jetty
servlet container and deploys the servlet into it. Use one of the following commands
to start the REST server in the foreground or background. The port is optional, and
defaults to 8080.
[source, bash]
----
# Foreground
$ bin/hbase rest start -p <port>
# Background, logging to a file in $HBASE_LOGS_DIR
$ bin/hbase-daemon.sh start rest -p <port>
----
To stop the REST server, use Ctrl-C if you were running it in the foreground, or the
following command if you were running it in the background.
[source, bash]
----
$ bin/hbase-daemon.sh stop rest
----
=== Configuring the REST Server and Client
For information about configuring the REST server and client for SSL, as well as `doAs`
impersonation for the REST server, see <<security.gateway.thrift>> and other portions
of the <<security>> chapter.
=== Using REST Endpoints
The following examples use the placeholder server pass:[http://example.com:8000], and
the following commands can all be run using `curl` or `wget` commands. You can request
plain text (the default), XML , or JSON output by adding no header for plain text,
or the header "Accept: text/xml" for XML or "Accept: application/json" for JSON.
NOTE: Unless specified, use `GET` requests for queries, `PUT` or `POST` requests for
creation or mutation, and `DELETE` for deletion.
==== Cluster Information
.HBase Version
----
http://example.com:8000/version/cluster
----
.Cluster Status
----
http://example.com:8000/status/cluster
----
.Table List
----
http://example.com:8000/
----
==== Table Information
.Table Schema (GET)
To retrieve the table schema, use a `GET` request with the `/schema` endpoint:
----
http://example.com:8000/<table>/schema
----
.Table Creation
To create a table, use a `PUT` request with the `/schema` endpoint:
----
http://example.com:8000/<table>/schema
----
.Table Schema Update
To update a table, use a `POST` request with the `/schema` endpoint:
----
http://example.com:8000/<table>/schema
----
.Table Deletion
To delete a table, use a `DELETE` request with the `/schema` endpoint:
----
http://example.com:8000/<table>/schema
----
.Table Regions
----
http://example.com:8000/<table>/regions
----
==== Gets
.GET a Single Cell Value
To get a single cell value, use a URL scheme like the following:
----
http://example.com:8000/<table>/<row>/<column>:<qualifier>/<timestamp>/content:raw
----
The column qualifier and timestamp are optional. Without them, the whole row will
be returned, or the newest version will be returned.
.Multiple Single Values (Multi-Get)
To get multiple single values, specify multiple column:qualifier tuples and/or a start-timestamp
and end-timestamp. You can also limit the number of versions.
----
http://example.com:8000/<table>/<row>/<column>:<qualifier>?v=<num-versions>
----
.Globbing Rows
To scan a series of rows, you can use a `*` glob
character on the <row> value to glob together multiple rows.
----
http://example.com:8000/urls/https|ad.doubleclick.net|*
----
==== Puts
For Puts, `PUT` and `POST` are equivalent.
.Put a Single Value
The column qualifier and the timestamp are optional.
----
http://example.com:8000/put/<table>/<row>/<column>:<qualifier>/<timestamp>
http://example.com:8000/test/testrow/test:testcolumn
----
.Put Multiple Values
To put multiple values, use a false row key. Row, column, and timestamp values in
the supplied cells override the specifications on the path, allowing you to post
multiple values to a table in batch. The HTTP response code indicates the status of
the put. Set the `Content-Type` to `text/xml` for XML encoding or to `application/x-protobuf`
for protobufs encoding. Supply the commit data in the `PUT` or `POST` body, using
the <<xml_schema>> and <<protobufs_schema>> as guidelines.
==== Scans
`PUT` and `POST` are equivalent for scans.
.Scanner Creation
To create a scanner, use the `/scanner` endpoint. The HTTP response code indicates
success (201) or failure (anything else), and on successful scanner creation, the
URI is returned which should be used to address the scanner.
----
http://example.com:8000/<table>/scanner
----
.Scanner Get Next
To get the next batch of cells found by the scanner, use the `/scanner/<scanner-id>'
endpoint, using the URI returned by the scanner creation endpoint. If the scanner
is exhausted, HTTP status `204` is returned.
----
http://example.com:8000/<table>/scanner/<scanner-id>
----
.Scanner Deletion
To delete resources associated with a scanner, send a HTTP `DELETE` request to the
`/scanner/<scanner-id>` endpoint.
----
http://example.com:8000/<table>/scanner/<scanner-id>
----
[[xml_schema]]
=== REST XML Schema
[source,xml]
----
<schema xmlns="http://www.w3.org/2001/XMLSchema" xmlns:tns="RESTSchema">
<element name="Version" type="tns:Version"></element>
<complexType name="Version">
<attribute name="REST" type="string"></attribute>
<attribute name="JVM" type="string"></attribute>
<attribute name="OS" type="string"></attribute>
<attribute name="Server" type="string"></attribute>
<attribute name="Jersey" type="string"></attribute>
</complexType>
<element name="TableList" type="tns:TableList"></element>
<complexType name="TableList">
<sequence>
<element name="table" type="tns:Table" maxOccurs="unbounded" minOccurs="1"></element>
</sequence>
</complexType>
<complexType name="Table">
<sequence>
<element name="name" type="string"></element>
</sequence>
</complexType>
<element name="TableInfo" type="tns:TableInfo"></element>
<complexType name="TableInfo">
<sequence>
<element name="region" type="tns:TableRegion" maxOccurs="unbounded" minOccurs="1"></element>
</sequence>
<attribute name="name" type="string"></attribute>
</complexType>
<complexType name="TableRegion">
<attribute name="name" type="string"></attribute>
<attribute name="id" type="int"></attribute>
<attribute name="startKey" type="base64Binary"></attribute>
<attribute name="endKey" type="base64Binary"></attribute>
<attribute name="location" type="string"></attribute>
</complexType>
<element name="TableSchema" type="tns:TableSchema"></element>
<complexType name="TableSchema">
<sequence>
<element name="column" type="tns:ColumnSchema" maxOccurs="unbounded" minOccurs="1"></element>
</sequence>
<attribute name="name" type="string"></attribute>
<anyAttribute></anyAttribute>
</complexType>
<complexType name="ColumnSchema">
<attribute name="name" type="string"></attribute>
<anyAttribute></anyAttribute>
</complexType>
<element name="CellSet" type="tns:CellSet"></element>
<complexType name="CellSet">
<sequence>
<element name="row" type="tns:Row" maxOccurs="unbounded" minOccurs="1"></element>
</sequence>
</complexType>
<element name="Row" type="tns:Row"></element>
<complexType name="Row">
<sequence>
<element name="key" type="base64Binary"></element>
<element name="cell" type="tns:Cell" maxOccurs="unbounded" minOccurs="1"></element>
</sequence>
</complexType>
<element name="Cell" type="tns:Cell"></element>
<complexType name="Cell">
<sequence>
<element name="value" maxOccurs="1" minOccurs="1">
<simpleType><restriction base="base64Binary">
</simpleType>
</element>
</sequence>
<attribute name="column" type="base64Binary" />
<attribute name="timestamp" type="int" />
</complexType>
<element name="Scanner" type="tns:Scanner"></element>
<complexType name="Scanner">
<sequence>
<element name="column" type="base64Binary" minOccurs="0" maxOccurs="unbounded"></element>
</sequence>
<sequence>
<element name="filter" type="string" minOccurs="0" maxOccurs="1"></element>
</sequence>
<attribute name="startRow" type="base64Binary"></attribute>
<attribute name="endRow" type="base64Binary"></attribute>
<attribute name="batch" type="int"></attribute>
<attribute name="startTime" type="int"></attribute>
<attribute name="endTime" type="int"></attribute>
</complexType>
<element name="StorageClusterVersion" type="tns:StorageClusterVersion" />
<complexType name="StorageClusterVersion">
<attribute name="version" type="string"></attribute>
</complexType>
<element name="StorageClusterStatus"
type="tns:StorageClusterStatus">
</element>
<complexType name="StorageClusterStatus">
<sequence>
<element name="liveNode" type="tns:Node"
maxOccurs="unbounded" minOccurs="0">
</element>
<element name="deadNode" type="string" maxOccurs="unbounded"
minOccurs="0">
</element>
</sequence>
<attribute name="regions" type="int"></attribute>
<attribute name="requests" type="int"></attribute>
<attribute name="averageLoad" type="float"></attribute>
</complexType>
<complexType name="Node">
<sequence>
<element name="region" type="tns:Region"
maxOccurs="unbounded" minOccurs="0">
</element>
</sequence>
<attribute name="name" type="string"></attribute>
<attribute name="startCode" type="int"></attribute>
<attribute name="requests" type="int"></attribute>
<attribute name="heapSizeMB" type="int"></attribute>
<attribute name="maxHeapSizeMB" type="int"></attribute>
</complexType>
<complexType name="Region">
<attribute name="name" type="base64Binary"></attribute>
<attribute name="stores" type="int"></attribute>
<attribute name="storefiles" type="int"></attribute>
<attribute name="storefileSizeMB" type="int"></attribute>
<attribute name="memstoreSizeMB" type="int"></attribute>
<attribute name="storefileIndexSizeMB" type="int"></attribute>
</complexType>
</schema>
----
[[protobufs_schema]]
=== REST Protobufs Schema
[source,json]
----
message Version {
optional string restVersion = 1;
optional string jvmVersion = 2;
optional string osVersion = 3;
optional string serverVersion = 4;
optional string jerseyVersion = 5;
}
message StorageClusterStatus {
message Region {
required bytes name = 1;
optional int32 stores = 2;
optional int32 storefiles = 3;
optional int32 storefileSizeMB = 4;
optional int32 memstoreSizeMB = 5;
optional int32 storefileIndexSizeMB = 6;
}
message Node {
required string name = 1; // name:port
optional int64 startCode = 2;
optional int32 requests = 3;
optional int32 heapSizeMB = 4;
optional int32 maxHeapSizeMB = 5;
repeated Region regions = 6;
}
// node status
repeated Node liveNodes = 1;
repeated string deadNodes = 2;
// summary statistics
optional int32 regions = 3;
optional int32 requests = 4;
optional double averageLoad = 5;
}
message TableList {
repeated string name = 1;
}
message TableInfo {
required string name = 1;
message Region {
required string name = 1;
optional bytes startKey = 2;
optional bytes endKey = 3;
optional int64 id = 4;
optional string location = 5;
}
repeated Region regions = 2;
}
message TableSchema {
optional string name = 1;
message Attribute {
required string name = 1;
required string value = 2;
}
repeated Attribute attrs = 2;
repeated ColumnSchema columns = 3;
// optional helpful encodings of commonly used attributes
optional bool inMemory = 4;
optional bool readOnly = 5;
}
message ColumnSchema {
optional string name = 1;
message Attribute {
required string name = 1;
required string value = 2;
}
repeated Attribute attrs = 2;
// optional helpful encodings of commonly used attributes
optional int32 ttl = 3;
optional int32 maxVersions = 4;
optional string compression = 5;
}
message Cell {
optional bytes row = 1; // unused if Cell is in a CellSet
optional bytes column = 2;
optional int64 timestamp = 3;
optional bytes data = 4;
}
message CellSet {
message Row {
required bytes key = 1;
repeated Cell values = 2;
}
repeated Row rows = 1;
}
message Scanner {
optional bytes startRow = 1;
optional bytes endRow = 2;
repeated bytes columns = 3;
optional int32 batch = 4;
optional int64 startTime = 5;
optional int64 endTime = 6;
}
----
== Thrift
Documentation about Thrift has moved to <<thrift>>.
[[c]]
== C/C++ Apache HBase Client
FB's Chip Turner wrote a pure C/C++ client.
link:https://github.com/facebook/native-cpp-hbase-client[Check it out].
[[jdo]]
== Using Java Data Objects (JDO) with HBase
link:https://db.apache.org/jdo/[Java Data Objects (JDO)] is a standard way to
access persistent data in databases, using plain old Java objects (POJO) to
represent persistent data.
.Dependencies
This code example has the following dependencies:
. HBase 0.90.x or newer
. commons-beanutils.jar (http://commons.apache.org/)
. commons-pool-1.5.5.jar (http://commons.apache.org/)
. transactional-tableindexed for HBase 0.90 (https://github.com/hbase-trx/hbase-transactional-tableindexed)
.Download `hbase-jdo`
Download the code from http://code.google.com/p/hbase-jdo/.
.JDO Example
====
This example uses JDO to create a table and an index, insert a row into a table, get
a row, get a column value, perform a query, and do some additional HBase operations.
[source, java]
----
package com.apache.hadoop.hbase.client.jdo.examples;
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
import java.util.Hashtable;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.client.tableindexed.IndexedTable;
import com.apache.hadoop.hbase.client.jdo.AbstractHBaseDBO;
import com.apache.hadoop.hbase.client.jdo.HBaseBigFile;
import com.apache.hadoop.hbase.client.jdo.HBaseDBOImpl;
import com.apache.hadoop.hbase.client.jdo.query.DeleteQuery;
import com.apache.hadoop.hbase.client.jdo.query.HBaseOrder;
import com.apache.hadoop.hbase.client.jdo.query.HBaseParam;
import com.apache.hadoop.hbase.client.jdo.query.InsertQuery;
import com.apache.hadoop.hbase.client.jdo.query.QSearch;
import com.apache.hadoop.hbase.client.jdo.query.SelectQuery;
import com.apache.hadoop.hbase.client.jdo.query.UpdateQuery;
/**
* Hbase JDO Example.
*
* dependency library.
* - commons-beanutils.jar
* - commons-pool-1.5.5.jar
* - hbase0.90.0-transactionl.jar
*
* you can expand Delete,Select,Update,Insert Query classes.
*
*/
public class HBaseExample {
public static void main(String[] args) throws Exception {
AbstractHBaseDBO dbo = new HBaseDBOImpl();
//*drop if table is already exist.*
if(dbo.isTableExist("user")){
dbo.deleteTable("user");
}
//*create table*
dbo.createTableIfNotExist("user",HBaseOrder.DESC,"account");
//dbo.createTableIfNotExist("user",HBaseOrder.ASC,"account");
//create index.
String[] cols={"id","name"};
dbo.addIndexExistingTable("user","account",cols);
//insert
InsertQuery insert = dbo.createInsertQuery("user");
UserBean bean = new UserBean();
bean.setFamily("account");
bean.setAge(20);
bean.setEmail("ncanis@gmail.com");
bean.setId("ncanis");
bean.setName("ncanis");
bean.setPassword("1111");
insert.insert(bean);
//select 1 row
SelectQuery select = dbo.createSelectQuery("user");
UserBean resultBean = (UserBean)select.select(bean.getRow(),UserBean.class);
// select column value.
String value = (String)select.selectColumn(bean.getRow(),"account","id",String.class);
// search with option (QSearch has EQUAL, NOT_EQUAL, LIKE)
// select id,password,name,email from account where id='ncanis' limit startRow,20
HBaseParam param = new HBaseParam();
param.setPage(bean.getRow(),20);
param.addColumn("id","password","name","email");
param.addSearchOption("id","ncanis",QSearch.EQUAL);
select.search("account", param, UserBean.class);
// search column value is existing.
boolean isExist = select.existColumnValue("account","id","ncanis".getBytes());
// update password.
UpdateQuery update = dbo.createUpdateQuery("user");
Hashtable<String, byte[]> colsTable = new Hashtable<String, byte[]>();
colsTable.put("password","2222".getBytes());
update.update(bean.getRow(),"account",colsTable);
//delete
DeleteQuery delete = dbo.createDeleteQuery("user");
delete.deleteRow(resultBean.getRow());
////////////////////////////////////
// etc
// HTable pool with apache commons pool
// borrow and release. HBasePoolManager(maxActive, minIdle etc..)
IndexedTable table = dbo.getPool().borrow("user");
dbo.getPool().release(table);
// upload bigFile by hadoop directly.
HBaseBigFile bigFile = new HBaseBigFile();
File file = new File("doc/movie.avi");
FileInputStream fis = new FileInputStream(file);
Path rootPath = new Path("/files/");
String filename = "movie.avi";
bigFile.uploadFile(rootPath,filename,fis,true);
// receive file stream from hadoop.
Path p = new Path(rootPath,filename);
InputStream is = bigFile.path2Stream(p,4096);
}
}
----
====
[[scala]]
== Scala
=== Setting the Classpath
To use Scala with HBase, your CLASSPATH must include HBase's classpath as well as
the Scala JARs required by your code. First, use the following command on a server
running the HBase RegionServer process, to get HBase's classpath.
[source, bash]
----
$ ps aux |grep regionserver| awk -F 'java.library.path=' {'print $2'} | awk {'print $1'}
/usr/lib/hadoop/lib/native:/usr/lib/hbase/lib/native/Linux-amd64-64
----
Set the `$CLASSPATH` environment variable to include the path you found in the previous
step, plus the path of `scala-library.jar` and each additional Scala-related JAR needed for
your project.
[source, bash]
----
$ export CLASSPATH=$CLASSPATH:/usr/lib/hadoop/lib/native:/usr/lib/hbase/lib/native/Linux-amd64-64:/path/to/scala-library.jar
----
=== Scala SBT File
Your `build.sbt` file needs the following `resolvers` and `libraryDependencies` to work
with HBase.
----
resolvers += "Apache HBase" at "https://repository.apache.org/content/repositories/releases"
resolvers += "Thrift" at "http://people.apache.org/~rawson/repo/"
libraryDependencies ++= Seq(
"org.apache.hadoop" % "hadoop-core" % "0.20.2",
"org.apache.hbase" % "hbase" % "0.90.4"
)
----
=== Example Scala Code
This example lists HBase tables, creates a new table, and adds a row to it.
[source, scala]
----
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.client.{Connection,ConnectionFactory,HBaseAdmin,HTable,Put,Get}
import org.apache.hadoop.hbase.util.Bytes
val conf = new HBaseConfiguration()
val connection = ConnectionFactory.createConnection(conf);
val admin = connection.getAdmin();
// list the tables
val listtables=admin.listTables()
listtables.foreach(println)
// let's insert some data in 'mytable' and get the row
val table = new HTable(conf, "mytable")
val theput= new Put(Bytes.toBytes("rowkey1"))
theput.add(Bytes.toBytes("ids"),Bytes.toBytes("id1"),Bytes.toBytes("one"))
table.put(theput)
val theget= new Get(Bytes.toBytes("rowkey1"))
val result=table.get(theget)
val value=result.value()
println(Bytes.toString(value))
----
[[jython]]
== Jython
=== Setting the Classpath
To use Jython with HBase, your CLASSPATH must include HBase's classpath as well as
the Jython JARs required by your code. First, use the following command on a server
running the HBase RegionServer process, to get HBase's classpath.
[source, bash]
----
$ ps aux |grep regionserver| awk -F 'java.library.path=' {'print $2'} | awk {'print $1'}
/usr/lib/hadoop/lib/native:/usr/lib/hbase/lib/native/Linux-amd64-64
----
Set the `$CLASSPATH` environment variable to include the path you found in the previous
step, plus the path to `jython.jar` and each additional Jython-related JAR needed for
your project.
[source, bash]
----
$ export CLASSPATH=$CLASSPATH:/usr/lib/hadoop/lib/native:/usr/lib/hbase/lib/native/Linux-amd64-64:/path/to/jython.jar
----
Start a Jython shell with HBase and Hadoop JARs in the classpath:
$ bin/hbase org.python.util.jython
=== Jython Code Examples
.Table Creation, Population, Get, and Delete with Jython
====
The following Jython code example creates a table, populates it with data, fetches
the data, and deletes the table.
[source,jython]
----
import java.lang
from org.apache.hadoop.hbase import HBaseConfiguration, HTableDescriptor, HColumnDescriptor, HConstants, TableName
from org.apache.hadoop.hbase.client import HBaseAdmin, HTable, Get
from org.apache.hadoop.hbase.io import Cell, RowResult
# First get a conf object. This will read in the configuration
# that is out in your hbase-*.xml files such as location of the
# hbase master node.
conf = HBaseConfiguration()
# Create a table named 'test' that has two column families,
# one named 'content, and the other 'anchor'. The colons
# are required for column family names.
tablename = TableName.valueOf("test")
desc = HTableDescriptor(tablename)
desc.addFamily(HColumnDescriptor("content:"))
desc.addFamily(HColumnDescriptor("anchor:"))
admin = HBaseAdmin(conf)
# Drop and recreate if it exists
if admin.tableExists(tablename):
admin.disableTable(tablename)
admin.deleteTable(tablename)
admin.createTable(desc)
tables = admin.listTables()
table = HTable(conf, tablename)
# Add content to 'column:' on a row named 'row_x'
row = 'row_x'
update = Get(row)
update.put('content:', 'some content')
table.commit(update)
# Now fetch the content just added, returns a byte[]
data_row = table.get(row, "content:")
data = java.lang.String(data_row.value, "UTF8")
print "The fetched row contains the value '%s'" % data
# Delete the table.
admin.disableTable(desc.getName())
admin.deleteTable(desc.getName())
----
====
.Table Scan Using Jython
====
This example scans a table and returns the results that match a given family qualifier.
[source, jython]
----
# Print all rows that are members of a particular column family
# by passing a regex for family qualifier
import java.lang
from org.apache.hadoop.hbase import HBaseConfiguration
from org.apache.hadoop.hbase.client import HTable
conf = HBaseConfiguration()
table = HTable(conf, "wiki")
col = "title:.*$"
scanner = table.getScanner([col], "")
while 1:
result = scanner.next()
if not result:
break
print java.lang.String(result.row), java.lang.String(result.get('title:').value)
----
====