To make some easily accessible environment to run and develop Hive.
Isolates work on different branches/etc by leveraging container isolation X11 apps could still run like “normal” application (I tend to multiple eclipse instances for every patch I'm actually working)
Full isolation makes it easier to customize everything toward the goal...all ports can be binded/etc.
You may also run hive inside...
There is a prebacked image which contains some build tools in the image itself - that image is used at ci.hive.apache.org to run tests
Ability to run some version of hive as a standalone container;
Lets launch a hive with:
docker run --rm -d -p 10000:10000 -v hive-dev-box_work:/work apache/hive-dev-box:bazaar
the above will initialize the metastore and launch a nodemanger/resourcemanager and hive as separate processes inside the container (in a screen session)
-v hive-dev-box_data:/data
to enable persistent metastore/warehouseThere are sometimes bugreports agains earlier releases; but testing these out sometimes is problematic - running/switching between versions is kinda problematic. I was using some vagrant based box which was usefull doing this...
I'm working on Hive and sometimes on other projects in the last couple years - and since QA runs may come after 8-12 hours; I work on multiple patches simultaneously. However; working on several patches simultaniously has its own problems:
I go thru all the approaches I was using ealier:
The aim of this project is to provide an easier way to test-drive hive releases
# build and launch the hive-dev-box container ./hdb run hive-test # after building the container you will get a prompt inside it # initialize the metastore with reinit_metastore # everything should be ready to launch hive hive_launch # exit with CTRL+A CTRL+\ to kill all processes
Every container will be reaching out to almost the same artifacts; so employing an artifact cache “makes sense” in this case :D
# start artifactory instance ./start_artifactory.bash
To configure this instance the start_artifactory command will show a few commands you will need to execute to set it up - once its running.
After that you will be able to acccess artifactory at http://127.0.0.1:8081/ by using admin/admin to login.
This instance will be linked to the running development environment(s) automatically
add an export to your .bashrc or similar; like:
# to have a shared folder between all the dev containers and also the host system: export HIVE_DEV_BOX_HOST_DIR=$HOME/hdb
The dev environment will assume that you are working on upstream patches; and will always open a new branch forked from master If you skip this; things may not work - you will be left to do these things; in case you are using HIVE_SOURCES env variable you might not need to set it anyway.
# make sure to load the new env variables for bash . .bashrc # and also create the host dir beforehand mkdir $HIVE_DEV_BOX_HOST_DIR
# invoking with an argument names the container and will also be the preffered name for the ws and the development branch ./hdb run HIVE-12121-asd # when the terminal comes up # issuing the the following command will clone the sources based on your srcs dsl srcs hive # enter hive dir ; and create a local branch based on your requirements cd hive git branch `hostname` apache/master # if you need...patch the sources: cdpd-patcher hive # run a full rebuild rebuild # you may run eclipse dev_eclipse
A shorter version exists for initializing upstream patch development
./hdb run HIVE-12121-asd # this will clone the source; creates a branch named after the containers hostname; runs a rebuild and open eclipse hive_patch_development
beyond the “obvious” /bin
and /lib
folders there are some which might make it more clear how this works:
/work
/work
are not changed/active
/work
folder may contain a number versions of the same componentls -l /active
gives a brief overview about the active components/home/dev
/home/dev/hive
HIVE_SOURCES
is set at launch time; this folder will be mapped to that directory on the host/home/dev/host
bin
directory under this folder will be linked as /home/dev/bin
so that scripts can be shared between containers and the host# create a symlink to hive-dev-box/hdb from an executable location ; eg $HOME/bin ? ln -s $PWD/hdb $HOME/bin/hdb # enable bash_completion for hdb # add the following line to .bashrc . <($HOME/bin/hdb bash_completion)
# use hadoop 3.1.0 sw hadoop 3.1.0 # use hive 2.3.5 sw hive 2.3.5 # use tez 0.8.4 sw tez 0.8.4
reinit_metastore [derby|postgres|mysql]