CI Process | Status |
---|---|
Travis CI Build | |
Apache Release Audit Tool (RAT) | |
Coverity Static Analysis |
Website | Wiki | Documentation | Developer Mailing List | User Mailing List | Q&A Collections | Open Defect
Apache HAWQ is a Hadoop native SQL query engine that combines the key technological advantages of MPP database with the scalability and convenience of Hadoop. HAWQ reads data from and writes data to HDFS natively. HAWQ delivers industry-leading performance and linear scalability. It provides users the tools to confidently and successfully interact with petabyte range data sets. HAWQ provides users with a complete, standards compliant SQL interface. More specifically, HAWQ has the following features:
Install HomeBrew referring to here.
brew install hadoop
${HADOOP_HOME}/etc/hadoop/slaves
For example, /usr/local/Cellar/hadoop/2.8.1/libexec/etc/hadoop/slaves
localhost
${HADOOP_HOME}/etc/hadoop/core-site.xml
For example, /usr/local/Cellar/hadoop/2.8.1/libexec/etc/hadoop/core-site.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:8020</value> </property> </configuration>
${HADOOP_HOME}/etc/hadoop/hdfs-site.xml
For example, /usr/local/Cellar/hadoop/2.8.1/libexec/etc/hadoop/hdfs-site.xml
Attention: Replace ${HADOOP_DATA_DIRECTORY}
and ${USER_NAME}
variables with your own specific values.
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>dfs.namenode.name.dir</name> <value>file://${HADOOP_DATA_DIRECTORY}/name</value> <description>Specify your dfs namenode dir path</description> </property> <property> <name>dfs.datanode.data.dir</name> <value>file://${HADOOP_DATA_DIRECTORY}/data</value> <description>Specify your dfs datanode dir path</description> </property> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
touch ~/.bashrc touch ~/.bash_profile echo "if [ -f ~/.bashrc ]; then source ~/.bashrc fi" >> ~/.bash_profile echo "export HADOOP_HOME=/usr/local/Cellar/hadoop/2.8.1/libexec" >> ~/.bashrc echo "export PATH=$PATH:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin" >> ~/.bashrc source ~/.bashrc
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys chmod 0600 ~/.ssh/authorized_keys
Now you can ssh localhost
without a passphrase. If you meet Port 22 connecting refused
error, turn on Remote login
in your Mac's System Preferences->Sharing
.
hdfs namenode -format
# start/stop HDFS start-dfs.sh/stop-dfs.sh # Do some basic tests to make sure HDFS works hdfs dfsadmin -report hadoop fs -ls /
When things go wrong, check the log and view the FAQ in wiki first.
Turning Off Rootless System Integrity Protection on macOS that newer than OS X El Capitan 10.11
if you encounter some tricky LIBRARY_PATH problems, e.g. HAWQ-513, which makes hawq binary not able to find its shared library dependencies. Steps below:
sysctl.conf
For Mac OSX 10.10 / 10.11, add following content to /etc/sysctl.conf
and then sudo sysctl -p
to activate them.
For Mac OSX 10.12+, add following content to /etc/sysctl.conf
and then cat /etc/sysctl.conf | xargs sudo sysctl
to check.
kern.sysv.shmmax=2147483648 kern.sysv.shmmin=1 kern.sysv.shmmni=64 kern.sysv.shmseg=16 kern.sysv.shmall=524288 kern.maxfiles=65535 kern.maxfilesperproc=65536 kern.corefile=/cores/core.%N.%P
mkdir ~/dev git clone git@github.com:apache/hawq ~/dev/hawq sudo mkdir -p /opt sudo chmod a+w /opt sudo install -o $USER -d /usr/local/hawq
Setup toolchain and thirdparty dependency
2.4.1 Add hawq environment information to ~/.bashrc
, and source ~/.bashrc
to make it effect.
ulimit -c 10000000000 export CC=clang export CXX=clang++ export DEPENDENCY_PATH=/opt/dependency/package source /opt/dependency-Darwin/package/env.sh
2.4.2 Build HAWQ
cd ~/dev/hawq git checkout master ln -sf ../../commit-msg .git/hooks/commit-msg ./configure make -j8 make -j8 install
mkdir /tmp/magma_master mkdir /tmp/magma_segment
Feel free to use the default /usr/local/hawq/etc/hawq-site.xml
. Pay attention to mapping hawq_dfs_url
to fs.defaultFS
in ${HADOOP_HOME}/etc/hadoop/core-site.xml
.
# Before initializing HAWQ, you need to install HDFS and make sure it works. source /usr/local/hawq/greenplum_path.sh # Besides you need to set password-less ssh on the systems. # If only install hawq for developing in localhost, skip this step. # Exchange SSH keys between the hosts host1, host2, and host3: #hawq ssh-exkeys -h host1 -h host2 -h host3 # Initialize HAWQ cluster and start HAWQ by default hawq init cluster -a # Now you can stop/restart/start the cluster using: hawq stop/restart/start cluster # Init command would invoke start command automaticlly too. # HAWQ master and segments are completely decoupled. # So you can also init, start or stop the master and segments separately. # For example, to init: hawq init master, then hawq init segment # to stop: hawq stop master, then hawq stop segment # to start: hawq start master, then hawq start segment
Everytime you init hawq you need to delete some files. The directory of all files you need to delete have been configured in /usr/local/hawq/etc/hawq-site.xml.
- Name:
hawq_dfs_url
Description:URL for accessing HDFS
- Name:
hawq_master_directory
Description:The directory of hawq master
- Name:
hawq_segment_directory
Description:The directory of hawq segment
- Name:
hawq_magma_locations_master
Description:HAWQ magma service locations on master
- Name:
hawq_magma_locations_segment
Description:HAWQ magma service locations on segmenti.e.
hdfs dfs -rm -r /hawq* rm -rf /Users/xxx/data/hawq/master/* rm -rf /Users/xxx/data/hawq/segment/* rm -rf /Users/xxx/data/hawq/tmp/magma_master/* rm -rf /Users/xxx/data/hawq/tmp/magma_segment/*Check whether there is any process of postgres or magma running in your computer. If they are running ,you must kill them before you init hawq. For example,
ps -ef | grep postgres | grep -v grep | awk '{print $2}'| xargs kill -9 ps -ef | grep magma | grep -v grep | awk '{print $2}'| xargs kill -9
Almost the same as that on macOS, feel free to have a try.
Almost the same as that on macOS, feel free to have a try.
Please see HAWQ wiki page: https://cwiki.apache.org/confluence/display/HAWQ/Build+and+Install
cd hawq make feature-test
To make the output is consistent, please create a newdb and use specific locale.
TEST_DB_NAME="hawq_feature_test_db" psql -d postgres -c "create database $TEST_DB_NAME;" export PGDATABASE=$TEST_DB_NAME psql -c "alter database $TEST_DB_NAME set lc_messages to 'C';" psql -c "alter database $TEST_DB_NAME set lc_monetary to 'C';" psql -c "alter database $TEST_DB_NAME set lc_numeric to 'C';" psql -c "alter database $TEST_DB_NAME set lc_time to 'C';" psql -c "alter database $TEST_DB_NAME set timezone_abbreviations to 'Default';" psql -c "alter database $TEST_DB_NAME set timezone to 'PST8PDT';" psql -c "alter database $TEST_DB_NAME set datestyle to 'postgres,MDY';"
To run normal feature test , please use below filter:
hawq/src/test/feature/feature-test --gtest_filter=-TestHawqRegister.*:TestTPCH.TestStress:TestHdfsFault.*:TestZookeeperFault.*:TestHawqFault.*
cd hawq/src/test/feature/ mkdir -p testresult python ./gtest-parallel --workers=4 --output_dir=./testresult --print_test_times ./feature-test --gtest_filter=-TestHawqRegister.*:TestTPCH.*:TestHdfsFault.*:TestZookeeperFault.*:TestHawqFault.*:TestQuitQuery.*:TestErrorTable.*:TestExternalTableGpfdist.*:TestExternalTableOptionMultibytesDelimiter.TestGpfdist:TETAuth.*
TestHawqRegister is not included TestTPCH.TestStress is for TPCH stress test TestHdfsFault Hdfs fault tests TestZookeeperFault Zookeeper fault tests
TestHawqFault Hawq fault tolerance tests
This distribution includes cryptographic software. The country in which you currently reside may have restrictions on the import, possession, use, and/or re-export to another country, of encryption software. BEFORE using any encryption software, please check your country's laws, regulations and policies concerning the import, possession, or use, and re-export of encryption software, to see if this is permitted. See http://www.wassenaar.org/ for more information.
The U.S. Government Department of Commerce, Bureau of Industry and Security (BIS), has classified this software as Export Commodity Control Number (ECCN) 5D002.C.1, which includes information security software using or performing cryptographic functions with asymmetric algorithms. The form and manner of this Apache Software Foundation distribution makes it eligible for export under the License Exception ENC Technology Software Unrestricted (TSU) exception (see the BIS Export Administration Regulations, Section 740.13) for both object code and source code.