This tutorial provides a quick introduction to using current integration/hive module.
Create a sample.csv file using the following commands. The CSV file is required for loading data into CarbonData.
cd carbondata cat > sample.csv << EOF id,name,scale,country,salary 1,yuhai,1.77,china,33000.1 2,runlin,1.70,china,33000.2 EOF
copy data to HDFS
$HADOOP_HOME/bin/hadoop fs -put sample.csv <hdfs store path>/sample.csv
<property> <name>hive.metastore.pre.event.listeners</name> <value>org.apache.carbondata.hive.CarbonHiveMetastoreListener</value> </property>
./bin/spark-shell --jars <carbondata assembly jar path, carbon hive jar path>
import org.apache.spark.sql.SparkSession import org.apache.spark.sql.CarbonSession._ val rootPath = "hdfs:///user/hadoop/carbon" val storeLocation = s"$rootPath/store" val warehouse = s"$rootPath/warehouse" val metaStoreDB = s"$rootPath/metastore_db" val carbon = SparkSession.builder().enableHiveSupport().config("spark.sql.warehouse.dir", warehouse).config(org.apache.carbondata.core.constants.CarbonCommonConstants.STORE_LOCATION, storeLocation).getOrCreateCarbonSession(storeLocation, metaStoreDB) carbon.sql("create table hive_carbon(id int, name string, scale decimal, country string, salary double) STORED AS carbondata") carbon.sql("LOAD DATA INPATH '<hdfs store path>/sample.csv' INTO TABLE hive_carbon") scala>carbon.sql("SELECT * FROM hive_carbon").show()
mkdir hive/auxlibs/ cp carbondata/assembly/target/scala-2.11/carbondata_2.11*.jar hive/auxlibs/ cp carbondata/integration/hive/target/carbondata-hive-*.jar hive/auxlibs/ cp $SPARK_HOME/jars/spark-catalyst*.jar hive/auxlibs/ cp $SPARK_HOME/jars/scala*.jar hive/auxlibs/ export HIVE_AUX_JARS_PATH=hive/auxlibs/
copy snappy-java-xxx.jar from "./<SPARK_HOME>/jars/" to "./Library/Java/Extensions" export HADOOP_OPTS="-Dorg.xerial.snappy.lib.path=/Library/Java/Extensions -Dorg.xerial.snappy.lib.name=libsnappyjava.jnilib -Dorg.xerial.snappy.tempdir=/Users/apple/DEMO/tmp"
hive/lib/ (for hive server) yarn/lib/ (for MapReduce) Carbon Jars to be copied to the above paths.
$HIVE_HOME/bin/beeline
create table hive_carbon(id int, name string, scale decimal, country string, salary double) stored by 'org.apache.carbondata.hive.CarbonStorageHandler'; insert into hive_carbon select * from parquetTable;
Note: Only non-transactional tables are supported when created through hive. This means that the standard carbon folder structure would not be followed and all files would be written in a flat folder structure.
set hive.mapred.supports.subdirectories=true; set mapreduce.dir.recursive=true; These properties helps to recursively traverse through the directories to read the carbon folder structure.
- Query the table select * from hive_carbon; select count(*) from hive_carbon; select * from hive_carbon order by id;