This tutorial provides a quick introduction to using CarbonData.
Create sample.csv file in carbondata directory
$ cd carbondata $ cat > sample.csv << EOF id,name,city,age 1,david,shenzhen,31 2,eason,shenzhen,27 3,jarry,wuhan,35 EOF
Carbon Spark shell is a wrapper around Apache Spark Shell, it provides a simple way to learn the API, as well as a powerful tool to analyze data interactively. Please visit Apache Spark Documentation for more details on Spark shell. Start Spark shell by running the following in the Carbon directory:
./bin/carbon-spark-shell
Note: In this shell SparkContext is readily available as sc and CarbonContext is available as cc.
Create table
scala>cc.sql("create table if not exists test_table (id string, name string, city string, age Int) STORED BY 'carbondata'")
Load data to table
scala>val dataFilePath = new File("../carbondata/sample.csv").getCanonicalPath scala>cc.sql(s"load data inpath '$dataFilePath' into table test_table")
Query data from table
scala>cc.sql("select * from test_table").show scala>cc.sql("select city, avg(age), sum(age) from test_table group by city").show
The Carbon Spark SQL CLI is a wrapper around Apache Spark SQL CLI. It is a convenient tool to execute queries input from the command line. Please visit Apache Spark Documentation for more information Spark SQL CLI. Start the Carbon Spark SQL CLI, run the following in the Carbon directory
./bin/carbon-spark-sql
And you can provide your own store location by providing configuration using --conf option like:
./bin/carbon-spark-sql --conf spark.carbon.storepath=/home/root/carbonstore
Execute Queries in CLI
spark-sql> create table if not exists test_table (id string, name string, city string, age Int) STORED BY 'carbondata' spark-sql> load data inpath '../sample.csv' into table test_table spark-sql> select city, avg(age), sum(age) from test_table group by city