This tutorial provides a quick introduction to using CarbonData.
cd carbondata cat > sample.csv << EOF id,name,city,age 1,david,shenzhen,31 2,eason,shenzhen,27 3,jarry,wuhan,35 EOF
Apache Spark Shell provides a simple way to learn the API, as well as a powerful tool to analyze data interactively. Please visit Apache Spark Documentation for more details on Spark shell.
Start Spark shell by running the following command in the Spark directory:
./bin/spark-shell --jars <carbondata assembly jar path>
In this shell, SparkSession is readily available as ‘spark’ and Spark context is readily available as ‘sc’.
In order to create a CarbonSession we will have to configure it explicitly in the following manner :
import org.apache.spark.sql.SparkSession import org.apache.spark.sql.CarbonSession._
val carbon = SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("<hdfs store path>")
NOTE: By default metastore location is pointed to “../carbon.metastore”, user can provide own metastore location to CarbonSession like SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("<hdfs store path>", "<local metastore path>")
scala>carbon.sql("CREATE TABLE IF NOT EXISTS test_table(id string, name string, city string, age Int) STORED BY 'carbondata'")
scala>carbon.sql("LOAD DATA INPATH 'sample.csv file path' INTO TABLE test_table")
NOTE:Please provide the real file path of sample.csv for the above script.
scala>carbon.sql("SELECT * FROM test_table").show() scala>carbon.sql("SELECT city, avg(age), sum(age) FROM test_table GROUP BY city").show()
Start Spark shell by running the following command in the Spark directory:
./bin/spark-shell --jars <carbondata assembly jar path>
NOTE: In this shell, SparkContext is readily available as sc.
import org.apache.spark.sql.CarbonContext
val cc = new CarbonContext(sc)
NOTE: By default store location is pointed to “../carbon.store”, user can provide own store location to CarbonContext like new CarbonContext(sc, storeLocation).
scala>cc.sql("CREATE TABLE IF NOT EXISTS test_table (id string, name string, city string, age Int) STORED BY 'carbondata'")
To see the table created :
scala>cc.sql("SHOW TABLES").show()
scala>cc.sql("LOAD DATA INPATH 'sample.csv file path' INTO TABLE test_table")
NOTE:Please provide the real file path of sample.csv for the above script.
scala>cc.sql("SELECT * FROM test_table").show() scala>cc.sql("SELECT city, avg(age), sum(age) FROM test_table GROUP BY city").show()