docs/Quick-Start.md - carbondata - Git at Google

 <!--
     Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

     Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.
 -->

 # Getting started with Apache CarbonData

 This tutorial provides a quick introduction to using CarbonData.


 ## Install

 * Download released package of [Spark 1.5.0 or later](http://spark.apache.org/downloads.html)
 * Download and install Apache Thrift 0.9.3, make sure thrift is added to system path.
 * Download [Apache CarbonData code](https://github.com/apache/incubator-carbondata) and build it. Please visit [Building CarbonData And IDE Configuration](Installing-CarbonData-And-IDE-Configuartion.md) for more information.

 ## Interactive Data Query

 ### Prerequisite
 Create sample.csv file in carbondata directory
 ```
 $ cd carbondata
 $ cat > sample.csv << EOF
   id,name,city,age
   1,david,shenzhen,31
   2,eason,shenzhen,27
   3,jarry,wuhan,35
   EOF
 ```

 ### Carbon Spark Shell
 Carbon Spark shell is a wrapper around Apache Spark Shell, it provides a simple way to learn the API, as well as a powerful tool to analyze data interactively. Please visit Apache Spark Documentation for more details on Spark shell.
 Start Spark shell by running the following in the Carbon directory:
 ```
 ./bin/carbon-spark-shell
 ```
 *Note*: In this shell SparkContext is readily available as sc and CarbonContext is available as cc.

 **Create table**

 ```
 scala>cc.sql("create table if not exists test_table (id string, name string, city string, age Int) STORED BY 'carbondata'")
 ```

 **Load data to table**
 ```
 scala>val dataFilePath = new File("../carbondata/sample.csv").getCanonicalPath
 scala>cc.sql(s"load data inpath '$dataFilePath' into table test_table")
 ```

 **Query data from table**

 ```
 scala>cc.sql("select * from test_table").show
 scala>cc.sql("select city, avg(age), sum(age) from test_table group by city").show
 ```

 ### Carbon SQL CLI
 The Carbon Spark SQL CLI is a wrapper around Apache Spark SQL CLI. It is a convenient tool to execute queries input from the command line. Please visit Apache Spark Documentation for more information Spark SQL CLI.
 Start the Carbon Spark SQL CLI, run the following in the Carbon directory

 ```
 ./bin/carbon-spark-sql
 ```
 And you can provide your own store location by providing configuration using --conf option like:
 ```
 ./bin/carbon-spark-sql --conf spark.carbon.storepath=/home/root/carbonstore
 ```

 **Execute Queries in CLI**
 ```
 spark-sql> create table if not exists test_table (id string, name string, city string, age Int) STORED BY 'carbondata'
 spark-sql> load data inpath '../sample.csv' into table test_table
 spark-sql> select city, avg(age), sum(age) from test_table group by city
 ```
	<!--
	Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.
	-->

	# Getting started with Apache CarbonData

	This tutorial provides a quick introduction to using CarbonData.


	## Install

	* Download released package of [Spark 1.5.0 or later](http://spark.apache.org/downloads.html)
	* Download and install Apache Thrift 0.9.3, make sure thrift is added to system path.
	* Download [Apache CarbonData code](https://github.com/apache/incubator-carbondata) and build it. Please visit [Building CarbonData And IDE Configuration](Installing-CarbonData-And-IDE-Configuartion.md) for more information.

	## Interactive Data Query

	### Prerequisite
	Create sample.csv file in carbondata directory
	```
	$ cd carbondata
	$ cat > sample.csv << EOF
	id,name,city,age
	1,david,shenzhen,31
	2,eason,shenzhen,27
	3,jarry,wuhan,35
	EOF
	```

	### Carbon Spark Shell
	Carbon Spark shell is a wrapper around Apache Spark Shell, it provides a simple way to learn the API, as well as a powerful tool to analyze data interactively. Please visit Apache Spark Documentation for more details on Spark shell.
	Start Spark shell by running the following in the Carbon directory:
	```
	./bin/carbon-spark-shell
	```
	Note: In this shell SparkContext is readily available as sc and CarbonContext is available as cc.

	Create table

	```
	scala>cc.sql("create table if not exists test_table (id string, name string, city string, age Int) STORED BY 'carbondata'")
	```

	Load data to table
	```
	scala>val dataFilePath = new File("../carbondata/sample.csv").getCanonicalPath
	scala>cc.sql(s"load data inpath '$dataFilePath' into table test_table")
	```

	Query data from table

	```
	scala>cc.sql("select * from test_table").show
	scala>cc.sql("select city, avg(age), sum(age) from test_table group by city").show
	```

	### Carbon SQL CLI
	The Carbon Spark SQL CLI is a wrapper around Apache Spark SQL CLI. It is a convenient tool to execute queries input from the command line. Please visit Apache Spark Documentation for more information Spark SQL CLI.
	Start the Carbon Spark SQL CLI, run the following in the Carbon directory

	```
	./bin/carbon-spark-sql
	```
	And you can provide your own store location by providing configuration using --conf option like:
	```
	./bin/carbon-spark-sql --conf spark.carbon.storepath=/home/root/carbonstore
	```

	Execute Queries in CLI
	```
	spark-sql> create table if not exists test_table (id string, name string, city string, age Int) STORED BY 'carbondata'
	spark-sql> load data inpath '../sample.csv' into table test_table
	spark-sql> select city, avg(age), sum(age) from test_table group by city
	```