blob: 0020c71abdb20b0253533535bcea7ca634b07184 [file] [log] [blame] [view]
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
# Getting started with Apache CarbonData
This tutorial provides a quick introduction to using CarbonData.
## Install
* Download released package of [Spark 1.5.0 or later](http://spark.apache.org/downloads.html)
* Download and install Apache Thrift 0.9.3, make sure thrift is added to system path.
* Download [Apache CarbonData code](https://github.com/apache/incubator-carbondata) and build it. Please visit [Building CarbonData And IDE Configuration](Installing-CarbonData-And-IDE-Configuartion.md) for more information.
## Interactive Data Query
### Prerequisite
Create sample.csv file in carbondata directory
```
$ cd carbondata
$ cat > sample.csv << EOF
id,name,city,age
1,david,shenzhen,31
2,eason,shenzhen,27
3,jarry,wuhan,35
EOF
```
### Carbon Spark Shell
Carbon Spark shell is a wrapper around Apache Spark Shell, it provides a simple way to learn the API, as well as a powerful tool to analyze data interactively. Please visit Apache Spark Documentation for more details on Spark shell.
Start Spark shell by running the following in the Carbon directory:
```
./bin/carbon-spark-shell
```
*Note*: In this shell SparkContext is readily available as sc and CarbonContext is available as cc.
**Create table**
```
scala>cc.sql("create table if not exists test_table (id string, name string, city string, age Int) STORED BY 'carbondata'")
```
**Load data to table**
```
scala>val dataFilePath = new File("../carbondata/sample.csv").getCanonicalPath
scala>cc.sql(s"load data inpath '$dataFilePath' into table test_table")
```
**Query data from table**
```
scala>cc.sql("select * from test_table").show
scala>cc.sql("select city, avg(age), sum(age) from test_table group by city").show
```
### Carbon SQL CLI
The Carbon Spark SQL CLI is a wrapper around Apache Spark SQL CLI. It is a convenient tool to execute queries input from the command line. Please visit Apache Spark Documentation for more information Spark SQL CLI.
Start the Carbon Spark SQL CLI, run the following in the Carbon directory
```
./bin/carbon-spark-sql
```
And you can provide your own store location by providing configuration using --conf option like:
```
./bin/carbon-spark-sql --conf spark.carbon.storepath=/home/root/carbonstore
```
**Execute Queries in CLI**
```
spark-sql> create table if not exists test_table (id string, name string, city string, age Int) STORED BY 'carbondata'
spark-sql> load data inpath '../sample.csv' into table test_table
spark-sql> select city, avg(age), sum(age) from test_table group by city
```