Apache CarbonData is a new big data file format for faster interactive query using advanced columnar storage, index, compression and encoding techniques to improve computing efficiency, which helps in speeding up queries by an order of magnitude faster over PetaBytes of data.
File Format Concepts: Start with the basics of understanding the CarbonData file format and its storage structure. This will help to understand other parts of the documentation, including deployment, programming and usage guides.
CarbonData SQL Language Reference: CarbonData extends the Spark SQL language and adds several DDL and DML statements to support operations on it. Refer to the Reference Manual to understand the supported features and functions.
The Apache CarbonData community welcomes all kinds of contributions from anyone with a passion for faster data format.Contributing to CarbonData doesn’t just mean writing code. Helping new users on the mailing list, testing releases, and improving documentation are also welcome.Please follow the Contributing to CarbonData guidelines before proposing a design or code change.
Compiling CarbonData: This guide will help you to compile and generate the jars for test.
Wiki: You can read the Apache CarbonData wiki page for upcoming release plan, blogs and training materials.
Summit: Presentations from past summits and conferences can be found here.
Blogs: Blogs by external users can be found here.
Performance reports: TPC-H performance reports can be found here.
Trainings: Training records on design and code flows can be found here.