commit | 00f516bbfaa58efa5f2e7914cfe01bddb0bcc812 | [log] [tgz] |
---|---|---|
author | ajantha-bhat <ajanthabhat@gmail.com> | Sat Dec 28 16:54:06 2019 +0800 |
committer | Jacky Li <jacky.likun@qq.com> | Sun Jan 05 14:45:44 2020 +0800 |
tree | 1dc8cb46f3e4183cbd24ffea099530ab2b1786e2 | |
parent | 5b8ac8478847ef6962f39362f8e16f30a9cb6a03 [diff] |
[CARBONDATA-3639][CARBONDATA-3638] Fix global sort exception in load from CSV flow with binary non-sort columns Problem: Global sort throws exception in load from CSV flow with binary non-sort columns. Exception is attached in JIRA and test case is added in PR. Cause: For global sort flow, we make csvRDD from file. But again this String RDD is converted to new string scala objects, here binary was exceeding 32000 length. Hence the exception Solution: For CSV load flow, avoid this new String object conversion as it is already string object. This can also handle [CARBONDATA-3638] This closes #3540
Apache CarbonData is an indexed columnar data store solution for fast analytics on big data platform, e.g.Apache Hadoop, Apache Spark, etc.
You can find the latest CarbonData document and learn more at: http://carbondata.apache.org
CarbonData file format is a columnar store in HDFS, it has many features that a modern columnar format has, such as splittable, compression schema ,complex data type etc, and CarbonData has following unique features:
CarbonData is built using Apache Maven, to build CarbonData
This is an active open source project for everyone, and we are always open to people who want to use this system or contribute to it. This guide document introduce how to contribute to CarbonData.
To get involved in CarbonData:
Apache CarbonData is an open source project of The Apache Software Foundation (ASF).