commit | 3c7b33992e06d81fb47d81bf8ccf7884f845b3ff | [log] [tgz] |
---|---|---|
author | manishgupta88 <tomanishgupta18@gmail.com> | Mon Oct 08 19:38:54 2018 +0530 |
committer | ravipesala <ravi.pesala@gmail.com> | Tue Oct 09 13:38:51 2018 +0530 |
tree | ed5d5c1a50ab2b790e14dea638767094939d037a | |
parent | 19097f272fe3227c71c86338bb8bf788e87cd4aa [diff] |
[CARBONDATA-2990] Queries slow down after some time due to broadcast issue Problem It is observed that during consecutive run of queries after some time queries are slowing down. This is causing the degrade in query performance. No exception is thrown in driver and executor logs but as observed from the logs the time to broadcast hadoop conf is increasing after every query run. Analysis This is happening because in carbon SerializableConfiguration class is overriden from spark. Spark registers this class with Kryo serializer and hence the computation using the kryo is fast. The same benefit is not observed in carbondata becuase of overriding the class. Internal Spark sizeEstimator calculates the size of object and there are few extra objects in carbondata overriden class because of which the computation time is increasing. Solution Use the spark class instead of overriding the class in carbondata This closes #2803
Apache CarbonData is an indexed columnar data store solution for fast analytics on big data platform, e.g.Apache Hadoop, Apache Spark, etc.
You can find the latest CarbonData document and learn more at: http://carbondata.apache.org
CarbonData file format is a columnar store in HDFS, it has many features that a modern columnar format has, such as splittable, compression schema ,complex data type etc, and CarbonData has following unique features:
CarbonData is built using Apache Maven, to build CarbonData
This is an active open source project for everyone, and we are always open to people who want to use this system or contribute to it. This guide document introduce how to contribute to CarbonData.
To get involved in CarbonData:
Apache CarbonData is an open source project of The Apache Software Foundation (ASF).