commit | 3161a42011902fd7ad4e2ee86ac818d2d554d77d | [log] [tgz] |
---|---|---|
author | akashrn5 <akashnilugal@gmail.com> | Mon Jul 20 11:22:50 2020 +0530 |
committer | kunal642 <kunalkapoor642@gmail.com> | Fri Jul 24 11:11:27 2020 +0530 |
tree | a3c5842fcee5a798b5b01ebe820b53b50a95f67b | |
parent | ea6ab2836d273bebbfe5aaf2faba1fdf0d3af12d [diff] |
[CARBONDATA-3918] Fix extra data in count(*) after multiple updates and index server running Why is this PR needed? Select count * gives extra data after multiple updates with the index server running. This is because, once the horizontal compaction happens, it stores the index fils to cache and create new index and data files, so if the table is updated or deleted table, we will exclude those splits after getting all splits. Since once the splits come from index server since loadmetadatadetails are transient in Segment object, we will have null value for it as the slits are serialized from index server. Because of which it won't be able to filter out the IUD old segments. So it leads to extra data in count *. What changes were proposed in this PR? Once we get the splits from the index server, then from the validSegments, get the loadmetadataDetails and readCommittedScope and set into the splits which solve this problem. This closes #3853
Apache CarbonData is an indexed columnar data store solution for fast analytics on big data platform, e.g.Apache Hadoop, Apache Spark, etc.
You can find the latest CarbonData document and learn more at: http://carbondata.apache.org
CarbonData file format is a columnar store in HDFS, it has many features that a modern columnar format has, such as splittable, compression schema ,complex data type etc, and CarbonData has following unique features:
CarbonData is built using Apache Maven, to build CarbonData
Some features are marked as experimental because the syntax/implementation might change in the future.
This is an active open source project for everyone, and we are always open to people who want to use this system or contribute to it. This guide document introduce how to contribute to CarbonData.
To get involved in CarbonData:
Apache CarbonData is an open source project of The Apache Software Foundation (ASF).