[CARBONDATA-3481] Multi-thread pruning fails when datamaps count is just near numOfThreadsForPruning
Cause : When the datamaps count is just near numOfThreadsForPruning,
As code is checking '>= ', last thread may not get the datamaps for prune.
Hence array out of index exception is thrown in this scenario.
There is no issues with higher number of datamaps.
Solution: In this scenario launch threads based on the distribution value,
not on the hardcoded value
This closes #3336
diff --git a/core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java b/core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java
index 33fc3b1..ecdd586 100644
--- a/core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java
+++ b/core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java
@@ -207,9 +207,6 @@
*/
int numOfThreadsForPruning = CarbonProperties.getNumOfThreadsForPruning();
- LOG.info(
- "Number of threads selected for multi-thread block pruning is " + numOfThreadsForPruning
- + ". total files: " + totalFiles + ". total segments: " + segments.size());
int filesPerEachThread = totalFiles / numOfThreadsForPruning;
int prev;
int filesCount = 0;
@@ -254,6 +251,15 @@
// this should not happen
throw new RuntimeException(" not all the files processed ");
}
+ if (datamapListForEachThread.size() < numOfThreadsForPruning) {
+ // If the total datamaps fitted in lesser number of threads than numOfThreadsForPruning.
+ // Launch only that many threads where datamaps are fitted while grouping.
+ LOG.info("Datamaps is distributed in " + datamapListForEachThread.size() + " threads");
+ numOfThreadsForPruning = datamapListForEachThread.size();
+ }
+ LOG.info(
+ "Number of threads selected for multi-thread block pruning is " + numOfThreadsForPruning
+ + ". total files: " + totalFiles + ". total segments: " + segments.size());
List<Future<Void>> results = new ArrayList<>(numOfThreadsForPruning);
final Map<Segment, List<ExtendedBlocklet>> prunedBlockletMap =
new ConcurrentHashMap<>(segments.size());