blob: e19d4513bb946e7838ea148238ad0b2522fff928 [file] [log] [blame]
| | Spark/Crail | Spark/Vanilla | Spark/Winner2014 | Tencent/Winner2016 |
|--------------------|---------------|---------------|------------------------|--------------------------|
| Data Size | 12.8 TB | 12.8 TB | 100 TB | 100 TB |
| Elapsed Time | 98 s | 527 s | 1406 s | 98.8 s |
| Cores | 2560 | 2560 | 6592 | 10240 |
| Nodes | 128 | 128 | 206 | 512 |
| Network | 100 Gbit/sec| 100 Gbit/s | 10 Gbit/s | 100 Gbit/s |
| Sorting rate | 7.8 TB/min | 1.4 TB/min | 4.27 TB/min | 44.78 TB/min |
| Sorting rate/core | 3.13 GB/min | 0.58 GB/min | 0.66 GB/min | 4.4 GB/min |
The natural follow-up question from these results is about a further breakdown that shows what part of the gains come from the Crail I/O, serializer and sorter. Such a breakdown cannot easily be obtained as in the built-in Spark shuffler these different phases are executed interleaved.
table, th, td {
border: 1px solid black;
}