site/_data/table.txt - incubator-crail-website - Git at Google


 |                    | Spark/Crail   | Spark/Vanilla |    Spark/Winner2014    | Tencent/Winner2016       |
 |--------------------|---------------|---------------|------------------------|--------------------------|
 | Data Size          |   12.8 TB     |      12.8 TB  |         100 TB         |            100 TB        |
 | Elapsed Time       |    98 s       |       527 s   |        1406 s          |         98.8 s           |
 | Cores              |     2560      |     2560      |        6592            |         10240            |
 | Nodes              |    128        |   128         |         206            |           512            |
 | Network            |   100 Gbit/sec|  100 Gbit/s   |       10 Gbit/s        |       100 Gbit/s         |
 | Sorting rate       |  7.8 TB/min   |  1.4 TB/min   |      4.27 TB/min       |         44.78 TB/min     |
 | Sorting rate/core  |  3.13 GB/min  |  0.58 GB/min  |      0.66 GB/min       |          4.4 GB/min      |

 The natural follow-up question from these results is about a further breakdown that shows what part of the gains come from the Crail I/O, serializer and sorter. Such a breakdown cannot easily be obtained as in the built-in Spark shuffler these different phases are executed interleaved.

 table, th, td {
     border: 1px solid black;
 }

	\| \| Spark/Crail \| Spark/Vanilla \| Spark/Winner2014 \| Tencent/Winner2016 \|
	\|--------------------\|---------------\|---------------\|------------------------\|--------------------------\|
	\| Data Size \| 12.8 TB \| 12.8 TB \| 100 TB \| 100 TB \|
	\| Elapsed Time \| 98 s \| 527 s \| 1406 s \| 98.8 s \|
	\| Cores \| 2560 \| 2560 \| 6592 \| 10240 \|
	\| Nodes \| 128 \| 128 \| 206 \| 512 \|
	\| Network \| 100 Gbit/sec\| 100 Gbit/s \| 10 Gbit/s \| 100 Gbit/s \|
	\| Sorting rate \| 7.8 TB/min \| 1.4 TB/min \| 4.27 TB/min \| 44.78 TB/min \|
	\| Sorting rate/core \| 3.13 GB/min \| 0.58 GB/min \| 0.66 GB/min \| 4.4 GB/min \|

	The natural follow-up question from these results is about a further breakdown that shows what part of the gains come from the Crail I/O, serializer and sorter. Such a breakdown cannot easily be obtained as in the built-in Spark shuffler these different phases are executed interleaved.

	table, th, td {
	border: 1px solid black;
	}