blob: 97f2e70fccc5de181771dce95dba5fbfb4d3c07f [file] [log] [blame]
================================================================================================
Benchmark to measure CSV read/write performance
================================================================================================
OpenJDK 64-Bit Server VM 11.0.13+8-LTS on Linux 5.11.0-1022-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Parsing quoted values: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
One quoted string 38994 39382 514 0.0 779874.2 1.0X
OpenJDK 64-Bit Server VM 11.0.13+8-LTS on Linux 5.11.0-1022-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Wide rows with 1000 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Select 1000 columns 109112 110020 836 0.0 109111.7 1.0X
Select 100 columns 44175 44228 74 0.0 44174.8 2.5X
Select one column 37100 37326 265 0.0 37100.3 2.9X
count() 7061 7116 51 0.1 7060.6 15.5X
Select 100 columns, one bad input field 66313 66524 192 0.0 66312.7 1.6X
Select 100 columns, corrupt record field 74984 75463 509 0.0 74984.5 1.5X
OpenJDK 64-Bit Server VM 11.0.13+8-LTS on Linux 5.11.0-1022-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Count a dataset with 10 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Select 10 columns + count() 17351 17486 118 0.6 1735.1 1.0X
Select 1 column + count() 14258 14398 169 0.7 1425.8 1.2X
count() 3940 3951 10 2.5 394.0 4.4X
OpenJDK 64-Bit Server VM 11.0.13+8-LTS on Linux 5.11.0-1022-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Write dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
Create a dataset of timestamps 1886 1897 10 5.3 188.6 1.0X
to_csv(timestamp) 13634 13687 66 0.7 1363.4 0.1X
write timestamps to files 12583 12699 175 0.8 1258.3 0.1X
Create a dataset of dates 1933 1947 20 5.2 193.3 1.0X
to_csv(date) 8585 8661 96 1.2 858.5 0.2X
write dates to files 7246 7281 46 1.4 724.6 0.3X
OpenJDK 64-Bit Server VM 11.0.13+8-LTS on Linux 5.11.0-1022-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Read dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
read timestamp text from files 2701 2717 24 3.7 270.1 1.0X
read timestamps from files 32540 32782 210 0.3 3254.0 0.1X
infer timestamps from files 66992 67194 183 0.1 6699.2 0.0X
read date text from files 2480 2487 8 4.0 248.0 1.1X
read date from files 10795 10872 128 0.9 1079.5 0.3X
infer date from files 25915 26054 137 0.4 2591.5 0.1X
timestamp strings 3003 3046 65 3.3 300.3 0.9X
parse timestamps from Dataset[String] 35786 35938 160 0.3 3578.6 0.1X
infer timestamps from Dataset[String] 70439 70794 325 0.1 7043.9 0.0X
date strings 3195 3205 9 3.1 319.5 0.8X
parse dates from Dataset[String] 13149 13200 53 0.8 1314.9 0.2X
from_csv(timestamp) 33172 33190 15 0.3 3317.2 0.1X
from_csv(date) 11749 11809 63 0.9 1174.9 0.2X
OpenJDK 64-Bit Server VM 11.0.13+8-LTS on Linux 5.11.0-1022-azure
Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Filters pushdown: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
w/o filters 22560 22661 88 0.0 225603.4 1.0X
pushdown disabled 22558 22647 107 0.0 225578.3 1.0X
w/ filters 1558 1590 29 0.1 15582.4 14.5X