commit | dc50114756b7e9067b42181a6a86f928effc6e68 | [log] [tgz] |
---|---|---|
author | Dewey Dunnington <dewey@dunnington.ca> | Mon Mar 18 09:28:33 2024 -0300 |
committer | GitHub <noreply@github.com> | Mon Mar 18 09:28:33 2024 -0300 |
tree | c1f7374f2a0127de1c136e6e95b7bd4a94b6a1a8 | |
parent | c7a123659b7e6a7e10fc849b0af1e0ec904f3666 [diff] |
chore(dev/benchmarks): Add benchmarks for `ArrowArrayAppend()` (#401) This PR adds a set of benchmarks for building arrays using `ArrowArrayAppendXXX()` and adds a few missing ones for `ArrowArrayView` like `ArrowArrayViewGetString()`. (Report output in details) <details> # Benchmark Report ## Configurations These benchmarks were run with the following configurations: | preset_name | preset_description | |:------------|:-------------------------------------------------| | local | Uses the nanoarrow C sources from this checkout. | | v0.4.0 | Uses the nanoarrow C sources the 0.4.0 release. | ## Summary A quick and dirty summary of benchmark results between this checkout and the last released version. | benchmark_label | v0.4.0 | local | change | pct_change | |:----------------------------------------------------------|---------:|---------:|-------:|-----------:| | [ArrayAppendInt16](#arrayappendint16) | 2.68ms | 2.66ms | 1ns | -0.9% | | [ArrayAppendInt32](#arrayappendint32) | 3.12ms | 3.08ms | 1ns | -1.3% | | [ArrayAppendInt64](#arrayappendint64) | 3.79ms | 3.47ms | 1ns | -8.4% | | [ArrayAppendInt8](#arrayappendint8) | 2.39ms | 2.38ms | 1ns | -0.1% | | [ArrayAppendNulls](#arrayappendnulls) | 12.05ms | 12.04ms | 1ns | -0.1% | | [ArrayAppendString](#arrayappendstring) | 8.96ms | 8.67ms | 1ns | -3.2% | | [ArrayViewGetInt16](#arrayviewgetint16) | 628.79µs | 627.1µs | 1ns | -0.3% | | [ArrayViewGetInt32](#arrayviewgetint32) | 634.21µs | 625.86µs | 1ns | -1.3% | | [ArrayViewGetInt64](#arrayviewgetint64) | 672.81µs | 676.99µs | 4.18µs | 0.6% | | [ArrayViewGetInt8](#arrayviewgetint8) | 783.55µs | 784.61µs | 1.05µs | 0.1% | | [ArrayViewGetString](#arrayviewgetstring) | 1.26ms | 1.25ms | 1ns | -0.4% | | [ArrayViewIsNull](#arrayviewisnull) | 1.21ms | 1.19ms | 1ns | -1.8% | | [ArrayViewIsNullNonNullable](#arrayviewisnullnonnullable) | 938.36µs | 940.65µs | 2.28µs | 0.2% | | [SchemaInitWideStruct](#schemainitwidestruct) | 1.02ms | 1.02ms | 1ns | -0.2% | | [SchemaViewInitWideStruct](#schemaviewinitwidestruct) | 103.62µs | 103.53µs | 1ns | -0.1% | ## ArrowArray-related benchmarks Benchmarks for producing ArrowArrays using the ArrowArrayXXX() functions. ### ArrayAppendString Use ArrowArrayAppendString() to build a string array. [View Source](https://github.com/paleolimbot/arrow-nanoarrow/blob/benchmarks-read-create/dev/benchmarks/c/array_benchmark.cc#L288-L315) | preset_name | iterations | real_time | cpu_time | items_per_second | |:------------|-----------:|----------:|---------:|-----------------:| | local | 83 | 8.67ms | 8.64ms | 115,712,019 | | v0.4.0 | 77 | 8.96ms | 8.81ms | 113,455,364 | ### ArrayAppendInt8 Use ArrowArrayAppendInt() to build an int8 array. [View Source](https://github.com/paleolimbot/arrow-nanoarrow/blob/benchmarks-read-create/dev/benchmarks/c/array_benchmark.cc#L339-L341) | preset_name | iterations | real_time | cpu_time | items_per_second | |:------------|-----------:|----------:|---------:|-----------------:| | local | 292 | 2.38ms | 2.38ms | 420,186,810 | | v0.4.0 | 296 | 2.39ms | 2.38ms | 419,740,272 | ### ArrayAppendInt16 Use ArrowArrayAppendInt() to build an int16 array. [View Source](https://github.com/paleolimbot/arrow-nanoarrow/blob/benchmarks-read-create/dev/benchmarks/c/array_benchmark.cc#L344-L346) | preset_name | iterations | real_time | cpu_time | items_per_second | |:------------|-----------:|----------:|---------:|-----------------:| | local | 264 | 2.66ms | 2.66ms | 376,369,150 | | v0.4.0 | 261 | 2.68ms | 2.68ms | 373,079,925 | ### ArrayAppendInt32 Use ArrowArrayAppendInt() to build an int32 array. [View Source](https://github.com/paleolimbot/arrow-nanoarrow/blob/benchmarks-read-create/dev/benchmarks/c/array_benchmark.cc#L349-L351) | preset_name | iterations | real_time | cpu_time | items_per_second | |:------------|-----------:|----------:|---------:|-----------------:| | local | 228 | 3.08ms | 3.08ms | 324,738,215 | | v0.4.0 | 225 | 3.12ms | 3.12ms | 320,760,473 | ### ArrayAppendInt64 Use ArrowArrayAppendInt() to build an int64 array. [View Source](https://github.com/paleolimbot/arrow-nanoarrow/blob/benchmarks-read-create/dev/benchmarks/c/array_benchmark.cc#L354-L356) | preset_name | iterations | real_time | cpu_time | items_per_second | |:------------|-----------:|----------:|---------:|-----------------:| | local | 206 | 3.47ms | 3.46ms | 289,089,536 | | v0.4.0 | 186 | 3.79ms | 3.77ms | 265,070,543 | ### ArrayAppendNulls Use ArrowArrayAppendNulls() to build an int32 array that contains 80% null values. [View Source](https://github.com/paleolimbot/arrow-nanoarrow/blob/benchmarks-read-create/dev/benchmarks/c/array_benchmark.cc#L379-L401) | preset_name | iterations | real_time | cpu_time | items_per_second | |:------------|-----------:|----------:|---------:|-----------------:| | local | 59 | 12ms | 12ms | 83,199,603 | | v0.4.0 | 58 | 12ms | 12ms | 83,135,409 | ## ArrowArrayView-related benchmarks Benchmarks for consuming ArrowArrays using the ArrowArrayViewXXX() functions. ### ArrayViewGetInt8 Use ArrowArrayViewGet() to consume an int8 array. [View Source](https://github.com/paleolimbot/arrow-nanoarrow/blob/benchmarks-read-create/dev/benchmarks/c/array_benchmark.cc#L118-L120) | preset_name | iterations | real_time | cpu_time | items_per_second | |:------------|-----------:|----------:|---------:|-----------------:| | local | 893 | 785µs | 784µs | 1,276,321,450 | | v0.4.0 | 894 | 784µs | 782µs | 1,278,021,040 | ### ArrayViewGetInt16 Use ArrowArrayViewGet() to consume an int16 array. [View Source](https://github.com/paleolimbot/arrow-nanoarrow/blob/benchmarks-read-create/dev/benchmarks/c/array_benchmark.cc#L123-L125) | preset_name | iterations | real_time | cpu_time | items_per_second | |:------------|-----------:|----------:|---------:|-----------------:| | local | 1114 | 627µs | 626µs | 1,597,100,560 | | v0.4.0 | 1115 | 629µs | 628µs | 1,593,178,054 | ### ArrayViewGetInt32 Use ArrowArrayViewGet() to consume an int32 array. [View Source](https://github.com/paleolimbot/arrow-nanoarrow/blob/benchmarks-read-create/dev/benchmarks/c/array_benchmark.cc#L128-L130) | preset_name | iterations | real_time | cpu_time | items_per_second | |:------------|-----------:|----------:|---------:|-----------------:| | local | 1115 | 626µs | 625µs | 1,600,061,993 | | v0.4.0 | 1114 | 634µs | 633µs | 1,580,536,418 | ### ArrayViewGetInt64 Use ArrowArrayViewGet() to consume an int64 array. [View Source](https://github.com/paleolimbot/arrow-nanoarrow/blob/benchmarks-read-create/dev/benchmarks/c/array_benchmark.cc#L133-L135) | preset_name | iterations | real_time | cpu_time | items_per_second | |:------------|-----------:|----------:|---------:|-----------------:| | local | 1023 | 677µs | 676µs | 1,480,375,260 | | v0.4.0 | 1018 | 673µs | 671µs | 1,490,177,709 | ### ArrayViewIsNullNonNullable Use ArrowArrayViewIsNull() to check for nulls while consuming an int32 array that does not contain a validity buffer. [View Source](https://github.com/paleolimbot/arrow-nanoarrow/blob/benchmarks-read-create/dev/benchmarks/c/array_benchmark.cc#L139-L168) | preset_name | iterations | real_time | cpu_time | items_per_second | |:------------|-----------:|----------:|---------:|-----------------:| | local | 746 | 941µs | 940µs | 1,064,112,037 | | v0.4.0 | 745 | 938µs | 937µs | 1,066,931,705 | ### ArrayViewIsNull Use ArrowArrayViewIsNull() to check for nulls while consuming an int32 array that contains 20% nulls. [View Source](https://github.com/paleolimbot/arrow-nanoarrow/blob/benchmarks-read-create/dev/benchmarks/c/array_benchmark.cc#L172-L211) | preset_name | iterations | real_time | cpu_time | items_per_second | |:------------|-----------:|----------:|---------:|-----------------:| | local | 588 | 1.19ms | 1.19ms | 842,447,913 | | v0.4.0 | 588 | 1.21ms | 1.2ms | 830,223,525 | ### ArrayViewGetString Use ArrowArrayViewGetStringUnsafe() to consume a string array. [View Source](https://github.com/paleolimbot/arrow-nanoarrow/blob/benchmarks-read-create/dev/benchmarks/c/array_benchmark.cc#L214-L245) | preset_name | iterations | real_time | cpu_time | items_per_second | |:------------|-----------:|----------:|---------:|-----------------:| | local | 557 | 1.25ms | 1.25ms | 800,060,902 | | v0.4.0 | 546 | 1.26ms | 1.25ms | 797,048,875 | ## Schema-related benchmarks Benchmarks for producing and consuming ArrowSchema. ### SchemaInitWideStruct Benchmark ArrowSchema creation for very wide tables. Simulates part of the process of creating a very wide table with a simple column type (integer). [View Source](https://github.com/paleolimbot/arrow-nanoarrow/blob/benchmarks-read-create/dev/benchmarks/c/schema_benchmark.cc#L45-L56) | preset_name | iterations | real_time | cpu_time | items_per_second | |:------------|-----------:|----------:|---------:|-----------------:| | local | 690 | 1.02ms | 1.02ms | 9,843,783 | | v0.4.0 | 683 | 1.02ms | 1.02ms | 9,831,837 | ### SchemaViewInitWideStruct Benchmark ArrowSchema parsing for very wide tables. Simulates part of the process of consuming a very wide table. Typically the ArrowSchemaViewInit() is done by ArrowArrayViewInit() but uses a similar pattern. [View Source](https://github.com/paleolimbot/arrow-nanoarrow/blob/benchmarks-read-create/dev/benchmarks/c/schema_benchmark.cc#L78-L91) | preset_name | iterations | real_time | cpu_time | items_per_second | |:------------|-----------:|----------:|---------:|-----------------:| | local | 6772 | 104µs | 103µs | 96,669,664 | | v0.4.0 | 6749 | 104µs | 103µs | 96,625,343 | </details> --------- Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com>
The nanoarrow library is a set of helper functions to interpret and generate Arrow C Data Interface and Arrow C Stream Interface structures. The library is in active early development and users should update regularly from the main branch of this repository.
Whereas the current suite of Arrow implementations provide the basis for a comprehensive data analysis toolkit, this library is intended to support clients that wish to produce or interpret Arrow C Data and/or Arrow C Stream structures where linking to a higher level Arrow binding is difficult or impossible.
The nanoarrow C library is intended to be copied and vendored. This can be done using CMake or by using the bundled nanoarrow.h/nanorrow.c distribution available in the dist/ directory in this repository. Examples of both can be found in the examples/ directory in this repository.
A simple producer example:
#include "nanoarrow.h" int make_simple_array(struct ArrowArray* array_out, struct ArrowSchema* schema_out) { struct ArrowError error; array_out->release = NULL; schema_out->release = NULL; NANOARROW_RETURN_NOT_OK(ArrowArrayInitFromType(array_out, NANOARROW_TYPE_INT32)); NANOARROW_RETURN_NOT_OK(ArrowArrayStartAppending(array_out)); NANOARROW_RETURN_NOT_OK(ArrowArrayAppendInt(array_out, 1)); NANOARROW_RETURN_NOT_OK(ArrowArrayAppendInt(array_out, 2)); NANOARROW_RETURN_NOT_OK(ArrowArrayAppendInt(array_out, 3)); NANOARROW_RETURN_NOT_OK(ArrowArrayFinishBuildingDefault(array_out, &error)); NANOARROW_RETURN_NOT_OK(ArrowSchemaInitFromType(schema_out, NANOARROW_TYPE_INT32)); return NANOARROW_OK; }
A simple consumer example:
#include <stdio.h> #include "nanoarrow.h" int print_simple_array(struct ArrowArray* array, struct ArrowSchema* schema) { struct ArrowError error; struct ArrowArrayView array_view; NANOARROW_RETURN_NOT_OK(ArrowArrayViewInitFromSchema(&array_view, schema, &error)); if (array_view.storage_type != NANOARROW_TYPE_INT32) { printf("Array has storage that is not int32\n"); } int result = ArrowArrayViewSetArray(&array_view, array, &error); if (result != NANOARROW_OK) { ArrowArrayViewReset(&array_view); return result; } for (int64_t i = 0; i < array->length; i++) { printf("%d\n", (int)ArrowArrayViewGetIntUnsafe(&array_view, i)); } ArrowArrayViewReset(&array_view); return NANOARROW_OK; }