GH-2994: Optimize string to binary conversion in AvroWriteSupport (#2995)
`Binary.fromCharSequence` is an order of magnitude slower than `Binary.fromString` when input is a `String`:
```
Benchmarks.fromCharSequence thrpt 25 5885347.328 ± 186669.738 ops/s
Benchmarks.fromString thrpt 25 71335979.492 ± 8800704.044 ops/s
```
Here is the code for the benchmarks:
```java
public class Benchmarks {
private static final String string = RandomStringUtils.randomAlphanumeric(100);
@Benchmark
@BenchmarkMode(Mode.Throughput)
public void fromCharSequence(Blackhole blackhole) {
blackhole.consume(Binary.fromCharSequence(string));
}
@Benchmark
@BenchmarkMode(Mode.Throughput)
public void fromString(Blackhole blackhole) {
blackhole.consume(Binary.fromString(string));
}
}
```diff --git a/parquet-avro/src/main/java/org/apache/parquet/avro/AvroWriteSupport.java b/parquet-avro/src/main/java/org/apache/parquet/avro/AvroWriteSupport.java
index 846fb8b..53fc3d5 100644
--- a/parquet-avro/src/main/java/org/apache/parquet/avro/AvroWriteSupport.java
+++ b/parquet-avro/src/main/java/org/apache/parquet/avro/AvroWriteSupport.java
@@ -403,10 +403,12 @@
if (value instanceof Utf8) {
Utf8 utf8 = (Utf8) value;
return Binary.fromReusedByteArray(utf8.getBytes(), 0, utf8.getByteLength());
+ } else if (value instanceof String) {
+ return Binary.fromString((String) value);
} else if (value instanceof CharSequence) {
return Binary.fromCharSequence((CharSequence) value);
}
- return Binary.fromCharSequence(value.toString());
+ return Binary.fromString(value.toString());
}
private static GenericData getDataModel(ParquetConfiguration conf, Schema schema) {