URL-safe Base64 encoder

Clone this repo:
  1. ef4fca0 Add C/C++ plugin to devcontainer by Adam Kocoloski · 9 months ago main 1.0.3
  2. 7a80ac1 Add Erlang/OTP 20 and 21 to CI matrix by Adam Kocoloski · 9 months ago
  3. 442003e Format code with erlfmt, run dialyzer in CI by Adam Kocoloski · 9 months ago
  4. 1040be2 Fix rebar3 debug logging by Adam Kocoloski · 9 months ago
  5. 699f471 Metadata / README updates, branch protection by Adam Kocoloski · 9 months ago

Base64 encoder with URL-safe scheme

CI

This is a simple NIF that is responsible for quickly encoding and decoding Base64 URL values:

1> Thing = b64url:encode("Hello, CouchDB!").
<<"SGVsbG8sIENvdWNoREIh">>
2> b64url:decode(Thing).
<<"Hello, CouchDB!">>

Performance

This implementation is significantly faster than the Erlang version it replaced in CouchDB. The benchmark.escript file contains the original implementation (using regular expressions to replace unsafe characters in the output of the base64 module) and can be used to compare the two for strings of various lengths. For example:

ERL_LIBS=_build/default/lib/b64url/ ./test/benchmark.escript 4 10 100 30
erl :       75491270 bytes /  30 seconds =     2516375.67 bps
nif :      672299342 bytes /  30 seconds =    22409978.07 bps

This test invocation spawns four workers that generate random strings between 10 and 100 bytes in length and then perform an encode/decode on them in a tight loop for 30 seconds, and then reports the aggregate encoded data volume. Note that the generator overhead (crypto:strong_rand_bytes/1) is included in these results, so the relative difference in encoder throughput is rather larger than what's reported here.

Timeslice Consumption

NIF implementations must take care to avoid doing lengthy work in a single invocation. This library will yield back to the Erlang VM as needed when operating on a large string, maintaining a partial result until it can resume operation. The current implementation uses a conservative heuristic that estimates 64 bytes of encoding / decoding to consume 1% of a timeslice, so input strings shorter than ~6k should be encoded / decoded within a single invocation, and the library should not adversely affect the responsiveness of the VM in any way.