Add an option to ignore UTF-8 encoding errors

By default Jiffy is quite strict in what it encodes. By default it will
not allow invalid UTF-8 to be produced. This can cause issues when
attempting to encode JSON that was decoded by other libraries as UTF-8
semantics are not uniformly enforced.

This patch adds an option 'force_utf8' to the encoder. If encoding hits
an error for an invalid string it will forcefully mutate the object to
contain only valid UTF-8 and return the resulting encoded JSON.

For the most part this means it will strip any garbage data from
binaries replacing it replacement codepoint U+FFFD. Although, it will
also try and the common error of encoding surrogate pairs as three-byte
sequences and reencode them into UTF-8 properly.
6 files changed
tree: 874dc6aeede4420393e3f8ec22449f4c9795fbe1
  1. c_src/
  2. src/
  3. test/
  4. .gitignore
  5. LICENSE
  6. Makefile
  7. README.md
  8. rebar
  9. rebar.config
README.md

Jiffy - JSON NIFs for Erlang

A JSON parser as a NIF. This is a complete rewrite of the work I did in EEP0018 that was based on Yajl. This new version is a hand crafted state machine that does its best to be as quick and efficient as possible while not placing any constraints on the parsed JSON.

Usage

Jiffy is a simple API. The only thing that might catch you off guard is that the return type of jiffy:encode/1 is an iolist even though it returns a binary most of the time.

A quick note on unicode. Jiffy only understands utf-8 in binaries. End of story. Also, there is a jiffy:encode/2 that takes a list of options for encoding. Currently the only supported option is uescape.

Errors are raised as exceptions.

Eshell V5.8.2  (abort with ^G)
1> jiffy:decode(<<"{\"foo\": \"bar\"}">>).
{[{<<"foo">>,<<"bar">>}]}
2> Doc = {[{foo, [<<"bing">>, 2.3, true]}]}.
{[{foo,[<<"bing">>,2.3,true]}]}
3> jiffy:encode(Doc).
<<"{\"foo\":[\"bing\",2.2999999999999998224,true]}">>

Data Format

Erlang                          JSON            Erlang
==========================================================================

null                       -> null           -> null
true                       -> true           -> true
false                      -> false          -> false
"hi"                       -> [104, 105]     -> [104, 105]
<<"hi">>                   -> "hi"           -> <<"hi">>
hi                         -> "hi"           -> <<"hi">>
1                          -> 1              -> 1
1.25                       -> 1.25           -> 1.25
[]                         -> []             -> []
[true, 1.0]                -> [true, 1.0]    -> [true, 1.0]
{[]}                       -> {}             -> {[]}
{[{foo, bar}]}             -> {"foo": "bar"} -> {[{<<"foo">>, <<"bar">>}]}
{[{<<"foo">>, <<"bar">>}]} -> {"foo": "bar"} -> {[{<<"foo">>, <<"bar">>}]}

Improvements over EEP0018

Jiffy should be in all ways an improvemnt over EEP0018. It no longer imposes limits on the nesting depth. It is capable of encoding and decoding large numbers and it does quite a bit more checking for validity of valid UTF-8 in strings.