| commit | 414827d604b0f28974bd666f7da1068bb36b44ae | [log] [tgz] |
|---|---|---|
| author | Paul J. Davis <paul.joseph.davis@gmail.com> | Fri Jun 01 09:53:41 2012 -0500 |
| committer | Paul J. Davis <paul.joseph.davis@gmail.com> | Fri Jun 01 10:35:02 2012 -0500 |
| tree | 874dc6aeede4420393e3f8ec22449f4c9795fbe1 | |
| parent | 6f589d457673a3bbdf4329103fbe2b10b302ec87 [diff] |
Add an option to ignore UTF-8 encoding errors By default Jiffy is quite strict in what it encodes. By default it will not allow invalid UTF-8 to be produced. This can cause issues when attempting to encode JSON that was decoded by other libraries as UTF-8 semantics are not uniformly enforced. This patch adds an option 'force_utf8' to the encoder. If encoding hits an error for an invalid string it will forcefully mutate the object to contain only valid UTF-8 and return the resulting encoded JSON. For the most part this means it will strip any garbage data from binaries replacing it replacement codepoint U+FFFD. Although, it will also try and the common error of encoding surrogate pairs as three-byte sequences and reencode them into UTF-8 properly.
A JSON parser as a NIF. This is a complete rewrite of the work I did in EEP0018 that was based on Yajl. This new version is a hand crafted state machine that does its best to be as quick and efficient as possible while not placing any constraints on the parsed JSON.
Jiffy is a simple API. The only thing that might catch you off guard is that the return type of jiffy:encode/1 is an iolist even though it returns a binary most of the time.
A quick note on unicode. Jiffy only understands utf-8 in binaries. End of story. Also, there is a jiffy:encode/2 that takes a list of options for encoding. Currently the only supported option is uescape.
Errors are raised as exceptions.
Eshell V5.8.2 (abort with ^G)
1> jiffy:decode(<<"{\"foo\": \"bar\"}">>).
{[{<<"foo">>,<<"bar">>}]}
2> Doc = {[{foo, [<<"bing">>, 2.3, true]}]}.
{[{foo,[<<"bing">>,2.3,true]}]}
3> jiffy:encode(Doc).
<<"{\"foo\":[\"bing\",2.2999999999999998224,true]}">>
Erlang JSON Erlang
==========================================================================
null -> null -> null
true -> true -> true
false -> false -> false
"hi" -> [104, 105] -> [104, 105]
<<"hi">> -> "hi" -> <<"hi">>
hi -> "hi" -> <<"hi">>
1 -> 1 -> 1
1.25 -> 1.25 -> 1.25
[] -> [] -> []
[true, 1.0] -> [true, 1.0] -> [true, 1.0]
{[]} -> {} -> {[]}
{[{foo, bar}]} -> {"foo": "bar"} -> {[{<<"foo">>, <<"bar">>}]}
{[{<<"foo">>, <<"bar">>}]} -> {"foo": "bar"} -> {[{<<"foo">>, <<"bar">>}]}
Jiffy should be in all ways an improvemnt over EEP0018. It no longer imposes limits on the nesting depth. It is capable of encoding and decoding large numbers and it does quite a bit more checking for validity of valid UTF-8 in strings.