| <!-- doc/src/sgml/json.sgml --> |
| |
| <sect1 id="datatype-json"> |
| <title><acronym>JSON</acronym> Types</title> |
| |
| <indexterm zone="datatype-json"> |
| <primary>JSON</primary> |
| </indexterm> |
| |
| <indexterm zone="datatype-json"> |
| <primary>JSONB</primary> |
| </indexterm> |
| |
| <para> |
| JSON data types are for storing JSON (JavaScript Object Notation) |
| data, as specified in <ulink url="https://tools.ietf.org/html/rfc7159">RFC |
| 7159</ulink>. Such data can also be stored as <type>text</type>, but |
| the JSON data types have the advantage of enforcing that each |
| stored value is valid according to the JSON rules. There are also |
| assorted JSON-specific functions and operators available for data stored |
| in these data types; see <xref linkend="functions-json"/>. |
| </para> |
| |
| <para> |
| <productname>PostgreSQL</productname> offers two types for storing JSON |
| data: <type>json</type> and <type>jsonb</type>. To implement efficient query |
| mechanisms for these data types, <productname>PostgreSQL</productname> |
| also provides the <type>jsonpath</type> data type described in |
| <xref linkend="datatype-jsonpath"/>. |
| </para> |
| |
| <para> |
| The <type>json</type> and <type>jsonb</type> data types |
| accept <emphasis>almost</emphasis> identical sets of values as |
| input. The major practical difference is one of efficiency. The |
| <type>json</type> data type stores an exact copy of the input text, |
| which processing functions must reparse on each execution; while |
| <type>jsonb</type> data is stored in a decomposed binary format that |
| makes it slightly slower to input due to added conversion |
| overhead, but significantly faster to process, since no reparsing |
| is needed. <type>jsonb</type> also supports indexing, which can be a |
| significant advantage. |
| </para> |
| |
| <para> |
| Because the <type>json</type> type stores an exact copy of the input text, it |
| will preserve semantically-insignificant white space between tokens, as |
| well as the order of keys within JSON objects. Also, if a JSON object |
| within the value contains the same key more than once, all the key/value |
| pairs are kept. (The processing functions consider the last value as the |
| operative one.) By contrast, <type>jsonb</type> does not preserve white |
| space, does not preserve the order of object keys, and does not keep |
| duplicate object keys. If duplicate keys are specified in the input, |
| only the last value is kept. |
| </para> |
| |
| <para> |
| In general, most applications should prefer to store JSON data as |
| <type>jsonb</type>, unless there are quite specialized needs, such as |
| legacy assumptions about ordering of object keys. |
| </para> |
| |
| <para> |
| <acronym>RFC</acronym> 7159 specifies that JSON strings should be encoded in UTF8. |
| It is therefore not possible for the JSON |
| types to conform rigidly to the JSON specification unless the database |
| encoding is UTF8. Attempts to directly include characters that |
| cannot be represented in the database encoding will fail; conversely, |
| characters that can be represented in the database encoding but not |
| in UTF8 will be allowed. |
| </para> |
| |
| <para> |
| <acronym>RFC</acronym> 7159 permits JSON strings to contain Unicode escape sequences |
| denoted by <literal>\u<replaceable>XXXX</replaceable></literal>. In the input |
| function for the <type>json</type> type, Unicode escapes are allowed |
| regardless of the database encoding, and are checked only for syntactic |
| correctness (that is, that four hex digits follow <literal>\u</literal>). |
| However, the input function for <type>jsonb</type> is stricter: it disallows |
| Unicode escapes for characters that cannot be represented in the database |
| encoding. The <type>jsonb</type> type also |
| rejects <literal>\u0000</literal> (because that cannot be represented in |
| <productname>PostgreSQL</productname>'s <type>text</type> type), and it insists |
| that any use of Unicode surrogate pairs to designate characters outside |
| the Unicode Basic Multilingual Plane be correct. Valid Unicode escapes |
| are converted to the equivalent single character for storage; |
| this includes folding surrogate pairs into a single character. |
| </para> |
| |
| <note> |
| <para> |
| Many of the JSON processing functions described |
| in <xref linkend="functions-json"/> will convert Unicode escapes to |
| regular characters, and will therefore throw the same types of errors |
| just described even if their input is of type <type>json</type> |
| not <type>jsonb</type>. The fact that the <type>json</type> input function does |
| not make these checks may be considered a historical artifact, although |
| it does allow for simple storage (without processing) of JSON Unicode |
| escapes in a database encoding that does not support the represented |
| characters. |
| </para> |
| </note> |
| |
| <para> |
| When converting textual JSON input into <type>jsonb</type>, the primitive |
| types described by <acronym>RFC</acronym> 7159 are effectively mapped onto |
| native <productname>PostgreSQL</productname> types, as shown |
| in <xref linkend="json-type-mapping-table"/>. |
| Therefore, there are some minor additional constraints on what |
| constitutes valid <type>jsonb</type> data that do not apply to |
| the <type>json</type> type, nor to JSON in the abstract, corresponding |
| to limits on what can be represented by the underlying data type. |
| Notably, <type>jsonb</type> will reject numbers that are outside the |
| range of the <productname>PostgreSQL</productname> <type>numeric</type> data |
| type, while <type>json</type> will not. Such implementation-defined |
| restrictions are permitted by <acronym>RFC</acronym> 7159. However, in |
| practice such problems are far more likely to occur in other |
| implementations, as it is common to represent JSON's <type>number</type> |
| primitive type as IEEE 754 double precision floating point |
| (which <acronym>RFC</acronym> 7159 explicitly anticipates and allows for). |
| When using JSON as an interchange format with such systems, the danger |
| of losing numeric precision compared to data originally stored |
| by <productname>PostgreSQL</productname> should be considered. |
| </para> |
| |
| <para> |
| Conversely, as noted in the table there are some minor restrictions on |
| the input format of JSON primitive types that do not apply to |
| the corresponding <productname>PostgreSQL</productname> types. |
| </para> |
| |
| <table id="json-type-mapping-table"> |
| <title>JSON Primitive Types and Corresponding <productname>PostgreSQL</productname> Types</title> |
| <tgroup cols="3"> |
| <colspec colname="col1" colwidth="1*"/> |
| <colspec colname="col2" colwidth="1*"/> |
| <colspec colname="col3" colwidth="2*"/> |
| <thead> |
| <row> |
| <entry>JSON primitive type</entry> |
| <entry><productname>PostgreSQL</productname> type</entry> |
| <entry>Notes</entry> |
| </row> |
| </thead> |
| <tbody> |
| <row> |
| <entry><type>string</type></entry> |
| <entry><type>text</type></entry> |
| <entry><literal>\u0000</literal> is disallowed, as are Unicode escapes |
| representing characters not available in the database encoding</entry> |
| </row> |
| <row> |
| <entry><type>number</type></entry> |
| <entry><type>numeric</type></entry> |
| <entry><literal>NaN</literal> and <literal>infinity</literal> values are disallowed</entry> |
| </row> |
| <row> |
| <entry><type>boolean</type></entry> |
| <entry><type>boolean</type></entry> |
| <entry>Only lowercase <literal>true</literal> and <literal>false</literal> spellings are accepted</entry> |
| </row> |
| <row> |
| <entry><type>null</type></entry> |
| <entry>(none)</entry> |
| <entry>SQL <literal>NULL</literal> is a different concept</entry> |
| </row> |
| </tbody> |
| </tgroup> |
| </table> |
| |
| <sect2 id="json-keys-elements"> |
| <title>JSON Input and Output Syntax</title> |
| <para> |
| The input/output syntax for the JSON data types is as specified in |
| <acronym>RFC</acronym> 7159. |
| </para> |
| <para> |
| The following are all valid <type>json</type> (or <type>jsonb</type>) expressions: |
| <programlisting> |
| -- Simple scalar/primitive value |
| -- Primitive values can be numbers, quoted strings, true, false, or null |
| SELECT '5'::json; |
| |
| -- Array of zero or more elements (elements need not be of same type) |
| SELECT '[1, 2, "foo", null]'::json; |
| |
| -- Object containing pairs of keys and values |
| -- Note that object keys must always be quoted strings |
| SELECT '{"bar": "baz", "balance": 7.77, "active": false}'::json; |
| |
| -- Arrays and objects can be nested arbitrarily |
| SELECT '{"foo": [true, "bar"], "tags": {"a": 1, "b": null}}'::json; |
| </programlisting> |
| </para> |
| |
| <para> |
| As previously stated, when a JSON value is input and then printed without |
| any additional processing, <type>json</type> outputs the same text that was |
| input, while <type>jsonb</type> does not preserve semantically-insignificant |
| details such as whitespace. For example, note the differences here: |
| <programlisting> |
| SELECT '{"bar": "baz", "balance": 7.77, "active":false}'::json; |
| json |
| ------------------------------------------------- |
| {"bar": "baz", "balance": 7.77, "active":false} |
| (1 row) |
| |
| SELECT '{"bar": "baz", "balance": 7.77, "active":false}'::jsonb; |
| jsonb |
| -------------------------------------------------- |
| {"bar": "baz", "active": false, "balance": 7.77} |
| (1 row) |
| </programlisting> |
| One semantically-insignificant detail worth noting is that |
| in <type>jsonb</type>, numbers will be printed according to the behavior of the |
| underlying <type>numeric</type> type. In practice this means that numbers |
| entered with <literal>E</literal> notation will be printed without it, for |
| example: |
| <programlisting> |
| SELECT '{"reading": 1.230e-5}'::json, '{"reading": 1.230e-5}'::jsonb; |
| json | jsonb |
| -----------------------+------------------------- |
| {"reading": 1.230e-5} | {"reading": 0.00001230} |
| (1 row) |
| </programlisting> |
| However, <type>jsonb</type> will preserve trailing fractional zeroes, as seen |
| in this example, even though those are semantically insignificant for |
| purposes such as equality checks. |
| </para> |
| |
| <para> |
| For the list of built-in functions and operators available for |
| constructing and processing JSON values, see <xref linkend="functions-json"/>. |
| </para> |
| </sect2> |
| |
| <sect2 id="json-doc-design"> |
| <title>Designing JSON Documents</title> |
| <para> |
| Representing data as JSON can be considerably more flexible than |
| the traditional relational data model, which is compelling in |
| environments where requirements are fluid. It is quite possible |
| for both approaches to co-exist and complement each other within |
| the same application. However, even for applications where maximal |
| flexibility is desired, it is still recommended that JSON documents |
| have a somewhat fixed structure. The structure is typically |
| unenforced (though enforcing some business rules declaratively is |
| possible), but having a predictable structure makes it easier to write |
| queries that usefully summarize a set of <quote>documents</quote> (datums) |
| in a table. |
| </para> |
| <para> |
| JSON data is subject to the same concurrency-control |
| considerations as any other data type when stored in a table. |
| Although storing large documents is practicable, keep in mind that |
| any update acquires a row-level lock on the whole row. |
| Consider limiting JSON documents to a |
| manageable size in order to decrease lock contention among updating |
| transactions. Ideally, JSON documents should each |
| represent an atomic datum that business rules dictate cannot |
| reasonably be further subdivided into smaller datums that |
| could be modified independently. |
| </para> |
| </sect2> |
| |
| <sect2 id="json-containment"> |
| <title><type>jsonb</type> Containment and Existence</title> |
| <indexterm> |
| <primary>jsonb</primary> |
| <secondary>containment</secondary> |
| </indexterm> |
| <indexterm> |
| <primary>jsonb</primary> |
| <secondary>existence</secondary> |
| </indexterm> |
| <para> |
| Testing <firstterm>containment</firstterm> is an important capability of |
| <type>jsonb</type>. There is no parallel set of facilities for the |
| <type>json</type> type. Containment tests whether |
| one <type>jsonb</type> document has contained within it another one. |
| These examples return true except as noted: |
| </para> |
| <programlisting> |
| -- Simple scalar/primitive values contain only the identical value: |
| SELECT '"foo"'::jsonb @> '"foo"'::jsonb; |
| |
| -- The array on the right side is contained within the one on the left: |
| SELECT '[1, 2, 3]'::jsonb @> '[1, 3]'::jsonb; |
| |
| -- Order of array elements is not significant, so this is also true: |
| SELECT '[1, 2, 3]'::jsonb @> '[3, 1]'::jsonb; |
| |
| -- Duplicate array elements don't matter either: |
| SELECT '[1, 2, 3]'::jsonb @> '[1, 2, 2]'::jsonb; |
| |
| -- The object with a single pair on the right side is contained |
| -- within the object on the left side: |
| SELECT '{"product": "PostgreSQL", "version": 9.4, "jsonb": true}'::jsonb @> '{"version": 9.4}'::jsonb; |
| |
| -- The array on the right side is <emphasis>not</emphasis> considered contained within the |
| -- array on the left, even though a similar array is nested within it: |
| SELECT '[1, 2, [1, 3]]'::jsonb @> '[1, 3]'::jsonb; -- yields false |
| |
| -- But with a layer of nesting, it is contained: |
| SELECT '[1, 2, [1, 3]]'::jsonb @> '[[1, 3]]'::jsonb; |
| |
| -- Similarly, containment is not reported here: |
| SELECT '{"foo": {"bar": "baz"}}'::jsonb @> '{"bar": "baz"}'::jsonb; -- yields false |
| |
| -- A top-level key and an empty object is contained: |
| SELECT '{"foo": {"bar": "baz"}}'::jsonb @> '{"foo": {}}'::jsonb; |
| </programlisting> |
| |
| <para> |
| The general principle is that the contained object must match the |
| containing object as to structure and data contents, possibly after |
| discarding some non-matching array elements or object key/value pairs |
| from the containing object. |
| But remember that the order of array elements is not significant when |
| doing a containment match, and duplicate array elements are effectively |
| considered only once. |
| </para> |
| |
| <para> |
| As a special exception to the general principle that the structures |
| must match, an array may contain a primitive value: |
| </para> |
| <programlisting> |
| -- This array contains the primitive string value: |
| SELECT '["foo", "bar"]'::jsonb @> '"bar"'::jsonb; |
| |
| -- This exception is not reciprocal -- non-containment is reported here: |
| SELECT '"bar"'::jsonb @> '["bar"]'::jsonb; -- yields false |
| </programlisting> |
| |
| <para> |
| <type>jsonb</type> also has an <firstterm>existence</firstterm> operator, which is |
| a variation on the theme of containment: it tests whether a string |
| (given as a <type>text</type> value) appears as an object key or array |
| element at the top level of the <type>jsonb</type> value. |
| These examples return true except as noted: |
| </para> |
| <programlisting> |
| -- String exists as array element: |
| SELECT '["foo", "bar", "baz"]'::jsonb ? 'bar'; |
| |
| -- String exists as object key: |
| SELECT '{"foo": "bar"}'::jsonb ? 'foo'; |
| |
| -- Object values are not considered: |
| SELECT '{"foo": "bar"}'::jsonb ? 'bar'; -- yields false |
| |
| -- As with containment, existence must match at the top level: |
| SELECT '{"foo": {"bar": "baz"}}'::jsonb ? 'bar'; -- yields false |
| |
| -- A string is considered to exist if it matches a primitive JSON string: |
| SELECT '"foo"'::jsonb ? 'foo'; |
| </programlisting> |
| |
| <para> |
| JSON objects are better suited than arrays for testing containment or |
| existence when there are many keys or elements involved, because |
| unlike arrays they are internally optimized for searching, and do not |
| need to be searched linearly. |
| </para> |
| |
| <tip> |
| <para> |
| Because JSON containment is nested, an appropriate query can skip |
| explicit selection of sub-objects. As an example, suppose that we have |
| a <structfield>doc</structfield> column containing objects at the top level, with |
| most objects containing <literal>tags</literal> fields that contain arrays of |
| sub-objects. This query finds entries in which sub-objects containing |
| both <literal>"term":"paris"</literal> and <literal>"term":"food"</literal> appear, |
| while ignoring any such keys outside the <literal>tags</literal> array: |
| <programlisting> |
| SELECT doc->'site_name' FROM websites |
| WHERE doc @> '{"tags":[{"term":"paris"}, {"term":"food"}]}'; |
| </programlisting> |
| One could accomplish the same thing with, say, |
| <programlisting> |
| SELECT doc->'site_name' FROM websites |
| WHERE doc->'tags' @> '[{"term":"paris"}, {"term":"food"}]'; |
| </programlisting> |
| but that approach is less flexible, and often less efficient as well. |
| </para> |
| |
| <para> |
| On the other hand, the JSON existence operator is not nested: it will |
| only look for the specified key or array element at top level of the |
| JSON value. |
| </para> |
| </tip> |
| |
| <para> |
| The various containment and existence operators, along with all other |
| JSON operators and functions are documented |
| in <xref linkend="functions-json"/>. |
| </para> |
| </sect2> |
| |
| <sect2 id="json-indexing"> |
| <title><type>jsonb</type> Indexing</title> |
| <indexterm> |
| <primary>jsonb</primary> |
| <secondary>indexes on</secondary> |
| </indexterm> |
| |
| <para> |
| GIN indexes can be used to efficiently search for |
| keys or key/value pairs occurring within a large number of |
| <type>jsonb</type> documents (datums). |
| Two GIN <quote>operator classes</quote> are provided, offering different |
| performance and flexibility trade-offs. |
| </para> |
| <para> |
| The default GIN operator class for <type>jsonb</type> supports queries with |
| the key-exists operators <literal>?</literal>, <literal>?|</literal> |
| and <literal>?&</literal>, the containment operator |
| <literal>@></literal>, and the <type>jsonpath</type> match |
| operators <literal>@?</literal> and <literal>@@</literal>. |
| (For details of the semantics that these operators |
| implement, see <xref linkend="functions-jsonb-op-table"/>.) |
| An example of creating an index with this operator class is: |
| <programlisting> |
| CREATE INDEX idxgin ON api USING GIN (jdoc); |
| </programlisting> |
| The non-default GIN operator class <literal>jsonb_path_ops</literal> |
| does not support the key-exists operators, but it does support |
| <literal>@></literal>, <literal>@?</literal> and <literal>@@</literal>. |
| An example of creating an index with this operator class is: |
| <programlisting> |
| CREATE INDEX idxginp ON api USING GIN (jdoc jsonb_path_ops); |
| </programlisting> |
| </para> |
| |
| <para> |
| Consider the example of a table that stores JSON documents |
| retrieved from a third-party web service, with a documented schema |
| definition. A typical document is: |
| <programlisting> |
| { |
| "guid": "9c36adc1-7fb5-4d5b-83b4-90356a46061a", |
| "name": "Angela Barton", |
| "is_active": true, |
| "company": "Magnafone", |
| "address": "178 Howard Place, Gulf, Washington, 702", |
| "registered": "2009-11-07T08:53:22 +08:00", |
| "latitude": 19.793713, |
| "longitude": 86.513373, |
| "tags": [ |
| "enim", |
| "aliquip", |
| "qui" |
| ] |
| } |
| </programlisting> |
| We store these documents in a table named <structname>api</structname>, |
| in a <type>jsonb</type> column named <structfield>jdoc</structfield>. |
| If a GIN index is created on this column, |
| queries like the following can make use of the index: |
| <programlisting> |
| -- Find documents in which the key "company" has value "Magnafone" |
| SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @> '{"company": "Magnafone"}'; |
| </programlisting> |
| However, the index could not be used for queries like the |
| following, because though the operator <literal>?</literal> is indexable, |
| it is not applied directly to the indexed column <structfield>jdoc</structfield>: |
| <programlisting> |
| -- Find documents in which the key "tags" contains key or array element "qui" |
| SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc -> 'tags' ? 'qui'; |
| </programlisting> |
| Still, with appropriate use of expression indexes, the above |
| query can use an index. If querying for particular items within |
| the <literal>"tags"</literal> key is common, defining an index like this |
| may be worthwhile: |
| <programlisting> |
| CREATE INDEX idxgintags ON api USING GIN ((jdoc -> 'tags')); |
| </programlisting> |
| Now, the <literal>WHERE</literal> clause <literal>jdoc -> 'tags' ? 'qui'</literal> |
| will be recognized as an application of the indexable |
| operator <literal>?</literal> to the indexed |
| expression <literal>jdoc -> 'tags'</literal>. |
| (More information on expression indexes can be found in <xref |
| linkend="indexes-expressional"/>.) |
| </para> |
| |
| <para> |
| Another approach to querying is to exploit containment, for example: |
| <programlisting> |
| -- Find documents in which the key "tags" contains array element "qui" |
| SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @> '{"tags": ["qui"]}'; |
| </programlisting> |
| A simple GIN index on the <structfield>jdoc</structfield> column can support this |
| query. But note that such an index will store copies of every key and |
| value in the <structfield>jdoc</structfield> column, whereas the expression index |
| of the previous example stores only data found under |
| the <literal>tags</literal> key. While the simple-index approach is far more |
| flexible (since it supports queries about any key), targeted expression |
| indexes are likely to be smaller and faster to search than a simple |
| index. |
| </para> |
| |
| <para> |
| GIN indexes also support the <literal>@?</literal> |
| and <literal>@@</literal> operators, which |
| perform <type>jsonpath</type> matching. Examples are |
| <programlisting> |
| SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @? '$.tags[*] ? (@ == "qui")'; |
| </programlisting> |
| <programlisting> |
| SELECT jdoc->'guid', jdoc->'name' FROM api WHERE jdoc @@ '$.tags[*] == "qui"'; |
| </programlisting> |
| For these operators, a GIN index extracts clauses of the form |
| <literal><replaceable>accessors_chain</replaceable> |
| = <replaceable>constant</replaceable></literal> out of |
| the <type>jsonpath</type> pattern, and does the index search based on |
| the keys and values mentioned in these clauses. The accessors chain |
| may include <literal>.<replaceable>key</replaceable></literal>, |
| <literal>[*]</literal>, |
| and <literal>[<replaceable>index</replaceable>]</literal> accessors. |
| The <literal>jsonb_ops</literal> operator class also |
| supports <literal>.*</literal> and <literal>.**</literal> accessors, |
| but the <literal>jsonb_path_ops</literal> operator class does not. |
| </para> |
| |
| <para> |
| Although the <literal>jsonb_path_ops</literal> operator class supports |
| only queries with the <literal>@></literal>, <literal>@?</literal> |
| and <literal>@@</literal> operators, it has notable |
| performance advantages over the default operator |
| class <literal>jsonb_ops</literal>. A <literal>jsonb_path_ops</literal> |
| index is usually much smaller than a <literal>jsonb_ops</literal> |
| index over the same data, and the specificity of searches is better, |
| particularly when queries contain keys that appear frequently in the |
| data. Therefore search operations typically perform better |
| than with the default operator class. |
| </para> |
| |
| <para> |
| The technical difference between a <literal>jsonb_ops</literal> |
| and a <literal>jsonb_path_ops</literal> GIN index is that the former |
| creates independent index items for each key and value in the data, |
| while the latter creates index items only for each value in the |
| data. |
| <footnote> |
| <para> |
| For this purpose, the term <quote>value</quote> includes array elements, |
| though JSON terminology sometimes considers array elements distinct |
| from values within objects. |
| </para> |
| </footnote> |
| Basically, each <literal>jsonb_path_ops</literal> index item is |
| a hash of the value and the key(s) leading to it; for example to index |
| <literal>{"foo": {"bar": "baz"}}</literal>, a single index item would |
| be created incorporating all three of <literal>foo</literal>, <literal>bar</literal>, |
| and <literal>baz</literal> into the hash value. Thus a containment query |
| looking for this structure would result in an extremely specific index |
| search; but there is no way at all to find out whether <literal>foo</literal> |
| appears as a key. On the other hand, a <literal>jsonb_ops</literal> |
| index would create three index items representing <literal>foo</literal>, |
| <literal>bar</literal>, and <literal>baz</literal> separately; then to do the |
| containment query, it would look for rows containing all three of |
| these items. While GIN indexes can perform such an AND search fairly |
| efficiently, it will still be less specific and slower than the |
| equivalent <literal>jsonb_path_ops</literal> search, especially if |
| there are a very large number of rows containing any single one of the |
| three index items. |
| </para> |
| |
| <para> |
| A disadvantage of the <literal>jsonb_path_ops</literal> approach is |
| that it produces no index entries for JSON structures not containing |
| any values, such as <literal>{"a": {}}</literal>. If a search for |
| documents containing such a structure is requested, it will require a |
| full-index scan, which is quite slow. <literal>jsonb_path_ops</literal> is |
| therefore ill-suited for applications that often perform such searches. |
| </para> |
| |
| <para> |
| <type>jsonb</type> also supports <literal>btree</literal> and <literal>hash</literal> |
| indexes. These are usually useful only if it's important to check |
| equality of complete JSON documents. |
| The <literal>btree</literal> ordering for <type>jsonb</type> datums is seldom |
| of great interest, but for completeness it is: |
| <synopsis> |
| <replaceable>Object</replaceable> > <replaceable>Array</replaceable> > <replaceable>Boolean</replaceable> > <replaceable>Number</replaceable> > <replaceable>String</replaceable> > <replaceable>Null</replaceable> |
| |
| <replaceable>Object with n pairs</replaceable> > <replaceable>object with n - 1 pairs</replaceable> |
| |
| <replaceable>Array with n elements</replaceable> > <replaceable>array with n - 1 elements</replaceable> |
| </synopsis> |
| Objects with equal numbers of pairs are compared in the order: |
| <synopsis> |
| <replaceable>key-1</replaceable>, <replaceable>value-1</replaceable>, <replaceable>key-2</replaceable> ... |
| </synopsis> |
| Note that object keys are compared in their storage order; |
| in particular, since shorter keys are stored before longer keys, this |
| can lead to results that might be unintuitive, such as: |
| <programlisting> |
| { "aa": 1, "c": 1} > {"b": 1, "d": 1} |
| </programlisting> |
| Similarly, arrays with equal numbers of elements are compared in the |
| order: |
| <synopsis> |
| <replaceable>element-1</replaceable>, <replaceable>element-2</replaceable> ... |
| </synopsis> |
| Primitive JSON values are compared using the same |
| comparison rules as for the underlying |
| <productname>PostgreSQL</productname> data type. Strings are |
| compared using the default database collation. |
| </para> |
| </sect2> |
| |
| <sect2 id="jsonb-subscripting"> |
| <title><type>jsonb</type> Subscripting</title> |
| <para> |
| The <type>jsonb</type> data type supports array-style subscripting expressions |
| to extract and modify elements. Nested values can be indicated by chaining |
| subscripting expressions, following the same rules as the <literal>path</literal> |
| argument in the <literal>jsonb_set</literal> function. If a <type>jsonb</type> |
| value is an array, numeric subscripts start at zero, and negative integers count |
| backwards from the last element of the array. Slice expressions are not supported. |
| The result of a subscripting expression is always of the jsonb data type. |
| </para> |
| |
| <para> |
| <command>UPDATE</command> statements may use subscripting in the |
| <literal>SET</literal> clause to modify <type>jsonb</type> values. Subscript |
| paths must be traversable for all affected values insofar as they exist. For |
| instance, the path <literal>val['a']['b']['c']</literal> can be traversed all |
| the way to <literal>c</literal> if every <literal>val</literal>, |
| <literal>val['a']</literal>, and <literal>val['a']['b']</literal> is an |
| object. If any <literal>val['a']</literal> or <literal>val['a']['b']</literal> |
| is not defined, it will be created as an empty object and filled as |
| necessary. However, if any <literal>val</literal> itself or one of the |
| intermediary values is defined as a non-object such as a string, number, or |
| <literal>jsonb</literal> <literal>null</literal>, traversal cannot proceed so |
| an error is raised and the transaction aborted. |
| </para> |
| |
| <para> |
| An example of subscripting syntax: |
| |
| <programlisting> |
| |
| -- Extract object value by key |
| SELECT ('{"a": 1}'::jsonb)['a']; |
| |
| -- Extract nested object value by key path |
| SELECT ('{"a": {"b": {"c": 1}}}'::jsonb)['a']['b']['c']; |
| |
| -- Extract array element by index |
| SELECT ('[1, "2", null]'::jsonb)[1]; |
| |
| -- Update object value by key. Note the quotes around '1': the assigned |
| -- value must be of the jsonb type as well |
| UPDATE table_name SET jsonb_field['key'] = '1'; |
| |
| -- This will raise an error if any record's jsonb_field['a']['b'] is something |
| -- other than an object. For example, the value {"a": 1} has a numeric value |
| -- of the key 'a'. |
| UPDATE table_name SET jsonb_field['a']['b']['c'] = '1'; |
| |
| -- Filter records using a WHERE clause with subscripting. Since the result of |
| -- subscripting is jsonb, the value we compare it against must also be jsonb. |
| -- The double quotes make "value" also a valid jsonb string. |
| SELECT * FROM table_name WHERE jsonb_field['key'] = '"value"'; |
| </programlisting> |
| |
| <type>jsonb</type> assignment via subscripting handles a few edge cases |
| differently from <literal>jsonb_set</literal>. When a source <type>jsonb</type> |
| value is <literal>NULL</literal>, assignment via subscripting will proceed |
| as if it was an empty JSON value of the type (object or array) implied by the |
| subscript key: |
| |
| <programlisting> |
| -- Where jsonb_field was NULL, it is now {"a": 1} |
| UPDATE table_name SET jsonb_field['a'] = '1'; |
| |
| -- Where jsonb_field was NULL, it is now [1] |
| UPDATE table_name SET jsonb_field[0] = '1'; |
| </programlisting> |
| |
| If an index is specified for an array containing too few elements, |
| <literal>NULL</literal> elements will be appended until the index is reachable |
| and the value can be set. |
| |
| <programlisting> |
| -- Where jsonb_field was [], it is now [null, null, 2]; |
| -- where jsonb_field was [0], it is now [0, null, 2] |
| UPDATE table_name SET jsonb_field[2] = '2'; |
| </programlisting> |
| |
| A <type>jsonb</type> value will accept assignments to nonexistent subscript |
| paths as long as the last existing element to be traversed is an object or |
| array, as implied by the corresponding subscript (the element indicated by |
| the last subscript in the path is not traversed and may be anything). Nested |
| array and object structures will be created, and in the former case |
| <literal>null</literal>-padded, as specified by the subscript path until the |
| assigned value can be placed. |
| |
| <programlisting> |
| -- Where jsonb_field was {}, it is now {'a': [{'b': 1}]} |
| UPDATE table_name SET jsonb_field['a'][0]['b'] = '1'; |
| |
| -- Where jsonb_field was [], it is now [null, {'a': 1}] |
| UPDATE table_name SET jsonb_field[1]['a'] = '1'; |
| </programlisting> |
| |
| </para> |
| </sect2> |
| |
| <sect2> |
| <title>Transforms</title> |
| |
| <para> |
| Additional extensions are available that implement transforms for the |
| <type>jsonb</type> type for different procedural languages. |
| </para> |
| |
| <para> |
| The extensions for PL/Perl are called <literal>jsonb_plperl</literal> and |
| <literal>jsonb_plperlu</literal>. If you use them, <type>jsonb</type> |
| values are mapped to Perl arrays, hashes, and scalars, as appropriate. |
| </para> |
| |
| <para> |
| The extensions for PL/Python are called <literal>jsonb_plpythonu</literal>, |
| <literal>jsonb_plpython2u</literal>, and |
| <literal>jsonb_plpython3u</literal> (see <xref |
| linkend="plpython-python23"/> for the PL/Python naming convention). If you |
| use them, <type>jsonb</type> values are mapped to Python dictionaries, |
| lists, and scalars, as appropriate. |
| </para> |
| |
| <para> |
| Of these extensions, <literal>jsonb_plperl</literal> is |
| considered <quote>trusted</quote>, that is, it can be installed by |
| non-superusers who have <literal>CREATE</literal> privilege on the |
| current database. The rest require superuser privilege to install. |
| </para> |
| </sect2> |
| |
| <sect2 id="datatype-jsonpath"> |
| <title>jsonpath Type</title> |
| |
| <indexterm zone="datatype-jsonpath"> |
| <primary>jsonpath</primary> |
| </indexterm> |
| |
| <para> |
| The <type>jsonpath</type> type implements support for the SQL/JSON path language |
| in <productname>PostgreSQL</productname> to efficiently query JSON data. |
| It provides a binary representation of the parsed SQL/JSON path |
| expression that specifies the items to be retrieved by the path |
| engine from the JSON data for further processing with the |
| SQL/JSON query functions. |
| </para> |
| |
| <para> |
| The semantics of SQL/JSON path predicates and operators generally follow SQL. |
| At the same time, to provide a natural way of working with JSON data, |
| SQL/JSON path syntax uses some JavaScript conventions: |
| </para> |
| |
| <itemizedlist> |
| <listitem> |
| <para> |
| Dot (<literal>.</literal>) is used for member access. |
| </para> |
| </listitem> |
| <listitem> |
| <para> |
| Square brackets (<literal>[]</literal>) are used for array access. |
| </para> |
| </listitem> |
| <listitem> |
| <para> |
| SQL/JSON arrays are 0-relative, unlike regular SQL arrays that start from 1. |
| </para> |
| </listitem> |
| </itemizedlist> |
| |
| <para> |
| An SQL/JSON path expression is typically written in an SQL query as an |
| SQL character string literal, so it must be enclosed in single quotes, |
| and any single quotes desired within the value must be doubled |
| (see <xref linkend="sql-syntax-strings"/>). |
| Some forms of path expressions require string literals within them. |
| These embedded string literals follow JavaScript/ECMAScript conventions: |
| they must be surrounded by double quotes, and backslash escapes may be |
| used within them to represent otherwise-hard-to-type characters. |
| In particular, the way to write a double quote within an embedded string |
| literal is <literal>\"</literal>, and to write a backslash itself, you |
| must write <literal>\\</literal>. Other special backslash sequences |
| include those recognized in JSON strings: |
| <literal>\b</literal>, |
| <literal>\f</literal>, |
| <literal>\n</literal>, |
| <literal>\r</literal>, |
| <literal>\t</literal>, |
| <literal>\v</literal> |
| for various ASCII control characters, and |
| <literal>\u<replaceable>NNNN</replaceable></literal> for a Unicode |
| character identified by its 4-hex-digit code point. The backslash |
| syntax also includes two cases not allowed by JSON: |
| <literal>\x<replaceable>NN</replaceable></literal> for a character code |
| written with only two hex digits, and |
| <literal>\u{<replaceable>N...</replaceable>}</literal> for a character |
| code written with 1 to 6 hex digits. |
| </para> |
| |
| <para> |
| A path expression consists of a sequence of path elements, |
| which can be any of the following: |
| <itemizedlist> |
| <listitem> |
| <para> |
| Path literals of JSON primitive types: |
| Unicode text, numeric, true, false, or null. |
| </para> |
| </listitem> |
| <listitem> |
| <para> |
| Path variables listed in <xref linkend="type-jsonpath-variables"/>. |
| </para> |
| </listitem> |
| <listitem> |
| <para> |
| Accessor operators listed in <xref linkend="type-jsonpath-accessors"/>. |
| </para> |
| </listitem> |
| <listitem> |
| <para> |
| <type>jsonpath</type> operators and methods listed |
| in <xref linkend="functions-sqljson-path-operators"/>. |
| </para> |
| </listitem> |
| <listitem> |
| <para> |
| Parentheses, which can be used to provide filter expressions |
| or define the order of path evaluation. |
| </para> |
| </listitem> |
| </itemizedlist> |
| </para> |
| |
| <para> |
| For details on using <type>jsonpath</type> expressions with SQL/JSON |
| query functions, see <xref linkend="functions-sqljson-path"/>. |
| </para> |
| |
| <table id="type-jsonpath-variables"> |
| <title><type>jsonpath</type> Variables</title> |
| <tgroup cols="2"> |
| <colspec colname="col1" colwidth="1*"/> |
| <colspec colname="col2" colwidth="2*"/> |
| <thead> |
| <row> |
| <entry>Variable</entry> |
| <entry>Description</entry> |
| </row> |
| </thead> |
| <tbody> |
| <row> |
| <entry><literal>$</literal></entry> |
| <entry>A variable representing the JSON value being queried |
| (the <firstterm>context item</firstterm>). |
| </entry> |
| </row> |
| <row> |
| <entry><literal>$varname</literal></entry> |
| <entry> |
| A named variable. Its value can be set by the parameter |
| <parameter>vars</parameter> of several JSON processing functions; |
| see <xref linkend="functions-json-processing-table"/> for details. |
| <!-- TODO: describe PASSING clause once implemented !--> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>@</literal></entry> |
| <entry>A variable representing the result of path evaluation |
| in filter expressions. |
| </entry> |
| </row> |
| </tbody> |
| </tgroup> |
| </table> |
| |
| <table id="type-jsonpath-accessors"> |
| <title><type>jsonpath</type> Accessors</title> |
| <tgroup cols="2"> |
| <colspec colname="col1" colwidth="1*"/> |
| <colspec colname="col2" colwidth="2*"/> |
| <thead> |
| <row> |
| <entry>Accessor Operator</entry> |
| <entry>Description</entry> |
| </row> |
| </thead> |
| <tbody> |
| <row> |
| <entry> |
| <para> |
| <literal>.<replaceable>key</replaceable></literal> |
| </para> |
| <para> |
| <literal>."$<replaceable>varname</replaceable>"</literal> |
| </para> |
| </entry> |
| <entry> |
| <para> |
| Member accessor that returns an object member with |
| the specified key. If the key name matches some named variable |
| starting with <literal>$</literal> or does not meet the |
| JavaScript rules for an identifier, it must be enclosed in |
| double quotes to make it a string literal. |
| </para> |
| </entry> |
| </row> |
| <row> |
| <entry> |
| <para> |
| <literal>.*</literal> |
| </para> |
| </entry> |
| <entry> |
| <para> |
| Wildcard member accessor that returns the values of all |
| members located at the top level of the current object. |
| </para> |
| </entry> |
| </row> |
| <row> |
| <entry> |
| <para> |
| <literal>.**</literal> |
| </para> |
| </entry> |
| <entry> |
| <para> |
| Recursive wildcard member accessor that processes all levels |
| of the JSON hierarchy of the current object and returns all |
| the member values, regardless of their nesting level. This |
| is a <productname>PostgreSQL</productname> extension of |
| the SQL/JSON standard. |
| </para> |
| </entry> |
| </row> |
| <row> |
| <entry> |
| <para> |
| <literal>.**{<replaceable>level</replaceable>}</literal> |
| </para> |
| <para> |
| <literal>.**{<replaceable>start_level</replaceable> to |
| <replaceable>end_level</replaceable>}</literal> |
| </para> |
| </entry> |
| <entry> |
| <para> |
| Like <literal>.**</literal>, but selects only the specified |
| levels of the JSON hierarchy. Nesting levels are specified as integers. |
| Level zero corresponds to the current object. To access the lowest |
| nesting level, you can use the <literal>last</literal> keyword. |
| This is a <productname>PostgreSQL</productname> extension of |
| the SQL/JSON standard. |
| </para> |
| </entry> |
| </row> |
| <row> |
| <entry> |
| <para> |
| <literal>[<replaceable>subscript</replaceable>, ...]</literal> |
| </para> |
| </entry> |
| <entry> |
| <para> |
| Array element accessor. |
| <literal><replaceable>subscript</replaceable></literal> can be |
| given in two forms: <literal><replaceable>index</replaceable></literal> |
| or <literal><replaceable>start_index</replaceable> to <replaceable>end_index</replaceable></literal>. |
| The first form returns a single array element by its index. The second |
| form returns an array slice by the range of indexes, including the |
| elements that correspond to the provided |
| <replaceable>start_index</replaceable> and <replaceable>end_index</replaceable>. |
| </para> |
| <para> |
| The specified <replaceable>index</replaceable> can be an integer, as |
| well as an expression returning a single numeric value, which is |
| automatically cast to integer. Index zero corresponds to the first |
| array element. You can also use the <literal>last</literal> keyword |
| to denote the last array element, which is useful for handling arrays |
| of unknown length. |
| </para> |
| </entry> |
| </row> |
| <row> |
| <entry> |
| <para> |
| <literal>[*]</literal> |
| </para> |
| </entry> |
| <entry> |
| <para> |
| Wildcard array element accessor that returns all array elements. |
| </para> |
| </entry> |
| </row> |
| </tbody> |
| </tgroup> |
| </table> |
| |
| </sect2> |
| </sect1> |