| <!-- doc/src/sgml/charset.sgml --> |
| |
| <chapter id="charset"> |
| <title>Localization</title> |
| |
| <para> |
| This chapter describes the available localization features from the |
| point of view of the administrator. |
| <productname>PostgreSQL</productname> supports two localization |
| facilities: |
| |
| <itemizedlist> |
| <listitem> |
| <para> |
| Using the locale features of the operating system to provide |
| locale-specific collation order, number formatting, translated |
| messages, and other aspects. |
| This is covered in <xref linkend="locale"/> and |
| <xref linkend="collation"/>. |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| Providing a number of different character sets to support storing text |
| in all kinds of languages, and providing character set translation |
| between client and server. |
| This is covered in <xref linkend="multibyte"/>. |
| </para> |
| </listitem> |
| </itemizedlist> |
| </para> |
| |
| |
| <sect1 id="locale"> |
| <title>Locale Support</title> |
| |
| <indexterm zone="locale"><primary>locale</primary></indexterm> |
| |
| <para> |
| <firstterm>Locale</firstterm> support refers to an application respecting |
| cultural preferences regarding alphabets, sorting, number |
| formatting, etc. <productname>PostgreSQL</productname> uses the standard ISO |
| C and <acronym>POSIX</acronym> locale facilities provided by the server operating |
| system. For additional information refer to the documentation of your |
| system. |
| </para> |
| |
| <sect2> |
| <title>Overview</title> |
| |
| <para> |
| Locale support is automatically initialized when a database |
| cluster is created using <command>initdb</command>. |
| <command>initdb</command> will initialize the database cluster |
| with the locale setting of its execution environment by default, |
| so if your system is already set to use the locale that you want |
| in your database cluster then there is nothing else you need to |
| do. If you want to use a different locale (or you are not sure |
| which locale your system is set to), you can instruct |
| <command>initdb</command> exactly which locale to use by |
| specifying the <option>--locale</option> option. For example: |
| <screen> |
| initdb --locale=sv_SE |
| </screen> |
| </para> |
| |
| <para> |
| This example for Unix systems sets the locale to Swedish |
| (<literal>sv</literal>) as spoken |
| in Sweden (<literal>SE</literal>). Other possibilities might include |
| <literal>en_US</literal> (U.S. English) and <literal>fr_CA</literal> (French |
| Canadian). If more than one character set can be used for a |
| locale then the specifications can take the form |
| <replaceable>language_territory.codeset</replaceable>. For example, |
| <literal>fr_BE.UTF-8</literal> represents the French language (fr) as |
| spoken in Belgium (BE), with a <acronym>UTF-8</acronym> character set |
| encoding. |
| </para> |
| |
| <para> |
| What locales are available on your |
| system under what names depends on what was provided by the operating |
| system vendor and what was installed. On most Unix systems, the command |
| <literal>locale -a</literal> will provide a list of available locales. |
| Windows uses more verbose locale names, such as <literal>German_Germany</literal> |
| or <literal>Swedish_Sweden.1252</literal>, but the principles are the same. |
| </para> |
| |
| <para> |
| Occasionally it is useful to mix rules from several locales, e.g., |
| use English collation rules but Spanish messages. To support that, a |
| set of locale subcategories exist that control only certain |
| aspects of the localization rules: |
| |
| <informaltable> |
| <tgroup cols="2"> |
| <colspec colname="col1" colwidth="1*"/> |
| <colspec colname="col2" colwidth="3*"/> |
| <tbody> |
| <row> |
| <entry><envar>LC_COLLATE</envar></entry> |
| <entry>String sort order</entry> |
| </row> |
| <row> |
| <entry><envar>LC_CTYPE</envar></entry> |
| <entry>Character classification (What is a letter? Its upper-case equivalent?)</entry> |
| </row> |
| <row> |
| <entry><envar>LC_MESSAGES</envar></entry> |
| <entry>Language of messages</entry> |
| </row> |
| <row> |
| <entry><envar>LC_MONETARY</envar></entry> |
| <entry>Formatting of currency amounts</entry> |
| </row> |
| <row> |
| <entry><envar>LC_NUMERIC</envar></entry> |
| <entry>Formatting of numbers</entry> |
| </row> |
| <row> |
| <entry><envar>LC_TIME</envar></entry> |
| <entry>Formatting of dates and times</entry> |
| </row> |
| </tbody> |
| </tgroup> |
| </informaltable> |
| |
| The category names translate into names of |
| <command>initdb</command> options to override the locale choice |
| for a specific category. For instance, to set the locale to |
| French Canadian, but use U.S. rules for formatting currency, use |
| <literal>initdb --locale=fr_CA --lc-monetary=en_US</literal>. |
| </para> |
| |
| <para> |
| If you want the system to behave as if it had no locale support, |
| use the special locale name <literal>C</literal>, or equivalently |
| <literal>POSIX</literal>. |
| </para> |
| |
| <para> |
| Some locale categories must have their values |
| fixed when the database is created. You can use different settings |
| for different databases, but once a database is created, you cannot |
| change them for that database anymore. <literal>LC_COLLATE</literal> |
| and <literal>LC_CTYPE</literal> are these categories. They affect |
| the sort order of indexes, so they must be kept fixed, or indexes on |
| text columns would become corrupt. |
| (But you can alleviate this restriction using collations, as discussed |
| in <xref linkend="collation"/>.) |
| The default values for these |
| categories are determined when <command>initdb</command> is run, and |
| those values are used when new databases are created, unless |
| specified otherwise in the <command>CREATE DATABASE</command> command. |
| </para> |
| |
| <para> |
| The other locale categories can be changed whenever desired |
| by setting the server configuration parameters |
| that have the same name as the locale categories (see <xref |
| linkend="runtime-config-client-format"/> for details). The values |
| that are chosen by <command>initdb</command> are actually only written |
| into the configuration file <filename>postgresql.conf</filename> to |
| serve as defaults when the server is started. If you remove these |
| assignments from <filename>postgresql.conf</filename> then the |
| server will inherit the settings from its execution environment. |
| </para> |
| |
| <para> |
| Note that the locale behavior of the server is determined by the |
| environment variables seen by the server, not by the environment |
| of any client. Therefore, be careful to configure the correct locale settings |
| before starting the server. A consequence of this is that if |
| client and server are set up in different locales, messages might |
| appear in different languages depending on where they originated. |
| </para> |
| |
| <note> |
| <para> |
| When we speak of inheriting the locale from the execution |
| environment, this means the following on most operating systems: |
| For a given locale category, say the collation, the following |
| environment variables are consulted in this order until one is |
| found to be set: <envar>LC_ALL</envar>, <envar>LC_COLLATE</envar> |
| (or the variable corresponding to the respective category), |
| <envar>LANG</envar>. If none of these environment variables are |
| set then the locale defaults to <literal>C</literal>. |
| </para> |
| |
| <para> |
| Some message localization libraries also look at the environment |
| variable <envar>LANGUAGE</envar> which overrides all other locale |
| settings for the purpose of setting the language of messages. If |
| in doubt, please refer to the documentation of your operating |
| system, in particular the documentation about |
| <application>gettext</application>. |
| </para> |
| </note> |
| |
| <para> |
| To enable messages to be translated to the user's preferred language, |
| <acronym>NLS</acronym> must have been selected at build time |
| (<literal>configure --enable-nls</literal>). All other locale support is |
| built in automatically. |
| </para> |
| </sect2> |
| |
| <sect2> |
| <title>Behavior</title> |
| |
| <para> |
| The locale settings influence the following SQL features: |
| |
| <itemizedlist> |
| <listitem> |
| <para> |
| Sort order in queries using <literal>ORDER BY</literal> or the standard |
| comparison operators on textual data |
| <indexterm><primary>ORDER BY</primary><secondary>and locales</secondary></indexterm> |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| The <function>upper</function>, <function>lower</function>, and <function>initcap</function> |
| functions |
| <indexterm><primary>upper</primary><secondary>and locales</secondary></indexterm> |
| <indexterm><primary>lower</primary><secondary>and locales</secondary></indexterm> |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| Pattern matching operators (<literal>LIKE</literal>, <literal>SIMILAR TO</literal>, |
| and POSIX-style regular expressions); locales affect both case |
| insensitive matching and the classification of characters by |
| character-class regular expressions |
| <indexterm><primary>LIKE</primary><secondary>and locales</secondary></indexterm> |
| <indexterm><primary>regular expressions</primary><secondary>and locales</secondary></indexterm> |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| The <function>to_char</function> family of functions |
| <indexterm><primary>to_char</primary><secondary>and locales</secondary></indexterm> |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| The ability to use indexes with <literal>LIKE</literal> clauses |
| </para> |
| </listitem> |
| </itemizedlist> |
| </para> |
| |
| <para> |
| The drawback of using locales other than <literal>C</literal> or |
| <literal>POSIX</literal> in <productname>PostgreSQL</productname> is its performance |
| impact. It slows character handling and prevents ordinary indexes |
| from being used by <literal>LIKE</literal>. For this reason use locales |
| only if you actually need them. |
| </para> |
| |
| <para> |
| As a workaround to allow <productname>PostgreSQL</productname> to use indexes |
| with <literal>LIKE</literal> clauses under a non-C locale, several custom |
| operator classes exist. These allow the creation of an index that |
| performs a strict character-by-character comparison, ignoring |
| locale comparison rules. Refer to <xref linkend="indexes-opclass"/> |
| for more information. Another approach is to create indexes using |
| the <literal>C</literal> collation, as discussed in |
| <xref linkend="collation"/>. |
| </para> |
| </sect2> |
| |
| <sect2> |
| <title>Problems</title> |
| |
| <para> |
| If locale support doesn't work according to the explanation above, |
| check that the locale support in your operating system is |
| correctly configured. To check what locales are installed on your |
| system, you can use the command <literal>locale -a</literal> if |
| your operating system provides it. |
| </para> |
| |
| <para> |
| Check that <productname>PostgreSQL</productname> is actually using the locale |
| that you think it is. The <envar>LC_COLLATE</envar> and <envar>LC_CTYPE</envar> |
| settings are determined when a database is created, and cannot be |
| changed except by creating a new database. Other locale |
| settings including <envar>LC_MESSAGES</envar> and <envar>LC_MONETARY</envar> |
| are initially determined by the environment the server is started |
| in, but can be changed on-the-fly. You can check the active locale |
| settings using the <command>SHOW</command> command. |
| </para> |
| |
| <para> |
| The directory <filename>src/test/locale</filename> in the source |
| distribution contains a test suite for |
| <productname>PostgreSQL</productname>'s locale support. |
| </para> |
| |
| <para> |
| Client applications that handle server-side errors by parsing the |
| text of the error message will obviously have problems when the |
| server's messages are in a different language. Authors of such |
| applications are advised to make use of the error code scheme |
| instead. |
| </para> |
| |
| <para> |
| Maintaining catalogs of message translations requires the on-going |
| efforts of many volunteers that want to see |
| <productname>PostgreSQL</productname> speak their preferred language well. |
| If messages in your language are currently not available or not fully |
| translated, your assistance would be appreciated. If you want to |
| help, refer to <xref linkend="nls"/> or write to the developers' |
| mailing list. |
| </para> |
| </sect2> |
| </sect1> |
| |
| |
| <sect1 id="collation"> |
| <title>Collation Support</title> |
| |
| <indexterm zone="collation"><primary>collation</primary></indexterm> |
| |
| <para> |
| The collation feature allows specifying the sort order and character |
| classification behavior of data per-column, or even per-operation. |
| This alleviates the restriction that the |
| <symbol>LC_COLLATE</symbol> and <symbol>LC_CTYPE</symbol> settings |
| of a database cannot be changed after its creation. |
| </para> |
| |
| <sect2> |
| <title>Concepts</title> |
| |
| <para> |
| Conceptually, every expression of a collatable data type has a |
| collation. (The built-in collatable data types are |
| <type>text</type>, <type>varchar</type>, and <type>char</type>. |
| User-defined base types can also be marked collatable, and of course |
| a domain over a collatable data type is collatable.) If the |
| expression is a column reference, the collation of the expression is the |
| defined collation of the column. If the expression is a constant, the |
| collation is the default collation of the data type of the |
| constant. The collation of a more complex expression is derived |
| from the collations of its inputs, as described below. |
| </para> |
| |
| <para> |
| The collation of an expression can be the <quote>default</quote> |
| collation, which means the locale settings defined for the |
| database. It is also possible for an expression's collation to be |
| indeterminate. In such cases, ordering operations and other |
| operations that need to know the collation will fail. |
| </para> |
| |
| <para> |
| When the database system has to perform an ordering or a character |
| classification, it uses the collation of the input expression. This |
| happens, for example, with <literal>ORDER BY</literal> clauses |
| and function or operator calls such as <literal><</literal>. |
| The collation to apply for an <literal>ORDER BY</literal> clause |
| is simply the collation of the sort key. The collation to apply for a |
| function or operator call is derived from the arguments, as described |
| below. In addition to comparison operators, collations are taken into |
| account by functions that convert between lower and upper case |
| letters, such as <function>lower</function>, <function>upper</function>, and |
| <function>initcap</function>; by pattern matching operators; and by |
| <function>to_char</function> and related functions. |
| </para> |
| |
| <para> |
| For a function or operator call, the collation that is derived by |
| examining the argument collations is used at run time for performing |
| the specified operation. If the result of the function or operator |
| call is of a collatable data type, the collation is also used at parse |
| time as the defined collation of the function or operator expression, |
| in case there is a surrounding expression that requires knowledge of |
| its collation. |
| </para> |
| |
| <para> |
| The <firstterm>collation derivation</firstterm> of an expression can be |
| implicit or explicit. This distinction affects how collations are |
| combined when multiple different collations appear in an |
| expression. An explicit collation derivation occurs when a |
| <literal>COLLATE</literal> clause is used; all other collation |
| derivations are implicit. When multiple collations need to be |
| combined, for example in a function call, the following rules are |
| used: |
| |
| <orderedlist> |
| <listitem> |
| <para> |
| If any input expression has an explicit collation derivation, then |
| all explicitly derived collations among the input expressions must be |
| the same, otherwise an error is raised. If any explicitly |
| derived collation is present, that is the result of the |
| collation combination. |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| Otherwise, all input expressions must have the same implicit |
| collation derivation or the default collation. If any non-default |
| collation is present, that is the result of the collation combination. |
| Otherwise, the result is the default collation. |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| If there are conflicting non-default implicit collations among the |
| input expressions, then the combination is deemed to have indeterminate |
| collation. This is not an error condition unless the particular |
| function being invoked requires knowledge of the collation it should |
| apply. If it does, an error will be raised at run-time. |
| </para> |
| </listitem> |
| </orderedlist> |
| |
| For example, consider this table definition: |
| <programlisting> |
| CREATE TABLE test1 ( |
| a text COLLATE "de_DE", |
| b text COLLATE "es_ES", |
| ... |
| ); |
| </programlisting> |
| |
| Then in |
| <programlisting> |
| SELECT a < 'foo' FROM test1; |
| </programlisting> |
| the <literal><</literal> comparison is performed according to |
| <literal>de_DE</literal> rules, because the expression combines an |
| implicitly derived collation with the default collation. But in |
| <programlisting> |
| SELECT a < ('foo' COLLATE "fr_FR") FROM test1; |
| </programlisting> |
| the comparison is performed using <literal>fr_FR</literal> rules, |
| because the explicit collation derivation overrides the implicit one. |
| Furthermore, given |
| <programlisting> |
| SELECT a < b FROM test1; |
| </programlisting> |
| the parser cannot determine which collation to apply, since the |
| <structfield>a</structfield> and <structfield>b</structfield> columns have conflicting |
| implicit collations. Since the <literal><</literal> operator |
| does need to know which collation to use, this will result in an |
| error. The error can be resolved by attaching an explicit collation |
| specifier to either input expression, thus: |
| <programlisting> |
| SELECT a < b COLLATE "de_DE" FROM test1; |
| </programlisting> |
| or equivalently |
| <programlisting> |
| SELECT a COLLATE "de_DE" < b FROM test1; |
| </programlisting> |
| On the other hand, the structurally similar case |
| <programlisting> |
| SELECT a || b FROM test1; |
| </programlisting> |
| does not result in an error, because the <literal>||</literal> operator |
| does not care about collations: its result is the same regardless |
| of the collation. |
| </para> |
| |
| <para> |
| The collation assigned to a function or operator's combined input |
| expressions is also considered to apply to the function or operator's |
| result, if the function or operator delivers a result of a collatable |
| data type. So, in |
| <programlisting> |
| SELECT * FROM test1 ORDER BY a || 'foo'; |
| </programlisting> |
| the ordering will be done according to <literal>de_DE</literal> rules. |
| But this query: |
| <programlisting> |
| SELECT * FROM test1 ORDER BY a || b; |
| </programlisting> |
| results in an error, because even though the <literal>||</literal> operator |
| doesn't need to know a collation, the <literal>ORDER BY</literal> clause does. |
| As before, the conflict can be resolved with an explicit collation |
| specifier: |
| <programlisting> |
| SELECT * FROM test1 ORDER BY a || b COLLATE "fr_FR"; |
| </programlisting> |
| </para> |
| </sect2> |
| |
| <sect2 id="collation-managing"> |
| <title>Managing Collations</title> |
| |
| <para> |
| A collation is an SQL schema object that maps an SQL name to locales |
| provided by libraries installed in the operating system. A collation |
| definition has a <firstterm>provider</firstterm> that specifies which |
| library supplies the locale data. One standard provider name |
| is <literal>libc</literal>, which uses the locales provided by the |
| operating system C library. These are the locales that most tools |
| provided by the operating system use. Another provider |
| is <literal>icu</literal>, which uses the external |
| ICU<indexterm><primary>ICU</primary></indexterm> library. ICU locales can only be |
| used if support for ICU was configured when PostgreSQL was built. |
| </para> |
| |
| <para> |
| A collation object provided by <literal>libc</literal> maps to a |
| combination of <symbol>LC_COLLATE</symbol> and <symbol>LC_CTYPE</symbol> |
| settings, as accepted by the <literal>setlocale()</literal> system library call. (As |
| the name would suggest, the main purpose of a collation is to set |
| <symbol>LC_COLLATE</symbol>, which controls the sort order. But |
| it is rarely necessary in practice to have an |
| <symbol>LC_CTYPE</symbol> setting that is different from |
| <symbol>LC_COLLATE</symbol>, so it is more convenient to collect |
| these under one concept than to create another infrastructure for |
| setting <symbol>LC_CTYPE</symbol> per expression.) Also, |
| a <literal>libc</literal> collation |
| is tied to a character set encoding (see <xref linkend="multibyte"/>). |
| The same collation name may exist for different encodings. |
| </para> |
| |
| <para> |
| A collation object provided by <literal>icu</literal> maps to a named |
| collator provided by the ICU library. ICU does not support |
| separate <quote>collate</quote> and <quote>ctype</quote> settings, so |
| they are always the same. Also, ICU collations are independent of the |
| encoding, so there is always only one ICU collation of a given name in |
| a database. |
| </para> |
| |
| <sect3> |
| <title>Standard Collations</title> |
| |
| <para> |
| On all platforms, the collations named <literal>default</literal>, |
| <literal>C</literal>, and <literal>POSIX</literal> are available. Additional |
| collations may be available depending on operating system support. |
| The <literal>default</literal> collation selects the <symbol>LC_COLLATE</symbol> |
| and <symbol>LC_CTYPE</symbol> values specified at database creation time. |
| The <literal>C</literal> and <literal>POSIX</literal> collations both specify |
| <quote>traditional C</quote> behavior, in which only the ASCII letters |
| <quote><literal>A</literal></quote> through <quote><literal>Z</literal></quote> |
| are treated as letters, and sorting is done strictly by character |
| code byte values. |
| </para> |
| |
| <para> |
| Additionally, the SQL standard collation name <literal>ucs_basic</literal> |
| is available for encoding <literal>UTF8</literal>. It is equivalent |
| to <literal>C</literal> and sorts by Unicode code point. |
| </para> |
| </sect3> |
| |
| <sect3> |
| <title>Predefined Collations</title> |
| |
| <para> |
| If the operating system provides support for using multiple locales |
| within a single program (<function>newlocale</function> and related functions), |
| or if support for ICU is configured, |
| then when a database cluster is initialized, <command>initdb</command> |
| populates the system catalog <literal>pg_collation</literal> with |
| collations based on all the locales it finds in the operating |
| system at the time. |
| </para> |
| |
| <para> |
| To inspect the currently available locales, use the query <literal>SELECT |
| * FROM pg_collation</literal>, or the command <command>\dOS+</command> |
| in <application>psql</application>. |
| </para> |
| |
| <sect4> |
| <title>libc Collations</title> |
| |
| <para> |
| For example, the operating system might |
| provide a locale named <literal>de_DE.utf8</literal>. |
| <command>initdb</command> would then create a collation named |
| <literal>de_DE.utf8</literal> for encoding <literal>UTF8</literal> |
| that has both <symbol>LC_COLLATE</symbol> and |
| <symbol>LC_CTYPE</symbol> set to <literal>de_DE.utf8</literal>. |
| It will also create a collation with the <literal>.utf8</literal> |
| tag stripped off the name. So you could also use the collation |
| under the name <literal>de_DE</literal>, which is less cumbersome |
| to write and makes the name less encoding-dependent. Note that, |
| nevertheless, the initial set of collation names is |
| platform-dependent. |
| </para> |
| |
| <para> |
| The default set of collations provided by <literal>libc</literal> map |
| directly to the locales installed in the operating system, which can be |
| listed using the command <literal>locale -a</literal>. In case |
| a <literal>libc</literal> collation is needed that has different values |
| for <symbol>LC_COLLATE</symbol> and <symbol>LC_CTYPE</symbol>, or if new |
| locales are installed in the operating system after the database system |
| was initialized, then a new collation may be created using |
| the <xref linkend="sql-createcollation"/> command. |
| New operating system locales can also be imported en masse using |
| the <link linkend="functions-admin-collation"><function>pg_import_system_collations()</function></link> function. |
| </para> |
| |
| <para> |
| Within any particular database, only collations that use that |
| database's encoding are of interest. Other entries in |
| <literal>pg_collation</literal> are ignored. Thus, a stripped collation |
| name such as <literal>de_DE</literal> can be considered unique |
| within a given database even though it would not be unique globally. |
| Use of the stripped collation names is recommended, since it will |
| make one fewer thing you need to change if you decide to change to |
| another database encoding. Note however that the <literal>default</literal>, |
| <literal>C</literal>, and <literal>POSIX</literal> collations can be used regardless of |
| the database encoding. |
| </para> |
| |
| <para> |
| <productname>PostgreSQL</productname> considers distinct collation |
| objects to be incompatible even when they have identical properties. |
| Thus for example, |
| <programlisting> |
| SELECT a COLLATE "C" < b COLLATE "POSIX" FROM test1; |
| </programlisting> |
| will draw an error even though the <literal>C</literal> and <literal>POSIX</literal> |
| collations have identical behaviors. Mixing stripped and non-stripped |
| collation names is therefore not recommended. |
| </para> |
| </sect4> |
| |
| <sect4> |
| <title>ICU Collations</title> |
| |
| <para> |
| With ICU, it is not sensible to enumerate all possible locale names. ICU |
| uses a particular naming system for locales, but there are many more ways |
| to name a locale than there are actually distinct locales. |
| <command>initdb</command> uses the ICU APIs to extract a set of distinct |
| locales to populate the initial set of collations. Collations provided by |
| ICU are created in the SQL environment with names in BCP 47 language tag |
| format, with a <quote>private use</quote> |
| extension <literal>-x-icu</literal> appended, to distinguish them from |
| libc locales. |
| </para> |
| |
| <para> |
| Here are some example collations that might be created: |
| |
| <variablelist> |
| <varlistentry> |
| <term><literal>de-x-icu</literal></term> |
| <listitem> |
| <para>German collation, default variant</para> |
| </listitem> |
| </varlistentry> |
| |
| <varlistentry> |
| <term><literal>de-AT-x-icu</literal></term> |
| <listitem> |
| <para>German collation for Austria, default variant</para> |
| <para> |
| (There are also, say, <literal>de-DE-x-icu</literal> |
| or <literal>de-CH-x-icu</literal>, but as of this writing, they are |
| equivalent to <literal>de-x-icu</literal>.) |
| </para> |
| </listitem> |
| </varlistentry> |
| |
| <varlistentry> |
| <term><literal>und-x-icu</literal> (for <quote>undefined</quote>)</term> |
| <listitem> |
| <para> |
| ICU <quote>root</quote> collation. Use this to get a reasonable |
| language-agnostic sort order. |
| </para> |
| </listitem> |
| </varlistentry> |
| </variablelist> |
| </para> |
| |
| <para> |
| Some (less frequently used) encodings are not supported by ICU. When the |
| database encoding is one of these, ICU collation entries |
| in <literal>pg_collation</literal> are ignored. Attempting to use one |
| will draw an error along the lines of <quote>collation "de-x-icu" for |
| encoding "WIN874" does not exist</quote>. |
| </para> |
| </sect4> |
| </sect3> |
| |
| <sect3 id="collation-create"> |
| <title>Creating New Collation Objects</title> |
| |
| <para> |
| If the standard and predefined collations are not sufficient, users can |
| create their own collation objects using the SQL |
| command <xref linkend="sql-createcollation"/>. |
| </para> |
| |
| <para> |
| The standard and predefined collations are in the |
| schema <literal>pg_catalog</literal>, like all predefined objects. |
| User-defined collations should be created in user schemas. This also |
| ensures that they are saved by <command>pg_dump</command>. |
| </para> |
| |
| <sect4> |
| <title>libc Collations</title> |
| |
| <para> |
| New libc collations can be created like this: |
| <programlisting> |
| CREATE COLLATION german (provider = libc, locale = 'de_DE'); |
| </programlisting> |
| The exact values that are acceptable for the <literal>locale</literal> |
| clause in this command depend on the operating system. On Unix-like |
| systems, the command <literal>locale -a</literal> will show a list. |
| </para> |
| |
| <para> |
| Since the predefined libc collations already include all collations |
| defined in the operating system when the database instance is |
| initialized, it is not often necessary to manually create new ones. |
| Reasons might be if a different naming system is desired (in which case |
| see also <xref linkend="collation-copy"/>) or if the operating system has |
| been upgraded to provide new locale definitions (in which case see |
| also <link linkend="functions-admin-collation"><function>pg_import_system_collations()</function></link>). |
| </para> |
| </sect4> |
| |
| <sect4> |
| <title>ICU Collations</title> |
| |
| <para> |
| ICU allows collations to be customized beyond the basic language+country |
| set that is preloaded by <command>initdb</command>. Users are encouraged |
| to define their own collation objects that make use of these facilities to |
| suit the sorting behavior to their requirements. |
| See <ulink url="https://unicode-org.github.io/icu/userguide/locale/"></ulink> |
| and <ulink url="https://unicode-org.github.io/icu/userguide/collation/api.html"></ulink> for |
| information on ICU locale naming. The set of acceptable names and |
| attributes depends on the particular ICU version. |
| </para> |
| |
| <para> |
| Here are some examples: |
| |
| <variablelist> |
| <varlistentry> |
| <term><literal>CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, locale = 'de-u-co-phonebk');</literal></term> |
| <term><literal>CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, locale = 'de@collation=phonebook');</literal></term> |
| <listitem> |
| <para>German collation with phone book collation type</para> |
| <para> |
| The first example selects the ICU locale using a <quote>language |
| tag</quote> per BCP 47. The second example uses the traditional |
| ICU-specific locale syntax. The first style is preferred going |
| forward, but it is not supported by older ICU versions. |
| </para> |
| <para> |
| Note that you can name the collation objects in the SQL environment |
| anything you want. In this example, we follow the naming style that |
| the predefined collations use, which in turn also follow BCP 47, but |
| that is not required for user-defined collations. |
| </para> |
| </listitem> |
| </varlistentry> |
| |
| <varlistentry> |
| <term><literal>CREATE COLLATION "und-u-co-emoji-x-icu" (provider = icu, locale = 'und-u-co-emoji');</literal></term> |
| <term><literal>CREATE COLLATION "und-u-co-emoji-x-icu" (provider = icu, locale = '@collation=emoji');</literal></term> |
| <listitem> |
| <para> |
| Root collation with Emoji collation type, per Unicode Technical Standard #51 |
| </para> |
| <para> |
| Observe how in the traditional ICU locale naming system, the root |
| locale is selected by an empty string. |
| </para> |
| </listitem> |
| </varlistentry> |
| |
| <varlistentry> |
| <term><literal>CREATE COLLATION latinlast (provider = icu, locale = 'en-u-kr-grek-latn');</literal></term> |
| <term><literal>CREATE COLLATION latinlast (provider = icu, locale = 'en@colReorder=grek-latn');</literal></term> |
| <listitem> |
| <para> |
| Sort Greek letters before Latin ones. (The default is Latin before Greek.) |
| </para> |
| </listitem> |
| </varlistentry> |
| |
| <varlistentry> |
| <term><literal>CREATE COLLATION upperfirst (provider = icu, locale = 'en-u-kf-upper');</literal></term> |
| <term><literal>CREATE COLLATION upperfirst (provider = icu, locale = 'en@colCaseFirst=upper');</literal></term> |
| <listitem> |
| <para> |
| Sort upper-case letters before lower-case letters. (The default is |
| lower-case letters first.) |
| </para> |
| </listitem> |
| </varlistentry> |
| |
| <varlistentry> |
| <term><literal>CREATE COLLATION special (provider = icu, locale = 'en-u-kf-upper-kr-grek-latn');</literal></term> |
| <term><literal>CREATE COLLATION special (provider = icu, locale = 'en@colCaseFirst=upper;colReorder=grek-latn');</literal></term> |
| <listitem> |
| <para> |
| Combines both of the above options. |
| </para> |
| </listitem> |
| </varlistentry> |
| |
| <varlistentry> |
| <term><literal>CREATE COLLATION numeric (provider = icu, locale = 'en-u-kn-true');</literal></term> |
| <term><literal>CREATE COLLATION numeric (provider = icu, locale = 'en@colNumeric=yes');</literal></term> |
| <listitem> |
| <para> |
| Numeric ordering, sorts sequences of digits by their numeric value, |
| for example: <literal>A-21</literal> < <literal>A-123</literal> |
| (also known as natural sort). |
| </para> |
| </listitem> |
| </varlistentry> |
| </variablelist> |
| |
| See <ulink url="https://www.unicode.org/reports/tr35/tr35-collation.html">Unicode |
| Technical Standard #35</ulink> |
| and <ulink url="https://tools.ietf.org/html/bcp47">BCP 47</ulink> for |
| details. The list of possible collation types (<literal>co</literal> |
| subtag) can be found in |
| the <ulink url="https://github.com/unicode-org/cldr/blob/master/common/bcp47/collation.xml">CLDR |
| repository</ulink>. |
| </para> |
| |
| <para> |
| Note that while this system allows creating collations that <quote>ignore |
| case</quote> or <quote>ignore accents</quote> or similar (using the |
| <literal>ks</literal> key), in order for such collations to act in a |
| truly case- or accent-insensitive manner, they also need to be declared as not |
| <firstterm>deterministic</firstterm> in <command>CREATE COLLATION</command>; |
| see <xref linkend="collation-nondeterministic"/>. |
| Otherwise, any strings that compare equal according to the collation but |
| are not byte-wise equal will be sorted according to their byte values. |
| </para> |
| |
| <note> |
| <para> |
| By design, ICU will accept almost any string as a locale name and match |
| it to the closest locale it can provide, using the fallback procedure |
| described in its documentation. Thus, there will be no direct feedback |
| if a collation specification is composed using features that the given |
| ICU installation does not actually support. It is therefore recommended |
| to create application-level test cases to check that the collation |
| definitions satisfy one's requirements. |
| </para> |
| </note> |
| </sect4> |
| |
| <sect4 id="collation-copy"> |
| <title>Copying Collations</title> |
| |
| <para> |
| The command <xref linkend="sql-createcollation"/> can also be used to |
| create a new collation from an existing collation, which can be useful to |
| be able to use operating-system-independent collation names in |
| applications, create compatibility names, or use an ICU-provided collation |
| under a more readable name. For example: |
| <programlisting> |
| CREATE COLLATION german FROM "de_DE"; |
| CREATE COLLATION french FROM "fr-x-icu"; |
| </programlisting> |
| </para> |
| </sect4> |
| </sect3> |
| |
| <sect3 id="collation-nondeterministic"> |
| <title>Nondeterministic Collations</title> |
| |
| <para> |
| A collation is either <firstterm>deterministic</firstterm> or |
| <firstterm>nondeterministic</firstterm>. A deterministic collation uses |
| deterministic comparisons, which means that it considers strings to be |
| equal only if they consist of the same byte sequence. Nondeterministic |
| comparison may determine strings to be equal even if they consist of |
| different bytes. Typical situations include case-insensitive comparison, |
| accent-insensitive comparison, as well as comparison of strings in |
| different Unicode normal forms. It is up to the collation provider to |
| actually implement such insensitive comparisons; the deterministic flag |
| only determines whether ties are to be broken using bytewise comparison. |
| See also <ulink url="https://www.unicode.org/reports/tr10">Unicode Technical |
| Standard 10</ulink> for more information on the terminology. |
| </para> |
| |
| <para> |
| To create a nondeterministic collation, specify the property |
| <literal>deterministic = false</literal> to <command>CREATE |
| COLLATION</command>, for example: |
| <programlisting> |
| CREATE COLLATION ndcoll (provider = icu, locale = 'und', deterministic = false); |
| </programlisting> |
| This example would use the standard Unicode collation in a |
| nondeterministic way. In particular, this would allow strings in |
| different normal forms to be compared correctly. More interesting |
| examples make use of the ICU customization facilities explained above. |
| For example: |
| <programlisting> |
| CREATE COLLATION case_insensitive (provider = icu, locale = 'und-u-ks-level2', deterministic = false); |
| CREATE COLLATION ignore_accents (provider = icu, locale = 'und-u-ks-level1-kc-true', deterministic = false); |
| </programlisting> |
| </para> |
| |
| <para> |
| All standard and predefined collations are deterministic, all |
| user-defined collations are deterministic by default. While |
| nondeterministic collations give a more <quote>correct</quote> behavior, |
| especially when considering the full power of Unicode and its many |
| special cases, they also have some drawbacks. Foremost, their use leads |
| to a performance penalty. Note, in particular, that B-tree cannot use |
| deduplication with indexes that use a nondeterministic collation. Also, |
| certain operations are not possible with nondeterministic collations, |
| such as pattern matching operations. Therefore, they should be used |
| only in cases where they are specifically wanted. |
| </para> |
| |
| <tip> |
| <para> |
| To deal with text in different Unicode normalization forms, it is also |
| an option to use the functions/expressions |
| <function>normalize</function> and <literal>is normalized</literal> to |
| preprocess or check the strings, instead of using nondeterministic |
| collations. There are different trade-offs for each approach. |
| </para> |
| </tip> |
| </sect3> |
| </sect2> |
| </sect1> |
| |
| <sect1 id="multibyte"> |
| <title>Character Set Support</title> |
| |
| <indexterm zone="multibyte"><primary>character set</primary></indexterm> |
| |
| <para> |
| The character set support in <productname>PostgreSQL</productname> |
| allows you to store text in a variety of character sets (also called |
| encodings), including |
| single-byte character sets such as the ISO 8859 series and |
| multiple-byte character sets such as <acronym>EUC</acronym> (Extended Unix |
| Code), UTF-8, and Mule internal code. All supported character sets |
| can be used transparently by clients, but a few are not supported |
| for use within the server (that is, as a server-side encoding). |
| The default character set is selected while |
| initializing your <productname>PostgreSQL</productname> database |
| cluster using <command>initdb</command>. It can be overridden when you |
| create a database, so you can have multiple |
| databases each with a different character set. |
| </para> |
| |
| <para> |
| An important restriction, however, is that each database's character set |
| must be compatible with the database's <envar>LC_CTYPE</envar> (character |
| classification) and <envar>LC_COLLATE</envar> (string sort order) locale |
| settings. For <literal>C</literal> or |
| <literal>POSIX</literal> locale, any character set is allowed, but for other |
| libc-provided locales there is only one character set that will work |
| correctly. |
| (On Windows, however, UTF-8 encoding can be used with any locale.) |
| If you have ICU support configured, ICU-provided locales can be used |
| with most but not all server-side encodings. |
| </para> |
| |
| <sect2 id="multibyte-charset-supported"> |
| <title>Supported Character Sets</title> |
| |
| <para> |
| <xref linkend="charset-table"/> shows the character sets available |
| for use in <productname>PostgreSQL</productname>. |
| </para> |
| |
| <table id="charset-table"> |
| <title><productname>PostgreSQL</productname> Character Sets</title> |
| <tgroup cols="7"> |
| <colspec colname="col1" colwidth="3*"/> |
| <colspec colname="col2" colwidth="2*"/> |
| <colspec colname="col3" colwidth="2*"/> |
| <colspec colname="col4" colwidth="1.25*"/> |
| <colspec colname="col5" colwidth="1*"/> |
| <colspec colname="col6" colwidth="1*"/> |
| <colspec colname="col7" colwidth="2*"/> |
| <thead> |
| <row> |
| <entry>Name</entry> |
| <entry>Description</entry> |
| <entry>Language</entry> |
| <entry>Server?</entry> |
| <entry>ICU?</entry> |
| <!-- |
| The Bytes/Char field is populated by looking at the values returned |
| by pg_wchar_table.mblen function for each encoding. |
| --> |
| <entry>Bytes/&zwsp;Char</entry> |
| <entry>Aliases</entry> |
| </row> |
| </thead> |
| <tbody> |
| <row> |
| <entry><literal>BIG5</literal></entry> |
| <entry>Big Five</entry> |
| <entry>Traditional Chinese</entry> |
| <entry>No</entry> |
| <entry>No</entry> |
| <entry>1–2</entry> |
| <entry><literal>WIN950</literal>, <literal>Windows950</literal></entry> |
| </row> |
| <row> |
| <entry><literal>EUC_CN</literal></entry> |
| <entry>Extended UNIX Code-CN</entry> |
| <entry>Simplified Chinese</entry> |
| <entry>Yes</entry> |
| <entry>Yes</entry> |
| <entry>1–3</entry> |
| <entry></entry> |
| </row> |
| <row> |
| <entry><literal>EUC_JP</literal></entry> |
| <entry>Extended UNIX Code-JP</entry> |
| <entry>Japanese</entry> |
| <entry>Yes</entry> |
| <entry>Yes</entry> |
| <entry>1–3</entry> |
| <entry></entry> |
| </row> |
| <row> |
| <entry><literal>EUC_JIS_2004</literal></entry> |
| <entry>Extended UNIX Code-JP, JIS X 0213</entry> |
| <entry>Japanese</entry> |
| <entry>Yes</entry> |
| <entry>No</entry> |
| <entry>1–3</entry> |
| <entry></entry> |
| </row> |
| <row> |
| <entry><literal>EUC_KR</literal></entry> |
| <entry>Extended UNIX Code-KR</entry> |
| <entry>Korean</entry> |
| <entry>Yes</entry> |
| <entry>Yes</entry> |
| <entry>1–3</entry> |
| <entry></entry> |
| </row> |
| <row> |
| <entry><literal>EUC_TW</literal></entry> |
| <entry>Extended UNIX Code-TW</entry> |
| <entry>Traditional Chinese, Taiwanese</entry> |
| <entry>Yes</entry> |
| <entry>Yes</entry> |
| <entry>1–3</entry> |
| <entry></entry> |
| </row> |
| <row> |
| <entry><literal>GB18030</literal></entry> |
| <entry>National Standard</entry> |
| <entry>Chinese</entry> |
| <entry>No</entry> |
| <entry>No</entry> |
| <entry>1–4</entry> |
| <entry></entry> |
| </row> |
| <row> |
| <entry><literal>GBK</literal></entry> |
| <entry>Extended National Standard</entry> |
| <entry>Simplified Chinese</entry> |
| <entry>No</entry> |
| <entry>No</entry> |
| <entry>1–2</entry> |
| <entry><literal>WIN936</literal>, <literal>Windows936</literal></entry> |
| </row> |
| <row> |
| <entry><literal>ISO_8859_5</literal></entry> |
| <entry>ISO 8859-5, <acronym>ECMA</acronym> 113</entry> |
| <entry>Latin/Cyrillic</entry> |
| <entry>Yes</entry> |
| <entry>Yes</entry> |
| <entry>1</entry> |
| <entry></entry> |
| </row> |
| <row> |
| <entry><literal>ISO_8859_6</literal></entry> |
| <entry>ISO 8859-6, <acronym>ECMA</acronym> 114</entry> |
| <entry>Latin/Arabic</entry> |
| <entry>Yes</entry> |
| <entry>Yes</entry> |
| <entry>1</entry> |
| <entry></entry> |
| </row> |
| <row> |
| <entry><literal>ISO_8859_7</literal></entry> |
| <entry>ISO 8859-7, <acronym>ECMA</acronym> 118</entry> |
| <entry>Latin/Greek</entry> |
| <entry>Yes</entry> |
| <entry>Yes</entry> |
| <entry>1</entry> |
| <entry></entry> |
| </row> |
| <row> |
| <entry><literal>ISO_8859_8</literal></entry> |
| <entry>ISO 8859-8, <acronym>ECMA</acronym> 121</entry> |
| <entry>Latin/Hebrew</entry> |
| <entry>Yes</entry> |
| <entry>Yes</entry> |
| <entry>1</entry> |
| <entry></entry> |
| </row> |
| <row> |
| <entry><literal>JOHAB</literal></entry> |
| <entry><acronym>JOHAB</acronym></entry> |
| <entry>Korean (Hangul)</entry> |
| <entry>No</entry> |
| <entry>No</entry> |
| <entry>1–3</entry> |
| <entry></entry> |
| </row> |
| <row> |
| <entry><literal>KOI8R</literal></entry> |
| <entry><acronym>KOI</acronym>8-R</entry> |
| <entry>Cyrillic (Russian)</entry> |
| <entry>Yes</entry> |
| <entry>Yes</entry> |
| <entry>1</entry> |
| <entry><literal>KOI8</literal></entry> |
| </row> |
| <row> |
| <entry><literal>KOI8U</literal></entry> |
| <entry><acronym>KOI</acronym>8-U</entry> |
| <entry>Cyrillic (Ukrainian)</entry> |
| <entry>Yes</entry> |
| <entry>Yes</entry> |
| <entry>1</entry> |
| <entry></entry> |
| </row> |
| <row> |
| <entry><literal>LATIN1</literal></entry> |
| <entry>ISO 8859-1, <acronym>ECMA</acronym> 94</entry> |
| <entry>Western European</entry> |
| <entry>Yes</entry> |
| <entry>Yes</entry> |
| <entry>1</entry> |
| <entry><literal>ISO88591</literal></entry> |
| </row> |
| <row> |
| <entry><literal>LATIN2</literal></entry> |
| <entry>ISO 8859-2, <acronym>ECMA</acronym> 94</entry> |
| <entry>Central European</entry> |
| <entry>Yes</entry> |
| <entry>Yes</entry> |
| <entry>1</entry> |
| <entry><literal>ISO88592</literal></entry> |
| </row> |
| <row> |
| <entry><literal>LATIN3</literal></entry> |
| <entry>ISO 8859-3, <acronym>ECMA</acronym> 94</entry> |
| <entry>South European</entry> |
| <entry>Yes</entry> |
| <entry>Yes</entry> |
| <entry>1</entry> |
| <entry><literal>ISO88593</literal></entry> |
| </row> |
| <row> |
| <entry><literal>LATIN4</literal></entry> |
| <entry>ISO 8859-4, <acronym>ECMA</acronym> 94</entry> |
| <entry>North European</entry> |
| <entry>Yes</entry> |
| <entry>Yes</entry> |
| <entry>1</entry> |
| <entry><literal>ISO88594</literal></entry> |
| </row> |
| <row> |
| <entry><literal>LATIN5</literal></entry> |
| <entry>ISO 8859-9, <acronym>ECMA</acronym> 128</entry> |
| <entry>Turkish</entry> |
| <entry>Yes</entry> |
| <entry>Yes</entry> |
| <entry>1</entry> |
| <entry><literal>ISO88599</literal></entry> |
| </row> |
| <row> |
| <entry><literal>LATIN6</literal></entry> |
| <entry>ISO 8859-10, <acronym>ECMA</acronym> 144</entry> |
| <entry>Nordic</entry> |
| <entry>Yes</entry> |
| <entry>Yes</entry> |
| <entry>1</entry> |
| <entry><literal>ISO885910</literal></entry> |
| </row> |
| <row> |
| <entry><literal>LATIN7</literal></entry> |
| <entry>ISO 8859-13</entry> |
| <entry>Baltic</entry> |
| <entry>Yes</entry> |
| <entry>Yes</entry> |
| <entry>1</entry> |
| <entry><literal>ISO885913</literal></entry> |
| </row> |
| <row> |
| <entry><literal>LATIN8</literal></entry> |
| <entry>ISO 8859-14</entry> |
| <entry>Celtic</entry> |
| <entry>Yes</entry> |
| <entry>Yes</entry> |
| <entry>1</entry> |
| <entry><literal>ISO885914</literal></entry> |
| </row> |
| <row> |
| <entry><literal>LATIN9</literal></entry> |
| <entry>ISO 8859-15</entry> |
| <entry>LATIN1 with Euro and accents</entry> |
| <entry>Yes</entry> |
| <entry>Yes</entry> |
| <entry>1</entry> |
| <entry><literal>ISO885915</literal></entry> |
| </row> |
| <row> |
| <entry><literal>LATIN10</literal></entry> |
| <entry>ISO 8859-16, <acronym>ASRO</acronym> SR 14111</entry> |
| <entry>Romanian</entry> |
| <entry>Yes</entry> |
| <entry>No</entry> |
| <entry>1</entry> |
| <entry><literal>ISO885916</literal></entry> |
| </row> |
| <row> |
| <entry><literal>MULE_INTERNAL</literal></entry> |
| <entry>Mule internal code</entry> |
| <entry>Multilingual Emacs</entry> |
| <entry>Yes</entry> |
| <entry>No</entry> |
| <entry>1–4</entry> |
| <entry></entry> |
| </row> |
| <row> |
| <entry><literal>SJIS</literal></entry> |
| <entry>Shift JIS</entry> |
| <entry>Japanese</entry> |
| <entry>No</entry> |
| <entry>No</entry> |
| <entry>1–2</entry> |
| <entry><literal>Mskanji</literal>, <literal>ShiftJIS</literal>, <literal>WIN932</literal>, <literal>Windows932</literal></entry> |
| </row> |
| <row> |
| <entry><literal>SHIFT_JIS_2004</literal></entry> |
| <entry>Shift JIS, JIS X 0213</entry> |
| <entry>Japanese</entry> |
| <entry>No</entry> |
| <entry>No</entry> |
| <entry>1–2</entry> |
| <entry></entry> |
| </row> |
| <row> |
| <entry><literal>SQL_ASCII</literal></entry> |
| <entry>unspecified (see text)</entry> |
| <entry><emphasis>any</emphasis></entry> |
| <entry>Yes</entry> |
| <entry>No</entry> |
| <entry>1</entry> |
| <entry></entry> |
| </row> |
| <row> |
| <entry><literal>UHC</literal></entry> |
| <entry>Unified Hangul Code</entry> |
| <entry>Korean</entry> |
| <entry>No</entry> |
| <entry>No</entry> |
| <entry>1–2</entry> |
| <entry><literal>WIN949</literal>, <literal>Windows949</literal></entry> |
| </row> |
| <row> |
| <entry><literal>UTF8</literal></entry> |
| <entry>Unicode, 8-bit</entry> |
| <entry><emphasis>all</emphasis></entry> |
| <entry>Yes</entry> |
| <entry>Yes</entry> |
| <entry>1–4</entry> |
| <entry><literal>Unicode</literal></entry> |
| </row> |
| <row> |
| <entry><literal>WIN866</literal></entry> |
| <entry>Windows CP866</entry> |
| <entry>Cyrillic</entry> |
| <entry>Yes</entry> |
| <entry>Yes</entry> |
| <entry>1</entry> |
| <entry><literal>ALT</literal></entry> |
| </row> |
| <row> |
| <entry><literal>WIN874</literal></entry> |
| <entry>Windows CP874</entry> |
| <entry>Thai</entry> |
| <entry>Yes</entry> |
| <entry>No</entry> |
| <entry>1</entry> |
| <entry></entry> |
| </row> |
| <row> |
| <entry><literal>WIN1250</literal></entry> |
| <entry>Windows CP1250</entry> |
| <entry>Central European</entry> |
| <entry>Yes</entry> |
| <entry>Yes</entry> |
| <entry>1</entry> |
| <entry></entry> |
| </row> |
| <row> |
| <entry><literal>WIN1251</literal></entry> |
| <entry>Windows CP1251</entry> |
| <entry>Cyrillic</entry> |
| <entry>Yes</entry> |
| <entry>Yes</entry> |
| <entry>1</entry> |
| <entry><literal>WIN</literal></entry> |
| </row> |
| <row> |
| <entry><literal>WIN1252</literal></entry> |
| <entry>Windows CP1252</entry> |
| <entry>Western European</entry> |
| <entry>Yes</entry> |
| <entry>Yes</entry> |
| <entry>1</entry> |
| <entry></entry> |
| </row> |
| <row> |
| <entry><literal>WIN1253</literal></entry> |
| <entry>Windows CP1253</entry> |
| <entry>Greek</entry> |
| <entry>Yes</entry> |
| <entry>Yes</entry> |
| <entry>1</entry> |
| <entry></entry> |
| </row> |
| <row> |
| <entry><literal>WIN1254</literal></entry> |
| <entry>Windows CP1254</entry> |
| <entry>Turkish</entry> |
| <entry>Yes</entry> |
| <entry>Yes</entry> |
| <entry>1</entry> |
| <entry></entry> |
| </row> |
| <row> |
| <entry><literal>WIN1255</literal></entry> |
| <entry>Windows CP1255</entry> |
| <entry>Hebrew</entry> |
| <entry>Yes</entry> |
| <entry>Yes</entry> |
| <entry>1</entry> |
| <entry></entry> |
| </row> |
| <row> |
| <entry><literal>WIN1256</literal></entry> |
| <entry>Windows CP1256</entry> |
| <entry>Arabic</entry> |
| <entry>Yes</entry> |
| <entry>Yes</entry> |
| <entry>1</entry> |
| <entry></entry> |
| </row> |
| <row> |
| <entry><literal>WIN1257</literal></entry> |
| <entry>Windows CP1257</entry> |
| <entry>Baltic</entry> |
| <entry>Yes</entry> |
| <entry>Yes</entry> |
| <entry>1</entry> |
| <entry></entry> |
| </row> |
| <row> |
| <entry><literal>WIN1258</literal></entry> |
| <entry>Windows CP1258</entry> |
| <entry>Vietnamese</entry> |
| <entry>Yes</entry> |
| <entry>Yes</entry> |
| <entry>1</entry> |
| <entry><literal>ABC</literal>, <literal>TCVN</literal>, <literal>TCVN5712</literal>, <literal>VSCII</literal></entry> |
| </row> |
| </tbody> |
| </tgroup> |
| </table> |
| |
| <para> |
| Not all client <acronym>API</acronym>s support all the listed character sets. For example, the |
| <productname>PostgreSQL</productname> |
| JDBC driver does not support <literal>MULE_INTERNAL</literal>, <literal>LATIN6</literal>, |
| <literal>LATIN8</literal>, and <literal>LATIN10</literal>. |
| </para> |
| |
| <para> |
| The <literal>SQL_ASCII</literal> setting behaves considerably differently |
| from the other settings. When the server character set is |
| <literal>SQL_ASCII</literal>, the server interprets byte values 0–127 |
| according to the ASCII standard, while byte values 128–255 are taken |
| as uninterpreted characters. No encoding conversion will be done when |
| the setting is <literal>SQL_ASCII</literal>. Thus, this setting is not so |
| much a declaration that a specific encoding is in use, as a declaration |
| of ignorance about the encoding. In most cases, if you are |
| working with any non-ASCII data, it is unwise to use the |
| <literal>SQL_ASCII</literal> setting because |
| <productname>PostgreSQL</productname> will be unable to help you by |
| converting or validating non-ASCII characters. |
| </para> |
| </sect2> |
| |
| <sect2> |
| <title>Setting the Character Set</title> |
| |
| <para> |
| <command>initdb</command> defines the default character set (encoding) |
| for a <productname>PostgreSQL</productname> cluster. For example, |
| |
| <screen> |
| initdb -E EUC_JP |
| </screen> |
| |
| sets the default character set to |
| <literal>EUC_JP</literal> (Extended Unix Code for Japanese). You |
| can use <option>--encoding</option> instead of |
| <option>-E</option> if you prefer longer option strings. |
| If no <option>-E</option> or <option>--encoding</option> option is |
| given, <command>initdb</command> attempts to determine the appropriate |
| encoding to use based on the specified or default locale. |
| </para> |
| |
| <para> |
| You can specify a non-default encoding at database creation time, |
| provided that the encoding is compatible with the selected locale: |
| |
| <screen> |
| createdb -E EUC_KR -T template0 --lc-collate=ko_KR.euckr --lc-ctype=ko_KR.euckr korean |
| </screen> |
| |
| This will create a database named <literal>korean</literal> that |
| uses the character set <literal>EUC_KR</literal>, and locale <literal>ko_KR</literal>. |
| Another way to accomplish this is to use this SQL command: |
| |
| <programlisting> |
| CREATE DATABASE korean WITH ENCODING 'EUC_KR' LC_COLLATE='ko_KR.euckr' LC_CTYPE='ko_KR.euckr' TEMPLATE=template0; |
| </programlisting> |
| |
| Notice that the above commands specify copying the <literal>template0</literal> |
| database. When copying any other database, the encoding and locale |
| settings cannot be changed from those of the source database, because |
| that might result in corrupt data. For more information see |
| <xref linkend="manage-ag-templatedbs"/>. |
| </para> |
| |
| <para> |
| The encoding for a database is stored in the system catalog |
| <literal>pg_database</literal>. You can see it by using the |
| <command>psql</command> <option>-l</option> option or the |
| <command>\l</command> command. |
| |
| <screen> |
| $ <userinput>psql -l</userinput> |
| List of databases |
| Name | Owner | Encoding | Collation | Ctype | Access Privileges |
| -----------+----------+-----------+-------------+-------------+------------------------------------- |
| clocaledb | hlinnaka | SQL_ASCII | C | C | |
| englishdb | hlinnaka | UTF8 | en_GB.UTF8 | en_GB.UTF8 | |
| japanese | hlinnaka | UTF8 | ja_JP.UTF8 | ja_JP.UTF8 | |
| korean | hlinnaka | EUC_KR | ko_KR.euckr | ko_KR.euckr | |
| postgres | hlinnaka | UTF8 | fi_FI.UTF8 | fi_FI.UTF8 | |
| template0 | hlinnaka | UTF8 | fi_FI.UTF8 | fi_FI.UTF8 | {=c/hlinnaka,hlinnaka=CTc/hlinnaka} |
| template1 | hlinnaka | UTF8 | fi_FI.UTF8 | fi_FI.UTF8 | {=c/hlinnaka,hlinnaka=CTc/hlinnaka} |
| (7 rows) |
| </screen> |
| </para> |
| |
| <important> |
| <para> |
| On most modern operating systems, <productname>PostgreSQL</productname> |
| can determine which character set is implied by the <envar>LC_CTYPE</envar> |
| setting, and it will enforce that only the matching database encoding is |
| used. On older systems it is your responsibility to ensure that you use |
| the encoding expected by the locale you have selected. A mistake in |
| this area is likely to lead to strange behavior of locale-dependent |
| operations such as sorting. |
| </para> |
| |
| <para> |
| <productname>PostgreSQL</productname> will allow superusers to create |
| databases with <literal>SQL_ASCII</literal> encoding even when |
| <envar>LC_CTYPE</envar> is not <literal>C</literal> or <literal>POSIX</literal>. As noted |
| above, <literal>SQL_ASCII</literal> does not enforce that the data stored in |
| the database has any particular encoding, and so this choice poses risks |
| of locale-dependent misbehavior. Using this combination of settings is |
| deprecated and may someday be forbidden altogether. |
| </para> |
| </important> |
| </sect2> |
| |
| <sect2> |
| <title>Automatic Character Set Conversion Between Server and Client</title> |
| |
| <para> |
| <productname>PostgreSQL</productname> supports automatic character |
| set conversion between server and client for many combinations of |
| character sets (<xref linkend="multibyte-conversions-supported"/> |
| shows which ones). |
| </para> |
| |
| <para> |
| To enable automatic character set conversion, you have to |
| tell <productname>PostgreSQL</productname> the character set |
| (encoding) you would like to use in the client. There are several |
| ways to accomplish this: |
| |
| <itemizedlist> |
| <listitem> |
| <para> |
| Using the <command>\encoding</command> command in |
| <application>psql</application>. |
| <command>\encoding</command> allows you to change client |
| encoding on the fly. For |
| example, to change the encoding to <literal>SJIS</literal>, type: |
| |
| <programlisting> |
| \encoding SJIS |
| </programlisting> |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| <application>libpq</application> (<xref linkend="libpq-control"/>) has functions to control the client encoding. |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| Using <command>SET client_encoding TO</command>. |
| |
| Setting the client encoding can be done with this SQL command: |
| |
| <programlisting> |
| SET CLIENT_ENCODING TO '<replaceable>value</replaceable>'; |
| </programlisting> |
| |
| Also you can use the standard SQL syntax <literal>SET NAMES</literal> |
| for this purpose: |
| |
| <programlisting> |
| SET NAMES '<replaceable>value</replaceable>'; |
| </programlisting> |
| |
| To query the current client encoding: |
| |
| <programlisting> |
| SHOW client_encoding; |
| </programlisting> |
| |
| To return to the default encoding: |
| |
| <programlisting> |
| RESET client_encoding; |
| </programlisting> |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| Using <envar>PGCLIENTENCODING</envar>. If the environment variable |
| <envar>PGCLIENTENCODING</envar> is defined in the client's |
| environment, that client encoding is automatically selected |
| when a connection to the server is made. (This can |
| subsequently be overridden using any of the other methods |
| mentioned above.) |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| Using the configuration variable <xref |
| linkend="guc-client-encoding"/>. If the |
| <varname>client_encoding</varname> variable is set, that client |
| encoding is automatically selected when a connection to the |
| server is made. (This can subsequently be overridden using any |
| of the other methods mentioned above.) |
| </para> |
| </listitem> |
| |
| </itemizedlist> |
| </para> |
| |
| <para> |
| If the conversion of a particular character is not possible |
| — suppose you chose <literal>EUC_JP</literal> for the |
| server and <literal>LATIN1</literal> for the client, and some |
| Japanese characters are returned that do not have a representation in |
| <literal>LATIN1</literal> — an error is reported. |
| </para> |
| |
| <para> |
| If the client character set is defined as <literal>SQL_ASCII</literal>, |
| encoding conversion is disabled, regardless of the server's character |
| set. (However, if the server's character set is |
| not <literal>SQL_ASCII</literal>, the server will still check that |
| incoming data is valid for that encoding; so the net effect is as |
| though the client character set were the same as the server's.) |
| Just as for the server, use of <literal>SQL_ASCII</literal> is unwise |
| unless you are working with all-ASCII data. |
| </para> |
| </sect2> |
| |
| <sect2 id="multibyte-conversions-supported"> |
| <title>Available Character Set Conversions</title> |
| |
| <para> |
| <productname>PostgreSQL</productname> allows conversion between any |
| two character sets for which a conversion function is listed in the |
| <link linkend="catalog-pg-conversion"><structname>pg_conversion</structname></link> |
| system catalog. <productname>PostgreSQL</productname> comes with |
| some predefined conversions, as summarized in |
| <xref linkend="multibyte-translation-table"/> and shown in more |
| detail in <xref linkend="builtin-conversions-table"/>. You can |
| create a new conversion using the SQL command |
| <xref linkend="sql-createconversion"/>. (To be used for automatic |
| client/server conversions, a conversion must be marked |
| as <quote>default</quote> for its character set pair.) |
| </para> |
| |
| <table id="multibyte-translation-table"> |
| <title>Built-in Client/Server Character Set Conversions</title> |
| <tgroup cols="2"> |
| <colspec colname="col1" colwidth="1*"/> |
| <colspec colname="col2" colwidth="3*"/> |
| <thead> |
| <row> |
| <entry>Server Character Set</entry> |
| <entry>Available Client Character Sets</entry> |
| </row> |
| </thead> |
| <tbody> |
| <row> |
| <entry><literal>BIG5</literal></entry> |
| <entry><emphasis>not supported as a server encoding</emphasis> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>EUC_CN</literal></entry> |
| <entry><emphasis>EUC_CN</emphasis>, |
| <literal>MULE_INTERNAL</literal>, |
| <literal>UTF8</literal> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>EUC_JP</literal></entry> |
| <entry><emphasis>EUC_JP</emphasis>, |
| <literal>MULE_INTERNAL</literal>, |
| <literal>SJIS</literal>, |
| <literal>UTF8</literal> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>EUC_JIS_2004</literal></entry> |
| <entry><emphasis>EUC_JIS_2004</emphasis>, |
| <literal>SHIFT_JIS_2004</literal>, |
| <literal>UTF8</literal> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>EUC_KR</literal></entry> |
| <entry><emphasis>EUC_KR</emphasis>, |
| <literal>MULE_INTERNAL</literal>, |
| <literal>UTF8</literal> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>EUC_TW</literal></entry> |
| <entry><emphasis>EUC_TW</emphasis>, |
| <literal>BIG5</literal>, |
| <literal>MULE_INTERNAL</literal>, |
| <literal>UTF8</literal> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>GB18030</literal></entry> |
| <entry><emphasis>not supported as a server encoding</emphasis> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>GBK</literal></entry> |
| <entry><emphasis>not supported as a server encoding</emphasis> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>ISO_8859_5</literal></entry> |
| <entry><emphasis>ISO_8859_5</emphasis>, |
| <literal>KOI8R</literal>, |
| <literal>MULE_INTERNAL</literal>, |
| <literal>UTF8</literal>, |
| <literal>WIN866</literal>, |
| <literal>WIN1251</literal> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>ISO_8859_6</literal></entry> |
| <entry><emphasis>ISO_8859_6</emphasis>, |
| <literal>UTF8</literal> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>ISO_8859_7</literal></entry> |
| <entry><emphasis>ISO_8859_7</emphasis>, |
| <literal>UTF8</literal> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>ISO_8859_8</literal></entry> |
| <entry><emphasis>ISO_8859_8</emphasis>, |
| <literal>UTF8</literal> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>JOHAB</literal></entry> |
| <entry><emphasis>not supported as a server encoding</emphasis> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>KOI8R</literal></entry> |
| <entry><emphasis>KOI8R</emphasis>, |
| <literal>ISO_8859_5</literal>, |
| <literal>MULE_INTERNAL</literal>, |
| <literal>UTF8</literal>, |
| <literal>WIN866</literal>, |
| <literal>WIN1251</literal> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>KOI8U</literal></entry> |
| <entry><emphasis>KOI8U</emphasis>, |
| <literal>UTF8</literal> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>LATIN1</literal></entry> |
| <entry><emphasis>LATIN1</emphasis>, |
| <literal>MULE_INTERNAL</literal>, |
| <literal>UTF8</literal> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>LATIN2</literal></entry> |
| <entry><emphasis>LATIN2</emphasis>, |
| <literal>MULE_INTERNAL</literal>, |
| <literal>UTF8</literal>, |
| <literal>WIN1250</literal> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>LATIN3</literal></entry> |
| <entry><emphasis>LATIN3</emphasis>, |
| <literal>MULE_INTERNAL</literal>, |
| <literal>UTF8</literal> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>LATIN4</literal></entry> |
| <entry><emphasis>LATIN4</emphasis>, |
| <literal>MULE_INTERNAL</literal>, |
| <literal>UTF8</literal> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>LATIN5</literal></entry> |
| <entry><emphasis>LATIN5</emphasis>, |
| <literal>UTF8</literal> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>LATIN6</literal></entry> |
| <entry><emphasis>LATIN6</emphasis>, |
| <literal>UTF8</literal> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>LATIN7</literal></entry> |
| <entry><emphasis>LATIN7</emphasis>, |
| <literal>UTF8</literal> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>LATIN8</literal></entry> |
| <entry><emphasis>LATIN8</emphasis>, |
| <literal>UTF8</literal> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>LATIN9</literal></entry> |
| <entry><emphasis>LATIN9</emphasis>, |
| <literal>UTF8</literal> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>LATIN10</literal></entry> |
| <entry><emphasis>LATIN10</emphasis>, |
| <literal>UTF8</literal> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>MULE_INTERNAL</literal></entry> |
| <entry><emphasis>MULE_INTERNAL</emphasis>, |
| <literal>BIG5</literal>, |
| <literal>EUC_CN</literal>, |
| <literal>EUC_JP</literal>, |
| <literal>EUC_KR</literal>, |
| <literal>EUC_TW</literal>, |
| <literal>ISO_8859_5</literal>, |
| <literal>KOI8R</literal>, |
| <literal>LATIN1</literal> to <literal>LATIN4</literal>, |
| <literal>SJIS</literal>, |
| <literal>WIN866</literal>, |
| <literal>WIN1250</literal>, |
| <literal>WIN1251</literal> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>SJIS</literal></entry> |
| <entry><emphasis>not supported as a server encoding</emphasis> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>SHIFT_JIS_2004</literal></entry> |
| <entry><emphasis>not supported as a server encoding</emphasis> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>SQL_ASCII</literal></entry> |
| <entry><emphasis>any (no conversion will be performed)</emphasis> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>UHC</literal></entry> |
| <entry><emphasis>not supported as a server encoding</emphasis> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>UTF8</literal></entry> |
| <entry><emphasis>all supported encodings</emphasis> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>WIN866</literal></entry> |
| <entry><emphasis>WIN866</emphasis>, |
| <literal>ISO_8859_5</literal>, |
| <literal>KOI8R</literal>, |
| <literal>MULE_INTERNAL</literal>, |
| <literal>UTF8</literal>, |
| <literal>WIN1251</literal> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>WIN874</literal></entry> |
| <entry><emphasis>WIN874</emphasis>, |
| <literal>UTF8</literal> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>WIN1250</literal></entry> |
| <entry><emphasis>WIN1250</emphasis>, |
| <literal>LATIN2</literal>, |
| <literal>MULE_INTERNAL</literal>, |
| <literal>UTF8</literal> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>WIN1251</literal></entry> |
| <entry><emphasis>WIN1251</emphasis>, |
| <literal>ISO_8859_5</literal>, |
| <literal>KOI8R</literal>, |
| <literal>MULE_INTERNAL</literal>, |
| <literal>UTF8</literal>, |
| <literal>WIN866</literal> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>WIN1252</literal></entry> |
| <entry><emphasis>WIN1252</emphasis>, |
| <literal>UTF8</literal> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>WIN1253</literal></entry> |
| <entry><emphasis>WIN1253</emphasis>, |
| <literal>UTF8</literal> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>WIN1254</literal></entry> |
| <entry><emphasis>WIN1254</emphasis>, |
| <literal>UTF8</literal> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>WIN1255</literal></entry> |
| <entry><emphasis>WIN1255</emphasis>, |
| <literal>UTF8</literal> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>WIN1256</literal></entry> |
| <entry><emphasis>WIN1256</emphasis>, |
| <literal>UTF8</literal> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>WIN1257</literal></entry> |
| <entry><emphasis>WIN1257</emphasis>, |
| <literal>UTF8</literal> |
| </entry> |
| </row> |
| <row> |
| <entry><literal>WIN1258</literal></entry> |
| <entry><emphasis>WIN1258</emphasis>, |
| <literal>UTF8</literal> |
| </entry> |
| </row> |
| </tbody> |
| </tgroup> |
| </table> |
| |
| <table id="builtin-conversions-table"> |
| <title>All Built-in Character Set Conversions</title> |
| <tgroup cols="3"> |
| <colspec colname="col1" colwidth="2*"/> |
| <colspec colname="col2" colwidth="1*"/> |
| <colspec colname="col3" colwidth="1*"/> |
| <thead> |
| <row> |
| <entry>Conversion Name |
| <footnote> |
| <para> |
| The conversion names follow a standard naming scheme: The |
| official name of the source encoding with all |
| non-alphanumeric characters replaced by underscores, followed |
| by <literal>_to_</literal>, followed by the similarly processed |
| destination encoding name. Therefore, these names sometimes |
| deviate from the customary encoding names shown in |
| <xref linkend="charset-table"/>. |
| </para> |
| </footnote> |
| </entry> |
| <entry>Source Encoding</entry> |
| <entry>Destination Encoding</entry> |
| </row> |
| </thead> |
| |
| <tbody> |
| <row> |
| <entry><literal>big5_to_euc_tw</literal></entry> |
| <entry><literal>BIG5</literal></entry> |
| <entry><literal>EUC_TW</literal></entry> |
| </row> |
| <row> |
| <entry><literal>big5_to_mic</literal></entry> |
| <entry><literal>BIG5</literal></entry> |
| <entry><literal>MULE_INTERNAL</literal></entry> |
| </row> |
| <row> |
| <entry><literal>big5_to_utf8</literal></entry> |
| <entry><literal>BIG5</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| </row> |
| <row> |
| <entry><literal>euc_cn_to_mic</literal></entry> |
| <entry><literal>EUC_CN</literal></entry> |
| <entry><literal>MULE_INTERNAL</literal></entry> |
| </row> |
| <row> |
| <entry><literal>euc_cn_to_utf8</literal></entry> |
| <entry><literal>EUC_CN</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| </row> |
| <row> |
| <entry><literal>euc_jp_to_mic</literal></entry> |
| <entry><literal>EUC_JP</literal></entry> |
| <entry><literal>MULE_INTERNAL</literal></entry> |
| </row> |
| <row> |
| <entry><literal>euc_jp_to_sjis</literal></entry> |
| <entry><literal>EUC_JP</literal></entry> |
| <entry><literal>SJIS</literal></entry> |
| </row> |
| <row> |
| <entry><literal>euc_jp_to_utf8</literal></entry> |
| <entry><literal>EUC_JP</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| </row> |
| <row> |
| <entry><literal>euc_kr_to_mic</literal></entry> |
| <entry><literal>EUC_KR</literal></entry> |
| <entry><literal>MULE_INTERNAL</literal></entry> |
| </row> |
| <row> |
| <entry><literal>euc_kr_to_utf8</literal></entry> |
| <entry><literal>EUC_KR</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| </row> |
| <row> |
| <entry><literal>euc_tw_to_big5</literal></entry> |
| <entry><literal>EUC_TW</literal></entry> |
| <entry><literal>BIG5</literal></entry> |
| </row> |
| <row> |
| <entry><literal>euc_tw_to_mic</literal></entry> |
| <entry><literal>EUC_TW</literal></entry> |
| <entry><literal>MULE_INTERNAL</literal></entry> |
| </row> |
| <row> |
| <entry><literal>euc_tw_to_utf8</literal></entry> |
| <entry><literal>EUC_TW</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| </row> |
| <row> |
| <entry><literal>gb18030_to_utf8</literal></entry> |
| <entry><literal>GB18030</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| </row> |
| <row> |
| <entry><literal>gbk_to_utf8</literal></entry> |
| <entry><literal>GBK</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| </row> |
| <row> |
| <entry><literal>iso_8859_10_to_utf8</literal></entry> |
| <entry><literal>LATIN6</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| </row> |
| <row> |
| <entry><literal>iso_8859_13_to_utf8</literal></entry> |
| <entry><literal>LATIN7</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| </row> |
| <row> |
| <entry><literal>iso_8859_14_to_utf8</literal></entry> |
| <entry><literal>LATIN8</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| </row> |
| <row> |
| <entry><literal>iso_8859_15_to_utf8</literal></entry> |
| <entry><literal>LATIN9</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| </row> |
| <row> |
| <entry><literal>iso_8859_16_to_utf8</literal></entry> |
| <entry><literal>LATIN10</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| </row> |
| <row> |
| <entry><literal>iso_8859_1_to_mic</literal></entry> |
| <entry><literal>LATIN1</literal></entry> |
| <entry><literal>MULE_INTERNAL</literal></entry> |
| </row> |
| <row> |
| <entry><literal>iso_8859_1_to_utf8</literal></entry> |
| <entry><literal>LATIN1</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| </row> |
| <row> |
| <entry><literal>iso_8859_2_to_mic</literal></entry> |
| <entry><literal>LATIN2</literal></entry> |
| <entry><literal>MULE_INTERNAL</literal></entry> |
| </row> |
| <row> |
| <entry><literal>iso_8859_2_to_utf8</literal></entry> |
| <entry><literal>LATIN2</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| </row> |
| <row> |
| <entry><literal>iso_8859_2_to_windows_1250</literal></entry> |
| <entry><literal>LATIN2</literal></entry> |
| <entry><literal>WIN1250</literal></entry> |
| </row> |
| <row> |
| <entry><literal>iso_8859_3_to_mic</literal></entry> |
| <entry><literal>LATIN3</literal></entry> |
| <entry><literal>MULE_INTERNAL</literal></entry> |
| </row> |
| <row> |
| <entry><literal>iso_8859_3_to_utf8</literal></entry> |
| <entry><literal>LATIN3</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| </row> |
| <row> |
| <entry><literal>iso_8859_4_to_mic</literal></entry> |
| <entry><literal>LATIN4</literal></entry> |
| <entry><literal>MULE_INTERNAL</literal></entry> |
| </row> |
| <row> |
| <entry><literal>iso_8859_4_to_utf8</literal></entry> |
| <entry><literal>LATIN4</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| </row> |
| <row> |
| <entry><literal>iso_8859_5_to_koi8_r</literal></entry> |
| <entry><literal>ISO_8859_5</literal></entry> |
| <entry><literal>KOI8R</literal></entry> |
| </row> |
| <row> |
| <entry><literal>iso_8859_5_to_mic</literal></entry> |
| <entry><literal>ISO_8859_5</literal></entry> |
| <entry><literal>MULE_INTERNAL</literal></entry> |
| </row> |
| <row> |
| <entry><literal>iso_8859_5_to_utf8</literal></entry> |
| <entry><literal>ISO_8859_5</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| </row> |
| <row> |
| <entry><literal>iso_8859_5_to_windows_1251</literal></entry> |
| <entry><literal>ISO_8859_5</literal></entry> |
| <entry><literal>WIN1251</literal></entry> |
| </row> |
| <row> |
| <entry><literal>iso_8859_5_to_windows_866</literal></entry> |
| <entry><literal>ISO_8859_5</literal></entry> |
| <entry><literal>WIN866</literal></entry> |
| </row> |
| <row> |
| <entry><literal>iso_8859_6_to_utf8</literal></entry> |
| <entry><literal>ISO_8859_6</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| </row> |
| <row> |
| <entry><literal>iso_8859_7_to_utf8</literal></entry> |
| <entry><literal>ISO_8859_7</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| </row> |
| <row> |
| <entry><literal>iso_8859_8_to_utf8</literal></entry> |
| <entry><literal>ISO_8859_8</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| </row> |
| <row> |
| <entry><literal>iso_8859_9_to_utf8</literal></entry> |
| <entry><literal>LATIN5</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| </row> |
| <row> |
| <entry><literal>johab_to_utf8</literal></entry> |
| <entry><literal>JOHAB</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| </row> |
| <row> |
| <entry><literal>koi8_r_to_iso_8859_5</literal></entry> |
| <entry><literal>KOI8R</literal></entry> |
| <entry><literal>ISO_8859_5</literal></entry> |
| </row> |
| <row> |
| <entry><literal>koi8_r_to_mic</literal></entry> |
| <entry><literal>KOI8R</literal></entry> |
| <entry><literal>MULE_INTERNAL</literal></entry> |
| </row> |
| <row> |
| <entry><literal>koi8_r_to_utf8</literal></entry> |
| <entry><literal>KOI8R</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| </row> |
| <row> |
| <entry><literal>koi8_r_to_windows_1251</literal></entry> |
| <entry><literal>KOI8R</literal></entry> |
| <entry><literal>WIN1251</literal></entry> |
| </row> |
| <row> |
| <entry><literal>koi8_r_to_windows_866</literal></entry> |
| <entry><literal>KOI8R</literal></entry> |
| <entry><literal>WIN866</literal></entry> |
| </row> |
| <row> |
| <entry><literal>koi8_u_to_utf8</literal></entry> |
| <entry><literal>KOI8U</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| </row> |
| <row> |
| <entry><literal>mic_to_big5</literal></entry> |
| <entry><literal>MULE_INTERNAL</literal></entry> |
| <entry><literal>BIG5</literal></entry> |
| </row> |
| <row> |
| <entry><literal>mic_to_euc_cn</literal></entry> |
| <entry><literal>MULE_INTERNAL</literal></entry> |
| <entry><literal>EUC_CN</literal></entry> |
| </row> |
| <row> |
| <entry><literal>mic_to_euc_jp</literal></entry> |
| <entry><literal>MULE_INTERNAL</literal></entry> |
| <entry><literal>EUC_JP</literal></entry> |
| </row> |
| <row> |
| <entry><literal>mic_to_euc_kr</literal></entry> |
| <entry><literal>MULE_INTERNAL</literal></entry> |
| <entry><literal>EUC_KR</literal></entry> |
| </row> |
| <row> |
| <entry><literal>mic_to_euc_tw</literal></entry> |
| <entry><literal>MULE_INTERNAL</literal></entry> |
| <entry><literal>EUC_TW</literal></entry> |
| </row> |
| <row> |
| <entry><literal>mic_to_iso_8859_1</literal></entry> |
| <entry><literal>MULE_INTERNAL</literal></entry> |
| <entry><literal>LATIN1</literal></entry> |
| </row> |
| <row> |
| <entry><literal>mic_to_iso_8859_2</literal></entry> |
| <entry><literal>MULE_INTERNAL</literal></entry> |
| <entry><literal>LATIN2</literal></entry> |
| </row> |
| <row> |
| <entry><literal>mic_to_iso_8859_3</literal></entry> |
| <entry><literal>MULE_INTERNAL</literal></entry> |
| <entry><literal>LATIN3</literal></entry> |
| </row> |
| <row> |
| <entry><literal>mic_to_iso_8859_4</literal></entry> |
| <entry><literal>MULE_INTERNAL</literal></entry> |
| <entry><literal>LATIN4</literal></entry> |
| </row> |
| <row> |
| <entry><literal>mic_to_iso_8859_5</literal></entry> |
| <entry><literal>MULE_INTERNAL</literal></entry> |
| <entry><literal>ISO_8859_5</literal></entry> |
| </row> |
| <row> |
| <entry><literal>mic_to_koi8_r</literal></entry> |
| <entry><literal>MULE_INTERNAL</literal></entry> |
| <entry><literal>KOI8R</literal></entry> |
| </row> |
| <row> |
| <entry><literal>mic_to_sjis</literal></entry> |
| <entry><literal>MULE_INTERNAL</literal></entry> |
| <entry><literal>SJIS</literal></entry> |
| </row> |
| <row> |
| <entry><literal>mic_to_windows_1250</literal></entry> |
| <entry><literal>MULE_INTERNAL</literal></entry> |
| <entry><literal>WIN1250</literal></entry> |
| </row> |
| <row> |
| <entry><literal>mic_to_windows_1251</literal></entry> |
| <entry><literal>MULE_INTERNAL</literal></entry> |
| <entry><literal>WIN1251</literal></entry> |
| </row> |
| <row> |
| <entry><literal>mic_to_windows_866</literal></entry> |
| <entry><literal>MULE_INTERNAL</literal></entry> |
| <entry><literal>WIN866</literal></entry> |
| </row> |
| <row> |
| <entry><literal>sjis_to_euc_jp</literal></entry> |
| <entry><literal>SJIS</literal></entry> |
| <entry><literal>EUC_JP</literal></entry> |
| </row> |
| <row> |
| <entry><literal>sjis_to_mic</literal></entry> |
| <entry><literal>SJIS</literal></entry> |
| <entry><literal>MULE_INTERNAL</literal></entry> |
| </row> |
| <row> |
| <entry><literal>sjis_to_utf8</literal></entry> |
| <entry><literal>SJIS</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| </row> |
| <row> |
| <entry><literal>windows_1258_to_utf8</literal></entry> |
| <entry><literal>WIN1258</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| </row> |
| <row> |
| <entry><literal>uhc_to_utf8</literal></entry> |
| <entry><literal>UHC</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| </row> |
| <row> |
| <entry><literal>utf8_to_big5</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| <entry><literal>BIG5</literal></entry> |
| </row> |
| <row> |
| <entry><literal>utf8_to_euc_cn</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| <entry><literal>EUC_CN</literal></entry> |
| </row> |
| <row> |
| <entry><literal>utf8_to_euc_jp</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| <entry><literal>EUC_JP</literal></entry> |
| </row> |
| <row> |
| <entry><literal>utf8_to_euc_kr</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| <entry><literal>EUC_KR</literal></entry> |
| </row> |
| <row> |
| <entry><literal>utf8_to_euc_tw</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| <entry><literal>EUC_TW</literal></entry> |
| </row> |
| <row> |
| <entry><literal>utf8_to_gb18030</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| <entry><literal>GB18030</literal></entry> |
| </row> |
| <row> |
| <entry><literal>utf8_to_gbk</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| <entry><literal>GBK</literal></entry> |
| </row> |
| <row> |
| <entry><literal>utf8_to_iso_8859_1</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| <entry><literal>LATIN1</literal></entry> |
| </row> |
| <row> |
| <entry><literal>utf8_to_iso_8859_10</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| <entry><literal>LATIN6</literal></entry> |
| </row> |
| <row> |
| <entry><literal>utf8_to_iso_8859_13</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| <entry><literal>LATIN7</literal></entry> |
| </row> |
| <row> |
| <entry><literal>utf8_to_iso_8859_14</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| <entry><literal>LATIN8</literal></entry> |
| </row> |
| <row> |
| <entry><literal>utf8_to_iso_8859_15</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| <entry><literal>LATIN9</literal></entry> |
| </row> |
| <row> |
| <entry><literal>utf8_to_iso_8859_16</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| <entry><literal>LATIN10</literal></entry> |
| </row> |
| <row> |
| <entry><literal>utf8_to_iso_8859_2</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| <entry><literal>LATIN2</literal></entry> |
| </row> |
| <row> |
| <entry><literal>utf8_to_iso_8859_3</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| <entry><literal>LATIN3</literal></entry> |
| </row> |
| <row> |
| <entry><literal>utf8_to_iso_8859_4</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| <entry><literal>LATIN4</literal></entry> |
| </row> |
| <row> |
| <entry><literal>utf8_to_iso_8859_5</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| <entry><literal>ISO_8859_5</literal></entry> |
| </row> |
| <row> |
| <entry><literal>utf8_to_iso_8859_6</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| <entry><literal>ISO_8859_6</literal></entry> |
| </row> |
| <row> |
| <entry><literal>utf8_to_iso_8859_7</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| <entry><literal>ISO_8859_7</literal></entry> |
| </row> |
| <row> |
| <entry><literal>utf8_to_iso_8859_8</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| <entry><literal>ISO_8859_8</literal></entry> |
| </row> |
| <row> |
| <entry><literal>utf8_to_iso_8859_9</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| <entry><literal>LATIN5</literal></entry> |
| </row> |
| <row> |
| <entry><literal>utf8_to_johab</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| <entry><literal>JOHAB</literal></entry> |
| </row> |
| <row> |
| <entry><literal>utf8_to_koi8_r</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| <entry><literal>KOI8R</literal></entry> |
| </row> |
| <row> |
| <entry><literal>utf8_to_koi8_u</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| <entry><literal>KOI8U</literal></entry> |
| </row> |
| <row> |
| <entry><literal>utf8_to_sjis</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| <entry><literal>SJIS</literal></entry> |
| </row> |
| <row> |
| <entry><literal>utf8_to_windows_1258</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| <entry><literal>WIN1258</literal></entry> |
| </row> |
| <row> |
| <entry><literal>utf8_to_uhc</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| <entry><literal>UHC</literal></entry> |
| </row> |
| <row> |
| <entry><literal>utf8_to_windows_1250</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| <entry><literal>WIN1250</literal></entry> |
| </row> |
| <row> |
| <entry><literal>utf8_to_windows_1251</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| <entry><literal>WIN1251</literal></entry> |
| </row> |
| <row> |
| <entry><literal>utf8_to_windows_1252</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| <entry><literal>WIN1252</literal></entry> |
| </row> |
| <row> |
| <entry><literal>utf8_to_windows_1253</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| <entry><literal>WIN1253</literal></entry> |
| </row> |
| <row> |
| <entry><literal>utf8_to_windows_1254</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| <entry><literal>WIN1254</literal></entry> |
| </row> |
| <row> |
| <entry><literal>utf8_to_windows_1255</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| <entry><literal>WIN1255</literal></entry> |
| </row> |
| <row> |
| <entry><literal>utf8_to_windows_1256</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| <entry><literal>WIN1256</literal></entry> |
| </row> |
| <row> |
| <entry><literal>utf8_to_windows_1257</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| <entry><literal>WIN1257</literal></entry> |
| </row> |
| <row> |
| <entry><literal>utf8_to_windows_866</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| <entry><literal>WIN866</literal></entry> |
| </row> |
| <row> |
| <entry><literal>utf8_to_windows_874</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| <entry><literal>WIN874</literal></entry> |
| </row> |
| <row> |
| <entry><literal>windows_1250_to_iso_8859_2</literal></entry> |
| <entry><literal>WIN1250</literal></entry> |
| <entry><literal>LATIN2</literal></entry> |
| </row> |
| <row> |
| <entry><literal>windows_1250_to_mic</literal></entry> |
| <entry><literal>WIN1250</literal></entry> |
| <entry><literal>MULE_INTERNAL</literal></entry> |
| </row> |
| <row> |
| <entry><literal>windows_1250_to_utf8</literal></entry> |
| <entry><literal>WIN1250</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| </row> |
| <row> |
| <entry><literal>windows_1251_to_iso_8859_5</literal></entry> |
| <entry><literal>WIN1251</literal></entry> |
| <entry><literal>ISO_8859_5</literal></entry> |
| </row> |
| <row> |
| <entry><literal>windows_1251_to_koi8_r</literal></entry> |
| <entry><literal>WIN1251</literal></entry> |
| <entry><literal>KOI8R</literal></entry> |
| </row> |
| <row> |
| <entry><literal>windows_1251_to_mic</literal></entry> |
| <entry><literal>WIN1251</literal></entry> |
| <entry><literal>MULE_INTERNAL</literal></entry> |
| </row> |
| <row> |
| <entry><literal>windows_1251_to_utf8</literal></entry> |
| <entry><literal>WIN1251</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| </row> |
| <row> |
| <entry><literal>windows_1251_to_windows_866</literal></entry> |
| <entry><literal>WIN1251</literal></entry> |
| <entry><literal>WIN866</literal></entry> |
| </row> |
| <row> |
| <entry><literal>windows_1252_to_utf8</literal></entry> |
| <entry><literal>WIN1252</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| </row> |
| <row> |
| <entry><literal>windows_1256_to_utf8</literal></entry> |
| <entry><literal>WIN1256</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| </row> |
| <row> |
| <entry><literal>windows_866_to_iso_8859_5</literal></entry> |
| <entry><literal>WIN866</literal></entry> |
| <entry><literal>ISO_8859_5</literal></entry> |
| </row> |
| <row> |
| <entry><literal>windows_866_to_koi8_r</literal></entry> |
| <entry><literal>WIN866</literal></entry> |
| <entry><literal>KOI8R</literal></entry> |
| </row> |
| <row> |
| <entry><literal>windows_866_to_mic</literal></entry> |
| <entry><literal>WIN866</literal></entry> |
| <entry><literal>MULE_INTERNAL</literal></entry> |
| </row> |
| <row> |
| <entry><literal>windows_866_to_utf8</literal></entry> |
| <entry><literal>WIN866</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| </row> |
| <row> |
| <entry><literal>windows_866_to_windows_1251</literal></entry> |
| <entry><literal>WIN866</literal></entry> |
| <entry><literal>WIN</literal></entry> |
| </row> |
| <row> |
| <entry><literal>windows_874_to_utf8</literal></entry> |
| <entry><literal>WIN874</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| </row> |
| <row> |
| <entry><literal>euc_jis_2004_to_utf8</literal></entry> |
| <entry><literal>EUC_JIS_2004</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| </row> |
| <row> |
| <entry><literal>utf8_to_euc_jis_2004</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| <entry><literal>EUC_JIS_2004</literal></entry> |
| </row> |
| <row> |
| <entry><literal>shift_jis_2004_to_utf8</literal></entry> |
| <entry><literal>SHIFT_JIS_2004</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| </row> |
| <row> |
| <entry><literal>utf8_to_shift_jis_2004</literal></entry> |
| <entry><literal>UTF8</literal></entry> |
| <entry><literal>SHIFT_JIS_2004</literal></entry> |
| </row> |
| <row> |
| <entry><literal>euc_jis_2004_to_shift_jis_2004</literal></entry> |
| <entry><literal>EUC_JIS_2004</literal></entry> |
| <entry><literal>SHIFT_JIS_2004</literal></entry> |
| </row> |
| <row> |
| <entry><literal>shift_jis_2004_to_euc_jis_2004</literal></entry> |
| <entry><literal>SHIFT_JIS_2004</literal></entry> |
| <entry><literal>EUC_JIS_2004</literal></entry> |
| </row> |
| </tbody> |
| </tgroup> |
| </table> |
| </sect2> |
| |
| <sect2> |
| <title>Further Reading</title> |
| |
| <para> |
| These are good sources to start learning about various kinds of encoding |
| systems. |
| |
| <variablelist> |
| <varlistentry> |
| <term><citetitle>CJKV Information Processing: Chinese, Japanese, Korean & Vietnamese Computing</citetitle></term> |
| |
| <listitem> |
| <para> |
| Contains detailed explanations of <literal>EUC_JP</literal>, |
| <literal>EUC_CN</literal>, <literal>EUC_KR</literal>, |
| <literal>EUC_TW</literal>. |
| </para> |
| </listitem> |
| </varlistentry> |
| |
| <varlistentry> |
| <term><ulink url="https://www.unicode.org/"></ulink></term> |
| |
| <listitem> |
| <para> |
| The web site of the Unicode Consortium. |
| </para> |
| </listitem> |
| </varlistentry> |
| |
| <varlistentry> |
| <term><ulink url="https://tools.ietf.org/html/rfc3629">RFC 3629</ulink></term> |
| |
| <listitem> |
| <para> |
| <acronym>UTF</acronym>-8 (8-bit UCS/Unicode Transformation |
| Format) is defined here. |
| </para> |
| </listitem> |
| </varlistentry> |
| </variablelist> |
| </para> |
| </sect2> |
| |
| </sect1> |
| |
| </chapter> |