| <!-- doc/src/sgml/nls.sgml --> |
| |
| <chapter id="nls"> |
| <title>Native Language Support</title> |
| |
| <sect1 id="nls-translator"> |
| <title>For the Translator</title> |
| |
| <para> |
| <productname>PostgreSQL</productname> |
| programs (server and client) can issue their messages in |
| your favorite language — if the messages have been translated. |
| Creating and maintaining translated message sets needs the help of |
| people who speak their own language well and want to contribute to |
| the <productname>PostgreSQL</productname> effort. You do not have to be a |
| programmer at all |
| to do this. This section explains how to help. |
| </para> |
| |
| <sect2> |
| <title>Requirements</title> |
| |
| <para> |
| We won't judge your language skills — this section is about |
| software tools. Theoretically, you only need a text editor. But |
| this is only in the unlikely event that you do not want to try out |
| your translated messages. When you configure your source tree, be |
| sure to use the <option>--enable-nls</option> option. This will |
| also check for the <application>libintl</application> library and the |
| <filename>msgfmt</filename> program, which all end users will need |
| anyway. To try out your work, follow the applicable portions of |
| the installation instructions. |
| </para> |
| |
| <para> |
| If you want to start a new translation effort or want to do a |
| message catalog merge (described later), you will need the |
| programs <filename>xgettext</filename> and |
| <filename>msgmerge</filename>, respectively, in a GNU-compatible |
| implementation. Later, we will try to arrange it so that if you |
| use a packaged source distribution, you won't need |
| <filename>xgettext</filename>. (If working from Git, you will still need |
| it.) <application>GNU Gettext 0.10.36</application> or later is currently recommended. |
| </para> |
| |
| <para> |
| Your local gettext implementation should come with its own |
| documentation. Some of that is probably duplicated in what |
| follows, but for additional details you should look there. |
| </para> |
| </sect2> |
| |
| <sect2> |
| <title>Concepts</title> |
| |
| <para> |
| The pairs of original (English) messages and their (possibly) |
| translated equivalents are kept in <firstterm>message |
| catalogs</firstterm>, one for each program (although related |
| programs can share a message catalog) and for each target |
| language. There are two file formats for message catalogs: The |
| first is the <quote>PO</quote> file (for Portable Object), which |
| is a plain text file with special syntax that translators edit. |
| The second is the <quote>MO</quote> file (for Machine Object), |
| which is a binary file generated from the respective PO file and |
| is used while the internationalized program is run. Translators |
| do not deal with MO files; in fact hardly anyone does. |
| </para> |
| |
| <para> |
| The extension of the message catalog file is to no surprise either |
| <filename>.po</filename> or <filename>.mo</filename>. The base |
| name is either the name of the program it accompanies, or the |
| language the file is for, depending on the situation. This is a |
| bit confusing. Examples are <filename>psql.po</filename> (PO file |
| for psql) or <filename>fr.mo</filename> (MO file in French). |
| </para> |
| |
| <para> |
| The file format of the PO files is illustrated here: |
| <programlisting> |
| # comment |
| |
| msgid "original string" |
| msgstr "translated string" |
| |
| msgid "more original" |
| msgstr "another translated" |
| "string can be broken up like this" |
| |
| ... |
| </programlisting> |
| The msgid lines are extracted from the program source. (They need not |
| be, but this is the most common way.) The msgstr lines are |
| initially empty and are filled in with useful strings by the |
| translator. The strings can contain C-style escape characters and |
| can be continued across lines as illustrated. (The next line must |
| start at the beginning of the line.) |
| </para> |
| |
| <para> |
| The # character introduces a comment. If whitespace immediately |
| follows the # character, then this is a comment maintained by the |
| translator. There can also be automatic comments, which have a |
| non-whitespace character immediately following the #. These are |
| maintained by the various tools that operate on the PO files and |
| are intended to aid the translator. |
| <programlisting> |
| #. automatic comment |
| #: filename.c:1023 |
| #, flags, flags |
| </programlisting> |
| The #. style comments are extracted from the source file where the |
| message is used. Possibly the programmer has inserted information |
| for the translator, such as about expected alignment. The #: |
| comments indicate the exact locations where the message is used |
| in the source. The translator need not look at the program |
| source, but can if there is doubt about the correct |
| translation. The #, comments contain flags that describe the |
| message in some way. There are currently two flags: |
| <literal>fuzzy</literal> is set if the message has possibly been |
| outdated because of changes in the program source. The translator |
| can then verify this and possibly remove the fuzzy flag. Note |
| that fuzzy messages are not made available to the end user. The |
| other flag is <literal>c-format</literal>, which indicates that |
| the message is a <function>printf</function>-style format |
| template. This means that the translation should also be a format |
| string with the same number and type of placeholders. There are |
| tools that can verify this, which key off the c-format flag. |
| </para> |
| </sect2> |
| |
| <sect2> |
| <title>Creating and Maintaining Message Catalogs</title> |
| |
| <para> |
| OK, so how does one create a <quote>blank</quote> message |
| catalog? First, go into the directory that contains the program |
| whose messages you want to translate. If there is a file |
| <filename>nls.mk</filename>, then this program has been prepared |
| for translation. |
| </para> |
| |
| <para> |
| If there are already some <filename>.po</filename> files, then |
| someone has already done some translation work. The files are |
| named <filename><replaceable>language</replaceable>.po</filename>, |
| where <replaceable>language</replaceable> is the |
| <ulink url="https://www.loc.gov/standards/iso639-2/php/English_list.php"> |
| ISO 639-1 two-letter language code (in lower case)</ulink>, e.g., |
| <filename>fr.po</filename> for French. If there is really a need |
| for more than one translation effort per language then the files |
| can also be named |
| <filename><replaceable>language</replaceable>_<replaceable>region</replaceable>.po</filename> |
| where <replaceable>region</replaceable> is the |
| <ulink url="https://www.iso.org/iso-3166-country-codes.html"> |
| ISO 3166-1 two-letter country code (in upper case)</ulink>, |
| e.g., |
| <filename>pt_BR.po</filename> for Portuguese in Brazil. If you |
| find the language you wanted you can just start working on that |
| file. |
| </para> |
| |
| <para> |
| If you need to start a new translation effort, then first run the |
| command: |
| <programlisting> |
| make init-po |
| </programlisting> |
| This will create a file |
| <filename><replaceable>progname</replaceable>.pot</filename>. |
| (<filename>.pot</filename> to distinguish it from PO files that |
| are <quote>in production</quote>. The <literal>T</literal> stands for |
| <quote>template</quote>.) |
| Copy this file to |
| <filename><replaceable>language</replaceable>.po</filename> and |
| edit it. To make it known that the new language is available, |
| also edit the file <filename>nls.mk</filename> and add the |
| language (or language and country) code to the line that looks like: |
| <programlisting> |
| AVAIL_LANGUAGES := de fr |
| </programlisting> |
| (Other languages can appear, of course.) |
| </para> |
| |
| <para> |
| As the underlying program or library changes, messages might be |
| changed or added by the programmers. In this case you do not need |
| to start from scratch. Instead, run the command: |
| <programlisting> |
| make update-po |
| </programlisting> |
| which will create a new blank message catalog file (the pot file |
| you started with) and will merge it with the existing PO files. |
| If the merge algorithm is not sure about a particular message it |
| marks it <quote>fuzzy</quote> as explained above. The new PO file |
| is saved with a <filename>.po.new</filename> extension. |
| </para> |
| </sect2> |
| |
| <sect2> |
| <title>Editing the PO Files</title> |
| |
| <para> |
| The PO files can be edited with a regular text editor. The |
| translator should only change the area between the quotes after |
| the msgstr directive, add comments, and alter the fuzzy flag. |
| There is (unsurprisingly) a PO mode for Emacs, which I find quite |
| useful. |
| </para> |
| |
| <para> |
| The PO files need not be completely filled in. The software will |
| automatically fall back to the original string if no translation |
| (or an empty translation) is available. It is no problem to |
| submit incomplete translations for inclusions in the source tree; |
| that gives room for other people to pick up your work. However, |
| you are encouraged to give priority to removing fuzzy entries |
| after doing a merge. Remember that fuzzy entries will not be |
| installed; they only serve as reference for what might be the right |
| translation. |
| </para> |
| |
| <para> |
| Here are some things to keep in mind while editing the |
| translations: |
| <itemizedlist> |
| <listitem> |
| <para> |
| Make sure that if the original ends with a newline, the |
| translation does, too. Similarly for tabs, etc. |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| If the original is a <function>printf</function> format string, the translation |
| also needs to be. The translation also needs to have the same |
| format specifiers in the same order. Sometimes the natural |
| rules of the language make this impossible or at least awkward. |
| In that case you can modify the format specifiers like this: |
| <programlisting> |
| msgstr "Die Datei %2$s hat %1$u Zeichen." |
| </programlisting> |
| Then the first placeholder will actually use the second |
| argument from the list. The |
| <literal><replaceable>digits</replaceable>$</literal> needs to |
| follow the % immediately, before any other format manipulators. |
| (This feature really exists in the <function>printf</function> |
| family of functions. You might not have heard of it before because |
| there is little use for it outside of message |
| internationalization.) |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| If the original string contains a linguistic mistake, report |
| that (or fix it yourself in the program source) and translate |
| normally. The corrected string can be merged in when the |
| program sources have been updated. If the original string |
| contains a factual mistake, report that (or fix it yourself) |
| and do not translate it. Instead, you can mark the string with |
| a comment in the PO file. |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| Maintain the style and tone of the original string. |
| Specifically, messages that are not sentences (<literal>cannot |
| open file %s</literal>) should probably not start with a |
| capital letter (if your language distinguishes letter case) or |
| end with a period (if your language uses punctuation marks). |
| It might help to read <xref linkend="error-style-guide"/>. |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| If you don't know what a message means, or if it is ambiguous, |
| ask on the developers' mailing list. Chances are that English |
| speaking end users might also not understand it or find it |
| ambiguous, so it's best to improve the message. |
| </para> |
| </listitem> |
| |
| </itemizedlist> |
| </para> |
| </sect2> |
| |
| </sect1> |
| |
| |
| <sect1 id="nls-programmer"> |
| <title>For the Programmer</title> |
| |
| <sect2 id="nls-mechanics"> |
| <title>Mechanics</title> |
| |
| <para> |
| This section describes how to implement native language support in a |
| program or library that is part of the |
| <productname>PostgreSQL</productname> distribution. |
| Currently, it only applies to C programs. |
| </para> |
| |
| <procedure> |
| <title>Adding NLS Support to a Program</title> |
| |
| <step> |
| <para> |
| Insert this code into the start-up sequence of the program: |
| <programlisting> |
| #ifdef ENABLE_NLS |
| #include <locale.h> |
| #endif |
| |
| ... |
| |
| #ifdef ENABLE_NLS |
| setlocale(LC_ALL, ""); |
| bindtextdomain("<replaceable>progname</replaceable>", LOCALEDIR); |
| textdomain("<replaceable>progname</replaceable>"); |
| #endif |
| </programlisting> |
| (The <replaceable>progname</replaceable> can actually be chosen |
| freely.) |
| </para> |
| </step> |
| |
| <step> |
| <para> |
| Wherever a message that is a candidate for translation is found, |
| a call to <function>gettext()</function> needs to be inserted. E.g.: |
| <programlisting> |
| fprintf(stderr, "panic level %d\n", lvl); |
| </programlisting> |
| would be changed to: |
| <programlisting> |
| fprintf(stderr, gettext("panic level %d\n"), lvl); |
| </programlisting> |
| (<symbol>gettext</symbol> is defined as a no-op if NLS support is |
| not configured.) |
| </para> |
| |
| <para> |
| This tends to add a lot of clutter. One common shortcut is to use: |
| <programlisting> |
| #define _(x) gettext(x) |
| </programlisting> |
| Another solution is feasible if the program does much of its |
| communication through one or a few functions, such as |
| <function>ereport()</function> in the backend. Then you make this |
| function call <function>gettext</function> internally on all |
| input strings. |
| </para> |
| </step> |
| |
| <step> |
| <para> |
| Add a file <filename>nls.mk</filename> in the directory with the |
| program sources. This file will be read as a makefile. The |
| following variable assignments need to be made here: |
| |
| <variablelist> |
| <varlistentry> |
| <term><varname>CATALOG_NAME</varname></term> |
| |
| <listitem> |
| <para> |
| The program name, as provided in the |
| <function>textdomain()</function> call. |
| </para> |
| </listitem> |
| </varlistentry> |
| |
| <varlistentry> |
| <term><varname>AVAIL_LANGUAGES</varname></term> |
| |
| <listitem> |
| <para> |
| List of provided translations — initially empty. |
| </para> |
| </listitem> |
| </varlistentry> |
| |
| <varlistentry> |
| <term><varname>GETTEXT_FILES</varname></term> |
| |
| <listitem> |
| <para> |
| List of files that contain translatable strings, i.e., those |
| marked with <function>gettext</function> or an alternative |
| solution. Eventually, this will include nearly all source |
| files of the program. If this list gets too long you can |
| make the first <quote>file</quote> be a <literal>+</literal> |
| and the second word be a file that contains one file name per |
| line. |
| </para> |
| </listitem> |
| </varlistentry> |
| |
| <varlistentry> |
| <term><varname>GETTEXT_TRIGGERS</varname></term> |
| |
| <listitem> |
| <para> |
| The tools that generate message catalogs for the translators |
| to work on need to know what function calls contain |
| translatable strings. By default, only |
| <function>gettext()</function> calls are known. If you used |
| <function>_</function> or other identifiers you need to list |
| them here. If the translatable string is not the first |
| argument, the item needs to be of the form |
| <literal>func:2</literal> (for the second argument). |
| If you have a function that supports pluralized messages, |
| the item should look like <literal>func:1,2</literal> |
| (identifying the singular and plural message arguments). |
| </para> |
| </listitem> |
| </varlistentry> |
| </variablelist> |
| </para> |
| </step> |
| |
| </procedure> |
| |
| <para> |
| The build system will automatically take care of building and |
| installing the message catalogs. |
| </para> |
| </sect2> |
| |
| <sect2 id="nls-guidelines"> |
| <title>Message-Writing Guidelines</title> |
| |
| <para> |
| Here are some guidelines for writing messages that are easily |
| translatable. |
| |
| <itemizedlist> |
| <listitem> |
| <para> |
| Do not construct sentences at run-time, like: |
| <programlisting> |
| printf("Files were %s.\n", flag ? "copied" : "removed"); |
| </programlisting> |
| The word order within the sentence might be different in other |
| languages. Also, even if you remember to call <function>gettext()</function> on |
| each fragment, the fragments might not translate well separately. It's |
| better to duplicate a little code so that each message to be |
| translated is a coherent whole. Only numbers, file names, and |
| such-like run-time variables should be inserted at run time into |
| a message text. |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| For similar reasons, this won't work: |
| <programlisting> |
| printf("copied %d file%s", n, n!=1 ? "s" : ""); |
| </programlisting> |
| because it assumes how the plural is formed. If you figured you |
| could solve it like this: |
| <programlisting> |
| if (n==1) |
| printf("copied 1 file"); |
| else |
| printf("copied %d files", n): |
| </programlisting> |
| then be disappointed. Some languages have more than two forms, |
| with some peculiar rules. It's often best to design the message |
| to avoid the issue altogether, for instance like this: |
| <programlisting> |
| printf("number of copied files: %d", n); |
| </programlisting> |
| </para> |
| |
| <para> |
| If you really want to construct a properly pluralized message, |
| there is support for this, but it's a bit awkward. When generating |
| a primary or detail error message in <function>ereport()</function>, you can |
| write something like this: |
| <programlisting> |
| errmsg_plural("copied %d file", |
| "copied %d files", |
| n, |
| n) |
| </programlisting> |
| The first argument is the format string appropriate for English |
| singular form, the second is the format string appropriate for |
| English plural form, and the third is the integer control value |
| that determines which plural form to use. Subsequent arguments |
| are formatted per the format string as usual. (Normally, the |
| pluralization control value will also be one of the values to be |
| formatted, so it has to be written twice.) In English it only |
| matters whether <replaceable>n</replaceable> is 1 or not 1, but in other |
| languages there can be many different plural forms. The translator |
| sees the two English forms as a group and has the opportunity to |
| supply multiple substitute strings, with the appropriate one being |
| selected based on the run-time value of <replaceable>n</replaceable>. |
| </para> |
| |
| <para> |
| If you need to pluralize a message that isn't going directly to an |
| <function>errmsg</function> or <function>errdetail</function> report, you have to use |
| the underlying function <function>ngettext</function>. See the gettext |
| documentation. |
| </para> |
| </listitem> |
| |
| <listitem> |
| <para> |
| If you want to communicate something to the translator, such as |
| about how a message is intended to line up with other output, |
| precede the occurrence of the string with a comment that starts |
| with <literal>translator</literal>, e.g.: |
| <programlisting> |
| /* translator: This message is not what it seems to be. */ |
| </programlisting> |
| These comments are copied to the message catalog files so that |
| the translators can see them. |
| </para> |
| </listitem> |
| </itemizedlist> |
| </para> |
| </sect2> |
| </sect1> |
| |
| </chapter> |