|  |  | 
|  | Issues (and their resolutions) when using gettext for message translation | 
|  |  | 
|  | Contents | 
|  | ======== | 
|  |  | 
|  | * Windows issues | 
|  | * Automatic characterset conversion | 
|  | * Translations on the client | 
|  | * No translations on the server | 
|  |  | 
|  |  | 
|  |  | 
|  | Windows issues | 
|  | ============== | 
|  |  | 
|  | On Windows, Subversion is linked against a modified version of GNU gettext. | 
|  | This resolves several issues: | 
|  |  | 
|  | - Eliminated need to link against libiconv (which would be the second | 
|  | iconv library, since we already link against apr-iconv) | 
|  | - No automatic charset conversion (guaranteed UTF-8 strings returned by | 
|  | gettext() calls without performance penalties) | 
|  |  | 
|  | More in the paragraphs below... | 
|  |  | 
|  |  | 
|  | Automatic characterset conversion | 
|  | ================================= | 
|  |  | 
|  | Some gettext implementations automatically convert the strings in the | 
|  | message catalogue to the active system characterset.  The source encoding | 
|  | is stored in the "" message id.  The message string looks somewhat like | 
|  | a mime header and contains a "Content-Encoding" line. It's typically GNU's | 
|  | gettext which does this. | 
|  |  | 
|  | Subversion uses UTF-8 to encode strings internally, which may not be the | 
|  | systems default character encoding.  To prevent internal corruption, | 
|  | libsvn_subr:svn_cmdline_init2() explicitly tells gettext to return UTF-8 | 
|  | encoded strings if it has bind_textdomain_codeset(). | 
|  |  | 
|  | Some gettext implementations don't contain automatic string recoding.  In | 
|  | order to work with both recoding and non-recoding implementations, the | 
|  | source strings must be UTF-8 encoded.  This is achieved by requiring .po | 
|  | files to be UTF-8 encoded.  [Note: a pre-commit hook has been installed to | 
|  | ensure this.] | 
|  |  | 
|  | On Windows Subversion links against a version of GNU gettext, which has | 
|  | been modified not to do character conversions.  This eliminates the | 
|  | requirement to link against libiconv which would mean Subversion being | 
|  | linked against 2 iconv libraries (apr_iconv as well as libiconv). | 
|  |  | 
|  |  | 
|  | Translations on the client | 
|  | ========================== | 
|  |  | 
|  | The translation effort is to translate most error messages generated on | 
|  | the system on which the user has invoked his subversion command (svnadmin, | 
|  | svnlook, svndumpfilter, svnversion or svn). | 
|  |  | 
|  | This means that in all layers of the libraries strings have been marked for | 
|  | translation, either with _(), N_() or Q_(). | 
|  |  | 
|  | Parameters are sprintf-ed straight into errorstrings at the time they are | 
|  | added to the error structure, so most strings are marked with _() and | 
|  | translated directly into the language for which the client was set up. | 
|  | [Note: The N_() macro marks strings for delayed translation.] | 
|  |  | 
|  |  | 
|  |  | 
|  | Translations on the server | 
|  | ========================== | 
|  |  | 
|  | On systems which define the LC_MESSAGES constant, setlocale() can be used | 
|  | to set string translation for all (error) strings even those outside | 
|  | the Subversion domain. | 
|  |  | 
|  | Windows doesn't define LC_MESSAGES.  Instead GNU gettext uses the environ- | 
|  | ment variables LANGUAGE, LC_ALL, LC_MESSAGES and LANG (in that order) to | 
|  | find out what language to translate to.  If none of these are defined, the | 
|  | system and user default locales are queried.  Though setting one of | 
|  | the aforementioned variables before starting the server will avoid | 
|  | localization by Subversion to the default locale, messages generated | 
|  | by the system itself are likely to still be in its default locale | 
|  | (they are on Windows). | 
|  |  | 
|  | While systems which have the LC_MESSAGES flag (or setenv() - of which | 
|  | Windows has neither) allow languages to be switched at run time, this cannot | 
|  | be done portably. | 
|  |  | 
|  | Any attempt to use setlocale() in an Apache environment may conflict | 
|  | with settings other modules expect to be setup (even when using a | 
|  | prefork MPM).  On the svnserve side, having no portable way to change | 
|  | languages dynamically means that the environment has to be set up | 
|  | correctly from the start.  Futhermore, the svnserve protocol doesn't | 
|  | yet support content negotiation. | 
|  |  | 
|  | In other words, there is no way -- programmatically -- to ensure that | 
|  | messages are served in any specific language using a traditional | 
|  | gettext implementation.  Current consensus is that gettext must be | 
|  | replaced on the server side with a more flexible implementation. | 
|  |  | 
|  | Server requirement(s): | 
|  | - Language negotiation on a per-client session basis.  For a | 
|  | stateless protocol like HTTP, this means per-request.  For a | 
|  | stateful protocol like the one used by svnserve, this means | 
|  | per-connection. | 
|  | - Avoid contamination of environment used by other code (e.g. other | 
|  | Apache modules running in the same server as mod_dav_svn). | 
|  | - Allow for propagation of the language to use to hook scripts. | 
|  | - Continue to inter-op with generic HTTP/DAV clients, and stay | 
|  | compatible with SVN clients of various versions (as per existing | 
|  | compatibility rules). | 
|  |  | 
|  | I18N module requirement(s): | 
|  | - Cross-platform. | 
|  | - Interoperable with gettext tools (e.g. for .po files). | 
|  | - Non-viral license which allows for any necessary modifications. | 
|  | - gettext-like API (needn't be an exact match). | 
|  |  | 
|  | Implementation guidelines: | 
|  | - The L10N API will be uniform across all libraries, clients, and | 
|  | servers.  Server-negotiated language will be recorded in either a | 
|  | context baton (e.g. apr_pool_t.userdata), or in thread-local | 
|  | storage (TLS). | 
|  | - Implemented on top of a new gettext-like module with per-struct or | 
|  | per-thread locale mutator functions and storage for name/value | 
|  | pairs (a glorified apr_hash_t).  (See implementation from Nicolás | 
|  | Lichtmaier noted below.) | 
|  | - Language chosen by the server will be negotiated based on a ranked | 
|  | list of preferences provided by the client. | 
|  | - Language used by httpd/mod_dav_svn will be derived from the | 
|  | Accept-Language HTTP header, and setup by mod_negotiation (when | 
|  | available), or by mod_dav_svn on a per-request basis. | 
|  | - Language used by svnserve derived from additions to the protocol | 
|  | which allow for HTTP-style content negotiation on a per-connection | 
|  | basis.  The protocol extension would use the same sort of q-value | 
|  | list found in the Accept-Language header to specify user language | 
|  | preferences. | 
|  |  | 
|  | Investigation: A brief canvasing of developers (on IRC) indicated that | 
|  | no thorough investigation of existing solutions which might meet the | 
|  | above requirements has been done.  This incomplete canvasing may not | 
|  | paint an accurate picture, however. | 
|  |  | 
|  | A branch <http://svn.apache.org/repos/asf/subversion/branches/server-l10n/> | 
|  | has been created to explore a solution to the above requirements.  While | 
|  | the L10N module is important, how that module is applied to both the | 
|  | server-side and client-side is possibly even more so; an | 
|  | implementation which meets the requirements should not dramatically | 
|  | impact the solution used across the code base for the general L10N | 
|  | API, nor the necessary server-side machinations. | 
|  |  | 
|  | Nicolás Lichtmaier wrote something along the lines of the module | 
|  | referenced in the "Possible implementation" section | 
|  | <http://svn.haxx.se/dev/archive-2004-04/0788.shtml>, which has been | 
|  | committed to the server-l10n branch.  However, it depends upon the GNU | 
|  | gettext .mo format, and the GNU implementation may not be available on | 
|  | all platforms (unless re-implemented).  This module will need to be | 
|  | enhanced or replaced, ideally completely obviating the need for | 
|  | linkage against a platform's own gettext implementation. | 
|  |  | 
|  | Whether to use TLS or a context baton for the L10N API is under | 
|  | discussion.  TLS can provide a more friendly API (albeit somewhat | 
|  | underhanded), while use of a context baton more resilient to change | 
|  | (e.g. if httpd someday allowed more than one thread to service a | 
|  | request).  Here's a sample: | 
|  | - No localization:                          "A message to localize" | 
|  | - Localization w/ TLS or global preference: _("A message to localize") | 
|  | - Localization w/ a context baton:          _("A message to localize", pool) | 
|  |  | 
|  | Historical note: Original consensus indicated that messages from the | 
|  | server side should stay untranslated for transmission to the client. | 
|  | However, client side localization is not an option, because by then | 
|  | the parameter values have been inserted into the string, meaning that | 
|  | it can't be looked up in the messages catalogue anymore.  So any | 
|  | localization must occur on the server, or significantly increase the | 
|  | complexity of marshalling messages from the server as | 
|  | unlocalized/unformatted data structures and localizing them on the | 
|  | client side using some additional wrapper APIs to handle the | 
|  | unmarshalling and message formatting.  Additionally, client and server | 
|  | versions may not match up, meaning that message keys and format string | 
|  | values provided by the server may not correspond to what's available | 
|  | on the client. | 
|  |  | 
|  | Paul Querna suggested a variation on this scheme involving requesting | 
|  | (once) and caching the localizations (to the local disk) for each | 
|  | server version, along with sending the message key (for lookup of | 
|  | localized text) and an already formatted text (to use as the default | 
|  | when no localization bundle is available).  In addition to the | 
|  | complications mentioned previously, this has the downside of crippling | 
|  | the localization of server-generated messages when no write access to | 
|  | the local disk is available to the client. |