| <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> |
| <HTML> |
| <HEAD> |
| <META HTTP-EQUIV="CONTENT-TYPE" CONTENT="text/html; charset=iso-8859-1"> |
| <TITLE>How to add a new locale to the i18n framework</TITLE> |
| <META NAME="GENERATOR" CONTENT="OpenOffice.org 1.9.55 (Linux)"> |
| <META NAME="CREATED" CONTENT="20020214;18353600"> |
| <META NAME="CHANGEDBY" CONTENT="Eike Rathke"> |
| <META NAME="CHANGED" CONTENT="20041004;14060600"> |
| <STYLE> |
| <!-- |
| @page { size: 21cm 29.7cm; margin: 2cm } |
| TD P { margin-bottom: 0.2cm } |
| TH P { margin-bottom: 0.2cm; font-style: italic } |
| --> |
| </STYLE> |
| </HEAD> |
| <BODY LANG="en-US" DIR="LTR"> |
| <H2>How to add a new locale to the i18n framework</H2> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <H2>Overview</H2> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>The |
| i18n framework offers full-featured i18n functionality that covers a range of |
| geographies that, besides West and East European derivates, includes East Asia |
| (CJK), South Asia and South-East Asia (Indian, Thai) and West Asia and Middle |
| East (Arabic, Hebrew), so-called CTL (Complex Text Layout) and BiDi |
| (bidirectional) script types. Also, the i18n framework is built over the |
| component model UNO thus making the addition of new i18n components |
| easy.</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>The |
| following language and locale specific attributes are supported :</FONT></P> |
| <TABLE WIDTH=100% BORDER=1 BORDERCOLOR="#000000" CELLPADDING=5 CELLSPACING=0> |
| <COL WIDTH=85*> |
| <COL WIDTH=85*> |
| <COL WIDTH=85*> |
| <THEAD> |
| <TR VALIGN=TOP> |
| <TH WIDTH=33%> |
| <P>i18n Attribute Name</P> |
| </TH> |
| <TH WIDTH=33%> |
| <P>Feature/Consumer</P> |
| </TH> |
| <TH WIDTH=33%> |
| <P>Location in Source</P> |
| </TH> |
| </TR> |
| </THEAD> |
| <TBODY> |
| <TR VALIGN=TOP> |
| <TD WIDTH=33%> |
| <P>Locale Data</P> |
| </TD> |
| <TD WIDTH=33%> |
| <P>Provide all locale sensitive data, like |
| date/time/number/currency format, calendar information etc.</P> |
| </TD> |
| <TD WIDTH=33%> |
| <P>i18npool/source/localedata/data</P> |
| </TD> |
| </TR> |
| <TR VALIGN=TOP> |
| <TD WIDTH=33%> |
| <P>Character Classification</P> |
| </TD> |
| <TD WIDTH=33%> |
| <P>Provide API to implement features such as switching case, |
| capitalization, punctuation and so on.</P> |
| </TD> |
| <TD WIDTH=33%> |
| <P>i18npool/source/characterclassification</P> |
| </TD> |
| </TR> |
| <TR VALIGN=TOP> |
| <TD WIDTH=33%> |
| <P>Calendar</P> |
| </TD> |
| <TD WIDTH=33%> |
| <P STYLE="font-weight: medium"><FONT SIZE=3>Provide the ability |
| to support a variety of calendaring systems</FONT></P> |
| </TD> |
| <TD WIDTH=33%> |
| <P>i18npool/source/calendar</P> |
| </TD> |
| </TR> |
| <TR VALIGN=TOP> |
| <TD WIDTH=33%> |
| <P>Break Iterator</P> |
| </TD> |
| <TD WIDTH=33%> |
| <P STYLE="font-weight: medium"><FONT SIZE=3>Provide |
| language/script specific Cursor placement, Word, Line, and |
| Sentence breaking</FONT></P> |
| </TD> |
| <TD WIDTH=33%> |
| <P>i18npool/source/breakiterator</P> |
| </TD> |
| </TR> |
| <TR VALIGN=TOP> |
| <TD WIDTH=33%> |
| <P>Collator</P> |
| </TD> |
| <TD WIDTH=33%> |
| <P STYLE="font-weight: medium"><FONT SIZE=3>Provide the ability |
| to perform sorting and indexing according to local conventions</FONT></P> |
| </TD> |
| <TD WIDTH=33%> |
| <P>i18npool/source/collator</P> |
| </TD> |
| </TR> |
| <TR VALIGN=TOP> |
| <TD WIDTH=33%> |
| <P>Transliteration</P> |
| </TD> |
| <TD WIDTH=33%> |
| <P>Numerous applications including in Searching, Input, with more |
| applications for Indian languages</P> |
| </TD> |
| <TD WIDTH=33%> |
| <P>i18npool/source/transliteration</P> |
| </TD> |
| </TR> |
| <TR VALIGN=TOP> |
| <TD WIDTH=33%> |
| <P>Index entry</P> |
| </TD> |
| <TD WIDTH=33%> |
| <P STYLE="font-weight: medium"><FONT SIZE=3>Support indexing |
| feature</FONT></P> |
| </TD> |
| <TD WIDTH=33%> |
| <P>i18npool/source/indexentry</P> |
| </TD> |
| </TR> |
| <TR VALIGN=TOP> |
| <TD WIDTH=33%> |
| <P>Search & Replace</P> |
| </TD> |
| <TD WIDTH=33%> |
| <P STYLE="font-weight: medium"><FONT SIZE=3>Support the |
| Find/Change feature</FONT></P> |
| </TD> |
| <TD WIDTH=33%> |
| <P>i18npool/source/search</P> |
| </TD> |
| </TR> |
| </TBODY> |
| </TABLE> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <H2>Locale Data</H2> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm">For most locales this is the |
| only thing you need to implement. Follow the instructions lined out |
| in the <A HREF="LocaleData.html">excerpt from the Developers Guide |
| I18n chapter</A>.</P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>For |
| the following topics it might be necessary to implement them, |
| depending on the locale. Please refer also to the Developers Guide, |
| as information given there might be more up-to-date, and this hasn't |
| been synchronized yet.</FONT></P> |
| <H2>CharacterClassification</H2> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>The |
| component provides toUpper()/toLower()/toTitle() and get various |
| character attribute defined by Unicode. These functions are |
| implemented by cclass_unicode class, if you need language specific |
| requirements for these functions, you can derive a language specific |
| class cclass_<locale_name> from cclass_unicode and overwrite |
| the methods for the functions. In most cases, these attributes are |
| well defined by Unicode, you don't need to create your own class.</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>The |
| class also provides a number parser, i.e., if a particular language |
| needs number parsing, you will need to derive a class and overwrite |
| the method cclass_unicode::parsePredefinedToken(). Typical examples |
| of where number parsing is needed is to accept date, calendaring |
| information.</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>A |
| manager class 'CharacterClassificationImpl' will handle the loading |
| of language specific implementation of CharacterClassification on the |
| fly. If no implementation is provided the implementation defaults to |
| class 'cclass_unicode'.</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <H2>Calendar</H2> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>The |
| component provides a calendar service. All calendar implementations |
| are managed by a class 'CalendarImpl' the front-end, which |
| dynamically calls a locale specific implementation.</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>Calendar_gregorian |
| is a wrapper to the ICU Calendar class. </FONT> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>If |
| you need to implement a localespecific calendar, you can choose to |
| either derive your class from Calendar_gregorian or write one from |
| scratch.</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>There |
| are three steps to create a localespecific calendar,</FONT></P> |
| <OL> |
| <LI><P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>Name |
| your calendar <name> (for example, 'gengou' for Japanese |
| Calendar) and add it to localedata XML file with proper |
| day/month/era names.</FONT></P> |
| <LI><P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>Derive |
| a class either from Calendar_gregorian or XCalendar, name it as |
| Calendar_<name>, which will be loaded by CalendarImpl when the |
| calendar is specified.</FONT></P> |
| <LI><P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>Add |
| your new calendar as a service in |
| i18npool/source/registerservices/registerservices.cxx, </FONT> |
| </P> |
| </OL> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>If |
| you plan to derive from the Gregorian calendar, you need to know the |
| mapping between your new calendar and the Gregorian calendar. For |
| example, the Japanese Emperor Era calendar has a starting year offset |
| to Gregorian calendar for each era. You will need to override the |
| method Calendar_gregorian::convertValue to map the Era/Year/Month/Day |
| from the Gregorian calendar to the calendar for your language.</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <H2>BreakIterator</H2> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>This |
| component provides Character(Cell)/Word/Sentence/Line-break service |
| to its users, e.g. BreakIterator component provides the APIs to |
| iterate a string by character, word, line and sentence. Interface of |
| this component is used by the Output layer for the following |
| operations:</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <UL> |
| <LI><P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>Cursor |
| positioning and selection — Since a character or cell can take |
| more than one code point, cursor movement cannot be done by |
| incrementing or decrementing the index.</FONT></P> |
| <LI><P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>Complex |
| Text Layout Languages — In CTL languages (such as Thai, |
| Hebrew, Arabic and Indian), multiple characters may combine to form |
| a display cell. Cursor movement must traverse a display cell instead |
| of a single character.</FONT></P> |
| </UL> |
| <P>Line breaking must be highly configurable in desktop publishing |
| applications. The line breaking algorithm should be able to find a |
| line break with or without a hyphenator. Additionally, it should be |
| able to parse special characters that are illegal if they occur at |
| the end or beginning of a line. |
| </P> |
| <P ALIGN=LEFT STYLE="font-weight: medium"><FONT SIZE=3>Both the above |
| are locale-sensitive.</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>The |
| BreakIterator components are managed by the class BreakIteratorImpl, |
| which will load the language specific component in service name |
| BreakIterator_<language> dynamically.</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>The |
| base breakiterator class 'BreakIterator_Unicode' is a wrapper to the |
| ICU BreakIterator class. While this class meets the requirements for |
| western languages, it is not so for other languages such as those of |
| East Asia (CJK), South Asia and South-East Asia (Indian, Thai) and |
| West Asia and Middle East (Arabic, Hebrew), where we require more |
| enhanced functionality as described above</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>Thus |
| the current BreakIterator base class has two derived classes, |
| BreakIterator_CJK and from BreakIterator_Unicode, first one will |
| provide dictionary base word break for Chinese and Japanese, second |
| will provide more specific definition for Character/Cell/Cluster for |
| the language like Thai, Arabic.</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>Use |
| the following steps to create language specific BreakIterator |
| service,</FONT></P> |
| <OL> |
| <LI><P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>Derive |
| a class either from BreakIterator_CJK or BreakIterator_CTL, name it |
| as BreakIterator_<language>.</FONT></P> |
| <LI><P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>Add |
| new service in registerservices.cxx.</FONT></P> |
| </OL> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>There |
| are 3 methods for word breaking, |
| nextWord()/previousWord/getWrodBoundary(). You can overwrite them by |
| your own language rules.</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>BreakIterator_CJK |
| provides input string caching and dictionary searching for longest |
| matching. You may provide a sorted dictionary (the encoding needs to |
| be UTF-8) by creating the following file: |
| i18npool/source/breakiterator/data/dict_<language>.</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>The |
| utility 'gendict' will convert it to C code which will be compiled |
| into a shared library for dynamical loading. </FONT> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>All |
| dictionary searching/loading is performed in xdictionary class, only |
| thing you need to do is to derived your class from BreakIterator_CJK |
| and create an instance of the xdictionary with language name and pass |
| it to parent class. </FONT> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <H2>Collation</H2> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>There |
| are two type of collations, single or multiple level collation.</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>Most |
| European and English locales need multiple level collation. We use |
| the ICU collator to cover this need.</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>Most |
| CJK languages need only single level collation. We have created a two |
| step table lookup to do the collation for the languages. If you have |
| a new language or algorithm in this category, you can derive a new |
| service from Collator_CJK and provide index and weight tables. Here |
| is a sample implementation,</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>************************************************************************/</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>#include |
| <collator_CJK.hxx></FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>static |
| sal_uInt16 index[] = {</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>...</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>};</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>static |
| sal_uInt16 weight[] = {</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>...</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>};</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>sal_Int32 |
| SAL_CALL Collator_zh_CN_pinyin::compareSubstring (</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>const |
| ::rtl::OUString& str1, sal_Int32 off1, sal_Int32 len1,</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>const |
| ::rtl::OUString& str2, sal_Int32 off2, sal_Int32 len2)</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>throw |
| (::com::sun::star::uno::RuntimeException) {</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>return |
| compare(str1, off1, len1, str2, off2, len2, index, weight);</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>}</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>sal_Int32 |
| SAL_CALL Collator_zh_CN_pinyin::compareString (</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>const |
| ::rtl::OUString& str1,</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>const |
| ::rtl::OUString& str2)</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>throw |
| (::com::sun::star::uno::RuntimeException) {</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>return |
| compare(str1, 0, str1.getLength(), str2, 0, str2.getLength(), index, |
| weight);</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>}</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>Fontend |
| implementation Collator will load and cache language specific service |
| on the name Collator_<locale> dynamically.</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>The |
| step to add new services</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <OL> |
| <LI><P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>Derive |
| new service from above class</FONT></P> |
| <LI><P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>Provide |
| idx and weight tables.</FONT></P> |
| <LI><P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>Register |
| new service in registerservices.cxx</FONT></P> |
| <LI><P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>Add |
| new service in collation section in localedata file.</FONT></P> |
| </OL> |
| <P><BR><BR> |
| </P> |
| <H2><FONT SIZE=4 STYLE="font-size: 16pt"><B>Transliteration</B></FONT></H2> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>Translation |
| is the service for string conversion. Frontend implementation |
| TranliterationImpl will load and cache specific transliteration |
| services by enum defined in Xtransliteration.idl or by implementation |
| name dynamically.</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>We |
| have defined transliteration in three categories, Ignore, OneToOne |
| and Numeric, all of them are derived from |
| transliteration_commonclass.</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>Ignore |
| service is for ignore case, half/full width, katakana/hiragana etc. |
| You can derive your new service from it and overwrite |
| folding/transliteration methods.</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>OneToOne |
| service is for one to one mapping. For example, converting lower case |
| to upper case etc. The class provide two more services, take mapping |
| table or mapping function to do folding/transliteration. You can |
| derive a class from it and provide a table or function for the parent |
| class to do translation.</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>Numeric |
| service is used to convert number to number string in specific |
| languages. It can be used to format date strings etc.</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>To |
| add new transliteration</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <OL> |
| <LI><P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>Derive |
| a new class from above three classes.</FONT></P> |
| <LI><P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>Overwrite |
| folding/transliteration methods or provide table for parent to do |
| transliteration.</FONT></P> |
| <LI><P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>Register |
| new service in registerservices.cxx</FONT></P> |
| <LI><P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>Add |
| new service in transliteration section in localedata file.</FONT></P> |
| </OL> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <H2>Indexing</H2> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>Indexing |
| provides a service for generating index pages. The main method for |
| the service is getIndexCharacter(). Frontend implementation |
| IndexEntrySupplier will load and cache language specific services |
| based on the name IndexEntrySupplier_<locale> dynamically.</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>We |
| have devided languages into two sets. </FONT> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>First |
| is Latin1 languages, which can be covered by 256 Unicode code points. |
| We use a one step table lookup to generate the index character. We |
| have generated alphabetic and numeric tables that cover most Latin1 |
| languages. But if you think you need another algorithm or have |
| conflicts with the table, you can create your own table and derive a |
| new class from IndexEntrySupplier_Euro. Here is a sample |
| implementation,</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>#include |
| <sal/types.h></FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>#include |
| <indexentrysupplier_euro.hxx></FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>#include |
| <indexdata_alphanumeric.h></FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>OUString |
| SAL_CALL i18n::IndexEntrySupplier_alphanumeric::getIndexCharacter( |
| const OUString& rIndexEntry,</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>const |
| lang::Locale& rLocale, const OUString& rSortAlgorithm ) throw |
| (uno::RuntimeException) {</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>return |
| getIndexString(rIndexEntry, idxStr);</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>}</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>where |
| idxStr is the table.</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>For |
| the languages which could not be covered in first case, like CJK, we |
| use two step table lookup. Here is a sample implementation,</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>#include |
| <indexentrysupplier_cjk.hxx></FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>#include |
| <indexdata_zh_pinyin.h></FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>OUString |
| SAL_CALL i18n::IndexEntrySupplier_zh_pinyin::getIndexCharacter( const |
| OUString& rIndexEntry,</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>const |
| lang::Locale& rLocale, const OUString& rSortAlgorithm ) throw |
| (uno::RuntimeException) {</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>return |
| getIndexString(rIndexEntry, idxStr, idx1, idx2);</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>}</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>Where |
| idx1 and idx2 are two step tables. IdxStr contains all index keys |
| that will be returned. If you have a new language or algorithm, you |
| can derive new service from IndexEntrySupplier_CJK and provide tables |
| for parent class to generate index.</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>Note, |
| the index does very much depend on collation, each index algorithm |
| should have a collation algorithm to support it.</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>To |
| add new service,</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <OL> |
| <LI><P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>Derive |
| a new service from above class.</FONT></P> |
| <LI><P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>Provide |
| table for lookup</FONT></P> |
| <LI><P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>Register |
| new service in registerservices.cxx</FONT></P> |
| </OL> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <H2>Search and Replace</H2> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>Search |
| and replace is also locale dependent because there are special search |
| options only available for a particular locale. For instance if the |
| “Asian languages support” is enabled, you'll see an |
| additional option for “Sounds like (Japanese)” in the |
| “Find & Replace” dialog box. With this option, you |
| can turn on/off certain Japanese specific option in the search and |
| replace process.</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>Search |
| and replace relies on the transliteration modules for various search |
| options. The transliteration modules will be loaded and the search |
| string will be converted before the search process.</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm; font-weight: medium"><FONT SIZE=3>Search |
| supports regular expressions, the regular expression implementation |
| uses the transliteration service available for the locale to perform |
| case insensitive search.</FONT></P> |
| <P ALIGN=LEFT STYLE="margin-bottom: 0cm"><BR> |
| </P> |
| </BODY> |
| </HTML> |