[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

OOC Library and Locales



Hi!

Now that we have Unicode (i.e., LONGCHAR) support available in OOC, I
would like to start a discussion on Locales.

Note that the OOC Library already provides support for locales through
the modules Locales, LocStrings, LocNumConv, LocNumStr, LocTextRider,
LocText, and Calendar.  These are similar to the facilities provided by
Standard C: A program's locale is a global property, and it must be
reset every time the program needs to process a different locale.

These C-like facilities are minimally sufficient, but are nowhere near
what Java and C++ provide.  And AFAIK, these modules do not yet support
LONGCHAR.

In order to move forward, we need to agree on what exactly constitutes
"complete" support for locales.  That is, what facilities do application
developers need in order to easily create internationalized software?
Once these requirements are defined, the necessary classes can be
defined along with their corresponding module interfaces.  (Note: There
are no interfaces proposed yet.)

The following proposed requirements are based to a large degree on the
facilities provided by Java.  As usual, please feel free to make
comments and suggestions.


Thanks,
Eric

---

  Locales 

    A locale is simply an identifier for a particular combination of
    language, country, and variant (which form a ``region'').  By
    differentiating locales, a program can adapt to a region's
    conventions for text, numbers, dates, currency, and other
    user-defined objects.
        
    Locale-specific attributes are maintained by each locale-sensitive
    class, rather than the locale itself.  If a specific locale has not
    been set for a local-sensitive object, a default locale setting is
    available (probably based on some global setting).

    The Locales module provides the means to list all available locales
    and create Locale objects (based on language, country, and variant).
    A Locale object has accessor methods for getting its language,
    country, and variant settings.

    Local-sensitive classes use the Locale (and its various settings) to
    perform customized operations (e.g., formatting data for display).

    Supported Locales are consistent across platforms (and should
    probably match fairly closely with those available in Java).


  Character Classification and Conversion

    Facilities for handling Unicode characters provide both operations
    for classification (including access to Unicode attributes such as a
    decimal value, an uppercase equivalent, a lowercase equivalent,
    and/or a titlecase equivalent) and conversion (which is mainly
    converting between upper-, lower-, and titlecase).


  Formatting and Parsing

    Numbers, dates, times, and messages all may require local-specific
    formatting when they are displayed to a user.  Formatting facilities
    require support for standard locale formats and user-defined
    formats.  Parsing of data from these formats is also required.

    * Calendar and Time Zones

       Facilities are provided to represent time values (i.e., "time
       stamps" and "time intervals") and calendar dates.  Operations for
       formatting and parsing of date-time values provide all the fields
       needed to implement the date-time format of a particular language
       and calendar style.  Dates and time values are stored internally
       in a locale-independent way, but can be formatted in a
       locale-sensitive manner.

       (Calendar and Time Zone Support is available in the OOC Library
       via the modules `Time' and `Calendar', but they lack support for
       Unicode strings.  Also, because `Calendar' is dependant on
       Locales, it will most likely have to be updated for any new
       Locales modules.  This might also require that other changes be
       made to the `Calendar' interface.)


    * Numbers

       Numeric formatting and parsing is provided for both monetary and
       non-monetary formatting and includes support for the following:

       * decimal point character 
       * digit group separator
       * digit groupings
       * positive and negative sign characters
       * currancy symbols (monetary formats only)
       * number of fractional digits displayed (monetary formats only)
       * positioning information for signs and currancy symbols
         (preceeding or following)
       * percentages


    * Messages

       Message formatting facilities are able to create message strings
       based on a locale-specific pattern, with data such as numbers or
       dates being inserted in appropriate places.  In this way, a
       message can vary at run-time and still be applicable to the
       current locale.

       For example, consider the text of a message that gives the number
       of files on a disk drive (in English):

          The disk G contains 3 files.

       This message is built from two parameters: "3" and "G", but in
       another language (in this case French) those parameters can be in
       different positions:

          Il y a 3 fichiers sur le disque G.


  Locale-Sensitive String Operations 

    These provide locale-sensitive string comparison and collation
    (providing means for searching and sorting) and determination of
    logical boundaries in text (such boundaries include potential line
    breaks, sentence breaks, word breaks, and character breaks).


  Character Set Conversion 

    Because OOC uses Latin-1 (for CHAR) and Unicode (for LONGCHAR) as
    its native character encodings, a means to handle text data in other
    encodings is provided.  Classes to map from other encodings to
    Unicode or Latin-1 (as appropriate), and vice versa are supported.
    These classes are built as extensions to the existing I/O Subsystem
    (Channels, Readers, Writers, etc.).


  Localized Resources 

    Locale-sensitive classes need access to locale specific resources,
    which should be packaged and stored in an easily accessible manner.
    The most common use of this mechanism is to provide text strings
    such as error messages, menu items, titles, and so forth in the
    appropriate language.  However, other resources, such as images,
    should also be able to take advantage of this resource-packaging
    mechanism.


  Visual Components

    Although Visual Oberon is an official part of OOC, the facilities
    listed above should be usable by and extensible to visual
    components.  Each component should have a "locale attribute," which,
    if not set directly, should match the locale of its parent object.
    An object with no parent has its locale set to the default, when not
    set directly.

    Developers of visual components may also have to create ``Input
    Methods''.  The reason for this is that some languages (e.g.,
    Chinese, Japanese, and Korean) have large character sets that do not
    easily map to a keyboard.  So normally, when entering of text is
    required, a special utility called an ``input method'' is used.  An
    input method allows characters to be composed (i.e., a single
    character is composed by typing multiple keys).  Once a character
    has been composed, the input method can forward it to the
    appropriate application.