[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
OOC Library and Locales
Hi!
Now that we have Unicode (i.e., LONGCHAR) support available in OOC, I
would like to start a discussion on Locales.
Note that the OOC Library already provides support for locales through
the modules Locales, LocStrings, LocNumConv, LocNumStr, LocTextRider,
LocText, and Calendar. These are similar to the facilities provided by
Standard C: A program's locale is a global property, and it must be
reset every time the program needs to process a different locale.
These C-like facilities are minimally sufficient, but are nowhere near
what Java and C++ provide. And AFAIK, these modules do not yet support
LONGCHAR.
In order to move forward, we need to agree on what exactly constitutes
"complete" support for locales. That is, what facilities do application
developers need in order to easily create internationalized software?
Once these requirements are defined, the necessary classes can be
defined along with their corresponding module interfaces. (Note: There
are no interfaces proposed yet.)
The following proposed requirements are based to a large degree on the
facilities provided by Java. As usual, please feel free to make
comments and suggestions.
Thanks,
Eric
---
Locales
A locale is simply an identifier for a particular combination of
language, country, and variant (which form a ``region''). By
differentiating locales, a program can adapt to a region's
conventions for text, numbers, dates, currency, and other
user-defined objects.
Locale-specific attributes are maintained by each locale-sensitive
class, rather than the locale itself. If a specific locale has not
been set for a local-sensitive object, a default locale setting is
available (probably based on some global setting).
The Locales module provides the means to list all available locales
and create Locale objects (based on language, country, and variant).
A Locale object has accessor methods for getting its language,
country, and variant settings.
Local-sensitive classes use the Locale (and its various settings) to
perform customized operations (e.g., formatting data for display).
Supported Locales are consistent across platforms (and should
probably match fairly closely with those available in Java).
Character Classification and Conversion
Facilities for handling Unicode characters provide both operations
for classification (including access to Unicode attributes such as a
decimal value, an uppercase equivalent, a lowercase equivalent,
and/or a titlecase equivalent) and conversion (which is mainly
converting between upper-, lower-, and titlecase).
Formatting and Parsing
Numbers, dates, times, and messages all may require local-specific
formatting when they are displayed to a user. Formatting facilities
require support for standard locale formats and user-defined
formats. Parsing of data from these formats is also required.
* Calendar and Time Zones
Facilities are provided to represent time values (i.e., "time
stamps" and "time intervals") and calendar dates. Operations for
formatting and parsing of date-time values provide all the fields
needed to implement the date-time format of a particular language
and calendar style. Dates and time values are stored internally
in a locale-independent way, but can be formatted in a
locale-sensitive manner.
(Calendar and Time Zone Support is available in the OOC Library
via the modules `Time' and `Calendar', but they lack support for
Unicode strings. Also, because `Calendar' is dependant on
Locales, it will most likely have to be updated for any new
Locales modules. This might also require that other changes be
made to the `Calendar' interface.)
* Numbers
Numeric formatting and parsing is provided for both monetary and
non-monetary formatting and includes support for the following:
* decimal point character
* digit group separator
* digit groupings
* positive and negative sign characters
* currancy symbols (monetary formats only)
* number of fractional digits displayed (monetary formats only)
* positioning information for signs and currancy symbols
(preceeding or following)
* percentages
* Messages
Message formatting facilities are able to create message strings
based on a locale-specific pattern, with data such as numbers or
dates being inserted in appropriate places. In this way, a
message can vary at run-time and still be applicable to the
current locale.
For example, consider the text of a message that gives the number
of files on a disk drive (in English):
The disk G contains 3 files.
This message is built from two parameters: "3" and "G", but in
another language (in this case French) those parameters can be in
different positions:
Il y a 3 fichiers sur le disque G.
Locale-Sensitive String Operations
These provide locale-sensitive string comparison and collation
(providing means for searching and sorting) and determination of
logical boundaries in text (such boundaries include potential line
breaks, sentence breaks, word breaks, and character breaks).
Character Set Conversion
Because OOC uses Latin-1 (for CHAR) and Unicode (for LONGCHAR) as
its native character encodings, a means to handle text data in other
encodings is provided. Classes to map from other encodings to
Unicode or Latin-1 (as appropriate), and vice versa are supported.
These classes are built as extensions to the existing I/O Subsystem
(Channels, Readers, Writers, etc.).
Localized Resources
Locale-sensitive classes need access to locale specific resources,
which should be packaged and stored in an easily accessible manner.
The most common use of this mechanism is to provide text strings
such as error messages, menu items, titles, and so forth in the
appropriate language. However, other resources, such as images,
should also be able to take advantage of this resource-packaging
mechanism.
Visual Components
Although Visual Oberon is an official part of OOC, the facilities
listed above should be usable by and extensible to visual
components. Each component should have a "locale attribute," which,
if not set directly, should match the locale of its parent object.
An object with no parent has its locale set to the default, when not
set directly.
Developers of visual components may also have to create ``Input
Methods''. The reason for this is that some languages (e.g.,
Chinese, Japanese, and Korean) have large character sets that do not
easily map to a keyboard. So normally, when entering of text is
required, a special utility called an ``input method'' is used. An
input method allows characters to be composed (i.e., a single
character is composed by typing multiple keys). Once a character
has been composed, the input method can forward it to the
appropriate application.