[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Revised LONGCHAR proposal
Here is the revised proposal for LONGCHAR.
If this is acceptable, I believe Michael van Acken will implement the
compiler changes and probably module LongStrings as well, and I'll do
the documentation. Can I get some volunteers to work on the rest of
the library changes?
Thanks,
Eric
---
In order to support the Unicode character set, OOC adds the type
LONGCHAR and introduces the concept of long strings. The `character
types' are now CHAR and LONGCHAR, and the `string types' are String
and LongString.
I. Language
The basic character types are as follows:
* CHAR the characters of the ISO-Latin-1 (i.e., ISO-8859-1)
character set (0X..0FFX)
* LONGCHAR the characters of the Unicode character set
(0X..0FFFFX)
The character type LONGCHAR includes the values of type CHAR
according to the following hierarchy:
LONGCHAR >= CHAR
Character constants are denoted by the ordinal number of the
character in hexadecimal notation followed by the letter X. The type
of a character constant is the minimal type to which the constant
value belongs. (i.e., If the constant value is in the range
`0X..0FFX', its type is CHAR; otherwise, it is LONGCHAR).
Constant strings which consist solely of characters in the range
`0X..0FFX' and strings stored in an ARRAY OF CHAR are of type String,
all others are of type LongString.
Constants strings can be represented using the string concatenation
operator `+' and a combination of characters or string constants.
For example, the following is of type LongString:
CONST
aLongString = 0C0ACX + 0C6A9X + " " + 0C2E4X + 0D328X;
The following predeclared function procedures support these
additional operations:
Name Argument type Result type Function
LONG(x) CHAR LONGCHAR identity
String LongString identity
LONGCHR(x) integer type LONGCHAR long character with
ordinal value x
ORD(x) LONGCHAR LONGINT ordinal value of x
SHORT(x) LONGCHAR CHAR projection
LongString String projection
Please Note:
SHORT(x), where x is of type LONGCHAR, can result in overflow, which
triggers a compilation or run-time error. The result of an operation
that causes an overflow, but is not detected as such, is undefined.
The predeclared procedure COPY(x, v) also supports LongStrings.
Name Argument type Function
COPY(x, v) x: character array, string v := x
v: character array
Note that, COPY(x, v) is invalid if x is of type ARRAY OF CHAR, and v
is of type LongString or ARRAY OF LONGCHAR.
String types are assignment compatible as follows:
An expression e of type Te is assignment compatible with a variable
v of type Tv if one of the following conditions hold:
1. Tv is an array of LONGCHAR, Te is LongString or String, and
LEN(e) < LEN(v);
2. Tv is an array of CHAR, Te is String, and LEN(e) < LEN(v);
String types are array compatible as follows:
An actual parameter a of type Ta is array compatible with a formal
parameter f of type Tf if
1. Tf is an open array of LONGCHAR and Ta is LongString, or
2. Tf is an open array of CHAR and Ta is String.
Character and string types are expression compatible as follows:
Operator First operand Second operand Result type
= # < <= > >= character type character type BOOLEAN
string type string type BOOLEAN
II. Library
The following modules would be added to support LONGCHAR and
LongStrings:
LongStrings
LongRider [ABSTRACT]
UnicodeRider (or is UnicodeMapper a better name?)
(I removed the addition of `LongCharClass' from this proposal because
character classification of LONGCHARs is always locale sensitive. I
will bring this up again in later discussions on Locales.)
The following modules would have to be modified:
BinaryRider (add Reader: ReadLChar, ReadLString;
Writer: WriteLChar, WriteLString;
Note that the above would be affected by calls
to SetByteOrder. Also, an option needs to be
added to set how a BinaryRider interprets (and
possibly discards) a leading U+FEFF byte order
marker.)
Calendar (procedures TimeToLStr and LStrToTime)
Exception (LRaise and GetLMessage)
CSTRING would also need to be changed to deal with
LongStrings. (Or is it that we need to add CWIDESTRING?)
(The following could also be changed/added to support LONGCHAR,
but these aren't absolutely necessary:
Integers (procedures ConvertFromLString and
ConvertToLString)
Integer/LongString Conversions
Real/LongString Conversions)
III. MISC
Other mapper classes can be added (as time permits) to handle
additional 8- and 16-bit encodings. These classes map from
another encoding (e.g., "KSC5601", a standard Korean character
encoding) to Unicode or Latin-1 (as appropriate), and vice
versa. Here "encoding" means both the encoding of n bit values
in byte streams, and translation of character codes between the
two standards.
A possible class hierarchy is as follows:
Rider [ABSTRACT]
/ \
/ \
/ \
TextRider LongRider [ABSTRACT]
/\ |
/ \ |
/ \ |
Cp037Rider (other UnicodeRider
8-bit /\
encodings) / \
/ \
KSCRider (other 16-bit encodings)
`Rider' is the abstract class that defines the Reader, Writer,
and Scanner interfaces as currently implemented in `TextRider'.
Modules In, Out, and Err would be defined relative to this
class.
`LongRider' adds LONGCHAR and LongString support.
Input/output of TextRider is ISO-Latin-1, and likewise the I/O
of UnicodeRider is Unicode.