[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: LONGCHAR proposal
>The following is a proposal to add the type LONGCHAR to OOC, which has been
>discussed here very briefly before. Please feel free to critique it.
I think this would be a useful addition to OOC. Currently, there is no good
mapping for Unicode data types under Microsoft Windows. LONGCHAR would
solve this problem.
[...]
>Constant strings which consist solely of characters in the range
>`0X..0FFX' and strings stored in an array of CHAR are of type String,
>all others are of type LongString.
>
>(LongString constants need to have a means of representing Unicode
> character values. Java does it using escape sequences like,
> "\uc0ac\uc6a9". What would be an equivalent Oberon-like way to do
> this? How does Component Pascal handle this? I'd guess that, since
> CP has a string concatenation operator `+', you could write
> 0C0ACX+0C6A9X to represent the above Java string.)
Yes, this is correct. While concatenation is generally used for operating
on variables, it can also be used to build strings containing special
characters (eg. control characters, Unicode characters). It would be useful
to introduce a limited form of string concatenation that works for
constants. This could also be useful in many Windows functions. For
example, MessageBoxA is used to display informational or error messages in
a window. Without string concatenation, one needs to build strings at
run-time:
(*
** Usage - explain correct usage for program
*)
PROCEDURE Usage;
VAR
nl : ARRAY 3 OF CHAR;
buffer : ARRAY 256 OF CHAR;
resI : LONGINT;
BEGIN
nl[0] := 0DX;
nl[1] := 0AX;
nl[2] := 0X;
Strings.Append("Usage:", buffer); Strings.Append(nl, buffer);
Strings.Append(" capture -f <name> - capture to file <name>", buffer);
Strings.Append(nl, buffer);
Strings.Append(" capture -p <name> - capture to pipe \\.\pipe\<name>",
buffer);
Strings.Append(nl, buffer);
Strings.Append(" capture -o - capture to standard output", buffer);
resI := W.MessageBoxA(0, buffer, "Error!", W.MB_OK);
END Usage;
Using string concatenation, the equivalent is:
PROCEDURE Usage;
CONST
nl = 0DX + 0AX;
VAR
resI : LONGINT;
BEGIN
resI := W.MessageBoxA(0,
"Usage:" + nl +
" capture -f <name> - capture to file <name>" + nl +
" capture -p <name> - capture to pipe \\.\pipe\<name>" + nl +
" capture -o - capture to standard output",
"Error!", W.MB_OK);
END Usage;
An alternative approach might be to implement a Channel that writes to a
string buffer, and use TextRiders to do the formatting. However, this still
requires work at run-time which is not really necessary.
[...]
>The following predeclared function procedures support these
>additional operations:
>
> Name Argument type Result type Function
> LONG(x) CHAR LONGCHAR identity
> String LongString identity
>
> SHORT(x) LONGCHAR CHAR projection
> LongString String projection
I would suggest in addition:
ORD(x) LONGCHAR INTEGER ordinal value of x
LONGCHR(x) integer type LONGCHAR long character with
ordinal
value x
[...]
>The following modules would have to be modified:
>
> Channel (add Channel: LErrorDescr;
> Reader: LErrorDescr;
> Writer: LErrorDescr)
> Files (add LErrorDescr, LNew, LOld, LTmp, LSetModTime,
I don't think the underlying run-time system will allow LONGCHARs in file
names. This means that longstrings will need to be converted to strings,
possibly signalling an error if non-representable characters are included
in file names.
> LGetModTime, LExists;
> File: LErrorDescr,
> Reader: LErrorDescr;
> Writer: LErrorDescr)
>
> BinaryRider (add Reader: ReadLChar, LErrorDescr, ReadLString;
> Writer: WriteLChar, LErrorDescr, WriteLString)
>
> TextRider (This will require the most significant changes,
> and will need at least some discussion as to
> what should be done. Some things to think
> about:
>
> 1) We could just add methods like methods like
> reader.ReadLChar and reader.ReadLString, but
> what about things like reader.ReadInt, how
> would they be set to read from Unicode
> streams? Should there be methods like
> reader.LongReadInt?
>
> 2) Or should there be a separate module
> LongTextRider? And if so, how would that
> affect modules In, Out, Err, and Log? Could
> we make it so that TextRider.reader is
> interchangeable with LongTextRider.reader?)
I think approach (1) is probably the simpler and would be easier to
maintain since it would not require dupication of the majority of the code.
We could regard the "character width" as simply an attribute of the text.
Thus, there would be only one TextRider class, but it would deal with
either 1-byte and 2-byte characters. Character types should be able to be
used interchangably, provided that they are representable.
Thus, to define the Text type, we add:
reader.SetType(x) x=1: ISO-Latin, x=2: Unicode
writer.SetType(x)
For Unicode Text (type=2) streams all character operations read/write
2-byte characters.
writer.WriteLChar(x) writes x
writer.WriteChar(x) writes LONG(x)
reader.ReadLChar() returns c, where c is the next LONGCHAR in the input
reader.ReadChar() returns SHORT(c), where c is the next LONGCHAR in
the input or an error if c > MAX(CHAR)
For ISO-Latin text (type=1) streams all character operations read/write
1-byte characters.
writer.WriteLChar(x) writes SHORT(x) where x <= MAX(CHAR), but returns
an error if x > MAX(CHAR)
writer.WriteChar(x) writes x
reader.ReadLChar() returns LONG(c), where c is the next CHAR in the
input
reader.ReadChar() returns c, where c is the next CHAR in the input
Functions like ReadInt would continue to use ReadChar, but should check the
error status in case a non-ISO-Latin character was returned.
Read/WriteString would use Read/WriteChar, and Read/WriteLString would use
Read/WriteLChar. WriteLString and ReadString would need to check the error
state in case non-representable characters are involved.
It ought to be possible to make a clever TextRider.reader that
automatically detects the width of the text stream, since 0 is not a valid
text character but will start the majority of plain-text characters when
represented in Unicode.
- Stewart
PS. CSTRING would also need to be changed to deal with LongStrings.