[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Revised LONGCHAR proposal



> From: "Eric W. Nikitin" <enikitin@apk.net>
> Date: Fri, 5 Mar 1999 13:55:32 -0500 (EST)
> 
> Here is the revised proposal for LONGCHAR.  
> 
> If this is acceptable, I believe Michael van Acken will implement the
> compiler changes and probably module LongStrings as well,

I have.  See companion mail.

> and I'll do the documentation.  Can I get some volunteers to work on
> the rest of the library changes?
> 
> 
> Thanks,
> Eric
> ---
> 
> In order to support the Unicode character set, OOC adds the type
> LONGCHAR and introduces the concept of long strings.  The `character
> types' are now CHAR and LONGCHAR, and the `string types' are String
> and LongString.

This definition of "String"/"LongString" is a but problematic, because
further down it is said, e.g., that "LONG(String)" is a legal
expression.  It _is_ legal, as long as the argument is a string
constant, but is is not legal if it is a variable of type array of
CHAR.

Does anyone know if the O2 language report uses the term "string" with
a different meaning than "string constant"?  I would prefer to reserve
"string" for constants, not for ARRAY values (even if they happen to
be terminated by a 0X).  But we must find a way to distinguish the
different kinds of string constants, maybe string(CHAR) and
string(LONGCHAR), with simple "string" referring to either of them?

> I.  Language
> 
> The basic character types are as follows:
> 
>     * CHAR      the characters of the ISO-Latin-1 (i.e., ISO-8859-1)
> 	        character set (0X..0FFX)
> 
>     * LONGCHAR  the characters of the Unicode character set
> 		(0X..0FFFFX)
> 
> The character type LONGCHAR includes the values of type CHAR
> according to the following hierarchy:
> 
>     LONGCHAR >= CHAR

And

      LongString => String

That is, a string constant composed of CHAR can also be used in place
of a string constant composed of LONGCHAR.  The usual implicit type
conversion rules (as known from integer types) apply to character
values and string constants, too.

> Character constants are denoted by the ordinal number of the
> character in hexadecimal notation followed by the letter X.  The type
> of a character constant is the minimal type to which the constant
> value belongs.  (i.e., If the constant value is in the range
> `0X..0FFX', its type is CHAR; otherwise, it is LONGCHAR).
> 
> 
> Constant strings which consist solely of characters in the range
> `0X..0FFX' and strings stored in an ARRAY OF CHAR are of type String,
> all others are of type LongString.
> 
> Constants strings can be represented using the string concatenation
> operator `+' and a combination of characters or string constants.
> For example, the following is of type LongString:
> 
>   CONST
>      aLongString = 0C0ACX + 0C6A9X + " " + 0C2E4X + 0D328X; 

Which is a string of Hangul syllables as far as I am able to determine
:-]

> The following predeclared function procedures support these
> additional operations:
> 
>      Name        Argument type        Result type  Function
>      LONG(x)     CHAR                 LONGCHAR     identity 
>                  String               LongString   identity 
> 
>      LONGCHR(x)  integer type         LONGCHAR     long character with
>                                                    ordinal value x
> 
>      ORD(x)      LONGCHAR             LONGINT      ordinal value of x
> 
>      SHORT(x)    LONGCHAR             CHAR         projection 
>                  LongString           String       projection 
> 
> Please Note: 
> 
> SHORT(x), where x is of type LONGCHAR, can result in overflow, which
> triggers a compilation or run-time error.  The result of an operation
> that causes an overflow, but is not detected as such, is undefined.

Also:
       CAP(x)      CHAR                 CHAR

  Maps lower case letters from ISO-Latin-1 to the capital counterparts,
  identity for all other characters.  Exception: U+00DF (LATIN SMALL
  LETTER SHARP S) whose uppercase version is the two letter sequence
  "SS", and U+00FF (LATIN SMALL LETTER Y WITH DIARESIS) whose capital
  version is outside the ISO-Latin-1 range (it has the code U+0178).
  These two characers are also mapped onto themselves.
  [Gosh, I am being pedantic today!  This should appear in the "Language 
  Specifications" section of the manual.]

       CAP(x)      LONGCHAR             LONGCHAR

  Restricted to the range of CHAR, this function is equivalent to
  CAP(CHAR).  Outside this range it is equivalent to identity.  [I am
  adding this for symmetry reasons.]

MIN(LONGCHAR) and MAX(LONGCHAR) are also defined as expected.

> The predeclared procedure COPY(x, v) also supports LongStrings.
> 
>      Name            Argument type                 Function
>      COPY(x, v)      x: character array, string    v := x
>                      v: character array
> 
> Note that, COPY(x, v) is invalid if x is of type ARRAY OF CHAR, and v
> is of type LongString or ARRAY OF LONGCHAR.
> 
> 
> String types are assignment compatible as follows:
> 
>   An expression e of type Te is assignment compatible with a variable
>   v of type Tv if one of the following conditions hold:
> 
>    1. Tv is an array of LONGCHAR, Te is LongString or String, and 
>       LEN(e) < LEN(v); 
>    2. Tv is an array of CHAR, Te is String, and LEN(e) < LEN(v);
> 
> 
> String types are array compatible as follows:
> 
>   An actual parameter a of type Ta is array compatible with a formal
>   parameter f of type Tf if
> 
>    1. Tf is an open array of LONGCHAR and Ta is LongString, or 
>    2. Tf is an open array of CHAR and Ta is String. 

Based on a (not up to date) copy of the language report, the rule (3)
from "Array compatible" is probably better rephrased as

  An actual parameter a of type Ta is array compatible with a formal
  parameter f of type Tf if

    3a. f is a value parameter of type ARRAY OF CHAR and a is a String,
        or
    3b. f is a value parameter of type ARRAY OF LONGCHAR and a is a
        String or LongString.

> Character and string types are expression compatible as follows:
> 
>        Operator        First operand   Second operand  Result type
>        = # < <= > >=   character type  character type  BOOLEAN
>                        string type     string type     BOOLEAN

Just to make this clear: implicit type conversion rules apply to both
character values and string constants.  They do _not_ apply to
character arrays.

> II.  Library
> 
> [...]
> 	CSTRING would also need to be changed to deal with
> 	LongStrings.  (Or is it that we need to add CWIDESTRING?)

This is just a type flag govering assigment rules of array variables.
So sticking to CSTRING is sufficient IMO.

-- mva