[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LONGCHAR proposal



> > > >      SHORT(x)    LONGCHAR             CHAR         projection 
> > > 
> > > This will lose the most significant part of the 2-byte Unicode
> > > character.
> > 
> > True.  But are you saying that this simply needs to be noted in the
> > proposal?  Or that SHORT shouldn't be defined to operate on LONGCHAR
> > and LongStrings?  
> 
> There are two ways to deal with this loss of information.
> 
> 1) Truncation.  Disadvantage: The character mapping to latin-1 is
>    quite arbitrary.
> 
> 2) Mapping of 0100X..FFFFX onto a single character, e.g. "?".
>    Advantage: More deterministic, and the shorted string can be
>    readable if the original Unicode text uses mostly latin-1
>    characters, e.g. if it is an English text with a few special
>    characters like quotes, hyphens, and the like.  The effect would be
>    pretty much like viewing a HTML document produced by a MS product
>    with a non-Windows browser.

Silly me.  SHORT(LONGCHAR) is integer arithmetik, and integer
arithmetic can overflow.  If it overflows it will trigger a
compilation or run-time error.  The result of an operation that causes
an overflow, but is not detected as such, is undefined.  This, of
course, is The Right Way(tm) to deal with SHORT(LONGCHAR).

-- mva