[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LONGCHAR proposal




Michael van Acken wrote:
> It will make CAP slower, but I think we can life
> with the slowdown.  Now, does anything have a mapping from lower to
> upper case characters for the upper half of the iso-latin-1 character
> set?  Otherwise this discussion is quite pointless.

These are available as part of the data provided by the the Unicode
Consortium (www.unicode.org).  These are straightforward, unlike some
of the other Unicode case mappings, which may or may not map to
single characters, and may or may not be locale specific.


> I don't think LONGCHAR should appear in the module interface of
> `TextRider'.  It is much easier to duplicate its functionality in a
> separate module that deals exclusively with LONGCHAR, instead of
> kludging it into `TextRider'.
> 
> And without LONGCHAR in `TextRider' it is impossible to have a common
> "UnicodeMapper" (or whatever) super class that includes and string
> functions.  A question: Could we derive `KSCRider' from a concrete
> class `UnicodeRider'?  The answer would be "yes" if we define
> `KSCRider' to just map from KSC encoding to Unicode and vice versa.
> Here "encoding" means both the encoding of n bit values in byte
> streams, and translation of character codes between the two standards.

I think the answer is yes.  That's pretty much how Java handles
non-Unicode encodings.

As a first attempt at a class hierarchy, consider the following
(please feel free to suggest different names, along with other
suggestions):

  
                Rider [ABSTRACT]
                /    \
              /        \
            /            \
       TextRider        LongRider [ABSTRACT]
          /\                 |
        /    \               |
      /        \             |
  Cp037Rider (other       UnicodeRider
              8-bit           /\
              encodings)    /    \
                          /        \
                      KSCRider    (other 16-bit encodings)


(Cp037 is US EBCDIC, I think.)

`Rider' is the abstract class, which defines the Reader, Writer, and
Scanner interfaces as currently implemented in `TextRider'.

`LongRider' adds LONGCHAR and LongString support to these.  (The
parent-child relationship between Rider and LongRider probably isn't
necessary.  Any arguments for or against having this relationship?)



Eric