[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LONGCHAR proposal



> From: "Eric W. Nikitin" <enikitin@apk.net>
> Date: Wed, 3 Mar 1999 15:48:31 -0500 (EST)
>
> Michael van Acken wrote:
> > It will make CAP slower, but I think we can life
> > with the slowdown.  Now, does anything have a mapping from lower to
> > upper case characters for the upper half of the iso-latin-1 character
> > set?  Otherwise this discussion is quite pointless.
>
> These are available as part of the data provided by the the Unicode
> Consortium (www.unicode.org).  These are straightforward, unlike some
> of the other Unicode case mappings, which may or may not map to
> single characters, and may or may not be locale specific.
>
>
> > I don't think LONGCHAR should appear in the module interface of
> > `TextRider'.  It is much easier to duplicate its functionality in a
> > separate module that deals exclusively with LONGCHAR, instead of
> > kludging it into `TextRider'.
> >
> > And without LONGCHAR in `TextRider' it is impossible to have a common
> > "UnicodeMapper" (or whatever) super class that includes and string
> > functions.  A question: Could we derive `KSCRider' from a concrete
> > class `UnicodeRider'?  The answer would be "yes" if we define
> > `KSCRider' to just map from KSC encoding to Unicode and vice versa.
> > Here "encoding" means both the encoding of n bit values in byte
> > streams, and translation of character codes between the two standards.
>
> I think the answer is yes.  That's pretty much how Java handles
> non-Unicode encodings.
>
> As a first attempt at a class hierarchy, consider the following
> (please feel free to suggest different names, along with other
> suggestions):
>
>
>                 Rider [ABSTRACT]
>                 /    \
>               /        \
>             /            \
>        TextRider        LongRider [ABSTRACT]
>           /\                 |
>         /    \               |
>       /        \             |
>   Cp037Rider (other       UnicodeRider
>               8-bit           /\
>               encodings)    /    \
>                           /        \
>                       KSCRider    (other 16-bit encodings)
>
>
> (Cp037 is US EBCDIC, I think.)
>
> `Rider' is the abstract class, which defines the Reader, Writer, and
> Scanner interfaces as currently implemented in `TextRider'.
>
> `LongRider' adds LONGCHAR and LongString support to these.  (The
> parent-child relationship between Rider and LongRider probably isn't
> necessary.  Any arguments for or against having this relationship?)

It is a little bit of an overkill, but it has the advantage of being
the cleaner design.  With an abstract `Rider' class we can define In,
Out, and Err in terms of this abstract class.  The obvious extension
to this is to base `Long(In|Out|Err)' on `LongRider'.  But I don't
think we need the `Long(In|Out|Err)' modules yet.

Lets say the input/output of TextRider is ISO-Latin-1, and likewise
the I/O of UnicodeRider is (guess what) Unicode.  Then one could still
add I/O modules without these requirements in their specifications
under `Rider' resp. `LongRider'.  Assuming, of course, that these
modules do not enforce a specific encoding.  This whole argumentation
is a weeny bit academic, though :-/

-- mva