[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: LONGCHAR proposal
> From: "Eric W. Nikitin" <enikitin@apk.net>
> Date: Wed, 3 Mar 1999 15:48:31 -0500 (EST)
>
> Michael van Acken wrote:
> > It will make CAP slower, but I think we can life
> > with the slowdown. Now, does anything have a mapping from lower to
> > upper case characters for the upper half of the iso-latin-1 character
> > set? Otherwise this discussion is quite pointless.
>
> These are available as part of the data provided by the the Unicode
> Consortium (www.unicode.org). These are straightforward, unlike some
> of the other Unicode case mappings, which may or may not map to
> single characters, and may or may not be locale specific.
>
>
> > I don't think LONGCHAR should appear in the module interface of
> > `TextRider'. It is much easier to duplicate its functionality in a
> > separate module that deals exclusively with LONGCHAR, instead of
> > kludging it into `TextRider'.
> >
> > And without LONGCHAR in `TextRider' it is impossible to have a common
> > "UnicodeMapper" (or whatever) super class that includes and string
> > functions. A question: Could we derive `KSCRider' from a concrete
> > class `UnicodeRider'? The answer would be "yes" if we define
> > `KSCRider' to just map from KSC encoding to Unicode and vice versa.
> > Here "encoding" means both the encoding of n bit values in byte
> > streams, and translation of character codes between the two standards.
>
> I think the answer is yes. That's pretty much how Java handles
> non-Unicode encodings.
>
> As a first attempt at a class hierarchy, consider the following
> (please feel free to suggest different names, along with other
> suggestions):
>
>
> Rider [ABSTRACT]
> / \
> / \
> / \
> TextRider LongRider [ABSTRACT]
> /\ |
> / \ |
> / \ |
> Cp037Rider (other UnicodeRider
> 8-bit /\
> encodings) / \
> / \
> KSCRider (other 16-bit encodings)
>
>
> (Cp037 is US EBCDIC, I think.)
>
> `Rider' is the abstract class, which defines the Reader, Writer, and
> Scanner interfaces as currently implemented in `TextRider'.
>
> `LongRider' adds LONGCHAR and LongString support to these. (The
> parent-child relationship between Rider and LongRider probably isn't
> necessary. Any arguments for or against having this relationship?)
It is a little bit of an overkill, but it has the advantage of being
the cleaner design. With an abstract `Rider' class we can define In,
Out, and Err in terms of this abstract class. The obvious extension
to this is to base `Long(In|Out|Err)' on `LongRider'. But I don't
think we need the `Long(In|Out|Err)' modules yet.
Lets say the input/output of TextRider is ISO-Latin-1, and likewise
the I/O of UnicodeRider is (guess what) Unicode. Then one could still
add I/O modules without these requirements in their specifications
under `Rider' resp. `LongRider'. Assuming, of course, that these
modules do not enforce a specific encoding. This whole argumentation
is a weeny bit academic, though :-/
-- mva