[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 64-bit extensions



So many messages, so little time! :) I've put different messages
together, so forgive me if I mis-attribute a quote...

Quotes are from Eric Nikitin (EN -- i.e., me), Mike Griebling (MG),
and Tim Teulings (TT).

EN > Bytes are computer specific entities; *not* a data abstraction like
EN > the other basic types.
>
MG > Actually bytes are no more computer specific than SHORTINTs.  The true
MG > mathematical abstraction would be an unbounded INTEGER.
>
MG > From that point of view, a BYTE as an abstraction of the machine
MG > makes sense as do SHORTINT, INTEGER, and LONGINT.

Using typical (not necessarily computer specific) definitions (from
WWWebster online dictionary):

byte: a group of eight binary digits processed as a unit by a computer
and used especially to represent an alphanumeric character.

integer: any of the natural numbers, the negatives of these numbers,
or zero.

A byte is not a data abstraction; it is a unit of data representation.
A byte can be used to represent whatever you want it to be -- it *can*
be used to represent integer values, but it is not necessarily, or
even typically required to.  Therefore, using BYTE to describe an
integer type is a mistake.

SHORTINT, INTEGER, and LONGINT are data abstractions, this doesn't
mean they are required to strictly conform to a "true" mathematical
definition of an infinite set of values.  I does mean they express
*some* of the essential characteristics of that mathematical
definition, which they do even though they have restricted ranges.

EN > Again I'm confused, what language do you want it to be?  Java with
EN > Oberon-like syntax?  Component Pascal?  Or ???
>
MG > I want Oberon-3.  Sort of like Component Pascal except with the
MG > extensions that I like.  :-)

I think we all want that, but...  There was a quote, which I don't
have an exact reference to, associated with the Modula-2 ISO
standardization process that went something like "We all agreed that
if we added just one more feature, Modula-2 would be the perfect
language, the problem was that no one could agree what that one
feature was."

EN > In the current state of things, Java (the actual language) doesn't
[snip some stuff on the Java Core API]
>
MG > Well, it sounds like you're intimately familiar with this stuff,
MG > do you have any interface specifications?

The Java IO package is a mess -- it contains around 45 classes.  Most
of them have very specific, and limited, uses.  I haven't thought
about this very much, but basically, what I think we would only have
to modify BinaryRider slightly, adding ReadLChar, ReadLString, and the
corresponding write methods, and add a UniTextRider module.  It
depends on whether you want to add translations to and from all the
various international character encoding standards.  Which brings me
to...

MG > I just hope we won't need UTF support
MG > in the future so we'll have to invent another entity.  Since a type
MG > inclusion thingie would have   CHAR >= LONGCHAR(UNICHAR) >= UTFCHAR
MG > we may have to introduce a HUGECHAR to support UTF.
>
TT > However it seems like there is Unicode
TT > (16 bit), UCS (31bit), utf-8 (pack in 8bits portions) etc... One has to
TT > look which is most common on the difernets OSs and which is covered by
TT > f.e. posix, C language definition to do the implementation of UNICHAR,
TT > HUGECHAR or whatever its name will be right from the beginning.
>
MG > The UTF format is actually a variable-width coding with from 8- to
MG > 24-bit characters.  The BeOS uses it.  Unicode seems to be more
MG > prevalent but doesn't have the range or the encoding efficiency of UTF.

Because I'm not sure that everyone understands what all of these
things are, let me try to clarify.

Unicode defines only character encoding; it does *not* require a
specific data representation.  But, because of the total number of
characters it supports, it does require a minimum of 16 bits to
represent all of these characters.

This is similar to ASCII encoding.  ASCII requires a minimum of 7 bits
to represent its 128 values.  But I could have 7-bit ASCII, or 8-bit
ASCII, or 9-bit ASCII...

UTF, UCS, and the like are data formats that are typically used to
represent Unicode.  UTF is a variable length format, whereas UCS is a
fixed length format.  There are also varieties of these: UTF8 uses a
variable number of 8-bit chunks to represent character values --
Unicode values in the ASCII range use only 8-bits, and other Unicode
values use 16, 24, or 32 bits.  UCS2 is a 2-byte (8 bits per byte)
fixed length format, *all* character values occupy 2 bytes -- even
those in the ASCII range.

UTF8 is good if your data is primarily in the ASCII range, but isn't
so good in other cases.  For instance, Korean Hangul characters need 3
bytes in UTF8 rather than 2 bytes in UCS2.

Java uses only UCS2 internally, but through it's I/O and sql packages,
it can translate to UTF8.  (Java also provides translations between
other international character standards -- for example, KSC-5601 is a
Korean standard encoding.)

Following Java's lead, OOC wouldn't have to supply a built-in UTFCHAR
type, but could supply translations in library modules.  

MG > Of course all these fancy encodings are useless without some fonts
MG > which actually use these encodings.  I am ignorant of font technologies.
MG > Do standard Postscript fonts or ATM fonts support these encodings?

There are Unicode fonts, which can handle international characters.
There are also language specific fonts, for instance, Gulim is a
Korean font.  

Java doesn't guarantee that fonts are available.  The assumption seems
to be that, if you're interested in displaying international
characters, then you'll have those fonts available.

(In reference to Oberon/F -- BBCP w/CP):
MG > Is there a reason you stopped using it? 

Mainly because it's not Oberon-2, it's now a proprietary language.
For the past year and a half, I've been working on an introductory
programming book using Oberon-2.  I wasn't going to change my book to
use CP, and XDS has a perfectly fine Oberon-2 compiler that runs on
Windows.  I'm not writing large applications, nor am I doing any GUI
stuff using Oberon-2 (yet), so BBCP w/CP has no advantage for me.  OOC
and XDS serve my needs nicely right now.

MG > I wasn't aware that backwards compatibility was a requirement for
MG > the name Oberon-3.

Techincally, it's not.  However, the name implies it, and many people
require backwards compatibility before they'd switch.  Also, as I said
last time, there *are* other Oberon variants, which may have
contributions to an official Oberon-3.

---
Eric