[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 64-bit extensions



Tim wrote:

> Hallo!
> 
> > One thing I like about this definition is that the BYTE comes
> > out of hiding from the SYSTEM and is part of language definition.
> 
> BYTE stands for programming close to the hardware and signs something like
> "posssibly not portable". If you want small number use SHORTINT instead or
> design something like INT8. BYTE belongs to SYSTEM and possibly INT8, too
> since it is not portable in the sense of language definition. 

A byte is possibly _the_ most portable construct around.  Every micro
and computer that I know of has a byte data type.  So putting it into
system really has no basis in reality -- it's just a whim of the language
designer since no operations were defined on a byte (well they were on
SHORTINTs which is essentially what BYTE would become).

> > Almost everyone is also talking about using a Unicode character
> > set which are basically 16-bit characters.  I noticed the Component
> > Pascal has defined the following:
> > 
> >   SHORTCHAR = 0X - 0FFX    (perhaps in SYSTEM)
> >   CHAR      = 0X - 0FFFFX
> 
> > What are people's thought's about extending the CHAR type as
> > well?  Personally, this is not a big deal for me, but our
> > Asian colleagues might definitely be interested in something
> > like this.
> 
> I dont'l like this idea. It wasts memory. You have to use unconvential
> SHORTCHAR to interface with the OS since at least all OS I know of use 8
> bit character for nearly everything. Linux f.e. has only a few conversion
> functions if I remember correctly. There is no support for unicode when
> using filenames, etc... This results in conversions at many places, too.
> And a very clumpsy interface for filehandling etc...
> However something like UNICHAR or LONGCHAR is nice with a module like
> LongStrings and a conversion module.

There are some advantages to having this support in the language:

1) Automatic support for UNICODE strings and conversion from standard
   constant strings to UNICOCDE strings.
2) Standard comparison and other operators would work on UNICHAR strings
   just as they do on ordinary strings.

If, as you say, a separate module defines this new data type then we
will start to have a proliferation of modules that deal with all sorts
of different character basis strings.  There is also the UTF character
format which represents characters in one to three bytes.  The main
reason for selecting the UNICODE character set is that it is simpler
to manipulate since character widths would be fixed.

As far as interface to existing OSes, that would still be possible
with the final conversion consisting of a simple SHORT() on the
UNICODE string.  Eventually, OSes will start using Unicode and the
interface will be a non-issue.

I agree, if you're dealing with a limited memory computer such as
an Intel processor (old generation) then memory is an issue.  But with
modern OSes and processors, virtual memory is usually used so memory
is not as big an issue anymore.  What is more important is how good
the code generation is in terms of speed to the end user.  Most people
wouldn't care whether a string takes 80 bytes or 160 bytes.  Modern
processors also incur an overhead in unpacking bytes from memory
devices which are inherently 32- or 64-bits wide.

Of course, I don't see mva changing the current definition.  He has
already indicated that a HUGEINT 64-bit type is the way he would go
in the existing compiler.  I don't think a 16-bit character type would
be implemented either.

In my original proposal I was thinking more of a language change,
which, to be accurate, could no longer be called Oberon-2. 

> > 4) The world speaks Unicode and now so would OOC.
> 
> But it needs not to speak it natively. Search the manpages of Linux or
> other Unix to see how good a huge part of the world speaks Unicode :-|

I believe these are stop-gap measures.  Eventually OSes will speak
Unicode and that will drive the languages of the future.
 
> > 5) We're ready for humungous graphical databases with
> >    up to 9,223,372,036,854,775,808 bytes. This number
> >    courtesy UnixCalc :-). 
> 
> This is no problem with support for LONGLONGINT f.e.

Actually that would be HUGEINT according the mva.  Besides LONGLONGINT
is rather clumsy.
 
> About the other points I'm not sure. There is juice instead of java and I
> see no need for portability to component Pascal since I don't like the
> language extentions, too, and they have definitely left the oberon
> community by changing their name ;-)

Juice runs on two targets (Mac OS and Windows 95).  That leaves a lot of
operating systems which aren't supported.  Java byte code VMs can be found
almost everywhere or will be very soon -- even the BeOS has a Java VM.
If we want Oberon-2 to have a place in the future, eventually there will
have to be a Java back-end.  I agree that Juice is the better technology
but we all know where that gets us.  Java has the better PR so it will
win -- no doubt about it.  As for Component Pascal leaving the Oberon
community: I would say any language which is not C/C++ is at least a
higher-level language and I would gladly switch if they ever got enough
support for different targets.  As it is, any enemy of C/C++ is a friend
of Oberon.  Besides, I happen to like many of the extensions they have
made--they are long overdue in Oberon-2:

1) Non-system conversions between integers and SETs via BITSET.  Why
   would this be system dependent?  It's just a paradigm shift between
   one mathematical abstraction of sets and another of numbers.
2) Support for underscores and foreign characters in identifiers.
3) Built-in support for string concatenation via `+'.
4) IN OUT keywords in place of the VAR.
5) MIN/MAX can be applied to numerical arguments as well as types.
6) A finalization method is added to the base pointer type for all
   types.
7) Modules have finalization sections.

Of course, they have some silly extensions too such as keywords added
just to use as pragmas.  OOC's method is better IMHO.
 
> > Of course the disadvantages would be:
> 
> This ends in the classic discussion we already had, if I remember
> correctly: Should LONGINT always be 32 bit or should it stand for the
> native wordsize of the processor, e.g. 32bit for 32bit porcessors and
> 64bit for 64 bit processors? Or should we just define lower limits and
> make everything implementation definied? 

It seems like this is a non-issue as mva has already decided that
64-bits will be HugeInt (in SYSTEM?).  Personally I would go with
the Component Pascal approach but this isn't my compiler so what
mva says goes.

> Personaly I see some important drawbacks by simply extending LONGINT to 64
> bit: It much and unnecessary slower on 32 bit processors, as you stated,
> and they are still very common ;-) the memory problem is also true.

No need to use the 64-bit compiler on such systems.  There still would
be the 32-bit compiler.  Eventually everyone will have 64-bit processors
so then the switch is painless.  By that time memory will be even cheaper
than today so that will a non-issue as well.  Besides, after removing
Windows 2000, you'll have an extra couple of Gigabytes of RAM to play
with. :-)
 
> Another problem is the interfacing: With a huge gap between INTEGER and
> LONGINT you will have difficulty in mapping variable sizes of a 32 bit OS
> like C.long which is 32 bit under Linux.

I don't understand this point.  An Oberon-2 INTEGER gets mapped to a
C.long.
 
> In my opinion the best way would be to do it the way C does (and I mean:
> use the same bitssizes a c compiler on the same system does use for
> corresponding datatypes: int -> INTEGER, Long -> LONGINT, short ->
> SHORTINT), this will result in huge portability (minimal sizes are
> garantied), optimal interfacing to OS (which are mainly written in C
> today, this would be exspecially nice for the current version of oo2c,
> since it bases its code generation on a c compiler) and sensible
> use of porcessor features and memory.

I disagree.  This would cause a confusion with different compilers 
supporting different concepts of what an `int' is.  Usually, to prevent
this problem, most portable C software trys to define 8, 16, and 32-bit
integers.  Oberon-2 does this already so in a 64-bit Oberon-2 compiler
the numerical types get mapped to the best fit for a 64-bit processor.
As said, with mva's HUGEINT concept, this is a non-issue.  Even in a
64-bit compiler, interfacing to a 32-bit OS can be done optimally by
selecting the appropriate data type for the interface.  IMHO this
interface has nothing to do with the integer representation.
 
> Note, oo2c is not for its own OS (like java is somehow and is implied in
> the historical values of datatype ranges of oberon) but is designed as
> standalone compiler which must nicely interface witht he underlying OS.

I think the Java data types were designed for future OSes.  Oberon-2's
data types were designed for current OSes.

Michael