[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: TextRider bug on last token



>>IR> 2. A small change request for Channel.Mod.  Please add the
>>following field:
>>IR> 
>>IR> eolType-: INTEGER;
>>IR> (* Type of end-of-line char(s) this channel has.
>>IR> 0=lf, 1=cr/lf, 2=cr.
>>IR> *)
>>
>>I am sorry, but this is not possible.  A channel is defined as a
>>sequence of uninterpreted bytes.  It has no concept of "lines",
>>nor of "end of line" characters.  The same is true for all
>>implementations of Channel and all binary riders.  Therefore I
>>cannot introduce this alien concept at this level.
>>
>>The field must be made an attribute of text riders.
>
>If this is so then we need to fix Files.C and OakFiles.C. The

For what it's worth, POSIX does not know O_TEXT or O_BINARY.
Therefore Files.c and OakFiles.c are not broken, nor need to be
fixed.

>underlying assumption in all C compilers that I have met is that
>files are opened as "text" unless otherwise specified. Under Unix
>it makes no difference, since text and binary files have the same
>representation. Under DOS and Windows (and other operating
>systems?) if you don't explicity request "binary" mode, the native
>end-of-line representation is automatically translated to / from
>the "C" end-of-line representation (LF terminator). For Windows,
>this means that CR-LF is translated to LF on all file reads, and
>LF is translated to CR-LF on all file writes. This is done at the
>level of the C run-time library. Under Unix, "fopen" is part of
>the C runtime and "open" is part of the OS API. On other operating
>systems, "open" is also part of the C run-time and will do this
>kind of translation. Unless you hard-wire O_BINARY into all
>file-open functions, any attempt to do binary I/O will fail on
>these types of systems. 

Inserting O_BINARY will break the files for all POSIX compliant
systems.  Obviously, I will not hardwire this into Files.c if it
breaks oo2c on my Linux box ;-)

>This is what I had to do to get OO2C running under Windows. Since
>we are agreed (?) that Channels are uninterpreted, any channel
>MUST be opened as O_BINARY (for Unix this should have no effect
>anyway).

As pointed out above, "no effect" is a severe understatement.
Anyway, I agree with you, that for certain systems the low-level
file modules need to be adjusted, but the current implementation
(oo2c for Unix) is valid.  This is no news.

The question still remains, how different end of line conventions
can be selected for the TextRider module.  The current
implementation hardwires a single end of line character (ASCII.lf,
I believe) into the module.  I think we must make the eol
convention an attribute of a rider instance instead.  Then, to
deal with MS-DOG's CR/LF, the mechanism to "unget" characters must
be extended to deal with 2 or more characters.  I believe, MS-DOG
also has an end of file character (^Z).  Do we need to tackle
this, too?

-- mva