[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: TextRider bug on last token
Here is my proposal of the eol implementation, based on recent
postings on this mailing list:
TYPE
Reader* = POINTER TO ReaderDesc;
ReaderDesc* = RECORD
[...]
(* the end of line marker may contain the character 0X, which means its
length must be stored in a separate field; the eol marker can be empty,
but if it is not empty, it is required to start with a control character
with an ASCII code in 00X..1FX; an `eolLen' of -1 means auto detect *)
eol-: ARRAY maxEOL+1 OF CHAR;(* character sequence of end of line marker *)
eolLen-: INTEGER; (* number of characters in `eol' *)
END;
PROCEDURE (r: Reader) SetEOL* (marker: ARRAY OF CHAR; markerLen: INTEGER);
(* Sets new end of line marker. If the passed string marker does not fit into
the field `eol', or it does not start with an control characters, then
`r.Res()' is set to `invalidFormat'. A value `markerLen=-1' means that
the reader should auto detect the end of line convention used by the file.
This only works for a fixed number of patterns:
LF used by Unix
CR used by ???
CR/LF used by MS-DOS and Windows
pre: (r.Res() = done) & (-1 <= markerLen < LEN (marker)) &
((markerLen <= 0) OR (marker[0] < 20X)) *)
PROCEDURE (w: Writer) SetEOL* (marker: ARRAY OF CHAR; markerLen: INTEGER);
(* Sets new end of line marker. If the passed string marker does not fit into
the field `eol', then `w.Res()' is set to `invalidFormat'.
pre: (w.Res() = done) & (0 <= markerLen < LEN (marker)) *)
PROCEDURE (s: Scanner) SetEOL* (marker: ARRAY OF CHAR; markerLen: INTEGER);
(* See Reader.EOL *)
BEGIN
s. r. SetEOL (marker, markerLen)
END SetEOL;
Default: Reader does auto detect, Writer uses CharClass.eol (which is
always one character)
I believe everyone will be happy with this. If someone is not, do
speak out or be silent forever.
Summary of problems with the input part of TextRider:
TextRider must not assume that the channel is positionable.
Most tokens require a lookahead of 1. E.g., procedure
`ReadIdentifier' needs to inspect the next character to decide, if
it has reached the end of the identifier. The lookahead character
must be available for the next read operation.
The lookahead character may cause an read error, e.g. when reading
past the end of a file. This does _not_ imply a read error for the
scanned token! Example: If a character of an identifier has been ,
and lookahed reports `readAfterEnd', then the identifier is valid,
but the next read operation has to report the error.
Some systems have an end of line marker longer than one character.
This means that a lookahead of 1 is no longer sufficient.
E.g. r.Eol() needs to check all characters of the marker. Note that
any prefix of the marker may not be detected as an end of line: CR
on its own is a normal character, while CR/LF is an eol.
A per system selection of the eol character is not sufficient.
While a MS-DOS file may use CR/LF, input from terminal will use the
code for RETURN as end of line marker. (Can someone verify this?)
No rider (reader, writer, or scanner) is permitted to do anything if
the riders `Res()' is not `done'.
Error handling of the readers must be precise. Either the last
operation was successfull, consuming a number of characters and
returning a result based on these characters, or an error is
signalled, potentially leaving the reader in an undefined state.
Silently dropping characters from the input stream is not permitted,
of course.
Mike's implementation of TextRider only addresses parts of these
problems. Especially the implementation of the lookahead mechanism is
inadequate. I will rewrite the reader parts of TextRider over the
weekend :-( I do not have any application using TextRider (or In, for
the matter) beyond Mike's InOutTest, so I will need some feedback from
users, preferably _before_ the module gets into the next release of
oo2c.
Btw, the reference manual fails to mention that the field
`Scanner.pos' holds `Channel.noPosition', if the scanner is attached
to a channel that does not permit positioning.
-- mva