[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: New TextRider & oo2c patch
Sorry that it took so long to respond to this; I've been really busy. Now
that MvA is going to join the world of full time employment, he'll better
appreciate this excuse :)
Michael van Acken wrote:
> While hacking away at TextRider, I noticed a small number of minor
> inconsistencies of the reference manual.
>
> Quote OOC RM:
> > When attempting to read, and if the value is not properly formatted
> >for its type, `r.Res()' returns `invalidFormat'. The reader remains
> >positioned at the character which caused the `invalidFormat' error, but
> >further reading can not take place until the error is cleared.
>
> The bit about the reader position after `invalidFormat' errors does
> not after a failed ReadBool. In this case, the position is after the
> invalid identifier.
The way I read it, the RM is correct. Assume a `ReadBool' is done on each
of the following (^ indicates rider position after read attempt):
TRUE
^ (* r.Res() => done *)
FALSE
^ (* r.Res() => done *)
TRUUE
^ (* r.Res() => invalidFormat, expects an `E' here not `U' *)
FALS
^ (* r.Res() => invalidFormat, expects an `E' here not whitespace *)
FALES
^ (* r.Res() => invalidFormat, expects an `S' here not `E' *)
ReadString and ReadSet work exactly the same way:
Doing ReadSet on
{1, 2, 3, Q}
^ (* `Q' causes invalidFormat *)
Doing ReadString on
"No close quote
^ (* EOL causes invalidFormat *)
When reading numbers, invalidFormat can only occur at the beginning (first
or second character). After that, "invalid" characters signal end-of-input
for a number. ReadInt on each of the following:
-123A
^ (* r.Res() => done *)
-A
^ (* r.Res() => invalidFormat *)
A123
^ (* r.Res() => invalidFormat *)
Should all of this be explained in the RM? Or is there a better short
explanation for this than what is already there?
> `valueOutOfRange' is signaled after the whole integer number is read,
> but for strings the reading procedure may return with this error "in
> between", without scanning to the end of the string first.
I'll add a note about this to the RM.
> The method Scanner.SkipSpaces is gone.
Gone from the RM too.
> OOC RM:
> > Note that, unlike most other reader types, a scanner will continue
> > to read (via calls to `Scan') even after it has scanned an invalid
> > token.
>
> Why? I have made the error condition "sticky" again. Once a scanner
> signals an error (but not `invalid'), ClearError has to be called
> explicitly to continue things. I will not revert this unless someone
> gives me a very good reason for this.
When an `error' occurs, making the programmer explicitly clear the error
makes sense.
> Also: The RM does not explain the difference between the types `error'
> and `invalid'. Lacking any further information, I am using `invalid'
> as an equivalent to `undefined', which can only happen after
> `InitScanner' or `ClearError'.
The difference should be that `invalid' (for Scanners) is equivalent to
`invalidFormat' OR `valueOutOfRange' (for Readers) -- i.e., problems
interpreting tokens. On the other hand, an `error' is when an error occurs
on the underlying Reader (i.e., `s.r.byteReader.res#done'); `error' is
therefore used to determine when you've reached end-of-text. So normally,
you would write something like
s.Scan;
WHILE s.type # TextRider.error DO
IF s.type = TextRider.string THEN
...
ELSIF s.type = TextRider.invalid THEN
(* Do something special with a bad token *)
...
Note that allowing attempts to re-scan `invalid' tokens is built into the
Scanner:
From the RM:
: Field: pos-: `LONGINT'
: Starting position of the most recently scanned token. Note
: that this is *not* the same as the value returned by the
: `Pos()' method.
:
: This value may be useful when an `invalid' token is scanned,
: as it will point to the start of the `invalid' token (whereas
: `Pos()' would be positioned *after* the invalid token). You
: could, for example, reset the scanner options and re-position
: the scanner back at the invalid token to attempt a re-scan.
In the past, I have argued that the invalid token should be placed in the
scanner's `string' field for the programmer's convienience, but I was voted
down (string buffer overruns would be a problem here, as MVA points out
for other cases below).
> Btw: I did _not_ fix the potential string buffer overruns in ReadLReal
> and Scanner.ReadNum.
What should we do about this? Make it POINTER TO ARRAY OF CHAR? Or ADT
Lib's dynamic String type?
Eric