[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: New TextRider & oo2c patch




Sorry that it took so long to respond to this; I've been really busy.  Now
that MvA is going to join the world of full time employment, he'll better
appreciate this excuse :)   


Michael van Acken wrote:
> While hacking away at TextRider, I noticed a small number of minor
> inconsistencies of the reference manual.
> 
> Quote OOC RM:
> >   When attempting to read, and if the value is not properly formatted
> >for its type, `r.Res()' returns `invalidFormat'.  The reader remains
> >positioned at the character which caused the `invalidFormat' error, but
> >further reading can not take place until the error is cleared.
> 
> The bit about the reader position after `invalidFormat' errors does
> not after a failed ReadBool.  In this case, the position is after the
> invalid identifier.

The way I read it, the RM is correct.  Assume a `ReadBool' is done on each
of the following (^ indicates rider position after read attempt):

TRUE
    ^   (* r.Res() => done *)

FALSE
     ^  (* r.Res() => done *)

TRUUE
   ^    (* r.Res() => invalidFormat, expects an `E' here not `U' *)

FALS
    ^   (* r.Res() => invalidFormat, expects an `E' here not whitespace *)

FALES
   ^    (* r.Res() => invalidFormat, expects an `S' here not `E' *)


ReadString and ReadSet work exactly the same way:

Doing ReadSet on

{1, 2, 3, Q}
          ^   (* `Q' causes invalidFormat *)

Doing ReadString on

"No close quote
               ^  (* EOL causes invalidFormat *)


When reading numbers, invalidFormat can only occur at the beginning (first
or second character).  After that, "invalid" characters signal end-of-input
for a number.  ReadInt on each of the following:

-123A
    ^  (* r.Res() => done *)

-A
 ^     (* r.Res() => invalidFormat *)

A123
^      (* r.Res() => invalidFormat *)


Should all of this be explained in the RM?  Or is there a better short
explanation for this than what is already there?

> `valueOutOfRange' is signaled after the whole integer number is read,
> but for strings the reading procedure may return with this error "in
> between", without scanning to the end of the string first.

I'll add a note about this to the RM.

> The method Scanner.SkipSpaces is gone.

Gone from the RM too.

> OOC RM:
> >     Note that, unlike most other reader types, a scanner will continue
> >     to read (via calls to `Scan') even after it has scanned an invalid
> >     token.
> 
> Why?  I have made the error condition "sticky" again.  Once a scanner
> signals an error (but not `invalid'), ClearError has to be called
> explicitly to continue things.  I will not revert this unless someone
> gives me a very good reason for this.

When an `error' occurs, making the programmer explicitly clear the error
makes sense.  

> Also: The RM does not explain the difference between the types `error'
> and `invalid'.  Lacking any further information, I am using `invalid'
> as an equivalent to `undefined', which can only happen after
> `InitScanner' or `ClearError'.

The difference should be that `invalid' (for Scanners) is equivalent to
`invalidFormat' OR `valueOutOfRange' (for Readers) -- i.e., problems
interpreting tokens.  On the other hand, an `error' is when an error occurs
on the underlying Reader (i.e., `s.r.byteReader.res#done'); `error' is
therefore used to determine when you've reached end-of-text.  So normally,
you would write something like


  s.Scan;
  WHILE s.type # TextRider.error DO
     IF s.type = TextRider.string THEN
  ...
     ELSIF s.type = TextRider.invalid THEN
        (* Do something special with a bad token *)
  ...

Note that allowing attempts to re-scan `invalid' tokens is built into the
Scanner:

From the RM:
:    Field: pos-: `LONGINT'
:          Starting position of the most recently scanned token.  Note
:          that this is *not* the same as the value returned by the
:          `Pos()' method.
:
:          This value may be useful when an `invalid' token is scanned,
:          as it will point to the start of the `invalid' token (whereas
:          `Pos()' would be positioned *after* the invalid token).  You
:          could, for example, reset the scanner options and re-position
:          the scanner back at the invalid token to attempt a re-scan.

In the past, I have argued that the invalid token should be placed in the
scanner's `string' field for the programmer's convienience, but I was voted
down (string buffer overruns would be a problem here, as MVA points out
for other cases below).

> Btw: I did _not_ fix the potential string buffer overruns in ReadLReal
> and Scanner.ReadNum.

What should we do about this?  Make it POINTER TO ARRAY OF CHAR?  Or ADT
Lib's dynamic String type?


Eric