[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: TextRider bug on last token



> From: Michael van Acken <mvacken@t-online.de>
> 
> Here is my proposal of the eol implementation, based on recent
> postings on this mailing list:
> 
> TYPE
>   Reader* = POINTER TO ReaderDesc;
>   ReaderDesc* = RECORD
>     [...]
>     
>     (* the end of line marker may contain the character 0X, which means
its
>        length must be stored in a separate field; the eol marker can be
empty,
>        but if it is not empty, it is required to start with a control
character
>        with an ASCII code in 00X..1FX; an `eolLen' of -1 means auto
detect  *)
>     eol-: ARRAY maxEOL+1 OF CHAR;(* character sequence of end of line
marker *)
>     eolLen-: INTEGER;            (* number of characters in `eol' *)
>   END;
> ..snip..

Nice.  By being exported, a TextRider client can determine what eol was
auto-detected.
But only after the first line has been read; hopefully eolLen is initially
0 until auto-detection
detects something.

> PROCEDURE (r: Reader) SetEOL* (marker: ARRAY OF CHAR; markerLen:
INTEGER);
> (* Sets new end of line marker.  If the passed string marker does not fit
into
>    the field `eol', or it does not start with an control characters, then

>    `r.Res()' is set to `invalidFormat'.  A value `markerLen=-1' means
that
>    the reader should auto detect the end of line convention used by the
file.
>    This only works for a fixed number of patterns:
>      LF      used by Unix
>      CR      used by ???
>      CR/LF   used by MS-DOS and Windows
>    pre: (r.Res() = done) & (-1 <= markerLen < LEN (marker)) &
>         ((markerLen <= 0) OR (marker[0] < 20X)) *)
>..snip.. 
> Default: Reader does auto detect, Writer uses CharClass.eol (which is
> always one character)
> 
> I believe everyone will be happy with this.  If someone is not, do
> speak out or be silent forever.
> ..snip..

Forever is a long time! :)  This looks great.  My only reservation is
auto-detection.
I assume that "only works for a fixed number of patterns" means that only
those three are searched for -- don't want bell characters (ascii 9) or
somesuch
control character confusing the auto-detection.  If that detail is handled
then I am happy.

--Ian