[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: TextRider bug on last token



> From: Mark K. Gardner <mkgardne@rtsl3.cs.uiuc.edu>
> Stewart Smith <ssmith@murdoch.edu.au> wrote:
> > ..snip..
> ..snip..
> My favorite solution, based on experience gained by porting many
> programs between Unix, Macintosh, and MS-DOS / Windows, is to always
> read files in binary mode (using #ifdef's as appropriate). I then
> convert CR-LF or CR sequences to LF before I use the data as text.
> (Most usually, I have a compiler-like scanner anyway so accepting CR,
> LF, or CR-LF is not a problem in my software.)
> ..snip..
> 

Here's a proposal for eol support in TextRider.  It's goals are:
-eol handling can be set on a per TextRider object basis
-OOC would support each platform's native eol format (POSIX, DOS, Mac)
-Ott (my text i/o library) could adjust eol format on per-channel basis
-TextRider.Mod stays as Oberon-2 code (no C code)


----Text End Of Line Handling Proposal----

1. Add a new module that would be implemented in C so that it could
be platform-specific using #ifdef.

MODULE TextEOL;
(*
    TextEOL -- defines end-of-line handling for text data streams.
*)

IMPORT Ascii;

CONST
(* eolTypes -- add more if necessary *)
 isLF = 0;
 isCRLF = 1;
 isCR = 2;

TYPE
  EndOfLine* = POINTER TO EndOfLineDesc;
  EndOfLineDesc* = RECORD
    type-: SHORTINT; (* on of the isXXX constants *)
    prefix-: CHAR; 	(* leading eol char, eg cr. undefined if eolLength # 2
*)
    eol-: CHAR;			(* final or only eol char *)
    length-: SHORTINT; (* 1 or 2 *)
  END;

PROCEDURE Init(VAR eol: EndOfLine);
(* init to platform's eol format *)
BEGIN

(* the following is presented in C *)
eol.type = isLF;
#ifdef _MSDOS_
eol.type = isLFCR;
#endif
/* handle Mac here with ifdef too... */

Force(eol, eol.type);
END Init;

PROCEDURE Force(VAR eol: EndOfLine, type: SHORTINT);
(* useful when channel-specific eol handling is required *)
BEGIN
(* the following is presented in C *)
eol.type = type;
switch(eol.type) {
	default:
	case 0:	eol.prefix = 0X;
		eol.eol =  Ascii.lf;
		eol.length = 1;
		break;

	case 1:	eol.prefix = Ascii.cr;
		eol.eol = Ascii.lf;
		eol.length = 2;
		break;
		
	case 2:	eol.prefix = 0X;
		eol.eol =  Ascii.cr;
		eol.length = 1;
		break;
}
END Init;


END TextEOL;

2.  Then change TextRider to use the new eol type

 IMPORT TextEOL, ...

  Reader* = POINTER TO ReaderDesc;
  ReaderDesc* = RECORD
	eol-: TextEOL.EndOfLine;	
	...
  END;

  Writer* = POINTER TO WriterDesc;
  WriterDesc* = RECORD
	eol-: TextEOL.EndOfLine;	
	...
  END;

 and ConnectReader would do

    r: Reader; t: Channel.Reader;
  BEGIN
    t:=ch.NewReader();
    IF t=NIL THEN RETURN NIL END;
    NEW(r);
    TextEOL.Init(r.eol);   (* <==== added *)
    ...

  ConnectWriter would do the same...

 Note. After calling Init my portable text i/o library version of
TextRider could call Force  based on channel-specific information.

 Then TextRider would use the eol object in several places.
a) ReadLine would now end:

	IF (cnt > 0) & (r.eol.length = 2) & (s[cnt-1] = r.eol.prefix) THEN
		DEC(cnt);
	END;

    s[cnt]:=0X (* terminate string *)
  END ReadLine;

b) WriteLn becomes
PROCEDURE (w: Writer) WriteLn*, NEW;
(* Write a newline *)
  BEGIN
	IF w.eol.length = 2 THEN
	   w.WriteChar(w.eol.prefix);
	END;
    w.WriteChar(w.eol.eol)
  END WriteLn;

c) I haven't added the UngetChar() eol handling.  It'll be a bit ticklish,
but better
 done here in TextRider than in Channel which is more position sensitive.

3.  Ctrl-Z handling for DOS is not IMO required.  It hasn't been commonly
used since
about DOS 2 or DOS.  Certainly none of the Windows code I've written in the
last
10 years ever used it.

I hope this is not overkill.  It should cover the bases.

--Ian