[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

XML Parser for OOC



I uploaded the first version of an XML parser for OOC to

  http://pweb.de.uu.net/vanacken.do/ooc/libxml-19990808.tar.gz

The file is approx. 70K in size.  The test suite weighs 500K and can
be found at

  http://pweb.de.uu.net/vanacken.do/ooc/libxml-test-19990808.tar.gz


Here is the README:

This is a non-validating XML parser for OOC.  It should conform to the
XML 1.0 specification in all aspects.  The package requires oo2c-1.4.6.

>From the beginning, this parser was intended to be validating as well.
Due to lack of time on my part, this feature will be included later.
The module XML:EntityResolver, does not know about the full scope of
URIs yet.  This is also something that needs to be addressed in the
future.  As it is, the parser can only deal with plain file names in
system ids.

I used this library as a test-bed to evaluate hierarchical module
names.  This means that the module names look like `XML:Builder',
which tells the compiler to look for file `Builder.Mod' in the
subdirectory `XML'.  Likewise, `XML:Codec:UTF8' is mapped to
`XML/Codec/UTF8.Mod'. 

Installation is simple: Unpack the tar file and do
  ./configure
  make
  make install

If you want to use this library, you should inspect the sample
application `XMLTest'.  Basically, you write yourself a concrete
implementation of `XML:Builder', get yourself a parser with
`XML:Parser.New', and call `ParseDocument' on the parser.

Attached is an overview of the modules, and a list of porting issues.

-- Michael van Acken <acken@informatik.uni-kl.de>


------------------------------------------------------------------------

List of modules:

XML/Codec.Mod
XML/Codec/*.Mod
  XML:Codec defines an abstract class for converting from or to
  Unicode encoding.  The modules XML:Codec:* implement the encoding
  schemes.  Currently available: US-ASCII, ISO-8859-1, UTF-8, and
  UTF-16.

XML/Config.Mod
  Defines some (very) basic types for the library.

XML/Msg.Mod
XML/MsgBoard.Mod
XML/ErrBoard.Mod
  These modules comprise the parser's error reporting facilities.
  XML:Msg defines the concept of messages, message attributes, and a
  way to convert messages to text.  XML:MsgBoard implements a list of
  messages.  XML:ErrBoard contains helper functions to create and
  write error messages.

XML/EntityResolver.Mod
  The class `Resolver' defined in this module is responsible for URI
  management.  Currently, the module only knows about plain file names.

XML/SymbolTable.Mod
XML/Stream.Mod
XML/Scanner.Mod
XML/Parser.Mod 
  The parser itself, distributed over several modules.  The module
  XML:Parser exports a constructor function `New', and a method
  `ParseDocument'.  Check `XMLTest.Mod' for an example how they are
  used.

XML/Builder.Mod 
XML/Locator.Mod
XML/CanonicalUTF8.Mod
  The class XML:Builder defines the construction interface of the
  parser to its clients.  The parser calls methods defined in this
  module whenever it parsed a XML entity.  XML:CanonicalUTF8 is an
  example implementation of a builder.  It constructs an canonical
  UTF-8 encoded XML document.  Module XML:Locator defines the
  mechanism to associate XML entities with documents and document
  positions.

XMLTest.Mod
  A simple XML parser.  It reads a XML document and writes its
  canonical representation to stdout, encoded in UTF-8.  Options:
    -i  only parse internal subset
    -p  include document positions of entities in output
    -v  validate (currently not supported)

------------------------------------------------------------------------

List non-standard compiler features used by this library:

  o hierarchical module names (remove with "oocn --strip-mident")

  o abstract classes (remove with "oocn --strip-system-flags")

  o pragmas (remove with "oocn --strip-pragmas")

  o Unicode characters (read: the library uses type LONGCHAR)

The first three points are a non-issue when porting this library to
another Oberon-2 compiler.  You can get standard Oberon-2 code simply
by running "make convert-to-std-o2", which will write modified module
files into the subdirectory `obj'.  LONGCHAR will probably be a major
obstacle for any porting efforts.