[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
XML Parser for OOC
I uploaded the first version of an XML parser for OOC to
http://pweb.de.uu.net/vanacken.do/ooc/libxml-19990808.tar.gz
The file is approx. 70K in size. The test suite weighs 500K and can
be found at
http://pweb.de.uu.net/vanacken.do/ooc/libxml-test-19990808.tar.gz
Here is the README:
This is a non-validating XML parser for OOC. It should conform to the
XML 1.0 specification in all aspects. The package requires oo2c-1.4.6.
>From the beginning, this parser was intended to be validating as well.
Due to lack of time on my part, this feature will be included later.
The module XML:EntityResolver, does not know about the full scope of
URIs yet. This is also something that needs to be addressed in the
future. As it is, the parser can only deal with plain file names in
system ids.
I used this library as a test-bed to evaluate hierarchical module
names. This means that the module names look like `XML:Builder',
which tells the compiler to look for file `Builder.Mod' in the
subdirectory `XML'. Likewise, `XML:Codec:UTF8' is mapped to
`XML/Codec/UTF8.Mod'.
Installation is simple: Unpack the tar file and do
./configure
make
make install
If you want to use this library, you should inspect the sample
application `XMLTest'. Basically, you write yourself a concrete
implementation of `XML:Builder', get yourself a parser with
`XML:Parser.New', and call `ParseDocument' on the parser.
Attached is an overview of the modules, and a list of porting issues.
-- Michael van Acken <acken@informatik.uni-kl.de>
------------------------------------------------------------------------
List of modules:
XML/Codec.Mod
XML/Codec/*.Mod
XML:Codec defines an abstract class for converting from or to
Unicode encoding. The modules XML:Codec:* implement the encoding
schemes. Currently available: US-ASCII, ISO-8859-1, UTF-8, and
UTF-16.
XML/Config.Mod
Defines some (very) basic types for the library.
XML/Msg.Mod
XML/MsgBoard.Mod
XML/ErrBoard.Mod
These modules comprise the parser's error reporting facilities.
XML:Msg defines the concept of messages, message attributes, and a
way to convert messages to text. XML:MsgBoard implements a list of
messages. XML:ErrBoard contains helper functions to create and
write error messages.
XML/EntityResolver.Mod
The class `Resolver' defined in this module is responsible for URI
management. Currently, the module only knows about plain file names.
XML/SymbolTable.Mod
XML/Stream.Mod
XML/Scanner.Mod
XML/Parser.Mod
The parser itself, distributed over several modules. The module
XML:Parser exports a constructor function `New', and a method
`ParseDocument'. Check `XMLTest.Mod' for an example how they are
used.
XML/Builder.Mod
XML/Locator.Mod
XML/CanonicalUTF8.Mod
The class XML:Builder defines the construction interface of the
parser to its clients. The parser calls methods defined in this
module whenever it parsed a XML entity. XML:CanonicalUTF8 is an
example implementation of a builder. It constructs an canonical
UTF-8 encoded XML document. Module XML:Locator defines the
mechanism to associate XML entities with documents and document
positions.
XMLTest.Mod
A simple XML parser. It reads a XML document and writes its
canonical representation to stdout, encoded in UTF-8. Options:
-i only parse internal subset
-p include document positions of entities in output
-v validate (currently not supported)
------------------------------------------------------------------------
List non-standard compiler features used by this library:
o hierarchical module names (remove with "oocn --strip-mident")
o abstract classes (remove with "oocn --strip-system-flags")
o pragmas (remove with "oocn --strip-pragmas")
o Unicode characters (read: the library uses type LONGCHAR)
The first three points are a non-issue when porting this library to
another Oberon-2 compiler. You can get standard Oberon-2 code simply
by running "make convert-to-std-o2", which will write modified module
files into the subdirectory `obj'. LONGCHAR will probably be a major
obstacle for any porting efforts.