[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ADT Lib: Persistence



Your sketch of the persistence mechanism looks pretty good to me. Here are
a couple of additional points that might be useful...

>PROCEDURE ReadObject* (r: BinaryRider.Reader; VAR obj: Object);
>  VAR
>    mark: LONGINT;
>    name: ARRAY 256(*should be unlimited*) OF CHAR;
>    module: Kernel.Module;
>    type: Types.Type;
>  BEGIN
>    r. ReadNum (mark);
>    IF (mark <= 0) THEN
>      obj := nodeTab[mark]
>    ELSE
>      r. ReadString (name);
>      module := Kernel.modules;
>      WHILE (module. name^ # name) DO
>        module := module. next
>      END;
>      (* for now: trap if `name' does not exist *)
>      r. ReadString (name);
>      type := Types.This (module, name);
>      (* for now: trap if `name' does not exist *)

It may be worth considering something similar to the Blackbox "Alien"
mechanism. When Blackbox finds an object for which it has no associated
module, it loads the object as type "Alien". Alien objects can be saved
again but cannot be otherwise interpreted. Since OOC does not have a
dynamic loading mechanism, it is quite possible that a program may
encounter objects created by modules that have not been linked into its
binary.

Inside an Alien object is a copy of its type name and some reference to the
position and extent of the object in the original file. In order to do
this, the length of the object's data must be stored at the start of a
record. Since this is initially unknown, WriteObject should leave a gap
after the type header which it can fill after the object's Store procedure
has returned. Note that this also means that the underlying Channel must be
positionable (is this too strong an assumption?). Blackbox also provides a
procedure TurnIntoAlien within a Store.Reader, which is used to convert a
partially internalised object into an Alien. This may be used by a Load
procedure if an object finds that a critical component is Alien (for
example, the Model of a View).

There is a further issue that arises if we would like to be able to
compress the type names. Blackbox only ever writes a type name once per
store (the first time it is used). Thereafter it substitutes a more compact
(persumably integer) identifier. This is fairly easy to do, and a procedure
is outlined in the ETH report that you mention. Without compression the
type names may well be much larger than actual objects which can waste time
and space in encoding and decoding. However, the situation gets complicated
if a type name _first_ appears inside an Alien object. You can't simply
ignore the contents of an Alien object, since you might miss the
type-name-to-identifier mapping that will be needed later. The implication
is that the components of Aliens must be interpretable, even if the object
itself is of unknown format. Again, this is not too hard to achieve: after
each type header, you write the offset to the next type header (this
effectively gives a linked-list of all the objects in a file). When
encountering an Alien object, you actually need to load its components (you
can't simply copy them) since the type-name-to-identifier mappings may be
different next time the object is written. This would be a result _not_ of
the object itself, but of other objects that may have previously been
written during the whole writing process. Parts of the Alien that occur
between its components can be copied verbatim.

You can guess what Blackbox is doing from the following definition in
module Stores:

  AlienComp = POINTER TO LIMITED RECORD
    next-: AlienComp
  END;

  AlienPiece = POINTER TO LIMITED RECORD (AlienComp)
    pos-, len-: INTEGER
  END;

  AlienPart = POINTER TO LIMITED RECORD (AlienComp)
    store-: Store
  END;

  Alien = POINTER TO LIMITED RECORD (Store)
    path-: TypePath;
    cause-: INTEGER;
    file-: Files.File;
    comps-: AlienComp
  END;

>3. The algorithm requires that every instance of `Object' has a field
>`mark'.  Alternative: like reading, writing could use a object table
>and do table lookups to determine the marker.  This would keep
>`Object' small, but takes longer because of the table lookups.

Hmmm. I always wondered how this was supposed to work. Thanks for the
reference to the ETH report. The problem as I saw it was that a mark field
will always (?) be initialised to zero by the run-time when an object is
created. Thus, you can write it once, but not a second time since you have
already marked all the objects. The only way I could see of clearing the
mark field was to do a "fake" WriteObject (with a dummy BinaryWriter) that
clears the marks before doing the "real" WriteObject. There is no other
formal way in which an object enumerates all its components. I guess the
other alternative is to use a "virtual clock" as described in Josef Templ's
article. Granted, the object-table approach does not suffer from this
problem. I still vote for the "mark fields" method, because I would like to
see the whole procedure be as fast as possible.

- Stewart