[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ADT Lib: Persistence



>> When
>> encountering an Alien object, you actually need to load its components (you
>> can't simply copy them) since the type-name-to-identifier mappings may be
>> different next time the object is written. This would be a result _not_ of
>> the object itself, but of other objects that may have previously been
>> written during the whole writing process. Parts of the Alien that occur
>> between its components can be copied verbatim.
>
>When you say "components", do you mean pointers (references to other
>objects) embedded in the alien?  If this is the case, this would
>answer my question from above.  

Yes. I do mean references. Sorry for the confusion. I forgot to mention the
most important reason for having to interpret objects contained within
Aliens. This is the fact that they may also be referenced from elsewhere in
the file, so you really do have to load them.

>Is it correct, that internalizing an alien object means to store in
>memory a sequence of bytes, with pointers interspersed somewhere in
>between?  And externalizing the alien means to store the byte
>sequences "as is", but rewrite the interspersed pointers to fit the
>newly written file?

This is my idea of how it should work. I have no idea whether Blackbox
actually implements it as I have described it. Rephrasing the Blackbox
definition in OOC terms, we would have the following. An Alien is
represented as a piece chain. Elements of the chain are either literal data
(AlienPiece) or pointer to an Object (AlienPart).

  (* base type for Alien components *)
  AlienComp = POINTER TO AlienCompDesc;
  AlienCompDesc = RECORD
    next-: AlienComp
  END;

  (* data from alien object copied verbatim from source file *)
  AlienPiece = POINTER TO AlienPieceDesc;
  AlienPieceDesc = RECORD (AlienComp)
    data-: POINTER TO ARRAY OF SYSTEM.BYTE; (* or similar *)
  END;

  (* object referenced by Alien object *)
  AlienPart = POINTER TO AlienPartDesc;
  AlienPartDesc = RECORD (AlienComp)
    object-: Object
  END;

  Alien = POINTER TO AlienDesc;
  AlienDesc = RECORD (StdTypes.Object)
    path-: TypePath;
    cause-: INTEGER;
    comps-: AlienComp
  END;

>Note that I am talking about copying the parts into memory, instead of
>just keeping track of file intervals.  After all, a file may disappear
>after it has been internalized.

That would make sense.

>I don't know what a `Store' is for Blackbox, and how it maps to `Files
>a la OOC'.
>
>>   Alien = POINTER TO LIMITED RECORD (Store)
>>   END;

A Blackbox Stores.Store is equivalent to a StdTypes.Object. In Blackbox,
Stores are accessed via their own special riders: Stores.Reader, and
Stores.Writer. In most respects they work like BinaryReaders and
BinaryWriters, except that they always write their data in a
platform-independent byte-ordering. The functions equivalent to
StdTypes.ReadObject and StdTypes.WriteObject are type-bound to riders in
Blackbox (ie. Reader.ReadStore and Writer.WriteStore). StdTypes.Object.Load
and Store are equivalent to Blackbox Store.Internalize and Externalize
procedures.

Special functions provided by Stores.Readers are:
- PROCEDURE (VAR rd : Reader) TurnIntoAlien (cause : INTEGER)
This causes the currently Internalized store to become Alien.
- PROCEDURE (VAR rd : Reader) ReadVersion (min, max: INTEGER; OUT version:
INTEGER)
This reads a version identifier and cancels the current Internalize if it
is not between <min> and <max>.
- a boolean variable "cancelled", which indicates that an Internalize has
been cancelled. A smart Internalize function will periodically check its
reader's cancelled status to see if it should exit without completing the
entire process. Normally this is caused by (1) an inappropriate version
number, (2) the Store being turned into an Alien.
- a boolean variable "readAlien" which is true if an Alien has been read
since the Reader was connected to a File.

Special functions provided by Stores.Writers are:
- PROCEDURE WriteVersion (version: INTEGER)

It probably also makes sense to have special riders in StdTypes. If you
want to support an efficient "cancel" for Load() there needs to be some way
of signalling this. You could use a global variable which is set by
TurnIntoAlien and cleared when appropriate by ReadObject, but then you
might have problems later if you want to support threads. In simple
situations, a Load() can simply return after calling TurnIntoAlien, etc.
However, if Load() is being used via a super-call, there has to be a way of
notifying the callee that a "cancel" has occurred. In Blackbox, the process
looks something like:

TYPE
  ExamplePart = POINTER TO RECORD
    next : ExamplePart;
    x, y : INTEGER;
    part : Stores.Store;
  END;

  Example = POINTER TO RECORD (some_subtype_of_Stores.Store)
    nParts : INTEGER;
    parts : ExamplePart;  
  END;

PROCEDURE (e : Example) Internalize (VAR rd : Stores.Reader), NEW;
CONST
  min=0; max=0;
VAR 
  count, version : INTEGER;
  ep : ExamplePart;
BEGIN
  (* super-call may cancel *)
  e.Internalize^(rd);

  (* read-version may cancel *)
  rd.ReadVersion(min,max,version);

  IF ~rd.cancelled THEN
    rd.ReadInt(count);
    WHILE (count > 0) & ~rd.cancelled DO
      NEW(ep);
      rd.ReadInt(ep.x);
      rd.ReadInt(ep.y);
      rd.ReadStore(ep.part);
      IF (ep.part = NIL) OR (ep.part IS Stores.Alien) THEN
        (* become Alien if a part is Alien *)
        rd.TurnIntoAlien(Stores.alienComponent)
      ELSE
        ep.next := e.parts;
        e.parts := ep;
        INC(e.nParts);
      END;
      DEC(count);
    END
  END
END Internalize;

The alternative to using a special rider would be to have a special context
object that is allocated locally in ReadStore, and is passed to Load(). Eg:

PROCEDURE (e : Example) Load(VAR rd : BinaryReader; VAR c : StdTypes.Context);
...
BEGIN
  e.Load^(rd, c);
  IF ~c.cancelled THEN
    rd.ReadInt(count);
    ...
  END;
END Load;

Having to access the rider through a context is probably too cumbersome. 

>This could be better done the other way round: after writing the whole
>graph, the same algorithm that was used to set `mark' can be used to
>clear it again.  This would establish the invariant, that `mark' is 0
>during the entire lifetime of an object, except temporarily when
>writing a data structure.

Yes, that would also work.

>One question remains: Should we also tackle `Libraries' a la Gadgets?
>I have no experience with Gadgets whatsoever, do we have any Gadgets
>users on this list who can provide some hard information?
>
>I believe, a Gadget `Library' is a file containing a number of named,
>persistent objects.  References can be made to objects of other
>libraries by quoting the library name and the item name (similar to
>qualified import in O2).  The differences between `Library' and the
>persistence model discussed here, is primarily the ability to
>distribute persistent objects over multiple files.  Our "simple"
>persistence can only dump a whole data structure into a single file.

Sorry. I've read about Gadgets Libraries, but don't have any experience
with them.

I have a simple (ie. naive) heap manager that allows a large number of
objects to be stored/retrieved from a memory-mapped file. This is the basis
to a simple database kernel that runs under Blackbox. There is no fancy
naming scheme: each object is "named" by an integer identifier. This
remains constant over the life of the object, even if it is relocated
within the file (due to changes in the size of its stored representation).
Mapping an identifier to a virtual address is by simple table-lookup (the
translation table is also stored in the memory-mapped file) so it is very
fast. Currently, the largest database I have built is somewhere over 1.2
million objects, but there is no reason why it can't go higher. The use of
memory-mapping may be a problem. Windows NT has MapViewOfFile, and many
unixes have mmap() functions. How widely available is mmap() under Unix? 

This could probably be used as a basis for a 'library' system: A library
consists of a collection of StdTypes.Objects stored in a single file.
Internal references (within the library) are simply represented as
(integer) object identifiers. Cross references (between libraries) involve
storing an additional library name with the object identifier. The only
problem would be keeping track of which 'library' an object belongs in.
I'll have a think about how this should be done and post some more later...

-Stewart