[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Sets: semantics and implementation (was: Re: ADT Lib (4))



Hi all!

At 7:04 PM -0400 6/5/98, Michael Griebling wrote:

>1) A "Set" module which represents the mathematical concept of a set
>    with an arbitrary number of unique elements.  Something like this is
>useful in
>    compiler register allocation algorithms and other places.  Every
>    Ada ADT has one of these.  Sets contain unique elements of
>    something.  In Oberon-2 we're probably limited to integers although
>    Ada allows enumerated types.

   I have recently worked on an implementation for the Sets abstraction and
there are a number of issues to consider:

   - Is the set sparse or dense?
   - What happens if you insert and element that is already there?
   - Is there some kind of "key" associated with the elements or do you use
     "object identity" all the way?
   - What do objects to be put into a set have to support?

   The Sets abstraction I have implemented is targetted at a "insert all
elements, freeze, make many (many!) queries" approach which explains my
particular choices which are:

   - Values are INTEGERs (actually LONGINTs) that are sparse (potentially very
     sparse since they can be addresses in the Lagoona runtime system :-).
   - If a different element (object identity) with the same key is inserted,
     this results in a trap.
   - Elements (actually keys) need to support "equal" and "less than".

   The implementation is based on an open array that is resized whenever
needed using a two-thirds fill heuristic (if out of space -> double size,
if less than 2/3 is used -> shrink size).

   Sets are initialized and filled with elements. As long as they are not
frozen, no other operation is legal. Freezing actually sorts the array. Now
you can query set membership of elements (binary search) and perform all
the set operations. Set operations actually produce new sets (eg. s3 :=
s1.Add (s2)) which causes some excess garbage to be generated, but is
otherwise very flexible and efficient: all usual operations take
O(|s1|+|s2|) time and space in the worst case (eg. union, difference,
subset).

   I currently think about generalizing the implementation to support
arbitrary keys that can be compared. I will try to produce an
implementation like this for our ADT library. Stay tuned for first attempts
at an interface that integrates nicely with the current proposals.

   I would also like to do the Hashtable (since I just *love* Hashtables)
but I think I don't have enough time for this (sad, sad, sad).

   Anyway, time for me to go to bed soon. Good night! :-)

PS: Keep those comments rolling in if you have some. I'd be interested what
you think about this approach to implementing sets. I've never read about
it in any book on algorithms (maybe because its so very simple?), so...

By(T)e...        /  Department of Information and Computer Science  \
        Peter... \ University of California - Irvine, CA 92697-3425 /