'Re: Thoughts on Records and Field Names'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       haskell
Subject:    Re: Thoughts on Records and Field Names
From:       dlb () severn ! wash ! inmet ! com (David Barton)
Date:       1995-11-21 16:38:09
Message-ID: 199511211638.RAA15777 () animal ! cs ! chalmers ! se
[Download RAW message or body]

Return-Path: haskell-list-request@dcs.gla.ac.uk
Resent-Message-Id: <199511211638.RAA15777@animal.cs.chalmers.se>
From: dlb@severn.wash.inmet.com (David Barton)
Subject: Re: Thoughts on Records and Field Names
Date: Tue, 21 Nov 1995 11:31:41 -0500 (EST)
To: haskell-dist@dcs.gla.ac.uk
Old-Resent-From: haskell-request@dcs.gla.ac.uk
Errors-To: haskell-request@dcs.gla.ac.uk
Approved: haskell@dcs.gla.ac.uk
Resent-Date:  Tue, 21 Nov 1995 16:33:51 +0000
Resent-From: kh@dcs.gla.ac.uk
Resent-To: haskell-list-dist@dcs.gla.ac.uk


After having posted on records, I decided to give them a try with a
real example.  So I constructed a balanced binary tree (given the
recent questions).  I decided to extend a bit into the "unionized"
record territory by at least marking the null tree with a constructor,
and see how things went.

I also decided to drive my implementation as far as possible towards
the "implementation independent" side of things.  Therefore, I wanted
to make all of my functions totally independent of the actual
implementation of the record, and only use functions provided by the
"record class" and the "field name classes".  The result was
interesting, and is recorded below.

A tree is either nil or a full record.  The record looks like:

        key: a;
        info: b;
        lchild: tree;
        rchild: tree;
        balance: Flag;

Here, I reproduce the fieldname class from my last post:

> class FieldName a f b where
>       set:: a -> f -> b -> a
>       get:: a -> f -> b

And the type of my balance flag

> data BalFlag = Balanced | Lheavy | Rheavy
> instance Eq BalFlag where
>   Balanced == Balanced = True
>   Lheavy == Lheavy = True
>   Rheavy == Rheavy = True
>   _ == _ = False

In addition to this, I will create classes for each of the fields in
the record:

> data Key = Key        deriving Text
> data Info = Info      deriving Text
> data Lchild = Lchild  deriving Text
> data Rchild = Rchild  deriving Text
> data Bflag = Bflag    deriving Text

And now we create a class for each field name in the record (all
pretty mechanical, so I don't care too much about the time for the
implementation):

> class FieldName a Key b => KeyField a b where
>    key:: a -> b
>    key d = get d Key
>
> class FieldName a Info b => InfoField a b where
>    info:: a -> b
>    info d = get d Info
>
> class FieldName a Lchild b => LchildField a b where
>    lchild:: a -> b
>    lchild d = get d Lchild
>
> class FieldName a Rchild b => RchildField a b where
>    rchild:: a -> b
>    rchild d = get d Rchild
>
> class FieldName a Bflag b => BflagField a b where
>    bflag:: a -> b
>    bflag d = get d Bflag
>

And the record class definition is:

> class (KeyField (c a b) a,
>        InfoField (c a b) b,
>        LchildField (c a b) (c a b),
>        RchildField (c a b) (c a b),
>        BflagField (c a b) BalFlag) => BalTree c a b where
>           nulTree:: c a b
>           isNulTree:: c a b -> Bool
>           isBalNode:: c a b -> Bool

This record definition dissatisfies me somewhat.  The presence of the
boolean functions "isNulTree" and "isBalNode" are perilously close to
actual data constructors.  The "nulTree" function to provide an
uninitialized record is also rather dissatisfying.  However, I cannot
really get along without them.

And now, the type which is our implementation of the tree:

> data BalTreeRec a b = NulTree | 
>                       BalNode a b (BalTreeRec a b) (BalTreeRec a b) BalFlag

Before I go into the field instances, I discover I need to create a
"zero" element for some of the fields.  This corresponds to the
"uninitialized" value that John Peterson has referred to in his papers
on structures.

> class HasZero a where
>   getZero:: a

And each of the field instances, which I intend to write with an Emacs
macro off an initial sample:

> instance (HasZero a, HasZero b) => FieldName (BalTreeRec a b) Key a where
>         set (NulTree) Key x = BalNode x getZero NulTree NulTree getZero
>         set (BalNode _ f1 f2 f3 f4) Key x = BalNode x f1 f2 f3 f4
>         get (BalNode x _  _  _  _ ) Key   = x
> instance (HasZero a, HasZero b) => KeyField (BalTreeRec a b) a
>
> instance (HasZero a, HasZero b) => FieldName (BalTreeRec a b) Info b where
>         set (NulTree) Info x = BalNode getZero x NulTree NulTree getZero
>         set (BalNode f1 _ f2 f3 f4) Info x = BalNode f1 x f2 f3 f4
>         get (BalNode _  x _  _  _ ) Info   = x
> instance (HasZero a, HasZero b) => InfoField (BalTreeRec a b) b
>
> instance (HasZero a, HasZero b) => 
>       FieldName (BalTreeRec a b) Lchild (BalTreeRec a b) where
>         set (NulTree) Lchild x = BalNode getZero getZero x NulTree getZero
>         set (BalNode f1 f2 _ f3 f4) Lchild x = BalNode f1 f2 x f3 f4
>         get (BalNode _  _  x _  _ ) Lchild   = x
> instance (HasZero a, HasZero b) => LchildField (BalTreeRec a b) 
>                                                (BalTreeRec a b)
>
> instance (HasZero a, HasZero b) => 
>       FieldName (BalTreeRec a b) Rchild (BalTreeRec a b) where
>         set (NulTree) Rchild x = BalNode getZero getZero NulTree x getZero
>         set (BalNode f1 f2 f3 _ f4) Rchild x = BalNode f1 f2 f3 x f4
>         get (BalNode _  _  _  x _ ) Rchild   = x
> instance (HasZero a, HasZero b) => RchildField (BalTreeRec a b)
>                                                (BalTreeRec a b)
>
> instance (HasZero a, HasZero b) => 
>       FieldName (BalTreeRec a b) Bflag BalFlag where
>         set (NulTree) Bflag x = BalNode getZero getZero NulTree NulTree x
>         set (BalNode f1 f2 f3 f4 _) Bflag x = BalNode f1 f2 f3 f4 x
>         get (BalNode _  _  _  _  x) Bflag   = x
> instance (HasZero a, HasZero b) => BflagField (BalTreeRec a b) BalFlag
>
> instance BalTree BalTreeRec a b where
>          nulTree = NulTree
>          isNulTree (NulTree) = True
>          isNulTree (BalNode _ _ _ _ _) = False
>          isBalNode (NulTree) = False
>          isBalNode (BalNode _ _ _ _ _) = True

First of all, note the instance declaration for each of the "-Field"
classes.  I left those out of my original post, to my chagrin.
Apologies to all.

Second, I am again dissatisfied with the necessity for the "set"
function with the "NulTree" data constructor.  It is harder to turn
this out "mechanically", as it were.  The writer has to know that this
is going to be the null case, and "do the right thing".  I really
dislike that necessity.

And now, having implemented our record (a lengthy, but not overly
time-consuming process), we are ready to begin actually implementing
the functions that go along with it.

First of all, we need an insert function (we allow duplicates in our
tree):

> insTree:: (BalTree t a b, Ord a) => t a b -> a -> b -> (t a b)
> insTree t k x = if isNulTree t then nt1
>                 else
>                    if k <= (key t) then lt1 else rt1
>         where nt1 = set nt2 Key k
>               nt2 = set nt3 Info x
>               nt3 = set nt4 Lchild (asTypeOf nulTree t)
>               nt4 = set nt5 Rchild (asTypeOf nulTree t)
>               nt5 = set nulTree Bflag Balanced
>               lt1 = ltilt lt2
>               lt2 = set t Lchild (insTree lc k x)
>               lc  = lchild t
>               rt1 = rtilt rt2
>               rt2 = set t Rchild (insTree rc k x)
>               rc  = rchild t

The need for the "asTypeOf" functions for "nulTree" again dissatisfies
me.  It is made necessary by the implementation independence of the
setup (without it, there is nothing to prevent the "nulTree" function
from returning a *different* implementation from that of t).

The repeated calls to "set" in the construction of the first record
strike me as really turgid and unnecessary.  I would much prefer to
simply construct the record.  It is therefore not clear to me that
this level of implementation independence is worth the effort.

> 
> ltilt:: (BalTree t a b, Ord a) => t a b -> t a b
> ltilt t = if ((bflag t) == Rheavy) then (set t Bflag Balanced)
>           else if ((bflag t) == Balanced) then (set t Bflag Lheavy)
>                else ladjust t
>
> ladjust:: (BalTree t a b, Ord a) => t a b -> t a b
> ladjust t = nt
>             where nt  = set nt1 Bflag Balanced
>                   nt1 = set nt2 Rchild (insTree (rchild t) (key t) (info t))
>                   nt2 = set r1 Lchild r2
>                   (r1, r2) = rmostChild (lchild t)
>
> rmostChild:: (BalTree t a b, Ord a) => t a b -> (t a b, t a b)
> rmostChild t = if (isNulTree (asTypeOf (rchild t) t)) then (t, (lchild t))
>                else (r1, (ltilt (set t Rchild r2)))
>            where (r1, r2) = rmostChild (rchild t)
>
> rtilt:: (BalTree t a b, Ord a) => t a b -> t a b
> rtilt t = if ((bflag t) == Lheavy) then (set t Bflag Balanced)
>           else if ((bflag t) == Balanced) then (set t Bflag Rheavy)
>                else radjust t
>
> radjust:: (BalTree t a b, Ord a) => t a b -> t a b
> radjust t = nt
>             where nt  = set nt1 Bflag Balanced
>                   nt1 = set nt2 Lchild (insTree (lchild t) (key t) (info t))
>                   nt2 = set l1 Rchild l2
>                   (l1, l2) = lmostChild (rchild t)
>
> lmostChild:: (BalTree t a b, Ord a) => t a b -> (t a b, t a b)
> lmostChild t = if (isNulTree (asTypeOf (lchild t) t)) then (t, (rchild t))
>                else (l1, (rtilt (set t Lchild l2)))
>            where (l1, l2) = lmostChild (lchild t)

On the other hand, the use of the "set" functions worked out very well
here, as did the use of the fieldname functions in the if statements.
All this flowed fairly obviously and nicely, and I had no complaints.

Next, we need to find a key in the list and return the result.  Of
course, we must also signal that something is not found; therefore, we
need an appropriate response type.

> data SearchResp a = NotFound | Found a
> 
> balFind:: (BalTree t a b, Ord a) => t a b -> a -> SearchResp b
> balFind t k = if isNulTree t then NotFound
>               else if key(t) == k then (Found (info t))
>                    else if key (t) < k then balFind (lchild t) k
>                         else balFind (rchild t) k

At this time, I see no need for list deletion (although it would not
be tremendously difficult to implement; perhaps later, as an
exercise).  However, I do want to extract a list of the key -
information pairs from the tree, in order to work with it.  I will
take the trouble to avoid the append call, and pass around a right
hand built list in order to keep things efficient (or at least a
little efficient).

> balFetch:: BalTree t a b => t a b -> [ (a, b) ]
> balFetch t = tbalFetch t []
> 
> tbalFetch:: BalTree t a b => t a b -> [ (a, b) ] -> [ (a, b) ]
> tbalFetch t l = if isNulTree t then l
>                    else tbalFetch (lchild t) 
>                         (((key t), (info t)):(tbalFetch (rchild t) l))

And finally a quick routine to take a list of key-info pairs and
insert them into a balanced tree list.  Since we may want to add the
elements of a list to the pair, I will go ahead and make the routine
take an initial balanced tree; passing it nulTree is not a big deal
(even if we have to type the thing).

> buildTree:: (BalTree t a b, Ord a) => t a b -> [ (a, b) ] -> t a b
> buildTree t [] = t
> buildTree t ((k, i):ls) = buildTree (insTree t k i) ls

And now for some testing.  First implementations of HasZero for Int,
String, and BalFlag:

> instance HasZero Int where
>    getZero = 0
> 
> instance HasZero String where
>    getZero = ""
>
> instance HasZero BalFlag where
>    getZero = Balanced

And a test case (we'll make it balanced, just to be interesting):

> inp = [ (2, "two"), (7, "seven"), (5, "five"), (1, "one"),
>         (4, "four"),(3, "three"), (6, "six") ]

> ttree:: BalTreeRec Int String
> ttree = buildTree (nulTree::BalTreeRec Int String) inp

Return all permutations of a given list.  I stole this from a previous
post by Phil Wadler; many thanks, Phil!

> perm :: [a] -> [[a]]
> perm xs  =  foldr (\x yss -> [ zs | ys <- yss, zs <- permins x ys]) [[]] xs

Insert element x into list xs in all possible ways.

> permins :: a -> [a] -> [[a]]
> permins x ys  =  [ take n ys++[x]++drop n ys | n <- [0..length ys]]

Test that a list of values are sorted.

> testSort:: Ord a =>  [a] -> Bool
> testSort [] = True
> testSort (l:ls) = testSort1 l ls
> testSort1 :: Ord a => a -> [ a ] -> Bool
> testSort1 x [] = True
> testSort1 x (l:ls) = if x <= l then (testSort1 l ls) else False

Test that building the tree yields a sorted list for all permuted
values of inp.

> test :: [ (Int,String) ] -> Bool
> test l = all (testSort . fst . unzip . balFetch . 
>               (buildTree (nulTree :: BalTreeRec Int String))) (perm l)
> llist = map (testSort . fst . unzip . balFetch . 
>              buildTree (nulTree::BalTreeRec Int String)) (perm inp)

And run a quick, back-of-the-envelope that the trees are actually
balanced (we can examine ttree by hand):

> eqTree:: BalTreeRec Int String -> BalTreeRec Int String -> Bool
> eqTree t1 t2 = (isNulTree t1 && isNulTree t2) ||
>                (((key t1)::Int) == (key t2) &&
>		  ((info t1)::String) == (info t2) &&
>		  ((bflag t1)::BalFlag) == (bflag t2) &&
>		  eqTree (lchild t1) (lchild t2) &&
>		  eqTree (rchild t1) (rchild t2))

> testbal:: [ (Int, String) ] -> Bool
> testbal l = all ((eqTree ttree) . buildTree (nulTree::BalTreeRec Int String))
>                 (perm inp)

Running the tests provides reassurance:

? ttree
BalNode 4 "four" (BalNode 2 "two" (BalNode 1 "one" NulTree NulTree
Balanced) (BalNode 3 "three" NulTree NulTree Balanced) Balanced)
(BalNode 6 "six" (BalNode 5 "five" NulTree NulTree Balanced) (BalNode
7 "seven" NulTree NulTree Balanced) Balanced) Balanced 
(471 reductions, 2194 cells)
? test inp
True
(2912369 reductions, 10921660 cells, 177 garbage collections)
? testbal inp
True
(3655886 reductions, 11298445 cells, 182 garbage collections)
? 

So I haven't done that badly.

Looking back on the experiment, I am not at all convinced of the
benefits of "implementation independence"; For example, the need for
the typed expressions in "eqTree" dismays me.  I don't know if doing
this more "in the system" would help such problems.  I suspect not,
but would be glad to hear more informed opinion.

Of course, comments or questions would be welcome.

					Dave Barton <*>
					dlb@wash.inmet.com )0(
                                        http://www.inmet.com/~dlb



[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic