'[Hs-Generics] Inlining and generic programming'

[prev in list] [next in list] [prev in thread] [next in thread] 

List: haskell-generics
Subject: [Hs-Generics] Inlining and generic programming
From: José_Pedro_Magalhães <jpm () cs ! uu ! nl>
Date: 2012-02-22 14:32:32
Message-ID: CA+ZFbn8f=q5WnJiLU9S2zYomRu5j-i8dKLtdD1Xev97Y34+ydQ () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]

Hello all,

First of all, I'm sorry that this email is so absurdly long. But it's not
easy to explain the problem at hand, so I took a step-by-step approach. The
executive summary is: GHC can do a *great *job with inlining, but often it
doesn't, and I don't understand why. So I have some questions, which are
highlighted in the text below. In general, any insights regarding inlining
or improving the performance of generics are welcome. My final goal is to
be able to state that generic functions (in particular using GHC.Generics)
will have no runtime overhead whatsoever when compared to a handwritten
type-specific version.

The setting

Generic programming is based on representing datatypes in a uniform way
using a small set of representation types. Functions defined on those
representation types can then be applied to all datatypes, because we can
convert between datatypes and their representations.
However, generic functions tend to be slower than their specialised
counterparts, because they have to deal with the conversions. But clever
inlining (together with other compiler optimisations) can completely remove
this overhead. The problem I'm tackling is how to tell GHC exactly what it
should in the particular case of optimisation of generic code.

Simplified example

I'll focus on the problem of optimising a non-trivial function for generic
enumeration of terms. My experience shows that GHC does quite good at
optimising simple functions, especially consumers (like generic equality).
But producers are trickier.

First, we'll need some auxiliary functions:

-- | Interleave elements from two lists. Similar to (++), but swap left and
> -- right arguments on every recursive application.
> --
> -- From Mark Jones' talk at AFP2008
> {-# NOINLINE (|||) #-}
> (|||) :: [a] -> [a] -> [a]
> []     ||| ys = ys
> (x:xs) ||| ys = x : ys ||| xs
>
>
> -- | Diagonalization of nested lists. Ensure that some elements from every
> -- sublist will be included. Handles infinite sublists.
> --
> -- From Mark Jones' talk at AFP2008
> {-# NOINLINE diag #-}
> diag :: [[a]] -> [a]
> diag = concat . foldr skew [] . map (map (\x -> [x]))
>
> skew :: [[a]] -> [[a]] -> [[a]]
> skew []     ys = ys
> skew (x:xs) ys = x : combine (++) xs ys
>
> combine :: (a -> a -> a) -> [a] -> [a] -> [a]
> combine _ xs     []     = xs
> combine _ []     ys     = ys
> combine f (x:xs) (y:ys) = f x y : combine f xs ys
>

The particular implementation of these functions doesn't really matter.
What's important is that we have a way to interleave lists (|||) and a way
to diagonalise a matrix into a list (diag). We mark these functions as
NOINLINE because inlining them will only make the core code more
complicated (and may prevent rules from firing).

Suppose we have a type of Peano natural numbers:

data Nat = Ze | Su Nat deriving Eq
>

Implementing enumeration on this type is simple:

enumNat :: [Nat]
> enumNat = [Ze] ||| map Su enumNat
>

Now, a generic representation of Nat in terms of sums and products could
look something like this:

type RepNat = Either () Nat
>

That is, either a singleton (for the Ze case) or a Nat (for the Su case).
Note that I am building a shallow representation, since at the leaves we
have Nat, and not RepNat. This mimics the situation with current generic
programming libraries (in particular GHC.Generics).

We'll need a way to convert between RepNat and Nat:

toNat :: RepNat -> Nat
> toNat (Left ()) = Ze
> toNat (Right n) = Su n
>
> fromNat :: Nat -> RepNat
> fromNat Ze = Left ()
> fromNat (Su n) = Right n
>

(In fact, since we're only dealing with a generic producer we won't need
the fromNat function.)

To get an enumeration for RepNat, we first need to know how to enumerate
units and sums:

enumU :: [()]
> enumU = [()]
>
> enumEither :: [a] -> [b] -> [Either a b]
> enumEither ea eb = map Left ea ||| map Right eb
>

Now we can define an enumeration for RepNat:

enumRepNat :: [RepNat]
> enumRepNat = enumEither enumU enumNatFromRep
>

With the conversion function toNat, we can use enumRepNat to get an
enumeration for Nat directly:

enumNatFromRep :: [Nat]
> enumNatFromRep = map toNat enumRepNat
>

First, convince yourself that enumNatFromRep and enumNat are equivalent
functions:

take 100 enumNat == take 100 enumNatFromRep
>

Now, what I want is that enumNatFromRep generates the same core code as
enumNat. That should be possible; here are the necessary steps:

  map toNat enumRepNat
>
> == { inline enumRepNat }
>
>   map toNat (enumEither enumU enumNatFromRep)
>
> == { inline enumEither }
>
>   map toNat (map Left enumU ||| map Right enumNatFromRep)
>
> == { inline enumU }
>
>   map toNat (map Left [()] ||| map Right enumNatFromRep)
>
> == { inline map }
>
>   map toNat ([Left ()] ||| map Right enumNatFromRep)
>
> == { free theorem (|||): forall f a b. map f (a ||| b) = map f a ||| map f
> b }
>
>   map toNat [Left ()] ||| map toNat (map Right enumNatFromRep)
>
> == { inline map }
>
>   [toNat (Left ())] ||| map toNat (map Right enumNatFromRep)
>
> == { definition of toNat (or inline toNat + case of constant) }
>
>   [Ze] ||| map toNat (map Right enumNatFromRep)
>
> == { functor composition law: forall f g l. map f (map g l) = map (f . g)
> l }
>
>   [Ze] ||| map (toNat . Right) enumNatFromRep
>
> == { definition of toNat (or inline toNat + case of constant) }
>
>   [Ze] ||| map Su enumNatFromRep
>

Now let's see what the compiler generates. I'm using GHC-7.4.1. Let's
compile with -O1 and use -ddump-simpl to see the final simplifier output
(core code) for enumNatFromRep:

EnumAlone.enumNatFromRep :: [EnumAlone.Nat]
> [GblId,
>  Str=DmdType,
>  Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
>          ConLike=False, Cheap=False, Expandable=False,
>          Guidance=IF_ARGS [] 30 0}]
> EnumAlone.enumNatFromRep =
>   GHC.Base.map
>     @ EnumAlone.RepNat
>     @ EnumAlone.Nat
>     EnumAlone.toNat
>     EnumAlone.enumRepNat
>
> EnumAlone.enumRepNat [Occ=LoopBreaker] :: [EnumAlone.RepNat]
> [GblId, Str=DmdType]
> EnumAlone.enumRepNat =
>   EnumAlone.|||
>     @ (Data.Either.Either () EnumAlone.Nat) lvl4_rvV lvl5_rvW
>

Ah, it didn't even inline enumRepNat because it made it a loop breaker. We
certainly want to inline it, so let's add a pragma:

{-# INLINE enumRepNat #-}
>

Recompiling, we get:

EnumAlone.enumRepNat [InlPrag=INLINE (sat-args=0)]
>   :: [EnumAlone.RepNat]
> [GblId,
>  Str=DmdType,
>  Unf=Unf{Src=InlineStable, TopLvl=True, Arity=0, Value=False,
>          ConLike=False, Cheap=False, Expandable=False,
>          Guidance=ALWAYS_IF(unsat_ok=False,boring_ok=False)
>          Tmpl= EnumAlone.enumEither
>                  @ () @ EnumAlone.Nat EnumAlone.enumU
> EnumAlone.enumNatFromRep}]
> EnumAlone.enumRepNat =
>   EnumAlone.|||
>     @ (Data.Either.Either () EnumAlone.Nat) lvl4_rvV lvl5_rvW
>
> EnumAlone.enumNatFromRep [Occ=LoopBreaker] :: [EnumAlone.Nat]
> [GblId, Str=DmdType]
> EnumAlone.enumNatFromRep =
>   GHC.Base.map
>     @ EnumAlone.RepNat
>     @ EnumAlone.Nat
>     EnumAlone.toNat
>     EnumAlone.enumRepNat
>

So no real difference in the generated code, other than the reassignment of
loop breakers. For some reason enumRepNat still doesn't get inlined.

*Question: why won't GHC inline enumRepNat, even when I tell it to do so
with an INLINE pragma?*

Well, let's inline it ourselves, then. We redefine enumNatFromRep to:

enumNatFromRep = map toNat (enumEither enumU enumNatFromRep)
>

This however doesn't help much. We get the following core:

lvl6_rw5 :: [Data.Either.Either () EnumAlone.Nat]
> [GblId]
> lvl6_rw5 =
>   EnumAlone.|||
>     @ (Data.Either.Either () EnumAlone.Nat) lvl4_rw3 lvl5_rw4
>
> EnumAlone.enumNatFromRep [Occ=LoopBreaker] :: [EnumAlone.Nat]
> [GblId, Str=DmdType]
> EnumAlone.enumNatFromRep =
>   GHC.Base.map
>     @ EnumAlone.RepNat @ EnumAlone.Nat EnumAlone.toNat lvl6_rw5
>

GHC is really keen on floating that (|||) out. Let's be very explicit about
inlining:

{-# INLINE toNat #-}
> {-# INLINE enumU #-}
> {-# INLINE enumEither #-}
>

Also, maybe it's floating it out because it doesn't have anything else to
do to it. Let's add the free theorem of (|||) as a rule:

{-# RULES "ft |||" forall f a b. map f (a ||| b) = map f a ||| map f b #-}
>

We needed this in our manual derivation, so GHC should need it too.
Recompiling, we see we've made some progress:

lvl5_ryv :: [Data.Either.Either () EnumAlone.Nat]
> [GblId]
> lvl5_ryv =
>   GHC.Base.map
>     @ EnumAlone.Nat
>     @ (Data.Either.Either () EnumAlone.Nat)
>     (Data.Either.Right @ () @ EnumAlone.Nat)
>     EnumAlone.enumNatFromRep
>
> lvl6_ryw :: [EnumAlone.Nat]
> [GblId]
> lvl6_ryw =
>   GHC.Base.map
>     @ (Data.Either.Either () EnumAlone.Nat)
>     @ EnumAlone.Nat
>     EnumAlone.toNat
>     lvl5_ryv
>
> EnumAlone.enumNatFromRep [Occ=LoopBreaker] :: [EnumAlone.Nat]
> [GblId, Str=DmdType]
> EnumAlone.enumNatFromRep =
>   EnumAlone.||| @ EnumAlone.Nat lvl4_ryu lvl6_ryw
>

enumNatFromRep finally starts with (|||) directly. But its second argument,
lvl6_ryw, is a map of lvl5_ryv, which is itself a map! At this stage I
expected GHC to be aware of the fusion law for map, but it seems that it
isn't.

*Question: why is map fusion not happening automatically?*

Let's add it as a rule:

{-# RULES "map/map1" forall f g l. map f (map g l) = map (f . g) l #-}
>

And now we're in a much better situation:

lvl3_ryD :: [Data.Either.Either () EnumAlone.Nat]
> [GblId, Caf=NoCafRefs]
> lvl3_ryD =
>   GHC.Types.:
>     @ (Data.Either.Either () EnumAlone.Nat)
>     EnumAlone.fromNat1
>     (GHC.Types.[] @ (Data.Either.Either () EnumAlone.Nat))
>
> lvl4_ryE :: [EnumAlone.Nat]
> [GblId]
> lvl4_ryE =
>   GHC.Base.map
>     @ (Data.Either.Either () EnumAlone.Nat)
>     @ EnumAlone.Nat
>     EnumAlone.toNat
>     lvl3_ryD
>
> lvl5_ryF :: [EnumAlone.Nat]
> [GblId]
> lvl5_ryF =
>   GHC.Base.map
>     @ EnumAlone.Nat
>     @ EnumAlone.Nat
>     EnumAlone.Su
>     EnumAlone.enumNatFromRep
>
> EnumAlone.enumNatFromRep [Occ=LoopBreaker] :: [EnumAlone.Nat]
> [GblId, Str=DmdType]
> EnumAlone.enumNatFromRep =
>   EnumAlone.||| @ EnumAlone.Nat lvl4_ryE lvl5_ryF
>

Note how toNat is entirely gone from the second part of the enumeration
(lvl5_ryF). Strangely enough, the enumerator for Ze (lvl4_ryE) is still
very complicated: map toNat ([Left ()]). Why doesn't GHC simplify this to
just [Ze]? Apparently because GHC doesn't simplify map over a single
element list.

*Question: why doesn't GHC optimise map f [x] to [f x]?*

Let's tell it to do so:

{-# RULES "map/map2" forall f x. map f (x:[]) = (f x):[] #-}
>

Now we're finally where we wanted:

lvl_ryA :: [EnumAlone.Nat]
> [GblId, Caf=NoCafRefs]
> lvl_ryA =
>   GHC.Types.:
>     @ EnumAlone.Nat EnumAlone.Ze (GHC.Types.[] @ EnumAlone.Nat)
>
> lvl3_ryD :: [EnumAlone.Nat]
> [GblId]
> lvl3_ryD =
>   GHC.Base.map
>     @ EnumAlone.Nat
>     @ EnumAlone.Nat
>     EnumAlone.Su
>     EnumAlone.enumNatFromRep
>
> EnumAlone.enumNatFromRep [Occ=LoopBreaker] :: [EnumAlone.Nat]
> [GblId, Str=DmdType]
> EnumAlone.enumNatFromRep =
>   EnumAlone.||| @ EnumAlone.Nat lvl_ryA lvl3_ryD
>

This is what I wanted: no more representation types (Either or ()), and the
code looks exactly like what is generated for the handwritten enumNat.

More realistic generic programming

Now let's see if we can transport this to the setting of a generic
programming library. I'll use a bare-bones version of GHC.Generics:

infixr 5 :+:
> infixr 6 :*:
>
> data U          = U              deriving (Show, Read)
> data a :+: b    = L a | R b      deriving (Show, Read)
> data a :*: b    = a :*: b        deriving (Show, Read)
> newtype Var a   = Var a          deriving (Show, Read)
> newtype Rec a   = Rec a          deriving (Show, Read)
>
> class Representable a where
>   type Rep a
>   to   :: Rep a -> a
>   from :: a -> Rep a
>

Let's represent Nat in this library:

instance Representable Nat where
>   type Rep Nat = U :+: (Rec Nat)
>   from Ze            = L U
>   from (Su n)        = R (Rec n)
>   to (L U)       = Ze
>   to (R (Rec n)) = Su n
>

(Note, in particular, that we do not need INLINE pragmas on the from/to
methods. This might just be because GHC thinks these are small and inlines
them anyway, but in general we want to make sure they are inlined, so we
typically use pragmas there.)

Now we need to implement enumeration generically. We do this by giving an
instance for each representation type:

class GEnum' a where
>   genum' :: [a]
>
> instance GEnum' U where
>   {-# INLINE genum' #-}
>   genum' = [U]
>
> instance (GEnum a) => GEnum' (Rec a) where
>   {-# INLINE genum' #-}
>   genum' = map Rec genum
>
> instance (GEnum a) => GEnum' (Var a) where
>   {-# INLINE genum' #-}
>   genum' = map Var genum
>
> instance (GEnum' f, GEnum' g) => GEnum' (f :+: g) where
>   {-# INLINE genum' #-}
>   genum' = map L genum' ||| map R genum'
>
> instance (GEnum' f, GEnum' g) => GEnum' (f :*: g) where
>   {-# INLINE genum' #-}
>   --genum' = diag [ [ x :*: y | y <- genum' ] | x <- genum' ]
>   genum' = diag (map (\x -> map (\y -> x :*: y) genum') genum')
>

We explicitly tell GHC to inline each case, as before. Note that for
products I'm not using the more natural list comprehension syntax because I
don't quite understand how that gets translated into core.

In the cases for Var and Rec we use genum from the GEnum class:

class GEnum a where
>   genum :: [a]
>   {-# INLINE genum #-}
>   default genum :: (Representable a, GEnum' (Rep a)) => [a]
>   genum = map to genum'
>

GEnum' is the class used for instantiating the generic representation
types, and GEnum is used for user types. We use a default signature to
provide a default method that can be used when we have a Representable
instance for the type in question. This makes instantiating Nat very easy:

instance GEnum Nat
>

Unfortunately, the core code generated in this situation (with the same
RULES as before) is not nice at all:

Main.$fGEnumNat_$cgenum [Occ=LoopBreaker] :: [Base.Nat]
> [GblId, Str=DmdType]
> Main.$fGEnumNat_$cgenum =
>   GHC.Base.map
>     @ (Base.Rep Base.Nat)
>     @ Base.Nat
>     Base.$fRepresentableNat_$cto
>     (lvl37_r79y
>      `cast` (Sym (GEnum.NTCo:GEnum') <Base.U Base.:+: (Base.Rec Base.Nat)>
> ;
>                                        (GEnum.GEnum' (Sym
> (Base.TFCo:R:RepNat)) ;
>                                          GEnum.NTCo:GEnum' <Base.Rep
> Base.Nat>)
>              :: [Base.C Base.Nat_Ze_ Base.U
>                  Base.:+: Base.C Base.Nat_Su_ (Base.Rec Base.Nat)]
>                   ~#
>                 [Base.Rep Base.Nat]))
>

We see a map of the `to` function, which is definitely not what we want.
Oddly enough, if we give an explicit definition of genum for Nat, with the
inlined default...

instance GEnum Nat where genum = map to genum'
>

... then we get the optimised code we want:

lvl34_r79p :: [Base.Nat]
> [GblId, Caf=NoCafRefs]
> lvl34_r79p =
>   GHC.Types.: @ Base.Nat Base.Ze (GHC.Types.[] @ Base.Nat)
>
> lvl35_r79q :: [Base.Nat]
> [GblId]
> lvl35_r79q =
>   GHC.Base.map @ Base.Nat @ Base.Nat Base.Su Main.$fGEnumNat_$cgenum
>
> Main.$fGEnumNat_$cgenum [Occ=LoopBreaker] :: [Base.Nat]
> [GblId, Str=DmdType]
> Main.$fGEnumNat_$cgenum =
>   GEnum.||| @ Base.Nat lvl34_r79p lvl35_r79q
>

Again, no representation types, no `to`, just the same code that is
generated for enumNat. Perfect. But we had to avoid using the default
definition, which is a pity.

*Question: why won't GHC inline the default method of a class, even when I
have a pragma telling it to do so?*

Let's look at one more datatype, because Nat does not use products. So
let's consider trees:

data Tree a = Leaf | Bin a (Tree a) (Tree a)
>
> instance Representable (Tree a) where
>   type Rep (Tree a) = U :+: (Var a :*: Rec (Tree a) :*: Rec (Tree a))
>   from (Bin x l r) = R (Var x :*: Rec l :*: Rec r)
>   from Leaf        = L U
>   to (R (Var x :*: (Rec l) :*: (Rec r))) = Bin x l r
>   to (L U)                               = Leaf
>

We give a GEnum instance using the same trick as before:

instance GEnum (Tree Int) where genum = map to genum'
>

(For simplicity only for trees of integers.) The generated code for trees
is unfortunately not as nice:

a2_r79M
>   :: [Base.Rec (Base.Tree GHC.Types.Int)
>       Base.:*: Base.Rec (Base.Tree GHC.Types.Int)]
> [GblId, Str=DmdType]
> a2_r79M =
>   GEnum.diag
>     @ (Base.Rec (Base.Tree GHC.Types.Int)
>        Base.:*: Base.Rec (Base.Tree GHC.Types.Int))
>     lvl8_r79L
>
> lvl9_r79N :: [Base.Tree GHC.Types.Int]
> [GblId]
> lvl9_r79N =
>   GHC.Base.map
>     @ (Base.Rec (Base.Tree GHC.Types.Int)
>        Base.:*: Base.Rec (Base.Tree GHC.Types.Int))
>     @ (Base.Tree GHC.Types.Int)
>     lvl5_r79H
>     a2_r79M
>
> Main.$fGEnumTree_$cgenum [Occ=LoopBreaker]
>   :: [Base.Tree GHC.Types.Int]
> [GblId, Str=DmdType]
> Main.$fGEnumTree_$cgenum =
>   GEnum.||| @ (Base.Tree GHC.Types.Int) lvl4_r79G lvl9_r79N
>

Note how lvl9_r79N is a map over a2_r79M, and a2_r79M is a `diag`. Ok, we
need the free theorem of `diag` to tell GHC how to commute the `diag` with
map:

{-# RULES "ft/diag" forall f l. map f (diag l) = diag (map (map f) l) #-}
>

Unfortunately this doesn't change the generated core code. With some more
debugging looking at the generated code at each simplifier iteration, I
believe that this is because a2_r79M got lifted out too soon, prevent the
rule from applying. With some imagination I decided to try the
-fno-full-laziness flag to prevent let-floating. I'm not sure this is a
good idea in general, but in this particular case it gives much better
results:

a1_r72i :: [Base.Rec (Base.Tree GHC.Types.Int)]
> [GblId, Str=DmdType]
> a1_r72i =
>   GHC.Base.map
>     @ (Base.Tree GHC.Types.Int)
>     @ (Base.Rec (Base.Tree GHC.Types.Int))
>     ((\ (tpl_B1 :: Base.Tree GHC.Types.Int) -> tpl_B1)
>      `cast` (<Base.Tree GHC.Types.Int>
>              -> Sym (Base.NTCo:Rec <Base.Tree GHC.Types.Int>)
>              :: (Base.Tree GHC.Types.Int -> Base.Tree GHC.Types.Int)
>                   ~#
>                 (Base.Tree GHC.Types.Int -> Base.Rec (Base.Tree
> GHC.Types.Int))))
>     Main.$fGEnumTree_$cgenum
>
> Main.$fGEnumTree_$cgenum [Occ=LoopBreaker]
>   :: [Base.Tree GHC.Types.Int]
> [GblId, Str=DmdType]
> Main.$fGEnumTree_$cgenum =
>   GEnum.|||
>     @ (Base.Tree GHC.Types.Int)
>     (GHC.Types.:
>        @ (Base.Tree GHC.Types.Int)
>        (Base.Leaf @ GHC.Types.Int)
>        (GHC.Types.[] @ (Base.Tree GHC.Types.Int)))
>     (GEnum.diag
>        @ (Base.Tree GHC.Types.Int)
>        (GHC.Base.map
>           @ (Base.Rec (Base.Tree GHC.Types.Int))
>           @ [Base.Tree GHC.Types.Int]
>           (\ (x_a1yQ :: Base.Rec (Base.Tree GHC.Types.Int)) ->
>              GHC.Base.map
>                @ (Base.Rec (Base.Tree GHC.Types.Int))
>                @ (Base.Tree GHC.Types.Int)
>                (\ (x1_X1zB :: Base.Rec (Base.Tree GHC.Types.Int)) ->
>                   Base.Bin
>                     @ GHC.Types.Int
>                     a_r72h
>                     (x_a1yQ
>                      `cast` (Base.NTCo:Rec <Base.Tree GHC.Types.Int>
>                              :: Base.Rec (Base.Tree GHC.Types.Int) ~#
> Base.Tree GHC.Types.Int))
>                     (x1_X1zB
>                      `cast` (Base.NTCo:Rec <Base.Tree GHC.Types.Int>
>                              :: Base.Rec (Base.Tree GHC.Types.Int) ~#
> Base.Tree GHC.Types.Int)))
>                a1_r72i)
>           a1_r72i))
>

Note how our enum is now of the shape `[Leaf] ||| diag y`, which is good.
The only catch is that there are some `Rec`s still laying around, with
their associated newtype coercions, and a function a1_r72i that basically
wraps the recursive enumeration in a Rec, only to be unwrapped in the body
of `$fGEnumTree_$cgenum`. I don't know how to get GHC to simplify this code
any further.

*Question: why do I need -fno-full-laziness for the ft/diag rule to apply?*

*Question: why is GHC not getting rid of the Rec newtype in this case?*

I have also played with -O2, in particular because of the SpecConstr
optimisation, but found that it does not affect these particular examples
(perhaps it only becomes important with larger datatypes). I have also
experimented with phase control in the rewrite rules and the inline
pragmas, but didn't find it necessary for this example. In general, anyway,
my experience with the inliner is that it is extremely fragile, especially
across different GHC versions, and it's hard to get any guarantees of
optimisation. I have also played with the -funfolding-* options before,
with mixed results. [1] It's also a pity that certain flags are not
explained in detail in the user's manual [2,3], like -fliberate-case, and
-fspec-constr-count and threshold, for instance.

Thank you for reading this. Any insights are welcome. In particular, I'm
wondering if I might be missing some details regarding strictness.

Cheers,
Pedro

[1] http://dreixel.net/research/pdf/ogie.pdf
[2]
http://www.haskell.org/ghc/docs/latest/html/users_guide/flag-reference.html
[3]
http://www.haskell.org/ghc/docs/latest/html/users_guide/options-optimise.html#options-f

[Attachment #5 (text/html)]

Hello all, First of all, I&#39;m sorry that this email is so absurdly long. \
But it&#39;s not easy to explain the problem at hand, so I took a step-by-step \
approach. The executive summary is: GHC can do a great job with inlining, but \
often it doesn&#39;t, and I don&#39;t understand why. So I have some questions, which \
are highlighted in the text below. In general, any insights regarding inlining or \
improving the performance of generics are welcome. My final goal is to be able to \
state that generic functions (in particular using GHC.Generics) will have no runtime \
overhead whatsoever when compared to a handwritten type-specific version.

The setting Generic programming is based on \
representing datatypes in a uniform way using a small set of representation types. \
Functions defined on those representation types can then be applied to all datatypes, \
because we can convert between datatypes and their representations.

However, generic functions tend to be slower than their specialised counterparts, \
because they have to deal with the conversions. But clever inlining (together with \
other compiler optimisations) can completely remove this overhead. The problem \
I&#39;m tackling is how to tell GHC exactly what it should in the particular case of \
optimisation of generic code.

Simplified example I&#39;ll focus on the problem \
of optimising a non-trivial function for generic enumeration of terms. My experience \
shows that GHC does quite good at optimising simple functions, especially consumers \
(like generic equality). But producers are trickier.

First, we&#39;ll need some auxiliary functions: <blockquote \
style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex;font-family:courier new,monospace" \
class="gmail_quote">-- | Interleave elements from two lists. Similar to (++), but \
swap left and

-- right arguments on every recursive application.<br>--<br>-- From Mark Jones&#39; \
talk at AFP2008<br>{-# NOINLINE (|||) #-}<br>(|||) :: [a] -&gt; [a] -&gt; [a]<br>[]   \
||| ys = ys<br>(x:xs) ||| ys = x : ys ||| xs<br> <br>
<br>-- | Diagonalization of nested lists. Ensure that some elements from every<br>-- \
sublist will be included. Handles infinite sublists.<br>--<br>-- From Mark Jones&#39; \
talk at AFP2008<br>{-# NOINLINE diag #-}<br>diag :: [[a]] -&gt; [a]<br>

diag = concat . foldr skew [] . map (map (\x -&gt; [x]))<br><br>skew :: [[a]] -&gt; \
[[a]] -&gt; [[a]]<br>skew []     ys = ys<br>skew (x:xs) ys = x : combine (++) xs \
ys<br><br>combine :: (a -&gt; a -&gt; a) -&gt; [a] -&gt; [a] -&gt; [a]<br>

combine _ xs [] = xs combine _ [] ys = ys combine f (x:xs) \
(y:ys) = f x y : combine f xs ys </blockquote> The particular implementation of \
these functions doesn&#39;t really matter. What&#39;s important is that we have a way \
to interleave lists (|||) and a way to diagonalise a matrix into a list (diag). We \
mark these functions as NOINLINE because inlining them will only make the core code \
more complicated (and may prevent rules from firing).

<br>Suppose we have a type of Peano natural numbers:<br><br><blockquote \
style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex;font-family:courier new,monospace" \
class="gmail_quote">data Nat = Ze | Su Nat deriving Eq<br>

</blockquote><br>Implementing enumeration on this type is simple:<br><br><blockquote \
style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex;font-family:courier new,monospace" \
class="gmail_quote">

enumNat :: [Nat]<br>enumNat = [Ze] ||| map Su enumNat<br></blockquote><br>Now, a \
generic representation of Nat in terms of sums and products could look something like \
this:<br><br><blockquote style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex;font-family:courier new,monospace" \
class="gmail_quote">

type RepNat = Either () Nat </blockquote> That is, either a singleton (for the \
Ze case) or a Nat (for the Su case). Note that I am building a shallow \
representation, since at the leaves we have Nat, and not RepNat. This mimics the \
situation with current generic programming libraries (in particular \
GHC.Generics).

<br>We&#39;ll need a way to convert between RepNat and Nat:<br><br><blockquote \
style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex;font-family:courier new,monospace" \
class="gmail_quote">

toNat :: RepNat -&gt; Nat<br>toNat (Left ()) = Ze<br>toNat (Right n) = Su \
n<br><br>fromNat :: Nat -&gt; RepNat<br>fromNat Ze = Left ()<br>fromNat (Su n) = \
Right n<br></blockquote><br>(In fact, since we&#39;re only dealing with a generic \
producer we won&#39;t need the fromNat function.)<br>

<br>To get an enumeration for RepNat, we first need to know how to enumerate units \
and sums:<br><br><blockquote style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex;font-family:courier new,monospace" \
class="gmail_quote">

enumU :: [()]<br>enumU = [()]<br><br>enumEither :: [a] -&gt; [b] -&gt; [Either a \
b]<br>enumEither ea eb = map Left ea ||| map Right eb<br></blockquote><br>Now we can \
define an enumeration for RepNat:<br><br><blockquote style="margin:0pt 0pt 0pt \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex;font-family:courier \
new,monospace" class="gmail_quote">

enumRepNat :: [RepNat]<br>enumRepNat = enumEither enumU \
enumNatFromRep<br></blockquote><br>With the conversion function toNat, we can use \
enumRepNat to get an enumeration for Nat directly:<br><br><blockquote \
style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex;font-family:courier new,monospace" \
class="gmail_quote">

enumNatFromRep :: [Nat]<br>enumNatFromRep = map toNat \
enumRepNat<br></blockquote><br>First, convince yourself that enumNatFromRep and \
enumNat are equivalent functions:<br><br><blockquote style="margin:0pt 0pt 0pt \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex;font-family:courier \
new,monospace" class="gmail_quote">

take 100 enumNat == take 100 enumNatFromRep </blockquote> Now, what I want is \
that enumNatFromRep generates the same core code as enumNat. That should be possible; \
here are the necessary steps: <blockquote style="margin:0pt 0pt 0pt \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex;font-family:courier \
new,monospace" class="gmail_quote">

  map toNat enumRepNat <br><br>== { inline enumRepNat }<br><br>  map toNat \
(enumEither enumU enumNatFromRep) <br><br>== { inline enumEither }<br><br>  map toNat \
(map Left enumU ||| map Right enumNatFromRep)<br><br>== { inline enumU }<br>

<br>  map toNat (map Left [()] ||| map Right enumNatFromRep)<br><br>== { inline map \
}<br><br>  map toNat ([Left ()] ||| map Right enumNatFromRep)<br><br>== { free \
theorem (|||): forall f a b. map f (a ||| b) = map f a ||| map f b }<br>

<br>  map toNat [Left ()] ||| map toNat (map Right enumNatFromRep)<br><br>== { inline \
map }<br><br>  [toNat (Left ())] ||| map toNat (map Right enumNatFromRep)<br><br>== { \
definition of toNat (or inline toNat + case of constant) }<br>

<br>  [Ze] ||| map toNat (map Right enumNatFromRep)<br><br>== { functor composition \
law: forall f g l. map f (map g l) = map (f . g) l }<br><br>  [Ze] ||| map (toNat . \
Right) enumNatFromRep<br><br>== { definition of toNat (or inline toNat + case of \
constant) }<br>

<br>  [Ze] ||| map Su enumNatFromRep<br></blockquote><br>Now let&#39;s see what the \
compiler generates. I&#39;m using GHC-7.4.1. Let&#39;s compile with -O1 and use \
-ddump-simpl to see the final simplifier output (core code) for enumNatFromRep:<br>

<br><blockquote style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex;font-family:courier new,monospace" \
class="gmail_quote">EnumAlone.enumNatFromRep :: [EnumAlone.Nat]<br>[GblId,<br> \
Str=DmdType,<br>

 Unf=Unf{Src=&lt;vanilla&gt;, TopLvl=True, Arity=0, Value=False,<br>         \
ConLike=False, Cheap=False, Expandable=False,<br>         Guidance=IF_ARGS [] 30 \
0}]<br>EnumAlone.enumNatFromRep =<br>  GHC.Base.map<br>    @ EnumAlone.RepNat<br>

    @ EnumAlone.Nat<br>    EnumAlone.toNat<br>    \
EnumAlone.enumRepNat<br><br>EnumAlone.enumRepNat [Occ=LoopBreaker] :: \
[EnumAlone.RepNat]<br>[GblId, Str=DmdType]<br>EnumAlone.enumRepNat =<br>  \
EnumAlone.|||<br>    @ (Data.Either.Either () EnumAlone.Nat) lvl4_rvV lvl5_rvW<br>

</blockquote> Ah, it didn&#39;t even inline enumRepNat because it made it a loop \
breaker. We certainly want to inline it, so let&#39;s add a \
pragma: <blockquote style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex" class="gmail_quote">

<span style="font-family:courier new,monospace">{-# INLINE enumRepNat \
#-}</span><br></blockquote><br>Recompiling, we get:<br><br><blockquote \
style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex;font-family:courier new,monospace" \
class="gmail_quote">

EnumAlone.enumRepNat [InlPrag=INLINE (sat-args=0)]<br>  :: \
[EnumAlone.RepNat]<br>[GblId,<br> Str=DmdType,<br> Unf=Unf{Src=InlineStable, \
TopLvl=True, Arity=0, Value=False,<br>         ConLike=False, Cheap=False, \
Expandable=False,<br>

         Guidance=ALWAYS_IF(unsat_ok=False,boring_ok=False)<br>         Tmpl= \
EnumAlone.enumEither<br>                 @ () @ EnumAlone.Nat EnumAlone.enumU \
EnumAlone.enumNatFromRep}]<br>EnumAlone.enumRepNat =<br>  EnumAlone.|||<br>

    @ (Data.Either.Either () EnumAlone.Nat) lvl4_rvV \
lvl5_rvW<br><br>EnumAlone.enumNatFromRep [Occ=LoopBreaker] :: \
[EnumAlone.Nat]<br>[GblId, Str=DmdType]<br>EnumAlone.enumNatFromRep =<br>  \
GHC.Base.map<br>    @ EnumAlone.RepNat<br>

    @ EnumAlone.Nat<br>    EnumAlone.toNat<br>    \
EnumAlone.enumRepNat<br></blockquote><br>So no real difference in the generated code, \
other than the reassignment of loop breakers. For some reason enumRepNat still \
doesn&#39;t get inlined.<br>

Question: why won&#39;t GHC inline enumRepNat, even when I tell it to do so \
with an INLINE pragma? Well, let&#39;s inline it ourselves, then. We \
redefine enumNatFromRep to: <blockquote style="margin:0pt 0pt 0pt \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex;font-family:courier \
new,monospace" class="gmail_quote">

enumNatFromRep = map toNat (enumEither enumU enumNatFromRep) </blockquote> This \
however doesn&#39;t help much. We get the following core: <blockquote \
style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex;font-family:courier new,monospace" \
class="gmail_quote">

lvl6_rw5 :: [Data.Either.Either () EnumAlone.Nat]<br>[GblId]<br>lvl6_rw5 =<br>  \
EnumAlone.|||<br>    @ (Data.Either.Either () EnumAlone.Nat) lvl4_rw3 \
lvl5_rw4<br><br>EnumAlone.enumNatFromRep [Occ=LoopBreaker] :: [EnumAlone.Nat]<br>

[GblId, Str=DmdType] EnumAlone.enumNatFromRep = GHC.Base.map @ \
EnumAlone.RepNat @ EnumAlone.Nat EnumAlone.toNat lvl6_rw5 </blockquote> GHC is \
really keen on floating that (|||) out. Let&#39;s be very explicit about \
inlining:

<br><blockquote style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex;font-family:courier new,monospace" \
class="gmail_quote">{-# INLINE toNat #-}<br>{-# INLINE enumU #-}<br>{-# INLINE \
enumEither #-}<br>

</blockquote> Also, maybe it&#39;s floating it out because it doesn&#39;t have \
anything else to do to it. Let&#39;s add the free theorem of (|||) as a \
rule: <blockquote style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex;font-family:courier new,monospace" \
class="gmail_quote">

{-# RULES &quot;ft |||&quot; forall f a b. map f (a ||| b) = map f a ||| map f b \
#-} </blockquote> We needed this in our manual derivation, so GHC should need \
it too. Recompiling, we see we&#39;ve made some progress:

<br><blockquote style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex;font-family:courier new,monospace" \
class="gmail_quote">lvl5_ryv :: [Data.Either.Either () EnumAlone.Nat]<br>[GblId]<br>

lvl5_ryv =<br>  GHC.Base.map<br>    @ EnumAlone.Nat<br>    @ (Data.Either.Either () \
EnumAlone.Nat)<br>    (Data.Either.Right @ () @ EnumAlone.Nat)<br>    \
EnumAlone.enumNatFromRep<br><br>lvl6_ryw :: [EnumAlone.Nat]<br>[GblId]<br>

lvl6_ryw =<br>  GHC.Base.map<br>    @ (Data.Either.Either () EnumAlone.Nat)<br>    @ \
EnumAlone.Nat<br>    EnumAlone.toNat<br>    lvl5_ryv<br><br>EnumAlone.enumNatFromRep \
[Occ=LoopBreaker] :: [EnumAlone.Nat]<br>[GblId, Str=DmdType]<br>

EnumAlone.enumNatFromRep = EnumAlone.||| @ EnumAlone.Nat lvl4_ryu \
lvl6_ryw </blockquote> enumNatFromRep finally starts with (|||) directly. But \
its second argument, lvl6_ryw, is a map of lvl5_ryv, which is itself a map! At this \
stage I expected GHC to be aware of the fusion law for map, but it seems that it \
isn&#39;t.

<br><b>Question: why is map fusion not happening automatically?</b><br><br>Let&#39;s \
add it as a rule:<br><br><blockquote style="margin:0pt 0pt 0pt 0.8ex;border-left:1px \
solid rgb(204,204,204);padding-left:1ex;font-family:courier new,monospace" \
class="gmail_quote">

{-# RULES &quot;map/map1&quot; forall f g l. map f (map g l) = map (f . g) l \
#-}<br></blockquote><br>And now we&#39;re in a much better \
situation:<br><br><blockquote style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex;font-family:courier new,monospace" \
class="gmail_quote">

lvl3_ryD :: [Data.Either.Either () EnumAlone.Nat]<br>[GblId, \
Caf=NoCafRefs]<br>lvl3_ryD =<br>  GHC.Types.:<br>    @ (Data.Either.Either () \
EnumAlone.Nat)<br>    EnumAlone.fromNat1<br>    (GHC.Types.[] @ (Data.Either.Either \
() EnumAlone.Nat))<br>

<br>lvl4_ryE :: [EnumAlone.Nat]<br>[GblId]<br>lvl4_ryE =<br>  GHC.Base.map<br>    @ \
(Data.Either.Either () EnumAlone.Nat)<br>    @ EnumAlone.Nat<br>    \
EnumAlone.toNat<br>    lvl3_ryD<br><br>lvl5_ryF :: [EnumAlone.Nat]<br>

[GblId]<br>lvl5_ryF =<br>  GHC.Base.map<br>    @ EnumAlone.Nat<br>    @ \
EnumAlone.Nat<br>    EnumAlone.Su<br>    \
EnumAlone.enumNatFromRep<br><br>EnumAlone.enumNatFromRep [Occ=LoopBreaker] :: \
[EnumAlone.Nat]<br>[GblId, Str=DmdType]<br>

EnumAlone.enumNatFromRep = EnumAlone.||| @ EnumAlone.Nat lvl4_ryE \
lvl5_ryF </blockquote> Note how toNat is entirely gone from the second part of \
the enumeration (lvl5_ryF). Strangely enough, the enumerator for Ze (lvl4_ryE) is \
still very complicated: map toNat ([Left ()]). Why doesn&#39;t GHC simplify this to \
just [Ze]? Apparently because GHC doesn&#39;t simplify map over a single element \
list.

<br><b>Question: why doesn&#39;t GHC optimise map f [x] to [f \
x]?</b><br><br>Let&#39;s tell it to do so:<br><br><blockquote style="margin:0pt 0pt \
0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex;font-family:courier \
new,monospace" class="gmail_quote">

{-# RULES &quot;map/map2&quot; forall f x. map f (x:[]) = (f x):[] \
#-}<br></blockquote><br>Now we&#39;re finally where we wanted:<br><br><blockquote \
style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex;font-family:courier new,monospace" \
class="gmail_quote">

lvl_ryA :: [EnumAlone.Nat]<br>[GblId, Caf=NoCafRefs]<br>lvl_ryA =<br>  \
GHC.Types.:<br>    @ EnumAlone.Nat EnumAlone.Ze (GHC.Types.[] @ \
EnumAlone.Nat)<br><br>lvl3_ryD :: [EnumAlone.Nat]<br>[GblId]<br>lvl3_ryD =<br>  \
GHC.Base.map<br>

    @ EnumAlone.Nat<br>    @ EnumAlone.Nat<br>    EnumAlone.Su<br>    \
EnumAlone.enumNatFromRep<br><br>EnumAlone.enumNatFromRep [Occ=LoopBreaker] :: \
[EnumAlone.Nat]<br>[GblId, Str=DmdType]<br>EnumAlone.enumNatFromRep =<br>

  EnumAlone.||| @ EnumAlone.Nat lvl_ryA lvl3_ryD<br></blockquote><br>This is what I \
wanted: no more representation types (Either or ()), and the code looks exactly like \
what is generated for the handwritten enumNat.<br><br>

More realistic generic programming Now let&#39;s see \
if we can transport this to the setting of a generic programming library. I&#39;ll \
use a bare-bones version of GHC.Generics: <blockquote style="margin:0pt 0pt \
0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex;font-family:courier \
new,monospace" class="gmail_quote">

infixr 5 :+:<br>infixr 6 :*:<br><br>data U          = U              deriving (Show, \
Read)<br>data a :+: b    = L a | R b      deriving (Show, Read)<br>data a :*: b    = \
a :*: b        deriving (Show, Read)<br>newtype Var a   = Var a          deriving \
(Show, Read)<br>

newtype Rec a   = Rec a          deriving (Show, Read)<br><br>class Representable a \
where<br>  type Rep a<br>  to   :: Rep a -&gt; a<br>  from :: a -&gt; Rep \
a<br></blockquote><br>Let&#39;s represent Nat in this library:<br>

<br><blockquote style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex;font-family:courier new,monospace" \
class="gmail_quote">instance Representable Nat where<br>  type Rep Nat = U :+: (Rec \
Nat)<br>

from Ze = L U from (Su n) = R (Rec n) to (L U) = \
Ze to (R (Rec n)) = Su n </blockquote> (Note, in particular, that we do \
not need INLINE pragmas on the from/to methods. This might just be because GHC thinks \
these are small and inlines them anyway, but in general we want to make sure they are \
inlined, so we typically use pragmas there.)

Now we need to implement enumeration generically. We do this by giving an \
instance for each representation type: <blockquote style="margin:0pt 0pt 0pt \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex;font-family:courier \
new,monospace" class="gmail_quote">

class GEnum&#39; a where<br>  genum&#39; :: [a]<br><br>instance GEnum&#39; U \
where<br>  {-# INLINE genum&#39; #-}<br>  genum&#39; = [U]<br><br>instance (GEnum a) \
=&gt; GEnum&#39; (Rec a) where<br>  {-# INLINE genum&#39; #-}<br>

  genum&#39; = map Rec genum<br><br>instance (GEnum a) =&gt; GEnum&#39; (Var a) \
where<br>  {-# INLINE genum&#39; #-}<br>  genum&#39; = map Var genum<br><br>instance \
(GEnum&#39; f, GEnum&#39; g) =&gt; GEnum&#39; (f :+: g) where<br>

  {-# INLINE genum&#39; #-}<br>  genum&#39; = map L genum&#39; ||| map R \
genum&#39;<br><br>instance (GEnum&#39; f, GEnum&#39; g) =&gt; GEnum&#39; (f :*: g) \
where<br>  {-# INLINE genum&#39; #-}<br>  --genum&#39; = diag [ [ x :*: y | y &lt;- \
genum&#39; ] | x &lt;- genum&#39; ]<br>

genum&#39; = diag (map (\x -&gt; map (\y -&gt; x :*: y) genum&#39;) \
genum&#39;) </blockquote> We explicitly tell GHC to inline each case, as \
before. Note that for products I&#39;m not using the more natural list comprehension \
syntax because I don&#39;t quite understand how that gets translated into core.

<br>In the cases for Var and Rec we use genum from the GEnum \
class:<br><br><blockquote style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex;font-family:courier new,monospace" \
class="gmail_quote">

class GEnum a where genum :: [a] {-# INLINE genum #-} default genum :: \
(Representable a, GEnum&#39; (Rep a)) =&gt; [a] genum = map to \
genum&#39; </blockquote> GEnum&#39; is the class used for instantiating the \
generic representation types, and GEnum is used for user types. We use a default \
signature to provide a default method that can be used when we have a Representable \
instance for the type in question. This makes instantiating Nat very easy:

<br><blockquote style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex;font-family:courier new,monospace" \
class="gmail_quote">instance GEnum Nat<br></blockquote><br>Unfortunately, the core \
code generated in this situation (with the same RULES as before) is not nice at \
all:<br>

<br><blockquote style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex;font-family:courier new,monospace" \
class="gmail_quote">Main.$fGEnumNat_$cgenum [Occ=LoopBreaker] :: \
[Base.Nat]<br>[GblId, Str=DmdType]<br>

Main.$fGEnumNat_$cgenum =<br>  GHC.Base.map<br>    @ (Base.Rep Base.Nat)<br>    @ \
Base.Nat<br>    Base.$fRepresentableNat_$cto<br>    (lvl37_r79y<br>     `cast` (Sym \
(GEnum.NTCo:GEnum&#39;) &lt;Base.U Base.:+: (Base.Rec Base.Nat)&gt; ; <br>

                                       (GEnum.GEnum&#39; (Sym (Base.TFCo:R:RepNat)) \
;<br>                                         GEnum.NTCo:GEnum&#39; &lt;Base.Rep \
Base.Nat&gt;)<br>             :: [Base.C Base.Nat_Ze_ Base.U<br>

                 Base.:+: Base.C Base.Nat_Su_ (Base.Rec Base.Nat)]<br>                \
~#<br>                [Base.Rep Base.Nat]))<br></blockquote><br>We see a map of the \
`to` function, which is definitely not what we want. Oddly enough, if we give an \
explicit definition of genum for Nat, with the inlined default...<br>

<br><blockquote style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex;font-family:courier new,monospace" \
class="gmail_quote">instance GEnum Nat where genum = map to \
genum&#39;<br></blockquote>

... then we get the optimised code we want: <blockquote style="margin:0pt \
0pt 0pt 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex;font-family:courier new,monospace" \
class="gmail_quote">lvl34_r79p :: [Base.Nat]

[GblId, Caf=NoCafRefs]<br>lvl34_r79p =<br>  GHC.Types.: @ Base.Nat Base.Ze \
(GHC.Types.[] @ Base.Nat)<br><br>lvl35_r79q :: [Base.Nat]<br>[GblId]<br>lvl35_r79q \
=<br>  GHC.Base.map @ Base.Nat @ Base.Nat Base.Su Main.$fGEnumNat_$cgenum<br>

Main.$fGEnumNat_$cgenum [Occ=LoopBreaker] :: [Base.Nat] [GblId, \
Str=DmdType] Main.$fGEnumNat_$cgenum = GEnum.||| @ Base.Nat lvl34_r79p \
lvl35_r79q </blockquote> Again, no representation types, no `to`, just the same \
code that is generated for enumNat. Perfect. But we had to avoid using the default \
definition, which is a pity.

Question: why won&#39;t GHC inline the default method of a class, even when I \
have a pragma telling it to do so? Let&#39;s look at one more datatype, \
because Nat does not use products. So let&#39;s consider trees:

<br><blockquote style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex;font-family:courier new,monospace" \
class="gmail_quote">data Tree a = Leaf | Bin a (Tree a) (Tree a)<br><br>instance \
Representable (Tree a) where<br>

  type Rep (Tree a) = U :+: (Var a :*: Rec (Tree a) :*: Rec (Tree a))<br>  from (Bin \
x l r) = R (Var x :*: Rec l :*: Rec r)<br>  from Leaf        = L U<br>  to (R (Var x \
:*: (Rec l) :*: (Rec r))) = Bin x l r<br>  to (L U)                               = \
Leaf<br>

</blockquote><br>We give a GEnum instance using the same trick as \
before:<br><br><blockquote style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex;font-family:courier new,monospace" \
class="gmail_quote">

instance GEnum (Tree Int) where genum = map to genum&#39;<br></blockquote><br>(For \
simplicity only for trees of integers.) The generated code for trees is unfortunately \
not as nice:<br><br><blockquote style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex;font-family:courier new,monospace" \
class="gmail_quote">

a2_r79M<br>  :: [Base.Rec (Base.Tree <a \
href="http://GHC.Types.Int">GHC.Types.Int</a>)<br>      Base.:*: Base.Rec (Base.Tree \
<a href="http://GHC.Types.Int">GHC.Types.Int</a>)]<br>[GblId, Str=DmdType]<br>a2_r79M \
=<br>  GEnum.diag<br>

    @ (Base.Rec (Base.Tree <a href="http://GHC.Types.Int">GHC.Types.Int</a>)<br>      \
Base.:*: Base.Rec (Base.Tree <a href="http://GHC.Types.Int">GHC.Types.Int</a>))<br>   \
lvl8_r79L<br><br>lvl9_r79N :: [Base.Tree <a \
href="http://GHC.Types.Int">GHC.Types.Int</a>]<br>

[GblId]<br>lvl9_r79N =<br>  GHC.Base.map<br>    @ (Base.Rec (Base.Tree <a \
href="http://GHC.Types.Int">GHC.Types.Int</a>)<br>       Base.:*: Base.Rec (Base.Tree \
<a href="http://GHC.Types.Int">GHC.Types.Int</a>))<br>    @ (Base.Tree <a \
href="http://GHC.Types.Int">GHC.Types.Int</a>)<br>

    lvl5_r79H<br>    a2_r79M<br><br>Main.$fGEnumTree_$cgenum [Occ=LoopBreaker]<br>  \
:: [Base.Tree <a href="http://GHC.Types.Int">GHC.Types.Int</a>]<br>[GblId, \
Str=DmdType]<br>Main.$fGEnumTree_$cgenum =<br>  GEnum.||| @ (Base.Tree <a \
href="http://GHC.Types.Int">GHC.Types.Int</a>) lvl4_r79G lvl9_r79N<br>

</blockquote> Note how lvl9_r79N is a map over a2_r79M, and a2_r79M is a `diag`. \
Ok, we need the free theorem of `diag` to tell GHC how to commute the `diag` with \
map: <blockquote style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex;font-family:courier new,monospace" \
class="gmail_quote">

{-# RULES &quot;ft/diag&quot; forall f l. map f (diag l) = diag (map (map f) l) \
#-} </blockquote> Unfortunately this doesn&#39;t change the generated core \
code. With some more debugging looking at the generated code at each simplifier \
iteration, I believe that this is because a2_r79M got lifted out too soon, prevent \
the rule from applying. With some imagination I decided to try the -fno-full-laziness \
flag to prevent let-floating. I&#39;m not sure this is a good idea in general, but in \
this particular case it gives much better results:

<br><blockquote style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex;font-family:courier new,monospace" \
class="gmail_quote">a1_r72i :: [Base.Rec (Base.Tree <a \
href="http://GHC.Types.Int">GHC.Types.Int</a>)]<br>

[GblId, Str=DmdType]<br>a1_r72i =<br>  GHC.Base.map<br>    @ (Base.Tree <a \
href="http://GHC.Types.Int">GHC.Types.Int</a>)<br>    @ (Base.Rec (Base.Tree <a \
href="http://GHC.Types.Int">GHC.Types.Int</a>))<br>    ((\ (tpl_B1 :: Base.Tree <a \
href="http://GHC.Types.Int">GHC.Types.Int</a>) -&gt; tpl_B1)<br>

     `cast` (&lt;Base.Tree <a href="http://GHC.Types.Int">GHC.Types.Int</a>&gt;<br>   \
-&gt; Sym (Base.NTCo:Rec &lt;Base.Tree <a \
href="http://GHC.Types.Int">GHC.Types.Int</a>&gt;)<br>             :: (Base.Tree <a \
href="http://GHC.Types.Int">GHC.Types.Int</a> -&gt; Base.Tree <a \
href="http://GHC.Types.Int">GHC.Types.Int</a>)<br>

                  ~#<br>                (Base.Tree <a \
href="http://GHC.Types.Int">GHC.Types.Int</a> -&gt; Base.Rec (Base.Tree <a \
href="http://GHC.Types.Int">GHC.Types.Int</a>))))<br>    \
Main.$fGEnumTree_$cgenum<br><br>Main.$fGEnumTree_$cgenum [Occ=LoopBreaker]<br>

  :: [Base.Tree <a href="http://GHC.Types.Int">GHC.Types.Int</a>]<br>[GblId, \
Str=DmdType]<br>Main.$fGEnumTree_$cgenum =<br>  GEnum.|||<br>    @ (Base.Tree <a \
href="http://GHC.Types.Int">GHC.Types.Int</a>)<br>    (GHC.Types.:<br>

       @ (Base.Tree <a href="http://GHC.Types.Int">GHC.Types.Int</a>)<br>       \
(Base.Leaf @ <a href="http://GHC.Types.Int">GHC.Types.Int</a>)<br>       \
(GHC.Types.[] @ (Base.Tree <a href="http://GHC.Types.Int">GHC.Types.Int</a>)))<br>

    (GEnum.diag<br>       @ (Base.Tree <a \
href="http://GHC.Types.Int">GHC.Types.Int</a>)<br>       (GHC.Base.map<br>          @ \
(Base.Rec (Base.Tree <a href="http://GHC.Types.Int">GHC.Types.Int</a>))<br>          \
@ [Base.Tree <a href="http://GHC.Types.Int">GHC.Types.Int</a>]<br>

          (\ (x_a1yQ :: Base.Rec (Base.Tree <a \
href="http://GHC.Types.Int">GHC.Types.Int</a>)) -&gt;<br>             \
GHC.Base.map<br>               @ (Base.Rec (Base.Tree <a \
href="http://GHC.Types.Int">GHC.Types.Int</a>))<br>

               @ (Base.Tree <a href="http://GHC.Types.Int">GHC.Types.Int</a>)<br>     \
(\ (x1_X1zB :: Base.Rec (Base.Tree <a href="http://GHC.Types.Int">GHC.Types.Int</a>)) \
-&gt;<br>                  Base.Bin<br>

                    @ <a href="http://GHC.Types.Int">GHC.Types.Int</a><br>            \
a_r72h<br>                    (x_a1yQ<br>                     `cast` (Base.NTCo:Rec \
&lt;Base.Tree <a href="http://GHC.Types.Int">GHC.Types.Int</a>&gt;<br>

                             :: Base.Rec (Base.Tree <a \
href="http://GHC.Types.Int">GHC.Types.Int</a>) ~# Base.Tree <a \
href="http://GHC.Types.Int">GHC.Types.Int</a>))<br>                    (x1_X1zB<br>   \
`cast` (Base.NTCo:Rec &lt;Base.Tree <a \
href="http://GHC.Types.Int">GHC.Types.Int</a>&gt;<br>

                             :: Base.Rec (Base.Tree <a \
href="http://GHC.Types.Int">GHC.Types.Int</a>) ~# Base.Tree <a \
href="http://GHC.Types.Int">GHC.Types.Int</a>)))<br>               a1_r72i)<br>       \
a1_r72i))<br> </blockquote>
<br>Note how our enum is now of the shape `[Leaf] ||| diag y`, which is good. The \
only catch is that there are some `Rec`s still laying around, with their associated \
newtype coercions, and a function a1_r72i that basically wraps the recursive \
enumeration in a Rec, only to be unwrapped in the body of `$fGEnumTree_$cgenum`. I \
don&#39;t know how to get GHC to simplify this code any further.<br>

<br><b>Question: why do I need -fno-full-laziness for the ft/diag rule to \
apply?</b><br><br><b>Question: why is GHC not getting rid of the Rec newtype in this \
case?</b><br><br>I have also played with -O2, in particular because of the SpecConstr \
optimisation, but found that it does not affect these particular examples (perhaps it \
only becomes important with larger datatypes). I have also experimented with phase \
control in the rewrite rules and the inline pragmas, but didn&#39;t find it necessary \
for this example. In general, anyway, my experience with the inliner is that it is \
extremely fragile, especially across different GHC versions, and it&#39;s hard to get \
any guarantees of optimisation. I have also played with the -funfolding-* options \
before, with mixed results. [1] It&#39;s also a pity that certain flags are not \
explained in detail in the user&#39;s manual [2,3], like -fliberate-case, and \
-fspec-constr-count and threshold, for instance.<br>

Thank you for reading this. Any insights are welcome. In particular, I&#39;m \
wondering if I might be missing some details regarding \
strictness. Cheers, Pedro [1] <a \
href="http://dreixel.net/research/pdf/ogie.pdf">http://dreixel.net/research/pdf/ogie.pdf</a>

[2] <a href="http://www.haskell.org/ghc/docs/latest/html/users_guide/flag-reference.ht \
ml">http://www.haskell.org/ghc/docs/latest/html/users_guide/flag-reference.html</a><br>[3] \
<a href="http://www.haskell.org/ghc/docs/latest/html/users_guide/options-optimise.html \
#options-f">http://www.haskell.org/ghc/docs/latest/html/users_guide/options-optimise.html#options-f</a><br>

_______________________________________________
Generics mailing list
Generics@haskell.org
http://www.haskell.org/mailman/listinfo/generics

[prev in list] [next in list] [prev in thread] [next in thread]