July 7, 2016

While very concise and powerful, GHC’s intermediate language, Core is very rich and hard to tame. This post is about my quest to understand and manipulate Core while implementing an abstract machine to interpret the language (more on which *maybe* later).

I will first describe Core in its latest incarnation and motivate the need for understanding and using it. Then I will discuss my endeavours while using the language. Finally I will go over some of the difficulties I encountered along the way.

Like many modern compilers the Glasgow Haskell compiler (GHC) is built as a pipeline of consecutive program transformations. While we won’t go over the compiler in details, GHC’s pipeline parses, typechecks, desugars and finally simplifies the code. When desugaring the code, GHC converts all plain Haskell source code into a much simpler language, GHC Core.

Core is an explicitly typed functional language based on System Fc, a “variant of System Fw with equality constraints and coercions. In GHC, Core is indeed a very simple language:

```
type CoreExpr = Expr Var
data Expr b -- "b" for the type of binders,
= Var Id
| Lit Literal
| App (Expr b) (Arg b)
| Lam b (Expr b)
| Let (Bind b) (Expr b)
| Case (Expr b) b Type [Alt b]
| Cast (Expr b) Coercion
| Tick (Tickish Id) (Expr b)
| Type Type
type Arg b = Expr b
type Alt b = (AltCon, [b], Expr b)
data AltCon = DataAlt DataCon | LitAlt Literal | DEFAULT
data Bind b = NonRec b (Expr b) | Rec [(b, (Expr b))]
```

Then, together with a simple type for binders

```
data Bind b = NonRec b (Expr b)
| Rec [(b, (Expr b))]
```

a Haskell program gets desugared into a `CoreProgram`

which is simply a synonym for `[CoreBind]`

, which in turn is a synonym for `Bind CoreBndr`

where `type CoreBndr = Var`

.

The importance of this language lies in its simplicity: since it is much simpler than pure Haskell, applying transformations to a Core program should be a much simpler task. So simple, in fact, that GHC provides an API to enable developers to write their own compiler plugins. This is a great deal given that we can insert our plugin at any stage of the optimization pipeline to apply the transformation(s) we want.

There are several good examples of plugins such as the Herbie plugin that “improves the numerical stability of Haskell programs”, or this plugin which enables to make functions (previously annotated) strict.

Despite its simplicity, taming Core can be hard.

First of all, while there is an API, the documentation is poor and there is a generalized lack of examples. This is understandable (up to a point) given that writing compiler plugins is not a common task and something that not even the most experienced Haskell programmers would do on a normal day (I guess).

The most up to date description of the language, as far as I am aware is “An external representation of GHC Core”, a description of Core as used in GHC 6.10. Considering that we are already in GHC 8, this is not a promising outlook given that outdated documentation makes it even harder to read and understand a Core program such as produced by the flag `ddump-ds`

.

Take for example the following simple program:

```
-- Example.hs
module Example where
factorial :: Integer -> Integer
factorial 0 = 1
factorial n = n * factorial (n-1)
```

Compiling with `ghc -c -ddump-ds Example.hs`

gives us the following:

```
==================== Desugar (after optimization) ====================
Result size of Desugar (after optimization)
= {terms: 20, types: 8, coercions: 0}
Rec {
factorial [Occ=LoopBreaker] :: Integer -> Integer
[LclIdX, Str=DmdType]
factorial =
\ (ds_dKE :: Integer) ->
case ==
@ Integer
integer-gmp-1.0.0.0:GHC.Integer.Type.$fEqInteger
ds_dKE
(__integer 0)
of _ [Occ=Dead] {
False ->
(\ _ [Occ=Dead, OS=OneShot] ->
* @ Integer
GHC.Num.$fNumInteger
ds_dKE
(factorial
(- @ Integer GHC.Num.$fNumInteger ds_dKE (__integer 1))))
GHC.Prim.void#;
True -> __integer 1
}
end Rec }
```

This code dump contains a lot of information about the Haskell source code but, the untrained developer will have a hard time deciphering all the bits and pieces of this text. Since my attempt here is not to teach you the “basics” of Core but rather, understanding how to manipulate it, you can check Stephen Diehl’s introduction to Core to get you started.

We are now ready to play with the language.

Core is defined in the CoreSyn module. An easy way to get started is to simply import the GhcPlugins which exports all the necessary bits and pieces for us to play with Core and build actual Core plugins.

Manipulating `CoreExprs`

is not difficult. Most of the time we can simply pattern match on the expressions we want to manipulate. For example, I am building a (very rudimentary) abstract machine that interprets `CoreExprs`

so I can check if an expression is a value (i.e. a Lambda expression or a Literal) as follows:

```
isCoreValue :: CoreExpr -> Bool
isCoreValue x = case x of
(Lam _ _) -> True
(Lit _) -> True
_ -> False
```

Or I can perform substitution inside expressions:

```
subst :: CoreExpr -> Id -> CoreExpr -> CoreExpr
subst y x t =
case t of
Var _x -> if x == _x then y else t -- base case for substitution
Lam z _t -> Lam z $ subst y x _t
App m arg -> App (subst y x m) (subst y x arg)
Case exp b t alts -> Case (subst y x exp) b t (map sub alts)
Tick tickish exp -> Tick tickish (subst y x exp)
_ -> t
where sub (alt, bs, exp) = (alt, bs, subst y x exp)
```

While the above examples are straightforward, there might be times when we need to create a `CoreExpr`

from scratch.

For the implementation of my abstract machine, I need to prove certain algebra laws. I could write the laws in Haskell, generate the Core and try to interpret them but it would be much easier if I could write a `CoreExpr`

in a file, say

`core_id = (Lam x (Var x))`

and interpret that file directly without the need to go through Haskell.

It turns out that around 2008 there was a Core interpreter in GHC, but for some reasons (probably not enough interest in the area? too hard to maintain?) it seized to exist soon after.

So, we either need to go through Haskell in order to get `CoreExprs`

we can manipulate, or we can attempt to use the API to create some for us. I did just that, and while I think there is probably a better/easier way, it gets the job done, for now at least. Here is we could recreate the above `core_id`

function:

```
-- (\x -> x)
core_id :: CoreExpr
core_id = Lam (buildId "x") (Var $ buildId "x")
-- | Builds an Id with varName
buildId varName = mkGlobalVar VanillaId name typ vanillaIdInfo
where
name = mkInternalName dunique (varOccName varName) noSrcSpan
dunique = mkUnique '_' 0
varOccName var = mkVarOcc var
typ = mkTyVarTy tyvar
tyvar = mkTyVar name anyKind
```

The above is more complicated that I thought it would be. As we saw above, a Core `Var`

has an `Id`

which contains a lot of information generated by the compiler. Here is its declaration:

```
-- | Essentially a typed 'Name', that may also contain some additional information
-- about the 'Var' and it's use sites.
data Var
= TyVar { -- Type and kind variables
-- see Note [Kind and type variables]
varName :: !Name,
realUnique :: FastInt, -- Key for fast comparison
-- Identical to the Unique in the name,
-- cached here for speed
varType :: Kind -- ^ The type or kind of the 'Var' in question
}
| TcTyVar { -- Used only during type inference
-- Used for kind variables during
-- inference, as well
varName :: !Name,
realUnique :: FastInt,
varType :: Kind,
tc_tv_details :: TcTyVarDetails }
| Id {
varName :: !Name,
realUnique :: FastInt,
varType :: Type,
idScope :: IdScope,
id_details :: IdDetails, -- Stable, doesn't change
id_info :: IdInfo } -- Unstable, updated by simplifier
```

So, by browsing through the API and testing pretty much all the functions (since, as I said above the documentation is almost nonexistent) I managed to build an `Id`

with the minimum amount of effort required to prove the laws in my algebra, not without hitting some problems first.

In fact, every Name contains a `Unique`

identifier that is “used in many places in GHC for fast ordering and equality tests”. This means that if we are not careful we could end up with different `CoreExprs`

being equal during an equality check! To solve this we need to generate the `Uniques`

from a `UniqSupply`

value, complicating things even more.

All the above reflects what I have been doing for the past month or so. Dealing with Core can be hard and complicated but it is also a lot of fun really trying to understand what happens inside GHC.

I am not sure if there are easier/better ways to deal with Core but so far as I know there are not many projects that attempt to generate Core from scratch, other than the discontinued interpreter I’ve only come across projects (plugins) that manipulate Core programs that are generated directly by the compiler.