« Enumerating linear inhabitants » Call for contributions: second edition of the Typeclassopedia

Binders Unbound

Posted on March 29, 2011
Tagged , , , , ,

Stephanie Weirich, Tim Sheard and I recently submitted a paper to ICFP entitled Binders Unbound. (You can read a draft here.) It’s about our kick-ass, I mean, expressive and flexible library, unbound (note: GHC 7 required), for generically dealing with names and binding structure when writing programs (compilers, interpreters, refactorers, proof assistants…) that work with syntax. Let’s look at a small example of representing untyped lambda calculus terms. This post is working Literate Haskell, feel free to save it to a .lhs file and play around with it yourself!

First, we need to enable lots of wonderful GHC extensions:

> {-# LANGUAGE MultiParamTypeClasses
>            , TemplateHaskell
>            , ScopedTypeVariables
>            , FlexibleInstances
>            , FlexibleContexts
>            , UndecidableInstances
>   #-}

Now to import the library and a few other things we’ll need:

> import Unbound.LocallyNameless
> 
> import Control.Applicative
> import Control.Arrow ((+++))
> import Control.Monad
> import Control.Monad.Trans.Maybe
> 
> import Text.Parsec hiding ((<|>), Empty)
> import qualified Text.Parsec.Token as P
> import Text.Parsec.Language (haskellDef)

We now declare a Term data type to represent lambda calculus terms.

> data Term = Var (Name Term)
>           | App Term Term
>           | Lam (Bind (Name Term) Term)
>   deriving Show

The App constructor is straightforward, but the other two constructors are worth discussing in more detail. First, the Var constructor holds a Name Term. Name is an abstract type for representing names, provided by Unbound. Names are indexed by the sorts of things to which they can refer (or more precisely, the sorts of things which can be substituted for them). Here, a variable is simply a name for some Term, so we use the type Name Term.

Lambdas are where names are bound, so we use the special Bind combinator, also provided by the library. Something of type Bind p b represents a pair consisting of a pattern p and a body b. The pattern may bind names which occur in b. Here is where the power of generic programming comes into play: we may use (almost) any types at all as patterns and bodies, and Unbound will be able to handle it with very little extra guidance from us. In this particular case, a lambda simply binds a single name, so the pattern is just a Name Term, and the body is just another Term.

Now we use Template Haskell to automatically derive a generic representation for Term:

> $(derive [''Term])

There are just a couple more things we need to do. First, we make Term an instance of Alpha, which provides most of the methods we will need for working with the variables and binders within Terms.

> instance Alpha Term

What, no method definitions? Nope! In this case (and in most cases) the default implementations, implemented in terms of automatically-derived generic representations, work just fine.

We also need to provide a Subst Term Term instance. In general, an instance for Subst b a means that we can use the subst function to substitute things of type b for Names occurring in things of type a. We override the isvar method so the library knows which constructor(s) of our type represent variables which can be substituted for.

> instance Subst Term Term where
>   isvar (Var v) = Just (SubstName v)
>   isvar _       = Nothing

OK, now that we’ve got the necessary preliminaries set up, what can we do with this? Here’s a little lambda-calculus evaluator:

> done :: MonadPlus m => m a
> done = mzero
> 
> step :: Term -> MaybeT FreshM Term
> step (Var _) = done
> step (Lam _) = done
> step (App (Lam b) t2) = do
>   (x,t1) <- unbind b
>   return $ subst x t2 t1
> step (App t1 t2) =
>       App <$> step t1 <*> pure t2
>   <|> App <$> pure t1 <*> step t2
> 
> tc :: (Monad m, Functor m) => (a -> MaybeT m a) -> (a -> m a)
> tc f a = do
>   ma' <- runMaybeT (f a)
>   case ma' of
>     Just a' -> tc f a'
>     Nothing -> return a
> 
> eval :: Term -> Term
> eval x = runFreshM (tc step x)

Note how we use unbind to take bindings apart safely, using the the FreshM monad (also provided by Unbound) for generating fresh names. We also get to use subst for capture-avoiding substitution. All without ever having to touch a de Bruijn index!

OK, but does it work? First, a little Parsec parser:

> lam :: String -> Term -> Term
> lam x t = Lam $ bind (string2Name x) t
> 
> var :: String -> Term
> var = Var . string2Name
> 
> lexer    = P.makeTokenParser haskellDef
> parens   = P.parens lexer
> brackets = P.brackets lexer
> ident    = P.identifier lexer
> 
> parseTerm = parseAtom `chainl1` (pure App)
> 
> parseAtom = parens parseTerm
>         <|> var <$> ident
>         <|> lam <$> (brackets ident) <*> parseTerm
> 
> runTerm :: String -> Either ParseError Term
> runTerm = (id +++ eval) . parse parseTerm ""

Now let’s try some arithmetic:

*Main> runTerm "([m][n][s][z] m s (n s z)) 
                ([s] [z] s (s z)) 
                ([s] [z] s (s (s z))) 
                s z"
Right (App (Var s) (App (Var s) (App (Var s) 
        (App (Var s) (App (Var s) (Var z))))))

2 + 3 is still 5, and all is right with the world.

This blog post has only scratched the surface of what’s possible. There are several other combinators other than just Bind for expressing binding structure: for example, nested bindings, recursive bindings and embedding terms within patterns are all supported. There are also other operations provided, such as free variable calculation, simultaneous unbinding, and name permutation. To learn more, see the package documentation, or read the paper!