« Announcing diagrams-0.4 » Wanted GHC feature: warn about unused constraints

More counting lambda terms

Posted on October 24, 2011
Tagged haskell

Yesterday someone submitted a comment to an old post of mine about counting linear lambda terms, asking the following question:

I am interested in a similar problem: given a natural number N, what is the number of different symply-typed closed lambda terms with size smaller than N?

They don’t need to be observationally different. They should be different modulo renaming of variables and types, though. So:

lambda x^A.x and lambda y^B.y

should be considered the same…
Do you know where I can find the answer? Answers using any kind of size measurement would be fine for me…

It’s an interesting question, and I thought it would make for a good way to illustrate one of my favorite techniques when trying to research a combinatorial question like this. Namely, write a program to compute the first few values by brute force, then look it up in the Online Encyclopedia of Integer Sequences and follow the references.

Let’s start with untyped lambda calculus terms.

> module CountSTLC where
> 
> type Nat = Int  -- just pretend

A type for deBruijn-indexed lambda terms.

> data Tm = Var Nat
>         | App Tm Tm
>         | Lam Tm
>   deriving (Eq, Ord, Show)

We’ll define the size of a term as the number of constructors used by its representation in the above data type.

> size :: Tm -> Nat
> size (Var _)     = 1
> size (App t1 t2) = 1 + size t1 + size t2
> size (Lam t)     = 1 + size t

Here’s a function to generate all the closed Tms of a given size. We pass along an index giving the current level of nesting (so we know what variables are available).

> genAll :: Nat -> [Tm]
> genAll = genAll' 0
>   where

There are no terms of size zero (or smaller).

>     genAll' _   n | n <= 0 = []

The only terms of size 1 are variables. We can choose to refer to any of the currently enclosing lambdas.

>     genAll' ctx 1 = map Var [0 .. ctx-1]

Otherwise, we could have an application (splitting the available size between the two sides in all possible ways) or a lambda (remembering to increment the nesting level).

>     genAll' ctx n = [ App t1 t2 
>                     | n1 <- [1 .. n-2]
>                     , t1 <- genAll' ctx n1
>                     , t2 <- genAll' ctx (n - n1 - 1)
>                     ]
>                  ++ (map Lam (genAll' (succ ctx) (n-1)))

Let’s see what we get:

*CountSTLC> genAll 4
[ Lam (App (Var 0) (Var 0))
, Lam (Lam (Lam (Var 0)))
, Lam (Lam (Lam (Var 1)))
, Lam (Lam (Lam (Var 2)))
]

Looks reasonable: there are four closed lambda terms of size 4. Now let’s count:

*CountSTLC> map (length . genAll) [1..10]
[0,1,2,4,13,42,139,506,1915,7558]

Searching for this sequence on OEIS turns up something promising, with a bit of information including a formula for computing the counts directly. It also has a link to someone’s research but unfortunately it seems dead. But it does have the email address of the person who submitted this sequence, and emailing him might be a good start!

This is all well and good, but the commenter actually asked for simply typed lambda calculus terms. The first size-4 lambda term above is not well-typed, since it involves a self-application. Can we modify our program to only generate well-typed lambda terms? This seems much more difficult. I may write about it in a future post. For now I’ll leave it as an exercise for the interested and motivated reader!