« foldr is made of monoids » Teaching abstraction

Combinatorial species definition

Posted on November 20, 2012
Tagged , , , ,

Continuing from my previous post, recall that the goal of species is to have a unified theory of containers with labeled1 locations. So, how do we actually specify such things (leaving aside for the moment the question of how we compute with them)?

We might imagine specifying them by:

On the face of it this seems quite natural (at least, it does to me). However, it works better to instead use a function from sets of labels to the subset of all structures containing precisely those labels.

In my experience teaching people about species, this often seems to be a source of confusion—it seems “backwards”. More generally, when thinking about a set \(B\) indexed by some other set \(A\) (in this case, structures indexed by their sets of labels), one might think to model this by a function \(B \to A\) (which tells us the index), but it actually works better to model it by a function \(A \to B\), which takes each “index” to the set of all things indexed by it.2 Hopefully as we work through the rest of the definition you’ll get a sense for why it works better this way. For now, I think the best advice is don’t assign computational significance to these functions from labels to structures. Just think of them as a convenient technical device to keep track of shapes indexed by labels.

In any case, the first half of the definition is:

(I deliberately chose the word mapping instead of function to emphasize, again, that we don’t particularly want to assign it computational significance.) Of course, the fact that a species takes sets “of labels” as input and outputs sets “of structures” doesn’t matter; any sets will do, so we might as well just say that a species maps sets to sets. We write \(F[U]\) for the species \(F\) applied to a set of labels \(U\), and call \(F[U]\) the set of “\(F\)-structures with labels drawn from \(U\)”, or simply “\(F\)-structures on \(U\)”, or even (when \(U\) is clear from context) just “\(F\)-structures”.

So far, however, this is rather uninteresting, and moreover it fails to adequately capture our intuition for what “structures” are. Intuitively, the labels are incidental, just like the variable names used in lambda terms are incidental: we must use them to be able to distinguish locations, but the precise objects we use as labels really “shouldn’t matter”. That is, given two sets of labels of the same size, we ought to have “the same” family of structures indexed by each. Of course they can’t be literally the same, because they have different labels! But they should be the same “up to relabeling”. We want to rule out the ability to have two same-size sets of labels indexing wildly different sets of structures: a species shouldn’t be able to “look at” the individual labels in order to “decide” what sort of structures to produce, just like a polymorphic type in Haskell can’t “look at” its type argument. The major difference is that species are allowed to “look at” the size of the label set.

Making this intuition precise is the clever part, and is really the pivotal point around which the whole theory revolves. Here’s how we do it. We don’t work with sizes of label sets directly; instead we work with bijections between label sets. (Of course, if there is a bijection between two finite sets then they are necessarily the same size.)

Given two label sets \(U\) and \(V\) which are related by the bijection \(\sigma\) (sometimes referred to as a relabeling), there must be a relationship between \(F[U]\) and \(F[V]\)—in particular they must also be in bijection. Here, then, is the second part of the definition:

(Note that we’re recycling notation here, using \(F[-]\) for the action of species on both label sets and bijections.) However, this still isn’t quite enough: we don’t want \(F[\sigma]\) to be just any bijection between \(F[U]\) and \(F[V]\). It really should be the specific bijection that “applies” \(\sigma\) to the labels contained within the structures in \(F[U]\). For example, it would be weird if the identity relabeling, when lifted through \(F\), resulted in some nontrivial reshuffling of the structures in \(F[U]\). It would also be strange if \(F\) didn’t respect composition, that is, if there were some \(\sigma, \tau\) such that \(F[\sigma] \circ F[\tau] \neq F[\sigma \circ \tau]\), since intuitively “applying \(\tau\) then applying \(\sigma\)” ought to be the same as “applying \((\sigma \circ \tau)\)”. So we add these as conditions:

Of course, all of this may look rather familiar if you know some basic category theory. Consider the category \(\mathbb{B}\) whose objects are sets and whose morphisms are bijections. Then all of the above can be summed up by the pithy

A species is an endofunctor on \(\mathbb{B}\).

Whew! We finally made it to the definition. However, working directly with the definition is not very convenient. In my next post I’ll begin explaining the more usual algebraic approach.

At this point I should mention that species were first introduced by André Joyal in his thesis (1981). Unfortunately it is in French, which I cannot read. Fortunately Bergeron, Labelle, and Leroux (1998) wrote an excellent reference text on the subject of species. Unfortunately it is in French too. Fortunately, Margaret Readdy translated it into English!

References

Bergeron, F., G. Labelle, and P. Leroux. 1998. Combinatorial species and tree-like structures. Trans. Margaret Readdy. Encyclopedia of Mathematics and its Applications. Cambridge: Cambridge University Press.

Joyal, André. 1981. “Une Théorie Combinatoire des Séries Formelles.” Advances in Mathematics 42: 1–82.


  1. A note on spelling: generally, “labeled” is the American spelling and “labelled” British (though “labelled” is also in common American usage, according to Merriam-Webster). I try to consistently use the American spelling, but will probably slip up occasionally, and you should use whichever spelling makes you happiest.↩︎

  2. I’ve seen this pattern show up multiple times in different category-theoretic contexts, but I don’t feel qualified to comment on it more generally. If you have any pointers to more general discussion of this idea/phenomenon I’d appreciate it.↩︎