The generic-random library, part 1: simple generic Arbitrary instances
(Note, 9/21/2017: The generic-random
package has been changed and updated quite a bit since I wrote this post. For more up-to-date information, see e.g. the tutorial included with the package.)
In a previous post I pointed out that we know all the theory to make nice, principled, practical random generators for recursive algebraic data types; someone just needed to step up and do the work. Well, Li-yao Xia took up the challenge and produced a brilliant package, generic-random, available on Hackage right now for you to use!
However, although the package does include some Haddock documentation, it is probably difficult for someone with no experience or background in this area to navigate. So I thought it would be worth writing a few blog posts by way of a tutorial and introduction to the package.
> {-# LANGUAGE GADTSyntax #-}
> {-# LANGUAGE DeriveGeneric #-}
> {-# LANGUAGE FlexibleContexts #-}
> {-# LANGUAGE UndecidableInstances #-}
>
> import GHC.Generics
> import Test.QuickCheck
>
> import Generic.Random.Generic
The problem
First, a quick recap of the problem we are trying to solve: the obvious, naive way of generating random instances of some recursive algebraic data type often produces really terrible distributions. For example, one might generate really tiny structures most of the time and then occasionally generate a humongous one. For more background on the problem, see this post or this one.
A first example: generating generic Arbitrary
instances
As a first example, consider the following algebraic data type:
> data Foo where
> Bar :: Char -> Int -> String -> Foo
> Baz :: Bool -> Bool -> Foo
> Quux :: [Woz] -> Foo
> deriving (Show, Generic)
>
> data Woz where
> Wiz :: Int -> Woz
> Waz :: Bool -> Woz
> deriving (Show, Generic)
You have probably noticed by now that this is not recursive (well, except for the embedded lists). Patience! We’ll get to recursive ADTs in due time, but it turns out the library has some nice things to offer for non-recursive ADTs as well, and it makes for an easier introduction.
Now, suppose we wanted to use QuickCheck to test some properties of a function that takes a Foo
as an argument. We can easily make our own instances of Arbitrary
for Foo
and Woz
, like so:
instance Arbitrary Foo where
arbitrary = oneof
[ Bar <$> arbitrary <*> arbitrary <*> arbitrary
, Baz <$> arbitrary <*> arbitrary
, Quux <$> arbitrary
]
instance Arbitrary Woz where
arbitrary = oneof
[ Wiz <$> arbitrary
, Waz <$> arbitrary
]
This works reasonably well:
λ> sample (arbitrary :: Gen Foo)
Baz True True
Baz False True
Baz True True
Quux []
Baz False True
Bar '<' 3 "zy\\\SOHpO_"
Baz False True
Bar '\SOH' 0 "\"g\NAKm"
Bar 'h' (-9) "(t"
Quux [Wiz (-2),Waz False]
Baz False True
The only problem is that writing those instances is quite tedious. There is no thought required at all. Isn’t this exactly the sort of thing that is supposed to be automated with generic programming?
Why yes, yes it is. And the generic-random
package can do exactly that. Notice that we have derived Generic
for Foo
and Woz
. We can now use the genericArbitrary
function from Generic.Random.Generic
to derive completely standard Arbitrary
instances, just like the ones we wrote above:
> instance Arbitrary Foo where
> arbitrary = genericArbitrary
>
> instance Arbitrary Woz where
> arbitrary = genericArbitrary
λ> sample (arbitrary :: Gen Foo)
Quux []
Bar '\159' (-2) ""
Baz True True
Baz False False
Baz True True
Baz True False
Quux [Wiz 9,Wiz 7,Waz True,Waz True,Waz False]
Quux [Wiz (-10),Waz False,Waz False,Waz True,Waz True,Wiz (-14),Wiz 13,Waz True,Wiz (-8),Wiz 12,Wiz (-13)]
Bar '\130' 10 "FN\222j?\b=\237(\NULW\231+ts\245"
Bar 'n' 14 ""
Bar '\205' 4 "\SYN"
Seems about the same, except we wrote way less code! Huzzah!
If we want certain constructors to occur more frequently, we can also control that using genericArbitraryFrequency
, which takes a list of Int
s (each Int
specifies the weight for one constructor).
A few notes:
-
Using the
Generic.Random.Generic
module is the quickest and simplest way to generate random instances of your data type, if it works for your use case. -
It has some limitations, namely:
-
It only generates
Arbitrary
instances for QuickCheck. It can’t create more general random generators. -
It probably won’t work very well for recursive data types.
-
However, these limitations are addressed by other parts of the library. Intrigued? Read on!
Recursive types, the simple way
Let’s now consider a simple recursive type:
> data Tree a where
> Leaf :: a -> Tree a
> Branch :: Tree a -> Tree a -> Tree a
> deriving (Show, Generic)
>
> treeSize :: Tree a -> Int
> treeSize (Leaf _) = 1
> treeSize (Branch l r) = 1 + treeSize l + treeSize r
We can try using genericArbitrary
:
instance Arbitrary a => Arbitrary (Tree a) where
arbitrary = genericArbitrary
The problem is that this tends to generate some tiny trees and some enormous trees, with not much in between:
λ> map treeSize replicateM 50 (generate (arbitrary :: Gen (Tree Int)))
[1,1,1,269,1,1,1,1,1,11,7,3,5,1,1,1,7,1,1,1,3,3,83,5,1,1,3,111,265,47,1,3,19,1,11,1,5,3,15,15,1,91,1,13,4097,119,1,15,5,3]
And this is not a problem specific to trees; this kind of thing is likely to happen for any recursive type.
Before we get to more interesting/complicated tools, it’s worth noting that random-generics
provides a simple mechanism to limit the size of the generated structures: the genericArbitrary’
function works like genericArbitrary
but uses QuickCheck’s sized
mechanism to cut off the recursion when it gets too big. The available size is partitioned among recursive calls, so it does not suffer from the exponential growth you might see if only the depth was limited. When the size counter reaches zero, the generator tries to terminate the recursion by picking some finite, non-recursive value(s). The parameter to genericArbitrary’
is a natural number specifying how deep the finite, recursion-terminating values can be. Z
(i.e zero) means the generator will only be willing to terminate the recursion with nullary constructors. In our case, Tree
does not have any nullary constructors, so we should not use Z
: if we do, the generator will be unable to terminate the recursion when the size reaches zero and we will get the same behavior as genericArbitrary
. Instead, we should use S Z
, which means it will be able to pick the depth-1 term Leaf x
(for some arbitrary x
) to terminate the recursion.
Let’s try it:
> instance (Arbitrary a, Generic a, BaseCases Z (Rep a)) => Arbitrary (Tree a) where
> arbitrary = genericArbitrary' (S Z)
λ> sample (arbitrary :: Gen (Tree Int))
Leaf 0
Branch (Leaf 0) (Branch (Leaf 0) (Branch (Leaf 0) (Leaf 0)))
Branch (Leaf (-1)) (Leaf 1)
Leaf (-3)
Leaf 7
Branch (Leaf (-4)) (Branch (Branch (Leaf 1) (Leaf (-1))) (Leaf (-1)))
Branch (Leaf (-2)) (Branch (Leaf 1) (Branch (Leaf 0) (Branch (Leaf 0) (Leaf 0))))
Leaf 14
Branch (Branch (Leaf 2) (Leaf 2)) (Branch (Branch (Branch (Leaf 1) (Branch (Branch (Leaf 0) (Branch (Leaf 0) (Leaf 0))) (Branch (Leaf 0) (Leaf 0)))) (Branch (Branch (Branch (Leaf 0) (Leaf 0)) (Leaf 0)) (Leaf 0))) (Leaf (-3)))
Leaf 4
Leaf 9
Ah, that’s much better.
Finally, genericArbitraryFrequency’
is the same as genericArbitraryFrequency
but limits the recursion depth as genericArbitrary’
does.
If you have a recursive data type you want to use with QuickCheck, it’s worth trying this, since it is quick and simple. The main problem with this approach is that it does not generate a uniform distribution of values. (Also, it is limited in that it is specifically tied to QuickCheck.) In this example, although you can’t necessarily tell just by looking at the sample random trees, I guarantee you that some kinds of trees are much more likely to be generated than others. (Though I couldn’t necessarily tell you which kinds.) This can be bad if the specific trees that will trigger a bug are in fact unlikely to be generated.
Next time, we’ll look at how we can actually have efficient, size-limited, uniform random generators using Boltzmann samplers.