Lightweight, efficiently sampleable enumerations in Haskell
For another project I’m working on, I needed a way to enumerate and randomly sample values from various potentially infinite collections. There are plenty of packages in this space, but none of them quite fit my needs:
-
universe
(and related packages) is very nice, but it’s focused on enumerating values of Haskell data types, not arbitrary sets: since it uses type classes, you have to make a new Haskell type for each thing you want to enumerate. It also uses actual Haskell lists of values, which doesn’t play nicely with sampling. -
enumerable
has not been updated in a long time and seems to be superseded byuniverse
. -
enumerate
is likewise focused on generating values of Haskell data types, with accompanying generic deriving machinery. -
size-based
is used as the basis for the venerabletesting-feat
library, but these are again focused on generating values of Haskell data types. I’m also not sure I need the added complexity of size-indexed enumerations. -
enumeration
looks super interesting, and I might be able to use it for what I want, but (a) I’m not sure whether it’s maintained anymore, and (b) it seems rather more complex than I need.
I really want something like Racket’s nice data/enumerate
package, but nothing like that seems to exist in Haskell. So, of course, I made my own! For now you can find it on GitHub.1 Here’s the package in a nutshell:
-
Enumerations are represented by the parameterized type
Enumeration
, which is an instance ofFunctor
,Applicative
, andAlternative
(but notMonad
). - Enumerations keep track of their cardinality, which could be either countably infinite or a specific natural number.
- Enumerations are represented as functions from index to value, so they can be efficiently indexed (which also enables efficient random sampling).
- The provided combinators will always do something sensible so that every value in the resulting enumeration occurs at a finite index. For example, if you take the disjoint union of two infinite enumerations, the resulting enumeration will alternate between values from the two inputs.
I wrote about something similar a few years ago. The main difference is that in that post I limited myself to only finite enumerations. There’s a lot more I could say but for now I think I will just show some examples:
>>> enumerate empty
[]
>>> enumerate unit
[()]
>>> enumerate $ empty <|> unit <|> unit
[(),()]
>>> enumerate $ finite 4 >< finiteList [27,84,17]
[(0,27),(0,84),(0,17),(1,27),(1,84),(1,17),(2,27),(2,84),(2,17),(3,27),(3,84),(3,17)]
>>> select (finite 4000000000000 >< finite 123456789) 0
(0,0)
>>> select (finite 4000000000000 >< finite 123456789) 196598723084073
(1592449,82897812)
>>> card (finite 4000000000000 >< finite 123456789)
Finite 493827156000000000000
>>> :set -XTypeApplications
>>> enumerate $ takeE 26 . dropE 65 $ boundedEnum @Char
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
>>> take 10 . enumerate $ nat >< nat
[(0,0),(0,1),(1,0),(0,2),(1,1),(2,0),(0,3),(1,2),(2,1),(3,0)]
>>> take 10 . enumerate $ cw
[1 % 1,1 % 2,2 % 1,1 % 3,3 % 2,2 % 3,3 % 1,1 % 4,4 % 3,3 % 5]
>>> take 15 . enumerate $ listOf nat
[[],[0],[0,0],[1],[0,0,0],[1,0],[2],[0,1],[1,0,0],[2,0],[3],[0,0,0,0],[1,1],[2,0,0],[3,0]]
data Tree = L | B Tree Tree
deriving (Eq, Show)
trees :: Enumeration Tree
trees = infinite $ singleton L <|> B <$> trees <*> trees
>>> take 3 . enumerate $ trees
[L,B L L,B L (B L L)]
>>> select trees 87239862967296
B (B (B (B (B L L) (B (B (B L L) L) L)) (B L (B L (B L L)))) (B (B (B L (B L (B L L))) (B (B L L) (B L L))) (B (B L (B L (B L L))) L))) (B (B L (B (B (B L (B L L)) (B L L)) L)) (B (B (B L (B L L)) L) L))
treesOfDepthUpTo :: Int -> Enumeration Tree
treesOfDepthUpTo 0 = singleton L
treesOfDepthUpTo n = singleton L <|> B <$> t' <*> t'
where t' = treesOfDepthUpTo (n-1)
>>> card (treesOfDepthUpTo 0)
Finite 1
>>> card (treesOfDepthUpTo 1)
Finite 2
>>> card (treesOfDepthUpTo 3)
Finite 26
>>> card (treesOfDepthUpTo 10)
Finite
14378219780015246281818710879551167697596193767663736497089725524386087657390556152293078723153293423353330879856663164406809615688082297859526620035327291442156498380795040822304677
>>> select (treesOfDepthUpTo 10) (2^50)
B L (B L (B L (B (B L (B (B L (B (B L L) L)) (B (B (B (B L L) (B L L)) (B L (B L L))) (B (B (B L L) L) (B (B L L) L))))) (B (B (B (B (B (B L L) L) (B (B L L) L)) (B L L)) (B (B (B (B L L) L) (B L (B L L))) (B (B (B L L) (B L L)) L))) (B (B (B (B L L) (B L L)) (B (B (B L L) L) L)) (B (B L L) (B (B (B L L) L) (B (B L L) L))))))))
Comments, questions, suggestions for additional features, etc. are all very welcome!
-
I chose the name
enumeration
before I realized there was already a package of that name on Hackage! So now I have to come up with another name that’s not already taken. Suggestions welcome…↩︎