Dynamic programming in Haskell: lazy immutable arrays
This is part 1 of a promised multi-part series on dynamic programming in Haskell. As a reminder, we’re using Zapis as a sample problem. In this problem, we are given a sequence of opening and closing brackets (parens, square brackets, and curly braces) with question marks, and have to compute the number of different ways in which the question marks could be replaced by brackets to create valid, properly nested bracket sequences.
Last time, we developed a recurrence for this problem and saw some naive, directly recursive Haskell code for computing it. Although this naive version is technically correct, it is much too slow, so our goal is to implement it more efficiently.
Mutable arrays?
Someone coming from an imperative background might immediately reach for some kind of mutable array, e.g. STUArray
. Every time we call the function, we check whether the corresponding array index has already been filled in. If so, we simply return the stored value; if not, we compute the value recursively, and then fill in the array before returning it.
This would work, but there is a better way!
Immutable arrays
While mutable arrays occasionally have their place, we can surprisingly often get away with immutable arrays, where we completely define the array up front and then only use it for fast lookups afterwards.
-
If the type of the array elements is suitable, and we can initialize the array elements all at once from a list using some kind of formula, map, scan, etc., we should use
UArray
since it is much faster thanArray
. -
However,
UArray
is strict in the elements, and the elements must be of a type that can be stored unboxed. If we need a more complex element type, or we need to compute the array recursively (where some elements depend on other elements), we can useArray
.
What about the vector
library, you ask? Well, it’s a very nice library, and quite fast, but unfortunately it is not available on many judging platforms, so I tend to stick to array
to be safe. However, if you’re doing something like Advent of Code or Project Euler where you get to run the code on your own machine, then you should definitely reach for vector
.
Lazy, recursive, immutable arrays
In my previous post on topsort we already saw the basic idea: since Array
s are lazy in their elements, we can define them recursively; the Haskell runtime then takes care of computing the elements in a suitable order. Previously, we saw this applied to automatically compute a topological sort, but more generally, we can use it to fill out a table of values for any recurrence.
So, as a first attempt, let’s just replace our recursive c
function from last time with an array. I’ll only show the solve
function for now; the rest of the code remains the same. (Spoiler alert: this solution works, but it’s ugly. We’ll develop much better solutions later.)
solve :: String -> Integer
solve str = c!(0,n)
where
n = length str
s = listArray (0,n-1) str
c :: Array (Int, Int) Integer
c = array ((0,0),(n,n)) $
[ ((i,i), 1) | i <- [0..n] ]
++
[ ((i,j),0) | i <- [0..n], j <- [0..n], even i /= even j ]
++
[ ((i,j),v)
| i <- [0..n], j <- [0..n], i /= j, even i == even j
, let v = sum [ m (s!i) (s!k) * c!(i+1,k) * c!(k+1,j) | k <- [i+1, i+3 .. j-1]]
]
We use the array
function to create an array, which takes first a pair of indices specifying the index range, and then a list of (index, value) pairs. (The listArray
function can also be particularly useful, when we have a list of values which are already in index order, as in the definition of s
.)
This solution is accepted, and it’s quite fast (0.04s for me). However, it’s really ugly, and although it’s conceptually close to our directly recursive function from before, the code is almost unrecognizably different. It’s ugly that we have to repeat conditions like i /= j
and even i == even j
, and binders like i <- [0..n]
; the multiple list comprehensions and nested pairs like ((i,j),v)
are kind of ugly, and the fact that this is implementing a recurrence is completely obscured.
However, I included this solution as a first step because for a long time, after I learned about using lazy immutable arrays to implement dynamic programming in Haskell, this was the kind of solution I wrote! Indeed, if you just think about the idea of creating a recursively defined array, this might be the kind of thing you come up with: we define an array c
using the array
function, then we have to list all its elements, and we get to refer to c
along the way.
Mutual recursion to the rescue
Most of the ugliness comes from losing sight of the fact that there is a function mapping indices to values: we simply listed out all the function’s input/output pairs without getting to use any of Haskell’s very nice facilities for defining functions! So we can clean up the code considerably if we make a mutually recursive pair of an array and a function: the array values are defined using the function, and the function definition can look up values in the array.
solve :: String -> Integer
solve str = cA!(0,n)
where
n = length str
s = listArray (0,n-1) str
cA :: Array (Int, Int) Integer
cA = array ((0,0),(n,n)) $
[ ((i,j), c (i,j)) | i <- [0 .. n], j <- [0 .. n] ]
c :: (Int, Int) -> Integer
c (i,j)
| i == j = 1
| even i /= even j = 0
| otherwise = sum
[ m (s!i) (s!k) * cA ! (i+1,k) * cA ! (k+1,j)
| k <- [i+1, i+3 .. j-1]
]
Much better! The c
function looks much the same as our naive version from before, with the one difference that instead of calling itself recursively, it looks up values in the array cA
. The array, in turn, is simply defined as a lookup table for the outputs of the function.
Generalized tabulation
One nice trick we can use to simplify the code a bit more is to use the range
function to generate the list of all valid array indices, and then just map
the c
function over this. This also allows us to use the listArray
function, since we know that the range
will generate the indices in the right order.
cA :: Array (Int, Int) Integer
cA = listArray rng $ map c (range rng)
where
rng = ((0,0), (n,n))
In fact, we can abstract this into a useful little function to create a lookup table for a function:
tabulate :: Ix i => (i,i) -> (i -> a) -> Array i a
tabulate rng f = listArray rng (map f $ range rng)
(We can generalize this even more to make it work for UArray
as well as Array
, but I’ll stop here for now. And yes, I intentionally named this to echo the tabulate
function from the adjunctions
package; Array i
is indeed a representable functor, though it’s not really possible to express without dependent types.)
The solution so far
Putting it all together, here’s our complete solution so far. It’s pretty good, and in fact it’s organized in a very similar way to Soumik Sarkar’s dynamic programming solution to Chemist’s Vows. (However, there’s an even better solution coming in my next post!)
import Control.Arrow
import Data.Array
main = interact $ lines >>> last >>> solve >>> format
format :: Integer -> String
format = show >>> reverse >>> take 5 >>> reverse
tabulate :: Ix i => (i,i) -> (i -> a) -> Array i a
tabulate rng f = listArray rng (map f $ range rng)
solve :: String -> Integer
solve str = cA!(0,n)
where
n = length str
s = listArray (0,n-1) str
cA :: Array (Int, Int) Integer
cA = tabulate ((0,0),(n,n)) c
c :: (Int, Int) -> Integer
c (i,j)
| i == j = 1
| even i /= even j = 0
| otherwise = sum
[ m (s!i) (s!k) * cA ! (i+1,k) * cA ! (k+1,j)
| k <- [i+1, i+3 .. j-1]
]
m '(' ')' = 1
m '[' ']' = 1
m '{' '}' = 1
m '?' '?' = 3
m b '?' | b `elem` "([{" = 1
m '?' b | b `elem` ")]}" = 1
m _ _ = 0
Coming up next: automatic memoization!
So what’s not to like about this solution? Well, I still don’t like the fact that we have to define a mutually recursive array and function. Conceptually, I want to name them both c
(or whatever) since they are really isomorphic representations of the exact same mathematical function. It’s annoying that I have to make up a name like cA
or c’
or whatever for one of them. I also don’t like that we have to remember to do array lookups instead of recursive calls in the function—and if we forget, Haskell will not complain! It will just be really slow.
Next time, we’ll see how to use some clever ideas from Conal Elliot’s MemoTrie
package (which themselves ultimately came from a paper by Ralf Hinze) to solve these remaining issues and end up with some really beautiful code!