Competitive programming in Haskell: BFS, part 4 (implementation via STUArray)
In a previous post, we saw one way to implement our BFS API, but I claimed that it is not fast enough to solve Modulo Solitaire. Today, I want to demonstrate a faster implementation. (It’s almost certainly possible to make it faster still; I welcome suggestions!)
Once again, the idea is to replace the HashMap
s from last time with mutable arrays, but in such a way that we get to keep the same pure API—almost. In order to allow arbitrary vertex types, while storing the vertices efficiently in a mutable array, we will require one extra argument to our bfs
function, namely, an Enumeration
specifying a way to map back and forth between vertices and array indices.
So why not instead just restrict vertices to some type that can be used as keys of a mutable array? That would work, but would unnecessarily restrict the API. For example, it is very common to see competitive programming problems that are “just” a standard graph algorithm, but on a non-obvious graph where the vertices are conceptually some more complex algebraic type, or on a graph where the vertices are specified as strings. Typically, competitive programmers just implement a mapping between vertices to integers on the fly—using either some math or some lookup data structures on the side—but wouldn’t it be nicer to be able to compositionally construct such a mapping and then have the graph search algorithm automatically handle the conversion back and forth? This is exactly what the Enumeration
abstraction gives us.
This post is literate Haskell; you can obtain the source from the darcs repo. The source code (without accompanying blog post) can also be found in my comprog-hs repo.
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE RecordWildCards #-}
{-# LANGUAGE ScopedTypeVariables #-}
module Graph where
import Enumeration
import Control.Arrow ((>>>))
import Control.Monad
import Control.Monad.ST
import qualified Data.Array.IArray as IA
import Data.Array.ST
import Data.Array.Unboxed (UArray)
import qualified Data.Array.Unboxed as U
import Data.Array.Unsafe (unsafeFreeze)
import Data.Sequence (Seq (..), ViewL (..), (<|), (|>))
import qualified Data.Sequence as Seq
infixl 0 >$>
(>$>) :: a -> (a -> b) -> b
(>$>) = flip ($)
{-# INLINE (>$>) #-}
exhaustM
is like exhaust
from the last post, but in the context of an arbitrary Monad
. Each step will now be able to have effects (namely, updating mutable arrays) so needs to be monadic.
exhaustM :: Monad m => (a -> m (Maybe a)) -> a -> m a
exhaustM f = go
where
go a = do
ma <- f a
maybe (return a) go ma
The BFSResult
type is the same as before.
data BFSResult v =
BFSR { getLevel :: v -> Maybe Int, getParent :: v -> Maybe v }
Instead of using HashMap
s in our BFSState
as before, we will use STUArray
s.1 These are unboxed, mutable arrays which we can use in the ST
monad. Note we also define V
as a synonym for Int
, just as a mnemonic way to remind ourselves which Int
values are supposed to represent vertices.
type V = Int
data BFSState s =
BS { level :: STUArray s V Int, parent :: STUArray s V V, queue :: Seq V }
To initialize a BFS state, we allocate new mutable level and parent arrays (initializing them to all \(-1\) values), and fill in the level
array and queue with the given start vertices. Notice how we need to be explicitly given the size of the arrays we should allocate; we will get this size from the Enumeration
passed to bfs
.
initBFSState :: Int -> [V] -> ST s (BFSState s)
initBFSState n vs = do
l <- newArray (0,n-1) (-1)
p <- newArray (0,n-1) (-1)
forM_ vs $ \v -> writeArray l v 0
return $ BS l p (Seq.fromList vs)
The bfs’
function implements the BFS algorithm itself. Notice that it is not polymorphic in the vertex type; we will fix that with a wrapper function later. If you squint, the implementation looks very similar to the implementation of bfs
from my previous post, with the big difference that everything has to be in the ST
monad now.
bfs' :: Int -> [V] -> (V -> [V]) -> (V -> Bool) -> ST s (BFSState s)
bfs' n vs next goal = do
st <- initBFSState n vs
exhaustM bfsStep st
where
bfsStep st@BS{..} = case Seq.viewl queue of
EmptyL -> return Nothing
v :< q'
| goal v -> return Nothing
| otherwise -> v >$> next >>> filterM (fmap not . visited st)
>=> foldM (upd v) (st{queue=q'}) >>> fmap Just
upd p b@BS{..} v = do
lp <- readArray level p
writeArray level v (lp + 1)
writeArray parent v p
return $ b{queue = queue |> v}
visited :: BFSState s -> V -> ST s Bool
visited BS{..} v = (/= -1) <$> readArray level v
{-# INLINE visited #-}
The bfs
function is a wrapper around bfs’
. It presents the same API as before, with the exception that it requires an extra Enumeration v
argument, and uses it to convert vertices to integers for the inner bfs’
call, and then back to vertices for the final result. It also handles freezing the mutable arrays returned from bfs’
and constructing level and parent lookup functions that index into them. Note, the use of unsafeFreeze
seems unavoidable, since runSTUArray
only allows us to work with a single mutable array; in any case, it is safe for the same reason the use of unsafeFreeze
in the implementation of runSTUArray
itself is safe: we can see from the type of toResult
that the s
parameter cannot escape, so the type system will not allow any further mutation to the arrays after it completes.
bfs :: forall v. Enumeration v -> [v] -> (v -> [v]) -> (v -> Bool) -> BFSResult v
bfs Enumeration{..} vs next goal
= toResult $ bfs' card (map locate vs) (map locate . next . select) (goal . select)
where
toResult :: (forall s. ST s (BFSState s)) -> BFSResult v
toResult m = runST $ do
st <- m
(level' :: UArray V Int) <- unsafeFreeze (level st)
(parent' :: UArray V V) <- unsafeFreeze (parent st)
return $
BFSR
((\l -> guard (l /= -1) >> Just l) . (level' IA.!) . locate)
((\p -> guard (p /= -1) >> Just (select p)) . (parent' IA.!) . locate)
Incidentally, instead of adding an Enumeration v
argument, why don’t we just make a type class Enumerable
, like this?
class Enumerable v where
enumeration :: Enumeration v
bfs :: forall v. Enumerable v => [v] -> ...
This would allow us to keep the same API for BFS, up to only different type class constraints on v
. We could do this, but it doesn’t particularly seem worth it. It would typically require us to make a newtype
for our vertex type (necessitating extra code to map in and out of the newtype
) and to declare an Enumerable
instance; in comparison, the current approach with an extra argument to bfs
requires us to do nothing other than constructing the Enumeration
itself.
Using this implementation, bfs
is finally fast enough to solve Modulo Solitaire, like this:
main = C.interact $ runScanner tc >>> solve >>> format
data Move = Move { a :: !Int, b :: !Int } deriving (Eq, Show)
data TC = TC { m :: Int, s0 :: Int, moves :: [Move] } deriving (Eq, Show)
tc :: Scanner TC
tc = do
m <- int
n <- int
TC m <$> int <*> n >< (Move <$> int <*> int)
type Output = Maybe Int
format :: Output -> ByteString
format = maybe "-1" showB
solve :: TC -> Output
solve TC{..} = getLevel res 0
where
res = bfs (finiteE m) [s0] (\v -> map (step m v) moves) (==0)
step :: Int -> Int -> Move -> Int
step m v (Move a b) = (a*v + b) `mod` m
{-# INLINE step #-}
It’s pretty much unchanged from before, except for the need to pass an Enumeration
to bfs
(in this case we just use finiteE m
, which is the identity on the interval \([0 .. m)\)).
Some remaining questions
This is definitely not the end of the story.
-
Submitting all this code (BFS,
Enumeration
, and the above solution itself) as a single file gives a 2x speedup over submitting them as three separate modules. That’s annoying—why is that? -
Can we make this even faster? My solution to Modulo Solitaire runs in 0.57s. There are faster Haskell solutions (for example, Anurudh Peduri has a solution that runs in 0.32s), and there are Java solutions as fast as 0.18s, so it seems to me there ought to be ways to make it much faster. If you have an idea for optimizing this code I’d be very interested to hear it! I am far from an expert in Haskell optimization.
-
Can we generalize this nicely to other kinds of graph search algorithms (at a minimum, DFS and Dijkstra)? I definitely plan to explore this question in the future.
For next time: Breaking Bad
Next time, I want to look at a few other applications of this BFS code (and perhaps see if we can improve it along the way); I challenge you to solve Breaking Bad.
-
Why not use
Vector
, you ask? It’s probably even a bit faster, but thevector
library is not supported on as many platforms.↩︎