Competitive programming in Haskell: topsort via laziness
In my previous post, I challenged you to solve Letter Optimiztion. In this problem, we have a directed acyclic graph where each vertex represents a person, and there is an edge p -> q
when person p
sends their finished envelopes to person q
. Also:
- Some people may send their envelopes to multiple other people, in which case they send a certain percentage of their output to each.
- Each person has a maximum speed at which they are able to process envelopes, measured in envelopes per second.
- The people with no inputs are assumed to have an infinite stack of envelopes and therefore work at their maximum speed.
- There are guaranteed to be no cycles.
The problem is to figure out which people are actually working at their maximum speed. Of course, the reason this is interesting is that the rate at which a person can work is partially determined by the rate at which envelopes are coming to them, which depends on the rates at which people before them in the pipeline are working, and so on.
The typical way to solve this would be to first topologically sort the people (e.g. using a DFS or Kahn’s Algorithm), then fill in the speed of each person in order of the topological sort. That way, when we calculate each person’s rate, we already know the rates of anyone that sends them input. This can also be thought of as a particularly simple form of dynamic programming.
Get rid of topological sort with this one neat trick
However, there is a nice trick we can use in Haskell to save ourselves a bunch of work: instead of doing an explicit topological sort, we can simply define a lazy, recursive array or map with the final values we want; laziness will take care of evaluating the array or map entries in the correct order. Essentially, we are co-opting the Haskell runtime into doing a topological sort for us!
Let’s see some code! First, some pragmas, imports, and boring utility functions. (For an explanation of the Scanner
import, see this post and also this one.)
{-# LANGUAGE RecordWildCards #-}
{-# LANGUAGE TupleSections #-}
import Control.Arrow (second, (>>>))
import Data.Array
import qualified Data.ByteString.Lazy.Char8 as C
import Data.Set (Set)
import qualified Data.Set as S
import ScannerBS
infixl 0 >$>
(>$>) = flip ($)
showB :: Show a => a -> C.ByteString
showB = show >>> C.pack
Now for some data types to represent the input, and some code to parse it.
data Person = Person { maxSpeed :: Double, sends :: [(Int, Double)] }
deriving (Eq, Show)
data TC = TC { n :: Int, people :: Array Int Person }
deriving (Eq, Show)
tc :: Scanner TC
tc = do
n <- int
people <- listArray (1,n) <$> (n >< (Person <$> double <*> numberOf ((,) <$> int <*> ((/100) <$> double))))
return TC{..}
As an aside, notice how I use a record wildcard to create the output TC
value. I find this a quick, simple, and consistent way to structure my scanning code, without having to come up with multiple names for the same thing. I don’t know whether I would ever use it in production code; I’ll leave that to others for debate.
To solve the problem, we take an array production
holding the computed production speeds for each person (we’ll see how to build it in a minute), and extract the people who are working at their max speed.
main = C.interact $ runScanner tc >>> solve >>> map showB >>> C.unwords
solve :: TC -> [Int]
solve TC{..} =
production >$> assocs >>>
filter (\(p,u) -> abs (u - maxSpeed (people!p)) < 0.0001) >>>
map fst
How do we compute the array of production speeds? First, we build a map from each person to their set of inputs:
where
-- inputMap!p = set of people from whom p gets input, with percentage for each
inputMap :: Array Int (Set (Int,Double))
inputMap = accumArray (flip S.insert) S.empty (1,n) (concatMap getInputs (assocs people))
getInputs :: (Int, Person) -> [(Int, (Int, Double))]
getInputs (p, Person _ ss) = map (second (p,)) ss
Now we create a lazy, recursive Array
that maps each person to their production speed. Notice how the definition of production
refers to itself: this works because the Array
type is lazy in the values stored in the array. The values are not computed until we actually demand the value stored for a particular index; the Haskell runtime then goes off to compute it, which may involve demanding the values at other indices, and so on.
production :: Array Int Double
production = array (1,n)
[ (p,u)
| p <- [1 .. n]
, let m = maxSpeed (people!p)
i = (inputMap!p) >$> S.toList >>> map (\(x,pct) -> pct * (production!x)) >>> sum
u = if S.null (inputMap!p) then m else min i m
]
For each person p
, m
is their maximum speed, i
is the sum of all the production coming from their inputs (depending on their inputs’ own production speeds), and the person’s production speed u
is the minimum of their input and their maximum speed (or simply their maximum speed if they have no inputs).
For next time
For next time, I challenge you to solve Alien Math!