Competitive programming in Haskell: building unordered trees
Tagged equality, Kattis, multiset, parsec, parsing, set, shape, subway, tree, competitive programming, haskell
In my previous post I challenged you to solve Subway Tree System, which encodes trees by recording sequences of steps taken away from and towards the root while exploring the whole tree, and asks whether two such recordings denote the same tree. There are two main difficulties here: the first is how to do the parsing; second, how to compare two trees when we don’t care about the order of children at each node. Thanks to all of you who posted your solutions—I learned a lot. I often feel like my solution is obviously the “only” solution, but then when I see how others solve a problem I realize that the solution space is much larger than I thought!
My solution
Here’s my solution, with some commentary interspersed. First, some pragmas and imports and such:
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE TupleSections     #-}
import           Control.Arrow
import           Data.Bool
import qualified Data.ByteString.Lazy.Char8  as C
import           Data.Function
import           Data.List
import           Data.List.Split
import           Data.Map                    (Map)
import qualified Data.Map                    as M
import           Text.Parsec
import           Text.Parsec.ByteString.Lazy
import           Text.Parsec.Char
My main then looks like this:
main = C.interact $
  C.lines >>> drop 1 >>> chunksOf 2 >>>
  map (solve >>> bool "different" "same") >>> C.unlines
The use of ByteString instead of String isn’t really necessary for this problem, just habit. I split the input into lines, group them in twos using Data.List.Split.chunksOf, solve each test case, and turn the output into different or same appropriately. (Data.Bool.bool is the fold/case analysis for the Bool type; I never use it in any other Haskell code but am unreasonably fond of it for this particular use case.) It would also be possible to use the Scanner abstraction instead of lines, drop, and chunksOf, as commenter blaisepascal2014 did. In some ways that would actually be nicer, but I often default to using these more basic tools in simple cases.
Parsing
Now for parsing the trees. The parsing is not too bad, and several commenters essentially did it manually with a recursive function manipulating a stack and so on; the most creative used a tree zipper to literally walk around the tree being constructed, just like you are supposedly walking around a subway in the problem. However, the parsec package is available in the Kattis environment, so the easiest thing is to actually whip up a proper little parser. (I know of several other Kattis problems which can be nicely solved using parser combinators but would be annoying otherwise, for example, Calculator and Otpor. A rather fiendish but fun parsing puzzle is Learning to Code.)
readTree :: C.ByteString -> Tree
readTree = parse parseTree "" >>> either undefined id
  where
    parseTree    = Node     <$> parseForest
    parseForest  = fromList <$> many parseSubtree
    parseSubtree = char '0' *> parseTree <* char '1'
Of course I haven’t actually shown the definition of Tree, Node, or fromList yet, but hopefully you get the idea. either undefined id is justified here since the input is guaranteed to be well-formed, so the parser will never actually fail with a Left.
Unordered trees
The other difficulty is how to compare trees up to reordering children. Trying all permutations of the children at each node and seeing whether any match is obviously going to be much too slow! The key insight, and what this problem had in common with the one from my previous post, is that we can use an (automatically-derived) Ord instance to sort the children at each node into a canonical order. We don’t really need to know or care what order they end up in, which depends on the precise details of how the derived Ord instance works. The point is that sorting into some consistent order allows us to efficiently test whether two lists are permutations of each other.
I think everyone who posted a solution created some kind of function to “canonicalize” a tree, by first canonicalizing all subtrees and then sorting them. When I first solved this problem, however, I approached it along slightly different lines, hinted at by commenter Globules: can we define the Tree type in such a way that there is only a single representation for each tree-up-to-reordering?
My first idea was to use a Data.Set of children at each node, but this is subtly wrong, since it gets rid of duplicates! We don’t actually want a set of children at each node, but rather a bag (aka multiset). So I made a little Bag abstraction out of a Map. The magical thing is that GHC can still derive an Ord instance for my recursive tree type containing a newtype containing a Map containing trees! (OK, OK, it’s not really magic, but it still feels magic…)
Now, actually, I no longer think this is the best solution, but it’s interesting, so I’ll leave it. Later on I will show what I think is an even better solution.
newtype Tree = Node (Bag Tree)
  deriving (Eq, Ord)
newtype Bag a = Bag (Map a Int)
  deriving (Eq, Ord)
fromList :: Ord a => [a] -> Bag a
fromList = map (,1) >>> M.fromListWith (+) >>> Bag
The final piece is the solve function, which simply calls readTree on the two strings and compares the resulting (canonical!) Tree values for equality.
solve :: [C.ByteString] -> Bool
solve [t1,t2] = ((==) `on` readTree) t1 t2A better way
I still think it’s a nice idea to have canonical-by-construction trees, rather than building ordered trees and then calling a separate function to canonicalize them afterwards. But inspired by several commenters’ solutions, I realized that rather than my complex Bag type, it’s much nicer to simply use a sorted list as the canonical representation of a Node’s bag of subtrees, and to use a smart constructor to build them:
newtype Tree = Node [Tree]
  deriving (Eq, Ord)
mkNode :: [Tree] -> Tree
mkNode = Node . sort
Then we just use mkNode instead of Node in the parser, and voilà! The canonicalization happens on the fly while parsing the tree. By contrast, if we write a separate canonicalization function, like
canonical :: Tree -> Tree
canonical (Node ts) = Node (map canonical (sort ts))it is actually possible to get it wrong. In fact, I deliberately introduced a bug into the above function: can you see what it is?
All told, then, here is the (in my opinion) nicest solution that I know of:
{-# LANGUAGE OverloadedStrings #-}
import           Control.Arrow
import           Data.Bool
import qualified Data.ByteString.Lazy.Char8 as C
import           Data.Function
import           Data.List
import           Text.Parsec
import           ScannerBS                  hiding (many)
main = C.interact $
  runScanner (numberOf (two str)) >>>
  map (solve >>> bool "different" "same") >>> C.unlines
solve :: [C.ByteString] -> Bool
solve [t1,t2] = ((==) `on` readTree) t1 t2
newtype Tree = Node [Tree] deriving (Eq, Ord)
readTree :: C.ByteString -> Tree
readTree = parse parseTree "" >>> either undefined id
  where
    parseTree    = (Node . sort) <$> many parseSubtree
    parseSubtree = char '0' *> parseTree <* char '1'Next problem
For Tuesday, I invite you to solve The Power of Substitution. Don’t let the high difficulty rating scare you; in my estimation it should be quite accessible if you know a bit of math and have been following along with some of my previous posts (YMMV). However, it’s not quite as obvious what the nicest way to write it in Haskell is.