Collecting unstructured information with the monoid of partial knowledge
In my last post, I described what I’m calling the “monoid of partial knowledge”, a way of creating a monoid over sets of elements from a preorder, which is a generalization of the familiar monoid \((S,\max)\) over a set with a total order and a smallest element.
There’s actually one situation where a special case of this monoid is commonly used in Haskell. Suppose you have a record type which contains several fields, and you would like to parse some input to create a value of this type. The problem is that the input is not very nice: the bits of input which designate values for various fields are not in any particular order; some occur more than once; some might even be missing. How to deal with this?
One common solution is to wrap the type of each field in the record with Maybe, and create a Monoid instance which allows you to combine two partial records into one (hopefully less partial) record. Using this framework, you can just parse each bit of input into a mostly-blank record, with one (or more) fields filled in, then combine all the records (with mconcat) into one summary record. For example:
import Data.Monoid import Control.Monad -- for the MonadPlus instance of Maybe data Record = R { name :: Maybe String, age :: Maybe Int } deriving (Eq, Show) instance Monoid Record where mempty = R Nothing Nothing mappend r1 r2 = R { name = name r1 `mplus` name r2 , age = age r1 `mplus` age r2 }
The mplus function, from the MonadPlus instance for Maybe, simply takes the first Just value that it finds. This is nice and simple, and works just great:
> mempty :: Record R {name = Nothing, age = Nothing} > mconcat [mempty { name = Just "Brent" }] R {name = Just "Brent", age = Nothing} > mconcat [mempty { name = Just "Brent" }, mempty { age = Just 26 }] R {name = Just "Brent", age = Just 26} > mconcat [mempty { name = Just "Brent" }, mempty { age = Just 26 }, mempty { age = Just 23 }] R {name = Just "Brent", age = Just 26}
Note how the appending is left-biased, because mplus is left-biased: after seeing the first age, all subsequent ages are ignored.
Now, really what we’re doing here is using the monoid (as in my previous post) induced by the preorder which says that any name is \(\lesssim\) any other name, and any age is \(\lesssim\) any other age, and names and ages can’t be compared!
Some code is in order. First, we create a class for preorders, and a newtype to contain sets of elements (there’s already a Monoid instance for Set, so we need a newtype to give a different semantics). Then we translate the specification from my previous post directly into a Monoid instance.
import qualified Data.Set as S -- laws: -- if x == y, then x <~ y -- if x <~ y and y <~ z, then x <~ z class (Eq p) => Preorder p where (<~) :: p -> p -> Bool newtype PStar p = PStar { unPStar :: (S.Set p) } deriving Show instance (Preorder p, Ord p) => Monoid (PStar p) where mempty = PStar S.empty mappend (PStar ss) (PStar ts) = PStar $ S.filter (\s -> all (\t -> s == t || t <~ s || not (s <~ t)) $ S.toList ts) ss `S.union` S.filter (\t -> all (\s -> s == t || not (t <~ s)) $ S.toList ss) ts
Pretty straightforward! Of course, there are probably more efficient ways to do this, but I don’t care about that at the moment: let’s get some working proof-of-concept code, and worry about efficiency later. Now, one thing you may notice is that our Monoid instance requires an Ord instance for p! “But I thought we only needed a preorder?” I hear you cry. Well, the Ord is just there so that we can use an efficient implementation of Set. In particular, note that the Ord instance need have nothing at all to do with the Preorder instance, and won’t affect the semantics of this Monoid instance in any way. If we wanted, we could do away with the Ord entirely and use lists of unique elements (or something like that) instead of Sets.
So, how do we translate our record example from above into this new framework? Easy, we just create a data type to represent the different kinds of facts we want to represent, along with a convenience method or two, and make it an instance of Preorder:
data Fact = Name String | Age Int deriving (Eq,Show,Ord) fact :: Fact -> PStar Fact fact f = PStar (S.singleton f) instance Preorder Fact where (Name _) <~ (Name _) = True (Age _) <~ (Age _) = True _ <~ _ = False
Let’s try it!
> mempty :: PStar Fact PStar {unPStar = fromList []} > mconcat . map fact $ [Name "Brent"] PStar {unPStar = fromList [Name "Brent"]} > mconcat . map fact $ [Name "Brent", Age 26] PStar {unPStar = fromList [Name "Brent",Age 26]} > mconcat . map fact $ [Name "Brent", Age 26, Age 23] PStar {unPStar = fromList [Name "Brent",Age 26]}
Neato! But the really cool thing is all the extra power we get now: we can easily tweak the semantics of the Monoid instance by altering the Preorder instance. For example, suppose we want the first name that we see, but the oldest age. Easy peasy:
instance Preorder Fact where (Name _) <~ (Name _) = True (Age m) <~ (Age n) = m <= n _ <~ _ = False > mconcat . map fact $ [Age 23, Name "Brent"] PStar {unPStar = fromList [Name "Brent",Age 23]} > mconcat . map fact $ [Age 23, Name "Brent", Age 24] PStar {unPStar = fromList [Name "Brent",Age 24]} > mconcat . map fact $ [Age 23, Name "Brent", Age 24, Name "Joe", Age 26] PStar {unPStar = fromList [Name "Brent",Age 26]}
Of course, we could have done this with the Record model, but it wouldn’t be terribly elegant. But we’re not done: let’s say we want to remember the oldest age we find, and the first name, unless the age is over 50, in which case we don’t want to remember the name (I admit this is a bit contrived)? That’s not too hard either:
instance Preorder Fact where (Name _) <~ (Name _) = True (Age m) <~ (Age n) = m <= n (Name _) <~ (Age n) = n > 50 _ <~ _ = False > mconcat . map fact $ [Name "Brent", Age 26] PStar {unPStar = fromList [Name "Brent",Age 26]} > mconcat . map fact $ [Name "Brent", Age 26, Age 45] PStar {unPStar = fromList [Name "Brent",Age 45]} > mconcat . map fact $ [Name "Brent", Age 26, Age 45, Age 53] PStar {unPStar = fromList [Age 53]}
This would have been a huge pain to do with the Record model! Now, this isn’t an unqualified improvement; there are several things we can’t do here. One is if we want to be able to combine facts into larger compound facts: we can do that fairly straightforwardly with the Record-of-Maybes model, but not with the preorder-monoid model. We also can’t easily choose to have some fields be left-biased and some right-biased (the Monoid instance for PStar has left-bias built in!). But it’s certainly an interesting approach.
Now, one thing we do have to be careful of is that our Preorder instances really do define a preorder! For example, at first I tried using \(n < 18\) in the above Preorder instance instead of \(n > 50\), and was confused by the weird results I got. But such a Preorder instance violates transitivity, so no wonder I was getting weird semantics. =) It would be interesting to reformulate this in a dependently typed language like Agda, where creating a Preorder could actually require a proof that it satisfied the preorder axioms.
Thanks to Conal Elliott for some suggestions on making the formulation in the previous post more elegant – we’ll see what comes of it!