« Competitive programming in Haskell: Kadane’s algorithm » Competitive programming in Haskell: Codeforces Educational Round 114

Automatically updated, cached views with lens

Posted on September 17, 2021
Tagged , , ,

Recently I discovered a nice way to deal with records where certain fields of the record cache some expensive function of other fields, using the lens library. I very highly doubt I am the first person to ever think of this, but I don’t think I’ve seen it written down anywhere. I’d be very happy to be learn of similar approaches elsewhere.

The problem

Suppose we have some kind of record data structure, and an expensive-to-calculate function which computes some kind of “view”, or summary value, for the record. Like this:

data Record = Record
  { field1 :: A, field2 :: B, field3 :: C }

expensiveView :: A -> B -> C -> D
expensiveView = ...

(Incidentally, I went back and forth on whether to put real code or only pseudocode in this post; in the end, I decided on pseudocode. Hopefully it should be easy to apply in real situations.)

If we need to refer to the summary value often, we might like to cache the result of the expensive function in the record:

data Record = Record
  { field1 :: A, field2 :: B, field3 :: C, cachedView :: D }

expensiveView :: A -> B -> C -> D
expensiveView = ...

However, this has several drawbacks:

  1. Every time we produce a new Record value by updating one or more fields, we have to remember to also update the cached view. This is easy to miss, especially in a large codebase, and will most likely result in bugs that are very difficult to track down.

  2. Actually, it gets worse: what if we already have a large codebase that is creating updated Record values in various places? We now have to comb through the codebase looking for such places and modifying them to update the cachedExpensive field too. Then we cross our fingers and hope we didn’t miss any.

  3. Finally, there is nothing besides comments and naming conventions to prevent us from accidentally modifying the cachedExpensive field directly.

The point is that our Record type now has an associated invariant, and invariants which are not automatically enforced by the API and/or type system are Bad (tm).

Lens to the rescue

If you don’t want to use lens, you can stop reading now. (Honestly, given the title, I’m not even sure why you read this far.) In my case, I was already using it heavily, and I had a lightbulb moment when I realized how I could leverage it to add a safe cached view to a data type without modifying the rest of my codebase at all!

The basic idea is this:

  1. Add a field to hold the cached value as before.
  2. Don’t use lens’s TemplateHaskell utilites to automatically derive lenses for all the fields. Instead, declare them manually, such that they automatically update the cached field on every set operation.
  3. For the field with the cached value itself, declare a Getter, not a Lens.
  4. Do not export the constructor or field projections for your data type; export only the type and the lenses.

In pseudocode, it looks something like this:

module Data.Record
  (Record, field1, field2, field3, cachedView)
  where

import Control.Lens

data Record = Record
  { _field1 :: A, _field2 :: B, _field3 :: C, _cachedView :: D }

expensiveView :: A -> B -> C -> D
expensiveView = ...

recache :: Record -> Record
recache r = r { _cachedView = expensiveView (_field1 r) (_field2 r) (_field3 r) }

cachingLens :: (Record -> a) -> (Record -> a -> Record) -> Lens' Record a
cachingLens get set = lens get (\r a -> recache $ set r a)

field1 :: Lens' Record A
field1 = cachingLens _field1 (\r x -> r { _field1 = x })

field2 :: Lens' Record B
field2 = cachingLens _field2 (\r x -> r { _field2 = x })

field3 :: Lens' Record C
field3 = cachingLens _field3 (\r x -> r { _field3 = x })

cachedView :: Getter Record D
cachedView = to _cachedView

This solves all the problems! (1) We never have to remember to update the cached field; using a lens to modify the value of another field will automatically cause the cached view to be recomputed as well. (3) We can’t accidentally set the cached field, since it only has a Getter, not a Lens. In fact, this even solves (2), the problem of having to update the rest of our codebase: if we are already using lens to access fields in the record (as I was), then the rest of the codebase doesn’t have to change at all! And if we aren’t using lens already, then the typechecker will infallibly guide us to all the places we have to fix; once our code typechecks again, we know we have caught every single access to the record in the codebase.

Variant for only a few fields

What if we have a large record, and the cached summary value only depends on a few of the fields? In that case, we can save a bit of work for ourselves by getting lens to auto-generate lenses for the other fields, and only handcraft lenses for the fields that are actually involved. Like this:

{-# LANGUAGE TemplateHaskell #-}

data Record = Record
  { _field1 :: A, _field2 :: B, _cachedView :: C, ... }

expensiveView :: A -> B -> C
expensiveView = ...

let exclude = ['_field1, '_field2, '_cachedView] in
  makeLensesWith
    (lensRules & lensField . mapped . mapped %~ \fn n ->
      if n `elem` exclude then [] else fn n)
  ''Record

field1 :: Lens' Record A
field1 = ... similar to before ...

field2 :: Lens' Record B
field2 = ...

cachedView :: Getter Record C
cachedView = to _cachedView

But what about the lens laws?

You might worry that having a lens for one field automatically update the value of another field might break the lens laws somehow, but it’s perfectly legal, as we can check.

  1. view l (set l v s) ≡ v clearly holds: setting the cachedView on the side doesn’t change the fact that we get back out whatever we put into, say, field1.
  2. set l v’ (set l v s) ≡ set l v’ s also clearly holds. On the left-hand side, the cached summary value will simply get overwritten in the same way that the other field does.
  3. set l (view l s) s ≡ s is actually a bit more subtle. If we view the value of field1, then set it with the same value again, how do we know the value of the overall record s doesn’t change? In particular, could we end up with a different cachedView even though field1 is the same? But in fact, in this specific scenario (putting the same value back into a field that we just read), the value of the cachedView won’t change. This depends on two facts: first, that the expensiveView is a deterministic function which always returns the same summary value for the same input record. Of course this is guaranteed by the fact that it’s a pure function. Second, we must maintain the invariant that the cachedView is always up-to-date, so that recomputing the summary value after setting a field to the same value it already had will simply produce the same summary value again, because we know the summary value was correct to begin with. And of course, maintaining this invariant is the whole point; it’s guaranteed by the way we only export the lenses (and only a Getter for the cachedView) and not the record constructor.

And that’s it! I’ve been using this approach very successfully in a current project (the same project that got me to implement Hindley-Milner with unification-fd—watch this space for an announcement soon!). If you know of similar approaches that have been written about elsewhere, or if you end up using this technique in your own project, I’d love to hear about it.