Competitive programming in Haskell: sparse tables
Tagged monoid, semigroup, idempotent, range, query, sum, sparse, table, Haskell, competitive programming
Continuing a series of posts on techniques for calculating range queries, today I will present the sparse table data structure, for doing fast range queries on a static sequence with an idempotent combining operation.
Motivation
In my previous post, we saw that if we have a static sequence and a binary operation with a group structure (i.e. every element has an inverse), we can precompute a prefix sum table in \(O(n)\) time, and then use it to answer arbitrary range queries in \(O(1)\) time.
What if we don’t have inverses? We can’t use prefix sums, but can we
do something else that still allows us to answer range queries in
\(O(1)\)? One thing we could always do would be to construct an \(n \times n\) table storing the answer to every possible range
query—that is, \(Q[i,j]\) would store the value of the range \(a_i \diamond \dots \diamond a_j\). Then we could just look up the answer to
any range query in \(O(1)\). Naively computing the value of each
\(Q[i,j]\) would take \(O(n)\) time, for a total of \(O(n^3)\) time to fill
in each of the entries in the tableWe only have to fill in \(Q[i,j]\)
where \(i < j\), but this is still about \(n^2/2\) entries.
, though it’s not
too hard to fill in the table in \(O(n^2)\) total time, spending only
\(O(1)\) to fill in each entry—I’ll leave this to you as an exercise.
However, \(O(n^2)\) is often too big. Can we do better? More generally, we are looking for a particular subset of range queries to precompute, such that the total number is asymptotically less than \(n^2\), but we can still compute the value of any arbitrary range query by combining some (constant number of) precomputed ranges. In the case of a group structure, we were able to compute the values for only prefix ranges of the form \(1 \dots k\), then compute the value of an arbitrary range using two prefixes, via subtraction.
A sparse table is exactly such a scheme for precomputing a subset of
ranges.In fact, I believe, but do not know for sure, that this is
where the name “sparse table” comes from—it is “sparse” in the sense
that it only stores a sparse subset of range values.
Rather than only
a linear number of ranges, as with prefix sums, we have to compute
\(O(n \lg n)\) of them, but that’s still way better than \(O(n^2)\). Note,
however, that a sparse table only works when the combining operation
is idempotent, that is, when \(x \diamond x = x\) for all \(x\). For
example, we can use a sparse table with combining operations such as
\(\max\) or \(\gcd\), but not with \(+\) or \(\times\). Let’s see how it works.
Sparse tables
The basic idea behind a sparse table is that we precompute a series of “levels”, where level \(i\) stores values for ranges of length \(2^i\). So level \(0\) stores “ranges of length \(1\)”—that is, the elements of the original sequence; level \(1\) stores ranges of length \(2\); level \(2\) stores ranges of length \(4\); and so on. Formally, \(T[i,j]\) stores the value of the range of length \(2^i\) starting at index \(j\). That is,
\[T[i,j] = a_j \diamond \dots \diamond a_{j+2^i-1}.\]
We can see that \(i\) only needs to go from \(0\) up to \(\lfloor \lg n \rfloor\); above that and the stored ranges would be larger than the entire sequence. So this table has size \(O(n \lg n)\).
Two important questions remain: how do we compute this table in the first place? And once we have it, how do we use it to answer arbitrary range queries in \(O(1)\)?
Computing the table is easy: each range on level \(i\), of length \(2^i\), is the combination of two length-\(2^{i-1}\) ranges from the previous level. That is,
\[T[i,j] = T[i-1, j] \diamond T[i-1, j+2^{i-1}]\]
The zeroth level just consists of the elements of the original sequence, and we can compute each subsequent level using values from the previous level, so we can fill in the entire table in \(O(n \lg n)\) time, doing just a single combining operation for each value in the table.
Once we have the table, we can compute the value of an arbitrary range \([l,r]\) as follows:
Compute the biggest power of two that fits within the range, that is, the largest \(k\) such that \(2^k \leq r - l + 1\). We can compute this simply as \(\lfloor \lg (r - l + 1) \rfloor\).
Look up two range values of length \(2^k\), one for the range which begins at \(l\) (that is, \(T[k, l]\)) and one for the range which ends at \(r\) (that is, \(T[k, r - 2^k + 1]\)). These two ranges overlap; but because the combining operation is idempotent, combining the values of the ranges yields the value for our desired range \([l,r]\).
This is why we require the combining operation to be idempotent: otherwise the values in the overlap would be overrepresented in the final, combined value.
Haskell code
Let’s write some Haskell code! First, a little module for idempotent
semigroups. Note that we couch everything in terms of semigroups,
not monoids, because we have no particular need of an identity
element; indeed, some of the most important examples like \(\min\) and
\(\max\) don’t have an identity element. The IdempotentSemigroup
class has no methods, since as compared to Semigroup
it only adds a
law. However, it’s still helpful to signal the requirement. You
might like to convince yourself that all the instances listed below
really are idempotent.
module IdempotentSemigroup where
import Data.Bits
import Data.Semigroup
-- | An idempotent semigroup is one where the binary operation
-- satisfies the law @x <> x = x@ for all @x@.
class Semigroup m => IdempotentSemigroup m
instance Ord a => IdempotentSemigroup (Min a)
instance Ord a => IdempotentSemigroup (Max a)
instance IdempotentSemigroup All
instance IdempotentSemigroup Any
instance IdempotentSemigroup Ordering
instance IdempotentSemigroup ()
instance IdempotentSemigroup (First a)
instance IdempotentSemigroup (Last a)
instance Bits a => IdempotentSemigroup (And a)
instance Bits a => IdempotentSemigroup (Ior a)
instance (IdempotentSemigroup a, IdempotentSemigroup b) => IdempotentSemigroup (a,b)
instance IdempotentSemigroup b => IdempotentSemigroup (a -> b)
Now, some code for sparse tables. First, a few imports.
{-# LANGUAGE TupleSections #-}
module SparseTable where
import Data.Array (Array, array, (!))
import Data.Bits (countLeadingZeros, finiteBitSize, (!<<.))
import IdempotentSemigroup
The sparse table data structure itself is just a 2D array over some
idempotent semigroup m
. Note that UArray
would be more efficient,
but (1) that would make the code for building the sparse table more
annoying (more on this later), and (2) it would require a bunch of
tedious additional constraints on m
.
newtype SparseTable m = SparseTable (Array (Int, Int) m)
deriving (Show)
We will frequently need to compute rounded-down base-two logarithms,
so we define a function for it. A straightforward implementation
would be to repeatedly shift right by one bit and count the number of
shifts needed to reach zero; however, there is a better way, using
Data.Bits.countLeadingZeros
. It has a naive default implementation
which counts right bit shifts, but in most cases it compiles down to
much more efficient machine instructions.
-- | Logarithm base 2, rounded down to the nearest integer. Computed
-- efficiently using primitive bitwise instructions, when available.
lg :: Int -> Int
= finiteBitSize n - 1 - countLeadingZeros n lg n
Now let’s write a function to construct a sparse table, given a
sequence of values. Notice how the sparse table array st
is defined
recursively.
This works because the Array
type is lazy in the stored values, with
the added benefit that only the array values we end up actually
needing will be computed. However, this comes with a decent amount of
overhead. If we wanted to use an unboxed array instead, we wouldn’t
be able to use
the recursive definition trick; instead, we would have to use an
STUArray
and fill in the values in a specific order. The code for this would
be longer and much more tedious, but could be faster if we end up
needing all the values in the array anyway.
-- | Construct a sparse table which can answer range queries over the
-- given list in $O(1)$ time. Constructing the sparse table takes
-- $O(n \lg n)$ time and space, where $n$ is the length of the list.
fromList :: IdempotentSemigroup m => [m] -> SparseTable m
= SparseTable st
fromList ms where
= length ms
n = lg n
lgn
=
st 0, 0), (lgn, n - 1)) $
array ((zip ((0,) <$> [0 ..]) ms
++ [ ((i, j), st ! (i - 1, j) <> st ! (i - 1, j + 1 !<<. (i - 1)))
| i <- [1 .. lgn]
<- [0 .. n - 1 !<<. i]
, j ]
Finally, we can write a function to answer range queries.
-- | \$O(1)$. @range st l r@ computes the range query which is the
-- @sconcat@ of all the elements from index @l@ to @r@ (inclusive).
range :: IdempotentSemigroup m => SparseTable m -> Int -> Int -> m
range (SparseTable st) l r = st ! (k, l) <> st ! (k, r - (1 !<<. k) + 1)
where
= lg (r - l + 1) k
Applications
Most commonly, we can use a sparse table to find the minimum or
maximum values on a range, \(\min\) and \(\max\) being the quintessential
idempotent operations. For example, this plays a key role in a
solution to the (quite tricky) problem
Ograda.At first it
seemed like that problem should be solvable with some kind of sliding
window approach, but I couldn’t figure out how to make it work!
What if we want to find the index of the minimum or maximum value in
a given range (see, for example, Worst Weather)? We can easily accomplish this using the semigroup Min (Arg m i)
(or Max (Arg m i)
), where m
is the type of the values and i
is
the index type. Arg
, from Data.Semigroup
, is just a pair which uses only the first value
for its Eq
and Ord
instances, and carries along the second value
(which is also exposed via Functor
, Foldable
, and Traversable
instances). In the example below, we can see that the call to range st 0 3
returns both the max value on the range (4
) and its index
(2
) which got carried along for the ride:
λ> :m +Data.Semigroup
λ> st = fromList (map Max (zipWith Arg [2, 3, 4, 2, 7, 4, 9] [0..]))
λ> range st 0 3
Max {getMax = Arg 4 2}
Finally, I will mention that being able to compute range minimum queries is one way to compute lowest common ancestors for a (static, rooted) tree. First, walk the tree via a depth-first search and record the depth of each node encountered in sequence, a so-called Euler tour (note that you must record every visit to a node—before visiting any of its children, in between each child, and after visiting all the children). Now the minimum depth recorded between visits to any two nodes will correspond to their lowest common ancestor.
Here are a few problems that involve computing least common ancestors in a tree, though note there are also other techniques for computing LCAs (such as binary jumping) which I plan to write about eventually.