Competitive programming in Haskell: Infinite 2D array, Levels 2 and 3
In a previous post, I challenged you to solve Infinite 2D Array using Haskell. As a reminder, the problem specifies a two-parameter recurrence \(F_{x,y}\), given by
- \(F_{0,0} = 0\)
- \(F_{0,1} = F_{1,0} = 1\)
- \(F_{i,0} = F_{i-1,0} + F_{i-2,0}\) for \(i \geq 2\)
- \(F_{0,i} = F_{0,i-1} + F_{0,i-2}\) for \(i \geq 2\)
- \(F_{i,j} = F_{i-1,j} + F_{i,j-1}\) for \(i,j \geq 1\).
Last time, we derived a formula for \(F_{x,y}\) that involves only a linear number of terms:
\(\displaystyle F_{x,y} = \left(\sum_{1 \leq k \leq x} F_k \binom{x-k+y-1}{x-k}\right) + \left(\sum_{1 \leq k \leq y} F_k \binom{y-k+x-1}{y-k}\right) \pmod{P}\)
While the number of terms may be linear, it can still be on the order of a million terms, so computing each term is going to have to be pretty quick in order to fit the whole thing within the one second time limit.
Fibonacci numbers modulo a prime
Computing Fibonacci numbers modulo a prime is not hard, especially since we want all the Fibonacci numbers from 1 up to \(\max(x,y)\): just compute each one by adding the previous two modulo \(P\). We could also precompute a table of Fibonacci numbers mod \(P\) this way. And any of the fast methods for computing individual Fibonacci numbers (for example, using 2x2 matrix exponentiation) also work just fine if you reduce everything modulo \(P\) at each step, since they only involve addition, subtraction, and multiplication.
Binomial coefficients modulo a prime
What about binomial coefficients? Since \(n\) and \(k\) are not too large, and in particular since they will both be smaller than \(P\), we can use the usual formula:
\(\displaystyle \binom n k = \frac{n!}{k!(n-k)!}\)
(If \(n\) and \(k\) could be much larger, or if they could be larger than \(P\), we would have to use something like Lucas’s Theorem or other techniques; that might make for another interesting blog post sometime.) But how do we handle division in modular arithmtic? Since we’re working modulo a prime, every value \(a\) other than zero must have a modular inverse, that is, a value \(a^{-1}\) such that \(a \cdot a^{-1} \equiv 1 \pmod p\) (this is a corollary of Bézout’s Theorem). To compute the modular inverse for a given \(a\), we have a couple options. One simple way is to use Fermat’s Little Theorem: if \(a\) is not divisible by a prime \(p\), then \(a^{p-2} \cdot a = a^{p-1} \equiv 1 \pmod p\), hence \(a^{p-2}\) is the modular inverse of \(a\) modulo \(p\), and we can compute it efficiently using repeated squaring modulo \(p\). Another option is to use the extended Euclidean algorithm to find the \(x\) and \(y\) (guaranteed to exist by Bézout’s Theorem) such that \(ax + py = 1\); then \(x\) is the inverse of \(a\) modulo \(p\).
Both of these methods take \(O(\lg p)\) time. In my experience, computing the \(p-2\) power is easier to code (especially in Haskell where we get exponentiation by repeated squaring for free!), but using the extended Euclidean algorithm can be a bit faster when it’s well-optimized. (Note the extended Euclidean algorithm can be faster when \(a\) is small, but raising to the \(p-2\) power always takes the same number of steps no matter what \(a\) is.)
Factorials modulo a prime
Since we’re going to be repeatedly using the same factorials, one thing we absolutely must do is precompute a table of factorials mod \(P\), from \(0\) up to some maximum. In this case, since our formula involves things like \(\binom {x-k+y-1}{x-k}\), we may need factorials up to \(x + y\), so a table of size \(2 \times 10^6\) will do (\(x\) and \(y\) can be up to \(10^6\)).
We could also precompute a table of modular inverses of factorials; to compute the inverse of \(k!\), we just find the inverse of each \(k\) and multiply it by the (previously computed) inverse of \((k-1)!\). (Or we could just invert the value for \(k!\) stored in the other table.) Making a table of inverse factorials like this turns out not to help too much for this particular problem, but it can be an important optimization in some cases.
The end?
So we can compute each additional Fibonacci number in \(O(1)\); we can also now compute binomial coefficients modulo \(P\) in \(O(\lg P)\), with a few \(O(1)\) table lookups for factorials and an \(O(\lg P)\) inversion operation. (Again, we could achieve \(O(1)\) if we also stored a table of inverse factorials, but for this problem it seems the additional time needed to construct the table in the first place outweighs the time saved computing binomial coefficients.) In theory, we have everything we need to solve this problem efficiently.
However, for this problem, constant factors matter! There’s still quite a bit of nontrivial work I had to do to get my code fast enough. In my next and final post on this problem, we’ll walk through a few different ideas for implementing this concretely in Haskell.