Data structure challenge: solutions
Tagged balanced, challenge, data, empty, full, log, rank, rightmost, select, slot, structure, tree, union-find
In my previous post I challenged you to find a way to keep track of a sequence of slots in such a way that we can quickly (in \(O(\lg n)\) or better) either mark any empty slot as full, or find the rightmost empty slot prior to a given index. When I posted it, I had two solutions in mind; thanks to all the excellent comments I now know of many more!
-
There were quite a few answers which in some way or another boiled down to using a balanced tree structure.
-
Joachim Breitner suggested to use something like
Data.IntMap
. -
Roman Cheplyaka suggested using
Data.Set
. -
An anonymous commenter suggested using some kind of self-balancing tree like a red-black tree.
-
Commenter Albert suggested keeping two balanced trees, one containing all the full slots and one containing the empty slots (though it is not clear to me what the benefit would be of having both).
In all these cases, the idea is that the tree stores the sorted indices of all empty slots. We can mark a slot full or empty by deleting or inserting from the tree structure, and we can find the rightmost empty slot not exceeding a given index by searching for the index and returning highest value in the tree which is less than the index being searched. It is well-known that all these operations can be done in \(O(\lg n)\) time.
-
-
David Barbour suggested something along similar lines, but somewhat more general: keep a finger tree with cached monoidal annotations representing both the total number of elements of each subtree, as well as the number of elements satisfying some proposition (such as the number of empty slots). This also allows performing the operations in \(O(\lg n)\) time. This is in some sense similar to the previous suggestions, but it generalizes much more readily, since we can use this scheme to track any kind of monoidal annotation. I had thought of using a segment tree where slot \(i\) stores the value \(i\) when it is full, and \(0\) when it is empty, and each node caches the max value contained in its subtree, allowing updates and queries to happen in \(O(\lg n)\) time. This could also track arbitrary monoidal annotations, but using a finger tree is strictly more expressive since it also supports insertion and deletion (although that is not required for my original formulation of the problem).
-
Albert also suggested using a van Emde Boas tree to achieve \(O(\lg \lg n)\) performance. Van Emde Boas trees directly support a “predecessor” operation which finds the largest key smaller than a given value.
-
Roman Cheplyaka suggested using some sort of dynamic rank/select: if we think of the sequence of slots as a bit vector, and represent empty slots by 1s and full slots by 0s, we can find the rightmost empty slot up to a given index by first finding the rank of the index, then doing a select operation on that rank. (I smell some kind of adjunction here: the composition of rank then select is a sort of idempotent closure operator that “rounds down” to the index of the rightmost preceding 1 bit. Maybe one of my readers can elaborate?) The tricky part, apparently, is doing this in such a way that we can dynamically update bits; apparently it can be done so the operations are still \(O(\lg n)\) (Roman linked to this paper) but it seems complicated.
-
One of my favorite solutions (which I also independently came up with) was suggested by Julian Beaumont: use a disjoint-set data structure (aka union-find) which stores a set for each contiguous (possibly empty) block of full slots together with its one empty predecessor (we can create a dummy slot on the left end to act as the “empty predecessor” for the first block of full slots). Each set keeps track of the index of its leftmost, empty, slot, which is easy to do: any time two sets are unioned we simply take the minimum of their empty slot indices (more generally, we can annotate the sets of a disjoint-set structure with values from any arbitrary monoid). To mark an empty slot as full, we simply union its set with the set of the slot to the left. To find the rightmost empty slot left of a given index, just look up the stored leftmost index corresponding to the set of the given index. Both these operations can thus be implemented in amortized \(O(\alpha(n))\) time (where \(\alpha\) is the inverse Ackermann function, hence essentially constant). Intuitively, I doubt it is possible to do any better than this. Curiously, however, unlike the other solutions, this solution depends crucially on the fact that we can never revert a full slot to empty!
-
Apparently, this is known as the predecessor problem, and can also be solved with something called fusion trees which I had never heard of before.