Binary search over floating point representations
I got some good feedback on my last post about binary search, and thought it was worth a follow-up post.
An important fix
First things first: commenter Globules pointed out that doing (l+r)
can overflow; my initial, glib answer was “sure, you have to be careful when doing arithmetic with fixed-sized integers”, but then I realized that it was easily fixable in this case! So I replaced that formula with div
2l + (r-l)
which won’t overflow, even when searching for values near the top of the range of a fixed-sized integer type.
div
2
An entirely frivolous distraction that I spent way too much time on
With that out of the way, let’s get to the more interesting discussion. Several commenters took issue with my use of unsafeCoerce
to convert Double
into Word64
and do binary search over bit representations for floating-point numbers, and they raised some good points:
-
Even if, in the context of competitive programming, we’re not particularly concerned with maintainability or safety, there’s no guarantee that
unsafeCoerce
will behave the same on our local machine as on the judging machine! For example, the endianness might be different. - Worse, it just plain doesn’t work: my worry about the bit representation of negative numbers looking bigger than that of positive numbers, because of the sign bit, was actually a very valid worry. It just so happens to work when searching for positive numbers, but when searching for negative numbers it goes into an infinite loop!
In my defense, I got this idea from Jules Jacobs’s original article… but looking at it again, there is a key footnote that I hadn’t noticed before:
I’m assuming that f2b
respects ordering, that is, comparing f2b(x) < f2b(y)
gives the same result as comparing the floats x < y
. Depending on the bit representation of floats, one would have to shuffle the mantissa and exponent and sign bits around to ensure this.
Oops, of course! It turns out there are a lot of things about the IEEE-754 standard which make this work nicely for positive numbers: the exponent is stored first, with a bias so we don’t have to deal with signed exponents, and the mantissa is always understood to have a leading 1 bit which is not stored. For positive floating-point numbers \(x\) and \(y\), it’s already the case that \(x < y\) if and only if their IEEE-754 representations, considered as unsigned integers, are in the same order! This is very clever, and it seems it was done this way on purpose, so that hardware comparison of floating-point numbers can be fast. And this is what made my example in the previous post work at all.
However, for negative numbers this doesn’t quite work. First of all, the high-order bit is the sign bit, so negative numbers all appear larger when interpreted as unsigned integers. Interpreting them as signed integers doesn’t work either, because they are just stored as a sign bit and a magnitude, as opposed to signed integers which are typically stored using 2’s complement, so negative floating point numbers are “backwards” compared to the interpretation of their bit pattern as (signed or unsigned) integers. But this is not hard to fix; commenter babel linked to a reference explaining exactly how to do the required bit-twiddling. Essentially, we always flip the sign bit, and flip all the other bits too if the sign bit was set.
So I could have just done this bit-twiddling on the result of unsafeCoerce
. However, goaded by some other commenters, I wanted to try using encodeFloat
/decodeFloat
instead of unsafeCoerce
to make it a little more platform-independent. I ended up spending many hours on this. I fixed about 17 gazillion bugs and gave up several times in frustration, only to realize another bug later and come back to it. In the end, though, I got it to work! You can see my f2b :: Double -> Word64
and b2f :: Word64 -> Double
functions here. I do make some assumptions about the size of Double
values, so it’s not completely platform-independent, but at least it should be independent of the endianness.
How do I know it’s correct? Well, I can’t be 100% sure, but I verified each of the following properties by running QuickCheck on a million random inputs (and I used these properties to find lots of bugs!):
-
f2b
is monotonic: for allx :: Double
,x < y
if and only iff2b x < f2b y
. -
b2f
is left inverse tof2b
: for allx :: Double
,b2f (f2b x) == x
. -
b2f
is almost a right inverse tof2b
; this direction is made more complicated by the fact that someWord64
values correspond toInfinity
orNaN
. Also, there are someWord64
values that correspond to really tiny floating-point values on the very edge of what is representable, wheref2b (b2f w)
is one more or less than the originalw
. -
It’s actually not enough that
x < y
impliesf2b x < f2b y
; we also need the fact that the midpoint betweenf2b x
andf2b y
will correspond to a floating-point number betweenx
andy
. I was worried about this for a while, until I finally understood the fact that the mantissa is always assumed to have a leading1
which is not stored. That makes everything work out nicely, and I checked this property with QuickCheck as well.
So, let’s see it in action! We can search for negative values now, or values that don’t exist, etc.
λ> search floating (> (-3.2934)) (-100) 100
(-3.2934,-3.2933999999999997)
λ> search floating (\x -> x**7 >= 1e34) (-1e100) (1e100)
(71968.56730011519,71968.5673001152)
λ> search floating (\x -> x**2 >= 150) 0 100
(12.247448713915889,12.24744871391589)
λ> search floating (\x -> x**2 >= (-150)) (-1e308) (1e308)
(-1.0e308,-9.999999999999998e307)
So, was it worth it? From a competitive programming point of view, probably not! I can think of one or two times I’ve really struggled with precision issues where this might have helped. But 99.9% of the time you can just use a normal binary search on floating-point values until you get within the required tolerance. Overall, though, despite the extreme frustration, this was a fun detour through some things I didn’t understand very well before. I now know a lot more about IEEE-754 encoding and Haskell’s support for floating-point values!