Liam Healy ([info]lhealy) wrote,
@ 2008-01-26 13:29:00
Previous Entry  Add to memories!  Tell a Friend  Next Entry
Entry tags:floating point, lisp

Comparison of floating point numbers in Common Lisp

Floating point numbers are a computer representation of the real numbers. Unlike the real numbers, there are a finite number of them. So there is a smallest and largest floating point number, and all others have a predecessor and successor.

Because different compilers and platforms can reorder a calculation and optimize in a way that is approximated differently and so do not necessarily produce the same floating point number, it is difficult to compare two floating point numbers and conclude that they represent the same result. For the purposes of regression (or unit) testing, we would like to do exactly this. Bruce Dawson addressed this problem in "Comparing Floating Point Numbers." He makes the point that the best way to do this correctly is to interpret each floating point number as an integer. By taking advantage of the IEEE 754 standard for representation of floating point numbers, we can construct a function that maps the floating point numbers to the integers. The genius of the standard's inventor W. Kahan is that a mapping derived from the standard, call it i(x), satisfies three properties:

  • If two floats a<b, then i(a)<i(b),
  • if two floats are adjacent and a<b, then i(b)=i(a)+1,
  • and finally i(0.0)=0.

Dawson provides some clever C constructs to read a floating point number as an integer, and instruction on how to prevent the compiler from complaining about your trickery in doing so. Common Lisp instead provides us with functions with which we can properly construct our own integers. As a side benefit, we don't care what the actual representation of the floating point number is; we will build our own IEEE-like representation. We don't exactly want the full IEEE754 word though; we leave off the most significant bit, which is a sign bit, and instead make the sign of the integer agree with the sign of the float.

What we end up with is an enumeration of the floats. That is, for every single precision float, there is one integer in the range [-2139095039, 2139095039], and vice versa, with the exception that both positive and negative zero (allowed by the standard) map to zero. Likewise, there is a one-to-one mapping of the double precision floats to [-9218868437227405311,9218868437227405311]. Floats with special values (positive and negative infinity, and NaN) that are required by the standard do not have integer values.

I have written the following functions in Common Lisp:

  • float-as-integer which is the function i(x);
  • integer-as-float which is the inverse function (this isn't necessary but can be useful) and also returns the rational form of the float;
  • decode-IEEE754 (used by other functions) that returns five values: significand, exponent, sign, bits in significand, bits in exponent, all as integers;
  • format-IEEE754-bits which prints out the binary form of the IEEE word, separated into the three parts (this isn't necessary but is nice for comparing with bit expansions shown in references like the Wikipedia page).

Here are some interesting floats:

(float-as-integer most-negative-single-float)
-2139095039
(float-as-integer least-negative-single-float)
-1
(float-as-integer -0.0f0)
0
(float-as-integer 0.0f0)
0
(float-as-integer least-positive-single-float)
1
(float-as-integer (- 1.0f0 single-float-negative-epsilon))
1065353215
(float-as-integer 1.0f0)
1065353216
(float-as-integer (+ 1.0f0 single-float-epsilon))
1065353217
(float-as-integer most-positive-single-float)
2139095039

A regression test would record not the floating point number, but the integer produced by float-as-integer. Since integers can be unambiguously formatted to and read from a text file in a unique way, a subsequent recomputation would provide a clear indication of how close the floats are. Of course, we must decide how much error we're going to allow, because a correct calculation may produce slightly different integers. As an added bonus, these functions can be used to identify (in languages other than Lisp) when a positive single float has been interpreted as a double float.




(10 comments) - (Post a new comment)


[info]jp_larocque
2008-01-27 06:17 am UTC (link)
First of all, I don't think CL mandates IEEE 754—not from what I've gathered looking at the Hyperspec some time ago.

Second of all, why not (PRINT (RATIONAL n))? This will express the exact value of n.

(Reply to this) (Thread)


[info]lhealy
2008-01-27 03:20 pm UTC (link)
You're absolutely correct that the standard doesn't mandate IEEE, and that was my point: these algorithms don't care what the internal representation of the numbers is, they compute the same integer as is computed in the IEEE spec.

Printing rational is a good way to distinguish numbers, but it's harder to tell how far apart numbers are when they're different. Dawson's premise is that "Units in the last place" is a significant measure of error no matter what the actual numbers. It seems possible to me that taking the difference of two rationals will yield greater acceptable errors (roundoff, etc.) near zero (because of subnormalized numbers) than away from it, but I'm not sure.

(Reply to this) (Parent)

float->int won't help
(Anonymous)
2008-01-27 11:33 pm UTC (link)
I'm not an expert, but I do know something of IEEE 754, and it does seem to me that transforming floating-point numbers into integers won't solve the first problem you stated: "Because different compilers and platforms can reorder a calculation and optimize in a way that is approximated differently and so do not necessarily produce the same floating point number..."

First, if you take two different floating-point numbers, convert them to integers, and compare the integers, they won't be equal. Furthermore, converting the significand to an integer can potentially lose the sign bit information, which matters for things like signed zero. (Kahan wrote a nice paper on the importance of signed zero for correctness of complex functions.)

Fortran and C compilers usually have options that control the compiler's freedom to reorder floating-point expressions. The default should always be not to reorder. In that case, two different compilers that profess IEEE 754 compliance, running on a machine that professes IEEE 754 compliance, should always give bitwise identical answers (not counting transcendental functions, though ideally the same would be true in that case). If they don't, there's a bug in one or both of the compilers.

Feel free to post a comment on my blog (http://hilbertastronaut.blogspot.com/) if you'd like to chat more about this.

mfh

(Reply to this) (Thread)

Re: float->int won't help
[info]lhealy
2008-01-28 03:01 am UTC (link)
Sorry my point wasn't clear. I wasn't trying to imply that the integers would prevent reordering etc. (how could they?) but rather that when two results are obtained, you could find out how many floats different your answers were, and make a judgement on whether that was acceptable or not. I agree with your signed zero point. Though I use Fortran and C a lot, I don't recall seeing a compiler option to prevent reordering, but that's most probably because I haven't looked. Nevertheless, here I'm using Common Lisp and if there is an option to prevent reordering, it would be very compiler dependent, and I'd still be at the mercy of different platforms. I'm think library functions might be different even with IEEE compliance, though like you I'd like to think that wouldn't be the case.

(Reply to this) (Parent)(Thread)

Re: float->int won't help
(Anonymous)
2008-01-29 12:33 am UTC (link)
Compilers may not have an option that explicitly says "allow / forbid reordering," but often have an option like "fast-math" (I'm thinking specifically of the Intel C compiler, but there are others) which relax floating-point ordering and other accuracy constraints, and default to a strict ordering. (Some compilers have an option like "IEEE-fp" which enforce strict floating-point semantics, like ordering and accuracy of division.)

I think I understand the purpose of your library now -- you'd like to measure the difference between two implementations' answers in terms of units in the last place (ulp). Wouldn't it be more practical to measure the relative difference between answers? That would also account for different significand (and exponent) lengths -- e.g., Clisp has LONG-FLOATs which are true arbitrary-precision bigfloats, so how do you compare its LONG-FLOAT answers as integers with those from, say, SBCL?

(Reply to this) (Parent)(Thread)

Re: float->int won't help
[info]lhealy
2008-01-29 02:11 am UTC (link)
OK, I have seen "fast-math". Never used it though, but I now know what it means, thanks. These are C or Fortran compilers, I don't know if any of the CL compilers have such an option.

I would not so much like to measure two implementations' answers, but measure the difference between two versions of my library, to insure things are still correct. In other words, the standard reason for a regression test. Other people, using other compilers, would be running it too, so I'd like their answers to come out close enough by some standard. For reasons why "relative difference between the answers" has troubles, see Dawson's article (short answer: near zero, that doesn't work). Every comparison of a result of a given type will be with another result of the same type (i.e. single float to single float, double float to double float). Regardless, none of the calculations are done in long float.

(Reply to this) (Parent)

Re: float->int won't help
(Anonymous)
2008-01-29 12:40 am UTC (link)
Also, I didn't mean to say that restricting reordering would be enough to ensure bitwise equality of two different implementations' calculations, but it's a minimum necessity of any test. If your CL compiler pretends that floats are integers and reorders them freely, then your tests won't help you; you may not even be able to estimate machine precision correctly, as the compiler might optimize the necessary calculation into a zero constant. Usually, it's recommended that you compile C and Fortran floating-point tests (such as the infamous and very illuminating test "Paranoia," and the machine parameter estimation tests in LAPACK) with all compiler optimizations turned off; I've heard of cases in which the LAPACK tests failed when the optimization flags were set too high, because the compiler did some boneheaded things.

(Reply to this) (Parent)(Thread)

Re: float->int won't help
[info]lhealy
2008-01-29 02:15 am UTC (link)
I don't know what you mean; what CL compiler pretends floats are integers? I trust that the compiler is correct, what I want to test is whether I've made mistakes in my library.
I'm not writing a compiler, I'm testing results. And I don't really need to know that results are exactly the same when repeated, I need to know any change due to rewriting is insignificant in a floating-point math sense. In Lisp, not C or Fortran.

(Reply to this) (Parent)(Thread)

Re: float->int won't help
(Anonymous)
2008-02-07 04:51 am UTC (link)
Sorry about the delay in responding!

By "pretend floats are integers," I mean "assume that floating-point numbers are associative." (They aren't.) If you turn the optimization flags high enough, your compiler may assume that they are. Then your results may (or may not) differ significantly from what you intended them to be. This is true for any language's compiler.

What I'm saying is that this problem is harder than you think, because even something like the optimization flags used to compile a library function could affect the result.

(Reply to this) (Parent)(Thread)

Re: float->int won't help
[info]lhealy
2008-03-05 03:54 pm UTC (link)
Oh yes, I absolutely agree that it's a hard problem. The integer representation can help when a formatted floating point representation is not reliable. It turns out though that ~g should print sufficient digits that when read back in, the number is exactly the same.

(Reply to this) (Parent)


(10 comments) - (Post a new comment)

Create an Account
Forgot your login or password?
Login w/ OpenID
English • Español • Deutsch • Русский…