Home

Advertisement

Customize

Liam Healy

Mar. 8th, 2008

02:16 pm - Never a NaN or Inf

In very few cases is it acceptable to me that a floating-point calculation would produce NaN or Inf; it almost always means that there is an error somewhere in my program. Following the principle that it is best to detect an error as soon as possible, I would like to know immediately whenever either of these is created, by trapping the exception and signaling an error. Depending on the language and implementation, this doesn't always happen by default. Here are some different languages and compilers, and how to set to make this happen:


  • Lisp: SBCL: (sb-int:set-floating-point-modes :traps '(:invalid :divide-by-zero :overflow)) and
    CMUCL: (ext:set-floating-point-modes :traps '(:invalid :divide-by-zero :overflow)).
    Also GSLL provides #'set-floating-point-modes.
  • Fortran: for gfortran, use the compiler flag -ffpe-trap=invalid,zero,overflow.
  • C/C++: use
    #include <fenv.h>
    trapfpe () {
      feenableexcept(FE_INVALID|FE_DIVBYZERO|FE_OVERFLOW);
    }
    

    In C, this will need to be included: void trapfpe();
    In C++, this will need to be included (also works for C):
    #ifndef _TRAPFPE_H_
    #define _TRAPFPE_H_
    
    #include <sys/cdefs.h>
    
    __BEGIN_DECLS
    void trapfpe(void);
    __END_DECLS
    
    #endif /* _TRAPFPE_H_ */
    

    and then in either, include trapfpe(); at the beginning. This should be portable to C99 compilers/libc but I've used only gcc and glibc. Also, GSL has gsl_ieee_env_setup.

Jan. 26th, 2008

01:29 pm - Comparison of floating point numbers in Common Lisp

Floating point numbers are a computer representation of the real numbers. Unlike the real numbers, there are a finite number of them. So there is a smallest and largest floating point number, and all others have a predecessor and successor.

Because different compilers and platforms can reorder a calculation and optimize in a way that is approximated differently and so do not necessarily produce the same floating point number, it is difficult to compare two floating point numbers and conclude that they represent the same result. For the purposes of regression (or unit) testing, we would like to do exactly this. Bruce Dawson addressed this problem in "Comparing Floating Point Numbers." He makes the point that the best way to do this correctly is to interpret each floating point number as an integer. By taking advantage of the IEEE 754 standard for representation of floating point numbers, we can construct a function that maps the floating point numbers to the integers. The genius of the standard's inventor W. Kahan is that a mapping derived from the standard, call it i(x), satisfies three properties:

  • If two floats a<b, then i(a)<i(b),
  • if two floats are adjacent and a<b, then i(b)=i(a)+1,
  • and finally i(0.0)=0.

Dawson provides some clever C constructs to read a floating point number as an integer, and instruction on how to prevent the compiler from complaining about your trickery in doing so. Common Lisp instead provides us with functions with which we can properly construct our own integers. As a side benefit, we don't care what the actual representation of the floating point number is; we will build our own IEEE-like representation. We don't exactly want the full IEEE754 word though; we leave off the most significant bit, which is a sign bit, and instead make the sign of the integer agree with the sign of the float.

What we end up with is an enumeration of the floats. That is, for every single precision float, there is one integer in the range [-2139095039, 2139095039], and vice versa, with the exception that both positive and negative zero (allowed by the standard) map to zero. Likewise, there is a one-to-one mapping of the double precision floats to [-9218868437227405311,9218868437227405311]. Floats with special values (positive and negative infinity, and NaN) that are required by the standard do not have integer values.

I have written the following functions in Common Lisp:

  • float-as-integer which is the function i(x);
  • integer-as-float which is the inverse function (this isn't necessary but can be useful) and also returns the rational form of the float;
  • decode-IEEE754 (used by other functions) that returns five values: significand, exponent, sign, bits in significand, bits in exponent, all as integers;
  • format-IEEE754-bits which prints out the binary form of the IEEE word, separated into the three parts (this isn't necessary but is nice for comparing with bit expansions shown in references like the Wikipedia page).

Here are some interesting floats:

(float-as-integer most-negative-single-float)
-2139095039
(float-as-integer least-negative-single-float)
-1
(float-as-integer -0.0f0)
0
(float-as-integer 0.0f0)
0
(float-as-integer least-positive-single-float)
1
(float-as-integer (- 1.0f0 single-float-negative-epsilon))
1065353215
(float-as-integer 1.0f0)
1065353216
(float-as-integer (+ 1.0f0 single-float-epsilon))
1065353217
(float-as-integer most-positive-single-float)
2139095039

A regression test would record not the floating point number, but the integer produced by float-as-integer. Since integers can be unambiguously formatted to and read from a text file in a unique way, a subsequent recomputation would provide a clear indication of how close the floats are. Of course, we must decide how much error we're going to allow, because a correct calculation may produce slightly different integers. As an added bonus, these functions can be used to identify (in languages other than Lisp) when a positive single float has been interpreted as a double float.