Liam Healy ([info]lhealy) wrote,
@ 2008-03-08 14:16:00
Previous Entry  Add to memories!  Tell a Friend!  Next Entry
Entry tags:c, floating point, fortran, lisp

Never a NaN or Inf

In very few cases is it acceptable to me that a floating-point calculation would produce NaN or Inf; it almost always means that there is an error somewhere in my program. Following the principle that it is best to detect an error as soon as possible, I would like to know immediately whenever either of these is created, by trapping the exception and signaling an error. Depending on the language and implementation, this doesn't always happen by default. Here are some different languages and compilers, and how to set to make this happen:


  • Lisp: SBCL: (sb-int:set-floating-point-modes :traps '(:invalid :divide-by-zero :overflow)) and
    CMUCL: (ext:set-floating-point-modes :traps '(:invalid :divide-by-zero :overflow)).
    Also GSLL provides #'set-floating-point-modes.
  • Fortran: for gfortran, use the compiler flag -ffpe-trap=invalid,zero,overflow.
  • C/C++: use
    #include <fenv.h>
    trapfpe () {
      feenableexcept(FE_INVALID|FE_DIVBYZERO|FE_OVERFLOW);
    }
    

    In C, this will need to be included: void trapfpe();
    In C++, this will need to be included (also works for C):
    #ifndef _TRAPFPE_H_
    #define _TRAPFPE_H_
    
    #include <sys/cdefs.h>
    
    __BEGIN_DECLS
    void trapfpe(void);
    __END_DECLS
    
    #endif /* _TRAPFPE_H_ */
    

    and then in either, include trapfpe(); at the beginning. This should be portable to C99 compilers/libc but I've used only gcc and glibc. Also, GSL has gsl_ieee_env_setup.



(Post a new comment)

Don't use libraries...
[info]ejr [cs.berkeley.edu]
2008-03-08 10:14 pm UTC (link)
At least LAPACK includes code in eigenvalue routines that explicitly uses Inf and NaN semantics for performance. Rather than checking each subexpression for all its possible problems, you run through a calculation quickly and check the result once. That optimizes the vastly more common case while still being safe.

There are many general cases where you want to try a fast and sloppy algorithm that may fail, detect possible failures, and run a slower, more careful algorithm. Forcing traps on all the code you call disallows that option. You're stuck always running the slower algorithm. Since platforms don't include floating-point mode settings in their ABIs, you never know which library will have a silent dependency on particular settings. And unfortunately changing mode settings can be incredibly expensive and (as you note) very difficult to port, so library developers cannot enforce a mode sanely on every library call.

Yeah, it's a mess. In general, traps are bad news unless their use is very tightly focused. If there were a clean way to specify trapping modes with lexical extent, life would be much simpler.

(Reply to this) (Thread)

Re: Don't use libraries...
[info]lhealy
2008-03-08 10:34 pm UTC (link)
Interesting point; I don't use LAPACK so I wasn't aware of that usage. It certainly makes sense to me. I view the traps as a debugging technique. I happen to encounter this problem in a C++ program where I slowly was backing through the code with a debugger trying to find where the NaNs were coming from until I thought "I want to know where a NaN was first made, because that's where my problem is." And indeed, when I found out about feenableexcept and put it to work, I instantly found out that I was computing the unit vector for a vector that was sometimes 0. Unfortunately, it took me a while to get that trapfpe() code right, and that's why I've posted the synopsis here.

I think this is where Fortran compilers are better than C compilers typically. With flags like -ffpe-trap and -fbounds-check you can nip a whole lot of problems in the bud, then remove them from your makefile in a flash for the "real" work. C compilers don't usually have these options. If you're using a library like LAPACK this shouldn't be a problem because that would usually be compiled separately; isn't that effectively the lexical extent you're looking for?

(Reply to this) (Parent)(Thread)

Re: Don't use libraries...
[info]ejr [cs.berkeley.edu]
2008-03-08 11:58 pm UTC (link)
Nope. If you set up traps in caller code, the traps still are on in the library. The extent is dynamic, not lexical. It's a tricky problem. On the flip side, there are times when you want to switch into, say, "read-only" mode and have any library calls fail if they attempt to write to a file. There's no one size fits all solution, but currently languages and libraries provide only one size.

Also note that there's no guarantee that the trap will correspond with the operation that generates it, and again no guarantee that the FPU operation will be correctly associated with a line of code... There are some notable exceptions in the higher end of the POWER processor family.

So trap-style debugging involves seriously low-level implementation details... It ain't for the faint of heart. A friend (David Bindel) has a C library that abstracts some of the details on x87, and Sun has similar functionality in their numerical libraries (written by Doug Priest, iirc). Any attempt to standardize any of the interfaces gets shot down or bogged in uberfeaturitis.

(Reply to this) (Parent)(Thread)

Re: Don't use libraries...
[info]ejr [cs.berkeley.edu]
2008-03-09 12:02 am UTC (link)
And as a quick addendum, rounding modes behave the same way. If you want to experience pain, change the rounding modes and call transcendental functions. Some no longer converge...

(Reply to this) (Parent)(Thread)

(Reply from suspended user)
Re: Don't use libraries...
[info]lhealy
2008-03-11 02:10 am UTC (link)
Yes, I realized that after I posted, despite flags the real work is being done at runtime. Still, I'm lucky in that I've managed to avoid those libraries that use mode setting as an optimization technique, and the pitfalls you mention, and find it handy to debug this way.

I don't suppose your friend is making his library available?

(Reply to this) (Parent)

CMUCL ext:with-float-traps-masked
[info]lhealy
2008-03-22 01:31 pm UTC (link)
I just found that CMUCL has ext:with-float-traps-masked which implies that it at least has the ability to mask the traps dynamically. This seems like it would be handy when calling those libraries that rely on masked FPEs.

(Reply to this) (Parent)

(Reply from suspended user)
__BEGIN_DECLS & __END_DECLS
(Anonymous)
2008-03-10 06:11 pm UTC (link)
N.B. In order to use the __BEGIN_DECLS and __END_DECLS macros, you must include their definition, in <sys/cdefs.h>. Also, using them means that you don't have to type 'extern "C"' yourself. In C++, __BEGIN_DECLS inserts 'extern "C" {' and __END_DECLS inserts the matching '}'; in C, they're no-ops.

So, a complete header file for trapfpe(), usable in both C and C++, could look like:

#ifndef _TRAPFPE_H_
#define _TRAPFPE_H_

#include <sys/cdefs.h>

__BEGIN_DECLS
void trapfpe(void);
__END_DECLS

#endif /* _TRAPFPE_H_ */

(Reply to this) (Thread)

Re: __BEGIN_DECLS & __END_DECLS
[info]lhealy
2008-03-11 02:06 am UTC (link)
Thanks, anonymous stranger! I've updated my blog.

(Reply to this) (Parent)


[info]draknek
2008-08-31 03:10 pm UTC (link)
Thanks a lot, this post was an absolute lifesaver!

One note: it's still possible to generate NaN with the above code; you also want to catch FE_UNDERFLOW.

I actually spent another couple of hours after I found this post, wondering how I was still getting NaN values but with no exceptions raised. It turns out somewhere I was getting the difference of two very small numbers, and the result was too small to store in a float.

(Reply to this) (Thread)

Underflow
[info]lhealy
2008-08-31 05:12 pm UTC (link)
Glad you had some success. I'm not sure how an underflow can lead to a NaN; my experience is that it leads to a zero. In any case, be aware that trapping underflows could give masses of uninteresting traps; ordinary calculations underflow all the time, and correctly produce zero as a result.

(Reply to this) (Parent)(Thread)

Re: Underflow
[info]draknek
2008-08-31 06:46 pm UTC (link)
Yes, you're right. I was just looking at this again trying to make sense of it and realised the mistake. The two very small numbers were very small because I hadn't initialised their memory -- the underflow exception alerted me to this and I jumped to the conclusion that the underflow was creating a NaN value.

The actual explanation is of course that if you have enough objects which don't initialise their data, eventually you'll get one randomly initialised to NaN (since there about 2^24 distinct NaN values).

Moral of the story: always, always initialise your member data, even if you don't think you have to (the field in question used to be a different type that didn't need initialising).

(Reply to this) (Parent)


Create an Account
Forgot your login or password?
Login w/ OpenID
English • Español • Deutsch • Русский…