Some Causes of Differences in Floating-Point ResultsLast reviewed: July 17, 1997Article ID: Q46749 |
6.00 6.00a 6.00ax 7.00 | 1.00 1.50
MS-DOS | WINDOWSkbtool The information in this article applies to:
SUMMARYThis article discusses some reasons why programs might produce different floating-point results when compiled with different compiler options. The program below produces different results when complied using
cl -AM -FPi prog.cthan when using the following:
cl -AM -FPa prog.cPart of the reason for the different results is that /FPa and /FPi generate math routines that don't work the same. /FPi math emulates the 80x87, to the point of actually converting 8-byte doubles to 10-byte internal format and doing the math in internal format. /FPa uses an 8-byte format for calculations; therefore, it is less accurate. This often accounts for differences in results.
MORE INFORMATIONAlso, the second number printed in the /FPi case is smaller than DBL_MIN, as defined in FLOAT.H. This situation is also correct because DBL_MIN is the smallest possible NORMALIZED value. (Normalized means that the high- order bit of the mantissa is a one.) "Denormals" (numbers where there are zeros in some of the high-order bits of the mantissa), however, can represent numbers "x" in the ranges + DBL_MIN > x > 0 and 0 > x > -DBL_MIN. Although this is an unusual situation, it is not an error. A denormal is less precise than a normalized number; however, a denormal is still more precise than 0 (zero) (which is the next best representation). By allowing use of denormal numbers, we make our floating-point result slightly more accurate. The alternate math library (/FPa) represents denormal numbers as 0 (zero). Another possible cause of differences in floating-point results is the inclusion or omission of the /Op option. When /Op is omitted, the compiler may skip storing intermediate results as 64-bit objects in memory, leaving them instead in the 80-bit registers of the 80x87 (or emulator package). This increases the speed and accuracy of the calculation. However, this can decrease the consistency of the calculations because other intermediate results may have been stored in 64-bit objects in memory anyway. Including /Op forces all intermediate results to be stored in memory, giving more consistent results. This option is often handy in programs involving complicated floating-point calculations. The program and its output follow:
Sample Code
#include <stdio.h> // START OF PROG.C #include <float.h> void main(void){ double a,b,c,prod1,prod2; _fpreset(); a=9.5788979e-283; b=8.050847e-1; c=9.5588526e-28; prod1=a*b; printf("\n product1 = %1.15le \n",prod1); prod2=c*prod1; printf("\n product2 = %1.15le \n",prod2);} // END OF PROG.C
Results
// RESULTS OBTAINED USING CL -AM -FPi PROG.C product1 = 7.711824142152130e-283 product2 = 7.371619025195353e-310 // This value is less than DBL_MIN // RESULTS OBTAINED USING CL -AM -FPa PROG.C product1 = 7.711824142152130e-283 product2 = 0.000000000000000e+000 |
Additional reference words: kbinf 1.00 1.50 6.00 6.00a 6.00ax 7.00 8.00
© 1998 Microsoft Corporation. All rights reserved. Terms of Use. |