PRB: Inaccurate Representation of Large Double Values

ID Number: Q59407

5.10 6.00 6.00a 6.00ax 7.00 | 5.10 6.00 6.00a

MS-DOS | OS/2

Summary:

SYMPTOMS

In Microsoft C versions 5.1, 6.0, 6.0a, 6.0ax, and C/C++ version 7.0,

subtracting double values greater than or equal to 1.0E+025 may

return inaccurate results.

CAUSE

This is expected behavior and is due to the imprecise nature of

floating-point math. Anytime floating-point math uses large

numbers, there will be rounding/truncation errors and errors

introduced due to imprecise representation of a result in binary

format.

More Information:

Since double values are only 15-digit precise, simple subtraction of

two large numbers can give unexpected results. The following sample

code demonstrates this behavior.

Double values less than 1.0E+25 may not experience the same problem.

Sample Code

-----------

#include <stdio.h>

double a = 1E+28, tmp = 9E+28;

void main (void)

{

printf ("a = %le tmp = %le\n", a, tmp);

while (tmp >= 1E+25) {

tmp -= a;

printf ("a = %le tmp = %le\n", a, tmp);

}

}

More Information:

The above sample code produces the following output:

a = 1.000000e+028 tmp = 9.000000e+028

a = 1.000000e+028 tmp = 8.000000e+028

a = 1.000000e+028 tmp = 7.000000e+028

a = 1.000000e+028 tmp = 6.000000e+028

a = 1.000000e+028 tmp = 5.000000e+028

a = 1.000000e+028 tmp = 4.000000e+028

a = 1.000000e+028 tmp = 3.000000e+028

a = 1.000000e+028 tmp = 2.000000e+028

a = 1.000000e+028 tmp = 1.000000e+028

a = 1.000000e+028 tmp = 1.319414e+013

Additional reference words: 5.10 6.00 6.00a 6.00ax 7.00