1. This forum section is a read-only archive which contains old newsgroup posts. If you wish to post a query, please do so in one of our main forum sections (here). This way you will get a faster, better response from the members on Motherboard Point.

Different numerical behaviour on i86 vs other CPUs

Discussion in 'Intel' started by Matthieu Dazy, Aug 8, 2003.

  1. Hi,

    I recently came across a problem with computations involving float operand
    and a double result.

    On the test code below, i86 CPUs have a consistent behaviour OS- and
    compiler-wise (tested on Linux + gcc 3.2 and Windows + VS6), and other CPU
    types (Sparc, MIPS and Intel ia64) exhibit different results, but still
    consistent OS- and compiler-wise (tested on Solaris + Sun CC 5.2, Irix +
    MIPSpro CC 7.3, Linux64 + gcc 3.2, Linux64 + Intel CC 7.1).

    It looks like i86s implicitly cast individual float operands to double,
    thus preventing potential float underflows/overflows that may change the
    final result.

    This is quite problematic for some of our code that is required to have
    strictly identical behaviour on all platforms. I have tried changing the
    precision setting of the i86 FPU, but the three computations of the test
    code still give the same single result, whereas other CPUs give three
    slightly different results.

    Is there a solution either at the CPU/FPU or compiler level that would
    avoid us the trouble of identifying and manually fixing all potentially
    problematic constructs in our (rather large) code ?

    Thanks a lot,


    --+-- Test code

    #include <iostream>
    using namespace std;

    int main(int argc, char** argv) {
    float x = 41200.0f;
    float y = 0.0f;
    float z = 1257.53f;

    // this evaluates to 1699021440 on Sparc, MIPS and ia64 CPUs
    double a = x * x + y * y + z * z;

    // this evaluates to 1699021381.75 on Sparc, MIPS and ia64 CPUs
    double b = double(x * x) + double(y * y) + double(z * z);

    // this evaluates to 1699021381.774583 on Sparc, MIPS and ia64 CPUs
    double c = double(x) * double(x) + double(y) * double(y) +
    double(z) * double(z);

    // on i86 CPUs, all three expressions evaluate to 1699021381.774583

    cout.precision(16);
    cout << "a : " << a << endl;
    cout << "b : " << b << endl;
    cout << "c : " << c << endl;

    return 0;
    }
     
    Matthieu Dazy, Aug 8, 2003
    #1
    1. Advertisements

  2. Matthieu Dazy

    Oliver S. Guest

    Is there a solution either at the CPU/FPU or compiler level that would
    With the MS-Compiler there's an option called "improve floating-pt
    consistency" (/Op[-]) and with the Intel-compilers there's something
    similar.
     
    Oliver S., Aug 8, 2003
    #2
    1. Advertisements

  3. This is an essentially impossible requirement. You would have to code
    your own math routines.

    DS
     
    David Schwartz, Aug 8, 2003
    #3
  4. Matthieu Dazy

    Yousuf Khan Guest

    You mean that there is actually a switch to decrease precision?

    Yousuf Khan
     
    Yousuf Khan, Aug 10, 2003
    #4
  5. Matthieu Dazy

    Oliver S. Guest

    You mean that there is actually a switch to decrease precision?

    No, the switch enforces appropriate precision. If you'd do a float *
    float * double multiplication, the result of the first two operations
    would be chopped by storing it into memory as a float and re-loading
    it.
     
    Oliver S., Aug 11, 2003
    #5
  6. Matthieu Dazy

    Oliver S. Guest

    You're close, but it's actually more than just a double. They cast it
    This isn't true for most runtime-libraries because the startup-code of
    all CRTs I've seen so far sets the x86 fpu-control-word to a precision
    of 64 bits.
    The newest compilers even support SSE2 by using the SSE-registers for
    all calculations (library-functions like sin() of course use the x86-FPU)
    although their capabilities to exploit the possible parallelism through
    using the SIMD SSE-instructions instead of their MIMD SSE-counterparts
    are moderate (Intel C++) to nearly inexistent (VC++.net-2003).
    The Athlon-XPs also support SSE1, the Opterons SSE2 and SSE2 has full
    extended-precision support.
     
    Oliver S., Aug 11, 2003
    #6
  7. Even with 64 bits precision, doesn't it still maintain the 16 bit exponent?
    If so, overflow properties would be different.

    -- glen
     
    Glen Herrmannsfeldt, Aug 18, 2003
    #7
  8. Matthieu Dazy

    Yousuf Khan Guest

    Here's the formatting differences between single and double precision
    floats:

    http://www.psc.edu/general/software/packages/ieee/ieee.html

    Basically, the single-precision exponents are 8-bits long, and in
    double-precision they are 11-bits long.

    Yousuf Khan
     
    Yousuf Khan, Aug 18, 2003
    #8
  9. Yes, but the internal format has a 16 bit exponent. When you lower the
    precision, I don't think it also decreases the exponent bits. The
    properties of overflow will not be the same as if the result was stored and
    then reloaded.

    Actually, I haven't looked at them for a while. Wouldn't the precision
    attribute specify the number of mantissa bits? So it would be 53 if you
    wanted to match double, and 24 if you wanted to match float?

    -- glen
     
    Glen Herrmannsfeldt, Aug 19, 2003
    #9
  10. Matthieu Dazy

    Oliver S. Guest

    Even with 64 bits precision, doesn't it still maintain the 16 bit
    The "IA-32 Architecture Software Developer’s Manual - Volume 1" sug-
    gests that only the precision of the mantissa is affected by the FPU
    control-word. I checked this with the following program:

    #include <stdio.h>

    typedef unsigned __int16 WORD;

    main()
    {
    double dbl = 1.0;
    static float const mult = 2.0;

    dbl = 1.0;

    WORD wCtl,
    wCtlNew;

    __asm
    {
    fstcw WORD PTR wCtl // set single-precision
    mov ax, wCtl // "
    and ax, 0xFCFF // "
    mov wCtlNew, ax // "
    fldcw WORD PTR wCtlNew // "

    fld QWORD PTR dbl
    mov ecx, 128

    doubledouble:
    fmul DWORD PTR mult
    sub ecx, 1
    jnz doubledouble

    fstp QWORD PTR dbl

    fldcw WORD PTR wCtl // restore old precision
    }

    printf( "%lf", (double)dbl );
    }

    And the result shows, that the presicion-control part of the FPU con-
    trol-word doesn't affect the exponent; so the exponent is always 16-bit.
    So is it correct to consider the x87-FPU to be IEEE-754-compliant ?
     
    Oliver S., Aug 19, 2003
    #10
  11. Matthieu Dazy

    Yousuf Khan Guest

    Yes, you're right, I misunderstood you, it does convert everything to the
    internal "extended precision" format, which has a 16-bit exponent. The
    process of loading it or storing it from either single- or double-precision
    format to extended is done internally by the FPU. There is no optioning that
    a compiler has in this conversion process, it is done completely internally.
    Well, actually the mantissa would actually be 52- and 23-bits, respectively,
    because one bit is reserved for the sign of the number.

    An interesting thing about floating point is that there doesn't seem to be a
    standard extended precision format, or if there is a standard it's quite ad
    hoc. The single- and double-precision formats are locked tight and
    well-defined, but not the extended one. If you'll notice, there are two
    separate extended precision formats available, the Intel format and the
    Sparc/PowerPC format. The Intel format has an 80-bit structure with a 16-bit
    exponent and a 63-bit mantissa. The other format has an 128-bit structure
    with a 111-bit mantissa, but surprisingly still a 16-bit exponent like the
    Intel extended format, despite the fact that it's got 48 extra bits. It
    seems as if the split up between exponent bits and mantissa bits is entirely
    arbitrarily chosen.

    http://www.dcs.ed.ac.uk/home/SUNWspro/3.0/common-tools/numerical_comp_guide/ncg_math.doc.html#866

    or

    http://tinyurl.com/kk5c

    Yousuf Khan
     
    Yousuf Khan, Aug 20, 2003
    #11
  12. I did check this one. The options are 64, 53, and 24.
    Yes, but the 32 bit and 64 bit formats have a hidden one. Since a
    normalized binary floating point non-zero number must have the most
    significant mantissa bit a one, they don't need to store it. The 80 bit
    format does store it, and I believe allows unnormalized numbers.
    16 bits allows numbers bigger and smaller than most uses for such numbers.
    There are a few algorithms where a large exponent is useful, but not so
    many. The Intel 80 bit format started on the 8087 when they were somewhat
    limited in what they could do. I might even say that 16 is too many. One
    that I saw once explained that the volume of the universe in cubic Fermis,
    (about the volume of an atomic nucleus) is 1e136. The actual number of
    protons and neutrons in the universe is about 1e80. Why would anyone need
    numbers larger than that?

    The IBM extended precision format, starting with the 360/85, is base 16 with
    a 112 bit mantissa, and 7 bit base 16 exponent. (Single, Double, and
    Extended all have the same 7 bit base 16 exponent.)

    -- glen
     
    Glen Herrmannsfeldt, Aug 20, 2003
    #12
  13. | 16 bits allows numbers bigger and smaller than most uses for such numbers.
    | There are a few algorithms where a large exponent is useful, but not so
    | many. The Intel 80 bit format started on the 8087 when they were somewhat
    | limited in what they could do. I might even say that 16 is too many. One
    | that I saw once explained that the volume of the universe in cubic Fermis,
    | (about the volume of an atomic nucleus) is 1e136. The actual number of
    | protons and neutrons in the universe is about 1e80. Why would anyone need
    | numbers larger than that?

    Think small... sometimes you want very small numbers not to go to zero.
    If they do your calculations will not converge, or will do so slowly.

    |
    | The IBM extended precision format, starting with the 360/85, is base 16 with
    | a 112 bit mantissa, and 7 bit base 16 exponent. (Single, Double, and
    | Extended all have the same 7 bit base 16 exponent.)

    I thought the 80 bit format was part of the IEEE standard, called
    something like intermediate result or some such. gcc calls it long
    double IIRC.

    --
    Bill Davidsen <> CTO, TMR Associates
    As we enjoy great advantages from inventions of others, we should be
    glad of an opportunity to serve others by any invention of ours; and
    this we should do freely and generously.
    -Benjamin Franklin (who would have liked open source)
     
    bill davidsen, Aug 21, 2003
    #13
  14.  
    Glen Herrmannsfeldt, Aug 27, 2003
    #14
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.