floating point to fixed point conversion

Discussion in 'Embedded' started by riya, Feb 21, 2006.

  1. riya

    riya Guest

    hello guys,

    I need some help from you. I am doing a DSP project and for that I need
    to do some C coding for the conversion of sample data which is in
    floating point representation to fixed point representation.
    the sample data is in floating point like


    My DSP algorithm is implemented in C and is supposed to be using fixed
    point representation.
    The above data is intended to be converted to fixed integer format.I
    request you to help me out regarding this conversion.I will be very
    glad if u give me some hints or algorithms for this conversion.
    riya, Feb 21, 2006
  2. riya

    Tim Wescott Guest

    If you must post the same question to multiple newsgroups please cross-post.


    Tim Wescott
    Wescott Design Services

    Posting from Google? See http://cfaj.freeshell.org/google/
    Tim Wescott, Feb 21, 2006
  3. I will use single precision.

    As you may or may not know the IEEE754 format is as follows:


    S = Sign bit (0 = +, 1 = -)
    E = 8-bit biased exponent (Bias = 0x7F)
    M = Fractional portion/ Significand

    The IEEE754 has a single implied integer bit of 1 (which is excluded
    from the mantissa).

    Really conversion from FP to Fixed point will be shifting and maybe
    negation as well (only whole part, when negating, DO NOT touch the
    fractional portion of the fixed point value).

    Basically, add the implied integer bit(mask off mantissa and or with
    Zero extend the result to 32-bits (ideally larger since you will risk
    losing some integer bits, If the values that you have given represent
    the range of FP values expected than 32-bits will be sufficient ).

    Shift left this value, and decrement the exponent with each shift, if
    unbiased exponent is positive.

    Shift right this value, and increment the exponent with each shift, if
    unbiased exponent is negative.
    Repeat until the exponent = 0 (Remember to remove bias)
    Take bits 31 - 24 as your integer portion and the bits below that as
    your integer.

    Bits 23-0 will be your fractional portion.
    Say if you are using 16.16 fixed point you will have to truncate the
    fractional portion. So the Leat significant byte of the fractional
    portion will have to be discarded.

    Last you will need to test the sign value to determine if negation of
    the whole portion should take place.

    I don't know how efficient this algorithm is or if there are any
    mistakes, hopefully someone will point this out. If I was given this
    task, that is how I would attack it.

    Isaac Bosompem, Feb 22, 2006
  4. riya

    Charles Oram Guest

    Depending on what precision you need, the simplest way is just to multiply
    the floating point numbers by an integer constant, then do all you maths
    processing in integers. To convert back to the floating point values just
    divide by that same integer constant. Think of it like changing your units
    of measurement, e.g. doing your calculations in millimetres instead of
    For example, you could use 32-bit integers as your fixed point numbers -
    multiply your floating point numbers by 65536 (or shift 16 bits) to get the
    fixed point numbers, then divide by 65536 to get back.

    - Charles
    Charles Oram, Feb 22, 2006
  5. This will work and will be a lot easier if your target has the
    capability (or instruction) to convert a fp value to an integer value
    (like x86).
    Isaac Bosompem, Feb 22, 2006
  6. It's not that simple. If you start with metres and you want to calculate 1m
    x 1m, the result is obvous. If you now converts 1m to 1000mm first and
    simply do 1000mm x 1000mm, the result is not quite what you want.
    DSP's solve this by swithing the MAC into either integer or fractional
    mode, where the latter shifts the result one bit left after each multiply.

    So you have to follow some rules of thinking. If you convert your floats to,
    say 1.15 fixed point and you multiply two of these, the result is 2.30,
    possibly truncated to 2.14 (16 bits). As long as you keep this in mind, you
    can multiply anything like this.

    If you look at the numbers the OP gave:
    2.296968 can be represented as 2.14 and -0.448350 in 1.15. The result of the
    multiplication will be in 3.29 format. Additions do not have this effect.

    So as long as you keep the resulting format in mind, you can indeed multiply
    each float by for instance 2^16 to convert them int fixed-point numbers.
    No, you must devide by 65535 * 65536
    Shift the input 16 bits left, shift the output *17* bits right.

    Meindert Sprang, Feb 22, 2006
  7. riya

    Charles Oram Guest

    You're dead right - I got confused because I was working on a calculation at
    the time where only one of the numebrs being multiplied has been scaled by
    16 bits :(
    The difficult part about doing it yourself (with no support from your
    processor) is that you have to go through each calculation and check that
    you are scaling the output correctly each time and that your calculations
    are not overflowing the integer size.

    - Charles
    Charles Oram, Feb 24, 2006
  8. But the good part of this is when you get the hang (and the discipline) of
    it, you can do the calculations in any fixed point representation you like.
    You can even treat numbers in one format at a certain stage in the
    calculation and then move on just "thinking" them in a different format in
    the next stage.

    Meindert Sprang, Feb 24, 2006
