Homework - Conversion of Reals to Machine Floating Point


During a lecture I asked that you convert the following numbers in base ten to the machine representation of a 4 byte floating points:
  1. 28.050
  2. -16.25
  3. -0.0025

Machine Numbers:

The 4 bytes have the following template.  The left most bit is the sign bit: zero if non negative. The next 8 bits are the exponent of the base two representation of the number biased by 127ten  .  The remaining 23 bits are the mantissa.

Solution method:

  1. Convert the number to it's base two representation.
  2. Multiply by the appropriate power of two so that the "decimal" point is shifted leaving only the number 1 to it's left.  This power of two becomes the exponent.
  3. Add the bias of 127ten  to the previous exponent to obtain the value that is stored.
  4. If "chopping" is used, the first 23 digits to the right of the "decimal" point becomes the mantissa.  If "rounding" is used, the standard practice, then look at the 24th bit.  If it is 1 round up.

The Problems:

  1. 28.050 = 11100.00(0011)two  where the () denote a repeated string.
  2. 28.050 = 1.110000(0011)two  X 24 .
From this we obtain that the sign bit is 0.  4 + 127 = 131.  Converted to base two and using 8 bits we get 10000011.  This is the exponent..  Finally the mantissa is 1100000011001100110.

-16.25 in base ten is -10000.01 in base two.  This means the sign bit is 1.  The exponent is again 10000011.  The mantissa is 00000100000000000000000.

This last one is tough, huh!  Converted directly with infinite precision it is 0.0025 = 0.0000(00001010001111010111) in which the digits in the () repeat.  This must be multiplied by 2-9  .  Hence we take 127 + (-9) = 118.  The representation of the exponent is 01110110.  Finally the mantissa is 0100011110101110000101.

Hope these agree with yours!


For fun, (yes, I'm sick this way), I wrote a program to implement the algorithm in the book Eqn (22) page 18.  It is in C++.  Feel free to download it and play with it.  It takes a fraction and converts it to a chopped string of binary digits.  It is how I found the 20 digit repeating pattern given above.  The program is  frac2.cpp .