Skip to content
Sahithyan's S1
Sahithyan's S1 — Programming Fundamentals

Floating-point Numbers

IEEE 754 standard.

2 types:

  • single precision
  • double precision

Single precision

Uses 3232 bits.

  • sign bit - 11 bit
  • exponent - 88 bit
  • mantissa - 2323 bit

Sign bit

00 if positive or zero. 11 if negative.

Exponent

Exponent field range - [0,255][0,255]. In this range [1,254][1,254] is defined for normal numbers. 00 and 255255 are reserved for subnormal, infinite, signed zeros and NaN.

To support negative exponents, 127127 (half of 254254) is subtracted from this range. [126,127][-126,127]. This range is the representable range.

Mantissa

In scientific notation, the part that doesn’t contain the base and the power.

In binary scientific notation, there will always be exactly one 11 bit before the dot. So the inital 11 is not included in the mantissa.

Double precision

Uses 6464 bits.

  • sign bit - 11 bit
  • exponent - 1111 bit
  • mantissa - 5252 bit

Sign bit

00 if positive or zero. 11 if negative.

Exponent

Exponent field range - [0,2047][0,2047]. In this range [1,2046][1,2046] is defined for normal numbers. 00 and 20472047 are reserved for subnormal, infinite, signed zeros and NaN.

To support negative exponents, 10231023 (half of 20462046) is subtracted from this range. [1022,1023][-1022,1023]. This range is the representable range.

Mantissa

In scientific notation, the part that doesn’t contain the base and the power.

In binary scientific notation, there will always be exactly one 11 bit before the dot. So we don’t include that one.