#1,091 – Subnormal Floating Point Numbers
May 7, 2014 1 Comment
32-bit binary floating point numbers are normally stored in a normalized form, that is:
where d is the fractional part of the mantissa, consisting of 23 binary digits and e is the exponent, represented by 8 bits.
In this form, the minimum allowed value for e is -126, which is stored in the 8-bit exponent as a value of 1. Because the leading 1 is implicit, this means that the minimum positive floating point value is:
We could use the 8-bit value of 0 in the exponent to represent an exponent of -127, but that would only gain us a single power of two, or one more value that we could store.
Instead, a value of of 0 stored in the 8-bit exponent is a signal to drop the leading 1 in the mantissa. This allows storing a set of much smaller numbers, known as subnormal numbers, of the form:
We can now use all 23 digits in the mantissa, allowing us to store numbers as low as 2^-149.