← #1,090 – Using Visual Studio to Verify How Floating Point Numbers Are Stored

#1,091 – Subnormal Floating Point Numbers

May 7, 2014 1 Comment

32-bit binary floating point numbers are normally stored in a normalized form, that is:

where d is the fractional part of the mantissa, consisting of 23 binary digits and e is the exponent, represented by 8 bits.

In this form, the minimum allowed value for e is -126, which is stored in the 8-bit exponent as a value of 1. Because the leading 1 is implicit, this means that the minimum positive floating point value is:

We could use the 8-bit value of 0 in the exponent to represent an exponent of -127, but that would only gain us a single power of two, or one more value that we could store.

Instead, a value of of 0 stored in the 8-bit exponent is a signal to drop the leading 1 in the mantissa. This allows storing a set of much smaller numbers, known as subnormal numbers, of the form:

We can now use all 23 digits in the mantissa, allowing us to store numbers as low as 2^-149.

Filed under Basics Tagged with Basics, C#, Floating Point, Subnormal

About Sean
Software developer in the Twin Cities area, passionate about software development and sailing.

One Response to #1,091 – Subnormal Floating Point Numbers

Pingback: Dew Drop – May 7, 2014 (#1771) | Morning Dew

S	M	T	W	T	F	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

2,000 Things You Should Know About C#

#1,091 – Subnormal Floating Point Numbers

One Response to #1,091 – Subnormal Floating Point Numbers

Leave a comment Cancel reply

Sean Sexton

Recent Posts

Blogroll

Calendar

Top Posts

Tags

Blog Stats

2,000 Things You Should Know About C#

#1,091 – Subnormal Floating Point Numbers

Share this:

Related

One Response to #1,091 – Subnormal Floating Point Numbers

Leave a comment Cancel reply

Sean Sexton

Recent Posts

Blogroll

Calendar

Top Posts

Tags

Blog Stats