#1,097 – Summary of how Floating Point Numbers Are Stored in .NET

Here’s a complete summary of how 32-bit and 64-bit floating point numbers are represented in .NET, including special values.  For more background, look here and here.

Floating point numbers are stored in .NET according to the IEEE 754 standard:

  • Normalized values (1.bb x 2^bb)
    • Sign bit – positive/negative
    • Mantissa – normalized binary number, does not store leading 1  (23 or 52 bits)
    • Exponent – biased, add 127 (or 1023) to exponent before storing  (8 or 11 bits)
  • Subnormal numbers
    • Sign bit – positive/negative
    • Mantissa – non-normalized, no implied leading 1  (23 or 52 bits)
    • Exponent – 0
  • Positive/negative zero
    • Sign bit – positive/negative
    • Mantissa – 0
    • Exponent – 0
  • Positive/negative infinity
    • Sign bit – positive/negative
    • Mantissa – 0
    • Exponent – FF (hex) or 7FF (hex)
  • NaN
    • Sign bit – not defined (implementation dependent)
    • Mantissa – some non-zero value
    • Exponent – FF (hex) or 7FF (hex)

Ranges for (normalized) numbers represented as 32- and 64-bit floating point numbers:

  • 32-bit float: -3.4 x 10^38 to 3.4 x 10^38
  • 64-bit double: -1.7 x 10^308 to 1.7 x 10^308

#1,096 – Floating Point NaN Values

We’ve seen that floating point numbers can represent approximations of real number values, positive and negative zero values, and positive and negative infinity.

A floating point variable or memory location can also represent a value that is “Not a Number”, normally denoted by the keyword NaN.  A NaN value is a floating point value that is a result of a calculation that leads to a value that is not a real number or a positive or negative infinity value.

NaN values are used when it’s useful to capture the fact that a calculation led to a value that is not a valid numerical result.

Below are some examples of calculations that can lead to NaN values.  Note that we can use float.IsNaN to check for this value.

            float zeroOverZero = 0.0f / 0.0f;
            float zeroTimesInfinity = 0.0f * float.PositiveInfinity;
            float InfinityCalc = float.PositiveInfinity + float.NegativeInfinity;

            double rootNegOne = Math.Sqrt(-1.0);

1096-001

#1,095 – How Floating Point Infinity Values Are Stored

.NET floating point types can represent both positive and negative infinity floating point values.  They are stored as 32-bit floating point values as follows:

  • +Infinity : sign bit 0, mantissa 0 (23 bits), exponent FF (hex) (8 bits)
  • -Infinity : sign bit 1, mantissa 0 (23 bits), exponent FF (hex) (8 bits)

Positive infinity is therefore stored as 7F800000 (hex) and negative zero as FF800000 (hex).  We can verify this by looking at these values in memory from within Visual Studio.

Assuming the following code:

            float posInfinity = float.PositiveInfinity;
            float negInfinity = float.NegativeInfinity;

We can now look at these values in memory.

Positive infinity is 7F800000 (stored little-endian).

1095-001

Negative infinity is FF800000 (stored little-endian).

1095-002

#1,094 – Positive and Negative Infinity

Because .NET uses the IEEE 754 standard to represent floating point numbers, it allows representing both positive and negative infinity.  The two values of infinity are special values that can be represented by the 32-bit float type or the 64-bit double.  Mathematically, infinity is a concept that represents a value greater than any other real number (positive infinity), or smaller than any other real number (negative infinity).

You can specify a value of infinity using the PositiveInfinity and NegativeInfinity static properties of the float class.

            float posInfinity = float.PositiveInfinity;
            float negInfinity = float.NegativeInfinity;

In Visual Studio, the debugger will list these values as Infinity or -Infinity.

1094-001

You can also generate a positive or negative infinity value as a result of dividing a positive or negative number by zero.  Doing these calculations does not result in an exception.

            float posInfinity = 1.0f / 0;
            float negInfinity = -1.0f / 0;

#1,093 – How Positive and Negative Zero Values Are Stored

.NET can represent both positive and negative zero floating point values.  They are stored as 32-bit floating point values as follows:

  • +0 : sign bit 0, mantissa 0 (23 bits), exponent 0 (8 bits)
  • -0: sign bit 1, mantissa 0 (23 bits), exponent 0 (8 bits)

A 32-bit positive zero is therefore stored as 00000000 (hex) and negative zero as 80000000 (hex).  We can verify this by looking at these values in memory from within Visual Studio.

Assuming the following code:

            float zero = 0.0f;
            float negZero = -0.0f;

Positive zero:

1093-001

Negative zero:

1093-002

 

See You at TechEd

I’ll Be at TechEd North America in Houston next week (12-15 May, 2014).  If you’re a fan of the 2,000 Things blogs and will be in Houston, look me up.  (E.g. DM @spsexton on Twitter).  We can go grab a beer and sing the praises of C#, WPF and all things .NET.

#1,092 – Positive and Negative Zero

Because .NET uses the IEEE 754 standard to represent floating point numbers, it allows representing both positive and negative zero values.  (+0.0 and -0.0).

Mathematically, +0.0 is equal to -0.0 and an equality check in C# will return a true result.  However, although the values are considered equal, either value can be represented in C# and they are stored differently in memory.

            float zero = 0.0f;
            float negZero = -0.0f;

            bool theyAreEqual = zero == negZero;   // true

            // 00-00-00-00
            Console.WriteLine(BitConverter.ToString(BitConverter.GetBytes(zero)));

            // 00-00-00-80
            Console.WriteLine(BitConverter.ToString(BitConverter.GetBytes(negZero)));

            float sum1 = zero + 1.0f;
            float sum2 = negZero + 1.0f;
            bool sumsEqual = sum1 == sum2;    // true

1092-001

 

You can think of  a floating point representation of zero as being either zero or a very small positive number that rounds to zero when stored as a floating point.  If the value was a tiny bit above zero before rounding, it’s stored as +0.0.  If it was a bit below zero before rounding, it’s stored as -0.0.

#1,091 – Subnormal Floating Point Numbers

32-bit binary floating point numbers are normally stored in a normalized form, that is:

1091-001

where d is the fractional part of the mantissa, consisting of 23 binary digits and e is the exponent, represented by 8 bits.

In this form, the minimum allowed value for e is -126, which is stored in the 8-bit exponent as a value of 1.  Because the leading 1 is implicit, this means that the  minimum positive floating point value is:

1091-002

We could use the 8-bit value of 0 in the exponent to represent an exponent of -127, but that would only gain us a single power of two, or one more value that we could store.

Instead, a value of of 0 stored in the 8-bit exponent is a signal to drop the leading 1 in the mantissa.  This allows storing a set of much smaller numbers, known as subnormal numbers, of the form:

1091-003

We can now use all 23 digits in the mantissa, allowing us to store numbers as low as 2^-149.

#1,090 – Using Visual Studio to Verify How Floating Point Numbers Are Stored

Recall that floating point numbers are stored in memory by storing the sign bit, exponent and mantissa.

We showed that the decimal value of 7.25, stored as a 32-bit floating point value, is stored as the binary value 0x40E80000.

1089-001

We can verify this in Visual Studio by assigning a float to contain the value 7.25 and then looking at that value in memory.

1090-001

Notice that the bytes appear to be backwards, relative to their order as written above.  This is because Intel is a little-endian platform (bytes at “little” end of 32-bit word are stored first).

#1,089 – How 32-Bit Floating Point Numbers Are Stored in .NET, part II

(part I)

We store the sign, exponent, and mantissa of a binary floating point number in memory as follows.

The sign is stored using a single bit, 0 for positive, 1 for negative.

The exponent is stored using 8 bits.  The exponent can be positive or negative, but we store it as a positive value by adding an offset (bias) of 127.  We add 127 to the desired exponent and then store the resulting value.  Exponents in the range of -126 to 127 are therefore stored as values in the range 1 to 254.  (Stored exponent values of 0 and 255 have special meaning).

Because we normalize the binary floating point number, it has the form 1.xxx.  Since it always includes the leading 1, we don’t store it, but use 23 bits to store up to 23 digits after the decimal point.

For example:

1089-001