#1,098 – Floating Point Overflow

When you perform an arithmetic operation on a floating point value and the result has a magnitude that is too large to be represented by the floating point data type, an overflow condition occurs.  This is equivalent to trying to store a value whose exponent is larger than the maximum allowed exponent value.

If the result of the calculation is greater than the maximum representable positive floating point value, the result of the calculation is +Infinity.  If the result of the calculation is less than the largest representable negative floating point value, the result of the calculation is -Infinity.

            // Create a big floating point number
            float bigNumber = float.MaxValue;
            Console.WriteLine(bigNumber);  // 3.402823E+38

            // Adding small number doesn't change large number
            float newBig = bigNumber + 1.0f;
            Console.WriteLine(newBig);     // 3.402823E+38

            // But doubling the original number leads to overflow
            float reallyBig = bigNumber * 2.0f;
            Console.WriteLine(reallyBig);

1098-001

#1,097 – Summary of how Floating Point Numbers Are Stored in .NET

Here’s a complete summary of how 32-bit and 64-bit floating point numbers are represented in .NET, including special values.  For more background, look here and here.

Floating point numbers are stored in .NET according to the IEEE 754 standard:

  • Normalized values (1.bb x 2^bb)
    • Sign bit – positive/negative
    • Mantissa – normalized binary number, does not store leading 1  (23 or 52 bits)
    • Exponent – biased, add 127 (or 1023) to exponent before storing  (8 or 11 bits)
  • Subnormal numbers
    • Sign bit – positive/negative
    • Mantissa – non-normalized, no implied leading 1  (23 or 52 bits)
    • Exponent – 0
  • Positive/negative zero
    • Sign bit – positive/negative
    • Mantissa – 0
    • Exponent – 0
  • Positive/negative infinity
    • Sign bit – positive/negative
    • Mantissa – 0
    • Exponent – FF (hex) or 7FF (hex)
  • NaN
    • Sign bit – not defined (implementation dependent)
    • Mantissa – some non-zero value
    • Exponent – FF (hex) or 7FF (hex)

Ranges for (normalized) numbers represented as 32- and 64-bit floating point numbers:

  • 32-bit float: -3.4 x 10^38 to 3.4 x 10^38
  • 64-bit double: -1.7 x 10^308 to 1.7 x 10^308

#1,096 – Floating Point NaN Values

We’ve seen that floating point numbers can represent approximations of real number values, positive and negative zero values, and positive and negative infinity.

A floating point variable or memory location can also represent a value that is “Not a Number”, normally denoted by the keyword NaN.  A NaN value is a floating point value that is a result of a calculation that leads to a value that is not a real number or a positive or negative infinity value.

NaN values are used when it’s useful to capture the fact that a calculation led to a value that is not a valid numerical result.

Below are some examples of calculations that can lead to NaN values.  Note that we can use float.IsNaN to check for this value.

            float zeroOverZero = 0.0f / 0.0f;
            float zeroTimesInfinity = 0.0f * float.PositiveInfinity;
            float InfinityCalc = float.PositiveInfinity + float.NegativeInfinity;

            double rootNegOne = Math.Sqrt(-1.0);

1096-001

#1,095 – How Floating Point Infinity Values Are Stored

.NET floating point types can represent both positive and negative infinity floating point values.  They are stored as 32-bit floating point values as follows:

  • +Infinity : sign bit 0, mantissa 0 (23 bits), exponent FF (hex) (8 bits)
  • -Infinity : sign bit 1, mantissa 0 (23 bits), exponent FF (hex) (8 bits)

Positive infinity is therefore stored as 7F800000 (hex) and negative zero as FF800000 (hex).  We can verify this by looking at these values in memory from within Visual Studio.

Assuming the following code:

            float posInfinity = float.PositiveInfinity;
            float negInfinity = float.NegativeInfinity;

We can now look at these values in memory.

Positive infinity is 7F800000 (stored little-endian).

1095-001

Negative infinity is FF800000 (stored little-endian).

1095-002

#1,094 – Positive and Negative Infinity

Because .NET uses the IEEE 754 standard to represent floating point numbers, it allows representing both positive and negative infinity.  The two values of infinity are special values that can be represented by the 32-bit float type or the 64-bit double.  Mathematically, infinity is a concept that represents a value greater than any other real number (positive infinity), or smaller than any other real number (negative infinity).

You can specify a value of infinity using the PositiveInfinity and NegativeInfinity static properties of the float class.

            float posInfinity = float.PositiveInfinity;
            float negInfinity = float.NegativeInfinity;

In Visual Studio, the debugger will list these values as Infinity or -Infinity.

1094-001

You can also generate a positive or negative infinity value as a result of dividing a positive or negative number by zero.  Doing these calculations does not result in an exception.

            float posInfinity = 1.0f / 0;
            float negInfinity = -1.0f / 0;

#1,093 – How Positive and Negative Zero Values Are Stored

.NET can represent both positive and negative zero floating point values.  They are stored as 32-bit floating point values as follows:

  • +0 : sign bit 0, mantissa 0 (23 bits), exponent 0 (8 bits)
  • -0: sign bit 1, mantissa 0 (23 bits), exponent 0 (8 bits)

A 32-bit positive zero is therefore stored as 00000000 (hex) and negative zero as 80000000 (hex).  We can verify this by looking at these values in memory from within Visual Studio.

Assuming the following code:

            float zero = 0.0f;
            float negZero = -0.0f;

Positive zero:

1093-001

Negative zero:

1093-002

 

#1,092 – Positive and Negative Zero

Because .NET uses the IEEE 754 standard to represent floating point numbers, it allows representing both positive and negative zero values.  (+0.0 and -0.0).

Mathematically, +0.0 is equal to -0.0 and an equality check in C# will return a true result.  However, although the values are considered equal, either value can be represented in C# and they are stored differently in memory.

            float zero = 0.0f;
            float negZero = -0.0f;

            bool theyAreEqual = zero == negZero;   // true

            // 00-00-00-00
            Console.WriteLine(BitConverter.ToString(BitConverter.GetBytes(zero)));

            // 00-00-00-80
            Console.WriteLine(BitConverter.ToString(BitConverter.GetBytes(negZero)));

            float sum1 = zero + 1.0f;
            float sum2 = negZero + 1.0f;
            bool sumsEqual = sum1 == sum2;    // true

1092-001

 

You can think of  a floating point representation of zero as being either zero or a very small positive number that rounds to zero when stored as a floating point.  If the value was a tiny bit above zero before rounding, it’s stored as +0.0.  If it was a bit below zero before rounding, it’s stored as -0.0.