#1,095 – How Floating Point Infinity Values Are Stored

.NET floating point types can represent both positive and negative infinity floating point values.  They are stored as 32-bit floating point values as follows:

  • +Infinity : sign bit 0, mantissa 0 (23 bits), exponent FF (hex) (8 bits)
  • -Infinity : sign bit 1, mantissa 0 (23 bits), exponent FF (hex) (8 bits)

Positive infinity is therefore stored as 7F800000 (hex) and negative zero as FF800000 (hex).  We can verify this by looking at these values in memory from within Visual Studio.

Assuming the following code:

            float posInfinity = float.PositiveInfinity;
            float negInfinity = float.NegativeInfinity;

We can now look at these values in memory.

Positive infinity is 7F800000 (stored little-endian).

1095-001

Negative infinity is FF800000 (stored little-endian).

1095-002

#1,094 – Positive and Negative Infinity

Because .NET uses the IEEE 754 standard to represent floating point numbers, it allows representing both positive and negative infinity.  The two values of infinity are special values that can be represented by the 32-bit float type or the 64-bit double.  Mathematically, infinity is a concept that represents a value greater than any other real number (positive infinity), or smaller than any other real number (negative infinity).

You can specify a value of infinity using the PositiveInfinity and NegativeInfinity static properties of the float class.

            float posInfinity = float.PositiveInfinity;
            float negInfinity = float.NegativeInfinity;

In Visual Studio, the debugger will list these values as Infinity or -Infinity.

1094-001

You can also generate a positive or negative infinity value as a result of dividing a positive or negative number by zero.  Doing these calculations does not result in an exception.

            float posInfinity = 1.0f / 0;
            float negInfinity = -1.0f / 0;

#1,093 – How Positive and Negative Zero Values Are Stored

.NET can represent both positive and negative zero floating point values.  They are stored as 32-bit floating point values as follows:

  • +0 : sign bit 0, mantissa 0 (23 bits), exponent 0 (8 bits)
  • -0: sign bit 1, mantissa 0 (23 bits), exponent 0 (8 bits)

A 32-bit positive zero is therefore stored as 00000000 (hex) and negative zero as 80000000 (hex).  We can verify this by looking at these values in memory from within Visual Studio.

Assuming the following code:

            float zero = 0.0f;
            float negZero = -0.0f;

Positive zero:

1093-001

Negative zero:

1093-002

 

#1,092 – Positive and Negative Zero

Because .NET uses the IEEE 754 standard to represent floating point numbers, it allows representing both positive and negative zero values.  (+0.0 and -0.0).

Mathematically, +0.0 is equal to -0.0 and an equality check in C# will return a true result.  However, although the values are considered equal, either value can be represented in C# and they are stored differently in memory.

            float zero = 0.0f;
            float negZero = -0.0f;

            bool theyAreEqual = zero == negZero;   // true

            // 00-00-00-00
            Console.WriteLine(BitConverter.ToString(BitConverter.GetBytes(zero)));

            // 00-00-00-80
            Console.WriteLine(BitConverter.ToString(BitConverter.GetBytes(negZero)));

            float sum1 = zero + 1.0f;
            float sum2 = negZero + 1.0f;
            bool sumsEqual = sum1 == sum2;    // true

1092-001

 

You can think of  a floating point representation of zero as being either zero or a very small positive number that rounds to zero when stored as a floating point.  If the value was a tiny bit above zero before rounding, it’s stored as +0.0.  If it was a bit below zero before rounding, it’s stored as -0.0.

#1,091 – Subnormal Floating Point Numbers

32-bit binary floating point numbers are normally stored in a normalized form, that is:

1091-001

where d is the fractional part of the mantissa, consisting of 23 binary digits and e is the exponent, represented by 8 bits.

In this form, the minimum allowed value for e is -126, which is stored in the 8-bit exponent as a value of 1.  Because the leading 1 is implicit, this means that the  minimum positive floating point value is:

1091-002

We could use the 8-bit value of 0 in the exponent to represent an exponent of -127, but that would only gain us a single power of two, or one more value that we could store.

Instead, a value of of 0 stored in the 8-bit exponent is a signal to drop the leading 1 in the mantissa.  This allows storing a set of much smaller numbers, known as subnormal numbers, of the form:

1091-003

We can now use all 23 digits in the mantissa, allowing us to store numbers as low as 2^-149.

#1,090 – Using Visual Studio to Verify How Floating Point Numbers Are Stored

Recall that floating point numbers are stored in memory by storing the sign bit, exponent and mantissa.

We showed that the decimal value of 7.25, stored as a 32-bit floating point value, is stored as the binary value 0x40E80000.

1089-001

We can verify this in Visual Studio by assigning a float to contain the value 7.25 and then looking at that value in memory.

1090-001

Notice that the bytes appear to be backwards, relative to their order as written above.  This is because Intel is a little-endian platform (bytes at “little” end of 32-bit word are stored first).

#1,089 – How 32-Bit Floating Point Numbers Are Stored in .NET, part II

(part I)

We store the sign, exponent, and mantissa of a binary floating point number in memory as follows.

The sign is stored using a single bit, 0 for positive, 1 for negative.

The exponent is stored using 8 bits.  The exponent can be positive or negative, but we store it as a positive value by adding an offset (bias) of 127.  We add 127 to the desired exponent and then store the resulting value.  Exponents in the range of -126 to 127 are therefore stored as values in the range 1 to 254.  (Stored exponent values of 0 and 255 have special meaning).

Because we normalize the binary floating point number, it has the form 1.xxx.  Since it always includes the leading 1, we don’t store it, but use 23 bits to store up to 23 digits after the decimal point.

For example:

1089-001

 

 

 

#1,088 – How 32-Bit Floating Point Numbers Are Stored in .NET, part I

Floating point numbers in .NET (on Intel-based PCs) are stored using the IEEE 754 standard, which defines how to store both 32-bit (float) and 64-bit (double) floating points.

Floating point numbers are stored in memory by storing the value as a binary floating point value represented using scientific notation (binary).

For example, to store a decimal value of 7.25:

1088-001

The exponent is expressed as binary, so it has a value of 2.

We can now store this floating point number in memory by storing three things:

  • The sign of the number (positive)
  • The mantissa (1.1101)
  • The exponent (10)

On Intel-based PCs, 32-bit floating point numbers are stored as follows:

  • 1 bit to store the sign (0 for positive numbers, 1 for negative numbers)
  • 8 bits to store the exponent
  • 23 bits to store the mantissa

More coming in part II

#1,087 – Representing Binary Floating Point Numbers Using Scientific Notation

We can represent decimal floating point numbers using scientific notation, using the form:

1084-001

(where a and b are both decimal values).

We can also represent binary floating point numbers using scientific notation.  The basic form is:

1087-001

(where a and b are both binary values).

Below is an example.  Assume that we have a binary floating point value of 101.101 (5.625 decimal).

1087-002

If we have the binary value 1.01101, we need to shift the decimal point two places to the right to get our desired value of 101.101.  This is equivalent to multiplying by 4, or 2 raised to the power of 2.  We write the exponent in binary, so the power of 2 is written as “10”.

 

#1,086 – Converting Decimal Floating Point to Binary Floating Point

We can represent a particular floating point value as either a decimal floating point number or a binary floating point number.

Below is an example of how we would convert a decimal floating point value (3.25)  to its equivalent binary floating point value.

1086-001

This particular example was rather easy, because the decimal value 0.25 represents 1/4, so can be represented as a binary fraction with only two digits after the decimal point.

The decimal value 1.1 is a bit more difficult to calculate.

At each step, we find the largest fractional power of two (e.g. 1/2, 1/4, 1/8) that is smaller than the remaining fractional value.  We then subtract that value and continue.

1086-002

We could continue this process, adding one digit of precision at each step and reducing the error–the difference between the binary representation and the value 1.1.  In the end, we can’t exactly represent this value with a binary floating point number.