#1,000 – UTF-8 and ASCII

UTF-8 is a character encoding scheme for Unicode character data that uses from 1-4 bytes to represent each character, depending on the code point of the character to be represented.

In Unicode, code points for ASCII characters are equivalent to the ASCII code for that character.

This mapping is true for all 128 ASCII codes.  

UTF-8 encoding maps these first 128 characters in the set of Unicode code points to a single byte containing the code point.  Because of this:

Characters included in the ASCII character set that are present in a stream of UTF-8 encoded character data will appear the same as if they were encoded as ASCII.

This means that a UTF-8 encoded stream of ASCII characters will be identical to an ASCII-encoded stream of the same characters.  I.e. For English language characters, UTF-8 is identical to ASCII.

Advertisements