#1,000 – UTF-8 and ASCII

UTF-8 is a character encoding scheme for Unicode character data that uses from 1-4 bytes to represent each character, depending on the code point of the character to be represented.

In Unicode, code points for ASCII characters are equivalent to the ASCII code for that character.

This mapping is true for all 128 ASCII codes.  

UTF-8 encoding maps these first 128 characters in the set of Unicode code points to a single byte containing the code point.  Because of this:

Characters included in the ASCII character set that are present in a stream of UTF-8 encoded character data will appear the same as if they were encoded as ASCII.

This means that a UTF-8 encoded stream of ASCII characters will be identical to an ASCII-encoded stream of the same characters.  I.e. For English language characters, UTF-8 is identical to ASCII.

Advertisements

About Sean
Software developer in the Twin Cities area, passionate about .NET technologies. Equally passionate about my own personal projects related to family history and preservation of family stories and photos.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: