#994 – Unicode Basics
December 12, 2013 7 Comments
To store alphabetic characters or other written characters on a computer system, either in memory or on disk, we need to encode each character so that we can store some numeric value that represents the character. The numeric values are, ultimately, just bit patterns–where each bit pattern represents some character.
Unicode is a standard that specifies methods for encoding all characters from the written languages of the world. This includes the ability to encode more than 1 million unique characters. Unicode is the standard used for all web-based traffic (HTML and XML) and for storing character data on most modern operating systems (e.g. Windows, OS X, Unix).
The Unicode standard defines a number of different character encodings. The most common are:
- UTF-8 – Variable number of bytes used, from 1-4 bytes. English characters use only 1 byte.
- UTF-16 – Uses 2 bytes for most common characters, 4 bytes for other characters.
Pingback: Dew Drop – December 12, 2013 (#1682) | Morning Dew
Pingback: #995 – Unicode Code Points | 2,000 Things You Should Know About C#
Pingback: #996 – UTF-16 Encoding, Part I | 2,000 Things You Should Know About C#
Pingback: #997 – UTF-16 Encoding, Part II | 2,000 Things You Should Know About C#
Pingback: #998 – UTF-8 Encoding | 2,000 Things You Should Know About C#
Pingback: #999 – Some Examples of UTF-16 and UTF-8 Encoding | 2,000 Things You Should Know About C#
Pingback: #1,002 – Specifying Character Encoding when Writing to a File | 2,000 Things You Should Know About C#