#994 – Unicode Basics

December 12, 2013 7 Comments

To store alphabetic characters or other written characters on a computer system, either in memory or on disk, we need to encode each character so that we can store some numeric value that represents the character. The numeric values are, ultimately, just bit patterns–where each bit pattern represents some character.

Unicode is a standard that specifies methods for encoding all characters from the written languages of the world. This includes the ability to encode more than 1 million unique characters. Unicode is the standard used for all web-based traffic (HTML and XML) and for storing character data on most modern operating systems (e.g. Windows, OS X, Unix).

The Unicode standard defines a number of different character encodings. The most common are:

UTF-8 – Variable number of bytes used, from 1-4 bytes. English characters use only 1 byte.
UTF-16 – Uses 2 bytes for most common characters, 4 bytes for other characters.

Filed under Basics Tagged with Basics, C#, Unicode, UTF-16, UTF-8

About Sean
Software developer in the Twin Cities area, passionate about software development and sailing.

S	M	T	W	T	F	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

2,000 Things You Should Know About C#

#994 – Unicode Basics

7 Responses to #994 – Unicode Basics

Leave a comment Cancel reply

Sean Sexton

Recent Posts

Blogroll

Calendar

Top Posts

Tags

Blog Stats

2,000 Things You Should Know About C#

#994 – Unicode Basics

Share this:

Related

7 Responses to #994 – Unicode Basics

Leave a comment Cancel reply

Sean Sexton

Recent Posts

Blogroll

Calendar

Top Posts

Tags

Blog Stats