#996 – UTF-16 Encoding, Part I

December 16, 2013 2 Comments

Unicode maps characters into their corresponding code points, i.e. a numeric value that represents that character. A character encoding scheme then dictates how each code point is represented as a series of bits so that it can be stored in memory or on disk.

UTF-16 is one of the more common character encodings used to represent Unicode characters. UTF-16 uses either 2 or 4 bytes to represent each code point.

All code points in the range of 0 to FFFF are represented directly as a 2-byte value. This set of code points is known as the Basic Multilingual Plane (BMP).

Code points in the BMP are defined only within the following two ranges (hex values):

0000 – D7FF
E000 – FFFF

This results in a total of 63,488 characters that can be represented. This first set of values is known as the Basic Multilingual Plane (BMP). In actuality, around 55,000 code points are currently defined in the BMP.

Filed under Basics Tagged with Basics, C#, Code Points, Unicode, UTF-16, UTF16

About Sean
Software developer in the Twin Cities area, passionate about software development and sailing.

S	M	T	W	T	F	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

2,000 Things You Should Know About C#

#996 – UTF-16 Encoding, Part I

2 Responses to #996 – UTF-16 Encoding, Part I

Leave a comment Cancel reply

Sean Sexton

Recent Posts

Blogroll

Calendar

Top Posts

Tags

Blog Stats

2,000 Things You Should Know About C#

#996 – UTF-16 Encoding, Part I

Share this:

Related

2 Responses to #996 – UTF-16 Encoding, Part I

Leave a comment Cancel reply

Sean Sexton

Recent Posts

Blogroll

Calendar

Top Posts

Tags

Blog Stats