#1,001 – Representing Unicode Surrogate Pairs

UTF-16 encodes Unicode code points above U+FFFF using surrogate pairs that take up 4 bytes.

You can specify a surrogate pair within a string literal by inserting the character directly into the string (provided that you have a keyboard that can insert the character):

            string myString = "𠈓";   // CJK Ideograph

You can also represent the surrogate pair within a string literal using the \Unnnnnnnn (4 byte) syntax to specify the Unicode code point or the \unnnn\unnnn syntax to specify the encoded surrogate pair value.

            string s1 = "\U00020213";    // Codepoint E+20213
            string s2 = "\uD840\uDE13";  // Surrogate pair

Note that because a surrogate pair requires more then 2 bytes, you cannot represent a surrogate pair within a single character (System.Char) literal.


About Sean
Software developer in the Twin Cities area, passionate about software development and sailing.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: