#773 – Reversing a String that Contains Unicode Characters Expressed as Surrogate Pairs

You can use the Reverse method to reverse the characters in a .NET-based string.  This method works if the string contains Unicode characters that can be expressed as 2-byte UTF16 code points.  This subset of Unicode is known as the Basic Multilingual Plane (BMP) and is able to represent 65,536 unique code points.

UTF16 can represent Unicode code points outside the BMP through the use of surrogate pairs.  Within a series of 16-bit characters, a 32-bit character can appear, stored as a pair of normal UTF16 words.

In practice, it’s quite rare to encounter Unicode characters outside of the BMP, given that this plane can represent characters from most living languages.

To reverse a string that contains surrogate pairs, you can use the Microsoft.VisualBasic.Strings.StrReverse method.

            // 8 byte string includes surrogate pair
            string s = "A𠈓C";

            // Won't handle surrogate pair
            string s2 = new string(s.Reverse().ToArray());

            // Will handle surrogate pair
            string s3 = Microsoft.VisualBasic.Strings.StrReverse(s);

773-001

Advertisements

About Sean
Software developer in the Twin Cities area, passionate about .NET technologies. Equally passionate about my own personal projects related to family history and preservation of family stories and photos.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: