#773 – Reversing a String that Contains Unicode Characters Expressed as Surrogate Pairs

You can use the Reverse method to reverse the characters in a .NET-based string.  This method works if the string contains Unicode characters that can be expressed as 2-byte UTF16 code points.  This subset of Unicode is known as the Basic Multilingual Plane (BMP) and is able to represent 65,536 unique code points.

UTF16 can represent Unicode code points outside the BMP through the use of surrogate pairs.  Within a series of 16-bit characters, a 32-bit character can appear, stored as a pair of normal UTF16 words.

In practice, it’s quite rare to encounter Unicode characters outside of the BMP, given that this plane can represent characters from most living languages.

To reverse a string that contains surrogate pairs, you can use the Microsoft.VisualBasic.Strings.StrReverse method.

            // 8 byte string includes surrogate pair
            string s = "A𠈓C";

            // Won't handle surrogate pair
            string s2 = new string(s.Reverse().ToArray());

            // Will handle surrogate pair
            string s3 = Microsoft.VisualBasic.Strings.StrReverse(s);

773-001

About Sean
Software developer in the Twin Cities area, passionate about software development and sailing.

Leave a comment