#773 – Reversing a String that Contains Unicode Characters Expressed as Surrogate Pairs
February 5, 2013 Leave a comment
You can use the Reverse method to reverse the characters in a .NET-based string. This method works if the string contains Unicode characters that can be expressed as 2-byte UTF16 code points. This subset of Unicode is known as the Basic Multilingual Plane (BMP) and is able to represent 65,536 unique code points.
UTF16 can represent Unicode code points outside the BMP through the use of surrogate pairs. Within a series of 16-bit characters, a 32-bit character can appear, stored as a pair of normal UTF16 words.
In practice, it’s quite rare to encounter Unicode characters outside of the BMP, given that this plane can represent characters from most living languages.
To reverse a string that contains surrogate pairs, you can use the Microsoft.VisualBasic.Strings.StrReverse method.
// 8 byte string includes surrogate pair string s = "A𠈓C"; // Won't handle surrogate pair string s2 = new string(s.Reverse().ToArray()); // Will handle surrogate pair string s3 = Microsoft.VisualBasic.Strings.StrReverse(s);