#970 – Checking for Valid Characters in a String

You can use a regular expression to check a string to see if the string contains only characters within a specified set of characters.

For example, to check whether a string is limited to a single line containing only alphabetic characters (e.g. a..z, A..Z), you can use the regular expression shown below.

        private bool IsAlphabetic(string s)
        {
            Regex r = new Regex(@"^[a-zA-Z]+$");

            return r.IsMatch(s);
        }

The regular expression defines a pattern and the IsMatch method checks the string to see if it matches the pattern.  You can read the regular expression as follows:

  • ^ – start of the string
  • [a..zA..Z] – single character that is a lowercase or uppercase letter
  • + – repeat previous item one or more times (alphabetic character)
  • $ – end of the string

A string will therefore match if it is a single line of text containing at least one alphabetic character and is limited to alphabetic characters.

946-001

946-002

#773 – Reversing a String that Contains Unicode Characters Expressed as Surrogate Pairs

You can use the Reverse method to reverse the characters in a .NET-based string.  This method works if the string contains Unicode characters that can be expressed as 2-byte UTF16 code points.  This subset of Unicode is known as the Basic Multilingual Plane (BMP) and is able to represent 65,536 unique code points.

UTF16 can represent Unicode code points outside the BMP through the use of surrogate pairs.  Within a series of 16-bit characters, a 32-bit character can appear, stored as a pair of normal UTF16 words.

In practice, it’s quite rare to encounter Unicode characters outside of the BMP, given that this plane can represent characters from most living languages.

To reverse a string that contains surrogate pairs, you can use the Microsoft.VisualBasic.Strings.StrReverse method.

            // 8 byte string includes surrogate pair
            string s = "A𠈓C";

            // Won't handle surrogate pair
            string s2 = new string(s.Reverse().ToArray());

            // Will handle surrogate pair
            string s3 = Microsoft.VisualBasic.Strings.StrReverse(s);

773-001

#479 – Identical String Literals Are Stored in the Same Object

The C# compiler will store equivalent string literals in the same underlying System.String object.

In the example below, s1 and s2 refer to the same string literal, so they will share the same underlying System.String object.  s3 will get the same string value assigned to it at runtime, but will be stored in a different object.

            string s1 = "Popeye";    // s1 & s2 will refer to the same
            string s2 = "Popeye";    //   underlying object

            // Enter "Popeye" for s3.  It will be stored in a
            //   different object, though equivalent to s1 & s2.
            string s3 = Console.ReadLine();

            object o1 = s1;
            object o2 = s2;
            object o3 = s3;

            bool bEqual = (o1 == o2);    // True, same object
            bEqual = (o1 == o3);         // False, different objects

We don’t normally care about whether two equivalent strings are stored in the same or different objects, because the == operator for System.String always checks for string equivalence, rather than object identity.

#478 – Verbatim String Literals Can Span Multiple Lines

A verbatim string literal is one in which no escape sequences are processed.  The string contains exactly the characters entered.  Verbatim strings can also span multiple lines.

            string multilineGuy1 = @"This
is
a
multiline
string with embedded \t escape sequences";


In fact, you can only include a multiline string in your source code if it’s declared as a verbatim string.

            string multilineGuy1 = "Let me not to the marriage of true minds
Admit impediments. Love is not love";

            string multilineGuy1 = @"Let me not to the marriage of true minds
Admit impediments. Love is not love
Which alters when it alteration finds,";
            Console.WriteLine(multilineGuy1);

#422 – How ReferenceEquals Behaves When Comparing Strings

The Object.ReferenceEquals method uses reference equality semantics, returning true if two variables refer to the same object in memory.

When comparing strings, you typically want value equality semantics, so you would not use ReferenceEquals.  In the example below, the string values are the same, but the strings are stored in different memory locations, so ReferenceEquals returns false.

            string s1 = "Bouffant";
            StringBuilder sb2 = new StringBuilder("Bouffant");
            bool compare = ReferenceEquals(s1, sb2.ToString());  // false

Because of the way that the compiler stores strings, ReferenceEquals might return true for two equivalent string constants stored in two different variables.  In the example below, the compiler stores just one copy of the string “Galoshes”.

            string s1 = "Galoshes";
            string s2 = "Galoshes";
            bool compare = ReferenceEquals(s1, s2);  // true

Even though ReferenceEquals returns true in this case, you should not rely on this behavior, but use Equals or the == operator to compare two strings.

#392 – Reversing a String Using the Reverse Method

Starting with .NET 3.5, you can use a number of extension methods that are part of the System.Linq namespace to act upon any type that implements IEnumerable<T>.  This includes the string type in C# (System.String), which implements IEnumerable<char>.

One of the most useful extension methods that comes with Linq is the Reverse method (Enumerable.Reverse<TSource>).  Because this is an extension method, you can use it on an instance of a string as if it were an instance method.

            string funnyMan = "Roscoe Arbuckle";

            string backwardsGuy = new string(funnyMan.Reverse().ToArray());

Because Reverse returns an object of type ReverseIterator<char>, we need to convert it back to an array and then use the array to instantiate a new string.

#106 – Using String.Split to Parse A String into Substrings

If you have a longer string consisting of substrings separated by a common delimiter, you can break the string into substrings using the Split method.

 string names = "John,Mary,Elvis,Ringo";
 string[] nameList = names.Split(new char[] {','});

 Console.WriteLine(nameList[0]);    // John
 Console.WriteLine(nameList[1]);    // Mary
 Console.WriteLine(nameList[2]);    // Elvis
 Console.WriteLine(nameList[3]);    // Ringo

Notice that because Split takes an array of characters, you can specify more than one character to use as the delimiter.

You can also split based on delimiters that are strings, rather than single characters.  (We also have to add a parameter indicating whether we want the function to return empty strings).

 string names = "John - Mary - Elvis - Ringo";

 // Same result as before - we get four names, without spaces or dash
 string[] nameList = names.Split(new string[] { " - " }, StringSplitOptions.RemoveEmptyEntries);

#105 – Chaining String Functions Together

Because many functions that operate on strings return a modified string, you can then pass that resulting string into another function that will operate on the new string.  In this way, you can chain together several string operations on the same line of code.

 char[] braces = new char[] {'{', '}'};
 string s = "{This|That|Such}";
 s = s.Replace("|", " and ").Trim(braces).Insert(0, "=> ").ToLower();
 Console.WriteLine(s);       // => this and that and such

#104 – Functions to Trim Leading and Trailing Characters from A String

You can use the Trim function to trim leading and trailing whitespace characters from a string.

 string s = "  The core phrase ";  // 2 leading spaces, 1 trailing
 s = s.Trim();     // s = "The core phrase"

Once again, the function does not change the original string, so you have to assign the result to some string.

You can also give Trim a list of characters that you want trimmed:

 string s = "  {The core phrase,} ";
 s = s.Trim(new char[] {' ','{',',','}'});     // s = "The core phrase"
 s = " {Doesn't {trim} internal stuff }";
 s = s.Trim(new char[] {' ', '{', '}'});      // s = "Doesn't {trim} internal stuff"

Finally, you can trim stuff from the start or end of the string with TrimStart and TrimEnd.

 string s = "{Name}";
 char[] braces = new char[] {'{', '}'};
 string s2 = s.TrimStart(braces);    // s2 = "Name}"
 s2 = s.TrimEnd(braces);             // s2 = "{Name"

#103 – Inserting and Removing Substrings

You can use the String.Insert method to insert a substring into the middle of an existing string.

 string s = "John Adams";
 int n = s.IndexOf("Adams");
 s = s.Insert(n, "Quincy ");    // s now "John Quincy Adams"

Remember that a string is immutable.  Invoking Insert on a string but not assigning the result to anything will have no effect on the string.

 string s = "John Adams";
 s.Insert(5, "Quincy ");    // Allowed, but s is not changed

You can use the String.Remove method to remove a specified number of characters from a string, starting at specified 0-based index.  (Again, you must assign the resulting string to something).

 string s = "OHOLEne";
 s = s.Remove(1, 4);      // Start at position 1, remove 4 characters
 Console.WriteLine(s);    // "One"