#1,005 – Replacing a Substring with a New Substring

You can use the string.Replace method to find and replace a substring of a longer string with a new substring.  Replace is an instance method that acts upon a specified string and returns a new string.

In the example below, we replace every occurrence of “race” with “class”, returning the new string.

            string quote =
@"There's a race of men that don't fit in,
A race that can't sit still;";

            string newQuote = quote.Replace("race", "class");

            Console.WriteLine(string.Format("ORIGINAL:\n{0}", quote));
            Console.WriteLine(string.Format("NEW:\n{0}", newQuote));

1005-001
You can use Replace to remove instances of a particular substring (replacing them with an empty string).

            string quote = "Four awesome score and seven awesome years ago";

            string newQuote = quote.Replace("awesome ", "");

            Console.WriteLine(string.Format("ORIGINAL:\n{0}", quote));
            Console.WriteLine(string.Format("NEW:\n{0}", newQuote));

1005-002

Advertisements

#1,004 – Converting a String to Uppercase or Lowercase

You can use the ToLower and ToUpper methods of the string type to convert a particular string to all lowercase or all uppercase.  In either case, a new string is returned.

            string peye = "Guy Noir";

            string peyeLower = peye.ToLower();
            string peyeUpper = peye.ToUpper();

            Console.WriteLine(peye);
            Console.WriteLine(peyeLower);
            Console.WriteLine(peyeUpper);

1004-001

#1,003 – Accessing Underlying Bytes in a String for Different Encodings

Strings in .NET are stored in memory as Unicode character data, using the UTF-16 encoding.  (2 bytes per character, or 4 bytes for surrogate pairs).

If you want to get access to the underlying data for the string in memory, you can use one of the functions listed below, indicating what encoding to use for the Unicode data when converting it to a byte array.  If you use Encoding.Unicode, you’ll get the data exactly as it is stored in memory for the String type.

  • System.Text.Encoding.Unicode.GetBytes – UTF-16
  • System.Text.Encoding.UTF8.GetBytes – UTF-8

In the example below, notice the different byte sequences used to encode the CJK character.

            string ideograph = "𠈓";
            byte[] utf16 = Encoding.Unicode.GetBytes(ideograph);
            byte[] utf8 = Encoding.UTF8.GetBytes(ideograph);

1003-001

#970 – Checking for Valid Characters in a String

You can use a regular expression to check a string to see if the string contains only characters within a specified set of characters.

For example, to check whether a string is limited to a single line containing only alphabetic characters (e.g. a..z, A..Z), you can use the regular expression shown below.

        private bool IsAlphabetic(string s)
        {
            Regex r = new Regex(@"^[a-zA-Z]+$");

            return r.IsMatch(s);
        }

The regular expression defines a pattern and the IsMatch method checks the string to see if it matches the pattern.  You can read the regular expression as follows:

  • ^ – start of the string
  • [a..zA..Z] – single character that is a lowercase or uppercase letter
  • + – repeat previous item one or more times (alphabetic character)
  • $ – end of the string

A string will therefore match if it is a single line of text containing at least one alphabetic character and is limited to alphabetic characters.

946-001

946-002

#773 – Reversing a String that Contains Unicode Characters Expressed as Surrogate Pairs

You can use the Reverse method to reverse the characters in a .NET-based string.  This method works if the string contains Unicode characters that can be expressed as 2-byte UTF16 code points.  This subset of Unicode is known as the Basic Multilingual Plane (BMP) and is able to represent 65,536 unique code points.

UTF16 can represent Unicode code points outside the BMP through the use of surrogate pairs.  Within a series of 16-bit characters, a 32-bit character can appear, stored as a pair of normal UTF16 words.

In practice, it’s quite rare to encounter Unicode characters outside of the BMP, given that this plane can represent characters from most living languages.

To reverse a string that contains surrogate pairs, you can use the Microsoft.VisualBasic.Strings.StrReverse method.

            // 8 byte string includes surrogate pair
            string s = "A𠈓C";

            // Won't handle surrogate pair
            string s2 = new string(s.Reverse().ToArray());

            // Will handle surrogate pair
            string s3 = Microsoft.VisualBasic.Strings.StrReverse(s);

773-001

#479 – Identical String Literals Are Stored in the Same Object

The C# compiler will store equivalent string literals in the same underlying System.String object.

In the example below, s1 and s2 refer to the same string literal, so they will share the same underlying System.String object.  s3 will get the same string value assigned to it at runtime, but will be stored in a different object.

            string s1 = "Popeye";    // s1 & s2 will refer to the same
            string s2 = "Popeye";    //   underlying object

            // Enter "Popeye" for s3.  It will be stored in a
            //   different object, though equivalent to s1 & s2.
            string s3 = Console.ReadLine();

            object o1 = s1;
            object o2 = s2;
            object o3 = s3;

            bool bEqual = (o1 == o2);    // True, same object
            bEqual = (o1 == o3);         // False, different objects

We don’t normally care about whether two equivalent strings are stored in the same or different objects, because the == operator for System.String always checks for string equivalence, rather than object identity.

#478 – Verbatim String Literals Can Span Multiple Lines

A verbatim string literal is one in which no escape sequences are processed.  The string contains exactly the characters entered.  Verbatim strings can also span multiple lines.

            string multilineGuy1 = @"This
is
a
multiline
string with embedded \t escape sequences";


In fact, you can only include a multiline string in your source code if it’s declared as a verbatim string.

            string multilineGuy1 = "Let me not to the marriage of true minds
Admit impediments. Love is not love";

            string multilineGuy1 = @"Let me not to the marriage of true minds
Admit impediments. Love is not love
Which alters when it alteration finds,";
            Console.WriteLine(multilineGuy1);