#100 – Using IndexOf to Search for Characters Within A String

You can search for the following within a string:

  • A single character
  • One of a set of characters
  • A substring

You search for a single character in a string using the String.IndexOf method, which returns a 0-based index into the string.

 string s = "Thomas Paine";
 int n = s.IndexOf('a');     // 4 (1st 'a')

You can also specify a starting position for the search.  So we can find the next occurrence of ‘a’ this way:

 n = s.IndexOf('a', n+1);    // 8 (start search after 1st 'a')

You can search for the first occurrence of one of a set of characters using the IndexOfAny method.

 string s = "Thomas Paine";
 char[] vowels = new char[] {'a','e','i','o','u'};
 int n = s.IndexOfAny(vowels);     // 2
 n = s.IndexOfAny(vowels, n + 1);  // 4

You can also use IndexOf so search for substrings within a string.

 string s = "A man, a plan, a canal";
 int n = s.IndexOf("an");       // 3
 n = s.IndexOf("an", n + 1);    // 11

#99 – Use StringInfo to Get Specific Characters From A UTF32 String

We saw that you cannot use the normal string index [] to get individual characters from a UTF32 string.  Instead, you need to use the System.Globalization.StringInfo class.

In the example below, we first get a list of indexes to each of the three characters in our UTF32 string.  We then extract index each character separately.

 s = "A𠈓C";
 int n = s.Length;     // 4, because of 4-byte character in middle

 // Get locations of text elements
 int[] indexes = StringInfo.ParseCombiningCharacters(s);  // 0, 1 and 3

 // Retrieve single element
 string nextChar = StringInfo.GetNextTextElement(s, 0);   // A
 nextChar = StringInfo.GetNextTextElement(s, 1);          // 𠈓
 nextChar = StringInfo.GetNextTextElement(s, 3);          // C

#98 – Using an Indexer to Get a Specific Character In a String

In C#, you can use the indexer operator [ ], to get a specified character in a string.

The indexer takes a zero-based integer as an index into the string.  0 returns the first character and n-1 (where n is the length of the string) returns the last character.

 string s = "ABCDE";
 char c = s[0];   // A
 c = s[2];        // C (3rd char)
 c = s[4];        // E

Using a negative value for the index will result in an IndexOutOfRangeException being thrown.

Note that indexers work to extract Unicode characters only if they are 4-byte UTF16 characters.  The indexer cannot retrieve an 8-byte UTF32 character (represented in string as a surrogate pair).

 string s = "A€C";
 char c = s[1];         // Works: €

 s = "A𠈓C";
 c = s[1];       // Doesn't work: unprintable character

#97 – String Comparisons Using Other Cultures

Each language has different rules for how to sort strings, based on the alphabetical ordering of the characters used for that language.  In .NET, information about a language/place combination is captured in an instance of the CultureInfo class.  In turn, the CultureInfo object has a CompareInfo property that points to an instance of the CompareInfo class, defining rules for sorting strings in that culture.

By default, when you use the String.Compare method, the sorting rules for the current culture (CultureInfo.CurrentCulture) are used.  But you can override this by adding a CultureInfo parameter on the Compare method.

You can pass a new instance of CultureInfo by creating an instance using a unique culture code.  The culture code indicates language and region.

 int n;
 n = string.Compare("åb", "bb", true);  // -1 (å < b in English)
 n = string.Compare("åb", "bb", true, new CultureInfo("nn-NO"));    // 1  (å > b in Norwegian)

#96 – Comparing String Values

The < and > operators are not overloaded for the System.String type, which means that you can’t compare strings using the relational operators.

Instead. you can use the static System.String.Compare method, which takes two strings and returns an integer value.  If the first string is less than the second, a negative number is returned.  If the first string is greater, a positive number is returned.  If the strings are equal, the return value is zero.

 int n = string.Compare("Sean", "Steinbeck");    // -1
 n = string.Compare("Sean", "Bozo");             // 1
 n = string.Compare("Sean", "Sean");             // 0
 n = string.Compare("Sean", "sean");             // 1 ("S" > "s")

You can also use the CompareTo instance method of the string type:

 int n = "Sean".CompareTo("Giotto");   // 1 (Sean > Giotto)

You can also ignore case during the comparison:

 n = string.Compare("Sean", "sean", true);         // 0

#95 – ToString() Called Automatically When Doing String Concatenation

When doing string concatenation, either using the + operator or when using the String.Concat or String.Format methods, you can concatenate objects that are not strings.  .NET will attempt to convert these objects to strings before doing the concatenation by calling the object’s ToString method.

Here’s an example:

string s1 = "Ten: " + 10;   // Ten: 10

This is equivalent to:

 int n = 10;
 string s1 = "Ten: " + n.ToString();

This works for any object:

 DateTime dt = DateTime.Now;
 string s2 = "Date and time: " + dt;

This causes the DateTime.ToString method to be called, which results in a string that looks like:

Date and time: 9/16/2010 4:01:44 PM

#71 – StringBuilder Capacity

You can use a StringBuilder object without worrying about how much memory it has allocated internally for the strings that it stores.  The StringBuilder class will automatically allocate enough memory to store the strings that it is working with.

The Capacity property indicates the the maximum number of characters that can be stored in a StringBuilder object.  If an operation results in a string requiring more memory, additional memory will automatically be allocated and Capacity will be increased.

The Length property indicates the length of the string stored in the StringBuilder object.

By default, Capacity starts out at 16 and is doubled whenever more characters are required for a string.

 StringBuilder sb1 = new StringBuilder();    // Len=0, Cap=16
 sb1.Append("1234567890123456");             // Len=16, Cap=16
 sb1.Append("z");                            // Len=17, Cap=32
 sb1.Append("1234567890123456");             // Len=33, Cap=64

You can also explicitly specify capacity when you instantiate a StringBuilder:

 StringBuilder sb2 = new StringBuilder(100);

#70 – The StringBuilder Class

For more efficient string manipulation, you can use the StringBuilder class, which has methods that allow you to modify its internal character data without allocating a new string for each operation.

A StringBuilder instance wraps a single Unicode string and allows you to modify that string in different ways.

StringBuilder can be found in the System.Text namespace.

Constructing a StringBuilder:

 StringBuilder sb1 = new StringBuilder();    // Empty string
 StringBuilder sb2 = new StringBuilder("Sean");

Modifying internal string:

 sb2.Append(" was here");
 sb2.AppendFormat(" on {0:d}", DateTime.Today);
 sb2.Replace("Sean", "Kilroy");
 sb2.Insert(0, "Mr. ");          // Insert at start of string

Other things that you can do with a StringBuilder object:

 char third = sb2[2];            // 3rd character
 string s = sb2.ToString();      // Convert to string
 int len = sb2.Length;           // # chars

#69 – Strings Are Immutable

In C#, strings are immutable which means that they cannot be changed after they are created.  More generally, this is the case for the System.String class in .NET.

Syntactically, however, it appears that you can change the contents of a string (e.g. add a character to the end of a string):

 string s1 = "AGORA";
 s1 = s1.Replace('A', 'Z');   // Replace A's with Z's

But in this case, the original string is destroyed, a new string is allocated that contains the result of the + operation and the s1 variable is set to point to the new string.

In practice, it doesn’t matter much to the programmer that C# strings are internally immutable, since you can “change” them syntactically, as shown above.  Immutability is important only when considering performance of repeated operations on the same string.

#68 – String Equality

The default behavior for reference type equality dictates that two variables are equal only if they point to the same object.  However, the System.String class (string) overrides this.  Two string variables are equal if the two strings pointed to are equal (i.e. they have the same value).

 string s1 = "Popeye";
 string s2 = Console.ReadLine();   // Enter "Popeye" here

 bool b = (s1 == s2);    // True because contents of strings are equal

So, although s1 and s2 point to two different string objects in memory, the equality operator returns true because the values of the two strings are equal.

Follow

Get every new post delivered to your Inbox.

Join 43 other followers