#1,002 – Specifying Character Encoding when Writing to a File

In .NET, string data is stored in memory as Unicode data encoded as UTF-16 (2 bytes per character, or 4 bytes for surrogate pairs).

When you persist string data out to a file, however, you must be aware of what encoding is being used.  In the example below, we use a StreamWriter to write string data to a file.  StreamWriter by default uses UTF-8 as  the encoding.

            string s1 = "A";             // U+0041
            string s2 = "\u00e9";        // U+00E9 accented e
            string s3 = "\u0100";        // Capital A with bar
            string s4 = "\U00020213";    // CJK ideograph (d840, de13 surrogate)

            using (StreamWriter sw = new StreamWriter(@"C:\Users\Sean\Documents\sometext.txt"))
            {
                sw.WriteLine(s1);
                sw.WriteLine(s2);
                sw.WriteLine(s3);
                sw.WriteLine(s4);
            }

1002_001

We could also explicitly specify a UTF-16 encoding (Encoding.Unicode) when creating the StreamWriter object.

            using (StreamWriter sw = new StreamWriter(@"C:\Users\Sean\Documents\sometext.txt", false, Encoding.Unicode))

1002_002

Advertisements

About Sean
Software developer in the Twin Cities area, passionate about .NET technologies. Equally passionate about my own personal projects related to family history and preservation of family stories and photos.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: