#1,002 – Specifying Character Encoding when Writing to a File

In .NET, string data is stored in memory as Unicode data encoded as UTF-16 (2 bytes per character, or 4 bytes for surrogate pairs).

When you persist string data out to a file, however, you must be aware of what encoding is being used.  In the example below, we use a StreamWriter to write string data to a file.  StreamWriter by default uses UTF-8 as  the encoding.

            string s1 = "A";             // U+0041
            string s2 = "\u00e9";        // U+00E9 accented e
            string s3 = "\u0100";        // Capital A with bar
            string s4 = "\U00020213";    // CJK ideograph (d840, de13 surrogate)

            using (StreamWriter sw = new StreamWriter(@"C:\Users\Sean\Documents\sometext.txt"))


We could also explicitly specify a UTF-16 encoding (Encoding.Unicode) when creating the StreamWriter object.

            using (StreamWriter sw = new StreamWriter(@"C:\Users\Sean\Documents\sometext.txt", false, Encoding.Unicode))


About Sean
Software developer in the Twin Cities area, passionate about software development and sailing.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: