Monday, December 06, 2004

Lesons Learned #2: Character Encoding

Since I've spent the last few work days dealing with character encoding issues I thought I would write up some useful points to remember.

1) Internally Java is excellent at maintaining character encoding, so you don't have to worry about how you deal with Strings and Characters within your application.

2) The points where character encoding will become an issue are at the boundaries, where you are sending or receiving data.

- When using streams to read/write data to/from a network connection or a file you should declare the encoding you want to use, e.g. "UTF-8".
- Ensure that any databases you interact with are correctly configured for the character encoding that you want to use. For example in SQLServer you will need to make sure that all String fields are either nvarchar or ntext, not the usual varchar and text types. You may also have to configure the database as a whole to ensure that connections use the correct encoding type.

3) Swing components will all support Unicode, however you will need to ensure that the font you are using supports the Unicode ranges that you are using. The default fonts delivered with Java will support most of the Unicode set, but this does not include Japanese, Chinese and Korean glyphs. MS Windows comes with "MS Arial Unicode", however you may be better off using code to find a font that will support the Unicode ranges that you need.

A good reference is the following;

cover buy from