Wednesday, December 08, 2004

Lessons Learned #3: Unicode in SQL...

We are continuing our adventures in ensuring that Unicode is fully supported throughout our application and a colleague of mine solved the last of our issues. We were loosing the Unicode character during the round trip to the database, which in this case is MS SQLServer 2000.

SQLServer 2000 is fully Unicode compliant so it all should have worked. All the columns that might contain Unicode character were set to the n* types (nchar, nvarchar and ntext) and yet we were still loosing the information.

Apparently the solution is the prefix the string literal with a "N" character, as in;

update bob set surname=N'unicode surname'

This prefix is SQL92 Intermediary and SQL99 & SQL2003 Optional. Here is an extract from the only place in the SQLServer 2000 help files that I could find this vital information;

"Unicode strings have a format similar to character strings but are preceded by an N identifier (N stands for National Language in the SQL-92 standard). The N prefix must be uppercase. For example, 'Michél' is a character constant while N'Michél' is a Unicode constant. Unicode constants are interpreted as Unicode data, and are not evaluated using a code page. Unicode constants do have a collation, which primarily controls comparisons and case sensitivity. Unicode constants are assigned the default collation of the current database, unless the COLLATE clause is used to specify a collation. Unicode data is stored using two bytes per character, as opposed to one byte per character for character data."

Hope this helps, I know that we're very glad we found it.