INF: Storing UTF-8 Data in SQL Server
ID: Q232580
|
The information in this article applies to:
-
Microsoft SQL Server versions 6.5, 7.0
SUMMARY
Some applications (especially those that are Web based) must deal with Unicode data that is encoded with the UTF-8 encoding method. SQL Server 7.0 uses a different Unicode encoding (UCS-2) and does not recognize UTF-8 as valid character data. This article discusses some options for dealing with this situation.
MORE INFORMATION
Unicode data can be encoded in many different ways. UCS-2 and UTF-8 are two common ways to store bit patterns that represent Unicode characters. SQL Server, Windows NT, and the SQL Server ODBC driver only deal with Unicode data that is encoded as UCS-2.
The options for using SQL Server 7.0 as a back-end for an application that sends and receives Unicode data encoded with UTF-8 include:
- Translate to and from UCS-2 or UTF-8 as appropriate within the application. Sample code for this type of conversion is located at the Unicode Consortium's site:
ftp site ftp://ftp.unicode.org/Public/PROGRAMS/CVTUTF/
A high-level description of the algorithm to convert UCS-2 to UTF-8 can be found in the Internet Request For Comments document RFC2279.
- Modify the application to use UCS-2 instead of UTF-8 encoding.
- Store the actual UTF-8 data on the server using the BINARY/VARBINARY/IMAGE columns. Storing UTF-8 data on SQL Server means that you can not use SQL Server to sort or find ranges of these values as if the data were valid character data. The types of operations on columns containing UTF-8 data that would not return expected results include "ORDER BY", greater-than ">" and less-than "<" comparisons, and the built-in SQL Server string manipulation functions such as SUBSTRING(). Equality comparisons, however, continue to work. Note that if you store UTF-8 data in SQL Server you should not use character columns (CHAR/NCHAR/VARCHAR and so forth). To SQL Server, UTF-8 is not valid character data, and non-character data should not be stored in character columns.
Keep in mind that if you ever need to access this data outside of a Web browser (from a non Web based ODBC application, for example) you need to do a conversion from UTF-8 to UCS-2 within this application since ODBC, Win32 API calls, and common string manipulation functions do not work with UTF-8 data. This moves the burden of translation to a different application.
- If your requirements do not include the need to store data from a combination of languages that cannot be satisfied by a single code page, you may not need to use Unicode.
SQL Server 6.5 does not support the storing of Unicode data. If you are using this version of SQL Server your only options are in step 3 and step 4.
Additional query words:
UTF-16 UTF16 international
Keywords : SSrvProg kbSQLServ650 kbSQLServ700
Version : winnt:6.5,7.0
Platform : winnt
Issue type : kbinfo
|