Developing DBCS Applications

Windows developers targeting the Asian marketplace can build database applications for Microsoft SQL Server 6.5 using a variety of DBCS character sets (Chinese, Japanese, or Korean). There are a many special considerations, coding techniques, and database issues that affect the development of DBCS applications that use SQL Server for database operations.

This paper addresses SQL Server-specific issues only and is not a comprehensive guide about how to create DBCS-enabled Windows applications. For a comprehensive guide on how to build DBCS-enabled applications, see Developing International Software for Windows 95 and Windows NT.

This section divides the topic of building DBCS-enabled SQL Server applications into client application design issues and database administration issues.

DBCS Client Application Design Issues

Windows application developers building database applications using ODBC must be aware of several critical issues when building DBCS-enabled applications that communicate with Microsoft SQL Server. These issues include DBCS string handling, using Input Method Editors, sizing ODBC buffers, and using certain ODBC canonical functions.

Handling DBCS Strings

DBCS-enabled applications must be coded to ensure proper handling of DBCS strings. If your code doesn't handle DBCS strings properly, you can separate a lead-byte from a trail-byte and mistakenly treat a single character as two separate characters. Any code that manipulates strings must check for double-byte pairs including search, edit, move, replace, delete, and insert operations. Display operations must be programmed properly to handle DBCS strings, including cursor placement, backspacing, and character selection operations. For more information about building DBCS-enabled Windows applications, see Developing International Software for Windows 95 and Windows NT.

Handling DBCS Input

A standard 101-key keyboard cannot represent all the characters in a DBCS character set, so Windows-based DBCS and Unicode applications use an Input Method Editor (IME) to enable input of extended characters. For more information about IME coding techniques, see Developing International Software for Windows 95 and Windows NT.

Sizing ODBC Data Buffers

ODBC uses a variety of buffers when communicating with the SQL Server ODBC driver. All of these buffers are sized in bytes, not characters. Your buffers must be large enough to handle the total bytes required for a DBCS string to avoid truncation. If a DBCS string is truncated you can inadvertently separate a lead-byte from a trail-byte, which would corrupt your data.

It is impossible to determine the number of characters in a DBCS string without processing it first. Therefore, using an ODBC variable-length character data type such as SQL_VARCHAR or SQL_LONGVARCHAR is recommended when working with columns that contain DBCS strings.

Use of ODBC LENGTH Canonical Function

If you use ODBC canonical syntax to call functions in SQL statement text by way of SQLPrepare or SQLExecDirect, it is important to understand how the ODBC canonical LENGTH function works against a SQL Server running a DBCS code page. This function maps to the Transact-SQL DATALENGTH function that returns the length in bytes, not characters, of the string upon which it operates.

DBCS Database Administration Issues

The SQL Server 6.5 engine is fully DBCS-enabled. Several SQL Server-specific DBCS issues that apply to database administrators are discussed in this section. These topics include DBCS object names, size of character columns, and use of Transact-SQL functions when manipulating DBCS strings.

DBCS Object Names

All Transact-SQL statements, functions, and system stored procedures support the use of DBCS strings for object names. This means that you can name tables, views, and all other SQL Server objects using DBCS strings. Be aware that object names are limited to 30 bytes.

Sizing Character Columns

SQL Server table columns are sized in bytes not characters. Size your columns to be large enough to handle the total bytes required for a DBCS string to avoid truncation. If a DBCS string is truncated you can inadvertently separate a lead-byte from a trail-byte, which could corrupt your string.

For example, a DBCS string that contains 20 characters may consume 22 bytes of storage because the last two characters are full-width characters. If you attempt to store this string in a column of type char(21) the trailing byte of the last full-width character would be truncated, which would corrupt the string.

Transact-SQL String Functions

Most Transact-SQL functions that manipulate character strings have been fully DBCS-enabled. This table describes each string function from a DBCS perspective. For a detailed description of each of these functions, see the Microsoft SQL Server Transact-SQL Reference.

Function DBCS-enabled Comments
CONVERT ü Truncates DBCS-strings by ensuring that if a truncation occurs involving a full-width character, both bytes of the full-width character are truncated not just the trailing byte.
ASCII ü Returns the integer value representing the code point for single-byte (usually ASCII) characters only. If you try to use it on a double-byte character it returns an error.
CHAR ü Returns the corresponding single-byte (usually ASCII) character using the integer value representing the code-point for the corresponding single-byte character. This function returns an error if you pass the integer code-point for a double-byte character.
CHARINDEX ü Searches a DBCS string for the first occurrence of a character (either single- or double-byte) and returns an integer value representing its logical (not byte) position in the string.
LOWER / UPPER ü Returns the lower- or upper-case equivalent of a single-byte character. Returns the same character if there is no lower- or upper-case equivalent or if it is a double-byte character.
LTRIM / RTRIM ü Removes leading or trailing spaces in DBCS strings.
PATINDEX Partially This function is DBCS-enabled when working with character data types (same as CHARINDEX). It is not DBCS-enabled when searching a text data type, however.
REPLICATE ü Replicates either single-byte or double-byte characters.
REVERSE ü Reverses DBCS strings.
RIGHT ü Extracts the rightmost n characters (not bytes) from a DBCS-string.
STUFF ü Inserts a DBCS-string into another DBCS-string using characters (not bytes) as the index.
SUBSTRING ü Removes a DBCS-string from another DBCS-string using characters (not bytes) as the index and counter.