The information in this article applies to:
- Microsoft SQL Server, version 6.0
- Microsoft Open Data Base Connectivity, version 2.5
SUMMARY
The purpose of this article is to answer general questions and provide more
background information regarding how the SQL Server Driver is DBCS-enabled.
The article is divided into the following sections:
- What is DBCS?
- What does "DBCS-enabled" imply?
- Q & A
MORE INFORMATION
What is DBCS?
Double-byte Character Set (DBCS) is a character encoding mechanism to
accommodate ideographic characters used in Far Eastern languages. Unlike
Single-byte Character Sets (SBCS), which can only represent at most 256
characters in one byte, characters in DBCS can be addressed using a 16-bit
notation, using two bytes, or double-byte. With 16-bit notation, you can
represent 65,536 (216) characters.
DBCS code pages contain both single and double-byte characters. The DBCS
single-byte characters conform to the 8-bit national standards for each
country and correspond closely to the ASCII character set.
In a double-byte character set, certain ranges of code-points are
designated as leading bytes. A leading byte, together with the following
byte, represents a single character. This second byte is called the
trailing byte or trail byte. Each DBCS has a different set of lead-byte
ranges and trail-byte ranges. Unlike leading bytes, trail-bytes in some
DBCS can overlap with 7-bit ASCII character set.
For example, the Shift JIS (Japan Industry Standard) character set
has a trail-byte range of 0x40H-0xFEH. That means a byte holding the
value of 0x7DH can represent the second half of a Kanji character,
not necessary a close brace character(}).
What does "DBCS-enabled" imply?
If a program is claimed to be DBCS-enabled, that means when it is
running on a DBCS platform, the following conditions are true:
- It can distinguish a trail-byte from an ASCII character. For example,
it can find out if 0x7DH is the trail-byte of a Kanji character or a
close brace when it runs on Japanese versions of Windows or Windows NT.
- It should differentiate character-based semantics from byte-based
semantics. For example, a function such as "CharCount" should return
the number of characters in the string instead of the number of bytes in
a DBCS string; a function such as "CharNext" should move to the next
character rather than the next byte in a DBCS string.
Questions and Answers
The following answers are based on connections to the English version
of Microsoft SQL Server version 6.0.
- CAN I PUT A DBCS STRING INTO CHAR OR VARCHAR COLUMNS? CAN I RETRIEVE
A DBCS STRING FROM THE SQL SERVER AND DISPLAY IT?
Yes. When you connect a driver to SQL Server version 6.0 (English
version), since there are no DBCS code pages provided with version 6.0
the server will treat any DBCS string as characters in one of three code
pages, ISO 8859-1, CP850, or CP437, which can be selected during the
installation of the SQL Server. No data will be lost during the
insertion or retrieval of the data.
In order to display DBCS strings, however, your client application
should run on a DBCS platform, such as the Japanese version of
Windows. As soon as you fetch a DBCS string from the SQL Server, the
Japanese version of Windows can display these characters for you.
- CAN I USE A DBCS CHARACTER OR STRING IN A LIKE CLAUSE ?
Yes. Since the driver is DBCS-enabled, it can parse trail-bytes
correctly. For example, it will not interpret trailing-byte characters
such as the percent sign (%) and underscore character (_) as wildcards,
and it will ignore trailing-byte characters such as the single quotation
mark (') and close brace character(}).
ODBC provides two wildcards in a LIKE clause: the percent sign matches
zero or more of any character, and the underscore character matches any
one character. When you connect to the English version of SQL Server
version 6.0, the underscore character actually matches one byte.
- CAN I USE DBCS CHARACTERS TO NAME MY TABLES, COLUMNS AND OTHER OBJECTS?
Yes, because SQL Server 6.0 treats any DBCS characters as characters
in one of its SBCS code pages. Remember to use double quotation marks to
enclose your DBCS identifier, in order to avoid syntax error messages
from the SQL Server.
- HOW DO YOU DEFINE SORT ORDERS FOR DBCS IN SQL SERVER?
Currently, the English version of SQL Server 6.0 has some predefined
sort orders based on Single-Byte Character Sets. There is no
easy way to plug-in a customized DBCS-based sort order in the
current SQL Server. As previously mentioned, the server treats any DBCS
characters as characters in the code page it is currently using.
- I AM TOLD THAT DBCS ISSUES WILL BE ADDRESSED IN THE ODBC 3.0 TIME FRAME.
SINCE THE SQL SERVER DRIVER 2.50 HAS ALREADY BEEN DBCS-ENABLED, WHAT
WILL BE NEW IN ODBC 3.0?
ODBC 3.0 will address DBCS issues from the specification's perspective.
For example, in Kyle Geiger's book, "Inside ODBC," Chapter 9, section
"ODBC 3.0", page 453, you can see two fields in a descriptor record:
LENGTH and OCTET_LENGTH. Here, LENGTH specifies the number of characters
in the column and OCTET_LENGTH gives the length of the column in bytes.
|