INF: How to Manipulate DBCS Data in a Text Field
ID: Q172308
|
The information in this article applies to:
-
Microsoft SQL Server version 6.5
SUMMARY
This article describes how to correctly manipulate double-byte character
set (DBCS) data in text fields.
MORE INFORMATION
UPDATETEXT and WRITETEXT are powerful tools that allow users to modify
existing text data through a text pointer. While using a text pointer, it
is very common for users to accidentally split a double-byte character in
half. This may result in "data corruption" from user's perspective. The
following scripts demonstrate this common problem.
USE pubs
GO
SET NOCOUNT ON
DROP TABLE textTable
GO
CREATE TABLE textTable
(
id SMALLINT
IDENTITY(1,1)
PRIMARY KEY CLUSTERED,
hexValues VARBINARY(20) NOT NULL,
textField TEXT NOT NULL
)
GO
DECLARE @hexValues VARBINARY(20)
SELECT @hexValues = 0xa540a541a542
INSERT textTable VALUES
(@hexValues,
CONVERT (VARCHAR(20), @hexValues))
INSERT textTable VALUES
(@hexValues,
CONVERT (VARCHAR(20), @hexValues))
GO
SELECT * FROM textTable
DECLARE @hexValues VARBINARY(20)
SELECT @hexValues = 0xa543
DECLARE @string VARCHAR(20)
SELECT @string = CONVERT(VARCHAR(20), @hexValues)
DECLARE @textPointer VARBINARY(16)
SELECT @textPointer = TEXTPTR(textField)
FROM textTable WHERE id = 2
UPDATETEXT textTable.textField @textPointer 1 2 @string
UPDATE textTable
SET hexValues =
CONVERT (VARBINARY(20), CONVERT(VARCHAR(20), textField))
WHERE id = 2
As the execution result indicates, the text data becomes invalid because a
double-byte character has been split. It is important for users to check
whether the target offset from the start of the existing text is a lead
byte or not, so that incorrect results can be avoided. The following
sample scripts demonstrate checking the target offset of a text field in a
SQL Server that was installed with code page 950. Please note the lead
byte ranges may vary in the other DBCS code pages. For more information
about lead byte ranges in different code pages, please see "Developing
International Software for Windows 95 and Windows NT" (ISBN 1-55615-840-8).
CREATE PROCEDURE IsThisLeadByteInCP950
@string VARCHAR(255), @offset SMALLINT
/* @string: is the target string that will be checked */
/* @offset: is the 1-based starting position, */
/* specified as the number of bytes (from */
/* the start of the existing text value) to */
/* skip. */
AS
SELECT @string = RIGHT(REVERSE(@string), @offset)
DECLARE @i SMALLINT
SELECT @i = ASCII(@string)
IF @i >= 129 and @i <= 254
/* lead byte range: 0x81 to 0xFE */
RETURN 1
ELSE
RETURN 0
GO
DECLARE @hexValues VARBINARY(20)
SELECT @hexValues = 0xa540a541a542
DECLARE @string VARCHAR(20)
SELECT @string = CONVERT(VARCHAR(20), @hexValues)
DECLARE @i SMALLINT
EXEC @i = IsThisLeadByteInCP950 @string, 1
IF (@i = 1)
PRINT "It is a lead byte in code page 950, command aborted."
ELSE
PRINT "It is not a lead byte in code page 950, command continue..."
Additional query words:
JIS Wansung Johab Hangul KB BIG-5 932 936 949
950 traditional Chinese Japanese Korean codepage cp
Keywords : kbusage SSrvGen
Version : WINDOWS:6.5
Platform : WINDOWS
Issue type : kbhowto