INF: How to Manipulate DBCS Data in a Text Field

ID: Q172308


The information in this article applies to:
  • Microsoft SQL Server version 6.5


SUMMARY

This article describes how to correctly manipulate double-byte character set (DBCS) data in text fields.


MORE INFORMATION

UPDATETEXT and WRITETEXT are powerful tools that allow users to modify existing text data through a text pointer. While using a text pointer, it is very common for users to accidentally split a double-byte character in half. This may result in "data corruption" from user's perspective. The following scripts demonstrate this common problem.


   USE pubs
   GO
   SET NOCOUNT ON
   DROP TABLE textTable
   GO
   CREATE TABLE textTable
   (
      id SMALLINT
         IDENTITY(1,1)
         PRIMARY KEY CLUSTERED,
      hexValues VARBINARY(20) NOT NULL,
      textField TEXT NOT NULL
   )
   GO
   DECLARE @hexValues VARBINARY(20)
   SELECT @hexValues = 0xa540a541a542
   INSERT textTable VALUES
      (@hexValues,
         CONVERT (VARCHAR(20), @hexValues))
   INSERT textTable VALUES
      (@hexValues,
         CONVERT (VARCHAR(20), @hexValues))
   GO
   SELECT * FROM textTable
   DECLARE @hexValues VARBINARY(20)
   SELECT @hexValues = 0xa543
   DECLARE @string VARCHAR(20)
   SELECT @string = CONVERT(VARCHAR(20), @hexValues)
   DECLARE @textPointer VARBINARY(16)
   SELECT @textPointer = TEXTPTR(textField)
      FROM textTable WHERE id = 2
   UPDATETEXT textTable.textField @textPointer 1 2 @string
   UPDATE textTable
      SET hexValues =
         CONVERT (VARBINARY(20), CONVERT(VARCHAR(20), textField))
      WHERE id = 2 

As the execution result indicates, the text data becomes invalid because a double-byte character has been split. It is important for users to check whether the target offset from the start of the existing text is a lead byte or not, so that incorrect results can be avoided. The following sample scripts demonstrate checking the target offset of a text field in a SQL Server that was installed with code page 950. Please note the lead byte ranges may vary in the other DBCS code pages. For more information about lead byte ranges in different code pages, please see "Developing International Software for Windows 95 and Windows NT" (ISBN 1-55615-840-8).

   CREATE PROCEDURE IsThisLeadByteInCP950
      @string VARCHAR(255), @offset SMALLINT

   /* @string: is the target string that will be checked */ 
   /* @offset: is the 1-based starting position,         */ 
   /*          specified as the number of bytes (from    */ 
   /*          the start of the existing text value) to  */ 
   /*          skip.                                     */ 
   AS
      SELECT @string = RIGHT(REVERSE(@string), @offset)
      DECLARE @i SMALLINT
      SELECT @i = ASCII(@string)
      IF @i >= 129 and @i <= 254
      /* lead byte range: 0x81 to 0xFE */ 
         RETURN 1
      ELSE
         RETURN 0
   GO
   DECLARE @hexValues VARBINARY(20)
   SELECT @hexValues = 0xa540a541a542
   DECLARE @string VARCHAR(20)
   SELECT @string = CONVERT(VARCHAR(20), @hexValues)
   DECLARE @i SMALLINT
   EXEC @i = IsThisLeadByteInCP950 @string, 1
   IF (@i = 1)
      PRINT "It is a lead byte in code page 950, command aborted."
   ELSE
      PRINT "It is not a lead byte in code page 950, command continue..." 

Additional query words: JIS Wansung Johab Hangul KB BIG-5 932 936 949
950 traditional Chinese Japanese Korean codepage cp

Keywords : kbusage SSrvGen
Version : WINDOWS:6.5
Platform : WINDOWS
Issue type : kbhowto


Last Reviewed: April 16, 1999
© 2000 Microsoft Corporation. All rights reserved. Terms of Use.