Using Unicode Data

The Unicode specification defines a single encoding scheme for practically all characters widely used in businesses around the world. All computers consistently translate the bit patterns in Unicode data into characters using the single Unicode specification. This ensures that the same bit pattern is always converted to the same character on all computers. Data can be freely transferred from one database or computer to another without concern that the receiving system will correctly translate the bit patterns into characters.

One problem with data types that use 1 byte to encode each character is that the data type can only represent 256 different characters. This forces multiple encoding specifications (or code pages) for different alphabets such as European alphabets, which are relatively small. It is also impossible to handle systems such as the Japanese Kanji or Korean Hangul alphabets that have thousands of characters.

Microsoft® SQL Server™ translates the bit patterns in char, varchar, and text columns to characters using the definitions in the code page installed with SQL Server. Client computers use the code page installed with the operating system to interpret character bit patterns. There are many different code pages. Some characters appear on some code pages, but not on others. Some characters are defined with one bit pattern on some code pages, and with a different bit pattern on other code pages. When you build international systems that must handle different languages, it becomes difficult to pick code pages for all the computers that meet the language requirements of multiple countries. It is also difficult to ensure that every computer performs the correct translations when interfacing with a system using a different code page.

The Unicode specification addresses this problem by using 2 bytes to encode each character. There are enough different patterns (65,536) in 2 bytes for a single specification covering the most common business languages. Because all Unicode systems consistently use the same bit patterns to represent all characters, there is no problem with characters being converted incorrectly when moving from one system to another.

In Microsoft SQL Server, these data types support Unicode data:

nchar
nvarchar
ntext

Note The n prefix for these data types comes from the SQL-92 standard for National (Unicode) data types.

Use of nchar, nvarchar, and ntext is the same as char, varchar, and text, respectively, except:

Unicode supports a wider range of characters.
More space is needed to store Unicode characters.
The maximum size of nchar and nvarchar columns is 4,000 characters, not 8,000 characters like char and varchar.
Unicode constants are specified with a leading N: N’A Unicode string’.

Data Types	UNICODE
NCHAR	Unicode Data

Using Unicode Data

See Also