Code Pages and Sort Orders

The physical storage of character strings in Microsoft® SQL Server™ is controlled by the code page and sort order selected during installation.

While the code page and sort order control the format of how data is stored in each database, the specification of the code page and sort order are global to the server. Every database attached to a specific SQL Server installation uses the same code page and sort order. The code page and sort order are specified during SQL Server Setup.

Code Pages

Software interprets the data in character strings using code pages. The eight bits in one byte can be arranged in 256 different patterns, so code pages that use one byte per character can only support 256 characters. A code page defines which bit patterns in a byte represent each character. Code pages define bit patterns for upper and lowercase characters, digits, symbols, and special characters such as !, @, #, or %.

There are several code pages. The default code page for a server running SQL Server is the 1252 - ISO character set, which contains the common characters from most languages that originated in Western Europe. The bit patterns can be represented by decimal numbers, the first bit pattern is 0, the second 1, and so on to the last bit pattern, which is represented by the value 255. The bit patterns from 32 to 126 represent the same characters on all code pages, while the characters represented by the bit values 0 through 31 and 127 through 255 vary between code pages. The values 32 through 126 cover the characters, digits, and special characters typically used in United States English text. The values 0 through 31 and 127 through 255 are known as the extended character set and represent characters typically used in languages other than U.S. English. Items that vary between countries, such as currency symbols, are also in the extended character set. Systems that store data that does not use any of the extended characters have no concern about what code page is used in the different system components. Systems using extended characters must ensure that all computers use the same code page, or must be able to successfully translate data stored with one code page onto a computer running another code page without losing any of the extended characters.

When you pick a code page during the setup of SQL Server, the data in all character columns in the database, and in all character variables and parameters in Transact-SQL statements, is stored and interpreted using the bit patterns from the indicated code page. SQL Server can be installed with a different code page than the code page used by the server operating system. If the data in a server running SQL Server contains extended characters, care must be used in determining the code pages used in the database and on the clients. If both the database and all the clients are running the same code page, there are no translation issues. If the server is running one code page and clients are using another code page, then the clients may need to turn on options in the SQL Server ODBC driver and OLE DB provider to support the proper conversion of extended characters. In SQL Server version 7.0, the SQL Server ODBC driver, and the OLE DB Provider for SQL Server handle this conversion automatically. You only need to specify it manually when using earlier versions of SQL Server or the SQL Server ODBC driver, or when using DB-Library.

International Data and Unicode

Storing data in multiple languages within one database is difficult to manage when using only character data and code pages. It is difficult to find one code page for the database that can store all the required language-specific characters. It is also difficult to ensure the proper translation of special characters when being read or updated by different clients running various code pages.

For example, a database of customers in North America has to handle three major languages:

When using only character columns and code pages, care has to be taken to ensure the database is installed with a code page that will handle the characters of all three languages. More care must be taken to ensure the proper translation of characters from one of the languages when read by clients running a code page for another languages.

A new feature in SQL Server 7.0 is support for Unicode data types, which eliminates the problem in converting characters. Unicode stores character data using two bytes for each character rather than one byte. There are 65,536 different bit patterns in two bytes, so Unicode can use one standard set of bit patterns to encode each character in all languages, including languages such as Chinese that have large numbers of characters. Data stored in Unicode columns is not affected by code pages. Programming languages also support Unicode data types, such as the C WCHAR data type. If an application uses Unicode variables to hold the data received from or sent to Unicode columns, there is no need for character translations.

It is easier to manage international databases if you:

The fact that Unicode data needs twice as much storage space is offset by eliminating the need to convert extended characters between code pages.

SQL Server 7.0 stores all textual system catalog data in columns having Unicode data types. The names of database objects such as tables, views, and stored procedures are stored in Unicode columns. This allows applications to be developed using only Unicode, which avoids all issues with code page conversions.

Sort Order

The sort order is another option specified during the Setup program. The sort order specifies the rules used by SQL Server to collate, compare, and present character data. It also specifies whether SQL Server is case-sensitive.

SQL Server 7.0 uses two sort orders, one for the character code page and one for Unicode data. Both sort orders are specified during setup.

The sort orders are different for each SQL Server code page. For example, the sort orders available with code page 1252 are:

The following table shows how the sort order affects comparisons and sorting.

Sort order Comparison examples Sorting example
Dictionary order, case-insensitive A = a, Ä = ä, Å = å,
a ¹ à ¹ á ¹ â ¹ ä ¹ å,
A ¹ Ä ¹ Å
A, a, à, á, â, ä, Ä, Å, å
(See Note 1)
Binary (See Note 2) (See Note 2)
Dictionary order, case-sensitive A ¹ a, Ä ¹ ä, Å ¹ å,
a ¹ à ¹ á ¹ â ¹ ä ¹ å,
A ¹ Ä ¹ Å
A, a, à, á, â, Ä, ä, Å, å
Dictionary order, case-insensitive, uppercase preference A = a, Ä = ä, Å = å,
a ¹ à ¹ á ¹ â ¹ ä ¹ å,
A ¹ Ä ¹ Å
A, a, à, á, â, Ä, ä, Å, å
Dictionary order, case-insensitive, accent-insensitive A = a = à = á = â = Ä =
ä = Å = å
A, a, à, á, â, Ä, ä, Å, å
1    This is just one example of how the characters might be sorted. Because this sort order does not have uppercase preference, you cannot predict whether an uppercase character will come before or after its corresponding lowercase character.

2    Comparisons and sorting for this sort order are based on the numeric values of the characters in the installed character set.


See Also
Character Set Using Unicode Data
Advanced Topics Sort Order
Unicode Data  

  


(c) 1988-98 Microsoft Corporation. All Rights Reserved.