INF: How to Obtain Case-Sensitive Query Results with a Case-Insensitive Sort Order

ID: Q171299


The information in this article applies to:
  • Microsoft SQL Server versions 6.0, 6.5


SUMMARY

A sort order is a set of rules that determines how SQL Server collates and presents data in response to database queries. The sort order determines the order in which data is presented in response to SQL Server statements involving GROUP BY, ORDER BY, and DISTINCT clauses. The sort order also determines how certain queries are resolved, such as queries involving WHERE and DISTINCT clauses.

During installation, SQL Server allows you to select a character set and a sort order that the server will use. Several sort orders and character sets are available. The default character set is ANSI (ISO 8859-1), and the default sort order is dictionary order, case-insensitive.

If a case-insensitive sort order is chosen, SQL Server considers character values that differ only in case as equal in value. That is, 'John' = 'john'. With a case-insensitive sort order, it may often be necessary to obtain query results with case-sensitivity considered. This article explains techniques to accomplish this.


MORE INFORMATION

To examine the ramifications of a case-insensitive sort order on query results, consider the following scenario:


   create table T1(col1 char(5) NOT NULL, col2 int NOT NULL)
   go
   insert T1 values('John',1)
   insert T1 values('John',1)
   insert T1 values('JOhn',1)
   insert T1 values('JOhn',1)
   insert T1 values('JoHn',1)
   insert T1 values('JoHn',1)
   insert T1 values('John',1)
   insert T1 values('JohN',1)
   insert T1 values('JohN',1)
   insert T1 values('john',1)
   insert T1 values('john',1)
   insert T1 values('john',1)
   go 

Executing the following query on a server with a case-insensitive sort order returns the following results:

   select DISTINCT(col1) from T1
   go 


   col1
   -----
   John

   (1 row(s) affected) 

However, executing the same query on a server with a case-sensitive sort order returns the following results:

   col1
   -----
   John
   JOhn
   JoHn
   JohN
   john

   (5 row(s) affected) 

Regarding the usage of GROUP BY, ORDER BY, and WHERE clauses, observe the results of these queries that are run on a server with a case-insensitive sort order, and compare them with the modified queries below. Note that in all cases, all of the values in col1 are treated as one distinct value:

   SELECT sum(col2)'col2 Group Totals' from T1 group by col1
   go 

   col2 Group Totals
   -----------------
   12

   (1 row(s) affected) 


   SELECT * from T1 order by col1
   go 

   col1  col2
   ----- -----------
   John  1
   John  1
   JOhn  1
   JOhn  1
   JoHn  1
   JoHn  1
   John  1
   JohN  1
   JohN  1
   john  1
   john  1
   john  1

   (12 row(s) affected) 


   SELECT * from T1 where col1='john'
   go 

   col1  col2
   ----- -----------
   John  1
   John  1
   JOhn  1
   JOhn  1
   JoHn  1
   JoHn  1
   John  1
   JohN  1
   JohN  1
   john  1
   john  1
   john  1

   (12 row(s) affected) 

The ability to obtain case-sensitive results with the DISTINCT function and the GROUP BY, ORDER BY, and WHERE clauses relies on the fact that the actual data is stored in binary form, with each character represented by a unique hexadecimal value. You can use the CONVERT function to convert the data from character format to a string of hexadecimal values. The results of the conversion can be operated on by the DISTINCT function and the GROUP BY, ORDER BY, and WHERE clauses without regard to case-sensitivity.

Consider the results of the following query, which displays the original character data and its corresponding hexadecimal representation:

   select col1,'=',CONVERT(binary(5),col1)'hex values' from T1 

   col1    hex values
   ----- - ------------
   John  = 0x4a6f686e20
   John  = 0x4a6f686e20
   JOhn  = 0x4a4f686e20
   John  = 0x4a4f686e20
   JoHn  = 0x4a6f486e20
   JoHn  = 0x4a6f486e20
   John  = 0x4a6f686e20
   JohN  = 0x4a6f684e20
   JohN  = 0x4a6f684e20
   john  = 0x6a6f686e20
   john  = 0x6a6f686e20
   john  = 0x6a6f686e20

   (12 row(s) affected) 

Note that in the results, each unique value of col1 has a distinct hexadecimal representation. To get the distinct values in character format, run the next set of statements. It produces the same results as a query using the DISTINCT function on a case-sensitive server.

   select DISTINCT(CONVERT(binary(5),col1))'col1'
   into #t1
   from T1
   go
   select CONVERT(char(5),col1)'distinct character results' from #t1
   go

   (5 row(s) affected) 

   -----
   JOhn
   JoHn
   JohN
   John
   john

   (5 row(s) affected) 

For queries using GROUP BY and/or ORDER BY clauses, consider the following examples:

   select DISTINCT(CONVERT(binary(5),col1))'col1',sum(col2)'col2'
   into #t1
   from T1
   group by CONVERT(binary(5),col1)
   go
   select CONVERT(char(5),col1)'distinct character results',col2'Group
   Totals'
   from #t1
   go 

   (5 row(s) affected)

   distinct character results Group Totals
   -------------------------- ------------
   JOhn                       2
   JoHn                       2
   JohN                       2
   John                       3
   john                       3

   (5 row(s) affected) 


   SELECT *
   from T1
   order by CONVERT(binary(5),col1)
   go 

   col1  col2
   ----- -----------
   JOhn  1
   JOhn  1
   JoHn  1
   JoHn  1
   JohN  1
   JohN  1
   John  1
   John  1
   John  1
   john  1
   john  1
   john  1

   (12 row(s) affected) 

For a query using the WHERE clause, use the CONVERT function to convert the constant and/or column values to comparable hexadecimal strings, as in the following example:

   SELECT *
   from T1
   where CONVERT(binary(5),col1)=CONVERT(binary(5),'john ')
   go 

   col1  col2
   ----- -----------
   john  1
   john  1
   john  1

   (3 row(s) affected) 

It is important to understand that the results returned from queries can be dramatically affected by the choice of sort order. Due consideration must be given to the importance of case-sensitivity when planning the installation and configuration of SQL Server. For further information on this topic, see "Character Sets And Sort Orders" in the SQL Server Books Online and the Administrator's Companion.

Additional query words: hex

Keywords : kbusage SSrvGen
Version : WINDOWS NT:6.0 6.5
Platform : winnt
Issue type : kbhowto kbinfo


Last Reviewed: April 16, 1999
© 2000 Microsoft Corporation. All rights reserved. Terms of Use.