INF: How to Obtain Case-Sensitive Query Results with a Case-Insensitive Sort Order
ID: Q171299
|
The information in this article applies to:
-
Microsoft SQL Server versions 6.0, 6.5
SUMMARY
A sort order is a set of rules that determines how SQL Server collates and
presents data in response to database queries. The sort order determines
the order in which data is presented in response to SQL Server statements
involving GROUP BY, ORDER BY, and DISTINCT clauses. The sort order also
determines how certain queries are resolved, such as queries involving
WHERE and DISTINCT clauses.
During installation, SQL Server allows you to select a character set and a
sort order that the server will use. Several sort orders and character sets
are available. The default character set is ANSI (ISO 8859-1), and the
default sort order is dictionary order, case-insensitive.
If a case-insensitive sort order is chosen, SQL Server considers character
values that differ only in case as equal in value. That is, 'John' =
'john'. With a case-insensitive sort order, it may often be necessary to
obtain query results with case-sensitivity considered. This article
explains techniques to accomplish this.
MORE INFORMATION
To examine the ramifications of a case-insensitive sort order on query
results, consider the following scenario:
create table T1(col1 char(5) NOT NULL, col2 int NOT NULL)
go
insert T1 values('John',1)
insert T1 values('John',1)
insert T1 values('JOhn',1)
insert T1 values('JOhn',1)
insert T1 values('JoHn',1)
insert T1 values('JoHn',1)
insert T1 values('John',1)
insert T1 values('JohN',1)
insert T1 values('JohN',1)
insert T1 values('john',1)
insert T1 values('john',1)
insert T1 values('john',1)
go
Executing the following query on a server with a case-insensitive sort
order returns the following results:
select DISTINCT(col1) from T1
go
col1
-----
John
(1 row(s) affected)
However, executing the same query on a server with a case-sensitive sort
order returns the following results:
col1
-----
John
JOhn
JoHn
JohN
john
(5 row(s) affected)
Regarding the usage of GROUP BY, ORDER BY, and WHERE clauses, observe the
results of these queries that are run on a server with a case-insensitive
sort order, and compare them with the modified queries below. Note that in
all cases, all of the values in col1 are treated as one distinct value:
SELECT sum(col2)'col2 Group Totals' from T1 group by col1
go
col2 Group Totals
-----------------
12
(1 row(s) affected)
SELECT * from T1 order by col1
go
col1 col2
----- -----------
John 1
John 1
JOhn 1
JOhn 1
JoHn 1
JoHn 1
John 1
JohN 1
JohN 1
john 1
john 1
john 1
(12 row(s) affected)
SELECT * from T1 where col1='john'
go
col1 col2
----- -----------
John 1
John 1
JOhn 1
JOhn 1
JoHn 1
JoHn 1
John 1
JohN 1
JohN 1
john 1
john 1
john 1
(12 row(s) affected)
The ability to obtain case-sensitive results with the DISTINCT function and
the GROUP BY, ORDER BY, and WHERE clauses relies on the fact that the
actual data is stored in binary form, with each character represented by a
unique hexadecimal value. You can use the CONVERT function to convert the
data from character format to a string of hexadecimal values. The results
of the conversion can be operated on by the DISTINCT function and the GROUP
BY, ORDER BY, and WHERE clauses without regard to case-sensitivity.
Consider the results of the following query, which displays the original
character data and its corresponding hexadecimal representation:
select col1,'=',CONVERT(binary(5),col1)'hex values' from T1
col1 hex values
----- - ------------
John = 0x4a6f686e20
John = 0x4a6f686e20
JOhn = 0x4a4f686e20
John = 0x4a4f686e20
JoHn = 0x4a6f486e20
JoHn = 0x4a6f486e20
John = 0x4a6f686e20
JohN = 0x4a6f684e20
JohN = 0x4a6f684e20
john = 0x6a6f686e20
john = 0x6a6f686e20
john = 0x6a6f686e20
(12 row(s) affected)
Note that in the results, each unique value of col1 has a distinct
hexadecimal representation. To get the distinct values in character format,
run the next set of statements. It produces the same results as a query
using the DISTINCT function on a case-sensitive server.
select DISTINCT(CONVERT(binary(5),col1))'col1'
into #t1
from T1
go
select CONVERT(char(5),col1)'distinct character results' from #t1
go
(5 row(s) affected)
-----
JOhn
JoHn
JohN
John
john
(5 row(s) affected)
For queries using GROUP BY and/or ORDER BY clauses, consider the following
examples:
select DISTINCT(CONVERT(binary(5),col1))'col1',sum(col2)'col2'
into #t1
from T1
group by CONVERT(binary(5),col1)
go
select CONVERT(char(5),col1)'distinct character results',col2'Group
Totals'
from #t1
go
(5 row(s) affected)
distinct character results Group Totals
-------------------------- ------------
JOhn 2
JoHn 2
JohN 2
John 3
john 3
(5 row(s) affected)
SELECT *
from T1
order by CONVERT(binary(5),col1)
go
col1 col2
----- -----------
JOhn 1
JOhn 1
JoHn 1
JoHn 1
JohN 1
JohN 1
John 1
John 1
John 1
john 1
john 1
john 1
(12 row(s) affected)
For a query using the WHERE clause, use the CONVERT function to convert the
constant and/or column values to comparable hexadecimal strings, as in the
following example:
SELECT *
from T1
where CONVERT(binary(5),col1)=CONVERT(binary(5),'john ')
go
col1 col2
----- -----------
john 1
john 1
john 1
(3 row(s) affected)
It is important to understand that the results returned from queries can be
dramatically affected by the choice of sort order. Due consideration must
be given to the importance of case-sensitivity when planning the
installation and configuration of SQL Server. For further information on
this topic, see "Character Sets And Sort Orders" in the SQL Server Books
Online and the Administrator's Companion.
Additional query words:
hex
Keywords : kbusage SSrvGen
Version : WINDOWS NT:6.0 6.5
Platform : winnt
Issue type : kbhowto kbinfo