Using Summary Queries

You can summarize data by using the GROUP BY and HAVING clauses. GROUP BY organizes data into groups; HAVING sets conditions on which groups to include in the results. These clauses should be used together; HAVING used without GROUP BY can produce confusing results.

Aggregate functions return summary values, either for the whole table or for groups within the table. For this reason they are used with GROUP BY. Aggregate functions can appear in a select list or in a HAVING clause, but not in a WHERE clause.

You can use the following aggregate functions with GROUP BY (expression in this table is almost always a column name):

Aggregate function Result
SUM([ALL | DISTINCT] expression) Total of the values in the numeric expression, either all or distinct
AVG([ALL | DISTINCT] expression) Average of the values in the numeric expression, either all or distinct
COUNT([ALL | DISTINCT] expression) Number of values in the expression, either all or distinct
COUNT(*) Number of selected rows
MAX(expression) Highest value in the expression
MIN(expression) Lowest value in the expression

SUM and AVG work only with numeric values. SUM, AVG, COUNT, MAX, and MIN ignore null values, but COUNT(*) does not.

Note You cannot use GROUP BY or HAVING on columns that have the text or image datatype.

The following query finds the total year-to-date sales of each publisher in the database:

SELECT pub_id, total = SUM(ytd_sales)
FROM titles
GROUP BY pub_id
pub_id
total

------
-----

0736
28286

0877
44219

1389
24941




(3 row(s) affected)


Because of the GROUP BY clause, only one row is returned for each publisher, and it contains the sum of all sales for that publisher.

Important The exact results of a GROUP BY clause depend on the sort order chosen during installation. For information about the effects of different sort orders, see Microsoft SQL Server Setup.

The HAVING clause sets conditions on the GROUP BY clause similarly to the way the WHERE clause interacts with the SELECT clause. The HAVING clause syntax is exactly like the WHERE clause syntax except that it can contain aggregate functions.

If, for example, you want to see the publishers who have had year-to-date sales greater than $40,000, use this query:

SELECT pud_id, total = SUM(ytd_sales)
FROM titles
GROUP BY pub_id
HAVING SUM(ytd_sales) > 40000

If, for example, you want to make sure that there are at least six books involved in the calculations for each publisher, use HAVING COUNT(*) > 5 to eliminate the publishers that return totals for fewer than six books. The query looks like this:

SELECT pub_id, total = SUM(ytd_sales)
FROM titles
GROUP BY pub_id
HAVING COUNT(*) > 5
pub_id
total
------
-----
0877
44219
1389
24941



(2 row(s) affected)

With this statement, two rows are returned. New Moon Books (0736) is eliminated.

For a detailed description of GROUP BY, HAVING, and aggregate functions, see Displaying Totals with GROUP BY or COMPUTE.