January 2000

SQL Service

Create Powerful Queries

Combining SQL Server's aggregate functions appropriately can help you create powerful, complex queries.

by Dianne Siebold

Reprinted with permission from Visual Basic Programmer's Journal, January 2000, Volume 10, Issue 1, Copyright 2000, Fawcette Technical Publications, Palo Alto, CA, USA. To subscribe, call 1-800-848-5523, 650-833-7100, visit www.vbpj.com, or visit The Development Exchange.

SQL Server includes ten simple but powerful aggregate functions that prove particularly useful in reports that require summary information (see Table 1). However, the power of these functions derives not from their individual functionality, which is modest if considered in isolation, but from the way you can combine them to create complex queries.

What you need:

Microsoft SQL Server 7.0

Northwind Database

In this column, I'll explain what each of SQL Server's aggregate functions does, as well as how to use each one. All the examples use the SQL Server Northwind database, so you can cut and paste any of the code and test it yourself. The ten aggregate functions include six general functions and four statistical functions (see Table 1). These functions are simple in themselves, but how you implement them and how you handle nulls and distinct values can vary your results.

Before diving into the functions themselves, let's go over the language you use to implement them. You use Structured Query Language (SQL) to communicate with a SQL Server database; Transact-SQL (T-SQL) is Microsoft's version of SQL. For example, you might use SQL commands to communicate with a database through an application, perhaps using ActiveX Data Objects (ADO) to send a SQL statement to query a database. Or, you might use T-SQL interactively by calling a stored procedure from inside SQL Server's Query Analyzer. T-SQL is a full-featured programming language, and its syntax contains the standard SELECT, INSERT, UPDATE, and DELETE commands. This language also includes conditional logic statements and syntax to create cursors. T-SQL's functions fall into one of three categories: rowset, aggregate, and scalar (see Table 2). I'll concentrate solely on aggregate functions in this article.

Aggregate functions have two characteristics: They operate on multiple values or rows, and they return a single summarizing value. For example, the AVG function takes the values from a column and returns the mean value. You can use aggregate functions in a COMPUTE BY clause, a HAVING clause, or in the return list of a SELECT statement. You cannot use aggregate functions in a WHERE clause, although you can create a WHERE clause to restrict the number of rows the function acts upon. The AVG function returns a mean value in a standard SELECT statement. For example, this code returns the average UnitPrice of all records in the Products table:

SELECT AVG(UnitPrice)

FROM Products

This function includes all rows in its computation by default, but the DISTINCT keyword limits the values AVG acts on to only the unique instances of a value. For example, this query returns the average price of all distinct beverages in the Products table:

SELECT AVG(DISTINCT UnitPrice)

FROM Products

WHERE CategoryID = 1

---------------------

48.4687

This query returns the mean price of all beverages in the Products table:

SELECT AVG(UnitPrice)

FROM Products

WHERE CategoryID = 1

---------------------

37.9791

DISTINCT Can Alter Function Behavior

Using DISTINCT in the first SELECT statement changes the nature of the query completely. Also, some functions don't support DISTINCT when you use the CUBE and ROLLUP commands. You can use the AVG function only on numeric columns; these columns have a defined datatype of int, smallint, tinyint, decimal, float, real, numeric, money, or smallmoney. The AVG function's returned datatype value depends on the expression's datatype. For example, any integer expression returns a value of type int. But you get a different result if the expression contains two datatype values:

SELECT AVG(UnitPrice * Quantity)

FROM [Order Details]

---------------------

628.5190

The UnitPrice column has a datatype of money, while the Quantity column has a datatype of smallint. SQL Server follows the rules of data-type precedence in this case. Datatype precedence converts the second value to the value type with the greatest precedence (see Table 3). Thus, the return value is of type money in this example.

Aggregate functions become more powerful when you use them with the GROUP BY clause, because you can employ them to return multiple summary figures. The GROUP BY clause restricts the rows returned in a resultset, permitting only a single row to be returned for each group value. Consider another query, where you group the Order Details table by OrderID, so only a single row is returned for each OrderID. The aggregate functions provide a summary for each group. This query returns the OrderID and the average order amount for each order:

SELECT OrderID, 'OrderAvgPrice' =

AVG(UnitPrice * Quantity)

FROM [Order Details]

GROUP BY OrderID

OrderIDOrderAvgPrice

--------  -------------

10248146.6666

10249931.7000

10250604.3333

10251223.6000

102521243.3333

Note that you must calculate the order amount for each order from the UnitPrice and Quantity columns before you average the order amount.

Another consideration when using T-SQL functions is how they handle columns with nulls in them. You might expect that a function returns null if the expression column contains null values, but all aggregate functions except for COUNT ignore nulls. This is a tricky issue. If you use an aggregate function with the results from the COUNT function, your results might vary because of the way they each handle nulls.

The AVG function serves as a good model for other aggregate functions. For example, you implement T-SQL's MAX, MIN, and SUM functions much as you do AVG. MAX returns the greatest column value in a range of records:

SELECT MAX(Quantity)

FROM [Order Details]

------

You can use the DISTINCT keyword with MAX, but Microsoft left this keyword in only for backward compatibility; it doesn't change how the MAX function works. You can use MAX with columns of any datatype except bit, text, and image. Using MAX with character columns prompts it to find the greatest value alphabetically. For example, MAX returns def if you have the values abc and def. Note that this function is case insensitive.

Return Smallest Column Value

Conversely, MIN returns the smallest column value in a range of records. You can use it with numeric, character, or datetime columns, and it also ignores nulls. The SUM function returns a total of all the expression values. Specifying DISTINCT causes SUM to return the total of all unique expression values. Unlike MAX and MIN, SUM works only with numeric columns. You can use SUM and a table join to return totals for each customer (see Listing 1).

You can use all four functions discussed so far together. For example, you can create a query that finds the largest order amount, the smallest order amount, and the average order amount for each employee (see Listing 2).

The COUNT function behaves slightly differently than other aggregate functions. The COUNT function evaluates a given expression and returns the number of items in a group. You can use COUNT in several different ways. For example, using COUNT(*) returns all the items in a group; unlike other aggregate functions, it includes null values and duplicate values. COUNT(*) doesn't take an expression because it returns all rows in a table that match your criteria:

SELECT COUNT(*)

FROM Orders

-----------

SELECT COUNT(*)

FROM Orders

WHERE EmployeeID = 4

-----------

You can use the DISTINCT keyword with COUNT to return the number of records that have non-null, unique values. For example, this query returns the number of unique EmployeeIDs in the Orders table:

SELECT COUNT(DISTINCT EmployeeID)

FROM Orders

-----------

You can also combine the COUNT function with other aggregate functions in this manner:

SELECT COUNT(*) AS Orders, SUM(Freight) AS

FreightTotal

FROM Orders

WHERE CustomerID = 'ALFKI'

OrdersFreightTotal

------    ----------

6225.5800

Like other aggregate functions, COUNT can be used with the GROUP BY clause:

SELECT ShipCity, COUNT(*) AS OrderCount

FROM Orders

GROUP BY ShipCity

ORDER BY ShipCity

ShipCityOrderCount

-----------  ----------

NULL1

Aachen6

Albuquerque18

Anchorage10

Århus11

Barcelona5

The aggregate functions for returning statistical standard deviations are STDEV and STDEVP. Standard deviation is the average amount that a number varies from the average number in a series of numbers. STDEVP returns the standard deviation for all a given population's expression values. You can use both functions only with numeric columns.

The other two statistical functions, VAR and VARP, return the variance of all expression values and the variance for all a given population's expression values, respectively. A variance is the square of a standard deviation.

You can use all aggregate functions in stored procedures because you write them in T-SQL. The same holds true for triggers, a type of stored procedure that executes when you perform an INSERT, DELETE, or UPDATE operation on a table. For example, you might want to restrict the value in the Freight column so it's always greater than the average freight amount when the user updates the table:

CREATE TRIGGER TU_Orders ON [Orders]

FOR UPDATE

AS

declare @ValidCount int

--Check to see if this col is updated

if update(Freight)

begin

--Get a count of records

select @ValidCount = COUNT(*)

from inserted

--where the new value is greater

--than the table average

where inserted.freight >

(select avg(freight) from Orders)

--If the new value isn't greater

--than the average, roll back

if @ValidCount <>1

begin

rollback transaction

end

end

Figure 1 Fine-Tune Your Queries. Click here.

One thing remains: You might want to run the Index Tuning Wizard if you have complex queries that use aggregate functions or return large resultsets. Run this wizard by opening the query in the Query Analyzer and selecting Query, then Perform Index Analysis. The wizard suggests table indexes that can optimize a query's performance. Of course, it's a judgment call as to whether you should go ahead and create the index, because it might affect other indexes or queries. For example, compare this query's performance when run with and without indexes (see Figure 1):

SELECT ShipCity, COUNT(*) AS OrderCount

FROM Orders

GROUP BY ShipCity

ORDER BY ShipCity

For example, compare this query's performance when run with and without indexes (see Figure 1).

About the Author: Dianne Siebold is a consultant and author specializing in Visual Basic and SQL Server programming. Reach her by e-mail at dsiebold@earthlink.net.