How Joins Are Processed

Knowing how joins are processed will help you to understand them and to figure out why incorrectly stated joins can give unexpected results. This section explains conceptually how joins are processed; the actual SQL Server procedure is more sophisticated.

The first step in processing a join is to form the Cartesian product of the tables, all the possible combinations of the rows from each of the tables. The number of rows in a Cartesian product of two tables is equal to the number of rows in the first table times the number of rows in the second table.

The Cartesian product of the authors table and the publishers table is 184 (23*8) (23 authors multiplied by 8 publishers). You can see the Cartesian product with any query that includes columns from more than one table in the select list, more than one table in the FROM clause, and no WHERE clause. For example, if you omit the WHERE clause from the join of the authors and publishers tables, SQL Server combines each of the 23 authors with each of the 8 publishers and returns all 184 rows. But this Cartesian product usually does not contain any particularly useful information (unless you are explicitly looking to see every possible combination). In fact, it is misleading, since it seems to imply that every author in the database has a relationship with every publisher in the database ¾ which is not true.

That's why a join must include a WHERE clause, which specifies the columns to be matched and the basis on which to match them. (It can also include other restrictions.) Once the Cartesian product has been formed, the rows that do not satisfy the join are eliminated, using the conditions in the WHERE clause. For example, the WHERE clause, WHERE authors.city = publishers.city, eliminates from the results all rows in which the author's city is not the same as the publisher's city.