INF: Using the "*=" and "=*" (Outer Join) Operators

ID Number: Q67753

1.00 1.10 1.11 4.20

OS/2

Summary:

An "outer join" is defined as:

A join that includes all rows from the outer table, regardless of

whether there is a matching row in the inner table. If there is

no matching row in the inner table, or if the join condition is not

met, the second table generates a "NULL" result as a match for that

row.

The outer join operators are "*=" and "=*". The "*=" operator

indicates that the first table is the outer table, and the second

table is the inner table. The "=*" operator indicates the reverse: the

first table is the inner table, and the second table is the outer

table. In any join, restrictions other than the join condition can be

placed on the outer table, but no restrictions beyond the join

condition can be placed on the inner table.

Example

-------

Consider the following operations on these two simple tables:

Table1 Table2

---------------------- --------------------------

| Col1 | Col2 | | Col3 | Col4 |

---------------------- --------------------------

1 | 1 1 | 11

2 | 2 3 | 13

3 | 3

4 | 4

A simple outer join that tests from equality between Table1.Col1 and

Table2.Col3, designating Table1 as the outer table, and Table2 as the

inner table, would look like the following:

SELECT * FROM TABLE1, TABLE2

WHERE COL1 *= COL3

The result set from this operation will be:

Col1 Col2 Col3 Col4

---------------------------------

1 | 1 | 1 | 11

2 | 2 | NULL | NULL

3 | 3 | 3 | 13

4 | 4 | NULL | NULL

Note that rows 2 and 4 from Table1 are "joined" with NULL, due to the

fact that there where no matching rows in Table2.

Reversing the direction of the join produces a different result set,

and no rows joined with "NULL":

SELECT * FROM TABLE1, TABLE1

WHERE COL1 =* COL3

The following result set is produced:

Col1 Col2 Col3 Col4

---------------------------------

1 | 1 | 1 | 11

3 | 3 | 3 | 13

The behavior of outer joins is fairly straightforward, as long as no

restrictions beyond the outer join restriction is placed on the inner

table. When restrictions are placed on the inner table of an outer

join, the results may appear unpredictable. Actually, they are not

unpredictable (within the context of a single implementation of SQL),

and conform exactly to the rules governing outer joins. Consider the

first example, with an additional restriction:

SELECT * FROM TABLE1, TABLE2

WHERE COL1 *= COL3

AND COL4 = NULL

Considering that the result set without the additional restriction is

as follows

Col1 Col2 Col3 Col4

---------------------------------

1 | 1 | 1 | 11

2 | 2 | NULL | NULL

3 | 3 | 3 | 13

4 | 4 | NULL | NULL

it seems logical that the restricted result set would be the following

Col1 Col2 Col3 Col4

---------------------------------

2 | 2 | NULL | NULL

4 | 4 | NULL | NULL

(and it is, in some implementations of SQL; see "Further Explanation,"

below).

However, the definition of an outer join specifies that ALL rows from

the outer table will appear in the result set, so the result set is

actually:

Col1 Col2 Col3 Col4

---------------------------------

1 | 1 | 1 | NULL

2 | 2 | NULL | NULL

3 | 3 | 3 | NULL

4 | 4 | NULL | NULL

Further Explanation

-------------------

Using additional restrictions on the inner table of an outer join is

considered "undefined" behavior. All SQL engines will handle this

syntax in a slightly different manner, and the behavior is only

predictable with respect to a particular implementation. In the case

of Microsoft SQL Server, the additional restrictions are actually

incorporated into the base select statement generating the result set.

Therefore, if you impose an additional restriction that creates a

condition that is met, it will be displayed in the result set. For

example, if you change the last example to test for the value "13" in

column 4, rather than NULL, the result set will be:

Col1 Col2 Col3 Col4

---------------------------------

1 | 1 | 1 | NULL

2 | 2 | NULL | NULL

3 | 3 | 3 | 13

4 | 4 | NULL | NULL

The other popular implementation is to take a "temp result table" from

the join condition, and then apply the further restriction to it. This

result set violates the definition of an outer join, because not all

rows from the outer table will be displayed. However, since this is

undefined behavior according to the ANSI standard, both positions can

be defended. The result set that you get from the following test query

SELECT * FROM TABLE1, TABLE2

WHERE COL1 *= COL3

AND COL4 = NULL

is the following "intuitive" result set:

Col1 Col2 Col3 Col4

---------------------------------

2 | 2 | NULL | NULL

4 | 4 | NULL | NULL