The NOT IN() Subquery Trap

Ron Talmage

Most people view issues associated with NULL as being about as exciting as the debate between two-valued and three-valued logic. However, not understanding how SQL Server's NOT IN() function handles NULL can bite you, so make sure you read Ron's article.

Danger lurks in your Transact-SQL (T-SQL) SELECT statements when you use the NOT IN() function with subqueries. Just a minor change to the data can dramatically affect a query's results; in particular, adding a single NULL to the data can cause a SELECT command to suddenly return no rows.

See for yourself

Let's use two familiar tables (Publishers and Titles) from the pubs database. Suppose we want a list of all publishers who don't have titles. An easy and natural way to get this is with the following SQL SELECT statement:

SELECT pub_id FROM Publishers

   WHERE pub_id NOT IN 

      (SELECT pub_id FROM Titles)

This yields the following (expected) results:

pub_id 

------ 

1622   

1756   

9901   

9952   

9999

But change a pub_id column to NULL in Titles, and the query returns no rows at all. To see this for yourself, execute the following batch in ISQL/W:

BEGIN TRANSACTION

PRINT '------- Before -------'

SELECT pub_id FROM Publishers

   WHERE pub_id NOT IN 

      (SELECT pub_id FROM Titles)

UPDATE Titles

SET pub_id = NULL

   WHERE title_id = 'BU2075'

PRINT '------- After -------'

SELECT pub_id FROM Publishers

   WHERE pub_id NOT IN 

      (SELECT pub_id FROM Titles)

ROLLBACK TRANSACTION

The preceding transaction executes the query, changes one of the Titles.pub_id values to NULL, and then re-executes the query. This way, you can see both a "before" and "after" image of the data. Then the ROLLBACK statement puts the data back into its original state. The results of running the batch are:

------- Before -------

pub_id 

------ 

1622   

1756   

9901   

9952   

9999   

------- After -------

pub_id 

------

One or more NULLs for pub_id in the Titles table causes the query to no longer return any rows! Of course, you can do this outside a transaction and vary the UPDATE statement, but the query results will be the same: no rows returned. Incidentally, this result occurs with ANSI_NULLS either ON or OFF. (Just a reminder -- any setting of ANSI_DEFAULTS also causes ANSI_NULLS to take the same setting. In the remainder of this article, I'll just refer to ANSI_NULLS.)

We've seen what IN with a subquery does; now what about the equivalent IN function with an explicit list of values? After NULL is added to the Titles table, the SELECT query will return three pub_id values ("1389", "0736", "0877"), plus NULL. We can substitute a list of these values in place of the subquery:

SELECT pub_id FROM Publishers

   WHERE pub_id NOT IN ('1389', '0736', '0877', NULL)

Executing this in pubs using ISQL/W results in:

pub_id 

------ 

1622   

1756   

9901   

9952   

9999

Now this is a surprise. Why does this equivalent version of the query return the correct rows? (This query does change behavior with ANSI_NULLS settings, and I'll explore why soon.) Let's summarize the problem as two issues:

The "NOT IN() Subquery Trap" -- Why does a NOT IN(subquery) suddenly return no rows when NULL is added to the subquery's results set?
The "NOT IN(list) Anomaly" -- Why does the NOT IN(list) return the correct rows, when a NOT IN(subquery) returns no rows?

Evaluating NOT IN() queries

To explore the NOT IN() Subquery Trap, let's find out how to evaluate NULL in a NOT IN() function. Recently, I asked SQL guru Joe Celko about this issue on his CompuServe CASEFORUM section. In a lively exchange, he showed me how to translate the ANSI SQL IN() function into a series of equality tests, and why NULL can have such a dramatic effect.

To evaluate an IN() function, we can translate it into a series of OR statements. For example, the original subquery without the NOT:

SELECT pub_id FROM Publishers

   WHERE pub_id IN ('1389', '0736', '0877')

becomes, using OR:

SELECT pub_id FROM Publishers

   WHERE (pub_id = '1389' OR pub_id = '0736' 

      OR pub_id = '0877')

To evaluate the NOT IN(list) subquery, we follow the same rules, just adding the NOT:

SELECT pub_id FROM Publishers

   WHERE pub_id NOT IN ('1389', '0736', '0877')

which becomes, using OR:

SELECT pub_id FROM Publishers

   WHERE NOT (pub_id = '1389' OR pub_id = '0736' 

      OR pub_id = '0877')

But this time, we can distribute the NOT into the grouping, changing OR to AND:

SELECT pub_id FROM Publishers

   WHERE (pub_id != '1389' AND pub_id != '0736' 

      AND pub_id != '0877')

Now what happens when we add NULL to the list?

SELECT pub_id FROM Publishers

   WHERE pub_id NOT IN ('1389', '0736', '0877', NULL)

becomes

SELECT pub_id FROM Publishers

   WHERE NOT(pub_id = '1389' OR pub_id = '0736' 

      OR pub_id = '0877' OR pub_id = NULL)

and then

SELECT pub_id FROM Publishers

   WHERE (pub_id != '1389' AND pub_id != '0736' 

      AND pub_id != '0877' AND pub_id != NULL)

How do we interpret "pub_id != NULL"? Clearly, this is the crux of the issue.

Into the unknown

In T-SQL, the behavior of "pub_id != NULL" changes depending on the setting of ANSI_NULLS. The following script shows the change. Try it in ISQL/W, and vary the setting of ANSI_NULLS from ON to OFF:

SET ANSI_NULLS ON

GO

DECLARE @x int 

SELECT @x = NULL

IF @x = NULL

   PRINT '@x=null is true'

ELSE

   IF NOT @x = NULL

      PRINT '@x=null is false'

   ELSE

      PRINT '@x=null is unknown'

The result, when ANSI_NULLS is ON, is:

@x=null is unknown

But the result, when ANSI_NULLS is OFF, is:

@x=null is true

When ANSI_NULLS is ON, NULL isn't equal to any value, including NULL. Nor can NULL be unequal to any value. So the expression "= NULL" takes a third logical value, "Unknown," which is neither True nor False. So does "!= NULL", because the negation of something Unknown is also Unknown. On the other hand, when ANSI_NULLS is OFF, "= NULL" behaves the same as "IS NULL" and evaluates to either True or False.

Well, back to the current question: How do we translate a query with NULL in the list? Recall the translated query:

SELECT pub_id FROM Publishers

   WHERE (pub_id != '1389' AND pub_id != '0736' 

      AND pub_id != '0877' AND pub_id != NULL)

Aha! See it? The last conjunction, "pub_id != NULL", will always be Unknown when ANSI_NULLS is ON. Now any expression that's either True or False becomes Unknown when ANDed with an Unknown, so the entire expression in the WHERE clause evaluates to Unknown. In ANSI SQL, there's one more rule to apply: A WHERE clause must evaluate to True in order to return any rows. Since the WHERE clause never evaluates to True in our example, the query will never return any rows!

What about the NOT IN(list) Anomaly? The reason that the NOT IN(list) didn't behave like the NOT IN(subquery) in our original example was that ANSI_NULLS was OFF. In fact, ANSI_NULLS defaults to OFF in ISQL/W. Let's try the IN(list) subquery again:

SET ANSI_NULLS ON

GO

SELECT pub_id FROM Publishers

   WHERE pub_id NOT IN ('1389', '0736', '0877', NULL)

Now the query returns no rows, just as the ANSI rules specify, and it matches its NOT IN(subquery) equivalent. With ANSI_NULLS ON, the NOT IN(list) Anomaly just disappears.

Resolving the problems

As to the first issue -- the NOT IN() Subquery Trap -- why does the NOT IN(subquery) suddenly return no rows when NULL is added to the subquery results set? Because the WHERE clause evaluates to an Unknown state. The presence of NULL in the subquery results set causes the WHERE clause to evaluate to an Unknown state, and therefore no rows are returned. This might be unexpected or unintuitive, but it's standard ANSI SQL, and Transact-SQL follows it.

As for the NOT IN(list) Anomaly, why does the NOT IN(list) return the correct rows, when a NOT IN(subquery) doesn't? We now have our answer -- we only missed it originally because ISQL/W defaults to ANSI_NULLS being OFF, and we didn't see that the NOT IN(list) also returns no rows when ANSI_NULLS are ON. It's a little troubling, though, to see that the NOT IN() function behaves consistently between subquery and list when ANSI NULLS are ON, but not when they're OFF.

Avoiding the NOT IN() subquery trap

How can we protect our NOT IN() queries against unexpected or unintuitive results caused by NULL? One way is to rewrite the original query and explicitly remove NULL from the subquery results set:

SELECT pub_id FROM Publishers

   WHERE pub_id NOT IN 

      (SELECT pub_id FROM Titles

         WHERE pub_id IS NOT NULL)

By adding the explicit "WHERE pub_id IS NOT NULL" clause in the subquery, we remove NULL from the result set of the subquery, and the WHERE clause evaluates to True or False. Now the outer query returns the expected rows, irrespective of NULL in the subquery results set.

A second, and perhaps better, way is to rewrite the query using NOT EXISTS, correlating the subquery to the outer table:

SELECT p.pub_id FROM Publishers p

   WHERE NOT EXISTS 

      (SELECT * FROM Titles t

         WHERE t.pub_id = p.pub_id)

Because the SELECT * subquery either returns some or no rows, the NOT EXISTS will always evaluate to either True or False, and never Unknown.

Interestingly, SQL Showplan reveals that both of the preceding queries resolve to the same result, namely a NOT EXISTS query. It seems that Transact-SQL actually translates IN() and NOT IN() subqueries into EXISTS and NOT EXISTS equivalents in order to execute them.

Bottom line

What lessons can we draw from the NOT IN() Subquery Trap? First, we need to make sure that a WHERE clause in a subquery can't accidentally evaluate to Unknown based on NULL in the results set. In the case of NOT IN(subquery), that rule implies we should explicitly exclude NULL from subquery results sets or use NOT EXISTS.

Next, we should keep ANSI_NULLS ON, especially in ISQL/W, because when ANSI_NULLS is ON, the NOT IN(subquery) and NOT IN(list) both treat NULL the same. But when ANSI NULLS is OFF, the NOT IN(list) treats "= NULL" the same as "IS NULL", and its results are no longer consistent with the NOT IN(subquery).

View sample code for this article here

Ron Talmage is a senior database technologist with ST Labs who specializes in Microsoft SQL Server database testing and instruction. He has over 15 years' experience in software design and programming, including eight years of teaching computing at the college level. He has developed database applications in real estate software, computer telephony, and public health research, and he also developed and currently teaches ST Labs' "Introduction to Microsoft SQL Server 6.5 for Testers" course. Ron is active in local database SIGs and is writing a book on SQL Server 7.0 for Prima Publishing. ron@stlabs.com, 70274.1224@compuserve.com.

Resources

·CompuServe CASEFORUM, Section 18. Celko on SQL, Messages 178221ff.
·For more information about three-valued logic in SQL, see C. J. Date's An Introduction to Database Systems, 6th Ed. (Addison-Wesley, 1995), Chapter 20.