INTERSECT

Removing Duplicates

Consider EXCEPT(S1, S2) where:

S1 = {Kansas, Buffalo, Buffalo, Topeka, Topeka, Wichita, Canada, BC}

S2 = {Buffalo, Topeka, Topeka, Canada, Quebec}

  1. As shown in "Literal Sets," make two tables — S1 and S2 — that contain the fully qualified member names in the sets S1 and S2, respectively.

  2. The following query does the intersect:
    SELECT Name, NewRank AS Rank
    FROM 
             (((SELECT Name, Rank FROM S1
             WHERE S1.Name IN (SELECT Name FROM S2))
          RANKTOP 1 LEVELS RANKORDER BY Rank RANKGROUP BY Name)
       RANK ROWS AS NewRank RANKORDER BY Rank)
    ORDER BY Rank
    

    In English, the query:

    1. Selects the rows of S1 that are also in S2.

    2. The RANKTOP operation groups duplicates into separate groups by using the GROUP BY clause and orders each group by Rank and picks up the first one. This ensures that duplicates are eliminated from the tail.

    3. The RANK operation orders the filtered rows by Rank and assigns a new rank to each row so that input ordering is preserved.

Note   You cannot use the INTERSECT operation of SQL-92 because duplicate elimination is not guaranteed to remove duplicates from the tail.

Retaining Duplicates

Consider INTERSECT(S1, S2, ALL). As in the previous section, tables are generated for S1 and S2. Then you use the following query:

SELECT Name, NewRank AS Rank
FROM 
      ((SELECT Name, Rank FROM S1
      WHERE S1.Name IN (SELECT Name FROM S2))
   RANK ROWS AS NewRank RANKORDER BY Rank)
ORDER BY Rank

This is simpler because there is no need to remove duplicates. After the query that finds the intersect, the RANK operation generates a rank value for each of the rows of the resulting table. The outer SELECT emits this table, ordering it by the NewRank (renamed as Rank) column.

Note   You cannot use INTERSECT ALL from SQL-92 because its semantics are as follows: If a row r appears n times in S1 and m times in S2, then r appears exactly p times in the result, where p is the lesser of  m and n. Moreover, the order of these p rows is not preserved.

Generalizing for Sets with Arbitrary Dimensionality

Each tuple has more than one element for these sets. Replace the NAME column in the select list of each SELECT statement above by a list of columns (equivalent to the number of dimensions in the input sets). Add as many IN clauses in the WHERE as there are columns, one for each column.