UNION

Removing Duplicates

Consider UNION(S1, S2) where:

S1 = {Kansas, Buffalo, Topeka, USA, Canada}

S2 = {Kansas, Topeka, Wichita, Canada}

  1. As shown in “Literal Sets,” make two tables — S1 and S2 — that contain the fully qualified member names in the sets S1 and S2 respectively.

  2. The following query does the union:
    SELECT Name, Newrank AS Rank
    FROM 
             (((SELECT Name, "1" AS Src, Rank FROM S1
             UNION ALL
             SELECT Name, "2" AS Src, Rank FROM S2)
          RANKTOP 1 ROWS RANKORDER BY Src, Rank RANKGROUP BY Name)
       RANK ROWS AS NewRank RANKORDER BY Src, Rank)
    ORDER BY Rank
    

    In English, the query:

    1. Makes a union of S1 and S2, with the Src column tagging which set a row came from.

    2. The RankTop operation groups duplicates into separate groups by using the GROUP BY clause and orders each group by Src, Rank, and picks up the first one. This ensures that duplicates are eliminated from the tail.

    3. The Rank orders the filtered rows by Src and Rank and assigns a new rank to each row so that input ordering is preserved.

Retaining Duplicates

Consider UNION(S1, S2, ALL). As in the previous section, tables are generated for S1 and S2. Then you use the following query:

SELECT Name, Newrank AS Rank
FROM 
      ((SELECT Name, "1" AS Src, Rank FROM S1
      UNION ALL
      SELECT Name, "2" AS Src, Rank FROM S2)
   RANK ROWS AS NewRank RANKORDER BY Src, Rank)
ORDER BY Rank

This is simpler because you need not remove duplicates. After the union, the Rank operation generates a rank value for each of the rows of the table joined by union. The outer SELECT emits this table, ordering it by the NewRank (renamed as Rank) column.

Generalizing for Sets with Arbitrary Dimensionality

Each tuple has more than one element for these sets. Replace the Name column in the select list of each SELECT statement above by a list of columns (which is equivalent to the number of dimensions in the input sets).