Operators on Tables

[This is preliminary documentation and subject to change.]

DBOP_alias
The alias facilities for a table. DBOP_alias takes one mandatory and one optional inputs. The first is a table-valued subtree, often a table_name, representing the table being aliased. The alias name for the table is provided in the pwszValue of the DBOP_alias command. A second optional input is a column_list_anchor containing column elements representing aliases to be assigned to the columns of the input table. SQL examples: "SELECT a.* FROM table AS a" and "SELECT x, z FROM table AS a (x, y, z)." The output table is of the same table type as the input table. The new name(s) replaces the old name(s) for the output of this operator. If the second input is provided, then the size of the column_list must be the same as the number of columns in the first input.
DBOP_cross_join
Represents the Cartesian product of two tables, i.e., SQL's cross join. This operator takes two tables T1 and T2 as input and produces a table R. The columns of R are all the columns of T1 and T2. Let t1, t2, and r denote a row in T1, T2, and R, respectively. Let r[Ti] denote the projection of row r on the columns of Ti, i=1,2. Each row r in R is formed by taking each row t1 in T1 and concatenating it with each row t2 in T2, so that r[T1]=t1 and r[T2]=t2.The cardinality of R is the product of the cardinalities of T1 and T2. R is a unique table if both T1 and T2 are unique tables; otherwise, it is a table.
DBOP_union_join
Represents SQL's union join. This operator takes two tables T1 and T2 as input and produces a table R. The columns of R are all the columns of T1 and T2. Let ti_null denote a row of Ti, i=1,2, containing NULL values in every column. Each row r in R is formed by taking every row from T1 and T2 as follows: (a) r[T1]= t in T1 and r[T2] = t2_null, or (b) r[T1] = t1_null, and r[T2] = t in T2. The cardinality of R is the sum of the cardinalities of T1 and T2.

The output is a unique table if both inputs are unique tables; otherwise, it is a table.

DBOP_inner_join, DBOP_left_semi_join, DBOP_right_semi_join, DBOP_left_anti_semi_join, DBOP_right_anti_semi_join, DBOP_left_outer_join, DBOP_right_outer_join, DBOP_full_outer_join
All join operators take two required table inputs, T1 and T2, and produce a table R as output. A third required input is a Boolean-valued expression specifying the join condition (i.e, SQL's ON or WHERE clause). The columns of R are specified as follows:

The rows in R are specified as follows:

The output R is a table in general; it is a unique table (1) for inner join and all outer joins, if both inputs are unique tables, (2) for left semi-join and left anti-semi-join, if the left input is a unique table, and (3) for right semi-join and right anti-semi-join, if the right input is a unique table.

DBOP_natural_join, DBOP_natural_left_outer_join, DBOP_natural_right_outer_join, DBOP_natural_full_outer_join
All natural join operators take two required table inputs, T1 and T2, and produce a table R as output. A third optional input is a column_list specifying the columns on which to test the join predicate (i.e, SQL's USING clause). Natural joins are different to the join operations described above in that they have an implicit projection and an implicit join condition. Let CC represent the columns that are common to both T1 and T2, T1C represent the columns of T1 not in CC, and T2C represent the columns of T2 not in CC. The columns of R are T1C, CC, and T2C. Producing the set of columns CC only once in R represents the implied projection. The implied join condition is the conjunction of terms T1.Ci = T2.Ci, for every Ci in CC. If the column_list input of these operators is included, then the join condition is restricted to the terms involving only the columns in the column list, which can only reference columns in CC. Other than the implied projection and join condition, the semantics of the natural join operators is exactly as their non-natural counterparts. That is, the cardinality of R is exactly the cardinality of the non-natural join operators. If CC is the empty set, then the implied join predicate is TRUE and R contains the cross join of T1 and T2.
DBOP_set_intersection, DBOP_set_union, DBOP_set_left_difference, DBOP_set_right_difference, DBOP_set_anti_difference
Set operations, requiring "union compatible" input tables. These operators take two required inputs T1 and T2 and produce a result table R. Let A1, A2,..., An denote the columns of T1, and B1, B2,..., Bm denote the columns of T2. T1 and T2 are union compatible if: (a) n=m, and (b) the types of Ai and Bi, i=1,..,n, are compatible. The rules determining when the types of two columns are compatible and what the type of the resulting columns will be are provider-specific. If Ai,Bi, have the same name, that name is propagated to the output; otherwise, the resulting name is provider-specific. Left-difference is the usual difference operator; right-difference is the opposite (T1 right-difference T2 == T2 left-difference T1). The anti-difference is the union of the left- and right-differences. For SQL's "union corresponding," the tree node requires a third optional input, which must be a non-empty column_list_anchor. If the third input is present, the input tables are projected on the columns named in that list before the set operation proceeds. If the third input is not included and the Boolean-valued field fValue of the DBCOMMANDTREE node is set to TRUE, then the command processor decides the maximal set of columns from the two inputs that are union-compatible and should therefore be preserved. The output is a unique table.
DBOP_bag_intersection, DBOP_bag_union, DBOP_bag_left_difference, DBOP_bag_right_difference, DBOP_bag_anti_difference
Multi-set (bag) operations, requiring "union compatible" input tables. The number of occurrences of a row in the output table is the minimum (intersection), sum (union), or difference respectively of that row's number of occurrences in the two input tables. Variants are as for set operations. DBOP_bag_union represents SQL's UNION ALL operator. Two or more table inputs, the output is a table.

Given table A with 2 occurrences of row X and table B with 1 occurrence of row X, then:

DBOP_division
Represents the relational division operator. This operator takes two required table inputs T1 and T2, and produces a unique table R as result. The first input, T1, is the dividend table, the second input, T2, is the divisor table. The set of columns of T2 must be a subset of the columns of T1. Let CC be the columns that are common to T1 and T2, and T1C be the columns in T1 not in T2. Table R has all the columns in T1C. Let t1 denote a row in T1 and t2 a row in T2. The result R contains the set of rows r such that for every row t1 in T1, there is a row t2 in T2 with r=t1[T1C] and t1[T2]= t2.
DBOP_relative_sampling
A random sampling of rows from a table based on a reduction factor (given in percent, must be an integer between 0 and 100). Randomness is guaranteed to the provider's ability. The only required input is a table. The integer between 0 and 100 is represented as an integer in the ulValue field of the DBCOMMANDTREE node. If the input table is unique, so is the output table; otherwise, the output is a table.
DBOP_absolute_sampling
A random sampling of rows from a table with a known, fixed output cardinality. The output cardinality is given as a positive integer stored in the ulValue field of the DBCOMMANDTREE node (values larger than the input cardinality raise an error). Randomness is guaranteed to the provider's ability. The only required input is a table. If the input table is unique, so is the output table; otherwise, the output is a table.
DBOP_transitive_closure
The transitive closure operator performs recursive computations that involve traversal of all paths in a directed graph starting from a set of initial nodes as defined by Agrawal [IEEE TSE, 14:7, 1988]. Suppose the input table T is of the form T(K1, K2, P1, P2, ..., Pm), where K1 and K2 are columns belonging to the same domain. K1 and K2 can consist of more than one, but an equal number of columns. T can be represented as a directed graph in which each row t in T is represented as an link (arc) from node t[K1] to node t[K2]. The values t[Pi], i=1,...,m, represent link properties. Semantically, the transitive_closure operator computes a table representing all paths in that graph as the following iterative program:
let R = T;
do
  let R = select * from T
          union
          select R.K1, T.K2 from R inner join T on R.K2 = T.K1;
while "R changes";
 

The operator takes 4 mandatory inputs. First, a table T representing directed links in a graph. Second, a join predicate to determine when a link and a path can be concatenated (e.g., R.K2=T.K1). Third, a project_list representing the columns to be preserved after such a concatenation, each column may be the result of a scalar computation (including aggregate functions). In the above program, it is represented by R.K1, T.K2, but it may also include Pi columns from R and T. Fourth, a predicate to select result tuples based on their derivation history (see below). A fifth optional input represents a selection condition on the initial table. The derivation path that led to the existence of a row in the transitive_closure of T is captured in an additional chaptered column, called Delta, that gets implicitly attached to each result row. Thus, the operator's output is a hierarchical table R, having the same columns as T plus the chaptered column. For example, if a relation T is defined as:
T K1 K2
a b
b c
c d

The result R of the transitive closure of T would be:
R K1 K2 Delta
A b {<a,b>}
B c {<b,c>}
C d {<c,d>}
A c {<a,b>, <b,c>}
B d {<b,c>, <c,d>}
A d {<a,b>, <b,c>, <c,d>}

Thus, the fourth input to this operator is a predicate involving Delta.

DBOP_recursive_union
As required by SQL-3 (but not SQL-2), the recursive union starts with an input table and adds to that table (in the result, not in the stored database) rows that can be computed by a predicate from rows already in the table. When viewed as a computation on graphs, this operation starts with a set of nodes and adds nodes reachable by one or more steps in the graph. The starting points are specified in the input table, and the steps are specified by the predicate. Alternatively, the starting table's rows can represent a set of arcs in a graph, and the predicate can express how links connect and can be combined into new links (paths).

 

DBOP_aggregate
This operator combines grouping of rows from an input table T into equivalence classes, and aggregate function computation within each group to produce a table R. The operator first builds equivalence classes based on equal values in a set of (grouping) expressions (handling NULLs as required in SQL), then collapses each equivalence class into a single row by computing an aggregate function on the rows in the class. The operator takes 3 required inputs. The first input is a table. The second is a list of grouping expressions represented by a project_list_anchor (as for the "project" operator). The grouping expressions form the unique key for the output table. If the list of grouping expressions is empty, the entire input is presumed to be a single equivalence class and the output is a table with a single row. The list of grouping expressions may include all columns of the input table. The aggregation expressions, third input, are passed as a second project_list_anchor (similar to "project" operator), with the requirement that references to input columns other than those in the grouping list must be nested inside of aggregate functions (e.g., "sum"). Note that both lists are "project" lists, thus permitting arithmetic and assignment of new column names (expressions in the grouping list require a name!). The new name(s) replaces the old name(s) for the output of this operator. The table R contains all the columns mentioned in the two project lists. The output is a unique table.