Operators for Hierarchical Tables

[This is preliminary documentation and subject to change.]

DBOP_navigate
given two tables, one or both of which may be hierarchical tables, compute their join. however, instead of creating large rows with columns from both join inputs, attach to each row of the first table all matching rows from the second table, thus creating a hierarchy of rowsets. Also adds expression-valued columns to both the new parent and the new child rowset.

There are six inputs:

DBOP_nesting
Given a flat table, create a hierarchical table. First, classify the table's rows as in a grouping operation. For each resulting group, produce a single output row to become part of the parent table. This output row contains columns defined in the parent list plus one nested column. The nested column represents a sub-table, which contains columns defined in the child list. The grouping list is a project list, i.e., includes an column name for each element. The parent list is also a project list; its values are computed from the grouping list and aggregates over the input (same grouping). The child list is also a project list, whose values are computed from the input columns. In addition to these three lists, the inputs include an input table and a name for the column representing the sub-table.

given a hierarchical table, create an additional hierarchical column some of the columns. note that a sub-table already existing in the input may become part of the new top level or the new lower level. in the former case, set equality is required for two input rows to become members of the same group. in the latter case, multi-level nesting is created.

If the input to nesting is an ordered table, it does not affect the behavior of nesting. DBOP_nesting will produce what is specified by the semantics of nesting regardless of the order of the input. An optimizer might take the ordering into consideration when defining the execution strategy for DBOP_nesting to reduce the cost of the grouping, but to the consumer it does not make any difference.

the inputs to this operator are a flat or hierarchical input table, a column name for the nested column (sub-rowset) to be created (the alias name for child levels is mandatory and must be unique across the hierarchy), and the three lists of columns described above.

The order of the parent and child tables produced as output of DBOP_nesting is undefined. This allows for more flexibility of implementation by providers (e.g., the grouping can be done by hashing, and the computation of the chapters can be based on traversing some index or computing some parameterized query.

NULLs are handled during the grouping as in SQL "GROUP BY" clauses.

DBOP_unnesting
given a hierarchical table, flatten one nested column by repeating the top-level row as many times as there are values in the nested column (rows in the sub-rowset)

the inputs are a hierarchical table and the column to be flattened.

DBOP_nested_apply
given a hierarchical table, apply any table operation to a nested column, and replace the nested column with the resulting value (in the query result, not in the underlying database), or add the query result as a new column. if multiple levels of nesting exist, the operation to be applied may be "nested apply" itself. the operations "nesting" and "unnesting" are also permitted, as well as operations that result in a scalar, e.g., "aggregate" and "exists."

the inputs are a hierarchical table and a table operation, a column to which the table operation is to be applied, and (optionally) a column name for a new column. the output is a table, which may be nested or not, depending on the input table and the result type of the table operation applied.

DBOP_cross_tab
This node takes as an input a command tree that specifies a hierarchy of rowset chapters. It outputs an augmented result that contains additional rowsets that generate 'column membership' and 'column aggregates' that are useful for cross-tabulation reports.

To specify the crosstab, an alias is given to name the top level of the input hierarchy. Also, two lists of aliases are given: RowHierarchy and ColumnHierarchy. The RowHierarchy followed by the ColumnHierarchy must indicate a continuous descending path. The first element of the RowHierarchy is required to be the top level of the input hierarchy.

The crosstab node augments the input hierarchy such that each level of the RowHierarchy contains a nested rowset that is an instance of the ColumnHierarchy. The augmented hierarchy is then nested inside a 'top' rowset. This 'top' rowset also contains a nested rowset corresponding to the ColumnHierarchy and a list of aggregate values that are computed over the entire input tree. Aggregates are specified via a 'project_list' structure and follow the same binding rules as specified by the nesting node.

Fields of a given instance of a level of the ColumnHierarchy are defined as follows:

The exact list of parameters for this node are: a command tree, an alias for the input, a list of aliases for the RowHierarchy, a list of aliases for the ColumnHierarchy, and an optional project_list of aggregates.