SQL—Defining and Querying the Database

SQL, the Structured Query Language, is an industry standard language for querying relational databases. It provides a data manipulation language (DML) used to query the data.

However, SQL is not just a query language it is also a data definition language (DDL). DDL statements are used to create the tables, views, keys and constraints that define the database.

Data definition includes the name of the table and each of the fields, the contents and restrictions on the fields (field type, field length, restrictions on whether the field can be NULL, etc.). DDL can be used to define primary and secondary keys and to establish referential integrity through database constraints.

Referential integrity is the requirement that the various tables in a database are consistent in their representation of objects. A failure in referential integrity might allow a customer record to show that a given rental is outstanding, while the tape shows itself as returned.

Queries are structured to return an answer set or a view. An answer set is a set of rows in a table, and a view is a "virtual database", which looks to the user like a standalone database, but which represents a subset of the real database based on the results of the query. The answer set may be arbitrarily large. A placeholder, called a cursor, is used to iterate through the answer set, marking which rows have been examined already.

SQL is, as its name implies, structured. You can, for example, search the database using a SELECT statement. You might write:

SELECT CustomerID, CustomerName, CustomerAddress FROM Customer WHERE MemberDate > 'January 1 1997'

This statement will select every row from Customer that meets the search criteria and report the results in rows, consisting of the three columns that you requested (CustomerID, CustomerName and CustomerAddress). The SELECT statement includes a simple WHERE clause, which restricts the answer set. You can build much more complicated WHERE statements, which can even include sub-queries. The SELECT statement also allows you to sort the results arbitrarily (though it will run faster if you sort on an indexed field), and you can eliminate duplicate rows from your answer.

The real power in queries to a relational database, however, comes from joins.

Joins

The very act of normalizing the database limits the utility of your queries, unless you can join two or more tables together. Joining the tables allows you to treat the tables as if their data were shared in a single table. There are various ways to join tables. The first and most common is an equi-join or natural join. You perform an equi-join on two tables which share a column in common. In the example shown earlier, both Table A and Table B share the customer's ID (in Table A's Who Rented column and in Table B's ID column). You join these columns in the query statement itself, using the WHERE clause. Thus you might write:

SELECT TableA.TapeName, TableB.Name FROM TableA, TableB WHERE TableA.WhoRented = TableB.ID

It is the WHERE clause that creates the equi-join. Note that the FROM clause lists two tables. If you leave out the equi-join and just join the tables by listing two or more in the FROM clause, that is called a cross-join.

A variant on an equi-join is a theta-join. A theta-join is just like an equi-join, except that rather than equating the two columns, you use another relational operator.

You can generalize these statements into inner-join statements by explicitly naming the table columns to match. Note that with an inner join, you do not have to compare the same columns in the two tables. Thus you can write:

SELECT TableA.TapeName
FROM TableA inner join TableB ON TableA.WhoRented = TableB.ID 
AND TableA.ReleaseDate > January 1 1997 AND TableB.Club = y

This will produce a set of records with the TapeName for every tape that has a release date during or after 1997 and was rented by someone who is a member of a club. In effect, you've joined these tables together and are searching against criteria in each.

If there is an inner-join, you can guess that there is an outer-join. An outer join examines two tables and returns the records of one table when there is a matching record in the second table. This is a way to say, "Show me all the records in Table A and also all the records in Table B which have matches in A."

Outer joins are handed — either left or right handed. A left handed outer join looks like this:

SELECT * FROM TableA left outer join TableB ON TableA.WhoRented = TableB.ID

This will return every record from Table A (the left table) and all the records in Table B which match the criteria.

For more detailed information on SQL, check out Instant SQL Programming (Joe Celko, Wrox Press, 1-874416-50-8).