Defining the Data

As you learn more about your application's data structures, you can eventually group selected data items together and assign certain information details describing the data's characteristics and relationships.

The general approach to defining data includes the following:

Defining the tables, rows, and columns.
Inserting index keys.
Creating table relations.
Assigning data types.

Defining the Tables, Rows, and Columns

Regardless of how your application's data is physically stored, the data is typically organized into multiple tables (or files), each having a set of rows (or records) and columns (or fields), similar to the rows and columns of a spreadsheet. Each row in the table contains all of the information about a particular thing, person, product, item, customer, or asset.

For example, the following Authors table stores the name, address, and phone number for several authors.

Row	Name	Address	Phone
1	Arnie Baldwin	xxxxx xxxxxxxx	(xxx) xxx-xxxx
2	Don Hall	xxxxx xxxxxxxx	(xxx) xxx-xxxx
3	Teresa Atkinson	xxxxx xxxxxxxx	(xxx) xxx-xxxx
4	David Simpson	xxxxx xxxxxxxx	(xxx) xxx-xxxx

Inserting Index Keys

A key is a special field that provides an index for fast retrieval. A key can be unique or non-unique, depending on whether duplicates are allowed. A key can be designated as the primary key, making it the unique identifier for each row of the table. You should use keys where your application needs direct access to certain rows.

For example, in the following table the author's identification number (au_id) is added as the table's primary key, because au_id uniquely identifies one and only one author. A query using au_id will provide very fast retrieval of that author's information.

Authors table

au_id (key)

au_name

au_address

au_phone

Creating Table Relations

A database is usually composed of more than one table, and the tables are often related to one another in various ways. For instance, a Titles table might list the International Scientific Book Number (ISBN), title, and year the book was published. It would also be useful to identify the publisher for each title. Rather than repeating all of the publisher information for each title in the Titles table, you could simply establish a relationship with the Publishers table by including the publisher's identification (pu_id) in the Titles table.

In the following example, the Titles table is related to the Publishers table.

Titles table	Publishers table
ti_isbn (key)	pu_id (key)
ti_title	pu_name
ti_yearpublished	pu_address
pu_id (foreign key)	pu_phone

This is called a one-to-many relationship, because a single row in the Titles table is related to just one publisher in the Publishers table, but a single row in the Publishers table can be related to one or more records in the Titles table. You can also create relations that are one-to-one and many-to-many.

It's worth noting that you have only identified that a relationship exists between the Titles and Publishers tables — you have made no commitment to how that relationship will be managed. Depending on your final implementation, you might use the table joins and foreign key constraints available with Microsoft^® SQL Server™, or you might write custom code to read the file structures directly and handle referential integrity inside your application.

Assigning Data types

A data type is a named category of data characterized by a set of values, a way to denote the values, and some implied operations that can interpret and manipulate the values. Data types can be either intrinsic or derived.

An intrinsic data type is one that is provided by your database. For example, SQL Server provides intrinsic data types such as integer, datetime, bit, char, and varchar.

Derived data types are defined using the Data Modeling Language (DML) provided by your database. A derived data type is built from the available intrinsic data types or previously defined derived data types. You typically provide a name and a structure for the derived data type. With derived data types, you can assure consistent use of special data types for selected columns, variables, and parameters.

Data types are important because they assure that the assigned data value is of the correct type and within the acceptable range of values. Different data storage technologies and development languages support a variety of data types, including the following.

Boolean
Integer
Float
Datetime
Tinyint
DISPLAY
COMP-3
Binary
String
Character

While assigning data types, you want to be sure that the range provided by the data type fits the data which will be stored and — hopefully — anticipates future changes. For example, if you choose "tinyint" as a data type for customer identification, your application can handle a maximum of only 255 customers. On the other hand, if you choose the "integer" data type, you can have over two billion customers. As another example, if you use a single character to indicate customer service code, an expansion to two characters will create maintenance havoc.

You can save space in your database and improve join operations by choosing appropriate data types for fields. As a general rule, you should choose the smallest data type that's appropriate for the data in the field.

While you're assigning data types, some things to think about are:

Allowable maximums and minimums.
Default values.
Empty (or NULL) values.
Anticipated growth.
Expected and (to the extent possible) unexpected changes.

In a relational database environment, data types help enforce the business rules. For example, you can't add dollars and colors and get a useful answer. While you would never program this process intentionally, a relational database will identify the data type mismatch and automatically deny your query.

Note If you are using Microsoft SQL Server, you can easily define the data types, create queries, and populate your database using Microsoft Visual Database Tools. For more information, search online for "Microsoft Visual Database Tools Overview" in MSDN Library Visual Studio 6.0.