Abstraction

“Abstraction” is the process of ignoring details in order to concentrate on essential characteristics. A programming language is traditionally considered “high-level” if it supports a high degree of abstraction. For example, consider two programs that perform the same task, one written in assembly language, one in C. The assembly-language program contains a very detailed description of what the computer does to perform the task, but programmers usually aren't concerned with what happens at that level. The C program gives a much more abstract description of what the computer does, and that abstraction makes the program clearer and easier to understand.

While traditional languages support abstraction, object-oriented languages provide much more powerful abstraction mechanisms. To understand how, consider the different types of abstraction.

Procedural Abstraction

The most common form of abstraction is “procedural abstraction,” which lets you ignore details about processes.

There are many levels of procedural abstraction. For example, it's possible to describe what a program does in even greater detail than assembly language does, by listing the individual steps that the CPU performs when executing each assembly language instruction. On the other hand, a program written in the macro language of an application program can describe a given task on a much higher level than C does.

When you write a program in a given language, you aren't restricted to using the level of abstraction that the language itself provides. Most languages allow you to write programs at a higher level of procedural abstraction, by supporting user-defined functions (also known as procedures or subroutines). By writing your own functions, you define new terms to express what your program does.

As a simple example of procedural abstraction, consider a program that frequently has to check whether two strings are the same, ignoring case:

while (*s != '\0')

{

if ( (*s == *t) ||

((*s >= 'A') && (*s <= 'Z') && ((*s + 32) == *t)) ||

((*t >= 'A') && (*t <= 'Z') && ((*t + 32) == *s)) )

{

s++; t++;

}

else break;

}

if ( *s == '\0' )

printf("equal \n");

else

printf("not equal \n");

By writing a program this way, you are constantly reminded of the comparisons that the program performs to check whether two strings are equal. An alternate way to write this program is to place the string comparison in a function:

if ( !stricmp( s, t ) )

printf("equal \n");

else

printf("not equal \n");

The use of stricmp does more than save you a lot of typing. It also makes the program easier to understand, because it hides details that can distract you. The precise steps performed by the function aren't important. What's important is that a case-insensitive string comparison is being performed.

Functions make large programs easier to design by letting you think in terms of logical operations, rather than in specific statements of the programming language.

Data Abstraction

Another type of abstraction is “data abstraction,” which lets you ignore details of how a data type is represented.

For example, all computer data can be viewed as hexadecimal or binary numbers. However, since most programmers prefer to think in terms of decimal numbers, most languages support integer and floating-point data types. You can simply type “3.1416” rather than some hexadecimal bytes. Similarly, Basic provides a string data type, which lets you perform operations on strings intuitively, ignoring the details of how they're represented. On the other hand, C does not support the abstraction of strings, since the language requires you to manipulate strings as series of characters occupying consecutive memory locations.

Data abstraction always involves some degree of procedural abstraction as well. When you perform operations on variables of a given data type, you don't know the format of the data, so you can ignore the details of how operations are performed on those data types. How floating-point arithmetic is performed in binary is, thankfully, something C programmers don't have to worry about.

Compared to their capacity for procedural abstraction, most languages have very limited support for creating new levels of data abstraction. C supports user-defined data types through structures and typedefs. Most programmers use structures as no more than aggregates of variables. For example:

struct PersonInfo

{

char name[30];

long phone;

char address1[30];

char address2[30];

};

Such a user-defined type is convenient because it lets you manipulate several pieces of information as a unit instead of individually. However, this type doesn't provide any conceptual advantage. There's no point in thinking about the structure without thinking about the three pieces of information it contains.

A better example of data abstraction is the FILE type defined in STDIO.H:

typedef struct _iobuf

{

char __far *_ptr;

int _cnt;

char __far *_base;

char _flag;

char _file;

} FILE;

A FILE structure is conceptually much more than the fields contained within it. You can think about FILEs without knowing how they're represented. You simply use a FILE pointer with various library functions, and let them handle the details.

Notice that it's possible to declare a structure without declaring the functions needed to use the structure. The C language lets you view data abstraction and procedural abstraction as two distinct techniques, when in fact they're integrally linked.

Classes

This is where object-oriented programming comes in. Object-oriented languages combine procedural and data abstraction, in the form of classes. When you define a class, you describe everything about a high-level entity at once. When using an object of that class, you can ignore the built-in types contained in the class and the procedures used to manipulate them.

Consider a simple class: polygonal shapes. You might think of a polygon as a series of points, which can be stored as a series of paired numbers. However, a polygon is conceptually much more than the sum of its vertices. A polygon has a perimeter, an area, and a characteristic shape. You might want to move one, rotate it, or reflect it. Given two polygons, you might want to find their intersection or their union, or see if their shapes are identical. All of these properties and operations are perfectly meaningful without reference to any low-level entities that might make up a polygon. You can think about polygons without thinking about the numbers that might be stored in a polygon object, and without thinking about the algorithms for manipulating them.

With support for combined data abstraction and procedural abstraction, object-oriented languages make it easy for you create an additional layer of separation between your program and the computer. The high-level entities you define have the same advantage that floating-point numbers and printf statements have when compared to bytes and MOV instructions: they make it easier to write long and complex applications.

Classes can also represent entities that you usually wouldn't consider data types. For example, a class can represent a binary tree. Each object is not simply a node in a tree, the way a C structure is; each object is a tree in itself. It's just as easy to create multiple binary trees as it is to create one. More importantly, you can ignore all the nonessential details of a binary tree. What features of a binary tree are you really interested in? The ability to quickly search for an item, to add or delete items, and to enumerate all the items in sorted order. It really doesn't matter what data structure you use, as long as you can perform the same set of operations on it. It might be a tree implemented with nodes and pointers, or a tree implemented with an array, or some data structure you've never heard of.

Such a class shouldn't be called BinaryTree, since that name implies a particular implementation. Based on the operations that can be performed on it, the class should be called SortedList or something similar.

By designing your program around abstract entities that have their own set of operations, rather than using data structures made of built-in types, you make your program more independent from implementation details. This leads to another advantage of object-oriented programming: encapsulation.