Unified Modeling Language 101

Jefferey A. Donnici

The Unified Modeling Language is a powerful new tool for developers to use when working on object-oriented systems. Because its purpose is to document and model a software system using a language-independent methodology, software designers can more easily communicate their designs to other designers and to those who ultimately implement the software. This month's Best Practices column provides an introduction to the Unified Modeling Language and an explanation of a couple types of diagrams that can be used immediately by Visual FoxPro developers.

I know what you're thinking. It's something along the lines of, "What is he doing now, with this beginning Unified language stuff? I already know a computer language, and that's why I subscribe to FoxTalk in the first place!" Well, give me a minute before you turn the page or toss this issue into that stack on the bookshelf.

Over the past year or so, I think we've pretty well established in this space that documenting the analysis, design, and implementation of a system must be a top priority for any development project. Without the requirements and specifications, there can be no design, and without the design documentation to lead the programmers, there can be no implementation. The implementation must be documented through the liberal use of comments, or the long-term maintenance of the system will suffer with each new revision. This month's column focuses on a new approach to documenting the design phase of an object-oriented software project.

If you go to any well-stocked bookstore's Computer Science section, you're likely to find dozens of books that propose different methodologies to use when approaching an object-oriented design. Flipping through many of them will reveal a variety of odd-looking diagrams and a ton of text that's more likely to further confuse you than to make your life any easier. All of them will give you a way to document your designs, and most will provide an approach to coming up with the design, but which one should you be using? And if you often work with other developers, how do you know that your colleagues will understand the method you use for documenting your designs?

Asking around to find out which software design methodology is "best" has long been a sure way to start a programming Holy War. When reading magazine articles or books about the craft of designing software, it's common for the text to include diagrams that explain the relationships and actions within the software. The lack of a "standard methodology" for documenting a design means that these references require learning yet another approach to diagramming and notation.

Enter the Unified Modeling Language.

The birth of the Unified Modeling Language
Because of the varying methodologies for object-oriented software construction, there was a high demand for some sort of industry standards. In response to this demand, the authors of two of the more popular methodologies got together and decided to come up with a "unified" approach. These men, Grady Booch (author of the Booch method) and Jim Rumbaugh (author of the OMT, or Object Modeling Technique), were later joined by Ivar Jacobson (author of the OOSE, or Object-Oriented Software Engineering, method). The result of this collaboration is the Unified Modeling Language (UML). Within the past year, the "production" version of the UML has been officially released, but earlier versions were available on the Web so that feedback from software designers could be solicited. That initial availability has proven to be a wise decision because many development and design tools now support the UML despite its relative youth.

The purpose of the UML, as described in the specification documents, is to provide "a language for specifying, visualizing, and constructing the artifacts of software systems." In other words, it's meant as a common language for developers to use when explaining how the parts of a system are supposed to interact to form the system whole. When using the UML, you're actually using a specified "notation," just as with music or electronics, to present the "schematic" for a system that's being developed using object-oriented principles. The UML is completely independent of any one computer programming language, so it can be used to document the design of systems implemented in Visual FoxPro, Java, SmallTalk, or any other object-oriented language. The UML specification doesn't tell you how to design a system; it just lets you communicate your design to others for documentation purposes.

For this month's column, I'll be introducing the basic notation and concepts behind two of the more common types of diagrams in the UML -- the Class and Component diagrams. The Class diagram is also referred to as a Static Structure diagram because it diagrams static relationships that exist at design time and not actions, processes, or optional-runtime relationships and states. It's my belief that these two diagram types can be quickly put to use by most VFP developers. However, a basic understanding of the notation for other diagram types is important, especially as magazines, books, and development tools increasingly adopt the UML as an industry standard. To learn more about the UML specification, or to study the notation for other diagram types, check out the references mentioned at the end of this month's column.

The view from up high
Within the UML, a Component diagram provides a more high-level view of the interaction and relationships between the components in the system. A component can be any logical "piece" of the system and allows the whole to be viewed in smaller, more understandable parts without the need for the code- and class-level details. The diagram in Figure 1 shows an example Component diagram for an end user query mechanism. A shape that includes an ellipse and two horizontal rectangles represents each component. A single, continuous line with no arrows or patterns is used to provide a connection between the top of the ellipse and the bottom of the rectangle. This line also forms a "box" that contains the name and, optionally, the filename or library of the component. The dashed lines indicate the dependency relationship that exists between the different components. This line is used in other types of diagrams -- anywhere that a dependency relationship needs to be indicated.

You might have noticed the rectangular box with its corner "folded down," which is the UML notation for a "note" object. You can insert a Note into any of the UML diagrams whenever you want to make annotations about something in the diagram.

The relationships
A Class diagram is slightly more complex than a Component diagram, simply because the relationships of classes are more complex than the relatively nebulous nature of components. The diagram in Figure 2 is a simple Class diagram showing the common relationships between a family and its vehicles. For this diagram, no details are provided for the individual classes so that the diagram can be more easily understood.

In the Class diagram, a standard aggregation exists between the "family" class and the "persons" class. This type of relationship is often referred to as the "contains" or "has-a" relationship, but it doesn't imply strong ownership between the family and its family members. That is, any member of the family could leave and still be a person, and the family would continue to be a family. This relationship is represented by the hollow diamond on the "whole" side of the "whole-part" association, with a straight line (or lines) leading to the "parts."

The "2..*" annotation next to the connection between the family members and their relationship to the family indicates the relationship's "multiplicity." A family, for the purpose of this diagram, consists of two or more persons, and the asterisk represents a generic indication of "many." This multiplicity can be annotated using any continuous series of numbers, such as "5..10" to indicate from 5 to 10 parts in the whole, or as a comma-delimited list of valid numbers (that is, "3,5,7" to indicate that any of these values are valid for the relationship). When no multiplicity is indicated between an aggregate or associative relationship, it's generally assumed to mean "1".

The relationship between the family and its vehicles is an associative, or "uses," relationship. In other words, the family uses one or more vehicles as needed, but continues to be a family even if no vehicles are present. The flip side of that coin is that a vehicle is still a vehicle, whether a family is driving it or not. In the case of an association, a label is typically applied to indicate the nature of the relationship; in this case, that's represented by the word "drives." An associative relationship will also usually have a multiplicity indicator, so this relationship tells us that a family may drive "from zero to many" vehicles.

The next relationship is a "generalization" or "is-a" relationship, which refers to the principle of inheritance. In this diagram, "car," "sport utility," and "mini-van" are each a specialization of the more general "vehicle" superclass. This relationship is represented with lines running from the superclass to its subclasses, with a hollow arrow pointing to the superclass. There's no need for a multiplicity indicator with a generalization relationship.

The final relationship in this diagram is another type of aggregation called "composite aggregation." This relationship is still a "contains" or "has-a" relationship, but it implies a stronger ownership between the whole and its parts. Usually, this means that the parts are created and destroyed with the whole and that the entire aggregate is viewed externally as a single entity. A simple example of a composite aggregation would be the controls that exist inside a VFP container class. When the class is instantiated, the controls are as well; when the class is destroyed, the container takes its controls with it. This type of relationship is indicated by a solid diamond on the "whole" side of the "whole-part" composite and a straight line leading from the diamond to the "parts." In the preceding example, this notation is used to indicate that the "sport utility" class has a "ski rack" on it and that there's strong ownership by the vehicle of the ski rack (that is, when the vehicle goes, so does the ski rack). The multiplicity indicator in this relationship indicates that a single sport utility vehicle may contain "zero or one" ski rack.

The pieces and parts of a class
When providing the details for the design of a single class, a number of elements are added to the class entity on the diagram to indicate the various attributes of the class. The Class diagram shown in Figure 3 is an example of a more detailed visual description of two classes.

In this example, the "FileOutput" class is the superclass for the "SpreadsheetOutput" and "TextFileOutput" subclasses. In each case, the rectangle representing a class is divided into three different parts: the class name at the top, the class's operations (methods) in the middle, and the class's attributes (properties) at the bottom. You might have noticed that the name of the FileOutput class has been italicized. This indicates that the class is designed as an abstract class, meaning that its purpose is to provide a known interface that's shared by its subclasses and that those subclasses provide the implementation. You might occasionally see the word "{abstract}", including the braces, below the italicized class name, but it's an optional notation. In this example, the only method that's abstract is the superclass CreateFile method. It can't be implemented in the FileOutput class because the steps required to create a file are dependent on the specific type of file to create. The other methods aren't italicized, indicating that they've been implemented in the FileOutput class definition.

To the left of each method and property name is a symbol indicating the "visibility" of the class. A "+" sign indicates that the class member is public to external objects at runtime. The "#" sign is the notation for "protected," and a hyphen ("-"), which isn't shown on this diagram, is used to indicate "private" visibility (or "hidden," as it's called in VFP).

Following the name of each property is a colon and a data type indicator. This is used to identify the type of value that the property will contain. Sometimes, when a property contains a reference to another object in the system, the name of the object or class definition is used. This makes the purpose of the property much clearer than if the word "object" alone were to be used. In the case of the "nCopies" property, an "=" sign and the number 1 follow the data type ("= 1"). This indicates that the property has a default value assigned to it at design time.

A similar notation set is used for describing a class's methods and the attributes of each method. For example, the colon and data type that follow three of the methods in FileOutput indicate that those methods have a logical return value (to indicate the method's success). This is another instance where an object or class name can be used if the purpose of a method is to return an object reference. In the case of the CreateFile method, there's a single parameter, called "tcExportFile." Parameter notation also uses the colon-data type to indicate the expected data type of the parameter (in this case, a string). If a parameter has a default value, the same notation that's used with a property's default value is used (the = sign plus the default value).

Moving down to the subclass definitions, you'll notice that the names of the subclasses are no longer italicized, indicating that these are concrete classes. They implement the missing functionality from the superclass interface and, while they can be subclassed further, their purpose isn't necessarily to be extended or specialized. Also, a default value has been assigned to the "cDefaultFileExt" property to indicate the default file extension for each of these classes. Since the purpose of the SpreadsheetOutput class is to create spreadsheets, it's been assigned ".XLS" as its default extension, and the TextFileOutput class is using ".txt" as its default.

Finally, the CreateFile method is no longer italicized in the subclass definitions. This is because the implementation is to be provided for the method defined in the superclass. Again, these could be further subclassed and extended, but their purpose is to provide matching behavior to the method inherited on their interface.

Learning more about the UML
As you might imagine, there's a growing set of resources and tools available to go in-depth with the UML specification. Should you research this powerful new tool on your own, you'll find that I've just barely scratched the surface in this column. However, even if you don't get any further into it, I think the notation presented and explained here can be put to use immediately as you work to document your system designs or collaborate with others to come up with a design.

Two of the books I've found to be especially helpful in learning the basics of the UML are:

UML Toolkit, by Hans-Erik Eriksson and Magnus Penker (John Wiley & Sons, Inc., ISBN 0-471-19161-2).
Applying UML and Patterns, by Craig Larman (Prentice Hall, ISBN 0-13-748880-7).

The first book, in particular, does a good job of teaching the basics of the UML without getting bogged down in the very detailed specifics that can be found within the specification. The second book covers the UML and design patterns at once, so it doesn't go into quite the same detail about the specifications as the first book. Both books assume a basic understanding of object-oriented principles and theories.

You should also go to Rational Software Corporation's Web site at http://www.rational.com. The authors behind the creation of the UML work for Rational, and the company's Web site is the "official" location of the UML specifications. You can download Adobe Acrobat files, get links to other UML sites, or view UML tutorials online. The company also makes a software-modeling product called Rational Rose that's worth looking at. It's fairly expensive, but it's an amazingly powerful tool for creating system models and even turning those models into "shells" of classes that you implement. It will do this for C++ and Visual Basic by default, but you can get import/export wizards for Visual FoxPro that let you model a component, and it creates the .VCX and class definitions according to the model (sadly, you still have to write some code). A trimmed-down version of Rational Rose is available in the form of Microsoft's Visual Modeler tool. This can be found at http://www.microsoft.com and is included with the Visual Studio development suite. While it's nowhere near as powerful as Rational Rose, you might still find it useful, and it's worth checking out.

As for creating diagrams for documentation purposes, I've found Visio Professional, by Visio Corp., to be an excellent tool for UML diagrams. While it's a general-purpose business diagramming package, it comes with a variety of templates and wizards for software diagrams. When creating a new class definition, you can go into a simple wizard-like interface to describe the members of the class, data types, default values, and visibility. Using the Visio interface, you don't have to remember all the semantics of the UML notation because the wizard tools will create the visual elements based on the input you provide. There's also a UML Navigator for finding your way through the various levels of granularity in your diagrams, as well as a UML Semantics Checker that lets you know when something in your design doesn't make sense with regard to the specifications. You can find out more about this product at http://www.visio.com.

Now go draw some pictures
Once you've accepted the fact that you need to document your designs as a given, I think you'll find the UML to be an incredibly powerful tool for that purpose. Imagine trying to write a textual description of some of the diagrams presented in this month's column. It would take a lot more time than it should, and it probably wouldn't be as clear to everyone who reads it as a simple diagram would be.

Do yourself a favor and spend some time reading about the UML or checking out the various online resources. The people who need to decipher the designs in your next project will thank you. I'll be back next month with yet another design-oriented topic, but feel free to let me know if there are things you'd like to read about, or if you have suggestions and comments for the Best Practices column.

Jefferey A. Donnici is the senior Internet developer at Resource Data International, Inc., in Boulder, CO. Because he has no inner child, drawing software diagrams gives him a twisted sense of creativity he never got out of those finger-painting classes. Jeff is a Microsoft Certified Professional and a four-time Microsoft Developer Most Valuable Professional. 303-444-7788, fax 303-444-1286, jdonnici@compuserve.com, jdonnici@resdata.com.