Unified Modeling Language 101
Jefferey A. Donnici
The Unified Modeling Language
is a powerful new tool for developers to use when working on object-oriented
systems. Because its purpose is to document and model a software system
using a language-independent methodology, software designers can more easily
communicate their designs to other designers and to those who ultimately
implement the software. This month's Best Practices column provides an introduction
to the Unified Modeling Language and an explanation of a couple types of
diagrams that can be used immediately by Visual FoxPro developers.
I know what you're thinking. It's something along the lines of, "What
is he doing now, with this beginning Unified language stuff? I already know
a computer language, and that's why I subscribe to FoxTalk in the
first place!" Well, give me a minute before you turn the page or toss
this issue into that stack on the bookshelf.
Over the past year or so, I think we've pretty well established in this
space that documenting the analysis, design, and implementation of a system
must be a top priority for any development project. Without the requirements
and specifications, there can be no design, and without the design documentation
to lead the programmers, there can be no implementation. The implementation
must be documented through the liberal use of comments, or the long-term
maintenance of the system will suffer with each new revision. This month's
column focuses on a new approach to documenting the design phase of an object-oriented
software project.
If you go to any well-stocked bookstore's Computer Science section, you're
likely to find dozens of books that propose different methodologies to use
when approaching an object-oriented design. Flipping through many of them
will reveal a variety of odd-looking diagrams and a ton of text that's more
likely to further confuse you than to make your life any easier. All of
them will give you a way to document your designs, and most will provide
an approach to coming up with the design, but which one should you be using?
And if you often work with other developers, how do you know that your colleagues
will understand the method you use for documenting your designs?
Asking around to find out which software design methodology is "best"
has long been a sure way to start a programming Holy War. When reading magazine
articles or books about the craft of designing software, it's common for
the text to include diagrams that explain the relationships and actions
within the software. The lack of a "standard methodology" for
documenting a design means that these references require learning yet another
approach to diagramming and notation.
Enter the Unified Modeling Language.
The birth of the Unified Modeling Language
Because of the varying methodologies for
object-oriented software construction, there was a high demand for some
sort of industry standards. In response to this demand, the authors of two
of the more popular methodologies got together and decided to come up with
a "unified" approach. These men, Grady Booch (author of the Booch
method) and Jim Rumbaugh (author of the OMT, or Object Modeling Technique),
were later joined by Ivar Jacobson (author of the OOSE, or Object-Oriented
Software Engineering, method). The result of this collaboration is the Unified
Modeling Language (UML). Within the past year, the "production"
version of the UML has been officially released, but earlier versions were
available on the Web so that feedback from software designers could be solicited.
That initial availability has proven to be a wise decision because many
development and design tools now support the UML despite its relative youth.
The purpose of the UML, as described in the specification documents, is
to provide "a language for specifying, visualizing, and constructing
the artifacts of software systems." In other words, it's meant as a
common language for developers to use when explaining how the parts of a
system are supposed to interact to form the system whole. When using the
UML, you're actually using a specified "notation," just as with
music or electronics, to present the "schematic" for a system
that's being developed using object-oriented principles. The UML is completely
independent of any one computer programming language, so it can be used
to document the design of systems implemented in Visual FoxPro, Java, SmallTalk,
or any other object-oriented language. The UML specification doesn't tell
you how to design a system; it just lets you communicate your design
to others for documentation purposes.
For this month's column, I'll be introducing the basic notation and concepts
behind two of the more common types of diagrams in the UML -- the Class
and Component diagrams. The Class diagram is also referred to as a Static
Structure diagram because it diagrams static relationships that exist at
design time and not actions, processes, or optional-runtime relationships
and states. It's my belief that these two diagram types can be quickly put
to use by most VFP developers. However, a basic understanding of the notation
for other diagram types is important, especially as magazines, books, and
development tools increasingly adopt the UML as an industry standard. To
learn more about the UML specification, or to study the notation for other
diagram types, check out the references mentioned at the end of this month's
column.
The view from up high
Within the UML, a Component diagram provides
a more high-level view of the interaction and relationships between the
components in the system. A component can be any logical "piece"
of the system and allows the whole to be viewed in smaller, more understandable
parts without the need for the code- and class-level details. The diagram
in Figure 1 shows an example Component diagram for an end user query mechanism. A shape that includes
an ellipse and two horizontal rectangles represents each component. A single,
continuous line with no arrows or patterns is used to provide a connection
between the top of the ellipse and the bottom of the rectangle. This line
also forms a "box" that contains the name and, optionally, the
filename or library of the component. The dashed lines indicate the dependency
relationship that exists between the different components. This line is
used in other types of diagrams -- anywhere that a dependency relationship
needs to be indicated.
You might have noticed the rectangular box with its corner "folded
down," which is the UML notation for a "note" object. You
can insert a Note into any of the UML diagrams whenever you want to make
annotations about something in the diagram.
The relationships
A Class diagram is slightly more complex
than a Component diagram, simply because the relationships of classes are
more complex than the relatively nebulous nature of components. The diagram
in Figure 2 is a simple Class diagram showing the common relationships between a family and its vehicles.
For this diagram, no details are provided for the individual classes so
that the diagram can be more easily understood.
In the Class diagram, a standard aggregation exists between the "family"
class and the "persons" class. This type of relationship is often
referred to as the "contains" or "has-a" relationship,
but it doesn't imply strong ownership between the family and its family
members. That is, any member of the family could leave and still be a person,
and the family would continue to be a family. This relationship is represented
by the hollow diamond on the "whole" side of the "whole-part"
association, with a straight line (or lines) leading to the "parts."
The "2..*" annotation next to the connection between the family
members and their relationship to the family indicates the relationship's
"multiplicity." A family, for the purpose of this diagram,
consists of two or more persons, and the asterisk represents a generic indication
of "many." This multiplicity can be annotated using any continuous
series of numbers, such as "5..10" to indicate from 5 to 10 parts
in the whole, or as a comma-delimited list of valid numbers (that is, "3,5,7"
to indicate that any of these values are valid for the relationship). When
no multiplicity is indicated between an aggregate or associative relationship,
it's generally assumed to mean "1".
The relationship between the family and its vehicles is an associative,
or "uses," relationship. In other words, the family uses one or
more vehicles as needed, but continues to be a family even if no vehicles
are present. The flip side of that coin is that a vehicle is still a vehicle,
whether a family is driving it or not. In the case of an association, a
label is typically applied to indicate the nature of the relationship; in
this case, that's represented by the word "drives." An associative
relationship will also usually have a multiplicity indicator, so this relationship
tells us that a family may drive "from zero to many" vehicles.
The next relationship is a "generalization" or "is-a"
relationship, which refers to the principle of inheritance. In this diagram,
"car," "sport utility," and "mini-van" are
each a specialization of the more general "vehicle" superclass.
This relationship is represented with lines running from the superclass
to its subclasses, with a hollow arrow pointing to the superclass. There's
no need for a multiplicity indicator with a generalization relationship.
The final relationship in this diagram is another type of aggregation called
"composite aggregation." This relationship is still a "contains"
or "has-a" relationship, but it implies a stronger ownership between
the whole and its parts. Usually, this means that the parts are created
and destroyed with the whole and that the entire aggregate is viewed externally
as a single entity. A simple example of a composite aggregation would be
the controls that exist inside a VFP container class. When the class is
instantiated, the controls are as well; when the class is destroyed, the
container takes its controls with it. This type of relationship is indicated
by a solid diamond on the "whole" side of the "whole-part"
composite and a straight line leading from the diamond to the "parts."
In the preceding example, this notation is used to indicate that the "sport
utility" class has a "ski rack" on it and that there's strong
ownership by the vehicle of the ski rack (that is, when the vehicle goes,
so does the ski rack). The multiplicity indicator in this relationship indicates
that a single sport utility vehicle may contain "zero or one"
ski rack.
The pieces and parts of a class
When providing the details for the design
of a single class, a number of elements are added to the class entity on
the diagram to indicate the various attributes of the class. The Class diagram
shown in Figure 3 is an example of a more detailed visual description of two classes.
In this example, the "FileOutput" class is the superclass for
the "SpreadsheetOutput" and "TextFileOutput" subclasses.
In each case, the rectangle representing a class is divided into three different
parts: the class name at the top, the class's operations (methods) in the
middle, and the class's attributes (properties) at the bottom. You might
have noticed that the name of the FileOutput class has been italicized.
This indicates that the class is designed as an abstract class, meaning
that its purpose is to provide a known interface that's shared by its subclasses
and that those subclasses provide the implementation. You might occasionally
see the word "{abstract}", including the braces, below the italicized
class name, but it's an optional notation. In this example, the only method
that's abstract is the superclass CreateFile method. It can't be implemented
in the FileOutput class because the steps required to create a file are
dependent on the specific type of file to create. The other methods aren't
italicized, indicating that they've been implemented in the FileOutput class
definition.
To the left of each method and property name is a symbol indicating the
"visibility" of the class. A "+" sign indicates that
the class member is public to external objects at runtime. The "#"
sign is the notation for "protected," and a hyphen ("-"),
which isn't shown on this diagram, is used to indicate "private"
visibility (or "hidden," as it's called in VFP).
Following the name of each property is a colon and a data type indicator.
This is used to identify the type of value that the property will contain.
Sometimes, when a property contains a reference to another object in the
system, the name of the object or class definition is used. This makes the
purpose of the property much clearer than if the word "object"
alone were to be used. In the case of the "nCopies" property,
an "=" sign and the number 1 follow the data type ("= 1").
This indicates that the property has a default value assigned to it at design
time.
A similar notation set is used for describing a class's methods and the
attributes of each method. For example, the colon and data type that follow
three of the methods in FileOutput indicate that those methods have a logical
return value (to indicate the method's success). This is another instance
where an object or class name can be used if the purpose of a method is
to return an object reference. In the case of the CreateFile method, there's
a single parameter, called "tcExportFile." Parameter notation
also uses the colon-data type to indicate the expected data type of the
parameter (in this case, a string). If a parameter has a default value,
the same notation that's used with a property's default value is used (the
= sign plus the default value).
Moving down to the subclass definitions, you'll notice that the names of
the subclasses are no longer italicized, indicating that these are concrete
classes. They implement the missing functionality from the superclass interface
and, while they can be subclassed further, their purpose isn't necessarily
to be extended or specialized. Also, a default value has been assigned to
the "cDefaultFileExt" property to indicate the default file extension
for each of these classes. Since the purpose of the SpreadsheetOutput class
is to create spreadsheets, it's been assigned ".XLS" as its default
extension, and the TextFileOutput class is using ".txt" as its
default.
Finally, the CreateFile method is no longer italicized in the subclass definitions.
This is because the implementation is to be provided for the method defined
in the superclass. Again, these could be further subclassed and extended,
but their purpose is to provide matching behavior to the method inherited
on their interface.
Learning more about the UML
As you might imagine, there's a growing
set of resources and tools available to go in-depth with the UML specification.
Should you research this powerful new tool on your own, you'll find that
I've just barely scratched the surface in this column. However, even if
you don't get any further into it, I think the notation presented and explained
here can be put to use immediately as you work to document your system designs
or collaborate with others to come up with a design.
Two of the books I've found to be especially helpful in learning the basics
of the UML are:
- UML Toolkit,
by Hans-Erik Eriksson and Magnus Penker (John Wiley & Sons, Inc., ISBN
0-471-19161-2).
- Applying UML and Patterns, by Craig Larman (Prentice Hall, ISBN 0-13-748880-7).
The first book, in particular, does a good job of teaching the basics of
the UML without getting bogged down in the very detailed specifics that
can be found within the specification. The second book covers the UML and
design patterns at once, so it doesn't go into quite the same detail about
the specifications as the first book. Both books assume a basic understanding
of object-oriented principles and theories.
You should also go to Rational Software Corporation's Web site at http://www.rational.com.
The authors behind the creation of the UML work for Rational, and the company's
Web site is the "official" location of the UML specifications.
You can download Adobe Acrobat files, get links to other UML sites, or view
UML tutorials online. The company also makes a software-modeling product
called Rational Rose that's worth looking at. It's fairly expensive, but
it's an amazingly powerful tool for creating system models and even turning
those models into "shells" of classes that you implement. It will
do this for C++ and Visual Basic by default, but you can get import/export
wizards for Visual FoxPro that let you model a component, and it creates
the .VCX and class definitions according to the model (sadly, you still
have to write some code). A trimmed-down version of Rational Rose is available
in the form of Microsoft's Visual Modeler tool. This can be found at http://www.microsoft.com
and is included with the Visual Studio development suite. While it's nowhere
near as powerful as Rational Rose, you might still find it useful, and it's
worth checking out.
As for creating diagrams for documentation purposes, I've found Visio Professional,
by Visio Corp., to be an excellent tool for UML diagrams. While it's a general-purpose
business diagramming package, it comes with a variety of templates and wizards
for software diagrams. When creating a new class definition, you can go
into a simple wizard-like interface to describe the members of the class,
data types, default values, and visibility. Using the Visio interface, you
don't have to remember all the semantics of the UML notation because the
wizard tools will create the visual elements based on the input you provide.
There's also a UML Navigator for finding your way through the various levels
of granularity in your diagrams, as well as a UML Semantics Checker that
lets you know when something in your design doesn't make sense with regard
to the specifications. You can find out more about this product at http://www.visio.com.
Now go draw some pictures
Once you've accepted the fact that you
need to document your designs as a given, I think you'll find the UML to
be an incredibly powerful tool for that purpose. Imagine trying to write
a textual description of some of the diagrams presented in this month's
column. It would take a lot more time than it should, and it probably wouldn't
be as clear to everyone who reads it as a simple diagram would be.
Do yourself a favor and spend some time reading about the UML or checking
out the various online resources. The people who need to decipher the designs
in your next project will thank you. I'll be back next month with yet another
design-oriented topic, but feel free to let me know if there are things
you'd like to read about, or if you have suggestions and comments for the
Best Practices column.
Jefferey A. Donnici is the senior
Internet developer at Resource Data International, Inc., in Boulder, CO.
Because he has no inner child, drawing software diagrams gives him a twisted
sense of creativity he never got out of those finger-painting classes. Jeff
is a Microsoft Certified Professional and a four-time Microsoft Developer
Most Valuable Professional. 303-444-7788, fax 303-444-1286, jdonnici@compuserve.com,
jdonnici@resdata.com.