Profile Your Source CodePut away your stopwatch. With source code profiling, you uncover the code's architectureand probe it for soft spots.by Ash Rofail Reprinted with permission from Visual Basic Programmer's Journal, 6/98, Volume 8, Issue 7, Copyright 1998, Fawcette Technical Publications, Palo Alto, CA, USA. To subscribe, call 1-800-848-5523, 650-833-7100, visit www.vbpj.com, or visit The Development Exchange at www.devx.com Where software development is concerned, everyone likes a fresh start. Instead, you often wind up taking over existing projects or revisiting old ones. And when you do, chances are you suffer a lot of down time identifying the pieces and trying to recallor puzzle outwhy things were done the way they were. Only a masochist would relish doing this kind of reverse engineering. I'd like to lend a hand by showing you how to profile source code and analyze a project effectively. In this process, you deconstruct an existing project into its functional components, identify areas of weakness, assemble important information about the source code, and build an architectural sense of the project as a whole. Furthermore, you inoculate new projects against chaos by enforcing development standards. You can pull statistical information out of every VB project, such as number of files and lines of code, along with source size (see Figure 1). You can also extract the number of procedures, local and nonlocal variables, and constants, as well as names of the source files that are oldest and newest, and shortest and longest. You can also get optimization information such as identity and location of dead procedures, along with dead variables, constants, types and enums, untyped variables, and name-shadowing information (see Figure 2). Furthermore, you can get project design information, one of the more important types of information available, despite how often it's overlooked. Here you find informational complexity, nested loops, structural fan-out, informational fan-in times fan-out, and cyclomatic complexity of procedures (measuring the number of linearly independent paths through a program module). For procedures, "structural fan-in" measures the number of procedures that use a given procedure or variable; "structural fan-out" measures the number of procedures called by a given procedure. For modules, structural fan-in measures the number of modules that use variables, constants, or procedures in a given module; structural fan-out measures the number of modules whose variables, constants, or procedures are needed by the given module. "Cyclomatic complexity" is the minimum number of test cases you must execute for every statement in a procedure. Informational fan-in/out adds global variables and parameters referenced in the calculations. Later on I'll have more to say about these concepts.You also can derive metrics indicating code understandability, including comments-to-code ratio, white-space-to-code ratio, nested conditionals, and reusability. You'll have trouble tracking reusability if you don't have a tool that detects structural fan-in per number of procedures. I'll discuss this in detail later, along with some of the more exotic terminology I'm using. I've provided a profiler that performs all the tasks I've listed, along with its source code (available from the free, Registered Level of The Development Exchange; see the Code Online box for details). The profiler doesn't write any information back into your VB project, so you can use it freely. Profiling your projects helps produce tight, organized, optimized source with high understandability (for other VB source code profilers, see the sidebar, "Commercial Code Profiler Products").
Begin by extracting statistical information (see Listing 1). Query the project to reveal the number of lines and size of your code. This query helps evaluate the quality of the software when you compare code size with defect errors found in the system. It also helps estimate the cost of development of a new system based on the number of programmer hours per thousand lines of code. OPTIMIZE CODE FOR CLARITY After extracting these metrics, you can start figuring out how to improve them. First, eliminate the overhead incurred by unused variables and dead code. Under the pressure of getting software out the door, developers tend to shortchange what's under the hood in favor of a glitzy exterior. It's like polishing and waxing your car every weekend but never changing the oil.And optimization means a lot more than speed-tuning your code. Code also needs lucidity, so other developers (or you yourself, six months from now) and reviewers can understand your code quickly. The design-quality report in my profiler weighs your code in this light, with numbers such as comment-to-code and white-space-to-code ratios. Commenting isn't a substitute for making the code inherently clear, but it adds clarity. The same goes for white space, which confers legibility. To achieve clarity, you also need to clear out all the deadwood. Developers often err by declaring their own module-level procedures and variables without first checking whether a procedure has already been declared and used elsewhere. This makes code both obese and hard to understand. Don't use VB's forgiving nature as a license to write loose, sloppy code. And you can't depend on VB's compiler. It halts only at compile errors. You don't get warnings about redundant declarations, as you do with the VC++ compiler. Next, tackle the code's complexity. You can use a number of techniques to measure this directly from source. Most of the techniques involve either control flow or some measure of program size. For example, McCabe's Cyclomatic method deals with knot calculations. Knot counts measure the number of excursions from a sequential execution of processing nodes. Suppose node y can be entered directly from node x only on the determination of an intervening predicate that forces control to a node beyond y. In this case, one node is added to the total count, and your cyclomatic complexity goes up. The higher the number, the more complex the procedure, and the harder it is to maintain. For VB procedures, cyclomatic complexity equals number of branches plus 1. Branches grow from If, Select Case, Do...Loop, and While...Wend statements. "Normal" values for cyclomatic complexity range from 1 (simple) to 9 (moderately complex). If cyclomatic complexity exceeds 10, consider splitting the procedure. You should carefully test procedures with high cyclomatic complexity values. You'll often get high values for procedures with long Select Case statements. Also, the nested-conditionals metric is related to cyclomatic complexity. Whereas cyclomatic complexity deals with the absolute number of branches, the number of nested conditionals show how deeply nested these branches are. Try to split up procedures with deeply nested conditionals. Such procedures can be hard to understand and error-prone. Also, counting nested loops helps you estimate a procedure's mathematical complexity. If you have lots of nested loops, they'll probably take lots of time to execute. MEASURE FAN-IN AND FAN-OUT You can also gauge complexity with structural fan-in/fan-out, derived from structured design, one of L.L. Constantine's most widely used strategies. Structured design focuses on the couplings of modules that make them interdependent, the self-sufficiency of modules (called cohesion) that promotes their independence, and the architectural relationships among modules.For procedures, a high structural fan-in number means the procedure is called (that is, reused) many times-a positive sign. A high structural fan-out indicates how much the procedure depends on other procedures, which suggests high complexity. For modules, a high structural fan-in denotes reusable code. All these metrics suggest a procedure's or module's complexity, but they don't prove it. For example, a procedure might access global variables and be quite complex, yet not call many other procedures. You also need to look at informational fan-in/fan-out and informational complexity. Informational fan-in equals procedures called plus parameters referenced, plus global variables referenced. This estimates the information a procedure reads. Informational fan-out equals procedures that call a given procedure plus [ByRef] parameters assigned to it plus global variables assigned to it. This estimates the information a procedure returns. Combined, these give a new metric: informational fan-in times fan-out. This metric helps predict the effort needed for implementing a procedure, but it doesn't help much with predicting complexity. For that we need a new metric: informational complexity, which equals lines of code times (informational fan-in times informational fan-out). I've described what you can profile. But when should you profile your code? My answer: at all stages. Say you've just created a new project. You don't have a lot of information to analyze yet. But profiling will give you a baseline measure you can build on, like the marks you put on the back of a door to track how your kids are growing. Thereafter, run the profiler at every engineering or code milestone, watching for sudden changes in the project, raising red flags when necessary. And you should run the profiler on projects that have been developed over time. If you can, start to clean up these projects. You'll be amazed at how many controls are no longer in use and how many large modules contain only one function in use. Clearing out all that stuff not only shrinks your code's footprint, but it also makes it execute faster.
Some developers run the profiler and become discouraged; others get similar results but just shrug. It's hard to avoid one extreme or the other as your deadlines approach. "If it runs, it ships" is the guiding light for many shops. And VB lets you get away with it, to some extent, as I've said. But this leads to poor design and mismanaged projects. One of the most common errors made in development involves multiple developers on the same program. I'm not knocking team development, but rather the process of one developer starting something and abandoning it half-documented, followed by another developer taking over in a hurry and plowing ahead without taking time to grasp what has already been done. This situation leads to gobs of unnecessary code. Run the profiler's Dead Code detection procedure on such projects and you'll see. Then make sure every developer takes the time to document the program's design. I know I'm giving hard advice to follow. Perhaps you've been working on your program for month, or maybe you've even delivered it to your users. But do it anyway. Otherwise, the problems will only get worse as you wait and continue development in the same program. My Call Tree option, which can be run by form or by module, can help you document the flow of logic and information in your project (see Figure 3). Give every developer working on the project a printout of the resulting diagram. Have each pin it to his or her corkboard next to the family photos. Afterward, adding new pieces or modifying existing ones will happen with relative ease. CREATE STANDARDS-THEN ENFORCE THEM Even when a shop develops standards, they tend to fall by the wayside during the crush of development. And often the standards are unrealistically complex, giving everyone an excuse to abandon them after the first flush of enthusiasm.Once I was asked to provide different coding standards for a group to implement. Six months later, I asked the group leader how well his group was keeping up with the standards. He said, "We're still evaluating which one to use." Needless to say, don't do that. If you aren't enforcing a standard now, develop a simple one and build on it as you go. But simple doesn't mean slack. Don't just limit how developers name variables and functions; also mandate a maximum ratio of lines of code to lines of comments, and how (and how often) code should be checked into revision control systems. Standards should also cover formatting code for legibility. Above all, you should implement code review sessions. I can't recall how many times I've sat in on such sessions and heard developers talk about how they're trying to create a routine, only to discover that another developer in the room had already written it. These sessions are priceless. The profiler also tells you a lot about design quality-both the quality of the given project and of the process that produced the project. The process affects numerous projects, so that's what you should focus on. The profiler's design-quality report doesn't explicitly detect process deficiencies, but it uncovers patterns of poor design quality. These could point the way to areas where the process could be improved. Lack of specs and bad architecture certainly contribute to bad code design. If you find these warning signs, take a time-out to review how you make design decisions. I know, you thought being a programmer meant programming. Well, it does-and a lot more. You might think these procedures are overkill when it comes to managing information in the project, let alone managing program logic errors. But after applying these techniques and using the free profiler program, you can quickly start working on your projects and dramatically improve their quality as well as their performance. Start right now. Install my profiler and attack that nasty project you've been avoiding. You can even hoist me by my own petard-try using the profiler to profile its own source code. I profiled Microsoft's VisData and had a lot of fun with it (see the sidebar, "Profiling Microsoft's VisData Program").
Ash Rofail is the principal software engineer and UI architect at Best Software. He is a frequent contributor to VBPJ. Ash specializes in VB, C++, Java, SQL Server, and COM. You can reach him by e-mail at Ash_Rofail@Bestsoftware.com. |