Know the difference between what you can fix, and what you can’t

Benefits

Reduces application errors

Description

Self repair and install on demand are two of the cool new features of the Microsoft^® Windows^® Installer, and they can dramatically lower the TCO of your application. But while these features can really help the resiliency and robustness of your program by helping you to carry out repairs dynamically, they’re not going to solve all your problems. It’s important to know what you can’t fix.

Imagine a typical scenario: A user tries to start your application or use a feature and it doesn’t work. They’re given an error message that they don’t understand and then spend the next half-day trying to get the application working again. In order for people to comfortably use your application it needs to be reliable. Just using the application isn’t the point, the application is supposed to help with the user’s job. So, how do you identify and classify problems so your application can repair itself quickly and easily?

A good starting point is to isolate components of your application (and thus problems) as much as possible. Be able to load just your core product with little else, and then check for items as you need them. Avoid checking and loading more than is required. If a user want do a spell check, there’s no sense in also checking to see if clip art is working!

Methods for checking components range from the simplest: is it physically there? To the complex: is the checksum for the component correct? As much as possible a component should be responsible for checking the components it uses, and it’s own component support files. For example, if a spell checking DLL has a supporting dictionary file, the main application would check for the spell checking DLL, but the spell checking DLL would have to make sure the dictionary was there and in good working order.

Your application doesn’t have to check it’s components before it attempts to use them, which assumes failure, instead you could assume success and then trap any subsequent errors.

For instance, if you try to load your DLL and LoadLibrary or GetProcAddress fail, then you’d know you need to do a repair.

Remember though, most of the time things aren’t broken, so make sure you’re aware of any negative impact on your performance. If you can check quickly and easily (or only when something has already failed, but before reporting it to the user) then do so. If it makes your application’s startup take an extra minute, you should probably rethink your strategy.

The bottom line is don’t go overboard. Know when to say, “something’s wrong, but I can’t fix it” and then give the user a good error message, and degrade gracefully.

Know the difference between what you can fix, and what you can’t

Benefits

Description

See Also