Genetic Algorithms

Evolve Simple Solutions to Complex Problems

Use simple, iterative algorithms modeled on genetics to solve the Traveling Salesperson problem.

By Jonny Anderson

Reprinted with permission from Visual Basic Programmer's Journal, January 2000, Volume 10, Issue 1, Copyright 2000, Fawcette Technical Publications, Palo Alto, CA, USA. To subscribe, call 1-800-848-5523, 650-833-7100, visit www.vbpj.com, or visit The Development Exchange.

The rule seems to be that complex problems require complex solutions. For example, take the act of delivering something across a complex network. The properties that determine route efficiency tend to be cross-linked and codependent. It's often extremely difficult to obtain efficient routes in such an environment, except through "guesswork, mitigated by experience," as the saying goes. Typically, you achieve such solutions using only human experts, but I'll show you how to employ simple, iterative algorithms to solve just such a complex problem.

For example, consider the computationally difficult Traveling Salesperson Problem (TSP). This describes a salesperson and his or her need to get from one city to another city by the shortest possible route. Each city on the map has many roads that connect it to other cities, so there are many possible routes between any two cities. The trick: deciding which route is shortest.

The TSP is a simple variant of the network-efficiency problem where the shorter a route, the more efficient it's considered to be. The interesting thing: You can solve for any route parameter or combination of parameters if you can successfully solve for short routes. Therefore, the TSP serves as a good template for many common but defiantly tricky problems typified by interconnected causal webs.

Genetic algorithms (GAs) are programs that simulate the logic of Darwinian selection. Understanding a GA means understanding the simple, iterative processes that underpin evolutionary change. GAs generate solutions that are good enough from large populations of continually changing solutions, where only the best members of a given population can be selected for replication. From a GA's perspective, the best members of a population are those that conform most closely to some user-defined ideal. Consider the TSP: The shorter the route, the closer it is to an unknown ideal shortest route, and so the better or more fit it is. This means GAs don't work out the solution, but a solution-exactly what you need to crack multiconstraint problems such as the TSP. The sample application's GA, GAweb, shows you a general approach to solving multiconstraint problems while specifically solving the TSP. You can download the full source for GAweb from the VBPJ Web site; see the Go Online box for details.

GAweb solves the TSP by finding short routes between two nodes sitting somewhere in a randomly interconnected web-this is a generic way of expressing the TSP. GAweb requires only two fundamental objects to model the TSP environment of cities-interconnected-by-roads: a Link (road) object and a Node (city) object. Each Link object needs to know which two nodes it links together and its own length, which it derives from the nodes' positions. Each Node object needs to know its position (X, Y, Z properties) and must contain a collection of references to the links that radiate from it (see Figure 1).

The simple shortest-route version of the TSP needs only the Length property, but you could create new properties for the nodes and links and make them stand for anything you like. For example, you might want to add a Gradient property to each link, enabling routes to be both short and mostly downhill. GAweb begins by building a 3-D web to accurately model a real-life map. Remember that map contour lines normally show the Height dimension on a 2-D sheet of paper. Doing this requires that GAweb generate random X-, Y-, and Z-coordinates for all the nodes, scattering them throughout 3-D space. Next, it needs to link them up.

Link Nodes Together

You create a web by repeatedly linking pairs of nodes together. A link is created when you send two randomly selected nodes to the AddLink procedure (see Listing 1). In this procedure, a new Link object is instantiated and the two nodes are passed on to its LinkUp method. This method calculates the link's length by comparing the X-,Y-, and Z-properties of its two terminating nodes (see Listing 1). The new link is also added to the links collections of both nodes, making the link bidirectional. The web grows as pairs of nodes are linked in this manner until a user-defined number of links is attained. You can now use the web of interlinked nodes to generate routes from one node to another.

GAweb creates routes by selecting a start node and picking a link from its list of links; it uses this link to jump to the node at the other end. A route is complete if this new node turns out to be the preselected target node. If not, GAweb consults the new node's list of links, selects a new link, jumps to the next node, and so on, ad infinitum or until it finds the target node.

In this way GAweb creates a population of routes that connect the start and target nodes; this gives you the entire population of valid routes. GAweb stores and maintains this population by providing a Routes collection class. The Route class contains a reference to its start and target nodes-it's really just a giant Link object in this respect-and collects another type of object called Section. A Section object is the atomic unit of a route and is a container for one node reference and one link reference. Laying several Section objects end-to-end-links to the left, nodes to the right like little tadpoles-creates a route that starts with a dangling link to the left and terminates in the target node at the far right. This arrangement lacks a start node on the far left-hand side (see Figure 1), but the Route object contains a reference to the start node, which means the route is complete from start to target.

Repeating the routing process for the same start and target nodes and saving the results into the Routes collection gives you a population of valid routes of differing sizes. The longer routes might look bad, but they might contain sections of inspired shortcut taking. It is the GA's job to find and recombine those shortcuts with others, amplifying the population's existing shortcuts to create the brand new super-short routes required to solve the TSP. It is here that you harness the simple, iterative power of evolution.

Life's Bare Necessities

The primary actions for any GA are replication (of the fit over the unfit), mutation, and crossover. Replication provides the ratchet, mutation the novelty, and crossover the mixing. Only the best mutations survive long enough to be churned and dispersed by crossover, and only the best of those can hope to be amplified by the replication of their host route.

Most GAs include a loop that drives the iterative evolutionary process; GAweb is no exception:

Public Sub Evolve(Generations As Long)
For Index = 1 to Generations
If RndNum(0, ProbBase) < ProbMutate Then _
GA_Mutate
If RndNum(0, ProbBase) < ProbCross Then _ 
GA_CrossOver
Next
End Sub

The RndNum function in this code returns a random number between zero and the value of ProbBase, a user-defined parameter accessible through the GAweb Properties dialog. This parameter usually has a value of around 1,000. ProbMutate and ProbCross are also user-defined parameters, with typical values of 1 and 600, respectively. These parameters and the RndNum function mean that the first IF statement has a one in 1,000 (0.1%) chance of being True. In other words, a mutation occurs roughly once in every 1,000 generations. The second IF statement has a 600 in 1,000 (60%) chance of being True, so about 600 crossovers occur every 1,000 generations. However, you can never tell whether the IFs will evaluate to True or False, even if the probability variables remain unchanged. You know only that they have a tendency one way or the other. I call these simple logical devices fuzzy switches, and they are great for keeping the system fluttering around the "perfect temperature" (see the sidebar, "Poised on the Edge of Chaos").

The first fuzzy switch controls mutation. Mutating a given route requires GAweb to select a section randomly from somewhere along its length (the Sections collection), dump all the sections after it, then reconnect the route using the same routing logic as before. The reconnection takes place from the mutation point onward, effectively replacing all the dumped sections (see Figure 2).

Taking a closer look at the GA_Mutate sub reveals that GAweb selects a route for mutation (MutRt) from the population of routes using the RndObj function (see Listing 2). This sub uses same function to return a randomly selected section (MutSc) from that route. This gives the sub the route and the section within the route to begin mutating from. Next, GAweb instantiates a new, temporary mini-route (MiniRt), setting its start node to MutSc's node reference and setting its target node to MutRt's target reference.

GAweb then calls MiniRt's Join method, forcing it to try to create a new, valid route between its start node, which is actually the mutation point, and the target node. Once the MiniRt joins the mutation point to the target node, all MutRt's sections after the mutation point are replaced with those from MiniRt.

Keep Advantageous Changes Around

This method of mutation is so flexible because a single mutation can trigger changes that affect the entire route or just the last link. The profundity of a mutation depends on its position along the route, which is decided stochastically. You hope that some of the replaced sections will represent a shortcut, thus conferring an advantage on the host route. Any advantage, no matter how tiny, increases the route's (and the selfish sections') chances of survival and replication.

However, it's more likely that the new route will be longer, and therefore eventually pressured out of the population. This would be unfortunate if the new route, although longer, contained sections or combinations of sections that were, in themselves, marvelous shortcuts. Avoiding such waste requires another process, one that shuffles rather than invents. If a new, longer route can survive long enough to be shuffled with another route, then the good sections have an opportunity to hop into a new host route and set up home where their benefits can be recognized properly. This shuffling process is called crossover; the simple, beaded-string structure of GAweb's route makes the whole process relatively easy to code (see Listing 2).

Two routes must share at least one node other than their start and target nodes to make crossovers possible. Imagine two routes strung together so they form a figure eight. The crossover point is the node they share (see Figure 3). If both the routes are split at this shared point, they can safely swap all the sections after this point to create two brand-new, but still valid, routes.

The GA_CrossOver routine accomplishes this by selecting two candidate routes and comparing the first route's nodes with those of the second until it finds a shared node. If the two routes share a node, the routine sends the routes and the crossover node to the Swapsections routine, which exchanges-between the two routes-all the sections after the mutation point (see Listing 2).

The population is now ready to have its more fit member replicated at the expense of its less fit member which you determine with a simple recalculation of fitness values. You can see calls to GA_Replicate toward the end of both the GA_Mutate and GA_CrossOver routines:

Private Sub GA_Replicate()
LoInd = FindHiLo(
With routes
.Add Hiroute, Hiroute.Name
.Item(LoInd).Clear
routes.Remove LoInd
End With
End Sub

The replication routine effectively amplifies the presence of shorter routes at the expense of longer ones. It does so by deleting the route with the lowest fitness value and replacing it with a route with the highest fitness value. The FindHiLo function establishes the highest and the lowest routes. Note that the best replaces the worst, so birth and death are effectively synchronized, and GAweb always maintains a constant population size.

Select for a Higher Fitness Value

A more fit route might be created once a route has been mutated or crossed-over. The fitness test, GA_FitnessTest, resides in frmGAweb and directs the solution's overall shape. GA_FitnessTest calculates the fitness value for a given route, defining the selection pressure and deciding what it means to be fit. Remember that a shorter route means a higher fitness value for the TSP. You can see this process at work in this code:

Public Sub GA_FitnessTest(Rt As _
GAwebLib.route)
With Rt
.Fitness = (.Length - WebLength) * -1
End With
End Sub

The only complication: The GAweb must operate over webs of any size. To this end, the fitness test calculation incorporates the WebLength, a variable that is the sum of all the link's lengths in the current web.

That's about it. Putting the fitness test in place gives you all the components you need to crack the TSP: an interlinked network of nodes; a population of different routes connecting the same start and end points; and a GA that can replicate, eliminate, chop, and swap those routes to make them ever shorter.

You can see all these elements in action by downloading the GAweb demo project (see the Go Online box for details). This project provides you with the complete source required to solve the Traveling Salesperson Problem (see the sidebar, "Getting Started with the GAweb Demo"). I've no doubt that you'll quickly find more interesting uses for this technique once you crack the TSP. For example, you might want to set it up on powerful computers that manage large dynamic networks through the continual adaptation of the most efficient routes.

You might also use this same web structure to model other multiconstraint problems. It might seem strange, but the Internet was not my first thought when I initially fished for problems to challenge this GA with. My first idea was another classic problem: the school timetable. This problem illustrates the scope and potential of GAs operating within a webbed-data environment. The ability of the sample to crack such multiparameter scheduling indicates that this approach has incredible practical potential. For example, you might adapt the example's logic to construct an Internet search engine that always returns a fuzzy set of links, one that does not necessarily conform to the search string, but to the spirit of the search string.

My suggestions have had one thing in common: The data held in the web is passive. It just sits around, waiting to be included in a route and assessed by the fitness test. But there's no reason why the nodes or links can't represent actions to be taken along some manufacturing, mathematical, or financial process. GAs can evolve more efficient processes as easily as they can evolve efficient outcomes. Try imagining a dollar bill that speeds around an interlinked web, where the links represent financial conversions and nodes represent financial states-the fitness test might select for routes that minimize the tax burden. Or perhaps you want to solve the Rubik's Cube quickly from any starting position. The possibilities are as endless as they are exciting.

About the Author: Jonny feels convinced that the need to work with real-life dynamic systems is emerging as a major software challenge. It is his belief that a ready-made algorithm for managing complexity already exists and that the entire science of biology serves to describe it. Thus he looks forward to the day when feedback and fractal complexity become the new paradigm. Jonny is Scottish. Reach him by e-mail at jonny@chaotics.fsnet.co.uk or visit his Web site at www.chaotics.fsnet.co.uk/

This article's code includes the listings and source code for the GAweb demo described in this article. You will also need to install an IE5 compliant VRML plug-in to take advantage of GAweb's stunning 3-D interface. Two such plug-ins can be found at these sites:

Cosmo Player 2.1: http://www.platinum.com/cosmo/win95nt.htm*
Microsoft VRML 2.0 www.microsoft.com/vrml/default.htm

*Recommended