By Rod Stephens
Sorting is an important and common programming task. Almost any data set is easier to understand and manipulate if it is properly sorted. For example, a dialog box might contain a ListBox control that provides choices for the user to select. If the list of choices is long, it will be much easier for the user to make a selection if the choices are sorted.
The ListBox control in Visual Basic has a Sorted property that makes ordering choices easy. Unfortunately, the ListBox that MSForms allows you to use with UserForm dialog boxes doesn't have this property; you must sort the choices before you add them to the list.
Sorting algorithms also demonstrate important programming techniques that you can use in other VBA subroutines.
This article describes two sorting algorithms that are useful under different circumstances: Selectionsort and Quicksort. These algorithms demonstrate several important programming concepts, including randomized behavior, recursion, and the divide-and-conquer strategy.
If you are an experienced Microsoft Office programmer, you probably know that VBA doesn't provide sorting routines. It does, however, provide certain objects that can perform sorting. Word's Range object, for example, has a Sort method. One way to sort a list of items would be to create a new Word document, write the list into the document, sort the document, and read the results back into the original list. Similarly, you could create a table using Access and use a query to recover the records in sorted order.
These methods are slow and cumbersome, however. If you are using some other application, such as PowerPoint, you would also need to open a connection to Word or Access, adding further complication. Fortunately, there are several relatively easy methods for sorting using VBA code alone.
The Selectionsort algorithm begins by searching the list for the smallest item. It then swaps the smallest item with the item in the first position in the list. The algorithm then searches the remaining items for the second smallest. It swaps that item with the item in the second position of the list. The algorithm continues this process, searching for the smallest remaining item and swapping it into its correct position in the list, until the list is completely sorted.
FIGURE 1 shows the Selectionsort subroutine. It takes as parameters an array of strings to sort, and two integers giving the array's lower and upper bounds. The bound parameters are declared with the ByVal keyword so the subroutine cannot accidentally modify their values for the calling routine (see the sidebar "Coding Tips for Bug-Free VB" on page XX). When the Selectionsort subroutine is finished, the list array contains the strings in sorted order.
FIGURE 1: The Selectionsort algorithm implemented as a VBA Sub procedure.
' Sort items in the values array with bounds min and max.
Sub Selectionsort(values() As String, _
ByVal min As Long, _
ByVal max As Long)
Dim i As Long
Dim j As Long
Dim smallest_value As String
Dim smallest_j As Long
For i = min To max - 1
' Find the smallest remaining value in entries
' i through num.
smallest_value = values(i)
smallest_j = i
For j = i + 1 To max
' See if values(j) is smaller.
If values(j) < smallest_value Then
' Save the new smallest value.
smallest_value = values(j)
smallest_j = j
End If
Next j
If smallest_j <> i Then
' Swap items i and smallest_j.
values(smallest_j) = values(i)
values(i) = smallest_value
End If
Next i
End Sub
FIGURE 2 shows the SortListBox subroutine. This routine copies the choices in a ListBox control into an array, uses Selectionsort to sort the choices, then copies the sorted list back into the ListBox. You can use this subroutine to sort ListBox choices in dialog boxes you have created. First, fill the ListBox with choices as usual. Then, before you show the dialog box, call SortListBox for any ListBox control you want sorted.
Selectionsort is a fairly simple algorithm, which makes it easy to debug and maintain. It's also quite fast for small lists. On a 133 MHz Pentium, this subroutine can sort 1,000 items in under one second — which is more than fast enough for sorting items in a ListBox. For larger lists, however, Selectionsort can be much slower.
FIGURE 2: Using the Selectionsort Sub procedure to sort the items in a ListBox.
' Sort the items in the ListBox.
Sub SortListBox(list_box As ListBox)
Dim values() As String
Dim num_items As Integer
Dim i As Integer
' Put the list choices in a string array.
num_items = list_box.ListCount
ReDim values(1 To num_items)
For i = 1 To num_items
values(i) = list_box.List(i - 1)
Next i
' Sort the list.
Selectionsort values, 1, num_items
' Put the items back in the ListBox.
list_box.Clear
For i = 1 To num_items
list_box.AddItem values(i)
Next i
End Sub
The algorithm begins by looking through its list to find the smallest value. If the list contains N items, the algorithm takes N steps to find the smallest item. It then takes N - 1 steps to find the second smallest, N - 2 steps to find the third smallest, and so forth.
In total, Selectionsort uses N + (N - 1) + ... + 1 steps to arrange the list. If you add those steps you get N * (N + 1) / 2 steps. When N is large, this functions grows like the square of the number N. If you double the number of items in the list, the algorithm takes four times as long. If you increase the number of items by a factor of 10, the algorithm takes 100 times as long.
While Selectionsort can reasonably sort 1,000 items, a list containing 10,000 or 100,000 is simply too big to sort in a reasonable amount of time. For sorting the items in a ListBox, Selectionsort is fine. However, more complicated routines that manage larger lists need a different solution.
Quicksort is a slightly more complex algorithm that, for large lists, is much faster than Selectionsort. Quicksort uses a divide-and-conquer strategy to break the list into smaller and smaller pieces, until the pieces are small enough to handle easily.
The basic idea is to divide the items in the list into two sublists. The subroutine begins by selecting a dividing value. It places all the values smaller than the dividing value in one sublist, and all the other values in another sublist. Quicksort then calls itself to sort the two sublists. It updates its bound parameters min and max to tell the new copy of Quicksort which items to sort.
The process of a subroutine calling itself like this is called recursion. At first, recursion is a strange concept — it's not the way people naturally think. However, it's a powerful programming technique because it lets one routine break a problem into pieces small enough to handle. In this case, the Selectionsort algorithm breaks up the list until the sublists each hold only a single item. Because a list containing a single item must be sorted, the algorithm stops dividing the list at that point.
There are many ways in which the routine might pick a dividing value. For example, it could use the first item in the list. If the items are arranged randomly, this value will probably belong somewhere near the middle of the sorted list. In that case, the routine will divide the list into two sublists of roughly equal size. The sublists will be roughly half the size of the original list, so the algorithm will make good progress in its goal of cutting the sublists down to size.
On the other hand, if the list is initially sorted, the first item in the list belongs before all the other items. If the algorithm uses that item for a dividing value, it will place none of the items in one sublist and all the items in the other. The algorithm will have made little progress, and will be extremely slow.
An alternative strategy for choosing a dividing value is to pick one randomly. In that case, no matter how the list is arranged, it is unlikely the algorithm will pick an item that belongs at the front of the list. The sublists will be significantly smaller than the original list, and the algorithm will make reasonable progress. FIGURE 3 shows the Quicksort algorithm implemented in VBA.
Quicksort is much faster than Selectionsort for sorting large lists. On a 133 MHz Pentium, Selectionsort can sort 1,000 items in under one second. In about the same time, Quicksort can arrange 10,000 items. Even more impressively, Quicksort can sort 100,000 items in about nine seconds, while Selectionsort would take more than two hours.
FIGURE 3: A VBA implementation of the Quicksort algorithm.
' Sort the items in array values() with bounds min and max.
Sub Quicksort(values() As String, _
ByVal min As Long, _
ByVal max As Long)
Dim med_value As String
Dim hi As Long
Dim lo As Long
Dim i As Long
' If the list has only 1 item, it's sorted.
If min >= max Then Exit Sub
' Pick a dividing item randomly.
i = min + Int(Rnd(max - min + 1))
med_value = values(i)
' Swap the dividing item to the front of the list.
values(i) = values(min)
' Separate the list into sublists.
lo = min
hi = max
Do
' Look down from hi for a value < med_value.
Do While values(hi) >= med_value
hi = hi - 1
If hi <= lo Then Exit Do
Loop
If hi <= lo Then
' The list is separated.
values(lo) = med_value
Exit Do
End If
' Swap the lo and hi values.
values(lo) = values(hi)
' Look up from lo for a value >= med_value.
lo = lo + 1
Do While values(lo) < med_value
lo = lo + 1
If lo >= hi Then Exit Do
Loop
If lo >= hi Then
' The list is separated.
lo = hi
values(hi) = med_value
Exit Do
End If
' Swap the lo and hi values.
values(hi) = values(lo)
Loop ' Loop until the list is separated.
' Recursively sort the sublists.
Quicksort values, min, lo - 1
Quicksort values, lo + 1, max
End Sub
Despite this, Selectionsort has its place. Its greater simplicity makes it faster for sorting small lists. Selectionsort can sort 100 items in about 0.008 seconds, while Quicksort takes 0.014 seconds. Of course, if you need to sort 100 items, it doesn't matter much which algorithm you use. If you need to sort 1,000 lists containing 100 items each, using Selectionsort instead of Quicksort may mean the difference between waiting eight seconds and waiting 14 seconds.
The CountWords subroutine, shown in FIGURE 4, uses the Quicksort algorithm to count the distinct words in a Word document. First, it fills an array with the selected words. If no words are selected, it fills the array with every word in the document. The routine uses only words that do not have the _code style set and that begin with a letter. Copying the words into the array is by far the slowest step in the process.
Next, CountWords uses Quicksort to sort the words. It then searches the array, skipping duplicates. For example, if the word "the" appears 100 times in the document, the routine skips all but the first occurrence. CountWords finishes by displaying the distinct words and the word count in a new document.
FIGURE 4: Using the Quicksort Sub procedure to list the distinct words in a document.
Sub CountWords()
Dim values() As String
Dim doc_range As Range
Dim word_object As Object
Dim word_text As String
Dim num_words As Long
Dim txt As String
Dim ch As String
Dim i As Long
Dim j As Long
Dim doc As Document
' Make sure there is a document open.
If Selection Is Nothing Then
MsgBox "No document is open."
Exit Sub
End If
' Get the selected range.
If Selection.Type = wdSelectionIP Then
' If there' s no selection, use everything.
Set doc_range = ActiveDocument.Content
Else
' Otherwise use the selection.
Set doc_range = Selection.Range
End If
' Copy the selected words into the _words array.
' Copy only words that do not have the _code style,
' and that begin with a letter.
ReDim values(1 To doc_range.Words.Count)
For Each word_object In doc_range.Words
If word_object.Style <> "_code" Then
' Remove spaces and convert to lower case.
word_text = LCase$(Trim$(word_object.Text))
' See if it begins with a letter.
ch = Left$(word_text, 1)
If ch >= "a" And ch <= "z" Then
num_words = num_words + 1
values(num_words) = word_text
End If
End If
Next word_object
' Sort the words.
Quicksort values(), 1, num_words
' Merge duplicates.
j = 1 ' The last position used.
For i = 2 To num_words
If values(i) <> values(j) Then
j = j + 1
values(j) = values(i)
End If
Next i
num_words = j
' Display the results in a new document.
Set doc = Documents.Add
For i = 1 To num_words
Selection.TypeText values(i) & vbCrLf
Next i
Selection.TypeText "----------" & vbCrLf
Selection.TypeText num_words & " words." & vbCrLf
End Sub
While VBA doesn't provide tools for sorting, you can easily build your own. Using Selectionsort for small lists and Quicksort for large lists, you should be able to handle most of your sorting needs.
And, in case you are wondering, this article contains 417 distinct words.
The procedures referenced in this article are available for download from the Informant Web site at http://www.informant.com/mod/modnewupl.htm. File name: MOD9803RS.ZIP.
While it's nearly impossible to guarantee a program is completely bug free, there are several steps you can take to improve your chances.
Use Option Explicit. Normally, VBA creates a new variable when you first reference a variable it hasn't seen. This can quickly lead to confusion — especially if you misspell a variable name or try to use one routine's variable in another routine. For example, the following code always displays the value 1, although that's clearly not the intent:
Dim i As Integer
Dim the_num As Integer
For i = 1 To 10
the_num = teh_num + 1
Next I
MsgBox the_num
If you begin the Declares section with the statement:
Option Explicit
VBA will generate an error message when it encounters teh_num — instead of a mystery.
Type every variable. If you declare a variable, but don't specify a type, VBA makes the variable a Variant. Variants take more memory and longer to process than other data types. The easy way they can switch data types also makes them confusing. To get the best performance and prevent confusion, give every variable an explicit type. If you really want to use a Variant, say so:
Dim i As Integer
Dim v As Variant
Declare variables on separate lines. If you declare two variables on the same line, each must have its own type declaration, or it defaults to Variant. For example, the following line of code declares two variables: variable v2 is an Integer, but v1 is a Variant:
Dim v1, v2 As Integer
This statement shows the correct way to allocate two integers on the same line:
Dim w1 As Integer, w2 As Integer
To avoid confusion, declare variables on separate lines.
Pass arguments by value when appropriate. By default, VBA passes an argument to a subroutine by reference. This means that if the subroutine modifies the argument's value, the value is changed in the calling subroutine's variable as well. Obviously, this isn't always desirable. To prevent changes from affecting the calling subroutine, you can make VBA pass an argument by value using the ByVal keyword. In the following code, argument A is passed by reference and argument B is passed by value. The MsgBox statement reports the value of i as 100, and j as 2:
Sub Caller()
Dim i As Integer
Dim j As Integer
i = 1
j = 2
Callee i, j
MsgBox "i = " & i & ", j = " & j
End Sub
Sub Callee(A As Integer, ByVal B As Integer)
A = A * 100
B = B * 100
End Sub
To prevent confusion in the calling routine, pass arguments using ByVal whenever you don't need to modify their values.
Use constants. Use constants to make the meaning of values clear. In large programs, a well named constant is easier to understand than a "magic number." If you later need to change the constant's value, it is also easier to change one definition than to change all the places you use the value:
Function CalculateTax(ByVal price As Single) As Single
Const TaxRate = 0.09
CalculateTax = price * TaxRate
End Sub
Test in the Immediate window. You can easily test pieces of your code using the Immediate window. Use the View | Immediate Window command to make the window appear. Enter a command in the window, just as you would in VBA code, and press J. Use ? to make the window print values. You can even invoke VBA subroutines and functions:
?CalculateTax(100.00)
9
This is particularly helpful for testing routines that are only used by other routines and are hard to run directly as macros.
— Rod Stephens
Rod Stephens is the author of several Visual Basic books, including Visual Basic Algorithms and Advanced Visual Basic Techniques, both from John Wiley & Sons. He writes an algorithm column in Visual Basic Developer, and some of the material presented here has appeared there in a different guise. You can reach him at RodStephens@compuserve.com, or see what else he's doing at http://www.vb-helper.com.