Brian Haslam
Microsoft Corporation
February 1997
Click to open or copy the files in the ThrdCode sample application for this technical article.
Since the advent of the Microsoft® Win32® application programming interface (API), multithreading has been widely available and used by C/C++ developers writing applications for the Windows® operating system. Now this technology can be employed to produce scalable programs using Microsoft Visual Basic® version 5.0.
This article discusses the following issues:
This article is principally of interest to Visual Basic developers who want to learn more about potential performance gains that can be possible by implementing multithreaded Visual Basic software components instead of their single-threaded counterparts.
This article also fills a gap in the available multithreading literature, which deals largely with synchronization issues coupled with thread safety. These articles almost exclusively use sample code written in C/C++. While synchronization issues cannot be entirely dismissed in Visual Basic, many of these topics are often simply not relevant. Therefore this article provides useful information on ways the Visual Basic developer can employ multithreaded components. The article takes into account the limitations and benefits of multithreaded components in Visual Basic, and all code presented is written solely with Visual Basic version 5.0.
It is possible to classify at least three separate categories of improvements that multithreaded components may introduce, and these all fit under the umbrella of improved performance.
Increases in computation speed largely depend on the type of problem being solved and the type of computer on which those problems are tackled. In his technical article “Win32 Multithreading Performance,” Ruediger Asche divides programming problems into CPU-bound and I/O-bound categories. CPU-bound calculations use up most of the available processor cycles to perform a computation (for example, sorting an array or multiplying matrices). I/O-bound programs typically wait for a request to finish (for example, waiting for a serial port event or waiting for data to travel across a network). Asche notes that any substantial programming task will most probably contain elements of both CPU-bound and I/O-bound problems, but these two extreme categories serve to bound the range of programming tasks.
The classification of these computations is important because any CPU-bound computation will generally take longer in a multithreaded case on a single-processor machine than the same computation in a single thread of execution. The multithreaded case takes longer than the single-threaded case because the machine must perform the same work for the calculation in both cases, but the multithreaded program must also deal with the overhead of extra instructions for thread context switches.
In contrast, a series of I/O-bound requests in a program could benefit substantially from conversion to a multithreaded architecture even on a single-processor machine. If each of the requests do not have to rely on the results of former requests, then the results can be processed on individual threads as soon as they are received.
Assuming computations lend themselves to being broken up and performed in parallel, each can run on a different thread and be performed concurrently. In these cases, it is often possible to significantly increase program speed by running the programs on multiprocessor machines instead of single-processor machines of the same speed. Multithreaded programs therefore have more potential to be scaled to increase the throughput when placed on symmetric multiprocessor machines. The obvious corollary of increasing the throughput is that the response time of requests by multiple client machines will be less—that is, better performance.
Traditionally, to keep the user interface lively on a single-processor machine during a CPU-bound calculation, a developer interlaced DoEvents statements within the commands for the calculation—especially in nested loops. This method remains valid in Visual Basic, and it is easy to implement. However, two drawbacks with the method are listed below:
In a client/server application where synchronous function calls are being made to an ActiveX server, a Visual Basic client application may seem unresponsive while making the function calls. The cause for this sluggishness may be that the client program has no easy way to achieve concurrent background processing. Shelling out multiple processes to do the work is one solution, but management of processes carries a high performance penalty. In this case, a multithreaded server could be employed to perform the work of calculations while the client concentrates on keeping the user interface responsive.
Remember that if you are going to write both the client and multithreaded server in Visual Basic, the multithreaded server will have to be an ActiveX EXE in order to gain the benefits of multiple threads. You have to use an ActiveX EXE because only the client application can create and control threads of a multithreaded DLL, and Visual Basic 5.0 programs cannot explicitly create new threads.
Executing computational work in parallel can lead to better performance of programs on symmetric multiprocessor machines (SMPs). Although it is not possible to explicitly create new threads in Visual Basic, it is possible to perform work in parallel using multithreaded servers. By coupling multithreaded servers with events (also a new feature to Visual Basic version 5.0), the servers can perform synchronous tasks concurrently on multiple threads. A Visual Basic client can call an asynchronous method in a server component many times in succession using multiple objects. If the server is on a “thread-per-object” model, then each call will consequently be executed on a different thread. This is the technique introduced here to get multiple threads working at the same time, even though in this case we only have a single-threaded Visual Basic client application.
To illustrate this technique, a program was developed using a Visual Basic ActiveX EXE server to concurrently sort arrays of data. Although a sorting computation is studied here, this method can clearly be applied to any general programming problem where it would be advantageous to perform background work on multiple threads. The server project is attached as MultSort.vbp, and the sample client project is attached as TrdTest.vbp. This scenario involves several new features to Visual Basic version 5.0:
In order to perform the sorting tasks asynchronously, the server caches the unsorted data array and the instance of the client object in variables declared at the module level. The Asynchronous sort routine then enables a timer, and returns from the method. The code in clsSort of the server is shown below:
‘This is the event raised when the sort computation is complete.
Public Event SortComplete(vntSortedData As Variant)
‘Below is the method called by the client to perform the asynchronous sort.
Public Sub Asynchronous_SelectionSort(vntUnsortedData As Variant)
'Cache the array to be sorted.
g_vntUnsortedData = vntUnsortedData
'Cache the instance of this object.
Set g_objSort = Me
'Enable the Timer to fire in 100 ms – in the Timer event we do the work.
EnableOneShot 100
End Sub
‘This function will be called by a module-level function where
‘the sort actually takes place.
Friend Sub FireSortComplete(vntSortedData As Variant)
RaiseEvent SortComplete(vntSortedData)
End Sub
The code in the server relies on an important facet of Visual Basic multithreaded components—each thread maintains its own module-declared data. In this multithreaded server, the thread-per-object model is used, so each object essentially has its own thread and, therefore, it has its own module-declared data. In contrast, the asynchronous call implementation used here does not work for a single threaded server for the following reason: if a client called the Asynchronous_SelectionSort method twice in succession, the second call would cause the server to throw away the reference to the first client, and only the second call would have a resulting event fired with the result.
After enabling the timer, and returning from the Asynchronous_SelectionSort routine, the Visual Basic client thread will regain control, and it can continue servicing other user interface requests or carry on with further code in the procedure. In this case, code in the client asks the server to sort a different array of data using a different object (and therefore a different thread). Meanwhile, in the server, the timer fires and starts to perform the work that the client asked it to do. When the work for that data is finished, the server raises an event, and passes the resulting sorted array as a parameter in the SortComplete event. The code in the module of the server is shown below:
Public g_vntUnsortedData As Variant
Public g_objSort As clsSort
Public g_TimerID As Long
Public Sub EnableOneShot(ByVal ulTime As Long)
g_TimerID = SetTimer(0, 0, ulTime, AddressOf TimerCallback)
End Sub
Public Sub TimerCallback(ByVal hWnd As Long, ByVal uMsg As Long,
ByVal idEvent As Long, ByVal dwTime As Long)
'Kill the timer - we do not need it any more.
KillTimer 0, g_TimerID
'Use selection sort to sort the Global array.
SelectionSort g_vntUnsortedData
'Raise the event to indicate sort is finished.
g_objSort.FireSortComplete g_vntUnsortedData
'Cleanup and exit.
Set g_objSort = Nothing
End Sub
One of the principal benefits of this server implementation is that it should be very scalable—that is, if the server is ported over to a multiprocessor machine, the work of sorting multiple arrays can take place concurrently, and therefore the results can be obtained faster. A second benefit is that the work is being performed on different threads than the thread that maintains the user interface of the client. Using a multithreaded server to perform background work allows the user interface to be more responsive.
To test the performance of this program, the time to sort four arrays of random numbers (longs) using one object (and therefore one thread) was compared to the case with four objects on four different threads. Each case was tested on both a single-processor and a symmetric multiprocessor machine. The first machine used was a single-processor Pentium 166 MHz with 32 megabytes (MB) RAM, and the second machine was a quad-processor Pentium 166 MHz with 320 MB RAM. Since both programs were small relative to the amount of free RAM (the EXE client and server together were less than 30 KB, the Visual Basic run time is about 1.3 MB), and since they both performed very similarly with similar single-threaded programming tasks, the comparisons seem valid. In any case, results for the single-thread and multithread cases on any one type of machine can be compared to each other.
Both machines were running Microsoft Windows NT® 4.0 with Service Pack 2. The tests called the Windows API GetWindowTicks function to measure how long the computation ran, and the computation was run one hundred times. The mean average time of the one hundred trials was calculated using Microsoft Excel 97.
The results of running the tests with the Visual Basic client and EXE server on the single-processor and multi-processor machines are shown in Figure 1.
Figure 1: Mean Average Time to complete the sorting of four arrays
It is interesting to note that multiple threads for the same calculation on a single processor machine caused a small slowdown, most likely due to the added thread management instructions as described earlier. The single thread on the multiprocessor machine case was only slightly faster, as expected, probably due to the fact that operating system instructions could be performed by other processors while the sorting operations in the Visual Basic server took place. However, the significant result is that the multithreaded component completed the sorts significantly faster on the multiprocessor machine than the single-threaded case on the same machine.
A ratio, appropriately named Speedup (see Christopher Lazou, Supercomputers and their Use, Oxford Science Publications, 1988), is defined as:
Speedup = S = (Execution Time for Uniprocessor / Execution Time for P Processors)
In this case, the Speedup factor was therefore measured at ~3.6.
The results of this case were expected because the number of arrays to be sorted was equal to the number of processors. The case clearly illustrates a specific design goal—for CPU-bound calculations, the number of optimal threads for maximum performance is approximately equal to the number of processors on the machine. It shows the improvements you can gain by porting a program to an SMP machine, if that program contains CPU-intensive operations, and if the calculations lend themselves to being calculated in parallel in a multithreaded architecture.
However, this case also shows that the user should be especially cautious when considering whether or not to introduce multithreading in a CPU-bound computation. If it is likely that the program would never be run on a multiprocessor machine, multithreading will probably not improve the speed of these computations.
If a single-threaded component is interacting with other multithreaded components on a server machine, all requests to the single-threaded component have to be queued—a process that could cost a severe performance penalty and become the throughput bottleneck for this and similar architectures. The problem of a single-threaded component being a throughput bottleneck is one of the reasons why multithreading in Visual Basic servers can be considered important for enterprise-level solutions.
In order to study the potential for performance gains in these scenarios, the following case was designed. A single-threaded and multithreaded Visual Basic DLL server were compared for performance in conjunction with Active Server Pages and Microsoft Internet Information Server (IIS) 3.0. A case involving IIS 3.0 was chosen because IIS makes full use of multithreading to allow concurrent processing of client requests. Thus, by introducing multithreaded versus single-threaded components to interact with scripts run by IIS, you could potentially see large speed gains, particularly when run on multiprocessor machines.
This scenario also has real-world relevance since there are several reasons why using ActiveX components in conjunction with Visual Basic Scripting Edition (VBScript) makes sense.
Since Visual Basic 5.0 can compile native code, it seems reasonable that any intensive calculations are likely to run faster than equivalent code in VBScript.
By encapsulating common routines into servers, the code is in a much more reusable format.
Balanced against these reasons, it should also be remembered that there will be a performance cost of launching the server components and overhead associated with making calls into the server. Therefore, these components will most likely only pay dividends when the time to perform methods of the components is significantly larger than the time to instantiate an object from the server.
Figure 2: A schematic architecture of a typical arrangement of clients making requests to a server. Having a multithreaded server component allows IIS to create objects on different threads in order to increase the "throughput" of Web pages delivered (and therefore decrease the response time when multiple clients are making requests from the server).
For this case, a Visual Basic DLL (see the attached project CPUTest.vbp) was created containing code to make a call into a database, retrieve a result set, and then perform an intensive CPU-bound nested loop calculation on that result set. A DLL was used for this case because the calls to in-process servers are known to give better performance than EXE servers—OLE does not have to marshal the data across process boundaries. Some of the VBScript (embedded in the ASP page) is shown below. This code instantiates and then calls methods of the Visual Basic component.
<%
Dim objServer
Dim vntResultSetArray
Dim vntAverage
'Create the Object.
Set objServer = Server.CreateObject("DBTest.clsDatabaseOp")
'Retrieve the data from the Access database into an array.
vntResultSetArray = objServer.GetData("Select UnitPrice, Quantity From [Order Details]")
'Have the server calculate an average in a CPU-bound loop.
vntAverage = objServer.CalculatePriceQuantityAverage(vntResultSetArray)
'Let go of the server reference.
Set objServer = Nothing
%>
In this instance, the CalculatePriceQuantityAverage function contained in the Visual Basic server component is CPU-intensive—it loops through a 2,000-row result set about 20 times. By calling this function, the calculation was deliberately made CPU-bound. Admittedly, the numerical result calculated in this example does not have a practical use. However, it does serve to demonstrate a typical type of CPU-bound computation—one where a custom result must be calculated from the fields of a database, and where SQL alone does not provide sufficient syntax to achieve the results.
To measure performance, a Visual Basic client program (attached as WebTest.vbp) was constructed to continually hit the information server with requests for the ASP Web page. It uses the Microsoft Internet Transfer control (msinet.ocx). The time before the request to receive the Web page was taken with the GetWindowTicks API, and the time immediately after the call was also measured. These times were then sent in subsequent requests to the ASP page, which passed the results to the Visual Basic component. The Visual Basic component logged them to a Microsoft Access database. The time taken to log the result was small relative to the time taken for the operations performed by the component, but the CPU-bound operations did involve database access, which was part of the issues being studied for this case.
The server applications (consisting of IIS combined with the ASP page and the Visual Basic component) were run on the multiprocessor machine described earlier, and client machines were all Pentium 200s and Pentium 166s with 32 MB of RAM. The speed difference of these client machines did not appear to affect response time measured in any significant way. The response time versus a varying number of client machines for both the single and multithreaded DLL component cases are shown in Figure 3.
Figure 3: Mean Average Time for returning Web pages versus number of clients with Visual Basic DLLs performing intensive CPU-bound calculations
The results indicate that for this case, replacing the single-threaded Visual Basic DLL with a multithreaded one caused the throughput to increase on the multiprocessor machine. Another way of stating this is that the response time was proportionally smaller with the multithreaded DLL, instead of the single-threaded DLL component, when adding more client machines. The single-threaded component case exhibited a near linear relationship between response time and number of client machines. In the multithreaded component case, step behavior can be seen for small numbers of clients (<5), but the multithreaded case line seemed to approximate linear behavior as more client machines were added. In this case, the step behavior is most probably related to the fact that the server was a quad-processor machine.
In this computation, there was not a huge gain in throughput, although it must be remembered that the calculation was made CPU-bound. In a CPU-bound computation, the maximum gain possible would be four (when all four processors are working on the problem in parallel). Since the operating system still had to run, and the IIS service needed CPU cycles to deal with pending requests, the gain could only be less than this theoretical maximum.
The problem with CPU-bound calculations being performed on the server machine that is accepting and processing requests is that the server can be bogged down with calculations, and therefore is delayed from processing further requests from clients. A better architecture is often to have a third tier perform the work required for calculations, thus leaving the second tier free to deal with incoming requests. A third-tier architecture will give better performance in the sense of being more scalable, thus reducing response time (increasing throughput) for more client machines. This concept hinges on the second tier not being a bottleneck for incoming requests, and, in turn, that may hinge on all components in the second tier being fully multithreaded.
It is important to note that this is only going to give better performance if the third tier is more capable of performing the work in parallel. In the case of CPU-bound calculations, it may be reasonable to use COM in a distributed environment (DCOM) or Remote OLE to have the work distributed by a series of servers. (For more information in the area of load balancing, see “Use Visual Basic 4.0 to Distribute the Load with Remote Automation” in the MSDN Library (Developer Network News, Volume 4, Number 6.) A three-tier architecture can transform the CPU-bound calculations on the second-tier server into I/O requests for the work to be done somewhere else, and could mean that much better gains may be expected if the work is spread out.
To study the performance benefits of employing a multithreaded component in a three-tier architecture, a similar arrangement to the two-tier architecture case studied was designed. Again a solution employing Internet Information Server 3.0 was used, but the principles should extend to any solution which involves multithreaded server components interacting with multithreaded server software.
A Visual Basic DLL server (see the attached project IOTest.vbp) was created, and it contained a method which simulated an I/O-bound request to the third tier by calling the Sleep API function. In practice, the methods in the software component would really be of the form of I/O request(s) to a third-tier machine(s) to perform work, perhaps via Remote OLE or DCOM, or perhaps more commonly as requests to a SQL server. In all these cases, the work is not performed by the second tier, so while waiting for a request to return the results, the server can be setting up more requests.
Figure 4 : Schematic of a three-tier architecture that uses Active Server Pages, which make calls into a Visual Basic component
Since the machine running IIS is not performing the back-end work, it is left free to deal with pending requests. This is where having a multithreaded component becomes pivotal to increasing performance. If the Visual Basic component being called on the second-tier machine were single-threaded, the requests for the objects would be serially queued. The wait time in this single-threaded queue will define the performance penalty. If you make the Visual Basic component multithreaded, each object can wait for the request in its own thread. Since these objects are not doing any work, but rather waiting for work to be done on another machine(s), the CPU is free to give time slices to the process that accepts more pending requests from clients.
Using the same Web client application described in the two-tier case (WebTest.vbp), the effect of multiple clients was measured on the server response time. The case was performed with a single-threaded and a multithreaded Visual Basic component. The performance results are shown in Figure 5.
Figure 5: Mean Average Time for Web page request versus number of client machines making requests. The Visual Basic DLL component simulates an I/O-bound request by sleeping for 2 seconds in a method called by Active Server Pages.
These results show that use of the multithreaded component instead of the equivalent single-threaded component caused the throughput to be almost three times larger in the multithreaded case, even though the server components were all on a single-processor machine. It must be remembered that the full speed increase would only be realized if the backend server which performs the actual work of the requests showed no slowdown with multiple requests—something that could probably only be realized with a large multiprocessor machine, or perhaps via DCOM/Remote OLE and multiple third-tier servers. Figure 5 also shows interesting non-linear behavior for small numbers of clients. For only two or three clients in the multithreaded component case, there was no noticeable slowdown in the throughput compared to the one client case.
Visual Basic version 5.0 has introduced the capability of having multithreaded ActiveX servers. Intelligent use of multithreaded servers in place of single-threaded ones can yield significant performance gains by increasing the speed of computations, enhancing scalability of applications (that is, keeping response time low when more clients make requests to a server), and improving user interface “responsiveness.”
In this article I have presented a method of using multiple threads to perform work when using a Visual Basic client and EXE server. On a multiprocessor machine, CPU-bound computations can be increased in speed with a factor approaching the number of processors on that machine. However, CPU-bound computation can only reasonably be expected to increase in speed when ported over to a multiprocessor machine.
I have also shown that multithreaded Visual Basic ActiveX servers can function as faster back-end components than their single-threaded counterparts in two-tier and three-tier architectures. This is most true when these components serve in conjunction with other multithreaded software (such as Microsoft Internet Information Server). Employing multithreaded components versus single-threaded servers in a three-tier architecture can achieve large decreases in response time when multiple clients are making requests, even when the second-tier consists of a single-processor machine.
The cases studied in this article involve use of Visual Basic components in conjunction with Microsoft Internet Information Server 3.0 and server-side VBScript. However, the principles discussed in this article can be extended to programs that use DCOM in conjunction with, or instead of, the Intranet applications studied here.
Special thanks go to Pete Dussin, Ivo Salmre, and Mark Chace for advice. I am also grateful to Mike Willard for advice and time on the symmetric multiprocessor machine (the “Heater,” as it is known locally).