Programming Best Practices with Microsoft Message Queuing Services (MSMQ)

Charles Sterling
Microsoft Corporation

June 1999

Summary: Discusses the best practices for building, troubleshooting, and testing distributed applications with Microsoft® Message Queuing Services (MSMQ). (11 printed pages)

Introduction
Eleven Guidelines for Writing Better MSMQ Applications
Troubleshooting Common MSMQ Problems

Introduction

Microsoft Message Queuing Services (MSMQ) enables applications running at different times to communicate across heterogeneous networks and systems that may be temporarily offline. As such, it allows many ways of sending and receiving messages. However, such flexibility brings with it the ready possibility of inefficiency. This article outlines an efficient way of writing MSMQ applications, testing MSMQ use, and troubleshooting to resolve any issues that may be encountered during the application-writing process.

This article assumes the reader's knowledge of MSMQ, along with some MSMQ programming experience. All the sample code is in Microsoft Visual Basic®, but the principles apply to other languages as well.

Eleven Guidelines for Writing Better MSMQ Applications

Here is a quick summary of the programming guidelines in this section. You should follow them when writing an MSMQ application.

Do only local receives.
Avoid functions that query the Message Queue Information Store (MQIS).
Implement timeouts.
Understand the limits of asynchronous notification.
Know when and where to use transactions.
Know when to use persistable COM objects.
Understand what security context to use.
Implement smart queue usage.
Request acknowledgements or nonacknowledgements.
Remember case sensitivity.
Test your application with a full reboot while offline.

Do Only Local Receives

MSMQ allows programmers to write code with complete location independence. This feature is powerful and useful, but it can also be very costly to performance, since it will fail if a receiving computer is disconnected from the computer hosting the queue. The failure occurs because MSMQ cannot receive messages while disconnected from the host computer or the site controller.

The performance dichotomy is a result of two factors:

MSMQ sends messages without doing any work other than queuing the message, which it can then batch very efficiently to either the network or a hard drive
The action of MSMQ sending a message is based on TCP/IP, while the action of MSMQ receiving a message is based on RPC (which tends to be binding to TCP/IP, adding another layer of abstraction)

For more information on MSMQ performance please see "Optimizing Performance in a Microsoft Message Queue Server Environment," available at http://www.microsoft.com/ntserver/appservice/deployment/planguide/msmqperformance.asp.

Another consideration to keep in mind for doing only local receives: In MSMQ 1.0, you cannot receive from a remote queue while in a transaction. For details, see the chapter "Transactional Remote Receive Semantics in MSMQ" in the article "Microsoft Message Queuing Services (MSMQ) Tips," available at http://msdn.microsoft.com/library/backgrnd/html/msmqtips.htm.

Avoid Functions that Query the MQIS

All information about public queues is stored in a data repository named MQIS. (In MSMQ version 1.0, this repository is housed in a SQL Server™ database.)

Most of the functions that can be used to open queues also query the data repository to verify existence of the queue or to validate the permissions for the type of access being requested. For example, by default a queue allows everyone "send" access, while only the owner has "receive" access).

Many functions need data repository access while others do not. The key is to minimize the traffic for the functionality you need. A list of some of the strategies for opening queues follows, with the cost and some of the advantages and disadvantages associated with each strategy.

Use the queue GUID to reference a queue.

(Syntax queuinfo.FormatName = "public =228B7F89-EB76-11D2-8A55-0080C7E276C0" )

Cost

One round-trip to the site controller to validate the existence and permissions.
Advantages
Works Offline.
Note: If this strategy does not work for you, please upgrade to Windows NT® 4.0 Service Pack 4.
If the computer is online, MSMQ verifies that the GUID does exist.
Disadvantages
If the GUID is hard-coded, you can never rebuild the enterprise/queue.
If you use a cached GUID derived from a path, the first time the application is run it must make a connection

Use PathName to reference a queue.

(Syntax queuinfo.PathName = "Machine_Name\Queue_name" )

Cost

Two round-trips to the site controller: one to determine the GUID and the second to verify existence and permissions.
Advantages
Dynamic queue rediscovery.
Disadvantages
Will not work offline.
Easier code equates to another network roundtrip.

Use the direct format name to send to a queue.

(Syntax queuinfo.FormatName = "Direct=OS:Machine_Name\Queue_name" )

Cost

Zero trips to the site controller
Advantages
Has dynamic queue rediscovery.
Works to different enterprises.
Disadvantages
There is no cost-based routing.
Source and destination computer must be eventually online simultaneously.
MSMQ will never verify that the destination machine exists.
Cannot do receives.
There is no intermediate store and forward.

Implement Timeouts

The default timeouts in MSMQ are set to infinite. This setting can lead to some disastrous effects. At first glance, this seems to be a very easy issue to identify, but infinite timeouts occur subtly and can become a problem. These timeouts include:

ReceiveTimeout parameter to the receive function
The default is infinite (truly infinite). This function specifies how long an application will wait for a message to be received from a queue. The application stops responding while it is waiting for a message to be received. (Even pressing Ctrl + Break in Visual Basic won't release control, since the Visual Basic main thread is hung waiting for that message.) The easiest way to get out of this situation without losing any code is simply stop the MSMQ service. Doing so causes a runtime error that gives you the opportunity to break out of the process and add a timeout.
Time-to-reach-queue
Default is infinite (defined as 90 days). It is strongly recommended that you specify a nondefault timeout value and make that value as small as possible. If the application does not specify a timeout, and the destination is unreachable (the destination computer no longer exists, for example), the message stays alive and holds resources for three months.
Time-to-be-received
Default is infinite (truly infinite). The message stays in the queue until it is dequeued. It is very easy to get into a situation where this timeout can be an issue—particularly with messages you didn't explicitly send (for example: acknowledgements or journal messages).

One symptom of an incorrectly set timeout (for either time to reach the queue or time to be received) is a machine that reacts increasingly slowly over time. Depending on the type of messages, the situation may or may not get better after a reboot. When a computer doesn't get better after a reboot, the problem involves the recoverable or transacted messages. Also, in those situations, the MSMQ service takes much longer to start, since it needs to initialize more messages on each restart. To confirm that this is the problem, look at the Perfmon objects for the MSMQ Queue object and verify that the queue is holding messages and that the number of messages agrees with what should be there.

Understand the Limits of Asynchronous Notification

Asynchronous notifications via WithEvents in Visual Basic can be a powerful feature. The idea of running code only in response to an event is quite attractive. The MSMQ issues with this method are minor, but noteworthy:

Events can and will get lost. In this case, there is little you can do other than periodically re-enabling notification. See the article "BUG: UserControl Event Is Not Raised from a Modal Form" (article Q177996 available at http://support.microsoft.com/support/default.asp), which defines the problem as only dealing with modal windows. Unfortunately, that is not the case. Any situation of interrupted Microsoft Windows® messages to WinProc() can run into this problem.
Multiple clients will be notified in the event of a single message. This problem is very common; the application ceases to respond to user input. Fortunately, it is very easy to fix: Simply ensure that all subsequent receives have ReceiveTimeout set. (See Implement Timeouts in this section.)

If you find additional problems within multithreaded Visual Basic applications, you should upgrade to Windows NT 4.0 Service Pack 4, which includes enhancements specific to this area.

Know When and Where to Use Transactions

There are two types of transactions in MSMQ:

Those provided by DTC, which are needed to participate in transactions with other resource dispensers.
Those provided by MSMQ internally, which have a performance advantage but cannot enlist in external transactions

Transactions in MSMQ can be quite useful, guaranteeing that only one instance of a message will be sent. If multiple messages are sent in a single transaction, those messages remain in order, and failures are always logged to the XactDeadletter queue.

The primary disadvantage of transactions in MSMQ is the performance. A secondary consideration is that a remote receive from inside of a transaction cannot be done.

For more information, review the chapter "Transactional Remote Receive Semantics in MSMQ" in the article "Microsoft Message Queuing Services (MSMQ) Tips" (http://msdn.microsoft.com/library/backgrnd/html/msmqtips.htm) and "Optimizing Performance in a Microsoft Message Queue Server Environment," available at http://www.microsoft.com/ntserver/appservice/deployment/planguide/msmqperformance.asp.

Know When to Use Persistable COM Objects

The process of sending COM objects can be convenient, but this convenience can also be expensive, because a generic COM object (as in ADO or Microsoft Word) will have no idea what it will need to deliver. Thus, the developer must write contingency code for every possible circumstance. The following sample shows the sending of an ADO recordset that only has one column and one row. The message sent is 394 bytes; the same information sent as text is only 22 bytes.

Private Sub Form_Load()
Dim con As Connection
Dim strQuery As String
Dim rs As Recordset
Dim msg As New MSMQMessage
Dim q As MSMQQueue
Dim qi As New MSMQQueueInfo
'***********ADO Code****************
Set con = New ADODB.Connection
con.CursorLocation = adUseClient ' Required to implement IPersist
con.Open ("Driver={SQL Server};Server=eastway;Database=pubs;Uid=sa;Pwd=")
strQuery = "select max(au_id) from authors"
Set rs = con.Execute(strQuery)
qi.FormatName = "direct=os:eastway\test"
Set q = qi.Open(MQ_SEND_ACCESS, 0)
'msg.Body = rs
msg.Body = "" & rs.Fields(0)
msg.Send q
End Sub

Understand What Security Context to Use

MSMQ validates permissions based on the security context where the work is being done. This typically affects the following: services, Microsoft Transaction Server (MTS) objects, ASP scripts, and users who inadvertently log onto their computer as a local user rather than an authenticated user account.

By default, all services run in the context of a local system. The local system is only a valid account on the computer hosting the service. With the default, MSMQ permissions will fail all receive operations to remote computers. Unfortunately, sends will work because the default for send access is "Everyone." The overhead inherent in attempting to validate an invalid user on the domain shows up as huge delays in sends reaching the destination queue. The classic symptom of this problem has sends immediately succeeding but leaving the sending computer in 30-second intervals (viewable in the performance monitor).

By default, Active Server Pages (ASP) run in the context of Internet Information Server (IIS), which is based in the local system. (This setting is not configurable in IIS 4.0 and was ignored in IIS 3.0.) Because it runs on the local system, it has the same problem as described above. There are a couple of solutions to this problem:

The easiest solution is to place all MSMQ code in an MTS component, set the identity of the package to a static validated user, and call the MTS component as an out-of-process ActiveX server.
All the other methods are IIS specific and are outlined in "HOWTO: Accessing Network Files from IIS Applications," (article Q207671, available at http://support.microsoft.com/support/default.asp).

The default security for a package of MTS objects is "Interactive User." This default works well when the MTS package is on the same computer as the client application calling the MTS component. However, in classic Windows DNA architecture, business objects are placed on a dedicated computer. The design for this is as follows: the client computer calls the business object computer, which calls the server computer, which hosts the queue.

Since Windows NT 4.0 does not support delegation, this model will have the same problems as does running in the context of a local system. The security identifier that is passed is the same as that for the local system account.

Logging onto a workstation as the local administrator is frequently an issue for developers who are used to services with the idea of "standard security." MSMQ installed into a domain does not have this concept. The attempt to access a site controller generates the error: "No connection with this site's controller(s). C00E0013L." While mirrored accounts (local machine accounts that use the same name and password) can be used to work around this issue to a certain degree, they are not recommended, for the following reasons:

You may cause a security risk by keeping a password static.
Maintaining a synchronized change is not scalable and prone to failure.
MSMQ associates a user's domain SID (security identifier) with every object that the user creates. A SID from a mirrored account is unknown to the MSMQ enterprise, and the object shows up as having an owner "Unknown," causing problems with MSMQ defaults permissions

Smart Queue Usage

Queue creation has all of the overhead of other MQIS calls (see Avoid Functions that Query the MQIS), as well as, in some circumstances, the addition of the InterSite replication duration (the default is 10 seconds).

MSMQ actively finds the closest site controller after a restart. This site controller serves that client for such information requests as queue existence and access. However, the original site controller is the only site controller that services object creation.

An example of the logic flow:

Start an application by creating a queue.
Open that queue for either doing "sends" or "receives."

If the site controller used for information retrieval is the original parent site controller, then this process will always succeed. If the client has been moved and is now using a different site controller, this code will fail while the queue creation information is replicated to the information site controller from the original installation site controller. Therefore, you should have "retry logic" built into all opens.

Request Acknowledgements or Nonacknowledgements

The default behavior of MSMQ is to not give notifications of either a success or a failure when a message is delivered. This is fine for messages that are expendable, but for messages that need verification, you need to request notification.

Transactional messages have this notification property set for messages that fail. So all transactional message failures are reported (but not successes) to the XactDeadletter queue. Unbeknownst to the programmer, the XactDeadletter queue can accumulate a lot of messages.

Remember Case Sensitivity

This is more of heads-up warning than a programming guideline: MSMQ queue names can be case sensitive. You may find it doubly difficult to track down an error caused by case sensitivity because the MSMQ Explorer shows all queues in lower case. .

Test Your Application with a Full Reboot while Offline

This, too, is a heads-up warning rather than a programming guideline. Because of the caching of both MSMQ and Microsoft Windows NT 4.0, it is always a good idea to test your application after a full reboot in a disconnected mode, to verify that the program is going to behave as expected in production. Also, note that the MSMQ service typically takes longer to come online in a disconnected state than normal. (Keep in mind that some send syntaxes and no remote receives are allowed offline.)

Troubleshooting Common MSMQ Problems

Many MSMQ problems can be isolated and resolved using a few simple tests. Problems are listed here in order of percentage of cases created for support at Microsoft, with connectivity representing the largest proportion.

Connectivity
Security
MQIS Connectivity
Slowness/Resource Depletion
Miscellaneous Problems and other Issues

Connectivity

No matter what the connectivity symptom or problem, using the ping utility to test the computer having problems is always a good idea. The amount of time that a ping test takes to respond can indicate a problem, as can the fact that such a test succeeds only intermittently. (The latter would indicate such issues as name resolution failing and the computer falling back to doing broadcasts for name resolution or network saturation).

When running the ping utility to test a computer for MSMQ tests, the name used must be the Network Basic Input/Output System (NetBIOS) machine name—not a fully qualified DNS name or the IP address. This is so because MSMQ 1.0 only uses the NetBIOS name.

The ping utility is based on the ICMP protocol of TCP/IP. ICMP is a poor choice to verify firewall issues for two reasons:

It is not a session based, therefore it doesn't validate the ability to establish a TCP session
ICMP is rarely allowed through firewalls because its primary usage is to control networking devices.

For firewall and port issues, Telnet is a very good tool. With Telnet, you can establish sessions with the host computer on ports 135, 1801, 2101, and 3527. For more information on the ports required for firewalls, see the article "HOWTO: Configure a Firewall for MSMQ Access" (article Q183293, available at http://support.microsoft.com/support/default.asp) and "Using Distributed COM with Firewalls," available at http://www.microsoft.com/com/wpaper/dcomfw.asp.

Additional connectivity problems can be isolated through an INetMonitor trace. A NetMon trace can help to determine the object of an MSMQ attempt to establish a session and ascertain which part of the connectivity process is failing. NetMon can also help to find situations where the connection between the two computers is succeeding but validation to a domain controller is failing.

Security

Many security problems occur when users log on locally and try to use MSMQ across nontrusting domains. (See Understand What Security Context to Use for more information.)

The easiest test to verify proper security is to connect to the default admin share on the other computer:

\\Machine_Name\C$

If credentials are requested to make this connection, MSMQ will not connect. Unlike SQL Server, MSMQ will not use the credentials supplied to establish that connection. Note that making this connection by itself is not a valid connectivity test, since a share may be accessed by some protocol other than TCP/IP.

MQIS Connectivity

MQIS connectivity problems only affect MSMQ servers, since these servers are the only ones to have an MQIS data store. However, connectivity seems to be a tenuous beast on many development computers. Here are several key points for troubleshooting MQIS connectivity issues:

During setup, use both ODBC trace and SQL Trace (known as SQL Server Profiler in SQL Server 7.0), since MSMQ uses both ODBC and DB-Library.
After setup, ODBC trace is a much better tool than SQL Trace.
SQL Server 7.0 and MSMQ are totally compatible, with two provisos listed below:
If you have both SQL Server 6.5 and 7.0 installed, run setup with 6.5 running and upgrade the MQIS to 7.0 (when you set up in this order, MSMQ will now work with both.)
You cannot cluster SQL Server 7.0 and MSMQ servers.

Slowness and Resource Depletion

For slowness and resource depletion issues, the counters for MSMQ in the PerfMon performance monitor utility are extremely useful, since they can show the following problems:

Messages pending to be sent. This is very important, as pending messages are not detectable by any other means.
Messages piling up in journal queues or acknowledgement queues. Often the customers will have no idea they have left the journaling feature on.
Memory utilization or depletion, which can be a common issue in sending COM objects.

Miscellaneous Problems and other Issues

For other miscellaneous problems and unknown issues, the debug version of MSMQ is an invaluable troubleshooting tool. All MSMQ error conditions must return one of the errors defined in MQ.h, but this can lead to situations where errors are too generic to be useful. (For example, there is an actual error with the message text "GenericError.") However, in the debug version of MSMQ, you have the ability to see the actual causes of errors rather than predefined error categories from MQ.h, including comments from the original MSMQ developers.

For more information on using the MSMQ debug version, see the article "HOWTO: Use Debug DLLs to Troubleshoot MSMQ Issues" (article Q195141, available at http://support.microsoft.com/support/default.asp).

Programming Best Practices with Microsoft Message Queuing Services (MSMQ)

Contents

Introduction

Eleven Guidelines for Writing Better MSMQ Applications

Do Only Local Receives

Avoid Functions that Query the MQIS

Implement Timeouts

Understand the Limits of Asynchronous Notification

Know When and Where to Use Transactions

Know When to Use Persistable COM Objects

Understand What Security Context to Use

Smart Queue Usage

Request Acknowledgements or Nonacknowledgements

Remember Case Sensitivity

Test Your Application with a Full Reboot while Offline

Troubleshooting Common MSMQ Problems

Connectivity

Security

MQIS Connectivity

Slowness and Resource Depletion

Miscellaneous Problems and other Issues