Mail Server Performance Profile

With Microsoft Mail, users connected to a network can exchange messages, files, and programs electronically and efficiently with one another. These users have access to a collective mail-drop facility called a Postoffice (PO). This PO functions on a Shared File System (SFS) basis and resides on a file server where many users and processes can access the PO simultaneously, performing many different operations.

The Message Transfer Agent (MTA) for Microsoft Mail is a program called EXTERNAL.EXE. There are two main purposes of the MTA: transfer the mail between two or more Microsoft Mail POs and provide connectivity to remote mail users. There is both an MS-DOS® and an OS/2 version of the MTA. The OS/2 version is called the Multitasking MTA (MMTA). There is currently a Windows NT version of the MMTA as well that will soon be in Beta. The MMTA uses OS/2 (and soon, Windows NT) to extend the capabilities of Microsoft Mail to multiple External instances, Dispatch instances (a function of Directory Synchronization explained later), and SchDist instances (a function of Schedule Distribution for Microsoft Schedule+ explained later) on a single machine. The MMTA may also be configured as a modem pool supporting many remote clients from a central hub. Additional External MTAs/MMTAs may be added as required to increase performance as the network grows and provide greater remote access support.

Directory Synchronization (DirSync) is the automatic, fault tolerant process of keeping a Global Address List (GAL), which contains all mail addresses defined on the network, available to all users on the Microsoft Mail network. This GAL is used by both the Microsoft Mail and Microsoft Schedule+ systems. DirSync is performed by an application called Dispatch. DISPATCH.EXE is included in the base Microsoft Mail Server. The directory synchronization architecture consists of a DirSync Server (DSS) postoffice and DirSync Requester (DSR) postoffice(s). There is only one DSS for synchronizing directories in an organization with all other postoffices participating in DirSync, including the DSS postoffice, defined as a DSRs.

Mail Server Optimization and Tuning Guidelines

Make use of the MTA's wide-area network (WAN) configuration option. On all physical links slower than typical LAN speeds, MTA/MMTAs should be located on both sides of the WAN link and should be configured to use the WAN option. To explain how this improves performance, a discussion on the tasks an MTA/MMTA must perform is in order. When an MTA/MMTA services a postoffice, it has three tasks to perform:

Look for mail to send out of the postoffice. It checks its queues for all outgoing mail and moves it to the MTA/MMTA's home PO.

Send mail into the postoffice. The MTA/MMTA puts all of the incoming messages going into the PO in a single file.

Deliver the incoming mail to the correct recipients. The MTA/MMTA takes all of the messages it put in the single file and distributes them to the individual recipients.

By using the WAN option, the MTA/MMTA must only perform the second task across the WAN connection and then the first and third tasks are performed by the MTA/MMTA on the local side of the link, which means they are performed at LAN speeds instead of WAN speeds. This not only improves performance, it also reduces the possibility of mail database corruption since files are kept open for shorter periods of time.

Group people that communicate most often on the same PO. Since the mail clients work directly with the PO to deliver mail to other users on the same PO, this eliminates the need for the MTA process to get involved to send those messages. Reducing the number of messages an MTA must route will increase your system throughput.

Limit the number of active users on a PO. One of the most important directories on the SFS PO is the global (GLB) directory. This directory contains all of the system and configuration files used by all of the Microsoft Mail clients and processes. A few files in that directory are accessed very frequently. Even though these files are very small, because of the number of times they are accessed, they can become the bottleneck for the PO. A PO has a hard limit of 500 users, but a more realistic number would be between 200–300 LAN users because of the number of file accesses that must occur. One could have more users than this recommendation if some of these users are remote mail users and aren't connected to the PO continuously. To size a PO, one must take into consideration the performance of the network operating system for file access, the physical connectivity on the network (such as 2 MB wireless LAN connections compared to 10 MB Ethernet compared to 100 MB FDDI), and the speed of the disk access where the PO resides.

Consolidate POs. After just stating that it is important to limit the number of active users on a PO, it should also be stated that one shouldn't go overboard and allocate only 50 users per PO as some companies have done. Again, a good rule of thumb is to try to stay between 200–300 users per PO. If a company has a lot of smaller POs, then consolidation of POs should be a consideration. Consolidating POs does the following:

Collapses the mail routing topology, thus reducing the number of hops a mail message must take before being delivered.

Results in less mail to be delivered since MTA/MMTAs do not get involved in routing messages that stay on the same PO.

Reduces directory updating since there are fewer POs to update.

Connect users to a local PO. Due to the SFS architecture of the PO and the amount of file I/O that occurs between a client and the PO, it is always better for a client to connect to a PO on the LAN rather than across the WAN. Even if an office only has 50, 20, or even 10 users, it should have its own PO. This is one of the few times when having a PO with less than 200 users on it is justified.

Plan for 10–15MB per user for mail storage on the PO. It is important to size the PO correctly. Where mail is used regularly as a part of normal business operations, experience has shown that users may consume up to 10–15 MB of disk for mail storage. Also, it has been found that limiting individual user message stores (MMF files) to around 10–15 MB reduces the likelihood for corruption to occur. Should corruption occur, if the file is around 10–15 MB in size, then the greater the chances of success for restoring the integrity of the MMF file. Users should be taught how to archive folders and perform simple backups of the MMF files through the mail menu so that they can make sure they keep their MMF files in the 10–15 MB range or lower. In most cases, customers average between 5–10 MB per user, but planning for 10–15 MB will provide a buffer for heavy use.

Use Hub POs to minimize routing requirements and ease administration. A Hub PO is a dedicated mail routing PO. This PO should be used at strategic points where there are heavy mail routing needs. Basically, a Hub PO is exactly the same as a user PO except that there are no users assigned to this PO and the file server on which it runs does not have any other responsibilities such as file and print sharing. Its sole purpose is to shuttle mail back and forth between other POs. By isolating the Hub POs from the user POs, mail routing throughput is increased and administration of mail routing is centralized. It improves throughput because the user POs only need to have a single direct connection defined back to a Hub PO. All other external user POs are then defined as indirect via the Hub PO. This means that the user PO only has to keep up with one queue for its one direct connection.

Limit the number of modems per MMTA to handle Remote Mail Users. Asynchronous connections in the MMTA generate lots of interrupts and experience has shown that a 486 CPU with 16 MB RAM running OS/2 1.3 cannot handle more than 4 simultaneously connected asynchronous sessions. A Pentium processor with 16 MB of memory running under OS/2 1.3 may be able to handle 5 modem connections. Limiting your modem connections will allow the CPU to respond more efficiently to the modem sessions. This is really driven by memory and the version of OS/2 utilized on the MMTA. Since OS/2 2.x can work effectively with more than 16 MB of RAM, some customers have successfully used 8 modems on OS/2 2.1 systems with 20–24 MB of RAM. Utilizing intelligent communication boards with high-speed UARTs and buffers will also increase the number of modems that a single MMTA machine can support.

Utilize the MailerDisable and DispatchDisable EXTERNAL.INI parameters in asynchronous instances. EXTERNAL.EXE really has only two functions for mail processing—dispatch and mailer. The dispatch function does the checking of external postoffice mailbags to see if any mail needs to go out. It then builds the P1 headers on that postoffice. External will use the P1 directory for outbound message transfer in each postoffice database it is processing. Once all the P1s are built, external will set up the connection to the destination postoffice. The P1 files only include message header information, not actual messages or attachment data. It writes all the P1 data to the INQUEUE3 file and writes all the message text and attachments to the \MAI and \ATT directories respectively. This ends the dispatch function. The mailer function reads the INQUEUE3 (INQUEUE3 handles all Microsoft Mail 3.x messages, INQUEUE handles mail from Mac, 2.x, OS/2 Presentation Manager, and MS-DOS–based clients) and updates the users MBG files with pointers to the appropriate MAI and ATT files. One can disable one or both of these functions on an external using the MailerDisable or DispatchDisable INI commands. This is helpful when you have several MTAs or MMTA sessions and you want to tune them. For example, one may want to disable both of these functions on all MMTA sessions handling dial-in for remote users. This provides for the greatest availability and performance for remote users. Before doing this, make sure there are other sessions running on the MMTA system that are able to do all the dispatch and mailer functions.

For large networks, dedicate a PO to act as the DirSync server PO. DirSync is divided into three different stages that are called "times," and are numbered from T0 to T3. T2 is the DirSync server's "process updates" time. This is when the server takes all the updates sent in by the requestor POs, adds them to the master transaction list, and sends GAL updates out to each requestor PO. The DirSync T2 machine should run on its own PO, preferably SUBSTituted to a local hard drive to minimize network traffic. This local hard drive should be large and FAST. Using a separate PO and machine from the other POs minimizes impacting mail throughput due to problems with DirSync. Having the PO local can speed up DirSync by a factor of 10 (which means T2 can run in 30 minutes compared to 5 hours over the net). This can be a significant benefit on servicing distributed POs. This machine does not have to be a file server. Rather, it can copy the PO down to the local drive, run SUBST M: C:\MAILDATA, perform T2 for DirSync, copy the files back up to the server, and finish much faster than running the whole process over the network. Likewise, the local DirSync machine should be a separate machine than the local External PC. That way, if DirSync fails for any reason, at least External will continue to run to service mail customers. When DirSync returns on-line, the GAL will be usable without turning off External.

Limit the Number of Network Names. Microsoft Mail uses a three-layered naming convention consisting of Network/Postoffice/Mailbox. Even though the DirSync process will work with any combination, having a uniform network name does speed processing. The reason for this is that DirSync, in order to speed user search times, reindexes the GAL every time changes are made to the GAL. This indexing must occur across all three layers of the Microsoft Mail naming convention. By using only a couple Network names, DirSync has less indexing to do across the Network name and really only has to index across the Postoffice and Mailbox parts of the Microsoft Mail addresses.

Use more, high-performance hardware to increase performance. Many accounts use 286 and 386 class machines as their MTA/MMTAs. Depending on other factors such as link speeds, faster hardware (such as 486/33 or greater class machines) could significantly increase the throughput of mail. The cost effectiveness of using faster MTA/MMTAs to improve mail performance should not be underestimated. An advantage of using high-performance MTA/MMTAs is that they can be applied to the system in a highly tactical manner. Bottlenecks on only a small subset of the total number of links can have a big effect on the overall system performance. Increasing throughput on key links can have a substantial impact on overall system performance. When trying to determine where to apply high-performance MTA/MMTAs consider the following:

Apply faster hardware selectively on links where traffic volume is greatest. By selectively applying faster machines on links that move a lot of mail, one should be able to better balance the loads across the system.

Faster hardware should have a greater impact on MTA/MMTAs that are not link constrained. For example, on 10mbps links, the MTA/MMTAs are not link constrained to the degree that a WAN segment/link imposes. At 2mbps they are marginally link constrained. Therefore, the faster the link, the larger will be the marginal benefit of using fast hardware.

If used in conjunction with the WAN option, faster hardware has a greater impact. This is because when the MTA/MMTAs perform more delivery work on unconstrained links, the impact is relatively negligible on total link load and network traffic. The WAN option disables the delivery portion of MTA/MMTA processing across the WAN, which minimizes link impact due specifically to mail processing traffic. Remember, a local MTA must be running on both sides of the link in order for this to work or mail will not be distributed at the remote post office.

The use of fewer high-performance MTA/MMTAs is more efficient than the use of multiple low-performance MTA/MMTAs. Whether configured with a single MTA/MMTA polling remote postoffices, or with the WAN option, the marginal increase in throughput is much greater if a single fast machine is used as compared to multiple slower machines.

Match the Mail network configuration to the physical network. To ensure efficient WAN use, there must be a close correlation between the location of Hub POs (the mail backbone), the MTA/MMTAs, and physical network devices. When adjusting the mail system configuration, carefully consider the characteristics and locations of the various physical links. It is often difficult to determine the optimal location for a particular postoffice or device. The following are a few guidelines.

Locate MTAs as closely as possible to the Hub POs. Plan appropriately so that the MTAs servicing the Hub POs are not unnecessarily located on the far side of network devices such as bridges and routers.

The mail routing topology should inherently control traffic levels through slow or routed links. Analyze this closely as packet traffic across some links and devices could be substantially reduced. The best way to do this is always to have a Hub PO on each side (or one designated user PO on each side) as the only POs with single direct connections across the WAN links to each other. All other POs should define the other POs across the link as indirect via this Hub PO. This is best explained through an example. Let's say there is one site that has user POs A, B, and C and another site that has user POs 1, 2, and 3. When defining the connectivity between the POs, don't have each user PO have a direct connection definition for every other user PO. Instead, do the following:

Choose one PO to be the Hub PO at each site. At larger sites, it is better to install a dedicated Hub PO with no users on it in order to handle the mail routing needs. At smaller sites, one can designate one of the user POs to act as the Hub PO. In our example, let's choose PO A and PO 1.

Have each Hub PO define a direct connection to the other POs at its site and to the other site's Hub PO and an indirect connection to the other POs. In our example, PO A would have direct connection definitions to POs B, C, and 1 and indirect connection definitions to POs 2 and 3 via PO 1. PO 1 would have direct connection definitions to POs 2, 3, and A and indirect connection definitions to POs B and C via PO A.

The non-Hub POs at each site then have a direct connection definition to the Hub PO at their site and, at a minimum, an indirect connection definition via their Hub PO to all other user POs across the WAN. The other user POs at the same site can be either defined as direct or indirect. At larger sites it is definitely better to define all other user POs at the same site as indirect. At smaller sites, either way should be fine. In our example, PO B would have a direct connection definition to PO A and indirect connection definitions to POs C, 1, 2, and 3 via PO A. PO 3 would have a direct connection definition to PO 1, and indirect connection definitions to POs 2, A, B, and C via PO 1.

Utilize the DrivesWAN EXTERNAL.INI parameter to shuttle mail between the two Hub POs.

By following the above, the MTA/MMTA only has to send mail across the WAN link between POs A and 1. All other message routing will stay local to the site.

The number of MTA/MMTAs that are configured to deliver mail over a particular physical link should be correlated closely with the bandwidth of the link. This is important. Once a link is taxed to saturation, its overall performance is likely to degrade substantially. An account should be careful that its mail topology doesn't result in some specific links being overused.

The MTA/MMTAs use more of the bandwidth of the slower links. Some of the performance analysis data indicates that the MTA/MMTAs are able to use a larger percentage of the total bandwidth of slower links. The implication is that while it may be possible to have many MTA/MMTAs simultaneously operating across 2mbps links, it would not be wise to configure many MTA/MMTAs to simultaneously move mail across 48kbps links because link overload will occur. One might test to determine what the link saturation levels are relative to the total number of simultaneously operating MTA/MMTAs in order to determine what is optimal.

Distribute the processing for Directory Synchronization. As explained earlier, there are three stages, called "times" numbered T1 through T3, that make up the entire DirSync process. Times T1 and T3 run against all requestor POs and time T2 only runs against the DirSync Server PO. By distributing the processing of DirSync, one increases performance of the DirSync process by having multiple machines execute the T1 and T3 cycles against the different requestor POs. Just as one distributes processing for the MTA/MMTA function of Microsoft Mail, one should also distribute the processing for DirSync, especially when the connectivity involves WAN links. At all sites where there is at least one MTA/MMTA process running, there should also be at least one DirSync (Dispatch) process running. This can be put on the same physical box as the MTA/MMTA when performance requirements allow it. One knows when to add more machines to run the DirSync process when it takes too long for the company for the DirSync process to complete. For example, let's say a company schedules DirSync to run overnight when the mail network is not heavily utilized and desires it to be complete before the next morning when activity picks up again. The company sets the schedule so that T1 starts at 7:00pm, T2 at 11:00pm, and T3 at 3:00am every Monday, Wednesday, and Friday. The company wants the process to be completed by 7:00am the following day as that is when the mail activity increases. If the company determines that the T3 process is not being completed against all POs by 7:00am, then the company should consider increasing the number of Dispatch machines so that each has fewer POs to service.