Microsoft Site Server 3.0 Dynamic Directory Capacity and Performance Analysis

November 1999

Microsoft Corporation

Definition of Terms

Dynamic Directory–Specific Terms Meaning
User An individual user connected to a service.
Add Record Adding a user to the Dynamic Directory database.
Concurrent Users Simultaneous users active on the system; a subset of total users calculated based on user profile.
Least Squares Method of approximating a polynomial equation onto a non-linear data set.
Attributes Specific information associated with a file or an object.

General Terms Meaning
Pentium Pro equivalent MHz (PPEM) A unit of measure for processor work.

200 Pentium Pro equivalent MHz is delivered by a 200 MHz Pentium Pro processor.

A computer with two 200 MHz Pentium Pro processors will deliver 400 Pentium Pro equivalent MHz.


Chapter 1 Overview

This document evaluates the performance and scalability characteristics of the Microsoft® Site Service 3.0 Dynamic Directory. Also demonstrated are procedures for identifying these characteristics. Using these procedures, administrators can determine how user load impacts hardware resources, and which resources are likely to be bottlenecks in performance. This information can be used to calculate maximum capacity for a particular hardware configuration, and to assess both the value of adding resources and identify which resources can satisfy greater capacity needs.

The specific implementation of the Dynamic Directory for Microsoft® NetMeeting™ is discussed in a different report.

Analyzing the Dynamic Directory and LDAP Services

The Dynamic Directory is a central data repository that stores user records containing personal profile information used in Personalization & Membership (P&M). The Dynamic Directory stores session information, information about users that are currently logged on , and also includes user identifiers and Internet Protocol (IP) addresses. The Dynamic Directory may also contain per-user application information, such as items in a shopping cart or pages visited.

The Lightweight Directory Access Protocol (LDAP) Service manages the Dynamic Directory. The contents of the Dynamic Directory are stored in a RAM database. Dynamic Directory contents are obtained using the LDAP Service, which provides platform-independent access to the Dynamic Directory. To obtain performance and scalability information, the Dynamic Directory and the LDAP Service were analyzed in separate tests using various server configurations.

Projecting Total Resource Cost

A model is designed in which total resource cost is calculated for a variable number of concurrent users. The model is as follows:

C = n × K

C is resource cost, n = number of concurrent users, and K is the sum of the costs of each user operation.

Separate equations are created for each type of resource (CPU, memory, disk and network). It is possible to reasonably predict the maximum number of users a particular hardware configuration can support, or conversely, to determine the hardware resources required for a given number of users.

Verification Testing

The accuracy of these calculations has been confirmed by comparing the results of the calculations with a series of verification tests. A test script is created to simulate typical user behavior that is defined in the User Profile. The predicted resource costs calculated from the model are charted against actual resource costs generated by running the verification script.

Scalability

Given a model for predicting resource requirements, it is important to test scalability. For example, does investing in a multiple-processor system enhance the performance of the Dynamic Directory? How does replicating the Dynamic Directory across multiple computers affect overall performance?

System Configuration

In the test scenarios, this computer was used in the following configuration:

Processor Scaling

Dynamic Directory Server:    
  CPU: 2&4 x 200-MHz Pentium Pro
Memory: 512 MB of RAM
Disk: 4.3-GB SCSI
Network: 100BaseT
Software: Microsoft® Windows NT® version 4.0
Option Pack v. 622 (final)

Dynamic Directory Description

Introduction to the Dynamic Directory

Tested Transactions

Transaction Description
Modify Add Attribute Add one attribute to an existing user
Modify Replace Attribute Change the value of one attribute in an existing user
Search Base LDAP base-level search for a user by common name (CN)
Search Onelevel LDAP one-level search for a user by name
Add Record Add a 20-attribute user
Delete Delete one user

User Profile

The following profile captures users’ typical Dynamic Directory transactions. The Dynamic Directory can contain 10,000 users on a single computer. Each 100 users share a single connection to the Dynamic Directory. The ratio of operations and the frequency of each transaction is described in the following table:

Typical Dynamic Directory Transactions

Transaction Operations in user life cycle Frequency
Add 3% 0.00048 per second
Modify Replace 11% 0.00194 per second
Search by CN 74% 0.0123 per second
Search Onelevel 9% 0.00146 per second
Delete 3% 0.00048 per second

With this profile, the calculated scalability should read:

Summary of Scalability and Performance

Based on the data collected in this document, the following assertions can be made about scaling and performance for the Dynamic Directory:

The system is an in-memory database. If there is not sufficient memory, then the system does not function properly.

There is a background worker thread that eliminates timed-out users from the Dynamic Directory. When this thread is active, it can affect transaction response time.

A large number of open LDAP Service connections cause context switching, which can impact throughput. Therefore, it is recommended that concurrent connections be limited to 5,000 per computer.

Performing an LDAP bind as administrator greatly reduces the context switching overhead in the security layer, which can occur at higher transaction rates.

The use of the Dynamic Directory as a directory service for NetMeeting is discussed in a separate paper.

Chapter 2 Detailed Discussion of Scalability and Performance

Processor Usage

The PPEM costs of the transactions for the Dynamic Directory are not constants. They vary with the rate of the transaction. In some cases, when the transaction rate is very low, costs tend to be higher. This table represents the least squares approximation of PPEM versus Transaction Rate curves that were measured in our test environment. The relative accuracy of these measurements is confirmed in the following verification section.

The PPEM at a given rate r on a given processor is r x C1

Four processors

Transaction Valid range for r PPEM C1
Modify Add Attribute 1–480 1.1
Modify Replace Attribute 1– 605 0.7106
Search Base 1– 531 0.59375
Search Onelevel 1–500 0.65
Add Record 1– 106 2.5
Delete 1– 137 2.29

Two processors

Transaction Valid range for r PPEM C1
Modify Add Attribute 1– 257 0.71
Modify Replace Attribute 1– 250 0.8007
Search 1– 440 0.55
Search Onelevel 1–400 0.49
Add Record 1– 100 2
Delete 1– 100 2

Memory Usage

The base memory cost for running the Dynamic Directory is 850K.

There is an additional cost of 11K per active connection.

There is also a cost of approximately 1,500 bytes per Dynamic Directory object with five attributes. The memory cost will vary depending upon the attributes.

Network Usage

This table describes a typical network resource usage of the transactions that were measured.

Byte usage for typical transactions

Transaction Size in bits
Modify Add 1048
Modify Replace 1024
Search Base 3880
Search Onelevel 3880
Add 2288
Delete 160

Disk Usage

Because the database is stored in memory, there is no measurable disk utilization on the Dynamic Directory computer for each transaction.

Profile Calculations

Before looking at specific components of the CPU, disk, and network calculations, the user profile will be examined. Using the number of concurrent users to determine maximum capacity is difficult to achieve if user behavior is not considered.

T= n x f

Using Add Transaction the equation becomes:

T = n x .000485 per second

To find out how many Add transactions are generated per second for 5,000 concurrent users per connection, the equation becomes:

T = 5000 x .000485 = 2.425

The following is an expansion of the entire table:

Transaction rates in user profile

Concurrent users Add per sec. Modify per sec. Base search per sec. One Level Search per sec. Delete per sec.
500 0.24 0.97 6.15 0.73 0.24
1,000 0.49 1.94 12.30 1.46 0.49
1,500 0.73 2.91 18.45 2.18 0.73
2,000 0.97 3.88 24.60 2.91 0.97
2,500 1.21 4.85 30.74 3.64 1.21
5,000 2.43 9.71 61.49 7.28 2.43
7,500 3.64 14.56 92.23 10.92 3.64
10,000 4.85 19.42 122.98 14.56 4.85
20,000 9.71 38.83 245.95 29.13 9.71
25,000 12.14 48.54 307.44 36.41 12.14
30,000 14.56 58.25 368.93 43.69 14.56
50,000 24.27 97.09 614.89 72.82 24.27

Processor Calculations

Calculate the total CPU cost at 5,000 users for all five transactions in the user profile. Using the Add transaction again and arbitrarily choosing the 4-processor numbers:

PPEMAdd = Rate * PPEM C1

PPEMAdd = 2.425 hz * 2.5 PPEM = 6.0625 PPEM

This is 6.0625 PPEM / 800 PPEM Capacity = .758% CPU on a 4-processor Pentium Pro 200 MHz server.

The following is an expansion of the entire table for a 4-processor system.

Conc. users Add PPEM Modify PPEM Base search PPEM One Level Search PPEM Delete PPEM PPEM total % CPU
500 0.61 0.69 3.65 0.47 0.56 5.98 0.75
1,000 1.21 1.38 7.30 0.95 1.11 11.95 1.49
1,500 1.82 2.07 10.95 1.42 1.67 17.93 2.24
2,000 2.43 2.76 14.60 1.89 2.22 23.91 2.99
2,500 3.03 3.45 18.25 2.37 2.78 29.88 3.74
5,000 6.07 6.90 36.51 4.73 5.56 59.77 7.47
7,500 9.10 10.35 54.76 7.10 8.34 89.65 11.21
10,000 12.14 13.80 73.02 9.47 11.12 119.53 14.94
20,000 24.27 27.60 146.04 18.93 22.23 239.07 29.88
25,000 30.34 34.50 182.54 23.67 27.79 298.84 37.35
30,000 36.41 41.40 219.05 28.40 33.35 358.60 44.83
50,000 60.68 68.99 365.09 47.33 55.58 597.67 74.71

The following is an expansion of the entire table for a 2-processor system.

Conc. users Add PPEM Modify PPEM Base search PPEM One Level Search PPEM Delete PPEM PPEM total % CPU
500 0.49 0.78 3.38 0.36 0.49 5.49 1.37
1,000 0.97 1.55 6.76 0.71 0.97 10.97 2.74
1,500 1.46 2.33 10.15 1.07 1.46 16.46 4.12
2,000 1.94 3.11 13.53 1.43 1.94 21.95 5.49
2,500 2.43 3.89 16.91 1.78 2.43 27.43 6.86
5,000 4.85 7.77 33.82 3.57 4.85 54.87 13.72
7,500 7.28 11.66 50.73 5.35 7.28 82.30 20.58
10,000 9.71 15.55 67.64 7.14 9.71 109.74 27.43
20,000 19.42 31.10 135.28 14.27 19.42 219.48 54.87
25,000 24.27 38.87 169.09 17.84 24.27 274.35 68.59
30,000 29.13 46.64 202.91 21.41 29.13 329.22 82.30

Network Calculations

Example: Calculate the total network bits per second at 5,000 users for all five transactions in the user profile. Use the Add transaction again and arbitrarily choose the 4-processor numbers.

bits/secAdd = 2.425 hz x 2288 bits/sec = 5548 bits/sec

Conc. users Add bits/sec Modify bits/sec Base search bits/sec One-Level bits/sec Delete bits/sec Total bits/sec
500 555 994 23,858 2,825 39 28,271
1,000 1,111 1,988 47,715 5,650 78 56,542
1,500 1,666 2,983 71,573 8,476 117 84,814
2,000 2,221 3,977 95,430 11,301 155 113,085
2,500 2,777 4,971 119,288 14,126 194 141,356
5,000 5,553 9,942 238,576 28,252 388 282,712
7,500 8,330 14,913 357,864 42,379 583 424,068
10,000 11,107 19,883 477,152 56,505 777 565,424
20,000 22,214 39,767 954,304 113,010 1,553 1,130,848
25,000 27,767 49,709 1,192,880 141,262 1,942 1,413,560
30,000 33,320 59,650 1,431,456 169,515 2,330 1,696,272
50,000 55,534 99,417 2,385,761 282,524 3,883 2,827,120

Sample Configuration

Site Configuration Example

Based on the same user profile used throughout this document, the following example illustrates how to determine an optimal configuration for a given number of users.

Profile Description

Example: The site has 200,000 total users. On average, each user connects for 20 minutes and performs 20 transactions. The distribution is:

At peak time, 2 percent of total users are online simultaneously with 4,000 concurrent users.

The number of requests per second is given by:

N x F

where the number of concurrent users N is computed as stated earlier and the frequency F is the number of queries per user per second.

Total Add per second during peak time is 4000Users x .000485rate = 1.94 per sec.

Total Modify per second during peak time is 4000Users x .001942rate = 7.77 per sec.

Total Base Search per second during peak time is 4000Users x .012298rate = 49.19 per sec.

Total Onelevel Search per second during peak time is 4000Users x .001456rate = 5.83 per sec.

Total Delete per second during peak time is 4000Users x .000485rate = 1.94 per sec.

Sample Processor Calculations

For each of the transactions, compute the expected CPU cost. Computing the cost of the individual transactions constitutes the PPEM model. The model assumes that the total cost of running all the transactions together will not bring the system into the area of context switching. Note that context switching on multi-processor systems is very expensive at higher transaction rates.

It is also important to note that PPEM costs are only accurate to predict approximately 80 percent utilization on a processor. Above these levels, it is recommended that you run a computer with greater CPU capacity, where the best choice is increasing the clock speed. Another solution is to add an additional server to distribute the load.

Furthermore, each of the PPEM equations is only valid up to the maximum range measured.

In order to calculate the cost for the Add measurement for a 4-processor:

PPEMAdd = Rate x PPEM C1 è 1.94 x 2.50PPEM  = 4.85 PPEM

PPEMModify = Rate x PPEM C1 è 7.77 x 0.71PPEM  = 5.52 PPEM

PPEMBase Search = Rate x PPEM C1 è 49.19 x 0.59PPEM  = 29.21 PPEM

PPEM1level Search = Rate x PPEM C1 è 5.83 * 1.04PPEM  = 6.06 PPEM

PPEMDelete = Rate x PPEM C1 è 1.94 x 2.29PPEM  = 4.45 PPEM

The total is PPEM 48.22 or 6.03% CPU utilization on a 4-processor system.

Appendix A:  Testing Methodology

Calculating the per transaction cost

To calculate the PPEM from observed data, the load was simulated on a per transaction basis, in this case, Base Object Dynamic Search. An automated script was run that cycles through a changing user load. The values were captured in a perfmon.exe log file. This log file was exported to a tab-delimited text file and then imported into Microsoft® Excel. Using an Excel macro, the log was distributed and averaged. Thus, each row of data is a reasonable representation of average transaction load. Many other values are captured, but these are the most important for calculating PPEM. To establish a valid range of samples, context switches must be observed. The context switch rate must remain below a certain threshold to be considered running optimally. For example, in the case of a two-processor 200-MHz system, the threshold is 10,000 context switches/sec. In addition, the processor should be exercised across a full range, from 1 percent to 80 percent CPU utilization.

To calculate PPEM in this case, multiply the total processor time by the Clock capacity of the system, which (in this example) is 4 x 200, and then divide by the number of Dynamic Searches per second.

Sample:

Connection current Dynamic search % Total processor Context switches PPEM
3.00 3.83 0.42 329.36 0.88
5.00 7.70 0.54 348.67 0.56
9.00 15.31 1.26 394.67 0.66
13.00 22.79 1.83 435.38 0.64
17.00 30.22 2.50 469.48 0.66
25.00 44.91 3.44 546.55 0.61
32.29 51.56 3.82 594.10 0.59
40.71 60.35 4.50 668.12 0.60
48.71 65.31 4.70 659.47 0.58
64.57 80.69 5.80 780.76 0.57
95.57 113.51 8.13 1,104.85 0.57
98.00 116.09 8.43 1,071.36 0.58
197.00 345.00 19.00 3,730.00 0.44
295.00 350.00 19.30 4,024.00 0.44
393.00 531.00 37.00 10,849.00 0.56
493.00 680.00 48.00 14,266.00 0.56

The overall PPEM is calculated by taking a least squares approximation of the PPEM value at the transaction rate. This generates the formulas presented in Chapter 1 for each transaction.

In this case, PPEM = .59375

Verifying Actual vs. Calculated

The following analysis shows how the accuracy of CPU calculations can reflect actual utilization.

This table shows a similar result for the 2-processor results.

Appendix B:  Scaling Summary

Listed below are the results of the capacity and analysis testing.

Dynamic Directory Scaling

Every user that accesses the Dynamic Directory increases the hardware utilization on that computer. This chart shows the maximum rate of the transactions on computers with two four Pentium Pro 200-MHz processors, respectively.

Maximum Transaction Rates

Transaction Maximum transaction rate  
  4 processors 2 processors
Modify add 480 460
Modify replace 1000 560
Search base object 1000 1000
Search OneLevel    
Add 106 100
Delete 137  

Replication of Dynamic Directory Data

The Dynamic Directory can also be scaled by adding other computers running the Dynamic Directory independently and then turning on Dynamic Directory replication. The following table shows that adding more computers can increase the total number of transactions per second than a group of computers can handle. All three of these computers were handling requests against one contiguous in memory database. The costs were linear.

Analysis:

CPU utilization on the computer that receives the request is much higher than it is on the replicated computers for Add and Modify commands.

Dynamic Directory replication does not incur major overhead onto the replicated computers.

    Dynamic Directory Computer 1 Dynamic Directory Computer 2 Dynamic Directory Computer 3
Test Xact/sec %CPU PPEM1 %CPU PPEM %CPU PPEM
Base Search of 100,000-user Dynamic Directory at Nominal throughput 137.8 31.8 .92 - - - -
Delete from a 100,000-user Dynamic Directory 373.7 86.4 .93 4.46 .05 4.36 .05
Modify add 1 attribute 444.5 47.96 .43 2.57 .02 2.61 .02
Modify replace 1 attribute 436.2 43.1 .4 5.55 .05 2.48 .02
Add a random 5 user attribute to the 100,000-user Dynamic Directory 196.8 53.2 1.08 10.39 .21 10.26 .21

1. Cost expressed in Pentium Pro Equivalent MHz (PPEM).  This is total MHz on this server, (4 processors X 200 MHz each = 800MHz) multiplied by percent CPU and divided by Trans/sec, to get the cost of each transaction.

Appendix C:  Critical Monitoring Counters

All counters noted can be found in the Microsoft® Windows NT® Performance Monitor. These counters will be distributed among the machines in the P&M service group. The counters in the System and Memory objects can be used to monitor capacity.

Site Server LDAP Service

System Object

Context Switches/sec should be less than 15,000.

% Total Processor Time should be less than 80 percent.

Processor Object

% Processor Time (average) should be less than 80 percent (for each processor)

Memory Object

Available BytesShould be greater than 4MB.

Pages/secShould be less than 1 per second.

Appendix D:  Dynamic Directory Transactions

Scripts

Dynamic Directory add script

CONNECT
BINDSIMPLE ANONYMOUS
LOOP 10
ADD dn=cn=RANDALPHA(10), ou=dynamic, o=microsoft; objectclass=member\dynamicObject;guid=RANDNUMERIC(32);c=US;language=RANDALPHA(15);userComment=RANDNUMERIC(20);
SLEEP 100
ENDLOOP
QUIT

Dynamic Directory modify read script

CONNECT 
LOOP RANDNUMBER(15,30)
MODIFY postalcode=RANDNUMERIC(9):R;dn=cn=testRANDNUMBER(1,100000), ou=dynamic, o=microsoft;
SLEEP RANDNUMBER(200,350)
ENDLOOP
QUIT

Dynamic Directory modify add script

CONNECT
LOOP 15
  MODIFY userComment=RANDALPHA(10): A;dn=cn=testRANDNUMBER(1,100000), ou=dynamic, o=microsoft;
SLEEP 100
ENDLOOP
QUIT


search script

CONNECT
LOOP 15
  SEARCH BASE RETURN ALL dn=cn=testRANDNUMBER(1,100000),ou=Dynamic,o=microsoft;filter=(objectclass=*);
  SLEEP 100
ENDLOOP
QUIT

Verification Script

CONNECT
  BINDSIMPLE dn=cn=administrator,ou=members,o=microsoft;password=pw;
  LOOP 1000
    %29 SKIP 2
      SEARCH BASE RETURN ALL dn=cn=tstRANDNUMBER(1,10000),ou=dynamic,o=microsoft; filter=(objectclass=*);
      SKIP 10
    %92 SKIP 2
      SEARCH ONELEVEL RETURN ALL dn=ou=dynamic,o=microsoft;filter=(cn=tstRANDNUMBER(1,10000));
      SKIP 7
    %97 SKIP 4
      ADD dn=cn=DavidSEQUNUMBER(1,10000,xxx),ou=dynamic,o=microsoft;objectClass=member\dynamicObject;guid=RANDNUMERIC(32);telephoneNumber=RANDNUMERIC(10);street=RANDALPHA(60);postOfficeBox=RANDNUMERIC(6);postalCode=RANDNUMERIC(9);postalAddress=RANDNUMERIC(60);st=RANDALPHA(2);givenName=RANDALPHA(40);sn=RANDNUMERIC(9);c=RANDALPHA(2);language=RANDALPHA(5);userComment=RANDNUMERIC(20);homepage=RANDALPHA(8);description=RANDALPHA(8);userpassword=password;mail=RANDALPHA(20)@corp.com;mailerNoEmail=0;mailerNumNDR=0;mailerEmailInvalid=0;
      SLEEP RANDNUMBER(20000,35000)
      DELETE cn=DavidSEQUNUMBER(1,10000,xxx),ou=dynamic,o=microsoft;
      SKIP 2
    %89 SKIP 1
      MODIFY postalcode=RANDNUMERIC(9):R;dn=cn=tstRANDNUMBER(1,10000), ou=dynamic,o=microsoft;
      SLEEP RANDNUMBER(20000,35000)
  ENDLOOP
QUIT

Information in this document, including URL and other Internet web site references, is subject to change without notice.  The entire risk of the use or the results of the use of this resource kit remains with the user.  This resource kit is not supported and is provided as is without warranty of any kind, either express or implied.  The example companies, organizations, products, people and events depicted herein are fictitious.  No association with any real company, organization, product, person or event is intended or should be inferred.  Complying with all applicable copyright laws is the responsibility of the user.  Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.

Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document.  Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.

© 1999-2000 Microsoft Corporation.  All rights reserved.

Microsoft, Windows and Windows NT are either registered trademarks or trademarks of Microsoft Corporation in the U.S.A. and/or other countries/regions.

The names of actual companies and products mentioned herein may be the trademarks of their respective owners.