Dynamic Directory–Specific Terms | Meaning |
User | An individual user connected to a service. |
Add Record | Adding a user to the Dynamic Directory database. |
Concurrent Users | Simultaneous users active on the system; a subset of total users calculated based on user profile. |
Least Squares | Method of approximating a polynomial equation onto a non-linear data set. |
Attributes | Specific information associated with a file or an object. |
General Terms | Meaning |
Pentium Pro equivalent MHz (PPEM) | A unit of measure for processor work.
200 Pentium Pro equivalent MHz is delivered by a 200 MHz Pentium Pro processor. A computer with two 200 MHz Pentium Pro processors will deliver 400 Pentium Pro equivalent MHz. |
This document evaluates the performance and scalability characteristics of the Microsoft® Site Service 3.0 Dynamic Directory. Also demonstrated are procedures for identifying these characteristics. Using these procedures, administrators can determine how user load impacts hardware resources, and which resources are likely to be bottlenecks in performance. This information can be used to calculate maximum capacity for a particular hardware configuration, and to assess both the value of adding resources and identify which resources can satisfy greater capacity needs.
The specific implementation of the Dynamic Directory for Microsoft® NetMeeting™ is discussed in a different report.
The Dynamic Directory is a central data repository that stores user records containing personal profile information used in Personalization & Membership (P&M). The Dynamic Directory stores session information, information about users that are currently logged on , and also includes user identifiers and Internet Protocol (IP) addresses. The Dynamic Directory may also contain per-user application information, such as items in a shopping cart or pages visited.
The Lightweight Directory Access Protocol (LDAP) Service manages the Dynamic Directory. The contents of the Dynamic Directory are stored in a RAM database. Dynamic Directory contents are obtained using the LDAP Service, which provides platform-independent access to the Dynamic Directory. To obtain performance and scalability information, the Dynamic Directory and the LDAP Service were analyzed in separate tests using various server configurations.
A model is designed in which total resource cost is calculated for a variable number of concurrent users. The model is as follows:
C = n × K
C is resource cost, n = number of concurrent users, and K is the sum of the costs of each user operation.
Separate equations are created for each type of resource (CPU, memory, disk and network). It is possible to reasonably predict the maximum number of users a particular hardware configuration can support, or conversely, to determine the hardware resources required for a given number of users.
The accuracy of these calculations has been confirmed by comparing the results of the calculations with a series of verification tests. A test script is created to simulate typical user behavior that is defined in the User Profile. The predicted resource costs calculated from the model are charted against actual resource costs generated by running the verification script.
Given a model for predicting resource requirements, it is important to test scalability. For example, does investing in a multiple-processor system enhance the performance of the Dynamic Directory? How does replicating the Dynamic Directory across multiple computers affect overall performance?
In the test scenarios, this computer was used in the following configuration:
Dynamic Directory Server: | ||
CPU: | 2&4 x 200-MHz Pentium Pro | |
Memory: | 512 MB of RAM | |
Disk: | 4.3-GB SCSI | |
Network: | 100BaseT | |
Software: | Microsoft® Windows NT® version 4.0 Option Pack v. 622 (final) |
Tested Transactions
Transaction | Description |
Modify Add Attribute | Add one attribute to an existing user |
Modify Replace Attribute | Change the value of one attribute in an existing user |
Search Base | LDAP base-level search for a user by common name (CN) |
Search Onelevel | LDAP one-level search for a user by name |
Add Record | Add a 20-attribute user |
Delete | Delete one user |
The following profile captures users’ typical Dynamic Directory transactions. The Dynamic Directory can contain 10,000 users on a single computer. Each 100 users share a single connection to the Dynamic Directory. The ratio of operations and the frequency of each transaction is described in the following table:
Typical Dynamic Directory Transactions
Transaction | Operations in user life cycle | Frequency |
Add | 3% | 0.00048 per second |
Modify Replace | 11% | 0.00194 per second |
Search by CN | 74% | 0.0123 per second |
Search Onelevel | 9% | 0.00146 per second |
Delete | 3% | 0.00048 per second |
With this profile, the calculated scalability should read:
Based on the data collected in this document, the following assertions can be made about scaling and performance for the Dynamic Directory:
The system is an in-memory database. If there is not sufficient memory, then the system does not function properly.
There is a background worker thread that eliminates timed-out users from the Dynamic Directory. When this thread is active, it can affect transaction response time.
A large number of open LDAP Service connections cause context switching, which can impact throughput. Therefore, it is recommended that concurrent connections be limited to 5,000 per computer.
Performing an LDAP bind as administrator greatly reduces the context switching overhead in the security layer, which can occur at higher transaction rates.
The use of the Dynamic Directory as a directory service for NetMeeting is discussed in a separate paper.
The PPEM costs of the transactions for the Dynamic Directory are not constants. They vary with the rate of the transaction. In some cases, when the transaction rate is very low, costs tend to be higher. This table represents the least squares approximation of PPEM versus Transaction Rate curves that were measured in our test environment. The relative accuracy of these measurements is confirmed in the following verification section.
The PPEM at a given rate r on a given processor is r x C1
Four processors
Transaction | Valid range for r | PPEM C1 |
Modify Add Attribute | 1–480 | 1.1 |
Modify Replace Attribute | 1– 605 | 0.7106 |
Search Base | 1– 531 | 0.59375 |
Search Onelevel | 1–500 | 0.65 |
Add Record | 1– 106 | 2.5 |
Delete | 1– 137 | 2.29 |
Two processors
Transaction | Valid range for r | PPEM C1 |
Modify Add Attribute | 1– 257 | 0.71 |
Modify Replace Attribute | 1– 250 | 0.8007 |
Search | 1– 440 | 0.55 |
Search Onelevel | 1–400 | 0.49 |
Add Record | 1– 100 | 2 |
Delete | 1– 100 | 2 |
The base memory cost for running the Dynamic Directory is 850K.
There is an additional cost of 11K per active connection.
There is also a cost of approximately 1,500 bytes per Dynamic Directory object with five attributes. The memory cost will vary depending upon the attributes.
This table describes a typical network resource usage of the transactions that were measured.
Byte usage for typical transactions
Transaction | Size in bits |
Modify Add | 1048 |
Modify Replace | 1024 |
Search Base | 3880 |
Search Onelevel | 3880 |
Add | 2288 |
Delete | 160 |
Because the database is stored in memory, there is no measurable disk utilization on the Dynamic Directory computer for each transaction.
Before looking at specific components of the CPU, disk, and network calculations, the user profile will be examined. Using the number of concurrent users to determine maximum capacity is difficult to achieve if user behavior is not considered.
T= n x f
Using Add Transaction the equation becomes:
T = n x .000485 per second
To find out how many Add transactions are generated per second for 5,000 concurrent users per connection, the equation becomes:
T = 5000 x .000485 = 2.425
The following is an expansion of the entire table:
Transaction rates in user profile
Concurrent users | Add per sec. | Modify per sec. | Base search per sec. | One Level Search per sec. | Delete per sec. |
500 | 0.24 | 0.97 | 6.15 | 0.73 | 0.24 |
1,000 | 0.49 | 1.94 | 12.30 | 1.46 | 0.49 |
1,500 | 0.73 | 2.91 | 18.45 | 2.18 | 0.73 |
2,000 | 0.97 | 3.88 | 24.60 | 2.91 | 0.97 |
2,500 | 1.21 | 4.85 | 30.74 | 3.64 | 1.21 |
5,000 | 2.43 | 9.71 | 61.49 | 7.28 | 2.43 |
7,500 | 3.64 | 14.56 | 92.23 | 10.92 | 3.64 |
10,000 | 4.85 | 19.42 | 122.98 | 14.56 | 4.85 |
20,000 | 9.71 | 38.83 | 245.95 | 29.13 | 9.71 |
25,000 | 12.14 | 48.54 | 307.44 | 36.41 | 12.14 |
30,000 | 14.56 | 58.25 | 368.93 | 43.69 | 14.56 |
50,000 | 24.27 | 97.09 | 614.89 | 72.82 | 24.27 |
Calculate the total CPU cost at 5,000 users for all five transactions in the user profile. Using the Add transaction again and arbitrarily choosing the 4-processor numbers:
PPEMAdd = Rate * PPEM C1
PPEMAdd = 2.425 hz * 2.5 PPEM = 6.0625 PPEM
This is 6.0625 PPEM / 800 PPEM Capacity = .758% CPU on a 4-processor Pentium Pro 200 MHz server.
The following is an expansion of the entire table for a 4-processor system.
Conc. users | Add PPEM | Modify PPEM | Base search PPEM | One Level Search PPEM | Delete PPEM | PPEM total | % CPU |
500 | 0.61 | 0.69 | 3.65 | 0.47 | 0.56 | 5.98 | 0.75 |
1,000 | 1.21 | 1.38 | 7.30 | 0.95 | 1.11 | 11.95 | 1.49 |
1,500 | 1.82 | 2.07 | 10.95 | 1.42 | 1.67 | 17.93 | 2.24 |
2,000 | 2.43 | 2.76 | 14.60 | 1.89 | 2.22 | 23.91 | 2.99 |
2,500 | 3.03 | 3.45 | 18.25 | 2.37 | 2.78 | 29.88 | 3.74 |
5,000 | 6.07 | 6.90 | 36.51 | 4.73 | 5.56 | 59.77 | 7.47 |
7,500 | 9.10 | 10.35 | 54.76 | 7.10 | 8.34 | 89.65 | 11.21 |
10,000 | 12.14 | 13.80 | 73.02 | 9.47 | 11.12 | 119.53 | 14.94 |
20,000 | 24.27 | 27.60 | 146.04 | 18.93 | 22.23 | 239.07 | 29.88 |
25,000 | 30.34 | 34.50 | 182.54 | 23.67 | 27.79 | 298.84 | 37.35 |
30,000 | 36.41 | 41.40 | 219.05 | 28.40 | 33.35 | 358.60 | 44.83 |
50,000 | 60.68 | 68.99 | 365.09 | 47.33 | 55.58 | 597.67 | 74.71 |
The following is an expansion of the entire table for a 2-processor system.
Conc. users | Add PPEM | Modify PPEM | Base search PPEM | One Level Search PPEM | Delete PPEM | PPEM total | % CPU |
500 | 0.49 | 0.78 | 3.38 | 0.36 | 0.49 | 5.49 | 1.37 |
1,000 | 0.97 | 1.55 | 6.76 | 0.71 | 0.97 | 10.97 | 2.74 |
1,500 | 1.46 | 2.33 | 10.15 | 1.07 | 1.46 | 16.46 | 4.12 |
2,000 | 1.94 | 3.11 | 13.53 | 1.43 | 1.94 | 21.95 | 5.49 |
2,500 | 2.43 | 3.89 | 16.91 | 1.78 | 2.43 | 27.43 | 6.86 |
5,000 | 4.85 | 7.77 | 33.82 | 3.57 | 4.85 | 54.87 | 13.72 |
7,500 | 7.28 | 11.66 | 50.73 | 5.35 | 7.28 | 82.30 | 20.58 |
10,000 | 9.71 | 15.55 | 67.64 | 7.14 | 9.71 | 109.74 | 27.43 |
20,000 | 19.42 | 31.10 | 135.28 | 14.27 | 19.42 | 219.48 | 54.87 |
25,000 | 24.27 | 38.87 | 169.09 | 17.84 | 24.27 | 274.35 | 68.59 |
30,000 | 29.13 | 46.64 | 202.91 | 21.41 | 29.13 | 329.22 | 82.30 |
Example: Calculate the total network bits per second at 5,000 users for all five transactions in the user profile. Use the Add transaction again and arbitrarily choose the 4-processor numbers.
bits/secAdd = 2.425 hz x 2288 bits/sec = 5548 bits/sec
Conc. users | Add bits/sec | Modify bits/sec | Base search bits/sec | One-Level bits/sec | Delete bits/sec | Total bits/sec |
500 | 555 | 994 | 23,858 | 2,825 | 39 | 28,271 |
1,000 | 1,111 | 1,988 | 47,715 | 5,650 | 78 | 56,542 |
1,500 | 1,666 | 2,983 | 71,573 | 8,476 | 117 | 84,814 |
2,000 | 2,221 | 3,977 | 95,430 | 11,301 | 155 | 113,085 |
2,500 | 2,777 | 4,971 | 119,288 | 14,126 | 194 | 141,356 |
5,000 | 5,553 | 9,942 | 238,576 | 28,252 | 388 | 282,712 |
7,500 | 8,330 | 14,913 | 357,864 | 42,379 | 583 | 424,068 |
10,000 | 11,107 | 19,883 | 477,152 | 56,505 | 777 | 565,424 |
20,000 | 22,214 | 39,767 | 954,304 | 113,010 | 1,553 | 1,130,848 |
25,000 | 27,767 | 49,709 | 1,192,880 | 141,262 | 1,942 | 1,413,560 |
30,000 | 33,320 | 59,650 | 1,431,456 | 169,515 | 2,330 | 1,696,272 |
50,000 | 55,534 | 99,417 | 2,385,761 | 282,524 | 3,883 | 2,827,120 |
Based on the same user profile used throughout this document, the following example illustrates how to determine an optimal configuration for a given number of users.
Example: The site has 200,000 total users. On average, each user connects for 20 minutes and performs 20 transactions. The distribution is:
At peak time, 2 percent of total users are online simultaneously with 4,000 concurrent users.
The number of requests per second is given by:
N x F
where the number of concurrent users N is computed as stated earlier and the frequency F is the number of queries per user per second.
Total Add per second during peak time is 4000Users x .000485rate = 1.94 per sec.
Total Modify per second during peak time is 4000Users x .001942rate = 7.77 per sec.
Total Base Search per second during peak time is 4000Users x .012298rate = 49.19 per sec.
Total Onelevel Search per second during peak time is 4000Users x .001456rate = 5.83 per sec.
Total Delete per second during peak time is 4000Users x .000485rate = 1.94 per sec.
For each of the transactions, compute the expected CPU cost. Computing the cost of the individual transactions constitutes the PPEM model. The model assumes that the total cost of running all the transactions together will not bring the system into the area of context switching. Note that context switching on multi-processor systems is very expensive at higher transaction rates.
It is also important to note that PPEM costs are only accurate to predict approximately 80 percent utilization on a processor. Above these levels, it is recommended that you run a computer with greater CPU capacity, where the best choice is increasing the clock speed. Another solution is to add an additional server to distribute the load.
Furthermore, each of the PPEM equations is only valid up to the maximum range measured.
In order to calculate the cost for the Add measurement for a 4-processor:
PPEMAdd = Rate x PPEM C1 è 1.94 x 2.50PPEM = 4.85 PPEM
PPEMModify = Rate x PPEM C1 è 7.77 x 0.71PPEM = 5.52 PPEM
PPEMBase Search = Rate x PPEM C1 è 49.19 x 0.59PPEM = 29.21 PPEM
PPEM1level Search = Rate x PPEM C1 è 5.83 * 1.04PPEM = 6.06 PPEM
PPEMDelete = Rate x PPEM C1 è 1.94 x 2.29PPEM = 4.45 PPEM
The total is PPEM 48.22 or 6.03% CPU utilization on a 4-processor system.
To calculate the PPEM from observed data, the load was simulated on a per transaction basis, in this case, Base Object Dynamic Search. An automated script was run that cycles through a changing user load. The values were captured in a perfmon.exe log file. This log file was exported to a tab-delimited text file and then imported into Microsoft® Excel. Using an Excel macro, the log was distributed and averaged. Thus, each row of data is a reasonable representation of average transaction load. Many other values are captured, but these are the most important for calculating PPEM. To establish a valid range of samples, context switches must be observed. The context switch rate must remain below a certain threshold to be considered running optimally. For example, in the case of a two-processor 200-MHz system, the threshold is 10,000 context switches/sec. In addition, the processor should be exercised across a full range, from 1 percent to 80 percent CPU utilization.
To calculate PPEM in this case, multiply the total processor time by the Clock capacity of the system, which (in this example) is 4 x 200, and then divide by the number of Dynamic Searches per second.
Connection current | Dynamic search | % Total processor | Context switches | PPEM |
3.00 | 3.83 | 0.42 | 329.36 | 0.88 |
5.00 | 7.70 | 0.54 | 348.67 | 0.56 |
9.00 | 15.31 | 1.26 | 394.67 | 0.66 |
13.00 | 22.79 | 1.83 | 435.38 | 0.64 |
17.00 | 30.22 | 2.50 | 469.48 | 0.66 |
25.00 | 44.91 | 3.44 | 546.55 | 0.61 |
32.29 | 51.56 | 3.82 | 594.10 | 0.59 |
40.71 | 60.35 | 4.50 | 668.12 | 0.60 |
48.71 | 65.31 | 4.70 | 659.47 | 0.58 |
64.57 | 80.69 | 5.80 | 780.76 | 0.57 |
95.57 | 113.51 | 8.13 | 1,104.85 | 0.57 |
98.00 | 116.09 | 8.43 | 1,071.36 | 0.58 |
197.00 | 345.00 | 19.00 | 3,730.00 | 0.44 |
295.00 | 350.00 | 19.30 | 4,024.00 | 0.44 |
393.00 | 531.00 | 37.00 | 10,849.00 | 0.56 |
493.00 | 680.00 | 48.00 | 14,266.00 | 0.56 |
The overall PPEM is calculated by taking a least squares approximation of the PPEM value at the transaction rate. This generates the formulas presented in Chapter 1 for each transaction.
In this case, PPEM = .59375
The following analysis shows how the accuracy of CPU calculations can reflect actual utilization.
This table shows a similar result for the 2-processor results.
Listed below are the results of the capacity and analysis testing.
Every user that accesses the Dynamic Directory increases the hardware utilization on that computer. This chart shows the maximum rate of the transactions on computers with two four Pentium Pro 200-MHz processors, respectively.
Maximum Transaction Rates
Transaction | Maximum transaction rate | |
4 processors | 2 processors | |
Modify add | 480 | 460 |
Modify replace | 1000 | 560 |
Search base object | 1000 | 1000 |
Search OneLevel | ||
Add | 106 | 100 |
Delete | 137 |
The Dynamic Directory can also be scaled by adding other computers running the Dynamic Directory independently and then turning on Dynamic Directory replication. The following table shows that adding more computers can increase the total number of transactions per second than a group of computers can handle. All three of these computers were handling requests against one contiguous in memory database. The costs were linear.
Analysis:
CPU utilization on the computer that receives the request is much higher than it is on the replicated computers for Add and Modify commands.
Dynamic Directory replication does not incur major overhead onto the replicated computers.
Dynamic Directory Computer 1 | Dynamic Directory Computer 2 | Dynamic Directory Computer 3 | |||||
Test | Xact/sec | %CPU | PPEM1 | %CPU | PPEM | %CPU | PPEM |
Base Search of 100,000-user Dynamic Directory at Nominal throughput | 137.8 | 31.8 | .92 | - | - | - | - |
Delete from a 100,000-user Dynamic Directory | 373.7 | 86.4 | .93 | 4.46 | .05 | 4.36 | .05 |
Modify add 1 attribute | 444.5 | 47.96 | .43 | 2.57 | .02 | 2.61 | .02 |
Modify replace 1 attribute | 436.2 | 43.1 | .4 | 5.55 | .05 | 2.48 | .02 |
Add a random 5 user attribute to the 100,000-user Dynamic Directory | 196.8 | 53.2 | 1.08 | 10.39 | .21 | 10.26 | .21 |
1. Cost expressed in Pentium Pro Equivalent MHz (PPEM). This is total MHz on this server, (4 processors X 200 MHz each = 800MHz) multiplied by percent CPU and divided by Trans/sec, to get the cost of each transaction.
All counters noted can be found in the Microsoft® Windows NT® Performance Monitor. These counters will be distributed among the machines in the P&M service group. The counters in the System and Memory objects can be used to monitor capacity.
Context Switches/sec should be less than 15,000.
% Total Processor Time should be less than 80 percent.
% Processor Time (average) should be less than 80 percent (for each processor)
Available BytesShould be greater than 4MB.
Pages/secShould be less than 1 per second.
CONNECT
BINDSIMPLE ANONYMOUS
LOOP 10
ADD dn=cn=RANDALPHA(10), ou=dynamic, o=microsoft; objectclass=member\dynamicObject;guid=RANDNUMERIC(32);c=US;language=RANDALPHA(15);userComment=RANDNUMERIC(20);
SLEEP 100
ENDLOOP
QUIT
CONNECT
LOOP RANDNUMBER(15,30)
MODIFY postalcode=RANDNUMERIC(9):R;dn=cn=testRANDNUMBER(1,100000), ou=dynamic, o=microsoft;
SLEEP RANDNUMBER(200,350)
ENDLOOP
QUIT
CONNECT
LOOP 15
MODIFY userComment=RANDALPHA(10): A;dn=cn=testRANDNUMBER(1,100000), ou=dynamic, o=microsoft;
SLEEP 100
ENDLOOP
QUIT
CONNECT
LOOP 15
SEARCH BASE RETURN ALL dn=cn=testRANDNUMBER(1,100000),ou=Dynamic,o=microsoft;filter=(objectclass=*);
SLEEP 100
ENDLOOP
QUIT
CONNECT
BINDSIMPLE dn=cn=administrator,ou=members,o=microsoft;password=pw;
LOOP 1000
%29 SKIP 2
SEARCH BASE RETURN ALL dn=cn=tstRANDNUMBER(1,10000),ou=dynamic,o=microsoft; filter=(objectclass=*);
SKIP 10
%92 SKIP 2
SEARCH ONELEVEL RETURN ALL dn=ou=dynamic,o=microsoft;filter=(cn=tstRANDNUMBER(1,10000));
SKIP 7
%97 SKIP 4
ADD dn=cn=DavidSEQUNUMBER(1,10000,xxx),ou=dynamic,o=microsoft;objectClass=member\dynamicObject;guid=RANDNUMERIC(32);telephoneNumber=RANDNUMERIC(10);street=RANDALPHA(60);postOfficeBox=RANDNUMERIC(6);postalCode=RANDNUMERIC(9);postalAddress=RANDNUMERIC(60);st=RANDALPHA(2);givenName=RANDALPHA(40);sn=RANDNUMERIC(9);c=RANDALPHA(2);language=RANDALPHA(5);userComment=RANDNUMERIC(20);homepage=RANDALPHA(8);description=RANDALPHA(8);userpassword=password;mail=RANDALPHA(20)@corp.com;mailerNoEmail=0;mailerNumNDR=0;mailerEmailInvalid=0;
SLEEP RANDNUMBER(20000,35000)
DELETE cn=DavidSEQUNUMBER(1,10000,xxx),ou=dynamic,o=microsoft;
SKIP 2
%89 SKIP 1
MODIFY postalcode=RANDNUMERIC(9):R;dn=cn=tstRANDNUMBER(1,10000), ou=dynamic,o=microsoft;
SLEEP RANDNUMBER(20000,35000)
ENDLOOP
QUIT
Information in this document, including URL and other Internet web site references, is subject to change without notice. The entire risk of the use or the results of the use of this resource kit remains with the user. This resource kit is not supported and is provided as is without warranty of any kind, either express or implied. The example companies, organizations, products, people and events depicted herein are fictitious. No association with any real company, organization, product, person or event is intended or should be inferred. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.
Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.
© 1999-2000 Microsoft Corporation. All rights reserved.
Microsoft, Windows and Windows NT are either registered trademarks or trademarks of Microsoft Corporation in the U.S.A. and/or other countries/regions.
The names of actual companies and products mentioned herein may be the trademarks of their respective owners.