Microsoft Site Server 3.0 Usage Analysis Capacity and Performance Analysis

August 1999

Microsoft Corporation

Definition of Terms

Specific Terms Meaning
Import requests per second1 Total number of requests processed during log file import divided by the total time required for log file import.
Import bytes per second Total size of the log file imported divided by the total time required for log file import.

1. This performance metric does not include requests that are explicitly excluded (such as JPEGs and GIFs) when configuring a Site in Usage Import.

Overview

This document discusses the performance and scalability characteristics of the Microsoft® Site Server version 3.0 Analysis feature. Analysis consists of four main tools: Usage Import, Report Writer, Custom Import, and Content Analyzer. This capacity and performance analysis focuses on the Usage Import tool. This tool’s primary function is to import log files that contain usage data about your site into the Analysis database.

The information in this document was generated by a series of tests run on two servers. Issues that were targeted in these tests were performance bottlenecks and scalability issues when importing log files to the Analysis feature.

Note   Your performance numbers may vary from those provided here based on your hardware and software platform.

The potential performance bottlenecks associated with log file importing are enumerated here to assist you in capacity planning.

Processor and memory scalability are profiled in terms of import requests per second and bytes imported per second. This information should help you select the appropriate hardware configuration for your Analysis deployment.

The two servers used in these tests were configured as follows.

Usage Analysis Server:  
CPU: 1, 2, 4 x 200 MHz Pentium Pro
Memory: 64, 128, 256, 512 MB of RAM
Disk: 5 x 4.3 GB SCSI
File System: NTFS
Network: Intel EtherExpress Pro/100+ on 100 MB switched Ethernet
Software: Microsoft® Windows NT® Server operating system version 4.0, SP3, Microsoft® Internet Information Services (IIS) version 4.0, Microsoft® Site Server version 3.0

Microsoft SQL Server:  
CPU: 4 x 200 MHz Pentium Pro
Memory: 256 MB of RAM
Disk: 2 x 4.3 GB SCSI
File System: 2 GB FAT (C:), 6 GB NTFS (D:)
Network: Intel EtherExpress Pro/100+ on 100 MB switched Ethernet
Software: Windows NT Server 4.0, SP3, SQL Server 6.5

Usage Analysis Description

Analysis uses the log files generated by Web servers to analyze the usage and content of a site. The data in these log files falls into five categories:

Usage Import imports these log files into the Analysis database, where Report Writer can be used to analyze the log file content. For this capacity and performance analysis, a Microsoft® SQL Server™ database was used for the Analysis database.

Critical Program Execution Phases

First the log file is parsed. The data gets read into memory and put into a cache called the hits buffer. From the hits buffer, the inference engine takes the hits and matches the server, site, and user creating state data. In this phase, simple aggregations on the hit data are also performed. Once the user is identified, the inference engine will add requests to the user's visit. All the while the purging process periodically initiates and examines each user. The purge mechanism identifies users that have not performed any requests in more than 30 minutes (left the site), and flushes their data to the database. Each user contains the user information itself, plus their current visit and the requests that make up their visit.

Log File Profiles

All tests described in this document were run using pre-generated log files. Pre-generated log files are maximally dense; they do not contain entries that are filtered out of the input stream before import processing. Such entries that are filtered out before import processing include image file entries. A conservative estimate of the ratio of actual log file size to pre-generated log file size is 5 to 1. This ratio can be applied to important import performance metrics such as import requests per second and bytes imported per second. For example, if for a given hardware configuration and log file profile 300 requests per second is observed, then it is reasonable to assume that approximately 1500 requests per second would be observed with actual log files.

The Usage Import tests were run using a variety of pre-generated logs including those in the Table 1.

Table 1  Usage Import Test Suite

Number of entries in pre-generated log Projected number of entries in actual log Log file size (MB) Number of users Number of visits per user Number of requests per visit
1,000 5,000 0.28 10 10 10
10,000 50,000 2.73 100 10 10
100,000 500,000 27 1,000 10 10
1 million 5 million 274 1,000 100 10
1 million 5 million 318 10,000 10 10
10 million 50 million 3196 100,000 10 10

Summary of Scalability and Performance

Based on the data collected in this document, the following assertions can be made about scaling and performance for Analysis:

Factors Affecting Usage Analysis Performance

The tests in this document used two classes of pre-generated log files: those with a fixed number of users and those with a variable number of users.

Log File with Fixed Number of Users

A 273 MB log file consisting of one million entries with 1000 users, 100 visits per user, and ten requests per visit was used in this section.

Memory

Charts 1, 2, and 3 show how total import time, import requests per second, and import bytes per second scaled in this test as a function of physical RAM. For systems with 64 MB of RAM, a substantial degradation in these performance metrics occurred with pages out per second increasing substantially to 135 pages out per second. The total number of private bytes consumed by all Analysis executables is approximately 44 MB, not leaving enough physical RAM space for Microsoft® Windows NT® and other required services.

CPU

Charts 1, 2, and 3 also show how total import time, import requests per second, and import bytes per second scaled as a function of the number of processors.

In systems configured with at least 128 MB of RAM, memory is not a bottleneck and the performance metrics change only marginally as the number of processors is varied. Two-processor systems improve import request rates and import byte rates between six percent and 17 percent over the rates of one-processor systems.

Chart 1  Import time scaling

Chart 2  Import requests per second scaling (actual log file)

Chart 3  (Actual log file) import bytes per second scaling

Log Files with Variable Number of Users

It is of interest to measure how the performance changes as the number of users, and the number of open visits, increases. Because the performance improvement of importing with two-processors is only marginal and because insufficient memory was shown to bottleneck performance substantially, a system configuration with one-processor and 512 MB of RAM was chosen to collect measurements.

Figure 1 shows that the import time is a linear function of the number of open visits.

Figure 1

Table 2 shows the import time as a function of the projected number of entries in an actual log file which consists of a variable number of users with ten visits per user and ten requests per visit.

Table 2

Number of users Number of entries in pre-generated log Number of entries in actual log Import time
10 1,000 5,000 6 sec
100 10,000 50,000 44 sec
1000 100,000 500,000 5 min. 19 sec.
10000 1 million 5 million 1 hr. 37 sec.
100000 10 million 50 million 15 hr. 9 min. 32 sec.

Performance Degradation Regimes

Figures 2 and 3 show that during the tests import requests per second and import bytes per second began to decrease as a function of the number of users once the number of users increased to about 10,000. The total private bytes consumed by all Analysis executables is shown in Figure 4 and did not exceed 294 MB for up to 100,000 users, therefore leaving sufficient RAM space for operation of Windows NT and other required services. This suggests that insufficient memory was not a bottleneck contributing to performance degradation in this case.  

Figure 2

Figure 3

Figure 4

Heavy Consumption of Analysis Server CPU and SQL Server Disk

Performance degradation may be explained by heavy consumption of Analysis server processor and SQL Server disk resources during the first 26 percent of total import time of the log file with 10,000 users. During the first six percent of the total import time, processor utilization is approximately 100 percent. This initial period of time extends beyond the time required to create the first 10,000 open visits. The next 20 percent of the total import time is spent handling substantial SQL Server disk activity. In particular, disk activity increases and averages approximately 48 percent overall with sustained periods of 100 percent disk utilization. After this first 26 percent of total import time is complete, all hardware resource consumption remains within reasonable bounds.

System Recommendations

The results of the tests discussed in this document suggest that for best performance, you follow these recommendations:

Appendix A:  Scalability and Performance Data

Processor Usage

Analysis Server Processor Scaling

MB of RAM Number of Processors Percentage of total

CPU utilization

Context switching per second Import requests per second Import bytes per second (Actual) import requests per second (Actual) import bytes per second
64 1 15 811 72 20771 360 103855
64 2 11 1463 90 25904 450 129520
64 4 4 1028 72 20658 360 103290
128 1 54 494 345 99146 1725 495730
128 2 35 2885 405 116380 2025 581900
128 4 17 3741 382 109753 1910 548765
256 1 51 321 316 90636 1580 453180
256 2 25 2245 336 96709 1680 483545
256 4 16 4214 332 95360 1660 476800
512 1 55 559 338 97101 1690 485505
512 2 33 3743 386 110897 1930 554485
512 4 16 2741 378 108551 1890 542755

Memory Usage

Analysis Server Memory Scaling

MB of RAM Number of Processors Total Analysis Private Megabytes Total system available bytes Pages in per second Pages out per second
64 1 Approx 44 2221439 110 135
64 2   3083006 140 199
64 4 2683335 110 142
128 1 10634085 38 5.8
128 2 12616466 28 1.3
128 4 11877117 38 2.1
256 1 99938472 22 0.7
256 2 130668896 20 1
256 4 122526624 44 2.4
512 1 358771712 42 1.7
512 2 365349088 38 1.5
512 4 369011232 25 1

Disk Usage

SQL Server Disk Scaling

Analysis Server MB of RAM

Analysis Server number of processors Percentage of SQL Server disk Utilization SQL disk reads/sec SQL disk writes/sec
64 1 0.9 0.5 4.1
64 2 1 0.6 5
64 4 0.7 0.3 3.8
128 1 4.9 4.1 18.2
128 2 5 2.9 21.7
128 4 6.7 6.1 21.5
256 1 5.6 5.4 17.4
256 2 3.7 2 17.4
256 4 3.9 2.2 17.6
512 1 4.1 2.5 18.8
512 2 5.1 3.2 21.3
512 4 6 5.3 20.9

Appendix B:  Critical Monitoring Counters

All counters noted can be found in the Microsoft ® Windows NT® Performance Monitor and can be used to capture potential performance bottlenecks due to excessive hardware resource utilization.

The counters in the Usage Analysis server memory, process, and system objects and the SQL Server physical disk object can be used to monitor capacity.

Memory

Available bytes

Committed bytes

Pages in per second

Pages out per second

Process

Private bytes UAS, Uimport

Working set UAS, Uimport

System

Percentage of total processor

Physical Disk

Percentage of disk time

Disk reads per second

Disk writes per second

Average disk queue length

Information in this document, including URL and other Internet web site references, is subject to change without notice.  The entire risk of the use or the results of the use of this resource kit remains with the user.  This resource kit is not supported and is provided as is without warranty of any kind, either express or implied.  The example companies, organizations, products, people and events depicted herein are fictitious.  No association with any real company, organization, product, person or event is intended or should be inferred.  Complying with all applicable copyright laws is the responsibility of the user.  Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.

Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document.  Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.

©1998-2000 Microsoft Corporation.  All rights reserved.

Microsoft, Windows and Windows NT are either registered trademarks or trademarks of Microsoft Corporation in the U.S.A. and/or other countries/regions.

The names of actual companies and products mentioned herein may be the trademarks of their respective owners.