Brian Lawton and Don Awalt
RDA Consultants Limited
October 1999
Summary: This article examines the Microsoft SQL Server 7.0 DTS object model and provides a scenario in which a Microsoft Visual Basic 6.0 application uses the DTS COM interfaces to perform its data transformations. (16 printed pages)
Introduction
DTS Object Model
Transformations
The Interfaces
Implementing a Solution for Data Mart Population
About the Authors
For More Information
This document was previously published in SQL Server Magazine as “Unleashing the Power of DTS.” Data Transformation Services (DTS) is a Microsoft® SQL Server™ 7.0 component that provides import, export, and data-manipulation capabilities between OLE DB, ODBC, and ASCII data stores. Like many SQL Server tools, the DTS Package Designer and wizards are essentially wrappers that graphically expose an object model. Although these tools offer an extensive array of options that you can use to develop elaborate transformations, they do not expose the complete functionality and power of the DTS object model. To expand this power, you must interact with DTS using its native COM interfaces. This article examines the DTS object model and provides a scenario in which a Microsoft Visual Basic® 6.0 application uses the DTS COM interfaces to perform its data transformations.
The DTS object model offers a complete set of components that let you move data among multiple data stores. Figure 1 provides a summary of the major collections that compose the Package object model: Steps, Connections, GlobalVariables, and Tasks. Each collection plays an equally important role in supporting a Package.
Figure 1. Package object model
The Steps collection contains all the Task workflow information in a package. You can associate each Step object with a collection of Task objects; however, if you want to execute a Task, it must have at least one Step. Unlike the DTS Package Designer, which implicitly creates any needed steps, applications using the DTS COM interfaces must explicitly define the steps.
Whether a Step object executes depends on its PrecedenceConstraint objects; a Step can execute in parallel with other steps, serially to them, or conditionally based on their execution. A precedence constraint defines the conditions that prior Step object must meet before the current Step can execute. Thus, a Step with more than one precedence constraint cannot execute until it satisfies all its precedence constraints.
The three types of precedence constraints are:
The Connections collection contains all the OLE DB data provider information in a package. Each Connection object describes the data provider. With connection pooling, you can then reuse these connections across multiple Steps or Tasks collections, allowing you to do more work with fewer resources. Connection pooling is a provider-supplied function that enables an application to reuse an existing connection rather than physically establishing a new connection for each use.
The GlobalVariables collection provides an internal storage area where you can store data and share it across different Steps collections within a Package object. The GlobalVariable object provides a variant data type for storage. Each Package object contains a collection of GlobalVariable objects that you can dynamically add or remove during execution.
The Tasks collection, as shown in Figure 2, contains all the defined Task objects in a Package object. Each Task object contains information describing a unit of work that the Package object must perform as part of the transformation process. The CustomTask object is necessary to implement any Task. Using CustomTask, you can extend DTS by creating Task objects to supplement the built-in Task objects that ship with SQL Server. CustomTask objects can include their own customized user interface and dialog boxes, which you can add to the DTS Designer. The DTS interfaces also let you use CustomTask objects directly in COM applications.
Figure 2. Tasks collection
The following table lists the eight built-in Task objects that ship with DTS. This document focuses on the three Task objects related most directly to performing transformations: DataPumpTask, Data-DrivenQueryTask, and ExecuteSQLTask.
Built-In DTS Task Objects | |
Task Objects | Description |
ExecuteSQLTask | Executes one or more SQL statements. |
DataPumpTask | Copies and transforms data from a source to a destination data store. |
ActiveXScriptTask | Executes an ActiveX® script. These tasks can perform any operation that the language supports, including interaction with ADO and invocation of any other object supporting OLE Automation. |
ExecuteProcessTask | Executes an external program. |
BulkInsertTask | Performs a SQL Server/Transact-SQL BULK INSERT operation. |
SendMailTask | Sends an e-mail or page using SQL Mail. |
DataDrivenQueryTask | Performs advanced data transformations between data stores. |
TransferSQLServerObjectsTask | Transfers complete objects from one SQL Server 7.0 database to another. |
The DataPumpTask object defines the information necessary to create and execute an instance of the DTS data pump, which transforms data as it is moved from the source to the destination data store. The data pump, as shown in Figure 3, is a multithreaded, high-speed, in-process COM server that moves and transforms rowsets between a source and a destination data store. ActiveX scripts can perform the transformations as the data pump moves the data. If your transformation is too complicated for the ActiveX scripting language, you can develop Custom transformations with any COM-compliant language. Note that DTS applies transformations row by row as the data passes through the pump. Therefore, when designing custom transformations, you must evaluate and minimize any performance hits.
Figure 3. Data pump
The DataDrivenQueryTask object, shown in Figure 4, defines the information necessary to move data between the source and destination data stores using data-driven queries. DataDrivenQueryTask differs from DataPumpTask in the way Task object applies the data to the destination data store.
Figure 4. DataDrivenQueryTask object
DTS provides two methods of interaction with the destination data store:
When designing your application, keep in mind that these two methods are mutually exclusive. Therefore, apply the following rule when choosing between the two. If moving data into the destination data store is an insert-based scenario (always a complete refresh or add) implement the transformation using DataPumpTask. If moving data into the destination data store is a query-based scenario (requiring UPDATE, DELETE, or stored procedures) implement the transformation with DataDrivenQueryTask. Unless you implement DTS in complete refresh scenarios, most of your development work will use DataDrivenQueryTask.
Both DataPumpTask and DataDrivenQueryTask support the Lookup object, with which you can perform queries that use one or more named parameterized queries. The Lookup object lets a transformation retrieve data from locations other than the immediate source or destination row being transformed. For example, you might use a lookup if your source and destination data stores are SQL Server databases, but one of the values needed at the destination resides in an alternative location (for example, Oracle, DB2, or Microsoft Excel). For each source row, Lookup performs a query against the third-party location to retrieve the needed value.
Two dynamic link library(.dll) files that allow custom applications to directly access DTS packages and the data pump are provided:
Although both the DTSPkg and DTSPump interfaces let you create packages and CustomTask objects, you can implement custom transformations using C or C++ only by using Dtspump.h. Because DTS processes transformations one row at a time, the performance overhead of an OLE Automation interface is too overwhelming. Therefore, the development of custom transformations is restricted to the native OLE DB interface that Dts.pump.h includes.
This section presents a data mart population scenario and implements a solution using Visual Basic 6.0 that illustrates Connections, Steps, PrecedenceConstraints, DataPumpTask, DataDrivenQueryTask, and ExecuteSQLTask.
A large corporation must combine the employee data from two subsidiary companies into one data mart and refresh that data weekly. To simulate the subsidiary companies, this example uses the Employee tables from two SQL Server sample databases, Northwind and Pubs. A new database called HRMart represents the parent company’s data mart. Listing 1 shows the HRMart schema.
Listing 1. The HRMart schema
use master
go
create database HRMart
go
use HRMart
go
set quoted_identifier OFF
GO
if exists (select * from sysobjects
where id = object_id(N’[dbo].[Employee]’)
and OBJECTPROPERTY(id, N’IsUserTable’) = 1)
drop table [dbo].[Employee]
GO
if exists (select * from sysobjects
where id = object_id(N’[dbo].[Site]’)
and OBJECTPROPERTY(id, N’IsUserTable’) = 1)
drop table [dbo].[Site]
GO
CREATE TABLE [dbo].[Employee] (
[SiteId] [int] NOT NULL ,
[EmployeeId] [char] (9) NOT NULL ,
[FirstName] [char] (10) NOT NULL ,
[MiddleName] [char] (10) NULL ,
[LastName] [char] (10) NOT NULL ,
[DateOfBirth] [datetime] NULL ,
[DateOfHire] [datetime] NULL ,
[Address] [varchar] (60) NULL ,
[City] [varchar] (60) NULL ,
[ZipCode] [char] (9) NOT NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[Site] (
[SiteId] [int] NOT NULL ,
[Name] [varchar] (50) NOT NULL
) ON [PRIMARY]
GO
To begin, you must create a new Visual Basic project and add a reference to the DTS Package Object Library. Create a new Standard EXE project called HRMartExample, and then add the object reference, as shown in Figure 5.
Figure 5. DTS Package Object Library
To enhance the readability of the Visual Basic code by using all available DTS enumeration constants, add the Microsoft DTSDataPump Scripting Object Library. Because the HRMartExample application does not have a user interface, remove the default form from the project and replace it with a new module. Call the module HRMartPackage; it will contain all your code.
Listing 2 shows the module’s declarations and main subroutine. Starting with the declarations, use the moPackage variable to reference the Package object throughout the module. Next, establish an enumeration of the connection IDs to use throughout the module, EConnectionIds. The ID uniquely identifies each connection. Using the enumeration lets you enforce the uniqueness and still provide a meaningful name for referencing the connections. The final declarations serve as placeholders for SQL Server logon information. Customize these constants to match your SQL Server environment.
Listing 2. Module declarations and main subroutine
Option
Explicit
‘The Package object to be used throughout the
example
Private moPackage As DTS.Package
‘Unique connection IDs that will be referenced throughout the package
Private Enum EConnectionIds
Pubs = 1
Northwind = 2
HRMart = 3
End Enum
‘SQL Server login information
Private Const msServerName As String = “Acorn”
Private Const msLogonId As String = “sa”
Private Const msPassword As String = “password”
Private Sub Main()
Dim bStatus As Boolean
Dim 11 As Long
‘Create and initialize the package.
Call CreatePackage
‘Establish the connections for the package.
Call AddConnection(EConnectionIds.Pubs, “PubsData”, “pubs”, “SQLOLEDB.1”)
Call AddConnection(EConnectionIds.Northwind, “NorthwindData”, “Northwind”, “SQLOLEDB.1”)
Call AddConnection(EConnectionIds.HRMart, “HRMartData”, “HRMart”, “SQLOLEDB.1”)
‘Establish the tasks for the package.
Call TaskHRMartPurge
Call TaskPubsToHRMart
Call TaskNorthwindToHRMart
‘Establish the steps for the package.
Call AddStep(“HRMartPurgeStep”, “The HRMart Purge
Step”, “HRMartPurgeTask”)
Call AddStep(“PubsToHRMartStep”, “The Pubs to HRMart Step”, “PubsToHRMartTask”)
Call AddStep(“NorthwindToHRMartStep”, “The Northwind to HRMart Step”, “NorthwindToHRMartTask”)
‘Establish the precedence constraints (workflow) for the package.
Call AddConstraint(“HRMartPurgeStep”, “PubsToHRMartStep”, DTSStepPrecedenceBasis_ExecResult, DTSStepExecResult_Success)
Call AddConstraint(“HRMartPurgeStep”, “NorthwindToHRMartStep”, DTSStepPrecedenceBasis_ExecResult, DTSStepExecResult_Success)
‘Execute the package.
moPackage.Execute
‘Save the package in SQL Server.
‘Call moPackage.SaveToSQLServer(msServerName, msLogonId, msPassword).
‘Check for errors and report the status.
bStatus = True
For 11 = 1 To moPackage.Steps.Count
If moPackage.Steps(11).ExecutionResult = DTSStepExecResult_Failure Then
bStatus = False
MsgBox “The “““ & moPackage.Steps(ll).Name + “““ step in the package “““ & _
moPackage.Name + “““ failed.”
End If
Next 11
If bStatus Then
MsgBox “The execution of package “““ & moPackage.Name + “““ was successful!”
End If
Set moPackage = Nothing
End Sub
In addition, Listing 2 shows the Main subroutine, the shell in which you assemble and execute the package. To begin assembling the package, create and initialize a new Package object.
To start, create, and initialize a Package object, you must call the CreatePackage subroutine, and define the Package, as shown in Listing 3. Next, assign the Package a name. For the purposes of this example, use HRMartRefresh. The Package lets you log its completion status in the Windows NT Event Log using the WriteCompletionStatusTo-NTEventLog property. You can adjust and adapt the Package to a specific execution environment by tuning the Package-PriorityClass and MaxConcurrentSteps properties. The PackagePriorityClass property establishes the Win32® process priority of the Package. You can use the DTSPriorityClass enumeration constants to set this priority.
The MaxConcurrentSteps property controls the number of steps that the Package can execute concurrently. As you tune this value, remember that each Step runs in its own thread, so setting this value larger than the number of CPUs in the computer will not improve performance.
The LineageOptions and RepositoryMetadataOptions properties control Package interaction with Microsoft Repository. The LineageOptions property tells the Package how and whether to use Microsoft Repository, and the DTSLineage enumeration constants control the property’s values. The RepositoryMetadataOptions property controls how a Package is saved to Microsoft Repository, and the DTSReposMetadataOptions enumeration constants control the property’s values.
Listing 3. Package creation and initialization
Private Sub CreatePackage()
‘Create and initialize the package.
Set moPackage = New DTS.Package
With moPackage
.Name = “HRMartRefresh”
.Description = “Refresh the HRMart employee data.”
.WriteCompletionStatusToNTEventLog = False
.FailOnError = False ‘Default = FALSE
.PackagePriorityClass = DTSPriorityClass_Normal ‘Win32 Process Priority
.MaxConcurrentSteps = 4 ‘Default = 4
.LineageOptions = DTSLineage_None ‘How to use Repository Lineage
.UseTransaction = True
.TransactionIsolationLevel = DTSIsoLevel_ReadCommitted ‘Default = ReadCommitted
.AutoCommitTransaction = True
.RepositoryMetadataOptions = DTSReposMetadata_Default
.UseOLEDBServiceComponents = True ‘Default = TRUE
End With
End Sub
Another function of the Package object is to handle the transaction model. The UseTransaction property controls transaction usage. You control the TransactionIsolationLevel property of a Package by setting that property to one of the enumeration constants contained in DTSIsoLevel; its default value is Read-Committed. To define transactional control, use the AutoCommitTransaction property. This property tells the Package whether to implicitly or explicitly commit an active transaction upon completing execution. Next, you can create the connections to the various data stores.
In this scenario, three connections are needed: one each to represent the parent and two subsidiary companies. The connections are named PubsData, NorthwindData, and HRMartData and have ConnectionIds of 1, 2, and 3, respectively. To create the new connections and add them to the Package, the Main subroutine in Listing 2 calls the AddConnection subroutine in Listing 4. The Connection object gives the Package the definition of the OLE DB service provider. The Connections collection contains all the Connection objects accessible to the Package. To create a new Connection object, supply the class name of the OLE DB provider, and then assign the ID property a unique numeric value.
Listing 4. Adding connections
Private Sub AddConnection (ByVal lConnectionId As Long, _
ByVal sConnectionName As String, _
ByVal sCatalog As String, _
ByVal sOLEDBProvider As String)
Dim oConnection As DTS.Connection
Set oConnection = moPackage.Connections.New(sOLEDBProvider)
With oConnection
.Name = sConnectionName
.ID = lConnectionId ‘The unique connection ID
.Reusable = True ‘The default is FALSE.
.ConnectImmediate = False ‘The default is FALSE.
.DataSource = msServerName
.UserID = msLogonId
.Password = msPassword
.UseTrustedConnection = False ‘Support for integrated security
.ConnectionTimeout = 30 ‘The default time-out value is 60 seconds
.Catalog = sCatalog ‘The database name
End With
moPackage.Connections.Add oConnection
Set oConnection = Nothing
End Sub
The Connection object also contains several other attributes. The Reusable property determines whether multiple steps can share the Connection. The ConnectImmediate property controls the Connection initiation. When the value is TRUE, the connection to the data store occurs when the Package starts running. When the value is FALSE, the connection does not start until a Step referencing it executes.
Other Connection properties describe the data store’s environment information. With the Microsoft OLE DB Provider for SQL Server, the DataSource property identifies the server that SQL Server is on. For other providers, this value may differ. For example, for Oracle or Informix providers, this property may refer to the instance name. The UserID, Password, and UseTrustedConnection properties provide connection security information.
The last property to implement is the Catalog property, which identifies the database that the connection initially occurs in. The ConnectionProperties property specifies and initializes additional attributes that are identifiable to the OLE DB provider. The example relies on the default settings of the SQL Server provider.
After you initialize the Connection object, add it to the package’s Connections collection. After you’ve added all three connections, create the tasks and transformations the package will perform. The first Task to add is an ExecuteSQLTask object to delete the HRMart..Employee data.
An ExecuteSQLTask object is a specialized CustomTask object that lets a package execute a SQL statement on a given connection. To create a new Task, execute the New method of the Tasks collection and pass it the class name of the CustomTask to create. This task clears the employee file before performing the transformation.
To add the ExecuteSQLTask to the package, call the TaskHRMartPurge subroutine, shown in Listing 5. First, create a new ExecuteSQLTask by calling the Tasks.New method and passing it the class name, DTSExecuteSQLTask. Next, set the properties; the name of the Task is HRMartPurgeTask. The SQLStatement property contains the SQL statement for the Task to perform. In this case, the SQL statement is DELETE FROM Employee. A Task needs a Connection in order to execute. HRMartPurgeTask uses the HRMartData connection that you defined using the enumeration constant EConnectionIds. Next, set the time-out value to establish how long the Task will wait for the SQL statement to complete execution.
Finally, using the Tasks collection’s Add method, add ExecuteSQLTask to the Package. The next Task to add is the DataPumpTask, which transforms the Pubs..Employee data and inserts it into HRMart..Employee table.
Listing 5. Adding an ExecuteSQLTask
Private Sub TaskHRMartPurge()
Dim oTask As DTS.Task
Dim oCustomTask As DTS.ExecuteSQLTask
‘Create an ExecuteSQLTask and initialize the CustomTask object
Set oTask = moPackage.Tasks.New(“DTSExecuteSQLTask”)
Set oCustomTask = oTask.CustomTask
With oCustomTask
.Name = “HRMartPurgeTask”
.Description = “Delete the Employee data from the HRMart”
.SQLStatement = “DELETE FROM Employee” ‘SQL code to be executed
.ConnectionID = EConnectionIds.HRMart ‘Connection on which to execute
.CommandTimeout = 0 ‘Length of time to wait
End With
moPackage.Tasks.Add oTask
Set oCustomTask = Nothing
Set oTask = Nothing
End Sub
Brian Lawton (lawton@RDAConsultants.com) is a consultant with RDA Consultants Limited and a contributing editor for SQL Server Magazine. Since 1993, Brian has been focusing primarily on design, implementation, and deployment issues related to database integration within distributed multitiered applications. During this time, he has served as a consultant, mentor, and technical lead on numerous OLTP and Data Warehouse/Data Mart systems in the health care and financial services industries. In addition, Brian maintains Certified Systems Engineer (MCSE), Certified Database Administrator (MCDBA), and Certified Solutions Developer (MCSD) statuses. Recently, Brian has been focusing on implementation issues relating to SQL Server 7.0 and has spoken about SQL Server’s potential at several events including the COMPASS ‘99, the 1998 Windows NT Professionals Conference, the Capital Area and Baltimore Area SQL Server User Groups, and numerous internal client briefings. In addition, he is the author of the upcoming DTS Developer’s Guide, ISBN: 0-07-212411-3 (Osborne/McGraw-Hill).
Don Awalt (awalt@RDAConsultants.com) is President of RDA Consultants Limited, a Solutions Provider Partner focusing exclusively on custom applications development. Founded in 1988, RDA is an Inc.500 award-winning company with offices in Philadelphia, Baltimore, Washington DC, Atlanta, and Chicago. Don maintains Certified Solutions Developer, Certified Systems Engineer, and MCP + Internet status, and has spoken at many Microsoft events including Developer Days, TechEd, PASS, MSDN Events, Explorer98, and other regional events. Don is also a Contributing Editor for SQL Server Magazine and also contributes articles to their Web site (www.sqlmag.com/).
For more information, see www.microsoft.com/sql/.
For a risk-free subscription to SQL Server Magazine, see www.sqlmag.com/.
To subscribe to SQL Server UPDATE, a free weekly e-mail newsletter with timesaving tips and important product information, see www.sqlmag.com/.
--------------------------------------------
This is a preliminary document and may be changed substantially prior to final commercial release. This document is provided for informational purposes only and Microsoft makes no warranties, either express or implied, in this document. Information in this document is subject to change without notice. The entire risk of the use or the results of the use of this document remains with the user. The example companies, organizations, products, people and events depicted herein are fictitious. No association with any real company, organization, product, person or event is intended or should be inferred. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.
Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.
Unpublished work. © 1999 Microsoft Corporation. All rights reserved.
Microsoft, ActiveX, Visual Basic, and Win32 are either registered trademarks or trademarks of Microsoft Corporation in the U.S.A. and/or other countries.
The names of actual companies and products mentioned herein may be the trademarks of their respective owners.