DTS transformation definitions are stored in Microsoft Repository, SQL Server, or COM-structured storage files. Relational and nonrelational data sources are accessed using OLE DB. The data pump opens a rowset from the data source and pulls each row from the data source into the data pump. The data pump executes Microsoft ActiveX scripting functions (Microsoft Visual Basic, Scripting Edition; JScript development software; and PerlScript) to copy, validate, or transform data from the data source to the destination. Custom transform objects can be created for advanced data scrubbing. The new values for the destination are returned to the pump and sent to the destination by means of high-speed data transfers. Destinations can be OLE DB, ODBC, ASCII fixed field, ASCII delimited files, and HTML.
In DTS architecture, data is pulled from the data source with an OLE DB data pump, and optionally transformed before being sent to OLE DB destinations.
Complex transformation and data validation logic can be implemented using ActiveX scripting. These scripts can invoke methods from any OLE object to modify or validate the value of a column. Advanced developers can create reusable COM transformation objects that provide advanced scrubbing capabilities. Custom tasks can be created that transfer files by means of FTP or launch external processes.
ISVs and consultants can create new data sources and destinations by providing OLE DB interfaces. The data pump will query the OLE DB interface for any provider to determine whether high-speed data loading is supported; if not, then standard loading mechanisms will be used.
Although standards like SQL-92 have improved interoperability between relational database engines, vendors still differentiate themselves in the marketplace by adding useful but proprietary extensions to SQL-92. SQL Server offers a simple programming language known as Transact-SQL that provides basic conditional processing and simple repetition control. Oracle Corporation, Informix Software, Inc., and other vendors offer similar SQL extensions.
The DTS Transformation Engine pass-through SQL architecture helps guarantee that most of the functionality of the source and destination are available to customers using the DTS Transformation Engine. This allows customers to leverage scripts and stored procedures that they have already developed and tested by simply invoking them from the DTS Transformation Engine. The pass-through architecture dramatically simplifies development and testing, since DTS does not modify or interpret the SQL statement being executed. Any statement that works through the native interface of the DBMS will work exactly the same way during a transformation.
DTS records and documents the lineage of each transformation in the repository so customers can know where their data came from. Data lineage can be tracked at both the table and row levels. This provides a complete audit trail for the information in the data warehouse. Data lineage is shared across vendor products. DTS packages and data lineage can be stored centrally in Microsoft Repository. This includes transformation definitions, Visual Basic scripting, Java scripting, and package execution history. Integration with Microsoft Repository allows third parties to build on the infrastructure provided by the DTS Transformation Engine. DTS packages can be scheduled for execution through an integrated calendar, and then executed interactively or in response to system events.
The DTS package is a complete description of all the work to be performed as part of the transformation process. Each package defines one or more tasks to be executed in a coordinated sequence. A DTS package can be created interactively using the graphical user interface or any language that supports OLE Automation. The DTS package can be stored in Microsoft Repository, in SQL Server, or as a COM-structured storage file. After being retrieved from the repository or structured storage file, the package can be executed in the same way as a DTS package that was created interactively.
A DTS package can contain multiple tasks, and each task can be as uninvolved as table-to-table mapping or as complex as invoking an external data cleansing process.
A task defines a piece of work to be performed as part of the transformation process. A task can move and transform heterogeneous data from an OLE DB source to an OLE DB destination using the DTS Data Pump, and can execute ActiveX scripting or launch an external program. Tasks are executed by step objects.
Step objects coordinate the flow of control and execution of tasks in the DTS package. Some tasks must be executed in a certain sequence. For example, a database must be created successfully (Task A) before a table can be created (Task B). This is an example of a finish-start relationship between Task A and Task B; Task B, therefore, has a precedence constraint on Task A.
Each task is executed when all preceding constraints have been satisfied. Tasks can be executed conditionally based on run-time conditions. Multiple tasks can be executed in parallel to improve performance. For example, a package can load data simultaneously from Oracle and DB2 into separate tables. The step object also controls the priority of a task. The priority of a step determines the priority of the Win32 API thread running the task.
The DTS Data Pump is an OLE DB service provider that provides the infrastructure to import, export, and transform data between heterogeneous data stores. The OLE DB strategic data access interface provides access to the broadest possible range of relational and nonrelational data stores. The DTS Data Pump is a high-speed, in-process COM server that moves and transforms OLE DB rowsets.
A transformation is a set of procedural operations that must be applied to the source rowset before it can be stored in the desired destination. The DTS Data Pump provides an extensible, COM-based architecture that allows complex data validations and transformations as the data moves from the source to the destination. The DTS Data Pump makes the full power of ActiveX scripting available to the DTS package, allowing complex procedural logic to be expressed as simple, reusable ActiveX scripts. These scripts can validate, convert, or transform the column values using the scripting language of their choice as they move from the source, through DTS Data Pump, to the destination. New values can be calculated easily from one or more columns in the source rowset. Source columns also decompose a single field into multiple destination columns. ActiveX scripts can invoke and use the services of any COM object that supports automation.