MDAC 2.5 SDK - Technical Articles
Providing a common interface to diverse data stores containing mission-critical data is not an easy task. Each data store supports different sets of functionality, making it difficult to define a useful level of interaction between a generic application and diverse stores. Defining the level of interaction to be only the common functionality among all stores does not provide a useful set of functionality for the application. Further, this approach limits the ability of the data store to expose innovative extensions. On the other hand, requiring a rich level of functionality would force a simple data store to implement, in full or in part, a relational engine on top of its data. Such barriers prevent many types of simple data from participating in data access.
The Microsoft Universal Data Access solution is to define a rich set of functionality, but to factor that functionality so that data stores are able to expose only the subset that makes sense for their data. Thus, the data store is not limited in the functionality it can expose, but at the same time is not forced to implement extended database features in order to provide common access to its data.
This makes OLE DB attractive for the data store, but to stop there would be near to defining a least-common-denominator approach for the application. In order to be generic, the application would have to be filled with extensive conditional code to verify whether a particular set of functionality was supported. In many cases, the application would be forced to implement extensions internally to compensate for data stores that could not support extended functionality natively.
To solve this problem, Universal Data Access provides a component architecture that allows individual, specialized components to implement discrete sets of database functionality, or “services,” on top of less capable stores. Thus, rather than forcing each data store to provide its own implementation of extended functionality, or forcing generic applications to implement database functionality internally, service components provide a common implementation that any application can use when accessing any data store. The fact that some functionality is implemented natively by the data store, and some through generic components, is transparent to the application. In order to understand how OLE DB components are combined to guarantee a rich functionality, it is necessary to understand COM interface factoring and aggregation.
If the data store you want to access to has additional functionality, such as being able to execute a SQL query, then you will probably want a common way to access that extended functionality. The OLE DB specification, as well as specifying the base interfaces and objects that everyone implements, defines some interesting common extensions to expose additional functionality that specific data stores might have. The key concept to OLE DB is componentization. Rather than say that you must support everything or you can’t support anything, Microsoft split the interfaces into groups: some are supported by everyone, and some are extensions for different areas, such as SQL data, ISAM data, and so on.
Componentization lets an OLE DB provider expose exactly its native functionality; then a separate component can be added on to implement the additional incremental functionality required by a consumer. For example, say you implemented just the minimum set of object types and behavior described above in your OLE DB provider. You don’t support querying, sorting, or filtering; you just allow consumers to read through your data. Now let’s say that a consumer wants to execute a SQL query on data exposed by your OLE DB provider. Instead of your having to write that code, a service component that will execute a query over anyone’s simple data can be utilized to query your data. Now when a consumer wants to read through your data, they can talk directly to your OLE DB provider. When the consumer wants to do a query, the query processor can be invoked without the consumer’s knowledge to add that incremental functionality. So what you have is a world of cooperating components, where every component natively exposes and consumes OLE DB interfaces, and individual database components can be incrementally added on to a native data store in order to add functionality.
OLE DB service providers and services include the Remoting Provider, Persistence Provider, Data Shaping Service, Cursor Service, Synchronization Service, and Trace Provider.
The Remoting Provider is the next generation of the Microsoft Remote Data Service (RDS) component, which was formerly known as the Advanced Data Connector. The Remoting Provider delivers the promise of Universal Data Access to "stateless" environments such as the Internet and the World Wide Web. The Remoting Provider creates a framework that permits easy and efficient interaction with OLE DB data sources on corporate intranets and over the Internet. The Remoting Provider provides the advantages of client-side caching of data results, updatability, and support for data-aware ActiveX controls. It provides the rich OLE DB/ADO programming model for manipulating data to browser-hosted applications.
The Remoting Provider allows the client to retrieve actual data rather than a textual HTML page. This is important if the client wants to use data as something other than formatted text. The client can manipulate the local data through ADO or OLE DB directly, and RDS supports hosting standard Visual Basic–style data-bound controls into Microsoft Internet Explorer (IE), which consume the remoted OLE DB data. The fact that these remote data services are invoked is invisible to both the data consumer and the data provider.
The Remoting Provider goes beyond the current generation of Web data access tools by allowing clients to update the data they see. Using ActiveX data controls, such as grids, lists, and combo boxes, developers can deploy sophisticated user interfaces that allow end users to view and change data with a minimum of programming. End users are no longer restricted to staring at a static HTML results table. With the Remoting Provider, they can now alter, add, and delete data they have queried and retrieved. In addition, all changes are buffered locally, and can be submitted to the server for inspection, processing, and storage in the database.
The benefits of traditional client/server technology have migrated to the Web, and the read-only, static client is outdated. By providing a local data cache, the end user is now able to navigate through large data sets without costly server round trips.
The Remoting Provider provides the ability to invoke remote objects over HTTP and DCOM, enabling programmers to develop distributed Web applications that effectively partition application logic between Microsoft Visual Basic Scripting Edition code on the client and server objects. Automation objects written in Visual Basic can expose services to client-side applications, while protecting business logic and data from distribution. Developers are no longer restricted to choosing between "thin" or "fat" clients and servers. They can make an informed choice and partition their data and business logic accordingly.
The OLE DB Persistence Provider makes it possible to save rowset data obtained from a data store through a data provider, its accompanying metadata describing the columns of the rowset, and the rowset's state to a local file. Persisted data can be accessed even if the connection to the data store has been broken.
The Shape Service provides a hierarchical view of data and the reshaping capabilities that user interfaces and reporting tools require. This service works on top of the Cursor Service; it can create new hierarchical rowsets or aggregate existing rowsets into hierarchies. It implements three types of hierarchies, based on relation, parameters, and grouping. Hierarchical rowsets offer a flexible view of data. For example, an application could initially display the top level of a hierarchy, then let the user drill down into one or more levels of detail. Also, the same hierarchy can then be reshaped locally to offer a different perspective on the data.
An important challenge for a multitier, distributed application is to find a balance between keeping data current, or live, and reducing network traffic. Live data produces high network traffic and is impractical for most applications. It requires every change at the application level to be immediately propagated to the data store, and concurrent changes to the data store to be immediately pushed to the client application. In addition, every time the application scrolls beyond its local buffer, new data must be fetched to ensure dynamic membership of the data set. Another side effect of using live data is that it virtually kills application scalability. An Internet application where thousands of client requests need to be served from a data store simply does not work with live data.
A Cursor Service proves a good solution for this challenge. It provides a client cache for data sets (or cursors) and a service that implements smart synchronization mechanisms (see “The Synchronization Service.”) Using these services dramatically reduces network traffic while giving you full control over data synchronization. The Cursor Service can be installed on each tier of a multitier application, providing a great solution for scalability.
For applications using client rowsets over SQL data, the Synchronization Service component provides the ability to send changes back to the provider and to refresh the data in local rowsets with current data from the provider. Both the update and refresh operations can be applied to one or more rows simultaneously and can be coupled together in a single user command. Both operations ensure consistency of multitable rowsets if the structure of the join relation between the participating tables is known.
Your application interacts with a data source by calling interfaces supported by the provider. The Trace Provider tool transparently intercepts those calls and outputs a log containing the call, return value, values of relevant input and output parameters, and the elapsed time for the call.