Developing Applications That Employ Replication

This section identifies design issues in developing replicated applications and further isolates specific constraints in the current SQL Server 6.5 implementation of transactional replication. While you can successfully set up replication on many existing Microsoft SQL Server 6.5 applications, you can increase flexibility, ease deployment, and minimize development problems by considering replication early in your application design process.

Network and Security Requirements

Replication requires integrated security for server-to-server operations. SQL Server 6.5 replication uses SQL Server remote stored procedure calls for communicating between publisher, distributor, and subscriber.

All replication tasks rely on SQL Executive scheduling engine to provide scheduling and execution facilities. Replication requires that each replication task connect to SQL Server by impersonating the domain account. As a result, the SQLExecutive service must be configured to run under a Windows NT Domain account during installation. However, this setting can be modified at any time by accessing the Services Configuration utility in Windows NT Control Panel and altering the SQL Executive startup account.

Additionally, named pipes or multiprotocol network support must be installed to support login impersonation of the domain account and name resolution for remote server connections. While you can choose to establish anything as the default network protocol, it is recommended that named pipes not be removed as a supported network library during server setup.

Beyond SQL Server configuration issues, it should be noted that certain replication configuration activities are limited by user type. Public users of a database can view publication and subscription information, but most operations involving replication installation and configuration must be performed by the DBO or SA. Only an SA can install publishing to enable replication, and public users of a database cannot create publications or add subscriptions.

Internationalization and Localization

It is recommended that all servers involved in replication are configured to use the same code page. While it is possible to create custom stored procedures that handle data translation for each INSERT, UPDATE, or DELETE statement delivered to a subscriber, this is costly for performance and introduces unnecessary complexity to the enterprise application.

It is also recommended, though not required, that you establish the same default sort order for all SQL Servers involved in replication. This ensures that the same content and ordering of query result sets are returned from all servers.

Data Type Support

All data types are supported with Microsoft SQL Server 6.5 transactional replication. Prior to SQL Server version 6.5, replication of text and image data was limited to snapshot replication (scheduled table refresh) and timestamp columns were always translated to binary. User-defined data types are supported as well. You should consider, however, the costs incurred with replicating text, image, or user-defined data types.

Updating text and image data is costly for storage and performance. You should try to minimize the amount and frequency of text and image replication in your application. You can establish the maximum size of text data that can be replicated by the application by setting the SQL Server configuration value max text repl size. By default this value is limited to 64K, however, it can be changed using SQL Enterprise Manager or sp_configure. The same setting controls the maximum size of image data that can be added to a replicated column.

When setting up replication, you can choose to maintain custom data types across all servers or have the publisher translate or "map" those data types to a standard data type. This is particularly useful when replicating data for inclusion in existing databases in the enterprise do not support the user-defined data types you want to include in your development project. Be advised, however, as with any data type mapping or translation, a minor performance penalty is incurred.

Primary Key Requirements

Microsoft SQL Server 6.5 replication requires that all published tables have a declared primary key. To add replication to an existing application, you must ensure that all tables to be published have a primary key. You can add a primary key to a table by using the ALTER TABLE statement:

ALTER TABLE [database.[owner].]table_name 
[WITH NOCHECK]
[ADD
   {col_name column_properties [column_constraints]
   | [[,] table_constraint]}
      [, {next_col_name | next_table_constraint}]...]
| 

where constraint is

[CONSTRAINT constraint_name]
   PRIMARY KEY [CLUSTERED | NONCLUSTERED]
      (col_name [, col_name2 [..., col_name16]])
      [ON segment_name] 

New tables you plan to publish should be designed and created with a declared primary key:

CREATE TABLE [database.[owner].]table_name 
(
   {col_name column_properties [constraint [constraint [...constraint]]]
   | [[,] constraint]}
      [[,] {next_col_name | next_constraint}...]
)
[ON segment_name]

where constraint is

[CONSTRAINT constraint_name]
   PRIMARY KEY [CLUSTERED | NONCLUSTERED]
      (col_name [, col_name2 [..., col_name16]])
      [ON segment_name] 

Representing Identity Columns

When transactions are propagated through replication, they cannot be delivered to a subscribing table containing an identity column. By default, any identity column in a published table is converted to a simple integer during the schema creation portion of initial synchronization. However, if you precreate your schema at the subscription database, identity columns must be represented as integers only. Not doing so results in failure to replicate data when the distribution task executes.

Using Check Constraints

Check constraints are frequently used to enforce validation of data entry on a table. For example, a check constraint can be created in conjunction with a preseeded identity column so that a specific server in the enterprise is allocated a restricted set of identifiers for use as primary keys. Where such a mechanism is used to restrict updates on two servers that replicate data to one another, careful application of such constraints is necessary.

Assume that one server restricts its set of valid keys to the range 1-100000 and a second server restricts primary keys to the range 100001-200000. Neither server can replicate data to the other server without generating a constraint violation unless the constraints are created with the NOT FOR REPLICATION option.

ALTER TABLE [database.[owner].]table_name 
 [ADD
   {col_name column_properties [column_constraints]
   | [[,] table_constraint]}
      [, {next_col_name | next_table_constraint}]...]
| 

where constraint is

[CONSTRAINT constraint_name]
CHECK [NOT FOR REPLICATION] (expression) 

When NOT FOR REPLICATION is employed, the check constraint is ignored when the distribution task applies transactions to the target table on the subscriber.

Partitioning Data

It is recommended that you publish only the minimum data that your application requires. You can limit the set of data published by partitioning when establishing an article or by strategically fragmenting tables according to a site-specific attribute.

Fully Replicated Database Designs

Fully replicated database designs publish all tables in a database to remote locations. A suitable application environment might be one that provides reference materials to customers for remote sites where the data need not be edited locally and the data set is small enough to be stored on each remote server. The advantage of this approach is simplicity. The initial database design can be established at a single location. Data at the central location can be published for all tables in the database. Empty databases can be created at remote locations on each target server and subscriptions can be established to a single publication or a small number of publications that reference the data set to be replicated.

The replication initial synchronization process can be used to transfer the initial schema from the central location to each of the remote sites and data replication can begin. Subscribers in this scenario could include doctors' offices that require read-only access to current insurance benefit details for patients or active content servers for web sites.

Partially Replicated Database Designs

Partially replicated database designs publish a subset of the tables in a database to remote locations. This is often done to reduce the amount of storage required at subscribers or to restrict the availability of certain data to specific subscribers. Subsets of the database can be logically constructed at schema design time through fragmentation of table structures or through partitioning when you set up replication.

Horizontal Partitioning

Horizontal partitioning, or filtering the data by sending only a subset of the rows in a published table, is useful when subscribers are limited to smaller servers not capable of storing the entire published data set. Horizontal partitioning makes sense when only certain sites or regions in the replicated topology need access to certain rows in the database.

Horizontal partitioning can be used to send only specific rows from one departmental database to another departmental database or to a corporate warehouse.

Horizontal partitioning is also useful if you want to re-publish a table at the subscriber so local edits can be made. This can be useful if the data is already located at regional sites that edit their own data. In this situation, the application must be developed carefully to avoid conflicts in data updates.

SQL Server replication supports this scenario by allowing any server in the topology to play the roles of subscriber and publisher simultaneously. By carefully limiting the data set and allocating the privilege to modify data to the region that "owns" each subset of rows, update conflicts can be avoided. Even if you consider another replication scheme in which multi-site updates to a single row are permissible, it is best to design your database up-front to avoid conflicts. Conflict resolution is inherently costly because it adds another step to the process of transmitting changes to each subscriber.

Regardless of the motivating factor behind the decision to use horizontal partitioning, there are trade-offs that must be considered. Horizontal partitions can degrade replication performance because the filter clause on the article is evaluated for each row in the log that is marked for replication to determine whether it should be included in the distribution database's transaction and commands tables. Horizontal partitions should likely be avoided where each site can support the full data load, the overall data set is reasonably small, and the number of insert, update, and delete transactions per day is relatively high.

Horizontal partitioning is convenient because it can be readily applied to existing deployed applications where a site-specific attribute is present in the table to be published or in one of its related tables. Additionally, only a single copy of the metadata associated with any table need exist on each database in a distributed system. This is a significant advantage over the same restrictions incurred in fragmenting (data allocation) your database design.

Vertical Partitioning

Unlike horizontal partitions, vertical partitions alter the construction of the column list for a publication. This feature does not pose a performance impact and can, in some cases, reduce the amount of time it takes to insert replicated commands at the subscriber.

Vertical partitioning can be used to limit the columns replicated as part of an article definition. It can decrease the volume of data traffic across the network or the size of storage at each site by eliminating large text or image columns that may be unnecessary at subscribers.

Though views can work better for this purpose, the combination of a security need and a desire to provide greater autonomy for each site can make vertical partitioning a valuable technique for an application. Vertical partitioning can be used to limit the view of data at a remote location by excluding privileged data that is desirable only at the central publisher.

Limiting Data Sets Through Data Allocation

An alternative to complex query evaluation for the application of a filter clause to each publication created on a table, is to rely on horizontal fragmentation of the data tables during the design phase. For example, if your data can be partitioned on regional boundaries, you can physically store the data in discrete tables specific to each region. Replicating the data in these tables does not require costly article filters and greatly simplifies the logical view of the data model. The drawback is the need to maintain metadata in multiple locations for each table. That is, essentially the same table (but a region-specific version) must be modified in the event a schema change is called for system-wide. For this reason, it is critical that the attributes of these tables be well thought out.

When you partially replicate a database, you must not include a single table in multiple publications targeted for a single site. This redundancy can cause synchronization and blocking problems when replicated data is applied to the target subscriber. The general rule for a subscriber is that you should make only one transactional subscription to any table in the database.