Transactional Replication

You can use Microsoft® SQL Server™ transactional replication to replicate two distinct types of objects: tables and stored procedures. You can select all or part of a table to be replicated as an article within a publication. Similarly, you can select one or more stored procedures to be replicated as an article within the same or a different publication.

Transactional replication uses the transaction log to capture changes that were made to data in an article. SQL Server monitors INSERT, UPDATE, and DELETE statements, or other modifications made to the data, and stores those changes in the distribution database, which acts as a reliable queue. Changes are then sent to Subscribers and applied in the same order.

With transactional replication, any given data element has a Publisher. Changes made at the Publisher flow continuously or at scheduled intervals out to subscribing sites. Changes are usually propagated in near real-time; typically, with a latency of seconds. Because changes to the data must be made at the publishing site (or both the subscribing and publishing site if there are immediate-updating Subscribers), update conflicts are avoided. This guarantees transactional consistency. Ultimately, all the subscribing sites will achieve the same values as the Publisher, which is where the updates were made.

If Subscribers need to receive data changes in near real-time, then they need a network connection to the Publisher. Transactional replication in a well-connected environment can provide very low latency to Subscribers. Push Subscribers usually receive changes from the Publisher within a minute or sooner, provided that the network link remains available and adequate processing resources are available.

However, the Subscriber could also pull those changes down as needed. A traveling sales representative could be a Subscriber and nightly pull down just the incremental changes to the price list, which is modified only at the corporate office. The use of transactional replication for disconnected users can be very effective for read-only data.

As illustrated in the diagram, transactional replication is carried out by the Snapshot Agent, Log Reader Agent, and Distribution Agent. The Snapshot Agent prepares snapshot files containing schema and data of published tables, stores the files in the snapshot folder on the Distributor, and records synchronization jobs in the distribution database on the Distributor. The Log Reader Agent monitors the transaction log of each database set up for replication and copies the transactions marked for replication from the transaction log into the distribution database. The Distribution Agent moves the transactions and initial snapshot jobs held in the distribution database tables to Subscribers.

Snapshot Agent

Before it can receive incremental changes from a Publisher, the new Subscriber needs a starting point. The Subscriber must contain tables with the same schema and data as the tables at the Publisher. Copying the complete current publication from the Publisher to the Subscriber is called applying the initial synchronization. Unless the Subscriber has specifically opted to ship the initial synchronization, transactional replication occurs only after SQL Server ensures that the Subscriber has a current snapshot of the table schema and data.

Note If you subscribe to a transactional publication and choose not to initialize the subscription, any custom stored procedures automatically generated at the Publisher are not applied to the Subscriber. Instead, you must manually create the stored procedures at the Subscriber.

When snapshots are distributed and applied to Subscribers, only those Subscribers waiting for initial snapshots are affected. Other Subscribers to that publication (those that are already receiving inserts, updates, deletes, or other modifications to the published data) are unaffected.

Share locks are held while the snapshot is generated so a full, logical, and consistent set of data is produced. This means that while the data can be queried, it cannot be updated during the time it takes to generate the snapshot. To minimize any inconvenience to your operations, always plan to generate a snapshot when updates are minimal.

The procedures by which the Snapshot Agent implements the initial snapshot in transactional replication are the same procedures used in snapshot replication.

When setting up a subscription, you can load the initial snapshot on the Subscriber manually instead of sending it over the network; this is referred to as “manual synchronization.” If the publication is very large, it may be more efficient to load the snapshot from a tape or other storage device. For example, if you have a 20 GB database, it might be easier and faster to dump the database to tape, express courier the tapes to the Subscriber location, and reload the database instead of sending the files over a slow network. If you opt to load the snapshot this way, you can avoid running the Snapshot Agent and SQL Server will not synchronize the published articles with the destination tables. SQL Server then assumes that the Publisher and Subscriber are already synchronized, and immediately starts sending inserts, updates, deletes, or other modifications to the published data.

Note In the case of manual synchronization, it is the responsibility of the user setting up replication to ensure that the table schema and data are identical for the published article and the destination table. The advantage of this option is that it allows changes to replicated data to be distributed immediately to Subscribers, without incurring the system overhead associated with the initial snapshot. It is intended for experienced users who will be implementing a custom replication solution.

Log Reader Agent

The Log Reader Agent runs either continuously or according to a schedule you establish at the time the publication is created. When executing, the Log Reader Agent first reads the publication’s transaction log (the same database log used for transaction tracking and recovery during normal SQL Server operations) and identifies any INSERT, UPDATE, and DELETE statements, or other modifications made to the data transactions that have been marked for replication. Next, the agent batch copies those transactions to the distribution database at the Distributor. The Log Reader Agent uses the system stored procedure sp_replcmds to get the next set of commands marked for replication from the log. The distribution database then becomes the store-and-forward queue from which changes are sent to Subscribers. Only committed transactions are sent to the distribution database.

There is a one-to-one correspondence between transactions on the Publisher and replication transactions in the distribution database. A single transaction stored in MSrepl_transactions can consist of many commands and each command can be broken up along a 500-Unicode-character boundary in the MSrepl_commands table. After the entire batch of transactions has been written successfully to the distribution database, it is committed. Following the commit of each batch of commands to the Distributor, the Log Reader Agent calls sp_repldone to mark where replication was last completed. Finally, the agent marks which rows in the transaction log are ready to be truncated. Rows still waiting to be replicated are not truncated. The transaction log on the Publisher can be dumped without interfering with replication, because only transactions not marked for replication are purged.

The Log Reader Agent runs under SQL Server Agent and can be administered directly by using SQL Server Enterprise Manager. The Log Reader Agent executes on the Distributor.

Distribution Agent

Transaction commands are stored in the distribution database until the Distribution Agent “pushes” them to all the Subscribers (or a Distribution Agent at the Subscriber “pulls” the changes). The distribution database is used only by replication and does not contain any user tables. Under no circumstances should you create other objects in the distribution database. The actions that change data at the Publisher flow to the Subscriber where they change the data in the exact same way. This ensures that the Subscribers receive transactions in the same order in which they were applied at the Publisher.

The procedures by which the Distribution Agent moves the commands to Subscribers are the same procedures used in snapshot replication.

Cleaning Up Transactional Replication

When the distribution database is created, SQL Server adds three tasks automatically to SQL Server Agent at the Distributor to purge the data no longer required:

Agent checkup
Transaction cleanup
History cleanup

These tasks help replication to function effectively in a long-running environment. The cleanup tasks retain transactions for each publication for a defined period after all Subscribers have received them. The retention period setting determines the period of time that transactions are guaranteed to be retained within the distribution database after they have been distributed. Setting a retention period in conjunction with scheduling backups can be used to ensure that information required to recover a destination database automatically is available within the distribution database.

For example, if a Subscriber has scheduled a transaction log dump of a destination database every 24 hours, you could set the retention period to 48 hours. Then, even if the Subscriber experiences a catastrophic failure immediately before a scheduled backup, all transactions necessary to restore the replicated tables automatically will still be available to the distribution process of the Distributor.

To set the retention period for a distribution database

MSrepl_commands	sp_replcmds
MSrepl_transactions	sp_repldone