Snapshot Replication

As its name implies, snapshot replication takes a picture, or snapshot, of the published data in the database at a moment in time. Snapshot replication requires less constant processor overhead than transactional replication because it does not require continuous monitoring of data changes on source servers. Instead of copying INSERT, UPDATE, and DELETE statements (characteristic of transactional replication), or data modifications (characteristic of merge replication), Subscribers are updated by a total refresh of the data set. Hence, snapshot replication sends all the data to the Subscriber instead of sending just the changes. If the article is very large, it can require substantial network resources to transmit. In deciding if snapshot replication is appropriate, you must balance the size of the entire data set against the volatility of change to the data.

Snapshot replication is the simplest type of replication, and it guarantees latent transactional consistency between the Publisher and Subscriber. Snapshot replication is often used by groups needing to look up data such as price lists, or needing data for decision support, where the most current data is not essential. This type of replication is a good solution for read-only Subscribers that do not require the most recent data. Such Subscribers can be disconnected totally if they are not updating. Snapshot replication is also a good solution for immediate-updating Subscribers that make changes to a database infrequently.

As illustrated in the diagram, snapshot replication is carried out by the Snapshot Agent and the Distribution Agent. The Snapshot Agent prepares snapshot files containing schema and data of published tables, stores the files in the snapshot folder on the Distributor, and records synchronization jobs in the distribution database on the Distributor. The Distribution Agent moves the snapshot jobs held in the distribution database tables to the destination tables at the Subscribers. The distribution database is used only by replication and does not contain any user tables.

Snapshot Agent

Each time the Snapshot Agent runs, it creates schema and data files to be sent to Subscribers. The agent does this in several steps:

  1. The agent establishes a connection from the Distributor to the Publisher and sets a share-lock on all tables (articles) included in the publication. The share-lock ensures a totally consistent snapshot of data. Because the locks will prevent all other users from updating the tables, the Snapshot Agent should be scheduled to execute during off-peak database activity.
  2. The agent establishes a connection from the Publisher back to the Distributor and writes a copy of the table schema for each article to an .sch file on the Distributor. The file is stored in a subdirectory in the working directory of the distribution database. If you request that indexes and declarative referential integrity be included, the agent scripts out the selected indexes to an .idx file on the Distributor.
  3. The agent takes a “snapshot” of the data in the published table on the Publisher and writes the data to a file on the Distributor. The file is stored in a subdirectory in the working directory of the distribution database. If all Subscribers are Microsoft SQL Serve™r installations, the snapshot is stored as a native bulk copy program (.bcp) file. If one or more Subscribers is a heterogeneous data source, the snapshot is stored as a character mode (.txt) file. The .sch and .bcp files are the synchronization set that represent the table at a single point in time. There is a synchronization set for each article within a publication.
  4. The agent appends rows to the MSrepl_commands and MSrepl_transactions tables in the distribution database. The entries in the MSrepl_commands tables are commands indicating the location of the synchronization set (.sch and .bcp files) and ordered references to any specified pre-creation scripts. The entries in the MSrepl_transactions table are commands referencing the Subscriber’s synchronization task.
  5. Finally, the agent releases its share-locks on each published table and finishes writing in the log history file.
Distribution Agent

Each time the Distribution Agent runs for a snapshot publication, it moves the schema and data to Subscribers. The agent does this in several steps:

  1. The agent establishes a connection from the server where the agent is located to the Distributor. For push subscriptions, the Distribution Agent is located on the Distributor, but for pull subscriptions, the Distribution Agent is located on the Subscriber.
  2. The agent examines the MSrepl_commands and MSrepl_transactions tables in the distribution database on the Distributor. The agent reads the location of the synchronization set from the first table and the Subscriber’s synchronization commands from these tables.
  3. The agent applies the schema and commands to the subscription database. If the Subscriber is not a SQL Server database, the agent converts the data types as necessary. All articles of a publication are synchronized, preserving transactional and referential integrity between the underlying tables.

When handling a large number of Subscribers, running the Distribution Agent at the Subscriber’s computer (that is, as a pull subscription) can save valuable processing resources on the Distributor.

Snapshots can be applied either when the subscription is created or according to a schedule set at the time the publication is created. When the scheduled time arrives, only those Subscribers who have not been synchronized after the last scheduled synchronization event occurred (therefore, not all Subscribers to that publication) get synchronized. This minimizes the effect on the Publisher.


Note For agents running at the Distributor, scheduled synchronization is based on the date and time at the Distributor (not the date and time at the Subscribers). Otherwise, the schedule is based on the date and time at the Subscriber.


Because automatic synchronization of databases or even individual tables requires increased system overhead, a benefit of scheduling automatic synchronization for less frequent intervals is that it allows the synchronization snapshot to be scheduled for a period of low activity on the Publisher.

Cleaning Up Snapshot Replication

When the distribution database is created, SQL Server adds three tasks automatically at the Distributor:

These tasks help replication to function effectively in a long-running environment. After the snapshot is delivered to all Subscribers, replication cleanup deletes the associated .bcp file for the initial snapshots.

See Also
MSrepl_commands Transferring Data with bcp
MSrepl_transactions  

  


(c) 1988-98 Microsoft Corporation. All Rights Reserved.