SQL Server replication has been designed to require minimal user interaction. In most cases, once a subscription has been successfully established, and maintenance routines such as scheduled backup and replication cleanup have been configured, little or no user intervention is required.
However, if your particular replication configuration experiences problems during initial setup or while running replication, review the following troubleshooting information. While not inclusive, this troubleshooting overview contains helpful hints and techniques for isolating and correcting problems that may arise during data replication.
The following situations are discussed:
The best way to troubleshoot a situation where no subscribers are receiving changes is to determine which of these two replication components has a problem:
To determine which component has a problem, first check whether the changes have been stored in the distribution database of the distribution server. The commands themselves can be viewed by examining the command column of the MSjob_commands table.
Note For security reasons, distribution tables are only accessible to the SA.
In the distribution database, execute the following Transact-SQL statement:
SELECT command FROM MSjob_commands
If the changes to the replicated tables are not stored in the table, then the problem resides in the replication log reader task. If the changes are stored in the command column, then the problem resides in the distribution task.
This result set can be further restricted by specifying in the SELECT statement both the publisher_id and publisher_db columns of the table.
The Task History is displayed for that log reader task. Review the history, then choose Close. The Task Scheduling dialog box returns.
Note If the log reader task is running in Auto Start mode, recent status may not be available.
The Edit Task dialog box appears.
The Task Schedule dialog box appears.
To do this, click the right mouse button on the SQL Executive icon top, and then from the drop-down menu that appears choose Stop. Repeat this process, and choose Start.
This terminates the currently running autostart log reader task. The log reader will now run once every minute and record the appropriate status messages in the Task History (see step 2) to help you determine why it is not moving table changes to the distribution database.
To troubleshoot the distribution task, follow the same steps outlined above for obtaining status for the log reader task, but focus on a distribution task instead.
If subscribers are all failing on distribution of a replicated transaction, this can be repaired as follows:
To do this, open the Task Scheduling window, select the appropriate distribution task, and choose the Edit Task button. In the Edit Task dialog box that appears, change the -c parameter in the Command window to -c1.
Once the distribution task commits each individual transaction, it will fail on the specific transaction that is causing your problem.
This provides you with the maximum job_id that was successfully distributed. (This job_id is also available in the MSlast_job_info table on the subscription server.)
This result will provide you with the job_id of the transaction that is failing.
SELECT command FROM MSjob_commands WHERE publisher_id = 'publisher_id' AND publisher_db = 'publisher_db' AND job_id = 'job_id'
This should only be attempted by an individual well-versed in Transact-SQL. Applying an incorrect update to the distribution tables can result in unexpected data loss.
sp_MSkill_job job_id, publisher, publisher_db
This allows further distribution to then continue automatically.
If only one subscription server is not receiving changes, there can be a problem with the distribution task. Troubleshoot the distribution task following the steps described in "No Subscription Servers Receiving Changes", earlier in this chapter.
If a single subscription server is not receiving changes, also check for (and if necessary correct) the following error conditions:
Check the stoplight status indicator of this server in the Server Manager window of SQL Enterprise Manager, or in SQL Service Manager. Green indicates the SQL Server is running.
Another way to verify server availability is to establish a connection from the distribution server to the subscription server, using a utility such as ISQL/w.
If for some reason the repl_publisher login ID has been removed from the subscription server, the distribution task will experience a login failure.
Check the MSlast_job_info table in the subscription database of the subscription server. If this table contains entries in the publication, article and description columns, the distribution task is paused until a manual synchronization is accomplished.
If this condition occurs, ensure that the file named in the description column of the MSlast_job_info table is applied to the subscription server, then use SQL Enterprise Manager to acknowledge the completion of the manual synchronization. This will enable the distribution task to begin sending changes to the subscription server. For more information about applying and acknowledging a manual synchronization, see Chapter 14, Setting Up Replication.
If the transaction log of a publication database is filling up there is a problem with the log reader task. Troubleshoot the log reader task following the instructions provided in "No Subscription Servers Receiving Changes", earlier in this chapter.
You can use the sp_repltrans and sp_replcmds stored procedures to help you determine whether there are replicated transactions residing within the transaction log that have not been moved into the tables of the distribution database.
If sp_repltrans returns no results, then there are no transactions available within the publication database transaction log for the log reader to move into the distribution database.
For more information about sp_repltrans and sp_replcmds, see the Microsoft SQL Server Transact-SQL Reference.
If you are unable to resolve a problem with the log reader task, the transaction log of the publication database can become completely full. Replicated transactions that have not been moved into the distribution database may not be truncated from the transaction log. This can prevent you from truncating the transaction log of the publication database. If this occurs, two solutions can allow you to continue using the publication database.
If you have available disk space, increase the space allocated to the log. Use SQL Enterprise Manager to expand the database onto additional log space, or use the ALTER DATABASE command or sp_extendsegment procedure to increase the size of the segment. For more information see Chapter 6, Managing Databases. Also see the Microsoft SQL Server Transact SQL Reference.
If more transaction log space is not available, and if normal processing must continue within the publication database, then replicated transactions that have not been moved into the distribution database may be manually removed from the transaction log of the publication database.
Note If non-distributed transactions are removed from the transaction log of the publication database, they will not be delivered to subscribing servers. All servers subscribing to publications within this database may need to re-synchronize to the published tables by unsubscribing and resubscribing.
To remove non-distributed transactions from the transaction log of the publication database, the system administrator must execute the following stored procedure, using the parameters shown:
sp_repldone 0, 0, NULL, 0, 0, 1
This marks all replicated transactions as distributed. The transaction log may then be truncated by executing this statement:
DUMP TRANSACTION database_name WITH NO_LOG
This allows normal processing to continue within the publication database.
Since you don't know which subscribers expect to receive the replicated data that was just truncated from the log, normally you should unsubscribe and resubscribe each subscriber to any publications in that database. This will force new initial synchronization events to occur.
The best way to troubleshoot a situation where the initial synchronization has not been applied at a subscriber is to determine which of these two replication components has a problem:
To determine which component has a problem, check to see if the sync events have been stored in the distribution database of the distribution server. The commands themselves can be viewed by examining the command column of the MSjob_commands table.
If the sync events are not stored in the command column, then the problem resides in the sync task. If the changes are stored in the command column, then the problem resides in the distribution task. (Sync commands appears as Sync, followed by table and file information.)
The following actions can help you troubleshoot a sync task:
This can be checked in SQL Enterprise Manager by selecting the server, choosing Task Scheduling from the Tools menu, selecting the task, and choosing the Task History button.
Select the appropriate sync task from the Task Scheduling window, and choose the Run Task button.
Sync events are often scheduled to occur infrequently and during quiet processing periods. Scheduling information is displayed under the Frequency column in the Task List of the Task Scheduling window. For more detailed information, select the task and choose the Task Edit button.
The initial sync task can fail when it does not have access to the replication working directory. This is a directory where all bcp image synchronization files for a publisher are stored.
The working directory can be reviewed and modified in the SQL Enterprise Manager Replication Publishing dialog box. To access this dialog box, select the publication server from the Server Manager window, and from the Server menu, choose Replication Configuration/Publishing.
To troubleshoot the distribution of an initial synchronization, you can perform the following tasks:
Because an initial synchronization is often the first attempt by the distribution task to communicate with the subscription server, it can reveal login permission problems. By default, replication uses the login ID repl_publisher to log into the subscriber.
You can set the subscription server to audit both successful and failed logins. To do this, select the subscription server in the Server Manager window; from the Server menu, choose Configurations; and in the Server Configuration window that appears, choose Security Options. Under Audit Level, select the check boxes for Successful Login and Failed Login.
Information concerning user logins will be written to the Windows NT Server application log and/or the SQL Server error log (depending on how logging has been configured on that server).
You can determine if the bcp portion of the distribution task has a problem by checking to see if the replicated table has been created without the expected rows. If this is the case, check in the working directory to ensure that the appropriate .TMP file has been created by the sync task.
This allows the distribution task to bypass this operation and begin to apply the individual changes to the replicated table.
If you have set up an article to be replicated using scheduled table refresh, how can you determine whether the destination table on the subscriber has been updated? This can be checked by monitoring the MSlast_job_info table in the destination database.
A table refresh will distribute at least two and sometimes three jobs. If the existing table is being deleted from or truncated, there are two jobs:
If you monitor MSlast_job_info on the subscriber and job_id increments by two, the table refresh was successful.
If the existing table is being dropped, then there are three jobs:
If you monitor MSlast_job_info on the subscriber and job_id increments by three, the table refresh was successful.
Under some conditions, deadlocks can occur between the log reader and distribution processes due to log reader page locks being escalated to an exclusive table lock. To avoid this, you can:
For more information about using the log reader -c commitbatchsize option and the distribution -b jobbatchsize option, see Chapter 16, Scheduling Tasks.
You can also customize the lock escalation level by increasing the LE threshold maximum setting of sp_configure, allowing more page locks. This change is primarily useful on remote distribution servers. However, be aware that changing this option affects the entire server (not just the replication processes).
If replication between two servers is not functioning properly you can use the ODBCPING.EXE utility to check whether ODBC is properly installed (by connecting to a server using the ODBC SQL Server Driver). This utility is a 32-bit application that is stored in the \SQL60\BINN directory.
odbcping /Sservername /Ulogin_id /Ppassword
where
If the ODBC connection is established, this message is displayed:
CONNECTED TO SQL SERVER
If the ODBC connection cannot be established, this message is displayed:
COULD NOT CONNECT TO SQL SERVER