The information in this article applies to:
SUMMARY
Microsoft SQL Server tables should never contain duplicate rows, nor
non-unique primary keys. For brevity, we will sometimes refer to primary
keys as "key" or "PK" in this article, but this will always denote "primary
key." Duplicate PKs are a violation of entity integrity, and should be
disallowed in a relational system. SQL Server has various mechanisms for
enforcing entity integrity, including indexes, UNIQUE constraints,
PRIMARY KEY constraints, and triggers. This article discusses how to locate and remove duplicate primary keys from a table. However you should closely examine the process which allowed the duplicates to happen in order to preven a recurrence. MORE INFORMATIONFor this example, we will use the following table with duplicate PK values. In this table the primary key is the two columns (col1, col2). We cannot create a unique index or PRIMARY KEY constraint since two rows have duplicate PKs. This procedure illustrates how to identify and remove the duplicates.
The first step is to identify which rows have duplicate primary key values:
This will return one row for each set of duplicate PK values in the table.
The last column in this result is the number of duplicates for the
particular PK value.
If there are only a few sets of duplicate PK values, the best procedure is to delete these manually on an individual basis. For example:
The rowcount value should be n-1 the number of duplicates for a given key value. In this example, there are 2 duplicates so rowcount is set to 1. The col1/col2 values are taken from the above GROUP BY query result. If the GROUP BY query returns multiple rows, the "set rowcount" query will have to be run once for each of these rows. Each time it is run, set rowcount to n-1 the number of duplicates of the particular PK value.Before deleting the rows, you should verify that the entire row is duplicate. While unlikely, it is possible that the PK values are duplicate, yet the row as a whole is not. An example of this would be a table with Social Security Number as the primary key, and having two different people (or rows) with the same number, each having unique attributes. In such a case whatever malfunction caused the duplicate key may have also caused valid unique data to be placed in the row. This data should copied out and preserved for study and possible reconciliation prior to deleting the data. If there are many distinct sets of duplicate PK values in the table, it may be too time-consuming to remove them individually. In this case the following procedure can be used:
Additional query words: sql6 dedupe entity intg
Keywords : kbusage SSrvProg |
Last Reviewed: April 10, 1999 © 2000 Microsoft Corporation. All rights reserved. Terms of Use. |