Duplicate records can occur numerous ways, such as loading source files too many times, keying the same data more than once, or from just bad database coding. Having a primary key on your table (and you always should have one) can will in the removal of the duplicate records, but even a primary key it is never a fun task to have handed to you to complete.
I will demonstrate how you can use a common-table expression (CTE) in 2005 to easily remove duplicate entries from a table:
;WITH PendingCalculationCTE(EmployeeId, Start, Finish, [Timestamp], TransactionId, LockFlag, Ranking)
SELECT EmployeeId, Start, Finish, [Timestamp], TransactionId, LockFlag,
Ranking = DENSE_RANK()
OVER(PARTITION BY EmployeeId, Start, Finish, [Timestamp], TransactionId, LockFlag
ORDER BY NEWID() ASC)
DELETE FROM PendingCalculationCTE WHERE Ranking > 1
The script above defines my CTE. I am using a windowing function named DENSE_RANK to group the records together based on the EmployeeId, Start, Finish, [Timestamp], TransactionId, LockFlag fields, and assign them a sequential value randomly. This means that if I have two records with the exact same EmployeeId, Start, Finish, [Timestamp], TransactionId, LockFlag values, the first record will be ranked as 1, the second as 2, and so on.
Because a CTE acts as a virtual table, I am able to process data modification statements against it, and the underlying table will be affected. In this case, I am removing any record from the PendingCalculationCTE that is ranked higher than 1. This will remove all of my duplicate records.
DENSE_RANK – Returns the rank of rows within the partition of a result set, without any gaps in the ranking. The rank of a row is one plus the number of distinct ranks that come before the row in question.