Last Updated: 3rd March 2025
One of the most efficient methods to identify and eliminate duplicate rows in an SQL Server table is by utilizing the CTE (Common Table Expression). In this guide, I'll demonstrate how to leverage CTE to quickly and effectively remove duplicate rows from your SQL Server database.Why Duplicate Rows Are Problematic?
Duplicate rows in tables can lead to data inconsistencies, posing challenges for DBAs and developers. By following this simple method, you can ensure your database remains accurate and reliable.
Why Database Design Matters: Preventing Duplicate Records
Technically, a table should never accept duplicate records to begin with. Ensuring data integrity is a fundamental responsibility of a Database Administrator (DBA). By carefully designing the database and its objects, DBAs can prevent duplicates and other data inconsistencies from arising.
Now let us assume, we have found duplicates in our tables and we need to eliminate them. The process begins with identifying duplicate records using the ROW_NUMBER() function. Once duplicates are traced, CTE makes it straightforward to remove them.
ROW_NUMBER() function
The ROW_NUMBER() function in SQL Server is used to assign a unique sequential integer to rows within a result set. This numbering starts at 1 and increments by 1 for each subsequent row. It is particularly useful for identifying duplicates, implementing pagination, or ordering results in a specific manner.
Ok, now let's see how we can find duplicates in a table.
First, create a table and add a few rows, particularly a set of duplicate rows. I have highlighted the two rows that are repeated.
CREATE TABLE dbo.Employee (EmpID INT NOT NULL , EmpName VARCHAR(50) NOT NULL, Designation VARCHAR(50) NULL )
Now add rows to the newly created table Employee. Now let us assume, I accidently inserted two similar set of records for the employee name MIKE PEARL. (Second and Fifth rows)
INSERT INTO Employee (EmpID, EmpName, Designation) VALUES (1, 'GORDEN', 'LAB ASSISTANT'), (2, 'MIKE PEARL', 'SENIOR ACCOUNTANT'), (3, 'ARUN', 'SR. PROGRAMMER'), (4, 'DAVID', 'ACCOUNTANT'), (5, 'MIKE PEARL', 'SENIOR ACCOUNTANT');
Note: You can use this sample database to test various functions at the front end as well as at the database level. This will save you some time.
Find Duplicate Rows using ROW_NUMBER()
Syntax
ROW_NUMBER () OVER ( [ <partition_by_clause> ] <order_by_clause> )
SELECT EmpName, ROW_NUMBER() OVER ( PARTITION BY EmpName ORDER BY EmpID ) RowNumber FROM dbo.Employee
The PARTITION BY option is optional, you may or may not use it, but in this scenario, we need it.
Result:
As you can see, the method has returned two rows for Employee MIKE PEARL, RowNumber 1 and 2. The second value with the result 2 is clearly the duplicate value, which I have inserted accidently (or whatever reason).
Ok. We found the duplicate rows. The next step is to eliminate the duplicate that has a value greater than one.
Remove Duplicate Rows using CTE
With SQL Server 2005, Microsoft has introduced a tool called the Common Table Expression (CTE).
I have read a beautiful article on Code Project about CTE and I would recommend you to read this in full to get a better understanding about CTE and its usage.
The CTE in our example query will hold the record set (rows) retuned by the query using ROW_NUMBER() method.
;WITH CTE AS ( SELECT EmpName, ROW_NUMBER() OVER ( PARTITION BY EmpName ORDER BY EmpID ) RowNumber FROM dbo.Employee ) SELECT *FROM CTE WHERE RowNumber > 1
Result:
The image above shows how the row, which I'll remove, looks like. Now let's remove the row. Instead of SELECT we will use the DELETE command to finish it off.
;WITH CTE AS ( SELECT EmpName, ROW_NUMBER() OVER ( PARTITION BY EmpName ORDER BY EmpID ) RowNumber FROM dbo.Employee ) DELETE FROM CTE WHERE RowNumber > 1
This procedure can swiftly remove thousands of rows, showcasing true efficiency. To test this, repeatedly add five rows until the table contains thousands of rows.
In summary, this article has provided a comprehensive guide on how to find and remove duplicate rows in a table. Despite the extensive details, the solution can be broken down into three main steps.
• Identifying Duplicate Rows: Using the ROW_NUMBER function to trace duplicates.
• Viewing Duplicates: Utilizing Common Table Expressions (CTE) to view the dataset with duplicate rows.
• Removing Duplicates: Applying both functions to efficiently remove the duplicate rows.