SQL
Efficiently Identify and Delete Duplicate Rows in SQL
Discover how to find and remove duplicate records from a SQL table, ensuring that only unique entries remain, using subqueries for precise control without Common Table Expressions.
-- Step 1: Identify duplicate rows
SELECT
column1,
column2,
COUNT(*)
FROM
your_table
GROUP BY
column1,
column2
HAVING
COUNT(*) > 1;
-- Step 2: Delete duplicate rows, keeping one (assuming 'id' is a unique primary key)
DELETE FROM your_table
WHERE id IN (
SELECT id FROM (
SELECT
id,
ROW_NUMBER() OVER(PARTITION BY column1, column2 ORDER BY id) as rn
FROM
your_table
) AS subquery
WHERE rn > 1
);
-- Alternative for databases that do not support ROW_NUMBER() or complex subqueries in DELETE directly:
-- DELETE t1 FROM your_table t1
-- INNER JOIN your_table t2 ON t1.column1 = t2.column1 AND t1.column2 = t2.column2 AND t1.id > t2.id;
-- This approach keeps the row with the smallest ID among duplicates.
How it works: This snippet demonstrates how to handle duplicate rows without relying on explicit Common Table Expressions. The first query identifies duplicates by grouping on `column1` and `column2` and checking for `COUNT(*) > 1`. For deletion, it uses a subquery to assign a `ROW_NUMBER()` to each row partitioned by the identifying columns. Rows with `rn > 1` are identified as duplicates, and their `id`s are used in the `DELETE` statement. An alternative `INNER JOIN` method is also provided for databases with more restrictive subquery support in `DELETE` operations, which keeps the first occurrence of a duplicate set (based on `id`).