SQL
Deleting Duplicate Rows While Keeping One
Efficiently remove duplicate records from your database table, preserving one unique entry based on a chosen ordering, vital for data integrity.
-- For PostgreSQL, SQL Server, Oracle (using CTE and ROW_NUMBER)
WITH DuplicateCTE AS (
SELECT
id,
column1,
column2,
ROW_NUMBER() OVER(PARTITION BY column1, column2 ORDER BY id) as rn
FROM
your_table
)
DELETE FROM your_table
WHERE id IN (
SELECT id
FROM DuplicateCTE
WHERE rn > 1
);
-- For MySQL (using a self-join)
-- This method might be slower on very large tables than the CTE approach
DELETE t1 FROM your_table t1
INNER JOIN your_table t2
WHERE
t1.id > t2.id AND
t1.column1 = t2.column1 AND
t1.column2 = t2.column2;
How it works: This snippet provides two common methods to delete duplicate rows while ensuring at least one unique record remains. The first method, suitable for PostgreSQL, SQL Server, and Oracle, uses a Common Table Expression (CTE) with the `ROW_NUMBER()` window function. It partitions data by the columns that define uniqueness (`column1`, `column2`) and assigns a row number. Records with `rn > 1` are duplicates and are then deleted. The second method, tailored for MySQL, uses a self-join to identify and delete duplicates where one record's `id` is greater than another's, effectively keeping the one with the smaller `id`.