SQL
Identify and Delete Duplicate Rows While Keeping One
Learn to find duplicate records in a table based on specific columns and then delete all but one occurrence, maintaining data integrity.
-- Step 1: Identify duplicate rows
SELECT
column1, column2, COUNT(*)
FROM
your_table
GROUP BY
column1, column2
HAVING
COUNT(*) > 1;
-- Step 2: Delete duplicate rows, keeping the one with the minimum ID
DELETE FROM
your_table
WHERE
id IN (
SELECT
id
FROM
(
SELECT
id,
ROW_NUMBER() OVER(PARTITION BY column1, column2 ORDER BY id) as rn
FROM
your_table
) AS subquery
WHERE
rn > 1
);
-- Replace 'id' with your primary key column and 'column1, column2' with the columns defining uniqueness.
How it works: This snippet provides a two-step approach to handle duplicate rows. The first query helps you identify which combinations of `column1` and `column2` (which define uniqueness for your data) have more than one entry. The second, more crucial part, uses a subquery with the `ROW_NUMBER()` window function. `ROW_NUMBER()` assigns a unique sequential integer to rows within a partition (defined by `column1`, `column2`), ordered by `id`. By deleting all rows where `rn > 1`, you effectively remove all duplicates while retaining the earliest inserted record (assuming `id` is an auto-incrementing primary key). Adjust `column1`, `column2`, and `id` to match your table's schema.