SQL
Identify and Remove Duplicate Rows in SQL
Discover SQL techniques to efficiently find duplicate entries in your database table based on one or more columns and safely remove them, keeping only one unique record.
-- Identify duplicates (e.g., based on email and name)
SELECT
email,
first_name,
COUNT(*) AS duplicate_count
FROM
users
GROUP BY
email, first_name
HAVING
COUNT(*) > 1;
-- Delete all but one duplicate (PostgreSQL, SQL Server, Oracle)
DELETE FROM users
WHERE user_id IN (
SELECT user_id
FROM (
SELECT
user_id,
ROW_NUMBER() OVER (PARTITION BY email, first_name ORDER BY user_id) as rn
FROM
users
) AS subquery
WHERE rn > 1
);
-- Delete all but one duplicate (MySQL using JOIN)
-- This assumes a unique ID column like 'user_id'
DELETE t1 FROM users t1
INNER JOIN users t2
WHERE
t1.user_id < t2.user_id AND
t1.email = t2.email AND
t1.first_name = t2.first_name;
How it works: This snippet provides methods to handle duplicate records. First, it shows how to identify duplicates using `GROUP BY` and `HAVING COUNT(*) > 1`. Then, it demonstrates how to delete redundant rows, keeping only one unique entry. For databases supporting window functions (`ROW_NUMBER()`), a subquery identifies rows to delete based on a partition. For MySQL, a self-join comparison can be used to achieve the same result.