SQL
Identify and Remove Duplicate Records in SQL
Learn essential SQL techniques to find duplicate rows based on specific columns and safely delete redundant entries while preserving a unique record.
-- 1. Find duplicate records (e.g., by email and name)
SELECT email, name, COUNT(*)
FROM customers
GROUP BY email, name
HAVING COUNT(*) > 1;
-- 2. Delete duplicate records, keeping the one with the minimum ID
DELETE FROM customers
WHERE id NOT IN (
SELECT MIN_ID FROM (
SELECT MIN(id) AS MIN_ID
FROM customers
GROUP BY email, name
) AS temp
);
How it works: This snippet provides a two-step approach to managing duplicate records. The first part identifies duplicates by grouping rows on specific columns (e.g., email and name) and filtering for groups with more than one entry. The second part demonstrates how to delete these duplicates while ensuring one unique record is kept, typically the one with the MIN(id) or MAX(id) to preserve the oldest or newest entry, respectively. The inner subquery is necessary in some SQL dialects to avoid issues with modifying a table while selecting from it in the same statement.