SQL

Identify and Remove Duplicate Rows in SQL

Discover SQL techniques to efficiently find duplicate entries in your database table based on one or more columns and safely remove them, keeping only one unique record.

-- Identify duplicates (e.g., based on email and name)
SELECT
    email,
    first_name,
    COUNT(*) AS duplicate_count
FROM
    users
GROUP BY
    email, first_name
HAVING
    COUNT(*) > 1;

-- Delete all but one duplicate (PostgreSQL, SQL Server, Oracle)
DELETE FROM users
WHERE user_id IN (
    SELECT user_id
    FROM (
        SELECT
            user_id,
            ROW_NUMBER() OVER (PARTITION BY email, first_name ORDER BY user_id) as rn
        FROM
            users
    ) AS subquery
    WHERE rn > 1
);

-- Delete all but one duplicate (MySQL using JOIN)
-- This assumes a unique ID column like 'user_id'
DELETE t1 FROM users t1
INNER JOIN users t2
WHERE
    t1.user_id < t2.user_id AND
    t1.email = t2.email AND
    t1.first_name = t2.first_name;

How it works: This snippet provides methods to handle duplicate records. First, it shows how to identify duplicates using `GROUP BY` and `HAVING COUNT(*) > 1`. Then, it demonstrates how to delete redundant rows, keeping only one unique entry. For databases supporting window functions (`ROW_NUMBER()`), a subquery identifies rows to delete based on a partition. For MySQL, a self-join comparison can be used to achieve the same result.

Identify and Remove Duplicate Rows in SQL

Related SQL Snippets

Query Hierarchical Data with Recursive CTE

Query and Manipulate JSON Data in SQL

Rank Query Results Using SQL Window Functions

Need help integrating this into your project?