SQL

Deleting Duplicate Rows While Keeping One

Efficiently remove duplicate records from your database table, preserving one unique entry based on a chosen ordering, vital for data integrity.

-- For PostgreSQL, SQL Server, Oracle (using CTE and ROW_NUMBER)
WITH DuplicateCTE AS (
    SELECT
        id,
        column1,
        column2,
        ROW_NUMBER() OVER(PARTITION BY column1, column2 ORDER BY id) as rn
    FROM
        your_table
)
DELETE FROM your_table
WHERE id IN (
    SELECT id
    FROM DuplicateCTE
    WHERE rn > 1
);

-- For MySQL (using a self-join)
-- This method might be slower on very large tables than the CTE approach
DELETE t1 FROM your_table t1
INNER JOIN your_table t2
WHERE
    t1.id > t2.id AND
    t1.column1 = t2.column1 AND
    t1.column2 = t2.column2;
How it works: This snippet provides two common methods to delete duplicate rows while ensuring at least one unique record remains. The first method, suitable for PostgreSQL, SQL Server, and Oracle, uses a Common Table Expression (CTE) with the `ROW_NUMBER()` window function. It partitions data by the columns that define uniqueness (`column1`, `column2`) and assigns a row number. Records with `rn > 1` are duplicates and are then deleted. The second method, tailored for MySQL, uses a self-join to identify and delete duplicates where one record's `id` is greater than another's, effectively keeping the one with the smaller `id`.

Need help integrating this into your project?

Our team of expert developers can help you build your custom application from scratch.

Hire DigitalCodeLabs