SQL

Deleting Duplicate Rows While Keeping One

Efficiently remove duplicate records from your database table, preserving one unique entry based on a chosen ordering, vital for data integrity.

-- For PostgreSQL, SQL Server, Oracle (using CTE and ROW_NUMBER)
WITH DuplicateCTE AS (
    SELECT
        id,
        column1,
        column2,
        ROW_NUMBER() OVER(PARTITION BY column1, column2 ORDER BY id) as rn
    FROM
        your_table
)
DELETE FROM your_table
WHERE id IN (
    SELECT id
    FROM DuplicateCTE
    WHERE rn > 1
);

-- For MySQL (using a self-join)
-- This method might be slower on very large tables than the CTE approach
DELETE t1 FROM your_table t1
INNER JOIN your_table t2
WHERE
    t1.id > t2.id AND
    t1.column1 = t2.column1 AND
    t1.column2 = t2.column2;

How it works: This snippet provides two common methods to delete duplicate rows while ensuring at least one unique record remains. The first method, suitable for PostgreSQL, SQL Server, and Oracle, uses a Common Table Expression (CTE) with the `ROW_NUMBER()` window function. It partitions data by the columns that define uniqueness (`column1`, `column2`) and assigns a row number. Records with `rn > 1` are duplicates and are then deleted. The second method, tailored for MySQL, uses a self-join to identify and delete duplicates where one record's `id` is greater than another's, effectively keeping the one with the smaller `id`.

Deleting Duplicate Rows While Keeping One

Related SQL Snippets

Implementing SQL Pagination with OFFSET and LIMIT

Performing Upsert (Insert or Update) Operations in SQL

Calculating Running Totals and Cumulative Sums in SQL

Need help integrating this into your project?