SQL

Efficiently Identify and Delete Duplicate Rows in SQL

Discover how to find and remove duplicate records from a SQL table, ensuring that only unique entries remain, using subqueries for precise control without Common Table Expressions.

-- Step 1: Identify duplicate rows
SELECT
    column1,
    column2,
    COUNT(*)
FROM
    your_table
GROUP BY
    column1,
    column2
HAVING
    COUNT(*) > 1;

-- Step 2: Delete duplicate rows, keeping one (assuming 'id' is a unique primary key)
DELETE FROM your_table
WHERE id IN (
    SELECT id FROM (
        SELECT
            id,
            ROW_NUMBER() OVER(PARTITION BY column1, column2 ORDER BY id) as rn
        FROM
            your_table
    ) AS subquery
    WHERE rn > 1
);

-- Alternative for databases that do not support ROW_NUMBER() or complex subqueries in DELETE directly:
-- DELETE t1 FROM your_table t1
-- INNER JOIN your_table t2 ON t1.column1 = t2.column1 AND t1.column2 = t2.column2 AND t1.id > t2.id;
-- This approach keeps the row with the smallest ID among duplicates.

How it works: This snippet demonstrates how to handle duplicate rows without relying on explicit Common Table Expressions. The first query identifies duplicates by grouping on `column1` and `column2` and checking for `COUNT(*) > 1`. For deletion, it uses a subquery to assign a `ROW_NUMBER()` to each row partitioned by the identifying columns. Rows with `rn > 1` are identified as duplicates, and their `id`s are used in the `DELETE` statement. An alternative `INNER JOIN` method is also provided for databases with more restrictive subquery support in `DELETE` operations, which keeps the first occurrence of a duplicate set (based on `id`).

Efficiently Identify and Delete Duplicate Rows in SQL

Related SQL Snippets

Pivot Rows to Columns Using Conditional Aggregation in SQL

Query Data by Date Range and Extract Date Components in SQL

Paginate Query Results with SQL LIMIT and OFFSET Clauses

Need help integrating this into your project?