SQL

Identify and Remove Duplicate Rows While Preserving One

Discover an efficient SQL method to find and delete duplicate records from a table, ensuring one unique entry is retained based on criteria like minimum ID.

DELETE T1 FROM your_table T1
JOIN (
    SELECT column1, column2, MIN(id) as min_id
    FROM your_table
    GROUP BY column1, column2
    HAVING COUNT(*) > 1
) AS T2
ON T1.column1 = T2.column1
AND T1.column2 = T2.column2
AND T1.id > T2.min_id;

-- For PostgreSQL/SQL Server, you might use an equivalent subquery structure:
-- DELETE FROM your_table
-- WHERE id IN (
--     SELECT t_inner.id
--     FROM your_table t_inner
--     JOIN (
--         SELECT column1, column2, MIN(id) as min_id
--         FROM your_table
--         GROUP BY column1, column2
--         HAVING COUNT(*) > 1
--     ) AS t_min
--     ON t_inner.column1 = t_min.column1
--     AND t_inner.column2 = t_min.column2
--     AND t_inner.id > t_min.min_id
-- );
How it works: This snippet demonstrates how to remove duplicate rows from a table, keeping only one unique entry based on a combination of columns (`column1`, `column2`) and retaining the row with the smallest `id`. It uses a self-join with a subquery that first identifies the `MIN(id)` for each group of duplicate `(column1, column2)` pairs. The `DELETE` statement then targets and removes all rows where the `id` is greater than this `min_id` for each duplicate group, effectively preserving the earliest encountered unique record. This approach avoids using database-specific window functions for broader compatibility.

Need help integrating this into your project?

Our team of expert developers can help you build your custom application from scratch.

Hire DigitalCodeLabs