SQL

Identifying and Deleting Duplicate Rows in SQL

Efficiently find and remove duplicate records from your database tables based on one or more columns, preserving only unique entries while maintaining data integrity.

-- Step 1: Identify duplicate rows (optional, for verification)
SELECT
    column1,
    column2,
    COUNT(*)
FROM your_table
GROUP BY column1, column2
HAVING COUNT(*) > 1;

-- Step 2: Delete duplicate rows, keeping one instance
DELETE FROM your_table
WHERE id IN (
    SELECT id FROM (
        SELECT
            id,
            ROW_NUMBER() OVER(PARTITION BY column1, column2 ORDER BY id) as rn
        FROM your_table
    ) AS T
    WHERE rn > 1
);

-- Note:
-- 'id' should be a unique identifier (e.g., primary key) for each row.
-- Adjust 'column1', 'column2' to the columns that define a duplicate set.
-- The ORDER BY clause in ROW_NUMBER() determines which duplicate row is kept (e.g., the one with the lowest/highest id).

-- Example 'your_table' structure:
-- CREATE TABLE your_table (
--     id INT PRIMARY KEY AUTO_INCREMENT,
--     column1 VARCHAR(50),
--     column2 VARCHAR(50),
--     value INT
-- );
-- INSERT INTO your_table (column1, column2, value) VALUES
-- ('A', 'X', 10), ('A', 'Y', 20), ('B', 'X', 30),
-- ('A', 'X', 15), -- Duplicate of first row based on col1, col2
-- ('C', 'Z', 40),
-- ('B', 'X', 35); -- Duplicate of third row based on col1, col2
How it works: This snippet shows how to identify and remove duplicate rows while keeping one unique instance. It uses the `ROW_NUMBER()` window function partitioned by the columns that define uniqueness. Rows with a `ROW_NUMBER` greater than 1 are considered duplicates and are then deleted, ensuring that only one record for each distinct combination of the specified columns remains.

Need help integrating this into your project?

Our team of expert developers can help you build your custom application from scratch.

Hire DigitalCodeLabs