SQL
Identify and Remove Duplicate Rows (Keeping One)
Learn how to find duplicate records based on specific columns and then delete all but one occurrence, ensuring data integrity in your database effectively.
-- 1. Identify duplicates:
SELECT
column1, column2, COUNT(*)
FROM
your_table
GROUP BY
column1, column2
HAVING
COUNT(*) > 1;
-- 2. Delete duplicates, keeping the one with the minimum ID (example for MySQL/PostgreSQL/SQL Server):
DELETE FROM
your_table
WHERE
id NOT IN (
SELECT
MIN(id)
FROM
your_table
GROUP BY
column1, column2
);
How it works: This snippet provides a two-step approach to handle duplicate records in your database. The first query helps you identify which combinations of `column1` and `column2` have more than one entry, flagging potential duplicates. The second query then safely deletes all duplicate rows, ensuring that at least one (specifically, the one with the minimum `id` in this example) is kept. This is crucial for maintaining data uniqueness and integrity, especially after data imports or during clean-up operations.