SQL

Identify and Delete Duplicate Rows While Keeping One

Learn to find duplicate records in a table based on specific columns and then delete all but one occurrence, maintaining data integrity.

-- Step 1: Identify duplicate rows
SELECT
  column1, column2, COUNT(*)
FROM
  your_table
GROUP BY
  column1, column2
HAVING
  COUNT(*) > 1;

-- Step 2: Delete duplicate rows, keeping the one with the minimum ID
DELETE FROM
  your_table
WHERE
  id IN (
    SELECT
      id
    FROM
      (
        SELECT
          id,
          ROW_NUMBER() OVER(PARTITION BY column1, column2 ORDER BY id) as rn
        FROM
          your_table
      ) AS subquery
    WHERE
      rn > 1
  );
-- Replace 'id' with your primary key column and 'column1, column2' with the columns defining uniqueness.

How it works: This snippet provides a two-step approach to handle duplicate rows. The first query helps you identify which combinations of `column1` and `column2` (which define uniqueness for your data) have more than one entry. The second, more crucial part, uses a subquery with the `ROW_NUMBER()` window function. `ROW_NUMBER()` assigns a unique sequential integer to rows within a partition (defined by `column1`, `column2`), ordered by `id`. By deleting all rows where `rn > 1`, you effectively remove all duplicates while retaining the earliest inserted record (assuming `id` is an auto-incrementing primary key). Adjust `column1`, `column2`, and `id` to match your table's schema.

Identify and Delete Duplicate Rows While Keeping One

Related SQL Snippets

Aggregate Multiple Metrics with Conditional Counts

Perform an Upsert (Insert or Update) on a Single Record

Calculate Running Totals or Cumulative Sums with Window Functions

Need help integrating this into your project?