SQL

Identify and Remove Duplicate Rows from a SQL Table

Efficiently find duplicate records based on specific columns and safely remove them, keeping only one unique entry using SQL window functions or self-joins.

-- Step 1: Identify duplicates
SELECT
  id, email, username, created_at,
  ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_at) as rn
FROM
  users;

-- Step 2: Delete duplicates (keeping the oldest entry by created_at)
DELETE FROM users
WHERE id IN (
  SELECT id FROM (
    SELECT
      id,
      ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_at) as rn
    FROM
      users
  ) AS subquery
  WHERE rn > 1
);
How it works: This snippet provides a common pattern for handling duplicate rows. The first part uses the `ROW_NUMBER()` window function to assign a rank to each row within partitions defined by the `email` column. The `ORDER BY created_at` ensures that for duplicate emails, the earliest entry gets `rn=1`. The second part then uses this logic within a `DELETE` statement, targeting all `id`s where `ROW_NUMBER()` is greater than 1, effectively removing all but the first (oldest) entry for each duplicate email. This is a robust way to clean up redundant data.

Need help integrating this into your project?

Our team of expert developers can help you build your custom application from scratch.

Hire DigitalCodeLabs