SQL

Identify and Remove Duplicate Rows from a SQL Table

Efficiently find duplicate records based on specific columns and safely remove them, keeping only one unique entry using SQL window functions or self-joins.

-- Step 1: Identify duplicates
SELECT
  id, email, username, created_at,
  ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_at) as rn
FROM
  users;

-- Step 2: Delete duplicates (keeping the oldest entry by created_at)
DELETE FROM users
WHERE id IN (
  SELECT id FROM (
    SELECT
      id,
      ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_at) as rn
    FROM
      users
  ) AS subquery
  WHERE rn > 1
);

How it works: This snippet provides a common pattern for handling duplicate rows. The first part uses the `ROW_NUMBER()` window function to assign a rank to each row within partitions defined by the `email` column. The `ORDER BY created_at` ensures that for duplicate emails, the earliest entry gets `rn=1`. The second part then uses this logic within a `DELETE` statement, targeting all `id`s where `ROW_NUMBER()` is greater than 1, effectively removing all but the first (oldest) entry for each duplicate email. This is a robust way to clean up redundant data.

Identify and Remove Duplicate Rows from a SQL Table

Related SQL Snippets

Querying Nested JSON Data in PostgreSQL

Grouping Data by Week and Year in PostgreSQL

Conditional Data Ordering with CASE Statement

Need help integrating this into your project?