SQL
Identifying Duplicate Records Using Window Functions
Detect duplicate entries within a dataset based on one or more columns using SQL window functions like ROW_NUMBER(), useful for data cleaning and integrity.
WITH RankedItems AS (
SELECT
id,
email,
username,
ROW_NUMBER() OVER(PARTITION BY email ORDER BY created_at) as rn
FROM
users
)
SELECT
id,
email,
username
FROM
RankedItems
WHERE
rn > 1;
How it works: This query identifies duplicate records in the `users` table based on the `email` column. The `ROW_NUMBER() OVER(PARTITION BY email ORDER BY created_at)` window function assigns a sequential number to each row within groups defined by the `email`. If `rn` is greater than 1, it indicates that the row is a duplicate within its `email` group, ordered by `created_at` to prioritize the original entry.