PYTHON

Deduplicate List of Dictionaries by a Specific Key

Efficiently remove duplicate dictionaries from a list based on the unique values of a specified key, retaining the first encountered item.

records = [
    {"id": 101, "name": "Apple", "price": 1.0},
    {"id": 102, "name": "Banana", "price": 0.5},
    {"id": 101, "name": "Apple", "price": 1.2}, # Duplicate ID, different price
    {"id": 103, "name": "Cherry", "price": 2.5},
    {"id": 102, "name": "Banana", "price": 0.6}, # Duplicate ID, different price
    {"id": 104, "name": "Date", "price": 3.0}
]

def deduplicate_list_of_dicts(data, key_to_deduplicate_by):
    """
    Deduplicates a list of dictionaries based on the unique value of a specified key.
    Retains the first occurrence of an item.

    Args:
        data (list): A list of dictionaries.
        key_to_deduplicate_by (str): The key whose values determine uniqueness.

    Returns:
        list: A new list with duplicate dictionaries removed.
    """
    seen_keys = set()
    deduplicated_data = []
    for item in data:
        unique_value = item.get(key_to_deduplicate_by)
        if unique_value is not None and unique_value not in seen_keys:
            deduplicated_data.append(item)
            seen_keys.add(unique_value)
    return deduplicated_data

deduplicated_records = deduplicate_list_of_dicts(records, "id")

print("Original Records:")
for record in records:
    print(record)

print("
Deduplicated Records (by 'id'):")
for record in deduplicated_records:
    print(record)

# Expected output for deduplicated_records:
# [{'id': 101, 'name': 'Apple', 'price': 1.0},
#  {'id': 102, 'name': 'Banana', 'price': 0.5},
#  {'id': 103, 'name': 'Cherry', 'price': 2.5},
#  {'id': 104, 'name': 'Date', 'price': 3.0}]

How it works: This snippet provides a function to deduplicate a list of dictionaries. It iterates through the list, using a `set` (`seen_keys`) to keep track of values for a designated key (`key_to_deduplicate_by`) that have already been encountered. If an item's key value hasn't been seen before, the item is added to the result list and its key value is added to the `seen_keys` set. This effectively retains the *first* occurrence of each unique item based on that specific key, making it useful for cleaning up data where multiple entries might share a common identifier but vary in other attributes.

Deduplicate List of Dictionaries by a Specific Key

Related PYTHON Snippets

Efficient Set Operations for List Comparisons

Implement a Simple LRU Cache with OrderedDict

Safely Access Nested Dictionary Values in Python

Need help integrating this into your project?