PYTHON
Group Data by Key Using Python's `defaultdict`
Efficiently group a list of dictionaries or objects by a specific key into a dictionary of lists using Python's `collections.defaultdict` for streamlined data aggregation, crucial for processing API responses or database results.
from collections import defaultdict
# Sample data: A list of users, each with a 'city' key
users_data = [
{'id': 1, 'name': 'Alice', 'city': 'New York'},
{'id': 2, 'name': 'Bob', 'city': 'London'},
{'id': 3, 'name': 'Charlie', 'city': 'New York'},
{'id': 4, 'name': 'David', 'city': 'Paris'},
{'id': 5, 'name': 'Eve', 'city': 'London'}
]
# Group users by city using a regular dictionary (more verbose)
users_by_city_regular = {}
for user in users_data:
city = user['city']
if city not in users_by_city_regular:
users_by_city_regular[city] = []
users_by_city_regular[city].append(user)
print(f"Grouped (regular dict): {users_by_city_regular}")
# Group users by city using defaultdict (more concise)
users_by_city_defaultdict = defaultdict(list)
for user in users_data:
city = user['city']
users_by_city_defaultdict[city].append(user)
print(f"Grouped (defaultdict): {dict(users_by_city_defaultdict)}")
# Example of using a lambda as default_factory for more complex initialization
# For instance, if you wanted to group into sets instead of lists to ensure uniqueness
users_by_city_unique = defaultdict(set)
for user in users_data:
city = user['city']
users_by_city_unique[city].add(user['name']) # Add only the name
print(f"Grouped (defaultdict with set): {dict(users_by_city_unique)}")
How it works: This snippet demonstrates how to group data (e.g., a list of user dictionaries) by a specific key using `collections.defaultdict`. When accessing a key that doesn't exist in a `defaultdict`, it automatically creates an entry using the `default_factory` (e.g., `list` or `set` in this case) and returns that default value. This eliminates the need for explicit `if key not in dict:` checks, making the code more concise and less prone to errors when aggregating data from sources like API responses or database queries.