PYTHON
Group Data by Key Using defaultdict in Python
Discover how to easily group a list of dictionaries or objects by a specific key into a dictionary of lists using Python's `collections.defaultdict`.
from collections import defaultdict
data = [
{'id': 1, 'category': 'fruit', 'item': 'apple'},
{'id': 2, 'category': 'vegetable', 'item': 'carrot'},
{'id': 3, 'category': 'fruit', 'item': 'banana'},
{'id': 4, 'category': 'vegetable', 'item': 'celery'},
{'id': 5, 'category': 'fruit', 'item': 'grape'},
]
# Grouping by 'category'
grouped_data = defaultdict(list)
for item in data:
group_key = item['category']
grouped_data[group_key].append(item)
print(f"Grouped by category: {dict(grouped_data)}")
# Example with a different key: 'item' (to show flexibility)
data_with_duplicates = [
{'order_id': 'A1', 'product': 'Laptop', 'price': 1200},
{'order_id': 'A2', 'product': 'Mouse', 'price': 25},
{'order_id': 'A1', 'product': 'Keyboard', 'price': 75},
{'order_id': 'A3', 'product': 'Laptop', 'price': 1200},
]
grouped_by_order = defaultdict(list)
for record in data_with_duplicates:
group_key = record['order_id']
grouped_by_order[group_key].append(record)
print(f"
Grouped by order ID: {dict(grouped_by_order)}")
How it works: `collections.defaultdict` is incredibly useful for grouping data. When you access a key that doesn't exist in a `defaultdict`, it automatically creates an entry for that key using the factory function provided during initialization (e.g., `list` for `defaultdict(list)`). This eliminates the need to explicitly check if a key exists before appending an item, making the code cleaner and less error-prone. This pattern is very common when processing lists of records and organizing them by a common attribute.