PYTHON
Group Data Efficiently with Python's `collections.defaultdict`
Learn to simplify data grouping tasks in Python using `collections.defaultdict`, automatically handling missing keys and streamlining code for categorization.
from collections import defaultdict
# Example 1: Grouping a list of dictionaries by a key
sales_data = [
{'product': 'Laptop', 'category': 'Electronics', 'price': 1200},
{'product': 'Mouse', 'category': 'Electronics', 'price': 25},
{'product': 'Shirt', 'category': 'Apparel', 'price': 30},
{'product': 'Keyboard', 'category': 'Electronics', 'price': 75},
{'product': 'Pants', 'category': 'Apparel', 'price': 50},
]
grouped_by_category = defaultdict(list)
for item in sales_data:
grouped_by_category[item['category']].append(item)
print(f"Grouped by category: {grouped_by_category}")
# Accessing groups
print(f"Electronics items: {grouped_by_category['Electronics']}")
# Example 2: Counting occurrences (similar to Counter, but showing defaultdict usage)
word_list = ["apple", "banana", "apple", "orange", "banana"]
word_counts = defaultdict(int) # Default value for int is 0
for word in word_list:
word_counts[word] += 1
print(f"Word counts (defaultdict): {word_counts}")
# Example 3: Grouping by the first letter of a string
names = ["Alice", "Bob", "Anna", "Charlie", "David"]
grouped_by_first_letter = defaultdict(list)
for name in names:
grouped_by_first_letter[name[0].upper()].append(name)
print(f"Grouped by first letter: {grouped_by_first_letter}")
How it works: `collections.defaultdict` is a dictionary subclass that calls a factory function to supply missing values. When you try to access a key that isn't present, `defaultdict` automatically inserts a default value (e.g., an empty list for `defaultdict(list)` or `0` for `defaultdict(int)`) before returning it. This eliminates the need for explicit key checking, simplifying code for tasks like grouping data, accumulating sums, or building indices.