PYTHON

Grouping Data Efficiently with collections.defaultdict

Streamline data aggregation in Python using `collections.defaultdict`. Automatically initialize dictionary values for easy grouping of items, perfect for processing API responses or user data.

from collections import defaultdict

# Imagine a list of user activities from a log or database
activities = [
    {"user_id": "u1", "action": "login", "timestamp": "2023-01-01"},
    {"user_id": "u2", "action": "view_page", "timestamp": "2023-01-01"},
    {"user_id": "u1", "action": "add_to_cart", "timestamp": "2023-01-02"},
    {"user_id": "u3", "action": "login", "timestamp": "2023-01-02"},
    {"user_id": "u2", "action": "purchase", "timestamp": "2023-01-03"},
    {"user_id": "u1", "action": "logout", "timestamp": "2023-01-03"}
]

# Group activities by user_id using a standard dictionary (verbose)
user_activities_standard = {}
for activity in activities:
    user_id = activity["user_id"]
    if user_id not in user_activities_standard:
        user_activities_standard[user_id] = []
    user_activities_standard[user_id].append(activity)
print(f"Standard dictionary grouping:
{user_activities_standard}
")

# Group activities by user_id using defaultdict (concise and cleaner)
user_activities_default = defaultdict(list)
for activity in activities:
    user_activities_default[activity["user_id"]].append(activity)
print(f"Defaultdict grouping:
{dict(user_activities_default)}
")

# Another example: counting item types
items = ["apple", "banana", "apple", "orange", "banana", "apple"]
item_counts = defaultdict(int)
for item in items:
    item_counts[item] += 1
print(f"Item counts with defaultdict:
{dict(item_counts)}")
How it works: `collections.defaultdict` is a powerful subclass of the standard dictionary that provides a default value for a key that does not exist. When you try to access a key that isn't present, `defaultdict` automatically creates it and assigns a default value, defined by the factory function passed to its constructor (e.g., `list`, `int`, `set`). This eliminates the need for explicit key checking and initialization, making code for grouping, aggregation, or frequency counting significantly cleaner, more concise, and less prone to `KeyError` exceptions when processing dynamic data like API responses or user events.

Need help integrating this into your project?

Our team of expert developers can help you build your custom application from scratch.

Hire DigitalCodeLabs