PYTHON
Grouping Iterables by Key with `itertools.groupby`
Learn to efficiently group consecutive identical elements in an iterable using `itertools.groupby` in Python, a powerful tool for data aggregation and structured processing.
from itertools import groupby
# Grouping by a simple attribute (requires sorted data)
data = [
{"city": "New York", "name": "Alice"},
{"city": "London", "name": "Bob"},
{"city": "New York", "name": "Charlie"},
{"city": "London", "name": "David"},
{"city": "New York", "name": "Eve"},
]
# For groupby to work correctly, the iterable must be sorted by the key
data.sort(key=lambda x: x["city"])
print(f"Sorted data: {data}
")
grouped_data = {}
for key, group in groupby(data, key=lambda x: x["city"]):
grouped_data[key] = list(group)
print(f"Grouped by city (using groupby): {grouped_data}")
# Example: Grouping file paths by extension
file_paths = ["doc1.txt", "image.png", "report.pdf", "doc2.txt", "logo.png"]
file_paths.sort(key=lambda x: x.split(".")[-1]) # Sort by extension
print(f"Sorted file paths: {file_paths}
")
grouped_by_extension = {}
for extension, files in groupby(file_paths, key=lambda x: x.split(".")[-1]):
grouped_by_extension[extension] = list(files)
print(f"Grouped by extension: {grouped_by_extension}")
How it works: `itertools.groupby` is a powerful function for grouping consecutive identical elements in an iterable. It returns an iterator that yields pairs: `(key, group_iterator)`. A crucial point is that `groupby` only groups *consecutive* elements. Therefore, the input iterable must first be sorted by the desired grouping key to ensure all items with the same key are adjacent. This snippet demonstrates grouping lists of dictionaries by a common key and grouping file paths by their extensions after sorting.