PYTHON
Efficiently Find N Smallest/Largest Elements with heapq
Learn to use Python's heapq module to efficiently manage priority queues, making it easy to find the N smallest or largest elements in a collection without full sorting.
import heapq
def find_n_smallest(data, n):
"""
Finds the N smallest elements in a list using heapq.nsmallest.
For largest, use heapq.nlargest.
"""
if not isinstance(data, list):
return []
if not isinstance(n, int) or n < 0:
raise ValueError("N must be a non-negative integer.")
# heapq.nsmallest returns a list of the n smallest elements
# It's more efficient than sorting the entire list if n is much smaller than len(data)
return heapq.nsmallest(n, data)
def find_n_largest(data, n):
"""
Finds the N largest elements in a list using heapq.nlargest.
"""
if not isinstance(data, list):
return []
if not isinstance(n, int) or n < 0:
raise ValueError("N must be a non-negative integer.")
return heapq.nlargest(n, data)
# Example usage:
numbers = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]
n_val = 3
smallest_three = find_n_smallest(numbers, n_val)
largest_three = find_n_largest(numbers, n_val)
# print(f"Numbers: {numbers}")
# print(f"{n_val} Smallest: {smallest_three}") # Expected: [1, 1, 2]
# print(f"{n_val} Largest: {largest_three}") # Expected: [9, 6, 5]
# For a custom priority, use a key function (like sorted or min/max)
people = [
{'name': 'Alice', 'age': 30},
{'name': 'Bob', 'age': 24},
{'name': 'Charlie', 'age': 35},
{'name': 'David', 'age': 28}
]
youngest_two = heapq.nsmallest(2, people, key=lambda p: p['age'])
# print(f"Youngest two people: {youngest_two}")
How it works: This snippet demonstrates the use of the `heapq` module for efficient retrieval of the N smallest or largest elements from a collection. Unlike sorting the entire list (which takes O(N log N) time), `heapq.nsmallest` and `heapq.nlargest` perform these operations in O(M log N) time, where M is the total number of elements and N is the number of elements to retrieve, making them significantly faster when N is much smaller than M. This is achieved by internally managing a min-heap or max-heap. It's ideal for priority queue implementations or when you only need a subset of ranked data.