← Back to all snippets
PYTHON

Performing Set Operations for Data Comparison and Deduplication

Utilize Python sets for efficient data comparison, finding common elements (intersection), unique elements (difference), and combining unique items from multiple collections.

# Example data: user IDs from different systems
system_a_users = {101, 102, 103, 104, 105}
system_b_users = {104, 105, 106, 107}
system_c_users = {103, 107, 108, 109}

# Union: All unique users across systems
all_unique_users = system_a_users.union(system_b_users, system_c_users)
# Alternatively: all_unique_users = system_a_users | system_b_users | system_c_users
print(f"All unique users: {all_unique_users}")

# Intersection: Users present in both system A and system B
common_ab_users = system_a_users.intersection(system_b_users)
# Alternatively: common_ab_users = system_a_users & system_b_users
print(f"Users in both A and B: {common_ab_users}")

# Difference: Users in system A but not in system B
a_only_users = system_a_users.difference(system_b_users)
# Alternatively: a_only_users = system_a_users - system_b_users
print(f"Users in A only (not B): {a_only_users}")

# Symmetric Difference: Users unique to either system A or system B (not in both)
unique_to_a_or_b = system_a_users.symmetric_difference(system_b_users)
# Alternatively: unique_to_a_or_b = system_a_users ^ system_b_users
print(f"Users unique to A or B: {unique_to_a_or_b}")

# Checking for subsets and supersets
set_small = {101, 102}
print(f"Is {set_small} a subset of {system_a_users}? {set_small.issubset(system_a_users)}")
print(f"Is {system_a_users} a superset of {set_small}? {system_a_users.issuperset(set_small)}")
How it works: This snippet demonstrates powerful set operations in Python for efficiently comparing and manipulating collections of unique items. Sets allow for quick computation of unions (all unique elements), intersections (common elements), differences (elements in one set but not another), and symmetric differences (elements unique to either set). These operations are highly optimized and invaluable for tasks like data deduplication, finding discrepancies between data sources, or managing unique identifiers.

Need help integrating this into your project?

Our team of expert developers can help you build your custom application from scratch.

Hire DigitalCodeLabs