Set

Back Home

Category

Category: Hash-based

Topics

Topics: membership testing, deduplication, graph visited sets, set theory operations

Definition

An unordered collection of unique hashable elements, implemented as a hash table with only keys (no values), providing O(1) average membership testing.

Use cases

Attributes

Common operations

Operation Average Worst Description
add(x) O(1) O(n) Add element
remove(x) O(1) O(n) Remove element, raises KeyError if missing
discard(x) O(1) O(n) Remove element if present
x in s O(1) O(n) Check membership
len(s) O(1) O(1) Number of elements
s | t (union) O(n+m) O(n+m) Elements in either set
s & t (intersect) O(min) O(n*m) Elements in both sets
s - t (diff) O(n) O(n) Elements in s but not t
s ^ t (sym diff) O(n+m) O(n+m) Elements in exactly one set

In code

# Creation
s = {1, 2, 3}
s = set([1, 2, 2, 3])  # {1, 2, 3} - duplicates removed

# Empty set (not {} which creates dict)
empty = set()

# Add/Remove
s.add(4)
s.remove(1)      # KeyError if missing
s.discard(10)    # No error if missing
s.pop()          # Remove and return arbitrary element

# Membership - O(1) average
if 2 in s:
    print("Found!")

# Set operations
a = {1, 2, 3}
b = {2, 3, 4}

union = a | b           # {1, 2, 3, 4}
intersection = a & b    # {2, 3}
difference = a - b      # {1}
symmetric_diff = a ^ b  # {1, 4}

# Set comparisons
a.issubset(b)      # Is a contained in b?
a.issuperset(b)    # Does a contain b?
a.isdisjoint(b)    # No common elements?

# Set comprehension
evens = {x for x in range(10) if x % 2 == 0}

# Common patterns: deduplication
duplicated_list = [1, 2, 2, 3, 3, 3]
unique = list(set(duplicated_list))  # [1, 2, 3]

# Graph visited set pattern
visited = set()
def dfs(node, graph):
    if node in visited:
        return
    visited.add(node)
    for neighbor in graph.get(node, []):
        dfs(neighbor, graph)

Time complexity

Space complexity

O(n) for n elements. Like dictionaries, Python sets over-allocate for fast access. Each element stores the object reference and its hash.

Trade-offs

Pros:

Cons:

Variants

When to use vs avoid