Skip to main content

🎯 Sets

Collections with no duplicates

The Guest List Analogy

At a party with a guest list:

  • Each person can only appear once on the list
  • Checking if someone is invited is fast (just scan the list)
  • Order doesn't matter - just "are they on the list or not?"

Sets work like this - they store unique elements with O(1) lookup, and duplicates are automatically ignored.


What Makes Sets Special

Uniqueness

Adding to a set:
  add("Alice") → Set: {Alice}
  add("Bob")   → Set: {Alice, Bob}
  add("Alice") → Set: {Alice, Bob}  ← Alice not added twice!

Duplicates are automatically ignored.

Fast Membership Testing

Is "Alice" in the set?

Array: Check each element → O(n)
Set: Hash lookup → O(1)

For 1 million elements:
  Array: 1,000,000 comparisons
  Set: 1 comparison!

Set Operations

Core Operations

ADD - Add element (O(1))
  Set: {1, 2}
  add(3) → {1, 2, 3}
  add(2) → {1, 2, 3}  (no change, 2 already exists)

REMOVE - Remove element (O(1))
  Set: {1, 2, 3}
  remove(2) → {1, 3}

CONTAINS - Check membership (O(1))
  Set: {1, 2, 3}
  contains(2) → true
  contains(5) → false

SIZE - Count elements (O(1))
  Set: {1, 2, 3}
  size() → 3

Mathematical Set Operations

A = {1, 2, 3}
B = {2, 3, 4}

UNION (A ∪ B): All elements in either set
  {1, 2, 3, 4}

INTERSECTION (A ∩ B): Elements in both sets
  {2, 3}

DIFFERENCE (A - B): Elements in A but not B
  {1}

SYMMETRIC DIFFERENCE: Elements in one but not both
  {1, 4}

Visual Example

Removing duplicates from: [1, 2, 2, 3, 1, 4, 3, 5]

Add each to set:
  1 → {1}
  2 → {1, 2}
  2 → {1, 2}        (duplicate ignored)
  3 → {1, 2, 3}
  1 → {1, 2, 3}     (duplicate ignored)
  4 → {1, 2, 3, 4}
  3 → {1, 2, 3, 4}  (duplicate ignored)
  5 → {1, 2, 3, 4, 5}

Result: {1, 2, 3, 4, 5} - all unique!

Where Sets Are Used

1. Remove Duplicates

Input: [1, 2, 2, 3, 1]
unique = set(input)
Output: {1, 2, 3}

2. Fast Lookup Tables

Banned words: {"spam", "scam", "phishing"}

For each comment:
  if any word in banned_words:
    flag comment

3. Tracking Visited Items

BFS/DFS traversal:
  visited = set()

  if node in visited:
    skip
  else:
    visited.add(node)
    process node

4. Finding Common Elements

Friends of Alice: {"Bob", "Carol", "Dave"}
Friends of Eve: {"Carol", "Frank", "Dave"}

Mutual friends = Alice ∩ Eve = {"Carol", "Dave"}

5. Set Theory in Queries

Users who like Python AND JavaScript:
  python_users ∩ javascript_users

Users who like Python OR JavaScript:
  python_users ∪ javascript_users

Classic Set Problems

1. Contains Duplicate

Array: [1, 2, 3, 1]
Has duplicate? Yes!

Approach:
  Add each element to set.
  If element already in set → duplicate found!

2. Intersection of Two Arrays

nums1 = [1, 2, 2, 1]
nums2 = [2, 2]

set1 = {1, 2}
set2 = {2}
intersection = {2}

3. Single Number

Every element appears twice except one.
Array: [2, 2, 1]
Answer: 1

Approach:
  XOR all elements (a XOR a = 0)
  Or use set: add if not present, remove if present

4. Happy Number

Repeatedly sum squares of digits until 1 or cycle.

Use set to detect cycles:
  If sum seen before → cycle → not happy
  If sum = 1 → happy!

Set Implementations

Hash Set

Based on hash table:

Pros:
  - O(1) add, remove, contains
  - Fast!

Cons:
  - Unordered
  - Uses more memory

Tree Set (Sorted Set)

Based on balanced tree:

Pros:
  - Elements maintained in sorted order
  - Range queries possible

Cons:
  - O(log n) operations
  - More complex

Which to Use?

NeedUse
Fast lookupHash Set
Sorted orderTree Set
Min/max elementTree Set
Simple dedupHash Set

Set vs Other Structures

OperationSetList/ArrayHash Map
ContainsO(1)O(n)O(1)
AddO(1)O(1)O(1)
RemoveO(1)O(n)O(1)
Get by indexO(1)
Allows duplicates❌ for keys

Set = Hash Map without values (keys only).


Common Mistakes

1. Expecting Order

Most sets are unordered. Don't rely on iteration order!

2. Unhashable Elements

Lists and dicts can't be in sets (mutable). Use tuples or frozensets.

3. Modifying During Iteration

Don't add/remove while looping over a set!

4. Forgetting O(n) for Set Operations

Union/intersection are O(n), not O(1).


FAQ

Q: Set vs list for uniqueness?

Set: O(1) lookup, unique by design, unordered. List + manual check: O(n) lookup, preserves order.

Q: Can sets contain sets?

No (sets are mutable). Use frozenset for nested sets.

Q: Why is order not preserved?

Hash sets use hash positions, not insertion order. Python 3.7+ dicts preserve order, but sets may not.

Q: When is a list better than a set?

When you need: ordering, duplicates, or index access.


Summary

A set is a collection of unique elements with O(1) membership testing.

Key Takeaways:

  • Unique elements only - duplicates ignored
  • O(1) add, remove, contains
  • Unordered (hash set) or ordered (tree set)
  • Set operations: union, intersection, difference
  • Used for: deduplication, fast lookup, visited tracking
  • Elements must be hashable (immutable)

Sets answer "is this element in the collection?" instantly!

Leave a Comment

Comments (0)

Be the first to comment on this concept.

Comments are approved automatically.