Hash Map

Back Home

Category

Category: Hash-based

Topics

Topics: key-value storage, caching, indexing, frequency counting

Definition

A data structure that stores key-value pairs using a hash function to compute an index into an array of buckets, providing average O(1) lookup, insertion, and deletion.

Use cases

Attributes

Common operations

Operation Average Worst Description
Get O(1) O(n) Retrieve value by key
Put O(1) O(n) Insert or update key-value pair
Delete O(1) O(n) Remove key-value pair
Contains O(1) O(n) Check if key exists
Resize O(n) O(n) Rehash all elements

In code

# Python dict IS a hash map
cache = {}

# Insert - O(1) average
cache["key1"] = "value1"
cache["key2"] = "value2"

# Get - O(1) average
val = cache.get("key1", "default")

# Check existence - O(1) average
if "key1" in cache:
    print("Found!")

# Delete - O(1) average
del cache["key1"]

# Simple hash map implementation (chaining)
class HashMap:
    def __init__(self, capacity=16):
        self.capacity = capacity
        self.size = 0
        self.buckets = [[] for _ in range(capacity)]
        self.load_factor = 0.75

    def _hash(self, key):
        return hash(key) % self.capacity

    def put(self, key, value):
        if self.size / self.capacity >= self.load_factor:
            self._resize()
        idx = self._hash(key)
        for i, (k, v) in enumerate(self.buckets[idx]):
            if k == key:
                self.buckets[idx][i] = (key, value)
                return
        self.buckets[idx].append((key, value))
        self.size += 1

    def get(self, key):
        idx = self._hash(key)
        for k, v in self.buckets[idx]:
            if k == key:
                return v
        raise KeyError(key)

    def _resize(self):
        old_buckets = self.buckets
        self.capacity *= 2
        self.buckets = [[] for _ in range(self.capacity)]
        self.size = 0
        for bucket in old_buckets:
            for key, value in bucket:
                self.put(key, value)

Time complexity

Load factor: Python dict resizes at ~66% capacity. Higher load factor = more collisions, lower = more memory waste.

Space complexity

O(n) for n key-value pairs. Additional space for empty buckets depends on load factor and capacity. With separate chaining, O(m + n) where m is bucket array size.

Trade-offs

Pros:

Cons:

Variants

When to use vs avoid