Hash Tables

by Justin Skycak (@justinskycak) on March 08, 2021

Under the hood, dictionaries are hash tables.

This post is part of the book Introduction to Algorithms and Machine Learning: from Sorting to Strategic Agents. Suggested citation: Skycak, J. (2021). Hash Tables. In Introduction to Algorithms and Machine Learning: from Sorting to Strategic Agents. https://justinmath.com/hash-tables/

Want to get notified about new posts? Join the mailing list and follow on X/Twitter.

Under the hood, dictionaries are hash tables.

The most elementary (and inefficient) version of a hash table would be a list of tuples. For example, if we wanted to implement the dictionary


{
    'a': [0 ,1],
    'b': 'abcd',
    'c': 3.14
}

then we’d have the following list of tuples:


[
    ('a', [0, 1]),
    ('b', 'abcd'),
    ('c', 3.14  )
]

To add a new key-value pair to the dictionary, we’d just append the corresponding tuple, and to look up the value for some key, we’d just loop through the tuples until we get to the tuple with the key we wanted (and then we’d return the corresponding value).

Exercise: Buckets and Hash Functions

Searching through a long array is slow. So, to be more efficient, we use several lists of tuples. We call those lists buckets, and we use a hash function to tell us which bucket to put the new key-value pair in.

Complete the code below to implement a special case of an elementary hash table. We’ll expand on this example soon, but let’s start with something simple.


array = [[], [], [], [], []] # has 5 empty buckets

def hash(string):
    # Return the sum of character indices in the string 
    # (where 'a' has index 0, 'b' has index 1, ..., 'z' has index 25)
    # modulo 5.

    # We'll assume the string consists of lowercase
    # letters with no other characters or spaces.

def insert(array, key, value):
    # Apply the hash function to the key to get the bucket index.
    # then append the (key, value) pair to the bucket.

def find(array, key):
    # Apply the hash function to the key to get the bucket index.
    # Then, loop through the bucket until you get to the tuple with
    # the desired key, and return the corresponding value.

Here’s an example of how the hash table will work:


>>> print(array)
array = [[], [], [], [], []]

>>> insert(array, 'a', [0,1])
>>> insert(array, 'b', 'abcd')
>>> insert(array, 'c', 3.14)
>>> print(array)
[
    [('a', [0, 1])],
    [('b', 'abcd')],
    [('c', 3.14  )],
    [],
    []
]

>>> insert(array, 'd', 0)
>>> insert(array, 'e', 0)
>>> insert(array, 'f', 0)
>>> print(array)
[
    [('a', [0, 1]), ('f', 0)],
    [('b', 'abcd')],
    [('c', 3.14  )],
    [('d', 0     )],
    [('e', 0     )]
]
Test your code as follows:

alphabet = 'abcdefghijklmnopqrstuvwxyz'
for i, char in enumerate(alphabet):
    key = 'someletters'+char
    value = [i, i**2, i**3]
    insert(array, key, value)

for i, char in enumerate(alphabet):
    key = 'someletters'+char
    output_value = find(array, key)
    desired_value = [i, i**2, i**3]
    assert output_value == desired_value

Exercise: Hash Table

Write a class HashTable that generalizes the hash table you previously wrote. This class should store an array of buckets, and the hash function should add up the alphabet indices of the input string and mod the result by the number of buckets.


>>> ht = HashTable(num_buckets = 3)
>>> ht.buckets
[[], [], []]
>>> ht.hash('cabbage')
2    (because 2+0+1+1+0+6+4 mod 3 = 14 mod 3 = 2)

>>> ht.insert('cabbage', 5)
>>> ht.buckets
[
    [],
    [],
    [('cabbage', 5)]
]

>>> ht.insert('cab', 20)
>>> ht.buckets
[
    [('cab',    20)],
    [],
    [('cabbage', 5)]
]

>>> ht.insert('c', 17)
>>> ht.buckets
[
    [('cab',    20)],
    [],
    [('cabbage', 5), ('c', 17)]
]

>>> ht.insert('ac', 21)
>>> ht.buckets
[
    [('cab',    20)],
    [],
    [('cabbage', 5), ('c', 17), ('ac', 21)]
]

>>> ht.find('cabbage')
5
>>> ht.find('cab')
20
>>> ht.find('c')
17
>>> ht.find('ac')
21

Want to get notified about new posts? Join the mailing list and follow on X/Twitter.