This is an archived version of the course. Please find the latest version of the course on the main webpage.

Chapter 2: Set

Set elements

face Josiah Wang

What happens when you run the following? Make a guess, and verify with an interactive prompt!

>>> numbers = {1, 2, 3}
>>> print(numbers[1])
???

If you ran the code above (you did run it, didn’t you?), you will find that Python says TypeError: 'set' object is not subscriptable. So you cannot access an element in a set using indexing like for lists or dictionaries. Why? Hint: Look at the second half of my definition of a set (“A set does not have duplicate elements, nor should the ordering be important.”)

If you need to access individual elements, you will need to convert the set to a list, or use it as part of a for loop.

>>> number_set = {1, 2, 3}
>>> number_list = list(number_set)
>>> print(number_list[1])
2
>>> for number in number_set:
...     print(number)
...
1
2
3
>>>

So why do you need sets when you cannot even access its elements? Why not just use a list?

The power of sets comes when you need the elements to be unique.

For example, it is really fast to check whether an element is in a set. The following piece of code compares the speed of membership checks using lists and using sets. If you run the code, you will find that set is much much faster. Run the code multiple times to ensure that the pattern is consistent. (If you run out of memory, reduce the number in line 3 to something smaller)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
import time

numbers = range(1000000)

number_list = list(numbers)
start_time = time.time()
print(51232422 in number_list)
end_time = time.time()
print(f"list took {end_time - start_time} seconds")

number_set = set(numbers)
start_time = time.time()
print(51232422 in number_set)
end_time = time.time()
print(f"set took {end_time - start_time} seconds")

In a single run on my computer, list took 0.00965 seconds and set took 4.935e-05 seconds (or 0.000049 seconds).

set is fast because the elements are stored and retrieved via hashing, just like for dict keys. So always use sets for membership checks with the in operator when you can!