This is an archived version of the course and is no longer updated. Please find the latest version of the course on the main webpage.

pickle

If you need to save complex Python data structures, and only expect to load it in the future into Python, then you can consider using Python’s pickle module.

Pickle

Image by Alina Kuptsova from Pixabay

The pickle module is used for serialising and de-serialising Python object structures.

It is useful for saving data. For example, you may use it to save your Machine Learning model that you have been spending the whole week training.

You pickle your Python objects onto the disk as a binary file (serialising), and you unpickle them from the disk into memory (deserialising).

You can pickle integers, floats, booleans, strings, tuples, lists, sets, dictionaries (that contain objects that can be pickled), top-level classes. [No pickled gherkins, sorry!]

Health warnings!

  • pickle is specific to Python so is not recommended if you expect to share your data across different programming languages
  • Make sure you use the same Python version. It is not guaranteed that pickle will work with different versions of Python
  • Do not unpickle data from untrusted sources as you may execute malicious code inside the file when unpickling

Pickling time!

Just like json, you pickle with pickle.dump(obj, file), and unpickle with pickle.load(file).

import pickle

courses = {558: {"lecturer": "Josiah Wang", "title": "Python Programming"}, 
   556: {"lecturer": "Robert Craven", "title": "Symbolic AI"}}

# Save courses to disk. Note binary mode!
with open("courses.pkl", "wb") as f:
    pickle.dump(courses, f)

# Load courses from disk. Again, it is a binary file!
with open("courses.pkl", "rb") as f: 
    pickled_courses = pickle.load(f)

print(pickled_courses)
## {558: {'lecturer': 'Josiah Wang', 'title': 'Python Programming'}, 
## 556: {'lecturer': 'Robert Craven', 'title': 'Symbolic AI'}} 

print(type(pickled_courses)) ## <class 'dict'> 

print(courses == pickled_courses)  ## True

Here is another example of pickling a list of objects (of a custom class)

import pickle

class Vector: 
    def __init__(self, a, b): 
        self.a = a 
        self.b = b

    def __str__(self): 
        return f"Vector ({self.a}, {self.b})"

    def __repr__(self):
        """ This makes the unique string representation
            of the object instance look more readble
        """
        return str(self)

v1 = Vector(2, 3) 
v2 = Vector(4, 3) 
v = [v1, v2] 

# Save v to disk.
with open('vectors.pkl', 'wb') as f: 
    pickle.dump(v, f)

# Load pickled file from disk
with open('vectors.pkl', 'rb') as f: 
    pickled_vectors = pickle.load(f)

print(pickled_vectors)  ## [Vector (2, 3), Vector (4, 3)]

print(type(pickled_vectors))  ## <class 'list'>