pickle
If you need to save complex Python data structures, and only expect to load it in the future into Python, then you can consider using Python’s pickle
module.
The pickle
module is used for serialising and de-serialising Python object structures.
It is useful for saving data. For example, you may use it to save your Machine Learning model that you have been spending the whole week training.
You pickle
your Python objects onto the disk as a binary file (serialising), and you unpickle
them from the disk into memory (deserialising).
You can pickle integers, floats, booleans, strings, tuples, lists, sets, dictionaries (that contain objects that can be pickled), top-level classes. [No pickled gherkins, sorry!]
Health warnings!
pickle
is specific to Python so is not recommended if you expect to share your data across different programming languages- Make sure you use the same Python version. It is not guaranteed that pickle will work with different versions of Python
- Do not unpickle data from untrusted sources as you may execute malicious code inside the file when unpickling
Pickling time!
Just like json
, you pickle
with pickle.dump(obj, file)
, and unpickle
with pickle.load(file)
.
import pickle
courses = {558: {"lecturer": "Josiah Wang", "title": "Python Programming"},
556: {"lecturer": "Robert Craven", "title": "Symbolic AI"}}
# Save courses to disk. Note binary mode!
with open("courses.pkl", "wb") as f:
pickle.dump(courses, f)
# Load courses from disk. Again, it is a binary file!
with open("courses.pkl", "rb") as f:
pickled_courses = pickle.load(f)
print(pickled_courses)
## {558: {'lecturer': 'Josiah Wang', 'title': 'Python Programming'},
## 556: {'lecturer': 'Robert Craven', 'title': 'Symbolic AI'}}
print(type(pickled_courses)) ## <class 'dict'>
print(courses == pickled_courses) ## True
Here is another example of pickling a list of objects (of a custom class)
import pickle
class Vector:
def __init__(self, a, b):
self.a = a
self.b = b
def __str__(self):
return f"Vector ({self.a}, {self.b})"
def __repr__(self):
""" This makes the unique string representation
of the object instance look more readble
"""
return str(self)
v1 = Vector(2, 3)
v2 = Vector(4, 3)
v = [v1, v2]
# Save v to disk.
with open('vectors.pkl', 'wb') as f:
pickle.dump(v, f)
# Load pickled file from disk
with open('vectors.pkl', 'rb') as f:
pickled_vectors = pickle.load(f)
print(pickled_vectors) ## [Vector (2, 3), Vector (4, 3)]
print(type(pickled_vectors)) ## <class 'list'>