The pickle
module in the standard library provide tools for serializing arbitrary Python object into a series of bytes.
A serialized object can be stored in physical memory and then later reconstructed into the equivalent Python object. This can be used to achieve data persistence across different Python processes
To use the module, we first need to import it in our Python program:
import pickle
Basic Usage
Basically, the pickle.dumps()
function is the basic tool used for data serialization. It serializes a Python object into the equivalent bytes sequence. It takes the object to be serialized as the argument.
pickle.dumps(obj)
serialize a dictionary
import pickle
data = {'Tokyo': 'Japan', 'Moscow': 'Russia', 'Manilla': 'Philippines'} # a dictionary
serialized_data = pickle.dumps(data) #serialize the dictionary
print(serialized_data)
To reconstruct/deserialize the serialized object into the equivalent Python object, we use the pickle.loads()
function. The function takes the serialized data as the argument and then tries to reconstruct it into the equivalent object. It returns the reconstructed Python object.
pickle.loads(serialized_data)
Reconstruct a serialized object
import pickle
data = {'Tokyo': 'Japan', 'Moscow': 'Russia', 'Manilla': 'Philippines'} # a dictionary
serialized_data = pickle.dumps(data) #serialize the dictionary
print(serialized_data)
reconstructed_object = pickle.loads(serialized_data)
print(reconstructed_object)
Note that the reconstructed object is equal to but not the same as the original object.
In the above examples, we used a dictionary, however it is worth noting that literally any type of Python object can be serialized and reconstructed. Consider the following example which uses custom objects.
import pickle
class Point:
def __init__(self, x = 0, y = 0):
self.x = x
self.y = y
def __str__(self):
return f'Point({self.x}, {self.y})'
p = Point(-2, 3)
#serialize the p2
serialized_p = pickle.dumps(p)
print(serialized_p)
#reconstruct
reconstructed_p = pickle.loads(serialized_p)
print(reconstructed_p)
print(reconstructed_p.x)
print(reconstructed_p.y)
b'\x80\x04\x95-\x00\x00\x00\x00\x00\x00\x00\x8c\x08__main__\x94\x8c\x05Point\x94\x93\x94)\x81\x94}\x94(\x8c\x01x\x94J\xfe\xff\xff\xff\x8c\x01y\x94K\x03ub.'
Point(-2, 3)
-2
3
Data persistence
A serialized object can be written into a file for permanent storage. The data can then be loaded and used even after the current program instance has been closed.
Basically, you can use the file methods like write()
and read()
to write and read serialized data to and from a file, however, as we will see in a while, the module provides more convenient functions for doing the same.
write the serialized data to a file
import pickle
data = {'Tokyo': 'Japan', 'Moscow': 'Russia', 'Manilla': 'Philippines', 'Nairobi': 'Kenya'} # a dictionary
serialized_data = pickle.dumps(data) #serialize the dictionary
print(serialized_data)
#write serialized data to cities.dump
with open('cities.dump', 'wb') as file:
file.write(serialized_data)
b'\x80\x04\x95Q\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x05Tokyo\x94\x8c\x05Japan\x94\x8c\x06Moscow\x94\x8c\x06Russia\x94\x8c\x07Manilla\x94\x8c\x0bPhilippines\x94\x8c\x07Nairobi\x94\x8c\x05Kenya\x94u.'
The wb
flag in the open()
function indicates that the data should be written as binary data instead of mere strings.
We can then read from the file then call pickle.loads()
to reconstruct the object.
Read serialized object and reconstruct it
import pickle
#read data from cities.dump
with open('cities.dump', 'rb') as file:
serialized_data = file.read()
reconstructed_object = pickle.loads(serialized_data)
print(reconstructed_object)
{'Tokyo': 'Japan', 'Moscow': 'Russia', 'Manilla': 'Philippines', 'Nairobi': 'Kenya'}
Use dump()
and load()
instead
As earlier mentioned, the pickle module offers functions to conveniently read and write data into a stream. The two functions are dump()
and load()
.
dump()
serializes an object and at the same time writes it into a file. Basically, It has the following syntax:
pickle.dump(obj, file)
It serializes obj
and writes it into file
.
using dump()
import pickle
class Point:
def __init__(self, x = 0, y = 0):
self.x = x
self.y = y
def __str__(self):
return f'Point({self.x}, {self.y})'
p1= Point(-2, 3)
p2 = Point(3, 4)
p3 = Point(5, 6)
points = [p1, p2, p3] # a list with points to serialize
with open('points.dump', 'wb') as file:
pickle.dump(points, file)
The load()
method reads and reconstructs data from the given file. It has the following basic syntax:
pickle.load(file)
It returns the reconstructed object.
import pickle
class Point:
def __init__(self, x = 0, y = 0):
self.x = x
self.y = y
def __str__(self):
return f'Point({self.x}, {self.y})'
def __repr__(self):
return f'Point({self.x}, {self.y}'
with open('points.dump', 'rb') as file:
reconstructed_points = pickle.load(file)
print(reconstructed_points)
p1, p2, p3 = reconstructed_points
print(p1)
print(p2)
print(p3)
[Point(-2, 3), Point(3, 4), Point(5, 6)]
Point(-2, 3)
Point(3, 4)
Point(5, 6)
Unpicklable objects
Unpicklable objects are those that cannot be serialized using pickle
module.
Generally, objects that depends on the operating system in use cannot be serialized. Such objects includes files, database connections, sockets, etc
Trying to serialize such objects using pickle will result in errors.
import pickle
file = open('file.txt')
pickle.dumps(file)
TypeError: cannot pickle 'TextIOWrapper' instances
If an object contains unpicklable attributes, we can define the __getstate__()
and __setstate__()
methods to return only the attributes to be serialized.