The pickle module in the standard library provide tools for serializing arbitrary Python object into a series of bytes.

A serialized object can be stored in physical memory and then later reconstructed into the equivalent Python object. This can be used to achieve data persistence across different Python processes

To use the module, we first need to import it in our Python program:

import pickle

Basic Usage

Basically, the pickle.dumps() function is the basic tool used for data serialization.  It serializes a Python object into the equivalent bytes sequence. It takes the object to be serialized as the argument.

pickle.dumps(obj)

serialize a dictionary  

import pickle

data = {'Tokyo': 'Japan', 'Moscow': 'Russia', 'Manilla': 'Philippines'} # a dictionary

serialized_data = pickle.dumps(data) #serialize the dictionary
print(serialized_data)

To reconstruct/deserialize the serialized object into the equivalent Python object, we use the pickle.loads() function. The function takes the serialized data as the argument and then tries to reconstruct it into the equivalent object. It returns the reconstructed Python object.

pickle.loads(serialized_data)

Reconstruct a serialized object

import pickle

data = {'Tokyo': 'Japan', 'Moscow': 'Russia', 'Manilla': 'Philippines'} # a dictionary

serialized_data = pickle.dumps(data) #serialize the dictionary
print(serialized_data)

reconstructed_object = pickle.loads(serialized_data)
print(reconstructed_object)

Note that the reconstructed object is equal to but not the same as the original object.

In the above examples, we used a dictionary, however it is worth noting that literally any type of Python object can be serialized and reconstructed. Consider the following example which uses custom objects.

import pickle

class Point:
	def __init__(self, x = 0, y = 0):
		self.x = x 
		self.y = y 
	def __str__(self):
		return f'Point({self.x}, {self.y})'

p = Point(-2, 3)

#serialize the p2
serialized_p = pickle.dumps(p)
print(serialized_p)

#reconstruct 
reconstructed_p = pickle.loads(serialized_p)
print(reconstructed_p)
print(reconstructed_p.x)
print(reconstructed_p.y)

b'\x80\x04\x95-\x00\x00\x00\x00\x00\x00\x00\x8c\x08__main__\x94\x8c\x05Point\x94\x93\x94)\x81\x94}\x94(\x8c\x01x\x94J\xfe\xff\xff\xff\x8c\x01y\x94K\x03ub.'
Point(-2, 3)
-2
3

Data persistence

A serialized object can be written into a file for permanent storage. The data can then be loaded and used even after the current program instance has been closed. 

Basically, you can use the file methods like write() and read() to write and read serialized data to and from a file, however, as we will see in a while, the module provides more convenient functions for doing the same.

write the serialized data to a file

import pickle

data = {'Tokyo': 'Japan', 'Moscow': 'Russia', 'Manilla': 'Philippines', 'Nairobi': 'Kenya'} # a dictionary

serialized_data = pickle.dumps(data) #serialize the dictionary
print(serialized_data)

#write serialized data to cities.dump
with open('cities.dump', 'wb') as file:
      file.write(serialized_data)

b'\x80\x04\x95Q\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x05Tokyo\x94\x8c\x05Japan\x94\x8c\x06Moscow\x94\x8c\x06Russia\x94\x8c\x07Manilla\x94\x8c\x0bPhilippines\x94\x8c\x07Nairobi\x94\x8c\x05Kenya\x94u.'

The wb flag in the open() function indicates that the data should be written as binary data instead of mere strings.

We can then read from the file then call pickle.loads() to reconstruct the object.

Read serialized object and reconstruct it

import pickle

#read data from cities.dump
with open('cities.dump', 'rb') as file:
    serialized_data = file.read()

    reconstructed_object = pickle.loads(serialized_data)
    print(reconstructed_object)

{'Tokyo': 'Japan', 'Moscow': 'Russia', 'Manilla': 'Philippines', 'Nairobi': 'Kenya'}

Use dump() and  load() instead

As earlier mentioned, the pickle module offers functions to conveniently read and write data into a stream. The two functions are dump() and load().

dump() serializes an object and at the same time writes it into a file. Basically, It has the following syntax:

pickle.dump(obj, file)

It serializes obj and writes it into file

using dump()

import pickle

class Point:
	def __init__(self, x = 0, y = 0):
		self.x = x 
		self.y = y 
	def __str__(self):
		return f'Point({self.x}, {self.y})'

p1= Point(-2, 3)
p2 = Point(3, 4)
p3 = Point(5, 6)
points = [p1, p2, p3] # a list with points to serialize

with open('points.dump', 'wb') as file:
    pickle.dump(points, file)

 

The load() method reads and reconstructs data from the given file. It has the following basic syntax:

pickle.load(file)

It returns the reconstructed object.

import pickle

class Point:
	def __init__(self, x = 0, y = 0):
		self.x = x 
		self.y = y 
	def __str__(self):
		return f'Point({self.x}, {self.y})'
	def __repr__(self):
		return f'Point({self.x}, {self.y}'

with open('points.dump', 'rb') as file:
    reconstructed_points = pickle.load(file)
    print(reconstructed_points)

p1, p2, p3 = reconstructed_points
print(p1)
print(p2)
print(p3)

[Point(-2, 3), Point(3, 4), Point(5, 6)]
Point(-2, 3)
Point(3, 4)
Point(5, 6) 

Unpicklable objects

Unpicklable objects are those that cannot be serialized using pickle module.

Generally, objects that depends on the operating system in use cannot be serialized. Such objects includes files, database connections, sockets, etc

Trying to serialize such objects using pickle will result in errors.

import pickle

file = open('file.txt')

pickle.dumps(file)

TypeError: cannot pickle 'TextIOWrapper' instances

If an object contains unpicklable attributes, we can define the __getstate__() and  __setstate__() methods to return only the attributes to be serialized.