Binary data is a representation of information in the form of bytes( 0s and 1s).  The information stored can be of various types such as  plain text, images (e.g., .jpeg), audio (e.g., .mp3), video (e.g., .mp4), zip files, and more.

Similar to plain text files, binary files also have opening modes that determine how they can be accessed and manipulated. The following are the common opening modes for binary files:

  • "rb" (read binary): This mode is used to open a binary file for reading. It indicates that the file will be read in binary format, allowing you to retrieve the raw data from the file. 

  • "wb" (write binary): This mode is used to open a binary file for writing. It indicates that you will be writing binary data to the file, allowing you to create or overwrite the file with new binary data.

  • "ab" (append binary): This mode is similar to "wb", but it appends binary data to an existing file instead of overwriting it. It allows you to add new data at the end of the file without modifying the existing contents.

  • "rb+" (read and write binary): This mode allows you to open a binary file for both reading and writing. It enables you to read and modify the existing binary data within the file.

The open() function, supports both binary and plain text modes. In binary mode , the function allows the user to effectively perform read and write operations on a binary file. The syntax for using the open() function is as follows:

open(path, mode)

The path argument indicates the location of the file you want to open. The mode argument determines how the file will be opened, whether for reading or writing, and whether it should be treated as a plain text file or a binary file.

In order to perform  read and write operations on binary files,  we need a way to convert Python objects to and from binary representation. The pickle module serves this purpose by offering the functionality to serialize (convert objects into binary) and deserialize (convert binary back into Python objects) Python objects. It is readily available in the standard library and thus requires no additional installations.

Writing binary data to a file: the dump() function

import pickle
with open("demo.bin", 'wb') as file:
    pickle.dump('Hello, World!', file)

Pickle's dump() function is used to serialize an object to binary before writing it to a file.

As shown above, the dump() function takes the value to write and a file object as the parameters. In the previous example, we used a string ('Hello, World!') as the value, but it's important to note that the value can be of any valid type. For instance, in the following example, we pass a dictionary to the dump() function.

import pickle
data = {'Tokyo': 'Japan', 'Ottawa':'Canada', 'Helsinki':'Finland', 'Nairobi':'Kenya', 'Manilla':'Philippines'}
with open("demo.bin", 'wb') as file:
    pickle.dump(data, file)

Binary data are not human readable, if you open the file that we just created i.e demo.bin  in a text editor, you will see a bunch of random characters and symbols. Binary representation is designed to store data in a format that is optimized for machine processing rather than human interpretation.

Reading binary data : The load() function

import pickle
with open("demo.bin", 'rb') as file:
    data = pickle.load(file)
    print(data)

//{'Tokyo': 'Japan', 'Ottawa': 'Canada', 'Helsinki': 'Finland', 'Nairobi': 'Kenya', 'Manilla': 'Philippines'}

The load() function in the pickle module is used to deserialize and read binary data from a file, reconstructing the original Python objects. It takes a file object as a parameter from which it reads the binary data.

It is important to note that the load() function expects the binary data to be in the correct format as generated by the dump() function. Attempting to load data that was not serialized using pickle.dump() may result in errors or unexpected behavior. 

While we can also use the file object methods; read, readline and readlines to read data from a binary file, it is important to note that these methods will only give you the raw binary content without any interpretation or deserialization. For example:

with open("demo.bin", 'rb') as file:
    data = file.read()
    print(data)

//b'\x80\x04\x95f\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x05Tokyo\x94\x8c\x05Japan\x94\x8c\x06Ottawa\//x94\x8c\x06Canada\x94\x8c\x08Helsinki\x94\x8c\x07Finland\x94\x8c\x07Nairobi\x94\x8c\x05Kenya\x94\x//8c\x07Manilla\x94\x8c\x0bPhilippines\x94u.'

Appending to a binary file

We can use the 'ab' mode to append data to a binary file without overwriting the existing data. Example:

import pickle
more_data = {'cow':'moo', 'cat':'Purrr', 'dog':'woof'}
with open("demo.bin", 'ab') as file:
    data = pickle.dump(more_data, file)

To read more than one object from a binary file using pickle.load(), you can iterate over the file and continuously call pickle.load() until an EOFError is raised. The EOFError indicates that there is no more data to unpickle.

with open("demo.bin", 'rb') as file:
    while True:
       try:
            print(pickle.load(file))
       except EOFError:
            break

//{'Tokyo': 'Japan', 'Ottawa': 'Canada', 'Helsinki': 'Finland', 'Nairobi': 'Kenya', 'Manilla': 'Philippines'}
//{'cow': 'moo', 'cat': 'Purrr', 'dog': 'woof'}