Zip files are commonly used to reduce the size of, or compress, large files and directories. By convention,  files having a  .zip  extension are associated with zip files.

Extracting a zip file involves taking the contents out of the compressed zip file to a directory in the computer. If a user wants to use certain files found in a zip file, they extract them and use them  just like normal files/directories.  

The zipfile module

The zipfile module in the standard library is used to manipulate zip files.   The module contains various classes and methods that allow you to create, read, update, and extract files from zip archives

The is_zipfile() function in the module can be used to check whether a given file is a valid zip file.

Syntax:

zipfile.is_zipfile(filename)

The filename may be a path to the file or a file-like object.

The function returns True if filename is a valid ZIP file, otherwise False

import zipfile

print(zipfile.is_zipfile('example.zip'))

The most useful class in the zipfile module is the ZipFile class.  The class is a constructor function for and  can be used for various operations involving zip files. Such operations include:

  • Creating new Zip files.
  • Reading from Zip files.
  • Extracting the contents of an existing zip file.
  • Adding or removing files from an existing zip file.
  • Setting passwords and encryption for  zip files.
  • Updating existing entries in a zip file.
  • Renaming entries in a zip file 

opening a zip file

To open a zip file, you create a ZipFile object using the directory name as the argument.  You typically follows the below steps:

from zipfile import ZipFile

zf = ZipFile(file, mode = 'r', compression = ) #open the zip file
#do something with the file zip file.
zf.close()

 

The file argument specifies the location of the zip file you want to open, or the name of the zip file you want to create.

The mode argument specifies the mode used to open the zip file.  Mode can be either 'r' for read-only, 'w' for write-only, 'x' for exclusive write, or 'a' for appending. 

  • The default mode is the 'r' mode meaning that the zip file is opened as read only by default.
  • The 'w' mode is used for writing data to the zipfile, it overwrites the existing data in the archive.
  • The 'x' mode is similar to the 'w' mode but raises an error if a zip file with a similar name exists.
  • The 'a' mode is used to append data to the zip file, it does not overwrite the existing content.

All the above modes except for the 'r' mode will create an empty zip file if the file does not exist.

The compression argument specifies the compression method to be used for the files within the ZIP archive. It allows you to control how the files are compressed and stored in the archive. Some commonly used compression options are:

  • zipfile.ZIP_STORED: This option indicates that the file should be stored without any compression. It essentially adds the files to the archive as they are, without reducing their size. This is the fastest option but does not provide any compression benefits.

  • zipfile.ZIP_DEFLATED: This option uses the DEFLATE algorithm to compress the files. DEFLATE is a widely used compression algorithm that provides a good balance between compression ratio and speed. It is suitable for compressing most types of data.

  • zipfile.ZIP_BZIP2: This option uses the BZIP2 compression algorithm. BZIP2 typically provides higher compression ratios than DEFLATE but is slower in terms of compression and decompression speed. It can be useful when you need better compression for certain types of data.

  • zipfile.ZIP_LZMA: This option uses the LZMA (Lempel-Ziv-Markov chain algorithm) compression algorithm. LZMA offers a higher compression ratio but is slower compared to DEFLATE and BZIP2. It is especially effective for compressing large amounts of text or binary data.

We can preferably use the ZipFile class as a  context manager, i.e using the with statement, as shown below:

from zipfile import ZipFile

with ZipFile('file_path.zip', 'r') as zf:
    #perform some operations

Using  the with statement where possible is considered a good practice, as it ensures that resources are properly managed; for example, it ensures that an opened file is closed when the block of code that opened it finishes execution. This eliminates the need to call the close() method explicitly.  The with statement also ensures that the zip file is closed automatically, regardless of whether the code within the block raises an exception.

List all names(files and directories) in a zip file

The ZipFile defines the namelist() method which can be used to list the relative path to each file and directory present in a zip file.

Syntax:

from zipfile import ZipFile

with ZipFile('file_path.zip') as zf:
    zf.namelist()

The namelist() method takes no argument.

Get file information

The getinfo() method of the ZipFile object returns a ZipInfo object, which contains various information about a specific file present in the ZIP archive. The ZipInfo object provides some useful attributes, which includes:

  • filename: The name of the file, including any relative directories to the file.
  • file_size: The uncompressed size of the file in bytes.
  • compress_size: The compressed size of the file in bytes.
  • date_time: A tuple containing the year, month, day, hour, minute, and second of the last modification time of the file.
  • comment: Any comment associated with the file.

Syntax:

 with zipfile.ZipFile(zip_file_path, 'r') as zf:
     file_info = zf.getinfo(file_name)

Example:

 with zipfile.ZipFile('my_file.zip', 'r') as zf:
     file_info = zf.getinfo('example.txt')#
     name = file_info.filename
     size = file_info.file_size
     last_modified = file_info.date_time

To retrieve bulk information on all the file present in the archive, you use the infolist() method of the ZipFile object. The method returns a list of ZipInfo objects, each representing information about a file within the ZIP archive.

Syntax:

 with zipfile.ZipFile(zip_file_path, 'r') as zf:
     bulk_info = zf.infolist()

Test the files in the archive 

The testzip() method in the ZipFile object is used to perform a self-test on the contents of a ZIP archive. It checks the integrity of the archive by verifying the CRC (cyclic redundancy check) values of each file within the archive. If the zip passes the test, The the method returns None, otherwise, The method returns the name of the first corrupted file encountered within the ZIP archive.

Syntax:

 with zipfile.ZipFile(zip_file_path, 'r') as zf:
     zf.testzip()

Open a file in the archive.

To open a file in a zip archive for reading or writing, you use the open() method of the ZipFile object. The method share similar arguments and syntax  with the builtin open() function.

Syntax:

from zipfile import ZipFile

with ZipFile('my_file.zip') as zf:
    with zf.open(my_file, mode) as f:
        #do something with the opened file.

The nested with statement is used to handle the opened file.

Example:

from zipfile import ZipFile

with ZipFile('my_file.zip') as zf:
    with zf.open('examples/example1.txt', 'w') as f:
        f.write('Ipsum, Lorem')

Read from a file in the archive(directly).

The read() method is used to read the content of a given file without having to explicitly open it.

from zipfile import ZipFile

with ZipFile('myzip.file'):
    file_content = file.read('demo.txt')
    file_content = file_content.decode("utf-8")

The extracted contents are returned as a byte string, which can be decoded into a text format using an appropriate encoding (e.g., UTF-8).

Make a new Directory in the zip archive.

The mkdir() method of the zip object creates a directory based on a string argument.  The argument should be the directory's path name.

Syntax:

from zipfile import ZipFile

with ZipFile('my_file.zip', 'a') as zf:
   zf.mkdir('path/to/the/dir')

If a directory with a similar relative path exists, a duplicate Warning will be issued.

Note:The archive must be opened with mode 'w''x' or 'a'.

Add External Files to the archive.

The write method of the ZipFile object is used to add files to a ZIP archive. It takes two parameters:

Syntax:

import zipfile

# Create a new ZIP archive
with zipfile.ZipFile('archive.zip', 'w') as myzip:
    # Add a file to the archive
    myzip.write('file.txt', 'file_in_archive.txt')

The first parameter of the write method specifies the path to the file to be added to the archive. The second parameter is the optional arcname(archive name), which specifies the name under which the file should be stored in the archive.

Extract files from  the archive.

The extract() method in the zipfile module is used to extract a specific file from a ZIP archive.

Syntax:

from zipfile import ZipFile

with ZipFile('archive.zip', 'r') as zip_file:
    # Extract a specific file from the archive
    zip_file.extract('path/to/file', destination_folder)

The extracted file will be stored in the 'destination_folder' directory. If the directory does not exist, it will be created automatically. If a file with a similar name exists in the destination folder it will be overwritten.

To extract all files from the archive, use the extractall() method.

from zipfile import ZipFile

with zipfile.ZipFile('archive.zip', 'r') as zip_file:
    zip_file.extractall('destination_folder')

Create a zip archive for an entire directory

To create a zip file from a directory you can call the ZipFile.write() method iteratively with each file and subdirectory in the directory. The example below shows how we can achieve this:

import os
from zipfile import ZipFile
    
def zipdir(path, zip_file):
    for root, dirs, files in os.walk(path):
        for file in files:
            zip_file.write(os.path.join(root, file), 
                       os.path.relpath(os.path.join(root, file), 
                                       os.path.join(path, '..')))

with ZipFile('Example.zip', 'w', zipfile.ZIP_DEFLATED) as zip_file:
    zipdir('target/directory/', zip_file)

You should replace 'target/directory/' with the actual path of the directory you want to zip. Then, it will create a ZIP archive named 'Example.zip' and add all the files and subdirectories from the specified directory.