glob module in Python

File System

os.path module

module usage

path.join()

expanduser()

basename()

split()

abspath()

exists()

splitext()

isfile()

isdir()

realpath()

expandvars()

getatime()

commonprefix()

normpath()

getsize()

isabs()

glob module

module usage

tempfile module

module usage

shutil module

module usage

mmap module

using mmap

filecmp

io Module

StringIO class

BytesIO class

fnmatch

module usage

glob is one of the various modules that exists in Python's standard library. It is primarily used for finding all pathnames matching a specified pattern. For example, instead of writing custom code for finding all files with a certain extension, you can use glob to achieve this quickly and easily.

The module supports wildcards and patterns as defined by UNIX shell rules.

The primary tool offered in the module is the glob.glob() function which returns a list of path names matching a specified pattern. The function has the following syntax.

glob.glob(pattern, recursive = False, include_hidden = False )

`pattern`	The pattern to be matched, it may contain simple wildcards.
`recursive`	If True, the pattern '**' will match any files and directories in the specified directory and its subdirectories.
`include_hidden`	If True, hidden directories will be matched.

The function returns a list containing the paths that matched the pattern.

For illustration purposes, in the following parts we will work with the glob module assuming that the following files and directories exists in the current working directory.

project
project/file.py
project/file1.py
project/file2.py
project/file3.py
project/tests
project/tests/test.py
project/tests/test1.py

glob() with wildcards

As earlier mentioned the glob module supports wildcard characters:

An asterisk (*) matches zero or more characters of any type.
A question mark(?) matches a single character.

match zero or more characters(`*`)

use an asterisk for matching

import glob

matches = glob.glob('project/*')
for p in matches:
    print(p)

project\file.py
project\file1.py
project\file2.py
project\file3.py
project\tests

In the above example, only files that are direct children of "project" directory are returned i.e those that matches the pattern "project/*". To include paths in sub-directories, the sub-directory must also be included in the pattern.

import glob

matches = glob.glob('project/tests/*')
for p in matches:
    print(p)

project\tests\test.py
project\tests\test1.py

In the above example, we have included explicitly the name of the sub-directory, we can also rely on a wildcard to find the directory as shown below.

import glob

matches = glob.glob('project/*/*')
for p in matches:
    print(p)

While the previous approach will match only the directories under the "tests" subdirectory, using a wildcard as in above, will match all present subdirectories.

match single character(`?`)

A question mark in the pattern will match a single character in the specified position.

import glob

matches = glob.glob('project/file?.py')
for p in matches:
    print(p)

project\file1.py
project\file2.py
project\file3.py

As you can see in the above example, all paths matching the pattern and with a character in the position specified by the ? character are returned.

Instead of a question mark, we can also use simple character range to match a single character but in a limited manner, for example [0-9] in the pattern will match only digits from 0 to 9 instead of just any character.

import glob

matches = glob.glob('project/file[0-9].py')
for p in matches:
    print(p)

project\file1.py
project\file2.py
project\file3.py

Matching files recursively

Normally the glob() function will only match paths only in the specified directory without descending into sub-directories. We can however, specify the recursive parameter as True so that glob will traverse all sub-directories of the specified directory and match the given path accordingly.

We use a double star wildcard(**) to indicate that the search should be recursive.

Consider the following example.

import glob

for p in glob.glob("project/**/*.py", recursive = True):
     print(p)

project\file.py
project\file1.py
project\file2.py
project\file3.py
project\tests\test.py
project\tests\test1.py

In the above example, we were able to recursively search through the "project" directory to get all files with a ".py" extension . The double star wildcard '**' indicates recursive searching, meaning the function will search through all subdirectories of the directory given in the pattern.

Recursive searching can be very expensive especially if the directory tree is too large. The iglobe() function can be used instead of glob() so as to significantly improve the performance. The function works just like globe() except that it returns an iterator object instead of a list. This makes it suitable if the directory tree is large because each path is only retrieved when it is needed.

import glob

paths = glob.iglob("project/**/*.py", recursive = True)
print(next(paths))
print(next(paths))
print(next(paths))

project\file.py
project\file1.py
project\file2.py

As you can see in the above example, the iglobe() function does not load the paths into memory all at once, instead, it loads them one by one, as needed.

File System

os.path module

glob module

tempfile module

shutil module

mmap module

filecmp

io Module

fnmatch

glob module in Python

glob() with wildcards

match zero or more characters(*)

match single character(?)

Matching files recursively

Related articles

match zero or more characters(`*`)

match single character(`?`)