The os module in the standard library includes the path submodule, which provides utilities for working with file and directory paths.

To use the various utilities offered in the path module,  we will first have to import it in our program. As shown below.

#import the module
from os import path

#do something with the module
print(path)

Parsing Paths

Parsing refers to breaking down data into meaningful tokens which can then be used for further processing. 

The path module offers functions for parsing a string that represents a path. The functions do not validate whether the given path actually exists they simply operates on the paths as mere  strings.

The parsing functions depends on some os variables in order to  function correctly. These variables are as outlined below:

  • os.sep -  The character used to separate the parts of a path. It is typically a forward slash(/) on Unix-based systems and a backslash(\\)on Windows-based systems.
  • os.extsep -  The character used to separate an extension from the rest of a filename. This is usually a period (.)
  • os.curdir - The character used to represent the current directory. It is usually a period(.)
  • os.pardir - The character used to  go up a directory. it is usually two periods(..)
import os

#the variables
print(os.sep)
print(os.extsep)
print(os.curdir)
print(os.pardir)

path.split() 

The split() function breaks a path into  two parts: the directory path, and the base filename. It simply splits the string at the last position where the os.sep character appears.

path.split(p)

 The function returns a tuple which contains the directory and the filename.

from os import path

p = "C:/Users/admin/desktop/example.mp3"

parts = path.split(p)
print(parts)

dir, fname = parts[0], parts[1]
print('directory: ', dir)
print('filename: ', fname)

If the given string ends with the os.sep character, the second value in the returned tuple will be an empty string.

from os import path

p = "C:/Users/admin/desktop/"

parts = path.split(p)
print(parts)

dir, fname = parts[0], parts[1]
print('directory: ', dir)
print('filename: ', fname)

path.basename() 

The basename() function returns an equivalent value to the second value in the tuple returned by the split() function. It returns the text that comes after the last slash/os.sep character.

os.basename(p)
from os import path

p = 'C:/Users/admin/desktop/example.png'

print(path.basename(p))

If the input string ends with the os.sep character, the basename() function returns an empty string.

from os import path

paths = [
'/styles/css/base.css',
'/scripts/js/index.js',
'static/media/home.png',
'static/files/'
]

for p in paths:
    print(path.basename(p))

path.dirname()

The dirname() function returns a string equivalent to the first value of the tuple returned by the split() function. The returned value represents the directory name for the given path. It is simply all the characters up to the last os.sep character in the path.

path.dirmname(p)
from os import path

p = 'C:/Users/john/desktop/test.py'

print(path.dirname(p))

path.splitext()

The splitext() function works just like the split() function except that it  use the os.extsep character to split the path. It returns a tuple containing the root and the file extension.

path.splitext(p)

The function splits the input string at the last occurrence of os.extsep character. 

from os import path

paths = ['example.txt',
'city.png',
'song.mp3',
'project.py']

for p in paths:
    print(path.splitext(p))

path.commonprefix() 

The commonprefix() function takes an iterable of paths as an argument and  returns the longest common sub-path of all the given paths.

path.commonprefix(m)
from os import path

paths = ['project/static/css/',
'project/static/js',
'project/static/media/image.png',
'project/static/files/song.mp3'
]

print(path.commonprefix(paths))

Creating Paths

It is common to create paths from existing strings. The path module offers various functions that can be used for this purpose.

path.join()

The join() function is a common  tool for creating a path by joining two or more segments.

path.join(path, *paths)
path The base path.
*paths A series of arbitrary additional paths to be joined to the base path.

 The join() function uses the os.sep variable to join the elements of the given path.

from os import path

segments = ('dir1', 'dir2', 'dir3', 'dir4')

full_path = path.join('/basedir', *segments)

print(full_path)

In the above example:

  • We imported the path module from os
  • we instantiated a tuple containing the segments to be joined to the base path.
  • We called the path.join with 'basedir' as the base path and the segments as the rest of the parameters. The unpacking operator(*) ensures that the segments are passed to the function one at a time.

If an argument to join begins with the os.sep character, it is regarded as a full path and is treated as the base path,  all the arguments that precedes it are therefore discarded.

from os import path

paths = [('dir1', 'dir2', 'dir3', 'dir4'),
('dir1', '/dir2', 'dir3', 'dir4'),
('dir1', 'dir2', '/dir3', 'dir4'),
('dir1', 'dir2', 'dir3', '/dir4')
]

for  p in paths:
     full_path = path.join('basedir', *p)
     print(full_path)

In the above example, in each iteration, the arguments preceding the os.sep character(/) are disregarded and the one beginning with the os.sep character becomes the beginning value of the returned path.

path.expanduser()

The expanduser() function expands a pathname that may start with a tilde (~) character to represent the user's home directory. It simply  replaces theand ~user constructs at the beginning of a path with the absolute path to the home directory of the current user.

path.expanduser(p)
from os import path

home = path.expanduser('~')
print(home)

C:\Users\John

We can use expanduser() function together with the join() function to create absolute paths.

from os import path

my_path = "~\\desktop\\project.py"

print(path.expanduser(my_path))

C:\Users\John\desktop\project.py 

path.expandvars()

The expandvars() function is more general than the expanduser() function in that it  expands  any environment variables in the given path.

path.expandvars(path)

 The function replaces values in the path which are of the forms( $var, ${var} or %var% ) with the corresponding environment variables. It does not validate whether the resulting path actually exists.

import os
from os import path

os.environ['DESKTOP'] = 'C:\\Users\\Admin\\Desktop'
os.environ['TEST_PATH'] = 'project\\tests.py'

abs_test_path = path.expandvars('$DESKTOP\\$TEST_PATH')\

print(abs_test_path)

C:\Users\Admin\Desktop\project\tests.py

path normalization

Normalizing paths is the process of converting a given path into a canonical form. This may involve:

  • Removing any redundant elements from the path  such as duplicate separators e.g //
  • Resolving any relative paths (e.g. '.' or '..')
  • Converting a path to the platform-specific format(e.g. foward slashes for Linux, backslashes for Windows)
  • Resolving any symbolic links.
  • Removing trailing slashes.

The main aim of path normalization is to ensure that all elements of the path are consistent and unambiguous. This can be especially necessary when the path has been generated using the join() function 

path.normpath()

The normpath() function provides an easy way to normalize paths> It converts a path to its simplest form by eliminating redundant separators, references to current and parent directories, and symbolic links.

The normpath() function makes the path easier to read and more compatible across systems. For example, on Linux, it replaces backslashes '\ ' with forward slashes '/' and removes '..' and '.' references. On Unix-based systems, it removes '//' references. 

on linux 

from os import path

#the path to normalize
mypath = 'desktop//./project//tests.py'

#normalize the path
normalized = path.normpath(mypath)
print(normalized)

In the above example, mypath is an inconsistently formatted path . The normpath() function in this case transforms the path into a valid Linux path. Running the same program on windows will result in the  forward slashes being replaced with backslashes.

on windows

from os import path

#the path to normalize
mypath = 'desktop//./project//tests.py'

#normalize the path
normalized = path.normpath(mypath)
print(normalized)

desktop\project\tests.py

path.abspath()

The abspath() function is used to get the absolute path of a given relative path. It returns the absolute file path, which is the full path of a file or directory, that is, the complete path starting from the root of the file tree relative to the working directory.

path.abspath(p)

on windows with 'Desktop' as the working directory.

from os import path

relative = 'media/images/me.jpg'

absolute = path.abspath(relative)

print(absolute)

 C:\Users\John\Desktop\media\images\me.jpg

Retrieve file and directory properties

The path module contains functions that returns file or directory properties such as when it  was last modified, when it was created and the amount of data it contains.  Unlike the previous functions, these functions depends on the file actually existing in the memory.

get file properties

import os, time
from os import path

p = path.abspath(os.getcwd())

print(path.getsize(p)) #the amount of data stored in the file in bytes
print(time.ctime(path.getctime(p))) #the time it was created
print(time.ctime(path.getmtime(p))) #the time it was last modified
print(time.ctime(path.getatime(p))) #the time it was last accessed

In the above example:

  • We defined theparameter to contains the path of the current working directory.
  • The path.getsize() function returns the amount of data stored in bytes.
  • The path.getctime() function returns a timestamp for when it was created
  • The path.getmtime() function returns a timestamp for when it was last modified
  • The path.getatime returns a timestamp for when it was last accessed.
  • We used the time.ctime() function to convert the various times into a human friendly format.

Testing files and directories

The module contains various functions which can be used to check whether some properties of a file are True or False. Such as checking whether a path is a file or a directory, checking whether a path is an absolute or relative, whether a file exists or not, etc.

check if path is a file/directory

import os
from os import path

p = path.abspath(os.getcwd())

print(path.isfile(p)) #is it a file?
print(path.isdir(p)) #is it a directory?

In the above example:

  • The p variable holds the path of the current working directory.
  • The path.isfile() function checks whether the input path is a file. Returns True if it is a file and False otherwise.
  • The path.isdir() function checks whether the input path is a directory. Returns True if it is a file, False otherwise.

Check if path exists

import os
from os import path

p = path.abspath(os.getcwd())

print(path.exists(p)) #does the path exist

All of the testing functions can be summarized as shown below:

function usage
isabs(p) Checks whether path p is an absolute path. 
isfile(p) Checks whether path p is a file. 
isdir(p) Checks whether path p is a directory.
islink(p) Checks whether path p is a link.
ismount(p) Checks whether path p is a mount point.
exists(p) Checks whether path p exists in the memory.
lexists(p) Checks whether link p actually exists.