The os
module in the standard library includes the path
submodule, which provides utilities for working with file and directory paths.
To use the various utilities offered in the path module, we will first have to import it in our program. As shown below.
Parsing Paths
Parsing refers to breaking down data into meaningful tokens which can then be used for further processing.
The path
module offers functions for parsing a string that represents a path. The functions do not validate whether the given path actually exists they simply operates on the paths as mere strings.
The parsing functions depends on some os
variables in order to function correctly. These variables are as outlined below:
os.sep
- The character used to separate the parts of a path. It is typically a forward slash(/) on Unix-based systems and a backslash(\\)on Windows-based systems.os.extsep
- The character used to separate an extension from the rest of a filename. This is usually a period (.)os.curdir
- The character used to represent the current directory. It is usually a period(.)os.pardir
- The character used to go up a directory. it is usually two periods(..)
path.split()
The split()
function breaks a path into two parts: the directory path, and the base filename. It simply splits the string at the last position where the os.sep
character appears.
path.split(p)
copy
The function returns a tuple which contains the directory and the filename.
If the given string ends with the os.sep
character, the second value in the returned tuple will be an empty string.
path.basename()
The basename()
function returns an equivalent value to the second value in the tuple returned by the split()
function. It returns the text that comes after the last slash/os.sep
character.
os.basename(p)
copy
If the input string ends with the os.sep
character, the basename()
function returns an empty string.
path.dirname()
The dirname()
function returns a string equivalent to the first value of the tuple returned by the split()
function. The returned value represents the directory name for the given path. It is simply all the characters up to the last os.sep
character in the path.
path.dirmname(p)
copy
path.splitext()
The splitext()
function works just like the split()
function except that it use the os.extsep
character to split the path. It returns a tuple containing the root and the file extension.
path.splitext(p)
copy
The function splits the input string at the last occurrence of os.extsep
character.
path.commonprefix()
The commonprefix()
function takes an iterable of paths as an argument and returns the longest common sub-path of all the given paths.
path.commonprefix(m)
copy
Creating Paths
It is common to create paths from existing strings. The path module offers various functions that can be used for this purpose.
path.join()
The join()
function is a common tool for creating a path by joining two or more segments.
path.join(path, *paths)
copy
path |
The base path. |
*paths |
A series of arbitrary additional paths to be joined to the base path. |
The join()
function uses the os.sep
variable to join the elements of the given path.
In the above example:
- We imported the
path
module fromos
- we instantiated a tuple containing the segments to be joined to the base path.
- We called the path.join with
'basedir'
as the base path and the segments as the rest of the parameters. The unpacking operator(*
) ensures that the segments are passed to the function one at a time.
If an argument to join begins with the os.sep
character, it is regarded as a full path and is treated as the base path, all the arguments that precedes it are therefore discarded.
In the above example, in each iteration, the arguments preceding the os.sep character(/
) are disregarded and the one beginning with the os.sep character becomes the beginning value of the returned path.
path.expanduser()
The expanduser()
function expands a pathname that may start with a tilde (~
) character to represent the user's home directory. It simply replaces the ~
and ~user
constructs at the beginning of a path with the absolute path to the home directory of the current user.
path.expanduser(p)
copy
from os import path home = path.expanduser('~') print(home)
copy
Output:
C:\Users\John
We can use expanduser()
function together with the join()
function to create absolute paths.
from os import path my_path = "~\\desktop\\project.py" print(path.expanduser(my_path))
copy
Output:
C:\Users\John\desktop\project.py
path.expandvars()
The expandvars()
function is more general than the expanduser()
function in that it expands any environment variables in the given path.
path.expandvars(path)
copy
The function replaces values in the path which are of the forms( $var, ${var} or %var% ) with the corresponding environment variables. It does not validate whether the resulting path actually exists.
import os from os import path os.environ['DESKTOP'] = 'C:\\Users\\Admin\\Desktop' os.environ['TEST_PATH'] = 'project\\tests.py' abs_test_path = path.expandvars('$DESKTOP\\$TEST_PATH')\ print(abs_test_path)
copy
Output:
C:\Users\Admin\Desktop\project\tests.py
path normalization
Normalizing paths is the process of converting a given path into a canonical form. This may involve:
- Removing any redundant elements from the path such as duplicate separators e.g //
- Resolving any relative paths (e.g.
'.'
or'..'
) - Converting a path to the platform-specific format(e.g. foward slashes for Linux, backslashes for Windows)
- Resolving any symbolic links.
- Removing trailing slashes.
The main aim of path normalization is to ensure that all elements of the path are consistent and unambiguous. This can be especially necessary when the path has been generated using the join()
function
path.normpath()
The normpath(
) function provides an easy way to normalize paths> It converts a path to its simplest form by eliminating redundant separators, references to current and parent directories, and symbolic links.
The normpath()
function makes the path easier to read and more compatible across systems. For example, on Linux, it replaces backslashes '\ ' with forward slashes '/' and removes '..' and '.' references. On Unix-based systems, it removes '//' references.
In the above example, mypath
is an inconsistently formatted path . The normpath()
function in this case transforms the path into a valid Linux path. Running the same program on windows will result in the forward slashes being replaced with backslashes.
on windows
from os import path #the path to normalize mypath = 'desktop//./project//tests.py' #normalize the path normalized = path.normpath(mypath) print(normalized)
copy
Output:
desktop\project\tests.py
path.abspath()
The abspath()
function is used to get the absolute path of a given relative path. It returns the absolute file path, which is the full path of a file or directory, that is, the complete path starting from the root of the file tree relative to the working directory.
path.abspath(p)
copy
on windows with 'Desktop' as the working directory.
from os import path relative = 'media/images/me.jpg' absolute = path.abspath(relative) print(absolute)
copy
Output:
C:\Users\John\Desktop\media\images\me.jpg
Retrieve file and directory properties
The path
module contains functions that returns file or directory properties such as when it was last modified, when it was created and the amount of data it contains. Unlike the previous functions, these functions depends on the file actually existing in the memory.
In the above example:
- We defined the
p
parameter to contains the path of the current working directory. - The
path.getsize()
function returns the amount of data stored in bytes. - The
path.getctime()
function returns a timestamp for when it was created - The
path.getmtime()
function returns a timestamp for when it was last modified - The
path.getatime
returns a timestamp for when it was last accessed. - We used the
time.ctime()
function to convert the various times into a human friendly format.
Testing files and directories
The module contains various functions which can be used to check whether some properties of a file are True
or False
. Such as checking whether a path is a file or a directory, checking whether a path is an absolute or relative, whether a file exists or not, etc.
check if path is a file/directory
In the above example:
- The
p
variable holds the path of the current working directory. - The
path.isfile()
function checks whether the input path is a file. ReturnsTrue
if it is a file andFalse
otherwise. - The
path.isdir()
function checks whether the input path is a directory. ReturnsTrue
if it is a file,False
otherwise.
Check if path exists
All of the testing functions can be summarized as shown below:
function | usage |
---|---|
isabs(p) |
Checks whether path p is an absolute path. |
isfile(p) |
Checks whether path p is a file. |
isdir(p) |
Checks whether path p is a directory. |
islink(p) |
Checks whether path p is a link. |
ismount(p) |
Checks whether path p is a mount point. |
exists(p) |
Checks whether path p exists in the memory. |
lexists(p) |
Checks whether link p actually exists. |