A string is a sequence of characters that represent text-based information. A character is a single unit of data that represents a symbol, letter, digit, or any other symbol from a character set, such as Unicode. Examples of characters include the alphanumeric characters (a-z, A-Z, 0-9) and the special characters such as #, &, !, ^, %, @ , etc.
A string is represented in the memory simply as an array of characters terminated with a special character known as the null character. In some languages such as C and C++, it is possible to interact with strings at extremely low level. Python, on the other hand, abstracts the strings in a way that a programmer does not necessarily need to care about how they are stored and represented in the memory.
Strings are the most commonly used data types in Python. Some common use cases of strings are:
- Storing and manipulating textual data, such as user input, file contents, and database records.
- Formatting output to display data in a readable and organized manner, such as with print statements or logging messages.
- Processing and analyzing text data, such as with natural language processing or regular expressions.
- Developing and testing software, where strings can be used for debugging, logging, and unit testing.
How are strings represented in Python?
In Python, a string is anything that is enclosed in single quotes(' ') , double quotes(" ") or triple single or double quotes(''' ''', """ """).
Examples:
'Hello, World!'
"Hello, World!"
'''Hello, World!'''
"""Hello, World!"""
The simplest string in Python is simply a single or double quote without anything inside them('', ""), this represents an empty string.
Python does not differentiate between single or double quotes when defining a string, this means that 'Hello, World!' and "Hello, World!" are exactly the same thing. However, you cannot mix the two formats, for example 'Hello, World!" will result in SyntaxError
, if you use one format for opening you must as well use it as the closing format.
The single and double quotes are the most commonly used string formats, the triple quotes are in most cases, used only when a string spans multiple lines.
'''This is an example
of a string
that spans multiple lines'''
In some languages such as C and C++, characters and strings are two distinct data types but in Python there is no data type to represent single characters, a character is simply a string made of just one character. Examples of characters in Python:
'a'
"b"
'4'
"7"
'@'
Escaping special characters
Consider if we want to use a string which contains an apostrophe e.g john's, in this case, if we use single quotes like 'john's' , the string will be closed prematurely leading to a SyntaxError.
To solve this issue, we can use the double quotes format. If a string contains double quotes we can as well use the single quotes to avoid premature termination of the string.
In cases where we want to use both single and double quotes inside a string, we can use the triple quotes formats as follows:
The backlash (\) , also known as the escape character, is used to allow special characters in a string. Special characters are characters that have a special purpose such as the backlash, single quote, double quotes, newline character e.t.c.
For example to use the single quotes format for the string John's, we can do it as follows:
our other examples above using the backlash character:
The backslash character before the single or double quotes tells Python to treat them as literal characters within the string.
We can add a new line in a string by using the backlash followed by letter n ( \n ) :
The backslash must itself be escaped to occur as a natural character of the string literal, as in:
The following list shows some escape characters and their purpose
Escape Character | Meaning |
\\ | Backlash |
\' | Single quote |
\" | Double quote |
\a | Alert |
\b | Backspace |
\f | Formfeed |
\n | Newline |
\r | Carriage return |
\t | Horizontal tab |
\v | Vertical tab |
\0 | Null |
Raw Strings
Raw strings are used to create string literals that ignore all escape characters. To create a raw string, we add the letter "r" before the first quote character of the string.
As you can see, the "\n"
character in r"Denver\nTokyo" is not interpreted as a newline character in the output, but instead it is treated as two separate characters (a backslash and the letter "n").
Consider a string to represent a path on Windows directories, without using a raw string, we will have to use the double backslashes ( \\ ): For example to open a file .
open('C:\\users\\user\\desktop\\text.dat', 'w')
If we use a raw string, one backslash will be okay as shown below:
open(r'C:\users\user\desktop\text.dat', 'w')
In raw strings all the escape characters are treated just as normal characters. However, a string whether raw or not cannot end with a single backlash or to be more precise, an odd number of backlashes. This is because the trailing backlash will escape the closing quote character.
In this case, two backslashes still can't be used like we would do in a normal( non raw ) string because both of them will be included:
What we can do if a raw string ends with a backlash is use two backlashes, then slice off the last one as follows:
Or just use a normal string with the escape character:
Operations on Strings
We can perform operations on strings in order to manipulate them or extract information from them. Some common operations on strings are:
String Length
The length of a string is the number of characters that makes it, including white spaces and special characters. The builtin len()
function is used to get the length of a given string.
An empty string has a length of 0
String concatenation and Repetition
Concatenation means creating a new string by joining/adding two or more separate strings , in Python this operation is achieved using the "+" operator.
Both of the operand in string concatenation must be strings, for example using an integer as one of the operand will raise a TypeError error.
The "*" operator is used for string repetition. The operator takes two operands, the string to repeat and an integer for the number of times the string will be repeated.
If both operations are used in the same expression, repetition is done before concatenation.
Indexing and Slicing
A string is a collection of characters, we can access individual characters in a given string by use of Indexing. Each character in a string have an integer index which defines its position in the string. The first character in a string has an index 0, the second character has an index 1 , the third character has an index 2 , and so forth. For example for in the string "Hello, World!", the characters will be positioned as shown below:
If you are new to programming you should avoid the mistake of assuming that the indices starts at 1. Even in all the other sequence-based data types such as lists and tuples, the first index will always be 0, and this is still the case in most other programming languages.
To get the character at a given position/index we use the [] operator with the following syntax:
<string>[<index>]
Since the first character has an index of 0, the last character in any string will always have an index which is the total number of characters minus 1, or simply the string's length minus 1. For example, the string "Hello, World!" has a length of 13 , therefore, the last character which is ( ! ) has an index 12 .
Negative Indexing
Python allows indexing from back to front. Negative indices are used in this approach where the last character in the string has an index -1 , the second last has an index -2 , the third last has an index -3, and so forth, until we reach the first element which has an index which is negative the length of the string.
From the two indexing approaches i.e using positive or negative indices, we get that the valid indices which represents characters in a string, starts at negative the length of the string and ends at the length of the string minus 1 . For example in the string "Hello, World!" which has a length of13, the valid index values are -13 to 12 . Trying to Use an index which is not within this range will raise an IndexError
.
Slicing
While indexing is used to get the character at a specific index, Slicing can be used to get a substring containing characters in a given range of indices. The [] operator is still used in slicing but with a different syntax in order to capture a range. The syntax is as shown below:
<string>[start : stop : step]
The start indicates the index at which the range will begin, while the stop indicates the index where the range will end, the stop itself is not included in the range. For example, to get the first 5 characters, the value of start will be 0, and the value of stop will be 5 , these are the values in the indices 0,1,2,3,4
The step is optional , if it is not included , 1 will be used as the default value. The value given as step is used as the jump value in the range, for example:
Without step, the syntax is:
<string>[start : stop]
In slicing, unlike in indexing, values outside the range can be used without raising an IndexError
. If the value of start, is smaller than the smallest valid index, the slicing will begin at the first character and if the value of stop is larger than the largest valid index, the slicing will end at the last character in the string.
As shown above, we can mix negative and positive indices as the values of start or stop . For example to get all the characters from the fifth indices up to the second last , we can use 5 as the start and -1 as the stop.
Using the value of start larger than the value of stop will result in an empty range and an empty string will be returned.
The Short Hand Slicing
Consider if we want to get the characters from a certain index up to the last character in the string. In this case, we can use the string's length as the stop value, or just any other number larger than the strings length. For example:
The above examples in short hand slicing will be as follows:
We simply ignore the stop and the slicing goes on up to the last index.
We can also ignore the start and the slicing will begin from index 0.
Ignoring both start and stop will result in exactly the same string:
If the step need to be included in the short hand syntax, two colons must be used.
Reversing a String
The short hand slicing with no start and stop and with -1 as the step can be used to easily reverse a string.