A string is a sequence of characters that represent text-based information. A character is a single unit of data that represents a symbol, letter, digit, or any other symbol from a character set, such as Unicode. Examples of  characters include the alphanumeric characters (a-z, A-Z, 0-9)  and the special characters such as #, &, !, ^, %, @ , etc.

A string is  represented in the memory  simply as an array of characters terminated with a special character known as the null character. In some languages such as C and C++, it is possible to interact with strings at extremely low level. Python, on the other hand, abstracts the strings in a way that a programmer does not necessarily need to care about how they are stored and represented in the memory.

Strings are the most commonly  used data types in Python. Some common use cases of strings are:

  1. Storing and manipulating textual data, such as user input, file contents, and database records.
  2. Formatting output to display data in a readable and organized manner, such as with print statements or logging messages.
  3. Processing and analyzing text data, such as with natural language processing or regular expressions.
  4. Developing and testing software, where strings can be used for debugging, logging, and unit testing.

How are strings represented in Python?

In Python, a string is anything that is enclosed in single quotes(' ') , double quotes(" ") or triple single or double quotes(''' ''', """ """).

Examples:

'Hello, World!'
"Hello, World!"
'''Hello, World!'''
"""Hello, World!"""

The simplest string in Python is simply a single or double quote without anything inside them('', ""), this represents an empty string.

Python does not differentiate between single or double quotes when defining a string, this means that 'Hello, World!' and "Hello, World!" are exactly the same thing. However, you cannot mix the two formats, for example 'Hello, World!" will result in SyntaxError, if you use one format for opening you must as well use it as the closing format.

'Hello, World!"
//SyntaxError: unterminated string literal (detected at line 1)

The single and double quotes are the most commonly used string formats, the triple quotes are in most cases, used only when a string spans multiple lines. Example:

'''Hello,
   World!'''

In some languages such as C and C++, characters and strings are two distinct data types but in Python there is no data type to represent single characters,  a character is simply a string made of just one character.  Examples of characters in Python:

'a'
"b"
'4'
"7"
'@'

Escaping special characters

Consider if we want to use a string which contains an apostrophe e.g john's, in this case, if we use single quotes like 'john's'  , the string will be closed prematurely leading to a  SyntaxError. 

'Johns's'
//SyntaxError: unterminated string literal (detected at line 1)

To solve this issue, we can use the double quotes format. If a string contains double quotes  we can as well use the single quotes to avoid premature termination of the string.

Examples:

print( "John's" )
//John's
print( '"Hello, World!"' )
//"Hello, World"

In cases where we want to use both single and double quotes inside a string, we can use  the  triple quotes  formats as follows:

my_string = '''She said, "Don't forget to bring the umbrella".'''
print(my_string)
//She said, "Don't forget to bring the umbrella".

The backlash (\) , also known as the escape character,  is used to allow special characters in a string. Special characters are characters that have a special purpose such as  the backlash, single quote, double quotes, newline character e.t.c. 

For example  to use the single quotes format for  the string John's, we can do it as follows:

print( 'John\'s' )
//John's

our other examples above using the backlash character:

print("\"Hello, World!\"")
//"Hello, World!"
my_string = "She said, \"Don\'t forget to bring the umbrella\"."
print(my_string)
//She said, "Don't forget to bring the umbrella".

The backslash character before the single or double quotes tells Python to treat them as literal characters within the string.

We can add a new line  in a string by using the backlash followed by letter  ( \n ) :

print("Denver\nNairobi\nBogota\nManilla\nTokyo")
//Denver
//Nairobi
//Bogota
//Manilla
//Tokyo

The backslash must itself be escaped to occur as a natural character of the string literal, as in:

print("C:\\Python\\")
//C:\Python\ 

The following list shows some escape  characters and their purpose

Escape Character  Meaning
\\ Backlash
\' Single quote
\" Double quote
\a Alert
\b Backspace
\f Formfeed
\n Newline
\r Carriage return
\t Horizontal tab
\v Vertical tab
\0 Null

Raw Strings

Raw strings are used to create string literals that ignore all escape characters. To create a raw string, we add the letter "r" before the first quote character of the string. For example:

print(r'Denver\nTokyo')
//Denver\nTokyo
print(r'John\'s')
//John\'s

As you can see, the "\n" character in r"Denver\nTokyo" is not interpreted as a newline character in the output, but instead it is treated as two separate characters (a backslash and the letter "n").

Consider a string  to represent a path on Windows directories, without using a raw string, we will have to use the double backslashes  ( \\ ): For example to open a file :

open('C:\\users\\user\\desktop\\text.dat', 'w')

If we use a raw string, one backslash will be okay as shown below:

open(r'C:\users\user\desktop\text.dat', 'w')

In raw strings all the escape characters are treated just as normal characters. However, a string whether raw or not cannot end with a single backlash or to be more precise, an odd number of backlashes. This is because the trailing backlash will escape the closing quote character. 

print( r"Hello\" )
//SyntaxError: unterminated string literal (detected at line 1)

In  this case, two backslashes still can't be used like we would do in a normal( non raw ) string because both of them will be included: 

print( r"Hello\\" )
//Hello\\

What we can do if a raw string ends with a backlash is use two backlashes, then slice off the last one as follows:

my_string = r"Hello\\"[0:-1]
print(my_string)
//Hello\

Or just use a normal string with the escape character:

my_string = "Hello\\"
print(my_string)
//Hello\

Operations on Strings

We can perform operations on strings in order to  manipulate them or extract information from them. Some common operations on strings are:

String Length

The length of a string is the number of  characters that makes it, including white spaces and special characters. The builtin len() function is used to get the length of a given string.

Examples:

len("a")
//1
len("abc")
//3
len("Hello, World!")
//13
len("Welcome to Pynerds!")
//19

An empty string has a length of 0

len("")
//0

String concatenation and Repetition

Concatenation means creating a new string by joining/adding  two or more separate strings , in Python this operation is achieved using  the  "+" operator. 

Examples:

"Hello, " + "World!"
//'Hello, World!'
"John" + " Doe"
//'John Doe'
"String " + "Concatenation " + "in " + "Python"
//'String Concatenation in Python'

Both of the operand in string concatenation must be strings, for example using an integer as one of the operand will raise a TypeError error.

"3" + 3
//TypeError: can only concatenate str (not "int") to str

The  "*" operator is used for string repetition. The operator takes two operands,  the string to repeat and an integer for the number of times the string will be repeated. 

Examples:

"Hello" * 3
//'HelloHelloHello'
"Hello " * 3
//'Hello Hello Hello '
"33" * 3
//'333333'

If both operations are used in the same expression, repetition is done before concatenation.

Examples:

"5" + "4" * 3
//'5444'
"5" + "4" * 3 + "3" + "2" * 3
//'54443222'

Indexing and Slicing 

A string is a collection of characters,  we can access individual characters in a given string by use of Indexing.  Each character in a string have an integer index which defines its position in the string. The first character in a string has an index 0, the second character has an index 1 , the third character has an index 2 , and so forth.  For example for in the string "Hello, World!", the characters will be positioned as shown below:

How characters are arranged in a Python string

If you are new to programming you should avoid the mistake of assuming that the indices starts at  1.  Even in all the other sequence-based data types such as lists and tuples,  the first index will always be  0, and this is still the case in most other programming languages. 

To get the character at a given position/index we use the [] operator with the following syntax:

<string>[<index>]

For example:

"Hello, World!"[0]
//'H'
"Hello, World!"[1]
//'e'
"Hello, World!"[4]
//'o'
"Hello, World!"[12]
//'!'
"Hello, World"[8]
//'o'
"Hello, World!"[9]
//'r'

Since the first character has an index of 0, the last character in any string will always have an index which is the total  number of characters minus 1,  or simply the string's length minus 1. For example, the string "Hello, World!" has a length of  13 , therefore, the last character which is ( ! ) has an index 12 .

Example:

my_string = "Hello, World!"
my_string[len(my_string) - 1]
//'!'
Negative Indexing

Python allows indexing from back to front. Negative indices are used in this approach where the last character in the string has an index -1 , the second last has an index -2 , the third last has an index  -3, and so forth, until we reach the first element which has an index which is negative the length of  the string. 

Examples:

"Hello, World!"[-1]
//'!'
"Hello, World!"[-2]
//'d'
"Hello, World!"[-13]
//'H'
"Hello, World!"[-12]
//'e'

From the two indexing approaches i.e using positive or negative indices, we get that the valid indices which represents  characters in a string, starts at negative the length of the string and ends at the length of the string minus . For example in the string "Hello, World!" which has a length of13, the valid index  values are  -13 to 12 . Trying to Use  an index which is not within this range will raise an IndexError

Example

"Hello, World"[13]:
//IndexError: string index out of range
"Hello, World!"[-16]
//IndexError: string index out of range

Slicing

While indexing is used to get the character at a specific index, Slicing can be used to get a substring containing  characters in a given range of indices. The [] operator is still used in slicing but with a different syntax in order to capture a range. The syntax is as shown below:

<string>[start : stop : step]

The start indicates the index at which the range will begin, while the stop indicates the index where the range will end, the stop itself is not included in the range. For example,  to get the first 5 characters,  the value of start  will be 0, and the value of stop will be 5 , these are the values in the indices 0,1,2,3,4 

The step is optional , if it is not included , will be used as the default value. The value given as step is used as the jump value in the range, for example:

"Hello, World!"[0 : 10 : 1]
//'Hello, Wo'
"Hello, World!"[0 : 10 : 2]
//'Hlo o'

Without step, the syntax is: 

<string>[start : stop]

Examples:

"Hello, World!"[0 : 5]
//'Hello'
"Hello, World!"[0 : 10]
//'Hello, Wor'
"Hello, World!"[1 : 5]
//'ello'
"Hello, World!"[5 : 10]
//', Wor'
"Hello, World!"[7 : 12]
//'World'

In slicing, unlike in indexing, values outside the range can be used without raising an IndexError . If the value of start, is smaller than the smallest valid index, the slicing will begin at the first character and if the value of stop is larger than the  largest valid index, the slicing will end at the last character in the string:

Examples:

"Hello, World!"[5 : 100]
//', World!'
"Hello, World!"[-50 : 8]
//'Hello, W'
"Hello, World!"[-50 : 100]
//'Hello, World!'

As shown above, we can mix negative and positive indices as the values of start  or stop . For example to get all the characters from the fifth indices up to the second last , we can use 5  as the start and -1 as the stop. 

"Hello, World!"[5 : -1]
//', World!'
"Hello, World!"[7 : -1]
//'World'
"Hello, World!"[0 : -8]
//'Hello'
"Hello, World!"[-6 : -1]
//'World'

Using the value of start larger than the value of stop will result in an empty range and an empty string will be returned.

"Hello, World!"[10 : 0]
//''
The Short Hand Slicing

Consider if we want to get the characters from a certain index up to the last character in the string. In this case, we can use the string's length as the stop value, or just any other number larger than the strings length. For example:

my_string = "Hello, World!"
my_string[7 : len(my_string)]
//'World!'
my_string[0 : len(my_string)]
//'Hello, World!'
my_string[5 : len(my_string)]
//', World!'

The above examples in short hand slicing will be as follows:

my_string = "Hello, World!"
my_string[7 :]
//'World!'
my_string[0 :]
//'Hello, World!'
my_string[5 : ]
//', World!'

We simply ignore the stop  and the slicing goes on up to the last index.

We can also ignore the start and the slicing will begin from index 0, Examples:

my_string = "Hello, World!"
my_string[ : 5]
//'Hello'
my_string[ : 10]
//'Hello, Wor'
my_string[ : 8]
//'Hello, W'

Ignoring both start and stop will result in exactly the same string:

"Hello, World!"[ : ]
//'Hello, World!'

If the  step need to be included in the short hand syntax, two colons must be used.

Examples:

"Hello, World!"[ 5 :: 2]
//',Wrd'
"Hello, World!"[ : 10 : 2 ]
//'Hlo o'
Reversing a String 

The short hand slicing with no start and stop  and with -1 as the step can be used to easily reverse a string as shown below:

"Hello, World!"[ :: -1 ]
//'!dlroW ,olleH'