In Python, the global interpreter lock commonly known as GIL is a mechanism that is used to ensure that only a single thread can access the interpreter at a time.
A brief overview of threads and multithreading.
A thread can be thought of as a smaller unit of a process/program that is executed independently of other threads within the same process. A single process can have multiple threads concurrently. All the threads running in a single process shares same resources and can directly access common data.
Multithreading is what it is called when two or more threads in a single process are executed in overlapping periods of time. It offers a lightweight and very convenient way to execute multiple tasks concurrently.
The following example shows a basic multithreading program.
import threading
import time
#print capital letters
def task1():
for letter in "ABCDE":
print(letter)
time.sleep(0.5) #cause a small delay
#print digits
def task2():
for letter in "12345":
print(letter)
time.sleep(0.5) #cause a small delay
if __name__ == "__main__":
# Create two thread for each function/task
thread1 = threading.Thread(target=task1)
thread2 = threading.Thread(target=task2)
#Start the threads
thread1.start()
thread2.start()
# The main thread waits for both threads to finish
thread1.join()
thread2.join()
A
1
B
2
C
3
D
4
E
5
If you observe the above output, you can see that the functions are actually been executed in parallel rather than one after the other.
Why do we need the GIL?
The fact that threads directly shares resources and data makes communication between them easier and efficient, leading to quicker execution of simultaneous tasks. However, this also comes with its own challenges. Without proper synchronization of the tasks, it can easily lead to data inconsistency and corruption. For example, this would happen if one thread deletes a resources that is been used by other threads.
To avoid situations like one described above, the GIL ensures that only a single threads gets executed by the interpreter at a given time. Multiple threads are executed in turns by switching very rapidly between them. The switching is fast enough to not cause any noticeable delays to the end user. However, it still isn't true parallelism as the two tasks are not actually been executed in parallel.
GIL's Limitations
The Global Interpreter Lock is designed to ensure data integrity in a multi-threaded program. It helps to avoid race conditions and potential data corruption that could occur if multiple threads were allowed to modify shared data simultaneously.
However, despite its role in maintaining data integrity, the GIL also introduces certain limitations and trade-offs.
The two main limitations of the GIL are:
- Limited parallelism: The GIL limits true parallelism, particularly in CPU-bound tasks. While multiple threads can run concurrently, they are not able to fully utilize multiple CPU cores simultaneously within the same process.
- Performance Bottleneck: In scenarios where threads spend significant time executing, the GIL can become a performance bottleneck. CPU-bound tasks may not see the expected performance improvements from threading due to the serialization caused by the GIL.
Dealing with GIL's Limitations
To deal with the limitations imposed by the Global Interpreter Lock (GIL) especially in scenarios where true parallelism is needed, one can consider the following approaches:
Using Multiprocessing:
With multiprocessing, separate processes can be created, each with its own instance of the interpreter. Each process operates independently with its own resources and data. Processes are not affected by the GIL and this allows for true parallelism, as each process can utilize a separate CPU core.
Multprocessing in Python is achieved using the multiprocessing module in the standard library. The module has an interface that is very similar to that of multithreading. This makes it an attractive option for developers who are familiar with threading but need to achieve true parallelism and avoid the Global Interpreter Lock (GIL) limitations.
Using multiprocessing, our previous example would look like:
import multiprocessing
import time
#print capital letters
def task1():
for letter in "ABCDE":
print(letter)
time.sleep(0.5) #cause a small delay
#print digits
def task2():
for letter in "12345":
print(letter)
time.sleep(0.5) #cause a small delay
if __name__ == "__main__":
# Create two processes for each function/task
thread1 = multiprocessing.Process(target=task1)
thread2 = multiprocessing.Process(target=task2)
#Start the threads
thread1.start()
thread2.start()
# The main process waits for both processes to finish
thread1.join()
thread2.join()
A
1
B
2
C
3
D
4
E
5
As you can see from the above example, multiprocessing is very similar to multithreading if you are already with either.
Using alternative implementations:
While CPython, the default implementation of Python, has the GIL, other implementations like Jython (Python on the Java Virtual Machine) and IronPython (Python on the .NET Framework) don't have a GIL. If applicable, one might explore using these alternative implementations.