Iterators are objects that can be used to traverse through a collections of elements, such as lists, tuples, range, sets, etc.
The most obvious way to create an iterator is through the builtin iter()
function.
A simple example of an iterator
data = ['Python', 'Java', 'C++', 'PHP', 'Swift']
iterator = iter(data)
#use the iterator to loop through the list items
print(next(iterator))
print(next(iterator))
print(next(iterator))
print(next(iterator))
print(next(iterator))
In the above example, we used the builtin iter()
function to create an iterator for the list data
.
The builtin next()
function is used with iterators to move to the next item in the target collection.
By themselves, iterators do not actually hold any data, instead they provide a way to access it. They keep track of their current position in the given iterable and allows traversing through the elements one at a time. So in their basic form, iterators are merely tools whose purpose is to scan through the elements of a given container. Consider the following example:
data = ['Python', 'Java', 'C++']
iterator = iter(data)
print(len(iterator))
As demonstrated above, it is not possible to obtain the length of an iterator as they don't store any data themselves.
Iterators supports use with loops for effective and efficient traversal on the elements. By using an iterator together with for
loops we can iteratively process each element until the elements are exhausted. This is just an automated way of calling the next()
function on the iterator until we reach the end of the target container.
data = range(10)
for i in iter(data):
print(i)
Another important thing to note about iterators is that they cannot traverse backwards. Once an iterator has been used to move through a sequence, it cannot be used to move backwards through the same sequence. So when an iterator reaches the end of the target container, it fairly becomes useless. And this leads us to the next part of the discussion, the StopIteration
exception.
The StopIteration exception
When an iterator has traversed through all the present elements of the target iterable or the iterable is empty, it will raise a StopIteration
exception on an attempt to access the next element.
data = ['Python', 'Java', 'C++']
iterator = iter(data)
print(next(iterator))
print(next(iterator))
print(next(iterator))
print(next(iterator))
When the target iterable is mutable, the iterator can still access recently added elements that are added after its creation. This is because it is the object's reference that is being tracked by the iterator.
data = ['Java', 'Python']
iterator = iter(data)
print(next(iterator))
#add an item to the list
data.append('C++')
print(next(iterator))
print(next(iterator))
However, when an iterator raise a StopIteration
exception from there onwards it becomes rather unusable, and any subsequent attempts to access the next element will also raise StopIteration
.
data = ['Java', 'Python']
iterator = iter(data)
print(next(iterator))
print(next(iterator))
print(next(iterator))
data.append('C++')
print(next(iterator))
The iterator protocol
We have looked at how iterators work at an operational level. In this part we will look at the technical details about iterators.
An iterator is defined by two special methods, __iter__()
and __next__()
. These two methods are together known as the Iterator protocol.
The __iter__()
method is called when instantiating the iterator
The __next__()
method is responsible for returning the next element in the iteration. It is actually the method that is being called in the background when we call the next()
function on an iterator.
When a custom class implements the two methods it becomes an iterator and can be used with the built-in functions and constructs that expect an iterator.
class Square:
def __init__(self, start, stop):
self.stop = stop
self.current = start
#define the __iter__ method
def __iter__(self):
return self
#define the __next__ method
def __next__(self):
if self.current >= self.stop:
raise StopIteration
self.current += 1
return self.current ** 2
#create an iterator
sq = Square(10, 20)
#use the iterator
for i in sq:
print(i)
In the above example, we created a square iterator which on each traversal returns the square of a number between start
and stop
parameters. The __next__()
method ensures that when start
is greater than stop
, a StopIteration
exception is raised. The method is also responsible of incrementing the current
value so that on the following iteration, the square of the next value in the sequence is returned.
The itertools module
As the name suggests, the itertools module in the standard library provides a number of tools for working with iterators. In essence, the module provides solutions to common iterative problems which arise when dealing with iterables. We will only discuss some useful applications of the various tools provided by the module, you can check on itertools module to view full discussion on the module.
merging iterators
The itertools
module provides the chain()
function which can be used to iterate through multiple iterables without actually copying their contents. The function creates an iterator that iterates through the elements of the first iterable until they are exhausted, then continues through the subsequent iterators until they are all exhausted.
from itertools import chain
data1 = ['Python', 'Java', 'C++']
data2 = ['Ruby', 'Swift']
data3 = ['HTMl', 'Javascript', 'PHP']
for i in chain(data1, data2, data3):
print(i)
In the above example we have three lists, data1, data2, and data3. We used the chain() function to iterate through their elements in a row. This is more efficient than first concatenating the lists because in the later case an entirely new list which would take up additional memory. With the chain()
function, the elements can be accessed without having to create a new list.
Grouping data
The groupby()
function is used in the case where we need to group element from a collection based on a similar characteristic. The function returns an iterator of tuples in which each tuple contains a key and the corresponding group of elements from the iterable.
from itertools import groupby
data = [ {'name': 'John', 'age': 25}, {'name': 'Tim', 'age': 20}, {'name': 'Sally', 'age': 30}, {'name': 'Paul', 'age': 25}, {'name': 'Jane', 'age': 20} ]
sorted_data = sorted(data, key=lambda x: x['age'])
grouped_data = groupby(sorted_data, key = lambda x: x['age'])
for key, group in grouped_data:
agemates = ', '.join(person['name'] for person in group)
print(f'Age {key}: %s'% agemates)
In the above example we have a variable data
which is a list of dictionary where each dictionary contains the name
and the age
of a particular person.
We sorted the dictionaries by age using the builtin sorted()
function and then called the groupby()
function on the sorted data. Note that the function expects the input iterables to be sorted as per the given key before they are passed for grouping, otherwise, it may produce inconsistent results.
We then used a for
loop to get each key and its group of elements before printing them.
Why use Iterators?
Due to the fact that we can still access the elements of an iterable in direct ways without using iterators, one might wonder what their advantage is. The following points indicates the advantages of using iterators over the traditional iteration approach.
- Iterators allow for efficient iteration over large datasets as they do not have to store all values in memory.
- Iterators provide a concise way to scan over datasets or collections.
- Iterators allow for lazy evaluation, which means that values are computed on demand instead of pre-computing all values in advance.
Did you know that you can have iterator functions?, they are called generator functions. These are functions whose statements are executed in chunks instead of all at once. check on discussion about them at generator functions.