Iterators in Python

Iterators are objects that can be used to traverse through a collections of elements, such as lists, tuples, range, sets, etc.

The most obvious way to create an iterator is through the builtin iter() function.

A simple example of an iterator

data = ['Python', 'Java', 'C++', 'PHP', 'Swift']

iterator = iter(data)

#use the iterator to loop through the list items
print(next(iterator))
print(next(iterator))
print(next(iterator))
print(next(iterator))
print(next(iterator))

In the above example, we used the builtin iter() function to create an iterator for the list data.

The builtin next() function is used with iterators to move to the next item in the target collection.

By themselves, iterators do not actually hold any data, instead they provide a way to access it. They keep track of their current position in the given iterable and allows traversing through the elements one at a time. So in their basic form, iterators are merely tools whose purpose is to scan through the elements of a given container. Consider the following example:

data = ['Python', 'Java', 'C++']

iterator = iter(data)

print(len(iterator))

As demonstrated above, it is not possible to obtain the length of an iterator as they don't store any data themselves.

Iterators supports use with loops for effective and efficient traversal on the elements. By using an iterator together with for loops we can iteratively process each element until the elements are exhausted. This is just an automated way of calling the next() function on the iterator until we reach the end of the target container.

data = range(10)

for i in iter(data):
    print(i)

Another important thing to note about iterators is that they cannot traverse backwards. Once an iterator has been used to move through a sequence, it cannot be used to move backwards through the same sequence. So when an iterator reaches the end of the target container, it fairly becomes useless. And this leads us to the next part of the discussion, the StopIteration exception.

The StopIteration exception

When an iterator has traversed through all the present elements of the target iterable or the iterable is empty, it will raise a StopIteration exception on an attempt to access the next element.

data = ['Python', 'Java', 'C++']

iterator = iter(data)

print(next(iterator))
print(next(iterator))
print(next(iterator))
print(next(iterator))

When the target iterable is mutable, the iterator can still access recently added elements that are added after its creation. This is because it is the object's reference that is being tracked by the iterator.

data = ['Java', 'Python']

iterator = iter(data)

print(next(iterator))

#add an item to the list
data.append('C++')

print(next(iterator))
print(next(iterator))

However, when an iterator raise a StopIteration exception from there onwards it becomes rather unusable, and any subsequent attempts to access the next element will also raise StopIteration.

data = ['Java', 'Python']

iterator = iter(data)

print(next(iterator))
print(next(iterator))
print(next(iterator))

data.append('C++')
print(next(iterator))

The iterator protocol

We have looked at how iterators work at an operational level. In this part we will look at the technical details about iterators.

An iterator is defined by two special methods, __iter__() and __next__(). These two methods are together known as the Iterator protocol.

The __iter__() method is called when instantiating the iterator

The __next__() method is responsible for returning the next element in the iteration. It is actually the method that is being called in the background when we call the next() function on an iterator.

When a custom class implements the two methods it becomes an iterator and can be used with the built-in functions and constructs that expect an iterator.

class Square:
     def __init__(self, start, stop):
         self.stop = stop
         self.current = start 
  
     #define the __iter__ method
     def __iter__(self):
          return self

     #define the __next__ method
     def __next__(self):
          if self.current >= self.stop:
              raise StopIteration

          self.current += 1
          return self.current ** 2

#create an iterator
sq = Square(10, 20)

#use the iterator
for i in sq:
    print(i)

In the above example, we created a square iterator which on each traversal returns the square of a number between start and stop parameters. The __next__() method ensures that when start is greater than stop, a StopIteration exception is raised. The method is also responsible of incrementing the current value so that on the following iteration, the square of the next value in the sequence is returned.

The itertools module

As the name suggests, the itertools module in the standard library provides a number of tools for working with iterators. In essence, the module provides solutions to common iterative problems which arise when dealing with iterables. We will only discuss some useful applications of the various tools provided by the module, you can check on itertools module to view full discussion on the module.

merging iterators

The itertools module provides the chain() function which can be used to iterate through multiple iterables without actually copying their contents. The function creates an iterator that iterates through the elements of the first iterable until they are exhausted, then continues through the subsequent iterators until they are all exhausted.

from itertools import chain

data1 = ['Python', 'Java', 'C++']
data2 = ['Ruby', 'Swift']
data3 = ['HTMl', 'Javascript', 'PHP']

for i in chain(data1, data2, data3):
    print(i)

In the above example we have three lists, data1, data2, and data3. We used the chain() function to iterate through their elements in a row. This is more efficient than first concatenating the lists because in the later case an entirely new list which would take up additional memory. With the chain() function, the elements can be accessed without having to create a new list.

Grouping data

The groupby() function is used in the case where we need to group element from a collection based on a similar characteristic. The function returns an iterator of tuples in which each tuple contains a key and the corresponding group of elements from the iterable.

from itertools import groupby

data = [ {'name': 'John', 'age': 25}, {'name': 'Tim', 'age': 20}, {'name': 'Sally', 'age': 30}, {'name': 'Paul', 'age': 25},  {'name': 'Jane', 'age': 20} ]

sorted_data = sorted(data, key=lambda x: x['age']) 

grouped_data = groupby(sorted_data, key = lambda x: x['age']) 

for key, group in grouped_data: 
     agemates = ', '.join(person['name'] for person in group)
     print(f'Age {key}: %s'% agemates)

In the above example we have a variable data which is a list of dictionary where each dictionary contains the name and the age of a particular person.

We sorted the dictionaries by age using the builtin sorted() function and then called the groupby() function on the sorted data. Note that the function expects the input iterables to be sorted as per the given key before they are passed for grouping, otherwise, it may produce inconsistent results.

We then used a for loop to get each key and its group of elements before printing them.

Why use Iterators?

Due to the fact that we can still access the elements of an iterable in direct ways without using iterators, one might wonder what their advantage is. The following points indicates the advantages of using iterators over the traditional iteration approach.

Iterators allow for efficient iteration over large datasets as they do not have to store all values in memory.
Iterators provide a concise way to scan over datasets or collections.
Iterators allow for lazy evaluation, which means that values are computed on demand instead of pre-computing all values in advance.

Did you know that you can have iterator functions?, they are called generator functions. These are functions whose statements are executed in chunks instead of all at once. check on discussion about them at generator functions.

Data Types