The itertools
module in the standard library offers a number of tools that for conveniently iterating through iterables.
The groupby()
function in the module is used for grouping and sorting data based on a certain key.
groupby(iterable, key = None)
iterable |
Required. An iterable(list, set, tuple, range, etc) containing the elements to be grouped. |
key |
Optional. A function that will be used for grouping the elements of the iterable. |
The groupby()
function returns an iterator of tuples in which each tuple contains a key and the corresponding group of elements from the iterable.
from itertools import groupby
bodies = [('planet', 'Jupiter'), ("planet", "Earth"), ("planet", "Mars"), ('galaxy', 'Andromeda'), ("galaxy", "Milky way"), ("moon", "Europa"), ("moon", "Lunar")]
for key, group in groupby(bodies, lambda x: x[0]):
similar_bodies = ", ".join([body[1] for body in group])
print(key + "(s): " + similar_bodies + ".")
In the above example, bodies
is a list of tuples where each tuple contains a celestial body, first item in each tuple is the body's category and the second is its name.
We used lambda x: x[0]
as the key parameter to tell the groupby()
function to use the first item in each tuple as the grouping key.
In the for
loop, groupby()
returns three (key, group iterator) pairs - once for each unique key. You can use the returned iterator to iterate over each individual item in that group.
Let us see another example:
Group people by age
from itertools import groupby
data = [ {'name': 'John', 'age': 25}, {'name': 'Tim', 'age': 20}, {'name': 'Sally', 'age': 30}, {'name': 'Paul', 'age': 25},]
sorted_data = sorted(data, key=lambda x: x['age'])
grouped_data = groupby(sorted_data, key = lambda x: x['age'])
for key, group in grouped_data:
agemates = ', '.join(person['name'] for person in group)
print(f'Age {key}: %s'% agemates)
In the above example we have a variable data which is a list of dictionary where each dictionary contains the name
and the age
of a particular person.
We sorted the dictionaries by age using the builtin sorted()
function and then called the groupby()
function on the sorted data. Note that the function expects the input iterables to be sorted as per the given key before they are passed for grouping, otherwise, it may produce inconsistent results.
We then used a for
loop to get each key and its group of elements before printing them.