counting element occurrences

#import the Counter class from collections module
from collections import Counter

#An iterable with elements to count
data = 'aabbbccccdeefff'

#create the Counter object
c = Counter(data)

print(c)

#get the count of a specific element
print(c['f'])

 

The collections module in the standard library provide the  Counter class that is a container which stores elements as dictionary keys, and their counts as dictionary values. The class is a subclass of the dict class and its objects shares a lot of functionality with the standard dictionaries. 

from collections import Counter

print(issubclass(Counter, dict))

Counter objects are typically used in situations where you need to keep track of how many of each element appears in an iterable object such as a list, set, tuple, e.t.c  This can be the case for example when  performing analysis on a dataset.

Instantiating Counter objects

Counter objects can be initialized by either passing in a sequence of data such as a list or tuple, or by manually setting the initial values. 

Counter(iterable = None, **kwargs)

The class only works with hashable elements, if the iterable contains any unhashable element, a  TypeError will be raised. 

Instantiate a Counter object from an iterable object.

from collections import Counter

data = ['Java', 'Python', 'Java', 'C++', 'Python', 'PHP', 'C++', 'Java', 'PHP', 'Java', 'Python']

c = Counter(data)

print(c)

Alternatively we can populate the initial values at instantiation as shown below.

from collections import Counter

c = Counter({'Python': 10, 'Java': 8, 'C++': 12, 'PHP': 9})

print(c)
Updating the dataset

The update method  changes the dataset after the counter have been instantiated

Update  the sequence after instantiation

from collections import Counter

c = Counter()

data = ['Python', 'C++', 'Java', 'Python', 'C++', 'Python']
#update the counter
c.update(data)

print(c)

Counter methods

Since the Counter class is a subclass of the dict class, it inherits  all of the methods of the dict class. It also defines other useful methods which includes:  

method description
most_common(n) Returns a list of tuples showing the n most commonly encountered elements and their respective counts.
elements() Returns an iterator over the elements of the
subtract(iterable = None, **kwargs) Subtracts counts of elements in the given iterable from the current counts.
update(iterable = None, **kwargs) Updates the Counter with the elements from another iterable.

Get the n most common elements

The most_common() method returns the most common elements as per the given integer argument. It returns a list  containing tuples of the form (item, frequency) in descending order of frequency. 

most_common(n)
from collections import Counter

data = ['Java', 'Python', 'Java', 'C++', 'Python', 'PHP', 'C++', 'Java', 'PHP', 'Java', 'Python']

c = Counter(data)

print(c.most_common(3))

In the above example, the most_common() method is used to get the 3 most common elements from the list. 

Accessing Counts

Since Counter  objects are a dictionaries we can use the dictionary way of accessing elements to access the count of a particular element,  i.e  counter['x'].

from collections import Counter

data = ['Java', 'Python', 'Java', 'C++', 'Python', 'PHP', 'C++', 'Java', 'PHP', 'Java', 'Python']

c = Counter(data)

print(c['Java'])

We can also use the get() method for the same purpose.

from collections import Counter

data = ['Java', 'Python', 'Java', 'C++', 'Python', 'PHP', 'C++', 'Java', 'PHP', 'Java', 'Python']

c = Counter(data)

print(c.get('Python'))

Operations on Counter objects

Counter objects support various arithmetic as well as set operations. Some of these operations are.

operation Description
c1 + c2 You can add two Counter objects by using the + operator, resulting in a new Counter with the sum of the count of each element.
c1 - c2 You can subtract two Counter objects by using the - operator. This will remove all elements found in c2 from c1.
c1 & c2 The union of two Counters  results in another Counter object with all elements of both c1 and c2, with the value/ count of each element being the sum ots count in c1 and c2.
c1 | c2 Intersecting two Counters keeps only the elements that appear in both Counters. The resulting Counter will be the minimum of the two Counters.

Addition

from collections import Counter

data1 = ['Java', 'Python', 'Java', 'C++', 'Python', 'PHP', 'C++', 'Java', 'PHP', 'Java', 'Python']
data2 = ['PHP', 'Python', 'Java', 'Python']

c1 = Counter(data1)
c2 = Counter(data2)

c3 = c1 + c2 
print(c3)

Intersection

from collections import Counter

data1 = ['Java', 'Python', 'Java', 'C++', 'Python', 'PHP', 'C++', 'Java', 'PHP', 'Java', 'Python']
data2 = ['PHP', 'Python', 'Java', 'Python']

c1 = Counter(data1)
c2 = Counter(data2)

c3 = c1 | c2 
print(c3)