You may be familiar with how aggregation functions work in SQL. In this article you will learn how to create such functions from a Python object for an SQLite database with the sqlite3 module.

An aggregate functions takes in multiple values as argument, processes them and only returns a single value that represents something significant about the input values. Example of aggregate functions in SQL include max(), min() avg() and count().

In sqlite3, an aggregate SQL function is defined as a class that supports two methods; step() and finalize(). The step() method is called with each value in the input data, while the finalize() function is called at the end to return the resulting aggregate value. 

An aggregate function is registered using the create_aggregate() method of connection objects.

Basic example

Consider if we want to get the most occurring value in a given dataset.

We can define the Mode class as shown below:

import sqlite3
import collections

class Mode:
   def __init__(self):
      self.counts =collections.Counter()

   def step(self, val):
      self.counts[val] += 1

   def finalize(self):
      return self.counts.most_common(1)[0][0]

In the above example, we are keeping the count of each distinct value in a collections.Counter() object in the self.counts variable. With each call to step(), the count of the current value is incremented by 1. In the finalize() method, the most occurring element is returned.

Note that the Counter.most_common(n) returns a list of tuples for the n most common elements, with each list element being in form of (value, count).  in the above example, we called the method as most_common(1)[0][0], this way, the first value in the first tuple will be returned.

Register the function

After creating the class to represent the aggregate function, we can then register it using the conn.create_aggregate() function. The function has the following syntax:

create_aggregate(name, nargs, cls)

Where the name parameter is the name of the aggregate as it will appear in an SQL statement. nargs is the number of arguments taken by the step() method of the corresponding class. and cls is the class with the step() and finalize() methods that is to be used in the aggregation. 

The following example shows how to register the mode() aggregate function.

import sqlite3

#... definition of the Mode class is omitted. Refer to the previous section

with sqlite3.connct('example.db') as conn:
   conn.create_aggregate('mode', 1, Mode) #create the aggregate

A Working example

In the following example, we open a connection with a students.db database and create a table called student with two fields; name and age. We then insert several rows in the student table.

import sqlite3

with sqlite3.connect('students.db') as conn:
  try:

  #create student table
    conn.execute('''
        CREATE TABLE student(
            id iNTEGER PRIMARY KEY AUTOINCREMENT,
            name TEXT,
            age INTEGER

          );
      ''')
    print('Table student created successfully.')

    #insert some rows
    conn.executescript('''
        INSERT INTO student (name, age) VALUES ('John', 17);
        INSERT INTO student (name, age) VALUES ('Jane', 16);
        INSERT INTO student (name, age) VALUES ('Mary', 17);
        INSERT INTO student (name, age) VALUES ('Mike', 17);
        INSERT INTO student (name, age) VALUES ('Hannah', 15);
        INSERT INTO student (name, age) VALUES ('Joe', 19);
        INSERT INTO student (name, age) VALUES ('Hans', 17);
        INSERT INTO student (name, age) VALUES ('Margret', 17);
        INSERT INTO student (name, age) VALUES ('Like', 16);
        INSERT INTO student (name, age) VALUES ('Reagan', 17);
        INSERT INTO student (name, age) VALUES ('Albert', 15);
        INSERT INTO student (name, age) VALUES ('Frank', 17);
        INSERT INTO student (name, age) VALUES ('Steve', 17);
        INSERT INTO student (name, age) VALUES ('Bob', 16);

      ''')
    print('Rows inserted successfully.')
  except:
    print('Something went wrong.')

Table student created successfully.
Rows inserted successfully.

In the following example, we register the mode() aggregate and then use it to check the most occurring age  for the students.

import sqlite3
import collections

class Mode:
   def __init__(self):
      self.counts =collections.Counter()

   def step(self, val):
      self.counts[val] += 1

   def finalize(self):
      return self.counts.most_common(1)[0][0]


with sqlite3.connect('students.db') as conn:
  conn.create_aggregate('mode', 1, Mode)
  cursor = conn.cursor()

  cursor.execute('''
       SELECT mode(age) FROM student;
    ''')

  mode =  cursor.fetchone()
  print(mode)
  print(mode[0])

(17,)
17 

As you can see from the above example, 17 is the most occurring age for students in the database.