remove duplicate elements from a Python list

List Type

list methods

append()

insert()

remove()

clear()

copy()

count()

extend()

index()

pop()

reverse()

sort()

list examples

add items to a list

check if object is a list

create a list

reverse a list

remove all occurrences

get item index

Flattening a list

swap elements

remove duplicates

remove an element

random element

concatenate list

insert to sorted list

Apply a function to list

append multiple items

negative indexing

Set vs List

basic example

my_list = ['Tokyo', 'Moscow', 'Tokyo', 'Nairobi', 'Helsinki', 'Tokyo', 'Moscow', 'Nairobi']

my_set = set(my_list)

unique_list = list(my_set)

print(unique_list)

The list is the most versatile core data type in Python, it represents a collections of ordered elements in which duplicates are allowed.

Sometimes we may need to remove all the duplicate values from a list. We can achieve this by creating a new list which only contains the unique values or by actually removing the duplicates from the existing list. Either of the approaches has its own advantages and disadvantages. If we create a new list, we can preserve the original list without any modifications thus there will be no information or data lost. However, the downside is the time and memory taken up by having to create a new list and compare items from the original list to check for uniqueness. On the other hand, if we remove the duplicates from the existing list, it will be more time-efficient since we don't have to create a new list. However, we can't undo the modifications and the original list will be permanently altered.

There are various ways that we can use to remove duplicates in a list efficiently:

Using a secondary list.
Using list comprehension
Using count() and pop() methods with a while loop.
Using collections.OrderedDict keys
Using an intermediate set

Using a secondary list

In this approach, we create a new list, then loop through the elements of the original list and append each of its elements to the new list once. The resulting secondary list will contain the unique elements while the original list remains unchanged.

my_list = ['Python', 'Java', 'Python', 'C++', 'Ruby', 'Python']

#create the secondary list
unique_list = []

for item in my_list:
    if item not in unique_list:
        unique_list.append(item)

print(unique_list)

If necessary, you can re-assign the secondary list to the original list in which case the original list would contain the same items as the secondary list.

my_list = ['Python', 'Java', 'Python', 'C++', 'Ruby', 'Python']

unique_list = []

for item in my_list:
    if item not in unique_list:
        unique_list.append(item)

my_list = unique_list

print(my_list)

Using list comprehension

List comprehension is a convenient and powerful syntax that combines the features of for loops, sequence building, and conditional logic all into one concise syntax.

We can use list comprehension to create a new list containing only the unique elements of the original list.

my_list = ['Japan', 'China', 'China', 'Japan', 'India', 'Japan']

unique_list = []

[unique_list.append(item) for item in my_list if item not in unique_list]

print(unique_list)

Using count() and pop() methods with a while loop.

The previous approaches, resulted in creation of a new list rather that altering the original list. We can use relevant list methods to actually remove duplicate elements from the original list.

The list.count() method returns the number of times a specified element appears in a list. We will use this method to tell whether an element appears more than once in the list.

my_list = ['Python', 'Java', 'Java']

#use the count() method
print(my_list.count('Java'))

The remove() method, on the other hand, removes the first occurrence of a specified element from the list.

my_list = ['Python', 'Java','C++']

my_list.remove('Java')
print(my_list)

We can combine the functionality of the two methods and a while loop to remove duplicate elements from a list. We will see in a moment why the while loop is more fit than the for loop in this scenario.

my_list = [87, 94, 45, 94, 94, 41, 65, 94, 41, 99, 94, 94, 94]

i = 0
while i < len(my_list):
    current = my_list[i]
    if my_list.count(current) > 1:
         my_list.remove(current)
    else:
        i += 1      

print(my_list)

In the above case we only increase the value of i if there is no element that has been removed from the list.

The while loop allows us to keep counter values which we can increment or decrement at will. This is in contrary to the for loop in which the loop counter is mostly beyond control once the loop execution starts.

As earlier noted, the remove() method removes the leftmost occurrence of an element from the list, this means that the unique element that remains in the list is the rightmost one.

This approach may also be far less efficient due to all the list traversing done by the count() method for each object of the list. It is therefore more suited in small to medium size lists.

Using collections.OrderedDict keys

The collections module in the standard library provides the OrderedDict class whose objects behaves just like standard dictionary except that the order in which elements are inserted are kept. This feature combined with the fact that dictionary keys are unique can help us to remove duplicate elements from a list.

The OrderedDict just like the standard dict contains the keys() method which iterates through all the keys of the dictionary. We will use this method to create a unique list from the keys.

from collections import OrderedDict

my_list = ['orange', 'apple', 'orange', 'pear', 'pear', 'apple', 'orange']

my_dict = OrderedDict( zip(my_list, (None,) * len(my_list) ))

unique_list = list(my_dict.keys())

print(unique_list)

In the above approach we use the builtin zip() function with the original list and a tuple of corresponding length containing Nones as arguments. The function thus creates an iterator of (element, None) pairs so that in the resulting ordered dict, the element will be used as the key and None as the value.

We then used the builtin list() function to cast the keys of the OrderedDict, my_dict into a list. The resulting list contains only the unique elements from the original list.

While this approach works, you should, however, remember that dictionary keys must be hashable/immutable, this unfortunately means that this approach will fail if the target list contains unhashable object such as another list, a set, a dictionary, etc.

from collections import OrderedDict

#A list of sets
my_list = [({2}, None), ({1}, None), ({1}, None) ]

#raises an error
my_dict = OrderedDict(my_list)

Using an intermediate set

Sets are popular for their uniqueness property, an item can only appear once in a set. However, sets are also known for being unordered this means that even in the resulting unique list, the order of the elements cannot be guaranteed to persist the way they were in the original list. Also just like in dictionary keys, set elements must be hashable/immutable thus the approach will fail if the original list contains unhashable element.

my_list = ['Tokyo', 'Moscow', 'Tokyo', 'Nairobi', 'Helsinki', 'Tokyo', 'Moscow', 'Nairobi']

my_set = set(my_list)

unique_list = list(my_set)

print(unique_list)

Conclusion

The first two approaches are more fit and efficient if you want to create a new list rather than altering the original list, while the count() and remove() approach is more fit when we want to alter the original list. However, the count() and remove() approach may be less efficient than the previous two and is more fit if the list is not very large, it also keeps the rightmost element rather the leftmost element which may not be desirable at times.

The OrderDict and the set approaches are only applicable if we are sure that the list contains hashable elements. The OrderedDict approach has an advantage in that it keeps the elements in the order they were in the original list a thing which we cannot guarantee with sets.