Need to de-duplicate a list of items?
>>> all_colors = ["blue", "purple", "green", "red", "green", "pink", "blue"]
How can you do this in Python?
Let's take a look at two approach for de-duplicating: one when we don't care about the order of our items and one when we do.
set
to de-duplicateYou can use the built-in set
constructor to de-duplicate the items in a list (or in any iterable):
>>> unique_colors = set(all_colors)
>>> unique_colors
{'blue', 'pink', 'green', 'purple', 'red'}
This only works for lists of hashable values, but that includes quite a few values: strings, numbers, and most tuples are hashable in Python.
You might have noticed that the order of the original items was lost once they were converted to a set:
>>> all_colors
['blue', 'purple', 'green', 'red', 'green', 'pink', 'blue']
>>> unique_colors
{'blue', 'pink', 'green', 'purple', 'red'}
Even if we convert the items back to a list
, that original order won't be maintained:
>>> unique_colors = list(set(all_colors))
>>> unique_colors
['blue', 'pink', 'green', 'purple', 'red']
What if we want to maintain the order of our items while de-duplicating?
To de-duplicate while maintaining relative item order, we can use dict.fromkeys
:
>>> unique_colors = dict.fromkeys(all_colors)
>>> >>> unique_colors
{'blue': None, 'purple': None, 'green': None, 'red': None, 'pink': None}
Python's dict
class has a fromkeys
class method which accepts an iterable and makes a new dictionary where the keys are the items from the given iterable.
Since dictionaries can't have duplicate keys, this also de-duplicates the given items! Dictionaries also maintain the order of their items (as of Python 3.6), so the resulting dictionary will have its keys ordered based on the first time each value was seen.
Okay, we have a dictionary now, but how can we use it?
Well, dictionaries have a keys
method which we could use to get an iterable of just the keys:
>>> unique_colors = dict.fromkeys(all_colors).keys()
>>> unique_colors
dict_keys(['blue', 'purple', 'green', 'red', 'pink'])
And we could even convert those keys to a list:
>>> unique_colors = list(dict.fromkeys(all_colors).keys())
>>> unique_colors
['blue', 'purple', 'green', 'red', 'pink']
But dictionaries are also iterables (looping over a dictionary provides the keys), so we could simply pass the dictionary to the built-in list
constructor:
>>> unique_colors = list(dict.fromkeys(all_colors))
>>> unique_colors
['blue', 'purple', 'green', 'red', 'pink']
That might look a bit odd, but it works.
If you prefer to be more explicit by calling the keys
method, you're welcome to.
I don't have a strong preference between these two approaches: being explicit is nice but so is brevity.
One last thing to note: if you just need to loop over the unique items right away there's no need to convert back to a list. This works fine:
>>> for color in dict.fromkeys(all_colors):
... print(color)
...
blue
purple
green
red
pink
That works because all forms of iteration are the same in Python: whether you're using the list
constructor, a for
loop, or a list comprehension it all works the same way.
You might be wondering whether a list and a for
loop would work well for de-duplicating.
>>> unique_colors = []
>>> for color in all_colors:
... if color not in unique_colors:
... unique_colors.append(color)
...
>>> unique_colors
['blue', 'purple', 'green', 'red', 'pink']
This does work, but if you have many values to de-duplicate this could be very slow because the in
operator on lists is considerably slower than the in
operator on sets.
Watch List containment checks for more on that.
The next time you need to de-duplicate items in your list (or in any iterable), try out Python's set
constructor.
>>> unique_items = set(original_items)
If you need to de-duplicate while maintaining the order of your items, use dict.fromkeys
instead:
>>> unique_items = list(dict.fromkeys(original_items))
That will de-duplicate your items while keeping them in the order that each item was first seen.
If you'd like practice de-duplicating list items, try out the uniques_only
Python Morsels exercise.
The bonuses include some twists that weren't discussed above. 😉
Need to fill-in gaps in your Python skills?
Sign up for my Python newsletter where I share one of my favorite Python tips every week.
Need to fill-in gaps in your Python skills? I send weekly emails designed to do just that.