How to find duplicates in a collection?

Sometimes it is needed to find duplicates in a collection. I met this challenge a couple of days ago and my first thought was “Ha! What a trivial task!”. But then I realised that we could use a wide range of tools in Java to reach a hill of success and be so proud of our solution. We need to only ask ourselves or look at the acceptance criteria of a task in Jira and think about how much information we need to deliver about found duplicates.

Does a collection contain duplicates?

The easiest way to check if a collection contains duplicates is to pack everything into a HashSet and compare of sizes both collections. Before and after conversion to Set. But remember, we need to have implemented hashcode() and equals() methods properly to use all power of this type of java collection.

List<String> data = List.of(
        "Lexus",
        "Lexus",
        "BMW",
        "Toyota",
        "Toyota",
        "Porsche",
        "Audi",
        "Ferrari"
);

boolean containsDuplicates = new HashSet<>(data).size() != data.size();

System.out.println(data);
System.out.println("Does the collection contain duplicates? Answer: " + containsDuplicates);

// Output:
// [Lexus, Lexus, BMW, Toyota, Toyota, Porsche, Audi, Ferrari]
// Does the collection contain duplicates? Answer: true

How to find which items are duplicated?

The second level of searching for duplicates is to use the result of Set.add() which comes here from Collection interface and has one small feature. Have you ever checked what it should return for Set interface? I haven’t, so I was skipping this many many times but now I think it is very useful. It returns true if a Set does not already contain the specified element. So basically we can use the result of this operation to filter out all elements for which we received false during the new set creation. In other words, it means that we tried to add an element but nothing changed because this element already was there. It is easy as it sounds! Let’s check the example below. I used the same collection of data as previously.

Set<String> container = new HashSet<>();
Set<String> duplicates = data.stream()
        .filter(item -> !container.add(item))
        .collect(Collectors.toSet());

System.out.println("Found duplicates: " + duplicates);

// Output:
// Found duplicates: [Lexus, Toyota]

How to find and count duplicates?

What if we have to not only find duplicated elements but also count them? It is the method which I commonly use. To resolve this problem I am going to use two collectors and make from them a great cooperating couple. I am writing about Collectors.groupingBy() and Collectors.counting().

Grouping operation produces Map where forms of a key and a value can be adjusted to our needs and instead of counting elements and putting this result as a value for the key, we can store e.g a rough or transformed list of elements.

Map<String, Long> counted = data.stream()
        .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));

System.out.println("Elements count: " + counted);

// Output:
// Elements count: {Ferrari=1, Lexus=2, Toyota=2, Audi=1, Porsche=1, BMW=1}

How to count the frequency of an item?

Sometimes we do not need to count all duplicated items. Sometimes we need to check the frequency of only one item. Collections.frequency() matches perfectly for this quest and to be honest I was nicely surprised when I discovered this possibility of Collections class.

int count = Collections.frequency(List.of("A", "B", "B"), "B");
System.out.println("Count: " + count);

// Output:
// Count: 2

If you are interested in java topics, you can find more of my articles here.

Other useful links:

All examples from this post are available on my GitHub here

To resolve one programming problem we have as many solutions as ideas. Which one should be implemented? It depends on our needs but for sure we should take into consideration complexity and clarity. I hope I brought here some useful knowledge for you.

The same programmers as us work everywhere.
Oskar K. Bogacz

Similar Posts