How to find duplicates in a collection?

Sometimes it is needed to find duplicates in a collection. I met this challenge a couple of days ago and my first thought was “Ha! What a trivial task!”. But then I realised that we could use a wide range of tools in Java to reach a hill of success and be so proud of our solution. We need to only ask ourselves or look at the acceptance criteria of a task in Jira and think about how much information we need to deliver about found duplicates.

Does a collection contain duplicates?

The easiest way to check if a collection contains duplicates is to pack everything into a HashSet and compare of sizes both collections. Before and after conversion to Set. But remember, we need to have implemented hashcode() and equals() methods properly to use all power of this type of java collection.

List<String> data = List.of(
        "Lexus",
        "Lexus",
        "BMW",
        "Toyota",
        "Toyota",
        "Porsche",
        "Audi",
        "Ferrari"
);

boolean containsDuplicates = new HashSet<>(data).size() != data.size();

System.out.println(data);
System.out.println("Does the collection contain duplicates? Answer: " + containsDuplicates);

// Output:
// [Lexus, Lexus, BMW, Toyota, Toyota, Porsche, Audi, Ferrari]
// Does the collection contain duplicates? Answer: true

How to find which items are duplicated?

The second level of searching for duplicates is to use the result of Set.add() which comes here from Collection interface and has one small feature. Have you ever checked what it should return for Set interface? I haven’t, so I was skipping this many many times but now I think it is very useful. It returns true if a Set does not already contain the specified element. So basically we can use the result of this operation to filter out all elements for which we received false during the new set creation. In other words, it means that we tried to add an element but nothing changed because this element already was there. It is easy as it sounds! Let’s check the example below. I used the same collection of data as previously.

Set<String> container = new HashSet<>();
Set<String> duplicates = data.stream()
        .filter(item -> !container.add(item))
        .collect(Collectors.toSet());

System.out.println("Found duplicates: " + duplicates);

// Output:
// Found duplicates: [Lexus, Toyota]

How to find and count duplicates?

What if we have to not only find duplicated elements but also count them? It is the method which I commonly use. To resolve this problem I am going to use two collectors and make from them a great cooperating couple. I am writing about Collectors.groupingBy() and Collectors.counting().

Grouping operation produces Map where forms of a key and a value can be adjusted to our needs and instead of counting elements and putting this result as a value for the key, we can store e.g a rough or transformed list of elements.

Map<String, Long> counted = data.stream()
        .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));

System.out.println("Elements count: " + counted);

// Output:
// Elements count: {Ferrari=1, Lexus=2, Toyota=2, Audi=1, Porsche=1, BMW=1}

How to count the frequency of an item?

Sometimes we do not need to count all duplicated items. Sometimes we need to check the frequency of only one item. Collections.frequency() matches perfectly for this quest and to be honest I was nicely surprised when I discovered this possibility of Collections class.

int count = Collections.frequency(List.of("A", "B", "B"), "B");
System.out.println("Count: " + count);

// Output:
// Count: 2

If you are interested in java topics, you can find more of my articles here.

How to find duplicates in a collection?

Does a collection contain duplicates?

How to find which items are duplicated?

How to find and count duplicates?

How to count the frequency of an item?

Singleton container with Testcontainers library

Testcontainers library for integration tests

Local H2 in-memory DB with Spring Boot

Does a collection contain duplicates?

How to find which items are duplicated?

How to find and count duplicates?

How to count the frequency of an item?

Similar Posts