Tutorial

Java Stream distinct() Function to Remove Duplicates

Published on August 3, 2022
author

Pankaj

Java Stream distinct() Function to Remove Duplicates

Java Stream distinct() method returns a new stream of distinct elements. It’s useful in removing duplicate elements from the collection before processing them.

Java Stream distinct() Method

  • The elements are compared using the equals() method. So it’s necessary that the stream elements have proper implementation of equals() method.
  • If the stream is ordered, the encounter order is preserved. It means that the element occurring first will be present in the distinct elements stream.
  • If the stream is unordered, then the resulting stream elements can be in any order.
  • Stream distinct() is a stateful intermediate operation.
  • Using distinct() with an ordered parallel stream can have poor performance because of significant buffering overhead. In that case, go with sequential stream processing.

Remove Duplicate Elements using distinct()

Let’s see how to use stream distinct() method to remove duplicate elements from a collection.

jshell> List<Integer> list = List.of(1, 2, 3, 4, 3, 2, 1);
list ==> [1, 2, 3, 4, 3, 2, 1]

jshell> List<Integer> distinctInts = list.stream().distinct().collect(Collectors.toList());
distinctInts ==> [1, 2, 3, 4]
Java Stream Distinct Example
Java Stream distinct() Example

Processing only Unique Elements using Stream distinct() and forEach()

Since distinct() is a intermediate operation, we can use forEach() method with it to process only the unique elements.

jshell> List<Integer> list = List.of(1, 2, 3, 4, 3, 2, 1);
list ==> [1, 2, 3, 4, 3, 2, 1]

jshell> list.stream().distinct().forEach(x -> System.out.println("Processing " + x));
Processing 1
Processing 2
Processing 3
Processing 4
Java Stream Distinct ForEach Example
Java Stream distinct() forEach() Example

Stream distinct() with custom objects

Let’s look at a simple example of using distinct() to remove duplicate elements from a list.

package com.journaldev.java;

import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;

public class JavaStreamDistinct {

	public static void main(String[] args) {
		List<Data> dataList = new ArrayList<>();
		dataList.add(new Data(10));
		dataList.add(new Data(20));
		dataList.add(new Data(10));
		dataList.add(new Data(20));

		System.out.println("Data List = "+dataList);

		List<Data> uniqueDataList = dataList.stream().distinct().collect(Collectors.toList());

		System.out.println("Unique Data List = "+uniqueDataList);
	}

}

class Data {
	private int id;

	Data(int i) {
		this.setId(i);
	}

	public int getId() {
		return id;
	}

	public void setId(int id) {
		this.id = id;
	}

	@Override
	public String toString() {
		return String.format("Data[%d]", this.id);
	}
}

Output:

Data List = [Data[10], Data[20], Data[10], Data[20]]
Unique Data List = [Data[10], Data[20], Data[10], Data[20]]

The distinct() method didn’t remove the duplicate elements. It’s because we didn’t implement the equals() method in the Data class. So the superclass Object equals() method was used to identify equal elements. The Object class equals() method implementation is:

public boolean equals(Object obj) {
    return (this == obj);
}

Since the Data objects had the same ids’ but they were referring to the different objects, they were considered not equal. That’s why it’s very important to implement equals() method if you are planning to use stream distinct() method with custom objects. Note that both equals() and hashCode() methods are used by Collection classes API to check if two objects are equal or not. So it’s better to provide an implementation for both of them.

@Override
public int hashCode() {
	final int prime = 31;
	int result = 1;
	result = prime * result + id;
	return result;
}

@Override
public boolean equals(Object obj) {
	System.out.println("Data equals method");
	if (this == obj)
		return true;
	if (obj == null)
		return false;
	if (getClass() != obj.getClass())
		return false;
	Data other = (Data) obj;
	if (id != other.id)
		return false;
	return true;
}

Tip: You can easily generate equals() and hashCode() method using “Eclipse > Source > Generate equals() and hashCode()” menu option. The output after adding equals() and hashCode() implementation is:

Data List = [Data[10], Data[20], Data[10], Data[20]]
Data equals method
Data equals method
Unique Data List = [Data[10], Data[20

Reference: Stream distinct() API Doc

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about our products

About the authors
Default avatar
Pankaj

author

While we believe that this content benefits our community, we have not yet thoroughly reviewed it. If you have any suggestions for improvements, please let us know by clicking the “report an issue“ button at the bottom of the tutorial.

Still looking for an answer?

Ask a questionSearch for more help

Was this helpful?
 
JournalDev
DigitalOcean Employee
DigitalOcean Employee badge
November 18, 2019

Do you know how ‘dataList.stream().distinct().collect(Collectors.toList());’ this line actually works? I need to loop to check if data from two array matches one another where one array contains 80K data. And the two loops are taking too much time. So i need to know any mechanism how i can loop through two arraylist and reduce my time. Thank You.

- Taslima

    Try DigitalOcean for free

    Click below to sign up and get $200 of credit to try our products over 60 days!

    Sign up

    Join the Tech Talk
    Success! Thank you! Please check your email for further details.

    Please complete your information!

    Featured on Community

    Get our biweekly newsletter

    Sign up for Infrastructure as a Newsletter.

    Hollie's Hub for Good

    Working on improving health and education, reducing inequality, and spurring economic growth? We'd like to help.

    Become a contributor

    Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.

    Welcome to the developer cloud

    DigitalOcean makes it simple to launch in the cloud and scale up as you grow — whether you're running one virtual machine or ten thousand.

    Learn more
    Animation showing a Droplet being created in the DigitalOcean Cloud console