Java Stream distinct() method returns a new stream of distinct elements. It’s useful in removing duplicate elements from the collection before processing them.
Let’s see how to use stream distinct() method to remove duplicate elements from a collection.
jshell> List<Integer> list = List.of(1, 2, 3, 4, 3, 2, 1);
list ==> [1, 2, 3, 4, 3, 2, 1]
jshell> List<Integer> distinctInts = list.stream().distinct().collect(Collectors.toList());
distinctInts ==> [1, 2, 3, 4]
Since distinct() is a intermediate operation, we can use forEach() method with it to process only the unique elements.
jshell> List<Integer> list = List.of(1, 2, 3, 4, 3, 2, 1);
list ==> [1, 2, 3, 4, 3, 2, 1]
jshell> list.stream().distinct().forEach(x -> System.out.println("Processing " + x));
Processing 1
Processing 2
Processing 3
Processing 4
Let’s look at a simple example of using distinct() to remove duplicate elements from a list.
package com.journaldev.java;
import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;
public class JavaStreamDistinct {
public static void main(String[] args) {
List<Data> dataList = new ArrayList<>();
dataList.add(new Data(10));
dataList.add(new Data(20));
dataList.add(new Data(10));
dataList.add(new Data(20));
System.out.println("Data List = "+dataList);
List<Data> uniqueDataList = dataList.stream().distinct().collect(Collectors.toList());
System.out.println("Unique Data List = "+uniqueDataList);
}
}
class Data {
private int id;
Data(int i) {
this.setId(i);
}
public int getId() {
return id;
}
public void setId(int id) {
this.id = id;
}
@Override
public String toString() {
return String.format("Data[%d]", this.id);
}
}
Output:
Data List = [Data[10], Data[20], Data[10], Data[20]]
Unique Data List = [Data[10], Data[20], Data[10], Data[20]]
The distinct() method didn’t remove the duplicate elements. It’s because we didn’t implement the equals() method in the Data class. So the superclass Object equals() method was used to identify equal elements. The Object class equals() method implementation is:
public boolean equals(Object obj) {
return (this == obj);
}
Since the Data objects had the same ids’ but they were referring to the different objects, they were considered not equal. That’s why it’s very important to implement equals() method if you are planning to use stream distinct() method with custom objects. Note that both equals() and hashCode() methods are used by Collection classes API to check if two objects are equal or not. So it’s better to provide an implementation for both of them.
@Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + id;
return result;
}
@Override
public boolean equals(Object obj) {
System.out.println("Data equals method");
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Data other = (Data) obj;
if (id != other.id)
return false;
return true;
}
Tip: You can easily generate equals() and hashCode() method using “Eclipse > Source > Generate equals() and hashCode()” menu option. The output after adding equals() and hashCode() implementation is:
Data List = [Data[10], Data[20], Data[10], Data[20]]
Data equals method
Data equals method
Unique Data List = [Data[10], Data[20
Reference: Stream distinct() API Doc
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
While we believe that this content benefits our community, we have not yet thoroughly reviewed it. If you have any suggestions for improvements, please let us know by clicking the “report an issue“ button at the bottom of the tutorial.
Sign up for Infrastructure as a Newsletter.
Working on improving health and education, reducing inequality, and spurring economic growth? We'd like to help.
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.
Do you know how ‘dataList.stream().distinct().collect(Collectors.toList());’ this line actually works? I need to loop to check if data from two array matches one another where one array contains 80K data. And the two loops are taking too much time. So i need to know any mechanism how i can loop through two arraylist and reduce my time. Thank You.
- Taslima