By Pankaj Kumar
Pandas drop_duplicates() function removes duplicate rows from the DataFrame. Its syntax is:
drop_duplicates(self, subset=None, keep="first", inplace=False)
Let’s look into some examples of dropping duplicate rows from a DataFrame object.
This is the default behavior when no arguments are passed.
import pandas as pd
d1 = {'A': [1, 1, 1, 2], 'B': [2, 2, 2, 3], 'C': [3, 3, 4, 5]}
source_df = pd.DataFrame(d1)
print('Source DataFrame:\n', source_df)
# keep first duplicate row
result_df = source_df.drop_duplicates()
print('Result DataFrame:\n', result_df)
Output:
Source DataFrame:
A B C
0 1 2 3
1 1 2 3
2 1 2 4
3 2 3 5
Result DataFrame:
A B C
0 1 2 3
2 1 2 4
3 2 3 5
The source DataFrame rows 0 and 1 are duplicates. The first occurrence is kept and the rest of the duplicates are deleted.
result_df = source_df.drop_duplicates(keep='last')
print('Result DataFrame:\n', result_df)
Output:
Result DataFrame:
A B C
1 1 2 3
2 1 2 4
3 2 3 5
The index ‘0’ is deleted and the last duplicate row ‘1’ is kept in the output.
result_df = source_df.drop_duplicates(keep=False)
print('Result DataFrame:\n', result_df)
Output:
Result DataFrame:
A B C
2 1 2 4
3 2 3 5
Both the duplicate rows ‘0’ and ‘1’ are dropped from the result DataFrame.
import pandas as pd
d1 = {'A': [1, 1, 1, 2], 'B': [2, 2, 2, 3], 'C': [3, 3, 4, 5]}
source_df = pd.DataFrame(d1)
print('Source DataFrame:\n', source_df)
result_df = source_df.drop_duplicates(subset=['A', 'B'])
print('Result DataFrame:\n', result_df)
Output:
Source DataFrame:
A B C
0 1 2 3
1 1 2 3
2 1 2 4
3 2 3 5
Result DataFrame:
A B C
0 1 2 3
3 2 3 5
The columns ‘A’ and ‘B’ are used to identify duplicate rows. Hence, rows 0, 1, and 2 are duplicates. So, rows 1 and 2 are removed from the output.
source_df.drop_duplicates(inplace=True)
print(source_df)
Output:
A B C
0 1 2 3
2 1 2 4
3 2 3 5
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
Java and Python Developer for 20+ years, Open Source Enthusiast, Founder of https://www.askpython.com/, https://www.linuxfordevices.com/, and JournalDev.com (acquired by DigitalOcean). Passionate about writing technical articles and sharing knowledge with others. Love Java, Python, Unix and related technologies. Follow my X @PankajWebDev
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.
Full documentation for every DigitalOcean product.
The Wave has everything you need to know about building a business, from raising funding to marketing your product.
Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.
New accounts only. By submitting your email you agree to our Privacy Policy
Scale up as you grow — whether you're running one virtual machine or ten thousand.
Sign up and get $200 in credit for your first 60 days with DigitalOcean.*
*This promotional offer applies to new accounts only.