While we believe that this content benefits our community, we have not yet thoroughly reviewed it. If you have any suggestions for improvements, please let us know by clicking the “report an issue“ button at the bottom of the tutorial.
Pandas DataFrame dropna() function is used to remove rows and columns with Null/NaN values. By default, this function returns a new DataFrame and the source DataFrame remains unchanged. We can create null values using None, pandas.NaT, and numpy.nan variables. The dropna() function syntax is:
dropna(self, axis=0, how="any", thresh=None, subset=None, inplace=False)
Let’s look at some examples of using dropna() function.
This is the default behavior of dropna() function.
import pandas as pd
import numpy as np
d1 = {'Name': ['Pankaj', 'Meghna', 'David', 'Lisa'], 'ID': [1, 2, 3, 4], 'Salary': [100, 200, np.nan, pd.NaT],
'Role': ['CEO', None, pd.NaT, pd.NaT]}
df = pd.DataFrame(d1)
print(df)
# drop all rows with any NaN and NaT values
df1 = df.dropna()
print(df1)
Output:
Name ID Salary Role
0 Pankaj 1 100 CEO
1 Meghna 2 200 None
2 David 3 NaN NaT
3 Lisa 4 NaT NaT
Name ID Salary Role
0 Pankaj 1 100 CEO
We can pass axis=1
to drop columns with the missing values.
df1 = df.dropna(axis=1)
print(df1)
Output:
Name ID
0 Pankaj 1
1 Meghna 2
2 David 3
3 Lisa 4
import pandas as pd
import numpy as np
d1 = {'Name': ['Pankaj', 'Meghna', 'David', pd.NaT], 'ID': [1, 2, 3, pd.NaT], 'Salary': [100, 200, np.nan, pd.NaT],
'Role': [np.nan, np.nan, pd.NaT, pd.NaT]}
df = pd.DataFrame(d1)
print(df)
df1 = df.dropna(how='all')
print(df1)
df1 = df.dropna(how='all', axis=1)
print(df1)
Output:
Name ID Salary Role
0 Pankaj 1 100 NaT
1 Meghna 2 200 NaT
2 David 3 NaN NaT
3 NaT NaT NaT NaT
Name ID Salary Role
0 Pankaj 1 100 NaT
1 Meghna 2 200 NaT
2 David 3 NaN NaT
Name ID Salary
0 Pankaj 1 100
1 Meghna 2 200
2 David 3 NaN
3 NaT NaT NaT
import pandas as pd
import numpy as np
d1 = {'Name': ['Pankaj', 'Meghna', 'David', pd.NaT], 'ID': [1, 2, pd.NaT, pd.NaT], 'Salary': [100, 200, np.nan, pd.NaT],
'Role': [np.nan, np.nan, pd.NaT, pd.NaT]}
df = pd.DataFrame(d1)
print(df)
df1 = df.dropna(thresh=2)
print(df1)
Output:
Name ID Salary Role
0 Pankaj 1 100 NaT
1 Meghna 2 200 NaT
2 David NaT NaN NaT
3 NaT NaT NaT NaT
Name ID Salary Role
0 Pankaj 1 100 NaT
1 Meghna 2 200 NaT
The rows with 2 or more null values are dropped.
import pandas as pd
import numpy as np
d1 = {'Name': ['Pankaj', 'Meghna', 'David', 'Lisa'], 'ID': [1, 2, 3, pd.NaT], 'Salary': [100, 200, np.nan, pd.NaT],
'Role': ['CEO', np.nan, pd.NaT, pd.NaT]}
df = pd.DataFrame(d1)
print(df)
df1 = df.dropna(subset=['ID'])
print(df1)
Output:
Name ID Salary Role
0 Pankaj 1 100 CEO
1 Meghna 2 200 NaN
2 David 3 NaN NaT
3 Lisa NaT NaT NaT
Name ID Salary Role
0 Pankaj 1 100 CEO
1 Meghna 2 200 NaN
2 David 3 NaN NaT
We can specify the index values in the subset when dropping columns from the DataFrame.
df1 = df.dropna(subset=[1, 2], axis=1)
print(df1)
Output:
Name ID
0 Pankaj 1
1 Meghna 2
2 David 3
3 Lisa NaT
The ‘ID’ column is not dropped because the missing value is looked only in index 1 and 2.
We can pass inplace=True
to change the source DataFrame itself. It’s useful when the DataFrame size is huge and we want to save some memory.
import pandas as pd
d1 = {'Name': ['Pankaj', 'Meghna'], 'ID': [1, 2], 'Salary': [100, pd.NaT]}
df = pd.DataFrame(d1)
print(df)
df.dropna(inplace=True)
print(df)
Output:
Name ID Salary
0 Pankaj 1 100.0
1 Meghna 2 NaN
Name ID Salary
0 Pankaj 1 100.0
Join our DigitalOcean community of over a million developers for free! Get help and share knowledge in our Questions & Answers section, find tutorials and tools that will help you grow as a developer and scale your project or business, and subscribe to topics of interest.
Sign up
Thank u bro, well explained in very simple way
- KHAJA MOINUDDIN KHAN
thats very comprehensive. out of all drop explanation … this is the best thank you
- johny