Let’s understand how to update rows and columns using Python pandas. In the real world, most of the time we do not get ready-to-analyze datasets. There can be many inconsistencies, invalid values, improper labels, and much more. Being said that, it is mesentery to update these values to achieve uniformity over the data. In this tutorial, we will be focusing on how to update rows and columns in python using pandas. Without spending much time on the intro, let’s dive into action!.
In this whole tutorial, we will be using a dataframe that we are going to create now. This will give you an idea of updating operations on the data. After this, you can apply these methods to your data.
To create a dataframe, pandas offers function names pd.DataFrame
, which helps you to create a dataframe out of some data. Let’s see how it works.
#create a dictionary
import pandas as pd
fruit_data = {"Fruit": ['Apple','Avacado','Banana','Strawberry','Grape'],"Color": ['Red','Green','Yellow','Pink','Green'],
"Price": [45, 90, 60, 37, 49]
}
fruit_data
Here, we have created a python dictionary with some data values in it. Now, we were asked to turn this dictionary into a pandas dataframe.
#Dataframe
data = pd.DataFrame(fruit_data)
data
That’s perfect!. Using the pd.DataFrame
function by pandas, you can easily turn a dictionary into a pandas dataframe. Our dataset is now ready to perform future operations.
More read: How To Change Column Order Using Pandas
Sometimes, the column or the names of the features will be inconsistent. It can be with the case of the alphabet and more. Having a uniform design helps us to work effectively with the features.
So, as a first step, we will see how we can update/change the column or feature names in our data.
#update the column name
data.rename(columns = {'Fruit':'Fruit Name'})
That’s it. As simple as shown above. You can even update multiple column names at a single time. For that, you have to add other column names separated by a comma under the curl braces.
#multile column update
data.rename(columns = {'Fruit':'Fruit Name','Colour':'Color','Price':'Cost'})
Just like this, you can update all your columns at the same time.
You may have encountered inconsistency in the case of the column names when you are working with datasets with many columns.
In our data, you can observe that all the column names are having their first letter in caps. It is always advisable to have a common casing for all your column names.
Well, you can either convert them to upper case or lower case.
#lower case
data.columns.str.lower()
data
Now, all our columns are in lower case.
Like updating the columns, the row value updating is also very simple. You have to locate the row value first and then, you can update that row with new values.
You can use the pandas loc
function to locate the rows.
#updating rows
data.loc[3]
Fruit Strawberry
Color Pink
Price 37
Name: 3, dtype: object
We have located row number 3, which has the details of the fruit, Strawberry. Now, we have to update this row with a new fruit named Pineapple and its details.
Let’s roll!
#update
data.loc[3] = ['PineApple','Yellow','48']
data
That’s it. I hope you too find this easy to update the row values in the data. Now, let’s assume that you need to update only a few details in the row and not the entire one. So, what’s your approach to this?
#update specific values
data.loc[3, ['Price']]
Price 48
Name: 3, dtype: object
we have to update only the price of the fruit located in the 3rd row. We get to know that the current price of that fruit is 48. But, we have to update it to 65. Let’s do that.
#updating
data.loc[3, ['Price']] = [65]
data
Awesome :P
We have updated the price of the fruit Pineapple as 65 with just one line of python code. That’s how it works. Simple.
Yes, we are now going to update the row values based on certain conditions. Finally, we want some meaningful values which should be helpful for our analysis.
Let’s define our condition.
#Condition
updated = data['Price'] > 60
updated
What we are going to do here is, updating the price of the fruits which costs above 60 as Expensive.
0 False
1 True
2 False
3 True
4 False
Name: Price, dtype: bool
Based on the output, we have 2 fruits whose price is more than 60. Let’s quote those fruits as expensive in the data.
#Updating
data.loc[updated, 'Price'] = 'Expensive'
data
Trust me, you are awesome :).
You did it in an amazing way and with perfection. In this whole tutorial, I have never used more than 2 lines of code. The best suggestion I can give is, to try to learn pandas as much as possible. It is such a robust library, which offers many functions which are one-liners, but able to get the job done epically.
Update rows and columns in the data are one primary thing that we should focus on before any analysis. With simple functions and code, we can make the data much more meaningful and in this process, we will definitely get some insights over the data quality and any further requirements as well. If we get our data correct, trust me, you can uncover many precious unheard stories.
I hope you find this tutorial useful one or another way and don’t forget to implement these practices in your analysis work.
That’s all for now. Happy Python!!!
More read: Pandas DataFrame
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
While we believe that this content benefits our community, we have not yet thoroughly reviewed it. If you have any suggestions for improvements, please let us know by clicking the “report an issue“ button at the bottom of the tutorial.
Sign up for Infrastructure as a Newsletter.
Working on improving health and education, reducing inequality, and spurring economic growth? We'd like to help.
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.