By Prajwal CN
While we believe that this content benefits our community, we have not yet thoroughly reviewed it. If you have any suggestions for improvements, please let us know by clicking the “report an issue“ button at the bottom of the tutorial.
Missing data or values occurs when the data record is absent in the variable. This will cause serious issues in the data modeling process if not treated properly. Above all, most of the algorithms are not comfortable with missing data.
There are many ways to handle missing data in R. You can drop those records. But, keep in mind that you are dropping information when you do so and may lose a potential edge in modeling. On the other hand, you can impute the missing data with the mean and median of the data. In this article, we will be looking at filling Missing Values in R using the Tidyr package.
Tidyr is a R package which offers many functions to assist you in tidy the data. Greater the data quality, Better the model!
To install the Tidyr package in R, run the below code in R.
#Install tidyr package install.packages('tidyr') #Load the library library(tidyr)
package ‘tidyr’ successfully unpacked and MD5 sums checked
You will get the confirmation message after successful loading of the tidyr as shown above.
Yes, we have to create a simple sample data frame that has missing values. This will help us in using the fill function of tidyr to fill the missing data.
#Create a dataframe a <- c('A','B','C','D','E','F','G','H','I','J') b <- c('Roger','Carlo','Durn','Jessy','Mounica','Rack','Rony','Saly','Kelly','Joseph') c <- c(86,NA,NA,NA,88,NA,NA,86,NA,NA) df <- data.frame(a,b,c) df
a b c 1 A Roger 86 2 B Carlo NA 3 C Durn NA 4 D Jessy NA 5 E Mounica 88 6 F Rack NA 7 G Rony NA 8 H Saly 86 9 I Kelly NA 10 J Joseph NA
Well, we got our data frame but with a lot of missing values. So, in these cases where your data has more and more missing values, you can make use of the fill function in R to fill the corresponding values/neighbor values in place of missing data.
Yes, you can fill in the data as I said earlier. This process includes two approaches -
Didn’t get it?
Don’t worry. We will be going through some examples to illustrate the same and you will get to know how things work.
In this process, we have a data frame with 3 columns and 10 data records in it. Before using the fill function to handle the missing data, you have to make sure of some things -
Sometimes when the data is collected, people may enter 1 value as a representation of some values, because they were the same.
Ex: When collecting the age, if there were 10 people whose age is 25, you can mention 25 against the last person indicating that all 10 people’s age is 25.
Please note that it is not the most common situation you face. But, the intention of this is to make sure, when you are in this kind of space, you can use the fill function to deal with this.
#Dataframe a b c 1 A Roger 86 2 B Carlo NA 3 C Durn NA 4 D Jessy NA 5 E Mounica 88 6 F Rack NA 7 G Rony NA 8 H Saly 86 9 I Kelly NA 10 J Joseph NA #Creste new dataframe by filling missing values (Up) df1 <- df %>% fill(c, .direction = 'up') df1
a b c 1 A Roger 86 2 B Carlo 88 3 C Durn 88 4 D Jessy 88 5 E Mounica 88 6 F Rack 86 7 G Rony 86 8 H Saly 86 9 I Kelly NA 10 J Joseph NA
You can observe that, the fill function filled the missing values using UP direction (Bottom - Up).
Well, here we will be using the ‘Down’ method to fill the missing values in the data. Always make sure of some assumptions which I have mentioned in the earlier section to understand what you are doing and what will be the outcome.
#Data a b c 1 A Roger 86 2 B Carlo NA 3 C Durn NA 4 D Jessy NA 5 E Mounica 88 6 F Rack NA 7 G Rony NA 8 H Saly 86 9 I Kelly NA 10 J Joseph NA #Creates new dataframe by filling missing values (Down) - (Top-Down approach) df1 <- df %>% fill(c, .direction = 'down') df1
a b c 1 A Roger 86 2 B Carlo 86 3 C Durn 86 4 D Jessy 86 5 E Mounica 88 6 F Rack 88 7 G Rony 88 8 H Saly 86 9 I Kelly 86 10 J Joseph 86
Filling Missing values in R is the most important process when you are analyzing any data which has null values. Things may seem a bit hard for you, but make sure you through the article once or twice to understand it concisely. It’s not a hard cake to digest!.
I hope this method will come to your assistance in your future assignments. That’s all for now. Happy R!!! :)
More read: Fill function in R
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
Join our DigitalOcean community of over a million developers for free! Get help and share knowledge in our Questions & Answers section, find tutorials and tools that will help you grow as a developer and scale your project or business, and subscribe to topics of interest.Sign up now