While we believe that this content benefits our community, we have not yet thoroughly reviewed it. If you have any suggestions for improvements, please let us know by clicking the “report an issue“ button at the bottom of the tutorial.
Replacing a value is very easy, thanks to replace() in R to replace the values.
In data analysis, there may be plenty of instances where you have to deal with missing values, negative values, or non-accurate values that are present in the dataset. These values might affect the analysis result as well.
So in order to avoid these situations and false accuracies, you can make use of replace() function in R to replace the false values with appropriate values.
The replace() function in R syntax is very simple and easy to implement. It includes the vector, index vector, and the replacement values as well as shown below.
replace(x, list, values)
This section will show how to replace a value in a vector. Execute the below code for the same.
In the below instances, you can observe and understand the syntax of the replace() function clearly. The first value is the vector name followed by the index of the value and finally the replacement value.
df<- c('apple', 'orange','grape','banana') df "apple" "orange" "grape" "banana"
Let’s replace the 2nd item in the list.
dy<-replace(df, 2,'blueberry') dy "apple" "blueberry" "grape" "banana"
Now, we’ll replace the 4th item in the list
dx<-replace(dy, 4, 'cranberry') dx "apple" "blueberry" "grape" "cranberry"
Well, in this section we are going to replace the NA values with 0 which are present in the data frame. This is the input data frame having the NA values.
The replacement of the NA values with 0 is done with the help of a single piece of code as shown below.
#defines the data frame df<-airquality #replaces the NA values with 0 df[is.na(df)]<-0 df
In the data analysis process, sometimes eliminating the entire row or a column just for the sake of one or more NA values is not a good idea. You simply cannot eliminate most of the values, as it may result in bad accuracy and results.
To overcome this situation the NA values are replaced by the mean of the rest of the values. This method has proven vital in producing good accuracy without any data loss.
The input data set having the NA values is shown below.
df<-airquality df df$Ozone[is.na(df$Ozone)]<-mean(df$Ozone, na.rm = T) round(df, digits = 0)
The below image shows the Ozone column having the NA values are replaced by the mean of the values in the Ozone column.
This section will show you how you can replace the negative values in the data frame with 0’s and NA’s.
This is done to avoid the negative tendency of the results. The negative values present in a dataset will mislead the analysis and produce false accuracy.
The below code will illustrate the same.
#reads the csv file df<-read.csv('negetivevalues.csv') df #replaces the negetive numbers with zeros data<-replace(df$entry2, df$entry2<0,0) data Output=> 0 654 345 876 34 98 0 98 67 0 45 761 #replaces the negetive values with NA's data1<-replace(df$entry2,df$entry2<0,NA) data1 Output=> NA 654 345 876 34 98 NA 98 67 NA 45 761
Replacing values in a data frame is a very handy option available in R for data analysis. Using replace() in R, you can switch NA, 0, and negative values with appropriate to clear up large datasets for analysis.
Congratulations, you learned to replace the values in R. Keep going! If you want to learn to take a sample of the dataset, have a look at our previous tutorial on the sample() method in R.
If you’ve enjoyed this tutorial and our broader community, consider checking out our DigitalOcean products which can also help you achieve your development goals.