By Safa Mulani
While we believe that this content benefits our community, we have not yet thoroughly reviewed it. If you have any suggestions for improvements, please let us know by clicking the “report an issue“ button at the bottom of the tutorial.
Hello, readers! In this article, we will be having a look at 3 Easy Ways to Normalize data in R programming.
So, let us begin!! :)
Feature Scaling is an essential step prior to modeling while solving prediction problems in Data Science. Machine Learning algorithms work well with the data that belongs to a smaller and standard scale.
This is when Normalization comes into picture. Normalization techniques enables us to reduce the scale of the variables and thus it affects the statistical distribution of the data in a positive manner.
In the subsequent sections, we will be having a look at some of the techniques to perform Normalization on the data values.
In the real world scenarios, to work with the data, we often come across situations wherein we find the datasets that are unevenly distributed. That is, they are either skewed or do not follow normalization of values.
In such cases, the easiest way to get values into proper scale is to scale them through the individual log values.
rm(list = ls()) data = c(1200,34567,3456,12,3456,0985,1211) summary(data) log_scale = log(as.data.frame(data))
data 1 7.090077 2 10.450655 3 8.147867 4 2.484907 5 8.147867 6 6.892642 7 7.099202
Another efficient way of Normalizing values is through the Min-Max Scaling method.
With Min-Max Scaling, we scale the data values between a range of 0 to 1 only. Due to this, the effect of outliers on the data values suppresses to a certain extent. Moreover, it helps us have a smaller value of the standard deviation of the data scale.
In the below example, we have used ‘caret’ library to pre-process and scale the data. The
preProcess() function enables us to scale the value to a range of 0 to 1 using
method = c('range') as an argument. The
predict() method applies the actions of the preProcess() function on the entire data frame as shown below.
rm(list = ls()) data = c(1200,34567,3456,12,3456,0985,1211) summary(data) library(caret) process <- preProcess(as.data.frame(data), method=c("range")) norm_scale <- predict(process, as.data.frame(data))
data 1 0.03437997 2 1.00000000 3 0.09966720 4 0.00000000 5 0.09966720 6 0.02815801 7 0.03469831
In Standard scaling, also known as Standardization of values, we scale the data values such that the overall statistical summary of every variable has a mean value of zero and an unit variance value.
scale() function enables us to apply standardization on the data values as it centers and scales the
rm(list = ls()) data = c(1200,34567,3456,12,3456,0985,1211) summary(data) scale_data <- as.data.frame(scale(data))
As seen below, the mean value of the data frame before scaling is 6412. Whereas, after performing scaling of values, the mean has reduced to Zero.
Min. 1st Qu. Median Mean 3rd Qu. Max. 12 1092 1211 6412 3456 34567 V1 1 -0.4175944 2 2.2556070 3 -0.2368546 4 -0.5127711 5 -0.2368546 6 -0.4348191 7 -0.4167131 V1 Min. :-0.5128 1st Qu.:-0.4262 Median :-0.4167 Mean : 0.0000 3rd Qu.:-0.2369 Max. : 2.2556
By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question. For more such posts related to R programming, stay tuned with us!
Till then, Happy Learning!! :)
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.