By Safa Mulani
While we believe that this content benefits our community, we have not yet thoroughly reviewed it. If you have any suggestions for improvements, please let us know by clicking the “report an issue“ button at the bottom of the tutorial.
Hello, readers! In this article, we will be focusing on one of the most important pre-processing techniques in Python - Standardization using StandardScaler() function.
So, let us begin!!
Before getting into Standardization, let us first understand the concept of Scaling.
Scaling of Features is an essential step in modeling the algorithms with the datasets. The data that is usually used for the purpose of modeling is derived through various means such as:
So, the data obtained contains features of various dimensions and scales altogether. Different scales of the data features affect the modeling of a dataset adversely.
It leads to a biased outcome of predictions in terms of misclassification error and accuracy rates. Thus, it is necessary to Scale the data prior to modeling.
This is when standardization comes into picture.
Standardization is a scaling technique wherein it makes the data scale-free by converting the statistical distribution of the data into the below format:
By this, the entire data set scales with a zero mean and unit variance, altogether.
Let us now try to implement the concept of Standardization in the upcoming sections.
Python sklearn library offers us with StandardScaler() function to standardize the data values into a standard format.
object = StandardScaler() object.fit_transform(data)
According to the above syntax, we initially create an object of the
StandardScaler() function. Further, we use
fit_transform() along with the assigned object to transform the data and standardize it.
Note: Standardization is only applicable on the data values that follows Normal Distribution.
Have a look at the below example!
from sklearn.datasets import load_iris from sklearn.preprocessing import StandardScaler dataset = load_iris() object= StandardScaler() # Splitting the independent and dependent variables i_data = dataset.data response = dataset.target # standardization scale = object.fit_transform(i_data) print(scale)
By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.
For more posts related to Python, Stay tuned @ Python with JournalDev and till then, Happy Learning!! :)
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
Join our DigitalOcean community of over a million developers for free! Get help and share knowledge in our Questions & Answers section, find tutorials and tools that will help you grow as a developer and scale your project or business, and subscribe to topics of interest.Sign up now