K-Mean Clustering and Feature Scaling

4 min readAug 10, 2020

Unsupervised Learning: Train the model without y (dependent feature) is called unsupervised learning.

Example: Baby learn how to sit, walk and run by trying multiple Hit and trial. Then after getting some experience baby understand how to sit, walk and run. There is no one to supervise the baby. This learning we call unsupervised learning.

Unsupervised learning is a way to find hidden pattern from dataset. This is also known as feature learning. Custer is a concept we use in unsupervised leaning. By characteristic of feature value it put the common data value into 1 cluster.

There are 2 different types cluster: Soft Clustering and Hard Clustering.

In hard clustering we use K-Mean Clustering. K-mean clustering group the similar attribute of features in one cluster. K-Mean Algorithm normally work on unsupervised learning. In unsupervised learning we fit/dump all the dataset into model. Because here we have only X features.

So let’s go more deep in K-mean Clustering. So before we dump/fit our data in any machine learning model the most important part of pre-processing should be done that is feature scaling. If we don’t do feature scaling it will give bias prediction.

Q. How machine give bias prediction if we don’t do feature scaling?

Ans: In Machine learning weight is most important, it tells you how much feature(X) is correlated with dependent feature (y).

Machine understand only number’s. So when machine find a weight for features(X) it will give more importance to that feature which have lesser value. In our case we can see that price feature have high value so here machine will give less weight to price feature and give high weight to Bed feature.

Let’s check the mathematics that machine do behind the scene. So machine normally uses linear function for weight.

Finding weight for Obs №3:

y=w1x1 + w2x2

1 = (1/8 * 4) + (1/1000 * 500)

w1 = 1/8 == 0.1

w2 = 1/1000 == 0.001

In reality we can see that price plays important role for buying bed. But here in equation we can see that according to the machine bed is more important feature then price that’s why machine gave bed a high weight, this we call bias. It will give a prediction but that prediction won’t be correct. So to solve this issue we have to use feature scaling before model creation.

We can do feature scaling in 2-way according to the requirement:

1) Standardization feature scaling [Range from -3 to +3]

We use Standardization feature scaling in all use cases. 99% we use standardization feature scaling.

2) Normalization feature scaling [Range from 0 to 1]

We use Normalization feature scaling when data is normally distributed.

When and How to do feature scaling?

First split the dataset into train and test dataset. Before model creation always do feature scaling on training data and then train the model. Do scaling of test data separately. Don’t fit the test data into model use transform function for scaling. Otherwise testing data will have different scale and training data have different this will give wrong prediction. Scale of train data should be equal to the scale of test data. We don’t do scaling of dummy data and y (dependent feature).

How K-mean Clustering work behind the scene?

Suppose if we gave k=2 then k-mean clustering start initially with some

random position. Both point are going to hunt for different features, that’s why we call it feature learning. Internally this 2 point uses Euclidean Distance formula. Whichever feature is close to 1st point that comes under them and for 2nd point it do the same. When all feature comes

under both the point it will take mean of all the feature and finally make a cluster. For finding cluster they are using mean/centroid. Now important point is when initializer start with random point then it come under random initializer trap. It means if they start randomly they will find wrong mean/centroid and because of that they make wrong cluster which do prediction wrong.

So to come out of the random initializer trap K-mean internally use a formula/function under K-mean ++ i.e. WCSS (Within Cluster squared sum).

This function/formula use to find-out how many cluster we require for current dataset.

This will show how much WCSS value change when we increase number of cluster when we see a small change in WCSS value then we can say now it is stable and this number of cluster is good for particular dataset. This method we also called Elbow method because when we plot this values on graph it looks like elbow.

You can connect with me on linkedin : Prakash Singh

Thank-you for reading.

If you find this article helpful, it would be appreciable if you could give 1 clap for it.

K-Mean Clustering and Feature Scaling

Q. How machine give bias prediction if we don’t do feature scaling?

We can do feature scaling in 2-way according to the requirement:

When and How to do feature scaling?

How K-mean Clustering work behind the scene?

Written by Prakash Singh Rajpurohit