In big data agile world it is very important to work efficiently with better accuracy. If data is huge then we need huge resources (RAM, CPU, GPU, HD, etc) and investment. Cloud computing can be a solution for this but in Deep Learning world we have some ways in which we can work on huge data with less resources and one of the way is Transfer learning.

In Transfer Learning we use the weight of pre-trained model to train our model. Some of the pre-trained model are VGG, Mobilenet, Resnet, etc.

Question might come in mind that why to use other’s pre-trained model if we can create our own?

Yes we can create our own personal model but why to invest so much time to train your model and find thousands of weight which is already being given by someone. So it is better to use pre-trained model’s weight, save time and engage yourself in other work. If your use case/ requirement is different than you can create your own personal model.

How to use Transfer Learning?

In architecture of deep learning it contain Input layer, Convolution Layer, Hidden (Fully Connected) layer and Output layer.

Input layer: Data that we feed to machine for model creation.

Convolution layer: It do 2 thing i.e. convolve and pooling.

1. Convolve: Take the image input from input layer and do feature extraction, feature extraction means combining multiple similar feature into 1 feature. After feature extraction is done it use relu activation function between convolve layer and pooling layer.

2. Pooling layer: Pooling layer minimize the feature (squeeze the image) without losing important feature.

Flatten Layer: As machine only understand number so this layer convert the image into 2D to 1D.

Hidden/Fully connected Layer: Now Neural network find the best parameter and train the model.

Output Layer: It predict the input which receive from Fully connected layer with their classes. Using gradient descent if input weight not match with the classes then it send back again to change parameter, this we call back proportion. This process goes on till it not get desired weight. For Output layer we are going to use softmax activation function.

For this practical I am using Mobilenet architecture. I gonna add input of our data in their input layer. I completely freeze the convolution layer and fully connected layer of Mobilenet architecture. I removed their output layer called softmax and added dense layer ahead of mobilenet’s dense layer which is now freeze. At last I have added new softmax function.

We are freezing Convolution layer and Fully Connected layer because we don’t want to train mobilenet architecture model again it is already being trained. Here is the scenario where half the model is trained and half is not trained. So we are going to train only new fully connected layer using mobilenet architecture weight. We are using the experience (weight) of mobilenet model to train our model that’s why we call this transfer of learning.

For the practical I used sketches. I took 1200 pics of each sketch from my webcam. 900pic is given for training and 300 for testing(75:25 ratio). I added sketch images in MobileNet Architecture input layer. I added 3 new fully connected layer after freeze layer and atlast I added Output layer with Softmax Function where number of category I kept 4.

  1. Downloading Mobilenet model without output layer and freezing all layers except input layer.

2. You can see Output layer is not there and layers are freeze. Creating function to add 3 Fully connected layer and 1 output layer after freeze layer.

3. You can see 3 Fully connected layer added and Ouput layer with softmax activation function is added with freeze layer of MobileNet model.

4. In input layer I had given very less data so I am doing pre-processing to increase data of training and testing. You can see training images increases to 3700 and testing 1200. I am using batch to do the processing fast with batch(group) of 12 images go at a time.

5. Using Optimizer to change the weight after comparing with loss. Train with only 3 Epoch then to it took 20 minutes to complete it. Let us check how much accuracy we get.

Accuracy : You can see above 92% accuracy we got. Now you can imagine the power of Transfer Learning.


Out of 20, model predicted 3 wrong. But it is so powerful now, giving 90% accuracy approx. and remember that we had done only 3 Epoch, if we increase number of Epoch accuracy might increase.


Click here to get complete code of this project.

Thank-you for reading.

If this article is helpful, it would be appreciable if you could give 1 clap for it.

Cloud Stack Developer. Working on Machine Learning, DevOps, Cloud and Big Data.