A guide to transfer learning with Keras using DenseNet201

5 min readFeb 21, 2021

Abstract:

In this short blog post I will focus on the demonstration that high validation accuracies (here >88%) can be obtained by using a pre-trained (“frozen”) base model instance (DenseNet201) topped by a few carefully crafted Dense layers, trained with the CIFAR10 dataset (50,000 training images, 10,000 test images).

Introduction:

Transfer learning consists of taking features learned on one problem, and leveraging them on a new, similar problem. For instance, features from a model that has learned to identify racoons may be useful to kick-start a model meant to identify tanukis.

“Human learners appear to have inherent ways to transfer knowledge between tasks. That is, we recognize and apply relevant knowledge from previous learning experience when we encounter new tasks. The more related a new task is to our previous experience, the more easily we can master it.”

Transfer Learning can be explained intuitively through a simple example: imagine a person who wants to learn to play the piano, it will be easier to do so if he knows how to play the guitar. She will be able to use her knowledge of music to learn to play a new instrument.

Materials and Methods

Setting our environment

I’m going to use Keras which is an open source library written in Python for neural networks. I work over it with tensorflow in a Google Colab, a Jupyter notebook environment that runs in the cloud.

The first thing I do is importing the libraries needed with the line of code below.

import tensorflow as tfimport tensorflow.keras as K

Training a model uses a lot of resources so we recommend using a GPU configuration in the Google Colab. This will speed up the process and allow more testing.

Load the CIFAR10 dataset:

CIFAR-10(Canadian Institute For Advanced Research) is a dataset with 60000 32x32 colour images grouped in 10 classes, that means 6000 images per class. This is a dataset of 50,000 32x32 color training images and 10,000 test images, labeled over 10 categories.

The categories are airplane, automobile, beer, cat, deer, dog, frog, horse, ship, truck. We can take advantage of the fact that these categories and a lot more are into the “Imagenet” collection.

To load a database with Keras, we use:


(X_train, Y_train),(X_test, y_test) = K.datasets.cifar10.load_data()

Preprocess:

In the preprocessing stage, I define a function that will prepare the data uploaded from CIFAR10 to be fed to the DensNet201 Keras model. The first step is clearing the dataset of null values. Then, it’ll use one-hot encoding to convert categorical variables (10 categorical) to numerical variables. Neural Nets work with numerical data, not categorical.

Next, I’m going to call this function with the parameters loaded from the CIFAR10 database.

# load the Cifar10 dataset, 50,000 training images and 10,000 test images (here used as validation data)(X, Y), (x_test, y_test) = K.datasets.cifar10.load_data()# preprocess the data using the application's preprocess_input method and convert the labels to one-hot encodingsX_p, Y_p = preprocess_data(X, Y)
x_t, y_t = preprocess_data(x_test, y_test)

Pre-trained network:

Before you pick a model I recommend you to search for a benchmark, also in the Keras documentation, you will find some interesting data.

For me I already choose DensNet201, I loaded it and omitting the head classifier (include_top=False), and uploading the weights obtained during pre-training using the ImageNet database.

But before that, we must resize images to the image size upon which the network was pre-trained.

Then I “freeze” layers in the context of neural networks is about controlling the way the weights are updated. When a layer is frozen, it means that the weights cannot be modified further. This technique, as obvious as it may sound is to cut down on the computational time for training while losing not much on the accuracy side.

for layer in base_model.layers:
    layer.trainable = False
# or simply
base_model.trainable = False

Our top layers:

Later I flatten my processed input and pass it through 3 dense layers .

Each layer has batch normalization beforehand and dropout coming out before the last layer, with softmax and 10 neurons.

Compile and save the model:

Finally, once the model is defined, we compile it specifying which will be the optimization function, we will also take into account the cost or loss function and finally which will be the metric to use.

In this case, for the optimization we will use K.optimizers.Adam() and for the loss function “categorical_crossentropy” and for the metrics “accuracy”

Results:

After running the model, I had these results for each epoch and the last two line was for the accuracy and loss results (model.evaluate()) for the validation data:

Discussion:

In summary, this article shows that training accuracies >88% and validation accuracies >89% can be obtained by transfer learning from a pre-trained DenseNet201 instance taken from the Keras applications database , without having to retrain the DenseNet base instance. Here, it is also worth noting that the accuracy values reached in this demonstration are notably close to the Top-5 accuracy value given by Keras as reference for DenseNet201 instances: 93.6%.

Literature Cited:

https://www.youtube.com/watch?v=FQM13HkEfBk&index=20&list=PLkDaE6sCZn6Gl29AoE31iwdVwSG-KnDzF

https://towardsdatascience.com/a-comprehensive-hands-on-guide-to-transfer-learning-with-real-world-applications-in-deep-learning-212bf3b2f27a

Keras documentation: Transfer learning & fine-tuning

Author: fchollet Date created: 2020/04/15 Last modified: 2020/05/12 Description: Complete guide to transfer learning &…

keras.io

https://intranet.hbtn.io/rltoken/094hW_tsJrotSljWeiCSSA