iPython Notebook for this tutorial is available here.

The examples in this notebook assume that you are familiar with the theory of the neural networks. To learn more about the neural networks, you can refer the resources mentioned here.

In this notebook, we will learn to:

  • import MNIST dataset and visualize some example images
  • define deep neural network model with single as well as multiple hidden layers
  • train the model and plot the accuracy or loss at each epoch
  • study the effect of varying the learning rate, batch size and number of epochs
  • use SGD and Adam optimizers
  • save model weights every 10 epochs
  • resume training by loading a saved model
  • earlystop the training if there is negligiable improvement in the performance

Import modules

# Use GPU for Theano, comment to use CPU instead of GPU
# Tensorflow uses GPU by default
import os
os.environ["THEANO_FLAGS"] = "mode=FAST_RUN, device=gpu, floatX=float32"
# If using tensorflow, set image dimensions order
from keras import backend as K
if K.backend()=='tensorflow':
    K.set_image_dim_ordering("th")
Using TensorFlow backend.
import time
import matplotlib.pyplot as plt
import numpy as np
from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.optimizers import SGD
from keras.utils import np_utils
% matplotlib inline
np.random.seed(2017)

Load MNIST dataset

from keras.datasets import mnist
(train_features, train_labels), (test_features, test_labels) = mnist.load_data()
_, img_rows, img_cols =  train_features.shape
num_classes = len(np.unique(train_labels))
num_input_nodes = img_rows*img_cols
print "Number of training samples: %d"%train_features.shape[0]
print "Number of test samples: %d"%test_features.shape[0]
print "Image rows: %d"%train_features.shape[1]
print "Image columns: %d"%train_features.shape[2]
print "Number of classes: %d"%num_classes
Number of training samples: 60000
Number of test samples: 10000
Image rows: 28
Image columns: 28
Number of classes: 10

Show Examples from Each Class

fig = plt.figure(figsize=(8,3))
for i in range(num_classes):
    ax = fig.add_subplot(2, 5, 1 + i, xticks=[], yticks=[])
    features_idx = train_features[train_labels[:]==i,:]
    ax.set_title("Num: " + str(i))
    plt.imshow(features_idx[1], cmap="gray")
plt.show()

png

Pre-processing

# reshape images to column vectors
train_features = train_features.reshape(train_features.shape[0], img_rows*img_cols)
test_features = test_features.reshape(test_features.shape[0], img_rows*img_cols)
# convert class labels to binary class labels
train_labels = np_utils.to_categorical(train_labels, num_classes)
test_labels = np_utils.to_categorical(test_labels, num_classes)

Define a Neural Network Model with a Single Hidden Layer

def simple_nn():
    # initialize model
    model = Sequential()
    # add an input layer and a hidden layer
    model.add(Dense(100, input_dim = num_input_nodes))
    # add activation layer to add non-linearity
    model.add(Activation('sigmoid'))
    # to add ReLu instead of sigmoid: model.add(Activation('relu'))
    # combine above 2 layers: model.add(Dense(100, input_dim=784),Activation('sigmoid'))
    # add output layer
    model.add(Dense(num_classes))
    # add softmax layer 
    model.add(Activation('softmax'))
    return model
# define model
model = simple_nn()
# define optimizer
sgd = SGD(lr=0.01)
model.compile(optimizer=sgd, loss='mse', metrics=['accuracy'])
# print model information
model.summary()
____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
dense_1 (Dense)                  (None, 100)           78500       dense_input_1[0][0]              
____________________________________________________________________________________________________
activation_1 (Activation)        (None, 100)           0           dense_1[0][0]                    
____________________________________________________________________________________________________
dense_2 (Dense)                  (None, 10)            1010        activation_1[0][0]               
____________________________________________________________________________________________________
activation_2 (Activation)        (None, 10)            0           dense_2[0][0]                    
====================================================================================================
Total params: 79,510
Trainable params: 79,510
Non-trainable params: 0
____________________________________________________________________________________________________

Train the model

start = time.time()
model_info = model.fit(train_features, train_labels, batch_size=64, \
                       nb_epoch=10, verbose=2, validation_split=0.2)
end = time.time()
print "Model took %0.2f seconds to train"%(end - start)
Train on 48000 samples, validate on 12000 samples
Epoch 1/10
1s - loss: 0.0866 - acc: 0.2679 - val_loss: 0.0783 - val_acc: 0.3748
Epoch 2/10
1s - loss: 0.0726 - acc: 0.4483 - val_loss: 0.0665 - val_acc: 0.5164
Epoch 3/10
1s - loss: 0.0629 - acc: 0.5445 - val_loss: 0.0584 - val_acc: 0.5817
Epoch 4/10
1s - loss: 0.0561 - acc: 0.6055 - val_loss: 0.0523 - val_acc: 0.6432
Epoch 5/10
1s - loss: 0.0506 - acc: 0.6636 - val_loss: 0.0473 - val_acc: 0.6973
Epoch 6/10
1s - loss: 0.0459 - acc: 0.7166 - val_loss: 0.0426 - val_acc: 0.7530
Epoch 7/10
1s - loss: 0.0417 - acc: 0.7669 - val_loss: 0.0389 - val_acc: 0.7914
Epoch 8/10
1s - loss: 0.0383 - acc: 0.7967 - val_loss: 0.0356 - val_acc: 0.8141
Epoch 9/10
1s - loss: 0.0355 - acc: 0.8121 - val_loss: 0.0331 - val_acc: 0.8273
Epoch 10/10
1s - loss: 0.0333 - acc: 0.8237 - val_loss: 0.0311 - val_acc: 0.8387
Model took 12.27 seconds to train

Plot Accuracy or Loss as a Function of Number of Epoch

def plot_model_history(model_history):
    fig, axs = plt.subplots(1,2,figsize=(15,5))
    # summarize history for accuracy
    axs[0].plot(range(1,len(model_history.history['acc'])+1),model_history.history['acc'])
    axs[0].plot(range(1,len(model_history.history['val_acc'])+1),model_history.history['val_acc'])
    axs[0].set_title('Model Accuracy')
    axs[0].set_ylabel('Accuracy')
    axs[0].set_xlabel('Epoch')
    axs[0].set_xticks(np.arange(1,len(model_history.history['acc'])+1),len(model_history.history['acc'])/10)
    axs[0].legend(['train', 'val'], loc='best')
    # summarize history for loss
    axs[1].plot(range(1,len(model_history.history['loss'])+1),model_history.history['loss'])
    axs[1].plot(range(1,len(model_history.history['val_loss'])+1),model_history.history['val_loss'])
    axs[1].set_title('Model Loss')
    axs[1].set_ylabel('Loss')
    axs[1].set_xlabel('Epoch')
    axs[1].set_xticks(np.arange(1,len(model_history.history['loss'])+1),len(model_history.history['loss'])/10)
    axs[1].legend(['train', 'val'], loc='best')
    plt.show()
plot_model_history(model_info)

png

Test the Model

def accuracy(test_x, test_y, model):
    result = model.predict(test_x)
    predicted_class = np.argmax(result, axis=1)
    true_class = np.argmax(test_y, axis=1)
    num_correct = np.sum(predicted_class == true_class) 
    accuracy = float(num_correct)/result.shape[0]
    return (accuracy * 100)
print "Accuracy on test data is: %0.2f"%accuracy(test_features, test_labels, model)
Accuracy on test data is: 83.54

Observation:

Both training and validation accuracy increase as the number of epochs increase. More information is learned in each epoch.

Vary the Learning Rate

# decrease the learning rate
# define model
model = simple_nn()
# define optimizer
sgd = SGD(lr=0.001)
model.compile(optimizer=sgd, loss='mse', metrics=['accuracy'])
# train the model
start = time.time()
model_info = model.fit(train_features, train_labels, batch_size=64, \
                       nb_epoch=10, verbose=0, validation_split=0.2)
end = time.time()
# plot model history
plot_model_history(model_info)
print "Model took %0.2f seconds to train"%(end - start)
# compute test accuracy
print "Accuracy on test data is: %0.2f"%accuracy(test_features, test_labels, model)

png

Model took 11.39 seconds to train
Accuracy on test data is: 27.29
# increase the learning rate
# define model
model = simple_nn()
# define optimizer
sgd = SGD(lr=0.1)
model.compile(optimizer=sgd, loss='mse', metrics=['accuracy'])
# train the model
start = time.time()
model_info = model.fit(train_features, train_labels, batch_size=64, \
                       nb_epoch=10, verbose=0, validation_split=0.2)
end = time.time()
# plot model history
plot_model_history(model_info)
print "Model took %0.2f seconds to train"%(end - start)
# compute test accuracy
print "Accuracy on test data is: %0.2f"%accuracy(test_features, test_labels, model)

png

Model took 11.47 seconds to train
Accuracy on test data is: 92.43

Observation:

If the learning rate is decreased, less information is learned in each epoch and more epochs are required to learn a good model. If the learning rate is increased, more information is learned in each epoch and less epochs are required to learn a good model. When using SGD, learning rate needs to be decided emperically for a given dataset.

Use Adam optimizer instead of SGD

# Define model
model = simple_nn()
# define optimizer, loss function
model.compile(optimizer='adam', loss='mse', metrics=['accuracy'])
# Train the model
start = time.time()
model_info = model.fit(train_features, train_labels, batch_size=64, \
                       nb_epoch=10, verbose=0, validation_split=0.2)
end = time.time()
# plot model history
plot_model_history(model_info)
print "Model took %0.2f seconds to train"%(end - start)
# compute test accuracy
print "Accuracy on test data is: %0.2f"%accuracy(test_features, test_labels, model)

png

Model took 13.49 seconds to train
Accuracy on test data is: 92.23

Observation:

Using Adam optimizer, we don’t need to specify a learning rate. However, the training time increases. Refer this tutorial for an interesting comparison of optimizers.

Vary the Batch Size

# increase the batch size
# define model
model = simple_nn()
# define optimizer
model.compile(optimizer='adam', loss='mse', metrics=['accuracy'])
# train the model
start = time.time()
model_info = model.fit(train_features, train_labels, batch_size=128, \
                       nb_epoch=10, verbose=0, validation_split=0.2)
end = time.time()
# plot model history
plot_model_history(model_info)
print "Model took %0.2f seconds to train"%(end - start)
# compute test accuracy
print "Accuracy on test data is: %0.2f"%accuracy(test_features, test_labels, model)

png

Model took 6.99 seconds to train
Accuracy on test data is: 92.94
# decrease the batch size
# define model
model = simple_nn()
model.compile(optimizer='adam', loss='mse', metrics=['accuracy'])
# train the model
start = time.time()
model_info = model.fit(train_features, train_labels, batch_size=32, \
                       nb_epoch=10, verbose=0, validation_split=0.2)
end = time.time()
# plot model history
plot_model_history(model_info)
print "Model took %0.2f seconds to train"%(end - start)
# compute test accuracy
print "Accuracy on test data is: %0.2f"%accuracy(test_features, test_labels, model)

png

Model took 24.76 seconds to train
Accuracy on test data is: 91.38

Observation:

Increasing the batch size decreases the training time but reduces the rate of learning.

Change the Cost Function to Categorical Crossentropy

# define model
model = simple_nn()
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
# train the model
start = time.time()
model_info = model.fit(train_features, train_labels, batch_size=128, \
                       nb_epoch=10, verbose=0, validation_split=0.2)
end = time.time()
# plot model history
plot_model_history(model_info)
print "Model took %0.2f seconds to train"%(end - start)
# compute test accuracy
print "Accuracy on test data is: %0.2f"%accuracy(test_features, test_labels, model)

png

Model took 7.76 seconds to train
Accuracy on test data is: 93.27

Observation:

Changing the cost function to categorical crossentropy reduced the training time. The decrease in training time is significant when using SGD for this experiment.

Increase the Number of Epochs

# define model
model = simple_nn()
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
# train the model
start = time.time()
model_info = model.fit(train_features, train_labels, batch_size=128, \
                       nb_epoch=100, verbose=0, validation_split=0.2)
end = time.time()
# plot model history
plot_model_history(model_info)
print "Model took %0.2f seconds to train"%(end - start)
# compute test accuracy
print "Accuracy on test data is: %0.2f"%accuracy(test_features, test_labels, model)

png

Model took 86.18 seconds to train
Accuracy on test data is: 95.39

Observation:

As the number of epochs increase, more information is learned. The training as well as validation accuracy increases and then stabilizes.

Deep Neural Network

def deep_nn():
    # Define a deep neural network
    model = Sequential()
    model.add(Dense(512, input_dim=num_input_nodes))
    model.add(Activation('relu'))
    model.add(Dense(256))
    model.add(Activation('relu'))
    model.add(Dense(128))
    model.add(Activation('relu'))
    model.add(Dense(64))
    model.add(Activation('relu'))
    model.add(Dense(num_classes))
    model.add(Activation('softmax'))
    return model
model = deep_nn()
# Define optimizer
sgd = SGD(lr=0.1)
model.compile(optimizer=sgd, loss='mse', metrics=['accuracy'])
# Train the model
start = time.time()
model_info = model.fit(train_features, train_labels, batch_size=128, \
                       nb_epoch=100, verbose=0, validation_split=0.2)
end = time.time()
# plot model history
plot_model_history(model_info)
print "Model took %0.2f seconds to train"%(end - start)
# compute test accuracy
print "Accuracy on test data is: %0.2f"%accuracy(test_features, test_labels, model)

png

Model took 94.52 seconds to train
Accuracy on test data is: 97.16

Observation:

By adding more hidden layers, training time as well as information learned in each epoch increases. It helps to improve the performance for complex tasks but may not help significantly for relatively simple datasets such as MNIST.

Save model every 10th epoch

from keras.callbacks import ModelCheckpoint
import os

# define a deep neural network
model = deep_nn()
# define optimizer
sgd = SGD(lr=0.1)
model.compile(optimizer=sgd, loss='mse', metrics=['accuracy'])
# checkpoint
outputFolder = './output-mnist'
if not os.path.exists(outputFolder):
    os.makedirs(outputFolder)
filepath=outputFolder+"/weights-{epoch:02d}-{val_acc:.2f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, \
                             save_best_only=False, save_weights_only=True, \
                             mode='auto', period=10)
callbacks_list = [checkpoint]
# train the model
model_info = model.fit(train_features, train_labels, batch_size=128, \
                       nb_epoch=80, callbacks=callbacks_list, verbose=0, \
                       validation_split=0.2)
# compute test accuracy
print "Accuracy on test data is: %0.2f"%accuracy(test_features, test_labels, model)
Epoch 00009: saving model to ./output-mnist/weights-09-0.92.hdf5
Epoch 00019: saving model to ./output-mnist/weights-19-0.96.hdf5
Epoch 00029: saving model to ./output-mnist/weights-29-0.95.hdf5
Epoch 00039: saving model to ./output-mnist/weights-39-0.96.hdf5
Epoch 00049: saving model to ./output-mnist/weights-49-0.97.hdf5
Epoch 00059: saving model to ./output-mnist/weights-59-0.97.hdf5
Epoch 00069: saving model to ./output-mnist/weights-69-0.97.hdf5
Epoch 00079: saving model to ./output-mnist/weights-79-0.97.hdf5
Accuracy on test data is: 97.45

Resume training by loading a saved model

# define model
model = deep_nn()
# load weights
import os, glob
epoch_num = 79
outputFolder = './output-mnist'
file_ini = outputFolder+'/weights-'+ str(epoch_num)+'*'
filename =  glob.glob(file_ini)
if os.path.isfile(filename[0]):
    model.load_weights(filename[0])
else:
    print "%s does not exist"%filename[0]
# define optimizer
sgd = SGD(lr=0.1)
model.compile(optimizer=sgd, loss='mse', metrics=['accuracy'])
# checkpoint
outputFolder = './output-mnist'
if not os.path.exists(outputFolder):
    os.makedirs(outputFolder)
filepath=outputFolder+"/weights-{epoch:02d}-{val_acc:.2f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, \
                             save_best_only=False, save_weights_only=True,\
                             mode='auto', period=10)
callbacks_list = [checkpoint]
# train the model
model_info = model.fit(train_features, train_labels, batch_size=128, \
                       nb_epoch=100, callbacks=callbacks_list, verbose=0, \
                       validation_split=0.2, initial_epoch = epoch_num+1)
# compute test accuracy
print "Accuracy on test data is: %0.2f"%accuracy(test_features, test_labels, model)
Epoch 00089: saving model to ./output-mnist/weights-89-0.97.hdf5
Epoch 00099: saving model to ./output-mnist/weights-99-0.97.hdf5
Accuracy on test data is: 97.32

Early Stopping

from keras.callbacks import EarlyStopping
#define model
model = deep_nn()
# define optimizer
sgd = SGD(lr=0.1)
model.compile(optimizer=sgd, loss='mse', metrics=['accuracy'])
# define early stopping callback
earlystop = EarlyStopping(monitor='val_acc', min_delta=0.0001, patience=5, \
                          verbose=1, mode='auto')
callbacks_list = [earlystop]
# train the model
start = time.time()
model_info = model.fit(train_features, train_labels, batch_size=128, \
                       nb_epoch=100, callbacks=callbacks_list, verbose=0, \
                       validation_split=0.2)
end = time.time()
# plot model history
plot_model_history(model_info)
# compute test accuracy
print "Accuracy on test data is: %0.2f"%accuracy(test_features, test_labels, model)
Epoch 00054: early stopping

png

Accuracy on test data is: 96.19