3. Convolutional Neural Networks

This chapter teaches us one of the famous deep learning architecture, Convolutional Neural Networks(CNNs).

This chapter includes not only how to construct CNNs using PyTorch but also transfer learning and autoencoders.

[toc]

Official course description

Convolutional networks have achieved state of the art results in computer vision. These types of networks can detect and identify objects in images. You’ll learn how to build convolutional networks in PyTorch.

You’ll also get the second project, where you’ll build a convolutional network to classify dog breeds in pictures.

Structure of a convolutional neural network.

You’ll also use convolutional networks to build an autoencoder, a network architecture used for image compression and denoising. Then, you’ll use a pre-trained neural network, to classify images the network has never seen before, a technique known as transfer learning.

Convolutional Neural Networks(CNNs)

Basic structure of CNN

Convolutional layer

The convolutional layer projects the input on output using a filter.

source: https://www.researchgate.net/figure/Outline-of-the-convolutional-layer_fig1_323792694

source: https://www.quora.com/What-are-convolutional-layers

Pooling layer

To avoid over-fitting and reduce features, extracts max(or in) value in the feature map.

source: https://kevinthegrey.tistory.com/142

Code Samples

Define Network

import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # convolutional layer
        # output shape is (20, 16, 32, 32)
        # 32+(2x2) - (5-1) = 32 + 4 - 4 = 32
        self.conv1 = nn.Conv2d(3, 16, 5, padding=2)
        # max pooling layer
        # output shape is (20, 16, 16, 16)
        self.pool1 = nn.MaxPool2d(2, 2)
        # convolutional layer
        # output shape is (20, 32, 16, 16)
        # 16+(2x2) - (5-1) = 16 + 4 - 4 = 16
        self.conv2 = nn.Conv2d(16, 32, 5, padding=2)
        # max pooling layer
        # output shape is (20, 32, 8, 8)
        self.pool2 = nn.MaxPool2d(2, 2)
        
        self.conv3 = nn.Conv2d(32, 64, 3, padding=1)
        self.pool3 = nn.MaxPool2d(2, 2)
        
        self.fc1 = nn.Linear(64*4*4, 100)
        self.fc2 = nn.Linear(100, 10)
        self.dropout = nn.Dropout(0.3)
        
        
    def forward(self, x):
        # add sequence of convolutional and max pooling layers
        x = self.pool1(F.relu(self.conv1(x)))
        x = self.pool2(F.relu(self.conv2(x)))
        x = self.pool3(F.relu(self.conv3(x)))
        x = x.view(-1, 64*4*4)
        x = self.dropout(x)
        x = F.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.fc2(x)
        
        return x

Training

def train(n_epochs, loaders, model, optimizer, criterion, use_cuda, save_path):
    """returns trained model"""
    # initialize tracker for minimum validation loss
    valid_loss_min = np.Inf 
    
    for epoch in range(1, n_epochs+1):
        # initialize variables to monitor training and validation loss
        start = time.time()        
        train_loss = 0.0
        valid_loss = 0.0
        
        ###################
        # train the model #
        ###################
        model.train()
        for batch_idx, (data, target) in enumerate(loaders['train']):
            # move to GPU
            if use_cuda:
                data, target = data.cuda(), target.cuda()
            ## find the loss and update the model parameters accordingly
            ## record the average training loss, using something like
            ## train_loss = train_loss + ((1 / (batch_idx + 1)) * (loss.data - train_loss))
            optimizer.zero_grad()
            output = model(data)
            
            loss = criterion(output, target)
            loss.backward()
            optimizer.step()
            train_loss = train_loss + ((1 / (batch_idx + 1)) * (loss.data - train_loss))
            
        ######################    
        # validate the model #
        ######################
        model.eval()
        for batch_idx, (data, target) in enumerate(loaders['valid']):
            # move to GPU
            if use_cuda:
                data, target = data.cuda(), target.cuda()
            ## update the average validation loss
            output = model(data)
            loss = criterion(output, target)
            valid_loss = valid_loss + ((1 / (batch_idx + 1)) * (loss.data - valid_loss))

    # print training/validation statistics 
        print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(
            epoch, 
            train_loss,
            valid_loss
            ))
        print('Time consumed %.4f seconds' % (time.time() - start))
        
        ## TODO: save the model if validation loss has decreased
        if valid_loss <= valid_loss_min:
            print('Valid loss decreased (%.6f --> %.6f).' % (valid_loss_min, valid_loss))
            torch.save(model.state_dict(), save_path)
            valid_loss_min = valid_loss
            
    # return trained model
    return model
  
model = train(50, loaders_scratch, model_scratch, optimizer_scratch, criterion_scratch, use_cuda, 'model_scratch.pt')

Testing

def test(loaders, model, criterion, use_cuda):

    # monitor test loss and accuracy
    test_loss = 0.
    correct = 0.
    total = 0.

    model.eval()
    for batch_idx, (data, target) in enumerate(loaders['test']):
        # move to GPU
        if use_cuda:
            data, target = data.cuda(), target.cuda()
        # forward pass: compute predicted outputs by passing inputs to the model
        output = model(data)
        # calculate the loss
        loss = criterion(output, target)
        # update average test loss 
        test_loss = test_loss + ((1 / (batch_idx + 1)) * (loss.data - test_loss))
        # convert output probabilities to predicted class
        pred = output.data.max(1, keepdim=True)[1]
        # compare predictions to true label
        correct += np.sum(np.squeeze(pred.eq(target.data.view_as(pred))).cpu().numpy())
        total += data.size(0)
            
    print('Test Loss: {:.6f}\n'.format(test_loss))

    print('\nTest Accuracy: %2d%% (%2d/%2d)' % (
        100. * correct / total, correct, total))
    
test(loaders_scratch, model_scratch, criterion_scratch, use_cuda)

Transfer Learning

Transfer learning involves taking a pre-trained neural network and adapting the neural network to a new, different data set.

Depending on both:

The size of the new data set, and
The similarity of the new data set to the original data set

The approach for using transfer learning will be different. There are four main cases:

New data set is small, new data is similar to original training data.
New data set is small, new data is different from original training data.
New data set is large, new data is similar to original training data.
New data set is large, new data is different from original training data.

A large data set might have one million images. A small data could have two-thousand images. The dividing line between a large data set and small data set is somewhat subjective. Overfitting is a concern when using transfer learning with a small data set.

Images of dogs and images of wolves would be considered similar; the images would share common characteristics. A data set of flower images would be different from a data set of dog images.

Each of the four transfer learning cases has its own approach. In the following sections, we will look at each case one by one.

The graph below displays what approach is recommended for each of the four main cases.

Autoencoder

Convolutional Autoencoder consists of the encoder part and the decoder part. The encoder part is a typical convolutional network. This part compresses information of original input. The decoder part is a reverse of the typical convolutional network. The decoder uses Transpose Convolution layers to convert from compressed vector to original information.

Style transfer project is a good example using autoencoder. Decoder trained by other style images can convert compressed information using the encoder. In this case, the original image converts the same image but different styles.

source: https://hackernoon.com/autoencoders-deep-learning-bits-1-11731e200694

[Project] Dog-Breed Classifier

Welcome to the Convolutional Neural Networks (CNN) project! In this project, you will learn how to build a pipeline to process real-world, user-supplied images. Given an image of a dog, your algorithm will identify an estimate of the canine’s breed. If supplied an image of a human face, the code will identify the resembling dog breed.

Along with exploring state-of-the-art CNN models for classification, you will make important design decisions about the user experience for your app. By completing this lab, you demonstrate your understanding of the challenges involved in piecing together a series of models designed to perform various tasks in a data processing pipeline.

Each model has its strengths and weaknesses, and engineering a real-world application often involves solving many problems without a perfect answer. Your imperfect solution will nonetheless create a fun user experience!

Nanodegree Deep Learning