2. Neural Networks

In this chapter, instructors teach us the foundations of deep learning and neural networks.

For basic concepts for deep learning, instructors explain perceptron, loss functions, activation functions, and gradient descent. And instructors also teach useful techniques for training neural networks like early stopping, regularization, dropout, etc.

This chapter includes a mini-project, Sentiment Analysis, with Andrew Trask, the author of Grokking Deep Learning, and the main project, Predicting Bike-Sharing Patterns.

[toc]

Official course description

In this part, you’ll learn how to build a simple neural network from scratch using python. We’ll cover the algorithms used to train networks such as gradient descent and backpropagation.

The first project is also available this week. In this project, you’ll predict bike ridership using a simple neural network.

Multi-layer neural network with some inputs and a single output. Image from Stanford’s cs231n course.*

Sentiment Analysis

You’ll also learn about model evaluation and validation, an important technique for training and assessing neural networks. We also have guest instructor Andrew Trask, author of Grokking Deep Learning, developing a neural network for processing text and predicting sentiment. The exercises in each chapter of his book are also available in his Github repository.

Deep Learning with PyTorch

The last lesson will be all about using the deep learning framework, PyTorch. Here, you’ll learn how to use the Tensor datatype, and the foundational knowledge you’ll need to define and train your own deep learning models in PyTorch!

Foundations of Deep Learning

Perceptron

As a simplest Feed-Foward Network, perceptron determines the output using input values and their weights. It can be expressed a equation like $w_1x_1 w_2x_2 b = y$ .

Loss Functions

Cross-Entropy

In the classification problem, Cross-Entropy is usually used as a loss function.

$CE = -(ylog(p) 1(1-y)log(1-p))$ where $y$ is actual value(1 or 0), $p$ is predicted probability(between 0 and 1).

So, if the actual value is 1, cross-entropy is $-log(p)$ . And if the actual value is 0, cross-entropy is $-log(1-p)$

MSE(Mean Squared Error)

In the regression problem, MSE is usually used as a loss function. This is a very simple loss function, the mean of differences between actual and predicted values.

$MSE = { {1}\over{n} } \Sigma^n_{i=1} {(Y_i - \hat{Y_i})}^2$ where $Y_i$ is actual value, $\hat{Y_i}$ is precited value.

Activation Functions

Sigmoid

The Sigmoid function uses in the binary-classification problem. It returns the value between 0 and 1.

source: http://krisbolton.com/a-quick-introduction-to-artificial-neural-networks-part-2

Softmax

The Softmax function uses in the multi-classification problem. It returns the value between 0 and 1. And the sum of its outputs is 1.

source: http://krisbolton.com/a-quick-introduction-to-artificial-neural-networks-part-2

ReLU(Rectified Linear activation function)

One of the purposes of ReLU is to solve the vanishing gradient problem. When using sigmoid as a loss function, the final derivative values too small to update deep layers’ weights. However, because ReLU’s derivative values are 0 or 1, the derivative values are well backpropagated.

source: http://krisbolton.com/a-quick-introduction-to-artificial-neural-networks-part-2

Gradient Descent

To find values(weights), the neural networks modify their weights toward minimization of cost. This algorithm is called Gradient Descent.

source: http://rasbt.github.io/mlxtend/user_guide/general_concepts/gradient-optimization/

Sentiment Analysis

In this mini-project, I build a simple fully connected neural network(or Feed-Forward Network, FFN). There is one single hidden layer. And this hidden layer size is 100.

Input data is the IMDB’s review texts. Before the model train, every review pre-process. Each word in the review texts counts how many appear in the positive reviews and negative reviews in the pre-processing phases. And the ratio of positive to negative is calculated and used as an input data set.

An activation function is sigmoid, and a loss function is a simple error ( $\hat{y} -y$ ).

The following image is the result of the training:

In this project, you’ll get to build a neural network from scratch to carry out a prediction problem on a real dataset! By building a neural network from the ground up, you’ll have a much better understanding of gradient descent, backpropagation, and other concepts that are important to know before we move to higher-level tools such as PyTorch. You’ll also get to see how to apply these networks to solve real prediction problems!

The data comes from the UCI Machine Learning Database.

Nanodegree Deep Learning

2. Neural Networks

Foundations of Deep Learning

Perceptron

Loss Functions

Cross-Entropy

MSE(Mean Squared Error)

Activation Functions

Sigmoid

Softmax

ReLU(Rectified Linear activation function)

Gradient Descent

Sentiment Analysis

[Project] Predictin Bike-Sharing Patterns

Related Posts

[Deep Reinforcement Learning Nanodegree Chapter 4] Multi-Agent Reinforcement Learning 15 Dec 2020

[Deep Reinforcement Learning Nanodegree Chapter 3] Policy-Based Methods 10 Dec 2020

[Deep Reinforcement Learning Nanodegree Chapter 2] Value-Based Methods 27 Nov 2020