一:什么是梯度
1、Clarification

2、What does grad mean?


3、How to search for minima?


4、Learning process
1

2

5、Convex function

6、Local Minima

7、ResNet-56


8、Saddle point


9、Optimizer Performance
▪ initialization status
▪ learning rate
▪ momentum
▪ etc.
10、Initialization



11、Learning rate


12、Escape minima

二:激活函数
1、Activation Functions

Derivative




Derivative

torch.sigmoid



Derivative

torch.tanh

4、Rectified Linear Unit


Derivative


F.relu

三:LOSS及其梯度
1、Typical Loss
▪ Mean Squared Error
▪ Cross Entropy Loss
▪ binary
▪ multi-class
▪ +softmax
▪ Leave it to Logistic Regression Part
2、MSE

Derivative

autograd.grad

loss.backward

Gradient API

3、Softmax
soft version of max

Derivative



F.softmax

四:感知机的梯度推导
1、Derivative


2、Perceptron



3、Multi-output Perceptron

4、Derivative

5、Multi-output Perceptron

