优化算法 - 梯度下降

文章目录

随机梯度下降

随机梯度下降

但是，在前面的章节中，我们一直在训练过程中使用随机梯度下降，但没有解释它为什么起作用。在本节中，我会更详细地说明随机梯度下降（stochastic gradient descent）

%matplotlib inline
import math
import torch
from d2l import torch as d2l
1
2
3
4

1 - 随机梯度更新

def f(x1,x2): # 目标函数
    return x1 ** 2 + 2 * x2 ** 2

def f_grad(x1,x2): # 目标函数的梯度
    return 2 * x1,4 * x2
1
2
3
4
5

def sgd(x1,x2,s1,s2,f_grad):
    g1,g2 = f_grad(x1,x2)
    # 模拟有噪声的梯度
    g1 += torch.normal(0.0,1,(1,))
    g2 += torch.normal(0.0,1,(1,))
    eta_t = eta * lr()
    return (x1 - eta_t * g1,x2 - eta_t * g2,0,0)
1
2
3
4
5
6
7

def constant_lr():
    return 1

eta = 0.1
lr = constant_lr # 常数学习速度
d2l.show_trace_2d(f,d2l.train_2d(sgd,steps=50,f_grad=f_grad))
1
2
3
4
5
6

epoch 50, x1: 0.022305, x2: 0.014646


C:\Users\20919\anaconda3\envs\d2l\lib\site-packages\numpy\core\shape_base.py:65: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  ary = asanyarray(ary)
C:\Users\20919\anaconda3\envs\d2l\lib\site-packages\torch\functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\TensorShape.cpp:2895.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
1
2
3
4
5
6
7

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-n5QOhw9L-1663162229199)(https://yingziimage.oss-cn-beijing.aliyuncs.com/img/202209142121675.svg)]

2 - 动态学习率

def exponential_lr():
    # 在函数外部定义，而在内部更新的全局变量
    global t
    t += 1
    return math.exp(-0.1 * t)

t = 1
lr = exponential_lr
d2l.show_trace_2d(f,d2l.train_2d(sgd,steps=1000,f_grad=f_grad))
1
2
3
4
5
6
7
8
9

epoch 1000, x1: -0.866794, x2: 0.028221


C:\Users\20919\anaconda3\envs\d2l\lib\site-packages\numpy\core\shape_base.py:65: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  ary = asanyarray(ary)
1
2
3
4
5

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-piTwrwjV-1663162229200)(https://yingziimage.oss-cn-beijing.aliyuncs.com/img/202209142121677.svg)]

正如预期的那样，参数的⽅差⼤⼤减少。但是，这是以未能收敛到最优解x = (0, 0)为代价的。即使经过1000个迭代步骤，我们仍然离最优解很远。事实上，该算法根本⽆法收敛。另⼀⽅⾯，如果我们使⽤多项式衰减，其中学习率随迭代次数的平⽅根倒数衰减，那么仅在50次迭代之后，收敛就会更好

def polynomial_lr():
    # 在函数外部定义，而在内部更新的全局变量
    global t
    t += 1
    return (1 + 0.1 * t) ** (-0.5)

t = 1
lr = polynomial_lr
d2l.show_trace_2d(f, d2l.train_2d(sgd, steps=50, f_grad=f_grad))
1
2
3
4
5
6
7
8
9

epoch 50, x1: 0.064155, x2: 0.037703


C:\Users\20919\anaconda3\envs\d2l\lib\site-packages\numpy\core\shape_base.py:65: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  ary = asanyarray(ary)
1
2
3
4
5

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-qFQmiPqL-1663162229200)(https://yingziimage.oss-cn-beijing.aliyuncs.com/img/202209142121678.svg)]

关于如何设置学习率，还有更多的选择。例如，我们可以从较小的学习率开始，然后使其迅速上涨，再让它降低，尽管这会更慢。我们甚至可以在较小和较大的学习率之间切换。这样的计划各种各样。

现在，让我们专注于可以进行全面理论分析的学习率计划，即凸环境下的学习率。对于一般的非凸问题，很难获得由意义的收敛保证，因为总的来说，最大限度地减少非线性非凸问题是NP困难的

3 - 凸目标的收敛性分析

4 - 随机梯度和有限样本

5 - 小结

对于凸问题，我们可以证明，对于广泛的学习率选择，随机梯度下降将收敛到最优解
对于深度学习而言，情况通常并非如此。但是，对凸问题的分析使我们能够深入了解如何进行优化，即逐步降低学习率，尽管不是太块
如果学习率太⼩或太⼤，就会出现问题。实际上，通常只有经过多次实验后才能找到合适的学习率
当训练数据集中有更多样本时，计算梯度下降的每次迭代的代价更⾼，因此在这些情况下，⾸选随机梯度下降
随机梯度下降的最优性保证在非凸情况下一般不可用，因为需要检查的局部最小值的数量可能是指数级的

相关阅读:
【xv6操作系统】Lab systems calls
蒙特卡洛估计举例
秋招面试题系列- - -Java 工程师（一）
基础会计学重点
LwIP的TCP客户端先于服务端启动情况下，无法正常连接服务器端的解决办法
Chrome速度无人能敌？Safari也甘拜下风
【劳动者捍卫自己的权利】
关于Java中的运算符
解锁Spring Boot中的设计模式—05.策略模式：探索【策略模式】的奥秘与应用实践！
dart的Websocket为什么找不到onOpen方法？

原文地址：https://blog.csdn.net/mynameisgt/article/details/126860805