• Hyperopt:分布式异步超参数优化(Distributed Asynchronous Hyperparameter Optimization)


    1、概述

    深度学习的训练模型过程中,参数的优化是一个比较繁琐的过程,一般使用网格搜索Grid search与人工搜索Manual search,所以这个参数优化有时候看起来就像太上老君炼丹,是一个有点玄的东西。
    那有没有一种可以自动去调优的工具呢?恩,本节介绍的这个Hyperopt工具就是这个用途。
    Hyperopt是一个Python库,用于在复杂的搜索空间(可能包括实值、离散和条件维度)上进行串行和并行优化。

    Hyperopt目前实现了三种算法:
    Random Search
    Tree of Parzen Estimators (TPE)
    Adaptive TPE

    Hyperopt的设计是为了适应基于高斯过程和回归树的贝叶斯优化算法,但这些算法目前还没有实现。所有算法都可以通过下面两种方式并行化:
    Apache Spark
    MongoDB
    一个是大数据处理引擎,另一个是分布式数据库。

    2、安装hyperopt

    安装(依然建议加上豆瓣镜像)

    pip3 install --user hyperopt -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com
    

    可以看到安装了下面这些模块:

    Successfully built future
    Installing collected packages: zipp, numpy, importlib-resources, decorator, tqdm, scipy, py4j, networkx, future, cloudpickle, hyperopt
    Successfully installed cloudpickle-2.2.1 decorator-4.4.2 future-0.18.3 hyperopt-0.2.7 importlib-resources-5.4.0 networkx-2.5.1 numpy-1.19.5 py4j-0.10.9.7 scipy-1.5.4 tqdm-4.64.1 zipp-3.6.0

    3、测试

    3.1、hyperopt_test.py

    安装好了之后,我们来测试一个示例:

    gedit hyperopt_test.py
    1. from hyperopt import fmin, tpe, space_eval,hp
    2. def objective(args):
    3. case, val = args
    4. if case == 'case 1':
    5. return val
    6. else:
    7. return val ** 2
    8. # define a search space
    9. space = hp.choice('a',
    10. [
    11. ('case 1', 1 + hp.lognormal('c1', 0, 1)),
    12. ('case 2', hp.uniform('c2', -10, 10))
    13. ])
    14. # minimize the objective over the space
    15. best = fmin(objective, space, algo=tpe.suggest, max_evals=100)
    16. print(best)
    17. print(space_eval(space, best))
    18. best2 = fmin(fn=lambda x: x ** 2,
    19. space=hp.uniform('x', -8, -2),
    20. algo=tpe.suggest,
    21. max_evals=200)
    22. print(best2)

     运行:

    1. python3 hyperopt_test.py
    2. '''
    3. 100%|██████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 269.38trial/s, best loss: 6.787702954398033e-05]
    4. {'a': 1, 'c2': -0.008238751698162794}
    5. ('case 2', -0.008238751698162794)
    6. 100%|██████████████████████████████████████████████████████████████████████████████| 200/200 [00:00<00:00, 335.54trial/s, best loss: 4.000953693453848]
    7. {'x': -2.000238409153731}
    8. '''

    其中objective这个就是目标函数,通过fmin函数使目标函数最小,那么得到的就是最佳参数。
    结果返回的是字典类型,迭代过程可以显示进度条(verbose=False禁用不显示),前面best迭代100次,后面best2那个迭代200次,以及最小损失函数和优化后的结果。

    3.2、fmin函数

     这里关键点是fmin函数,我们查看这个函数的帮助文档help(fmin)

    1. fmin(fn, space, algo=None, max_evals=None, timeout=None, loss_threshold=None, trials=None, rstate=None, allow_trials_fmin=True, pass_expr_memo_ctrl=None, catch_eval_exceptions=False, verbose=True, return_argmin=True, points_to_evaluate=None, max_queue_len=1, show_progressbar=True, early_stop_fn=None, trials_save_file='')
    2. Minimize a function over a hyperparameter space.
    3. #最小化超参数空间上的函数
    4. More realistically: *explore* a function over a hyperparameter space
    5. according to a given algorithm, allowing up to a certain number of
    6. function evaluations. As points are explored, they are accumulated in
    7. `trials`
    8. Parameters
    9. ----------
    10. fn : callable (trial point -> loss)
    11. This function will be called with a value generated from `space`
    12. as the first and possibly only argument. It can return either
    13. a scalar-valued loss, or a dictionary. A returned dictionary must
    14. contain a 'status' key with a value from `STATUS_STRINGS`, must
    15. contain a 'loss' key if the status is `STATUS_OK`. Particular
    16. optimization algorithms may look for other keys as well. An
    17. optional sub-dictionary associated with an 'attachments' key will
    18. be removed by fmin its contents will be available via
    19. `trials.trial_attachments`. The rest (usually all) of the returned
    20. dictionary will be stored and available later as some 'result'
    21. sub-dictionary within `trials.trials`.
    22. space : hyperopt.pyll.Apply node or "annotated"
    23. The set of possible arguments to `fn` is the set of objects
    24. that could be created with non-zero probability by drawing randomly
    25. from this stochastic program involving involving hp_ nodes
    26. (see `hyperopt.hp` and `hyperopt.pyll_utils`).
    27. If set to "annotated", will read space using type hint in fn. Ex:
    28. (`def fn(x: hp.uniform("x", -1, 1)): return x`)
    29. algo : search algorithm
    30. This object, such as `hyperopt.rand.suggest` and
    31. `hyperopt.tpe.suggest` provides logic for sequential search of the
    32. hyperparameter space.
    33. max_evals : int
    34. Allow up to this many function evaluations before returning.
    35. timeout : None or int, default None
    36. Limits search time by parametrized number of seconds.
    37. If None, then the search process has no time constraint.
    38. loss_threshold : None or double, default None
    39. Limits search time when minimal loss reduced to certain amount.
    40. If None, then the search process has no constraint on the loss,
    41. and will stop based on other parameters, e.g. `max_evals`, `timeout`
    42. trials : None or base.Trials (or subclass)
    43. Storage for completed, ongoing, and scheduled evaluation points. If
    44. None, then a temporary `base.Trials` instance will be created. If
    45. a trials object, then that trials object will be affected by
    46. side-effect of this call.
    47. rstate : numpy.random.Generator, default numpy.random or `$HYPEROPT_FMIN_SEED`
    48. Each call to `algo` requires a seed value, which should be different
    49. on each call. This object is used to draw these seeds via `randint`.
    50. The default rstate is
    51. `numpy.random.default_rng(int(env['HYPEROPT_FMIN_SEED']))`
    52. if the `HYPEROPT_FMIN_SEED` environment variable is set to a non-empty
    53. string, otherwise np.random is used in whatever state it is in.
    54. verbose : bool
    55. Print out some information to stdout during search. If False, disable
    56. progress bar irrespectively of show_progressbar argument
    57. allow_trials_fmin : bool, default True
    58. If the `trials` argument
    59. pass_expr_memo_ctrl : bool, default False
    60. If set to True, `fn` will be called in a different more low-level
    61. way: it will receive raw hyperparameters, a partially-populated
    62. `memo`, and a Ctrl object for communication with this Trials
    63. object.
    64. return_argmin : bool, default True
    65. If set to False, this function returns nothing, which can be useful
    66. for example if it is expected that `len(trials)` may be zero after
    67. fmin, and therefore `trials.argmin` would be undefined.
    68. points_to_evaluate : list, default None
    69. Only works if trials=None. If points_to_evaluate equals None then the
    70. trials are evaluated normally. If list of dicts is passed then
    71. given points are evaluated before optimisation starts, so the overall
    72. number of optimisation steps is len(points_to_evaluate) + max_evals.
    73. Elements of this list must be in a form of a dictionary with variable
    74. names as keys and variable values as dict values. Example
    75. points_to_evaluate value is [{'x': 0.0, 'y': 0.0}, {'x': 1.0, 'y': 2.0}]
    76. Returns
    77. -------
    78. argmin : dictionary
    79. If return_argmin is True returns `trials.argmin` which is a dictionary. Otherwise
    80. this function returns the result of `hyperopt.space_eval(space, trails.argmin)` if there
    81. were successfull trails. This object shares the same structure as the space passed.
    82. If there were no successfull trails, it returns None.
    83. max_queue_len : integer, default 1
    84. Sets the queue length generated in the dictionary or trials. Increasing this
    85. value helps to slightly speed up parallel simulatulations which sometimes lag
    86. on suggesting a new trial.
    87. show_progressbar : bool or context manager, default True (or False is verbose is False).
    88. Show a progressbar. See `hyperopt.progress` for customizing progress reporting.
    89. early_stop_fn: callable ((result, *args) -> (Boolean, *args)).
    90. Called after every run with the result of the run and the values returned by the function previously.
    91. Stop the search if the function return true.
    92. Default None.
    93. trials_save_file: str, default ""
    94. Optional file name to save the trials object to every iteration.
    95. If specified and the file already exists, will load from this file when
    96. trials=None instead of creating a new base.Trials object

     3.3、可视化函数

     我们再来看一个y=(x-3)²的示例,先画出这个函数的图,这样看起来更直观一点:

    1. import numpy as np
    2. import matplotlib.pylab as plt
    3. x=np.linspace(-10,16)
    4. y=(x-3)**2
    5. plt.xlabel('x')
    6. plt.ylabel('y')
    7. plt.plot(x,y,'r--',label='(x-3)**2')
    8. plt.title("y=(x-3)**2")
    9. #plt.legend()
    10. plt.show()

    如下图:

    更多画图技巧,可以查阅:Python画图(直方图、多张子图、二维图形、三维图形以及图中图) 

    从图中我们可以看到,让函数最小化的值,x为3,当然这个不看图也可以知道,好了,现在我们来测试下:

    1. best = fmin(
    2. fn=lambda x: (x-3)**2,
    3. space=hp.uniform('x', -10, 10),
    4. algo=tpe.suggest,
    5. max_evals=100)
    6. print(best)
    7. #{'x': 2.967563715953902}

    试着将max_evals最大迭代次数调到1000,看下结果是怎么样的,将更接近于3了。

    3.4、hp范围值

    space为空间搜索范围,其中这里面的hp包含有下面的取值方法:

    'choice', 'lognormal', 'loguniform', 'normal', 'pchoice', 'qlognormal', 'qloguniform', 'qnormal', 'quniform', 'randint', 'uniform', 'uniformint'

    需要注意的是,normal正态分布的返回值,限制不了范围,我们来做一个对比测试:

    1. from hyperopt import hp
    2. import hyperopt.pyll.stochastic
    3. space = {
    4. 'x':hp.uniform('x', 0, 1),
    5. 'y':hp.normal('y', 0, 1),
    6. 'z':hp.randint('z',0,10),
    7. 'c':hp.choice('City', ['GuangZhou','ShangHai', 'BeiJing']),
    8. }
    1. >>> print(hyperopt.pyll.stochastic.sample(space))
    2. {'c': 'GuangZhou', 'x': 0.38603237555669656, 'y': -0.19782139601114704, 'z': array(1)}
    3. >>> print(hyperopt.pyll.stochastic.sample(space))
    4. {'c': 'ShangHai', 'x': 0.7838648171908386, 'y': 0.43014722187588245, 'z': array(8)}
    5. >>> print(hyperopt.pyll.stochastic.sample(space))
    6. {'c': 'BeiJing', 'x': 0.5137264208587933, 'y': -0.10021079359026988, 'z': array(4)}
    7. >>> print(hyperopt.pyll.stochastic.sample(space))
    8. {'c': 'BeiJing', 'x': 0.7201793839228087, 'y': 0.11571302115909506, 'z': array(0)}
    9. >>> print(hyperopt.pyll.stochastic.sample(space))
    10. {'c': 'GuangZhou', 'x': 0.21906317438496536, 'y': -1.645732195658909, 'z': array(0)}
    11. >>> print(hyperopt.pyll.stochastic.sample(space))
    12. {'c': 'ShangHai', 'x': 0.17319873908122796, 'y': -0.7472225692827178, 'z': array(4)}
    13. >>> print(hyperopt.pyll.stochastic.sample(space))
    14. {'c': 'GuangZhou', 'x': 0.4376348587045986, 'y': 0.7303201600143362, 'z': array(7)}
    15. >>> print(hyperopt.pyll.stochastic.sample(space))
    16. {'c': 'BeiJing', 'x': 0.43311251571433906, 'y': 1.216596288611056, 'z': array(1)}
    17. >>> print(hyperopt.pyll.stochastic.sample(space))
    18. {'c': 'BeiJing', 'x': 0.17755989388617366, 'y': 0.3168677593459059, 'z': array(4)}
    19. >>> print(hyperopt.pyll.stochastic.sample(space))
    20. {'c': 'GuangZhou', 'x': 0.6058631246917083, 'y': -0.2849664724345445, 'z': array(1)}

    可以看到输出的样本空间中,其中正态分布y的值,出现了负数,其他的都是在限定范围内。

    3.5、Trials追踪

    Trials用来了解在迭代过程中的一些返回信息,我们来看个示例:

    1. from hyperopt import fmin, tpe, hp, STATUS_OK, Trials
    2. fspace = {
    3. 'x': hp.uniform('x', -5, 5)
    4. }
    5. def f(params):
    6. x = params['x']
    7. val = (x-3)**2
    8. return {'loss': val, 'status': STATUS_OK}
    9. trials = Trials()
    10. best = fmin(fn=f, space=fspace, algo=tpe.suggest, max_evals=50, trials=trials)
    11. print(best)
    12. #{'x': 2.842657137743265}
    13. for trial in trials.trials[:5]:
    14. print(trial)
    15. '''
    16. {'state': 2, 'tid': 0, 'spec': None, 'result': {'loss': 12.850632865897229, 'status': 'ok'}, 'misc': {'tid': 0, 'cmd': ('domain_attachment', 'FMinIter_Domain'), 'workdir': None, 'idxs': {'x': [0]}, 'vals': {'x': [-0.5847779381570106]}}, 'exp_key': None, 'owner': None, 'version': 0, 'book_time': datetime.datetime(2023, 9, 12, 5, 24, 57, 615000), 'refresh_time': datetime.datetime(2023, 9, 12, 5, 24, 57, 615000)}
    17. {'state': 2, 'tid': 1, 'spec': None, 'result': {'loss': 23.862240347848957, 'status': 'ok'}, 'misc': {'tid': 1, 'cmd': ('domain_attachment', 'FMinIter_Domain'), 'workdir': None, 'idxs': {'x': [1]}, 'vals': {'x': [-1.884899215730961]}}, 'exp_key': None, 'owner': None, 'version': 0, 'book_time': datetime.datetime(2023, 9, 12, 5, 24, 57, 616000), 'refresh_time': datetime.datetime(2023, 9, 12, 5, 24, 57, 616000)}
    18. {'state': 2, 'tid': 2, 'spec': None, 'result': {'loss': 42.84157056715999, 'status': 'ok'}, 'misc': {'tid': 2, 'cmd': ('domain_attachment', 'FMinIter_Domain'), 'workdir': None, 'idxs': {'x': [2]}, 'vals': {'x': [-3.545347245728067]}}, 'exp_key': None, 'owner': None, 'version': 0, 'book_time': datetime.datetime(2023, 9, 12, 5, 24, 57, 616000), 'refresh_time': datetime.datetime(2023, 9, 12, 5, 24, 57, 617000)}
    19. {'state': 2, 'tid': 3, 'spec': None, 'result': {'loss': 0.8412634189024095, 'status': 'ok'}, 'misc': {'tid': 3, 'cmd': ('domain_attachment', 'FMinIter_Domain'), 'workdir': None, 'idxs': {'x': [3]}, 'vals': {'x': [3.9172041315336568]}}, 'exp_key': None, 'owner': None, 'version': 0, 'book_time': datetime.datetime(2023, 9, 12, 5, 24, 57, 617000), 'refresh_time': datetime.datetime(2023, 9, 12, 5, 24, 57, 617000)}
    20. {'state': 2, 'tid': 4, 'spec': None, 'result': {'loss': 30.580983627886543, 'status': 'ok'}, 'misc': {'tid': 4, 'cmd': ('domain_attachment', 'FMinIter_Domain'), 'workdir': None, 'idxs': {'x': [4]}, 'vals': {'x': [-2.5300075612865616]}}, 'exp_key': None, 'owner': None, 'version': 0, 'book_time': datetime.datetime(2023, 9, 12, 5, 24, 57, 618000), 'refresh_time': datetime.datetime(2023, 9, 12, 5, 24, 57, 618000)}
    21. '''

    同样的我们可以根据上面这些迭代的信息,将它们画出来,这样看起来就比较直观了:

    1. import matplotlib.pylab as plt
    2. x=[t['misc']['vals']['x'] for t in trials.trials]
    3. y=[t['result']['loss'] for t in trials.trials]
    4. plt.xlabel('x')
    5. plt.ylabel('y')
    6. plt.scatter(x,y,c='r')
    7. plt.show()

    如图:

    可以看到在3附近的地方得到的是函数的最小值。
    关于更多散点图的知识,可以查阅:Python画图之散点图(plt.scatter)

    题外话:这里的Trials我感觉这个库应该是误写了,正确的单词应该是Trails,有踪迹的意思,而Trials的意思是努力和尝试等含义。

    4、实际应用

    有了上面知识的铺垫,我们接下来测试下实际当中的应用效果,先来看一个K最近邻的示例,使用的是鸢尾花iris的数据集(150个样本的三个类setosa,versicolor,virginica):

    4.1、K最近邻KNN

    先安装相应的库,已安装的忽略

    pip3 install --user scikit-learn -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com

    接下来就上代码:

    1. from sklearn.datasets import load_iris
    2. from sklearn.neighbors import KNeighborsClassifier
    3. from sklearn.model_selection import cross_val_score
    4. from hyperopt import hp,STATUS_OK,Trials,fmin,tpe
    5. iris=load_iris()
    6. X=iris.data
    7. y=iris.target
    8. def hyperopt_train(params):
    9. clf=KNeighborsClassifier(**params)
    10. return cross_val_score(clf,X,y).mean()
    11. space_knn={'n_neighbors':hp.choice('n_neighbors',range(1,50))}
    12. def f(parmas):
    13. acc=hyperopt_train(parmas)
    14. return {'loss':-acc,'status':STATUS_OK}
    15. trials=Trials()
    16. best=fmin(f,space_knn,algo=tpe.suggest,max_evals=100,trials=trials)
    17. print(best)
    18. #{'n_neighbors': 6}

    同样我们将其画图来直观感受下:

    1. import matplotlib.pylab as plt
    2. x=[t['misc']['vals']['n_neighbors'] for t in trials.trials]
    3. y=[-t['result']['loss'] for t in trials.trials]
    4. plt.xlabel('n_neighbors')
    5. plt.ylabel('cross_val_score')
    6. plt.scatter(x,y,c='r')
    7. plt.show()

    4.2、支持向量分类SVC

    再来看下这个鸢尾花数据集在支持向量机中的向量分类会是什么情况:

    1. from sklearn.datasets import load_iris
    2. from sklearn.model_selection import cross_val_score
    3. from hyperopt import hp,STATUS_OK,Trials,fmin,tpe
    4. from sklearn.svm import SVC
    5. iris=load_iris()
    6. X=iris.data
    7. y=iris.target
    8. def hyperopt_train_test(params):
    9. clf =SVC(**params)
    10. return cross_val_score(clf, X, y).mean()
    11. space_svm = {
    12. 'C': hp.uniform('C', 0, 20),
    13. 'kernel': hp.choice('kernel', ['linear', 'sigmoid', 'poly', 'rbf']),
    14. 'gamma': hp.uniform('gamma', 0, 20),
    15. }
    16. def f(params):
    17. acc = hyperopt_train_test(params)
    18. return {'loss': -acc, 'status': STATUS_OK}
    19. trials = Trials()
    20. best = fmin(f, space_svm, algo=tpe.suggest, max_evals=100, trials=trials)
    21. print(best)
    22. #{'C': 0.8930681939735963, 'gamma': 8.379245134714441, 'kernel': 0}

     同样的画图看下效果:

    1. from matplotlib import pyplot as plt
    2. parameters = ['C', 'kernel', 'gamma']
    3. cols = len(parameters)
    4. f, axes = plt.subplots(1,cols)
    5. for i, val in enumerate(parameters):
    6. xs = [t['misc']['vals'][val] for t in trials.trials]
    7. ys = [-t['result']['loss'] for t in trials.trials]
    8. axes[i].scatter(xs, ys, c="g")
    9. axes[i].set_title(val)
    10. axes[i].set_ylim([0.9, 1.0])
    11. plt.show()

    如图:

     4.3、决策树DecisionTree

    再来看下决策树的优化情况,代码都差不多,这里将SVC换成DecisionTreeClassifier,决策树就是看下层数等优化情况:

    1. from sklearn.datasets import load_iris
    2. from sklearn.model_selection import cross_val_score
    3. from hyperopt import hp,STATUS_OK,Trials,fmin,tpe
    4. from sklearn.tree import DecisionTreeClassifier
    5. iris=load_iris()
    6. X=iris.data
    7. y=iris.target
    8. def hyperopt_train_test(params):
    9. clf =DecisionTreeClassifier(**params)
    10. return cross_val_score(clf, X, y).mean()
    11. space_dt = {
    12. 'max_depth': hp.choice('max_depth', range(1,20)),
    13. 'max_features': hp.choice('max_features', range(1,5)),
    14. 'criterion': hp.choice('criterion', ["gini", "entropy"]),
    15. }
    16. def f(params):
    17. acc = hyperopt_train_test(params)
    18. return {'loss': -acc, 'status': STATUS_OK}
    19. trials = Trials()
    20. best = fmin(f, space_dt, algo=tpe.suggest, max_evals=100, trials=trials)
    21. print(best)
    22. #{'criterion': 0, 'max_depth': 17, 'max_features': 1}
    23. 同样的画图看下效果:
    24. from matplotlib import pyplot as plt
    25. parameters = ['max_depth', 'max_features', 'criterion']
    26. cols = len(parameters)
    27. f, axes = plt.subplots(1,cols)
    28. for i, val in enumerate(parameters):
    29. xs = [t['misc']['vals'][val] for t in trials.trials]
    30. ys = [-t['result']['loss'] for t in trials.trials]
    31. axes[i].scatter(xs, ys, c="g")
    32. axes[i].set_title(val)
    33. axes[i].set_ylim([0.9, 1.0])
    34. plt.show()

    如图:

    5、小结 

    通过对hyperopt的认识,这样我们就可以在后期的工作中来高效寻找最优参数了,主要就是通过fmin()这个方法里面设定需要优化的损失函数,以及寻找的空间范围值,然后进行迭代找出最佳值。我们还可以指定Trials()来追踪迭代的信息,并对其进行了画图可视化,便于我们更直观的观察。

    关于寻找最优参数与超参数的一些技巧的文章:
    神经网络技巧篇之寻找最优参数的方法
    神经网络技巧篇之寻找最优参数的方法【续】
    神经网络技巧篇之寻找最优超参数
    github:https://github.com/hyperopt/hyperopt

  • 相关阅读:
    一文详解爬楼梯
    前后端分离项目,vue+uni-app+php+mysql订座预约小程序系统 开题报告
    06-ServletRequest
    集合(Set)和有序集合(ZSet)的基本使用方法详解【Redis】
    C++——编译和链接原理笔记
    Haproxy
    力扣每日一题:1732. 找到最高海拔【简单模拟题,有点前缀和的样子】
    世界杯叠加购物节日,预计用户消费和品牌营销将迎来大幅增长
    JavaScript奇淫技巧:把JS编译成exe
    直方图学习
  • 原文地址:https://blog.csdn.net/weixin_41896770/article/details/132868806