(框架)Deepracer 自动训练框架的搭建

1. 背景

由于使用 AWS deepracer 来训练模型实在是太贵了，每小时20多块，平民玩家表示玩不起，在本地部署WSL玩起来发现性能很差，时不时就挂掉，特别是对于那些专业赛道，需要的计算资源跟内存是我本地电脑WSL带不动的，基本是模型跑了几个episode就挂了，根据博主的理解deepracer在ubuntu系统就能跑起来，理论上在云服务器上使用Ubuntu应该就能跑起来，于是博主就尝试在阿里云上尝试部署 deepracer，经过多番折腾最终在阿里云上部署成功，并且发现 2 核 8G内存就能跑起来，而且性能还是相当不错，但是由于我不是新用户，如果包年要2k+，博主只能使用抢占式服务器（这之后也尝试注册其他云厂商作为新用户去购买结果发现有的云服务器同样的配置性能差太多不得已重新换回阿里云），每天只用七八块钱就能玩一整天，但是你得时刻盯着模型训练到哪一步了，并且需要频繁修改参数和执行命令，否则每次训练完成后就会停止，后面的服务器就会空闲下来，博主没有那么多时间花费在上面也不想浪费钱（终于体会到时间就是金钱），于是我根据常用命令做了一个自动训练的框架。
1

2. 框架介绍

该框架通过配置好的参数自动进行模型训练，自动迭代，每完成一次迭代之后发送邮箱提醒。
架构图:
在这里插入图片描述

3. 效果展示

地图: 2022 re:Invent Warm Up

这是本月公开赛地图
在这里插入图片描述

邮件通知

昨天的:
请添加图片描述
今天:

可以看到模型已经迭代到1007，3圈总成绩能达到 73s 左右，基本能达到 top 5%，总共训练了8个小时。
在这里插入图片描述

3. 框架讲解

3.1 邮件模块

下面的 token 是你的qq邮箱

from email.header import Header
from email.mime.text import  MIMEText
from email.mime.multipart import MIMEMultipart
mail_host = 'smtp.qq.com'                   #mail server dns
mail_user = '846058904@qq.com'              #mail address
mail_pass = 'xxxxxxxxxxxxxxxx'              #token
sender = '846058904@qq.com'                 #sender
receivers = ['846058904@qq.com',]    #receiver list
def sendmail(subject,content,user=mail_user,password=mail_pass,sender=sender,to=sender,receivers=receivers,host=mail_host):
    message = MIMEMultipart()
    message['From'] = sender
    message['To'] = to
    message['Subject'] = Header(subject)
    message.attach(MIMEText(content))
    try:
        smtpobj = smtplib.SMTP_SSL(host)                          # connect mail server
        smtpobj.login(user,password)                              # log in
        smtpobj.sendmail(sender,receivers,message.as_string())    # send mail
        smtpobj.quit()                                            #logoff
        log(4,f'[SendMail.py]:subject=[{subject}],content=[{content}], send successful.')
    except smtplib.SMTPException :
        log(2,f'[SendMail.py]:subject=[{subject}],content=[{content}], send failed.')
def log(level,logString,LEVEL = 3,folder='log',*argv):
    if not os.path.exists(folder):
        os.mkdir(folder)
    timestamp = str(datetime.now())
    filename= timestamp[:10]
    filename = f'{folder}/{filename}.log'
    preString = f'[{timestamp}]-[{level}]: '
    if type(logString) == list:
        logString = ','.join(logString)
    logString = preString + logString.replace('\n',preString) + '\n'
    
    if level <= LEVEL:
        saveFile(filename,logString)
        print(logString)
    else:
        saveFile(filename,logString)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

3.2 执行模块

通过执行不同的 dr-*命令进行训练，使用bash script进行控制

#!/bin/bash

################################
#00: dr-upload-custom-files    #
#01: dr-start-training -w      #
#02: dr-stop-training          #
#03: dr-increment-training     #
#04: dr-start-evaluation       #
#05: dr-stop-evaluation        #
################################
# setsid ./monitor.sh>>log/monitor.log
if [ ! -d "log/" ];then
  mkdir log
fi
function get_file_last_modify_timestamp(){   
export CUR_MODIFY_TIMESTAMP=`stat -c %Y  status.txt`
echo $CUR_MODIFY_TIMESTAMP
get_file_last_modify_timestamp=$CUR_MODIFY_TIMESTAMP
}

export timeSleep=10
export cur=`get_file_last_modify_timestamp`

while true
do
    LAST_MODIFY_TIMESTAMP=`get_file_last_modify_timestamp`
    export status=`cat status.txt`
    if [ "$cur" != "$LAST_MODIFY_TIMESTAMP" ]; then # if status changed
        echo "[`date +%Y-%m-%d%t%X.%N`]: status is [$status] ,status has changed.">>log/monitor.log
        case $status in 
            "00") 
            echo "[`date +%Y-%m-%d%t%X.%N`]: run [setsid ./upload-custom-files.sh]">>log/monitor.log
            setsid ./upload-custom-files.sh;;
            "01") 
            echo "[`date +%Y-%m-%d%t%X.%N`]: run [setsid ./start-training.sh]">>log/monitor.log
            setsid ./start-training.sh;;
            "02") 
            echo "[`date +%Y-%m-%d%t%X.%N`]: run [setsid ./stop-training.sh]">>log/monitor.log
            setsid ./stop-training.sh;;
            "03") 
            echo "[`date +%Y-%m-%d%t%X.%N`]: run [setsid ./increment-training.sh]">>log/monitor.log
            setsid ./increment-training.sh;;
            "04") 
            echo "[`date +%Y-%m-%d%t%X.%N`]: run [setsid ./start-evaluation.sh]">>log/monitor.log
            setsid ./start-evaluation.sh;;
            "05") 
            echo "[`date +%Y-%m-%d%t%X.%N`]: run [setsid ./stop-evaluation.sh]">>log/monitor.log
            setsid ./stop-evaluation.sh;;
            *)
            echo "[`date +%Y-%m-%d%t%X.%N`]: Invild status code [$status]! sleep [$timeSleep]s">>log/monitor.log
            sleep $timeSleep
        esac
    else
        echo "[`date +%Y-%m-%d%t%X.%N`]: status code is [$status], not change, it will be sleep [$timeSleep]s.">>log/monitor.log
    fi
    export cur=`get_file_last_modify_timestamp`
    sleep $timeSleep
done
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58

原理

通过一个共享文件status.txt内容进行模型状态的转变

log

我们可以通过 log 来理解模式的转变，实际上上面的 bash script就是对 dr-* 进行编排实现自动化，

[2022-11-08	03:01:35 AM.437286281]: status code is [04], not change, it will be sleep [10]s.
[2022-11-08	03:01:45 AM.441117250]: status code is [04], not change, it will be sleep [10]s.
[2022-11-08	03:01:55 AM.444974459]: status code is [04], not change, it will be sleep [10]s.
[2022-11-08	03:02:05 AM.448929952]: status is [05] ,status has changed.
[2022-11-08	03:02:05 AM.449525797]: run [setsid ./stop-evaluation.sh]
[2022-11-08	03:02:20 AM.349379879]: status code is [05], not change, it will be sleep [10]s.
[2022-11-08	03:02:30 AM.354222402]: status code is [05], not change, it will be sleep [10]s.
[2022-11-08	03:02:40 AM.358674263]: status is [00] ,status has changed.
[2022-11-08	03:02:40 AM.359368923]: run [setsid ./upload-custom-files.sh]
[2022-11-08	03:02:55 AM.580158173]: status code is [00], not change, it will be sleep [10]s.
[2022-11-08	03:03:05 AM.402226063]: status code is [01], not change, it will be sleep [10]s.
[2022-11-08	03:03:15 AM.406244132]: status code is [01], not change, it will be sleep [10]s.
[2022-11-08	03:03:25 AM.418525804]: status code is [01], not change, it will be sleep [10]s.
[2022-11-08	03:03:35 AM.425738852]: status code is [01], not change, it will be sleep [10]s.
[2022-11-08	03:03:45 AM.429564953]: status code is [01], not change, it will be sleep [10]s.
[2022-11-08	03:03:55 AM.433331543]: status code is [01], not change, it will be sleep [10]s.
[2022-11-08	03:04:05 AM.437099544]: status code is [01], not change, it will be sleep [10]s.
[2022-11-08	03:04:15 AM.440827802]: status code is [01], not change, it will be sleep [10]s.
[2022-11-08	03:04:25 AM.444667428]: status code is [01], not change, it will be sleep [10]s.
[2022-11-08	03:04:35 AM.448684906]: status code is [01], not change, it will be sleep [10]s.
[2022-11-08	03:04:45 AM.452606476]: status code is [01], not change, it will be sleep [10]s.
[2022-11-08	03:04:55 AM.456495505]: status code is [01], not change, it will be sleep [10]s.
[2022-11-08	03:05:05 AM.460277403]: status code is [01], not change, it will be sleep [10]s.
[2022-11-08	03:05:15 AM.464175017]: status code is [01], not change, it will be sleep [10]s.
[2022-11-08	03:05:25 AM.467920367]: status code is [01], not change, it will be sleep [10]s.
[2022-11-08	03:05:35 AM.471651339]: status code is [01], not change, it will be sleep [10]s.
[2022-11-08	03:05:45 AM.475433648]: status is [01] ,status has changed.
[2022-11-08	03:05:45 AM.476036716]: run [setsid ./start-training.sh]
[2022-11-08	03:06:05 AM.482743409]: status code is [01], not change, it will be sleep [10]s.
[2022-11-08	03:06:15 AM.518810346]: status code is [01], not change, it will be sleep [10]s.
[2022-11-08	03:06:25 AM.524512837]: status code is [01], not change, it will be sleep [10]s.
[2022-11-08	03:06:35 AM.531110784]: status code is [01], not change, it will be sleep [10]s.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

3.3 控制模块

#!/bin/bash
export PATH=/root/.virtualenvs/deepracer/bin:$PATH 
python AutomaticIterate.py>>log/AutomaticIterate.log
1
2
3

原理

控制模块就是识别当前deepracer 模型处于的状态，下一步要做什么，将对应的code 写进status.txt文件中等待执行模块去执行。

参数控制

参数列表可以根据你的需要任意修改
在这里插入图片描述

超参，奖励函数的更新

通过参数列表对奖励函数进行format

迭代

这里触发到指定的条件，模型将会迭代到下一个参数列表，本轮迭代结束会发送邮件提醒，也就是之前截图展示的。
在这里插入图片描述

log

[2022-11-07 06:04:01.961217]-[3]: model:2022-reinvent-champ-500-1005,racename:2022_reinvent_champ,evalTimes:3,iterCount=5
[2022-11-07 06:04:01.961289]-[3]: skip 0
[2022-11-07 06:04:01.961321]-[3]: skip 1
[2022-11-07 06:04:01.961345]-[3]: skip 2
[2022-11-07 06:04:01.961368]-[3]: skip 3
[2022-11-07 06:04:01.961389]-[3]: skip 4
[2022-11-07 06:04:01.961417]-[3]: Param - [(20, 4, 12, 1.2, 2.0, 0.6, 30, -28, 256, 0.0003, 60, 0.23, 28.5)]
[2022-11-07 06:04:01.961963]-[3]: 2.upload-custom-files
[2022-11-07 06:04:21.978827]-[4]: {'batch_size': 256, 'beta_entropy': 0.01, 'discount_factor': 0.995, 'e_greedy_value': 0.05, 'epsilon_steps': 20000, 'exploration_type': 'categorical', 'loss_type': 'huber', 'lr': 0.0003, 'num_episodes_between_training': 20, 'num_epochs': 10, 'stack_size': 1, 'term_cond_avg_score': 350.0, 'term_cond_max_episodes': 60, 'sac_alpha': 0.2}
[2022-11-07 06:04:21.978896]-[4]: {'action_space': {'steering_angle': {'high': 30, 'low': -28}, 'speed': {'high': 2.0, 'low': 0.6}}, 'sensor': ['FRONT_FACING_CAMERA'], 'neural_network': 'DEEP_CONVOLUTIONAL_NETWORK_SHALLOW', 'training_algorithm': 'clipped_ppo', 'action_space_type': 'continuous', 'version': '4'}
[2022-11-07 06:04:21.978930]-[3]: max_episodes=[60],speed=[{'high': 2.0, 'low': 0.6}],angle:[{'high': 30, 'low': -28}]
[2022-11-07 06:04:21.982142]-[3]: 2.1 code is [09]
[2022-11-07 06:04:21.982194]-[4]: 2.1 No training task
[2022-11-07 06:04:21.982221]-[3]: 3.Preparation Training
[2022-11-07 06:05:22.060617]-[3]: 3. code is [01]
[2022-11-07 06:05:22.060736]-[4]: 3.start-training -w running status [normal]
[2022-11-07 06:05:22.060782]-[3]: 3.1 training 2400s, in sleep
[2022-11-07 06:45:22.159645]-[4]: Training Completed. jsFile:/home/rambo/deepracer-for-cloud/data/minio/bucket/2022-reinvent-champ-500-1005/metrics/TrainingMetrics.json/xl.meta
[2022-11-07 06:45:22.162616]-[3]: 3.1 [Training completed] at [1/60], handle data will be start after 300s
[2022-11-07 06:50:22.267060]-[4]: center_len = 33.27526530080891
[2022-11-07 06:50:22.267198]-[4]: race_len = 27.327370301905848
[2022-11-07 06:50:23.750073]-[3]: K1 K2 list:[20, 22.58, -0.51, 1.12]
[2022-11-07 06:50:23.750221]-[3]: K1 K2 list:[40, 26.64, -0.51, 1.17]
[2022-11-07 06:50:23.750273]-[3]: K1 K2 list:[60, 26.37, -0.45, 1.15]
[2022-11-07 06:50:23.750890]-[3]: whole: k1 = [25.29],k2 = [-0.49],speed = [1.15],
[2022-11-07 06:50:23.751218]-[4]: find json file [['/home/rambo/deepracer-for-cloud/data/minio/bucket/2022-reinvent-champ-500-1005/metrics/TrainingMetrics.json/xl.meta']]
[2022-11-07 06:50:23.751737]-[3]: Train time [9.052]
[2022-11-07 06:50:25.404650]-[3]: 4 code is [01]
[2022-11-07 06:50:25.404731]-[4]: 4. stop-training, please waiting-[1/30]
[2022-11-07 06:50:50.434640]-[3]: 4 code is [01]
[2022-11-07 06:50:50.434735]-[4]: 4. stop-training, please waiting-[2/30]
[2022-11-07 06:51:15.461886]-[3]: 4 code is [09]
[2022-11-07 06:51:15.461966]-[3]: 4. training task completed.
[2022-11-07 06:51:15.464681]-[3]: 5 code is [09]
[2022-11-07 06:51:15.464732]-[3]: 5-[1/3].start-evaluation, then sleep 300s
[2022-11-07 06:56:15.569662]-[3]: 5 code is [04]
[2022-11-07 06:56:15.569743]-[3]: 6.stop-evaluation, please waiting 30s-[1/30]
[2022-11-07 06:56:45.601392]-[3]: 5 code is [09]
[2022-11-07 06:56:45.601471]-[4]: 6.stop-evaluation task completed.
[2022-11-07 06:56:53.606790]-[3]: 6.handle [4-12-1.2-0] evaluation.
[2022-11-07 06:56:53.609046]-[4]: [handleEvaluation]:center_len = 33.27526530080891
[2022-11-07 06:56:53.609093]-[4]: [handleEvaluation]:race_len = 27.327370301905848
[2022-11-07 06:56:53.609167]-[4]: [handleEvaluation]:create folder 2022-reinvent-champ-500-1005/4-12-1.2-0
[2022-11-07 06:56:53.609446]-[4]: [handleEvaluation]:evalMetrics:/home/rambo/deepracer-for-cloud/data/minio/bucket/2022-reinvent-champ-500-1005/metrics/EvaluationMetrics-20221107065120.json/xl.meta
[2022-11-07 06:56:53.621811]-[4]: ['/home/rambo/deepracer-for-cloud/data/minio/bucket/2022-reinvent-champ-500-1005/evaluation-20221107065120/evaluation-simtrace/0-iteration.csv/807ca005-9b55-47a3-b1a0-fccc37d1d3df/part.1']
[2022-11-07 06:56:53.995542]-[4]: 7.[4-12-1.2-0]:totalTime = 88.802,mean_avg:0.21186306752681827,singleTime = 29.60066666666667,fastTime = 27.264,totalReset = 2,minLength=29.656,minStep=408
[2022-11-07 06:56:53.995619]-[4]: 7.[4-12-1.2-0]:slowTime = 30.845,maxLength = 30.3248,avgLength = 29.9601,maxStep=461,avgStep=440.3333333333333
[2022-11-07 06:56:53.995654]-[4]: 7.meanValue=0.23,timesAvgValue=28.5
[2022-11-07 06:56:53.998814]-[3]: 5 code is [09]
[2022-11-07 06:56:53.998871]-[3]: 5-[2/3].start-evaluation, then sleep 300s
[2022-11-07 07:01:54.103418]-[3]: 5 code is [04]
[2022-11-07 07:01:54.103501]-[3]: 6.stop-evaluation, please waiting 30s-[1/30]
[2022-11-07 07:02:24.133816]-[3]: 5 code is [09]
[2022-11-07 07:02:24.133905]-[4]: 6.stop-evaluation task completed.
[2022-11-07 07:02:32.140035]-[3]: 6.handle [4-12-1.2-1] evaluation.
[2022-11-07 07:02:32.142578]-[4]: [handleEvaluation]:center_len = 33.27526530080891
[2022-11-07 07:02:32.142627]-[4]: [handleEvaluation]:race_len = 27.327370301905848
[2022-11-07 07:02:35.146732]-[4]: [handleEvaluation]:create folder 2022-reinvent-champ-500-1005/4-12-1.2-0
[2022-11-07 07:02:35.146901]-[4]: [handleEvaluation]:evalMetrics:/home/rambo/deepracer-for-cloud/data/minio/bucket/2022-reinvent-champ-500-1005/metrics/EvaluationMetrics-20221107065705.json/xl.meta
[2022-11-07 07:02:35.159559]-[4]: ['/home/rambo/deepracer-for-cloud/data/minio/bucket/2022-reinvent-champ-500-1005/evaluation-20221107065705/evaluation-simtrace/0-iteration.csv/808f3661-082d-49b6-8c46-af7476accfbd/part.1', '/home/rambo/deepracer-for-cloud/data/minio/bucket/2022-reinvent-champ-500-1005/evaluation-20221107065120/evaluation-simtrace/0-iteration.csv/807ca005-9b55-47a3-b1a0-fccc37d1d3df/part.1']
[2022-11-07 07:02:35.479653]-[4]: 7.[4-12-1.2-1]:totalTime = 91.892,mean_avg:0.2235271375929391,singleTime = 30.630666666666666,fastTime = 26.615,totalReset = 3,minLength=29.3159,minStep=397
[2022-11-07 07:02:35.479728]-[4]: 7.[4-12-1.2-1]:slowTime = 34.236,maxLength = 31.5246,avgLength = 30.2256,maxStep=512,avgStep=453.6666666666667
[2022-11-07 07:02:35.479762]-[4]: 7.meanValue=0.23,timesAvgValue=28.5
[2022-11-07 07:02:35.482937]-[3]: 5 code is [09]
[2022-11-07 07:02:35.482986]-[3]: 5-[3/3].start-evaluation, then sleep 300s
[2022-11-07 07:07:35.587574]-[3]: 5 code is [04]
[2022-11-07 07:07:35.587661]-[3]: 6.stop-evaluation, please waiting 30s-[1/30]
[2022-11-07 07:08:05.618282]-[3]: 5 code is [09]
[2022-11-07 07:08:05.618362]-[4]: 6.stop-evaluation task completed.
[2022-11-07 07:08:13.626152]-[3]: 6.handle [4-12-1.2-2] evaluation.
[2022-11-07 07:08:13.628527]-[4]: [handleEvaluation]:center_len = 33.27526530080891
[2022-11-07 07:08:13.628573]-[4]: [handleEvaluation]:race_len = 27.327370301905848
[2022-11-07 07:08:16.630159]-[4]: [handleEvaluation]:create folder 2022-reinvent-champ-500-1005/4-12-1.2-0
[2022-11-07 07:08:16.630334]-[4]: [handleEvaluation]:evalMetrics:/home/rambo/deepracer-for-cloud/data/minio/bucket/2022-reinvent-champ-500-1005/metrics/EvaluationMetrics-20221107070245.json/xl.meta
[2022-11-07 07:08:16.643429]-[4]: ['/home/rambo/deepracer-for-cloud/data/minio/bucket/2022-reinvent-champ-500-1005/evaluation-20221107070245/evaluation-simtrace/0-iteration.csv/ffb81acc-8de8-4fa1-af00-2be4c6c1c716/part.1', '/home/rambo/deepracer-for-cloud/data/minio/bucket/2022-reinvent-champ-500-1005/evaluation-20221107065705/evaluation-simtrace/0-iteration.csv/808f3661-082d-49b6-8c46-af7476accfbd/part.1', '/home/rambo/deepracer-for-cloud/data/minio/bucket/2022-reinvent-champ-500-1005/evaluation-20221107065120/evaluation-simtrace/0-iteration.csv/807ca005-9b55-47a3-b1a0-fccc37d1d3df/part.1']
[2022-11-07 07:08:16.975983]-[4]: 7.[4-12-1.2-2]:totalTime = 109.377,mean_avg:0.3096560485376241,singleTime = 36.459,fastTime = 26.69,totalReset = 8,minLength=29.0818,minStep=396
[2022-11-07 07:08:16.976059]-[4]: 7.[4-12-1.2-2]:slowTime = 41.526,maxLength = 33.6375,avgLength = 32.0097,maxStep=619,avgStep=541.6666666666666
[2022-11-07 07:08:16.976094]-[4]: 7.meanValue=0.23,timesAvgValue=28.5
[2022-11-07 07:08:16.976583]-[3]: 2.upload-custom-files
[2022-11-07 07:08:36.995268]-[4]: {'batch_size': 256, 'beta_entropy': 0.01, 'discount_factor': 0.995, 'e_greedy_value': 0.05, 'epsilon_steps': 20000, 'exploration_type': 'categorical', 'loss_type': 'huber', 'lr': 0.0003, 'num_episodes_between_training': 20, 'num_epochs': 10, 'stack_size': 1, 'term_cond_avg_score': 350.0, 'term_cond_max_episodes': 60, 'sac_alpha': 0.2}
[2022-11-07 07:08:36.995337]-[4]: {'action_space': {'steering_angle': {'high': 30, 'low': -28}, 'speed': {'high': 2.0, 'low': 0.6}}, 'sensor': ['FRONT_FACING_CAMERA'], 'neural_network': 'DEEP_CONVOLUTIONAL_NETWORK_SHALLOW', 'training_algorithm': 'clipped_ppo', 'action_space_type': 'continuous', 'version': '4'}
[2022-11-07 07:08:36.995383]-[3]: max_episodes=[60],speed=[{'high': 2.0, 'low': 0.6}],angle:[{'high': 30, 'low': -28}]
[2022-11-07 07:08:36.998750]-[3]: 2.1 code is [09]
[2022-11-07 07:08:36.998800]-[4]: 2.1 No training task
[2022-11-07 07:08:36.998840]-[3]: 3.Preparation Training
[2022-11-07 07:09:37.075541]-[3]: 3. code is [01]
[2022-11-07 07:09:37.075657]-[4]: 3.start-training -w running status [normal]
[2022-11-07 07:09:37.075711]-[3]: 3.1 training 2400s, in sleep
[2022-11-07 07:49:37.174656]-[4]: Training Completed. jsFile:/home/rambo/deepracer-for-cloud/data/minio/bucket/2022-reinvent-champ-500-1005/metrics/TrainingMetrics.json/xl.meta
[2022-11-07 07:49:37.177869]-[3]: 3.1 [Training completed] at [1/60], handle data will be start after 300s
[2022-11-07 07:54:37.281866]-[4]: center_len = 33.27526530080891
[2022-11-07 07:54:37.281967]-[4]: race_len = 27.327370301905848
[2022-11-07 07:54:41.657382]-[3]: K1 K2 list:[20, 26.77, -0.53, 1.18]
[2022-11-07 07:54:41.657531]-[3]: K1 K2 list:[40, 21.45, -0.12, 1.15]
[2022-11-07 07:54:41.657591]-[3]: K1 K2 list:[60, 24.55, 0.0, 1.18]
[2022-11-07 07:54:41.658217]-[3]: whole: k1 = [24.47],k2 = [-0.23],speed = [1.17],
[2022-11-07 07:54:41.658574]-[4]: find json file [['/home/rambo/deepracer-for-cloud/data/minio/bucket/2022-reinvent-champ-500-1005/metrics/TrainingMetrics.json/xl.meta']]
[2022-11-07 07:54:41.659060]-[3]: Train time [8.851]
[2022-11-07 07:54:43.511598]-[3]: 4 code is [01]
[2022-11-07 07:54:43.511682]-[4]: 4. stop-training, please waiting-[1/30]
[2022-11-07 07:55:08.542381]-[3]: 4 code is [01]
[2022-11-07 07:55:08.542475]-[4]: 4. stop-training, please waiting-[2/30]
[2022-11-07 07:55:33.569796]-[3]: 4 code is [09]
[2022-11-07 07:55:33.569897]-[3]: 4. training task completed.
[2022-11-07 07:55:33.573248]-[3]: 5 code is [09]
[2022-11-07 07:55:33.573301]-[3]: 5-[1/3].start-evaluation, then sleep 300s
[2022-11-07 08:00:33.676360]-[3]: 5 code is [04]
[2022-11-07 08:00:33.676442]-[3]: 6.stop-evaluation, please waiting 30s-[1/30]
[2022-11-07 08:01:03.710027]-[3]: 5 code is [09]
[2022-11-07 08:01:03.710133]-[4]: 6.stop-evaluation task completed.
[2022-11-07 08:01:11.718158]-[3]: 6.handle [4-12-1.2-0] evaluation.
[2022-11-07 08:01:11.720734]-[4]: [handleEvaluation]:center_len = 33.27526530080891
[2022-11-07 08:01:11.720787]-[4]: [handleEvaluation]:race_len = 27.327370301905848
[2022-11-07 08:01:14.722765]-[4]: [handleEvaluation]:create folder 2022-reinvent-champ-500-1005/4-12-1.2-0
[2022-11-07 08:01:14.722971]-[4]: [handleEvaluation]:evalMetrics:/home/rambo/deepracer-for-cloud/data/minio/bucket/2022-reinvent-champ-500-1005/metrics/EvaluationMetrics-20221107075543.json/xl.meta
[2022-11-07 08:01:14.735812]-[4]: ['/home/rambo/deepracer-for-cloud/data/minio/bucket/2022-reinvent-champ-500-1005/evaluation-20221107075543/evaluation-simtrace/0-iteration.csv/19f3b7a9-f70a-4684-b7ef-8d8da2bb2e55/part.1']
[2022-11-07 08:01:15.052338]-[4]: 7.[4-12-1.2-0]:totalTime = 83.752,mean_avg:0.19228465989638896,singleTime = 27.917333333333332,fastTime = 26.44,totalReset = 1,minLength=29.3873,minStep=394
[2022-11-07 08:01:15.052415]-[4]: 7.[4-12-1.2-0]:slowTime = 30.004,maxLength = 30.2902,avgLength = 29.7114,maxStep=444,avgStep=415.0
[2022-11-07 08:01:15.052452]-[4]: 7.meanValue=0.23,timesAvgValue=28.5
[2022-11-07 08:01:15.052492]-[3]: 8. Update to next iteration, and exit.
[2022-11-07 08:01:15.698728]-[4]: [SendMail.py]:subject=[deepracer - 2022-reinvent-champ-500-1005],content=[totalTime=83.752,mean_avg:0.19228465989638896,singleTime = 27.917333333333332,fastTime = 26.44,totalReset = 1,avgLength = 29.7114,avgStep=415.0], send successful.
[2022-11-07 08:02:02.565812]-[3]: model:2022-reinvent-champ-500-1006,racename:2022_reinvent_champ,evalTimes:3,iterCount=6
[2022-11-07 08:02:02.565900]-[3]: skip 0
[2022-11-07 08:02:02.566467]-[3]: skip 1
[2022-11-07 08:02:02.566522]-[3]: skip 2
[2022-11-07 08:02:02.566552]-[3]: skip 3
[2022-11-07 08:02:02.566596]-[3]: skip 4
[2022-11-07 08:02:02.566622]-[3]: skip 5
[2022-11-07 08:02:02.566652]-[3]: Param - [(20, 4, 12, 1.2, 2.2, 0.6, 30, -28, 256, 0.0003, 60, 0.23, 26.5)]
[2022-11-07 08:02:02.567194]-[3]: 2.upload-custom-files
[2022-11-07 08:02:22.587463]-[4]: {'batch_size': 256, 'beta_entropy': 0.01, 'discount_factor': 0.995, 'e_greedy_value': 0.05, 'epsilon_steps': 20000, 'exploration_type': 'categorical', 'loss_type': 'huber', 'lr': 0.0003, 'num_episodes_between_training': 20, 'num_epochs': 10, 'stack_size': 1, 'term_cond_avg_score': 350.0, 'term_cond_max_episodes': 60, 'sac_alpha': 0.2}
[2022-11-07 08:02:22.587554]-[4]: {'action_space': {'steering_angle': {'high': 30, 'low': -28}, 'speed': {'high': 2.2, 'low': 0.6}}, 'sensor': ['FRONT_FACING_CAMERA'], 'neural_network': 'DEEP_CONVOLUTIONAL_NETWORK_SHALLOW', 'training_algorithm': 'clipped_ppo', 'action_space_type': 'continuous', 'version': '4'}
[2022-11-07 08:02:22.587590]-[3]: max_episodes=[60],speed=[{'high': 2.2, 'low': 0.6}],angle:[{'high': 30, 'low': -28}]
[2022-11-07 08:02:22.591045]-[3]: 2.1 code is [09]
[2022-11-07 08:02:22.591098]-[4]: 2.1 No training task
[2022-11-07 08:02:22.591126]-[3]: 3.Preparation Training
[2022-11-07 08:03:22.666526]-[3]: 3. code is [01]
[2022-11-07 08:03:22.666653]-[4]: 3.start-training -w running status [normal]
[2022-11-07 08:03:22.666721]-[3]: 3.1 training 2400s, in sleep
[2022-11-07 08:43:22.767096]-[4]: Training Completed. jsFile:/home/rambo/deepracer-for-cloud/data/minio/bucket/2022-reinvent-champ-500-1006/metrics/TrainingMetrics.json/xl.meta
[2022-11-07 08:43:22.770387]-[3]: 3.1 [Training completed] at [1/60], handle data will be start after 300s
[2022-11-07 08:48:22.874184]-[4]: center_len = 33.27526530080891
[2022-11-07 08:48:22.874283]-[4]: race_len = 27.327370301905848
[2022-11-07 08:48:24.190171]-[3]: K1 K2 list:[20, 24.01, -0.22, 1.23]
[2022-11-07 08:48:24.190299]-[3]: K1 K2 list:[40, 22.33, -1.14, 1.26]
[2022-11-07 08:48:24.190350]-[3]: K1 K2 list:[60, 23.19, -0.97, 1.22]
[2022-11-07 08:48:24.190993]-[3]: whole: k1 = [23.25],k2 = [-0.74],speed = [1.24],
[2022-11-07 08:48:24.191317]-[4]: find json file [['/home/rambo/deepracer-for-cloud/data/minio/bucket/2022-reinvent-champ-500-1006/metrics/TrainingMetrics.json/xl.meta']]
[2022-11-07 08:48:24.191921]-[3]: Train time [8.635]
[2022-11-07 08:48:25.786807]-[3]: 4 code is [01]
[2022-11-07 08:48:25.786891]-[4]: 4. stop-training, please waiting-[1/30]
[2022-11-07 08:48:50.817776]-[3]: 4 code is [01]
[2022-11-07 08:48:50.817875]-[4]: 4. stop-training, please waiting-[2/30]
[2022-11-07 08:49:15.841611]-[3]: 4 code is [09]
[2022-11-07 08:49:15.841718]-[3]: 4. training task completed.
[2022-11-07 08:49:15.844635]-[3]: 5 code is [09]
[2022-11-07 08:49:15.844692]-[3]: 5-[1/3].start-evaluation, then sleep 300s
[2022-11-07 08:54:15.949670]-[3]: 5 code is [04]
[2022-11-07 08:54:15.949756]-[3]: 6.stop-evaluation, please waiting 30s-[1/30]
[2022-11-07 08:54:45.983334]-[3]: 5 code is [09]
[2022-11-07 08:54:45.983428]-[4]: 6.stop-evaluation task completed.
[2022-11-07 08:54:53.991355]-[3]: 6.handle [4-12-1.2-0] evaluation.
[2022-11-07 08:54:53.993732]-[4]: [handleEvaluation]:center_len = 33.27526530080891
[2022-11-07 08:54:53.993779]-[4]: [handleEvaluation]:race_len = 27.327370301905848
[2022-11-07 08:54:53.993887]-[4]: [handleEvaluation]:create folder 2022-reinvent-champ-500-1006/4-12-1.2-0
[2022-11-07 08:54:53.994247]-[4]: [handleEvaluation]:evalMetrics:/home/rambo/deepracer-for-cloud/data/minio/bucket/2022-reinvent-champ-500-1006/metrics/EvaluationMetrics-20221107084924.json/xl.meta
[2022-11-07 08:54:54.006976]-[4]: ['/home/rambo/deepracer-for-cloud/data/minio/bucket/2022-reinvent-champ-500-1006/evaluation-20221107084924/evaluation-simtrace/0-iteration.csv/dd745098-751f-4ccd-97b2-2826aa0e8163/part.1']
[2022-11-07 08:54:54.351587]-[4]: 7.[4-12-1.2-0]:totalTime = 90.008,mean_avg:0.2687694652776692,singleTime = 30.002666666666666,fastTime = 25.348,totalReset = 4,minLength=29.6702,minStep=376
[2022-11-07 08:54:54.351680]-[4]: 7.[4-12-1.2-0]:slowTime = 39.024,maxLength = 32.2085,avgLength = 30.5168,maxStep=574,avgStep=444.0
[2022-11-07 08:54:54.351716]-[4]: 7.meanValue=0.23,timesAvgValue=26.5
[2022-11-07 08:54:54.355084]-[3]: 5 code is [09]
[2022-11-07 08:54:54.355135]-[3]: 5-[2/3].start-evaluation, then sleep 300s
[2022-11-07 08:59:54.460652]-[3]: 5 code is [04]
[2022-11-07 08:59:54.460760]-[3]: 6.stop-evaluation, please waiting 30s-[1/30]
[2022-11-07 09:00:24.494063]-[3]: 5 code is [09]
[2022-11-07 09:00:24.494177]-[4]: 6.stop-evaluation task completed.
[2022-11-07 09:00:32.502164]-[3]: 6.handle [4-12-1.2-1] evaluation.
[2022-11-07 09:00:32.504556]-[4]: [handleEvaluation]:center_len = 33.27526530080891
[2022-11-07 09:00:32.504603]-[4]: [handleEvaluation]:race_len = 27.327370301905848
[2022-11-07 09:00:35.508658]-[4]: [handleEvaluation]:create folder 2022-reinvent-champ-500-1006/4-12-1.2-0
[2022-11-07 09:00:35.508902]-[4]: [handleEvaluation]:evalMetrics:/home/rambo/deepracer-for-cloud/data/minio/bucket/2022-reinvent-champ-500-1006/metrics/EvaluationMetrics-20221107085505.json/xl.meta
[2022-11-07 09:00:35.521858]-[4]: ['/home/rambo/deepracer-for-cloud/data/minio/bucket/2022-reinvent-champ-500-1006/evaluation-20221107085505/evaluation-simtrace/0-iteration.csv/66de3496-6f7c-4891-bea4-ee665917c4c3/part.1', '/home/rambo/deepracer-for-cloud/data/minio/bucket/2022-reinvent-champ-500-1006/evaluation-20221107084924/evaluation-simtrace/0-iteration.csv/dd745098-751f-4ccd-97b2-2826aa0e8163/part.1']
[2022-11-07 09:00:35.843629]-[4]: 7.[4-12-1.2-1]:totalTime = 77.09,mean_avg:0.1805505599161034,singleTime = 25.69666666666667,fastTime = 25.187,totalReset = 0,minLength=29.2983,minStep=376
[2022-11-07 09:00:35.843712]-[4]: 7.[4-12-1.2-1]:slowTime = 26.089,maxLength = 29.5451,avgLength = 29.4581,maxStep=390,avgStep=383.0
[2022-11-07 09:00:35.843748]-[4]: 7.meanValue=0.23,timesAvgValue=26.5
[2022-11-07 09:00:35.843777]-[3]: 8. Update to next iteration, and exit.
[2022-11-07 09:00:36.638000]-[4]: [SendMail.py]:subject=[deepracer - 2022-reinvent-champ-500-1006],content=[totalTime=77.09,mean_avg:0.1805505599161034,singleTime = 25.69666666666667,fastTime = 25.187,totalReset = 0,avgLength = 29.4581,avgStep=383.0], send successful.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187

3.4 scheduler 模块

Ubuntu 有个 crontab , 类似于windows 的 taskschd 命令(计划任务)，将上面的执行模块脚本和控制模块脚本加入 crontab 表中，关于crontab 表的编写请自行查阅相关资料。

  GNU nano 4.8                                                               /tmp/crontab.7ENtM7/crontab
# For more information see the manual pages of crontab(5) and cron
# m h  dom mon dow   command
*/3 * * * * /home/rambo/deepracer-for-cloud/keepMonitorSetsid.sh
*/2 * * * * /home/rambo/deepracer-for-cloud/AutoTrainTools.sh
1
2
3
4
5

上面的两条crontab命令表示:

每隔 3 分钟执行一次 keepMonitorSetsid.sh
每隔 2 分钟执行一次 AutoTrainTools.sh

4. 重点讲解

这里与之前本地迭代不同点是增加了一个 meanValue 参数。

4.1 这个参数是做什么的呢？如何计算？

这个参数是计算小车偏离最优线路的距离的平方的平均值

4.2 它的意义是什么呢？

它表明小车的实际路线偏离最优路线的程度，这个值越小说明小车越靠近最优线路。这个也避免了之前只看完成时间的模型随机性，使得模型更加稳定，猜想这种模型在线下应该也比较好。

在实际数据中可以看到 meanValue 越小的小车路线确实更加接近最优路线，而且它三圈完成的时间几乎差值在1s内，非常稳定。
在这里插入图片描述

以上 bash脚本 和 python 代码 都是在阿里云实战出来的，目前代码稳定运行2天，没有出现挂掉的。
甚至你还可以通过我实时阿里云服务器网址（抢占式的IP可能在我释放后修改）查看训练和评估过程
训练: http://8.134.146.175:8080/stream_viewer?topic=/racecar/deepracer/kvs_stream
评估: http://8.134.146.175:8180/stream_viewer?topic=/racecar/deepracer/kvs_stream
在这里插入图片描述

通过在阿里云上选择不同配置的服务器进行训练，我发现最低配置必须是 2核 8G，看内存使用率就可以知道4G不够用，我现在使用的就是最低配置，如果你的预算充足，建议4核16G或者更高，通过抢占式发现，同样迭代次数，高配置的花的时间更短。在这里插入图片描述
我可以有偿(毕竟自己研究这也投了mony)出售或者分享阿里云镜像，有需要联系本人wechat:fbl4869
阿里云镜像是你拿了就可以直接跑起来，代码也可以从镜像中获取。

总结

通过自己开发这个框架恶补了 Linux bash script 知识和学习了Docker 相关的知识（docker 一些命令现在用的也是贼溜·-手动狗头），这个自动化训练的框架总算搭建起来且稳定跑了2天，以后就只需要编写指定的参数和format reward_function 就可以让它自动training, 本地训练的基础工作基本完成，后续进一步就是阅读 deepracer 源码 并分析了。

之前要是知道在阿里云上跑效果这么好就没有必要在本地搭建WSL了，性能相差太大，而且本地跑可靠性也差点意思，我在想如果在本地直接搭建一个Ubuntu应该效果跟阿里云一致。

>>> 如果你觉得我的文章对你有用，不妨【点赞】加【关注】，你的支持是我持续写作的动力，thank you! <<<

相关阅读:
Linux起源
 2年多的时间，我在便利蜂便利店消费了4千多块
 ES 查询语法-详解
 3. Exchange 交换机的使用
 Smart-tools 产品介绍
 复现log4j2漏洞（CVE-2021-44228）
jupyter notebook代码自动换行，超过一行长度自动换行，不用左右滑动
 asio做tcp的自动拆包时，asio的match condition如何使用的详细说明
 2023秋招--快手--游戏客户端--二面面经
 logback--进阶--04--配置
原文地址：https://blog.csdn.net/qq_37608398/article/details/127743152

(框架)Deepracer 自动训练 框架的搭建

文章目录