• bandit agent下棋AI(python编写) 通过强化学习RL 使用numpy


    PS:首先声明是学校的作业= = 我喊它贝塔狗(原谅我不要脸),因为一直觉得阿法狗很厉害但离我很遥远,终于第一次在作业驱动下尝试写了一个能看的AI,有不错的胜率还是挺开心的

    正文

    对战随机agent的胜率

    对战100局,记录胜/负/平与AI思考总时间(第三个是井字棋

    笔者CPU:i5-12500h,12核

    测试用例

    [--size SIZE]  (boardsize) 棋盘大小
    [--games GAMES]  (number of games) 玩多少盘
    [--iterations ITERATIONS] (number of iterations allowed by the agent) 提高这个会提高算度,但下的更慢
    [--print-board {all,final}] debug的时候用的
    [--parallel PARALLEL] 线程,我的电脑其实可以12,老师给的是8,懒得改了= =

    1. python main.py --games 100 --size 5 --iterations 100 --parallel 8 shapes1.txt >> results.txt # two in a row
    2. python main.py --games 100 --size 10 --iterations 100 --parallel 8 shapes1.txt >> results.txt # two in a row large
    3. python main.py --games 100 --size 3 --iterations 1000 --parallel 8 shapes2.txt >> results.txt # tic-tac-toe
    4. python main.py --games 100 --size 8 --iterations 1000 --parallel 8 shapes3.txt >> results.txt # plus
    5. python main.py --games 100 --size 8 --iterations 1000 --parallel 8 shapes4.txt >> results.txt # circle
    6. python main.py --games 100 --size 8 --iterations 100 --parallel 8 shapes4.txt >> results.txt # circle fast
    7. python main.py --games 100 --size 10 --iterations 1000 --parallel 8 shapes5.txt >> results.txt # disjoint

    思路/ pseudocode

    1. Get every possible move

    2. Simulate games for each possible move

    3. Calculate the reward for each possible move

    4. Return move choice for the real game

    上代码

    不能直接跑,重点是思路,不过我注释的很细节了

    1. from random_agent import RandomAgent
    2. from game import Game
    3. import numpy as np
    4. import copy
    5. import random
    6. class Agent:
    7. def __init__(self, iterations, id):
    8. self.iterations = iterations
    9. self.id = id
    10. def make_move(self, game):
    11. iter_cnt = 0
    12. rand = np.random.random()
    13. # parameters for each avaliable position
    14. freeposnum = len(game.board.free_positions())
    15. pos_winrate = np.zeros(freeposnum)
    16. pos_reward = np.zeros(freeposnum)
    17. pos_cnt = np.zeros(freeposnum)
    18. free_positions = game.board.free_positions()
    19. # simulation begin with creating a deep copy, which can change without affecting the others
    20. while iter_cnt < self.iterations:
    21. # create a deep copy
    22. board = copy.deepcopy(game.board)
    23. # dynamic epsilon, increased from 0(exploration) to 1(exploitation) by running time
    24. epsilon = iter_cnt / self.iterations
    25. # exploration & exploitation
    26. if rand > epsilon:
    27. #pointer = game.board.random_free()
    28. pointer = random.randrange(0, len(free_positions))
    29. else:
    30. pointer = np.argmax(pos_winrate)
    31. # make the move in the deepcopy and deduce the game by using random agents
    32. finalmove = free_positions[pointer]
    33. board.place(finalmove, self.id)
    34. # attention here, it should be agent no.2 to take the next move
    35. deepcopy_players = [RandomAgent(2), RandomAgent(1)]
    36. deepcopy_game = game.from_board(board, game.objectives, deepcopy_players, game.print_board)
    37. if deepcopy_game.victory(finalmove, self.id):
    38. winner = self
    39. else:
    40. winner = deepcopy_game.play()
    41. # give rewards by outcomes
    42. if winner:
    43. if winner.id == 1:
    44. pos_reward[pointer] += 1
    45. else:
    46. pos_reward[pointer] -= 1
    47. else:
    48. pos_reward[pointer] += 0
    49. # visit times + 1
    50. pos_cnt[pointer] += 1
    51. # calculate the winrate of each position
    52. pos_winrate[pointer] = pos_reward[pointer] / pos_cnt[pointer]
    53. # next iteration
    54. iter_cnt += 1
    55. # back to real match with a postion with the highest winrate
    56. highest_winrate_pos = np.argmax(pos_winrate)
    57. # take the shot
    58. finalmove = free_positions[highest_winrate_pos]
    59. return finalmove
    60. def __str__(self):
    61. return f'Player {self.id} (betago agent)'

    PSS: 其实我也比较懒,没有把测试用例都截图po上来,但时间精力确实有限,比如现在还有别的作业没写完= =

    只希望还是能帮到人吧(笑

  • 相关阅读:
    gcc/g++的使用
    Redis使用ZSET实现消息队列使用总结二
    基于Echarts实现可视化数据大屏厅店营业效能分析
    深入剖析Linux线程特定数据
    记 IDEA 启动 Command line is too long 解决
    [导弹打飞机H5动画制作] 导弹每次飞行的随机路线制作
    ESP8266--Arduino开发(驱动WS2812B)
    Linux - 基本背景
    Modelsim下载安装【Verilog】
    企业工程项目管理系统源码(三控:进度组织、质量安全、预算资金成本、二平台:招采、设计管理)
  • 原文地址:https://blog.csdn.net/weixin_42189468/article/details/127255829