• 计算机系统基础第六周作业


    1题:汇编器开发

    请开发一个RISC-V汇编器,支持自动将以下RISC-V汇编指令编译成RISC-V机器码。要求如下:

    1)该汇编器需支持的汇编命令包含:lui,sw,addi,add

    2)汇编器运行方式:创建一个名为source.s的汇编源代码文本,随意挑选4lui,sw,addi,add汇编测试语句,将这4条汇编语句写入到source.s文本中。然后用开发的汇编器运行该文本,并将生成的机器码输出到另一个名为binary.txt的文本中。最后再比较生成的机器码与附件1中的机器码是否一致,从而验证汇编器的功能(生成的机器码在binary.txt中用16进制表示)。

    3)汇编器功能要求如下:

    a能进行语法纠错,例如:如果输入luik  a3, 0x200,会进行报错,并指出语法错误的位置luik

    b能检查参数的非法范围,例如,若输入lui  a3, 0x2000000,则会提醒0x2000000超出合法界限。

    c)不考虑压缩型指令,所以指令机器码均为4字节(32位)。

    4)汇编器使用的开发语言没有限制。本文使用Python写。


    下面将定义Assembler类,即这个汇编器的核心,它包含几个方法来编码不同的汇编指令。首先进行初始化,再定义encode方法(它接受一个汇编指令字符串作为输入,然后基于指令的类型调用相应的编码方法),然后分别定义lui,sw,addi,add具体的encode编码方法,在每个方法中定义转换机器码、语法纠错、检查参数范围的功能。


    1. 初始化:给出寄存器注册名及其编号

    基于RISC-V的RV32I基本整数指令集的约定给出注册名和对应的编号。其中的键是RISC-V的寄存器名称,而值是它们对应的5位二进制编码。

    1. def __init__(self):
    2. # 注册名和其对应的编号
    3. self.registers = {
    4. "zero": "00000",
    5. "ra": "00001",
    6. "sp": "00010",
    7. "gp": "00011",
    8. "tp": "00100",
    9. "t0": "00101",
    10. "t1": "00110",
    11. "t2": "00111",
    12. "s0": "01000",
    13. "s1": "01001",
    14. "a0": "01010",
    15. "a1": "01011",
    16. "a2": "01100",
    17. "a3": "01101",
    18. "a4": "01110",
    19. "a5": "01111",
    20. "a6": "10000",
    21. "a7": "10001",
    22. "s2": "10010",
    23. "s3": "10011",
    24. "s4": "10100",
    25. "s5": "10101",
    26. "s6": "10110",
    27. "s7": "10111",
    28. "s8": "11000",
    29. "s9": "11001",
    30. "s10": "11010",
    31. "s11": "11011",
    32. "t3": "11100",
    33. "t4": "11101",
    34. "t5": "11110",
    35. "t6": "11111"
    36. }

    在RISC-V中,这些寄存器有特定的用途。例如:

    • zero(x0):硬连线为0的寄存器,任何写入操作都会被忽略。
    • ra(x1):返回地址。
    • sp(x2):堆栈指针。
    • gp(x3):全局指针。
    • tp(x4):线程指针。

    之后的寄存器,如t0-t6,是临时寄存器,用于一般的操作,而a0-a7用于函数参数和返回值。s0-s11是保存寄存器,它们在调用过程中的值应该被保留。


    2. 核心调度方法:encode

    它接受一个汇编指令字符串作为输入,然后基于指令的类型调用相应的编码方法。

    1. def encode(self, instruction):
    2. if "lui" in instruction:
    3. return self.encode_lui(instruction)
    4. elif "sw" in instruction:
    5. return self.encode_sw(instruction)
    6. elif "addi" in instruction:
    7. return self.encode_addi(instruction)
    8. elif "add" in instruction:
    9. return self.encode_add(instruction)
    10. else:
    11. raise ValueError(f"Unsupported instruction: {instruction}")

    其中的instruction就是我们将来要传递的汇编语句!这里只是初步判断语句的正确性:起码要保证句子里面含有lui,sw,addi,add吧?不含有的话就提前报错,这也算写的格式不对!


    3. 各类指令的编码方法

    1. def encode_lui(self, instruction):
    2. # 检查语法格式
    3. if not re.match(r'^lui\s+[a-z0-9]+,\s*0x[a-fA-F0-9]+$', instruction):
    4. raise ValueError(f"Syntax error in instruction: {instruction}")
    5. tokens = instruction.split()
    6. rd = self.registers.get(tokens[1].replace(',', ''))
    7. if rd is None:
    8. raise ValueError(f"Unknown register: {tokens[1]}")
    9. imm = bin(int(tokens[2], 16))[2:].zfill(20)
    10. if len(imm) > 20:
    11. raise ValueError(f"Immediate value out of bounds: {tokens[2]}")
    12. opcode = "0110111"
    13. return imm + rd + opcode
    14. def encode_sw(self, instruction):
    15. # 检查语法格式
    16. if not re.match(r'^sw\s+[a-z0-9]+,\s*[0-9]+\([a-z0-9]+\)$', instruction):
    17. raise ValueError(f"Syntax error in instruction: {instruction}")
    18. tokens = [token for token in re.split('[ ,()]', instruction) if token]
    19. rs2 = self.registers.get(tokens[1])
    20. rs1 = self.registers.get(tokens[3])
    21. if rs1 is None or rs2 is None:
    22. raise ValueError(f"Unknown register in instruction: {instruction}")
    23. offset = bin(int(tokens[2], 16))[2:].zfill(12)
    24. if len(offset) > 12:
    25. raise ValueError(f"Offset out of bounds: {tokens[2]}")
    26. opcode = "0100011"
    27. funct3 = "010"
    28. # Split the offset for the instruction format
    29. imm_11_5 = offset[:7]
    30. imm_4_0 = offset[7:]
    31. return imm_11_5 + rs2 + rs1 + funct3 + imm_4_0 + opcode
    32. def encode_addi(self, instruction):
    33. # 检查语法格式
    34. if not re.match(r'^addi\s+[a-z0-9]+,\s*[a-z0-9]+,\s*-?[0-9]+$', instruction):
    35. raise ValueError(f"Syntax error in instruction: {instruction}")
    36. tokens = instruction.split()
    37. rd = self.registers.get(tokens[1].replace(',', ''))
    38. rs1 = self.registers.get(tokens[2].replace(',', ''))
    39. if rd is None or rs1 is None:
    40. raise ValueError(f"Unknown register in instruction: {instruction}")
    41. imm = bin(int(tokens[3]))[2:].zfill(12)
    42. if len(imm) > 12:
    43. raise ValueError(f"Immediate value out of bounds: {tokens[3]}")
    44. opcode = "0010011"
    45. funct3 = "000"
    46. return imm + rs1 + funct3 + rd + opcode
    47. def encode_add(self, instruction):
    48. # 检查语法格式
    49. if not re.match(r'^add\s+[a-z0-9]+,\s*[a-z0-9]+,\s*[a-z0-9]+$', instruction):
    50. raise ValueError(f"Syntax error in instruction: {instruction}")
    51. tokens = instruction.split()
    52. rd = self.registers.get(tokens[1].replace(',', ''))
    53. rs1 = self.registers.get(tokens[2].replace(',', ''))
    54. rs2 = self.registers.get(tokens[3])
    55. if rd is None or rs1 is None or rs2 is None:
    56. raise ValueError(f"Unknown register in instruction: {instruction}")
    57. opcode = "0110011"
    58. funct3 = "000"
    59. funct7 = "0000000"
    60. return funct7 + rs2 + rs1 + funct3 + rd + opcode

    这些函数都大差不差,可以概括为:首先通过正则表达式检查语法格式,如果存在“luik”这种错误格式,就报错;然后将指令分解为tokens,便于转换机器码;然后就开始转换呗!转换规则都是基于RISC-V的RV32I基本整数指令集的约定。


    4. 读取source.s文件,将结果写入binary.txt

    这就涉及到Python的读写,对source.s文件中每一行指令读写后,将其内容赋给instructions,再使用核心调度方法encode( ),最后将结果以hex十六进制的格式写入binary.txt文件中。

    1. def from_file(self, input_file="source.s"):
    2. with open(input_file, 'r') as file:
    3. instructions = file.readlines()
    4. binary_codes = [self.encode(instruction.strip()) for instruction in instructions]
    5. with open("binary.txt", 'w') as out_file:
    6. for code in binary_codes:
    7. out_file.write(hex(int(code, 2))[2:].zfill(8) + '\n')

    以上都是Assembler类的定义,也就是汇编器的定义。下面我们再创建汇编器的实例assembler,调用from_file( )函数就可以:

    1. assembler = Assembler()
    2. assembler.from_file()

    下面我们对结果进行检验,准备好source.s文件:

    1. lui a5, 0x20000
    2. addi a5, a5, 12
    3. sw a4, 4(a5)
    4. add a4, a4, a5

    运行后产生binary.txt文件,内容如下:

    1. 200007b7
    2. 00c78793
    3. 00e7a223
    4. 00f70733

    与实际汇编器的内容进行比较发现,本文开发的汇编器结果正确!至此,问题一解决!完整源码如下:

    1. import re
    2. class Assembler:
    3. def __init__(self):
    4. # 注册名和其对应的编号
    5. self.registers = {
    6. "zero": "00000",
    7. "ra": "00001",
    8. "sp": "00010",
    9. "gp": "00011",
    10. "tp": "00100",
    11. "t0": "00101",
    12. "t1": "00110",
    13. "t2": "00111",
    14. "s0": "01000",
    15. "s1": "01001",
    16. "a0": "01010",
    17. "a1": "01011",
    18. "a2": "01100",
    19. "a3": "01101",
    20. "a4": "01110",
    21. "a5": "01111",
    22. "a6": "10000",
    23. "a7": "10001",
    24. "s2": "10010",
    25. "s3": "10011",
    26. "s4": "10100",
    27. "s5": "10101",
    28. "s6": "10110",
    29. "s7": "10111",
    30. "s8": "11000",
    31. "s9": "11001",
    32. "s10": "11010",
    33. "s11": "11011",
    34. "t3": "11100",
    35. "t4": "11101",
    36. "t5": "11110",
    37. "t6": "11111"
    38. }
    39. def encode(self, instruction):
    40. if "lui" in instruction:
    41. return self.encode_lui(instruction)
    42. elif "sw" in instruction:
    43. return self.encode_sw(instruction)
    44. elif "addi" in instruction:
    45. return self.encode_addi(instruction)
    46. elif "add" in instruction:
    47. return self.encode_add(instruction)
    48. else:
    49. raise ValueError(f"Unsupported instruction: {instruction}")
    50. def encode_lui(self, instruction):
    51. # 检查语法格式
    52. if not re.match(r'^lui\s+[a-z0-9]+,\s*0x[a-fA-F0-9]+$', instruction):
    53. raise ValueError(f"Syntax error in instruction: {instruction}")
    54. tokens = instruction.split()
    55. rd = self.registers.get(tokens[1].replace(',', ''))
    56. if rd is None:
    57. raise ValueError(f"Unknown register: {tokens[1]}")
    58. imm = bin(int(tokens[2], 16))[2:].zfill(20)
    59. if len(imm) > 20:
    60. raise ValueError(f"Immediate value out of bounds: {tokens[2]}")
    61. opcode = "0110111"
    62. return imm + rd + opcode
    63. def encode_sw(self, instruction):
    64. # 检查语法格式
    65. if not re.match(r'^sw\s+[a-z0-9]+,\s*[0-9]+\([a-z0-9]+\)$', instruction):
    66. raise ValueError(f"Syntax error in instruction: {instruction}")
    67. tokens = [token for token in re.split('[ ,()]', instruction) if token]
    68. if len(tokens) != 4:
    69. raise ValueError(f"Syntax error in instruction: {instruction}")
    70. rs2 = self.registers.get(tokens[1])
    71. rs1 = self.registers.get(tokens[3])
    72. if rs1 is None or rs2 is None:
    73. raise ValueError(f"Unknown register in instruction: {instruction}")
    74. offset = bin(int(tokens[2], 16))[2:].zfill(12)
    75. if len(offset) > 12:
    76. raise ValueError(f"Offset out of bounds: {tokens[2]}")
    77. opcode = "0100011"
    78. funct3 = "010"
    79. # Split the offset for the instruction format
    80. imm_11_5 = offset[:7]
    81. imm_4_0 = offset[7:]
    82. return imm_11_5 + rs2 + rs1 + funct3 + imm_4_0 + opcode
    83. def encode_addi(self, instruction):
    84. # 检查语法格式
    85. if not re.match(r'^addi\s+[a-z0-9]+,\s*[a-z0-9]+,\s*-?[0-9]+$', instruction):
    86. raise ValueError(f"Syntax error in instruction: {instruction}")
    87. tokens = instruction.split()
    88. if len(tokens) != 4:
    89. raise ValueError(f"Syntax error in instruction: {instruction}")
    90. rd = self.registers.get(tokens[1].replace(',', ''))
    91. rs1 = self.registers.get(tokens[2].replace(',', ''))
    92. if rd is None or rs1 is None:
    93. raise ValueError(f"Unknown register in instruction: {instruction}")
    94. imm = bin(int(tokens[3]))[2:].zfill(12)
    95. if len(imm) > 12:
    96. raise ValueError(f"Immediate value out of bounds: {tokens[3]}")
    97. opcode = "0010011"
    98. funct3 = "000"
    99. return imm + rs1 + funct3 + rd + opcode
    100. def encode_add(self, instruction):
    101. # 检查语法格式
    102. if not re.match(r'^add\s+[a-z0-9]+,\s*[a-z0-9]+,\s*[a-z0-9]+$', instruction):
    103. raise ValueError(f"Syntax error in instruction: {instruction}")
    104. tokens = instruction.split()
    105. if len(tokens) != 4:
    106. raise ValueError(f"Syntax error in instruction: {instruction}")
    107. rd = self.registers.get(tokens[1].replace(',', ''))
    108. rs1 = self.registers.get(tokens[2].replace(',', ''))
    109. rs2 = self.registers.get(tokens[3])
    110. if rd is None or rs1 is None or rs2 is None:
    111. raise ValueError(f"Unknown register in instruction: {instruction}")
    112. opcode = "0110011"
    113. funct3 = "000"
    114. funct7 = "0000000"
    115. return funct7 + rs2 + rs1 + funct3 + rd + opcode
    116. def from_file(self, input_file="source.s"):
    117. with open(input_file, 'r') as file:
    118. instructions = file.readlines()
    119. binary_codes = [self.encode(instruction.strip()) for instruction in instructions]
    120. with open("binary.txt", 'w') as out_file:
    121. for code in binary_codes:
    122. out_file.write(hex(int(code, 2))[2:].zfill(8) + '\n')
    123. # 示例运行
    124. assembler = Assembler()
    125. assembler.from_file()

    2题:手工编译

    &a = 0x20000000&b = 0x20000004&c = 0x20000008,请手工写出以下语句的汇编代码的机器码:c = a + b + 0x6000(说明:汇编实现方法不唯一,能实现功能即可)。


    汇编代码实现

    Lui a4,0x20000 #a

    Lw a5,4(a4) #b

    Lw a6,8(a4) #c

    Add a6,a4,a5

    Lui a7,0x6 

    Add a6,a6,a7

    其中,Lui a7,0x6中的0x6就是0x00006,只不过按照老师上上次课讲的,可以把前面的0省略掉。因为0x6000正好是4位十六进制数,刚好超过addi的imm范围,所以需要通过加载lui的上位部分才ok!


    机器码实现

    机器码(第一行为汇编指令,第二行为机器码格式,第三行为转换的机器码)如下:(a4实际地址为x14a5a6a7同样如此)

    Lui a4, 0x20000 加载上位指令

    imm[31:12] rd opcode

    0010,0000,0000,0000,0000,0111,0011,0111

    Lw a5, 4(a4)

    imm[11:0] rs1 funct3 rd opcode

    0000,0000,0100,0111,0010,0111,1000,0011

    同理:

    Lw a6,8(a4)的机器码为:

    0000,0000,1000,0111,0010,1000,0000,0011

    Add a6,a4,a5 #注意这里rs2a5rs1a4

    funct7 rs2 rs1 funct3 rd opcode

    0000,0000,1111,0111,0000,1000,0011,0011

    Lui a7,0x6

    imm[31:12] rd opcode

    0000,0000,0000,0000,0110,1000,1000,0011

    Add a6,a6,a7

    funct7 rs2 rs1 funct3 rd opcode

    0000,0001,0001,1000,0000,1000,0011,0011

  • 相关阅读:
    7_画图常用代码
    第3章 Linux多线程开发 01线程
    vuex实现简易购物车加购效果
    《持续交付:发布可靠软件的系统方法》- 读书笔记(五)
    智能导诊系统:基于机器学习和自然语言处理技术,可快速推荐合适的科室和医生
    【leetcode top100】两数相加。无重复字符的最长子串,盛水最多的容器,三数之和
    WAMP服务器对python进行配置
    一篇文章让你理解TCP协议,搭建TCP flood攻击实验以及了解其防御原理
    从零开始学习rust语言
    Codeforces 1281F 树上背包
  • 原文地址:https://blog.csdn.net/qq_65052774/article/details/133814186