• 深入理解计算机系统——第三章 Machine-Level Representation of Programs


    这章没有细看,涉及很多汇编内容,看以下博客讲解帮助理解下

    图解系统
    程序的机器级表示(一)
    程序的机器级表示(二)
    程序的机器级表示(三)
    程序的机器级表示(四)
    深入理解计算机系统学习笔记
    程序的机器级表示
    深入理解计算机系统(CSAPP)全书学习笔记(详细)
    深入理解计算机系统 课程视频

    看完后体会:第一次看时感觉记不住各种寄存器的含义等,但往后看多了例子后就逐渐记住和理解一些了,看的时候把里面的重要概念或者命令的图打印出来,方面看到代码的时候查看命令,看不懂的地方网上搜下别人的笔记讲解,或者视频课程讲解,然后继续往下看,发现后面经常会提到前面的知识,或者有示例帮助巩固理解,继续看完后发现前面一些地方比之前容易懂了。

    3.2 Program Encoding

    C 程序从源文件生成可执行文件的过程:

    1. The C Preprocessor expands the source code to include any files specified with #include commands and to expand any macros, specified with #define declarations.

    2. The compiler generates assembly code versions of the source files (*.s).

    3. The assembler converts the assembly code into binary object-code files (*.o). Object code is one form of machine code—it contains binary representations of all of the instructions, but the addresses of global values are not yet filled in.

    4. The linker merges these object-code files along with code implementing library functions (e.g., printf) and generates the final executable code file (as specified by the command-line directive -o p). Executable code is the second form of machine code we will consider—it is the exact form of code that is executed by the processor.

    3.2.1 Machine-Level Code

    1. Te format and behavior of a machine-level program is defined by the instruction set architecture, or ISA, defining the processor state, the format of the instructions, and the effect each of these instructions will have on the state.

    2. The memory addresses used by a machine-level program are virtual addresses, providing a memory model that appears to be a very large byte array.

    Whereas C provides a model in which objects of different data types can be declared and allocated in memory, machine code views the memory as simply a large byte-addressable array.

    指令集介绍:

    CPU 指令集
    What Is an Instruction Set Architecture?

    虚拟地址介绍:

    Windows内存体系(1) – 虚拟地址空间
    virtual address

    3.3 Data Formats

    Due to its origins as a 16-bit architecture that expanded into a 32-bit one, Intel uses the term “word” to refer to a 16-bit data type. Based on this, they refer to 32- bit quantities as “double words,” and 64-bit quantities as “quad words.”
    Sizes of C data types in x86-64

    3.4 Accessing Information

    An x86-64 central processing unit (CPU) contains a set of 16 general-purpose registers storing 64-bit values. These registers are used to store integer data as well as pointers.
    Integer registers

    整数寄存器,用来存放整型指针,有6个寄存器用来存放6个参数,如果参数超过6个,则放在上。

    3.4.1 Operand Specifiers

    x86-64 支持操作数形式如下:
    Operand forms
    源操作数有三种:立即数(常数),寄存器,内存
    目标操作数:寄存器或者内存; 结果存放的位置

    立即数(immediate):常数值,对于 ATT 形式的汇编代码,用 $ 跟着一个整型(C 标准格式),如:$-577$0x1F

    寄存器(register):表示寄存器的值。

    内存引用(memory reference):根据有效地址获取内存的位置。

    3.4.2 Data Movement Instructions

    MOV 指令:These instructions copy data from a source location to a destination location, without any transformation.

    四种形式:movb, movw, movl, and movq,效果相同,只是数据尺寸不同,分别为
    1,2,4,8 字节:
    Simple data movement instructions

    MOV 指令源操作数和目标操作数不能同时为内存

    For most cases, the mov instructions will only update the specific register bytes or memory locations indicated by the destination operand.
    The only exception is that when movl has a register as the destination, it will also set the high-order 4 bytes of the register to 0.

    当使用 movl 指令时,且目标操作数为寄存器,那么寄存器的高4个字节会填充 0

    示例:

    1 movl $0x4050,%eax Immediate--Register, 4 bytes
    2 movw %bp,%sp Register--Register, 2 bytes
    3 movb (%rdi,%rcx),%al Memory--Register, 1 byte
    4 movb $-17,(%esp) Immediate--Memory, 1 byte
    5 movq %rax,-12(%rbp) Register--Memory, 8 bytes
    
    • 1
    • 2
    • 3
    • 4
    • 5

    The regular movq instruction can only have immediate source operands that can be represented as 32-bit two’s-complement numbers. This value is then sign extended to produce the 64-bit value for the destination.

    The movabsq instruction can have an arbitrary 64-bit immediate value as its source operand and can only have a register as a destination.

    当目标操作数尺寸大于源操作数,可以在高位进行零扩展或者符号位扩展

    零扩展MOVZ 指令:Instructions in the movz class fill out the remaining bytes of the destination with zeros, while those in the movs class fill them out by sign extension, replicating copies of the most significant bit of the source operand.

    指令:
    Zero-extending data movement instructions注意:这里没有 movzlq,因为当使用 movl 指令时,且目标操作数为寄存器,那么寄存器的高4个字节会填充 0

    要求:源操作数位寄存器或者内存地址,目标操作数位寄存器。

    符号位扩展 MOVS
    Sign-extending data movement instructions注意:for 64-bit destinations, moving with sign extension is supported for all three source types, and moving with zero extension is supported for the two smaller source types.

    示例:

    long exchange(long *xp, long y)
    {
    long x = *xp;
    *xp = y;
    return x;
    }
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6

    汇编指令:

    long exchange(long *xp, long y)
    xp in %rdi, y in %rsi
    1 exchange:
    2 movq (%rdi), %rax		Get x at xp. Set as return value.
    3 movq %rsi, (%rdi)		Store y at xp.
    4 ret					Return.
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6

    第一个和第二个参数分别存在寄存 %rdi%rsi 中。
    指令 2:根据 xp 地址从内存中取数值,存到寄存器 %rax 中。
    指令3:将 %rsi ,即 y 的值存到指针 xp 指向的数值对应的内存地址处。
    最后寄存器 %rax 的内容,即 x 的数值将作为返回值返回。

    上述的局部变量保存在寄存器而非内存中,寄存器比内存的访问速度更快。

    3.4.4 Pushing and Popping Stack Data

    Push 和 Pop 指令:
    Push and pop instructions
    栈操作介绍:
    Illustration of stack operation栈:程序用栈来管理过程调用与返回的状态,有先进后出的特性。
    在栈中,程序传递潜在信息、控制信息和数据,并分配本地数据。
    ×86-64栈:初始地址很高,向里面添加数据时,栈的地址向低扩展,寄存器 %rsp 包含栈顶的地址。
    当使用 pop 弹出栈顶的数据时,实际数据仍在该地址,只是栈顶的指针指向的位置变了(向高地址移动8)。

    3.5 Arithmetic and Logical Operations

    Each of the instruction classes shown has instructions for operating on four
    different sizes of data
    .

    For example, the instruction class add consists of four addition instructions: addb, addw, addl, and addq, adding bytes, words, double words, and quad words, respectively.

    The operations are divided into four groups: load effective address, unary, binary, and shifts.

    3.5.1 Load Effective Address

    The load effective address instruction leaq is actually a variant of the movq instruction.
    It has the form of an instruction that reads from memory to a register, but it does not reference memory at all.
    Integer arithmetic operations

    注意:leaq 不是直接读地址,而是复制地址到目标位置。

    3.5.5 Special Arithmetic Operations

    特殊的算数操作:
    Special Arithmetic Operations

    3.6 Control

    3.6.1 Condition Codes

    In addition to the integer registers, the CPU maintains a set of single-bit condition code registers describing attributes of the most recent arithmetic or logical operation.

    These registers can then be tested to perform conditional branches.
    condition codesCF: 进位标志

    比较和测试指令
    Comparison and test instructions
    比较指令:如果相等则标志位为0
    测试指令:和 AND 指令使用相同,但不会修改内容

    3.6.2 Accessing the Condition Codes

    SET 指令:
    The set instructions

    3.6.3 Jump Instructions

    jump 指令:
    The jump instructions

    When the machine encounters a conditional jump (referred to as a “branch”), it cannot determine which way the branch will go until it has evaluated the branch condition.
    跳转指令效率低:机器遇到条件跳转时,无法知道跳转到哪条分支,因此会左分支预测来猜测可能会跳转到哪条分支,如果预测失败,则需要舍弃之前的工作然后执行正确的分支。

    Unlike conditional jumps, the processor can execute conditional move instructions without having to predict the outcome of the test.
    条件传送指令无需预测,效率更高。

    条件传送指令:
    The conditional move instructions

    条件转移示例:
    如果要做如下判断:

    v = test-expr ? then-expr : else-expr;

    写成如下形式:

    v = then-expr;
    ve = else-expr;
    t = test-expr;
    if (!t) v = ve;
    
    • 1
    • 2
    • 3
    • 4

    注意:
    1、不是所有情况都能用条件转移,如:

    long cread(long *xp) {
    return (xp ? *xp : 0);
    }
    
    • 1
    • 2
    • 3

    如果用条件转移:

    long cread(long *xp)
    Invalid implementation of function cread
    xp in register %rdi
    1 cread:
    2 movq (%rdi), %rax v = *xp
    3 testq %rdi, %rdi Test x
    4 movl $0, %edx Set ve = 0
    5 cmove %rdx, %rax If x==0, v = ve
    6 ret Return v
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9

    那么当 xp 为空指针时,也会获取其值,将会出错。

    2、不是所有用条件转移的情况效率都会更高
    如果条件判断语句中有大量的计算过程,那么会浪费大量的工作在错误的分支计算中。

    因此,只有当条件表达式的计算较简单时采用条件转移

    3.6.7 Loops

    do-whilewhilefor 通过条件判断和跳转实现循环控制。

    3.6.8 Switch Statements

    A switch statement provides a multiway branching capability based on the value of an integer index.

    Not only do they make the C code more readable, but they also allow an efficient implementation using a data structure called a jump table.

    A jump table is an array where entry i is the address of a code segment implementing the action the program should take when the switch index equals i.

    The code performs an array reference into the jump table using the switch index to determine the target for a jump instruction.

    The advantage of using a jump table over a long sequence of if-else statements is that the time taken to perform the switch is independent of the number of switch cases.

    Jump tables are used when there are a number of cases (e.g., four or more) and they span a small range of values.

    switch 比用大量的 if-else 效率高。

    3.7 Procedures

    Well-designed software uses procedures as an abstraction mechanism, hiding the detailed implementation of some action while providing a clear and concise interface definition of what values will be computed and what effects the procedure will have on the program state.

    Procedures come in many guises in different programming languages—functions, methods, subroutines, handlers, and so on—but they all share a general set of features.
    过程是一个抽象,隐藏实现的细节,提供接口使用。对于不同的编程语言,其叫法不同,但本质特征一样。

    示例:

    long mult2(long, long);
    void multstore(long x, long y, long *dest) {
    long t = mult2(x, y);
    *dest = t;
    }
    
    • 1
    • 2
    • 3
    • 4
    • 5

    汇编代码:

    multstore:
    pushq %rbx
    movq %rdx, %rbx
    
    call mult2
    movq %rax, (%rbx)
    popq %rbx
    ret
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8

    P 调用过程 Q,将执行以下操作:
    1、传递控制 Passing control
    调用 call 指令,首先减小指针地址(8位),然后将这条调用指令之后的指令地址写入栈顶(调用返回后执行的指令地址),程序计数器将被设置为被调用函数的首地址(该地址存在 call 指令中),该指令结合了 jumppush 的功能。

    当被调用的过程 Q 执行完成,会执行 ret 指令(或 retq),该指令就是逆转 call 指令的效果。它会假设栈顶有一个想要跳转的地址,然后将栈顶的地址弹出(pop指令,弹出后栈顶的指针会增加,而该地址的内容不会消失,只是不属于栈的一部分),然后将程序计数器设置为弹出的地址,因此程序会回到原来的地方继续执行。

    2、传递数据 Passing data
    存放参数的寄存器有6个:%rdi, %rsi, %rdx, %rcx, %r8, %r9,存放返回值的寄存器位 %rax
    上述这些寄存器只能存放整型和指针,如果参数超过6个,则参数会被放入栈中。(参数放在寄存器中比栈中访问速度更快)

    3、管理和释放内存 Allocating and deallocation memory
    被调用的函数 Q 必须位局部变量分配空间,并且在返回前释放存储空间

    栈结构:
    General stack frame structure

    3.7.1 The Run-Time Stack

    Using our example of procedure P calling procedure Q, we can see that while Q is executing, P, along with any of the procedures in the chain of calls up to P, is temporarily suspended.
    Q 正在被执行时,P 以及与其相关的调用 P 的过程都处于挂起状态。

    执行特定函数时,只需要引用该函数内部的数据或者已传递给它的值,而其他函数处于冻结状态,这种为单线程运行模式。

    栈帧(stack frame):栈上用于特定 call 的每个内存块成为栈帧(It is a frame for a particular instance of a procedure, a particular call to a procedure)。

    我们需要在栈中为每个被调用且未返回的过程保留一个栈帧。

    通常一个栈帧由两个指针分隔,一个是栈指针(指向栈顶),另一个是基指针(base pointer),由寄存器 %rbp 保存,基指针是可选的。

    用递归的弊端:会不断增加栈帧,需要很多空间,而大多数系统会限制栈的深度。

    P 调用 Q 时,会将 Q 返回后地址(返回后要执行的位置)放在栈上,该返回地址也被认为是 P 栈帧的一部分。

    大多数过程的栈帧有固定的尺寸,在过程开始执行时就分配好空间,但有些情况需要可变大小的栈帧,如过程中传递的参数数目大于6个时,会在 Q 被调用前将多余的参数存放在 P 的栈帧上。
    Some procedures, however, require variable-size frames. Procedure P can pass up to six integral values (i.e., pointers and integers) on the stack, but if Q requires more arguments, these can be stored by P within its stack frame prior to the call.

    并非所以的函数都会用到栈帧,当一个函数的全部局部变量都存在寄存器中,且函数内部不需要调用其他函数,不需要栈帧。

    3.7.4 Local Storage on the Stack

    At times, however, local data must be stored in memory. Common cases of this include these:

    • There are not enough registers to hold all of the local data.
    • The address operator & is applied to a local variable, and hence we must be able to generate an address for it.
    • Some of the local variables are arrays or structures and hence must be accessed by array or structure references. We will discuss this possibility when we describe how arrays and structures are allocated.

    3.7.5 Local Storage in Registers

    Although only one procedure can be active at a given time, we must make sure that when one procedure (the caller) calls another (the callee), the callee does not overwrite some register value that the caller planned to use later.

    当一个过程调用另一个过程时,必须保证被调用者不会覆盖掉调用者以后需要使用的寄存器。

    By convention, registers %rbx, %rbp, and %r12–%r15 are classified as callee saved registers.

    P 调用 Q时,Q 必须保证那些在调用完后 P 仍需使用的寄存器的值在调用前后保持不变(如寄存器 %rdi (第一个参数的值)),可以有两种方式:
    1、保证 Q 不会改变这些寄存器的值
    2、在 Q 被调用前先将这些寄存器的值压入栈中,然后在调用结束后弹出这些值。这种方式在压入栈中是在栈帧中创建的区域被标记为 被保存的寄存器(Saved register)

    临时存放寄存器的值有两种:

    • Caller Saved
      调用者在调用前将临时值存放在它的栈帧中
    • Callee Saved
      被调用者在执行前将临时值存在它的栈帧中,然后在返回到调佣处前回复这些值

    寄存器 %r10%r11Caller Saved 寄存器,用于存放任何可以被函数修改的临时值

    寄存器 %rbx, %rbp%r12–%r15Callee Saved 寄存器,当一个函数要改变这些寄存器的值时,必须先压入栈中保存再在返回时从栈中弹出恢复数据

    示例:
    Code demonstrating use of callee-saved registers

    上述代码使用 callee-saved 寄存器 %rbp来存放 x 的值,用寄存器 %rbx 来存放 Q(y) 计算的结果。在函数的最开始,先将这两个寄存器的内容压入栈中,最后在函数的结果再弹出栈的内容到寄存器中。
    注意压入栈和弹出栈时的顺序相反(先进后出)。

    3.8 Array Allocation and Access

    机器代码里没有数组的概念,数组即为连续的字节集合。

    3.8.1 Basic Principles

    For data type T and integer constant N, consider a declaration of the form
    T A[N];

    标注数组起始地址为 x A x_{A} xA,该数组将分配一块连续的内存空间,大小为 L ⋅ N L \cdot N LN,其中 L 为数组元素类型 T 的大小(bytes),N 为数组元素的个数。

    标志符 A 将被用作一个指向数组起始地址的指针,数组元素的索引在 0N-1 区间,第 i 个元素的索引为 x A + L ⋅ i x_{A} + L \cdot i xA+Li

    The memory referencing instructions of x86-64 are designed to simplify array access.

    假设 E 是一个元素为 int 型的数组,如果想获取 E[i],那么 E 的地址存在寄存器 %rdx 中,而索引 i 则存在寄存器 %rcx 中,如下指令:

    movl (%rdx,%rcx,4),%eax
    
    • 1

    将实现 x E + 4 i x_{E} + 4i xE+4i,从该地址的内存处读取数据,然后存在寄存器 %eax 中(%eax 是 32 位存返回值的寄存器,%rax 是64位 存放返回值寄存器)。

    The allowed scaling factors of 1, 2, 4, and 8 cover the sizes of the common primitive data types. (上述例子中为 4)

    3.8.2 Pointer Arithmetic

    C allows arithmetic on pointers, where the computed value is scaled according to the size of the data type referenced by the pointer.
    That is, if p p p is a pointer to data of type T T T , and the value of p p p is x p x_{p} xp, then the expression p + i p+i p+i has value x p + L ⋅ i x_{p} + L \cdot i xp+Li, where L L L is the size of data type T T T .

    示例:
    Examples
    上述例子返回值的存放:the result being stored in either register %eax (for data) or register %rax (for pointers).

    注意:上述返回结果为 int 类型的数组值时,用 4 字节的操作 movl 和寄存器 %reax ,而返回值为 int * 指针时,其为 8 字节操作 leaqleaq 指令不是引用内存而是复制地址),用的寄存器位 %rax

    最后一个例子展示计算有相同结构的两个指针相减,返回结果类型为 long,数值等于两个地址的差值除以元素类型的大小,即为两个地址相差的元素个数。

    3.8.3 Nested Arrays

    多维数组:对于数组 T D[R][C],数组元素 D[i][j] 在内存中的地址为:
    & D [ i ] [ j ] D[i][j] D[i][j] = x D + L ( C ⋅ i + j ) x_{D} + L(C \cdot i + j) xD+L(Ci+j)

    其中 L 时数据类型 T 的尺寸(bytes)

    数组的结构如下图:
    Nested Arrays

    3.8.4 Fixed-Size Arrays

    示例:

    #define N 16  //通过 #define 来声明常量,便于修改尺寸
    //typedef 声明 fix_matrix 为一个二维数组,有 N 行,N列,元素类型为 int
    typedef int fix_matrix[N][N];  
    
    //(a) Original C code
    /* Compute i,k of fixed matrix product */
    int fix_prod_ele (fix_matrix A, fix_matrix B, long i, long k) {
    long j;
    int result = 0;
    for (j = 0; j < N; j++)
    result += A[i][j] * B[j][k];
    return result;
    }
    
    //(b) Optimized C code
    1 /* Compute i,k of fixed matrix product */
    2 int fix_prod_ele_opt(fix_matrix A, fix_matrix B, long i, long k) {
    3 int *Aptr = &A[i][0]; /* Points to elements in row i of A */
    4 int *Bptr = &B[0][k]; /* Points to elements in column k of B */
    5 int *Bend = &B[N][k]; /* Marks stopping point for Bptr */
    6 int result = 0;
    7 do { /* No need for initial test */
    8 result += *Aptr * *Bptr; /* Add next product to sum */
    9 Aptr ++; /* Move Aptr to next column */
    10 Bptr += N; /* Move Bptr to next row */
    11 } while (Bptr != Bend); /* Test for stopping point */
    12 return result;
    13 }
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28

    上述代码展示了优化前和优化后的两种实现方式,优化后的汇编代码如下:
    在这里插入图片描述
    在这里插入图片描述

    salq 为左移指令,对于 & A [ i ] [ 0 ] A[i][0] A[i][0],根据公式 & D [ i ] [ j ] D[i][j] D[i][j] = x D + L ( C ⋅ i + j ) x_{D} + L(C \cdot i + j) xD+L(Ci+j),可得到 x A + 64 i x_{A} + 64i xA+64i,而 i 左移 6 位即为 i ∗ 2 6 i * 2^6 i26

    13 行处比较的结果会保存在 ZF 标志中,相等则为0,然后根据 jne 指令,即判断 ZF 标志,不为 0,即不相等就跳转到标签 L7 处。

    3.8.5 Variable-Size Arrays

    Historically, C only supported multidimensional arrays where the sizes (with the possible exception of the first dimension) could be determined at compile time.

    Programmers requiring variable-size arrays had to allocate storage for these arrays using functions such as malloc or calloc, and they had to explicitly encode the mapping of multidimensional arrays into single-dimension ones via row-major indexing, as expressed in Equation 3.1. ISO C99 introduced the capability of having array dimension expressions that are computed as the array is being allocated.

    可变大小的多维数组 int A[expr1][expr2] ,定义如下函数:

    int var_ele(long n, int A[n][n], long i, long j) {
    return A[i][j];
    }
    
    • 1
    • 2
    • 3

    注意 n 要在 A[n][n] 前声明,汇编代码如下:
    referencing function这里计算 n ⋅ i n \cdot i ni 用到 imulq 乘法指令,而非用左移指令。

    示例:

    //(a) Original C code
    1 /* Compute i,k of variable matrix product */
    2 int var_prod_ele(long n, int A[n][n], int B[n][n], long i, long k) {
    3 long j;
    4 int result = 0;
    5 6
    for (j = 0; j < n; j++)
    7 result += A[i][j] * B[j][k];
    8 9
    return result;
    10 }
    
    //(b) Optimized C code
    /* Compute i,k of variable matrix product */
    int var_prod_ele_opt(long n, int A[n][n], int B[n][n], long i, long k) {
    int *Arow = A[i];
    int *Bptr = &B[0][k];
    int result = 0;
    long j;
    for (j = 0; j < n; j++) {
    result += Arow[j] * *Bptr;
    Bptr += n;
    }
    return result;
    }
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25

    优化后的汇编代码:
    assembly code

    3.9 Heterogeneous Data Structures

    3.9.1 Structures

    The different components of a structure are referenced by names.

    The implementation of structures is similar to that of arrays in that all of the components of a structure are stored in a contiguous region of memory and a pointer to a structure is the address of its first byte.

    例如对以下结构体:

    struct rec {
    int i;
    int j;
    int a[2];
    int *p;
    };
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6

    其结构如下:
    structure
    可见数组是嵌入在结构体中。
    例如变量 r 的类型为 rec *,获取结构体中成员的汇编代码:

    Registers: r in %rdi
    1 movl (%rdi), %eax 		Get r->i
    2 movl %eax, 4(%rdi) 		Store in r->j
    
    • 1
    • 2
    • 3

    上述代码第一条为获取 i 的数值,然后存到返回值寄存器 %eax 中;
    第二条为将寄存器 %eax 的内容写入到变量 r 地址加 4 后的地址所在的内存处,即 j 的数值,因此将 j 的数值设置为 i 的数值。

    To generate a pointer to an object within a structure, we can simply add the field’s offset to the structure address.
    For example, we can generate the pointer &(r->a[1]) by adding offset 8 + 4 ⋅ 1 = 12 8 + 4 \cdot 1= 12 8+41=12.
    For pointer r in register %rdi and long integer variable i in register %rsi, we can generate the pointer value &(r->a[i]) with the single instruction:

    Registers: r in %rdi, i %rsi
    1 leaq 8(%rdi,%rsi,4), %rax 		Set %rax to &r->a[i]
    
    • 1
    • 2

    The selection of the different fields of a structure is handled completely at compile time.

    3.9.2 Unions

    共用体:allowing a single object to be referenced according to multiple types.

    与结构体的区别:Rather than having the different fields reference different blocks of memory, they all reference the same block.

    共用体的大小为共用体中最大类型的变量的大小。
    The overall size of a union equals the maximum size of any of its fields.

    共用体的使用场景:
    1、One application is when we know in advance that the use of two different fields in a data structure
    will be mutually exclusive. Then, declaring these two fields as part of a union rather than a structure will reduce the total space allocated.
    提前知道需要使用几种不同的类型,且不同类型是互斥的使用。

    2、Unions can also be used to access the bit patterns of different data types.
    例如:需要做如下类型转换

    unsigned long u = (unsigned long) d;
    
    • 1

    通过以下方式实现:

    unsigned long double2bits(double d) {
    union {
    double d;
    unsigned long u;
    } temp;
    temp.d = d;
    return temp.u;
    };
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8

    The result will be that u will have the same bit representation as d, including fields for the sign bit, the exponent, and the significand.

    注意事项:
    When using unions to combine data types of different sizes, byte-ordering issues can become important.

    3.9.3 Data Alignment

    数据对齐:基于硬件的需求
    Alignment restrictions simplify the design of the hardware forming the interface between the processor and the memory system.

    使用数据对齐能提升内存系统的性能。

    对齐的规则:
    Their alignment rule is based on the principle that any primitive object of K bytes must have an address that is a multiple of K. We can see that this rule leads to the following alignments:
    alignment rule

    具体 K 的大小是多少,由结构体中尺寸最大的类型对应的 K 决定。

    如:

    struct S1 {
    int i;
    char c;
    int j;
    };
    
    • 1
    • 2
    • 3
    • 4
    • 5

    对齐后如下:
    alignment

    有时编译器可能需要在结构体末尾填充,如:

    struct S2 {
    int i;
    int j;
    char c;
    };
    
    • 1
    • 2
    • 3
    • 4
    • 5

    对齐后:
    alignment

    3.10 Combining Control and Data in Machine-Level Programs

    3.10.3 Out-of-Bounds Memory References and Buffer Overflow

    GDB 调试的命令:
    Example gdb commands

    3.10.4 Thwarting Buffer Overflow Attacks (?)

    未看
    缓冲区溢出解决方案:

    1. Stack Randomization
    2. Stack Corruption Detection
    3. Limiting Executable Code Regions

    3.10.5 Supporting Variable-Size Stack Frames (?)

    未看
    Some functions, however, require a variable amount of local storage.

    This can occur, for example, when the function calls alloca, a standard library function that can allocate an arbitrary number of bytes of storage on the stack.

    It can also occur when the code declares a local array of variable size.

    示例:

    long vframe(long n, long idx, long *q) {
    long i;
    long *p[n];
    p[0] = &i;
    for (i = 1; i < n; i++)
    	p[i] = q;
    	
    return *p[idx];
    }
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9

    上述代码包含可变尺寸的数组,数组 p 为包含 n 个指向 long 的指针,不同调用函数可能传递的 n 的值不同,而该数组需要在栈上分配 8n 字节,因此编译器无法知道为该函数的栈帧分配多少空间。

    此外,该函数用到了对局部变量 i 的地址的引用,因此该变量必须存在栈上。(?)

    To manage a variable-size stack frame, x86-64 code uses register %rbp to serve as a frame pointer (sometimes referred to as a base pointer, and hence the letters bp in %rbp).

    栈帧的结构如下:
    Stack frame structure
    We see that the code must save the previous version of %rbp on the stack, since it is a callee-saved register.
    It then keeps %rbp pointing to this position throughout the execution of the function, and it references fixed-length local variables, such as i, at offsets relative to %rbp.

    汇编代码如下:
    Function requiring the use of a frame pointer
    首先保存当前 %rbp 寄存器的值到栈中,然后将 %rbp 寄存器的值设置为栈指针位置 %rsp
    然后将栈指针位置向下扩展16 字节。
    5 - 11 行为数组 p 分配空间 (?)

    In earlier versions of x86 code, the frame pointer was used with every function call. With x86-64 code, it is used only in cases where the stack frame may be of variable size, as is the case for function vframe.

    3.11 Floating-Point Code

    未看
    存放浮点数据的寄存器:
    Media registers

    Floating-point movement instructions:
    Floating-point movement instructions

    3.11.1 Floating-Point Movement and Conversion Operations

    浮点型转换操作:
    floating-point conversion operations

    3.11.6 Floating-Point Comparison Operations

    浮点型比较:
    Floating-Point Comparison Operations

  • 相关阅读:
    LQ0264 鲁卡斯队列【精度计算】
    vue怎样封装接口
    python经典百题之判断星期几
    Discrod账号为什么被封?怎么解封?这些是关键
    记录为小电机安装一个5012编码器(多摩川协议)的过程
    这个 MySQL bug 99% 的人会踩坑!
    软考通过率低吗?怎么备考?
    3.11-程序基本的控制语句 3.12-表达式 3.13-数据类型 3.14-常量/变量 3.15-标识符
    133道Java面试题及答案(面试必看)
    含文档+PPT+源码等]精品基于SpringCloud实现的商品服务系统-微服务毕业设计项目源码-分布式毕设项目[包运行成功]
  • 原文地址:https://blog.csdn.net/Lee567/article/details/126115738