这章没有细看,涉及很多汇编内容,看以下博客讲解帮助理解下
图解系统
程序的机器级表示(一)
程序的机器级表示(二)
程序的机器级表示(三)
程序的机器级表示(四)
深入理解计算机系统学习笔记
程序的机器级表示
深入理解计算机系统(CSAPP)全书学习笔记(详细)
深入理解计算机系统 课程视频
看完后体会:第一次看时感觉记不住各种寄存器的含义等,但往后看多了例子后就逐渐记住和理解一些了,看的时候把里面的重要概念或者命令的图打印出来,方面看到代码的时候查看命令,看不懂的地方网上搜下别人的笔记讲解,或者视频课程讲解,然后继续往下看,发现后面经常会提到前面的知识,或者有示例帮助巩固理解,继续看完后发现前面一些地方比之前容易懂了。
C 程序从源文件生成可执行文件的过程:
The C Preprocessor
expands the source code to include any files specified with #include
commands and to expand any macros
, specified with #define
declarations.
The compiler generates assembly code versions of the source files (*.s).
The assembler converts the assembly code into binary object-code files (*.o). Object code is one form of machine code—it contains binary representations of all of the instructions, but the addresses of global values are not yet filled in.
The linker merges these object-code files along with code implementing library functions (e.g., printf
) and generates the final executable code file (as specified by the command-line directive -o p). Executable code is the second form of machine code we will consider—it is the exact form of code that is executed by the processor.
Te format and behavior of a machine-level program is defined by the instruction set architecture
, or ISA
, defining the processor state, the format of the instructions, and the effect each of these instructions will have on the state.
The memory addresses used by a machine-level program are virtual addresses
, providing a memory model that appears to be a very large byte array.
Whereas C provides a model in which objects of different data types can be declared and allocated in memory, machine code views the memory as simply a large byte-addressable array.
指令集介绍:
虚拟地址介绍:
Due to its origins as a 16-bit architecture
that expanded into a 32-bit
one, Intel uses the term “word” to refer to a 16-bit
data type. Based on this, they refer to 32- bit
quantities as “double words,” and 64-bit
quantities as “quad words.”
An x86-64 central processing unit (CPU) contains a set of 16 general-purpose registers storing 64-bit values. These registers are used to store integer data as well as pointers.
整数寄存器,用来存放整型
和指针
,有6个寄存器用来存放6个参数,如果参数超过6个,则放在栈
上。
x86-64
支持操作数形式如下:
源操作数有三种:立即数(常数),寄存器,内存
目标操作数:寄存器或者内存; 结果存放的位置
立即数(immediate):常数值,对于 ATT 形式的汇编代码,用 $
跟着一个整型(C 标准格式),如:$-577
,$0x1F
。
寄存器(register):表示寄存器的值。
内存引用(memory reference):根据有效地址获取内存的位置。
MOV
指令:These instructions copy data from a source location to a destination location, without any transformation.
四种形式:movb, movw, movl, and movq
,效果相同,只是数据尺寸不同,分别为
1,2,4,8
字节:
MOV
指令源操作数和目标操作数不能同时为内存
For most cases, the mov
instructions will only update the specific register
bytes or memory
locations indicated by the destination operand
.
The only exception is that when movl
has a register
as the destination
, it will also set the high-order 4 bytes of the register to 0.
当使用 movl
指令时,且目标操作数为寄存器,那么寄存器的高4个字节会填充 0
。
示例:
1 movl $0x4050,%eax Immediate--Register, 4 bytes
2 movw %bp,%sp Register--Register, 2 bytes
3 movb (%rdi,%rcx),%al Memory--Register, 1 byte
4 movb $-17,(%esp) Immediate--Memory, 1 byte
5 movq %rax,-12(%rbp) Register--Memory, 8 bytes
The regular movq
instruction can only have immediate
source operands that can be represented as 32-bit two’s-complement numbers
. This value is then sign
extended to produce the 64-bit value
for the destination
.
The movabsq
instruction can have an arbitrary 64-bit immediate
value as its source
operand and can only have a register as a destination.
当目标操作数尺寸大于源操作数,可以在高位进行零扩展
或者符号位扩展
。
零扩展MOVZ
指令:Instructions in the movz
class fill out the remaining bytes of the destination
with zeros
, while those in the movs
class fill them out by sign extension
, replicating copies of the most significant bit
of the source
operand.
指令:
注意:这里没有 movzlq
,因为当使用 movl
指令时,且目标操作数为寄存器,那么寄存器的高4个字节会填充 0
。
要求:源操作数位寄存器或者内存地址,目标操作数位寄存器。
符号位扩展 MOVS
:
注意:for 64-bit destinations
, moving with sign extension
is supported for all three source types, and moving with zero extension
is supported for the two smaller source types.
示例:
long exchange(long *xp, long y)
{
long x = *xp;
*xp = y;
return x;
}
汇编指令:
long exchange(long *xp, long y)
xp in %rdi, y in %rsi
1 exchange:
2 movq (%rdi), %rax Get x at xp. Set as return value.
3 movq %rsi, (%rdi) Store y at xp.
4 ret Return.
第一个和第二个参数分别存在寄存 %rdi
和 %rsi
中。
指令 2:根据 xp
地址从内存中取数值,存到寄存器 %rax
中。
指令3:将 %rsi
,即 y
的值存到指针 xp
指向的数值对应的内存地址处。
最后寄存器 %rax
的内容,即 x
的数值将作为返回值返回。
上述的局部变量保存在寄存器而非内存中,寄存器比内存的访问速度更快。
Push
和 Pop 指令:
栈操作介绍:
栈:程序用栈来管理过程调用与返回的状态,有先进后出的特性。
在栈中,程序传递潜在信息、控制信息和数据,并分配本地数据。
×86-64
栈:初始地址很高,向里面添加数据时,栈的地址向低扩展,寄存器 %rsp
包含栈顶的地址。
当使用 pop
弹出栈顶的数据时,实际数据仍在该地址,只是栈顶的指针指向的位置变了(向高地址移动8)。
Each of the instruction classes shown has instructions for operating on four
different sizes of data.
For example, the instruction class add consists of four addition instructions: addb
, addw
, addl
, and addq
, adding bytes
, words
, double words
, and quad words
, respectively.
The operations are divided into four groups: load effective address, unary, binary, and shifts.
The load effective address
instruction leaq
is actually a variant of the movq
instruction.
It has the form of an instruction that reads from memory to a register, but it does not reference memory at all.
注意:leaq
不是直接读地址,而是复制地址到目标位置。
特殊的算数操作:
In addition to the integer registers, the CPU maintains a set of single-bit condition code registers describing attributes of the most recent arithmetic or logical operation.
These registers can then be tested to perform conditional branches.
CF: 进位标志
比较和测试指令
比较指令:如果相等则标志位为0
测试指令:和 AND
指令使用相同,但不会修改内容
SET
指令:
jump
指令:
When the machine encounters a conditional jump (referred to as a “branch”), it cannot determine which way the branch will go until it has evaluated the branch condition.
跳转指令效率低:机器遇到条件跳转时,无法知道跳转到哪条分支,因此会左分支预测来猜测可能会跳转到哪条分支,如果预测失败,则需要舍弃之前的工作然后执行正确的分支。
Unlike conditional jumps, the processor can execute conditional move instructions without having to predict the outcome of the test.
条件传送指令无需预测,效率更高。
条件传送指令:
条件转移示例:
如果要做如下判断:
v = test-expr ? then-expr : else-expr;
写成如下形式:
v = then-expr;
ve = else-expr;
t = test-expr;
if (!t) v = ve;
注意:
1、不是所有情况都能用条件转移,如:
long cread(long *xp) {
return (xp ? *xp : 0);
}
如果用条件转移:
long cread(long *xp)
Invalid implementation of function cread
xp in register %rdi
1 cread:
2 movq (%rdi), %rax v = *xp
3 testq %rdi, %rdi Test x
4 movl $0, %edx Set ve = 0
5 cmove %rdx, %rax If x==0, v = ve
6 ret Return v
那么当 xp
为空指针时,也会获取其值,将会出错。
2、不是所有用条件转移的情况效率都会更高
如果条件判断语句中有大量的计算过程,那么会浪费大量的工作在错误的分支计算中。
因此,只有当条件表达式的计算较简单时采用条件转移。
do-while
,while
和 for
通过条件判断和跳转实现循环控制。
A switch
statement provides a multiway branching capability based on the value of an integer index.
Not only do they make the C code more readable, but they also allow an efficient implementation using a data structure called a jump table
.
A jump table
is an array
where entry i
is the address of a code segment implementing the action the program should take when the switch
index equals i
.
The code performs an array reference into the jump table
using the switch
index to determine the target for a jump instruction.
The advantage of using a jump table
over a long sequence of if-else
statements is that the time taken to perform the switch
is independent of the number of switch cases
.
Jump tables
are used when there are a number of cases (e.g., four or more) and they span a small range of values.
用 switch
比用大量的 if-else
效率高。
Well-designed software uses procedures
as an abstraction mechanism
, hiding the detailed implementation of some action while providing a clear and concise interface definition of what values will be computed and what effects the procedure will have on the program state.
Procedures
come in many guises in different programming languages—functions, methods, subroutines, handlers, and so on—but they all share a general set of features.
过程是一个抽象,隐藏实现的细节,提供接口使用。对于不同的编程语言,其叫法不同,但本质特征一样。
示例:
long mult2(long, long);
void multstore(long x, long y, long *dest) {
long t = mult2(x, y);
*dest = t;
}
汇编代码:
multstore:
pushq %rbx
movq %rdx, %rbx
call mult2
movq %rax, (%rbx)
popq %rbx
ret
当 P
调用过程 Q
,将执行以下操作:
1、传递控制 Passing control
调用 call
指令,首先减小指针地址(8位),然后将这条调用指令之后的指令地址写入栈顶(调用返回后执行的指令地址),程序计数器将被设置为被调用函数的首地址(该地址存在 call
指令中),该指令结合了 jump
和 push
的功能。
当被调用的过程 Q
执行完成,会执行 ret
指令(或 retq
),该指令就是逆转 call
指令的效果。它会假设栈顶有一个想要跳转的地址,然后将栈顶的地址弹出(pop
指令,弹出后栈顶的指针会增加,而该地址的内容不会消失,只是不属于栈的一部分),然后将程序计数器设置为弹出的地址,因此程序会回到原来的地方继续执行。
2、传递数据 Passing data
存放参数的寄存器有6个:%rdi, %rsi, %rdx, %rcx, %r8, %r9
,存放返回值的寄存器位 %rax
。
上述这些寄存器只能存放整型和指针,如果参数超过6个,则参数会被放入栈中。(参数放在寄存器中比栈中访问速度更快)
3、管理和释放内存 Allocating and deallocation memory
被调用的函数 Q
必须位局部变量分配空间,并且在返回前释放存储空间
栈结构:
Using our example of procedure P
calling procedure Q
, we can see that while Q
is executing, P
, along with any of the procedures in the chain of calls up to P
, is temporarily suspended.
当 Q
正在被执行时,P
以及与其相关的调用 P
的过程都处于挂起状态。
执行特定函数时,只需要引用该函数内部的数据或者已传递给它的值,而其他函数处于冻结状态,这种为单线程运行模式。
栈帧(stack frame):栈上用于特定 call
的每个内存块成为栈帧(It is a frame for a particular instance of a procedure, a particular call to a procedure)。
我们需要在栈中为每个被调用且未返回的过程保留一个栈帧。
通常一个栈帧由两个指针分隔,一个是栈指针(指向栈顶),另一个是基指针(base pointer),由寄存器 %rbp
保存,基指针是可选的。
用递归的弊端:会不断增加栈帧,需要很多空间,而大多数系统会限制栈的深度。
当 P
调用 Q
时,会将 Q
返回后地址(返回后要执行的位置)放在栈上,该返回地址也被认为是 P
栈帧的一部分。
大多数过程的栈帧有固定的尺寸,在过程开始执行时就分配好空间,但有些情况需要可变大小的栈帧,如过程中传递的参数数目大于6个时,会在 Q
被调用前将多余的参数存放在 P
的栈帧上。
Some procedures, however, require variable-size frames. Procedure P
can pass up to six integral values (i.e., pointers and integers) on the stack, but if Q
requires more arguments, these can be stored by P
within its stack frame
prior to the call.
并非所以的函数都会用到栈帧,当一个函数的全部局部变量都存在寄存器中,且函数内部不需要调用其他函数,不需要栈帧。
At times, however, local data
must be stored in memory
. Common cases of this include these:
&
is applied to a local variable, and hence we must be able to generate an address for it.Although only one procedure
can be active at a given time, we must make sure that when one procedure
(the caller
) calls another (the callee
), the callee
does not overwrite some register value that the caller
planned to use later.
当一个过程调用另一个过程时,必须保证被调用者不会覆盖掉调用者以后需要使用的寄存器。
By convention, registers %rbx, %rbp, and %r12–%r15
are classified as callee saved registers
.
当 P
调用 Q
时,Q
必须保证那些在调用完后 P
仍需使用的寄存器的值在调用前后保持不变(如寄存器 %rdi
(第一个参数的值)),可以有两种方式:
1、保证 Q
不会改变这些寄存器的值
2、在 Q
被调用前先将这些寄存器的值压入栈中,然后在调用结束后弹出这些值。这种方式在压入栈中是在栈帧中创建的区域被标记为 被保存的寄存器(Saved register)
临时存放寄存器的值有两种:
寄存器 %r10
和 %r11
为 Caller Saved
寄存器,用于存放任何可以被函数
修改的临时值。
寄存器 %rbx, %rbp
和 %r12–%r15
为 Callee Saved
寄存器,当一个函数要改变这些寄存器的值时,必须先压入栈中保存再在返回时从栈中弹出恢复数据
示例:
上述代码使用 callee-saved
寄存器 %rbp
来存放 x
的值,用寄存器 %rbx
来存放 Q(y)
计算的结果。在函数的最开始,先将这两个寄存器的内容压入栈中,最后在函数的结果再弹出栈的内容到寄存器中。
注意压入栈和弹出栈时的顺序相反(先进后出)。
机器代码里没有数组的概念,数组即为连续的字节集合。
For data type T
and integer constant N
, consider a declaration of the form
T A[N]
;
标注数组起始地址为
x
A
x_{A}
xA,该数组将分配一块连续的内存空间,大小为
L
⋅
N
L \cdot N
L⋅N,其中 L
为数组元素类型 T
的大小(bytes),N
为数组元素的个数。
标志符 A
将被用作一个指向数组起始地址的指针,数组元素的索引在 0
到 N-1
区间,第 i
个元素的索引为
x
A
+
L
⋅
i
x_{A} + L \cdot i
xA+L⋅i。
The memory referencing instructions of x86-64
are designed to simplify array access.
假设 E
是一个元素为 int
型的数组,如果想获取 E[i]
,那么 E
的地址存在寄存器 %rdx
中,而索引 i
则存在寄存器 %rcx
中,如下指令:
movl (%rdx,%rcx,4),%eax
将实现
x
E
+
4
i
x_{E} + 4i
xE+4i,从该地址的内存处读取数据,然后存在寄存器 %eax
中(%eax
是 32 位存返回值的寄存器,%rax
是64位 存放返回值寄存器)。
The allowed scaling factors of 1, 2, 4, and 8
cover the sizes of the common primitive data types. (上述例子中为 4)
C allows arithmetic on pointers, where the computed value is scaled according to the size of the data type referenced by the pointer
.
That is, if
p
p
p is a pointer to data of type
T
T
T , and the value of
p
p
p is
x
p
x_{p}
xp, then the expression
p
+
i
p+i
p+i has value
x
p
+
L
⋅
i
x_{p} + L \cdot i
xp+L⋅i, where
L
L
L is the size of data type
T
T
T .
示例:
上述例子返回值的存放:the result being stored in either register %eax
(for data) or register %rax
(for pointers).
注意:上述返回结果为 int
类型的数组值时,用 4 字节的操作 movl
和寄存器 %reax
,而返回值为 int *
指针时,其为 8 字节操作 leaq
(leaq
指令不是引用内存而是复制地址),用的寄存器位 %rax
。
最后一个例子展示计算有相同结构的两个指针相减,返回结果类型为 long
,数值等于两个地址的差值除以元素类型的大小,即为两个地址相差的元素个数。
多维数组:对于数组 T D[R][C]
,数组元素 D[i][j]
在内存中的地址为:
&
D
[
i
]
[
j
]
D[i][j]
D[i][j] =
x
D
+
L
(
C
⋅
i
+
j
)
x_{D} + L(C \cdot i + j)
xD+L(C⋅i+j)
其中 L
时数据类型 T
的尺寸(bytes)
数组的结构如下图:
示例:
#define N 16 //通过 #define 来声明常量,便于修改尺寸
//typedef 声明 fix_matrix 为一个二维数组,有 N 行,N列,元素类型为 int
typedef int fix_matrix[N][N];
//(a) Original C code
/* Compute i,k of fixed matrix product */
int fix_prod_ele (fix_matrix A, fix_matrix B, long i, long k) {
long j;
int result = 0;
for (j = 0; j < N; j++)
result += A[i][j] * B[j][k];
return result;
}
//(b) Optimized C code
1 /* Compute i,k of fixed matrix product */
2 int fix_prod_ele_opt(fix_matrix A, fix_matrix B, long i, long k) {
3 int *Aptr = &A[i][0]; /* Points to elements in row i of A */
4 int *Bptr = &B[0][k]; /* Points to elements in column k of B */
5 int *Bend = &B[N][k]; /* Marks stopping point for Bptr */
6 int result = 0;
7 do { /* No need for initial test */
8 result += *Aptr * *Bptr; /* Add next product to sum */
9 Aptr ++; /* Move Aptr to next column */
10 Bptr += N; /* Move Bptr to next row */
11 } while (Bptr != Bend); /* Test for stopping point */
12 return result;
13 }
上述代码展示了优化前和优化后的两种实现方式,优化后的汇编代码如下:
salq
为左移指令,对于 &
A
[
i
]
[
0
]
A[i][0]
A[i][0],根据公式 &
D
[
i
]
[
j
]
D[i][j]
D[i][j] =
x
D
+
L
(
C
⋅
i
+
j
)
x_{D} + L(C \cdot i + j)
xD+L(C⋅i+j),可得到
x
A
+
64
i
x_{A} + 64i
xA+64i,而 i
左移 6
位即为
i
∗
2
6
i * 2^6
i∗26。
13 行处比较的结果会保存在 ZF
标志中,相等则为0,然后根据 jne
指令,即判断 ZF
标志,不为 0
,即不相等就跳转到标签 L7
处。
Historically, C only supported multidimensional arrays where the sizes (with the possible exception of the first dimension) could be determined at compile time.
Programmers requiring variable-size arrays had to allocate storage for these arrays using functions such as malloc
or calloc
, and they had to explicitly encode the mapping of multidimensional arrays into single-dimension ones via row-major indexing, as expressed in Equation 3.1. ISO C99 introduced the capability of having array dimension expressions that are computed as the array is being allocated.
可变大小的多维数组 int A[expr1][expr2]
,定义如下函数:
int var_ele(long n, int A[n][n], long i, long j) {
return A[i][j];
}
注意 n
要在 A[n][n]
前声明,汇编代码如下:
这里计算
n
⋅
i
n \cdot i
n⋅i 用到 imulq
乘法指令,而非用左移指令。
示例:
//(a) Original C code
1 /* Compute i,k of variable matrix product */
2 int var_prod_ele(long n, int A[n][n], int B[n][n], long i, long k) {
3 long j;
4 int result = 0;
5 6
for (j = 0; j < n; j++)
7 result += A[i][j] * B[j][k];
8 9
return result;
10 }
//(b) Optimized C code
/* Compute i,k of variable matrix product */
int var_prod_ele_opt(long n, int A[n][n], int B[n][n], long i, long k) {
int *Arow = A[i];
int *Bptr = &B[0][k];
int result = 0;
long j;
for (j = 0; j < n; j++) {
result += Arow[j] * *Bptr;
Bptr += n;
}
return result;
}
优化后的汇编代码:
The different components of a structure are referenced by names.
The implementation of structures is similar to that of arrays in that all of the components of a structure are stored in a contiguous region of memory and a pointer to a structure is the address of its first byte.
例如对以下结构体:
struct rec {
int i;
int j;
int a[2];
int *p;
};
其结构如下:
可见数组是嵌入在结构体中。
例如变量 r
的类型为 rec *
,获取结构体中成员的汇编代码:
Registers: r in %rdi
1 movl (%rdi), %eax Get r->i
2 movl %eax, 4(%rdi) Store in r->j
上述代码第一条为获取 i
的数值,然后存到返回值寄存器 %eax
中;
第二条为将寄存器 %eax
的内容写入到变量 r
地址加 4
后的地址所在的内存处,即 j
的数值,因此将 j
的数值设置为 i
的数值。
To generate a pointer to an object within a structure, we can simply add the field’s offset to the structure address.
For example, we can generate the pointer &(r->a[1])
by adding offset
8
+
4
⋅
1
=
12
8 + 4 \cdot 1= 12
8+4⋅1=12.
For pointer r
in register %rdi
and long integer variable i
in register %rsi
, we can generate the pointer value &(r->a[i])
with the single instruction:
Registers: r in %rdi, i %rsi
1 leaq 8(%rdi,%rsi,4), %rax Set %rax to &r->a[i]
The selection of the different fields of a structure is handled completely at compile time.
共用体:allowing a single object to be referenced according to multiple types.
与结构体的区别:Rather than having the different fields reference different blocks of memory, they all reference the same block.
共用体的大小为共用体中最大类型的变量的大小。
The overall size of a union
equals the maximum size
of any of its fields.
共用体的使用场景:
1、One application is when we know in advance that the use of two different fields in a data structure
will be mutually exclusive. Then, declaring these two fields as part of a union rather than a structure will reduce the total space allocated.
提前知道需要使用几种不同的类型,且不同类型是互斥的使用。
2、Unions
can also be used to access the bit patterns of different data types.
例如:需要做如下类型转换
unsigned long u = (unsigned long) d;
通过以下方式实现:
unsigned long double2bits(double d) {
union {
double d;
unsigned long u;
} temp;
temp.d = d;
return temp.u;
};
The result will be that u
will have the same bit representation as d
, including fields for the sign bit, the exponent, and the significand.
注意事项:
When using unions to combine data types of different sizes, byte-ordering issues can become important.
数据对齐:基于硬件的需求
Alignment restrictions simplify the design of the hardware forming the interface between the processor
and the memory system
.
使用数据对齐能提升内存系统的性能。
对齐的规则:
Their alignment rule is based on the principle that any primitive object of K
bytes must have an address that is a multiple of K. We can see that this rule leads to the following alignments:
具体 K
的大小是多少,由结构体中尺寸最大的类型对应的 K
决定。
如:
struct S1 {
int i;
char c;
int j;
};
对齐后如下:
有时编译器可能需要在结构体末尾填充,如:
struct S2 {
int i;
int j;
char c;
};
对齐后:
GDB
调试的命令:
未看
缓冲区溢出解决方案:
未看
Some functions, however, require a variable amount of local storage.
This can occur, for example, when the function calls alloca
, a standard library function that can allocate an arbitrary number of bytes of storage on the stack.
It can also occur when the code declares a local array of variable size
.
示例:
long vframe(long n, long idx, long *q) {
long i;
long *p[n];
p[0] = &i;
for (i = 1; i < n; i++)
p[i] = q;
return *p[idx];
}
上述代码包含可变尺寸的数组,数组 p
为包含 n
个指向 long
的指针,不同调用函数可能传递的 n
的值不同,而该数组需要在栈上分配 8n
字节,因此编译器无法知道为该函数的栈帧分配多少空间。
此外,该函数用到了对局部变量 i
的地址的引用,因此该变量必须存在栈上。(?)
To manage a variable-size stack frame
, x86-64
code uses register %rbp
to serve as a frame pointer
(sometimes referred to as a base pointer
, and hence the letters bp
in %rbp
).
栈帧的结构如下:
We see that the code must save the previous version of %rbp
on the stack
, since it is a callee-saved register
.
It then keeps %rbp
pointing to this position throughout the execution of the function, and it references fixed-length local variables
, such as i
, at offsets relative to %rbp
.
汇编代码如下:
首先保存当前 %rbp
寄存器的值到栈中,然后将 %rbp
寄存器的值设置为栈指针位置 %rsp
。
然后将栈指针位置向下扩展16 字节。
5 - 11 行为数组 p
分配空间 (?)
In earlier versions of x86 code
, the frame pointer was used with every function call. With x86-64 code
, it is used only in cases where the stack frame
may be of variable size, as is the case for function vframe
.
未看
存放浮点数据的寄存器:
Floating-point movement instructions:
浮点型转换操作:
浮点型比较: