llvm
gcc
优化器
llvm
更只能, 更加用户友好, 编译速度快; 且不断开发迭代;gcc
用户不太友好, 但是普遍更优;https://llvm.org/docs/Vectorizers.html
优化数量未知数量的循环
void test(int *a, int len) { for(int i = 0 ; i < len ; ++i) { a[i] += 5; } } int main(int argc,char**argv) { int* a = new int[argc<<10]; test(a,argc<<10); return a[0]; }
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
void test(int *a, int len) { int i; for(i = 0 ; i < len - 3 ; i += 4) { a[i] += 5; a[i + 1] += 5; a[i + 2] += 5; a[i + 3] += 5; } for(int j = i ; j < len ; j++) { a[j] += 5; } }
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
小结
为什么
restrict
案例分析
void updatePtrs(size_t *ptrA, size_t *ptrB, size_t *val) { *ptrA += *val; *ptrB += *val; }
- 1
- 2
- 3
- 4
- 5
不能进行优化, 如果
ptra, ptrb
指向同一个地址, 那么这个函数的意思就是给同一个加2 * *val
如果不是同一个, 则是各自加
*val
;使用
restrict
限制表示开发者知道, 传入的指针不可能相同, 则进行其他优化; 正确性由开发者自行保障;- https://en.wikipedia.org/wiki/Restrict
- https://zhuanlan.zhihu.com/p/349726808
; Hypothetical RISC Machine. ldr r12, [val] ; Load memory at val to r12. ldr r3, [ptrA] ; Load memory at ptrA to r3. add r3, r3, r12 ; Perform addition: r3 = r3 + r12. str r3, [ptrA] ; Store r3 to memory location ptrA, updating the value. ldr r3, [ptrB] ; 'load' may have to wait until preceding 'store' completes. ldr r12, [val] ; Have to load a second time to ensure consistency. add r3, r3, r12 str r3, [ptrB] ldr r12, [val] ; Note that val is now only loaded once. ldr r3, [ptrA] ; Also, all 'load's in the beginning ... ldr r4, [ptrB] add r3, r3, r12 add r4, r4, r12 str r3, [ptrA] ; ... all 'store's in the end. str r4, [ptrB]
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
小结
reduction variable, Induction Variables, Pointer Induction Variables
解释概念
案例分析
int foo(int *A, int *B, int n) { unsigned sum = 0; for (int i = 0; i < n; ++i) sum += A[i] + 5; return sum; }
- 1
- 2
- 3
- 4
- 5
- 6
SIMD
怎么操作sum
? 同时操作? 同时操作会不会出问题?这种变量会进行优化: 拆分成若干独立的变量
result = var1 op var2;
, 最后按照op
将拆分的result
进行合并即可;一般结合
-ffast-math
使用;int foo(int *A, int *B, int n) { unsigned sum0 = 0; unsigned sum1 = 0; unsigned sum2 = 0; unsigned sum3 = 0; for (int i = 0; i < n - 3; i += 4) { sum0 += A[i] + 5; sum1 += A[i + 1] + 5; sum2 += A[i + 2] + 5; sum3 += A[i + 3] + 5; } return sum0 + sum1 + sum2 + sum3; }
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
指针类型的
induction
int baz(int *A, int n) { return std::accumulate(A, A + n, 0); }
- 1
- 2
- 3
小结
if-conversion
简介
尝试用算法去掉
if else
甚至goto
; 将分支依赖变成数据依赖,即所有都执行。这样会导致任务变多,可能会变慢;参考MIT 6.172 3. bit hacks
其他
scatter/gather
说明
- https://en.wikipedia.org/wiki/Vectored_I/O
llvm
的loop vectorizer
可以将代码中零散内存中获取数据行为的也进行优化;案例
int foo(int * A, int * B, int n) { for (intptr_t i = 0; i < n; ++i) A[i] += B[i * 4]; }
- 1
- 2
- 3
- 4
vectorize
unroll when vectorization