• [DRAFT] LLVM ThinLTO原理分析


    我们在《论文阅读:ThinLTO: Scalable and Incremental LTO》中介绍了ThinLTO论文的主要思想,这里我们介绍下LLVM ThinLTO是如何实现的。本文主要分为如下几个部分:

    • LLVM ThinLTO Object 含有哪些内容?
    • LLVM ThinLTO 是如何做优化的?
    • LLVM ThinLTO 能够enable哪些优化?

    LLVM ThinLTO Objects都包含了哪些?

    继续使用 Example of link time optimization 中的例子进行分析,在《LLVM full LTO 学习笔记》中我们通过 magic number 作为切入点,简单分析了 full lto 的过程。下面按照这个路子继续该分析

    $ clang -flto=thin -c a.c -o a_lto.o
    $ clang -flto=thin -c main.c -o main_lto.o
    $ hexdump a_lto.o | head
    0000000 4342 dec0 1435 0000 0005 0000 0c62 2430
    0000010 594d 66be fb8d 4fb4 c81b 4424 3201 0005
    0000020 0c21 0000 0266 0000 020b 0021 0002 0000
    0000030 0016 0000 8107 9123 c841 4904 1006 3932
    0000040 0192 0c84 0525 1908 041e 628b 1080 0245
    0000050 9242 420b 1084 1432 0838 4b18 320a 8842
    0000060 7048 21c4 4423 8712 108c 9241 6402 08c8
    0000070 14b1 4320 8846 c920 3201 8442 2a18 2a28
    0000080 3190 b07c 915c c420 00c8 0000 2089 0000
    0000090 000e 0000 2232 0908 6220 0046 2b21 9824
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13

    我们可以看到 magic number 为 4342 dec0,说明对于 thin LTO 的 objects,其文件格式还是 bitcode file 。通过阅读 ThinLTO 的文档,发现其实文档中早已经说的很详细了。

    In ThinLTO mode, as with regular LTO, clang emits LLVM bitcode after the compile phase. The ThinLTO bitcode is augmented with a compact summary of the module. During the link step, only the summaries are read and merged into a combined summary index, which includes an index of function locations for later cross-module function importing. Fast and efficient whole-program analysis is then performed on the combined summary index.

    使用 llvm-dis a_lto.o 得到其可读的 IR。我们将其与 full lto 得到的 IR 进行对比后发现,两者差异极小,主要在于最后面的 summary 部分。以 a_lto.o 进行 thinLTO 和 full LTO 的对比如下。

    // ---------------- Thin LTO ----------------//
    !llvm.module.flags = !{!0, !1, !2, !3}
    !llvm.ident = !{!4}
    
    !0 = !{i32 1, !"wchar_size", i32 4}
    !1 = !{i32 7, !"uwtable", i32 1}
    !2 = !{i32 7, !"frame-pointer", i32 2}
    !3 = !{i32 1, !"EnableSplitLTOUnit", i32 0}
    !4 = !{!"clang version 14.0.0 (https://github.com/llvm/llvm-project.git 58e7bf78a3ef724b70304912fb3bb66af8c4a10c)"}
    
    ^0 = module: (path: "a_lto.o", hash: (3489747275, 1762444854, 1461358598, 2667786215, 1835806708))
    ^1 = gv: (name: "foo2", summaries: (function: (module: ^0, flags: (linkage: external, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 1, canAutoHide: 0), insts: 2, funcFlags: (readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0, noInline: 1, alwaysInline: 0, noUnwind: 1, mayThrow: 0, hasUnknownCall: 0, mustBeUnreachable: 0), refs: (writeonly ^2)))) ; guid = 2494702099028631698
    ^2 = gv: (name: "i", summaries: (variable: (module: ^0, flags: (linkage: internal, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 1, canAutoHide: 0), varFlags: (readonly: 1, writeonly: 1, constant: 0)))) ; guid = 2708120569957007488
    ^3 = gv: (name: "foo1", summaries: (function: (module: ^0, flags: (linkage: external, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 1, canAutoHide: 0), insts: 13, funcFlags: (readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0, noInline: 1, alwaysInline: 0, noUnwind: 1, mayThrow: 0, hasUnknownCall: 0, mustBeUnreachable: 0), calls: ((callee: ^5)), refs: (readonly ^2)))) ; guid = 7682762345278052905
    ^4 = gv: (name: "foo4") ; guid = 11564431941544006930
    ^5 = gv: (name: "foo3", summaries: (function: (module: ^0, flags: (linkage: internal, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 1, canAutoHide: 0), insts: 2, funcFlags: (readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0, noInline: 1, alwaysInline: 0, noUnwind: 1, mayThrow: 0, hasUnknownCall: 0, mustBeUnreachable: 0), calls: ((callee: ^4))))) ; guid = 17367728344439303071
    ^6 = blockcount: 5
    
    // ---------------- Full LTO ----------------//
    !llvm.module.flags = !{!0, !1, !2, !3, !4}
    !llvm.ident = !{!5}
    
    !0 = !{i32 1, !"wchar_size", i32 4}
    !1 = !{i32 7, !"uwtable", i32 1}
    !2 = !{i32 7, !"frame-pointer", i32 2}
    !3 = !{i32 1, !"ThinLTO", i32 0}
    !4 = !{i32 1, !"EnableSplitLTOUnit", i32 1}
    !5 = !{!"clang version 14.0.0 (https://github.com/llvm/llvm-project.git 58e7bf78a3ef724b70304912fb3bb66af8c4a10c)"}
    
    ^0 = module: (path: "a_lto.o", hash: (0, 0, 0, 0, 0))
    ^1 = gv: (name: "foo2", summaries: (function: (module: ^0, flags: (linkage: external, visibility: default, notEligibleToImport: 1, live: 0, dsoLocal: 1, canAutoHide: 0), insts: 2, funcFlags: (readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0, noInline: 1, alwaysInline: 0, noUnwind: 1, mayThrow: 0, hasUnknownCall: 0, mustBeUnreachable: 0), refs: (^2)))) ; guid = 2494702099028631698
    ^2 = gv: (name: "i", summaries: (variable: (module: ^0, flags: (linkage: internal, visibility: default, notEligibleToImport: 1, live: 0, dsoLocal: 1, canAutoHide: 0), varFlags: (readonly: 1, writeonly: 1, constant: 0)))) ; guid = 2708120569957007488
    ^3 = gv: (name: "foo1", summaries: (function: (module: ^0, flags: (linkage: external, visibility: default, notEligibleToImport: 1, live: 0, dsoLocal: 1, canAutoHide: 0), insts: 13, funcFlags: (readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0, noInline: 1, alwaysInline: 0, noUnwind: 1, mayThrow: 0, hasUnknownCall: 0, mustBeUnreachable: 0), calls: ((callee: ^5)), refs: (^2)))) ; guid = 7682762345278052905
    ^4 = gv: (name: "foo4") ; guid = 11564431941544006930
    ^5 = gv: (name: "foo3", summaries: (function: (module: ^0, flags: (linkage: internal, visibility: default, notEligibleToImport: 1, live: 0, dsoLocal: 1, canAutoHide: 0), insts: 2, funcFlags: (readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0, noInline: 1, alwaysInline: 0, noUnwind: 1, mayThrow: 0, hasUnknownCall: 0, mustBeUnreachable: 0), calls: ((callee: ^4))))) ; guid = 17367728344439303071
    ^6 = flags: 8
    ^7 = blockcount: 5
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37

    我们将重点的差别进行 highlight,

    DifferenceThin LTOFull LTO
    Module Flags!3 = !{i32 1, !"ThinLTO", i32 0}
    Global Value Summary module ^0^0 = module: (path: "a_lto.o", hash: (3489747275, 1762444854, 1461358598, 2667786215, 1835806708))^0 = module: (path: "a_lto.o", hash: (0, 0, 0, 0, 0))
    Global Value Summary foo2 ^1- notEligibleToImport: 0
    - refs: (writeonly ^2)
    - notEligibleToImport: 1
    - refs: (^2)
    Global Value Summary i ^2- notEligibleToImport: 0notEligibleToImport: 1
    Global Value Summary foo1 ^3- notEligibleToImport: 0
    - refs: (readonly ^2)
    - notEligibleToImport: 1
    - refs: (^2)
    Global Value Summary foo3 ^5notEligibleToImport: 0notEligibleToImport: 1

    通过 Metadata 知道,! 后面表示的是 metadata,^表示的是 global value summary。

    All metadata are identified in syntax by an exclamation point (‘!’).
    Compiling with ThinLTO causes the building of a compact summary of the module that is emitted into the bitcode. The summary is emitted into the LLVM assembly and identified in syntax by a caret (‘^’).

    通过 Module Flags Metadata 来对 !3 = !{i32 1, !"ThinLTO", i32 0} 进行解释。module flags metadata 是一组三元组 triplets

    • The first element is a behavior flag, which specifies the behavior when two (or more) modules are merged together.
    • The second element is a metadata string that is a unique ID for the metadata.
    • The third element is the value of the flag.
    !3 = !{i32 1, !"ThinLTO", i32 0}
    
    • 1

    thin lto
    ThinLTO 的值为 0, 表示非 ThinLTO,另外一个表明是否为 ThinLTO 或者 FullLTO,GLOBALVAL_SUMMARY_BLOCK 默认是 thin lto。

    $ llvm-bcanalyzer -dump a_full_lto.o
      Block ID #24 (FULL_LTO_GLOBALVAL_SUMMARY_BLOCK):
          Num Instances: 1
             Total Size: 789b/98.62B/24W
        Percent of file: 3.4924%
          Num SubBlocks: 0
            Num Abbrevs: 6
            Num Records: 7
        Percent Abbrevs: 57.1429%
    
    	Record Histogram:
    		  Count    # Bits     b/Rec   % Abv  Record Kind
    		      3       218      72.7  100.00  PERMODULE
    		      1        22                    BLOCK_COUNT
    		      1        22                    FLAGS
    		      1        22                    VERSION
    		      1        38            100.00  PERMODULE_GLOBALVAR_INIT_REFS
    $ llvm-bcanalyzer -dump a_thin_lto.o
      Block ID #20 (GLOBALVAL_SUMMARY_BLOCK):
          Num Instances: 1
             Total Size: 789b/98.62B/24W
        Percent of file: 3.4727%
          Num SubBlocks: 0
            Num Abbrevs: 6
            Num Records: 7
        Percent Abbrevs: 57.1429%
    
    	Record Histogram:
    		  Count    # Bits     b/Rec   % Abv  Record Kind
    		      3       218      72.7  100.00  PERMODULE
    		      1        22                    BLOCK_COUNT
    		      1        22                    FLAGS
    		      1        22                    VERSION
    		      1        38            100.00  PERMODULE_GLOBALVAR_INIT_REFS
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34

    在有 global value summary 的情况下,默认是 thin lto,除非 ThinLTO module metadata flag 为 0 。

    /// Emit the per-module summary section alongside the rest of
    /// the module's bitcode.
    void ModuleBitcodeWriterBase::writePerModuleGlobalValueSummary() {
      // By default we compile with ThinLTO if the module has a summary, but the
      // client can request full LTO with a module flag.
      bool IsThinLTO = true;
      if (auto *MD =
              mdconst::extract_or_null<ConstantInt>(M.getModuleFlag("ThinLTO")))
        IsThinLTO = MD->getZExtValue();
      Stream.EnterSubblock(IsThinLTO ? bitc::GLOBALVAL_SUMMARY_BLOCK_ID
                                     : bitc::FULL_LTO_GLOBALVAL_SUMMARY_BLOCK_ID,
                           4);
      // ...
    }
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14

    RFC

    https://lists.llvm.org/pipermail/llvm-dev/2015-May/085526.html
    https://sites.google.com/site/llvmthinlto/

    Patches

    https://reviews.llvm.org/D13107?id=35761

    Function Importer

    https://reviews.llvm.org/D14914
    https://reviews.llvm.org/D18343

    llvm-opt2/llvm-opt相关

    关于 SyntheticCount的讨论

    • https://lists.llvm.org/pipermail/llvm-dev/2017-December/119701.html
    • https://reviews.llvm.org/D43521?id=135117#inline-388028
    /// Compute synthetic function entry counts.
    void computeSyntheticCounts(ModuleSummaryIndex &Index);
    
    • 1
    • 2

    相关术语

    • BFI, block frequency inforamtion
    • BPI,probability information
    • CGSCC,call graph scc analysis,https://lists.llvm.org/pipermail/llvm-dev/2016-June/100792.html
  • 相关阅读:
    ssm+vue的4S店预约保养管理系统(有报告)。Javaee项目,ssm vue前后端分离项目。
    RocketMQ的BrokerContainer
    音响是如何把微弱声音放大呢
    tools.html 内容优化 --chatGPT
    在VMware16虚拟机中安装ubuntu系统(非桌面)
    Spring Security 自定义资源服务器实践
    【图论】有向图的强连通分量
    一个新工具 nolyfill
    【笔记】docker-compose.yml 文件更改后重新启动加载更改后的内容
    【建议收藏】逻辑回归面试题,机器学习干货、重点。
  • 原文地址:https://blog.csdn.net/dashuniuniu/article/details/122807374