• 一个计算密集小程序在不同CPU下的表现


    本文比较了几款CPU对同一测试程序的比较结果,用的是Oracle公有云OCI上的计算实例,均分配的1 OCPU,内存用的默认值,不过内存对此测试程序运行结果不重要。

    本文只列结果,不做任何评价。下表中,最后一列为测试程序运行5次的平均耗时。

    OCI shape名称CPU 型号基本频率(GHz)测试程序运行耗时平均值(秒)
    VM.Standard3.FlexIntel Xeon Platinum 83582.6135.084
    VM.Optimized3Intel Xeon 63543.0123.65
    VM.Standard.E4.FlexAMD EPYC 7J132.5562.766
    VM.Standard.E5.FlexAMD EPYC 7J132.453.22
    VM.Standard.A1.FlexAmpere Altra Q80-303.0107.206

    测试程序:

    #include 
    #include 
    
    void main()
    {
            double r;
            int i, j;
            for (i=0; i< 100000; i++)
                    for (j=0; j< 100000; j++)
                            r = r + sqrt(sqrt(i));
    
    }
    

    编译:

    cc -lm a.c
    

    test.sh运行a.out 5次:

    for i in 1 2 3 4 5; do
            time -p ./a.out
    done
    

    求平均值可以将以上输出存于临时文件,例如/tmp/1,然后运行一下:

    cat /tmp/1|grep real|sed  's/real //'|awk '{s+=$1} END {print s/5}'
    

    Intel Xeon Platinum 8358

    $ lscpu
    Architecture:        x86_64
    CPU op-mode(s):      32-bit, 64-bit
    Byte Order:          Little Endian
    CPU(s):              2
    On-line CPU(s) list: 0,1
    Thread(s) per core:  2
    Core(s) per socket:  1
    Socket(s):           1
    NUMA node(s):        1
    Vendor ID:           GenuineIntel
    CPU family:          6
    Model:               106
    Model name:          Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz
    Stepping:            6
    CPU MHz:             2594.024
    BogoMIPS:            5188.04
    Virtualization:      VT-x
    Hypervisor vendor:   KVM
    Virtualization type: full
    L1d cache:           32K
    L1i cache:           32K
    L2 cache:            4096K
    L3 cache:            16384K
    NUMA node0 CPU(s):   0,1
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cm                                                       ov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm consta                                                       nt_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq v                                                       mx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer                                                        aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invp                                                       cid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid e                                                       pt_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdse                                                       ed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveop                                                       t xsavec xgetbv1 xsaves nt_good wbnoinvd arat vnmi avx512vbmi umip pku ospke avx                                                       512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 r                                                       dpid md_clear arch_capabilities
    

    测试结果:

    $ ./test.sh
    real 135.08
    user 134.69
    sys 0.00
    real 135.04
    user 134.67
    sys 0.00
    real 135.14
    user 134.67
    sys 0.02
    real 135.10
    user 134.68
    sys 0.00
    real 135.06
    user 134.69
    sys 0.00
    

    通过grep real|sed 's/real //'可以得到所有real time统计:

    135.08
    135.04
    135.14
    135.10
    135.06
    

    直接求平均值可以用grep real|sed 's/real //'|awk '{s+=$1} END {print s/5}', 因此平均值为135.084

    Intel Xeon 6354

    $ lscpu
    Architecture:        x86_64
    CPU op-mode(s):      32-bit, 64-bit
    Byte Order:          Little Endian
    CPU(s):              2
    On-line CPU(s) list: 0,1
    Thread(s) per core:  2
    Core(s) per socket:  1
    Socket(s):           1
    NUMA node(s):        1
    Vendor ID:           GenuineIntel
    CPU family:          6
    Model:               106
    Model name:          Intel(R) Xeon(R) Gold 6354 CPU @ 3.00GHz
    Stepping:            6
    CPU MHz:             2993.064
    BogoMIPS:            5986.12
    Virtualization:      VT-x
    Hypervisor vendor:   KVM
    Virtualization type: full
    L1d cache:           32K
    L1i cache:           32K
    L2 cache:            4096K
    L3 cache:            16384K
    NUMA node0 CPU(s):   0,1
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves nt_good wbnoinvd arat vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid fsrm md_clear arch_capabilities
    

    测试结果:

    $ ./test.sh
    real 123.69
    user 123.40
    sys 0.00
    real 123.66
    user 123.37
    sys 0.00
    real 123.65
    user 123.38
    sys 0.00
    real 123.62
    user 123.38
    sys 0.00
    real 123.63
    user 123.38
    sys 0.01
    

    通过grep real|sed 's/real //'可以得到所有real time统计:

    123.69
    123.66
    123.65
    123.62
    123.63
    

    直接求平均值可以用grep real|sed 's/real //'|awk '{s+=$1} END {print s/5}', 因此平均值为123.65

    AMD EPYC 7J13

    $ ./test.sh
    real 60.26
    user 60.25
    sys 0.00
    real 60.51
    user 60.50
    sys 0.00
    real 64.45
    user 64.44
    sys 0.00
    real 67.76
    user 66.29
    sys 0.13
    real 60.85
    user 60.80
    sys 0.00
    
    

    测试结果:

    $ ./test.sh
    real 60.26
    user 60.25
    sys 0.00
    real 60.51
    user 60.50
    sys 0.00
    real 64.45
    user 64.44
    sys 0.00
    real 67.76
    user 66.29
    sys 0.13
    real 60.85
    user 60.80
    sys 0.00
    
    

    通过grep real|sed 's/real //'可以得到所有real time统计:

    60.26
    60.51
    64.45
    67.76
    60.85
    

    直接求平均值可以用grep real|sed 's/real //'|awk '{s+=$1} END {print s/5}', 因此平均值为62.766

    AMD EPYC 9J14

    $ lscpu
    Architecture:        x86_64
    CPU op-mode(s):      32-bit, 64-bit
    Byte Order:          Little Endian
    CPU(s):              2
    On-line CPU(s) list: 0,1
    Thread(s) per core:  2
    Core(s) per socket:  1
    Socket(s):           1
    NUMA node(s):        1
    Vendor ID:           AuthenticAMD
    CPU family:          25
    Model:               17
    Model name:          AMD EPYC 9J14 96-Core Processor
    Stepping:            1
    CPU MHz:             2596.100
    BogoMIPS:            5192.20
    Virtualization:      AMD-V
    Hypervisor vendor:   KVM
    Virtualization type: full
    L1d cache:           64K
    L1i cache:           64K
    L2 cache:            512K
    L3 cache:            16384K
    NUMA node0 CPU(s):   0,1
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core invpcid_single ssbd ibrs ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves nt_good avx512_bf16 clzero xsaveerptr wbnoinvd arat npt nrip_save avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid arch_capabilities
    
    

    测试结果:

    $ ./test.sh
    real 52.63
    user 52.62
    sys 0.00
    real 53.29
    user 53.19
    sys 0.00
    real 52.13
    user 52.12
    sys 0.00
    real 52.28
    user 52.27
    sys 0.00
    real 55.77
    user 54.79
    sys 0.01
    
    

    通过grep real|sed 's/real //'可以得到所有real time统计:

    52.63
    53.29
    52.13
    52.28
    55.77
    

    直接求平均值可以用grep real|sed 's/real //'|awk '{s+=$1} END {print s/5}', 因此平均值为53.22

    Ampere Altra Q80-30

    $ lscpu
    Architecture:        aarch64
    Byte Order:          Little Endian
    CPU(s):              1
    On-line CPU(s) list: 0
    Thread(s) per core:  1
    Core(s) per socket:  1
    Socket(s):           1
    NUMA node(s):        1
    Vendor ID:           ARM
    Model:               1
    Model name:          Neoverse-N1
    Stepping:            r3p1
    BogoMIPS:            50.00
    L1d cache:           unknown size
    L1i cache:           unknown size
    L2 cache:            unknown size
    NUMA node0 CPU(s):   0
    Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
    

    测试结果:

    $ ./test.sh
    real 113.46
    user 103.23
    sys 0.23
    real 103.77
    user 103.02
    sys 0.03
    real 109.15
    user 103.01
    sys 0.14
    real 105.11
    user 103.29
    sys 0.02
    real 104.54
    user 103.06
    sys 0.02
    
    

    通过grep real|sed 's/real //'可以得到所有real time统计:

    113.46
    103.77
    109.15
    105.11
    104.54
    
    

    直接求平均值可以用grep real|sed 's/real //'|awk '{s+=$1} END {print s/5}', 因此平均值为107.206

    参考

    • https://docs.oracle.com/en-us/iaas/Content/Compute/References/computeshapes.htm#vm-standard
  • 相关阅读:
    Elasticsearch 开放 inference API 增加了对 Azure OpenAI 嵌入的支持
    Java快速排序算法、三路快排(Java算法和数据结构总结笔记)[7/20]
    壁挂式SIP网络有源音柱 SIP广播音柱 支持私有协议软件广播
    IPSec
    ChatGPT帮助工程师写代码:从功能模块完善到成功执行,实现需求
    Flutter 7 个开源项目推荐 01
    无法在 DLL“SQLite.Interop.dll”中找到名为”sIb4c632894b76cc1d“
    JDBC 实现批量插入-任意表名
    2024滴滴校招面试真题汇总及其讲解(二)
    [GIT]版本控制工具
  • 原文地址:https://blog.csdn.net/stevensxiao/article/details/140038907