本文比较了几款CPU对同一测试程序的比较结果,用的是Oracle公有云OCI上的计算实例,均分配的1 OCPU,内存用的默认值,不过内存对此测试程序运行结果不重要。
本文只列结果,不做任何评价。下表中,最后一列为测试程序运行5次的平均耗时。
OCI shape名称 | CPU 型号 | 基本频率(GHz) | 测试程序运行耗时平均值(秒) |
---|---|---|---|
VM.Standard3.Flex | Intel Xeon Platinum 8358 | 2.6 | 135.084 |
VM.Optimized3 | Intel Xeon 6354 | 3.0 | 123.65 |
VM.Standard.E4.Flex | AMD EPYC 7J13 | 2.55 | 62.766 |
VM.Standard.E5.Flex | AMD EPYC 7J13 | 2.4 | 53.22 |
VM.Standard.A1.Flex | Ampere Altra Q80-30 | 3.0 | 107.206 |
测试程序:
#include
#include
void main()
{
double r;
int i, j;
for (i=0; i< 100000; i++)
for (j=0; j< 100000; j++)
r = r + sqrt(sqrt(i));
}
编译:
cc -lm a.c
test.sh运行a.out 5次:
for i in 1 2 3 4 5; do
time -p ./a.out
done
求平均值可以将以上输出存于临时文件,例如/tmp/1,然后运行一下:
cat /tmp/1|grep real|sed 's/real //'|awk '{s+=$1} END {print s/5}'
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 2
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 106
Model name: Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz
Stepping: 6
CPU MHz: 2594.024
BogoMIPS: 5188.04
Virtualization: VT-x
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 4096K
L3 cache: 16384K
NUMA node0 CPU(s): 0,1
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cm ov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm consta nt_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq v mx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invp cid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid e pt_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdse ed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveop t xsavec xgetbv1 xsaves nt_good wbnoinvd arat vnmi avx512vbmi umip pku ospke avx 512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 r dpid md_clear arch_capabilities
测试结果:
$ ./test.sh
real 135.08
user 134.69
sys 0.00
real 135.04
user 134.67
sys 0.00
real 135.14
user 134.67
sys 0.02
real 135.10
user 134.68
sys 0.00
real 135.06
user 134.69
sys 0.00
通过grep real|sed 's/real //'
可以得到所有real time统计:
135.08
135.04
135.14
135.10
135.06
直接求平均值可以用grep real|sed 's/real //'|awk '{s+=$1} END {print s/5}'
, 因此平均值为135.084。
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 2
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 106
Model name: Intel(R) Xeon(R) Gold 6354 CPU @ 3.00GHz
Stepping: 6
CPU MHz: 2993.064
BogoMIPS: 5986.12
Virtualization: VT-x
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 4096K
L3 cache: 16384K
NUMA node0 CPU(s): 0,1
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves nt_good wbnoinvd arat vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid fsrm md_clear arch_capabilities
测试结果:
$ ./test.sh
real 123.69
user 123.40
sys 0.00
real 123.66
user 123.37
sys 0.00
real 123.65
user 123.38
sys 0.00
real 123.62
user 123.38
sys 0.00
real 123.63
user 123.38
sys 0.01
通过grep real|sed 's/real //'
可以得到所有real time统计:
123.69
123.66
123.65
123.62
123.63
直接求平均值可以用grep real|sed 's/real //'|awk '{s+=$1} END {print s/5}'
, 因此平均值为123.65。
$ ./test.sh
real 60.26
user 60.25
sys 0.00
real 60.51
user 60.50
sys 0.00
real 64.45
user 64.44
sys 0.00
real 67.76
user 66.29
sys 0.13
real 60.85
user 60.80
sys 0.00
测试结果:
$ ./test.sh
real 60.26
user 60.25
sys 0.00
real 60.51
user 60.50
sys 0.00
real 64.45
user 64.44
sys 0.00
real 67.76
user 66.29
sys 0.13
real 60.85
user 60.80
sys 0.00
通过grep real|sed 's/real //'
可以得到所有real time统计:
60.26
60.51
64.45
67.76
60.85
直接求平均值可以用grep real|sed 's/real //'|awk '{s+=$1} END {print s/5}'
, 因此平均值为62.766。
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 2
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 25
Model: 17
Model name: AMD EPYC 9J14 96-Core Processor
Stepping: 1
CPU MHz: 2596.100
BogoMIPS: 5192.20
Virtualization: AMD-V
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 64K
L1i cache: 64K
L2 cache: 512K
L3 cache: 16384K
NUMA node0 CPU(s): 0,1
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core invpcid_single ssbd ibrs ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves nt_good avx512_bf16 clzero xsaveerptr wbnoinvd arat npt nrip_save avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid arch_capabilities
测试结果:
$ ./test.sh
real 52.63
user 52.62
sys 0.00
real 53.29
user 53.19
sys 0.00
real 52.13
user 52.12
sys 0.00
real 52.28
user 52.27
sys 0.00
real 55.77
user 54.79
sys 0.01
通过grep real|sed 's/real //'
可以得到所有real time统计:
52.63
53.29
52.13
52.28
55.77
直接求平均值可以用grep real|sed 's/real //'|awk '{s+=$1} END {print s/5}'
, 因此平均值为53.22。
$ lscpu
Architecture: aarch64
Byte Order: Little Endian
CPU(s): 1
On-line CPU(s) list: 0
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 1
Vendor ID: ARM
Model: 1
Model name: Neoverse-N1
Stepping: r3p1
BogoMIPS: 50.00
L1d cache: unknown size
L1i cache: unknown size
L2 cache: unknown size
NUMA node0 CPU(s): 0
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
测试结果:
$ ./test.sh
real 113.46
user 103.23
sys 0.23
real 103.77
user 103.02
sys 0.03
real 109.15
user 103.01
sys 0.14
real 105.11
user 103.29
sys 0.02
real 104.54
user 103.06
sys 0.02
通过grep real|sed 's/real //'
可以得到所有real time统计:
113.46
103.77
109.15
105.11
104.54
直接求平均值可以用grep real|sed 's/real //'|awk '{s+=$1} END {print s/5}'
, 因此平均值为107.206。