assembly - 为什么“setne %al”在 perf 注释中使用了“很多循环”？

Question

当我看到这个性能报告时，我很困惑。我试了好几次，这个setne指令总是占函数最多。该函数是一个大函数，下面仅显示一小部分函数。

该报告是通过以下方式生成的：

perf record ./test

我检查了性能结果：

perf report --showcpuutilization

我为我最常用的代价函数之一打开了注解，很大，小块如图：

从中，我们可以看到setne指令（从顶部开始大约第 10 行，以红色显示）达到了大约 9% 的周期。

有人会帮助我吗，因为我不明白为什么这个“简单的指令”要花这么多时间？也许它与依赖于其他指令的管道排序有关？提前致谢！

顺便说一句：该程序是在 x86_64 架构上使用以下命令编译的：

gcc -g -pg -m32 -o test test.c

以下是CPU信息：

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 63
model name      : Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
stepping        : 2
microcode       : 0x1
cpu MHz         : 2494.222
cache size      : 16384 KB
physical id     : 0
siblings        : 1
core id         : 0
cpu cores       : 1
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology eagerfpu pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm invpcid_single ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt arat md_clear spec_ctrl intel_stibp
bogomips        : 4988.44
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

score 0 · Accepted Answer

只是想在这里提供一个不准确的答案：

“perf”基于样本工作。在每个样本中，它都会检查当前的 EIP 值并记录下来。
指令的百分比仅指 EIP 显示地址时的样本与范围内的总样本相比。当上一条指令很慢时，EIP 就停留在这里。
对于一些现代 CPU，有时报告的热点可能比真正的“阻塞点”提前几条指令。所以通常最好回头看看是否有任何指令可能导致执行延迟。

参考资料：https ://perf.wiki.kernel.org/index.php/Tutorial#Sampling_with_perf_record

assembly - 为什么“setne %al”在 perf 注释中使用了“很多循环”？

1 回答 1

Related

Reference