作为分析几个 golang 服务的一部分,我们看到所有服务都在 runtime.futex 函数中花费了 55% 到 70% 的时间。
请注意,这些服务涉及使用 goroutines、锁和通道来实现 goroutines 之间的同步和通信。
这是意料之中的吗?我们应该只关心在非“运行时”函数上花费的时间,或者这可能是 goroutines/locks/channels 的错误使用造成的,导致 futex 调用中的这种开销?
(pprof) top20
Showing nodes accounting for 43.80s, 80.25% of 54.58s total
Dropped 654 nodes (cum <= 0.27s)
Showing top 20 nodes out of 156
flat flat% sum% cum cum%
31.05s 56.89% 56.89% 31.05s 56.89% runtime.futex
3.55s 6.50% 63.39% 4.51s 8.26% runtime.cgocall
2.25s 4.12% 67.52% 2.25s 4.12% runtime.usleep
1.14s 2.09% 69.60% 1.53s 2.80% runtime.scanobject
0.73s 1.34% 70.94% 1.56s 2.86% syscall.Syscall
0.71s 1.30% 72.24% 2.27s 4.16% runtime.mallocgc
(pprof) top20
Showing nodes accounting for 12.49s, 95.93% of 13.02s total
Dropped 93 nodes (cum <= 0.07s)
Showing top 20 nodes out of 47
flat flat% sum% cum cum%
9.69s 74.42% 74.42% 9.69s 74.42% runtime.futex
1.25s 9.60% 84.02% 1.63s 12.52% runtime.cgocall
0.64s 4.92% 88.94% 0.64s 4.92% runtime.usleep
0.17s 1.31% 90.25% 0.22s 1.69% runtime.timeSleepUntil
0.11s 0.84% 91.09% 0.17s 1.31% runtime.scanobject
0.08s 0.61% 91.71% 0.08s 0.61% runtime.nanotime1
0.08s 0.61% 92.32% 0.26s 2.00% runtime.retake
0.07s 0.54% 92.86% 0.07s 0.54% runtime.lock2
0.07s 0.54% 93.39% 0.13s 1% runtime.traceEventLocked
0.06s 0.46% 93.86% 0.36s 2.76% runtime.notetsleep_internal
0.05s 0.38% 94.24% 0.13s 1% runtime.mallocgc