我有一个简单的 Monte-Carlo Pi 计算程序。我尝试在 2 个不同的机器上运行它(相同的硬件,内核版本略有不同)。我在一个案例中看到了显着的性能下降(两次)。没有线程,性能基本相同。程序的分析执行表明,较慢的程序在每次 futex 调用上花费的时间较少。
- 这与任何内核参数有关吗?
- CPU 标志会影响 futex 性能吗?/proc/cpuinfo 表示 cpu 标志略有不同。
- 这是否与python版本有关?
Linux(3.10.0-123.20.1 (Red Hat 4.4.7-16)) Python 2.6.6
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
99.69 53.229549 5 10792796 5385605 futex
Profile Output
==============
256 function calls in 26.189 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
39 26.186 0.671 26.186 0.671 :0(acquire)
Linux(3.10.0-514.26.2 (Red Hat 4.8.5-11)) Python 2.7.5
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
99.69 94.281979 8 11620358 5646413 futex
Profile Output
==============
259 function calls in 53.448 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
38 53.445 1.406 53.445 1.406 :0(acquire)
测试程序
import random
import math
import time
import threading
import sys
import profile
def find_pi(tid, n):
t0 = time.time()
in_circle = 0
for i in range(n):
x = random.random()
y = random.random()
dist = math.sqrt(pow(x, 2) + pow(y, 2))
if dist < 1:
in_circle += 1
pi = 4.0 * (float(in_circle)/float(n))
print 'Pi=%s - thread(%s) time=%.3f sec' % (pi, tid, time.time() - t0)
return pi
def main():
if len(sys.argv) > 1:
n = int(sys.argv[1])
else:
n = 6000000
t0 = time.time()
threads = []
num_threads = 5
print 'n =', n
for tid in range(num_threads):
t = threading.Thread(target=find_pi, args=(tid,n,))
threads.append(t)
t.start()
for t in threads:
t.join()
#main()
profile.run('main()')
#profile.run('find_pi(1, 6000000)')