我正在尝试测量folly
hashmap 中并发插入的性能。此处提供了用于此类插入的程序的简化版本:
#include <folly/concurrency/ConcurrentHashMap.h>
#include <chrono>
#include <iostream>
#include <mutex>
#include <thread>
#include <vector>
const int kNumMutexLocks = 2003;
std::unique_ptr<std::mutex[]> mutices(new std::mutex[kNumMutexLocks]);
__inline__ void
concurrentInsertion(unsigned int threadId, unsigned int numInsertionsPerThread,
unsigned int numInsertions, unsigned int numUniqueKeys,
folly::ConcurrentHashMap<int, int> &follyMap) {
int base = threadId * numInsertionsPerThread;
for (int i = 0; i < numInsertionsPerThread; i++) {
int idx = base + i;
if (idx >= numInsertions)
break;
int val = idx;
int key = val % numUniqueKeys;
mutices[key % kNumMutexLocks].lock();
auto found = follyMap.find(key);
if (found != follyMap.end()) {
int oldVal = found->second;
if (oldVal < val) {
follyMap.assign(key, val);
}
} else {
follyMap.insert(key, val);
}
mutices[key % kNumMutexLocks].unlock();
}
}
void func(unsigned int numInsertions, float keyValRatio) {
const unsigned int numThreads = 12; // Simplified just for this post
unsigned int numUniqueKeys = numInsertions * keyValRatio;
unsigned int numInsertionsPerThread = ceil(numInsertions * 1.0 / numThreads);
std::vector<std::thread> insertionThreads;
insertionThreads.reserve(numThreads);
folly::ConcurrentHashMap<int, int> follyMap;
auto start = std::chrono::steady_clock::now();
for (int i = 0; i < numThreads; i++) {
insertionThreads.emplace_back(std::thread([&, i] {
concurrentInsertion(i, numInsertionsPerThread, numInsertions,
numUniqueKeys, follyMap);
}));
}
for (int i = 0; i < numThreads; i++) {
insertionThreads[i].join();
}
auto end = std::chrono::steady_clock::now();
auto diff = end - start;
float insertionTimeMs =
std::chrono::duration<double, std::milli>(diff).count();
std::cout << "i: " << numInsertions << "\tj: " << keyValRatio
<< "\ttime: " << insertionTimeMs << std::endl;
}
int main() {
std::vector<float> js = {0.5, 0.25};
for (auto j : js) {
std::cout << "-------------" << std::endl;
for (int i = 2048; i < 4194304 * 8; i *= 2) {
func(i, j);
}
}
}
问题是在 main 中使用这个循环会突然增加func
函数中的测量时间。也就是说,如果我直接从 main 调用函数而没有任何循环(如下所示),某些情况下的测量时间会突然减少 100 倍以上。
int main() {
func(2048, 0.25); // ~ 100X faster now that the loop is gone.
}
可能的原因
- 我在构建 hasmap 时分配了大量内存。我相信当我在循环中运行代码时,在执行循环的第二次迭代时,计算机正忙于为第一次迭代释放内存。因此,程序变得慢得多。如果是这种情况,如果有人能提出一个改变,我可以通过循环获得相同的结果,我将不胜感激。
更多细节
请注意,如果我在 main 中展开循环,我也会遇到同样的问题。也就是说,下面的程序也有同样的问题:
int main() {
performComputation(input A);
...
performComputation(input Z);
}
样本输出
第一个程序的输出如下所示:
i: 2048 j: 0.5 time: 1.39932
...
i: 16777216 j: 0.5 time: 3704.33
-------------
i: 2048 j: 0.25 time: 277.427 <= sudden increase in execution time
i: 4096 j: 0.25 time: 157.236
i: 8192 j: 0.25 time: 50.7963
i: 16384 j: 0.25 time: 133.151
i: 32768 j: 0.25 time: 8.75953
...
i: 2048 j: 0.25 time: 162.663
func
单独运行main并i=2048
产生j=0.25
:
i: 2048 j: 0.25 time: 1.01
非常感谢任何评论/见解。