optimization - Pipelining vs Batching in Stackexchange.Redis

Question

I am trying to insert a large(-ish) number of elements in the shortest time possible and I tried these two alternatives:

1) Pipelining:

List<Task> addTasks = new List<Task>();
for (int i = 0; i < table.Rows.Count; i++)
{
    DataRow row = table.Rows[i];
    Task<bool> addAsync = redisDB.SetAddAsync(string.Format(keyFormat, row.Field<int>("Id")), row.Field<int>("Value"));
    addTasks.Add(addAsync);
}
Task[] tasks = addTasks.ToArray();
Task.WaitAll(tasks);

2) Batching:

List<Task> addTasks = new List<Task>();
IBatch batch = redisDB.CreateBatch();
for (int i = 0; i < table.Rows.Count; i++)
{
    DataRow row = table.Rows[i];
    Task<bool> addAsync = batch.SetAddAsync(string.Format(keyFormat, row.Field<int>("Id")), row.Field<int>("Value"));
    addTasks.Add(addAsync);
}
batch.Execute();
Task[] tasks = addTasks.ToArray();
Task.WaitAll(tasks);

I am not noticing any significant time difference (actually I expected the batch method to be faster): for approx 250K inserts I get approx 7 sec for pipelining vs approx 8 sec for batching.

Reading from the documentation on pipelining,

"Using pipelining allows us to get both requests onto the network immediately, eliminating most of the latency. Additionally, it also helps reduce packet fragmentation: 20 requests sent individually (waiting for each response) will require at least 20 packets, but 20 requests sent in a pipeline could fit into much fewer packets (perhaps even just one)."

To me, this sounds a lot like the a batching behaviour. I wonder if behind the scenes there's any big difference between the two because at a simple check with procmon I see almost the same number of TCP Sends on both versions.

score 33 · Accepted Answer

在幕后，SE.Redis 做了很多工作来避免数据包碎片，因此在您的情况下它非常相似也就不足为奇了。批处理和扁平流水线的主要区别是：

批处理永远不会与同一个多路复用器上的竞争操作交错（尽管它可能在服务器上交错；以避免您需要使用multi/exec事务或 Lua 脚本）
批次将始终避免数据包过小的机会，因为它提前知道所有数据
但同时，必须在发送任何内容之前完成整个批次，因此这需要更多的内存缓冲，并且可能会人为地引入延迟

在大多数情况下，避免批处理会做得更好，因为 SE.Redis在简单地添加工作时会自动完成大部分工作。

最后一点；如果您想避免本地开销，最后一种方法可能是：

redisDB.SetAdd(string.Format(keyFormat, row.Field<int>("Id")),
    row.Field<int>("Value"), flags: CommandFlags.FireAndForget);

这会将所有内容发送到网络中，既不等待响应，也不分配不完整Task的 s 来表示未来值。您可能希望在Ping最后执行类似 a 的操作，而无需一劳永逸，以检查服务器是否仍在与您通话。请注意，使用即发即弃确实意味着您不会注意到报告的任何服务器错误。

optimization - Pipelining vs Batching in Stackexchange.Redis

1 回答 1

Related

Reference