python - 使用多个索引在 MongoDB 中加速搜索的最佳方法是什么？

Question

我有 50k+ 文档的 MongoDB 集合，例如：

{
    "_id" : ObjectId("5a42190e806c3210acd3fa82"),
    "start_time" : ISODate("2017-12-25T02:29:00.000Z"),
    "system" : "GL",
    "region" : "NY",
    "order_id" : 3,
    "task_type" : "px2",
    "status" : false
}

当我添加新订单时，Python 脚本会在数据库中搜索具有相同 start_time 和 task_type 的现有订单，例如：

    tasks_base.find_one({
    "$and": [{
        "start_time": plan_date.astimezone(pytz.UTC)
    }, {
        "task_type": px
    }]
    })

它可以工作，但是集合中的每个新文档都会减慢它的速度（要检查更多文档等）。

作为解决方案，我想添加task_type和start_time作为集合的索引。但是有一些顾虑（日期作为索引看起来有点不自然）。因此，需要建议如何正确执行（或其他想法，如何加快搜索速度）。感谢任何建议:)

score 0 · Accepted Answer

我用3个步骤解决了它：

首先，我创建了唯一的复合索引：

tasks.create_index([("start_time", pymongo.ASCENDING,), ("task_proxy", pymongo.ASCENDING)], unique=True)

然后，我将查询调整为仅在索引字段中搜索（覆盖查询）：

all_tasks = tasks.find({
        "$and": [{
            "start_time": {
                "$gte": plan_date.astimezone(pytz.UTC),
                "$lt": plan_date.astimezone(pytz.UTC) + timedelta(hours=1)
            }
        }, {
            "task_proxy": px
        }]
    }, {"_id": 0, "start_time": 1, "task_proxy": 1})

最后（上面的代码相同），我将时间查询的大小从 1 分钟增加到 1 小时，所以我可以进行 1 次数据库操作而不是 60 次。我可以对 Python 脚本中的大部分数据进行操作，所以数据库上的负载是低得多:)

UPD：重写了 80% 的代码：我对每个订单使用 1 个查询，现在我每小时使用 1 个查询，在那里找到空闲时间段，并将订单打包到空闲单元格中（如果单元格不足，则移动到另一个集合）。仍然使用复合索引和覆盖查询，脚本异常时间从 15-17 秒变为 0.6 秒。

python - 使用多个索引在 MongoDB 中加速搜索的最佳方法是什么？

1 回答 1

Related

Reference