elasticsearch - 改进 ES Agg 查询 - 获取 circuit_break_exception

Question

我在 2 个索引上运行聚合：idx-2020-07-21，idx-2020-07-22 目标：获取所有文档，但如果 id 重复（50% 是），则从最新索引中获取一个使用索引名称。

这是我正在运行的查询

{
  "size": 0,
  "aggregations": {
    "latest_item": {
      "composite": {
        "size": 1000,
        "sources": [
          {
            "product": {
              "terms": {
                "field": "_id",
                "missing_bucket": false,
                "order": "asc"
              }
            }
          }
        ]
      },
      "aggregations": {
        "max_date": {
          "top_hits": {
            "from": 0,
            "size": 1,
            "version": false,
            "explain": false,
            "sort": [
              {
                "_index": {
                  "order": "desc"
                }
              }
            ]
          }
        }
      }
    }
  }
}

每个索引大小为 8G，约 1M 文档。ES 版本 7.5

聚合大约需要 8 分钟，大多数时候我得到

{"error":{"root_cause":[{"type":"circuit_breaking_exception","reason":"[parent] Data too large, data for [<http_request>] would be [32933676058/30.6gb], which is larger than the limit of [32641751449/30.3gb].

有没有更好的方法来编写这个查询？
我该如何处理这个异常？
我运行了一个每隔 10 分钟查询 ES 的 java 作业，我注意到它在第二次发生了很多，我需要释放任何资源还是什么？我将 restHighLevelClient.searchAsync() 与一个侦听器一起使用，该侦听器使用下一个键再次调用，直到我得到 null。

集群有3个节点，每个32G。

我尝试使用存储桶大小，但没有多大帮助。

谢谢！

elasticsearch - 改进 ES Agg 查询 - 获取 circuit_break_exception

0 回答 0

Related

Reference