elasticsearch - Elasticsearch 2.0.0 中的采样器聚合

Question

{
            "query": {
                "match": {
                    "text": query
                }
            },
            "size": 5,
            "aggs": {
                "bestDocs": {
                    "sampler": {
                        "field": "cluster",
                        "shard_size": 1
                    },
                    "aggs": {
                        "bestBuckets": {
                            "terms": {
                                "field": "cluster",
                                "size": 5
                            }
                        }
                    }
                }
            }
        }

使用简单的查询，我得到一堆文档，其中每个文档都有一个bucketID。我正在尝试使用采样器聚合来按照它们出现在一般查询中的顺序获取存储桶 ID。

但是当我运行上述查询时，我会按升序获得存储桶，它们甚至不是我从一般查询中得到的存储桶。

例如

{
    "took": 4,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 959,
        "max_score": 1.841992,
        "hits": [
            {
                "_source": {
                    "cluster": "22570",
                    "text": "about 1.5 million veteran families live at the federal poverty level, including 634,000 below 50 percent of the federal poverty",
                }
            },
            {
                "_source": {
                    "cluster": "22570",
                    "text": "about 1.5 million veteran families live at the federal poverty level, including 634,000 below 50 percent of the federal poverty",
                }
            },
            {
                "_source": {
                    "cluster": "22570",
                    "text": "about 1.5 million veteran families live at the federal poverty level, including 634,000 below 50 percent of the federal poverty",
                }
            },
            {
                "_source": {
                    "cluster": "22570",
                    "text": "about 1.5 million veteran families live at the federal poverty level, including 634,000 below 50 percent of the federal poverty",
                }
            },
            {
                "_source": {
                    "cluster": "12239",
                    "text": "veterans and their families.&quot;</p><p>The Veterans&#39; Compensation Cost-of-Living Adjustment Act of 2011 directs the Secretary of Veterans Affairs to increase the rates of veterans",
                }
            }
        ]
    },
    "aggregations": {
        "bestDocs": {
            "doc_count": 5,
            "bestBuckets": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0,
                "buckets": [
                    {
                        "key": 22185,
                        "doc_count": 1
                    },
                    {
                        "key": 22570,
                        "doc_count": 1
                    },
                    {
                        "key": 29615,
                        "doc_count": 1
                    },
                    {
                        "key": 32784,
                        "doc_count": 1
                    },
                    {
                        "key": 43351,
                        "doc_count": 1
                    }
                ]
            }
        }
    }
}

您可以看到聚合不是所需的。我怎样才能[22570, 12239]按顺序获得？

score 0 · Accepted Answer

"sampler": {
    "field": "cluster",
    "shard_size": 1  // <-- It could be the potential culprit
},

ID22570, 12239可能都属于同一个分片。由于您已将分片大小指定为 1，因此聚合可能仅从分片中获得一个，因此12239从聚合中省略了该分片。

elasticsearch - Elasticsearch 2.0.0 中的采样器聚合

1 回答 1

Related

Reference