python - 如果注册的查询包含术语，则 ElasticSearch 的 Percolate API 匹配文档始终不返回匹配项

Question

我尝试使用Percolatorby Elasticsearch，但我遇到了一个小问题。

假设我们的文档如下所示：

{
    "doc": {
        "full_name": "Pacman"
        "company": "Arcade Game LTD",
        "occupation": "hunter", 
        "tags": ["Computer Games"]
    }
}

我们注册的查询是这样的：

{
    "query": {
        "bool": {
            "must": [
               {
                   "match_phrase":{
                       "occupation":  "hunter"
                   }
               },
               {
                   "terms": {
                       "tags":  [
                           "Computer Games",
                           "Electronic Sports"
                           ],
                       "minimum_match": 1
                   }
               }
            ]
        }
    }
}

我得到：

{
   "took": 3,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "total": 0,
   "matches": []
}

而且我不知道我做错了什么，因为如果我terms从注册查询中删除并匹配occupation它，它会按预期工作，我会得到一个匹配项。

有什么提示吗？

更新 1

好的，我认为@Slam 的解决方案是正确的方向，但我仍然有一些问题：

我更新了标签映射，现在看起来像这样：

"tags": {
    "store": True,
    "analyzer": "snowball",
    "type": "string",
    "index": "analyzed",
    "fields": {
        "raw": {
           "type": "string",
           "index": "not_analyzed"
       }
    }
}

要渗透的新文件：

{
    "doc": {
        "full_name": "Pacman"
        "company": "Arcade Game LTD",
        "occupation": "hunter", 
        "tags.raw": ["Computer Games"]
    }
}

当我尝试将上面的文档与匹配时tags.raw，仍然找不到匹配项。我分析了字段tags.raw，但看起来它仍然会创建令牌computer，games并且running.

score 0 · Accepted Answer

我猜，您使用隐式映射（默认分析器）或任何类型的分析器为您的tags领域。这意味着，该数据（在您的情况下为“计算机游戏”）被分解为令牌部分，不再可用于术语搜索，因为现在它表示为类似computer+gameindex.html 的内容。

为了能够对字符串进行术语匹配，您需要将它们映射为未分析（以防止它们被分割为标记），例如

PUT so/pacman/_mapping
{
  "pacman": {
    "properties": {
      "tags": {
        "type": "string",
        "index": "not_analyzed"
      }
    }
  }
}

或者让你的tags领域多领域，比如

PUT so/pacman/_mapping
{
  "pacman": {
    "properties": {
      "tags": {
        "type": "string",
        "index": "analyzed",
        "fields": {
          "raw": {
            "type": "string",
            "index": "not_analyzed"
          }
        }
      }
    }
  }
}

并查询文档

GET so/pacman/_search
{
  "query": {
    "terms": {
      "tags.raw": [
        "Computer Games",
        "Running"
      ],
      "minimum_match": 1
    }
  }
}

这种方法允许您执行文本搜索和术语搜索。

根据您的Update 1，在您放置正确的映射和渗透器后，例如：

PUT so/.percolator/1
{
  "query": {
    "terms": {
      "tags.raw": [
        "Computer Games",
        "Maze running"
      ]
    }
  }
}

您需要索引/过滤具有以下格式的文档

GET so/pacman/_percolate
{
  "doc": {
    "full_name": "Pacman",
    "company": "Arcade Game LTD",
    "occupation": "hunter", 
    "tags": ["Computer Games"]
  }
}

这里发生了什么。您正在索引/过滤带有字段的文档tags（没有提及raw或您拥有的任何多字段）。ES 从 json 中获取这个字段，添加tags.raw到索引中（作为整个字符串），同时将其分解为分析的标记，并将它们放入tag字段中（过程要复杂得多，但为了简单起见，让我们传递它这里）。因此，您无需管理有关此字段的任何内部事务，您已经在映射中完成了这些工作。

当 percolator 工作时，它将tags.raw在索引中查找字段（因为您为此“子字段”创建了术语查询），而分析的字段保持不变。

python - 如果注册的查询包含术语，则 ElasticSearch 的 Percolate API 匹配文档始终不返回匹配项

1 回答 1

Related

Reference