solr - Solr 建议 - 使用 DocumentDictionaryFactory 进行上下文过滤返回整个字段

Question

我正在为提供书籍全文搜索的应用程序开发自动完成功能。

我正在尝试使用上下文过滤配置 Solr (v.7.4.0) 建议器（例如，将结果限制为单本书页面中的文本）以返回提供的查询的匹配术语，但它返回整个字段的内容.

在 solrconfig.xml 中 searchComponent 的定义中，当我使用 FuzzyLookupFactory 时，它可以正常工作（返回单个单词），但是该查找实现不支持上下文过滤。当我切换到 AnalyzingInfixLookupFactory 并结合 DocumentDictionaryFactory 以支持上下文过滤（请参阅Solr 文档）时，我可以返回整个字段。

示例字段值：

{
   "id":"abc1234",
   "ocrtext":"In choosing Colors for candy, certain qualifications are necessary. First, they must not fade or change"
}

响应如下查询：

http://127.0.0.1:8983/solr/iiif_suggest?wt=json&q=col&suggest.cfq=456789

我想要的是：

{
    "responseHeader": {
        "status": 0,
        "QTime": 1
    },
    "suggest": {
        "iiifSuggester": {
            "col": {
                "numFound": 1,
                "suggestions": [
                    {
                        "term": "colors",
                        "weight": 0,
                        "payload": ""
                    }]}}}
}

但我得到的是：

{
    "responseHeader": {
        "status": 0,
        "QTime": 1
    },
    "suggest": {
        "iiifSuggester": {
            "col": {
                "numFound": 1,
                "suggestions": [
                    {
                        "term": "In choosing Colors for candy, certain qualifications are necessary. First, they must not fade or change",
                        "weight": 0,
                        "payload": ""
                    }]}}}
}

这是相关的 solrconfig.xml 设置：

<searchComponent name="iiif_suggest" class="solr.SuggestComponent">
  <lst name="suggester">
    <str name="name">mySuggester</str>
    <str name="lookupImpl">AnalyzingInfixLookupFactory</str>
    <str name="dictionaryImpl">DocumentDictionaryFactory</str>
    <str name="suggestAnalyzerFieldType">ocrtext_suggest</str>
    <str name="contextField">is_page_of_ssim</str>
    <str name="field">ocrtext</str>
  </lst>
</searchComponent>

以下是 schema.xml 中的字段定义：

<fieldType name="ocrtext_suggest" class="solr.TextField" positionIncrementGap="100">
  <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[^a-zA-Z0-9]" replacement=" " />
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
</fieldType>

<field name="ocrtext" type="ocrtext_suggest" indexed="true" stored="true" multiValued="false" />

本质上，ocrtext_suggest它以默认 SolrtextSpell字段类型定义为模型。但是，我观察到该字段必须具有stored="true"才能返回任何结果。

当我在 Solr GUI 模式浏览器中查看 ocrtext 字段的内容并单击加载术语信息时，该字段似乎被标记为单个术语。我不明白 DocumentDictionaryFactory 如何存储完整的字段值。

任何建议将不胜感激！

score 0 · Accepted Answer

0

尝试<str name="lookupImpl">FreeTextLookupFactory</str>根据您的要求替换它。

于 2020-07-09T04:18:10.180 回答

solr - Solr 建议 - 使用 DocumentDictionaryFactory 进行上下文过滤返回整个字段

1 回答 1

Related

Reference