3

Lucene 中有几个 IndexSearcher.Search 方法的重载。其中有些需要“top n hits”参数,有些则不需要(这些已过时,将在 Lucene.NET 3.0 中删除)。

那些需要“top n”参数的那些实际上会导致整个可能结果范围的内存预分配。因此,当您甚至无法估计返回的结果数时,唯一的机会是传递一个随机的大数以确保返回所有查询结果。由于 LOH 碎片,这会导致严重的内存压力和泄漏。

在不传递“top n”参数的情况下,是否有一种官方未过时的搜索方式?

提前谢谢各位。

4

1 回答 1

2

I'm using Lucene.NET 2.9.2 as reference point for this answer.

You could build a custom collector which you pass to one of the search overloads.

using System;
using System.Collections.Generic;
using Lucene.Net.Index;
using Lucene.Net.Search;

public class AwesomeCollector : Collector {
    private readonly List<Int32> _docIds = new List<Int32>();
    private Scorer _scorer;
    private Int32 _docBase;

    public IEnumerable<Int32> DocumentIds {
        get { return _docIds; }
    }

    public override void SetScorer(Scorer scorer) {
        _scorer = scorer;
    }

    public override void Collect(Int32 doc) {
        var score = _scorer.Score();
        if (_lowerInclusiveScore <= score)
            _docIds.Add(_docBase + doc);
    }

    public override void SetNextReader(IndexReader reader, Int32 docBase) {
        _docBase = docBase;
    }

    public override bool AcceptsDocsOutOfOrder() {
        return true;
    }
}
于 2011-01-21T12:52:56.677 回答