sql-server - 在 SQL 数据库中记录来自 dtSearch 的所有 DocIds 和 FileNames 的最快方法

Question

我将 dtSearch 与 SQL 数据库结合使用，并希望维护一个包含所有 DocId 及其相关文件名的表。从那里，我将添加一个带有我的外键的列，以允许我结合文本和数据库搜索。

我有代码可以简单地返回索引中的所有记录并将它们一一添加到数据库中。但是，这需要永远，并且没有解决如何在将新记录添加到索引时简单地附加新记录的问题。但以防万一它有帮助：

MyDatabaseContext db = new StateScapeEntities();
IndexJob ij = new dtSearch.Engine.IndexJob();

ij.IndexPath = @"d:\myindex";

IndexInfo indexInfo = dtSearch.Engine.IndexJob.GetIndexInfo(@"d:\myindex");

bool jobDone =   ij.Execute();

SearchResults sr = new SearchResults();

uint n = indexInfo.DocCount;

for (int i = 1; i <= n; i++)
{
    sr.AddDoc(ij.IndexPath, i, null);
}

for (int i = 1; i <= n; i++)
{
    sr.GetNthDoc(i - 1);
        //IndexDocument is defined elsewhere
        IndexDocument id = new IndexDocument();
        id.DocId = sr.CurrentItem.DocId;
        id.FilePath = sr.CurrentItem.Filename;

        if (id.FilePath != null)
        {
            db.IndexDocuments.Add(id);
            db.SaveChanges();           
        }   
}

score 2 · Accepted Answer

To keep the DocId in the index you must use the flag dtsIndexKeepExistingDocIds in the IndexJob

You can also look the dtSearch Text Retrieval Engine Programmer's Reference when the DocID is changed

When a document is added to an index, it is assigned a DocId, and DocIds are always numbered sequentially.
When a document is reindexed, the old DocId is cancelled and a new DocId is assigned.
When an index is compressed, all DocIds in the index are renumbered to remove the cancelled DocIds unless the dtsIndexKeepExistingDocIds flag is set in IndexJob.
When an index is merged into another index, DocIds in the target index are never changed. The documents merged into the target index will all be assigned new, sequentially-numbered DocIds, unless (a) the dtsIndexKeepExistingDocIds flag is set in IndexJob and (b) the indexes have non-overlapping ranges of doc ids.

score 1 · Accepted Answer

因此，我使用了 user2172986 的部分响应，但将其与一些额外的代码结合起来，以获得我的问题的解决方案。我确实必须在我的索引更新例程中设置dtsKeepExistingDocIds标志。从那里，我只想将新创建的 DocIds 添加到我的 SQL 数据库中。为此，我使用了以下代码：

string indexPath = @"d:\myindex"; 

        using (IndexJob ij = new dtSearch.Engine.IndexJob())
        {
            //make sure the updated index doesn't change DocIds
            ij.IndexingFlags = IndexingFlags.dtsIndexKeepExistingDocIds;
            ij.IndexPath = indexPath;
            ij.ActionAdd = true;
            ij.FoldersToIndex.Add( indexPath + "<+>");
            ij.IncludeFilters.Add( "*");
            bool jobDone = ij.Execute();
        }
        //create a DataTable to hold results
        DataTable newIndexDoc = MakeTempIndexDocTable(); //this is a custom method not included in this example; just creates a DataTable with the appropriate columns

        //connect to the DB;
        MyDataBase db = new MyDataBase(); //again, custom code not included - link to EntityFramework entity

        //get the last DocId in the DB?
        int lastDbDocId = db.IndexDocuments.OrderByDescending(i => i.DocId).FirstOrDefault().DocId;

        //get the last DocId in the Index
        IndexInfo indexInfo = dtSearch.Engine.IndexJob.GetIndexInfo(indexPath);

        uint latestIndexDocId = indexInfo.LastDocId;

        //create a searchFilter
        dtSearch.Engine.SearchFilter sf = new SearchFilter();

        int indexId = sf.AddIndex(indexPath);


        //only select new records (from one greater than the last DocId in the DB to the last DocId in the index itself
        sf.SelectItems(indexId, lastDbDocId + 1, int.Parse(latestIndexDocId.ToString()), true);

        using (SearchJob sj = new dtSearch.Engine.SearchJob())
        {
           sj.SetFilter(sf);
           //return every document in the specified range (using xfirstword)
           sj.Request = "xfirstword";
           // Specify the path to the index to search here
           sj.IndexesToSearch.Add(indexPath);


          //additional flags and limits redacted for clarity

           sj.Execute();

           // Store the error message in the status
           //redacted for clarity



           SearchResults results = sj.Results;
           int startIdx = 0;
           int endIdx = results.Count;
           if (startIdx==endIdx)
               return;


           for (int i = startIdx; i < endIdx; i++)
           {
               results.GetNthDoc(i);

               IndexDocument id = new IndexDocument();
               id.DocId = results.CurrentItem.DocId;
               id.FileName= results.CurrentItem.Filename;

               if (id.FileName!= null)
               {

                   DataRow row = newIndexDoc.NewRow();

                   row["DocId"] = id.DocId;
                   row["FileName"] = id.FileName;

                   newIndexDoc.Rows.Add(row);
               }


           }

           newIndexDoc.AcceptChanges();

           //SqlBulkCopy
           using (SqlConnection connection =
                  new SqlConnection(db.Database.Connection.ConnectionString))
           {
               connection.Open();

               using (SqlBulkCopy bulkCopy = new SqlBulkCopy(connection))
               {
                   bulkCopy.DestinationTableName =
                       "dbo.IndexDocument";

                   try
                   {
                       // Write from the source to the destination.
                       bulkCopy.WriteToServer(newIndexDoc);
                   }
                   catch (Exception ex)
                   {
                       Console.WriteLine(ex.Message);
                   }
               }
           }

           newIndexDoc.Clear();
           db.UpdateIndexDocument();
        }

score 1 · Accepted Answer

为了提高您的速度，您可以搜索单词“ xfirstword ”并在索引中获取所有文档。

您还可以查看常见问题解答如何检索索引中的所有文档

score 0 · Accepted Answer

这是我在 SearchResults 界面中使用 AddDoc 方法的新解决方案：

首先从 IndexInfo 中获取 StartingDocID 和 LastDocID 并像这样遍历循环：

function GetFilename(paDocID: Integer): String;    
var
  lCOMSearchResults:       ISearchResults;
  lSearchResults_Count:    Integer;
begin
  if Assigned(prCOMServer) then
  begin
    lCOMSearchResults := prCOMServer.NewSearchResults as ISearchResults;
    lCOMSearchResults.AddDoc(GetIndexPath(prIndexContent), paDocID, 0);
    lSearchResults_Count := lCOMSearchResults.Count;

    if lSearchResults_Count = 1 then
    begin
      lCOMSearchResults.GetNthDoc(0);
      Result := lCOMSearchResults.DocDetailItem['_Filename'];
    end;
  end;
end

sql-server - 在 SQL 数据库中记录来自 dtSearch 的所有 DocIds 和 FileNames 的最快方法

4 回答 4

Related

Reference