python - 语料库上的QA查询系统

Question

我们有一个问答语料库，如下所示

Q: Why did Lincoln issue the Emancipation Proclamation? 
A: The goal was to weaken the rebellion, which was led and controlled by slave owners.

Q: Who is most noted for his contributions to the theory of molarity and molecular weight?  
A: Amedeo Avogadro

Q: When did he drop John from his name? 
A: upon graduating from college

Q: What do beetles eat? 
A: Some are generalists, eating both plants and animals. Other beetles are highly specialised in their diet.

将问题视为查询，将答案视为文档。
我们必须建立一个系统，对于给定的查询（语义上类似于问题语料库中的一个问题）能够获得正确的文档（答案语料库中的答案）
任何人都可以提出任何算法或好方法来进行构建它。

score 3 · Accepted Answer

你的问题太宽泛了，你要做的任务很有挑战性。但是，我建议您阅读有关基于 IR 的 Factoid Question Answering 的内容。本文档参考了许多最先进的技术。阅读本文档应该会引导您产生几个想法。

请注意，对于基于 IR 的 Factoid QA 和基于知识的 QA，您需要遵循不同的方法。首先，确定您要构建的 QA 系统类型。

最后，我相信简单的 QA 文档匹配技术是不够的。但是您可以使用建议的@Debasis 尝试简单的方法Lucene，看看它是否效果很好。

score 0 · Accepted Answer

考虑一个问题及其答案（假设只有一个）作为Lucene中的一个文档。Lucene 支持文档的字段视图；因此，在构建文档时，对可搜索字段提出质疑。一旦您检索到给定查询问题的排名靠前的问题，请使用Document类的 get 方法返回答案。

代码框架（自己填写）：

//Index
IndexWriterConfig iwcfg = new IndexWriterConfig(new StandardAnalyzer());
IndexWriter writer = new IndexWriter(...);
....
Document doc = new Document();
doc.add(new Field("FIELD_QUESTION", questionBody, Field.Store.YES, Field.Index.ANALYZED));
doc.add(new Field("FIELD_ANSWER", answerBody, Field.Store.YES, Field.Index.ANALYZED));
...
...
// Search
IndexReader reader = new IndexReader(..);
IndexSearcher searcher = new IndexSearcher(reader);
...
...
QueryParser parser = new QueryParser("FIELD_QUESTION", new StandardAnalyzer());
Query q = parser.parse(queryQuestion);
...
...
TopDocs topDocs = searcher.search(q, 10); // top-10 retrieved
// Accumulate the answers from the retrieved questions which
// are similar to the query (new) question.
StringBuffer buff = new StringBuffer();
for (ScoreDoc sd : topDocs.scoreDocs) {
    Document retrievedDoc = reader.document(sd.doc);
    buff.append(retrievedDoc.get("FIELD_ANSWER")).append("\n");
}
System.out.println("Generated answer: " + buff.toString());

python - 语料库上的QA查询系统

2 回答 2

Related

Reference