1

我们有一个问答语料库,如下所示

Q: Why did Lincoln issue the Emancipation Proclamation? 
A: The goal was to weaken the rebellion, which was led and controlled by slave owners.

Q: Who is most noted for his contributions to the theory of molarity and molecular weight?  
A: Amedeo Avogadro

Q: When did he drop John from his name? 
A: upon graduating from college

Q: What do beetles eat? 
A: Some are generalists, eating both plants and animals. Other beetles are highly specialised in their diet.


将问题视为查询,将答案视为文档。
我们必须建立一个系统,对于给定的查询(语义上类似于问题语料库中的一个问题)能够获得正确的文档(答案语料库中的答案)
任何人都可以提出任何算法或好方法来进行构建它。

4

2 回答 2

3

你的问题太宽泛了,你要做的任务很有挑战性。但是,我建议您阅读有关基于 IR 的 Factoid Question Answering 的内容。本文档参考了许多最先进的技术。阅读本文档应该会引导您产生几个想法。

请注意,对于基于 IR 的 Factoid QA 和基于知识的 QA,您需要遵循不同的方法。首先,确定您要构建的 QA 系统类型。

最后,我相信简单的 QA 文档匹配技术是不够的。但是您可以使用建议的@Debasis 尝试简单的方法Lucene,看看它是否效果很好。

于 2017-02-26T19:11:54.703 回答
0

考虑一个问题及其答案(假设只有一个)作为Lucene中的一个文档。Lucene 支持文档的字段视图;因此,在构建文档时,对可搜索字段提出质疑。一旦您检索到给定查询问题的排名靠前的问题,请使用Document类的 get 方法返回答案。

代码框架(自己填写):

//Index
IndexWriterConfig iwcfg = new IndexWriterConfig(new StandardAnalyzer());
IndexWriter writer = new IndexWriter(...);
....
Document doc = new Document();
doc.add(new Field("FIELD_QUESTION", questionBody, Field.Store.YES, Field.Index.ANALYZED));
doc.add(new Field("FIELD_ANSWER", answerBody, Field.Store.YES, Field.Index.ANALYZED));
...
...
// Search
IndexReader reader = new IndexReader(..);
IndexSearcher searcher = new IndexSearcher(reader);
...
...
QueryParser parser = new QueryParser("FIELD_QUESTION", new StandardAnalyzer());
Query q = parser.parse(queryQuestion);
...
...
TopDocs topDocs = searcher.search(q, 10); // top-10 retrieved
// Accumulate the answers from the retrieved questions which
// are similar to the query (new) question.
StringBuffer buff = new StringBuffer();
for (ScoreDoc sd : topDocs.scoreDocs) {
    Document retrievedDoc = reader.document(sd.doc);
    buff.append(retrievedDoc.get("FIELD_ANSWER")).append("\n");
}
System.out.println("Generated answer: " + buff.toString());
于 2017-02-25T22:28:42.563 回答