python - 使用python提取句子

Question

如果该句子中存在特定单词，我想提取确切的句子。谁能让我知道如何用python做到这一点。我使用了 concordance() 但它只打印单词匹配的行。

score 4 · Accepted Answer

快速提醒一下：断句实际上是一件相当复杂的事情，句号规则也有例外，例如“先生”。或“博士” 还有各种句尾标点符号。但是例外也有例外（例如，如果下一个单词是大写的并且不是专有名词，那么 Dr. 可以结束一个句子）。

如果您对此感兴趣（这是一个自然语言处理主题），您可以查看：
自然语言工具包 (nltk) punkt 模块。

score 1 · Accepted Answer

如果您将每个句子都放在一个字符串中，您可以在您的单词上使用 find()，如果找到则返回该句子。否则你可以使用正则表达式，像这样

pattern = "\.?(?P<sentence>.*?good.*?)\."
match = re.search(pattern, yourwholetext)
if match != None:
    sentence = match.group("sentence")

我还没有测试过这个，但沿着这些思路。

我的测试：

import re
text = "muffins are good, cookies are bad. sauce is awesome, veggies too. fmooo mfasss, fdssaaaa."
pattern = "\.?(?P<sentence>.*?good.*?)\."
match = re.search(pattern, text)
if match != None:
    print match.group("sentence")

score 0 · Accepted Answer

dutt 很好地回答了这个问题。只是想添加一些东西

import re

text = "go directly to jail. do not cross go. do not collect $200."
pattern = "\.(?P<sentence>.*?(go).*?)\."
match = re.search(pattern, text)
if match != None:
    sentence = match.group("sentence")

显然，您需要在开始之前导入正则表达式库（import re）。这是正则表达式实际作用的拆解（更多信息可以在Python re 库页面找到）

\. # looks for a period preceding sentence.
(?P<sentence>...) # sets the regex captured to variable "sentence".
.*? # selects all text (non-greedy) until the word "go".

再次，图书馆参考页面的链接是关键。

python - 使用python提取句子

3 回答 3

Related

Reference