8

请看以下内容。

String[]sentenceHolder = titleAndBodyContainer.split("\n|\\.(?!\\d)|(?<!\\d)\\.");

这就是我尝试将段落拆分为句子的方式。但有个问题。我的段落包括日期之类Jan. 13, 2014的、单词之类的U.S和数字之类的2.2。他们都被上面的代码分开了。所以基本上,无论是否是句号,这段代码都会分割很多“点”。

我也试过String[]sentenceHolder = titleAndBodyContainer.split(".\n");String[]sentenceHolder = titleAndBodyContainer.split("\\.");。都失败了。

如何“正确”地将段落拆分为句子?

4

3 回答 3

19

你可以试试这个

String str = "This is how I tried to split a paragraph into a sentence. But, there is a problem. My paragraph includes dates like Jan.13, 2014 , words like U.S and numbers like 2.2. They all got split by the above code.";

Pattern re = Pattern.compile("[^.!?\\s][^.!?]*(?:[.!?](?!['\"]?\\s|$)[^.!?]*)*[.!?]?['\"]?(?=\\s|$)", Pattern.MULTILINE | Pattern.COMMENTS);
Matcher reMatcher = re.matcher(str);
while (reMatcher.find()) {
    System.out.println(reMatcher.group());
}

输出:

This is how I tried to split a paragraph into a sentence.
But, there is a problem.
My paragraph includes dates like Jan.13, 2014 , words like U.S and numbers like 2.2.
They all got split by the above code.
于 2014-01-29T12:12:42.337 回答
1
String[] sentenceHolder = titleAndBodyContainer.split("(?i)(?<=[.?!])\\S+(?=[a-z])");

试试这个它对我有用。

于 2014-01-29T12:10:38.367 回答
0

这会将段落拆分为. ? !

String a[]=str.split("\\.|\\?|\\!");

\\您可以将要使用的任何符号放在后面,并用于|分隔每个条件。

于 2017-03-06T10:57:43.647 回答