我是一个非常新的 Python
如果有重复的单词,我想换句。
正确的
- 前任。“这真是太好了”->“这真是太好了”
- 前任。“这就是就是”-->“这就是”
现在我正在使用这个 reg。但它确实在字母上发生了变化。前任。“我和我的朋友很高兴”->“我和我的朋友很高兴”(它删除了“i”和空格)错误
text = re.sub(r'(\w+)\1', r'\1', text) #remove duplicated words in row
我怎样才能做同样的改变,但它必须检查单词而不是字母?
使用非正则表达式解决方案itertools.groupby:
>>> strs = "this is just is is"
>>> from itertools import groupby
>>> " ".join([k for k,v in groupby(strs.split())])
'this is just is'
>>> strs = "this just so so so nice"
>>> " ".join([k for k,v in groupby(strs.split())])
'this just so nice'
text = re.sub(r'\b(\w+)( \1\b)+', r'\1', text) #remove duplicated words in row
\b匹配空字符串,但仅在单词的开头或结尾。
\b:匹配单词边界
\w:任何单词字符
\1:用找到的第二个单词替换匹配项
import re
def Remove_Duplicates(Test_string):
Pattern = r"\b(\w+)(?:\W\1\b)+"
return re.sub(Pattern, r"\1", Test_string, flags=re.IGNORECASE)
Test_string1 = "Good bye bye world world"
Test_string2 = "Ram went went to to his home"
Test_string3 = "Hello hello world world"
print(Remove_Duplicates(Test_string1))
print(Remove_Duplicates(Test_string2))
print(Remove_Duplicates(Test_string3))
结果:
Good bye world
Ram went to his home
Hello world