regex - 匹配所有出现的正则表达式

Question

我带来了直到

instance_name(.+)(?=instance_name)

我的测试字符串：

instance_name DEDUP

iops,other,1

instance_name USERSPACE_APPS

iops,read,158534981

iops,write,168514545

iops,other,1557566878

total_latency,read,38774076988

total_latency,write,36596756500

total_latency,other,96023066014

time

它只匹配

DEDUP

iops,other,1

我知道最后没有 instance_name 。但我想匹配 instance_name 之后的所有数据，直到下一个 instance_name。但是对于最后一次出现，最后没有 instance_name。我也想要那个。

我正在使用python。有什么建议么？

编辑

预期输出：

比赛1：

DEDUP

    iops,other,1

Match2: 
USERSPACE_APPS

    iops,read,158534981

    iops,write,168514545

    iops,other,1557566878

    total_latency,read,38774076988

    total_latency,write,36596756500

    total_latency,other,96023066014

score 0 · Accepted Answer

Jan 给出的答案对我不起作用，我认为它不适用于添加到文本中的任何更多 'instance_name' 值，但是这样做：

(?:(?<=instance_name\s)(?<value>(?:.|\s)*?)(?=instance_name\s|$))*

解释（由内而外）：

(?<value>(?:.|\s)*?)

这将是您想要的匹配组。该?<value>部分可以删除，因为我只是在本文档中添加它以供参考

(?<=instance_name\s)

仅当组 'value' 前面有字符串文字 'instance_name' 后跟空格（空格、制表符、换行符）时才匹配

(?=instance_name\s|$)

仅当组 'value' 后跟字符串文字 'instance_name' 或字符串末尾时才匹配

(?: <all regex from above> )*

将上面的匹配条件包装在一个新的非捕获组中，因为我们可能希望重复搜索字符串中的多个“instance_name”搜索

希望这可以帮助你：）

score 0 · Accepted Answer

将其更改为

instance_name(.+?)(?=instance_name|\Z)

这添加了一个交替 ( |)，其中\Z表示字符串的最末端（从移动设备发布，因此有点短）。

score 0 · Accepted Answer

我认为你在这里真正想做的只是分割你的字符串：

>>> s = '''instance_name DEDUP
... 
... iops,other,1
... 
... instance_name USERSPACE_APPS
... 
... iops,read,158534981
... 
... iops,write,168514545
... 
... iops,other,1557566878
... 
... total_latency,read,38774076988
... 
... total_latency,write,36596756500
... 
... total_latency,other,96023066014
... 
... time'''
>>> s.split('instance_name')
['',
 ' DEDUP\n\niops,other,1\n\n',
 ' USERSPACE_APPS\n\niops,read,158534981\n\niops,write,168514545\n\niops,other,1557566878\n\ntotal_latency,read,38774076988\n\ntotal_latency,write,36596756500\n\ntotal_latency,other,96023066014\n\ntime']

如果要删除空字符串和空格：

>>> list(filter(bool, (chunk.strip() for chunk in s.split('instance_name'))))
['DEDUP\n\niops,other,1',
 'USERSPACE_APPS\n\niops,read,158534981\n\niops,write,168514545\n\niops,other,1557566878\n\ntotal_latency,read,38774076988\n\ntotal_latency,write,36596756500\n\ntotal_latency,other,96023066014\n\ntime']

如果instance_name在您的特定情况下不是固定字符串，而是一种模式，那么您可以使用re.split().

regex - 匹配所有出现的正则表达式

3 回答 3

Related

Reference