python - 删除 2 个连续的行

Question

我有以下格式的数据：

#@ <id_wxyz_1>
A line written after this.

#@ <id_123>
A line written after this one also.

#@ <id_wxyz_2>
One more line.

#@ <id_yex_9>
Another line.

现在我想删除 2 行：#@ <...> 中包含“wxyz”的行及其下一行。我想要的示例输出是：

#@ <id_123>
A line written after this one also.

#@ <id_yex_9>
Another line.

是否有一些 linux 命令也可以实现相同的功能，或者在 python 中是否有一些有效的方法来实现相同的功能。我知道我可以使用 grep、sed 等有选择地删除一行。但是是否可以使用 linux 命令有选择地删除 2 个连续的行

编辑：给出的答案非常好，但它们不适用于以下形式的输入：

#@ <id_wxyz_1>
A line written after this.

#@ <id_wxyz_2>
A line written after this.

#@ <id_wxyz_3>
A line written after this.

#@ <id_wxyz_4>
A line written after this.

#@ <id_wxyzadded5>
A line written after this.

对于上述输入，我应该没有输出行。

再次编辑：我拥有的另一组输入是：

#@ <id_wxyz0>
Line 1.
#@ <id_wxyz1>
line 2.
#@ <id_wxyz2> 
line 3.
#@ <id_wxyz3> 
line 4.
#@ <id_6>
line 5.

输出应该是

#@ <id_6>
line 5.

score 4 · Accepted Answer

您可以使用 sed by 来执行此操作，例如。

/^#@ <.*wxyz.*>/ {
   N        #Add the next line to the pattern space
   s/.*//   #clear the line
   N        #Read another line
   /^\n$/ d #if line was blank, delete and start next cycle (reading again)
   D        #Otherwise, delete up to newline, and start next cycle with that

}

注意：对于第二种情况，它实际上仍然输出一个空行

score 2 · Accepted Answer

您也可以简单地使用grep.

示例：给定您的输入

$ cat t
#@ <id_wxyz_1>
A line written after this.

#@ <id_123>
A line written after this one also.

#@ <id_wxyz_2>
One more line.

#@ <id_yex_9>
Another line.

#@ <id_wxyz_1>
A line written after this.

#@ <id_wxyz_2>
A line written after this.

#@ <id_wxyz_3>
A line written after this.

#@ <id_wxyz_4>
A line written after this.

#@ <id_wxyzadded5>
A line written after this.

#@ <id_wxyz0>
Line 1.
#@ <id_wxyz1>
line 2.
#@ <id_wxyz2> 
line 3.
#@ <id_wxyz3> 
line 4.
#@ <id_6>
line 5.

你可以跑

$ grep -A1  --group-separator=""  -P '#[^_]*((?!wxyz).)*$' t
#@ <id_123>
A line written after this one also.

#@ <id_yex_9>
Another line.

#@ <id_6>
line 5.

正则表达式使用类似 Perl 的语法（因此是参数）匹配以开头#且不包含的行。在匹配后添加一行到输出。未记录的选项替换了使用(or or ) 选项时通常分隔行组的默认值。请注意，后一个选项并非在所有实现中都可用。wxyz-P-A1--group-separator=""---AB-C

score 1 · Accepted Answer

使用awk你可以说：

awk '/^#@ <.*wxyz.*>/{getline;getline}1' filename

编辑：根据您修改后的问题，您可以说：

sed '/^#@ <id_wxyz.*/,/^$/d' filename

score 1 · Accepted Answer

您也可以使用awk。当它与该行匹配时，getline对以下两行使用两次并使用next以避免打印它们。

awk '/^#@[[:blank:]]+<.*wxyz.*>/ { getline; getline; next } { print }' infile

它产生：

#@ <id_123>
A line written after this one also.

#@ <id_yex_9>
Another line.

UPDATE为OP的新编辑提供解决方案：

awk  '
    BEGIN { RS = "#@" } 
    $1 ~ /[^[:space:]]/ && $1 !~ /<.*wxyz.*>/ { 
        sub(/\n[[:blank:]]*$/, "")
        print RS, $0 
    }
' infile

在你的最后一个例子中，它产生：

#@  <id_6>
line 5.

python - 删除 2 个连续的行

4 回答 4

Related

Reference