python - 如何使用 Beautiful Soup 查找所有评论

Question

这个问题是四年前提出的，但现在对于 BS4 来说答案已经过时了。

我想使用漂亮的汤删除我的 html 文件中的所有评论。由于 BS4 将每个评论作为一种特殊类型的可导航字符串，我认为这段代码可以工作：

for comments in soup.find_all('comment'):
     comments.decompose()

所以这不起作用....如何使用 BS4 找到所有评论？

score 25 · Accepted Answer

您可以将函数传递给 find_all() 以帮助它检查字符串是否为 Comment。

例如我有以下html：

<body>
   <!-- Branding and main navigation -->
   <div class="Branding">The Science &amp; Safety Behind Your Favorite Products</div>
   <div class="l-branding">
      <p>Just a brand</p>
   </div>
   <!-- test comment here -->
   <div class="block_content">
      <a href="https://www.google.com">Google</a>
   </div>
</body>

代码：

from bs4 import BeautifulSoup as BS
from bs4 import Comment
....
soup = BS(html, 'html.parser')
comments = soup.find_all(string=lambda text: isinstance(text, Comment))
for c in comments:
    print(c)
    print("===========")
    c.extract()

输出将是：

Branding and main navigation 
============
test comment here
============

find_all('Comment')顺便说一句，我认为不起作用的原因是（来自 BeautifulSoup 文档）：

传入 name 的值，您会告诉 Beautiful Soup 只考虑具有特定名称的标签。文本字符串将被忽略，名称不匹配的标签也会被忽略。

score 11 · Accepted Answer

我需要做的两件事：

一、导入Beautiful Soup时

from bs4 import BeautifulSoup, Comment

二、这里是提取评论的代码

for comments in soup.findAll(text=lambda text:isinstance(text, Comment)):
    comments.extract()

python - 如何使用 Beautiful Soup 查找所有评论

2 回答 2

Related

Reference