1

我一直在使用 Python Reddit API Wrapper (PRAW) 从 Reddit 收集特定评论,我常用的功能之一是replace_more_comments()收集线程的所有评论。

其中一些线程非常大——例如 10,000 条评论——并且需要一段时间来收集所有评论。有没有办法显示进度条replace_more_comments()

这是一个最小的工作代码示例:

import praw
r = praw.Reddit('MSU vs Nebraska game')
submission = r.get_submission(submission_id='3rxx3y')
flat_comments = praw.helpers.flatten_tree(submission.comments)
submission.replace_more_comments(limit=None, threshold=0)
all_comments = submission.comments
flat_comments = praw.helpers.flatten_tree(submission.comments)
4

1 回答 1

0

的内置实现replace_more_comments不支持这一点,但您可以编写自己的版本。作为参考,这是原始实现

我不知道如何绘制实际的进度条;你必须写update_progress_bar。我也没有测试过这段代码,它可能根本不起作用。

def replace_more_comments(self, post):
    """Update the comment tree by replacing instances of MoreComments."""
    if post._replaced_more:
        return

    more_comments = post._extract_more_comments(comment.comments)

    # Estimate the total number of comments
    count = 0
    for item in more_comments:
        count += item.count

    update_progress_bar(0, count)

    num_loaded = 0

    while more_comments:
        item = heappop(more_comments)

        # Fetch new comments and decrease remaining if a request was made
        new_comments = item.comments(update=False)
        elif new_comments is None:
            continue

        # Re-add new MoreComment objects to the heap of more_comments
        for more in self._extract_more_comments(new_comments):
            more._update_submission(post)  # pylint: disable=W0212
            heappush(more_comments, more)
        # Increase progress bar
        num_loaded += len(new_comments)
        update_progress_bar(num_loaded, count)
        # Insert the new comments into the tree
        for comment in new_comments:
            post._insert_comment(comment)

    post._replaced_more = True
于 2015-11-24T18:09:36.567 回答