python - 使用过多 cpu 的简单 python 脚本

Question

最近我的 vps 告诉我，因为我的 python 脚本使用了太多的 cpu（显然该脚本使用了整个核心几个小时）。

我的脚本使用 twython 库来传输推文

def on_success(self, data):

    if 'text' in data:
        self.counter += 1
        self.tweetDatabase.save(Tweet(data))

        #we only want to commit when we have a batch
        if self.counter >= 1000:
            print("{0}: commiting {1} tweets".format(datetime.now(), self.counter))
            self.counter = 0
            self.tweetDatabase.commit()

Tweet 是一个类，它的工作是丢弃关于我不需要的推文的元数据：

class Tweet():

    def __init__(self, json):

        self.user = {"id" : json.get('user').get('id_str'), "name" : json.get('user').get('name')}
        self.timeStamp = datetime.datetime.strptime(json.get('created_at'), '%a %b %d %H:%M:%S %z %Y')
        self.coordinates  = json.get('coordinates')
        self.tweet = {
                        "id" : json.get('id_str'),
                        "text" : json.get('text').split('#')[0],
                        "entities" : json.get('entities'),
                        "place" :  json.get('place')
                     }

        self.favourite = json.get('favorite_count')
        self.reTweet = json.get('retweet_count')

它还有一个__str__方法可以返回对象的超紧凑字符串表示

只是将推文tweetDatabase.commit()保存到文件中，而tweetDatabase.Save()只是将推文保存到列表中：

def save(self, tweet):
    self.tweets.append(tweet.__str__())

def commit(self):
    with open(self.path, mode='a', encoding='utf-8') as f:
        f.write('\n'.join(self.tweets))

    self.tweets = []

保持cpu低的最佳方法是什么？如果我睡着了，我会丢失推文，因为那将是程序花在不听推特 API 上的时间。尽管如此，我在程序写入文件后尝试睡一秒钟，但这并没有让 cpu 下降。对于每 1000 条推文保存到文件的记录，每分钟只需要一次。

非常感谢

score 1 · Accepted Answer

您可以尝试使用

import cProfile
command = """<whatever line that starts your program>"""
cProfile.runctx( command, globals(), locals(), filename="OpenGLContext.profile" )

然后使用 RunSnakeRun ( http://www.vrplumber.com/programming/runsnakerun/ )查看 OpenGLContext.profile

块越大，该函数占用的 CPU 时间就越多。这将帮助您准确定位程序的哪个部分占用大量 CPU

score 1 · Accepted Answer

尝试检查是否需要先在 on_success() 中提交。然后，检查推文是否有您要保存的数据。您可能还需要考虑 self.counter 变量的竞争条件，并且可能应该将 self.count 的更新包装在互斥锁或类似的东西中。

python - 使用过多 cpu 的简单 python 脚本

2 回答 2

Related

Reference