1

我正在使用 python 对推文进行情感分析。在清理推文的过程中,我想从标签中提取单词。我发现 wordsegment 库非常有效地完成了这项工作。但是我的问题是,wordsegment 库在我使用df['tweet].apply(lambda x: segment(x)). segment()我想我可以通过对主题标签应用操作来减少这个时间。为此,我首先创建了一个函数作为休闲:

def extract_words(hashtags):
    words = " ".join(segment(hashtags))
    return words

然后我尝试使用 re.sub

df['tweet'] = df['tweet'].apply(lambda x: re.sub(r'#(\w)+', extract_words, x))

此代码不起作用并给了我一个错误。我该怎么做才能仅在主题标签上应用细分?

4

1 回答 1

0

作为替代方案,您可以在函数中使用re.findallextract_words获取每条推文的所有主题标签出现并放入列表中。正则表达式应更改为(#\w+),将主题标签和一个或多个量词放在捕获组内,这将简化后续替换功能。从那里,您可以替换该主题标签的segment函数结果找到的每个主题标签。

输入样本.csv

tweets
"RT @NatGeo: Watch: When humans live on Mars, what exactly will they call home? https://xxxxxxx/QlJBuB6FX2 #CountdownToMars"
"RT @HodderBooks: With the delectable #ActsOfLove publishing tomorrow, here's @TalulahRiley�herself to tell you about her debut novel: https?"
"RT @solarimpulse: BREAKING: we flew 40'000km without fuel. It's a first for energy, take it further! #futureisclean https://xxxxxxx/JCvKTDBVZx"
"RT @TeslaRoadTrip: #TeslaRoadTrip All - thanks so much for following our twitter feed.  The trip was a success and everyone has diverted ..."
"RT @TeslaMotors: Agreed. @FTC affirms States to allow consumers to choose how they buy their cars. #Tesla #Michigan http://xxxxxxx/fT1JHjMpzg""
import pandas as pd
import re
import wordsegment as ws
ws.load()

def extract_words(tweet):
    hashtags = re.findall(r"(#\w+)", tweet)
    for hs in hashtags:
        words = " ".join(ws.segment(hs))
        tweet = tweet.replace(hs, words)
    return tweet

df = pd.read_csv("sample.csv")
print(df)

df['NewTweet'] = df['tweets'].apply(lambda x: extract_words(x))
print(df)

来自NewTweet的输出

RT @NatGeo: Watch: When humans live on Mars, what exactly will they call home? https://xxxxxxx/QlJBuB6FX2 countdown to mars
RT @HodderBooks: With the delectable acts of love publishing tomorrow, here's @TalulahRiley�herself to tell you about her debut novel: https?
RT @solarimpulse: BREAKING: we flew 40'000km without fuel. It's a first for energy, take it further! future is clean https://xxxxxxx/JCvKTDBVZx
RT @TeslaRoadTrip: tesla road trip All - thanks so much for following our twitter feed.  The trip was a success and everyone has diverted ...
RT @TeslaMotors: Agreed. @FTC affirms States to allow consumers to choose how they buy their cars. tesla michigan http://xxxxxxx/fT1JHjMpzg
于 2020-09-12T21:23:54.353 回答