我有一个熊猫数据框 df ,它看起来像这样:
column1
0 apple is a fruit
1 fruit sucks
2 apple tasty fruit
3 fruits what else
4 yup apple map
5 fire in the hole
6 that is true
我想生成一个column2,它是行中每个单词的列表以及整个列中每个单词的总数。所以输出会是这样的......
column1 column2
0 apple is a fruit [('apple', 3),('is', 2),('a', 1),('fruit', 3)]
1 fruit sucks [('fruit', 3),('sucks', 1)]
我尝试使用 sklearn,但未能实现上述目标。需要帮忙。
from sklearn.feature_extraction.text import CountVectorizer
v = CountVectorizer()
x = v.fit_transform(df['text'])