python - 将 Snowballstemmer 应用于每个单词的 Pandas 数据框

翻译自：https://stackoverflow.com/questions/66371722 2021-02-25T15:43:06.197

73 次

所以我想在数据框的列（未提取）上使用 Snowballstemmer 应用词干，以便使用分类算法。

所以我的代码如下所示：

df = pd.read_excel(...)
df["content"] = df['column2'].str.lower()
stopword_list = nltk.corpus.stopwords.words('dutch')
df['unstemmed'] = df['content'].apply(lambda x: ' '.join([word for word in x.split() if word not in (stopword_list)]))
df["unstemmed"] = df["unstemmed"].str.replace(r"[^a-zA-Z ]+", " ").str.strip()
df["unstemmed"] = df["unstemmed"].replace('\s+', ' ', regex=True)

df['unstemmed'] = df['unstemmed'].str.split()
df['stemmed'] = df['unstemmed'].apply(lambda x : [stemmer.stem(y) for y in x])

因此，首先，我将所有大写字母转换为小写字母并删除所有荷兰语停用词。接下来是删除所有特殊字符，然后拆分所有单词。我检查了所有列都是“对象”。

我收到以下错误：stem() 缺少 1 个必需的位置参数：'token'

我该如何解决这个问题？

python - 将 Snowballstemmer 应用于每个单词的 Pandas 数据框

0 回答 0

Related

Reference