1

I am facing some issues with my k-means clustering results on Alteryx. I am trying to conduct topic modelling on my data set of around 5000 text descriptions. After data cleaning, parsing and removing stop words and common words, I created a Document Term Matrix of 20 words and around 5000 documents.

After running K-Means Clustering on Alteryx, no matter how many clusters I indicated, there will always be only 1 document in all clusters except one with all the rest. For example:

2 Clusters

  • Cluster 1: 19 words
  • Cluster 2: 1 word

3 Clusters

  • Cluster 1: 18 words
  • Cluster 2: 1 word
  • Cluster 3: 1 word

5 Clusters

  • Cluster 1: 16 words
  • Cluster 2: 1 word
  • Cluster 3: 1 word
  • Cluster 4: 1 word
  • Cluster 5: 1 word

This clustering behavior happens no matter how many clusters I indicated. Looking for some help to shed some light and identify if these results would mean my data has problems or if I did not use the correct settings?

Thanks in advance!

4

1 回答 1

0

您是否在预处理后查看过数据?

现在可能很多文档都是空的,或者只包含一个单词。

除了找到常用词外,剩下的不多了。

于 2018-10-18T08:13:15.387 回答