0

我在 pandas 数据帧上应用 K-means 聚类。集群分配函数如下:

def assign_to_cluster(row):
    lowest_distance = -1
    closest_cluster = -1

    for cluster_id, centroid in centroids_dict.items():
        df_row = [row['PPG'],row['ATR']]
        euclidean_distance = calculate_distance(centroids, df_row)

        if lowest_distance == -1:
            lowest_distance = euclidean_distance
            closest_cluster = cluster_id
        elif euclidean_distance < lowest_distance:
            lowest_distance = euclidean_distance
            closest_cluster = cluster_id
    return closest_cluster

point_guards['CLUSTER'] = point_guards.apply(lambda row: assign_to_cluster(row), axis=1)

但是在使用 lambda 函数时出现以下错误:

   1945                 return self._engine.get_loc(key)
   1946             except KeyError:
-> 1947                 return         self._engine.get_loc(self._maybe_cast_indexer(key))
   1948 
   1949         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:4154)()

pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:4018)()

pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item     (pandas\hashtable.c:12368)()

pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12322)()

KeyError: (0, 'occurred at index 0')

有人可以解释错误的原因以及如何解决吗?如果您需要更多信息,请回复此帖子。并为格式化道歉。这是我第一次在 StackOverflow 中提问。

4

1 回答 1

0

事实证明,我犯了一个简单的语法错误。而不是在调用函数'calculate_distance'时使用字典'centroid_dict.items()'的'centroid'部分:

for cluster_id, centroid in centroids_dict.items():
    df_row = [row['PPG'],row['ATR']]
    euclidean_distance = calculate_distance(centroid, df_row)
....

我改用“质心”:

for cluster_id, centroid in centroids_dict.items():
    df_row = [row['PPG'],row['ATR']]
    euclidean_distance = calculate_distance(centroids, df_row)

不过现在已经解决了。

于 2017-02-21T15:35:33.350 回答