我是数组编程的新手,发现很难解释 sklearn.metrics label_ranking_average_precision_score 函数。需要您的帮助来理解它的计算方式,并感谢任何学习 Numpy 数组编程的技巧。
一般来说,我知道精度是
((True Positive) / (True Positive + False Positive))
我之所以问这个问题是因为我偶然发现了 Kaggle Competition for Audio Tagging 并看到了这篇文章,说当响应中有多个正确标签时,他们正在使用 LWRAP 函数来计算分数。我开始阅读以了解该分数是如何计算的,但发现很难解释。我的两个困难是
1)从文档中解释 Math 函数,我不确定如何在分数计算中使用排名
2)从代码中解释 Numpy 数组操作
我正在阅读的函数来自Google Collab 文档,然后我尝试阅读文档在sklearn但无法正确理解。
一个样本计算的代码是
# Core calculation of label precisions for one test sample.
def _one_sample_positive_class_precisions(scores, truth):
"""Calculate precisions for each true class for a single sample.
Args:
scores: np.array of (num_classes,) giving the individual classifier scores.
truth: np.array of (num_classes,) bools indicating which classes are true.
Returns:
pos_class_indices: np.array of indices of the true classes for this sample.
pos_class_precisions: np.array of precisions corresponding to each of those
classes.
"""
num_classes = scores.shape[0]
pos_class_indices = np.flatnonzero(truth > 0)
# Only calculate precisions if there are some true classes.
if not len(pos_class_indices):
return pos_class_indices, np.zeros(0)
# Retrieval list of classes for this sample.
retrieved_classes = np.argsort(scores)[::-1]
# class_rankings[top_scoring_class_index] == 0 etc.
class_rankings = np.zeros(num_classes, dtype=np.int)
class_rankings[retrieved_classes] = range(num_classes)
# Which of these is a true label?
retrieved_class_true = np.zeros(num_classes, dtype=np.bool)
retrieved_class_true[class_rankings[pos_class_indices]] = True
# Num hits for every truncated retrieval list.
retrieved_cumulative_hits = np.cumsum(retrieved_class_true)
# Precision of retrieval list truncated at each hit, in order of pos_labels.
precision_at_hits = (
retrieved_cumulative_hits[class_rankings[pos_class_indices]] /
(1 + class_rankings[pos_class_indices].astype(np.float)))
return pos_class_indices, precision_at_hits