我正在尝试使用LDA减少数据集。我希望在减少的数据集上,我的准确性会降低。但是,根据我得到的随机种子,有时简化版本会给我更高的准确性。
X, y = make_classification(1000, 50, n_informative=10, n_classes=20)
X1, X2, y1, y2 = train_test_split(X, y)
lda = LDA()
lda.fit(X1, y1)
predicted = lda.predict(X2)
full_accuracy = accuracy_score(y2, predicted)
reduction = LDA(n_components=5)
X1red = reduction.fit_transform(X1, y1)
X2red = reduction.transform(X2)
lda.fit(X1red, y1)
predicted = lda.predict(X2red)
reduced_accuracy = accuracy_score(predicted, y2)
print full_accuracy, reduced_accuracy, reduced_accuracy/full_accuracy
# prints 0.132 0.16 1.21212121212
你知道为什么降维后我的准确率更高吗?