0

我正在开发一个使用 Huggingface BERT 实现和 Python 中的 KTrain 库的项目。我正在尝试实现一个带有 6 个标签(德国政党)的二元分类器。在学习 BERT 时,会返回每个 epoch 的损失和准确率值。我注意到所有值的准确率都非常低,只有 10-20%。另一方面,如果我使用 sk-learn 的 accuracy_score 计算准确度,我会得到 75-80% 的准确度值(计算每个标签的准确度,然后取所有 6 个值的平均值)。

我的猜测是该库同时计算所有 6 个标签的准确度分数(例如,需要正确预测所有六个标签才能算作正确)。阅读有关库和 BERT 的文档,我无法确定如何计算准确性。有人知道或有更多信息吗?

我的数据: 在此处输入图像描述

我的代码:

#model_name = "dbmdz/bert-base-german-cased"
model_name = "deepset/gbert-base"
t = ktrain.text.Transformer(model_name, maxlen=64, class_names = parties)

trn = t.preprocess_train(df_train["text"].to_list(), df_train[parties].to_numpy())
val = t.preprocess_test(df_val["text"].to_list(), df_val[parties].to_numpy())

model = t.get_classifier()
learner = ktrain.get_learner(model, train_data=trn, val_data=val, batch_size=64)

history_model = learner.autofit(1e-5, epochs=10)

使用最大 lr 为 1e-05 的三角学习率策略开始训练... Epoch 1/10 38/38 [========================= =====] - 93s 2s/步 - 损失:0.6947 - 准确度:0.1353 - val_loss:0.6873 - val_accuracy:0.0965 Epoch 2/10 38/38 [=============== ================]

  • 72s 2s/step - loss: 0.6891 - accuracy: 0.1382 - val_loss: 0.6837 - val_accuracy: 0.1555 Epoch 3/10 38/38 [===================== ========]
  • 71s 2s/step - loss: 0.6850 - accuracy: 0.1378 - val_loss: 0.6777 - val_accuracy: 0.1378 Epoch 4/10 38/38 [====================== ========]
  • 72s 2s/step - loss: 0.6770 - accuracy: 0.1408 - val_loss: 0.6689 - val_accuracy: 0.1437 Epoch 5/10 38/38 [===================== ========]
  • 72s 2s/step - loss: 0.6680 - accuracy: 0.1386 - val_loss: 0.6585 - val_accuracy: 0.1555 Epoch 6/10 38/38 [====================== ========]
  • 72s 2s/step - loss: 0.6591 - accuracy: 0.1601 - val_loss: 0.6471 - val_accuracy: 0.1713 Epoch 7/10 38/38 [====================== ========]
  • 72s 2s/step - loss: 0.6429 - accuracy: 0.1673 - val_loss: 0.6340 - val_accuracy: 0.1575 Epoch 8/10 38/38 [====================== ========]
  • 72s 2s/step - loss: 0.6263 - accuracy: 0.1580 - val_loss: 0.6211 - val_accuracy: 0.1417 Epoch 9/10 38/38 [===================== ========]
  • 72s 2s/step - loss: 0.6018 - accuracy: 0.1618 - val_loss: 0.6008 - val_accuracy: 0.1634 Epoch 10/10 38/38 [====================== ========] - 72s 2s/步 - 损失:0.5703 - 准确度:0.1551 - val_loss:0.5822 - val_accuracy:0.1693
predictor = ktrain.get_predictor(learner.model, preproc=t)

from sklearn.metrics import accuracy_score

predictions = predictor.predict(df_val.text.tolist(),return_proba=True) 

df_pred = pd.DataFrame(predictions,columns=parties, index=df_val.index)

for party in parties:
  df_compare = pd.DataFrame(df_val[party])
  df_compare["predictions"] = df_val[party]
  df_compare = df_compare[df_compare[party] != 0.5]
  df_compare.dropna(inplace= True)
  df_compare["predictions"] = df_compare["predictions"].round()
  print(f"{party} accuracy score is: {round(accuracy_score(df_compare[party], df_compare['predictions']), 3)}")

cdu/csu 准确度得分为:0.778

spd准确度得分为:0.802

grüne 准确度得分为:0.762

fdp准确度得分为:0.745

die linke 准确度得分为:0.794

afd 准确度得分为:0.782

4

0 回答 0