0

我是 pytorch_lightning 的新手,我的训练进展顺利,但由于某种原因,training_epoch_end 在一些步骤后被调用,而不是在纪元结束时调用。

这些是我的输出:

GPU 可用:False,已使用:False

TPU 可用:无,使用:0 个 TPU 内核

验证健全性检查:0%| | 0/2 [00:00<?, ?it/s]

| 姓名 | 类型 | 参数



纪元 0:0%| | 0/13 [00:00<?, ?it/s]

纪元 0:23%|██▎ | 3/13 [01:38<05:27, 32.75s/it, loss=4.73, v_num=7]

//training_epoch_end: 输出 = [{'loss': tensor(6.4593)}, {'loss': tensor(5.7653)}, {'loss': tensor(1.9642)}]

验证:0it [00:00, ?it/s]

验证:0%| | 0/10 [00:00<?, ?it/s]

纪元 0:38%|███▊ | 5/13 [01:48<02:54, 21.78s/it, loss=4.73, v_num=7]

纪元 0:46%|████▌ | 6/13 [01:59<02:19, 19.91s/it, loss=4.73, v_num=7]

纪元 0:54%|█████▍ | 7/13 [02:10<01:51, 18.58s/it, loss=4.73, v_num=7]

纪元 0:62%|██████▏ | 8/13 [02:20<01:27, 17.60s/it, loss=4.73, v_num=7]

纪元 0:69%|██████▉ | 9/13 [02:31<01:07, 16.83s/it, loss=4.73, v_num=7]

纪元 0:77%|███████▋ | 10/13 [02:42<00:48, 16.21s/it, loss=4.73, v_num=7]

纪元 0:85%|████████▍ | 11/13 [02:52<00:31, 15.71s/it, loss=4.73, v_num=7]

纪元 0:92%|█████████▏| 12/13 [03:04<00:15, 15.34s/it, loss=4.73, v_num=7]

纪元 0:100%|██████████| 13/13 [03:15<00:00, 15.00s/it, loss=4.73, v_num=7]

纪元 0:100%|██████████| 13/13 [03:16<00:00, 15.15s/it, loss=4.73, v_num=7]

时期 1:23%|██▎ | 3/13 [01:42<05:42, 34.24s/it, loss=3.39, v_num=7]

//training_epoch_end: 输出 = [{'loss': tensor(2.6766)}, {'loss': tensor(2.3010)}, {'loss': tensor(1.1722)}]

纪元 1:31%|███ | 4/13 [01:48<04:04, 27.22s/it, loss=3.39, v_num=7]

验证:0it [00:00, ?it/s]

时期 1:38%|███▊ | 5/13 [02:02<03:15, 24.42s/it, loss=3.39, v_num=7]

已完成 6.8 MiB/327.9 MiB (48.7 KiB/s),剩余 2 个文件

时期 1:100%|██████████| 13/13 [03:48<00:00, 17.54s/it, loss=3.39, v_num=7]

纪元 2:23%|██▎ | 3/13 [01:44<05:47, 34.72s/it, loss=2.72, v_num=7]

//training_epoch_end: 输出 = [{'loss': tensor(1.2504)}, {'loss': tensor(1.4905)}, {'loss': tensor(1.4158)}]

纪元 2:31%|███ | 4/13 [01:49<04:07, 27.48s/it, loss=2.72, v_num=7]

验证:0it [00:00, ?it/s]

时期 2:100%|██████████| 13/13 [03:50<00:00, 17.75s/it, loss=2.72, v_num=7]

时期 3:23%|██▎ | 3/13 [01:43<05:46, 34.62s/it, loss=2.27, v_num=7]

training_epoch_end: 3 个输出 = [{'loss': tensor(0.6632)}, {'loss': tensor(0.9215)}, {'loss': tensor(1.1396)}]

时期 3:31%|███ | 4/13 [01:49<04:06, 27.41s/it, loss=2.27, v_num=7]

验证:0it [00:00, ?it/s]

有谁知道为什么会这样?

工人或 GPU 的数量不影响订单。

谢谢!!!

4

1 回答 1

0

好吧,那完全是我的错!

我对 0..13 纪元的打印输出感到困惑,不明白其中只有三个在训练,其余的在验证。

于 2021-06-01T09:37:22.963 回答