我正在尝试创建一个界面来控制深度学习程序。我使用 QThread 的第二个线程来训练模型。我使用停止按钮来停止训练线程。在我按下停止按钮后,线程实际上已经结束,但 GPU 内存并没有释放。我的两个按钮的代码:
def start_training(self):
self.train_thread = run_training()
self.train_thread.start()
# thread确实被停止,但pytorch的显存占用没有交还给OS
def stop_training(self):
self.train_thread.cfg['training'] = False
time.sleep(2)
print('is running:', self.train_thread.isRunning())
print('finished:', self.train_thread.isFinished())
class run_training(QThread):
def __init__(self):
super().__init__()
self.cfg = {'training': True}
def run(self):
training(self.cfg)
if __name__ == '__main__':
app = QApplication()
scf = SolarCellForm()
scf.main_ui.show()
app.exec()
培训计划:
def training(cfg):
net = ResNet(depth=50, num_classes=10).cuda().train()
while cfg['training']:
inp = torch.randn((1, 3, 512, 512), dtype=torch.float32).cuda()
target = torch.tensor([1], dtype=torch.int64).cuda()
out = net(inp, target)
在我按下停止按钮后:
但是训练所需的 GPU 内存仍然被占用。