我有一个这样的数据集:
指数 | 标签 | 特色1 | 特征2 | 目标 |
---|---|---|---|---|
1 | 标签1 | 1.4342 | 88.4554 | 0.5365 |
2 | 标签1 | 2.5656 | 54.5466 | 0.1263 |
3 | 标签2 | 5.4561 | 845.556 | 0.8613 |
4 | 标签3 | 6.5546 | 8.52545 | 0.7864 |
5 | 标签3 | 8.4566 | 945.456 | 0.4646 |
每个标签中的条目数并不总是相同的。
我的目标是仅加载具有特定标签或标签的数据,以便我只tag1
获取一个小批量的条目,然后tag2
如果我设置了另一个小批量的条目batch_size=1
。或者例如tag1
,tag2
如果我设置batch_size=2
到目前为止,我的代码完全无视tag
标签,只是随机选择批次。
我构建了这样的数据集:
# features is a matrix with all the features columns through all rows
# target is a vector with the target column through all rows
featuresTrain, targetTrain = projutils.get_data(train=True, config=config)
train = torch.utils.data.TensorDataset(featuresTrain, targetTrain)
train_loader = make_loader(train, batch_size=config.batch_size)
我的装载机(通常)看起来像这样:
def make_loader(dataset, batch_size):
loader = torch.utils.data.DataLoader(dataset=dataset,
batch_size=batch_size,
shuffle=True,
pin_memory=True,
num_workers=8)
return loader
然后我像这样训练:
for epoch in range(config.epochs):
for _, (features, target) in enumerate(loader):
loss = train_batch(features, target, model, optimizer, criterion)
和train_batch
:
def train_batch(features, target, model, optimizer, criterion):
features, target = features.to(device), target.to(device)
# Forward pass ➡
outputs = model(features)
loss = criterion(outputs, target
return loss