我想在 Pytorch 中创建一个自定义数据集类和 Dataloader,它使用n
行(观察)和m
列(特征)预处理来自 pandas 数据帧的数据。
我特别想要的是一个加载张量的数据加载器,其中tensor.shape = torch.Size([1, num_features, num_sequence])
where是一个与num_features
特征数量相对应的数字(此外,如果我选择一个数字 x,数据加载器应该返回几个张量,例如:m
num_sequence
w
batch_size
BatchIndex 1, tensor.size([1, num_feat, time_window with rows 1 - w])
BatchIndex 2, tensor.size([1, num_feat, time_window with rows w+1 - 2w])
...
BatchIndex X, tensor.size([1, num_feat, time_window with rows n-w - n])
到目前为止,我只设法创建了一个类,它一次加载一个特性,batch_size 将第一个条目移动一个,这样:
BatchIndex1: Tensor([1,2,3], [2,3,4], [3,4,5])
BatchIndex2: Tensor([4,5,6], [5,6,7], [7,8,9])
etc.
通过使用以下代码:
class Training_Prep(Dataset):
def __init__(self, df_train):
self.mytraindata = df_train[["value"]]
def __len__(self):
return len(self.mytraindata) - 60
def __getitem__(self, index):
training_data = torch.zeros(60,1)
for i in range(0, 60):
training_data[i] = torch.tensor(self.mytraindata.iloc[index + i][0])
return training_data
def setup_data_loader(batch_size, use_cuda = False):
kwargs = {"num_workers": 0, "pin_memory": use_cuda}
traindata = Training_Prep(df_train = trainset)
train_loader = torch.utils.data.DataLoader(traindata,
batch_size = batch_size,
shuffle = False,
drop_last = True)
for index, (data) in enumerate(train_loader):
print('BatchIndex {}, data.shape {}'.format(index, data.shape))
return train_loader
有人知道如何解决这个问题吗?