0

我想在 Pytorch 中创建一个自定义数据集类和 Dataloader,它使用n行(观察)和m列(特征)预处理来自 pandas 数据帧的数据。

我特别想要的是一个加载张量的数据加载器,其中tensor.shape = torch.Size([1, num_features, num_sequence])where是一个与num_features特征数量相对应的数字(此外,如果我选择一个数字 x,数据加载器应该返回几个张量,例如:mnum_sequencewbatch_size

BatchIndex 1, tensor.size([1, num_feat, time_window with rows 1 - w])
BatchIndex 2, tensor.size([1, num_feat, time_window with rows w+1 - 2w])
...
BatchIndex X, tensor.size([1, num_feat, time_window with rows n-w - n])

到目前为止,我只设法创建了一个类,它一次加载一个特性,batch_size 将第一个条目移动一个,这样:

BatchIndex1: Tensor([1,2,3], [2,3,4], [3,4,5]) 
BatchIndex2: Tensor([4,5,6], [5,6,7], [7,8,9])
etc.

通过使用以下代码:

class Training_Prep(Dataset):
    def __init__(self, df_train):
        self.mytraindata = df_train[["value"]]
        
    def __len__(self):
        return len(self.mytraindata) - 60

    def __getitem__(self, index):
        training_data = torch.zeros(60,1)
        for i in range(0, 60):
            training_data[i] = torch.tensor(self.mytraindata.iloc[index + i][0])
        return training_data

def setup_data_loader(batch_size, use_cuda = False):
    kwargs = {"num_workers": 0, "pin_memory": use_cuda}
    traindata = Training_Prep(df_train = trainset)

    train_loader = torch.utils.data.DataLoader(traindata,
                                               batch_size = batch_size,
                                               shuffle = False,
                                               drop_last = True)
    
    for index, (data) in enumerate(train_loader):
        print('BatchIndex {}, data.shape {}'.format(index, data.shape))

    return train_loader

有人知道如何解决这个问题吗?

4

0 回答 0