nlp - 在为 HuggingFace 编码数据集时使用所有可用 RAM 后会话崩溃

翻译自：https://stackoverflow.com/questions/66948996 2021-04-05T06:20:36.183

119 次

我正在尝试使用 HuggingFace 转换器来训练我的模型。我有大约 1.5 GB 大小的数据集。

当我将数据集编码为：

def preprocess_function(examples):
    return tokenizer(examples['content'], truncation=True)

#all strings to be encoded 

def convert_type_to_int(example):
    example['type'] = example['type'][0]
    return example

encoded_dataset = fake_news_ds.map(preprocess_function, batched=True)
encoded_dataset = encoded_dataset.map(convert_type_to_int)

Colab 的内存已满。

使用所有可用 RAM 后会话崩溃

我知道不可能在 Colab 中扩展 ram，但是我可以用代码做些什么来降低 ram 的利用率吗？

nlp - 在为 HuggingFace 编码数据集时使用所有可用 RAM 后会话崩溃

0 回答 0

Related

Reference