1

图书馆预计utf-8. 我试图将我的 us-ascii 文件转换为utf-8使用:

iconv -f us-ascii -t utf-8 src.csv > target.csv

当我这样做时:

file -I target.csv

它仍然显示字符集为 us-ascii。然后我发现 us-ascii 是 utf-8 的一个子集,该文件只会猜测文件类型。

但是,如果我使用 src.csv 作为 TextLMDataBunch.from_csv() 库的输入,它就可以工作。如果我做:

cat src.csv > target.csv

然后使用 target.csv 作为同一个库的输入,它不起作用并给出以下错误:

   TypeError                                 Traceback (most recent call last)
<ipython-input-118-44bc7147d2a4> in <module>()
----> 1 data_lm = TextLMDataBunch.from_csv(sample_p, 'voila.csv')

/usr/local/lib/python3.6/dist-packages/fastai/text/data.py in from_csv(cls, path, csv_name, valid_pct, test, tokenizer, vocab, classes, header, text_cols, label_cols, label_delim, **kwargs)
    180         test_df = None if test is None else pd.read_csv(Path(path)/test, header=header)
    181         return cls.from_df(path, train_df, valid_df, test_df, tokenizer, vocab, classes, text_cols,
--> 182                            label_cols, label_delim, **kwargs)
    183 
    184     @classmethod

/usr/local/lib/python3.6/dist-packages/fastai/text/data.py in from_df(cls, path, train_df, valid_df, test_df, tokenizer, vocab, classes, text_cols, label_cols, label_delim, **kwargs)
    165         src = ItemLists(path, TextList.from_df(train_df, path, cols=text_cols, processor=processor),
    166                         TextList.from_df(valid_df, path, cols=text_cols, processor=processor))
--> 167         src = src.label_for_lm() if cls==TextLMDataBunch else src.label_from_df(cols=label_cols, classes=classes, sep=label_delim)
    168         if test_df is not None: src.add_test(TextList.from_df(test_df, path, cols=text_cols))
    169         return src.databunch(**kwargs)

/usr/local/lib/python3.6/dist-packages/fastai/data_block.py in _inner(*args, **kwargs)
    356         assert isinstance(fv, Callable)
    357         def _inner(*args, **kwargs):
--> 358             self.train = ft(*args, **kwargs)
    359             assert isinstance(self.train, LabelList)
    360             self.valid = fv(*args, **kwargs)

/usr/local/lib/python3.6/dist-packages/fastai/text/data.py in label_for_lm(self, **kwargs)
    285         "A special labelling method for language models."
    286         self.__class__ = LMTextList
--> 287         return self.label_const(0, label_cls=LMLabel)
    288 
    289     def reconstruct(self, t:Tensor):

/usr/local/lib/python3.6/dist-packages/fastai/data_block.py in label_const(self, const, **kwargs)
    211     def label_const(self, const:Any=0, **kwargs)->'LabelList':
    212         "Label every item with `const`."
--> 213         return self.label_from_func(func=lambda o: const, **kwargs)
    214 
    215     def label_empty(self):

/usr/local/lib/python3.6/dist-packages/fastai/data_block.py in label_from_func(self, func, **kwargs)
    219     def label_from_func(self, func:Callable, **kwargs)->'LabelList':
    220         "Apply `func` to every input to get its label."
--> 221         return self.label_from_list([func(o) for o in self.items], **kwargs)
    222 
    223     def label_from_folder(self, **kwargs)->'LabelList':

TypeError: iteration over a 0-d array

有人可以告诉我有什么问题吗?我正在 Google Colab 上尝试此操作,并在 Colab 和我的 Mac 上尝试了字符编码更改,但没有结果。

4

0 回答 0