我有一个带有ModelCheckpoint
回调的 keras 模型。
当我将回调中的路径设置为tmp
文件夹时,它工作得很好,但是当我将它设置到另一个调用的文件夹时,kaggle
我得到一个错误。
错误很长,这是它的最后一部分:
21/22 [===========================>..] - ETA: 0s - loss: 0.7804 - acc: 0.50482020-04-28 17:36:20.771950: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Invalid argument: indices[5,12] = 11086 is not in [0, 11086)
[[{{node embedding/embedding_lookup}}]]
2020-04-28 17:36:20.778527: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Invalid argument: indices[5,12] = 11086 is not in [0, 11086)
[[{{node embedding/embedding_lookup}}]]
[[dense_1_target/_2]]
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/DIRECTORY2/train.py", line 76, in <module>
Train(args)
File "/DIRECTORY2/train.py", line 28, in __init__
Train.train(params.read(configs))
File "/DIRECTORY2/train.py", line 69, in train
verbose = 1)
File "/DIRECTORY/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1433, in fit_generator
steps_name='steps_per_epoch')
File "/DIRECTORY/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_generator.py", line 264, in model_iteration
batch_outs = batch_function(*batch_data)
File "/DIRECTORY/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1175, in train_on_batch
outputs = self.train_function(ins) # pylint: disable=not-callable
File "/DIRECTORY/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 3443, in __call__
outputs = self._graph_fn(*converted_inputs)
File "/DIRECTORY/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 561, in __call__
return self._call_flat(args)
File "/DIRECTORY/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 660, in _call_flat
outputs = self._inference_function.call(ctx, args)
File "/DIRECTORY/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 434, in call
ctx=ctx)
File "/DIRECTORY/lib/python3.6/site-packages/tensorflow/python/eager/execute.py", line 67, in quick_execute
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[5,12] = 11086 is not in [0, 11086)
[[{{node embedding/embedding_lookup}}]] [Op:__inference_keras_scratch_graph_3082]
编辑(1):
导致错误的目录是使用 WinSCP 程序从另一台 Windows 机器传输给我的 linux 用户的,而另一个 ( tmp
) 是由本地在 linux 中创建的。
编辑(2):
我删除了导致错误的目录并在本地创建了相同的目录,错误消失了!我很确定错误是由于目录权限引起的,但我不知道来源是什么。