1

我有一个带有ModelCheckpoint回调的 keras 模型。

当我将回调中的路径设置为tmp文件夹时,它工作得很好,但是当我将它设置到另一个调用的文件夹时,kaggle我得到一个错误。

错误很长,这是它的最后一部分:

    21/22 [===========================>..] - ETA: 0s - loss: 0.7804 - acc: 0.50482020-04-28 17:36:20.771950: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Invalid argument: indices[5,12] = 11086 is not in [0, 11086)
         [[{{node embedding/embedding_lookup}}]]
2020-04-28 17:36:20.778527: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Invalid argument: indices[5,12] = 11086 is not in [0, 11086)
         [[{{node embedding/embedding_lookup}}]]
         [[dense_1_target/_2]]
Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/DIRECTORY2/train.py", line 76, in <module>
    Train(args)
  File "/DIRECTORY2/train.py", line 28, in __init__
    Train.train(params.read(configs))
  File "/DIRECTORY2/train.py", line 69, in train
    verbose = 1)
  File "/DIRECTORY/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1433, in fit_generator
    steps_name='steps_per_epoch')
  File "/DIRECTORY/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_generator.py", line 264, in model_iteration
    batch_outs = batch_function(*batch_data)
  File "/DIRECTORY/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1175, in train_on_batch
    outputs = self.train_function(ins)  # pylint: disable=not-callable
  File "/DIRECTORY/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 3443, in __call__
    outputs = self._graph_fn(*converted_inputs)
  File "/DIRECTORY/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 561, in __call__
    return self._call_flat(args)
  File "/DIRECTORY/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 660, in _call_flat
    outputs = self._inference_function.call(ctx, args)
  File "/DIRECTORY/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 434, in call
    ctx=ctx)
  File "/DIRECTORY/lib/python3.6/site-packages/tensorflow/python/eager/execute.py", line 67, in quick_execute
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[5,12] = 11086 is not in [0, 11086)
         [[{{node embedding/embedding_lookup}}]] [Op:__inference_keras_scratch_graph_3082]

我打印了两个文件夹的权限,看起来它们具有相同的权限! 在此处输入图像描述

编辑(1):

导致错误的目录是使用 WinSCP 程序从另一台 Windows 机器传输给我的 linux 用户的,而另一个 ( tmp) 是由本地在 linux 中创建的。

编辑(2):

我删除了导致错误的目录并在本地创建了相同的目录,错误消失了!我很确定错误是由于目录权限引起的,但我不知道来源是什么。

4

0 回答 0