tensorflow - 如何从张量流中的检查点继续训练初始模型

Question

我已经加载了预训练的初始模型：

if FLAGS.pretrained_model_checkpoint_path: assert tf.gfile.Exists(FLAGS.pretrained_model_checkpoint_path) variables_to_restore = tf.get_collection( slim.variables.VARIABLES_TO_RESTORE) restorer = tf.train.Saver(variables_to_restore) restorer.restore(sess, FLAGS.pretrained_model_checkpoint_path) print('%s: Pre-trained model restored from %s' % (datetime.now(), FLAGS.pretrained_model_checkpoint_path))并通过使用flowers_train.py 在我的数据上训练模型

训练完成后，损失约为1.0，模型保存在指定目录。

现在我想继续训练，所以，我恢复模型：

if FLAGS.checkpoint_dir is not None: # restoring from the checkpoint file ckpt = tf.train.get_checkpoint_state(FLAGS.checkpoint_dir) tf.train.Saver().restore(sess, ckpt.model_checkpoint_path)

并继续训练模型，但第一步的损失约为 6.5，这实际上意味着该模型根本没有初始化。

这是inception_train.py的全部内容，这些内容是从这个inception_train.py修改而来的

我乘坐的第一列火车是：

bazel-bin/inception/flowers_train --train_dir="{$TRAIN_DIR}" --data_dir="{$DATA_DIR}" --fine_tune=True --initial_learning_rate=0.001 --input_queue_memory_factor=1 --batch_size=64 --max_steps=100 --pretrained_model_checkpoint_path="/home/tensorflow/inception-v3/model.ckpt-157585"

我试图通过这个命令继续训练：

bazel-bin/inception/flowers_train --train_dir="{$TRAIN_NEW_DIR}" --data_dir="{$DATA_DIR}" --fine_tune=False --initial_learning_rate=0.001 --input_queue_memory_factor=1 --batch_size=64 --max_steps=2000 --checkpoint_dir="{$TRAIN_DIR}"

拜托，谁能解释一下，初始化训练模型时我做错了什么？

score 0 · Accepted Answer

我通过使用正确的 arg_scope 解决了它，如下所示：

with slim.arg_scope(inception_v3.inception_v3_arg_scope()): logits, _ = inception_v3.inception_v3(eval_inputs, num_classes=1001, is_training=False)

tensorflow - 如何从张量流中的检查点继续训练初始模型

1 回答 1

Related

Reference