python - TensorFlow 模型使用整个内存和 OOM 系统

Question

当运行为对象检测训练的模型时，我在调用 tf.run() 时内存不足

2018-06-26 18:32:16.914049: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.55GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-06-26 18:32:17.393037: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.31GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-06-26 18:32:23.825495: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.31GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-06-26 18:32:24.659582: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.11GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-06-26 18:32:29.902840: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.20GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-06-26 18:32:30.955526: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.29GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-06-26 18:32:37.434223: W tensorflow/core/framework/op_kernel.cc:1328] OP_REQUIRES failed at where_op.cc:286 : Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true / nonzero indices.  temp_storage_bytes: 1, status: too many resources requested for launch

是否有某种类型的模型训练过程以确保模型不需要大量 RAM 进行推理？

有什么方法可以将我的模型转换为使用更少的内存？

我已经尝试了一些图形转换，但它们似乎没有做太多。我还将 GPU 限制设置为内存的 40%，但这也无济于事。

我应该有大约 4gb-5gb 的可用内存。

这些是我认为我可能遇到的主要问题。

1) 从 Inception V3 训练出来的模型，而不是移动模型。

2) 对较大尺寸的图像进行标记并用于迁移学习。

-EDIT 这似乎是由于 ARM 架构上的 tensorflow 和 Cuda 的内存分配不佳。

python - TensorFlow 模型使用整个内存和 OOM 系统

0 回答 0

Related

Reference