当运行为对象检测训练的模型时,我在调用 tf.run() 时内存不足
2018-06-26 18:32:16.914049: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.55GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-06-26 18:32:17.393037: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.31GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-06-26 18:32:23.825495: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.31GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-06-26 18:32:24.659582: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.11GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-06-26 18:32:29.902840: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.20GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-06-26 18:32:30.955526: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.29GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-06-26 18:32:37.434223: W tensorflow/core/framework/op_kernel.cc:1328] OP_REQUIRES failed at where_op.cc:286 : Internal: WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true / nonzero indices. temp_storage_bytes: 1, status: too many resources requested for launch
是否有某种类型的模型训练过程以确保模型不需要大量 RAM 进行推理?
有什么方法可以将我的模型转换为使用更少的内存?
我已经尝试了一些图形转换,但它们似乎没有做太多。我还将 GPU 限制设置为内存的 40%,但这也无济于事。
我应该有大约 4gb-5gb 的可用内存。
这些是我认为我可能遇到的主要问题。
1) 从 Inception V3 训练出来的模型,而不是移动模型。
2) 对较大尺寸的图像进行标记并用于迁移学习。
-EDIT 这似乎是由于 ARM 架构上的 tensorflow 和 Cuda 的内存分配不佳。