0

我正在使用Caffe,它是带有 GPU(或 CPU)的卷积神经网络的框架。它主要使用 CUDA 6.0,我正在使用大量图像数据集(ImageNet 数据集 = 120 万张图像)训练 CNN,并且需要大量内存。但是,我正在对原始子集进行小型实验(这也需要大量内存)。我也在研究 gpu 集群。这是命令 $ nvidia-smi 的输出

+------------------------------------------------------+                       
| NVIDIA-SMI 331.62     Driver Version: 331.62         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla M2050         Off  | 0000:08:00.0     Off |                    0 |
| N/A   N/A    P0    N/A /  N/A |   1585MiB /  2687MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla M2050         Off  | 0000:09:00.0     Off |                    0 |
| N/A   N/A    P1    N/A /  N/A |      6MiB /  2687MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla M2050         Off  | 0000:0A:00.0     Off |                    0 |
| N/A   N/A    P1    N/A /  N/A |      6MiB /  2687MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla M2050         Off  | 0000:15:00.0     Off |                    0 |
| N/A   N/A    P1    N/A /  N/A |      6MiB /  2687MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla M2050         Off  | 0000:16:00.0     Off |                    0 |
| N/A   N/A    P1    N/A /  N/A |      6MiB /  2687MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla M2050         Off  | 0000:19:00.0     Off |                    0 |
| N/A   N/A    P1    N/A /  N/A |      6MiB /  2687MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla M2050         Off  | 0000:1A:00.0     Off |                    0 |
| N/A   N/A    P1    N/A /  N/A |      6MiB /  2687MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla M2050         Off  | 0000:1B:00.0     Off |                    0 |
| N/A   N/A    P1    N/A /  N/A |      6MiB /  2687MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Compute processes:                                               GPU Memory |
|  GPU       PID  Process name                                     Usage      |
|=============================================================================|
|    0     10242  ../../../build/tools/train_net.bin                  1577MiB |
+-----------------------------------------------------------------------------+

但是当我尝试运行这些多个进程(例如,在不同的数据集上运行相同的 train_net.bin)时,它们会失败,因为它们在同一个 GPU 上运行,我想知道如何强制使用另一个 GPU。我将不胜感激任何帮助。

4

0 回答 0