0

I have access to a hpc node, of 3 GPU and maximum of 38 CPU. I have a transformer model which I run of a single GPU at the moment, I want to utilize all the GPUs and CPUs. I have seen couple of tutorial on Dataparrallel and DistributedDataParallel. They only mentioned how to use multiple GPUs.

My questions are:

  1. Do I use Dataparallel or DistributedDataParallel
  2. How do I adapt my code run on the GPUs and CPUs simultaneously. Perhaps if I can get a tutorial link.
  3. How to do I get the device ids.
Fhunmie
  • 17
  • 6

1 Answers1

0
  1. I did use DistributedDataParallel according to PyTorch documentation DataParallel is usually slower than DistributedDataParallel therefore it is recommended since DistributedDataParallel works for both single- and multi-machine training.

  2. Tutorial Comparison between DataParallel and DistributedDataParallel

  3. Another Tutorial Multi-GPU Examples

  4. Solution LITDataScience's answer - How to find the nvidia GPU IDs for pytorch cuda run setup?

Lynn
  • 1,121
  • 1
  • 3
  • 18
Fhunmie
  • 17
  • 6