Kon-kkk t1_iuj7nyz wrote on October 31, 2022 at 7:04 PM

Reply to [D] When the GPU is NOT the bottleneck...? by alexnasla

What framework
What kind of network/model
Try to reduce the CPU-GPU data transitions during training.

Try the nsight system to profile one iteration(both forward and backward) and to see if there are many idles between GPU kernels. Idle means the gpu utilization is very low, and many operations are done on the CPU side. If you are using TensorFlow you can open XLA to accelerate the training. I believe PyTorch should have the same DL compiler for training. And you can open AMP(auto mixed precision/fp16) to accelerate.