Strong scaling seems to degrade performance, i.e., it takes longer to get the same model performance although more GPUs are being used.
Mini-batch on even on GPU convergences significantly faster.
There are minutes attached to this event.
Show them.