We recently ran a small internal benchmark of scvi-tools speed on 2 gpus, and it showed considerably slower results on H100 compared to A100 on the same task, same code.
Have you seen something similar?
Do you have any ideas about what might contribute to this or how to investigate?
Thanks. If you don’t mind, could you try using one of the built-in Lightning profilers to see where the bottleneck is? You can directly pass it into the train method. Feel free to report the results here and I can take a look.
Also, what is your setup like? Are both GPUs connected to the same motherboard, or are they on different nodes? If they’re on different nodes, do they have different CPUs/data interconnects?