Scvi performance on nvidia h100 vs a100

We recently ran a small internal benchmark of scvi-tools speed on 2 gpus, and it showed considerably slower results on H100 compared to A100 on the same task, same code.
Have you seen something similar?
Do you have any ideas about what might contribute to this or how to investigate?

I used scvi-tools==1.0.4 with torch==2.1.1

Thank you

Hi, would you be able to send your benchmark code if possible?

Sure, here it is. I used our internal dataset, but pretty sure it should be reproducible with pbmc 3k.

import scvi
import scanpy as sc
import time
import torch


adata = sc.read_h5ad('…')
sc.pp.highly_variable_genes(
    adata,
    flavor="seurat_v3",
    n_top_genes=1000,
    subset=True,
    batch_key="Patient"
)
scvi.model.SCVI.setup_anndata(
    adata,
    layer="counts",
    batch_key="Patient",
    categorical_covariate_keys=["Chemistry"]
)

model_adata = scvi.model.SCVI(adata, n_layers=2, dropout_rate=0.2, n_latent=10)

train_start = time.time()
model_adata.train(
    max_epochs=100,
    use_gpu=True,
    check_val_every_n_epoch=2,
    early_stopping=False
)
print(f'Training on {torch.cuda.get_device_name()} took {time.time() - train_start}s')

Thanks. If you don’t mind, could you try using one of the built-in Lightning profilers to see where the bottleneck is? You can directly pass it into the train method. Feel free to report the results here and I can take a look.

Also, what is your setup like? Are both GPUs connected to the same motherboard, or are they on different nodes? If they’re on different nodes, do they have different CPUs/data interconnects?

Against which cuda version did you build torch?