Error when training model on M3 Max MPS

Hi there,

I ran into an error when trying to train a model on M3 Max MBP. Python version is 3.10 with nightly Pytorch build:

ValueError: Expected parameter loc (Tensor of shape (128, 10)) of distribution Normal(loc: torch.Size([128, 10]), scale: torch.Size([128, 10])) to satisfy the constraint Real(), but found invalid values:
tensor([[nan, nan, nan, …, nan, nan, nan],
[nan, nan, nan, …, nan, nan, nan],
[nan, nan, nan, …, nan, nan, nan],
…,
[nan, nan, nan, …, nan, nan, nan],
[nan, nan, nan, …, nan, nan, nan],
[nan, nan, nan, …, nan, nan, nan]], device=‘mps:0’,
grad_fn=)

I would like to know if this issue is specifically because I am using a beta version of Pytorch or because I am doing something wrong with the script. I am new to Python sc-seq analysis, so I apologize in advance if this question is trivial.

Thank you for your time!

Hi, we’ve had some issues with NaNs in MultiVI and gimVI, are you using either of those models by chance? If so, I’d try downgrading to scvi-tools==0.20.3 and checking if the issue still persists.

I’ll also note that our models are not fully compatible with PyTorch MPS builds as the lgamma kernel is not supported with that backend yet, but based on the traceback, that doesn’t seem to be the source of the error here.

Thank you for the response! I am only using scvi.model.SCVI and I have found that I do not get any errors when I use the CPU instead. Also, scvi-tools==0.20.3 gives the same error as before. I suppose this is a compatibility issue :frowning:

Thanks again for your time!

Sorry about that! We’re also looking forward for PyTorch to add full compatibility with MPS, so we’ll be testing on our end to see when that is available.

I have the same issue.

Please refer to the following thread: Macbook M1 M2 mps acceleration with scVI - #4 by martinkim0