Using low precision matrix multiplication to boost performance

How do I track the training so that I know that changing to low precision did/didn’t work? What would you look for?

Also in Increase scVI integration speed - #2 by Valentine_Svensson @Valentine_Svensson didn’t see any significant increase by changing batch_size. Is it still the case?