To answer your question, you need to train the model using paired + single modality data and once that the model is trained, you can call the routines to calculate differential accessibility/expression.
We will be modifying the tutorial soon as we have many new features now. Thanks for asking this question! We will try to clarify it on the tutorial.
Please, let me know if it is clear now !
Regarding your other questions, I haven’t used this particular model, but the tutorial takes 36 minutes to train for 12k cells with a GPU. So if you have ~120k cells and are also using a GPU, it sounds reasonable. Without a GPU training will be slower (I don’t know how much slower).
With the RNA-seq scVI models my experience is that you can use fewer epochs for training if you have more cells to cut down training time. My typical workflow is to run quick training with fewer epochs while I explore hyperparameters/setting, then when I think I understand the variation in the data I start a longer training run and save the model so I can just load it when I want to do use it for some analysis in the future.
Regarding the UMAP; UMAP training is non-deterministic. In the tutorial a manual random seed is set for scvi, but it doesn’t seem like a manual random seed is set for the UMAP training. The resulting UMAP will then look different, but the general structure in the plot (number of clusters, overlap between labels, etc) should be consistent between UMAP runs.