Suggestion on parameters for training scvi model

Thanks you very much for your fast reply! This helped me a lot.

I also have another question: I would like to integrate my samples but instead of using only the top 2000 highly variable genes, I would like to use all the genes.
However, the batches do not all have the same number of genes, so when creating the concatenated matrix I would do

anndata_dir = 'C:/Users/Martina/Desktop/CAR-T Atlas Data/AnnData'
list_files = os.listdir(anndata_dir)
anndata_list = []

for filename in list_files:
    file_path = os.path.join(anndata_dir, filename)
    anndata_obj = ad.read_h5ad(file_path)
    anndata_list.append(anndata_obj)

concatenated_anndata = ad.concat(anndata_list, axis=0, join='outer')

so that I can keep all the genes, and for those cells not having that genes I have 0 counts added from the ad.concat function.

I would like to do this in order to better integrate the data taking into account all the possible variability and then run the differential gene expression over all the genes to better characterise all the cells.

Would you recommend doing this? Does it remove batch effects efficiently? Or should I use a different approach? If so, which approach would you recommend?

What makes me doubt of this is when it comes to the differential gene expression: please correct me if I’m wrong, but I would think that by doing this, the cells from batches that do not express some genes, and thus have added ‘zeros’ from the ad.concat function through the join=‘outer’ setting, would get biased when calculating the DGE since I added these 0 counts (maybe they would be expressed but I don’t have that information).

Thank you so much for your help!!