Hi to everyone and thank you in advance for any help. I’m relatively new to bioinformatic and I’m focusing to learn scRNA analysis. I have some problems of DEG results’ inconsistency after loading a saved .h5ad file. These are my scripts to load and integrate the dataset:
adata = sc.read('./Peng/output/Peng_concatenated.h5ad') #first save the raw data, not normalized, not log in the adata layer called 'counts', that will be used by SCVI adata.layers['counts']= adata.X.copy() #normalize in a way that each cell has a total count of 10000 UMI sc.pp.normalize_total(adata, target_sum=1e4) #convert to log sc.pp.log1p(adata) #freeze data as it is now adata.raw = adata #integration based on sample AND Condition as categorical keys scvi.model.SCVI.setup_anndata(adata, layer = "counts", categorical_covariate_keys=["Sample","Condition"], #you can add "batch" for different bachtes or "technology" for different technology sequencing continuous_covariate_keys=['pct_counts_mt', 'total_counts', 'pct_counts_ribo']) model = scvi.model.SCVI(adata) model.train() model.get_latent_representation() #output of the model adata.obsm['X_scVI'] = model.get_latent_representation() #we will use the model to cluster, save intoadata.obsm adata.layers['scvi_normalized'] = model.get_normalized_expression(library_size = 1e4) #normalized data, save into adata.layer to not damage original raw data
Then I cluster using scvi model:
sc.pp.neighbors(adata, use_rep = 'X_scVI') #calculate neighbours on scvi model sc.tl.umap(adata) sc.tl.leiden(adata, resolution = 1) sc.pl.umap(adata, color = ['leiden', 'Sample','Condition'], frameon = False)
Then I save my integrated and cluster dataset using:
The problem arises here: if I go on and calculate differential expression between clusters in the same session I obtain some results:
markers_scvi=model.differential_expression(groupby = 'leiden') #get DEG but using scvi markers_scvi = markers_scvi[(markers_scvi['is_de_fdr_0.05']) & (markers_scvi.lfc_mean > .5)]
If I close the session and load again the clustered dataset that I saved before and run again differential expression, I get other results (similar, but not identical).
That’s the code:
adata = sc.read('./Peng/output/Peng_concatenated_clustered_NEW_copy.h5ad') sc.pl.umap(adata, color = ['leiden', 'Sample','Condition'], frameon = False) #clustering works perfectly model=scvi.model.SCVI.load('model.Peng_all_cells_all_genes_NEW', adata) markers_scvi=model.differential_expression(groupby = 'leiden') #get DEG but using scvi
I get some of the same genes with slightly different values but also other genes.
Anyone has any idea about what I’m doing wrong when I’m loading the saved h5ad file? I think maybe I’m missing some normalization step or maybe something else related to the raw slot…
Thank you in advance for any help