Different DEG results after loading .h5ad

Hi to everyone and thank you in advance for any help. I’m relatively new to bioinformatic and I’m focusing to learn scRNA analysis. I have some problems of DEG results’ inconsistency after loading a saved .h5ad file. These are my scripts to load and integrate the dataset:

adata = sc.read('./Peng/output/Peng_concatenated.h5ad')
#first save the raw data, not normalized, not log in the adata layer called 'counts', that will be used by SCVI
adata.layers['counts']= adata.X.copy()
#normalize in a way that each cell has a total count of 10000 UMI
sc.pp.normalize_total(adata, target_sum=1e4)
#convert to log
#freeze data as it is now
adata.raw = adata

#integration based on sample AND Condition as categorical keys

scvi.model.SCVI.setup_anndata(adata, layer = "counts",
                             categorical_covariate_keys=["Sample","Condition"], #you can add "batch" for different bachtes or "technology" for different technology sequencing
                             continuous_covariate_keys=['pct_counts_mt', 'total_counts', 'pct_counts_ribo'])
model = scvi.model.SCVI(adata)
model.get_latent_representation() #output of the model
adata.obsm['X_scVI'] = model.get_latent_representation() #we will use the model to cluster, save intoadata.obsm
adata.layers['scvi_normalized'] = model.get_normalized_expression(library_size = 1e4) #normalized data, save into adata.layer to not damage original raw data

Then I cluster using scvi model:

sc.pp.neighbors(adata, use_rep = 'X_scVI') #calculate neighbours on scvi model
sc.tl.leiden(adata, resolution = 1)
sc.pl.umap(adata, color = ['leiden', 'Sample','Condition'], frameon = False)

Then I save my integrated and cluster dataset using:


The problem arises here: if I go on and calculate differential expression between clusters in the same session I obtain some results:

markers_scvi=model.differential_expression(groupby = 'leiden') #get DEG but using scvi
markers_scvi = markers_scvi[(markers_scvi['is_de_fdr_0.05']) & (markers_scvi.lfc_mean > .5)]

If I close the session and load again the clustered dataset that I saved before and run again differential expression, I get other results (similar, but not identical).
That’s the code:

adata = sc.read('./Peng/output/Peng_concatenated_clustered_NEW_copy.h5ad')
sc.pl.umap(adata, color = ['leiden', 'Sample','Condition'], frameon = False) #clustering works perfectly
model=scvi.model.SCVI.load('model.Peng_all_cells_all_genes_NEW', adata)
markers_scvi=model.differential_expression(groupby = 'leiden') #get DEG but using scvi

I get some of the same genes with slightly different values but also other genes.

Anyone has any idea about what I’m doing wrong when I’m loading the saved h5ad file? I think maybe I’m missing some normalization step or maybe something else related to the raw slot…

Thank you in advance for any help

1 Like