Convert Scanpy (h5ad) to Seurat (rds)

Hi Everyone,

I am trying to convert my h5ad to a Seurat rds to run R-based pseudo time algorithms (monocle, slingshot, etc). However I keep running into errors on the commonly posted methods. Does anyone have any advice or experience on how to effectively read a scanpy h5ad in R?



I’ve had luck converting Seurat objects to AnnData objects in memory using the sceasy::convertFormat as demonstrated in our R tutorial here Integrating datasets with scVI in R - scvi-tools. You could try using this in the inverse direction using the from and to args.

P.S. Would be best to categorize this kind of question in the future under the “AnnData” tag.

1 Like

Hi @Justin_Hong , thank you for the tip! I found a lot of good information from the link you provided.

I have ran into a error though, have you seen this one before?

ad <- anndata::read_h5ad('Results/celltype_assigned_raw.h5ad')
sceasy::convertFormat(ad, from="anndata", to="seurat", outFile='file.rds')

Error in path.expand(inFile): invalid ‘path’ argument

  1. sceasy::convertFormat(ad, from = “anndata”, to = “seurat”, outFile = “file.rds”)
  2. func(obj, outFile = outFile, main_layer = main_layer, …)
  3. path.expand(inFile)

R version 4.2.0 (2022-04-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] splines stats4 stats graphics grDevices utils datasets
[8] methods base

other attached packages:
[1] sp_1.4-7 SeuratObject_4.1.0
[3] Seurat_4.1.1 anndata_0.7.5.3
[5] sceasy_0.0.6 reticulate_1.25
[7] MAST_1.22.0 plyr_1.8.7
[9] clusterExperiment_2.16.0 gam_1.20.1
[11] foreach_1.5.2 monocle_2.24.0
[13] DDRTree_0.1.5 irlba_2.3.5
[15] VGAM_1.1-6 ggplot2_3.3.6
[17] Matrix_1.4-1 slingshot_2.4.0
[19] TrajectoryUtils_1.4.0 princurve_2.1.6
[21] RColorBrewer_1.1-3 scran_1.24.0
[23] scuttle_1.6.0 SingleCellExperiment_1.18.0
[25] SummarizedExperiment_1.26.1 Biobase_2.56.0
[27] GenomicRanges_1.48.0 GenomeInfoDb_1.32.2
[29] IRanges_2.30.0 S4Vectors_0.34.0
[31] BiocGenerics_0.42.0 MatrixGenerics_1.8.0
[33] matrixStats_0.62.0 jsonlite_1.8.0
[35] formatR_1.12

loaded via a namespace (and not attached):
[1] pbdZMQ_0.3-7 scattermore_0.8
[3] pkgmaker_0.32.2 tidyr_1.2.0
[5] bit64_4.0.5 DelayedArray_0.22.0
[7] rpart_4.1.16 data.table_1.14.2
[9] KEGGREST_1.36.0 RCurl_1.98-1.6
[11] doParallel_1.0.17 generics_0.1.2
[13] ScaledMatrix_1.4.0 leidenbase_0.1.11
[15] cowplot_1.1.1 RSQLite_2.2.14
[17] RANN_2.6.1 combinat_0.0-8
[19] future_1.25.0 bit_4.0.4
[21] phylobase_0.8.10 spatstat.data_2.2-0
[23] xml2_1.3.3 httpuv_1.6.5
[25] assertthat_0.2.1 viridis_0.6.2
[27] hms_1.1.1 evaluate_0.15
[29] promises_1.2.0.1 fansi_1.0.3
[31] progress_1.2.2 igraph_1.3.1
[33] DBI_1.1.2 htmlwidgets_1.5.4
[35] sparsesvd_0.2 spatstat.geom_2.4-0
[37] purrr_0.3.4 ellipsis_0.3.2
[39] dplyr_1.0.9 annotate_1.74.0
[41] gridBase_0.4-7 deldir_1.0-6
[43] locfdr_1.1-8 sparseMatrixStats_1.8.0
[45] vctrs_0.4.1 here_1.0.1
[47] ROCR_1.0-11 abind_1.4-5
[49] cachem_1.0.6 withr_2.5.0
[51] progressr_0.10.0 sctransform_0.3.3
[53] prettyunits_1.1.1 goftest_1.2-3
[55] softImpute_1.4-1 cluster_2.1.3
[57] ape_5.6-2 IRdisplay_1.1
[59] lazyeval_0.2.2 crayon_1.5.1
[61] genefilter_1.78.0 edgeR_3.38.1
[63] pkgconfig_2.0.3 slam_0.1-50
[65] nlme_3.1-157 rlang_1.0.2
[67] globals_0.15.0 lifecycle_1.0.1
[69] miniUI_0.1.1.1 registry_0.5-1
[71] rsvd_1.0.5 rprojroot_2.0.3
[73] polyclip_1.10-0 lmtest_0.9-40
[75] rngtools_1.5.2 IRkernel_1.3
[77] Rhdf5lib_1.18.0 zoo_1.8-10
[79] base64enc_0.1-3 ggridges_0.5.3
[81] pheatmap_1.0.12 png_0.1-7
[83] viridisLite_0.4.0 bitops_1.0-7
[85] rncl_0.8.6 KernSmooth_2.23-20
[87] rhdf5filters_1.8.0 Biostrings_2.64.0
[89] blob_1.2.3 DelayedMatrixStats_1.18.0
[91] stringr_1.4.0 zinbwave_1.18.0
[93] spatstat.random_2.2-0 parallelly_1.31.1
[95] beachmat_2.12.0 scales_1.2.0
[97] memoise_2.0.1 magrittr_2.0.3
[99] ica_1.0-2 howmany_0.3-1
[101] zlibbioc_1.42.0 compiler_4.2.0
[103] HSMMSingleCell_1.16.0 dqrng_0.3.0
[105] fitdistrplus_1.1-8 cli_3.3.0
[107] ade4_1.7-19 XVector_0.36.0
[109] listenv_0.8.0 patchwork_1.1.1
[111] pbapply_1.5-0 mgcv_1.8-40
[113] MASS_7.3-57 tidyselect_1.1.2
[115] stringi_1.7.6 BiocSingular_1.12.0
[117] locfit_1.5-9.5 ggrepel_0.9.1
[119] grid_4.2.0 tools_4.2.0
[121] future.apply_1.9.0 parallel_4.2.0
[123] uuid_1.1-0 bluster_1.6.0
[125] RNeXML_2.4.7 metapod_1.4.0
[127] gridExtra_2.3 Rtsne_0.16
[129] digest_0.6.29 rgeos_0.5-9
[131] shiny_1.7.1 qlcMatrix_0.9.7
[133] Rcpp_1.0.8.3 later_1.3.0
[135] RcppAnnoy_0.0.19 httr_1.4.3
[137] AnnotationDbi_1.58.0 kernlab_0.9-30
[139] colorspace_2.0-3 tensor_1.5
[141] XML_3.99-0.9 uwot_0.1.11
[143] statmod_1.4.36 spatstat.utils_2.3-1
[145] plotly_4.10.0 xtable_1.8-4
[147] R6_2.5.1 pillar_1.7.0
[149] htmltools_0.5.2 mime_0.12
[151] NMF_0.24.0 glue_1.6.2
[153] fastmap_1.1.0 BiocParallel_1.30.2
[155] BiocNeighbors_1.14.0 codetools_0.2-18
[157] utf8_1.2.2 spatstat.sparse_2.1-1
[159] lattice_0.20-45 tibble_3.1.7
[161] leiden_0.4.2 survival_3.3-1
[163] limma_3.52.1 repr_1.1.4
[165] docopt_0.7.1 fastICA_1.2-3
[167] munsell_0.5.0 rhdf5_2.40.0
[169] GenomeInfoDbData_1.2.8 iterators_1.0.14
[171] HDF5Array_1.24.0 reshape2_1.4.4
[173] gtable_0.3.0 spatstat.core_2.4-2

Looking at their code, looks like when converting from AnnData they require you pass in an input filepath rather than a loaded object. Seems like it’s because they want to ensure the anndata package is loaded correctly for their code to work.

Try doing this instead

ad_path <- "Results/celltype_assigned_raw.h5ad"
sceasy::convertFormat(ad_path, from="anndata", to="seurat", outFile="file.rds")

Hey @Justin_Hong, thank you for catching that! You’re right, it needed a string as a parameter instead of an object :smile:

Unfortunately it did not complete the conversion

ad_path <- "Results/celltype_assigned_hv.h5ad"
sceasy::convertFormat(ad_path, from="anndata", to="seurat", outFile="file.rds", use_seurat = FALSE, main_layer = "counts")

X → counts

Error in match(x, table, nomatch = 0L): ‘match’ requires vector arguments

  1. sceasy::convertFormat(ad_path, from = “anndata”, to = “seurat”,
    . outFile = “file.rds”, use_seurat = FALSE, main_layer = “counts”)
  2. func(obj, outFile = outFile, main_layer = main_layer, …)
  3. sapply(embed_names, function(x) reticulate::py_to_r(ad$obsm[x]),
    . simplify = FALSE, USE.NAMES = TRUE)
  4. lapply(X = X, FUN = FUN, …)
  5. FUN(X[[i]], …)
  6. reticulate::py_to_r(ad$obsm[x])
  7. ad$obsm[x]
  8. [$obsm, x)
  9. name %in% x$keys()

I found a similar unresolved error on GitHub: Anndata to Seurat Object, Error in match · Issue #54 · cellgeni/sceasy · GitHub
I tried to gzip (as shared in Issue comments) but I can’t resolve the error as of yet.
Thanks again for your help!

Hey @PEB,

You can also try using ReadH5AD() from MuDataSeurat.
There might be rough edges still but at least we can fix them quickly!

It is also a native R reader so no need for the Python environment and reticulate.

1 Like

Hey @gtca,

Thanks for reaching out and suggesting MuDataSeurat!
Can you specify what kind of rough edges you’re referring to? Data loss?

I’ve been trying out MuDataSeurat and its been working pretty well. I do get some immediate errors.

Warning in read_layers_to_assay(h5) :
  Only a subset of mod//raw/X is loaded, variables (features) that are not present in mod//X are discarded.
Warning: Keys should be one or more alphanumeric characters followed by an underscore, setting key from rna to rna_
Warning: No columnames present in cell embeddings, setting to 'pca_1:50'
Warning: No columnames present in cell embeddings, setting to 'tsne_1:2'
Warning: No columnames present in cell embeddings, setting to 'umap_1:2'

However I do not think this impacted the process.
I am having some errors with using the data object in pseudo time. I am not too sure if its the converted matrix or data on the algorithm.

In any case, thanks MuSeurat is much better than the other methods I’ve been testing.