Hi,
I processed my rna seq data and annotated them using scvi, and I would like to integrate the data using R. Specifically, this is my anndata structure:
AnnData object with n_obs × n_vars = 16712 × 10494
obs: 'sex', 'region', 'subcluster', 'cluster', 'nCount_RNA', 'nFeature_RNA', 'leiden', 'dataset', 'orig_ident', 'broad_names', 'simple_name', 'integrated_snn_res_1', 'integrated_snn_res_0_8', 'integrated_snn_res_1_2', 'broad_names2', '_scvi_batch', '_scvi_labels', 'cell type'
var: 'ensembl_common_ID'
obsm: 'X_pca', 'X_umap'
layers: 'counts'
Which contains the two datasets I want to integrate (annotated under the obs ‘dataset’).
So I am importing this anndata in R using the following:
library(Seurat)
library(SeuratData)
library(SeuratDisk)
library(rhdf5)
# Load my data ------------------------------------------------------------
Convert("C:/Users/data/concatenated_filtered_CSR.h5ad", dest = "h5seurat", overwrite = TRUE)
# Load Seurat object
mouse_combined <- Connect("C:/Users/data/concatenated_filtered_CSR.h5seurat", mode = "r")
metadata <- h5read("C:/Users/data/concatenated_filtered_CSR.h5seurat",
"/meta.data")
# Extract the counts data and cell_type
counts_data <- mouse_combined [["assays"]][["counts"]][["data"]]
# Extract the 'dataset' metadata from /meta.data
dataset_info <- mouse_combined [["meta.data"]][["dataset"]]
# Convert counts_data into a matrix
counts_matrix <- as.matrix(counts_data)
# Create a Seurat object
mouse_combined_seurat <- CreateSeuratObject(counts = counts_matrix)#, assay = "counts") #assay should default to R
# Extract cell names from Seurat object
# cell_names <- colnames(mouse_combined_seurat)
cell_names <- h5read("C:/Users/concatenated_filtered_CSR.h5seurat",
"/cell.names")
colnames(mouse_combined_seurat ) <- cell_names
# Extract dataset and cell type info and ensure it's named according to cell names
dataset_info <- metadata[["dataset"]]
cell_type <- metadata[["cell type"]]
dataset_categories <- dataset_info$categories[dataset_info$codes + 1]
expanded_cell_type <- cell_type$categories[cell_type$codes + 1]
names(dataset_categories) <- cell_names # Assign cell names as names of the vector
names(expanded_cell_type) <- cell_names
# Replace empty values in expanded_cell_type with "none"
expanded_cell_type[expanded_cell_type == ""] <- "none"
# Add dataset metadata to Seurat object
mouse_combined_seurat $dataset <- dataset_categories
mouse_combined_seurat $cell_type <- expanded_cell_type
As a sanity check, I want to map the annotations from one of the two datasets (contained in anndata obs as ‘cell type’ in one of the dataset) on the umap plot before the integration, however I get the label of cell type all over the plot (left, including in the wrong dataset). The plot on the right shows the label of the two datasets I want to integrate over in the umap plot. So I suspect something is wrong in the way I am importing the ‘cell type’ annotation:
Does anyone have any suggestion on the correct way of importing the obs layer in the seurat object?