Anndata Manager Store

Mariano · April 9, 2025, 12:42am

Hi, I have a basic question about the procedure to run scvi models.

When a model is created and trained, we run 3 instructions, 1) setup_anndata, 2) instantiate the model, 3) model.train()

My understanding is that when we run setup anndata, we registered some fields in the anndata and a “AnndataManager is stored in the AnndataManager Store” .

So, my question are:
A) let’s say that I run step 1, setup_anndata, where is the anndatamanager store?
B) can I access the anndatamanager without going to step 2 and creating a model?
C) after step 1, we have the fields encoded and written to the anndata.obs, but where is the state registry?

Thanks

ori-kron-wis · April 9, 2025, 11:35am

Hi,

AnndataManager is hidden at this point but you can extract it by its pointer that is the scvi_uuid or scvi_manager_uuid you added to adata while doing the setup anndata, like so: SCVI._get_most_recent_anndata_manager(adata).
Then, the state registries exists in any field that will be including in the AnndataManager registry.field_registries, e.g: AnndataManager.registry['field_registries']['batch']['state_registry'] and so on..

Mariano · April 9, 2025, 6:35pm

Two points:

From a software engineering point. Where is the AnndataManager is store? What does it mean is hidden? I want to learn how this is store within the python environment.
Another software engineering point. I want to know if data is duplicated and where. So, after my point 1), data resides in the anndata object and the AnndataManager does not have the anndata object linked to it. After point 2. Is the anndata duplicated inside the scvi object and in the anndata object itself. During point 3, when the AnndataManager convert to andtorchdataset, is the data duplicated?

Thanks !

ori-kron-wis · April 9, 2025, 8:34pm

AnndataManager Store it is a dictionary initialized per model type (e.g SCVI) which hold those uuid’s. see this in the base_model class (which every model inherits from), _setup_adata_manager_store and _per_instance_manager_store. it is not exposed during the setup anndata directly, only those mapping (but you can fetch it, see previous answer). its a kind of lazy initialization for the model.

So its there in memory and yes AnnDataManager link to that adata (a reference copy, not value), as well as each instance of the model holds a copy of it when initialized. so thats a duplication and the reason is that each model can ran with its own scenario and can add more information on top of that duplicated adata (like latent layer), later to be saved and used elsewhere perhaps.

as for AnnTorchDataset, yes the adata is also copied by value there, and changing its structure before being apply into a torch model during data loading. however this is train time duplication and is not wasting memory.

having said all of that, I did not design this whole data registering mechanism. Perhaps deeper questions might be referred to Adam ,Can , Martin and Ilan.

Hope I helped.

Mariano · April 10, 2025, 11:45am

Hi Ori,
thank you so much.

A) This is clear now. During my step 1, setup_anndata, scvi writes class variables even when the class is not initialized. This is some behavior from python that I was not aware.
B) This is also clear. During my step 2, anndata is duplicated, so in principle, I can remove from memory the first one without affecting the model.

Two more questions,
For AnnTorchDataset, why do you say that data duplication is “train time duplication” and does not consume memory? Is this because this is happening on a batch of data?

Can you give me a really brief overview of how you are thinking of handling the AnndataManager and the Data in the class duplication with a CustomDataloader?

Thanks !

ori-kron-wis · April 10, 2025, 12:29pm

oh it consume memory, but only a batch of data, and then released (unlike the previous duplication).

in custom dataloaders, in which I believe you meant we don’t use adata, we dont run the setup_anndata, and the model is initialized without adata.
instead we use a pre-defined registry, which suits the custom dataloader and the model of interest and it is part of the custom dataloader class initialization. so no lazy init here.
than we use the registry to init the model and the dataloader to create batches and run the training. its a parallel, bypass, mechanism to the AnndataManager.

actually this is something that we are now adding officially to next release of scvi-tools (we have a custom dataloader for LaminAI and one for Census data based on TileDB).

Mariano · April 10, 2025, 1:13pm

Hi Ori
Do you have the example of the custom data loader ? Is it in the branch?

Best

ori-kron-wis · April 10, 2025, 3:50pm

yes its in the custom dataloaders registry branch

Mariano · April 18, 2025, 12:02am

Hi Ori,
I gave it a try and made some comments on the branch, there was an error on the tutorial. I want to help with this so, is there any chat that you are having Can and you about this? Also, I compare head to head the TileDB and the regular anndata loaders and Tile is 50% slower.

Mariano · April 18, 2025, 12:56am

One more thing… how do I do get_latent() with the TileDB data module?

Mariano · April 18, 2025, 5:27am

Last thing. I tested the TileDB dataloader ir regular and DDP mode and what is causing the delay is the slow access to data. GPU peaks and process super fast but in between batches there is a long waiting time.

ori-kron-wis · April 20, 2025, 7:52am

@Mariano thanks for all the help.
you will be able to see how we use get_latent in the corresponding custom dataloders tutorial which has a PR in scvi-tools-tutorials (added custom dataloaders initial tutos by ori-kron-wis · Pull Request #425 · scverse/scvi-tutorials · GitHub)

because those are all development specific questions, for a version that is not released yet, I suggest to continue this discussion on the relevant branch/PR.

Topic		Replies	Views
With `data_module`, how do I get the `adata_manager`? scvi-tools	0	78	May 22, 2024
Is it possible to get the saved model from h5ad file? scvi-tools	3	481	October 17, 2023
Scanvi from scvi model saved to file scvi-tools	5	2058	August 29, 2022
AttributeError: ~ object has no attribute '_adata_manager' scvi-tools scvi	3	374	November 8, 2022
scVI with large datasets scvi-tools	4	311	September 24, 2024

Anndata Manager Store

Related topics