SOLO usage - batch, training, predicting

bjstewart1 · March 25, 2021, 5:46pm

Hello - great set of tools, extremely useful.

I was hoping to try using SOLO for doublet detection and can see this is now incorporated into scvi tools.
I’m not finding the usage of this particularly clear.
Seems that the guidance is that it should be run on a single droplet sequencing lane at a time, and it’s not possible to run after training an scvi model with batch correction. Is that correct?

After training then predicting I am getting a numpy array which is a little difficult to interpret - the dimensions are larger than the original anndata object - so perhaps it is the simulated doublets and the native cells together. It’s just a little unclear.
Any chance you could clarify how best to use this tool?

Thanks,

adamgayoso · March 25, 2021, 11:54pm

Thank you for the raising this. Indeed it looks like the predict() method will give predictions of real cells and simulated doublets.

Indeed this is true; this is how Solo was designed, as doublets are generated within a specific lane.

The workflow in the examples of the documentation should be followed, except here is a workaround to process the output of the predict function.

def process_predict_output(output, solo_model):
    import pandas as pd
    label = solo_model.adata.obs["_solo_doub_sim"].values.ravel()
    preds  = output[label == "singlet"]
    cols = solo_model.adata.uns["_scvi"]["categorical_mappings"]["_scvi_labels"]["mapping"]
    preds_df = pd.DataFrame(preds, columns=cols)
    return preds_df

This will give you the predictions with same order as the input anndata, and now named columns of the prediction, if you ran solo.predict(soft=True).

We will update the code so this is more straightforward.

bjstewart1 · March 26, 2021, 9:31am

Thanks for your quick and helpful response Adam.
I’m just conscious that with multiple droplet lanes, this requires recomputing a model for each lane which would be very time consuming for big experiments.
I wonder if it would be possible to compute a model overall for the experiment and then run solo off that for subsets corresponding to individual lanes? Or do you think this would break some important assumptions of the tool?

davek44 · March 28, 2021, 1:34am

Yes, it’s totally fine to fit one scVI model and then run Solo independently for each lane, seeded from that model. That --seed option in our CLI demonstrates that workflow: solo/solo.py at master · calico/solo · GitHub

adamgayoso · March 28, 2021, 4:28pm

Hmmm I think we might need to change the scvi-tools Solo API a bit to allow this. I can do this relatively soon.

bjstewart1 · March 28, 2021, 5:04pm

Thanks for your thoughts on this @davek44 & @adamgayoso
I think modifying the API to allow for this approach would be extremely useful

adamgayoso · March 28, 2021, 5:49pm

PR is up.

malonzm1 · May 8, 2024, 12:39am

Hi,

I used scvi.external.SOLO.from_scvi_model. How do I find output and solo_model?

Thanks.

malonzm1 · May 8, 2024, 2:16am

This is resolved. Thanks.

Topic		Replies	Views
Batch-Specific Training for Doublet Removal Help scvi , solo	0	422	October 4, 2022
SOLO - Doubt about usage scvi-tools scvi , solo	0	423	October 13, 2022
Scvi-tools solo return synthetic doublets scvi-tools	3	22	March 18, 2025
SOLO - channel vs batch scvi-tools solo , doublets	7	549	July 27, 2021
Scvi for developmental data scvi-tools	4	233	November 29, 2023

SOLO usage - batch, training, predicting

Related topics