Dear SCVI team,
I hope this message finds you well. I am currently utilizing scVI for my research and have some questions regarding the data input and analysis process, particularly concerning differential expression (DE) analysis. Your insights would be greatly helpful in advancing my understanding and application of the tool.
- Data Input for DE Analysis: When conducting DE analysis using scVI, what specific type of data should be inputted into the model? Is it the raw count data, normalized data within the model, or another form of standardized data? I noticed that in tutorials, there is no explicit specification of the data input type, which has led to some uncertainty about what exactly is being used for DE analysis.
- Interpretation of DE Results: Regarding the DE results, specifically when a
mean_log2FC
value is positive and exceeds 0.5, how should this be interpreted in terms of the comparison between two groups, say Group A and Group B? Does a positive value indicate that gene expression in Group A is greater than in Group B, or vice versa? - Inclusion of Batch Parameters: In community discussions, there is often debate over whether to include a ‘batch’ parameter in the model, and how setting it to True or False might affect the outcomes. Could you provide some guidance on when it is advisable to include this parameter and when it might not be necessary?
- Data for Visualization: For visualizing markers or DE results, should we stick to the data form used in DE analysis or can we use normalized or scVI-normalized data?
- Scanpy Integration: Regarding the integration with scanpy for DE analysis, is normalized data typically used in scanpy, while scVI might use a different form of data?
I apologize for the multitude of questions, but your expertise would greatly clarify these crucial aspects, enabling more accurate application and interpretation of scVI in my work.
Thank you very much for your time and assistance.
Best regards,