I am adding this topic because my colleague and I had this idea and I was wondering what the community’s suggestion would be for this.
In the creation of my ecosystem package I have added quite a few functions which would be really handy for the purposes of preprocessing/analysis if they were class methods rather than internal functions in the package. At first I was thinking of subclassing the AnnData object, but that would probably be at a heavy compatibility loss. Then I wondered if you could just temporarily add the functionality by adding class methods to already initialized AnnData objects (ie, ‘equip’ the methods to the object temporarily for the purposes needed at the time). Once the process that needed the extra functionality is done the AnnData objects can shed the extra functionality (equipment) and continue on as regular AnnData objects.
What are your thoughts on this? Please be as constructively critical as possible! I am new to the software dev space so maybe this issue is already addressed by some other routine practice I am unaware of.
Sorry about the late response! Conference + vacation season.
This sounds interesting, though I’m not quite sure what it would look like. Are there any APIs you use which do something similar? Examples would be very useful in explaining your idea here.
An alternative here would be to have pandas style extension methods.
No worries at all. @adamgayoso Yes I hadn’t seen this yet but this is the exact concept I was envisioning.
The functionality I would like to add involves the following:
- ability to perform groupby (as in pandas dataframe.groupby) operations on X/obsm/obs for application of functions in batches
- downsampling datapoints by unique groups of a label in obs/obsm
- bootstrapping datapoints by unique groups of a label in obs/obsm
- collapse anndata object by unique groups of a label in obs/obsm
- Alternate constructor/read from my file type
- Addition of check functions for internal use
- Getter functions which allow for extraction of metadata/labels alongside data
- Getter functions which allow for slicing of specific features
I already wrote the above functionality into a subclass of anndata for my own package. I realized that the compatibility loss was not really an issue since the kinds of analyses required in my package are relatively unique. Additionally, I believe I can always pull out the AnnData object from my subclass by saving it to a file and then reading it in like a regular AnnData.