How to set DataLoader(drop_last=True) for a model?

mkarikom · September 19, 2022, 11:27pm

Question:

How do I set torch.utils.data.DataLoader(drop_last=True).
In this case, runtime or at the model level works equally well.

Background:

I need to do some latent space arithmetic that requires even batch sizes

mkarikom · September 20, 2022, 6:35pm

In the meantime, I was able to simulate DataLoader(drop_last=True) by subsetting the data so that all batches are of size minibatch_size:

minibatch_size=100
sc.pp.subsample(adata,n_obs=adata.shape[0] - np.mod(adata.shape[0],minibatch_size))

adamgayoso · September 21, 2022, 5:17am

You would need to write your own datasplitter. See here:

github.com

scverse/scvi-tools/blob/87630ad766968023454a9a80a0807212b340b199/scvi/dataloaders/_data_splitting.py#L56


      
              if n_train == 0:
                  raise ValueError(
                      "With n_samples={}, train_size={} and validation_size={}, the "
                      "resulting train set will be empty. Adjust any of the "
                      "aforementioned parameters.".format(n_samples, train_size, validation_size)
                  )
          
          
    return n_train, n_val
          
          

          
class DataSplitter(pl.LightningDataModule):
              """
              Creates data loaders ``train_set``, ``validation_set``, ``test_set``.
          
          
    If ``train_size + validation_set < 1`` then ``test_set`` is non-empty.
          
          
    Parameters
              ----------
              adata_manager
                  :class:`~scvi.data.AnnDataManager` object that has been created via ``setup_anndata``.
              train_size

This would then be used in your own custom train function via the trainrunner

mkarikom · September 21, 2022, 11:04pm

Thanks @adamgayoso!

Following your comment, I’ve created a pull request to provide some additional functionality to DataSplitter and SemiUnsupervisedDataSplitter which makes it possible to define defaults for all data_loader_kwargs (including the existing drop_last=3), while simultaneously keeping AnnDataLoader transparent to the Lightning DataLoader API wrt parameters like drop_last.

Topic		Replies	Views
scVI 21618 problem scvi-tools integration , scvi	5	246	November 7, 2024
Device _make_data_loader for prediction scvi-tools developer	2	463	December 29, 2021
With `data_module`, how do I get the `adata_manager`? scvi-tools	0	78	May 22, 2024
AnnLoader for mudata? mudata developer	1	517	April 18, 2023
Setting num_workers in DataLoader for improving train performance scvi-tools scvi	2	2572	November 21, 2023

How to set DataLoader(drop_last=True) for a model?

Question:

Background:

Related topics