Ah yes, my guess is that this is occurring because the size of your dataset modulo the default batch size (128) is 1, so the last minibatch during an epoch only has one cell. Thus, the batch norm layer is complaining since it can’t compute normalization statistics on just one observation.
The simplest fix for this would be to be pass in some batch size other than 128 to train. Hope this helps!