BM-is-OP

BM-is-OP t1_jccin4h wrote

When dealing with an imbalanced dataset, I have been taught to oversample on only the train samples and not the entire dataset to avoid overfitting, however this was for structured text based data in pandas using simple models from sklearn. However is this still the case for image based datasets that will be trained on a CNN? I have been trying to oversample only the train data by applying augmentations to the images. However, for some reason I get a train accuracy of 1.0 and a validation accuracy of 0.25 which does not make sense to me on the very first epoch, where the numbers dont really change as the epochs progress which doesn't make sense to me. Should the image augmentations via oversamping be applied to the entire dataset? (fyi I am using PyTorch)

2