How to deal with imbalanced datasets?
Last updated
Was this helpful?
Last updated
Was this helpful?
is not the best metric to use when evaluating imbalanced datasets as it can be very misleading. It's better to try:
Always split into test and train sets BEFORE trying to resample techniques! And applying resample ONLY in the training set.
Oversample: by adding more copies of the minority class.
Undersample: by removing observations from the majority class.
SMOTE (Synthetic Minority Oversampling Technique): uses a kNN algorithm to generate new and synthetic data we can use for training our model.