How do you manage datasets where there is a vast difference in the number of instances between the two classes in binary classification?

Question

Anonymous · Accepted Answer

I'd use trees based models, XGBoost and LightGBM has the ability to handle
imbalanced datasets natively. Random Forest can handle imbalanced datasets too
but it's slower in real-time scoring with many trees

How do you manage datasets where there is a vast difference in the number of instances between the two classes in binary classification?

Practice this question with AI

Go Premium

Community Answers

Unlock Full Access

Practice More Questions