5

Handle Imbalanced Dataset

 2 years ago
source link: https://blog.feelyou.top/posts/2490200157.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Handle Imbalanced Dataset

2018-11-30

12

For common cases such as ads clickthrough log, the dataset is extremely imbalanced. This can affect the training of xgboost model, and there are two ways to improve it.
• If you care only about the ranking order (AUC) of your prediction
◦ Balance the positive and negative weights, via scale_pos_weight;
◦ Use AUC for evaluation.
• If you care about predicting the right probability
◦ In such a case, you cannot re-balance the dataset;
◦ In such a case, set parameter max_delta_step to a finite number (say 1) will help convergence.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK