[XGBoost] What value should be selected for parameter "scale_pos_weight"

Question

[XGBoost] What value should be selected for parameter "scale_pos_weight"

1 Answer

answered Feb 2, 2022 by pkumar81 (349k points)
selected Sep 18, 2022 by pkumar81

Best answer

The short answer to this question is "it depends on the data". Not one value will be suitable for all types of data.

According to XGBoost's documentation, in a binary classification problem,

scale_pos_weight = number of majority class records/number of the minority class records.

In your case, scale_pos_weight = number of class 0 records/number of class 1 records.

However, if your data is highly imbalanced, the above formula might not give you the best results. Sometimes, square_root (number of class 0 records/number of class 1 records) might provide better results.

In my opinion, one should run GridSearch to find the optimal value of scale_pos_weight. Without scale_pos_weight, when the number of class 0 records is very high compared to the number of class 1 records, you get poor results for recall [tp / (tp + fn) where tp is the number of true positives and fn the number of false negatives]. So, in GridSearch, use recall as a scoring parameter. Thus, the GridSearch will find the optimal value of scale_pos_weight that returns the best recall.

Here is a template for the GridSearch code:

import xgboost as xgb
from sklearn.model_selection import GridSearchCV

max_spw = count(0)/count(1)
model = xgb.XGBClassifier()
xgb_grid_params = {
'scale_pos_weight': [i for i in range(1, max_spw, 5)]
}
gs = GridSearchCV(model, param_grid=xgb_grid_params, scoring="recall", cv=5, verbose=7)
gs.fit(data, label)
print(gs.best_params_)

[XGBoost] What value should be selected for parameter "scale_pos_weight"

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions

Categories