+3 votes
in General by (7.6k points)
I am running Xgboost for the classification of my data. I am currently using the default parameter values and am getting a decent AUC_ROC. Is there any way to determine the optimal values for the Xgboost parameters so that I can get better AUC_ROC?

1 Answer

0 votes
by (14.8k points)

You can use GridSearchCV from sklearn library. It does exhaustive search over the specified parameter values for the estimator and at the end, you can get the optimal values for the parameters. Here is a sample example to use GridSearchCV.

import numpy as np
import xgboost as xgb
from sklearn.model_selection import GridSearchCV

parameters = {
    'n_estimators': [50,100,150,250,300,400,500,1000],
    'max_depth': [5,6,7,8,9,10],
    'max_delta_step': [0,1,2,3,4,5,6,7,8,9,10],
    'min_child_weight': [1,2,3,4,5],
    'subsample': [0.5,0.6,0.7,0.8,0.9,1],
    'colsample_bytree': [0.2,0.3,0.4,0.5,0.6,0.7,0.8],
    'colsample_bylevel': [0.2,0.3,0.4,0.5,0.6,0.7,0.8],
    'learning_rate': [0.002,0.005,0.007,0.008,0.01,0.05,0.07,0.1,0.25,0.5]
    }
bst = xgb.XGBClassifier()
clf = GridSearchCV(bst, parameters, n_jobs=48, scoring='roc_auc', cv=5, verbose=2)
clf.fit(X,Y)  #<-- Here X is you data and Y is label
print (clf.best_params_)
print (clf.best_estimator_)
print(clf.best_score_)

...