Python : Using Matthews correlation coefficient (MCC) as Evaluation Metric in XGboost

In XGboost classifier, if you do not specify the value of the parameter ‘eval_metric’, the default value is used according to the value of the objective function (e.g. rmse for regression, and error for classification, mean average precision for ranking). Matthews correlation coefficient (MCC), which is used as a measure of the quality of binary classifications, is not present in the list of values of the parameter ‘eval_metric’.

The Matthews correlation coefficient is regarded as being one of the best measures if the classes are of very different sizes i.e. imbalanced class (e.g presence of noisy labels in the data). To use MCC as eval_metric, you need to define a function and use that function as the value. Look at the following sample code. This may not be the best implementation, but it will give you an idea about how to use any user-defined value as eval_metric.

def evalmcc(preds, dtrain):
    THRESHOLD = 0.5
    labels = dtrain.get_label()
    return 'MCC', matthews_corrcoef(labels, preds >= THRESHOLD)

if __name__ == "__main__":
    bc = load_breast_cancer()
    X =
    y =

    clf = xgb.XGBClassifier()
    x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.33), y_train, eval_set=[(x_train, y_train),(x_test,y_test)], eval_metric=evalmcc, verbose=False)
    print (accuracy_score(y_test, clf.predict(x_test)))

The values that you can use as eval_metric without defining any function are listed below:

rmse: root mean square error
mae: mean absolute error
logloss: negative log-likelihood
error: Binary classification error rate. It is calculated as #(wrong cases)/#(all cases). For the predictions, the evaluation will regard the instances with prediction value larger than 0.5 as positive instances, and the others as negative instances.
error@t: a different than 0.5 binary classification threshold value could be specified by providing a numerical value through ‘t’.
merror: Multiclass classification error rate. It is calculated as #(wrong cases)/#(all cases).
mlogloss: Multiclass logloss.
auc: Area under the curve
ndcg: Normalized Discounted Cumulative Gain
map: Mean average precision
ndcg@n, map@n: ‘n’ can be assigned as an integer to cut off the top positions in the lists for evaluation.
ndcg-, map-, ndcg@n-, map@n-: In XGBoost, NDCG and MAP will evaluate the score of a list without any positive samples as 1. By adding “-” in the evaluation metric XGBoost will evaluate these score as 0 to be consistent under some conditions.
poisson-nloglik: negative log-likelihood for Poisson regression
gamma-nloglik: negative log-likelihood for gamma regression
cox-nloglik: negative partial log-likelihood for Cox proportional hazards regression
gamma-deviance: residual deviance for gamma regression
tweedie-nloglik: negative log-likelihood for Tweedie regression (at a specified value of the tweedie_variance_power parameter)


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.