Definition

The Matthews Correlation Coefficient (MCC) is a statistic used to evaluate the quality of binary classifications. It measures the correlation between the observed and predicted classifications, resulting in a value between -1 and +1. Unlike accuracy or the F1-score, MCC is a balanced measure that remains reliable even when classes are of significantly different sizes because it considers all four cells of the confusion matrix.

Formula

The MCC is calculated using all four values of the confusion matrix: True Positives (), True Negatives (), False Positives (), and False Negatives ().

Example

Let’s consider a medical screening test for a rare disease. We test 100 patients. The disease is only present in 10 of them.

Predicted: PositivePredicted: NegativeRow total
Actual: Positive7 ()3 ()10
Actual: Negative5 ()85 ()90
Column total1288100

1. Accuracy

At 92%, accuracy is misleadingly high because it is dominated by the majority (healthy) class.

2. The F1-score Paradox

The primary flaw of the F1-Score is that it is asymmetric: it changes depending on which class you define as “Positive.”

Let’s illustrate this by first calculating the F1-score of correctly predicting sick patients (our base example).

Now, let’s assume we simply swap our perspective and define healthy patients as Positive. The prediction task hasn’t changed, only how we label outcomes. The new confusion matrix is:

Predicted: PositivePredicted: NegativeRow total
Actual: Positive85 ()5 ()90
Actual: Negative3 ()7 ()10
Column total8812100
and the new F1-score is:

The F1-score gives two completely different evaluations (0.64 vs 0.96) for the exact same model performance just by changing the label name !

3. Calculate MCC

Conclusion

While the accuracy was 0.92, the MCC is ~0.60. This provides a more realistic view of the model’s performance, showing that while it is good, it is not nearly as “perfect” as the 92% accuracy might suggest for this imbalanced dataset.

Usage

  • Interpretation of the Score:

    • +1: Represents a perfect prediction.
    • 0: No better than a random prediction.
    • -1: Indicates total disagreement between prediction and observation.
  • Imbalanced Datasets: MCC is much more reliable than accuracy or F1-score when class sizes vary significantly. Because it considers all four cells of the confusion matrix equally, a model must perform well on both the majority and minority classes to achieve a high score.

  • Symmetry: Unlike metrics like Precision, Recall or F1-score, MCC is symmetric. If you swap the “Positive” and “Negative” definitions, the MCC value remains unchanged.

  • Comparison with Cohen’s Kappa: While both account for chance, MCC is a direct correlation coefficient. In modern ML research, MCC is often preferred because it is more mathematically robust to extreme class imbalances than Kappa.