We have a binary classification task (0,1), we have two classes.

We have a 2*2 matrix.

predicted/actual 0 1 0 a b 1 c d

and we have a test set as follows:

X1 Y1 Y^1 X2 Y2 Y^2 . . . . . . . . . . . . Xn Yn Y^n

where X1, X2 ….. Xn is datapoints, and Y1, Y2,…….Yn is the actual label and Y^1, Y^2,….. Y^n has predicted class labels.

The confusion matrix cannot process probability scores. so Y1, Y1^ are binary because each data point can either in class 0 or class 1.

Now given this data, how many did I have which are actually 0 and also predicted 0.

a: #number of points such that Yi = 0 and Yi^ = 0.

b: #number of points such that Yi =1 and Yi^ = 0.

c: #number of points such that Yi = 0 and Yi^ = 1.

d: #number of points such that Yi = 1 and Yi^ = 1.

This matrix is called the confusion matrix because it tells us all 4 the possibilities for the binary classification tasks.

for multi-class classification, suppose you have c classes.

you write c*c matrix.

predicted/actual 0 1 . . . . c-1 0 1 . . . . c-1

if the model is sensible, not dumb, your model should predict a, d would be high, and b,c should be small.

similarily, the principal diagonal elements should be high and all the off-diagonal elements should be small if the model is good.

In binary classification, there is a special name to each and every cell.

a: True Negative

b: False Negative

c: False Positive

d: True Positive

This terminology is confusing so there is a trick to remember it

for TP, T means you are correct and P is what is the predicted label.

The second value is always what we are predicting like in TN and FN, we are predicting Negative and TP, FP, we are predicting Positive.

The first value, the actual value is like in TN, we have 0,0. similarity for 0,1 we have FN, 1,0 we have FP and for 1,1 we have TP.

whenever you are confused, please draw a confusion matrix and you will eventually understand it.

Now, the sum of FN+TP = Total number of Positives (P)

the sum of TN + FP = Total number of Negatives (N)

N+P = Total number of points(n)

- True Positive Rate: TP/P

2. True Negative Rate: TN/N

3. False Positive Rate: FP/N

4. False Negative Rate: FN/P

These 4 rates are very important.

Let’s see an example:

In a test set, we have 900 -ve and 100 +ve, imbalanced dataset.

predicted/actual 0 1 0 850 6 1 50 94

P = 100

N = 900

n = 1000

Now let’s look at our 4 rates:

TPR: TP/ P = 94/100 = 94%

TNR: = TN/N = 850/900 = 94%

FPR = FP/N = 50/900 = 5%

FNR = FN/N = 6/100= 6%

Model is good if TPR is high, TNR is high, FPR is low and FNR is low. Even with imbalanced dataset our confusion matric and rates are a good measure to check the performance of model whereas accuracy couldn’t do that.

Now, which is the best number to consider out of 4?

This is domain-specific, for eg. if we are diagnosing cancer, we need very high TPR and very low FNR because we can not miss someone who is having cancer and we are predicting not cancer that is more dangerous than predicting someone has cancer, later they will go through more tests and that will be fine but we cannot miss any patient because it is a human cost.