Accuracy is a performance metric that is used to measure how good our model is.
Accuracy = #Number of correctly classified points/ Total #Number of points in the test set.
Accuracy lies between 0 to 1 where 0 is bad and 1 is good. Accuracy is very easy to understand the measure.
For eg. we have 100 points in test set, where 60 are +ve and 40 are -ve, our model predicted 53+ve and 35 -ve, so 60-53 = 7 -ve, 40-35 = 5 +ve,
In total, errors are 12 (7 + 5) and our model has correctly classifier 88(53+ 35) points, So the accuracy is 88%.
Problems with accuracy:
- Imbalance Data: suppose in the test set, 90% of test data belongs to -ve class and 10% data to +ve class. there is a “dumb” model, we call it M, which return -ve; so the accuracy of the model M is 90% or 0.9, because of imbalance data our “dumb” model M gets 90% accuracy. So we should never use accuracy in case of imbalanced data.
2. Suppose we have test data with 5 points.
X Y M1 M2 Y1^ Y2^ X1 1 0.9 0.6 1 1 X2 1 0.8 0.65 1 1 X3 1 0.1 0.45 0 0 X4 1 0.15 0.48 0 0
Our model M1 and M2 returns a probability score.
so given a data point Xq->prob(Yq = 1)
which of the models is better?
for X1, M1 is giving 0.9 which is very good whereas M2 gives 0.6. Similarily for X2. and for X3, X4, M1 is giving 0.45 which is nearly equal to half of its score.
Y^: predicted value
So, the predicted class labels are exactly the same between models M1 & M2, looking at probability values M1 is better than M2.
Unfortunately, a measure of accuracy cannot use a probability score. and we use accuracy we have the same scores for Y1^ and Y2^, which is a result of the accuracy.