i reading metrics used in sklearn find pretty confused following:
in documentation sklearn provides example of usage follows:
import numpy np sklearn.metrics import accuracy_score y_pred = [0, 2, 1, 3] y_true = [0, 1, 2, 3] accuracy_score(y_true, y_pred) 0.5
i understood sklearns computes metric follows:
i not sure process, appreciate if 1 explain more result step step since studying found hard understand, in order understand more tried following case:
import numpy np sklearn.metrics import accuracy_score y_pred = [0, 2, 1, 3,0] y_true = [0, 1, 2, 3,0] print(accuracy_score(y_true, y_pred)) 0.6
and supposed correct computation following:
but not sure it, see if support me computation rather copy , paste sklearn's documentation.
i have doubt if i in sumatory same i in formula inside parenthesis, unclear me, don't know if number of elements in sumatory related number of elements in sample of if depends on number of classes.
a simple way understand calculation of accuracy is:
given 2 lists, y_pred , y_true, every position index i, compare i-th element of y_pred i-th element of y_true , perform following calculation:
- count number of matches
- divide number of samples
so using own example:
y_pred = [0, 2, 1, 3, 0] y_true = [0, 1, 2, 3, 0]
we see matches on indices 0, 3 , 4. thus:
number of matches = 3 number of samples = 5
finally, accuracy calculation:
accuracy = matches/samples accuracy = 3/5 accuracy = 0.6
and question i index, sample index, same both summation index , y/yhat index.