Overall acurracy +/- E (with 90% C.I.)

Question

I am assessing the accuracy of my classification model. I performed a 4-folds cross-validation and I obtained the following Overall Accuracy: OA = (0.910, 0.920, 0.880, 0.910). So, the average OA is 0.905. My dataset contains 120 samples, therefore in each fold of cross-validation I used 90 samples for training (3/4) and 30 samples for validation (1/4).

Now I want to calculate the 95% confidence interval around the mean. I am thinking of using the following formula to calculate the symmetrical interval with respect to the average (confidence interval of the binomial proportion):

interval = z * sqrt ((accuracy * (1 - accuracy)) / n)

where z is the number of standard deviations from The Gaussian Distribution and n is the number of samples. For 95% C.I. z = 1.96. But,

What I should use n value? 120, 30, 4?

Is there a better way to calculate it?

which classification model are you using ? what do you mean by OA ? — moth, Jan 04 '23 at 14:21
It is a combination of a convolutional neural network and a Random Forest classifier. OA is the Overall Accuracy. I'm computing the following accuracy metrics: OA, Kohen's Kappa Index, and Precision and Recall by class. — sermomon, Jan 04 '23 at 16:57
The standard CI formula will not work. A modified version is proposed in "Cross-validation Confidence Intervals for Test Error" by Bayle et al. — Brian Spiering, Jan 04 '23 at 19:12

Dan · Accepted Answer · 2023-01-04T19:24:05.607

Here $n=120$ will be the right answer as you are calculating the distribution over the total number of trials performed accross all folds.

The confidence interval here is over a series of binary trials, which here is the per-data point classifications.

So the average trial here would be over the total number of trials in the k-fold validations, which is

$$30*4 = 120$$

from Wikipedia

Using the normal approximation, the success probability p is estimated as $$ {\displaystyle {\hat {p}}\pm z{\sqrt {\frac {{\hat > {p}}\left(1-{\hat {p}}\right)}{n}}},} $$ or the equivalent $$ > {\displaystyle {\frac {n_{S}}{n}}\pm {\frac {z}{n{\sqrt {n}}}}{\sqrt > {n_{S}n_{F}}},}{\displaystyle {\frac {n_{S}}{n}}\pm {\frac {z}{n{\sqrt > {n}}}}{\sqrt {n_{S}n_{F}}},} $$ ... measured with ${\displaystyle n}$ trials

In this case, each trial is a classification, since each prediction on validation data point is a Bernoulli trial (binary classifications) with some $n_s$ successes (correct classifications) and $n_f$ failures (incorrect classifications).

Overall acurracy +/- E (with 90% C.I.)

1 Answers1