3

I am building a Neural Network for a binary classification problem where the Bayes error (lowest possible error rate) is probably close to 50%.

What makes the task easier is that I don't need to make a prediction for each observation of the test sample. I only want to make a prediction for the observations where the model has a fairly high confidence. However a high rate at which predictions are made is better than a low one.

So far, I have used a standard neural network (feed-forward, cross-entropy loss, L2 regularization and sigmoid activation on final node). In the testing sample, I only take into account the observations for which the final node's value $(\hat{Y}_i)$ is outside of an interval of low confidence: $$\text{predicted class}_i = \begin{cases} 1 &\text{ if } \hat{Y}_i > 0.5 + a \\ 0 &\text{ if } \hat{Y}_i < 0.5 - a \\ \text{NA} &\text{else} \end{cases} \\ \text{where } a\in [0, 0.5] \text{ indicates the level of confidence required}$$

To tune the hyperparameters (including $a$), I have designed a metric that depends positively on:

  • Test-sample accuracy (only counting predictions different from NA)
  • Percentage of predictions that are different from NA.

I am not yet satisfied with the performance achieved with this approach, and I am sure that there are smarter ways to approach this, for example a custom loss function. Advices, links to articles, or even related search keywords are welcome.

Green Falcon
  • 13,868
  • 9
  • 55
  • 98
Pierre Cattin
  • 263
  • 1
  • 2
  • 6
  • 2
    You have to add new features to hope in the new feature space you diminish the Bayes error. – Green Falcon May 23 '18 at 00:39
  • Thanks! Feature engineering is a good idea to decrease Bayes error. I'd also like to better exploit the fact that I don't need to make a prediction for all the observations. – Pierre Cattin May 23 '18 at 09:10

1 Answers1

1

You have a high Bayes error rate and it means that you almost can not learn anything. You have to add extra features and investigate whether your data has a small Bayes error or not. Currently, it is worse than a disaster. This large Bayes error illustrates that you have patterns, input vectors, that have completely same components as each feature but different labels. Take a look at here.

Green Falcon
  • 13,868
  • 9
  • 55
  • 98