Neural Networks - Strategies for problems with high Bayes error rate

Question

I am building a Neural Network for a binary classification problem where the Bayes error (lowest possible error rate) is probably close to 50%.

What makes the task easier is that I don't need to make a prediction for each observation of the test sample. I only want to make a prediction for the observations where the model has a fairly high confidence. However a high rate at which predictions are made is better than a low one.

So far, I have used a standard neural network (feed-forward, cross-entropy loss, L2 regularization and sigmoid activation on final node). In the testing sample, I only take into account the observations for which the final node's value $(\hat{Y}_i)$ is outside of an interval of low confidence: $$\text{predicted class}_i = \begin{cases} 1 &\text{ if } \hat{Y}_i > 0.5 + a \\ 0 &\text{ if } \hat{Y}_i < 0.5 - a \\ \text{NA} &\text{else} \end{cases} \\ \text{where } a\in [0, 0.5] \text{ indicates the level of confidence required}$$

To tune the hyperparameters (including $a$), I have designed a metric that depends positively on:

Test-sample accuracy (only counting predictions different from NA)
Percentage of predictions that are different from NA.

I am not yet satisfied with the performance achieved with this approach, and I am sure that there are smarter ways to approach this, for example a custom loss function. Advices, links to articles, or even related search keywords are welcome.

You have to add new features to hope in the new feature space you diminish the Bayes error. — Green Falcon, May 23 '18 at 00:39
Thanks! Feature engineering is a good idea to decrease Bayes error. I'd also like to better exploit the fact that I don't need to make a prediction for all the observations. — Pierre Cattin, May 23 '18 at 09:10

score 1 · Answer 1 · answered May 23 '18 at 12:38

You have a high Bayes error rate and it means that you almost can not learn anything. You have to add extra features and investigate whether your data has a small Bayes error or not. Currently, it is worse than a disaster. This large Bayes error illustrates that you have patterns, input vectors, that have completely same components as each feature but different labels. Take a look at here.

Neural Networks - Strategies for problems with high Bayes error rate

1 Answers1