10

What is the best cost function to train a neural network to perform ordinal regression, i.e. to predict a result whose value exists on an arbitrary scale where only the relative ordering between different values is significant (e.g: to predict which product size a customer will order: 'small' (coded as 0), 'medium'(coded as 1), 'large' (coded as 2) or 'extra-large'(coded as 3))? I'm trying to figure out if there are better alternatives than quadratic loss (modeling the problem as an 'vanilla' regression) or cross-entropy loss (modeling the problem as classification).

xboard
  • 348
  • 3
  • 14

1 Answers1

9

Another approach was suggested in this paper for face age estimation:

https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Niu_Ordinal_Regression_With_CVPR_2016_paper.pdf

The authors use a number of binary classifiers predicting whether a data point is larger than a threshold, and do this for multiple thresholds. I.e. in your case the network would have three binary outputs corresponding to

  • larger than 0
  • larger than 1
  • larger than 2.

For example, for 'large (2)' the ground-truth would be [1 1 0]. The final cost function is a weighted sum of the individual cross-entropy cost functions for each binary classifier.

This has the advantage of inherently weighting larger errors more because more of the individual cost-entropy terms will be violated. Simply doing categorical classification of the ordered outcomes doesn't inherently have this feature.

Chrigi
  • 231
  • 1
  • 3