0

How do you treat statistical uncertainties coming from non-convex optimization problems?

More specifically, suppose you have a neural network. It is well known that the loss is not convex; the optimization procedure with any approximated stochastic optimizer together with the random weights initialization introduce some randomness in the training process, translating into different "optimal" regions reached at the end of training. Now, supposing that any minimum of the loss is an acceptable solution there are no guarantees that those minima correspond to the same model performance (i.e., the same scores).

Ideally, one would repeat N times the same optimization and look at the distribution of results, but practically speaking with a large neural network you cannot afford a reasonably large number of replica and use a statistical approach. Moreover, even looking at frequency histograms it would not be trivial to model such a distribution and quote expected values and variance (of course one can selected some percentiles, but it is not a formally correct approach).

Notice: I am not changing the data, I am talking about performance variance introduced by the non-convex optimization problem. Clearly changing the data (for instance with some cross-validation) would introduce a change in the loss and consequently introduce another source of variance in the game. So I am not interested in this.

Dave
  • 13
  • 3
  • according to my experience this is something that is simply tolerated. Almost any minima found by training will do and is simply used. – Nikos M. May 27 '22 at 08:08
  • 1
    I agree that it is tolerated, but usually not estimated. There might be large fluctuations which are not taken into account. In recent literature the majority of papers don't quote those variances, which can be even of the same size of the improvement introduced by the authors respect to their benchmarks. – Dave May 27 '22 at 08:15
  • One can always re-train and change performance. As far as published results are concerned I doubt the effect is significant enough (at least on average) to discredit the publication – Nikos M. May 27 '22 at 08:18
  • 1
    In my experience, restricting to NLP, often the improvement quoted is sub-percent in classification performances, while the variance in training (I think for instance to a Bert based sentence classification model) is at least one order of magnitude larger. Honestly this worries me a bit about the reliability of the research community in this field. – Dave May 27 '22 at 08:27

0 Answers0