2

It's fairly common to see people stacking different models when chasing marginal gains in contexts such as Kaggle competitions or the Netflix challenge. I would like to know about the mathematics behind it. When stacking experts, under the assumptions that they are independent and each one brings a new knowledge/information/signal to the stack, what is the expected convergence rate in terms of KL-divergence, or MSE, or binary cross-entropy? Are there universal behaviours or convergence rates known in this domain?

Brian Spiering
  • 20,142
  • 2
  • 25
  • 102
Learning is a mess
  • 646
  • 1
  • 8
  • 16

0 Answers0