1

I have been using epsilon greedy action selection strategy and recently have come across boltzmann(softmax) action selection strategy. One thing I am not clear about boltzmann exploration is the temperature variable. How should we define this variable. Is this a constant variable or should be decreased over the period of training. and how to decide on the absolute value of this parameter?

Thanks

chink
  • 555
  • 9
  • 17

0 Answers0