3

Considering application of Reinforcement learning(dynamic programming method performing value iteration) on grid world, in each of the iteration, I go through each of the cell of the grid and update its value depending on its present value and the present value of the taking action from that state. Now

  1. How long do I keep updating value of each cell? Shall I keep updating unless the change in the previous and the present value function is the least? I am not able to understand how to implement the stopping mechanism in the grid-world scenario(discount not considered)

  2. Is the value function the values of all the grids in the grid world?
Martin Thoma
  • 18,630
  • 31
  • 92
  • 167
girl101
  • 1,161
  • 2
  • 11
  • 25

1 Answers1

3

1- You should set a threshold (a hyper-param) that will allow you to quit the loop.

Let V the values for all state s and V' the new values after value iteration.

if $\sum_s|V(s) - V’(s)| \le threshold$, quit

2 - V is a function for every cell in the grid yes because you need to update every cell.

Hope it helps.

Neil Slater
  • 28,338
  • 4
  • 77
  • 100
Dref360
  • 161
  • 1
  • how do I set a threshold... What i am doing id the update the value of each grid with respect to the grids that the control can go to from the present grid.. What do you mean by saying V is a function – girl101 Aug 06 '15 at 04:00
  • $V(s)$ is a function that returns the utility of that state. In a computer program, where you have enumerated the states, you may well end up modelling $V$ as a simple array and treat it as an array lookup – Neil Slater Aug 06 '15 at 08:53
  • how do I set the threshold – girl101 Aug 10 '15 at 05:08
  • Make some test to what is best for you. Typically 0 is the optimal solution. That means that there is no better solution than this one. Since it's an hyperparam, you can learn it via a neural network. – Dref360 Aug 10 '15 at 20:55
  • @Dref360 i want to learn it via dynamic programming , I dont want to learn it via neural, – girl101 Aug 11 '15 at 03:53
  • @Dref360 what is hyperparam, i googled, i got the term hyperparameter, i that the short form of hyperparam ? – girl101 Aug 11 '15 at 03:54
  • @Dref360 can I stop learning when I notice no new updation in any of the states ?? – girl101 Aug 11 '15 at 04:19
  • @Rishika HyperParam == HyperParameter for exemple in neural network : number of layer, number of hidden neuron. Yes you can stop learning when there is not update in the state. That mean there is no better solution. – Dref360 Aug 11 '15 at 21:19
  • okay, got it :) – girl101 Aug 12 '15 at 03:45