There are around 5 different retirement plans available in my country. People can pick from them freely. I would like to create a solution that would try to predict the best plan(s) given a particular set of initial conditions (representing a person current situation).
My plan was to use Q-learning for this, however the problem I had encountered is that future rewards seem dependant on past actions:
- Some benefits (for example reduced tax rate during "cash-out" at the beginning of retirement) will be available only if the agent had chosen to invest in particular plan at least X times in the past.
- Investment return will depend on how much was invested in particular plan in the past.
I believe that I could solve the second problem by calculating a reward based only on last decision, but I see no way around first case.
My question is: Am I correct that q-learning, or any other reinforcement learning algorithm, won't be able to take into account those conditions (especially the first one)? If so, would you have any suggestion for AI algorithm that could find the optimal solution in such scenario?
I highly appreciate your help!