"Learning" algorithm to use when future depends on past events (MDP property not met)

Question

There are around 5 different retirement plans available in my country. People can pick from them freely. I would like to create a solution that would try to predict the best plan(s) given a particular set of initial conditions (representing a person current situation).

My plan was to use Q-learning for this, however the problem I had encountered is that future rewards seem dependant on past actions:

Some benefits (for example reduced tax rate during "cash-out" at the beginning of retirement) will be available only if the agent had chosen to invest in particular plan at least X times in the past.
Investment return will depend on how much was invested in particular plan in the past.

I believe that I could solve the second problem by calculating a reward based only on last decision, but I see no way around first case.

My question is: Am I correct that q-learning, or any other reinforcement learning algorithm, won't be able to take into account those conditions (especially the first one)? If so, would you have any suggestion for AI algorithm that could find the optimal solution in such scenario?

I highly appreciate your help!

one approach for 1st problem would be to create a new feature that is an aggregate of previous times and measures how many times person invested in a plan, then this can be a new feature — Nikos M., Apr 30 '21 at 10:26
My understanding of RL algorithms is still limited, so maybe I'm missing something, but wouldn't such a feature have to be translated somehow to reward at the end of each turn? If so, it should affect reward only if algorithm picked it up during exploration in this episode at least X times, right? — White_Raven, Apr 30 '21 at 12:38
Hmm, no I dont think so it can be a new feature you see, independent of the others. But on the other hand I am not an expert in RL — Nikos M., Apr 30 '21 at 15:00

"Learning" algorithm to use when future depends on past events (MDP property not met)

0 Answers0