0

How can I introduce bias for a decision tree model while building an ML application?

e.g. If I am building a stock trading recommendation algorithim, I would want to recommend a stock only when the model detects a probability of swing (upturn and downturn) but, when I have a set of stocks that I have defined as volatile, I would like the model to recommend them only when the probability of swing is above a certain value. Can I define this as bias? How can I introduce this in a model?

Can I:

  • Introduce a categorical varable that defines a certain stock as volatile and then fit?

or

  • Set a value to such a stock as categorical and then fit?

Apologies I am not able to explain my question better but essentially, I want to introduce bias in a model. What is the correct approach to doing it?

PyNoob
  • 83
  • 8
  • 2
    Please note that the term *bias* might be confusing here since in ML it has different meanings and often relates to the *bias-variance tradeoff*. See, for example, the first part of [this answer](https://datascience.stackexchange.com/a/97273/84891). – Jonathan Jul 13 '21 at 07:06

1 Answers1

1

In general it's a bad idea to try to force a model to do something: ML is supposed to be data-driven, so if the data doesn't represent the particular desirable pattern then either there's a good reason for that (i.e. the pattern is not as relevant as one thinks it is) or the data is not suitable for the task (or noisy, incomplete...).

You don't give any detail about the current model so there's no way to know whether introducing a variable will change the model the way you want, it depends how the ranking is calculated (assuming there's a ranking involved).

Keep in mind that there's no reason to make the model do everything itself, especially if it's not based on the data. It might make sense to do some rule-based pre- or post-processing. In the case you mention it would be simple to post-process the prediction: if the stock is volatile and the probability is lower than the threshold then ignore this stock.

Erwan
  • 24,823
  • 3
  • 13
  • 34
  • Yes, I would completely agree that its a bad idea to force a model to provide a desired output however, if need be, introducing bias into a model raises interesting possibilities. Can you help me with an example if it has been done before? – PyNoob Jul 13 '21 at 13:09
  • 1
    @PyNoob what you're describing is a deterministic outcome based on some variables. The way to make it happen *in the model* (I insist that it doesn't have to be part of the model itself, it's usually not efficient this way) is to train the model with data which satisfies exactly this condition. But you didn't give any detail about the current design of your system so I can't be more specific about your case. – Erwan Jul 13 '21 at 13:23