9

Most advanced supervised learning techniques are non-deterministic by construction. The final output of the model usually depends on some random parts of the learning process. (Random weight initialization for Neural Networks or variable selection / splits for Gradient Boosted Trees). This phenomenon can be observed by plotting the predictions for a given random seed against the predictions for another seed : the prediction are usually correlated but don't coincide exactly.

Generally speaking it is often not a problem. When trying to separate green tomatoes from red ones, only overall performance of the classifier matters. Individual predictions don't really matter as no tomato will be upset / will sue you. However, for more adavanced problems, more specifically those relating to people (education, work, loan application...) variance in individual scores due to the non-deterministic learning process might become a problem. Basically some people might get a life impacting result based on a given seed and the opposite result had you used another seed. This doesn't seems very fair or ethical to me. Choosing a seed sort of feel like a trolley problem...

Outside of techniques that might be used to reduce this 'seed-dependence' (regularization / ensemble and so on), I would be interested on the ethical aspects of this variance in outputs relating to the random seed. But I can't find any resource on the ethical matter. (I can barely find some resources on the topic of seed dependence - I suspect this phenomenon is not widely disclosed as it might deter people from wanting to use 'Artificial Intelligence').

In the context of models impacting people lives, have the ethical consequences of non-deterministic learning processes been formalized / evaluated?

Nikos M.
  • 2,301
  • 1
  • 6
  • 11
Lucas Morin
  • 2,513
  • 5
  • 19
  • 39
  • 1
    Interesting question and a +1 from me, though I do wonder if this would be a better fit on the AI Stack. – Dave Jun 23 '21 at 11:55
  • I wondered about it too, but couldn't find a strong argument for the AI SE. – Lucas Morin Jun 23 '21 at 11:58
  • 3
    This is a great question! Summarily there is no (current) way to employ "ethical" requirements into machine learning. So the next best thing is to take the results only as indications and have humans make the final evaluation. IMO – Nikos M. Jun 23 '21 at 14:42
  • 4
    Interpretability of deep models also plays a role, in that if the model is NOT interpretable, then both the error cannot be accounted through to causes plus there is no way to extract meaning from the model (thus cannot justify ethically the result) – Nikos M. Jun 23 '21 at 14:45
  • My previous comment can also mean that if interpretable AI succeeds, then ethical reasoning about the machine learning result, is at least not impossible – Nikos M. Jun 23 '21 at 14:47
  • 1
    The random seed phenomenon is not discussed much (I presume) because on average it doesnt matter so much. Training many models the random effects usualy are of no consenquence on average. That does not mean that any concrete model will not suffer and if people's lives are considered have serious ethical consequences as well. Plus I agree that some shortcomings of AI are not discussed, this being one of them. – Nikos M. Jun 23 '21 at 16:57
  • You may want to check out [this researcher](https://sites.google.com/site/zliobaite/non-discriminatory) who works on fairness and discrimination in AI. – Nikos M. Oct 23 '21 at 10:29
  • Cynthia Rudin has done some really nice work in this area, please check out her lectures and papers. Some are available on youtube. – Rajiv Sambasivan Feb 07 '22 at 03:34

2 Answers2

1

You could measure this effect and mitigate possible ethical issues by training several models, each with their own random seed.

When making a prediction that may effect someones life, don't make it automatically if the variance of the actions one would take as a result of these predictions is over a chosen threshold.

If the decisions must be made without a human in the loop - one could use the variance of the predictions as a part of your decision criteria when choosing amongst the different models (referring to different model architectures, not seeds) at your disposal.

Michael Higgins
  • 351
  • 2
  • 7
1

You seem to throw some things into one bag here that each should be evaluated individually and actually are in current research intensively:

  • Random aspects of model input
  • Unfair aspects of model input
  • Biased aspects of model input
  • Random aspects of the process (learning behaviour)
  • Unfair aspects of the process
  • Biased aspects of the process
  • Random aspects of the output
  • Unfair aspects of the output
  • Biased aspects of the output
  • explainability (XAI) as an own sub-field
  • Result reproducibility
  • Result consensus (Do 2 or multiple independent AI research teams come to the same conclusions for the same data set? Or if not, how much do their results differ and in what patterns?)

Some practical issues influencing ethical application of AI methodology arise in real-world circumstances, too:

  • Outdatedness of models trained with old data
  • Reduced potential for humans to stop machine-made decisions (even when humans are still kept in the loop and making the final decision, with fewer headcount they might only oversee a fraction of all decisions, like e.g. edge cases and overlook others, so where to set this threshold?)
  • Potential lost jobs for humans in the decision-process before (influences social aspects of ethics with the main question being "Is it more ethical to pay a human for a job instead of a machine with electricity?")

So I would object to viewing this as something undisclosed, and rather view this as an ongoing debate and research topic currently gaining some traction.

And that's exactly what especially the model transparency initiatives are achieving by aiming at the replacement of black box models by explainable AI.

And for example this new research currently in review which came out after you asked the question indicates that maybe in the future the non-probabilistic seeding is not necessary (which still doesn't solve all the other bullet points mentioned above):

This means that considering especially the non-deterministic learning process and model initialization aspects, this randomness likely has a rather negligible effect when compared to other ethical considerations, which might rather explain that it is overlooked more in comparison to more pressing ethical dilemmas, despite it being an absolutely valid aspect of ethical considerations to make.

ABC
  • 31
  • 2