2

I am confronted with a relatively original problem which consists in predicting on which floor of a building audio recordings have been made. I have tried many machine learning approaches but none of them seem to give significant and stable results yet despite applying almost every tweak I found online.

This makes me wonder whether the problem itself is learnable. This may be a problem with the data available, still the question remains for the general case.

I have then two questions.

  • How to know when a problem is not solvable by machine learning? Is there something like a rule of thumb to check at the beginning?

  • How to know when the performances obtained are close to the best ? You can see that in my case, the human level is not very useful because it is close to random for this problem.

nprime496
  • 65
  • 5
  • 2
    [How to know that your machine learning problem is hopeless](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwjs4YeIsZT5AhU-kIkEHdKQDbQQFnoECA4QAQ&url=https%3A%2F%2Fstats.stackexchange.com%2Fquestions%2F222179%2Fhow-to-know-that-your-machine-learning-problem-is-hopeless&usg=AOvVaw3Ecza8M1I0SE2WNgWMmmZc) – Dave Jul 25 '22 at 15:45
  • only problems where probability can be learned. ml can make probability predictions into the future. Fractals and Cell automa may be learned by ai to make predictions about patterns of complex tapestry. – Golden Lion Jul 26 '22 at 15:14

2 Answers2

3

The short answer is not all problems are solvable by machine learning and no one knows which problems are currently solvable.

Given that, there are common heuristics that offer clues towards solvability. One heuristic is: Can a human solve it? If a human can do the task, there is a greater chance that there could be a possible machine learning solution. Another heuristic is incremental learning. Does adding more data or more training increase the performance of the machine learning system? If there is an increase in performance then there is a better chance with enough data and training, the task is solvable.

Specifically for your problem, there is a good chance it is not solvable. You state that humans are unable to perform above chance - "human level is not very useful because it is close to random for this problem." This limits possible solutions because there will not be enough high-quality training labels that are required for supervised machine learning.

Brian Spiering
  • 20,142
  • 2
  • 25
  • 102
2

I completely agree with Brian's answer, I just want to add a couple points:

  • When starting on a completely new problem, it is important to research whether there is any similar problem which has been studied before. In case there is some system which has been tested on this problem (or a related problem), how did it perform? If it's very hard to find anything remotely related to the problem, either this an exciting opportunity to break new ground... or more likely it's not doable.
  • If there's no system available to compare against, it's useful to start by designing a baseline system. Of course there's the very crude option of a random or majority baseline classifier, but usually it's not too much effort to design a simple system (e.g. a simple decision tree). This serves several goals:
    • If the baseline doesn't work at all (not better than random), this is a sign that there's a serious issue in the design of the problem.
    • If it works, it gives an indication about the level of performance that can be expected.
Erwan
  • 24,823
  • 3
  • 13
  • 34