Rigor Theory. I wish to learn the scientific method and how to apply it in machine learning. Specifically, how to verify that a model captured the pattern in data; how to rigorously reach conclusions based on well-justified empirical evidence.
Verification in Practice. My colleagues in both academia and industry tell me measuring the accuracy of the model on testing data is sufficient, but I don't feel confident such criteria are sufficient.
Data Science Books. I have picked up multiple data science books, like Skiena's manual, Dell EMC's book, and Waikato's data mining. Even though there had been a section for diagnosing the model and measuring results, my instinct worries are these are heuristics, but not rigour-based.
Scientific Method Books. Searching for the scientific method I found, Statistics and Scientific Method: An Introduction for Students and Researchers and Principles of Scientific Methods, which seem to answer the crux of my question. I am planning to study both of them.
My Questions. Here are couple of questions I hope to gain guidance on, from your wonderful community.
- Is it feasible to rigorously apply the scientific method in machine learning applications like recommendation engines or social sciences, or is it the case that so far our scientific/technological advancement didn't reach that degree of maturity, and that the best we can hope for is heuristics-based approximations.
- Is it feasible to do machine learning in practical industry, by applying the scientific method, or is it the case that industry leaders prefer cheap heuristics in order to minimize a project's costs?
- Are the scientific method books I mentioned above useful for enhancing my own skills in machine learning? Are they worthwhile the effort and time?
- Are you aware of better alternative resources for learning the scientific method? Are there more helpful courses or recorded lectures?
- Do you have any recommendations or advise, while studying the scientific method, for someone who is mainly motivated by machine learning in industry like recommendation engines applications and logistical optimization?