0

As a software engineers we familiar with a concept of testing (unit, integration, e2e) Tests give us a level of confidence about the code and changes in our code. Looks like for ML the "code" is the data that was used for the model. And unfortunately data not so deterministic as source code.

If I consider that data is kind of code for ML:

  • What technics and tools cane be used for verifying / testing the data? My expectation is to have some tool like TFX for data validation, but more generic (for instance for PyTouch). But I didn't found any generic tools that automate/simplify this challenge.

  • If there is no generic and robust tool, is it worth to make some OSS project for it?

Thanks

  • the closest to unit tests in ML is the series of tests of a trained model on a subset of the real-world data. And then obtaining estimations of future performance. Remember that ML is part of statistics and one reasons with probabilities, not certainties – Nikos M. Jun 28 '21 at 19:43
  • I guess this not cover the full scope of my question. Model's probability is a matter of quality of real-world data. But it is a one-shot action for training a model, it not cover an influence of how real world evolve in time. Unit testing actually a matter of a health check that indicate that my changes won't affect prev bihavior. With ML model after we train it we can only measure it's accuracy in production. And when it starts be not so ok...investigate. I was interested in a tools that can help me to measure influence of a real world changes, like a metrics of features – BogdanSnisar Jun 30 '21 at 07:36
  • One can have a series of test datasets, and re-run model tests on these past datasets to ensure prev behaviour is same, for new model – Nikos M. Jun 30 '21 at 11:57

0 Answers0