0

Cross-posted on Reddit ML.

Should a Feature Store be part of an enterprise data catalog?

To me, a feature store seems to be a highly niche data catalog but missing a lot of the benefits of having an enterprise data catalog / data discovery tool. My need is to have generated features discoverable when searching for data.

For example, if I have dataset A and B used to generate a feature set AB', I would want to know about that information if I search and ever come across dataset A or B in my data catalog.

Along with that, it would be beneficial to have the code / git commit that generated the features.

Am I missing something?

desertnaut
  • 1,908
  • 2
  • 13
  • 23
  • Welcome to DataScienceSE. Can you please clarify what such a feature store would contain? Would it be like a human-readable description of what the feature represents? A list of datasets which have this feature maybe? – Erwan Mar 22 '21 at 11:22
  • Thank you. Similar to existing feature stores, the basic starting point is a group of features that are tagged as part of a feature set with feature lineage: the original data where the features were generated from. Metadata about the feature store features and feature sets would of course be available in both human-readable description in a UI or returned as part of the API. – Pouya Barrach-Yousefi Mar 22 '21 at 21:08
  • I'm not really knowledgeable about this topic but it makes me think of [linked data](https://en.wikipedia.org/wiki/Linked_data), there might be a link there. Also it might be worth looking into how some institutional sites with lots of data deal with this, for example [World Bank Open Data](https://data.worldbank.org/) or the [OECD Statistics](https://stats.oecd.org/), I assume that they have some kind of feature store so that people can search for a particular indicator. – Erwan Mar 24 '21 at 11:42

0 Answers0