Is my idea of a Feature Store wrong?

Question

Should a Feature Store be part of an enterprise data catalog?

To me, a feature store seems to be a highly niche data catalog but missing a lot of the benefits of having an enterprise data catalog / data discovery tool. My need is to have generated features discoverable when searching for data.

For example, if I have dataset A and B used to generate a feature set AB', I would want to know about that information if I search and ever come across dataset A or B in my data catalog.

Along with that, it would be beneficial to have the code / git commit that generated the features.

Am I missing something?

Welcome to DataScienceSE. Can you please clarify what such a feature store would contain? Would it be like a human-readable description of what the feature represents? A list of datasets which have this feature maybe? — Erwan, Mar 22 '21 at 11:22
Thank you. Similar to existing feature stores, the basic starting point is a group of features that are tagged as part of a feature set with feature lineage: the original data where the features were generated from. Metadata about the feature store features and feature sets would of course be available in both human-readable description in a UI or returned as part of the API. — Pouya Barrach-Yousefi, Mar 22 '21 at 21:08
I'm not really knowledgeable about this topic but it makes me think of [linked data](https://en.wikipedia.org/wiki/Linked_data), there might be a link there. Also it might be worth looking into how some institutional sites with lots of data deal with this, for example [World Bank Open Data](https://data.worldbank.org/) or the [OECD Statistics](https://stats.oecd.org/), I assume that they have some kind of feature store so that people can search for a particular indicator. — Erwan, Mar 24 '21 at 11:42

Is my idea of a Feature Store wrong?

0 Answers0