Can we use embeddings or latent vectors for a recommender system?

Question

I'm having a hard time understanding why people use any vector they find as a candidate for a recommender system.

In my mind, a recommender system requires a space where distance represents similarity. Of course, before you can construct such a space, first you need to settle on the type of distance you want to use (euclidean, angular, or anything else). Then you need a model (assuming we are talking about ML) to map your input (it could be an image, text, or anything else) to a point in that space. One major aspect of this model is that it's aware of the type of distance we've defined. If there's no notion of the distance in the model, definitely the output of the model is not going to have the attribute of "distance means similarity".

I'm asking this question because I've seen people use any vector they find to construct a recommender system. Here's an example of using a VAE's latent vectors for recommender systems:

https://developer.nvidia.com/blog/building-recommender-systems-faster-using-jupyter-notebooks-from-ngc/

I've also seen people using fastText word embeddings in the same way. I understand that all these embeddings/latent vectors form clusters in their spaces with some interesting patterns. But I don't think this is enough to assume the "distance represents similarity" requirement for a recommender system.

Please let me know if I'm missing anything here.

I think the training process forces the vectors that do the best job to emerge. Posting this as a comment, as "it works because it works" doesn't feel like the answer you were after :-) — Darren Cook, Feb 15 '21 at 21:21
@DarrenCook But the question is "does it really work?" or they are just assuming that it does? — Mehran, Feb 20 '21 at 15:55
It is trivial to make word embeddings, and reproduce results; or simply use a freely available pre-trained model. Contextual embeddings (like BERT) beat simple word embeddings (wordvec, fasttext, etc.) beat the algorithms that came before, in just about all NLP tasks. They are not perfect, but that is what I meant by "it works". — Darren Cook, Feb 20 '21 at 16:48

score 1 · Answer 1 · answered Jul 25 '21 at 15:31

1

You are correct that recommender systems that map similarity to distance is useful.

Vector representations are useful because most machine learning learning tools are based on linear algebra. Vector representations encode raw data in form that amenable to machine learning.

"Any" vector representation is more useful than no vector representation. For example, one-hot encoding is often more useful than not including the feature. However, distance in one-hot encoding is not related to similarity.

What is even more useful are semantic vectors (e.g., word2vec and related techniques). Semantic vectors map contextual meaning into locations in a learned vector space. This is an example of where distance is a proxy for similarity.

answered Jul 25 '21 at 15:31

Brian Spiering

20,142
2
25
102

Thanks for the answer. But I believe your very first sentence needs tweaking. "similarity to distance is useful" should be "distance to similarity is **necessary**". I believe if you have your data mapped into a vector space where distance does not represent similarity, that space cannot be used by a recommender system **directly**. – Mehran Jul 25 '21 at 15:37
@Mehran One-hot encoding can be directly used by an ML recommender; as Brian says here, distance is not related to similarity, yet it is superior to not using the data at all. – Darren Cook Jul 25 '21 at 19:12
I appreciate it if you could point me to (or explain) how a space without similarity <=> distance can be utilized for a recommender system. Based on my knowledge (which is not extensive by any means), spaces (like one-hot vectors) where distance does not represent similarity are absolutely useless for a recommender system. Please remember that I'm talking about the output of the ML model which means there cannot be any more mapping of the vectors (these vectors will be fed into ANN search engine). – Mehran Jul 25 '21 at 19:26
Simply put, whatever you will be feeding into an ANN search engine has to have the attribute of "distance represents similarity". Otherwise, the query results that you'll get from the ANN will be meaningless. ANN: Approximate Nearest Neighbor. – Mehran Jul 25 '21 at 19:30

score 0 · Answer 2 · answered Nov 21 '21 at 11:56

You are absolutely right - the "distance" between word embeddings does not automatically imply they are automatically semantically similar! This is a common misconception. This is more a result of when you train towards an objective, certain words can be used in place of one another and as a result, they can have similar representations inside of a neural network (hence similar embeddings and therefore reduced distance between your embeddings). However, this definitely does not mean that words are necessarily similar in meaning despite having similar embeddings!

However, models can be trained such that similar sentences/words have similar embeddings. This was the case of Universal Sentence Encoder where it was trained such that similar words and similar sentences had high cosine similarity relative to others. This method of training falls under a number of names: siamese neural networks/one-shot learning/metric learning. Similarly, what we call "similar" can fall under a number of different definitions - for example - "questions" can be "similar" to "answers" and so their embeddings can be trained to be near each other - same with text and images!

Now in order to build a recommendation engine, you will want to attain an embedding of an item and another item that has been purchased and similarly index them so that their embedding space is a lot closer. Then when you index them, you can immediately fetch the closest neighbors of a particular item and return them as results.

I would recommend trying out some of these applications yourself and experimenting with various models/data! These will help you explore some rather interesting results! Here are a few quickstarts to help you get started:

Text to image search https://docs.relevance.ai/docs/quickstart-text-to-image-search

Question answering: https://docs.relevance.ai/docs/quickstart-question-answering

"Semantic Text" Search: https://docs.relevance.ai/docs/quickstart-text-search

Disclaimer: I work at Relevance AI and helped author/edit the quickstarts linked.

Can we use embeddings or latent vectors for a recommender system?

2 Answers2