Finding similarity between a word and a sentence (like "restart" and "turn off and on")

Question

I am using Word2Vec for text vectorization. It is doing a good job but some cases it is failing. For example "turn the computer off and on" and the sentence "restart the computer" does not have a very good similarity score, even though they mean the same thing. Doc2Vec is not doing a good job as my inputs are usually a couple of sentences and not a document.

Can anyone please suggest an approach which would give a good similarity score between "turn on and off" and "restart" and also other combinations like that?

Use pre-trained embeddings, and form the document embeddings through [averaging the tf-idf scores](https://openreview.net/forum?id=SyK00v5xx) or concatenation of summary statistics (min, max, mean, std) — Emre, Aug 22 '17 at 06:04
This question related to this other question https://datascience.stackexchange.com/questions/22536/detect-related-sentences/22948#22948 — Brian Spiering, Sep 20 '17 at 16:54

score 0 · Answer 1 · answered Aug 23 '17 at 09:53

0

If you are training your word2vec by yourself than you should increase your training dataset. You can easily get the Wikipedia database. If you are using a pretrained model, you can always fine tune it with additional data.

answered Aug 23 '17 at 09:53

HatemB

316
2
7

score 0 · Answer 2 · edited Feb 06 '21 at 18:34

0

One approach you could take is to build sentence vectors using vectors generated for Words.

This post covers the different techniques you could use to achieve it.

edited Feb 06 '21 at 18:34

Ethan

1,625
8
23
39

answered Aug 23 '17 at 10:35

Nischal Hp

765
3
10

Finding similarity between a word and a sentence (like "restart" and "turn off and on")

2 Answers2