Is there a reference dataset for contextual similarity?

Question

I'm doing some experiments with word embeddings to try to capture context-aware similarity, so that for example the word pair apple - hardware, are very dissimilar in the context of a fruit store, but very similar in an IT context.

My question is if there is a benchmark dataset for this challenge. I've been looking, but I can't find anything.

Thanks in advance.

score 2 · Answer 1 · answered Mar 03 '23 at 18:14

I think some datasets used for word sense disambiguation (WSD) would be an option.

WSD is the task of classifying an ambiguous word into its correct meaning. For instance "apple" would have meaning 1 the fruit and meaning 2 the tech company. As a consequence a labelled dataset identifies the correct context for the meaning.

I don't know any specific dataset but I assume that state of the art papers mention and use these datasets.

Is there a reference dataset for contextual similarity?

1 Answers1