1

Hi all I'm fairly up to date with all the NLP tasks out there (nlpprogress.com, paperswithcode.com) and great tools like (nltk, flair, huggingface etc). I want to take a single word, and predict a similar word, a little like the old "Google Sets" feature except extrapolating from a single example. I'm thinking GPT-3 might be the best bet with some seed text like

here is a list of similar things: banana, 

and ask it to predict the next word.

transformer.huggingface.co is promising enough (though hilariously inadequate in itself) that I'm thinking GPT-3 indeed may well be the answer.

But the alternative is to navigate a treebank, through "type of" relationships… much, much faster and cheaper.

I've tagged this "semantic similarity" but really I don't want the relationship to be "similar", rather "is part of same set of".

thoughts most appreciated from actual practitioners in this space rather than hobbyists like me :)

Julian H
  • 113
  • 3

1 Answers1

1

But the alternative is to navigate a treebank, through "type of" relationships… much, much faster and cheaper.

WordNet provides exactly this: it is a lexical database in which words are grouped by synonyms, with several types of relations between groups in particular hypernyms/hyponyms (more general/more specific).

The database can be downloaded and there is a library to use it through nltk.

Erwan
  • 24,823
  • 3
  • 13
  • 34
  • OMG thanks Erwan I’ve used WordNet once before s couple of years ago but I’m not an active practitioner in this space so many thanks for the reminder! One question: last time I checked it wasn’t maintained. What do people use if they want a WordNet style capability that’s reflecting vernacular of the 2020s? – Julian H Mar 10 '21 at 11:12
  • 1
    @JulianH I think you're right, as far as I know Wordnet covers standard English, I doubt they would even try to cover anything specific of a period of time anyway. But I'm not aware of anything else remotely similar to WordNet which would cover your needs. So as you mentioned in the question, I think that people would just use similarity measures with word embeddings otherwise, with the disadvantage that it's not specific to any particular semantic relationship between words. – Erwan Mar 10 '21 at 14:26