1

I want to solve two questions:

  1. Which wikipedia articles could be interesting to me based on a list of keywords that are generated by the search terms I normally use in google(received by google takeout)?
  2. Which wikipedia articles could be interesting to me based on what is not on a list of keywords that are generated by the search terms I normally use in google(received by google takeout)?

I am looking for a how to do context search on wikipedia articles - preferrably via api so I don't have to download and process terabytes of wikipedia articles - using a/the mentioned set of keywords.

  • First it's useful to know that Wikipedia is big but not *that* big: the English Wikipedia dump containing all the articles with "current revisions only, no talk or user pages" is around 20 GB compressed, 80 GB uncompressed. Please follow the [instructions](https://en.wikipedia.org/wiki/Wikipedia:Database_download#Where_do_I_get_it?) if you want to use it. – Erwan Apr 02 '21 at 15:38
  • Second, sorry but your questions are not totally clear to me: question 1 is probably close to what happens in a regular query, it tries to match the keywords you give. Question 2 is strange: it looks as if you want to only give keywords which of not-related articles, but then that would correspond to the whole of Wikipedia minus a few articles. Or did you mean that you have a combination of some keywords of topics that you want and also some keywords of topics to exclude? – Erwan Apr 02 '21 at 15:42

0 Answers0