Highest Voted 'wikipedia' Questions - Data Science Stack Exchange

1

vote

0 answers

Search for similar wikipedia articles based on a set of keywords

I want to solve two questions: Which wikipedia articles could be interesting to me based on a list of keywords that are generated by the search terms I normally use in google(received by google takeout)? Which wikipedia articles could be…

nlp api wikipedia

asked Apr 02 '21 at 12:10

Pascal Widmann

23
3

1

vote

2 answers

IterativeImputer Evaluation

I am having a hard time evaluating my model of imputation. I used an iterative imputer model to fill in the missing values in all four columns. For the model on the iterative imputer, I am using a Random forest model, here is my code for…

python scikit-learn pandas model-evaluations wikipedia

asked Feb 19 '21 at 13:54

StarGit

13
3

1

vote

0 answers

doc2vec - paragraph or article as document

I'm trying to train a doc2vec model on the German wiki corpus. While looking for the best practice I've found different possibilities on how to create the training data. Should I split every Wikipedia article by each natural paragraph into several…

nlp gensim doc2vec wikipedia

asked Jan 09 '21 at 13:46

jonas

143
4

1

vote

1 answer

Minimum number of features for Naïve Bayes model

I keep on reading that Naive Bayes needs fewer features than many other ML algorithms. But what's the minimum number of features you actually need to get good results (90% accuracy) with a Naive Bayes model? I know there is no objective answer to…

python nlp feature-selection naive-bayes-classifier wikipedia

asked Dec 28 '20 at 12:45

E. Turok

11
1

1

vote

1 answer

How can I use Wikipedia2vec model for embedding my article named entities as 40% entities are not in a wikipedia?

I have news articles in my dataset containing named entities. I want to use the Wikipedia2vec model to encode the article's named entities. But some of the entities (around 40%) from our dataset articles are not present in Wikipedia. Please suggest…

machine-learning deep-learning word-embeddings named-entity-recognition wikipedia

asked Jan 10 '22 at 21:46

sajankar9

11
2

0

votes

0 answers

Can a dataset built upon another have more restrictive license?

I found a dataset built on top of Wikipedia dump, which comes in Huggingface Dataset library. The Wikipedia dump is licensed under CC BY-SA and the Huggingface Dataset is licensed under Apache-2.0, but there is no license specified for the dataset I…

dataset wikipedia

asked Jul 29 '21 at 10:50

Agata

1

0

votes

0 answers

Wikipedia corpus for NLP - Cleaned sentences

I can see many wikipedia dumps out there. I am looking for a wikipedia-made corpus, in which every line is one sentence, without any wikipedia meta tags.

nlp corpus wikipedia

asked Oct 21 '19 at 08:17

Nathan B

241
1
2
5

0

votes

1 answer

Correlation Wikipedia translated pages vs number of in links is weird (scatterplot)?

I'm trying to find a correlation measure for the number of Wikipedia pages an entity (an article) has been translated to vs number of links that point to that page (both measures that can point to the popularity of a page). For instance I have Work,…

python dataset data-science-model correlation wikipedia

asked Mar 29 '22 at 12:47

Idkwhatnomeis

3
2

0

votes

1 answer

What correlation measure for Wikipedia translated pages vs number of in links?

I'm trying to find a correlation measure for the number of Wikipedia pages an entity (an article) has been translated to vs number of links that point to that page (both measures that can point to the popularity of a page). Is it possible to…

python dataset pandas correlation wikipedia

asked Mar 18 '22 at 09:01

Idkwhatnomeis

3
2

Questions tagged [wikipedia]