Highest Voted Questions - Data Science Stack Exchange

8

votes

2 answers

Text similarity with sentence embeddings

I'm trying to calculate similarity between texts with various lengths. My current approach is following: Using Universal Sentence Encoder, I convert text to a set of vectors. I average these vectors to create the final feature vector. I compare…

word-embeddings similarity similar-documents

asked Sep 19 '19 at 20:04

Kertis van Kertis

133
1
6

8

votes

4 answers

How to learn spam email detection?

I want to learn how a spam email detector is done. I'm not trying to build a commercial product, it'll be a serious learning exercise for me. Therefore, I'm looking for resources, such as existing projects, source code, articles, papers etc that I…

machine-learning classification text-mining

asked Jun 01 '15 at 12:36

ABCD

3,510
2
18
30

8

votes

2 answers

Is There a Way to Re-Calibrate Predicted Probabilities After Using Class Weights?

I have classification data with far more negative instances than positive instances. I have used class weights in my models and have achieved the discrimination I want but the predicted probabilities from the models do not match the actual…

python prediction class-imbalance

asked Sep 03 '19 at 20:36

from keras import michael

360
3
13

8

votes

2 answers

Time-series prediction: Model & data assumptions in AI/ML models vs conventional models

I was wondering if there was a good paper out there that informs about model and data assumptions in AI/ML approaches. For example, if you look at Time Series Modelling (Estimation or Prediction) with Linear models or (G)ARCH/ARMA processes, there…

machine-learning neural-network time-series linear-regression

asked Aug 29 '19 at 06:45

Maeaex1

578
2
15

8

votes

4 answers

Why is there a difference between predicting on Validation set and Test set?

I have a XGBoost model trying to predict if a currency will go up or down next period (5 min). I have a dataset from 2004 to 2018. I split the data randomized into 95% train and 5% validation and the accuracy on the Validation set is up to 55%. When…

machine-learning xgboost

asked Aug 24 '19 at 20:10

DBSE

221
2
3

8

votes

1 answer

Complex Chunking with NLTK

I am trying to figure out how to use NLTK's cascading chunker as per Chapter 7 of the NLTK book. Unfortunately, I'm running into a few issues when performing non-trivial chunking measures. Let's start with this phrase: "adventure movies between 2000…

python nlp nltk

asked May 16 '15 at 00:15

grill

234
3
7

8

votes

1 answer

Gensim LDA model: return keywords based on relevance (λ - lambda) value

I am using the gensim library for topic modeling, more specifically LDA. I created my corpus, my dictionary, and my LDA model. With the help of the pyLDAvis library I visualized the results. When I print the words with the highest probability on…

python topic-model lda gensim

asked Aug 21 '19 at 17:40

Tasos Lytos

81
3

8

votes

1 answer

Which classification algorithms to try for classifying text data into 300 categories

I have 40000 rows of text data of health care domain. Data has one column for text (2-5 sentences) and one column for its category. I want to classify that into 300 categories. Some categories are independent while some are somewhat related.…

machine-learning classification nlp text-mining

asked May 07 '15 at 08:52

Alok Nayak

191
1
5

8

votes

2 answers

How to use Graph Neural Network to predict relationships between nodes with pytorch_geometric?

Let's say I have a partly connected graph that represents members of many unrelated communities. I would like to predict the possible friendships between members of the same community: on an sliding scale between 0 to 10 how likey would they like…

pytorch-geometric

asked Jul 31 '19 at 16:38

Soerendip

724
1
9
16

8

votes

5 answers

What is the best question generation state of art with nlp?

I was trying out various projects available for question generation on GitHub namely NQG,question-generation and a lot of others but I don't see good results form them either they have very bad question formation or the questions generated are…

machine-learning deep-learning nlp

asked Jul 27 '19 at 07:39

Sundeep Pidugu

108
1
10

8

votes

2 answers

Why is taking the gradient of the average error in SGD not correct, but rather the average of the gradients of single errors?

I am a little confused about taking averages in cost functions and SGD. So far I always thought in SGD you would compute the average error for a batch and then backpropagate it. But then I was told in a comment on this question that that was wrong.…

machine-learning optimization gradient-descent mini-batch-gradient-descent

asked Jul 25 '19 at 21:13

lo tolmencre

235
1
9

8

votes

1 answer

How does class_weight work in Decision Tree

The scikit-learn implementation of DecisionTreeClassifier has a parameter as class_weight. As per documentation: Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one. and The…

scikit-learn decision-trees class-imbalance

asked Jul 23 '19 at 14:29

Supratim Haldar

279
1
3
8

8

votes

2 answers

Which classification algorithms are negatively affected by class imbalances?

I've seen a few posts and papers floating around the web (mostly those related to over/undersampling, SMOTE, and cost-sensitive training) that, when discussing class imbalance, specify that certain algorithms are negatively impacted by class…

machine-learning classification predictive-modeling multilabel-classification class-imbalance

asked Jul 03 '19 at 19:45

Danny David Leybzon

180
2

8

votes

3 answers

Isolation forest sklearn contamination param

I am working on an unsupervised anomaly detection task on time series data using an isolation forest algorithm. I am developing it in Python, more in detail using scikit-learn. I found a lot of examples on this, but what is not very clear, is how to…

python scikit-learn unsupervised-learning anomaly-detection outlier

asked Jul 01 '19 at 19:58

Giordano

325
1
4
10

8

votes

4 answers

What is the term for when a model acts on the thing being modeled and thus changes the concept?

I'm trying to see if there is a conventional term for this concept to help me in my literature research and writing. When a machine learning model causes an action to be taken in the real world that affects future instances, what is that called? …

machine-learning terminology

asked Apr 02 '15 at 23:52

jsmith54

83
2

Most Popular