Highest Voted Questions - Data Science Stack Exchange

31

votes

1 answer

What is a LB score in machine learning?

I was going through an article on kaggle blogs. Repeatedly, the author mentions 'LB score' and 'LB fit') as a metric for effectiveness of machine learning (along with cross validation (CV) score). With a research for the meaning of 'LB' I spent…

machine-learning accuracy

asked May 08 '17 at 05:13

user345394

505
1
4
8

31

votes

3 answers

Neural Network for Multiple Output Regression

I have a dataset containing 34 input columns and 8 output columns. One way to solve the problem is to take the 34 inputs and build individual regression model for each output column. I am wondering if this problem can be solved using just one model…

neural-network regression tensorflow

asked Feb 10 '17 at 23:17

sjishan

411
1
4
6

31

votes

8 answers

How to count the number of missing values in each row in Pandas dataframe?

How can I get the number of missing value in each row in Pandas dataframe. I would like to split dataframe to different dataframes which have same number of missing values in each row. Any suggestion?

python pandas

asked Jul 07 '16 at 10:26

Kaggle

2,877
5
13
8

30

votes

3 answers

What is difference between text classification and topic models?

I know the difference between clustering and classification in machine learning, but I don't understand the difference between text classification and topic modeling for documents. Can I use topic modeling over documents to identify a topic? Can I…

classification text-mining topic-model

asked Aug 12 '14 at 03:50

Ali

361
1
4
6

30

votes

8 answers

Purpose of visualizing high dimensional data?

There are many techniques for visualizing high dimension datasets, such as T-SNE, isomap, PCA, supervised PCA, etc. And we go through the motions of projecting the data down to a 2D or 3D space, so we have a "pretty pictures". Some of these…

machine-learning dimensionality-reduction visualization

asked Nov 26 '15 at 04:28

hlin117

675
1
8
11

30

votes

3 answers

What is a better input for Word2Vec?

This is more like a general NLP question. What is the appropriate input to train a word embedding namely Word2Vec? Should all sentences belonging to an article be a separate document in a corpus? Or should each article be a document in said…

nlp word-embeddings

asked Nov 08 '15 at 04:17

wacax

3,370
4
22
45

30

votes

2 answers

How to interpret classification report of scikit-learn?

As you can see, it is about a binary classification with linearSVC. The class 1 has a higher precision than class 0 (+7%), but class 0 has a higher recall than class 1 (+11%). How would you interpret this? And two other questions: what does…

classification metric binary

asked Dec 08 '19 at 23:17

user77241

30

votes

7 answers

Can machine learning learn a function like finding maximum from a list?

I have an input which is a list and the output is the maximum of the elements of the input-list. Can machine learning learn such a function which always selects the maximum of the input-elements present in the input? This might seem as a pretty…

machine-learning deep-learning

asked Jul 31 '19 at 11:06

user78739

309
1
3
3

30

votes

2 answers

How to feed LSTM with different input array sizes?

If I like to write a LSTM network and feed it by different input array sizes, how is it possible? For example I want to get voice messages or text messages in a different language and translate them. So the first input maybe is "hello" but the…

keras lstm

asked Apr 07 '19 at 08:04

user3486308

1,260
5
16
27

30

votes

6 answers

What is the reason behind taking log transformation of few continuous variables?

I have been doing a classification problem and I have read many people's code and tutorials. One thing I've noticed is that many people take np.log or log of continuous variable like loan_amount or applicant_income etc. I just want to understand…

machine-learning python classification scikit-learn

asked Oct 23 '18 at 13:08

Sai Kumar

601
1
8
14

30

votes

1 answer

How is a splitting point chosen for continuous variables in decision trees?

I have two questions related to decision trees: If we have a continuous attribute, how do we choose the splitting value? Example: Age=(20,29,50,40....) Imagine that we have a continuous attribute $f$ that have values in $R$. How can I write an…

classification data decision-trees

asked Nov 03 '17 at 21:45

WALID BELRHALMIA

411
1
4
5

30

votes

4 answers

Is pandas now faster than data.table?

Here is the GitHub link to the most recent data.table benchmark. The data.table benchmarks has not been updated since 2014. I heard somewhere that Pandas is now faster than data.table. Is this true? Has anyone done any benchmarks? I have never used…

python r pandas data data-table

asked Oct 25 '17 at 02:43

xiaodai

620
1
5
12

30

votes

3 answers

Why do we convert skewed data into a normal distribution

I was going through a solution of the Housing prices competition on Kaggle (Human Analog's Kernel on House Prices: Advance Regression Techniques) and came across this part: # Transform the skewed numeric features by taking log(feature + 1). # This…

regression feature-extraction feature-engineering kaggle feature-scaling

asked Jul 07 '17 at 11:35

Abhijay Ghildyal

785
2
9
10

30

votes

6 answers

How to fill missing value based on other columns in Pandas dataframe?

Suppose I have a 5*3 data frame in which third column contains missing value 1 2 3 4 5 NaN 7 8 9 3 2 NaN 5 6 NaN I hope to generate value for missing value based rule that first product second column 1 2 3 4 5 20 <--4*5 7 8 9 3 2 6 <-- 3*2 5 6 30…

pandas

asked Mar 22 '17 at 12:57

KyL

419
1
4
5

30

votes

3 answers

How to get p-value and confident interval in LogisticRegression with sklearn?

I am building a multinomial logistic regression with sklearn (LogisticRegression). But after it finishes, how can I get a p-value and confident interval of my model? It only appears that sklearn only provides coefficient and intercept. Thank you a…

scikit-learn logistic-regression

asked Nov 28 '16 at 17:10

hminle

401
1
4
4

Most Popular