Most Popular
1500 questions
31
votes
1 answer
What is a LB score in machine learning?
I was going through an article on kaggle blogs. Repeatedly, the author mentions 'LB score' and 'LB fit') as a metric for effectiveness of machine learning (along with cross validation (CV) score).
With a research for the meaning of 'LB' I spent…
user345394
- 505
- 1
- 4
- 8
31
votes
3 answers
Neural Network for Multiple Output Regression
I have a dataset containing 34 input columns and 8 output columns.
One way to solve the problem is to take the 34 inputs and build individual regression model for each output column.
I am wondering if this problem can be solved using just one model…
sjishan
- 411
- 1
- 4
- 6
31
votes
8 answers
How to count the number of missing values in each row in Pandas dataframe?
How can I get the number of missing value in each row in Pandas dataframe.
I would like to split dataframe to different dataframes which have same number of missing values in each row.
Any suggestion?
Kaggle
- 2,877
- 5
- 13
- 8
30
votes
3 answers
What is difference between text classification and topic models?
I know the difference between clustering and classification in machine learning, but I don't understand the difference between text classification and topic modeling for documents. Can I use topic modeling over documents to identify a topic? Can I…
Ali
- 361
- 1
- 4
- 6
30
votes
8 answers
Purpose of visualizing high dimensional data?
There are many techniques for visualizing high dimension datasets, such as T-SNE, isomap, PCA, supervised PCA, etc. And we go through the motions of projecting the data down to a 2D or 3D space, so we have a "pretty pictures". Some of these…
hlin117
- 675
- 1
- 8
- 11
30
votes
3 answers
What is a better input for Word2Vec?
This is more like a general NLP question.
What is the appropriate input to train a word embedding namely Word2Vec? Should all sentences belonging to an article be a separate document in a corpus? Or should each article be a document in said…
wacax
- 3,370
- 4
- 22
- 45
30
votes
2 answers
How to interpret classification report of scikit-learn?
As you can see, it is about a binary classification with linearSVC. The class 1 has a higher precision than class 0 (+7%), but class 0 has a higher recall than class 1 (+11%). How would you interpret this?
And two other questions: what does…
user77241
30
votes
7 answers
Can machine learning learn a function like finding maximum from a list?
I have an input which is a list and the output is the maximum of the elements of the input-list.
Can machine learning learn such a function which always selects the maximum of the input-elements present in the input?
This might seem as a pretty…
user78739
- 309
- 1
- 3
- 3
30
votes
2 answers
How to feed LSTM with different input array sizes?
If I like to write a LSTM network and feed it by different input array sizes, how is it possible?
For example I want to get voice messages or text messages in a different language and translate them. So the first input maybe is "hello" but the…
user3486308
- 1,260
- 5
- 16
- 27
30
votes
6 answers
What is the reason behind taking log transformation of few continuous variables?
I have been doing a classification problem and I have read many people's code and tutorials. One thing I've noticed is that many people take np.log or log of continuous variable like loan_amount or applicant_income etc.
I just want to understand…
Sai Kumar
- 601
- 1
- 8
- 14
30
votes
1 answer
How is a splitting point chosen for continuous variables in decision trees?
I have two questions related to decision trees:
If we have a continuous attribute, how do we choose the splitting value?
Example: Age=(20,29,50,40....)
Imagine that we have a continuous attribute $f$ that have values in $R$. How can I write an…
WALID BELRHALMIA
- 411
- 1
- 4
- 5
30
votes
4 answers
Is pandas now faster than data.table?
Here is the GitHub link to the most recent data.table benchmark.
The data.table benchmarks has not been updated since 2014. I heard somewhere that Pandas is now faster than data.table. Is this true? Has anyone done any benchmarks? I have never used…
xiaodai
- 620
- 1
- 5
- 12
30
votes
3 answers
Why do we convert skewed data into a normal distribution
I was going through a solution of the Housing prices competition on Kaggle (Human Analog's Kernel on House Prices: Advance Regression Techniques) and came across this part:
# Transform the skewed numeric features by taking log(feature + 1).
# This…
Abhijay Ghildyal
- 785
- 2
- 9
- 10
30
votes
6 answers
How to fill missing value based on other columns in Pandas dataframe?
Suppose I have a 5*3 data frame in which third column contains missing value
1 2 3
4 5 NaN
7 8 9
3 2 NaN
5 6 NaN
I hope to generate value for missing value based rule that first product second column
1 2 3
4 5 20 <--4*5
7 8 9
3 2 6 <-- 3*2
5 6 30…
KyL
- 419
- 1
- 4
- 5
30
votes
3 answers
How to get p-value and confident interval in LogisticRegression with sklearn?
I am building a multinomial logistic regression with sklearn (LogisticRegression). But after it finishes, how can I get a p-value and confident interval of my model? It only appears that sklearn only provides coefficient and intercept.
Thank you a…
hminle
- 401
- 1
- 4
- 4