Highest Voted 'predictive-modeling' Questions

63

votes

6 answers

Should a model be re-trained if new observations are available?

So, I have not been able to find any literature on this subject but it seems like something worth giving a thought: What are the best practices in model training and optimization if new observations are available? Is there any way to determine the…

asked Jul 13 '16 at 11:03

yad

1,773
3
16
27

56

votes

8 answers

Why Is Overfitting Bad in Machine Learning?

Logic often states that by overfitting a model, its capacity to generalize is limited, though this might only mean that overfitting stops a model from improving after a certain complexity. Does overfitting cause models to become worse regardless of…

machine-learning predictive-modeling

asked May 14 '14 at 18:09

blunders

1,922
2
15
19

35

votes

1 answer

Time Series prediction using LSTMs: Importance of making time series stationary

In this link on Stationarity and differencing, it has been mentioned that models like ARIMA require a stationarized time series for forecasting as it's statistical properties like mean, variance, autocorrelation etc are constant over time. Since…

deep-learning predictive-modeling time-series forecast lstm

asked Nov 16 '17 at 07:57

Abhijay Ghildyal

785
2
9
10

28

votes

2 answers

Predicting a word using Word2vec model

Given a sentence: "When I open the ?? door it starts heating automatically" I would like to get the list of possible words in ?? with a probability. The basic concept used in word2vec model is to "predict" a word given surrounding context. Once the…

nlp predictive-modeling word-embeddings

asked Jan 14 '16 at 07:13

DED

345
1
3
7

28

votes

3 answers

What does "baseline" mean in the context of machine learning?

What does "baseline" mean in the context of machine learning and data science? Someone wrote me: Hint: An appropriate baseline will give an RMSE of approximately 200. I don't get this. Does he mean that if my predictive model on the training data…

machine-learning regression predictive-modeling terminology

asked Apr 26 '18 at 23:17

Meiiso

411
1
4
7

25

votes

4 answers

How to predict probabilities in xgboost using R?

The below predict function is giving -ve values as well so it cannot be probabilities. param <- list(max.depth = 5, eta = 0.01, objective="binary:logistic",subsample=0.9) bst <- xgboost(param, data = x_mat, label = y_mat,nround = 3000) pred_s <-…

machine-learning r predictive-modeling decision-trees xgboost

asked Sep 08 '15 at 03:14

GeorgeOfTheRF

2,018
5
17
20

18

votes

5 answers

Merging sparse and dense data in machine learning to improve the performance

I have sparse features which are predictive, also I have some dense features which are also predictive. I need to combine these features together to improve the overall performance of the classifier. Now, the thing is when I try to combine these…

machine-learning classification predictive-modeling scikit-learn supervised-learning

asked Apr 06 '16 at 05:14

Sagar Waghmode

231
2
7

16

votes

1 answer

Train Accuracy vs Test Accuracy vs Confusion matrix

After I developed my predictive model using Random Forest I get the following metrics: Train Accuracy :: 0.9764634601043997 Test Accuracy :: 0.7933284397683713 Confusion matrix [[28292 1474] …

python predictive-modeling accuracy confusion-matrix classifier

asked Feb 28 '18 at 21:07

Pedro Alves

367
2
3
11

16

votes

3 answers

Why are ensembles so unreasonably effective

It seems to have become axiomatic that an ensemble of learners leads to the best possible model results - and it is becoming far rarer, for example, for single models to win competitions such as Kaggle. Is there a theoretical explanation for why…

machine-learning data-mining predictive-modeling

asked May 25 '16 at 13:08

Robert de Graaf

899
5
17

15

votes

3 answers

Is feature selection necessary?

I would like to run some machine learning model like random forest, gradient boosting, or SVM on my dataset. There are more than 200 predictor variables in my dataset and my target classes are a binary variable. Do I need to run feature selection…

machine-learning predictive-modeling feature-selection random-forest

asked Jan 04 '17 at 08:46

LUSAQX

783
2
10
24

14

votes

2 answers

How to train model to predict events 30 minutes prior, from multi-dimensionnal timeseries

Experts in my field are capable of predicting the likelyhood an event (binary spike in yellow) 30 minutes before it occurs. Frequency here is 1 sec, this view represents a few hours worth of data, i have circled in black where "malicious" pattern…

machine-learning python predictive-modeling time-series scikit-learn

asked Apr 20 '17 at 13:24

William D

143
1
6

13

votes

5 answers

In industry, what type of new data science algorithms does one develop?

I've seen several job descriptions for data science which include developing a novel algorithm to be a part of production environments. Can you give some input of what could be meant here exactly? Would they mean an algorithm that behaves somewhat…

predictive-modeling algorithms

asked Jan 17 '20 at 19:02

Mariah

328
1
9

12

votes

2 answers

Machine Learning Best Practices for Big Dataset

I am about to graduate from my Master and had learnt about machine learning as well as performed research projects with it. I wonder about the best practices in the industry when performing machine learning tasks with Big Datasets (like 100s GB or…

machine-learning predictive-modeling bigdata

asked Sep 07 '16 at 22:40

iLoeng

243
3
5

12

votes

1 answer

Hashing Trick - what actually happens

When ML algorithms, e.g. Vowpal Wabbit or some of the factorization machines winning click through rate competitions (Kaggle), mention that features are 'hashed', what does that actually mean for the model? Lets say there is a variable that…

machine-learning predictive-modeling kaggle

asked Oct 10 '14 at 03:48

B_Miner

702
1
7
20

11

votes

3 answers

Can regression trees predict continuously?

Suppose I have a smooth function like $f(x, y) = x^2+y^2$. I have a training set $D \subsetneq \{((x, y), f(x,y)) | (x,y) \in \mathbb{R}^2\}$ and, of course, I don't know $f$ although I can evaluate $f$ wherever I want. Are regression trees capable…

predictive-modeling regression decision-trees

asked Dec 16 '15 at 11:39

Martin Thoma

18,630
31
92
167

Questions tagged [predictive-modeling]