Questions tagged [predictive-modeling]

Statistical techniques used for predicting outcomes.

Predictive modelling is a set of statistical techniques used for predicting outcomes. Each such techniques is called a predictive model.

Major areas of application of predictive modelling are health care, customer relationship management, and marketing.

1175 questions
63
votes
6 answers

Should a model be re-trained if new observations are available?

So, I have not been able to find any literature on this subject but it seems like something worth giving a thought: What are the best practices in model training and optimization if new observations are available? Is there any way to determine the…
yad
  • 1,773
  • 3
  • 16
  • 27
56
votes
8 answers

Why Is Overfitting Bad in Machine Learning?

Logic often states that by overfitting a model, its capacity to generalize is limited, though this might only mean that overfitting stops a model from improving after a certain complexity. Does overfitting cause models to become worse regardless of…
blunders
  • 1,922
  • 2
  • 15
  • 19
35
votes
1 answer

Time Series prediction using LSTMs: Importance of making time series stationary

In this link on Stationarity and differencing, it has been mentioned that models like ARIMA require a stationarized time series for forecasting as it's statistical properties like mean, variance, autocorrelation etc are constant over time. Since…
28
votes
2 answers

Predicting a word using Word2vec model

Given a sentence: "When I open the ?? door it starts heating automatically" I would like to get the list of possible words in ?? with a probability. The basic concept used in word2vec model is to "predict" a word given surrounding context. Once the…
DED
  • 345
  • 1
  • 3
  • 7
28
votes
3 answers

What does "baseline" mean in the context of machine learning?

What does "baseline" mean in the context of machine learning and data science? Someone wrote me: Hint: An appropriate baseline will give an RMSE of approximately 200. I don't get this. Does he mean that if my predictive model on the training data…
25
votes
4 answers

How to predict probabilities in xgboost using R?

The below predict function is giving -ve values as well so it cannot be probabilities. param <- list(max.depth = 5, eta = 0.01, objective="binary:logistic",subsample=0.9) bst <- xgboost(param, data = x_mat, label = y_mat,nround = 3000) pred_s <-…
18
votes
5 answers

Merging sparse and dense data in machine learning to improve the performance

I have sparse features which are predictive, also I have some dense features which are also predictive. I need to combine these features together to improve the overall performance of the classifier. Now, the thing is when I try to combine these…
16
votes
1 answer

Train Accuracy vs Test Accuracy vs Confusion matrix

After I developed my predictive model using Random Forest I get the following metrics: Train Accuracy :: 0.9764634601043997 Test Accuracy :: 0.7933284397683713 Confusion matrix [[28292 1474] …
16
votes
3 answers

Why are ensembles so unreasonably effective

It seems to have become axiomatic that an ensemble of learners leads to the best possible model results - and it is becoming far rarer, for example, for single models to win competitions such as Kaggle. Is there a theoretical explanation for why…
15
votes
3 answers

Is feature selection necessary?

I would like to run some machine learning model like random forest, gradient boosting, or SVM on my dataset. There are more than 200 predictor variables in my dataset and my target classes are a binary variable. Do I need to run feature selection…
14
votes
2 answers

How to train model to predict events 30 minutes prior, from multi-dimensionnal timeseries

Experts in my field are capable of predicting the likelyhood an event (binary spike in yellow) 30 minutes before it occurs. Frequency here is 1 sec, this view represents a few hours worth of data, i have circled in black where "malicious" pattern…
13
votes
5 answers

In industry, what type of new data science algorithms does one develop?

I've seen several job descriptions for data science which include developing a novel algorithm to be a part of production environments. Can you give some input of what could be meant here exactly? Would they mean an algorithm that behaves somewhat…
Mariah
  • 328
  • 1
  • 9
12
votes
2 answers

Machine Learning Best Practices for Big Dataset

I am about to graduate from my Master and had learnt about machine learning as well as performed research projects with it. I wonder about the best practices in the industry when performing machine learning tasks with Big Datasets (like 100s GB or…
iLoeng
  • 243
  • 3
  • 5
12
votes
1 answer

Hashing Trick - what actually happens

When ML algorithms, e.g. Vowpal Wabbit or some of the factorization machines winning click through rate competitions (Kaggle), mention that features are 'hashed', what does that actually mean for the model? Lets say there is a variable that…
B_Miner
  • 702
  • 1
  • 7
  • 20
11
votes
3 answers

Can regression trees predict continuously?

Suppose I have a smooth function like $f(x, y) = x^2+y^2$. I have a training set $D \subsetneq \{((x, y), f(x,y)) | (x,y) \in \mathbb{R}^2\}$ and, of course, I don't know $f$ although I can evaluate $f$ wherever I want. Are regression trees capable…
Martin Thoma
  • 18,630
  • 31
  • 92
  • 167
1
2 3
78 79