Questions tagged [regression]

Techniques for analyzing the relationship between one (or more) "dependent" variables and "independent" variables.

"Regression" is a general term for a wide variety of techniques to analyze the relationship between one (or more) dependent variables and independent variables. Typically the dependent variables are modeled with probability distributions whose parameters are assumed to vary (deterministically) with the independent variables.

Ordinary least squares (OLS) regression affords a simple example in which the expectation of one dependent variable is assumed to depend linearly on the independent variables. The unknown coefficients in the assumed linear function are estimated by choosing values for them that minimize the sum of squared differences between the values of the dependent variable and the corresponding fitted values.

1575 questions
31
votes
6 answers

Validation loss is not decreasing

I am trying to train a LSTM model. Is this model suffering from overfitting? Here is train and validation loss graph:
DukeLover
  • 561
  • 1
  • 6
  • 14
31
votes
3 answers

How can I check the correlation between features and target variable?

I am trying to build a Regression model and I am looking for a way to check whether there's any correlation between features and target variables? This is my sample dataset Loan_ID Gender Married Dependents Education Self_Employed…
Jeeth
  • 911
  • 2
  • 10
  • 18
31
votes
3 answers

Neural Network for Multiple Output Regression

I have a dataset containing 34 input columns and 8 output columns. One way to solve the problem is to take the 34 inputs and build individual regression model for each output column. I am wondering if this problem can be solved using just one model…
sjishan
  • 411
  • 1
  • 4
  • 6
30
votes
3 answers

Why do we convert skewed data into a normal distribution

I was going through a solution of the Housing prices competition on Kaggle (Human Analog's Kernel on House Prices: Advance Regression Techniques) and came across this part: # Transform the skewed numeric features by taking log(feature + 1). # This…
28
votes
3 answers

What does "baseline" mean in the context of machine learning?

What does "baseline" mean in the context of machine learning and data science? Someone wrote me: Hint: An appropriate baseline will give an RMSE of approximately 200. I don't get this. Does he mean that if my predictive model on the training data…
26
votes
2 answers

Why do we need to discard one dummy variable?

I have learned that, for creating a regression model, we have to take care of categorical variables by converting them into dummy variables. As an example, if, in our data set, there is a variable like location: Location…
19
votes
2 answers

Multivariate linear regression in Python

I'm looking for a Python package that implements multivariate linear regression. (Terminological note: multivariate regression deals with the case where there are more than one dependent variables while multiple regression deals with the case where…
Franck Dernoncourt
  • 5,573
  • 9
  • 40
  • 75
16
votes
4 answers

What does "linear in parameters" mean?

The model of linear regression is linear in parameters. What does this actually mean?
Albert Gao
  • 263
  • 1
  • 2
  • 5
15
votes
3 answers

Modelling Unevenly Spaced Time Series

I have a continuous variable, sampled over a period of a year at irregular intervals. Some days have more than one observation per hour, while other periods have nothing for days. This makes it particularly difficult to detect patterns in the time…
doublebyte
  • 420
  • 3
  • 9
15
votes
3 answers

Predict the best time of call

I have a dataset including a set of customers in different cities of California, time of calling for each customer, and the status of call (True if customer answers the call and False if customer does not answer). I have to find an appropriate time…
14
votes
2 answers

What to do when testing data has less features than training data?

Let's say we are predicting the sales of a shop and my training data has two sets of features: One about the store sales with the dates (the field "Store" is not unique) One about the store types (the field "Store" is unique here) So the matrix…
14
votes
1 answer

Stratify on regression

I have worked in classification problems, and stratified cross-validation is one of the most useful and simple techniques I've found. In that case, what it means is to build a training and validation set that have the same prorportions of classes of…
David Masip
  • 5,981
  • 2
  • 23
  • 61
13
votes
2 answers

MAD vs RMSE vs MAE vs MSLE vs R²: When to use which?

In regression problems, you can use various different metrics to check how well your model is doing: Mean Absolute Deviation (MAD): In $[0, \infty)$, the smaller the better Root Mean Squared Error (RMSE): In $[0, \infty)$, the smaller the…
Martin Thoma
  • 18,630
  • 31
  • 92
  • 167
13
votes
2 answers

Interpreting the Root Mean Squared Error (RMSE)!

I read all about pros and cons of RMSE vs. other absolute errors namely mean absolute error (MAE). See the the following references: MAE and RMSE — Which Metric is Better? What's the bottom line? How to compare models Or this nice blogpost, or this…
TwinPenguins
  • 4,157
  • 3
  • 17
  • 53
13
votes
1 answer

How to do stepwise regression using sklearn?

I could not find a way to stepwise regression in scikit learn. I have checked all other posts on Stack Exchange on this topic. Answers to all of them suggests using f_regression. But f_regression does not do stepwise regression but only give F-score…
1
2 3
99 100