What are the issues of dealing with highly skewed variable in a supervised problem? What are the machine learning algorithms that suffer more from skewness in the data and what are the solutions to this problem?
Asked
Active
Viewed 262 times
3
-
2Are you asking about dependent or independent variables? At this state it is way too general and you can already find possbile duplicates like: https://ai.stackexchange.com/questions/4283/does-data-skew-matter-in-classification-problem, or https://datascience.stackexchange.com/questions/20237/why-do-we-convert-skewed-data-into-a-normal-distribution? If still not clear happy none of these are your questions please re-frame your question and will be more than happy to discuss further and share my inputs. Often it is largely discussed for classification problems for the target variable!! – TwinPenguins May 01 '18 at 11:06
-
No, the referred question talks about skewness on classes. I talk about skewness of a continuous distribution. – David Masip May 01 '18 at 12:25
-
I see and I kind of guessed you are talking about the "continuous distribution", but your questions still is not clear whether it is the independent or dependent variable? If it is the dependent variable (target) almost the same principle as in classes applied and it is strongly recommended to transform it to a normally distributed esp. in linear reg. models where the assumption of residuals to be normally distributed has to hold and more. If it is about the independent variables we could very differently. – TwinPenguins May 01 '18 at 14:05
-
You are right, I was thinking about both cases – David Masip May 01 '18 at 15:57