Questions tagged [anova]

15 questions
2
votes
1 answer

Score of ANOVA in selected features

I selected features using ANOVA (because I have Numerical data as input and Categorical data as target): anova = SelectKBest(score_func=f_classif, k='all') anova.fit(X_train, y_train.values.argmax(1)) # y_train.values.argmax(1) because I already…
Mimi
  • 45
  • 7
2
votes
1 answer

What conclusion can I get when the variable is influenced by other but there isn't any correlation?

I am doing an analytic exploratory analysis. If the target is a continuous variable and the attributes are all categorical (discrete values), in order to know if exist any influence on the target from the each attribute I am doing the ANOVA-test…
1
vote
0 answers

Levene test for equal variance

I would like to run one-way ANOVA test on my data. I saw that one of several assumptions for one-way ANOVA is that there needs to be homogeneity of variances. I have run the test for different data-sets. I find sometimes my p-values are larger than…
Reut
  • 349
  • 2
  • 13
1
vote
1 answer

When should mutual information be used for feature selection over other feature selection methods like correlation, ANOVA , etc?

I have a data set with categorical and continuous/ordinal explanatory variables and continuous target variable. I tried to filter features using one-way ANOVA for categorical variables and using Spearman's correlation coefficient for…
1
vote
1 answer

Question on ANOVA and Correlation/Association

I've been working on examining statistical relationships between variable: Pearsons, Spearman's for continuous variables Kendall's Tau, Cramer's V for ordinal/nominal variables. I know there's many more ways. Recently I read about ANOVA and…
rocksNwaves
  • 309
  • 1
  • 10
1
vote
2 answers

Are Chi-square and ANOVA (f_classif) to select best features?

I have a binary classification problem (target 0 o 1), I have both variables continuous and categorical as features. I understood that about Chi-square i can use only categorical features to evaluate them. What about ANOVA (f_classif)? It's the…
0
votes
1 answer

pass variable length argument to mstats.kruskalwallis

I am trying to run kruskawallis test on multiple columns of my data for that i wrote an function var=['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z'] def kruskawallis_test(column): …
Ayush Ranjan
  • 401
  • 1
  • 4
  • 13
0
votes
1 answer

What does it mean to have 1 degree of freedom in ANOVA test?

So I used python to run multi-factorial ANOVA analysis on a data set. I first used a ols.fit() and then the anova_lm function. I realized for the variables I am analyzing their degree of freedom is 1. Does that mean only 1 value out of my data is…
0
votes
0 answers

How to check for correlations using ANOVA between the selected feature columns and the target feature column

Suppose I have a set of feature columns that I select to check for correlations with the target feature 'is_Star'; which is of binary values and the former consists of numeric values Why do i end up getting an array of NAN values? Also, I have…
meKafka
  • 1
  • 1
0
votes
0 answers

Hyunh-Feldt vs Growth Curve Analysis

Are there cases where the Hyunh Feldt and Greenhouse Geisser correction F tests are less conservative than the Conservative Growth Curve approach for ANOVA analysis? The textbook Design of Experiments in R has a problem in which we find that…
Lyle
  • 1
0
votes
0 answers

Why do you need more subjects than levels in RM ANOVA?

Honestly, I'm hoping I even asked this question appropriately. It's one of those things I never quite grasped, but just accepted to be true, that when running a within-subjects ANOVA, you have to have a lot of subjects or otherwise you can't run the…
0
votes
0 answers

Statistical significance on aggregate data to show that the groups are different?

I am working with performance data for three groups for each region. The denominator for the groups is the number of people who are identified as low performers. For region A, Group-1 low performer %= 40% , group-2= 30% , group-3 low performer= 30%.…
user728148
  • 21
  • 1
  • 3
0
votes
0 answers

Different runs of One Way ANOVA give different results in Python (on the same data, using stats.f_oneway)

I wrote the following code: The data I am running this code on is randomly generated, but I use fixed seeding. Each time I run the code this function gives different results for F statistics and pvalue. What is the reason for this? Is this normal?…
0
votes
2 answers

What model should I use to predict monthly sales by products?

I am trying to predict monthly sales by product based on a plethora of variables. There are 4 predictors. One is categorical (month) and the other three are numerical. One of the variables is just part sales. The data I am trying to predict is…
0
votes
0 answers

Create box plot with anova model and different p value

I have created box plot which looks like this Code - df = pd.read_csv("file.csv") import plotly.express as px fig = px.box(df, x="Treatment", y="Fresh weight (g)", color='Treatment', facet_col='Crop', title='Fresh Weight', height=750) fig.show() I…
Jhon Patric
  • 213
  • 1
  • 8