Questions tagged [chi-square-test]

27 questions
2
votes
1 answer

What is the best alternative for Fisher's Exact test for contigency tables that are NOT 2x2?

I am a newbie to data mining. I am trying to find associations between two categorical variables. Since more than 20% of my expected frequencies are less than 5, I wanted to use Fisher exact test but it turns out it is generally used for contingency…
wilma297
  • 21
  • 1
2
votes
1 answer

Interpreting the results based on Granger Causality test

I am trying to use Granger Causality test: https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.grangercausalitytests.html to assess whether "positivity score" affects value. Here is the code I am using: # Applying…
Darcey BM
  • 197
  • 1
  • 6
2
votes
2 answers

Whether Chi-square statistic Test helps us to assess a non-linear correlation between two categorical variables?

I have two categorical variables: sports level (1, 2, 3 and 4) and Use of supplements (Yes and No). I analyzed whether they are independent by the X² test, and their association was significant. I would like to know whether chi-squared statistic in…
2
votes
2 answers

Does t-test require Standard Deviation of sample for calculation

Might be a novice question, but the main difference between a t-test and z-test, I was able to understand, is that the z-test calculation requires the SD value of the population where as in a t-test, we do work with SD of the sample itself when…
1
vote
0 answers

Should I remove features such as gender and birth month before drawing the heatmap because they are categorical?

I am working on a dataset that has both categorical and numerical (continuous and discrete) features (26 columns, 30244 rows). Target is categorical (1, 2, 3) and I am performing EDA on this dataset. The categorical features with numerical values…
1
vote
0 answers

How to get correlation between the categories of two categorical variable?

I have a categorical variable with 2 categories ("Health") ('healthy', 'not_healthy') and another categorical variable ("country") with 5 categories ("english", "eua", "Australia", "spain", "Germany"). I want to check if there is any relation…
bonaqua
  • 11
  • 1
1
vote
0 answers

Chi Square Test Goodness of Fit

I want to use a chi square test but I'm unsure if I'm using it right. The KickStarter website shows the frequency of main categories projects. It is updated once a day. I got a data set of KickStarter Projects from 2009 -2016. I wanted to filter the…
Laurent
  • 53
  • 1
  • 4
1
vote
2 answers

Linear regression with a fixed intercept and everything is in log

I have a set of values for a surface (in pixels) that becomes bigger over time (exponentially). The surface consists of cells that divide over time. After doing some modelling, I came up with the following…
1
vote
0 answers

Using a Subset of Categories in a Categorical Column

I have a XGBoost model and I'm going to retrain it by adding new features. There is a column in my data and it's about professions of the customers. It has 60 categories. I suppose there is no need to convert them to dummy variables because tree…
1
vote
2 answers

Are Chi-square and ANOVA (f_classif) to select best features?

I have a binary classification problem (target 0 o 1), I have both variables continuous and categorical as features. I understood that about Chi-square i can use only categorical features to evaluate them. What about ANOVA (f_classif)? It's the…
1
vote
1 answer

Why do I get this result with a chi- square test?

I have a question about the chi squared independence test, I'm working on dataset and I'm interested in finding the link between the categories of product and the gender, I plot my contingency table. contingency_table :- I found that p-value…
1
vote
1 answer

Low P value in Chi-squared test but low coefficient in logistic regression

I ran a chi squared test on multiple features & also used these features to build a binary classifier using logistic regression. The feature which had the least p value (~0.1) had a low coefficient (=0) whereas the feature which had a higher p value…
user16584277
  • 149
  • 1
  • 1
  • 9
0
votes
1 answer

Multiple Hypotheses in python

I want to write a method to test multiple hypotheses for a pair of schools (say TAMU and UT Austin). I want to consider all possible pairs of words (Research Thesis Proposal AI Analytics), and test the hypothesis that the words counts differ…
stacky
  • 1
  • 1
0
votes
1 answer

Chi-Squared test: ok for selecting significant features?

I would have a question on the contingency table and its results. I was performing this analysis on names starting with symbols as a possible feature, getting the following values: Label 0.0 1.0 with_symb 1584 241 without_symb …
0
votes
1 answer

Chi-square test - how can I say if attributes are correlated?

I am experimenting a course's teorical contents on this dataset. After data cleaning, I am trying to use chi-square test. I wrote the following code: chisq.test(chocolate$CompanyMaker, chocolate$Rating, simulate.p.value =…
user96624
  • 169
  • 6
1
2