0

I need to do a chi square test of two of my dataset's categorical variables. This two variables have basically the same meaning but comes from two different sources, so my idea is to use a chi square test to see how "similar" or correlated, these two variables really are. To do so, I've written code in Python, but the p-value I get from it is exactly 0 which sounds a little strange to me.

the code is:

from scipy.stats import chi2_contingency
import pandas as pd

df = pd.read_csv('data/data_understanding_output.csv')

cont = pd.crosstab(df['sentiment'], df['valence_cat'])
c,p,dof,ex = chi2_contingency(cont)

My contingency table is:

Class 0 Class 1 Class 2
Class 0 315 37 2
Class 1 665 2661 665
Class 2 3 49 285

And the trying to output like this my results I get:

print(f"{c}\n{p}\n{dof}\n{ex}")

1954.0385481800377
0.0
[[  74.32336608  207.69713798   71.97949594]
 [ 837.92246903 2341.57988039  811.49765058]
 [  70.75416489  197.72298163   68.52285348]]

4

So my question is, Did I do anything wrong? Is it normal to have p-value that equals to absolute zero ?

Subhash C. Davar
  • 578
  • 4
  • 18

3 Answers3

1

Your results are based on cross tabulation of three categories. You have a single variable with three categories.There should be one-way tabulation in your contingency table. Re-write your contingency table and then compute p-value. It is unlikely to be close to zero.

Subhash C. Davar
  • 578
  • 4
  • 18
0

Is it normal to have p equals to absolute zero?

I don't know about "normal", however it is completely possible, and in your case it makes sense, your frequencies are vastly different between the classes, so one would expect this result to be extremely unusual.

I'll repeat this test in R

ct=rbind(
  c(315,37,2),
  c(665,2661,665),
  c(3,49,285)
)

chisq.test(ct)

    Pearson's Chi-squared test

data:  ct
X-squared = 1954, df = 4, p-value < 2.2e-16

same result, a p-value of practically 0.

Note: the Chi square test has some assumptions, one of them being (a rule of thumb)

  • No more than 20% of the expected counts are less than 5 and all individual expected counts are 1 or greater
user2974951
  • 499
  • 2
  • 6
  • So you basically are saying that this results is to be expected since the distribution of the frequency across the various class is a lot different that the one expected, which is the one I've outputted. And by saying so I can "safely" assume that the two variables are, in fact, correlated? – Michele Papucci Jan 18 '22 at 12:29
  • @MichelePapucci Yes to the first part, no to the second. You cannot claim anything about correlation from this test, that's a different subject. A chi square test only tests for differences in the distributions. – user2974951 Jan 18 '22 at 12:51
  • Ok so a more precise thing I could say then is that I can safely reject the null hypothesis that the distribution of the two variable are only similar by chance? – Michele Papucci Jan 18 '22 at 13:02
  • @MichelePapucci You can safely reject the nully hypothesis that "the two categorical variables are independent" or "there is no relationship between the categorical variables". – user2974951 Jan 18 '22 at 13:24
0

P Value of 0 is rare but theoretically possible. However in reality, p value can very rarely be zero. Any data collected for some study are certain to be suffered from error at least due to chance (random) cause. Accordingly, for any set of data, it is certain not to obtain "0" p value. However, p value can be very small in some cases.

Lets look at the interprations: The p-value is the probability of getting an outcome as extreme or more extreme than the observed outcome, ASSUMING THE NULL HYPOTHESIS IS TRUE. If the p-value is small, this weighs against the null hypothesis, because it says that the observed outcome is quite rare, and therefore unlikely. A large value for the p-value weights in favor of the null hypothesis, because it says that the observed outcome is pretty much what the null hypothesis said you would see.

So in your case a very small p values indicate, strong reason to reject Null Hypothesis

Ashwiniku918
  • 1,864
  • 3
  • 17