Scipy kstest problem

Asked Mar 10 '21 at 15:16

Active Mar 11 '21 at 12:22

Viewed 63 times

I am fitting mixture models to data and assessing how mixtures with more or less components will fit the data. To do this, I am going to plot the cdf of the empirical data and the cdf of my mixture model with k components. As an example, here is a cdf of the empirical data plotted beside a mixture of lognormal distributions with 2 components.

My question is: how do I use scipy's kstest to determine the goodness of fit for the mixture model on the empirical data?

ss.kstest(Y,y_cdf)

Above is the code that I tried where Y is the data I used to fit the model and make the empirical cdf and y_cdf is the cdf of the mixture model.

I am unsure if this is correct as the value for D returned seems quite high.

edited Mar 11 '21 at 12:22

desertnaut

1,908
2
13
23

asked Mar 10 '21 at 15:16

Eglantine46

You should you ks_2sample instead. – Multivac Mar 10 '21 at 18:24
1

@JulioJesus This seems to be a goodness-of-fit test to a population CDF, not a comparison of two data sets. // Note, however, that a large sample size is all but guaranteed to result in a rejection, even if the deviation is of no practical concern. This is not a fault of the test or a type I error. It just means that the test is sensitive to differences below the threshold of mattering (and that threshold is subjective and depends on the context). – Dave Mar 11 '21 at 14:01
Hi all i'm trying to do is find out how well the orange mixture model cdf fits the empirical data cdf. Is there an easier way to do this? – Eglantine46 Mar 12 '21 at 16:07

Scipy kstest problem

0 Answers0