0

I am fitting mixture models to data and assessing how mixtures with more or less components will fit the data. To do this, I am going to plot the cdf of the empirical data and the cdf of my mixture model with k components. As an example, here is a cdf of the empirical data plotted beside a mixture of lognormal distributions with 2 components.

enter image description here

My question is: how do I use scipy's kstest to determine the goodness of fit for the mixture model on the empirical data?

ss.kstest(Y,y_cdf)

Above is the code that I tried where Y is the data I used to fit the model and make the empirical cdf and y_cdf is the cdf of the mixture model.

I am unsure if this is correct as the value for D returned seems quite high.

desertnaut
  • 1,908
  • 2
  • 13
  • 23
  • You should you ks_2sample instead. – Multivac Mar 10 '21 at 18:24
  • 1
    @JulioJesus This seems to be a goodness-of-fit test to a population CDF, not a comparison of two data sets. // Note, however, that a large sample size is all but guaranteed to result in a rejection, even if the deviation is of no practical concern. This is not a fault of the test or a type I error. It just means that the test is sensitive to differences below the threshold of mattering (and that threshold is subjective and depends on the context). – Dave Mar 11 '21 at 14:01
  • Hi all i'm trying to do is find out how well the orange mixture model cdf fits the empirical data cdf. Is there an easier way to do this? – Eglantine46 Mar 12 '21 at 16:07

0 Answers0