Questions tagged [goodness-of-fit]

7 questions
3
votes
1 answer

SAS Studio seems to imply that apparently non-normal data is normal

I have some data I'm trying to analyze in SAS Studio (university edition). I am using the Distribution Analysis feature to try to test some data for normality. It gives me the following histogram: Skewness is approximately 2.934 and Kurtosis is…
2
votes
0 answers

Multi-dimensional Euclidian R^2 squared - reasonable?

I have a high-dimensional space, say $\mathbb{R}^{1000}$, and I have samples $y_1, \ldots , y_n \in \mathbb{R}^{1000}$ and $\hat{y}_1, \ldots , \hat{y}_n \in \mathbb{R}^{1000}$. Would $$ R^2 = 1 - \frac{\sum_i || y_i - \hat{y}_i||^2}{\sum_i || y_i -…
2
votes
1 answer

Why Should There Be Multiple Columns in Train Labels for One Model?

Going through the notebook on well known kaggle competition of favorita sales forecasting. One puzzle is, after the data is split for train and testing, it seems y_train has two columns containing unit_sales and transactions, both of which are being…
1
vote
1 answer

Does statsmodels compute R2 and other metrics on a validation-/test- set?

Does statsmodels compute R2 and other metrics on a validation set? I am using the OLS from the statsmodels.api when printing summary, an r2 and r2_asjusted are presented. I did not trust those 0.88 and computed an own adjusted R2 with scikit-learn…
0
votes
0 answers

Scipy kstest problem

I am fitting mixture models to data and assessing how mixtures with more or less components will fit the data. To do this, I am going to plot the cdf of the empirical data and the cdf of my mixture model with k components. As an example, here is a…
0
votes
0 answers

Interpreting chi-square statistic values and also scipy.stats.chisquare giving unreasonable value

I have some data from the S&P500 of daily returns. I'm not sure if I can show my graph, as I will be using it in my undergraduate paper, but it looks essentially the same as the histogram here: Just like in the figure above, I am attempting to fit…
0
votes
1 answer

Goodness on test or train set?

I split my data set before on train (80%) and test (20%) splits. Trained logistic regression model on the train set. Now, want to check the goodness of fit using the Chi-square likelihood omnibus test, on what data set I should apply it to test or…