Can I conduct independent t-test when data is infested with outliers ? and how to interpret the t-statistics?

Question

I am working on 2 sample independent t-test. I have conducted analysis on test group vs control group and I have to write a report but I have few questions.

Do we have to take out the outliers and then perform t-test?
Once I perform t-test- can anybody explain the t-test output? The explanation should not be in terms of statistical terms but in such a way that non business person can also understand. I need simple explanation for confidence intervals and difference in means of the two samples.
What kind of charts can we draw to represent our results?

if you're data is not normally distributed, then use a non-parametric test like Mann-Whitney U test: https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test — dmb, Mar 24 '16 at 21:36

ABCD · Answer 1 · 2016-03-26T06:39:56.293

It's fine to do a t-test on unequal sample size, however, the power wouldn't be as good as equal sample size.

1:) Yes or no. Impossible to say without plotting the outliers. What's more important, can you assume your data be normally distributed? Have you checked the QQ-plot? Have you checked the histogram? Do they look like close to a normal distribution? While the t-test is robust against non-normal data as long as the sample size is sufficient large, your data shouldn't behave too far away from a normal.

When you think about outliers, ask yourself the following questions:

How many outliers? If you have many, t-test is probably not appropriate.
Why the outliers? If it's a random error (you're just unlucky), you could include it in the t-test. If it's a systematic error, stop the test, go back and check your data.
How do you define the outliers?
Do those outliers look symmetry? If so, you might assume your sample come from a normal population. You can check the skewness of your data.

You have to try to understand those outliers to come with up a decision.

2:) You can just explain like "the probability of the difference in means is (or isn't) significant".

3:) You should draw a box-plot for each group.

Thank yo so much. I conducted non parametric tests on the data as the data was not that normal and had many outliers.I tried to remove outliers to get normal distribution but I was loosing data and so carried out non parametric test. Regarding the boxplot- what can I explain about each boxplot? What are the important points to list while explaining boxplots? — pinky, Mar 26 '16 at 18:04

score 0 · Answer 2 · answered Mar 24 '16 at 20:37

0

1) Maybe, remember that you are assuming a normal distributions, if you don't satisfy those assumptions you are not running a valid test.

2)You are testing whether or not the difference is zero, i.e. no difference=zero in my confidence interval.

3)Bar charts are the easiest to understand because you can see the difference. Box-plots provide more info but are for technical people only.

answered Mar 24 '16 at 20:37

Ryan

702
3
11

I just have 40 observations in a sample and 400 observations in another sample.There are many outliers in first sample. If I remove those, I would have very less data. So I conducted t-test with the original dataset. 2) If boxplots have to be explained to non -technical people, what can we explain them? 3) Can you suggest a better way to represent confidence intervals? – pinky Mar 24 '16 at 21:21
For this particular study, I have 6 samples of data. 3 of test and 3 of control. Test groups have very less data (40 observations in each) but control has around 400 observations in each sample. I have conducted t-test with each respective test and control groups. Now I need to represent the confidence intervals of these t-test in a chart in R. How can I do that? – pinky Mar 24 '16 at 22:47
What is the difference between the 3 test (/treatment) sets? – TBSRounder Mar 25 '16 at 14:02
You can add error bars to your bar chart,to stand in for your confidence intervals. – Ryan Mar 25 '16 at 14:42

Can I conduct independent t-test when data is infested with outliers ? and how to interpret the t-statistics?

2 Answers2