As a junior researcher I noticed that there are different opinions on when to choose nonparametric tests (like the Mann-Whitney or Kruskal-Wllis test) over parametric tests (like the independent samples t-test or ANOVA). Most researchers know that this decision should be made based on the distribution of the data: parametric tests for normally distributed outcomes, nonparametric tests for non-normal data. Therefore, in every beginners course on Statistics different ways to test/assess normality are discussed (histograms, QQ-plots, the Kolgomorov-Smirnov test, and the Shapiro-Wilk test).
In basic research I often see that parametric tests are used in groups that are smaller than 10; or even smaller than 5. How to test normality of the outcomes would be an interesting question on its own, however in my opinion it is strange to consider data from such small sample sizes as normally distributed regardless the outcome of such tests. What is your opinion on this topic?
Before I answer this question, I first have to declare a potential ‘conflict of interest’. Why? Well, I obtained my PhD in nonparametric statistics, and as a consequence my education might bias my answer a bit.
The biggest advantage of nonparametric statistics is that actually nothing is assumed regarding the underlying (true but unknown) distribution of the outcome measure. Note that normality is assumed at the population level and not on the sample level! The lack of such assumptions make nonparametric methods wider applicable than the corresponding parametric methods. Moreover, they are more robust and may be easier to use. But there is also some statistical theory that tells us that we may assume normality, even if the data is a little bit skewed, as long as our sample is large enough. Why do we need this theory and use parametric methods, if nonparametric tests might be (in any case by me) be preferred? Well, nonparametric tests have less power than parametric tests, i.e. the probability of rejecting the null hypothesis when indeed this hypothesis is false is larger for parametric tests than for nonparametric tests. The higher the power of a test, the better it is, hence the more preferred to be used.
But what is large enough for this theory to be valid, and use parametric tests over nonparametric tests? Well, for sure, a sample of size 5 or 10 is not large enough! Actually, I doubt a sample of size 5 or 10 is a preferable sample size anyways, not just because of normality of the data. But, in many situations it is too expensive, too unethical or too difficult (e.g. rare disease) to obtain larger samples, and you have to deal with these small samples. In that case, my advice would be to use the nonparametric test instead of the parametric test.
In search for a good rule-of-thumb to quantify ‘large enough’ to use parametric tests over nonparametric tests, I didn’t find any good reference unfortunately. Some say 20 is large enough, whereas others say that you need at least 50. If your sample is smaller than 50, my (pragmatic) advice would be to perform both the nonparametric and the parametric test. Hopefully, these tests will yield the same conclusion. If not, some more advanced statistical methods may be applied (off course in consultation with a statistician).
Statistical analysis highlighted
Both normal data (whose histogram is shaped like an old clock or ‘bell’) and non-normal data are rather common in medical science. Typical examples of normal outcomes are age, BMI, and cholesterol level, whereas typical examples of non-normal outcomes are triglyceride level, time since treatment, pack or unit years, and number of swollen joints.
Department of Epidemiology and Biostatistics, VU University Medical Center, Amsterdam, The Netherlands
Question by Wessel Fuijkschot, department of Internal Medicine, VU University Medical Center, The Netherlands