Researchers usually present the characteristics of the participants in each group at the start of a study in a table. This table is often the first table in a paper and, hence, called Table 1. This table gives the reader an overview of the study participants and examines whether the participants are similar to patients he or she encounters. The reader can also use the information in the table to judge whether the participants in the two groups were comparable. Sometimes the two groups differ with respect to relevant demographic and clinical characteristics. Then it is important to correct for these differences in further analyses and take them into account when interpreting the results of the study.
I am analyzing data from a small randomized controlled clinical trial with two arms. Should I test whether there are differences in baseline characteristics between the two groups and present the p-values in Table 1 of my manuscript? What do the results of these tests mean for further analysis that I carry out?
In a clinical trial with two independent arms (groups of patients), differences between the two groups can be tested using a t-test for normally distributed continuous variables . When using these tests, the null hypothesis is usually that there is no difference between the two groups of patients. Output from a program for statistical analysis for these tests usually includes a p-value. A p-value denotes the probability of observing a more extreme difference between the two groups if the null hypothesis were true. In clinical research, a p-value less than 0.05 is viewed as statistically significant.
The probability of observing a p-value less than 0.05 is known as the power of a study. Researchers performing clinical studies often aim for a minimum power of 0.8. The power of a study depends on multiple factors. Three important factors are: 1) the magnitude of the difference in the variable of interest between the arms of the study; 2) the amount of variation, often expressed as the standard deviation, in the population being examined; and 3) the number of subjects in each of the arms. Here we are particularly interested in the effect of the third point.
Imagine that you have performed a randomized controlled clinical trial with a placebo arm and a treatment arm. There are equal numbers of Dutch men aged 18 years old in each arm. One of the baseline characteristics you measured was height in centimeters. The mean height was 182.6cm in the placebo arm and 3cm, 4cm, 5cm or 6cm less in the treatment arm. The standard deviation of height was 7.2cm in both groups . After you have verified that the height of your subjects follows a normal distribution, you can test whether there is a statistically significant difference in height between the two groups using a t-test.
The relationship between the number of men in each arm and the power to detect each of the differences is presented in Figure 1. The power increases as the number of men in each arm increases. The power to detect a difference in height between the arms is at least 0.8 if there are at least 92 men in each arm and the difference is 3cm, at least 52 men in each arm if the difference is 4cm, at least 34 men in each arm if the difference is 5cm and at least 24 men in each arm if the difference is at 6cm. This means that even if the actual difference in height between the two arms is substantial, say 6cm, the probability that the p-value of the statistical test used to examine the differences will be greater than 0.05 will be at least 0.2 if there are fewer than 24 men in each arm.
Hence, you can test whether there are differences in baseline characteristics between the two groups and present the p-values in Table 1 of your manuscript. However, you should be aware that even if the p-value is greater than 0.05, there may still be clinically relevant differences between participants in two arms of a small trial. Depending on the aims of your study, it may be important to correct for these differences in further analysis, even if the differences are not statistically significant.
Figure: The probability (or power) of detecting the difference between the two arms as statistically different at the 5% level for two to 100 men per arm
Clinical Research Unit and Department of Medical Informatics, Academic Medical Center, Amsterdam, The Netherlands.
- Petrie A, Sabin C. Medical statistics at a glance. Third edition. Wiley-Blackwell. 2009. ISBN 978-1-4051-8051-1.
- Fredriks AM, van Buuren S, Burgmeijer RJ, Meulmeester JF, Beuker RJ, Brugman E, Roede MJ, Verloove-Vanhorick SP, Wit JM. Continuing positive secular growth change in The Netherlands 1955-1997. Pediatr Res. 2000 Mar;47(3):316-23.