ReviewSize of Treatment Effects and Their Importance to Clinical Research and Practice
Section snippets
Statistical and Clinical Significance, Power, and Meta-Analysis
As statistical hypothesis testing is typically performed, a “statistically significant” result with p < .05 means that the data indicate that something nonrandom is going on. When p < .01, the evidence is more convincing, and p = 10−6 very convincing indeed. However, the p value is a comment on how convincing the data are against the null hypothesis of randomness; the conclusion is always “something nonrandom is going on.” Such a conclusion gives no clue as to the size or importance of the
Cohen’s d
When an RCT outcome measure is scaled, the most common effect size is Cohen’s d (Cooper and Hedges 1994, Hedges and Olkin 1985), the difference between the T and C group means, divided by the within-group standard deviation. This effect size was designed for the situation in which the responses in T and C have normal distributions with equal standard deviations.
The population parameter estimated by Cohen’s d ranges across the real line, with zero indicating no difference between T and C,
Number Needed to Treat
The effect size proposed that seems to best reflect clinical significance is one proposed in the context of evidence-based medicine for binary (success/failure) outcomes: NNT (Altman and Andersen 1999, Cook and Sackett 1995). Number needed to treat is defined as the number of patients one would expect to treat with T to have one more success (or one less failure) than if the same number were treated with C. For a binary outcome (success/failure), the success rate difference (SRD) is defined as
Confidence Intervals and Effect Sizes
In every report of an RCT, we recommend that each p value be accompanied by NNT (for interpretability) and SRD with its standard error and confidence interval (for computations). The difficulty is that the correct computation of the confidence interval and the standard error of SRD depends on the distribution of the data underlying that effect size.
In those circumstances in which Cohen’s d is appropriate (normal distributions, equal variances), the exact distribution of Cohen’s d is known (
Discussion: The Threshold of Clinical Significance
To summarize, we propose that for any RCT, along with reporting the p value comparing T with C, researchers report NNT and SRD, as well as the standard error and a confidence interval for SRD. If effect sizes were so reported, they could then be used to facilitate consideration of what the threshold of clinical significance might be for design of subsequent related studies.
Here we have attempted to take the first major step, recommending an effect size that is clinically interpretable and
References (45)
The case for confidence intervals in controlled clinical trials
Control Clin Trials
(1994)Hypothesis testing and effect size estimation in clinical trials
Ann Allergy Asthma Immunol
(1997)Tightening the clinical trialNonrelevancy of power calculations after the fact (Appendix 1)
Control Clin Trials
(1993)- Acion L, Peterson JJ, Temple S, Anrndt S (in press): Probabilistic index: an intuitive non-parametric approach to...
- et al.
Calculating the number needed to treat for trials where the outcome is time to an event
Br Med J
(1999) The shift from significance testing to effect size estimation
Dominance statisticsOrdinal analyses to answer ordinal questions
Psychol Bull
(1993)The cost of dichotomization
Appl Psychol Measurement
(1983)Statistical Power Analysis for the Behavioral Sciences
(1988)The earth is round (p<.05)
Am Psychol
(1995)