Article Text
Statistics from Altmetric.com
Almost all studies are prone to error—they use samples drawn from a population to estimate what is occurring or what might occur in the whole population. These errors can broadly be divided into two: random error and systematic error. Random error is the play of chance and results in an estimate of effect (for example, relative risk) being equally likely to be above or below the true value. Its role is assessed with statistical measures such as p values and confidence intervals. Systematic error is called bias, and also leads to the estimate being above or below the true value. Systematic error can be further divided into information bias, which relates to the misclassification of data, and selection bias, which is the focus of this article.
Bias
Any trend in the collection, analysis, interpretation, publication or review of data that can lead to conclusions that are systematically different from the truth. (Last J. A dictionary of epidemiology, 2001)
Bias limits the conclusions that can be drawn from an analysis. It is particularly problematic because, unlike confounding, little can be done to “allow” or “control” for it once the data have been collected. As such it is in many ways an issue of study design, planning and practice.
Selection bias in epidemiological studies occurs when there is a systematic difference between the characteristics of those selected for the study and those who are not. It also occurs in intervention studies when there are systematic differences between comparison groups in response to treatment or prognosis. Intervention studies are especially susceptible to selection bias unless particular efforts are made to minimise it. The most effective method is random allocation to treatment and control groups. Randomisation cannot be applied to observational studies and the effects of selection bias on these will now be considered.
CASE CONTROL STUDIES
Avoiding selection bias is a particular challenge in the design of case-control studies. It occurs when the exposure status of cases or controls influences the likelihood that they are entered into the study. All cases should have an equal chance of being included in a case control study. If cases (or controls) are included in, or excluded from, a study on criteria that are related to exposure to the risk factor under investigation, the estimate of effect will be biased. In particular, cases exposed to the factor under investigation should be equally likely to be included compared to non-exposed cases. If for example exposure to the risk factor is associated with increased hospital attendance, and the cases are selected from a hospital population, the estimate of effect will be increased.
Selection bias is also a consideration in the selection of controls. The control population exists to represent “the population from which the cases were drawn”. Recruiting hospital-based controls risks the possibility that hospital patients may have greater exposure to the factors of interest than the population which provided the cases. This would lead to an underestimate of the effect size.
Bias is less likely when cases are selected from a defined population and controls are a random sample of unaffected individuals from that population.
If hospital controls must be used, they should not include patients with the same type of disease as the cases, nor should they include patient groups with illnesses associated with the exposure of interest.
Example of potential selection bias in a case-control study
Consider a case-control study in which major depression is the outcome of interest and alcohol use is the exposure of interest. If cases are selected from in-patients at a general hospital, alcohol use is likely to be higher than equivalent cases from the general population, because alcohol use is associated with hospital attendance (for example through accidents or gastrointestinal bleeds). This becomes a problem for the study in either of the following scenarios:
1. The use of non-depressed controls from the general population, as this will result in an overestimation of the effect of alcohol in depression.
2. The use of non-depressed in-patient controls if they have illnesses that may be associated with alcohol (the exposure of interest). For example, if the controls are orthopaedic patients, they may have high rates of alcohol use (because trauma and alcohol use are associated) leading to an underestimation of the effect of alcohol in depression.
COHORT STUDIES
Selection bias is less problematic in cohort studies when at recruitment the outcome of interest has yet to take place. However, it may be introduced if individuals in one exposure category are less likely to be followed up than those in another and if the reasons for loss to follow-up are associated with the outcome of interest. For example, more symptomatic individuals may be more likely to drop out. If these symptoms are related to their exposure status, then the comparison between exposed and unexposed ceases to be a fair one. It may be tempting to omit hard-to-trace individuals but it is probable that they differ systematically from their colleagues who are easy to contact. Again this is an issue of study design and plans should be made to track losses to follow-up. There is no absolute figure but studies with less than 80% follow-up should be viewed cautiously.
Example of potential selection bias in a cohort study
Cannabis use and expression of mania in the general population. Henquet et al, 20061 (see page 61, Evidence-Based Mental Health, volume 10, May 2007).
In this study 69.7% of those approached for inclusion agreed to participate. If non-participation was associated with cannabis use, bias may have been introduced at this point. At the second time point, 4848 out of 7076 (68.5%) participants provided follow-up data. If, for example, those that had been lost to follow-up were more likely to be manic and to be using cannabis, it follows that the estimate of effect calculated in the study could be an underestimate.
CROSS-SECTIONAL STUDIES
Participation in cross-sectional studies is never 100%. The decision to take part (or not) is not random. Some of the factors associated with non-response have been elucidated and include male sex, younger age, lower socioeconomic status and problems with alcohol and drugs. If either the exposure or outcome of interest is associated with any of these factors it is likely that the sample being studied is not truly representative of the background population. Furthermore, non-responders in an epidemiological study are generally more likely to be sick than those who agree to participate. Consequently, most surveys will underestimate the prevalence of disease. In order to minimise bias, response rates should be as high as possible, with as much information recorded on non-participants as is feasible and permitted. This allows comparisons to be made between responders and non-responders to determine whether there is a systematic difference between groups.
Even studies that are well conducted may be prone to a degree of bias. Therefore in appraising research it is not only important to identify potential selection bias, but to consider the extent to which it may affect results and in what direction.
Further reading
REFERENCE
Linked Articles
- EBMH notebook
- Aetiology