Article Text

Download PDFPDF

HCR-20 shows poor field validity in clinical forensic psychiatry settings
Free
  1. John Tully
  1. Department of Forensic and Neurodevelopmental Sciences, Institute of Psychiatry, Psychology and Neuroscience, London, UK; john.tully{at}kcl.ac.uk

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

ABSTRACT FROM: Jeandarme I, Pouls C, De Laender J, et al. Field validity of the HCR-20 in forensic medium security units in Flanders. Psychology, Crime & Law 2017;23:305–22.

What is already known about this topic

Assessment and prediction of violence risk is central to forensic psychiatry practice. From the 1970s, increasing awareness of the limitations of clinical judgement alone meant services began to employ actuarial risk assessment tools, which were shown to significantly improve predictive accuracy.1 However, these tools soon came under criticism for their own limitations, including focus on relatively static factors that are immutable and lack of applicability to specific real-world scenarios.1 This led to the development of structured professional judgement (SPJ) tools, which have largely replaced purely actuarial tools as the gold standard of risk assessment and management. Such instruments combine actuarial and dynamic components and are thought to be yet practically relevant and encourage use of professional discretion, while having a solid empirical basis.2

The most widely used SPJ tool in forensic psychiatry settings in the UK is the Historical Clinical Risk Management-20 (HCR-20), which is currently in its third version (HCR-20V3).3 Both retrospective and prospective research has suggested that the HCR-20 is predictive of future violence, across a range of settings.3–5 Hence, the HCR-20 has become the first choice instrument for risk assessment of violence in forensic psychiatry practice in the UK. In recent years, however, the basis of this practice has been challenged. Further studies have suggested that SPJ tools, including the HCR-20, lack validity for certain key diagnoses in forensic psychiatry such as schizophrenia6 and psychopathy7 and that the standard method of reporting validity of the HCR-20—area under the curve (AUC)— is flawed if used without other performance measures.6 Other work has suggested that most of its individual factors do not predict violence8 and that SPJ tools may be better employed to identify low risk individuals, for example in general adult psychiatry populations.9

Furthermore, authors have questioned the applicability of such studies to real world practice. One study, examining the predictive validity of the HCR-20 when applied to clinical practice among 109 male mentally disordered offenders in a high secure forensic hospital, showed that the HCR-20 did not predict future violence regardless of setting (community vs inpatient) nor time (short vs long term), except for serious incidents.10 Another study, examining prerelease HCR-20 assessments over 6 years at a US forensic hospital, revealed that none of the scales or subscales predicted recidivism better than chance.11 Hence, a recent study in a Belgian medium secure forensic psychiatry setting12 is of particular interest. Its main findings, that the HCR-20 did not predict violent recidivism during and after treatment, warrant appraisal by all forensic psychiatrists, psychologists and others working in forensic mental health.

How the study was conducted and what it showed

The study was conducted at the three forensic medium security units (MSUs) in Flanders. The study sample was 91.7% male and the mean age at admission was 36.1 years. The most common diagnoses were personality disorders (76.1%), substance use disorders (61%) and psychotic disorders (43.4%). The most common index offences were violent offences (75.6%) and property offences (18%). The mean number of prior convictions was 6.8, and only a small minority (12.2%) was a first offender.

HCR-20s were completed prospectively as part of clinical practice. Those conducted within 1 year of MSU admission (n=189) and within 1 year ahead of discharge (n=132) from the MSU between 2001 and 2010 were included in the study. Scoring was mainly performed by criminologists, sometimes in collaboration with a psychologist. Nearly all clinicians had the requisite training and professional credentials. In most cases, HCR-20 items were completed after team discussions. Inter-rater reliability (IRR) was evaluated using interclass correlations (ICC). Fleiss’ critical values for single measures were used: ICC ≥0.75=excellent, ICC ≥0.60=good, ICC ≥0.40=moderate and ICC <0.40=poor. Predictive validity was calculated through receiver operating characteristic curves, producing AUC values.

For each HCR-20, a total numerical score was calculated. Participants were also classified based on SPJ rating—those at high risk were compared with participants at low or medium risk. On admission, the mean HCR-20 total score was 24.8 (SD=5.1, range=10.5–36). A percentage of 18.2% of the patients were classified as low risk, 43.5% as medium and 38.3% as high risk. On discharge, 12.9% of the patients were classified as low risk, 64.4% as medium and 22.8% as high risk. At both time points, more recidivists were found in the high and medium risk levels compared with the low-risk level, but these differences were not significant at either time point (p=0.23 for admission, p=0.10 for discharge). For the total HCR-20 score, IRR was 0.74. This was the highest for the historical scale (ICC=0.84) and lower for the C-scale and R-scale (0.64 and 0.58, respectively). The relationship between recidivism and length of follow-up after discharge was not significant (p=0.37).

Based on admission assessment, only the individual items ‘Personality disorder’ (H9) and ‘Impulsivity’ (C4) were able to discriminate recidivists from non-recidivists (significant AUCs with small to moderate effect sizes). The HCR-20 was not useful in prospectively identifying who was likely to reoffend (positive predictive value (PPV)=22.0% and 27.1% for total score and SPJ rating, respectively), but it did identify low-risk patients more accurately (negative predictive value (NPV)=81.4% and 84.2%). Based on discharge assessment, only the individual items ‘Early maladjustment’ (H8) and ‘Impulsivity’ (C4) were able to discriminate recidivists from non-recidivists (significant AUCs with moderate to large effect sizes). Furthermore, AUCs for both the total numerical score and SPJ rating at discharge were non-significant. Again, low-risk individuals were identified with higher accuracy (NPV=76.9% and 80.8% for total score and SPJ rating, respectively) than high-risk patients (PPV=29.6% and 39.1%). Number needed to detain was 3, meaning that three people with a high-risk profile need to be detained in order to prevent one individual from recidivism. Conversely, number safely discharged was 3 and 4 (according to the numerical score and SPJ, respectively), meaning that three to four patients considered to be at low risk could be safely discharged, prior to an offence occurring.

Interpretation of the findings

In keeping with two recent studies suggesting limited predictive validity for the HCR-20 in real-world clinical forensic settings,10 11 the study by Jeandarme et al 12 concludes that the HCR-20 was not effective as a predictive tool for violence in a medium secure forensic setting. Following analysis of HCR-20s on admission and at discharge, AUCs for the prediction of violent recidivism during and after treatment were non-significant. Furthermore, although NPVs were higher than PPVs, the HCR-20 was not particularly accurate in identifying low-risk individuals. NPVs of >90% are likely to be expected if HCR-20 is to be used as a tool to identify low risk of violent offending.

The conclusions appear robust, and the analyses attempted to account for confounding factors. For example, the relationship between recidivism and the length of follow-up after discharge was not significant, and no significant differences could be found in assessments at discharge compared with those on admission. While reduced violence as a result of effective HCR-20 management plans may itself negatively affect the predictive validity of the tool,10 this would not appear to be the case in Jeandarme's study, as no significant differences in recidivism were found between patients assessed with the HCR-20 and those without an HCR-20 assessment. Though a power calculation is not provided, the sample consists of almost 40% of admissions to the MSU over a 10-year period and is similar or superior to that used in many previous studies examining predictive validity of the HCR-20.13

There are many of practical reasons why the HCR-20 may be of limited benefit as a predictive tool in clinical practice. Although the HCR-20 has been repeatedly associated with moderate to good predictive power,3 this is largely based on assessments completed for research. In clinical settings, completion of an accurate and thorough HCR-20 is time-consuming (up to 14 hours on average).14 In the reality of clinical practice, this is not always feasible, resulting in substandard HCR-20s that omit some items such as personality disorder and base rates of violence.10 In many clinical settings, including in the Jeandarme study, not all staff receive the adequate training, and the document may be also be completed by individuals with little clinical experience.

Another factor limiting the predictive validity of the HCR-20 in clinical settings may be IRR. While this should be achievable for historical (‘H’ scale) items using thorough review of patient records, it may be more difficult to achieve for dynamic risk factors, such as ‘active symptoms of mental illness’ and ‘stress’. These factors are more transient and dependent on subjective assessment, making them more difficult to rate and evidence in a pressured clinical environment. In keeping with this, the study found that the IRR of the HCR-20 was ‘excellent’ for the H scale, but only ‘good’ for the C scale and ‘moderate’ for the R scale (according to Fleiss’ criteria for single measures).

These limitations can be overcome by good clinical leadership and organisation, however resource intensive. Perhaps of greater concern to forensic practitioners, however, is that Jeandarme has replicated large-scale findings that suggest that the HCR-20, and SPJ tools generally, may have more inherent limitations. In 2012, a comprehensive meta-analysis of 24 847 participants from 13 countries showed that risk assessment tools produced low to moderate PPVs (median 41%; IQR 27%–60%) for violent offending while NPVs were much higher (91%; 81%–95%).9 These results support this, demonstrating the HCR-20 was had higher accuracy at identifying individuals at low risk of further violence, though still considerably lower than what would be useful in clinical practice.

The study's findings also support the conclusions of another large-scale study examining violent reconvictions following release among 1353 male prisoners in England and Wales.8 This showed that the predictive validity of the HCR-20 was based on a small number of items and that some of individual items were not independently predictive. Similarly, in the current study, only ‘Personality disorder’ (H9) and ‘Impulsivity’ (C4) on admission assessments and ‘Early maladjustment’ (H8) and ‘Impulsivity’ (C4) on discharge were predictive of recidivism.

Clinical and research implications

Taken alongside two recent studies with similar outcomes,10 11 the findings of this study call into question the utility of the HCR-20 in clinical settings. Specifically, this study adds to the evidence base suggesting that SPJ tools do not have predictive utility for violence. While they may be of more use in identifying low risk individuals, for example in non-forensic populations, the NPVs demonstrated in this study were all less than 85%, which suggests caution is required here also. The figure for number needed to detain from this and other studies suggest that current practice is leading to detention of low risk individuals due to poor specificity of the HCR-20.

Furthermore, given the replication of findings of large-scale study showing that limited number of items are independently predictive of violence,8 the findings of this study suggest that a simpler tool with fewer items may be beneficial. This might address difficulties in achieving consistent implementation of the HCR-20, without adversely affecting the tool's predictive validity. A brief risk prediction tool, generating a ‘score’ for violence similar to Q-RISK score for heart attack and stroke, may be useful.15 Meanwhile, the HCR-20 could perhaps be employed more selectively in clinical settings, where it is also used to develop risk management plans, particularly for more complex individuals. Indeed, many clinicians believe this is where its greatest utility lies.

Finally, this study highlights the need to make research on risk assessment tools more clinically valid. Studies on HCR-20 to date have been overly dependent on assessments completed by researchers, which do not appear to reflect the clinical reality. Achieving a balance between robust study design and an accurate assessment of practice in clinical settings poses a considerable challenge that nonetheless must be addressed if this field is to progress.

References

  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.

Footnotes

  • Competing interests None declared.

  • Provenance and peer review Commissioned; internally peer reviewed.