eLetters

24 e-Letters

Inaccurate representation of ANU-ADRI and biased comparison of ANU-ADRI with UKBDR concerning age distribution of the UK Biobank sample
24 April, 2024

The article [1] reports the development of a new dementia risk score, leveraging off superior area under the curve (AUC) statistics compared with previously published risk scores. However, the representation and use of at least one of those prior risk scores is highly inaccurate and this raises concerns about the overall integrity of the publication.

1. The authors incorrectly state that the ANU-ADRI risk index[2] was ‘developed in cohorts in Australia’ (abstract and page 2). This is wrong, it was not developed directly from any other cohorts. Rather, as described in the original publication[2] it was developed using an evidence-based medicine approach that collated the effect sizes of risk factors drawn from systematic reviews. The systematic reviews draw from the wider literature, with most cohorts being from North America, the UK, and Europe. The tool was validated three external cohort studies. Data from Australia was rarely included in the meta-analyses from which the risk score was derived [2].

2. The authors say that the ANU-ADRI ‘was developed for older individuals (60+), ….however our sensitivity analysis also performed poorly when restricting our cohort to an age range matching its development sample’.
There are two problems with this sentence:
a. There was no development cohort for the ANU-ADRI so it could not have been possible for the described sensitivity analysis to have been undertaken.
b. Most cohort studies that contribut...
Show More

The article [1] reports the development of a new dementia risk score, leveraging off superior area under the curve (AUC) statistics compared with previously published risk scores. However, the representation and use of at least one of those prior risk scores is highly inaccurate and this raises concerns about the overall integrity of the publication.

1. The authors incorrectly state that the ANU-ADRI risk index[2] was ‘developed in cohorts in Australia’ (abstract and page 2). This is wrong, it was not developed directly from any other cohorts. Rather, as described in the original publication[2] it was developed using an evidence-based medicine approach that collated the effect sizes of risk factors drawn from systematic reviews. The systematic reviews draw from the wider literature, with most cohorts being from North America, the UK, and Europe. The tool was validated three external cohort studies. Data from Australia was rarely included in the meta-analyses from which the risk score was derived [2].

2. The authors say that the ANU-ADRI ‘was developed for older individuals (60+), ….however our sensitivity analysis also performed poorly when restricting our cohort to an age range matching its development sample’.
There are two problems with this sentence:
a. There was no development cohort for the ANU-ADRI so it could not have been possible for the described sensitivity analysis to have been undertaken.
b. Most cohort studies that contributed to the meta-analyses from which the ANU-ADRI was developed, included samples over the age of 70. In addition, the ANU-ADRI validation studies which were the Rush Memory and Ageing Study, the Kungsholmen Project and the US Cardiovascular Health study, had baseline mean ages of 79.8, 81.5 and 72.3 respectively[3]. Hence the ANU-ADRI was developed and validated on studies that comprised much older participants than the UKBiobank (UKB) which had a mean baseline age of 59.97 and the Whitehall II which had an age-range of 35 to 55.

3. The authors report on page 5 that the UKBRS performed similarly to the ANU-ADRI and DRS when evaluated in sensitivity analysis in the sample aged over 60 in the Whitehall study. As both the DRS and the ANU-ADRI were developed on older cohorts this is the more appropriate comparison (even though not ideal as the age-structure of the samples is still not directly comparable). However, instead of being reported as the main analysis and in the abstract, it was relegated to supplementary material. The authors do not report the mean and distribution of age in the comparison samples, or compare samples on mean age of incident dementia.

4. The comparison with ANU-ADRI and UKBDRS is highly biased by differences in the weights of age for each risk score. The comparison of the UKBDRS with ANU-ADRI is not valid because UKBDRS uses weights of age obtained from the dataset, however, the ANU-ADRI uses weights 0 for participants aged <65 (which is almost 82% of the data). Age is the most significant predictor of dementia and alone provided a very high ROC (ROC 0.77 for age only in comparison 0.80 for the UKBDRS) in the UKB data. The comparison with UKB with ANU-ADRI would be valid if they were compared without age in both models. We expect the result would have been similar for UKB and ANU-ADRI if they had been compared without age.

References
[1] Anaturk M, Patel R, Ebmeier KP, Georgiopoulos G, Newby D, Topiwala A, de Lange A-M, Cole JH, Jansen, MG, Singh-Manoux A, Kivimäki M, Suri, S. Development and validation of a dementia risk score in the UK Biobank and Whitehall II cohorts. BMJ Ment Health. 2023;26.
[2] Anstey KJ, Cherbuin N, Herath PM. Development of a New Method for Assessing Global Risk of Alzheimer's Disease for Use in Population Health Approaches to Prevention. Prevention science 2013;14:411-21.
[3] Anstey KJ, Cherbuin N, Herath P, Qui C, Kuller LH, Lopez OL, Wilson, R.S., Fratiglioni, L. A self report risk index to predict occurrence of dementia in three independent cohorts of older adults: The ANU-ADRI. PLoS One. 2014;9:e86141.
Show Less
Significantly higher suicide mortality (p=.05) without gender reassignment should not be dismissed
20 March, 2024

Ruuska et al.’s [1] analysis incorrectly concluded that medical gender reassignment (GR) did not reduce suicide mortality because they had assessed that “the suicide mortality of both those [presenting with gender dysphoria (GD)] who proceeded and did not proceed to GR did not statistically significantly differ from that of controls.” On the contrary, by conventional criteria, the suicide mortality of the non-GR group was significantly higher than controls while that of the GR group was not. Ruuska et al. [1] reported: "Adjusted HRs [hazard rates] for suicide mortality were 3.2 [for non-GR] (95% CI 1.0 to 10.2; p=0.05) and 0.8 [for GR] (95% CI 0.2 to 4.0; p=0.8), respectively." By this finding, those dysphorics who had undergone GR were no more likely to have committed suicide than were general population controls, while those who had not undergone GR were more than three times as likely to have committed suicide. The latter difference is reported with a p-value of .05 and a 95% confidence interval that does not extend below 1.0.

By the prevailing standards of scientific inference, and in virtually any other study, such a finding would be assessed as statistically significant (or perhaps, depending on rounding error, trivially below significance), but Ruuska et al. [1] judged it not to be so. They achieved this anomaly by fiat, announcing: “In order to avoid type 1 error due to multiple testing and the large data size, the cut- off for statistical sign...
Show More

Ruuska et al.’s [1] analysis incorrectly concluded that medical gender reassignment (GR) did not reduce suicide mortality because they had assessed that “the suicide mortality of both those [presenting with gender dysphoria (GD)] who proceeded and did not proceed to GR did not statistically significantly differ from that of controls.” On the contrary, by conventional criteria, the suicide mortality of the non-GR group was significantly higher than controls while that of the GR group was not. Ruuska et al. [1] reported: "Adjusted HRs [hazard rates] for suicide mortality were 3.2 [for non-GR] (95% CI 1.0 to 10.2; p=0.05) and 0.8 [for GR] (95% CI 0.2 to 4.0; p=0.8), respectively." By this finding, those dysphorics who had undergone GR were no more likely to have committed suicide than were general population controls, while those who had not undergone GR were more than three times as likely to have committed suicide. The latter difference is reported with a p-value of .05 and a 95% confidence interval that does not extend below 1.0.

By the prevailing standards of scientific inference, and in virtually any other study, such a finding would be assessed as statistically significant (or perhaps, depending on rounding error, trivially below significance), but Ruuska et al. [1] judged it not to be so. They achieved this anomaly by fiat, announcing: “In order to avoid type 1 error due to multiple testing and the large data size, the cut- off for statistical significance was set at a p<0.01” [1] rather than the usual standard of p<0.05. This is a decision is highly questionable, for two reasons. First, the size of the data analyzed is not nearly large enough to justify it: 18,726 cases (122,358 person-years). Preference for a .01 criterion usually involves data numbering in the hundreds of thousands, at least, and even then rarely precludes recognition of substantively important findings that may have a higher p-value below .05. Of the ten most recent studies using larger similar data—extractions from Finnish registers that are as large or larger than that of Ruuska et al. [1] —with numbers of cases ranging from 24,341 to 5.6 million (the entire Finnish population), or 41.9 million person-years, none rejected findings between .05 and .01 as not being statistically significant. (See Endnote 1) The most methodologically similar study of the same subject as that of Ruuska et al. [1] , Dhejne et al.’s [11] 2011 examination of the Swedish population register, also used the conventional .05 standard. None of these large-data register studies rejected as insignificant a finding with p less than .05 but greater than .01, or a 95% confidence interval that did not include nullity, except for Ruuska et al. [1] Co-authors of the Ruuska et al. [1] study also imposed a restrictive .01 standard for significance on an earlier register study of gender reassignment, [12] but did not do so for similarly-sized register studies of other topics, for example schizophrenia (17,112 cases), [13] and depression (16,842 cases). [14]

More importantly, second, adjusting the significance criterion to reduce the prospect of one type of error inescapably imbalances or biases, by the same amount, the analysis toward the opposite error. In this case, it perversely increases the risk of the very type of error the stricter criterion is trying to avoid. This is because adopting a stricter significance criterion to reduce the risk of Type I errors due to perceived statistical significance raises, by the same amount, the risk of Type I errors due to perceived lack of statistical significance; and the inference that GR did not reduce suicide is based on perceived statistical insignificance. If the goal were not to affirm too easily substantive findings that may be due to measurement imprecision, then for conclusions that depend on similarity between groups rather than difference, the criterion for assessing true difference should be made less strict (perhaps .10), not more strict. A better practice, followed in many studies, might be to avoid unilaterally adjusting the conventional scientific standard of inference, but present findings as more or less certain with reference to that standard depending on their reported actual p-value.

Recognition that suicide risk may be lower with GR than without it in this analysis complicates, but does not negate, Ruuska et al’s [1] substantive observation that this comparison “does not support the claims that GR is necessary in order to prevent suicide.” It complicates it because a similar previous analysis of the same register data involving two of the same authors [12] already found that, when adjusted for year of initial contact with Finland’s gender identity service, follow-up contact with psychiatric care was emphatically no different by medical gender reassignment (GR) status. In that study, the hazard ratio (HR) for those gender dysphorics who had undergone GR—3.8 (95% CI 3.6-4.1)—was virtually identical to that of those who had not done so—3.9 (95% CI 3.6-4.2). Both of these HRs differed significantly from controls but did not differ significantly from each other. Unlike Ruuska et al., [1] they provided strong support for the earlier study’s conclusion that these findings “do not suggest that medical GR interventions resolve psychiatric morbidity among people experiencing gender distress.”[12]

This does not mean that Ruuska et al.’s observation is incorrect. There are at least two plausible explanations for the apparent contradiction of lower suicide mortality following GR despite similar high psychiatric morbidity, compared to GD registrants who did not receive GR. The first explanation reflects the time trajectory of suicide mortality. Dhejne et al. (2011), in the only other national registry study of completed suicide, observed that “survival of transsexual persons [undergoing medical gender reassignment] started to diverge from that of matched controls after about 10 years of follow-up.” [11] Up until 10 years post GR, the suicidal mortality of the Swedish registrants in the Dhejne et al. (2011) study did not differ from that of the general population controls, which is exactly what is observed, at an average follow-up of only 6.5 years, among the Finnish registrants in the Ruuska et al. [1] study. After 10 years, however, Dhejne et al. (2011) found that suicide mortality among Swedish GR recipients increased dramatically, a possibility that is not at all foreclosed for their Finnish counterparts by the findings of Ruuska et al. [1] .

A second explanation is suggested by the fact that, unlike in many jurisdictions, in Finland only 38% of GD diagnosed persons in treatment proceeded to medical GR. This suggests the presence of assessment, screening and monitoring processes that may inhibit some of the negative consequences of psychiatric morbidity, for example by ensuring better social support or medication compliance, than may be the case in other settings.
These two explanations are not mutually exclusive; both may be the case to some extent. Neither of them, moreover, impair in any way the main conclusion of Ruuska et al’s [1] study, that apart from attendant psychiatric morbidity, gender dysphoria in itself does not predict suicide mortality. Both also strongly support the important clinical implication of the study, that it is “of utmost importance to identify and appropriately treat mental disorders in adolescents experiencing gender dysphoria to prevent suicide.”

Endnote 1: The studies are: Bockerman et al.’s study of opioid use and employment rates (41.9 million person-years) [2]; Silenpaa et al.’s estimation of childhood epilepsy incidence (entire population rates); [3] Vaajala et al.’s (2024) study of fear of childbirth (comparing 211,202 cases from the Finnish Birth Register); [4] Ax et al.’s (2024) study of hand surgery and sick leave (24,341 cases); [5] Valtanen et al.’s (2024) comparison of psychiatric medication use by treatment strategy (44,685 cases); [6] Raudasoja et al. (2024), examining corrective surgery after treatment of distal radial fractures (41,418 cases); [7] Terho et al’s. (2024) study of frozen embryo transfer child growth rates (35,894); [8] Tamlander et al’s (2024) analysis of implications of polygenic risk for colorectal cancer screening (453,733 cases); [9] Kolari et al. (2024), examining ADHD medication use among Finnish children (41,920 cases); [10] and Holm et al.’s (2024) study of nonaffective psychosis diagnosis, examining 49,164 persons.

References
1 Ruuska S-M, Tuisku K, Holttinen T, et al. All-cause and suicide mortalities among adolescents and young adults who contacted specialised gender identity services in Finland in 1996–2019: a register study. BMJ Ment Health. 2024;27:e300940.
2 Böckerman P, Haapanen M, Hakulinen C, et al. Prescription opioid use and employment: A nationwide Finnish register study. Drug and Alcohol Dependence. 2021;227:108967.
3 Sillanpää ML, Camfield P, Löyttyniemi E. The changing incidence of childhood epilepsy in Finland. Seizure (London, England). 2024;117:20–7.
4 Vaajala M, Mattila VM, Kuitunen I. Fear of childbirth prolongs interpregnancy interval: A nationwide register-based quantile logistic regression analysis. European Journal of Obstetrics & Gynecology and Reproductive Biology: X. 2024;21:100281.
5 Ax M, Palola V, Ponkilainen V, et al. Duration of sick leave after operated and non-operated distal radial fracture: a Finnish cohort study of 19,995 patients. The Journal of hand surgery, European volume. 2024;49:316–21.
6 Valtanen K, Seikkula J, Kurtti M, et al. Ten-year patterns of psychiatric medications dispensed to adolescent in Finland: Open dialogue-informed practice in Western Lapland as compared to practice in other Finnish regions. Personalized Medicine in Psychiatry. 2024;43–44:100117.
7 Raudasoja L, Vastamäki H, Aspinen S, et al. Distal radial fractures: a nationwide register study on corrective osteotomies after malunion. J Hand Surg Eur Vol. 2024;49:329–33.
8 Terho AM, Tiitinen A, Salo J, et al. Growth of singletons born after frozen embryo transfer until early adulthood: a Finnish register study. Human reproduction (Oxford). 2024;39:604–11.
9 Tamlander M, Jermy B, Seppälä TT, et al. Genome-wide polygenic risk scores for colorectal cancer have implications for risk-based screening. British journal of cancer. 2024;130:651–9.
10 Kolari TA, Vuori M, RÄttÖ H, et al. Incidence of ADHD medication use among Finnish children and adolescents in 2008–2019: a need for practice changes? Scand J Public Health. 2024;14034948231219826.
11 Dhejne C, Lichtenstein P, Boman M, et al. Long-Term Follow-Up of Transsexual Persons Undergoing Sex Reassignment Surgery: Cohort Study in Sweden. PLoS One. 2011;6. doi: 10.1371/journal.pone.0016885
12 Kaltiala R, Holttinen T, Tuisku K. Have the psychiatric needs of people seeking gender reassignment changed as their numbers increase? A register study in Finland. Eur Psychiatry. 2023;66:e93.
13 Holttinen T, Pirkola S, Kaltiala R. Schizophrenia among young people first admitted to psychiatric inpatient care during early and middle adolescence. Schizophrenia research. 2023;252:103–9.
14 Jutta Niemi, Timo Holttinen, Riittakerttu Kaltiala. Adolescents with severe depressive and anxiety – a nationwide register study of inpatient-treated adolescents 1980- 2010. Psychiatria fennica (Online). 2023;54:66–79.
Show Less
Methods and results do not support conclusions of Ruuska et al.
20 March, 2024

We appreciate the interest in understanding the health and well-being of transgender persons and their unique care needs, particularly youth and adolescents. There are, however, several methodological missteps in the recent article by Ruuska et al. that has been published in BMJ Mental Health. The authors have fallen into a number of methodological mistakes and fallacies that make quite untenable their conclusions that gender-affirming interventions have no effect on suicide mortality.
First, the authors have not shared sufficient data to support their conclusions that gender-affirming interventions do not reduce suicide. A properly reported analysis must show the events and characteristics of all transgender persons referred for care, as well as the sub-groups (hormonal and/or surgical interventions vs. no interventions). Similarly, with respect to the shortfalls of their analytic methodology, the authors have not demonstrated that they checked the proportional hazards assumption on which their Cox models rely. Given the rapidly changing political and social environments for transgender people in countries around the world, including Finland, the assumption that the hazards are proportional over time must be examined and explained. The authors also violate standard practice by not showing Kaplan-Meier curves for each of the outcomes of interest, in addition to providing the rates of all-cause mortality and suicide in each risk group discussed.
Second, with onl...
Show More

We appreciate the interest in understanding the health and well-being of transgender persons and their unique care needs, particularly youth and adolescents. There are, however, several methodological missteps in the recent article by Ruuska et al. that has been published in BMJ Mental Health. The authors have fallen into a number of methodological mistakes and fallacies that make quite untenable their conclusions that gender-affirming interventions have no effect on suicide mortality.
First, the authors have not shared sufficient data to support their conclusions that gender-affirming interventions do not reduce suicide. A properly reported analysis must show the events and characteristics of all transgender persons referred for care, as well as the sub-groups (hormonal and/or surgical interventions vs. no interventions). Similarly, with respect to the shortfalls of their analytic methodology, the authors have not demonstrated that they checked the proportional hazards assumption on which their Cox models rely. Given the rapidly changing political and social environments for transgender people in countries around the world, including Finland, the assumption that the hazards are proportional over time must be examined and explained. The authors also violate standard practice by not showing Kaplan-Meier curves for each of the outcomes of interest, in addition to providing the rates of all-cause mortality and suicide in each risk group discussed.
Second, with only seven suicides among all transgender persons referred for care (medical interventions vs. none), the authors have likely presented over-fitted adjusted Cox models given the number of covariates that were included. This is particularly evident in the very wide confidence intervals for many of the hazard ratios presented. With so few events, the authors would perhaps be reasonably expected to adjust for one variable at a time.
Third, the authors have not presented a theoretical framework for the variables they include in their models. A directed acyclic graph (DAG) would provide important grounding in how each variable is handled and accounted for in relation to the outcome of suicide. In particular, including mental health visits—which are a component of gender-affirming care—is a fatal over-adjustment in the models that purport to show no relationship between receipt of gender-affirming care and death by suicide.
Lastly, adjusted hazard ratios for suicide mortality were 3.2 (95% CI: 1.0 to 10.2; p=0.05) among referred persons who had not received medical and surgical interventions and 0.8 (95% CI: 0.2 to 4.0; p=0.8) among referred persons who did receive interventions, each of these compared to cisgender controls, but not each other. The clinically meaningful sizes of the hazard ratios themselves demonstrate substantial likelihood that receipt of gender-affirming care is associated with a decrease in the risk of suicide mortality. Meanwhile, the enormous confidence intervals demonstrate a lack of precision within the models, which suggest insufficient statistical power notwithstanding the large overall sample sizes examined. The authors here succumb to the fallacy that a lack of statistical difference in a testing context equates with a lack of any difference at all.
Given the thankfully small number of suicides within the sample from Finland (7 out of 2083 identified transgender persons), it may be that a future study should pool multiple samples across comparable settings (e.g., Sweden, Norway, Netherlands, Denmark, etc.) and perform a meta-analysis. This would be understandably difficult as treatment models have not been uniform across time and geography, but this could provide sufficient power to truly explore differences in rates of suicide among transgender persons who receive medical and surgical interventions versus transgender persons who do not receive such interventions.
The authors should exercise appropriate restraint based on available data and the methods available to analyze them.
Show Less
Suicide prediction following self-harm: implications of the new science of risk modelling and being open to the evidence
20 March, 2024

The comments by Quinlivan and colleagues provide an opportunity to respond to some common misunderstandings of suicide risk assessment tools, and more broadly, prediction modelling. First, their comments are based on the mistaken assumption that all suicide prediction tools invariably have to classify individuals into low-risk versus high-risk groups. Unlike the earlier tools referred to in the response (all of which are classifiers, i.e. stratify people into risk categories), OxSATS provides probabilistic estimates of suicide risk. The benefits of probability estimation over classification have been discussed widely in the methodological literature,[1,2] and models which produce continuous risk scores are routinely used in other areas of medicine (such as the Framingham and QRISK models for cardiovascular disease risk prediction).

Second, Quinlivan and colleagues have compared the area under the curve (AUC) of OxSATS to earlier tools and highlighted the discrepancy in the interpretation of the findings. However, this misses the methodological point that what is considered good discrimination performance for a prediction model depends on the clinical area and available alternatives. While very high AUC values (e.g. above 0.90) can be reported for diagnostic prediction,[3] such values are rare in prognostic modelling, where AUC values in the 0.70s are found for best-performing models for incident cardiovascular disease[4] and adverse health outcomes (including mortal...
Show More

The comments by Quinlivan and colleagues provide an opportunity to respond to some common misunderstandings of suicide risk assessment tools, and more broadly, prediction modelling. First, their comments are based on the mistaken assumption that all suicide prediction tools invariably have to classify individuals into low-risk versus high-risk groups. Unlike the earlier tools referred to in the response (all of which are classifiers, i.e. stratify people into risk categories), OxSATS provides probabilistic estimates of suicide risk. The benefits of probability estimation over classification have been discussed widely in the methodological literature,[1,2] and models which produce continuous risk scores are routinely used in other areas of medicine (such as the Framingham and QRISK models for cardiovascular disease risk prediction).

Second, Quinlivan and colleagues have compared the area under the curve (AUC) of OxSATS to earlier tools and highlighted the discrepancy in the interpretation of the findings. However, this misses the methodological point that what is considered good discrimination performance for a prediction model depends on the clinical area and available alternatives. While very high AUC values (e.g. above 0.90) can be reported for diagnostic prediction,[3] such values are rare in prognostic modelling, where AUC values in the 0.70s are found for best-performing models for incident cardiovascular disease[4] and adverse health outcomes (including mortality) in COVID-19.[5]

Third, their comments fail to note that two models can have similar AUC values despite very different calibration performance. While OxSATS was reasonably well-calibrated in the external validation sample, ‘first-generation scales’ cannot even be assessed on their calibration performance because they have not been tested (and cannot without probability scores). Calibration is especially important when a model is intended to support clinical decision-making, as poorly calibrated predictions can make a model clinically useless or even harmful.[6]

Finally, it is misleading, in our view, to interpret the positive predictive value (PPV) of a suicide prediction model without considering the specific clinical context. This is because, like other measures of classification, PPV is dependent on the chosen risk cut-off. Usually, there is not one optimal risk threshold for a given prediction model, as the choice of cut-off depends on the specific clinical decision that the model is intended to inform and the relative costs and benefits of true and false positive classifications in that context.[1] For instance, health economics evidence from US primary care has shown that, depending on the target intervention, prediction models for suicide deaths may be cost-effective with PPV values as low as 0.07%, in part due to substantial cost savings associated with preventing one suicide death.[7]

References:

1. Wynants L, Smeden M van, McLernon DJ, Timmerman D, Steyerberg EW, Calster BV, et al. Three myths about risk thresholds for prediction models. BMC Med. 2019;17(1):192.

2. Goorbergh R van den, Smeden M van, Timmerman D, Calster BV. The harm of class imbalance corrections for risk prediction models: illustration and simulation using logistic regression. J Am Med Inform Assn. 2022.

3. Calster BV, Hoorde KV, Valentin L, Testa AC, Fischerova D, Holsbeke CV, et al. Evaluating the risk of ovarian cancer before surgery using the ADNEX model to differentiate between benign, borderline, early and advanced stage invasive, and secondary metastatic tumours: prospective multicentre diagnostic study. BMJ 2014;349:g5920.

4. Damen JA, Pajouheshnia R, Heus P, Moons KGM, Reitsma JB, Scholten RJPM, et al. Performance of the Framingham risk models and pooled cohort equations for predicting 10-year risk of cardiovascular disease: a systematic review and meta-analysis. BMC Med. 2019;17(1):109.

5. Wynants L, Calster BV, Collins GS, Riley RD, Heinze G, Schuit E, et al. Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal. BMJ. 2020;369:m1328.

6. Calster BV, McLernon DJ, Smeden M van, Wynants L, Steyerberg EW, Bossuyt P, et al. Calibration: the Achilles heel of predictive analytics. BMC Med. 2019;17(1):230.

7. Ross EL, Zuromski KL, Reis BY, Nock MK, Kessler RC, Smoller JW. Accuracy Requirements for Cost-effective Suicide Risk Prediction Among Primary Care Patients in the US. JAMA Psychiatry. 2021;78(6):642–50.
Show Less
Critique on Methodological Aspects in Culturally Adapted Counselling Study: Addressing Self-Reported Measures and Counsellor Training
20 March, 2024

The published study provides valuable insights into the effectiveness of culturally adapted counselling (CAC) for ethnic minorities, there are two critical aspects that warrant further discussion: the reliance on self-reported measures and the training and supervision of counsellors.

Firstly, the primary outcome measures in the study were self-reported by participants. While self-reporting is a common practice in psychological research, it is not without its limitations. Self-reported data are susceptible to biases, such as social desirability bias, where participants may provide responses they believe are more socially acceptable rather than their true feelings or experiences. Additionally, response bias can occur, particularly in longitudinal studies where participants might answer questions based on their memory of previous answers rather than their current state. These biases could significantly influence the study's findings, potentially overestimating the effectiveness of the CAC intervention. To enhance the robustness of future research, incorporating objective measures or third-party assessments could provide a more comprehensive and unbiased evaluation of the intervention's effectiveness.

Secondly, the study involved training counsellors in the culturally adapted intervention. However, the depth and effectiveness of this training, as well as the consistency of its application across counsellors, are not extensively discussed. The quality a...
Show More

The published study provides valuable insights into the effectiveness of culturally adapted counselling (CAC) for ethnic minorities, there are two critical aspects that warrant further discussion: the reliance on self-reported measures and the training and supervision of counsellors.

Firstly, the primary outcome measures in the study were self-reported by participants. While self-reporting is a common practice in psychological research, it is not without its limitations. Self-reported data are susceptible to biases, such as social desirability bias, where participants may provide responses they believe are more socially acceptable rather than their true feelings or experiences. Additionally, response bias can occur, particularly in longitudinal studies where participants might answer questions based on their memory of previous answers rather than their current state. These biases could significantly influence the study's findings, potentially overestimating the effectiveness of the CAC intervention. To enhance the robustness of future research, incorporating objective measures or third-party assessments could provide a more comprehensive and unbiased evaluation of the intervention's effectiveness.

Secondly, the study involved training counsellors in the culturally adapted intervention. However, the depth and effectiveness of this training, as well as the consistency of its application across counsellors, are not extensively discussed. The quality and uniformity of training are crucial factors in intervention studies, as they directly impact the intervention's fidelity and effectiveness. Inconsistent training or application could lead to variations in how counsellors deliver the intervention, potentially affecting the study's outcomes. Further details about the training process, the supervision provided to counsellors, and measures taken to ensure intervention fidelity would significantly contribute to the credibility and replicability of the findings.

In conclusion, while the study contributes to our understanding of mental health interventions for ethnic minorities, addressing these methodological concerns would greatly enhance the validity and generalisability of the findings. Future research could benefit from incorporating objective measures and providing more comprehensive details on counsellor training and supervision.
Show Less
Dementia Risk Score for UK
23 October, 2023

We are delighted to read the publication of a new dementia risk score for prediction of dementia up to 14 years. We congratulate the authors for incorporating almost all modifiable factors identified by the 2020 Lancet Commission on dementia prevention, management and care. Since the publication of the European Brain Health Guidelines earlier this year, memory clinic professionals in the UK have been desperately looking for a home grown tool. We hope to see an online training for using this clinical tool in the near future.
Suicide prediction following self-harm: are new tools any better?
23 October, 2023

Recent systematic reviews, clinical guidelines, and suicide prevention strategies suggest we should abandon the endeavour of using risk assessment to predict suicide and instead focus on clinical need. (1, 2) Does the recent carefully conducted paper by Fazel and colleagues mean that this advice should be overturned? We think not.

At the turn of the century several of us were involved in developing risk tools for self-harm. Nearly two decades later, colleagues have used larger samples, novel methodology, and painstaking analysis to produce OxSATs. Of course, interpretation of how good tools are depends on multiple diagnostic accuracy statistics and clinical context. However, one accepted measure of overall performance is the ‘area under the curve’ (AUC).

What is striking is that the AUC for predicting suicide in the 6 months after self-harm was nearly identical for OxSATs and the first generation scales (0.75 vs 0.71). (3, 4) What is perhaps even more striking is the very different interpretation of the findings. The authors of the recent study suggest OxSATs accurately predicts the risk of suicide whereas Steeg et al. concluded the opposite and suggested scales should not be used to determine treatment.

What should we make of this discrepancy? In the end it perhaps comes down to what different researchers mean by ‘accurate’. A commonly used measure which may reflect real world utility is the positive predictive value (PPV) - of those rated as at...
Show More

Recent systematic reviews, clinical guidelines, and suicide prevention strategies suggest we should abandon the endeavour of using risk assessment to predict suicide and instead focus on clinical need. (1, 2) Does the recent carefully conducted paper by Fazel and colleagues mean that this advice should be overturned? We think not.

At the turn of the century several of us were involved in developing risk tools for self-harm. Nearly two decades later, colleagues have used larger samples, novel methodology, and painstaking analysis to produce OxSATs. Of course, interpretation of how good tools are depends on multiple diagnostic accuracy statistics and clinical context. However, one accepted measure of overall performance is the ‘area under the curve’ (AUC).

What is striking is that the AUC for predicting suicide in the 6 months after self-harm was nearly identical for OxSATs and the first generation scales (0.75 vs 0.71). (3, 4) What is perhaps even more striking is the very different interpretation of the findings. The authors of the recent study suggest OxSATs accurately predicts the risk of suicide whereas Steeg et al. concluded the opposite and suggested scales should not be used to determine treatment.

What should we make of this discrepancy? In the end it perhaps comes down to what different researchers mean by ‘accurate’. A commonly used measure which may reflect real world utility is the positive predictive value (PPV) - of those rated as at high risk, how many go on to have the outcome of interest. In supplemental table 8 of the OxSATS paper the PPV for suicide within six months is 2%. In other words of 100 people rated as at high risk only 2 are truly at high risk. We acknowledge that the PPV is not the only the only way to judge a tool’s performance but we would argue that it is one that matters to clinicians.

Where should we go from here? Listening to those with lived experience is essential. Service users report that risk scales may detract from therapeutic engagement and lead to exclusion and iatrogenic harms. (5) For many, stigma and poor experiences of care are the more pressing concerns.(6) In the end, there are probably no short-cuts to careful assessment and no easy way to identify those who go on to have adverse outcomes. However, we can agree with Fazel and colleagues on the bigger issue of making high quality, evidence-based interventions available and accessible.

1. National Institute for Health and Care Excellence. Self-harm: assessment, management and preventing recurrence . NICE Guideline., 2022.
2. Department of Health and Social Care. Suicide prevention in England: 5-year cross-sector strategy. https://www.gov.uk/government/publications/suicide-prevention-strategy-f..., 2023.
3. Fazel S, Vazquez-Montes MDLA, Molero Y, et al. Risk of death by suicide following self-harm presentations to healthcare: development and validation of a multivariable clinical prediction rule (OxSATS). BMJ Mental Health 2023;26(1) doi: 10.1136/bmjment-2023-300673
4. Steeg S, Quinlivan L, Nowland R, et al. Accuracy of risk scales for predicting repeat self-harm and suicide: a multicentre, population-level cohort study using routine clinical data. BMC Psychiatry 2018;18
5. Graney J, Hunt IM, Quinlivan L, et al. Suicide risk assessment in UK mental health services: a national mixed-methods study. Lancet Psychiatry 2020;7(12):1046-53. doi: 10.1016/s2215-0366(20)30381-3
6. Quinlivan LM, Gorman L, Littlewood DL, et al. 'Relieved to be seen'-patient and carer experiences of psychosocial assessment in the emergency department following self-harm: qualitative analysis of 102 free-text survey responses. BMJ Open 2021;11(5):e044434. doi: 10.1136/bmjopen-2020-044434 [published Online First: 2021/05/25]
Show Less
Why "Alcohol Abuse" Needs to Go
23 October, 2023

I read this paper with great interest. The finding that 29% of somatic deaths were alcohol-related warrants further investigation, especially since, as the authors state, alcohol contributes to other somatic causes of death (e.g., cancer, CVD) that, in this methodology, were not classified as alcohol-related.

I would encourage the authors to refrain from using terms such as “alcohol abuse” as was used in this publication. While this term is ubiquitous in the alcohol literature, it perpetuates stigma toward individuals who have alcohol use disorder and/or drink alcohol at high-risk levels. Indeed, evidence has demonstrated that using words like “substance abuser” or “abuse” can lead to feelings that those who use alcohol and/or other drugs are to blame for their situation (1). Alternative terms such as “those drinking at high-risk levels” would be preferred (2).

References  
(1) https://www.sciencedirect.com/science/article/pii/S0955395909001546?via%...
(2) https://journals.sagepub.com/doi/10.1177/17579139221093163?icid=int.sj-f...
Real World Significance To Patients
29 June, 2023

It is regrettable that BMJ Mental Health marks its transition from the Journal Evidence-based Mental Health with the publication of a paper that could, at best, be judged evidence-informed than evidence-based. The authors of the O’Driscoll et al (2023) paper make no acknowledgements of possible publication bias. But they work either for the NHS trusts or IAPT. Further NHS Trusts operate the IAPT services. They make no critical appraisal of their usage of IAPT’s chosen metric of recovery. There is no acknowledgement of works that cast serious doubts on the Services claimed 50% recovery rate, Capobianco et al (2023), Scott (2018).
The O’Driscoll et al (2023) paper claims that CBT may be preferred to counselling for clients who have anxiety symptoms comorbid with depression. But the conclusions are built on sand in that:
a) there can be no certainty that the subjects studied were depressed as there was no ‘gold standard’ diagnostic interview conducted. Instead reliance was placed on a psychometric test, PHQ-9
b) there can be no certainty about comorbidity because of the absence of a diagnostic interview
c) no fidelity checks were carried out to establish whether therapists were conducting CBT or counselling. Reliance was instead placed on therapists claims.
d) no blind-raters were used to assess outcome
e) there can be no certainty that the observed changes would not have happened anyway because of the absence of a credible attention co...
Show More

It is regrettable that BMJ Mental Health marks its transition from the Journal Evidence-based Mental Health with the publication of a paper that could, at best, be judged evidence-informed than evidence-based. The authors of the O’Driscoll et al (2023) paper make no acknowledgements of possible publication bias. But they work either for the NHS trusts or IAPT. Further NHS Trusts operate the IAPT services. They make no critical appraisal of their usage of IAPT’s chosen metric of recovery. There is no acknowledgement of works that cast serious doubts on the Services claimed 50% recovery rate, Capobianco et al (2023), Scott (2018).
The O’Driscoll et al (2023) paper claims that CBT may be preferred to counselling for clients who have anxiety symptoms comorbid with depression. But the conclusions are built on sand in that:
a) there can be no certainty that the subjects studied were depressed as there was no ‘gold standard’ diagnostic interview conducted. Instead reliance was placed on a psychometric test, PHQ-9
b) there can be no certainty about comorbidity because of the absence of a diagnostic interview
c) no fidelity checks were carried out to establish whether therapists were conducting CBT or counselling. Reliance was instead placed on therapists claims.
d) no blind-raters were used to assess outcome
e) there can be no certainty that the observed changes would not have happened anyway because of the absence of a credible attention control condition
f) there can be no certainty that the observed changes were clinically meaningful or that changes endured. A 6 point improvement on in the CBT group and a 5 point improvement in the counselling for depression group.
g) the study was restricted to patients who attended 5 or more treatment sessions, but these are unrepresentative of IAPT clients with only half of clients having 2 or more treatment sessions (defined by IAPT as ‘treatment’). The mean number of IAPT treatment sessions is 7 but the mean number of treatment sessions in the O’Driscoll et al (2023) was 10 in counselling for depression and 11 in CBT. Further the third of IAPT clients who undergo low intensity intervention alone were excluded. Generalisation from this study is fraught with difficulties
Does the emergence of BMJ Mental Health signal the demise of evidence-based mental health? I hope not.

Capobianco, L., Verbist, I., Heal, C., Huey, D., & Wells, A. (2023). Improving access to psychological therapies: Analysis of effects associated with remote provision during COVID-19. The British journal of clinical psychology, 62(1), 312–324. https://doi.org/10.1111/bjc.12410

O'Driscoll C, Buckman JEJ, Saunders R, et al Symptom-specific effects of counselling for depression compared to cognitive–behavioural therapy BMJ Ment Health 2023;26:e300621.

Scott M. J. (2018). Improving Access to Psychological Therapies (IAPT) - The Need for Radical Reform. Journal of health psychology, 23(9), 1136–1147. https://doi.org/10.1177/1359105318755264
Show Less
Suicide risk factors and risk assessment – Authors' reply
21 June, 2023

In our meta-analysis, we synthesised evidence on risk factors for suicide based on psychological autopsy studies [1]. We included data from 37 case-control studies and examined associations for 40 risk factors in 12,734 adults. Novel aspects are the inclusion of a wide range of risk factors across four domains – sociodemographic, family history, clinical, and life events – and quantitative methods to examine sources of heterogeneity.

In their response, Soper and Large question one interpretation to the findings (rather than methods, analyses, or reporting) stating that consideration of risk factors and risk assessment has limited clinical utility. We think that this is a misreading of the evidence.

First, assessing the risk of suicide and linking assessment to preventative measures is a central component of clinical care. We suggest that prediction models can assist in stratifying an individual’s suicide risk. One advantage of empirically derived prediction models over subjective clinical judgment is that they attempt to incorporate the relative strength of multiple risk factors and their interactions. In addition, subjective clinical judgement tends to be optimistic with an over-reliance on recent events [2]. Furthermore, risk assessment tools can improve consistency within and between clinical services. They can also raise the ceiling of expertise, particularly where high staff turnover and variations in training experience exist, and anchor decision-maki...
Show More

In our meta-analysis, we synthesised evidence on risk factors for suicide based on psychological autopsy studies [1]. We included data from 37 case-control studies and examined associations for 40 risk factors in 12,734 adults. Novel aspects are the inclusion of a wide range of risk factors across four domains – sociodemographic, family history, clinical, and life events – and quantitative methods to examine sources of heterogeneity.

In their response, Soper and Large question one interpretation to the findings (rather than methods, analyses, or reporting) stating that consideration of risk factors and risk assessment has limited clinical utility. We think that this is a misreading of the evidence.

First, assessing the risk of suicide and linking assessment to preventative measures is a central component of clinical care. We suggest that prediction models can assist in stratifying an individual’s suicide risk. One advantage of empirically derived prediction models over subjective clinical judgment is that they attempt to incorporate the relative strength of multiple risk factors and their interactions. In addition, subjective clinical judgement tends to be optimistic with an over-reliance on recent events [2]. Furthermore, risk assessment tools can improve consistency within and between clinical services. They can also raise the ceiling of expertise, particularly where high staff turnover and variations in training experience exist, and anchor decision-making in evidence.

Second, research has identified limitations in current approaches [3], which commonly draw on tools and checklists developed for assessing depression and suicidal ideation rather than for predicting future risk of suicide [4]. Thus, discounting the potential value of risk assessment on the basis of tools designed for other purposes is unfounded. Focusing on single factors in isolation is another misreading – prediction models incorporate multiple factors. For example, in a longitudinal UK study [5], of 20,230 women aged 15–24 who self-harmed, 14 died by suicide in the subsequent year (incidence rate 111/100,000). In men aged 55 and older, 2766 self-harmed and 29 died by suicide (incidence rate 1874/100,000). This is a >15-fold difference based on two factors. Incorporating others, for which our meta-analysis provides an empirical basis [1], will improve assessment.

Another problem with risk assessment critiques is that they do not compare prediction models with current approaches (eg, unstructured clinical approaches) nor do they consider a full range of performance measures, including negative predictive value (NPV) and calibration [6]. NPV is potentially important as identifying true negatives can preserve resources by screening out persons who do not need further assessment and treatment [7]. Calibration, whether a tool predicts a risk level that is close to the observed risk, is also a key metric.

More recently, prognostic models to predict suicidal behaviour in high-risk groups such as individuals with psychiatric disorders have been developed with good discrimination and calibration [8]. Similarly, the Oxford Mental Illness and Suicide (OxMIS) tool [9] showed good performance in predicting suicide in people with severe mental illness (schizophrenia-spectrum disorders and bipolar disorder). Risk scores from OxMIS or other tools can be used to assist clinical decision-making. The Oxford Suicide after Self-harm (OxSATS) tool is another recent prediction model developed with high-quality methods [10].
Using risk prediction tools in combination with clinical judgment can improve care by identifying those at higher risk earlier and lead to more targeted management, which could be cost saving [11] and improve allocation of limited clinical resources. Decision curve analyses will help inform appropriate cut-offs for clinical use [12].

Our meta-analysis [1] was not about the potential role of risk assessment. Rather, by outlining the range and magnitude of individual risk factors, it underscores the rationale for combining multiple predictors when assessing risk.

References
1. Favril L, Yu R, Uyar A, Sharpe M, Fazel S. Risk factors for suicide in adults: systematic review and meta-analysis of psychological autopsy studies. Evid Based Ment Health 2022; 25: 148-55.
2. Pease JL, Forster JE, Davidson CL, et al. How Veterans Health Administration Suicide Prevention Coordinators assess suicide risk. Clin Psychol Psychother 2017; 24: 401-10.
3. Wolf A, Fazel, S. Overstating the lack of evidence on suicide risk assessment. Br J Psychiatry 2017; 210: 369.
4. Fazel S, Runeson B. Suicide. N Engl J Med 2020; 382: 266-74.
5. Geulayov G, Casey D, Bale L, et al. Suicide following presentation to hospital for non-fatal self-harm in the Multicentre Study of Self-harm: a long-term follow-up study. Lancet Psychiatry 2019; 6: 1021-30.
6. Whiting D, Fazel S. How accurate are suicide risk prediction models? Asking the right questions for clinical practice. Evid Based Ment Health 2019; 22: 125-8.
7. Bolton JM, Gunnell D, Turecki G. Suicide risk assessment and intervention in people with mental illness.
BMJ 2015; 351: h4978.
8. Chen Q, Zhang-James YL, Barnett EJ, et al. Predicting suicide attempt or suicide death following a visit to psychiatric specialty care: a machine learning study using Swedish national registry data. PLoS Med 2020;
17: e1003416.
9. Fazel S, Wolf A, Larsson H, Mallett S, Fanshawe TR. The prediction of suicide in severe mental illness: development and validation of a clinical prediction rule (OxMIS). Transl Psychiatry 2019; 9: 98.
10. Fazel S, Vazquez-Montes M, Molero Y, et al. Risk of death by suicide following self-harm presentations to healthcare: development and validation of a multivariable clinical prediction rule. BMJ Ment Health; in press.
11. Botchway S, Tsiachristas A, Pollard J, Fazel S. Cost-effectiveness of implementing a suicide prediction tool (OxMIS) in severe mental illness: economic modeling study. Eur Psychiatry 2022; 66: e6.
12. Fitzgerald M, Saville BR, Lewis RJ. Decision curve analysis. JAMA 2015; 313: 409-10.
Show Less

Main menu

eLetters

24 e-Letters

Pages

Publication Date

Log in using your username and password

Main menu

Log in using your username and password

You are here

eLetters

24 e-Letters

Pages

Publication Date