We appreciate the feedback from Streed et al., Sullins, Vesterinen, and Meyerowitz-Katz.
There has been discussion regarding whether suicide mortality differs between gender-referred adolescents who proceeded to gender reassignment (GR) and those who did not. We did not examine this. Our key finding was that suicides are rare among gender-referred adolescents, primarily explained by severe psychiatric morbidity. We consider the division of the gender-referred group into those who proceeded to GR (GR+) and those who did not (GR-) as additional information. No difference in suicide mortality was found between either GR group and matched controls in our final model. More detailed subgroup analyses could not be presented due to data security regulations by Statistics Finland, given the small number of suicides. For the same reason, we could not present the Kaplan-Meier curve requested by Streed et al., nor conduct analyses one variable at a time. Special permission was required to compare the GR subgroups with the controls in the final model.
There has also been debate about our use of p<0.01 as the threshold for statistical significance and the wide confidence intervals in suicide mortality between the GR subgroups. We consider this threshold justified given the sample size and use of multivariate models to minimise the role of chance. Even if p<0.05 were used, the finding would still be only borderline significant, and the wide confidence intervals of the...
We appreciate the feedback from Streed et al., Sullins, Vesterinen, and Meyerowitz-Katz.
There has been discussion regarding whether suicide mortality differs between gender-referred adolescents who proceeded to gender reassignment (GR) and those who did not. We did not examine this. Our key finding was that suicides are rare among gender-referred adolescents, primarily explained by severe psychiatric morbidity. We consider the division of the gender-referred group into those who proceeded to GR (GR+) and those who did not (GR-) as additional information. No difference in suicide mortality was found between either GR group and matched controls in our final model. More detailed subgroup analyses could not be presented due to data security regulations by Statistics Finland, given the small number of suicides. For the same reason, we could not present the Kaplan-Meier curve requested by Streed et al., nor conduct analyses one variable at a time. Special permission was required to compare the GR subgroups with the controls in the final model.
There has also been debate about our use of p<0.01 as the threshold for statistical significance and the wide confidence intervals in suicide mortality between the GR subgroups. We consider this threshold justified given the sample size and use of multivariate models to minimise the role of chance. Even if p<0.05 were used, the finding would still be only borderline significant, and the wide confidence intervals of the hazard ratio indicate that the small number of suicides permits no claims of a real difference between the GR- group and the controls. Wide confidence intervals are expected with so few suicides. However, Statistics Finland permitted us to share that, in a Cox regression comparing suicide mortality between the GR+ and GR- groups without other variables, there was no significant difference (p=0.3). In a simple cross-tabulation, the p-value was even higher (p>0.3). Thus, no clear evidence of a difference in suicide mortality between the groups emerged even in a direct comparison.
Streed et al. called for testing the proportional hazards assumption of the Cox regression. We visually inspected Schoenfeld residuals related to the model’s variables and observed no detrimental change over time. However, with such few suicides, detecting clear changes in estimates over time is difficult.
Streed and Vesterinen suggest that using psychiatric specialist care visits as a control variable may constitute overcontrol. However, earlier research indicates that many individuals with gender dysphoria (GD) already have severe psychiatric disorders before their contact with gender identity clinics [1] or evidence of transgender status [2]. Unfortunately, we cannot from registry data reliably determine the chronological order of these conditions. Nonetheless, regardless of whether individuals proceed with GR treatment, the need for psychiatric care remains significant, often persisting for many patients [1,3,4]. Previous research has also shown psychiatric care needs to even increase for both those who undergo GR and those who do not [1]. Given the small number of suicides in this study, analysing detailed relationships between variables is difficult.
Lastly, Vesterinen expresses concern that the suicide mortality in GD youth before 2011 may not be represented in the data as minors were not generally referred to gender identity clinics before that year. However, minors have been assessed at the clinics before 2011, and the overall suicide mortality rate among those under 18 in Finland is very low [5].
In summary, the small number of suicides complicates statistical analysis. However, this study remains the first to directly compare suicide mortality among gender-referred adolescents to a matched control population. Our primary finding is that suicide mortality was, fortunately, very low. We welcome further research on this topic and support the call for future meta-analyses.
1. Kaltiala R, Holttinen T, Tuisku K. Have the psychiatric needs of people seeking gender reassignment changed as their numbers increase? A register study in Finland. Eur Psychiatry 2023;66(1):e93. doi: 10.1192/j.eurpsy.2023.2471.
2. Becerra-Culqui TA, Liu Y, Nash R, et al. Mental health of transgender and gender nonconforming youth compared with their peers. Pediatrics 2018;141(5):e20173845. doi: 10.1542/peds.2017–3845.
3. Dhejne C, Lichtenstein P, Boman M, et al. Long-term follow-up of transsexual persons undergoing sex reassignment surgery: cohort study in Sweden. PLOS One 2011;6:e16885. doi: 10.1371/journal.pone.0016885.
4. Hisle-Gorman E, Schvey NA, Adirim TA, et al. Mental healthcare utilization of transgender youth before and after affirming treatment. J Sex Med 2021;18:1444–54. doi: 10.1016/j.jsxm.2021.05.014
I eagerly write in response to the perspective piece entitled “Perfect storm: emotionally based school avoidance in the post-COVID-19 pandemic context.”1 The authors provide a timely and crucial analysis of the rising trend in emotionally based school avoidance (EBSA) following the COVID-19 pandemic. However, while their overview of the issue and proposed interventions are commendable, I believe there are several critical points that warrant further inquiry and discussion.
The authors rightly identify the complex array of factors contributing to EBSA, including school, family, and child-based risk factors. However, their analysis would benefit from a more nuanced investigation of the socioeconomic disparities exacerbated by the pandemic. Research has previously shown that children from lower-income families were disproportionately affected by school closures and faced greater challenges in accessing remote learning resources.2 Such a preexisting inequality may have further exacerbated EBSA patterns among vulnerable populations—this deserves greater emphasis in developing targeted interventions.
The authors acknowledge the need for multi-component approaches across education, health, and social care sectors, and their call for early intervention that does not impose strict absenteeism criteria is laudable. My only worry is that this approach may present challenges in terms of resource allocation and identifying those most in need of support...
I eagerly write in response to the perspective piece entitled “Perfect storm: emotionally based school avoidance in the post-COVID-19 pandemic context.”1 The authors provide a timely and crucial analysis of the rising trend in emotionally based school avoidance (EBSA) following the COVID-19 pandemic. However, while their overview of the issue and proposed interventions are commendable, I believe there are several critical points that warrant further inquiry and discussion.
The authors rightly identify the complex array of factors contributing to EBSA, including school, family, and child-based risk factors. However, their analysis would benefit from a more nuanced investigation of the socioeconomic disparities exacerbated by the pandemic. Research has previously shown that children from lower-income families were disproportionately affected by school closures and faced greater challenges in accessing remote learning resources.2 Such a preexisting inequality may have further exacerbated EBSA patterns among vulnerable populations—this deserves greater emphasis in developing targeted interventions.
The authors acknowledge the need for multi-component approaches across education, health, and social care sectors, and their call for early intervention that does not impose strict absenteeism criteria is laudable. My only worry is that this approach may present challenges in terms of resource allocation and identifying those most in need of support. There needs to be a more detailed discussion of how to balance early intervention with targeted support for those at highest risk.
While the authors briefly mention the importance of addressing bullying and relationship issues in schools, I believe this point deserves more attention. Recent research has highlighted the significant role of peer relationships and school climate in influencing school attendance and mental health outcomes.3 A more in-depth analysis of school-based interventions that focus on improving social dynamics and fostering a supportive school environment would likely enhance the authors’ proposed multi-component approach.
Additionally, the authors’ emphasis on parental engagement is well-founded, but their discussion would benefit from first addressing potential barriers to parental involvement. Factors such as work schedules, language barriers, and cultural differences can significantly impact parental engagement in school-based interventions.4 Addressing these challenges should be a key consideration in developing effective EBSA interventions.
Lastly, while the authors mention the need for contextually-relevant interventions, there is limited discussion on how to tailor these interventions to diverse cultural and ethnic backgrounds. Given the increasing diversity in many school systems,5 culturally adapted interventions that consider varying attitudes towards mental health and education across different communities are essential.6
Lester and Michelson’s perspective piece offers a convincing springboard for addressing the challenge of EBSA in the postpandemic era. Their work illuminates the pressing need for a paradigm shift in our approach to school attendance and mental health, in order to create more equitable educational ecosystems. The challenges posed by EBSA offer an opportunity to reimagine our evidence-based educational structures and support systems, potentially catalyzing structural changes that extend beyond the immediate issue at hand. In this light, as we continue to refine our understanding and approaches, we may find that the solutions to EBSA contribute to a broader reimagining of education that better serves the diverse needs of 21st-century learners.
References
Lester KJ, Michelson D. Perfect storm: emotionally based school avoidance in the post-COVID-19 pandemic context. BMJ Ment Health. 2024;27(1):e300944. Published 2024 Apr 5. doi:10.1136/bmjment-2023-300944
Engzell P, Frey A, Verhagen MD. Learning loss due to school closures during the COVID-19 pandemic. Proc Natl Acad Sci U S A. 2021;118(17):e2022376118. doi:10.1073/pnas.2022376118
Arseneault L. Annual Research Review: The persistent and pervasive impact of being bullied in childhood and adolescence: implications for policy and practice. J Child Psychol Psychiatry. 2018;59(4):405-421. doi:10.1111/jcpp.12841
Reardon T, Harvey K, Baranowska M, O'Brien D, Smith L, Creswell C. What do parents perceive are the barriers and facilitators to accessing psychological treatment for mental health problems in children and adolescents? A systematic review of qualitative and quantitative studies. Eur Child Adolesc Psychiatry. 2017;26(6):623-647. doi:10.1007/s00787-016-0930-6
Hussar WJ, Bailey TM. Projections of Education Statistics to 2026. National Center for Education Statistics; 2018. Washington, DC.
Rathod S, Gega L, Degnan A, et al. The current status of culturally adapted mental health interventions: a practice-focused review of meta-analyses. Neuropsychiatr Dis Treat. 2018;14:165-178. Published 2018 Jan 4. doi:10.2147/NDT.S138430
Thank you for your article, it is always good to see more visual techniques to display research data. I would like to suggest some improvements for your consideration.
I am a little concerned that the scale of your suggested plots is linear, which means that larger segments have a disproportionately larger area. The human visual processing system tends to judge by area when comparing objects, something that was well understood by Florence Nightingale when she created her rose diagrams. Common practice in data visualisation is to use area for circles and arcs for this reason. This could be addressed by using a square root based scale rather than a linear one.
I also wonder whether all of the wedges having the same scale is appropriate? I would think that some outcomes (such as mortality) might have very small differences, but these would be of great consequence, compared with a minor adverse event which would look visually quite a bit more important. Having a different scale for each wedge - perhaps based on clinically important differences - could be more intuitive.
Also, a minor point, but red-green colour scales can be challenging to interpret for some people with different colour perception.
I am also concerned that it would be challenging to compare many different plots for large numbers of interventions. I wonder if the use of parallel coordinates, an established technique for multivariate comparisons, might address some of these issues?...
Thank you for your article, it is always good to see more visual techniques to display research data. I would like to suggest some improvements for your consideration.
I am a little concerned that the scale of your suggested plots is linear, which means that larger segments have a disproportionately larger area. The human visual processing system tends to judge by area when comparing objects, something that was well understood by Florence Nightingale when she created her rose diagrams. Common practice in data visualisation is to use area for circles and arcs for this reason. This could be addressed by using a square root based scale rather than a linear one.
I also wonder whether all of the wedges having the same scale is appropriate? I would think that some outcomes (such as mortality) might have very small differences, but these would be of great consequence, compared with a minor adverse event which would look visually quite a bit more important. Having a different scale for each wedge - perhaps based on clinically important differences - could be more intuitive.
Also, a minor point, but red-green colour scales can be challenging to interpret for some people with different colour perception.
I am also concerned that it would be challenging to compare many different plots for large numbers of interventions. I wonder if the use of parallel coordinates, an established technique for multivariate comparisons, might address some of these issues?
The authors present a retrospective cohort study of mostly adults who were referred to clinics in Finland for the treatment of gender dysphoria. However, one of the most important findings in this paper seems to have been missed in the discussion section.
The authors report that suicide risk was not statistically different between people referred for treatment and a matched cohort, with a hazard ratio of 1.8 (0.6-4.8) for people referred for gender dysphoria when compared to the control in the fully adjusted model (Table 3). However, the authors also conducted this analysis using only people who had accessed gender-affirming medical interventions (categorized as "Hormonal or surgical gender reassignment interventions" in Table 1) and those who hadn't (GR+ and GR-). In many ways, this is the more important analysis, as it addresses the question of medical treatment rather than medical referral.
The authors do note in their conclusion that there were no statistically significant differences in all-cause mortality when the data is split up into these groups, with GR- having a HR of 1.4 (0.6-3.3) and GR+ 0.7 (0.2-2). However, the results also show that the adjusted suicide mortality HRs for the GR- and GR+ groups compared to the matched control were 3.2 (1-10.2) and 0.8 (0.2-4) respectively. While the authors do not present an adjusted analysis of suicide mortality comparing these two groups directly, this implies a statistically significant associ...
The authors present a retrospective cohort study of mostly adults who were referred to clinics in Finland for the treatment of gender dysphoria. However, one of the most important findings in this paper seems to have been missed in the discussion section.
The authors report that suicide risk was not statistically different between people referred for treatment and a matched cohort, with a hazard ratio of 1.8 (0.6-4.8) for people referred for gender dysphoria when compared to the control in the fully adjusted model (Table 3). However, the authors also conducted this analysis using only people who had accessed gender-affirming medical interventions (categorized as "Hormonal or surgical gender reassignment interventions" in Table 1) and those who hadn't (GR+ and GR-). In many ways, this is the more important analysis, as it addresses the question of medical treatment rather than medical referral.
The authors do note in their conclusion that there were no statistically significant differences in all-cause mortality when the data is split up into these groups, with GR- having a HR of 1.4 (0.6-3.3) and GR+ 0.7 (0.2-2). However, the results also show that the adjusted suicide mortality HRs for the GR- and GR+ groups compared to the matched control were 3.2 (1-10.2) and 0.8 (0.2-4) respectively. While the authors do not present an adjusted analysis of suicide mortality comparing these two groups directly, this implies a statistically significant associated reduction in risk of suicide of roughly 50% for people referred to gender clinics in Finland and who had treatment when compared to those who were referred but did not access treatment.
The authors report in their discussion that "the suicide mortality of both those who proceeded and did
not proceed to GR did not statistically significantly differ from that of controls", indicating that presumably the p-value of the increased risk for GR- individuals was marginally above the threshold for significance (i.e. 0.051-0.0544). However, this arbitrary distinction is not useful. A more useful point is to note that the risk of suicide is substantially elevated in the GR- group, but with so few events the 95% CI reflects quite a bit of uncertainty as to the true rate. These results are consistent with everything from minimal difference in this cohort to a 10x increase.
This result appears to undercut the authors' stated conclusion that their findings do "not support the claims that GR is necessary in order to prevent suicide". While the study has significant limitations, noted by the authors, their results do seem to support the argument that GR treatment is associated with a reduced risk of suicide for people with clinical gender dysphoria in Finland when compared to people with gender dysphoria who do not receive treatment and a matched control. These results are both uncertain due to the low sample size, but imply that gender affirming care may be linked to lower suicide rates in people with gender dysphoria while a lack of such care may increase suicides.
I am writing here to express my concern that this paper might have some serious flaws that have apparently passed the peer review and editorial processes.
I am a doctoral candidate in Classics and unfortunately lack sophisticated skills at statistics, but due to my background both as a doctoral researcher in another field and as a former patient in a gender identity policlinic in Finland, I do believe I am capable of raising one question about the methodology and another about the analysis of the results.
The paper compares the all-cause mortality and suicide rates between individuals referred to gender identity clinics in Finland between 1996 and 2019 and a control group. The age limit is <23. The methodological problem that I perceive is that the paper fails to take into consideration that before 2011, minors were generally not granted referrals to gender identity clinics (see e.g. https://yle.fi/a/3-10707095 (in Finnish); https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4396787/). Consequently, any mortalities among the gender-incongruent youth under 18 years old would not be associated in the statistics with referrals to gi clinics but might be included in the control group. For example, had I succeeded in my suicide attempt at the age of 15 in 2008, this would not be classified as a gender-dysphoria-linked death in this paper despite my...
I am writing here to express my concern that this paper might have some serious flaws that have apparently passed the peer review and editorial processes.
I am a doctoral candidate in Classics and unfortunately lack sophisticated skills at statistics, but due to my background both as a doctoral researcher in another field and as a former patient in a gender identity policlinic in Finland, I do believe I am capable of raising one question about the methodology and another about the analysis of the results.
The paper compares the all-cause mortality and suicide rates between individuals referred to gender identity clinics in Finland between 1996 and 2019 and a control group. The age limit is <23. The methodological problem that I perceive is that the paper fails to take into consideration that before 2011, minors were generally not granted referrals to gender identity clinics (see e.g. https://yle.fi/a/3-10707095 (in Finnish); https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4396787/). Consequently, any mortalities among the gender-incongruent youth under 18 years old would not be associated in the statistics with referrals to gi clinics but might be included in the control group. For example, had I succeeded in my suicide attempt at the age of 15 in 2008, this would not be classified as a gender-dysphoria-linked death in this paper despite my several attempts to get a referral to a gi clinic prior to attempting suicide. This considerable limitation in the dataset is not mentioned at all in the paper.
The problem that I perceive in the analysis of the evidence is that the authors of the paper interpret the elevated suicide mortality in the gi-referred group as a consequence of mental health problems and not of (untreated) gender dysphoria. They draw the conclusion that gender dysphoria is not predictive of an elevated suicide risk in young gender-incongruent individuals. However, this conclusion seems to require that the authors completely reject the quite likely scenario that the mental health problems and suicide could both be a consequence of (untreated) gender dysphoria. On the other hand – as someone with no medical expertise –, I would consider it rather unexpected if suicides caused by untreated gender dysphoria did tend to occur without the presence of mental health issues such as major depression, anxiety-related conditions, eating disorders (for example, to control menstruation and body shape), or previous suicide attempts. However, this is what the authors of the paper seem to expect in the analysis.
The conclusions drawn in the paper and the medical implications made at the end (i.e., stressing the treatment of mental health problems rather than gender dysphoria that may be causing them) are likely to be taken into consideration when the treatment options of gender-incongruent youth are evaluated and reshaped both in Finland and elsewhere. This is why I am deeply concerned that the shortcomings pointed out above have the potential of resulting in detrimental consequences for young people seeking medical assistance with gender dysphoria in the near future. I do believe that treatment should be based on research and not opinion, but it is of utmost importance that the research is of approvable quality and there can be no suspicion of bias.
I would be most pleased if the authors, reviewers, or the editorial board would consider my concerns and, hopefully, demonstrate that they are unsubstantiated after all.
With kind regards
Jamie Vesterinen
Doctoral researcher
Doctoral Programme in History and Cultural Heritage
University of Helsinki
The article [1] reports the development of a new dementia risk score, leveraging off superior area under the curve (AUC) statistics compared with previously published risk scores. However, the representation and use of at least one of those prior risk scores is highly inaccurate and this raises concerns about the overall integrity of the publication.
1. The authors incorrectly state that the ANU-ADRI risk index[2] was ‘developed in cohorts in Australia’ (abstract and page 2). This is wrong, it was not developed directly from any other cohorts. Rather, as described in the original publication[2] it was developed using an evidence-based medicine approach that collated the effect sizes of risk factors drawn from systematic reviews. The systematic reviews draw from the wider literature, with most cohorts being from North America, the UK, and Europe. The tool was validated three external cohort studies. Data from Australia was rarely included in the meta-analyses from which the risk score was derived [2].
2. The authors say that the ANU-ADRI ‘was developed for older individuals (60+), ….however our sensitivity analysis also performed poorly when restricting our cohort to an age range matching its development sample’.
There are two problems with this sentence:
a. There was no development cohort for the ANU-ADRI so it could not have been possible for the described sensitivity analysis to have been undertaken.
b. Most cohort studies that contribut...
The article [1] reports the development of a new dementia risk score, leveraging off superior area under the curve (AUC) statistics compared with previously published risk scores. However, the representation and use of at least one of those prior risk scores is highly inaccurate and this raises concerns about the overall integrity of the publication.
1. The authors incorrectly state that the ANU-ADRI risk index[2] was ‘developed in cohorts in Australia’ (abstract and page 2). This is wrong, it was not developed directly from any other cohorts. Rather, as described in the original publication[2] it was developed using an evidence-based medicine approach that collated the effect sizes of risk factors drawn from systematic reviews. The systematic reviews draw from the wider literature, with most cohorts being from North America, the UK, and Europe. The tool was validated three external cohort studies. Data from Australia was rarely included in the meta-analyses from which the risk score was derived [2].
2. The authors say that the ANU-ADRI ‘was developed for older individuals (60+), ….however our sensitivity analysis also performed poorly when restricting our cohort to an age range matching its development sample’.
There are two problems with this sentence:
a. There was no development cohort for the ANU-ADRI so it could not have been possible for the described sensitivity analysis to have been undertaken.
b. Most cohort studies that contributed to the meta-analyses from which the ANU-ADRI was developed, included samples over the age of 70. In addition, the ANU-ADRI validation studies which were the Rush Memory and Ageing Study, the Kungsholmen Project and the US Cardiovascular Health study, had baseline mean ages of 79.8, 81.5 and 72.3 respectively[3]. Hence the ANU-ADRI was developed and validated on studies that comprised much older participants than the UKBiobank (UKB) which had a mean baseline age of 59.97 and the Whitehall II which had an age-range of 35 to 55.
3. The authors report on page 5 that the UKBRS performed similarly to the ANU-ADRI and DRS when evaluated in sensitivity analysis in the sample aged over 60 in the Whitehall study. As both the DRS and the ANU-ADRI were developed on older cohorts this is the more appropriate comparison (even though not ideal as the age-structure of the samples is still not directly comparable). However, instead of being reported as the main analysis and in the abstract, it was relegated to supplementary material. The authors do not report the mean and distribution of age in the comparison samples, or compare samples on mean age of incident dementia.
4. The comparison with ANU-ADRI and UKBDRS is highly biased by differences in the weights of age for each risk score. The comparison of the UKBDRS with ANU-ADRI is not valid because UKBDRS uses weights of age obtained from the dataset, however, the ANU-ADRI uses weights 0 for participants aged <65 (which is almost 82% of the data). Age is the most significant predictor of dementia and alone provided a very high ROC (ROC 0.77 for age only in comparison 0.80 for the UKBDRS) in the UKB data. The comparison with UKB with ANU-ADRI would be valid if they were compared without age in both models. We expect the result would have been similar for UKB and ANU-ADRI if they had been compared without age.
References
[1] Anaturk M, Patel R, Ebmeier KP, Georgiopoulos G, Newby D, Topiwala A, de Lange A-M, Cole JH, Jansen, MG, Singh-Manoux A, Kivimäki M, Suri, S. Development and validation of a dementia risk score in the UK Biobank and Whitehall II cohorts. BMJ Ment Health. 2023;26.
[2] Anstey KJ, Cherbuin N, Herath PM. Development of a New Method for Assessing Global Risk of Alzheimer's Disease for Use in Population Health Approaches to Prevention. Prevention science 2013;14:411-21.
[3] Anstey KJ, Cherbuin N, Herath P, Qui C, Kuller LH, Lopez OL, Wilson, R.S., Fratiglioni, L. A self report risk index to predict occurrence of dementia in three independent cohorts of older adults: The ANU-ADRI. PLoS One. 2014;9:e86141.
Ruuska et al.’s [1] analysis incorrectly concluded that medical gender reassignment (GR) did not reduce suicide mortality because they had assessed that “the suicide mortality of both those [presenting with gender dysphoria (GD)] who proceeded and did not proceed to GR did not statistically significantly differ from that of controls.” On the contrary, by conventional criteria, the suicide mortality of the non-GR group was significantly higher than controls while that of the GR group was not. Ruuska et al. [1] reported: "Adjusted HRs [hazard rates] for suicide mortality were 3.2 [for non-GR] (95% CI 1.0 to 10.2; p=0.05) and 0.8 [for GR] (95% CI 0.2 to 4.0; p=0.8), respectively." By this finding, those dysphorics who had undergone GR were no more likely to have committed suicide than were general population controls, while those who had not undergone GR were more than three times as likely to have committed suicide. The latter difference is reported with a p-value of .05 and a 95% confidence interval that does not extend below 1.0.
By the prevailing standards of scientific inference, and in virtually any other study, such a finding would be assessed as statistically significant (or perhaps, depending on rounding error, trivially below significance), but Ruuska et al. [1] judged it not to be so. They achieved this anomaly by fiat, announcing: “In order to avoid type 1 error due to multiple testing and the large data size, the cut- off for statistical sign...
Ruuska et al.’s [1] analysis incorrectly concluded that medical gender reassignment (GR) did not reduce suicide mortality because they had assessed that “the suicide mortality of both those [presenting with gender dysphoria (GD)] who proceeded and did not proceed to GR did not statistically significantly differ from that of controls.” On the contrary, by conventional criteria, the suicide mortality of the non-GR group was significantly higher than controls while that of the GR group was not. Ruuska et al. [1] reported: "Adjusted HRs [hazard rates] for suicide mortality were 3.2 [for non-GR] (95% CI 1.0 to 10.2; p=0.05) and 0.8 [for GR] (95% CI 0.2 to 4.0; p=0.8), respectively." By this finding, those dysphorics who had undergone GR were no more likely to have committed suicide than were general population controls, while those who had not undergone GR were more than three times as likely to have committed suicide. The latter difference is reported with a p-value of .05 and a 95% confidence interval that does not extend below 1.0.
By the prevailing standards of scientific inference, and in virtually any other study, such a finding would be assessed as statistically significant (or perhaps, depending on rounding error, trivially below significance), but Ruuska et al. [1] judged it not to be so. They achieved this anomaly by fiat, announcing: “In order to avoid type 1 error due to multiple testing and the large data size, the cut- off for statistical significance was set at a p<0.01” [1] rather than the usual standard of p<0.05. This is a decision is highly questionable, for two reasons. First, the size of the data analyzed is not nearly large enough to justify it: 18,726 cases (122,358 person-years). Preference for a .01 criterion usually involves data numbering in the hundreds of thousands, at least, and even then rarely precludes recognition of substantively important findings that may have a higher p-value below .05. Of the ten most recent studies using larger similar data—extractions from Finnish registers that are as large or larger than that of Ruuska et al. [1] —with numbers of cases ranging from 24,341 to 5.6 million (the entire Finnish population), or 41.9 million person-years, none rejected findings between .05 and .01 as not being statistically significant. (See Endnote 1) The most methodologically similar study of the same subject as that of Ruuska et al. [1] , Dhejne et al.’s [11] 2011 examination of the Swedish population register, also used the conventional .05 standard. None of these large-data register studies rejected as insignificant a finding with p less than .05 but greater than .01, or a 95% confidence interval that did not include nullity, except for Ruuska et al. [1] Co-authors of the Ruuska et al. [1] study also imposed a restrictive .01 standard for significance on an earlier register study of gender reassignment, [12] but did not do so for similarly-sized register studies of other topics, for example schizophrenia (17,112 cases), [13] and depression (16,842 cases). [14]
More importantly, second, adjusting the significance criterion to reduce the prospect of one type of error inescapably imbalances or biases, by the same amount, the analysis toward the opposite error. In this case, it perversely increases the risk of the very type of error the stricter criterion is trying to avoid. This is because adopting a stricter significance criterion to reduce the risk of Type I errors due to perceived statistical significance raises, by the same amount, the risk of Type I errors due to perceived lack of statistical significance; and the inference that GR did not reduce suicide is based on perceived statistical insignificance. If the goal were not to affirm too easily substantive findings that may be due to measurement imprecision, then for conclusions that depend on similarity between groups rather than difference, the criterion for assessing true difference should be made less strict (perhaps .10), not more strict. A better practice, followed in many studies, might be to avoid unilaterally adjusting the conventional scientific standard of inference, but present findings as more or less certain with reference to that standard depending on their reported actual p-value.
Recognition that suicide risk may be lower with GR than without it in this analysis complicates, but does not negate, Ruuska et al’s [1] substantive observation that this comparison “does not support the claims that GR is necessary in order to prevent suicide.” It complicates it because a similar previous analysis of the same register data involving two of the same authors [12] already found that, when adjusted for year of initial contact with Finland’s gender identity service, follow-up contact with psychiatric care was emphatically no different by medical gender reassignment (GR) status. In that study, the hazard ratio (HR) for those gender dysphorics who had undergone GR—3.8 (95% CI 3.6-4.1)—was virtually identical to that of those who had not done so—3.9 (95% CI 3.6-4.2). Both of these HRs differed significantly from controls but did not differ significantly from each other. Unlike Ruuska et al., [1] they provided strong support for the earlier study’s conclusion that these findings “do not suggest that medical GR interventions resolve psychiatric morbidity among people experiencing gender distress.”[12]
This does not mean that Ruuska et al.’s observation is incorrect. There are at least two plausible explanations for the apparent contradiction of lower suicide mortality following GR despite similar high psychiatric morbidity, compared to GD registrants who did not receive GR. The first explanation reflects the time trajectory of suicide mortality. Dhejne et al. (2011), in the only other national registry study of completed suicide, observed that “survival of transsexual persons [undergoing medical gender reassignment] started to diverge from that of matched controls after about 10 years of follow-up.” [11] Up until 10 years post GR, the suicidal mortality of the Swedish registrants in the Dhejne et al. (2011) study did not differ from that of the general population controls, which is exactly what is observed, at an average follow-up of only 6.5 years, among the Finnish registrants in the Ruuska et al. [1] study. After 10 years, however, Dhejne et al. (2011) found that suicide mortality among Swedish GR recipients increased dramatically, a possibility that is not at all foreclosed for their Finnish counterparts by the findings of Ruuska et al. [1] .
A second explanation is suggested by the fact that, unlike in many jurisdictions, in Finland only 38% of GD diagnosed persons in treatment proceeded to medical GR. This suggests the presence of assessment, screening and monitoring processes that may inhibit some of the negative consequences of psychiatric morbidity, for example by ensuring better social support or medication compliance, than may be the case in other settings.
These two explanations are not mutually exclusive; both may be the case to some extent. Neither of them, moreover, impair in any way the main conclusion of Ruuska et al’s [1] study, that apart from attendant psychiatric morbidity, gender dysphoria in itself does not predict suicide mortality. Both also strongly support the important clinical implication of the study, that it is “of utmost importance to identify and appropriately treat mental disorders in adolescents experiencing gender dysphoria to prevent suicide.”
Endnote 1: The studies are: Bockerman et al.’s study of opioid use and employment rates (41.9 million person-years) [2]; Silenpaa et al.’s estimation of childhood epilepsy incidence (entire population rates); [3] Vaajala et al.’s (2024) study of fear of childbirth (comparing 211,202 cases from the Finnish Birth Register); [4] Ax et al.’s (2024) study of hand surgery and sick leave (24,341 cases); [5] Valtanen et al.’s (2024) comparison of psychiatric medication use by treatment strategy (44,685 cases); [6] Raudasoja et al. (2024), examining corrective surgery after treatment of distal radial fractures (41,418 cases); [7] Terho et al’s. (2024) study of frozen embryo transfer child growth rates (35,894); [8] Tamlander et al’s (2024) analysis of implications of polygenic risk for colorectal cancer screening (453,733 cases); [9] Kolari et al. (2024), examining ADHD medication use among Finnish children (41,920 cases); [10] and Holm et al.’s (2024) study of nonaffective psychosis diagnosis, examining 49,164 persons.
References
1 Ruuska S-M, Tuisku K, Holttinen T, et al. All-cause and suicide mortalities among adolescents and young adults who contacted specialised gender identity services in Finland in 1996–2019: a register study. BMJ Ment Health. 2024;27:e300940.
2 Böckerman P, Haapanen M, Hakulinen C, et al. Prescription opioid use and employment: A nationwide Finnish register study. Drug and Alcohol Dependence. 2021;227:108967.
3 Sillanpää ML, Camfield P, Löyttyniemi E. The changing incidence of childhood epilepsy in Finland. Seizure (London, England). 2024;117:20–7.
4 Vaajala M, Mattila VM, Kuitunen I. Fear of childbirth prolongs interpregnancy interval: A nationwide register-based quantile logistic regression analysis. European Journal of Obstetrics & Gynecology and Reproductive Biology: X. 2024;21:100281.
5 Ax M, Palola V, Ponkilainen V, et al. Duration of sick leave after operated and non-operated distal radial fracture: a Finnish cohort study of 19,995 patients. The Journal of hand surgery, European volume. 2024;49:316–21.
6 Valtanen K, Seikkula J, Kurtti M, et al. Ten-year patterns of psychiatric medications dispensed to adolescent in Finland: Open dialogue-informed practice in Western Lapland as compared to practice in other Finnish regions. Personalized Medicine in Psychiatry. 2024;43–44:100117.
7 Raudasoja L, Vastamäki H, Aspinen S, et al. Distal radial fractures: a nationwide register study on corrective osteotomies after malunion. J Hand Surg Eur Vol. 2024;49:329–33.
8 Terho AM, Tiitinen A, Salo J, et al. Growth of singletons born after frozen embryo transfer until early adulthood: a Finnish register study. Human reproduction (Oxford). 2024;39:604–11.
9 Tamlander M, Jermy B, Seppälä TT, et al. Genome-wide polygenic risk scores for colorectal cancer have implications for risk-based screening. British journal of cancer. 2024;130:651–9.
10 Kolari TA, Vuori M, RÄttÖ H, et al. Incidence of ADHD medication use among Finnish children and adolescents in 2008–2019: a need for practice changes? Scand J Public Health. 2024;14034948231219826.
11 Dhejne C, Lichtenstein P, Boman M, et al. Long-Term Follow-Up of Transsexual Persons Undergoing Sex Reassignment Surgery: Cohort Study in Sweden. PLoS One. 2011;6. doi: 10.1371/journal.pone.0016885
12 Kaltiala R, Holttinen T, Tuisku K. Have the psychiatric needs of people seeking gender reassignment changed as their numbers increase? A register study in Finland. Eur Psychiatry. 2023;66:e93.
13 Holttinen T, Pirkola S, Kaltiala R. Schizophrenia among young people first admitted to psychiatric inpatient care during early and middle adolescence. Schizophrenia research. 2023;252:103–9.
14 Jutta Niemi, Timo Holttinen, Riittakerttu Kaltiala. Adolescents with severe depressive and anxiety – a nationwide register study of inpatient-treated adolescents 1980- 2010. Psychiatria fennica (Online). 2023;54:66–79.
We appreciate the interest in understanding the health and well-being of transgender persons and their unique care needs, particularly youth and adolescents. There are, however, several methodological missteps in the recent article by Ruuska et al. that has been published in BMJ Mental Health. The authors have fallen into a number of methodological mistakes and fallacies that make quite untenable their conclusions that gender-affirming interventions have no effect on suicide mortality.
First, the authors have not shared sufficient data to support their conclusions that gender-affirming interventions do not reduce suicide. A properly reported analysis must show the events and characteristics of all transgender persons referred for care, as well as the sub-groups (hormonal and/or surgical interventions vs. no interventions). Similarly, with respect to the shortfalls of their analytic methodology, the authors have not demonstrated that they checked the proportional hazards assumption on which their Cox models rely. Given the rapidly changing political and social environments for transgender people in countries around the world, including Finland, the assumption that the hazards are proportional over time must be examined and explained. The authors also violate standard practice by not showing Kaplan-Meier curves for each of the outcomes of interest, in addition to providing the rates of all-cause mortality and suicide in each risk group discussed.
Second, with onl...
We appreciate the interest in understanding the health and well-being of transgender persons and their unique care needs, particularly youth and adolescents. There are, however, several methodological missteps in the recent article by Ruuska et al. that has been published in BMJ Mental Health. The authors have fallen into a number of methodological mistakes and fallacies that make quite untenable their conclusions that gender-affirming interventions have no effect on suicide mortality.
First, the authors have not shared sufficient data to support their conclusions that gender-affirming interventions do not reduce suicide. A properly reported analysis must show the events and characteristics of all transgender persons referred for care, as well as the sub-groups (hormonal and/or surgical interventions vs. no interventions). Similarly, with respect to the shortfalls of their analytic methodology, the authors have not demonstrated that they checked the proportional hazards assumption on which their Cox models rely. Given the rapidly changing political and social environments for transgender people in countries around the world, including Finland, the assumption that the hazards are proportional over time must be examined and explained. The authors also violate standard practice by not showing Kaplan-Meier curves for each of the outcomes of interest, in addition to providing the rates of all-cause mortality and suicide in each risk group discussed.
Second, with only seven suicides among all transgender persons referred for care (medical interventions vs. none), the authors have likely presented over-fitted adjusted Cox models given the number of covariates that were included. This is particularly evident in the very wide confidence intervals for many of the hazard ratios presented. With so few events, the authors would perhaps be reasonably expected to adjust for one variable at a time.
Third, the authors have not presented a theoretical framework for the variables they include in their models. A directed acyclic graph (DAG) would provide important grounding in how each variable is handled and accounted for in relation to the outcome of suicide. In particular, including mental health visits—which are a component of gender-affirming care—is a fatal over-adjustment in the models that purport to show no relationship between receipt of gender-affirming care and death by suicide.
Lastly, adjusted hazard ratios for suicide mortality were 3.2 (95% CI: 1.0 to 10.2; p=0.05) among referred persons who had not received medical and surgical interventions and 0.8 (95% CI: 0.2 to 4.0; p=0.8) among referred persons who did receive interventions, each of these compared to cisgender controls, but not each other. The clinically meaningful sizes of the hazard ratios themselves demonstrate substantial likelihood that receipt of gender-affirming care is associated with a decrease in the risk of suicide mortality. Meanwhile, the enormous confidence intervals demonstrate a lack of precision within the models, which suggest insufficient statistical power notwithstanding the large overall sample sizes examined. The authors here succumb to the fallacy that a lack of statistical difference in a testing context equates with a lack of any difference at all.
Given the thankfully small number of suicides within the sample from Finland (7 out of 2083 identified transgender persons), it may be that a future study should pool multiple samples across comparable settings (e.g., Sweden, Norway, Netherlands, Denmark, etc.) and perform a meta-analysis. This would be understandably difficult as treatment models have not been uniform across time and geography, but this could provide sufficient power to truly explore differences in rates of suicide among transgender persons who receive medical and surgical interventions versus transgender persons who do not receive such interventions.
The authors should exercise appropriate restraint based on available data and the methods available to analyze them.
The comments by Quinlivan and colleagues provide an opportunity to respond to some common misunderstandings of suicide risk assessment tools, and more broadly, prediction modelling. First, their comments are based on the mistaken assumption that all suicide prediction tools invariably have to classify individuals into low-risk versus high-risk groups. Unlike the earlier tools referred to in the response (all of which are classifiers, i.e. stratify people into risk categories), OxSATS provides probabilistic estimates of suicide risk. The benefits of probability estimation over classification have been discussed widely in the methodological literature,[1,2] and models which produce continuous risk scores are routinely used in other areas of medicine (such as the Framingham and QRISK models for cardiovascular disease risk prediction).
Second, Quinlivan and colleagues have compared the area under the curve (AUC) of OxSATS to earlier tools and highlighted the discrepancy in the interpretation of the findings. However, this misses the methodological point that what is considered good discrimination performance for a prediction model depends on the clinical area and available alternatives. While very high AUC values (e.g. above 0.90) can be reported for diagnostic prediction,[3] such values are rare in prognostic modelling, where AUC values in the 0.70s are found for best-performing models for incident cardiovascular disease[4] and adverse health outcomes (including mortal...
The comments by Quinlivan and colleagues provide an opportunity to respond to some common misunderstandings of suicide risk assessment tools, and more broadly, prediction modelling. First, their comments are based on the mistaken assumption that all suicide prediction tools invariably have to classify individuals into low-risk versus high-risk groups. Unlike the earlier tools referred to in the response (all of which are classifiers, i.e. stratify people into risk categories), OxSATS provides probabilistic estimates of suicide risk. The benefits of probability estimation over classification have been discussed widely in the methodological literature,[1,2] and models which produce continuous risk scores are routinely used in other areas of medicine (such as the Framingham and QRISK models for cardiovascular disease risk prediction).
Second, Quinlivan and colleagues have compared the area under the curve (AUC) of OxSATS to earlier tools and highlighted the discrepancy in the interpretation of the findings. However, this misses the methodological point that what is considered good discrimination performance for a prediction model depends on the clinical area and available alternatives. While very high AUC values (e.g. above 0.90) can be reported for diagnostic prediction,[3] such values are rare in prognostic modelling, where AUC values in the 0.70s are found for best-performing models for incident cardiovascular disease[4] and adverse health outcomes (including mortality) in COVID-19.[5]
Third, their comments fail to note that two models can have similar AUC values despite very different calibration performance. While OxSATS was reasonably well-calibrated in the external validation sample, ‘first-generation scales’ cannot even be assessed on their calibration performance because they have not been tested (and cannot without probability scores). Calibration is especially important when a model is intended to support clinical decision-making, as poorly calibrated predictions can make a model clinically useless or even harmful.[6]
Finally, it is misleading, in our view, to interpret the positive predictive value (PPV) of a suicide prediction model without considering the specific clinical context. This is because, like other measures of classification, PPV is dependent on the chosen risk cut-off. Usually, there is not one optimal risk threshold for a given prediction model, as the choice of cut-off depends on the specific clinical decision that the model is intended to inform and the relative costs and benefits of true and false positive classifications in that context.[1] For instance, health economics evidence from US primary care has shown that, depending on the target intervention, prediction models for suicide deaths may be cost-effective with PPV values as low as 0.07%, in part due to substantial cost savings associated with preventing one suicide death.[7]
References:
1. Wynants L, Smeden M van, McLernon DJ, Timmerman D, Steyerberg EW, Calster BV, et al. Three myths about risk thresholds for prediction models. BMC Med. 2019;17(1):192.
2. Goorbergh R van den, Smeden M van, Timmerman D, Calster BV. The harm of class imbalance corrections for risk prediction models: illustration and simulation using logistic regression. J Am Med Inform Assn. 2022.
3. Calster BV, Hoorde KV, Valentin L, Testa AC, Fischerova D, Holsbeke CV, et al. Evaluating the risk of ovarian cancer before surgery using the ADNEX model to differentiate between benign, borderline, early and advanced stage invasive, and secondary metastatic tumours: prospective multicentre diagnostic study. BMJ 2014;349:g5920.
4. Damen JA, Pajouheshnia R, Heus P, Moons KGM, Reitsma JB, Scholten RJPM, et al. Performance of the Framingham risk models and pooled cohort equations for predicting 10-year risk of cardiovascular disease: a systematic review and meta-analysis. BMC Med. 2019;17(1):109.
5. Wynants L, Calster BV, Collins GS, Riley RD, Heinze G, Schuit E, et al. Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal. BMJ. 2020;369:m1328.
6. Calster BV, McLernon DJ, Smeden M van, Wynants L, Steyerberg EW, Bossuyt P, et al. Calibration: the Achilles heel of predictive analytics. BMC Med. 2019;17(1):230.
7. Ross EL, Zuromski KL, Reis BY, Nock MK, Kessler RC, Smoller JW. Accuracy Requirements for Cost-effective Suicide Risk Prediction Among Primary Care Patients in the US. JAMA Psychiatry. 2021;78(6):642–50.
The published study provides valuable insights into the effectiveness of culturally adapted counselling (CAC) for ethnic minorities, there are two critical aspects that warrant further discussion: the reliance on self-reported measures and the training and supervision of counsellors.
Firstly, the primary outcome measures in the study were self-reported by participants. While self-reporting is a common practice in psychological research, it is not without its limitations. Self-reported data are susceptible to biases, such as social desirability bias, where participants may provide responses they believe are more socially acceptable rather than their true feelings or experiences. Additionally, response bias can occur, particularly in longitudinal studies where participants might answer questions based on their memory of previous answers rather than their current state. These biases could significantly influence the study's findings, potentially overestimating the effectiveness of the CAC intervention. To enhance the robustness of future research, incorporating objective measures or third-party assessments could provide a more comprehensive and unbiased evaluation of the intervention's effectiveness.
Secondly, the study involved training counsellors in the culturally adapted intervention. However, the depth and effectiveness of this training, as well as the consistency of its application across counsellors, are not extensively discussed. The quality a...
The published study provides valuable insights into the effectiveness of culturally adapted counselling (CAC) for ethnic minorities, there are two critical aspects that warrant further discussion: the reliance on self-reported measures and the training and supervision of counsellors.
Firstly, the primary outcome measures in the study were self-reported by participants. While self-reporting is a common practice in psychological research, it is not without its limitations. Self-reported data are susceptible to biases, such as social desirability bias, where participants may provide responses they believe are more socially acceptable rather than their true feelings or experiences. Additionally, response bias can occur, particularly in longitudinal studies where participants might answer questions based on their memory of previous answers rather than their current state. These biases could significantly influence the study's findings, potentially overestimating the effectiveness of the CAC intervention. To enhance the robustness of future research, incorporating objective measures or third-party assessments could provide a more comprehensive and unbiased evaluation of the intervention's effectiveness.
Secondly, the study involved training counsellors in the culturally adapted intervention. However, the depth and effectiveness of this training, as well as the consistency of its application across counsellors, are not extensively discussed. The quality and uniformity of training are crucial factors in intervention studies, as they directly impact the intervention's fidelity and effectiveness. Inconsistent training or application could lead to variations in how counsellors deliver the intervention, potentially affecting the study's outcomes. Further details about the training process, the supervision provided to counsellors, and measures taken to ensure intervention fidelity would significantly contribute to the credibility and replicability of the findings.
In conclusion, while the study contributes to our understanding of mental health interventions for ethnic minorities, addressing these methodological concerns would greatly enhance the validity and generalisability of the findings. Future research could benefit from incorporating objective measures and providing more comprehensive details on counsellor training and supervision.
We appreciate the feedback from Streed et al., Sullins, Vesterinen, and Meyerowitz-Katz.
There has been discussion regarding whether suicide mortality differs between gender-referred adolescents who proceeded to gender reassignment (GR) and those who did not. We did not examine this. Our key finding was that suicides are rare among gender-referred adolescents, primarily explained by severe psychiatric morbidity. We consider the division of the gender-referred group into those who proceeded to GR (GR+) and those who did not (GR-) as additional information. No difference in suicide mortality was found between either GR group and matched controls in our final model. More detailed subgroup analyses could not be presented due to data security regulations by Statistics Finland, given the small number of suicides. For the same reason, we could not present the Kaplan-Meier curve requested by Streed et al., nor conduct analyses one variable at a time. Special permission was required to compare the GR subgroups with the controls in the final model.
There has also been debate about our use of p<0.01 as the threshold for statistical significance and the wide confidence intervals in suicide mortality between the GR subgroups. We consider this threshold justified given the sample size and use of multivariate models to minimise the role of chance. Even if p<0.05 were used, the finding would still be only borderline significant, and the wide confidence intervals of the...
Show MoreDear Editor,
I eagerly write in response to the perspective piece entitled “Perfect storm: emotionally based school avoidance in the post-COVID-19 pandemic context.”1 The authors provide a timely and crucial analysis of the rising trend in emotionally based school avoidance (EBSA) following the COVID-19 pandemic. However, while their overview of the issue and proposed interventions are commendable, I believe there are several critical points that warrant further inquiry and discussion.
The authors rightly identify the complex array of factors contributing to EBSA, including school, family, and child-based risk factors. However, their analysis would benefit from a more nuanced investigation of the socioeconomic disparities exacerbated by the pandemic. Research has previously shown that children from lower-income families were disproportionately affected by school closures and faced greater challenges in accessing remote learning resources.2 Such a preexisting inequality may have further exacerbated EBSA patterns among vulnerable populations—this deserves greater emphasis in developing targeted interventions.
The authors acknowledge the need for multi-component approaches across education, health, and social care sectors, and their call for early intervention that does not impose strict absenteeism criteria is laudable. My only worry is that this approach may present challenges in terms of resource allocation and identifying those most in need of support...
Show MoreThank you for your article, it is always good to see more visual techniques to display research data. I would like to suggest some improvements for your consideration.
I am a little concerned that the scale of your suggested plots is linear, which means that larger segments have a disproportionately larger area. The human visual processing system tends to judge by area when comparing objects, something that was well understood by Florence Nightingale when she created her rose diagrams. Common practice in data visualisation is to use area for circles and arcs for this reason. This could be addressed by using a square root based scale rather than a linear one.
I also wonder whether all of the wedges having the same scale is appropriate? I would think that some outcomes (such as mortality) might have very small differences, but these would be of great consequence, compared with a minor adverse event which would look visually quite a bit more important. Having a different scale for each wedge - perhaps based on clinically important differences - could be more intuitive.
Also, a minor point, but red-green colour scales can be challenging to interpret for some people with different colour perception.
I am also concerned that it would be challenging to compare many different plots for large numbers of interventions. I wonder if the use of parallel coordinates, an established technique for multivariate comparisons, might address some of these issues?...
Show MoreThe authors present a retrospective cohort study of mostly adults who were referred to clinics in Finland for the treatment of gender dysphoria. However, one of the most important findings in this paper seems to have been missed in the discussion section.
The authors report that suicide risk was not statistically different between people referred for treatment and a matched cohort, with a hazard ratio of 1.8 (0.6-4.8) for people referred for gender dysphoria when compared to the control in the fully adjusted model (Table 3). However, the authors also conducted this analysis using only people who had accessed gender-affirming medical interventions (categorized as "Hormonal or surgical gender reassignment interventions" in Table 1) and those who hadn't (GR+ and GR-). In many ways, this is the more important analysis, as it addresses the question of medical treatment rather than medical referral.
The authors do note in their conclusion that there were no statistically significant differences in all-cause mortality when the data is split up into these groups, with GR- having a HR of 1.4 (0.6-3.3) and GR+ 0.7 (0.2-2). However, the results also show that the adjusted suicide mortality HRs for the GR- and GR+ groups compared to the matched control were 3.2 (1-10.2) and 0.8 (0.2-4) respectively. While the authors do not present an adjusted analysis of suicide mortality comparing these two groups directly, this implies a statistically significant associ...
Show MoreHi
I am writing here to express my concern that this paper might have some serious flaws that have apparently passed the peer review and editorial processes.
I am a doctoral candidate in Classics and unfortunately lack sophisticated skills at statistics, but due to my background both as a doctoral researcher in another field and as a former patient in a gender identity policlinic in Finland, I do believe I am capable of raising one question about the methodology and another about the analysis of the results.
The paper compares the all-cause mortality and suicide rates between individuals referred to gender identity clinics in Finland between 1996 and 2019 and a control group. The age limit is <23. The methodological problem that I perceive is that the paper fails to take into consideration that before 2011, minors were generally not granted referrals to gender identity clinics (see e.g. https://yle.fi/a/3-10707095 (in Finnish); https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4396787/). Consequently, any mortalities among the gender-incongruent youth under 18 years old would not be associated in the statistics with referrals to gi clinics but might be included in the control group. For example, had I succeeded in my suicide attempt at the age of 15 in 2008, this would not be classified as a gender-dysphoria-linked death in this paper despite my...
Show MoreThe article [1] reports the development of a new dementia risk score, leveraging off superior area under the curve (AUC) statistics compared with previously published risk scores. However, the representation and use of at least one of those prior risk scores is highly inaccurate and this raises concerns about the overall integrity of the publication.
1. The authors incorrectly state that the ANU-ADRI risk index[2] was ‘developed in cohorts in Australia’ (abstract and page 2). This is wrong, it was not developed directly from any other cohorts. Rather, as described in the original publication[2] it was developed using an evidence-based medicine approach that collated the effect sizes of risk factors drawn from systematic reviews. The systematic reviews draw from the wider literature, with most cohorts being from North America, the UK, and Europe. The tool was validated three external cohort studies. Data from Australia was rarely included in the meta-analyses from which the risk score was derived [2].
2. The authors say that the ANU-ADRI ‘was developed for older individuals (60+), ….however our sensitivity analysis also performed poorly when restricting our cohort to an age range matching its development sample’.
Show MoreThere are two problems with this sentence:
a. There was no development cohort for the ANU-ADRI so it could not have been possible for the described sensitivity analysis to have been undertaken.
b. Most cohort studies that contribut...
Ruuska et al.’s [1] analysis incorrectly concluded that medical gender reassignment (GR) did not reduce suicide mortality because they had assessed that “the suicide mortality of both those [presenting with gender dysphoria (GD)] who proceeded and did not proceed to GR did not statistically significantly differ from that of controls.” On the contrary, by conventional criteria, the suicide mortality of the non-GR group was significantly higher than controls while that of the GR group was not. Ruuska et al. [1] reported: "Adjusted HRs [hazard rates] for suicide mortality were 3.2 [for non-GR] (95% CI 1.0 to 10.2; p=0.05) and 0.8 [for GR] (95% CI 0.2 to 4.0; p=0.8), respectively." By this finding, those dysphorics who had undergone GR were no more likely to have committed suicide than were general population controls, while those who had not undergone GR were more than three times as likely to have committed suicide. The latter difference is reported with a p-value of .05 and a 95% confidence interval that does not extend below 1.0.
By the prevailing standards of scientific inference, and in virtually any other study, such a finding would be assessed as statistically significant (or perhaps, depending on rounding error, trivially below significance), but Ruuska et al. [1] judged it not to be so. They achieved this anomaly by fiat, announcing: “In order to avoid type 1 error due to multiple testing and the large data size, the cut- off for statistical sign...
Show MoreWe appreciate the interest in understanding the health and well-being of transgender persons and their unique care needs, particularly youth and adolescents. There are, however, several methodological missteps in the recent article by Ruuska et al. that has been published in BMJ Mental Health. The authors have fallen into a number of methodological mistakes and fallacies that make quite untenable their conclusions that gender-affirming interventions have no effect on suicide mortality.
Show MoreFirst, the authors have not shared sufficient data to support their conclusions that gender-affirming interventions do not reduce suicide. A properly reported analysis must show the events and characteristics of all transgender persons referred for care, as well as the sub-groups (hormonal and/or surgical interventions vs. no interventions). Similarly, with respect to the shortfalls of their analytic methodology, the authors have not demonstrated that they checked the proportional hazards assumption on which their Cox models rely. Given the rapidly changing political and social environments for transgender people in countries around the world, including Finland, the assumption that the hazards are proportional over time must be examined and explained. The authors also violate standard practice by not showing Kaplan-Meier curves for each of the outcomes of interest, in addition to providing the rates of all-cause mortality and suicide in each risk group discussed.
Second, with onl...
The comments by Quinlivan and colleagues provide an opportunity to respond to some common misunderstandings of suicide risk assessment tools, and more broadly, prediction modelling. First, their comments are based on the mistaken assumption that all suicide prediction tools invariably have to classify individuals into low-risk versus high-risk groups. Unlike the earlier tools referred to in the response (all of which are classifiers, i.e. stratify people into risk categories), OxSATS provides probabilistic estimates of suicide risk. The benefits of probability estimation over classification have been discussed widely in the methodological literature,[1,2] and models which produce continuous risk scores are routinely used in other areas of medicine (such as the Framingham and QRISK models for cardiovascular disease risk prediction).
Second, Quinlivan and colleagues have compared the area under the curve (AUC) of OxSATS to earlier tools and highlighted the discrepancy in the interpretation of the findings. However, this misses the methodological point that what is considered good discrimination performance for a prediction model depends on the clinical area and available alternatives. While very high AUC values (e.g. above 0.90) can be reported for diagnostic prediction,[3] such values are rare in prognostic modelling, where AUC values in the 0.70s are found for best-performing models for incident cardiovascular disease[4] and adverse health outcomes (including mortal...
Show MoreThe published study provides valuable insights into the effectiveness of culturally adapted counselling (CAC) for ethnic minorities, there are two critical aspects that warrant further discussion: the reliance on self-reported measures and the training and supervision of counsellors.
Firstly, the primary outcome measures in the study were self-reported by participants. While self-reporting is a common practice in psychological research, it is not without its limitations. Self-reported data are susceptible to biases, such as social desirability bias, where participants may provide responses they believe are more socially acceptable rather than their true feelings or experiences. Additionally, response bias can occur, particularly in longitudinal studies where participants might answer questions based on their memory of previous answers rather than their current state. These biases could significantly influence the study's findings, potentially overestimating the effectiveness of the CAC intervention. To enhance the robustness of future research, incorporating objective measures or third-party assessments could provide a more comprehensive and unbiased evaluation of the intervention's effectiveness.
Secondly, the study involved training counsellors in the culturally adapted intervention. However, the depth and effectiveness of this training, as well as the consistency of its application across counsellors, are not extensively discussed. The quality a...
Show MorePages