Article Text
Abstract
Background More knowledge on the cost-effectiveness of various depression treatment programmes can promote efficient treatment allocation and improve the quality of depression care.
Objective This study aims to compare the real-world cost-effectiveness of an algorithm-guided programme focused on remission to a predefined duration, patient preference-centred treatment programme focused on response using routine care data.
Methods A naturalistic study (n=6295 in the raw dataset) was used to compare the costs and outcomes of two programmes in terms of quality-adjusted life years (QALY) and depression-free days (DFD). Analyses were performed from a healthcare system perspective over a 2-year time horizon. Incremental cost-effectiveness ratios were calculated, and the uncertainty of results was assessed using bootstrapping and sensitivity analysis.
Findings The algorithm-guided treatment programme per client yielded more DFDs (12) and more QALYs (0.013) at a higher cost (€3070) than the predefined duration treatment programme. The incremental cost-effectiveness ratios (ICERs) were around €256/DFD and €236 154/QALY for the algorithm guided compared with the predefined duration treatment programme. At a threshold value of €50 000/QALY gained, the programme had a probability of <10% of being considered cost-effective. Sensitivity analyses confirmed the robustness of these findings.
Conclusions The algorithm-guided programme led to larger health gains than the predefined duration treatment programme, but it was considerably more expensive, and hence not cost-effective at current Dutch thresholds. Depending on the preferences and budgets available, each programme has its own benefits.
Clinical implication This study provides valuable information to decision-makers for optimising treatment allocation and enhancing quality of care cost-effectively.
- depression & mood disorders
Data availability statement
The data used in this research concerns linked pseudonymised patient level data, suitable for use by researchers, after permission from the members of the IMPROVE consortium. However, due to binding legislation and institutional policy sharing of these data to third parties is not possible.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
WHAT IS ALREADY KNOWN ON THIS TOPIC
Prior research has established the effectiveness of algorithm-guided programmes for depression in primary care settings, but there is limited research on their cost-effectiveness in specialised mental healthcare settings.
WHAT THIS STUDY ADDS
Using a cost-effectiveness analysis over a 2-year time horizon, the analysis found that the algorithm-guided programme was more effective and more expensive than the predefined duration treatment programme.
HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY
This study provides a comparative analysis of two treatment programmes (as opposed to single treatments), emphasising the balance between cost and health outcomes in the short term.
The results can support patients and healthcare providers in treatment decisions, especially in settings where resources are limited.
Background
Major depressive disorder (MDD) is a debilitating mental health condition that affects millions of people worldwide.1 This condition imposes substantial disability and economic burdens on individuals and society at large.2 While there has been an expansion in available MDD treatment options in recent years, including the development of more selective antidepressants and the use of various medication and psychotherapy combinations, this increase in options has concurrently complicated the task of identifying the most effective treatment sequence for each individual. That is, while the effectiveness and partly also cost-effectiveness of separate treatments is known, there is less evidence on how to best combine treatments in practice, for instance, when to stop a certain treatment and switch to another treatment, or when to consider treatment is complete. Therefore, a challenge emerges in the need to organise sequential treatments in a cost-effective manner in routine practice.
In response to this challenge, algorithm-guided treatment (AGT) programmes, primarily using the stepped care model,3 have gained recognition as models for optimising treatment delivery and preventing chronic depression.4 These structured disease management models enhance client outcomes, reduce treatment resistance and increase the quality of care, thus earning the endorsement of physicians, client organisations and health insurance companies.5 Programmes such as the German Algorithm Project6 7 and the Texas Medication Algorithm Project8 have demonstrated more favourable outcomes than standard care.
Nevertheless, possibly lengthy treatment trajectories associated with adherence to algorithms can place a substantial strain on healthcare providers, leading to increased waiting times for new clients. Additionally, the longer and more intensive the treatment, the more costly it is. Consequently, predefined duration treatment (PDT) programmes have emerged as alternative strategies. These programmes, characterised by predefined treatment durations, aim to induce a therapeutic response in patients, rather than continuing treatment until a full recovery is achieved. Also, they allow a large degree of flexibility in initial treatment, aligning with individual client needs and preferences.
Studies evaluating treatment programmes have mostly examined only their effectiveness, and offer no information regarding cost. A limited number of investigations have analysed the cost-effectiveness of different treatment programmes (as opposed to separate treatments) for MDD in primary care settings.9–15 These studies have shown that several programmes demonstrated superior outcomes compared with usual care, although with a moderate increase in costs, yielding favourable incremental cost-effectiveness ratios (ICERs), within randomised controlled trials.
Few studies have evaluated cost-effectiveness in real practice using routine data. While this approach presents a higher risk of bias compared with an experimental setting, it has the advantage of providing insights into the real-world implementation of programmes. Within the Dutch context, specifically in the northern province of Friesland, a unique dataset from routine care allows for a comparison of two programmes implemented in recent history within the same region and with very comparable patient populations. This opportunity arose due to the systematic implementation of the AGT programme by a large specialised mental healthcare provider for several years (2012–2014). Subsequently, this programme was discontinued and replaced by PDT, which was offered by a separate, outsourced organisation from 2014 to 2019. It is worth noting that these two providers differed in terms of size and organisational structure, suggesting varied operational capacities and flexibilities. Yet, both programmes targeted the same group: individuals with MDDs too severe for primary care treatment and both were available in the same region. For stakeholders like clients and policy decision-makers, understanding the comparative cost-effectiveness of such programmes supports informed decision-making.
Objective
In light of this, our study aimed to use routine care data to assess the cost-effectiveness and cost-utility of an AGT programme for depression in a specialised mental healthcare setting in the Netherlands, in comparison to a PDT programme with a preset shorter treatment duration.
Methods
Study setting and data source
A comprehensive administrative mental healthcare database was used. This database contained patient-level information for, among others, clients of specialised mental health providers in the province of Friesland (with about 0.65 million inhabitants in 2019) from January 2010 to December 2019. The dataset includes demographic information (age and sex), principal diagnoses according to the Diagnostic and Statistical Manual of Mental Disorders (DSM), Fifth Edition, results of routine outcome monitoring (ROM) questionnaires filled by clients and a detailed breakdown of mental health service usage. Of note, ROM habits differed over time and between the organisations managing the AGT and PDT programmes.
The AGT programme, launched in 2012, aimed to improve depression care quality and treatment outcomes. It used specialised psychiatric personnel and followed established clinical guidelines with special attention for remission. Organisational shift replaced the AGT programme with the PDT programme 3 years later. The PDT programme features a predetermined treatment duration and client-centred treatment selection that considers client preferences, and aims at response primarily.
Participants
The study enrolled 6295 outpatients diagnosed with unipolar depression according to the DSM, Fourth Edition16 criteria at intake. Clients who started treatment under the AGT programme were included if they started treatment between January 2012 and December 2013, while clients using the PDT programme were included if they started treatment in 2014, to ensure full implementation of the respective treatment programmes. Clients participating in both programmes were excluded and all individuals were followed up for a duration of 2 years.
Only the clients’ first treatment episode was used, and the episode had to be concluded within a 2-year period after its start. Inclusion also required pre-treatment and post-treatment ROM scores within a 30-day time-window around start and end of treatment. Clients were informed that their medical data, after being anonymised, could be used for research purposes aimed at enhancing the quality of healthcare. Clients could opt-out and have their data excluded from the database.
To mitigate selection bias and potential confounding, we employed propensity score matching via the R package MatchIt17 to match clients participating in the AGT programme to those in the PDT programme. The propensity scores were estimated using a logistic regression model, factoring age, gender and baseline symptom score. The matching was conducted in a 3:1 ratio to maximise the sample size and statistical power while preserving the balance of covariates. We ensured the validity of our matching process by thoroughly checking the balance of matched variables across the two treatment programmes.18
Outcome measures
Both AGT and PDT programmes employed the OQ45 questionnaire for treatment evaluation. This 45-item self-report tool has a 180-point scale and assesses a client’s symptom severity and functioning level throughout treatment.19 Higher scores signify greater symptom distress, difficulties in interpersonal functioning and inadequacy in fulfilling social roles. It features three subscales and a validated cut-off score of 55 to differentiate mental disorders in the Dutch population.20
Cost assessment
Costs were analysed from the perspective of the healthcare system, capturing all client treatment activities. This was done by quantifying and valuing the duration of healthcare professionals’ involvement in client treatment, considering both direct time spent in consultations and indirect time, such as in meetings related to clients’ treatment planning. The cost of treatment was calculated using the Dutch Costing Manual for economic evaluations21 by multiplying the duration of various healthcare professionals’ services with their respective unit costs. All costs were expressed in euros at the 2019 price level.
Effectiveness assessment
The primary effectiveness measures used were depression-free days (DFDs) and quality-adjusted life years (QALYs). DFDs were calculated using t-scores derived from the OQ45 questionnaire (see the online supplemental material for details).22 QALYs were estimated using clients’ DFDs. Previous research shows full depression remission improved the quality of life by 0.4 on a 0–1 scale.23 Depressed clients typically score 0.6 on this utility scale.9 Two-year QALYs were calculated as 2×((DFDs/total days of 2 years’ time horizon×0.4)+0.6). The calculation assumes stable outcomes during follow-up after therapy ends, unless a recurrence occurs.
Supplemental material
Recurrence checking
Given the exceptionally low incidence of recurrences in the propensity score-matched population—verging on negligible—we used the recurrence rate from the unprocessed dataset in a sensitivity analysis. This unprocessed dataset was not selected based on availability of ROM scores or matched. We took these observed recurrence rates and used this in the study dataset, to assess the potential impact of recurrence on the outcomes (QALYs and DFD) and associated costs, considering that recurrence would escalate the costs while diminishing the effects. Our analysis evaluated the incidence of recurrences over two follow-up periods: 2 years and 3 years after end of treatment. Given the temporal constraints of our dataset, which extends only up to December 2019, we excluded clients who commenced treatment late and therefore could not be followed up for the full 2-year or 3-year period before the end-date. This led to final sample sizes of 3010 for the 2-year follow-up cohort and 1591 for the 3-year follow-up cohort. To gain further insights into the recurrence rates, we constructed Kaplan-Meier (KM) survival curves. We employed the observed recurrence rates from the unprocessed dataset for both the 2-year and 3-year follow-up periods to evaluate the potential impact of different recurrence rates on the cost-effectiveness of the two programmes in a range of sensitivity analyses. Recurrence was defined as the start of a new treatment episode at least 6 months after the end date of the previous treatment episode.24
Statistical methods
Cost-effectiveness analysis
The cost-effectiveness analysis (CEA) assesses the value for money of the AGT in comparison with the PDT programme in terms of the incremental cost required to gain incremental improvement in health over 2 years from a healthcare perspective. ICERs were calculated, dividing differences in costs by differences in health benefits, for both DFDs and QALYs as a health outcome. Costs and effects were not discounted, given the limited time horizon of 2 years. Analyses followed the Consolidated Health Economic Evaluation Reporting Standards checklist.25
Non-parametric bootstrapping was used to assess sampling uncertainty. ICERs were estimated for each of 25 000 bootstrapping resamples and results were summarised in a cost-effectiveness plane and a cost-effectiveness acceptability curve, showing the probability of either programme being cost-effective at different willingness-to-pay (WTP) threshold values. For this study, we used the common Dutch WTP thresholds for QALYs (€50 000/QALY).26
Sensitivity analyses
Multiple sensitivity analyses were conducted to validate the robustness of our study’s conclusions, with specific attention for assumptions regarding recurrences and utility weights. A first analysis considered a worst-case recurrence rate scenario, assuming reduction of health benefits immediately post-treatment for those with a recurrence and compared results for different baseline recurrence rates in the two programmes. Second, additional ICERs were calculated using a range of utility weights (0.6–0.8) for depression, and a third sensitivity analysis excluded indirect time costs.
Findings
Characteristics of participants
The study included 2467 clients, comprising 2181 in the PDT programme and 286 in the AGT programme (figure 1). Baseline evaluation showed significant age and baseline symptom score differences between the two groups (table 1). Clients in the PDT programme were on average younger (t(356.21)=−3.31, p<0.001) and had a lower baseline symptom score (t(323.23)=−5.57, p<0.001) than the clients in the AGT programme. Postmatching, the clients in both groups were comparable regarding age and baseline symptom scores. Notably, the client population in the AGT programme was smaller than that of the PDT programme, partly due to the AGT programme being in effect for only a brief period of 2 years before its conversion to PDT.
Figure 1 illustrates the recurrence rates in various stages of the data selection process. Without further selection, the AGT programme displayed a 2.20% recurrence rate, whereas it was 1.50% for the PDT programme. Online supplemental figure S2 shows plotted KM survival curves of time to recurrence. On extending the follow-up period to 3 years after the end of treatment, the recurrence rate was 4.52% in the AGT programme and 8.86% in the PDT programme (higher). None of the clients in the final matched cohort experienced a recurrence.
Cost-effectiveness analysis
Table 2 shows the deterministic analysis results. The health outcomes achieved through the AGT programme were higher than those obtained through the PDT programme, both in terms of DFD (12 more) and QALYs (0.013 more) per client.
The incremental cost-utility ratio of the AGT programme compared with the PDT programme was €236154/QALY gained (healthcare perspective), or €140 000/QALY gained when only direct contact time was included. The ICER was highly sensitive to minor fluctuations in incremental QALYs due to the small difference in QALYs between the two programmes. The ICER was €256/DFD and €152/DFD, respectively depending on whether or not overhead time was counted.
Figure 2 graphically displays the results of the uncertainty analysis: the coloured centre represents point estimates of incremental costs and effects, and the dots represent results of the bootstrap analyses. From a healthcare perspective, most ICERs are in the northeast quadrant, indicating that the AGT programme increased costs and improved health outcomes when using QALY as the outcome. Excluding the overhead treatment costs resulted in a slight downward shift in the position of the clouds (figure 2A,B). In the CEA with DFD as the outcome measure, the dots were situated in the northeast and other quadrant. The northeast quadrant signifies greater effectiveness linked with higher costs, while the northwest quadrant indicates that the AGT programme is less effective and more expensive (figure 2C,D) and hence would be dominated by the PDT programme.
Whether the increase in utility and effectiveness, as well as the associated rise in expenses, justifies the recommendation to implement the AGT programmes depends on the WTP. For a willingness to pay of €50 000/QALY, the likelihood that the AGT programme is cost-effective was <10% (figure 3), indicating a low probability that the extra costs are considered worthy, given the relatively limited health gains. Similarly, for a willingness to pay of €125/DFD, the probability of the AGT programme being cost-effective over the PDT programme was around 25%, however, this probability increased to around 50% when the willingness to pay for the AGT programme would be €250/DFD.
Sensitivity and uncertainty analysis
In the univariate (one-way) sensitivity analysis, different parameter values impacted the magnitude of the results, but in general, the direction of the findings remained unchanged (see online supplemental table S1-S4).The worst-case scenario of recurrence showed increased costs and decreased QALYs, with an ICER of 206 466/QALY (see online supplemental table S1). For the ICER to become cost-effective at a €50 000/QALY threshold, the recurrence rate in the PDT programme would need to exhibit a minimum 7-fold rise for a baseline recurrence rate of 10% in the AGT programme, or a 4.5-fold increase for a baseline recurrence rate of 20% (see online supplemental figure S1 and online supplemental table S1). Using lower utility weights resulted in less QALY gains and a lower probability of cost-effectiveness (online supplemental table S2). In all scenarios, the AGT programme showed higher costs and a small health benefits compared with the PDT programme.
Discussion
To the best of our knowledge, this is the first CEA comparing an AGT programme with a PDT programme focused on clients’ preferences in treatment of depression in specialised care conducted in the Dutch healthcare system. Higher costs were accompanied by only a modest improvement in effects, leading to an ICER of €140 000/QALY and €152/DFD. After factoring in the overhead costs related to the treatment of depression, the cost-effectiveness ratios of the AGT programme would be €236154/QALY and €256/DFD. The study indicates that the AGT programme is not cost-effective compared with the PDT programme at a WTP threshold of €50 000/QALY gained.
The result suggests that the utilisation of healthcare resources in the implementation of the AGT programme is relatively less efficient, leading to higher per-client costs. The higher costs observed in the AGT programme may be attributed, at least in part, to the more frequent use of multidisciplinary treatment approaches. This multidisciplinary treatment typically involves a team of healthcare professionals with different specialisations, such as psychiatrists, psychologists, social workers and nurses, working together to provide comprehensive and integrated care to the client. This collaborative approach, thus, may require additional resources, such as time for coordination among team members, more frequent client visits and specialised training for the staff. These factors can contribute to an overall increase in the cost of providing healthcare services.
An additional factor contributing to the higher cost of the AGT programme is its time-extensive nature. The programme focused on preventing recurrence in clients who have already achieved remission requires a sustained effort over a longer period, often involving ongoing monitoring, follow-up visits and supportive interventions. This extended duration of treatment and associated resource utilisation may result in higher costs, however it could also bring more favourable long-term outcomes. The latter could not be analysed in our current study.
The AGT programme exhibits a relatively lower rate of recurrence as compared with the PDT programme in the raw data. However, overall the recurrence rates observed seem very low when compared with literature.27 Two plausible explanations exist for the low recurrence rates observed in both programmes. First, clients may opt to seek treatment from general practice following a recurrence, which would not be observable in our data. Second, our study used a strict definition of recurrence, identifying it only when a new treatment episode occurred after a 6-month period without treatment, potentially leading to an under-representation of the actual recurrence rate by missing the recurrences within 6 months after end of treatment. Almost no instances of recurrence were detected in the two final matched cohorts. The clients who had experienced recurrence in unprocessed data were excluded from the analysis. For the majority this was due to a missing ROM score while a small part was excluded due to matching. Therefore, no information about the cost-effectiveness of treating clients who experience recurrence was obtained. However, the sensitivity analyses showed that for a range of assumptions regarding recurrences our conclusion would remain unchanged. Only when the AGT would result in more than sevenfold lower recurrence rates it would become cost-effective compared with the PDT.
Using a naturalistic data sample has the advantage of yielding important information about real-world cost-effectiveness. It is, however, accompanied by several limitations. First, a substantial proportion of clients were excluded due to missing symptom scores at the end of treatment. This exclusion might introduce selection bias into the analysis, but was present for both programmes. The reasons for the lack of end-of-treatment ROM results are both client and clinician related: once treatment has finished, clients are possibly less inclined to fill the rather extensive questionnaires, while clinicians do not use the results and have no reason or opportunity to remind clients about filling the questionnaire. The second limitation concerns the assumptions made for calculating the DFDs. Clients were assumed to keep their status after the treatment is complete. However, in practice, there may be further improvement or deterioration in the client’s health status. Thus, we may overestimate and underestimate the DFDs and this might even differ for the two programmes, affecting the ICER. Third, there is no information available on antidepressant dosage and drug type for clients receiving pharmacotherapy in the dataset. Therefore, we could not include cost of medication. As a result, the total cost may be underestimated. However, compared with costs of medical professionals, medication costs will be small. Additionally, client comorbidities were not taken into consideration while matching clients for both programmes, and only age, gender and baseline symptom scores were used due to the unavailability of corresponding data.
The health providers implementing these two programmes differed in size and organisation. The provider that implemented the PDT programme is smaller and more flexible than the provider that implemented the AGT programme, explaining some of the substantial differences in the indirect treatment costs (t(827.39)=16, p<0.001). This could result in long-term benefits that were not included in the current analysis. These differences are not so much related to the programme as to the organisation, and may indicate that it is better to compare the programmes using the ICER based on direct costs only.
Clinical implications
The primary objective of this CEA was to assist in the decision-making process regarding the general application of the PDT or AGT programme for the treatment of depression. In this regard, it is even more important to interpret the results based on local determinants than to compare them with other studies. The decision to implement the programme is strongly influenced by the willingness to pay. It is <10% likely that the AGT programme will be cost-effective at a threshold of €50 000/QALY in the Netherlands.26 These findings may suggest that the AGT programme is a less efficient use of resources in terms of achieving desired health outcomes, but it requires additional data to ascertain recurrence differences. Clients could still be informed about the potential benefits and limitations of AGTs, including their time-extensive nature and potential for higher costs. Our comparative analyses of distinct treatment programmes can be useful in aiding decision-making processes, particularly for individuals and stakeholders with limited capacity.
Data availability statement
The data used in this research concerns linked pseudonymised patient level data, suitable for use by researchers, after permission from the members of the IMPROVE consortium. However, due to binding legislation and institutional policy sharing of these data to third parties is not possible.
Ethics statements
Patient consent for publication
Ethics approval
The study was based on pseudonymised administrative healthcare data for which no ethical approval was needed.
Acknowledgments
Thanks to Corneel Bouman and Xinyu Li for their constructive comments and invaluable assistance which significantly contributed to the refinement of this research. Corneel Bouman performed an initial analysis of these data as part of his master’s degree.
References
Supplementary materials
Supplementary Data
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Footnotes
Correction notice This paper has been corrected since it was first published. The authors noted some errors in the statements concerning patient consent for publication, ethics approval and the data availability statement.
Contributors FL, TF, FJ and EV contributed to the design of the study. EV, TF, FJ, BG and SOdV coordinated the data acquisition and data interpretation. FL and MB performed the statistical analysis. TF is the guarantor.
Funding EV, TF and FJ were supported by a grant from ‘Stichting De Friesland’ during the period 2015–2021 in the IMRPOVE project, which aimed among others to develop a data infrastructure for accessing, preprocessing and organising the routine data that were used for the current study. FL was awarded a personal scholarship from the China Scholarship Council (CSC) under the file number 202006050038, while conducting this work.
Competing interests SOdV was involved in the design of the algorithm-guided treatment programme investigated in this study. BG was involved in the running of the predefined duration treatment programme investigated in this study. All other authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Provenance and peer review Not commissioned; externally peer reviewed.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.