Article Text

## Abstract

**Objective** A quantitative synthesis of evidence via standard pair-wise meta-analysis lies on the top of the hierarchy for evaluating the relative effectiveness or safety between two interventions. In most healthcare problems, however, there is a plethora of competing interventions. Network meta-analysis allows to rank competing interventions and evaluate their relative effectiveness even if they have not been compared in an individual trial. The aim of this paper is to explain and discuss the main features of this statistical technique.

**Methods** We present the key assumptions underlying network meta-analysis and the graphical methods to visualise results and information in the network. We used one illustrative example that compared the relative effectiveness of 15 antimanic drugs and placebo in acute mania.

**Results** A network plot allows to visualise how information flows in the network and reveals important information about network geometry. Discrepancies between direct and indirect evidence can be detected using inconsistency plots. Relative effectiveness or safety of competing interventions can be presented in a league table. A contribution plot reveals the contribution of each direct comparison to each network estimate. A comparison-adjusted funnel plot is an extension of simple funnel plot to network meta-analysis. A rank probability matrix can be estimated to present the probabilities of all interventions assuming each rank and can be represented using rankograms and cumulative probability plots.

**Conclusions** Network meta-analysis is very helpful in comparing the relative effectiveness and acceptability of competing treatments. Several issues, however, still need to be addressed when conducting a network meta-analysis for the results to be valid and correctly interpreted.

## Statistics from Altmetric.com

## Introduction

Evidence-based practices are crucial in informing healthcare decisions as they provide evidence on the effectiveness and adverse effects of the available treatment options. A quantitative synthesis of research findings from randomised controlled trials (RCTs) via meta-analysis lies at the top of evidence based methods.1 The benefits from meta-analysis are well established and include increased power, more precise effect estimates, and ability to generalise research findings and identify factors that modify the effect of an intervention (effect modifiers). In mental health, several meta-analyses have identified interventions that help people with mental disorders to attain better outcomes in terms of symptoms, functional status and quality of life. Examples of such interventions include psychosocial, psychological and pharmacological interventions.

Even within a class of interventions, there is a plethora of available options and they are not necessarily all equal. For example, Leucht *et al*2 found that second generation antipsychotics differ in many ways and should not be treated as a homogeneous class of drugs. It is expected that in most mental disorders a systematic review would find many trials comparing different interventions and the clinical interest would not lie only in comparing a pair of them (eg, via traditional meta-analysis) or grouping them in large classes (eg, active interventions vs placebo, psychosocial vs pharmacological interventions) of possibly heterogeneous interventions. The main question is which intervention is the best or the worst (eg, in terms of efficacy) and under what circumstances (eg, for whom). A relatively new statistical technique, called network meta-analysis (NMA), can be employed to address these issues.3–5 It is an extension of traditional pair-wise meta-analysis that allows synthesising studies that compare different interventions as long as these interventions form a connected network of evidence in which information flows not only directly but also indirectly. NMA yields summary estimates for the relative effectiveness between any pair of interventions by synthesising both direct and indirect evidence, and ranks them according to the outcome measured (eg, efficacy or safety). Systematic reviews that employ NMA are becoming more popular as initial doubt about the method fades away and user-friendly software becomes available.6 Statistical methodology is evolving and there are many review papers that provide guidelines on how to apply NMA, and how to present and interpret results.7–10 There are an increasing number of NMAs conducted in mental health that assess the comparative efficacy and tolerability of competing treatments for various disorders.11–14

## Basic concepts and assumptions in NMA

A fundamental concept in NMA is that of an indirect comparison. If two treatments, A and B, have both been compared with a common treatment, say C, in two different sets of trials (A vs C and B vs C), then the relative effectiveness between A and B can be estimated indirectly via the common comparator C.15 For illustrative purposes, we will consider three active antipsychotics, namely haloperidol (H), olanzapine (O) and risperidone (R). If there are only studies comparing risperidone or olanzapine with haloperidol, the summary estimates are as follows: and , where upper index refers to the source of evidence (*direct* in this case, but in theory it could be *direct* or *indirect*) and lower index refers to the treatment comparison; then, it is possible to yield the indirect evidence for the relative effectiveness between olanzapine and risperidone by subtracting the two summary estimates as follows: . Hence, even if there are no studies directly comparing olanzapine and risperidone (that means we cannot estimate ) we can still get an indirect estimate of the relative effectiveness between them. Bucher *et al*15 developed the idea of an indirect estimate and gave details on how to quantify their uncertainty and compute CIs. In a larger network of treatments, there is not necessarily only one indirect path from one intervention to another, as in the example we showed. If we had four antipsychotics, for example, adding paliperidone (P), we could get an indirect estimate for olanzapine versus risperidone through paliperidone, for example, O-P-R or through both haloperidol and paliperidone O-H-P-R. Another advantage of NMA is that it is possible to synthesise both indirect evidence and direct evidence into one pooled estimate, which is called *mixed estimate*. In a hypothetical network of studies directly comparing olanzapine with risperidone (), this estimate will be used along with the indirect estimate to get a NMA estimate for the relative effectiveness between the two antipsychotics.

Indirect evidence is plausibly valid and accurate if the unit of analysis is measured without uncertainty (or with uncertainty caused only by random variation). If we have three people, A, B and C, and we know that B is 5 cm taller than A and C is 8 cm taller than A, we immediately know that C is 3 cm taller than B. In practice, trials usually report relative differences that are subject not only to random variation but also to variation due to clinical and methodological aspects of the trials. For example, it has been shown that antidepressants are more effective in severely depressed people.16 If two active antidepressants, A and B, are compared with placebo, but A is tested only in trials with more severely depressed patients and B is tested in trials with patients suffering from mild depression, the indirect estimate for A versus B will give biased results and would probably show a spurious advantage of A over B (see below for more explanations about this).

Even if participants are randomised within a RCT to receive one of the available treatments, NMA is by nature an observational process because treatment comparisons are not randomised across trials. We do not necessarily have the same distribution of trial characteristics across treatment comparisons. A key assumption in NMA is *transitivity*, which implies that the distribution of the effect modifiers is the same across treatment comparisons. Transitivity is not violated if the trial characteristic does not modify the effect of the interventions. If A versus B studies involve younger participants than A versus C studies, we can still get a valid indirect estimate for B versus C if age is not an effect modifier. By contrast, if effectiveness of interventions changes with age, an indirect estimate is invalid and results from NMA are misleading. The *transitivity* assumption may be challenged when we have interventions included in trials conducted in different time periods, for example, old and new interventions. There is ample evidence that older trials involve smaller sample sizes, are of worse quality and show exaggerated effects.17 ,18 Publication bias is also more evident in older interventions. The transitivity assumption has a statistical manifestation known as the *consistency assumption*. This assumption implies that direct and indirect evidence is in agreement. In practice, if three treatments, A, B and C, form a closed loop of evidence (any subset of interventions where each of that have been directly compared with one another), the consistency equation requires that or that the direct estimate (left-hand side of the equation) equals the indirect estimate (right-hand side). The difference between direct and indirect evidence in a loop is called the *inconsistency factor* of the loop.

## A working example

There are many aspects in a NMA that need to be addressed in a thorough analysis. Are the underlying assumptions valid? How to proceed if not? How to present and interpret results? To illustrate how to conduct a NMA, we will use a published NMA that includes 67 RCTs (16 073 patients) and compares 15 antimanic drugs and placebo in acute mania. There were two primary outcomes: the mean change on mania rating scales (efficacy) and the number of patients who dropped out (acceptability), both at 3 weeks. The former was measured on a continuous scale and standardised mean differences were synthesised, whereas the latter was dichotomous and ORs were synthesised. All analyses were performed in Stata19 using the mvmeta command20 and a suite of commands for presenting and interpreting results from a NMA.21

## Network plot

The first step in a NMA is to understand the geometry of the network; that is, to understand which treatments have been compared directly in the included RCTs, how information flows indirectly and the contribution of certain interventions or treatment comparisons in the network. A standard tool to achieve these goals is a *network plot* that depicts the competing interventions by nodes and uses lines to connect those interventions that have been compared directly in a RCT. The size of the node can be used to represent extra information such as the number of studies involving this intervention or the number of participants who have been randomised to this intervention. The width of the lines can also be used to denote the number of studies for each comparisons or the number of participants observed in each comparison.

We can use the network plot to reveal information about characteristics that pertain to treatment comparisons. We may use different colours to represent trial characteristics that vary across treatment comparisons. Figure 1 shows two versions of the network plot. It is clear that most trials are placebo-controlled. From the active antimanics, haloperidol, lithium and olanzapine contribute significantly in the network. There are methods, not explored here, to determine exactly the contribution of each intervention or comparison in the network.21 ,22 The right-hand side plot of figure 1 conveys the same information but lines have been coloured according to publication date. A light grey colour for a treatment comparison refers to the fact that most studies for that comparison were published before 2003, whereas a dark grey colour means that the majority of studies were published from 2003 onwards. We observe that most studies comparing carbamazepine, divalproex, haloperidol and lithium to other drugs are compared in older studies.

The risk of bias in studies included in a treatment comparison should affect the confidence we place on the direct estimate. Salanti *et al*23 and Puhan *et al*24 extended the methodology developed by the Grading of Recommendations, Assessment, Development and Evaluation (GRADE) working group for placing confidence in the treatment effect estimates from a NMA. In a network plot, we can use different colours in the lines to represent the level of confidence we place in a treatment comparison or in a specific item of the risk of bias tool.21

## Network estimates

NMA allows estimating the relative effectiveness between any pair of interventions. In our example of 15 antimanic drugs, there are 105 relative effect estimates for each of the two outcomes (210 in total). One way to present results from a NMA is by drawing a square matrix, known as a league table, which contains all information about relative effectiveness and their uncertainty for all pairs of interventions (table 1). It is difficult to comprehend all the information from the league table but a closer look reveals that risperidone and haloperidol are more effective than most of the drugs and almost all drugs are more effective than lamotrigine, placebo, topiramate and gabapentin. Topiramate also ranks very low in terms of acceptability, and there is a significant difference between olanzapine and haloperidol in favour of olanzapine. From a clinical point of view, this is very important, because these two drugs are among the most effective ones.

## Inconsistency plot

Consistency is a key assumption for NMA and should be checked in each closed loop of evidence. Figure 2 presents the inconsistency factors (differences between direct and indirect evidence) accompanied by the 95% CI and the p value from testing equality of direct and indirect evidence. We are not interested in the direction of inconsistency (if direct estimates exceed the indirect ones or vice versa), but only on the magnitude of inconsistency. Hence, we change direction of all negative inconsistency factors so that they can be depicted in a positive scale in a graph. A zero inconsistency factor implies that direct and indirect evidence are in agreement for that loop. Caution is needed because with many treatments and many closed loops of evidence we may find inconsistency in some loops by pure chance. In addition, it is common to have a few studies in some loops to compute the corresponding inconsistency factor with much certainty. Inconsistency cannot be excluded if inconsistency factors include zero but have wide 95% CIs. Large inconsistency may compromise the validity of results from a NMA. Several other methods have been suggested to test for inconsistency.20 ,25 ,26 If inconsistency is found, we may explore it using network metaregression27 or encompass it by assuming statistical models that relax the consistency assumption.20 ,28 In figure 2 we see that there are significant inconsistencies in loops involving drugs such as carbamazepine, divalproex and lithium. These are drugs encountered in older studies. We found that the direct evidence from studies comparing carbamazepine and divalproex favoured divalproex (−0.84, 95% CI −1.63 to −0.03), while the indirect evidence favoured carbamazepine (0.30, 95% CI 0.01 to 0.57). Moreover, direct evidence for the relative effect for haloperidol versus lithium was very large (1.10, 95% CI 0.31 to 1.91) in favour of haloperidol whereas the indirect relative effect was very small (0.16, 95% CI 0.00 to 0.33).

## Ranking of interventions

One of the unique features of a NMA is to rank the competing interventions. We can, for each treatment, estimate the probability of it assuming any of the possible ranks. It is common to use the ‘probability of being the best’ as a method of finding the best treatment. We strongly suggest against this practice because it does not take into account the uncertainty of the relative effect estimate and the probabilities of assuming any of the other possible ranks.29 One can alternatively use rankograms and cumulative ranking probability plots.21 ,29 A rankogram plots the probabilities for treatment to assume any of the possible ranks. Figure 3 shows rankograms of the antimanic drugs for efficacy. Risperidone and haloperidol have high probabilities of being either best or second best drugs. Both drugs have probabilities larger than 80% of being among the two most effective drugs. This is also evident from figure 4, where we give the SUCRA value to each intervention, which is the ratio of the area under the cumulative ranking curve to the entire area in the plot. The more quickly the cumulative ranking curve approaches one, the more close to unity this ratio is. SUCRA values may be seen as the percentage of effectiveness (or safety) a treatment achieves in relation to an imaginary treatment that is always the best without any uncertainty. Ideally, we would like to see peaks in a rankogram or a steep increase in the cumulative ranking plot as that would suggest that the corresponding rank where we observe the peak is the most probable one for the antipsychotic. In our example, this happens for the most effective drugs (risperidone, haloperidol and olanzapine) and for least effective drugs (divalproex, gabapentin, lamotrigine, placebo, topiramate and ziprasidone). A similar distribution of rank probabilities across all (or many) possible ranks indicates uncertain ranking for that treatment. Figure 5 plots a scatterplot between the SUCRA values for efficacy and tolerability of antimanic drugs. We use different symbols to cluster drugs into groups. It seems that haloperidol makes a group on its own: although it is very effective, there are more patient dropouts from its trials than in trials of other effective drugs.

## Comparison-adjusted funnel plot

A funnel plot is a scatterplot between study effect size versus its inverted SE, and an asymmetrical funnel plot implies there are differences in effectiveness between small and large studies, also known as small-study effects.30 With many treatments, there is not one summary estimate but many. Chaimani *et al*21 extended the use of the funnel plot to NMA by plotting the difference between the study-specific effect sizes from the corresponding comparison-specific summary versus the inverted SE. Prior to drawing this plot, it is important to order the treatments in a meaningful way, regarding which treatment the small-study effect would favour in a comparison. In figure 6 we sorted drugs according to efficacy as measured by their SUCRA values; in other words, our assumption was that more effective drugs were favoured in small trials. There was no asymmetry in the plot, suggesting that smaller studies do not favour more effective (or less effective) treatments. This plot may reveal outlying effect sizes for a given study size. For instance, in this working example we found a large effect size for a small study comparing haloperidol versus lithium. This big effect size may be responsible for large inconsistency factors observed in loops including this comparison.

## Conclusions

There is a series of conceptual challenges when conducting a NMA and these should be borne in mind by clinicians who read such publications in scientific journals. First of all, disagreement between direct and indirect evidence (inconsistency) poses a threat to the validity of results from a NMA. Presentation of results is not as straightforward as in traditional meta-analysis. Forest plots are of little use in a NMA. Instead, relative effects may be presented in a league table and small-study effects can be explored by a comparison-adjusted funnel plot. NMA is a relatively new technique and methodology is advancing rapidly. There is a lot of ongoing research on how to evaluate the quality of evidence from a NMA. Until recently, NMA was understood by researchers with a strong statistical background but development of user-friendly software has acquainted clinicians to NMA and popularised the method. New methods for testing and accounting for inconsistency, and for ranking the available treatments are constantly being developed. Just as in traditional meta-analysis, publication bias30 and missing outcome data31 which are very common in mental health trials may compromise overall results.

## Acknowledgments

DM, MG and GS received research funding from the European Research Council (IMMA 260559). AC acknowledges support from the NIHR Oxford cognitive health Clinical Research Facility.

## References

## Footnotes

Competing interests None.