Development and validation of a dementia risk score in the UK Biobank and Whitehall II cohorts

Background Current dementia risk scores have had limited success in consistently identifying at-risk individuals across different ages and geographical locations. Objective We aimed to develop and validate a novel dementia risk score for a midlife UK population, using two cohorts: the UK Biobank, and UK Whitehall II study. Methods We divided the UK Biobank cohort into a training (n=176 611, 80%) and test sample (n=44 151, 20%) and used the Whitehall II cohort (n=2934) for external validation. We used the Cox LASSO regression to select the strongest predictors of incident dementia from 28 candidate predictors and then developed the risk score using competing risk regression. Findings Our risk score, termed the UK Biobank Dementia Risk Score (UKBDRS), consisted of age, education, parental history of dementia, material deprivation, a history of diabetes, stroke, depression, hypertension, high cholesterol, household occupancy, and sex. The score had a strong discrimination accuracy in the UK Biobank test sample (area under the curve (AUC) 0.8, 95% CI 0.78 to 0.82) and in the Whitehall cohort (AUC 0.77, 95% CI 0.72 to 0.81). The UKBDRS also significantly outperformed three other widely used dementia risk scores originally developed in cohorts in Australia (the Australian National University Alzheimer’s Disease Risk Index), Finland (the Cardiovascular Risk Factors, Ageing, and Dementia score), and the UK (Dementia Risk Score). Clinical implications Our risk score represents an easy-to-use tool to identify individuals at risk for dementia in the UK. Further research is required to determine the validity of this score in other populations.


Inclusion and Exclusion Criteria
For the UKB study, we restricted our sample to participants aged 50+, and who had complete data available on all candidate predictors of interest, described in section 2.3. To improve alignment of age ranges between the WHII and UKB samples, Wave 5 (1997Wave 5 ( -1999 of the WHII study was selected as our 'baseline', given that the minimum age of the sample at that Wave was just above 40. To minimise the likelihood of reverse causation, we excluded participants who self-reported dementia at baseline, or developed dementia prior to or within the first 12 months of the baseline assessment in UKB. Similarly, for WHII, we excluded participants who self-reported a diagnosis at Wave 6 (2001, median years after baseline = 3), or anyone who had developed dementia within 2 years of the baseline visit. The two-year criterion was introduced to ensure that at least 12 months had passed between the baseline assessment and diagnosis date, given that the WHII sample only had the calendar year of dementia diagnosis available. For example, this would prevent including someone who had attended their Wave 5 appointment in December 1997 and then received a dementia diagnosis in January 1998. Individuals in the WHII cohort with no recorded follow up (i.e. attended baseline assessment and no further assessment, or no date recorded at later assessment) were also excluded due to inability to determine length of follow up time.

Dementia Ascertainment
In the UKB sample, all-cause dementia status was determined based on complementary sources of information as done in several papers based on this cohort [4][5][6][7][8][9][10]. This included self-reported medical history at the time of interview (UKB field ID #20002), primary care records (#42040, #42039), hospital inpatient records (#41270, #41280) and death registry records (#40001, #40002). The International Classification of Diseases (ICD) was used to identify cases with all-cause dementia, with the list of ICD-9 and ICD-10 codes presented in the SI Table 1. ICD-9 codes were included to enhance our sensitivity to detecting positive cases of dementia prior to enrolment. In the UKB cohort, an individual was classified as having dementia if they had either (1) self-reported a diagnosis at baseline (excluded from analyses), (2) received a primary or secondary diagnosis of dementia (primary care/ hospital records), (3) were prescribed dementia-related medications (e.g., rivastigmine) by their general practitioner (GP), or (4) if their primary or secondary cause of death was dementia-related. For individuals with multiple dates available for the above variables, the date of diagnosis was either determined with algorithmically defined dates (field ID #42018), or otherwise the date of a primary care diagnosis, primary care prescription date for dementia-related medications or date of death derived from primary care records, whichever date came earliest. Years to diagnosis was then calculated as the difference between the baseline assessment date and the earliest date of dementia diagnosis. In the WHII sample, dementia diagnosis was determined through self-report of a long standing illness of dementia and hospital inpatient records [11][12][13] (SI Table  1). Date of diagnosis was taken as the earlier date of the two sources. For those with only a self report of long standing illness of dementia reported at a study assessment, the date was taken as 1 year prior to the assessment year to account for lag in assessment.

Follow up and censoring derivation
In the UKB, the censoring date was 2021-10-31 which was the date until which diagnoses were available. Death records were used to obtain the date of death. For individuals with dementia, time at risk was calculated as the time in days between their baseline assessment and dementia diagnosis. For individuals with a death record and no diagnosis of dementia, time at risk was computed as the time between their baseline assessment and the date of their death record. For individuals alive and dementia free, time was taken as the time between their baseline assessment and the censoring date.
In the WHII, a combination of hospital records and study assessments were used to derive follow up. The latest assessment record was the later of an individual's last assessment date and the year of their most recent hospital admission. For example, an individual with a baseline assessment in 1997 who last attended a study wave in 2012 and had a hospital record in 2013 was defined as having a time at risk of 2013 -1997 = 16 years. Date of death was drawn from the national mortality register.

Identifying Candidate Predictors
We compiled a list of 28 risk and protective factors associated with dementia, including the 12 modifiable factors identified by the Lancet Commission [14]. Predictors were selected for inclusion if (1) they had been consistently associated with dementia, (2) if information about these was available in UKB, and (3) they could be easily obtained within a primary care setting. The list included demographic, biomedical, lifestyle, and genetic variables. Demographic variables consisted of age, sex, years of education, and material deprivation. Material deprivation was measured by the Townsend Deprivation index, which combines data on car ownership, household overcrowding, owner occupation, and unemployment to obtain a measure of material deprivation [36]. In the UK Biobank, scores are available as quintile splits, wherein they are computed for each postcode, then split into quintiles with the first quintile representing the fifth of the sample that was least deprived. In the WHII cohort, Townsend scores are available as quartile splits. Where applicable, we mapped the four quartiles in WHII to be the second to fifth quintiles as assessed in the UK Biobank dataset. Biomedical variables included body mass index (BMI), systolic blood pressure (BP), total cholesterol, high-density lipoprotein (HDL) cholesterol, and low-density lipoprotein (LDL) cholesterol. Medical history included self-reported history of a stroke/ transient ischaemic attack (TIA), traumatic brain injury (TBI), depression, diabetes (I and II), atrial fibrillation, hypertensive status, and high cholesterol and parental history of dementia. Hypertensive and cholesterol status were determined based on a combination of self report, self reported use of medications (anti-hypertensive or statins, respectively), or an inpatient diagnosis (SI Table 2 for codes used). Self-reported prescriptions of the following medications were also included: non-steroidal anti-inflammatory drugs (excluding aspirin) and hormone replacement therapy medications. Lifestyle variables consisted of physical activity (low, moderate/high-intensity classifications based on the International Physical Activity Questionnaire [15]), social engagement (household occupancy as living alone, living with one other, or living with multiple people, and frequency of family and friend visits per week), weekly fish consumption, average daily sleep duration, sleeplessness, weekly units of alcohol (computed as in [16,17], and smoker status. The full list of predictors is available with detailed descriptions including UKB field codes in SI Table 2.

Construction of external risk scores
Each score was calculated using the formulae reported in the original papers for the DRS, ANU-ADRI and CAIDE. As described in the main text, the UKB-DRS and the three external risk scores were computed in (1) the entire UKB Test and WHII samples (results presented in Table  2) and (2) in a stratified analysis where the UKB and WH-II samples were truncated to match the age ranges of the cohorts in which the external scores were originally developed (SI Table  7).

ANU-ADRI:
This score was originally developed using an evidence-based medicine approach, in which risk factors are identified based on review of the literature [18]. A total of 15 dementia risk factors are included in the ANU-ADRI, including age, sex, educational level, BMI, diabetes, TBI, depressive symptoms, high cholesterol, cognitive activity levels, social engagement, smoking, alcohol consumption, physical activity, fish intake and pesticide exposure. We computed the ANU-ADRI as the weighted sum of the beta coefficients of these risk factors, as reported in Anstey et al. [18]. For the UKB cohort, the necessary cognitive information was not available to accurately estimate cognitive activity levels, hence, this variable was excluded from the ANU-ADRI calculation. In the WHII, no information was available on pesticide exposure or whether a participant had a history of TBI. Hence, these variables were not included in the risk score calculation. Previous external validations of this score have also left out items such as pesticide exposure due to lack of data availability [19]. As a result, we were only able to compute a weighted sum rather than a predicted risk. Therefore, only discrimination (not calibration) was evaluated for the ANU-ADRI.

DRS:
The DRS [20] was originally developed using data (N = 930,395) from The Health Improvement Network (THIN) and offers two versions, one for individuals aged between 60-79 years old (N = 800,013, mean age = 65.6 ± 6.08) and another for individuals aged between 80-95 years old (N = 130,382, mean age = 84.8 ± 3.93). Given the age range of the UKB and WHII, we opted to use the DRS for 60-79 year-old individuals for all analysis in this study. The DRS score is calculated using age, sex, BMI, calendar year (of participation), deprivation score, smoker status, drinker status, current depression and antidepressant use, aspirin use, history of stroke and TIA, history of atrial fibrillation and diabetes. While the deprivation score is computed using 5 equal groups based on quintiles of the Townsend deprivation index, only four groups based on deprivation scores were available in the WHII cohort. The DRS score is computed as:

Development of the UKBDRS
To identify a parsimonious model, we submitted the candidate predictors to a least absolute shrinkage and selection operator (LASSO) cox regression. Recently utilised in the development of dementia risk models [22,23], LASSO is a type of regularised regression developed to minimise model overfitting [24], This method favours a sparse solution by setting the coefficients of predictors that fall below a certain threshold to zero [24,25], effectively providing a list of the most informative predictors of dementia. An initial ten-fold cross-validation was performed to identify the optimal lambda value that determines this threshold. The highest lambda within one standard error of the minimum was selected for further analysis. As the results of LASSO can be unreliable in the presence of collinear predictors, the correlations between all numerical predictors were checked prior to analysis, with the correlation matrix presented in SI Figure 2. If two variables were highly correlated (i.e., r ≥ 0.8), one of these variables was removed. Given the high correlation between total and LDL cholesterol (Pearson's r = 0.95), the former was removed from the list of predictors submitted to the LASSO model.
Failure to account for competing risk can inflate the associations between predictors and outcomes. A competing risk regression models the effect of predictors on the outcome while accounting for competing risk of death without dementia [26][27][28]. To obtain the coefficients of predictors, a competing risk regression was used with LASSO selected variables as predictors.
Time at risk was derived as previously described (Follow up and censoring derivation). We used the crr command from the R package cmprsk. The resulting coefficients are used to compute predicted risk via the formula: Where LP is the linear predictor and the baseline 14-year survival of an individual with all 0 coefficients equal to 0 -i.e., being female, with the mean age of the training set (59.97 years), the mean years of education of the training set, (13.54 years) no diabetes, no depression, no stroke, not living alone, not materially deprived, and without parental history, hypertension, or hypercholesterolemia.
Model assumptions were checked by fitting a cox regression model with the selected predictors. The p-value for deviation from proportional hazards was significant for age and depression. However visual inspection of schoenfeld residuals showed minimal trend for both variables (SI Figure 3). For further inspection of depression, we plotted time vs. survival for those with/without depression. The survival curves for the two groups were parallel and did not cross, which indicates proportional hazards are satisfied. We also compared the form of the curves to those for the diabetes and family history variables, which did not have a significant p-value (SI Figure  4). In both cases, the curves showed similar form to depression, increasing confidence that assumptions were met. This led us to disregard the proportionality violations for these two predictors.

Evaluation of the UKBDRS
The performance of the UKBDRS was compared to the DRS, CAIDE, and ANU-ADRI in the UKB train, test and WHII datasets. All models were additionally compared to a baseline model consisting of chronological age only, to examine the added predictive value of additional factors. The model discrimination was evaluated using the area under the curve (AUC). To obtain an AUC, first a receiver operating curve (ROC) is created by plotting the sensitivity vs. specificity of a test. The points along the ROC curve correspond to sensitivity and specificity obtained with various decision thresholds. AUC is then computed as the area under the ROC. The AUC is interpreted as the probability that for a pair of individuals of whom one develops the outcome and the other either develops the outcome at a later time or not at all, the predicted risk is greater in an individual who develops dementia earlier [27]. In the UKB, the maximum follow up time was 14.8 years, and so we evaluate our score at a time horizon of 14 years. In the WHII, we evaluated AUC at 17 years as participants were younger at baseline and relatively few individuals had developed dementia at shorter time windows. By 17 years, 71 WHII participants had a dementia diagnosis. The longer follow-up in WHII allowed sufficient cases of dementia in the analyses We used risk-calibration to assess the agreement between the observed proportion of dementia cases and predicted probabilities of developing dementia as calculated from the risk score. In a well calibrated model, it is expected that 25% of individuals with a predicted risk of 25% will develop the outcome. Conversely, in a poorly calibrated model, of those with a 25% predicted risk, 10% or 50% may actually develop the outcome. Participants were stratified into 10 risk groups based on their predicted probability of developing dementia. For each group, the mean predicted risk is then plotted against the observed risk of the group, calculated using the cumulative incidence function [29,30]. A line of best fit of these 10 data points is then used to assess calibration. Perfect calibration is measured with a diagonal line (i.e. a slope of 0, intercept of 1), indicating that the proportion of individuals in each group who develop the outcome is in line with the average predicted risk of the group. Deviations from an intercept and slope of 0/1 indicate either over-or under-estimation of risk. As the WHII study did not have the minimum number of cases required to derive precise calibration intercepts and slopes (i.e., minimum n=100 dementia cases required) or calibration curves (minimum n=200 dementia cases required [31]), we only examined calibration in the UKB test set (SI Table 9, SI Figure 5).

20107, 20110
Hearing problems Participants were asked whether they had difficulty with their hearing. Participants who indicated "Yes" or "I am completely deaf" were coded as "1" while those indicating "No" were coded as "0". polymorphisms: rs429358 and rs7412. An individual with one or more E4 allelles was coded as a carrier (="1").

Smoking status
Participants reporting "ever" or "current" smokers were coded as "1" while those who did not smoke were coded as "0". Participants were asked "Do you have trouble falling asleep at night or do you wake up in the middle of the night?" with the response options ranging from "Never/rarely" to "Usually".

1200
Frequency of family/ friend visits This item asked individuals to report how often they visited or received visits from their family and friends. The response options ranged from: "No friends/family outside household" and "Never or almost never" to "Almost daily".

Household occupancy
This item required respondents to indicate "Including yourself, how many people are living together in your household? (Include those who usually live in the house such as students living away from home during term, partners in the armed forces or professions such as pilots)". From this, participants were grouped into one of three categories -lives alone, lives with one other person, lives with multiple people.

Fish intake
Participants were asked to report their weekly oily (e.g., sardines, salmon, mackerel) and non-oily (e.g. cod, tinned tuna, haddock) fish intake.     Table 8: Risk score cut-offs at 80%, 85%, 90% and 95% sensitivity and specificity. Note: the predicted probabilities of risk have been converted to percentages to aid in interpretation. Abbreviations:-NPV = Negative predictive value; PPV = Positive predictive value.  Table 9: Calibration intercept and slope. The mean predicted vs. observed risk is plotted within 10 risk stratifications, and linear regression is used to fit the intercept and slope of the line. Deviations from intercept/slope of 0/1 indicate imperfect calibration.

Model
Intercept