Reliability of the MRI-based Brain Atrophy and Lesion Index in the evaluation of whole-brain structural health
Abstract
Background
The Brain Atrophy and Lesion Index (BALI), which evaluates several common aging-related MRI changes in combination, has been validated as a feasible method to assess the status of structural brain health. Previous studies have been based primarily on older participants and high-field MRI. Here, we tested the generalizability of the BALI by examining its measurement properties in a wide age range at both high and conventional MRI field strengths.
Methods
Subjects (n = 229) who had T2WI at either 1.5T or 3.0T were grouped into younger (age ≤ 60 years) and older (age > 60 years) groups. Image evaluation and scoring were performed independently by two experienced neuroradiologists who have mastered the BALI method. Inter- and intrarater agreement rates were examined comparing age groups and field strengths.
Results
The intraclass correlation coefficient for the BALI total score was consistently high under each experimental condition (interrater ICC ≥ 0.92, 95% CI: 0.84-0.96), with no statistical difference between age groups (Fisher Z = 1.43) or field strengths (Z = 0.60). The reliability for BALI category subscores ranged between moderate and perfect (eg, 0.85 vs 0.57 for GA), similar for both age groups and typically greater at 3.0T than at 1.5T.
Conclusion
The BALI based on T2WI can be reliably applied to the evaluation of the whole-brain health of both younger and older adults at both field strengths, even though high-field MRI is preferable.
1 INTRODUCTION
Aging is a process characterized by deficit accumulation over time.1 As is the situation with the body,2 various degenerative changes can also accumulate in the brain,3 and they interact to overwhelm repair processes, causing high-level failure in brain functions, and can lead to cognitive decline and dementia.4 To collectively evaluate the additive effects of several common structural deficits on brain function, a semiquantitative rating scale, the Brain Atrophy and Lesion Index (BALI), has been validated with the application of multiple research datasets.5, 6 The BALI assesses global atrophy (GA) and lesions in both the supratentorial and infratentorial compartments. Categorized changes are assessed as lesions in the gray matter (eg, cortical infarcts) and dilated perivascular spaces in the subcortical white matter as well as lesions in the periventricular regions, deep white matter, basal ganglia, and the surrounding regions.4-6 By integrating multiple deficits in the aging brain into one scale, the BALI has been used as a proxy measure of brain health status and a way to model the dynamic brain changes in the process of aging.3, 4, 7
To date, the BALI has demonstrated sensitivity in the relationship with age and cognition, differentiating people with different cognitive diagnoses and those with high risks for cognitive decline, and predicting those converted to dementia.3-8 Moderate-to-high reliability has been reported for BALI based on different sequences.4, 9, 10 Nevertheless, previous studies have chiefly used high-field MRI (eg, 3.0 tesla) with a relatively high signal-to-noise ratio and assessed the brain of older subjects with many age-associated changes.4, 9, 10 It is yet to be determined (a) whether the BALI method can also be reliably applied in evaluating people of younger ages and (b) whether the data acquired from clinically conventional MRI may also be used to score BALI with acceptable reliability.
To test the generalizability of the BALI, here, we examined the measurement property of the BALI using the data from the adults of a wide age range and at both high and conventional MRI field strengths.
2 METHODS
2.1 Participants
We accessed the data (n = 229) from a convenience sample of adults who underwent a general health evaluation at Beijing Hospital from August 16, 2016, to August 31, 2017, agreed to have MRI brain scans at either 3.0T or 1.5T, and were not diagnosed with terminal malignancy, stroke, heart diseases, or cognitive decline. The sample contained 72% male, with age ranging between 25 and 80 years (mean age = 48.3 ± 12.5); over 95% of the participants were married, 91% had a college or university degree, and 82% were working on a job (Table 1).
Category | Case | Age (mean ± SD) | Male (%) | Married (%) | High education (%) | Working on a job (%) | |
---|---|---|---|---|---|---|---|
Overall | N | 229 | 48.3 ± 12.5 | 72.1 | 95.2 | 91.3 | 82.5 |
n | 48 | 46.2 ± 9.0 | 85.4 | 93.8 | 95.8 | 100.0 | |
Age group | |||||||
Younger | N | 194 | 44.5 ± 9.2 | 74.2 | 94.8 | 93.3 | 96.9 |
n | 35 | 46.3 ± 8.2 | 94.3 | 97.1 | 97.1 | 100.0 | |
Older | N | 35 | 69.4 ± 5.7 | 60 | 97.1 | 80.0 | 2.9 |
n | 35 | 69.4 ± 5.7 | 60 | 97.1 | 80.0 | 2.9 | |
Field strength | |||||||
1.5T | N | 60 | 47.0 ± 11.4 | 75.0 | 100.0 | 98.3 | 91.7 |
n | 35 | 43.0 ± 9.0 | 75.0 | 100.0 | 97.2 | 100.0 | |
3.0T | N | 169 | 48.8 ± 12.9 | 71.0 | 93.5 | 88.8 | 79.3 |
n | 35 | 46.4 ± 9.0 | 85.7 | 91.4 | 94.3 | 100.0 |
- N, sample size; n, randomly selected subsample size; SD, standard deviation.
2.2 MRI scans
Whole-brain scans were acquired using one of the four MRI scanners, including two of 3.0T (Discovery MR750; General Electric Medical Systems, Waukesha, WI, USA; and Achieva; Philips Medical Systems, Best, The Netherlands) and two of 1.5T (MAGNETOM Espree, Siemens, Germany; and Optima MR360; General Electric Medical Systems). Nearly three quarters (73.8%) of the MRI data were acquired with 3.0T. BALI scoring was completed based on the evaluation of the two-dimensional T2-weighted imaging (2D T2WI). The sequence parameter settings were as follows: TR/TE = 2500-5600/90-110 ms; flip angle = 90° or 140-160°; field of view: 230 mm; matrix size: 180 × 256; slice thickness: 5.0 mm; and 24 axial slices to cover the whole brain. Detailed parameter settings were optimized for scanner specifics.
2.3 Evaluation of the Brain Atrophy and Lesion Index
As described elsewhere,4-10 the BALI is a semiquantitative summary rating scale, adapted from several well-established scales that assess localized structural changes.11, 12 Changes that are integrated into the BALI evaluation include gray matter lesions (eg, cortical infarcts) and subcortical dilated perivascular spaces (GM-SV), deep white matter lesions (DWM), periventricular white matter lesions (PV), lesions in the basal ganglia and surrounding areas (BG), lesions in the infratentorial compartment (IT), and GA.
Applying the BALI rating schema, a value between 0 and 3 was assigned to assess a change in each category, with a higher score indicating greater severity. In the categories DWM and GA, the values of 0-5 were used, allowing the capture of more severe changes and thereby avoiding ceiling effects. The “other findings” category was included to record other possible changes such as neoplasm, trauma, idiopathic normal-pressure hydrocephalus, focal asymmetry, and deformity, each of which is sometimes seen in older adults. The BALI total score was calculated as the sum of subscores of all the seven categories.5
Figure 1 shows examples from the sample of the BALI categories evaluated as having different subscores (Figure 1).

2.4 Reliability assessment
Images were evaluated independently by two experienced neuroradiologists (TG and HG) with 10 and 12 years of diagnostic imaging experience, respectively, and familiarity with the BALI method. Imaging evaluation and BALI scoring were performed with the subjects' demographic information and scan field strength masked.
For the assessment of the interrater agreement rate, the two raters each assessed a subsample of 105 (45.9%) subjects. For statistical analysis and group comparison, the subjects from the selected subsample were divided into different experimental conditions, that is, younger (≤60 years) vs older (>60 years) age groups, and conventional (1.5T) vs high (3.0T) MRI field strengths, n = 70 for each comparison group (accounted for 18%-100% of the original sample under each condition). For the assessment of the intrarater agreement rate, each rater independently evaluated the same sample two times on separate days. The order of the image evaluation was determined using a random number generator.
2.5 Statistical analysis
For the BALI total score (interval data), the interrater and intrarater agreement rates were assessed using the intraclass correlation coefficient (ICC; moderate: 0.5-0.75; good: 0.75-0.90; excellent: >0.90);13 comparisons on the ICC between different raters or different experimental conditions were made using Fisher Z test. For the BALI subscores (categorical data), the interrater and intrarater agreement rates were assessed using the Cohen K coefficient (moderate: 0.41-0.6; substantial: 0.61-0.80; perfect: 0.81-1.0).14 Differences in the BALI total score between experimental conditions (age group and MRI field strength) were examined for each rater using two-way ANOVA.
All statistical analyses were performed using IBM Statistics SPSS version 22 (IBM, Chicago, IL, USA). The level of statistical significance was set at P < 0.050. The 95% confidence intervals (95% CI) were reported together with the mean values whenever appropriate.
3 RESULTS
As detailed in Table 1, the randomly selected subsamples for reliability testing well represented the samples under different experimental conditions, regarding age and other demographic features (Table 1). The BALI total score differed by age (4.20 ± 2.27 for the younger group vs 11.37 ± 2.79 for the older group), which was consistent for both raters and with no age vs field strength interactions (Fage = 77.02/64.73, Ffield strength = 2.51/5.22, Finteraction = 0.34/0.22).
Considering the reliability of the BALI total score (Table 2), the ICC for the interrater agreement was 0.94 (95% CI = 0.90-0.97) for the overall sample. The interrater ICC values were also good for each of the different experimental conditions: 0.96 for younger, 0.92 for older, 0.92 for 1.5T, and 0.94 for 3.0T. There was no statistical difference in the ICC between age groups (Z = 1.43, P = 0.153) or field strengths (Z = 0.60, P = 0.549).
BALI total score | Overall | Age group | Field strength | ||
---|---|---|---|---|---|
Young | Older | 1.5T | 3.0T | ||
Rater 1 | |||||
Mean ± SD | 4.20 ± 2.27 | 4.20 ± 2.27 | 11.37 ± 2.79 | 4.19 ± 2.35 | 4.86 ± 2.26 |
Intrarater agreement rate (95% CI) | 0.88 (0.77-0.94) | 0.88 (0.77-0.94) | 0.91 (0.82-0.95) | 0.93 (0.86-0.96) | 0.91 (0.82-0.95) |
Rater 2 | |||||
Mean ± SD | 4.69 ± 2.26 | 4.69 ± 2.26 | 11.26 ± 3.16 | 4.03 ± 2.04 | 5.34 ± 2.16 |
Intrarater agreement rate (95% CI) | 0.95 (0.90-0.97) | 0.95 (0.90-0.97) | 0.96 (0.92-0.98) | 0.88 (0.78-0.94) | 0.95 (0.90-0.97) |
Interrater agreement rate (95% CI) | 0.94 (0.90-0.97) | 0.96 (0.93-0.98) | 0.92 (0.85-0.96) | 0.92 (0.84-0.96) | 0.94 (0.87-0.97) |
Fisher Z (Sig) between experimental conditionsa | 1.43 (0.153) | 0.60 (0.549) |
- SD, standard deviation; CI, confidence interval; Sig, level of significance.
- a Used for analysis reliability of interrater agreement rate between different field strengths and age groups.
The ICC for intrarater agreement of the BALI total score was consistently high for each rater (ICC ≥ 0.90 for the total sample). There was no difference in the intrarater ICC between younger (≥0.88) and older (≥0.91) age groups (Z ≤ 1.82, P ≥ 0.069), or between 1.5T (≥0.88) and 3.0T (≥0.91; Z ≤ 1.61, P ≥ 0.107; Table 2).
The reliability for the BALI category subscores was also high for the overall sample tested; the Cohen K coefficients ranged between moderate (0.61) for DWM and perfect (0.85) for GM-SV (Table 3). The Cohen K values were more significant in the younger than in the older subjects for GM-SV (0.83 vs 0.46) and BG (0.79 and 0.52), but not for other subscores; in contrast, the Cohen K values were consistently higher (except for DWM) at 3.0T (0.68 for IT to 0.85 for GA) than at 1.5T (0.50 for BG to 0.65 for GM-SV), demonstrating robust identification of these brain changes due to higher field strength (Table 3).
BALI subcategory | Score | Overall | Age group | Field strength | ||
---|---|---|---|---|---|---|
Young | Older | 1.5T | 3.0T | |||
Gray matter lesions and subcortical dilated perivascular spaces (GM-SV) | ||||||
Rater 1 | Mean ± SD | 1.38 ± 0.49 | 1.37 ± 0.49 | 1.77 ± 0.60 | 1.22 ± 0.49 | 1.49 ± 0.51 |
Intrarater agreement rate | 0.82 | 0.81 | 0.77 | 0.55 | 0.89 | |
Rater 2 | Mean ± SD | 1.44 ± 0.50 | 1.46 ± 0.51 | 1.71 ± 0.57 | 1.25 ± 0.44 | 1.57 ± 0.50 |
Intrarater agreement rate | 0.83 | 0.77 | 0.72 | 0.46 | 0.94 | |
Interrater agreement rate | 0.85 | 0.83 | 0.46 | 0.65 | 0.83 | |
Deep white matter lesions (DWM) | ||||||
Rater 1 | Mean ± SD | 1.04 ± 0.74 | 1.00 ± 0.77 | 2.31 ± 0.58 | 0.89 ± 0.71 | 1.09 ± 0.78 |
Intrarater agreement rate | 0.62 | 0.48 | 0.61 | 0.70 | 0.55 | |
Rater 2 | Mean ± SD | 1.15 ± 0.62 | 1.14 ± 0.65 | 2.43 ± 0.70 | 0.97 ± 0.65 | 1.23 ± 0.65 |
Intrarater agreement rate | 0.65 | 0.72 | 0.63 | 0.78 | 0.62 | |
Interrater agreement rate | 0.61 | 0.57 | 0.51 | 0.57 | 0.58 | |
Periventricular white matter lesions (PV) | ||||||
Rater 1 | Mean ± SD | 0.35 ± 0.53 | 0.34 ± 0.54 | 1.54 ± 0.82 | 0.25 ± 0.44 | 0.37 ± 0.55 |
Intrarater agreement rate | 0.78 | 0.69 | 0.65 | 0.55 | 0.77 | |
Rater 2 | Mean ± SD | 0.38 ± 0.49 | 0.34 ± 0.48 | 1.66 ± 0.77 | 0.44 ± 0.50 | 0.37 ± 0.49 |
Intrarater agreement rate | 0.66 | 0.71 | 0.56 | 0.70 | 0.60 | |
Interrater agreement rate | 0.78 | 0.75 | 0.64 | 0.56 | 0.76 | |
Lesions in the basal ganglia and surrounding areas (BG) | ||||||
Rater 1 | Mean ± SD | 0.54 ± 0.77 | 0.51 ± 0.74 | 2.09 ± 0.78 | 0.56 ± 0.91 | 0.66 ± 0.84 |
Intrarater agreement rate | 0.76 | 0.77 | 0.70 | 0.55 | 0.74 | |
Rater 2 | Mean ± SD | 0.58 ± 0.82 | 0.57 ± 0.82 | 2.03 ± 0.82 | 0.22 ± 0.64 | 0.74 ± 0.89 |
Intrarater agreement rate | 0.65 | 0.62 | 0.66 | 0.56 | 0.65 | |
Interrater agreement rate | 0.77 | 0.79 | 0.52 | 0.50 | 0.76 | |
Lesions in the infratentorial regions (IT) | ||||||
Rater 1 | Mean ± SD | 0.60 ± 0.64 | 0.57 ± 0.66 | 1.66 ± 0.97 | 0.69 ± 0.86 | 0.66 ± 0.64 |
Intrarater agreement rate | 0.68 | 0.71 | 0.80 | 0.77 | 0.71 | |
Rater 2 | Mean ± SD | 0.81 ± 0.84 | 0.74 ± 0.78 | 1.40 ± 1.01 | 0.75 ± 0.77 | 0.83 ± 0.86 |
Intrarater agreement rate | 0.60 | 0.60 | 0.85 | 0.74 | 0.58 | |
Interrater agreement rate | 0.64 | 0.67 | 0.69 | 0.52 | 0.68 | |
Global atrophy (GA) | ||||||
Rater 1 | Mean ± SD | 0.52 ± 0.68 | 0.40 ± 0.60 | 1.91 ± 0.82 | 0.53 ± 0.65 | 0.57 ± 0.70 |
Intrarater agreement rate | 0.75 | 0.67 | 0.68 | 0.75 | 0.81 | |
Rater 2 | Mean ± SD | 0.52 ± 0.68 | 0.43 ± 0.61 | 1.83 ± 0.75 | 0.36 ± 0.64 | 0.60 ± 0.70 |
Intrarater agreement rate | 0.57 | 0.59 | 0.69 | 0.46 | 0.52 | |
Interrater agreement rate | 0.77 | 0.71 | 0.76 | 0.57 | 0.85 |
Under any given condition, the intrarater agreement rates for any BALI category subscore were always better than moderate and mostly comparable between the raters (Table 3).
4 DISCUSSION
In the present study, we investigated the reliability of the BALI evaluation using clinical routinely acquired 2D T2-weighted MRI at 1.5T and 3.0T in a convenience sample of younger and older adult participants. Reliability tests were conducted on the BALI total score and each of the category subscores. Inter- and intrarater agreement rates were compared between age groups (younger vs older) and field strengths (1.5T vs 3.0T). Our data suggested consistently high inter- and intrarater agreement rates with the BALI total score under different experimental conditions, which did not differ between the different age groups and field strengths. The data also suggested moderate-to-perfect inter- and intrarater agreement rates with each BALI category subscore regardless of age group and field strength, even though the reliability for a few categories was higher at 3.0T than at 1.5T.
Previous research has repeatedly shown that the BALI can be used to capture and summarize several common structural changes in the aging brain, thereby providing a way to study the impact of global structural changes on brain function in older adults.5-10 It has been known that various structural brain changes can coexist with brain aging, reflecting heterogeneous profiles in the process of age.15 Importantly, brain changes, including cerebral volume loss, can start early in the adult life span.16 Many of these changes can be clinically less significant or meaningful when considered individually, but when combined, they produce additive effects on function.4 Indeed, our previous work has shown that BALI total score differed significantly by age and in older people with different cognitive conditions.5-8 Here, the high reliability of the BALI total score and subscores in both younger and older subjects suggests that the BALI method may also be used in studying younger adults. Thus, the evidence yielded from the present study is in support of extending the BALI method beyond older ages, allowing enhanced generalizability of the BALI for the global assessment of brain health changes in broader study settings. This contribution is of significance, due to the critical opportunity to investigate the accumulation of deficits in the brain prior to older adulthood.
The BALI focuses on morphologic changes from the widely available routine clinical sequences.10 Here, we have further tested the reliability of the BALI evaluation using MR imaging acquired at both 1.5T and 3.0T. Consistent with a previous report on older adults,9 the data from the present study involving both older and younger adults have confirmed that the method is easy to master and the evaluation time is manageable.5 The interrater agreement rate of the BALI total scores is satisfactory at both field strengths and by both raters, indicating the robustness of the BALI total score as a global measure of structural brain health. Meanwhile, the increased reliability for several category subscores at 3.0T demonstrates advantages of high-field strength: The higher signal-to-noise ratio can benefit BALI evaluation, allowing the raters to more sensitively identify subtle changes and robustly grade the BALI categories.
Our data must be interpreted with caution. In this study, both raters who evaluated the images are experts in diagnostic neuroimaging and have well mastered the BALI method. Whether nonneuroradiologist raters may require additional training and practice to achieve the same level of reliability for each experimental condition remains to be determined. However, previous studies have shown highly reliable rating scores for older participants using 3.0-T T1-weighted MRI by nonneuroradiologist raters trained with the BALI method (using the rating schema descriptions, atlas, examples, and case discussions).5, 17, 18 Furthermore, our study used a subsample of the subjects from one study site for the reliability testing. Although the statistical analyses suggested satisfactory reliability, to what extent that the findings can be generalized to the general population deserves further research with increased sample sizes.
In conclusion, our study suggests that multiple structural brain changes can be collectively evaluated with the use of the BALI total score in adults of a wide age range. The BALI score based on both 1.5-T and 3.0-T T2WI is highly reliable in capturing global brain changes that can accumulate across the adult life course. High-field MRI can further improve the robustness in detecting subtle changes.
ACKNOWLEDGMENTS
The authors sincerely acknowledge Dr. K. Rockwood and Dr. W. Siu for critical discussions and Ms. Betty Chinda for proofreading the manuscript. This research was partly supported by Capital's Funds for Health Improvement and Research of China (2014-4-4052). Additional funding for data analysis was from Canadian Institutes of Health Research (CSE-125739) and Surrey Hospital & Outpatient Centre Foundation (2015-030, G2017-001).
CONFLICT OF INTEREST
Tao Gu receives a fellowship award from Beijing Hospital to conduct postdoctoral research in Canada. Hui Guo receives a fellowship award from the China Scholarship Council to conduct postdoctoral research in Canada.