Volume 39, Issue 7 pp. 3058-3071
RESEARCH ARTICLE
Full Access

Reproducibility of myelin content-based human habenula segmentation at 3 Tesla

Joo-Won Kim

Corresponding Author

Joo-Won Kim

Translational and Molecular Imaging Institute, Icahn School of Medicine at Mount Sinai, New York, New York

Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, New York

Graduate School of Biomedical Sciences, Icahn School of Medicine at Mount Sinai, New York, New York

Correspondence Joo-won Kim or Junqian Xu, One Gustave L. Levy Place, Box 1234, New York, NY 10029-6574. Email: [email protected] or [email protected]Search for more papers by this author
Thomas P. Naidich

Thomas P. Naidich

Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, New York

Department of Neurosurgery, Icahn School of Medicine at Mount Sinai, New York, New York

Department of Pediatrics, Icahn School of Medicine at Mount Sinai, New York, New York

Search for more papers by this author
Joshmi Joseph

Joshmi Joseph

Translational and Molecular Imaging Institute, Icahn School of Medicine at Mount Sinai, New York, New York

Search for more papers by this author
Divya Nair

Divya Nair

Translational and Molecular Imaging Institute, Icahn School of Medicine at Mount Sinai, New York, New York

Search for more papers by this author
Matthew F. Glasser

Matthew F. Glasser

Department of Neuroscience, Washington University School of Medicine, Saint Louis, Missouri

St. Luke's Hospital, Saint Louis, Missouri

Search for more papers by this author
Rafael O'halloran

Rafael O'halloran

Translational and Molecular Imaging Institute, Icahn School of Medicine at Mount Sinai, New York, New York

Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, New York

Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, New York

Search for more papers by this author
Gaelle E. Doucet

Gaelle E. Doucet

Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, New York

Search for more papers by this author
Won Hee Lee

Won Hee Lee

Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, New York

Search for more papers by this author
Hannah Krinsky

Hannah Krinsky

Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, New York

Search for more papers by this author
Alejandro Paulino

Alejandro Paulino

Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, New York

Search for more papers by this author
David C. Glahn

David C. Glahn

Department of Psychiatry, Yale University School of Medicine, New Haven, Connecticut

Department of Psychology, Yale University School of Medicine, New Haven, Connecticut

Olin Neuropsychiatric Research Center, Institute of Living, Hartford, Connecticut

Search for more papers by this author
Alan Anticevic

Alan Anticevic

Department of Psychiatry, Yale University School of Medicine, New Haven, Connecticut

Search for more papers by this author
Sophia Frangou

Sophia Frangou

Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, New York

Search for more papers by this author
Junqian Xu

Corresponding Author

Junqian Xu

Translational and Molecular Imaging Institute, Icahn School of Medicine at Mount Sinai, New York, New York

Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, New York

Graduate School of Biomedical Sciences, Icahn School of Medicine at Mount Sinai, New York, New York

Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, New York

Correspondence Joo-won Kim or Junqian Xu, One Gustave L. Levy Place, Box 1234, New York, NY 10029-6574. Email: [email protected] or [email protected]Search for more papers by this author
First published: 26 March 2018
Citations: 15

Funding information: Radiological Society of North America (RSNA) research scholar grant, Grant/Award Number: RSCH1328; Brain and Behavior Research Foundation (BBRF) young investigator grant, Grant/Award Number: NARSAD22324; National Institutes of Health, Grant/Award Number: R01 MH104284-01A1

Preliminary findings of this study can be found in the proceedings of the Annual Meetings for the Organization for Human Brain Mapping (OHBM) in 2016 (abstract 2189) and in 2017 (abstract 1857) (Kim et al., 2016a, 2017).

Abstract

In vivo morphological study of the human habenula, a pair of small epithalamic nuclei adjacent to the dorsomedial thalamus, has recently gained significant interest for its role in reward and aversion processing. However, segmenting the habenula from in vivo magnetic resonance imaging (MRI) is challenging due to the habenula's small size and low anatomical contrast. Although manual and semi-automated habenula segmentation methods have been reported, the test-retest reproducibility of the segmented habenula volume and the consistency of the boundaries of habenula segmentation have not been investigated. In this study, we evaluated the intra- and inter-site reproducibility of in vivo human habenula segmentation from 3T MRI (0.7–0.8 mm isotropic resolution) using our previously proposed semi-automated myelin contrast-based method and its fully-automated version, as well as a previously published manual geometry-based method. The habenula segmentation using our semi-automated method showed consistent boundary definition (high Dice coefficient, low mean distance, and moderate Hausdorff distance) and reproducible volume measurement (low coefficient of variation). Furthermore, the habenula boundary in our semi-automated segmentation from 3T MRI agreed well with that in the manual segmentation from 7T MRI (0.5 mm isotropic resolution) of the same subjects. Overall, our proposed semi-automated habenula segmentation showed reliable and reproducible habenula localization, while its fully-automated version offers an efficient way for large sample analysis.

1 INTRODUCTION

The habenula is a small midbrain structure consisting of a pair of epithalamic nuclei next to the dorsomedial thalamus and the third ventricle (Namboodiri, Rodriguez-Romaguera, & Stuber, 2016). The habenula is comprised of lateral and medial habenula with further subdivisions, which differ in cell morphometry, neurochemical characteristics, and connectivity (Benarroch, 2015; Díaz, Bravo, Rojas, & Concha, 2011). So far, the lateral and medial habenula have not been able to be separated confidently in in vivo human neuroimaging. The habenula receives inputs from the basal ganglia and limbic system (Hikosaka, Sesack, Lecourtier, & Shepard, 2008) and inhibits the dopaminergic and serotonergic neurons in the midbrain and brainstem nuclei (Jhou, Geisler, Marinelli, Degarmo, & Zahm, 2009b; Jhou, Fields, Baxter, Saper, & Holland, 2009a; Metzger, Bueno, & Lima, 2017; Stephenson-Jones, Floros, Robertson, & Grillner, 2012), such as the ventral tegmental area and substantia nigra pars compacta, interpeduncular nuclei, and raphe nuclei. Animal studies have shown that the habenula plays a critical role in a range of functional domains (Boulos, Darcq, & Kieffer, 2016; Elmer, Brown, & Shepard, 2016), in particular reward and aversion processing (Baker et al., 2016; Hikosaka, 2010; Matsumoto & Hikosaka, 2007; Proulx, Hikosaka, & Malinow, 2014) which mediates adaptive and maladaptive behaviors through reinforcement learning. Dysfunction of the habenula circuitry has been hypothesized to underlie pathogenesis mechanisms of psychiatric disorders such as major depressive disorder (MDD) and addiction (Batalla et al., 2017; Lecca, Meye, & Mameli, 2014; Velasquez, Molfese, & Salas, 2014).

Since the seminal report of histological evidence of reduced habenula volume in deceased subjects with depression (Ranft et al., 2010), there has been growing interest in linking the habenula volume with human behavior, using high-resolution in vivo magnetic resonance imaging (MRI). A smaller habenula volume measured by 3T MRI was documented in bipolar disorder and MDD patients than healthy controls (Savitz et al., 2011b), but not in post-traumatic stress disorder (PTSD) patients (Savitz et al., 2011a). In contrast, another study reported no significant total habenula volumetric differences measured by 3T MRI between healthy controls and MDD patients or patients at different stages (first episode, remitted-recurrent, or chronic) of MDD, except in women with a first episode MDD (Carceller-Sindreu et al., 2015). A recent 7T MRI study reported similar lack of habenula volume difference between MDD patients and healthy controls, but increasing habenula volume with disease severity in unmedicated MDD patients (Schmidt et al., 2016). Although these in vivo habenula volume findings in MDD are intriguing, the inconsistency in the reports highlights an urgent need to characterize the reproducibility and potential confounds in the segmentation methods used in volumetric studies of the habenula.

In addition, based on animal studies of deep brain stimulation (DBS) targeting the lateral habenula (Lim et al., 2015; Meng et al., 2011), case reports of lateral habenula DBS have been reported in treatment resistant depression patients (Kiening & Sartorius, 2013; Sartorius et al., 2010). Accurate positioning of electrodes through meticulous individual neurosurgical planning is likely critical for the desired efficacy and may reveal optimal targets of the habenula sub-regions as has been shown for the subthalamic nuclei in Parkinson's disease (Akram et al., 2017). Habenula segmentation from pre-surgical MRI is the necessary first step in such neurosurgical planning. The accuracy and consistency of the boundaries of habenula segmentation, which has not been carefully investigated so far, is arguably more important than the habenula volume in such DBS studies targeting the lateral habenula.

Segmentation of the human habenula in in vivo neuroimaging is challenging due to its small size (37.2/34.2 mm3 (Ahumada-Galleguillos et al., 2016) or 30.9/33.2 mm3 (Ranft et al., 2010) for left/right habenula, respectively from postmortem histology in deceased subjects without reported cerebral illness or neuropathy) and its low anatomical contrast to the neighboring dorsomedial thalamus. A majority of previous in vivo human habenula volumetric studies have used manual contrast-based (Carceller-Sindreu et al., 2015; Savitz et al., 2011a; Savitz et al., 2011b) or geometric segmentation (Lawson, Drevets, & Roiser, 2013) on T1-weighted (T1w) 3T MRI, while more recently, manual contrast-based segmentation of the habenula from ultra-high field (7T) T1 maps has also been reported (Schmidt et al., 2016).

Recently, we developed an objective semi-automated human habenula segmentation scheme based on myelin content from the ratio of T1w and T2-weighted (T2w) images at 3T (Kim et al., 2016b) in order to improve the time-consuming and operator-dependent manual habenula segmentation processes and enable efficient and reproducible habenula volumetric analysis in large cohorts. Using our method, we have shown group-wise habenula center-of-gravity in the MNI space consistent with the geometric segmentation method of the habenula (i.e., Lawson's method), primarily aimed to objectively locate the habenula as seed regions in functional MRI (Furman & Gotlib, 2016; Hétu et al., 2016; Lawson et al., 2014; Lawson et al., 2017; Torrisi et al., 2017) or diffusion MRI (Shelton et al., 2012) applications.

In order to further evaluate our habenula segmentation method for volumetric and DBS applications, we (1) assess the intra-site and inter-site reproducibility of 3T habenula segmentation methods on repeated scans of the same subjects, (2) compare the reproducibility of our myelin contrast-based segmentation with geometric segmentation (i.e., Lawson's method) at 3T, and (3) compare the boundary consistency of our habenula segmentation at 3T to the manual segmentation at ultra-high resolution 7T MRI of the same subjects.

2 METHODS

2.1 Intra-site 3T MRI data

T1w and T2w images were acquired from twenty-seven healthy young adults (mean age 30.9 years, range 22–35 years, 20 females) on a 3T MRI scanner (3T Connectom Skyra, Siemens, Erlangen, Germany) using a 32-channel head coil (Siemens) at 0.7 mm isotropic resolution on two different days within 2 weeks as part of the Human Connectome Project (Van Essen et al., 2012) scan-rescan protocol with the following parameters: TR/TE/TI = 2400/2.14/1000 ms, flip angle (FA) = 8° for T1w acquisition, and TR/TE = 3200/565 ms for T2w acquisition (Glasser et al., 2013). The images on the first day are referred to as intra-site test and those on the second day as intra-site retest.

2.2 Inter-site 3T MRI data

T1w and T2w images were acquired from a separate cohort of 12 healthy young adults (mean age 27.0 years, range 21–36 years, 3 females) at three 3T sites (site 1 and 2: standard Skyra, Siemens; site 3: Trio, Siemens) with a 32-channel head coil (Siemens) at 0.8 mm isotropic resolution within a two-month period with the following HCP-like protocol (HCP Lifespan Pilot Project) harmonized across three sites: T1w 3D MPRAGE sequence, FOV 256 mm × 256 mm, matrix size 320 × 320, TR/TE/TI = 2400/2.07/1000 ms, flip angle 8° with binomial (1, −1) fat saturation, bandwidth 240 Hz/pixel, echo spacing 7.6 ms, in-plane acceleration (GRAPPA) factor 2, total acquisition time ∼7 min; T2w 3D variable-flip angle turbo-spin-echo (SPACE) sequence, FOV 256 mm × 256 mm, matrix size 320 × 320, TR/TE = 3200/565 ms, bandwidth 679 Hz/pixel, echo spacing 3.87 ms, in-plane acceleration (GRAPPA) factor 2, turbo factor 314, echo train duration 1169 ms, total acquisition time ∼7 min (Kim et al., 2016b inline supplementary material). The images at the three sites are referred to as inter-site 1, inter-site 2, and inter-site 3.

2.3 Intra-site 3T and 7T comparison data

Five healthy young adults (mean age 28.0 years, range 22–36 years, all males) from the 3T inter-site cohort were also scanned at ultra-high field (7T AS, Magnetom, Siemens) with a 32-channel head coil (Nova Medical Inc., Wilmington, MA) at 0.5 mm isotropic resolution after 18 months from the inter-site scans at site 3, with the following parameters: MP2RAGE sequence (Marques et al., 2010), FOV = 224 mm × 202 mm × 112 (slice) mm, matrix size 450 × 406 × 224 (slice), phase and slice partial Fourier = 6/8, no in-plane parallel imaging acceleration, TR/TE/TI1/TI2 = 5000/5.75/900/2780 ms, FA1/FA2 = 5°/3°, bandwidth = 170 Hz/pixel, echo spacing = 10.4 ms, flow compensation in slice direction, single axial slab (covering a majority of the brain) with 8 mm slice oversampling, total acquisition time 25.5 min. The composite UNI image, combining the two images acquired at TI1 and TI2 (Marques et al., 2010), was used for further 7T image analysis. Another five healthy young adults (mean age 30.4 years, range 25–41 years, one female) were similarly scanned at both 3T and 7T within a year period using the same acquisitions as part of a separate study.

2.4 Intra/inter-site 3T MRI registration

The 3T T1w and T2w images were processed with the HCP PreFreeSurfer pipeline (Glasser et al., 2013), including gradient non-linearity distortion correction (Jovicich et al., 2006), anterior commissure (AC)–posterior commissure (PC) alignment, T2w-to-T1w registration, and AC-PC-to-MNI registration; but without bias field correction as previously noted (Kim et al., 2016b). Subject-specific unbiased target space was generated by registering (FSL/FLIRT) the T1w images to the average T1w image with 10 iterations. Rigid body transformation was used for the intra-site data, whereas affine transformation was used for the inter-site data. All transformations, including AC-PC alignment, T2w-to-T1w registration, and transformation to the subject-specific target space, were concatenated into a single step transformation for inter-site data, while we applied transformation to the subject-specific target space on preprocessed intra-site data that HCP provides.

2.5 Intra-site 3T and 7T image registration

After correcting gradient non-linearity distortion and extracting the brain (FSL/BET) from the same subject's 3T T1w and 7T UNI images, 3T T1w brain images were registered to 7T UNI brain images using affine transformation (FSL/FLIRT). In order to assess the partial volume effect on habenula contrast, the 7T UNI images were first down-sampled to 0.8 mm isotropic (3T) and then up-sampled back to 0.5 mm isotropic (7T) resolution to maintain the same slice location in the 7T image coordinate space for comparison. Image blurring was minimized during the upsampling using spline interpolation.

2.6 Semi-automated 3T habenula segmentation

Myelin-sensitive images were generated using T1w-to-T2w ratios (Glasser and Van Essen, 2011). Both binary segmentation and probability map of the left and right habenula were generated with our previously proposed semi-automated objective habenula segmentation scheme, consisting of five steps: (1) habenula region-of-interest initialization, (2) histogram-based thresholding, (3) region growing, (4) geometric constraint, and (5) partial volume estimation (Kim et al., 2016b; Our open-source segmentation software can be downloaded from github.com/junqianxulab/habenula_segmentation). The only manual step was to initialize seed voxels at the centers of the left or right habenula in the subject-specific unbiased target space in step (1). The habenula probability map from 3T data was transformed to the 7T image space using the transformation in §2.5 for intra-site 3T and 7T comparison. The transformed habenula was thresholded with an empirical threshold of 0.3, chosen to minimize overall mismatch between the 3T and 7T habenula segmentation based on visual inspection (JWK).

2.7 Fully-automated 3T habenula segmentation

To test the performance of a fully-automated version of our previously proposed method, we generated the seed voxels by applying transformation from a habenula template in the MNI space (github.com/junqianxulab/habenula_segmentation/releases/tag/template_v0.1-alpha) derived from 49 HCP subjects in our previous study, centered at (−2.7, −24.3, 2.2) for the left habenula and (4.0, −23.6, 2.2) for the right habenula (Kim et al., 2016b), to the subject-specific unbiased target space. Each of the transformed and spline-interpolated left and right habenula template region was thresholded (0.1) and its center-of-mass was used as the seed to initiate the habenula segmentation as in §2.6.

To test the applicability of the fully automated habenula segmentation on more commonly available lower resolution (e.g., 1.0 mm isotropic) 3T anatomical data, a public dataset of 40 subjects (mean age 28.4 years, range 20–34 years, 25 females) were obtained from the CamCAN repository (www.mrc-cbu.cam.ac.uk/datasets/camcan; Taylor et al., 2017) and analyzed similarly. Anatomical acquisition parameters: T1w MPRAGE, 1.0 mm isotropic resolution, TR/TI/TE = 2250/900/2.98 ms, FA = 9°, in-plane acceleration (GRAPPA) factor 2, total acquisition time 4.5 min; and T2w SPACE, 1.0 mm isotropic resolution, TR/TE = 2800/408 ms, in-plane acceleration (GRAPPA) factor 2, total acquisition time 4.5 min (Shafto et al., 2014).

2.8 Geometrically defining the habenula (Lawson's method, 3T)

We implemented a graphical user interface (GUI) for the protocol (see details in the Supporting Information Figure S1) to define the habenula geometrically (Lawson et al., 2013), which uses the habenula/3rd-ventricle boundary and two lines from three landmark points (i.e., the first point is between the medial boundary of the habenula to the cerebrospinal fluid and the posterior/habenula commissure; the second point is the dorsal boundary of the habenula; and the third point is the lateral mesopontine junction next to the tentorial incisure.) In our implementation, we added a fourth point between the habenula and third ventricle in order to avoid manually drawing the boundary, which results in two line segments and two elliptic curves for habenula definition. Our open-source implementation can be downloaded from github.com/junqianxulab/habenula_segmentation_lawson. Two trained raters (JJ and DN) independently performed manual segmentation using the GUI on 24 (out of 27) intra-site subject data and all 12 inter-site subject data in two weeks. To assess intra-rater reproducibility, each rater segmented one of the intra-site data (intra-site test) twice, referred to as intra-site test1 and intra-site test2. There was at least one day interval among segmenting images of each subject, to ensure that rater memory is not a confound.

2.9 Manual habenula segmentation (7T)

For the 7T data, the habenula was manually segmented on 7T MP2RAGE UNI images in the acquisition coordinate space to avoid interpolation by an experienced habenula researcher (JWK) and refined according to reviews from an expert neuroanatomist (TPN). The 7T MP2RAGE UNI images offer clear habenula contrast (Figure 3, top row) to the medial boundary (i.e., cerebrospinal fluid) and the lateral boundary (i.e., thalamus) for manual segmentation. The only ambiguous boundaries were the dorsal anterior boundary with the stria medullaris (SM) and ventral anterior lateral boundary with the fasciculus retroflexus (FR). As coronal slices move anterior, the habenula is moving superior and has narrow inferior tails. If the coronal slice moves further anterior, these tails disappear and we set the coronal slice as anterior limit to segment the habenula. The boundary between the habenula and FR was determined for habenula to have a smooth spherical shape in both coronal and axial views according to histology in the literature (Díaz et al., 2011). The 7T habenula segmentation (n = 10) spanned 7–11 coronal and 9–17 axial slices in the 0.5 mm isotropic resolution 7T image space depending on subject and head tilting.

2.10 Group average habenula segmentation in the MNI space

To compare the 3T and 7T average habenula segmentation, we transformed each segmentation to the MNI152 template space (0.8 mm isotropic resolution) using the following steps. The 3T habenula probability map (in the AC-PC-aligned space) was transformed using AC-PC to MNI transformation generated by the HCP PreFreeSurfer pipeline. The 7T habenula manual segmentation (in the acquisition coordinate space) was transformed to the MNI space using the same subject's co-registered 3T MRI as an intermediate target. Habenula probability maps in the MNI space were created by averaging the subjects’ transformed habenula segmentation results (unthresholded) from 3T and 7T data, respectively.

2.11 Habenula segmentation similarity evaluation methods

The Dice coefficient (DC) measures the overlapping of two binary segmentations (Dice, 1945) by urn:x-wiley:10659471:media:hbm24060:hbm24060-math-0001, where urn:x-wiley:10659471:media:hbm24060:hbm24060-math-0002 and urn:x-wiley:10659471:media:hbm24060:hbm24060-math-0003 are the segmentations and urn:x-wiley:10659471:media:hbm24060:hbm24060-math-0004 is the number of segmented voxels in urn:x-wiley:10659471:media:hbm24060:hbm24060-math-0005. The DC ranges from 0 (no overlap) to 1 (identical).

The DC using probability map was implemented to account for partial volume estimation by defining urn:x-wiley:10659471:media:hbm24060:hbm24060-math-0006 as the sum of probabilities in urn:x-wiley:10659471:media:hbm24060:hbm24060-math-0007 and the probability of voxel urn:x-wiley:10659471:media:hbm24060:hbm24060-math-0008 in urn:x-wiley:10659471:media:hbm24060:hbm24060-math-0009 as the smaller probability between the probabilities of voxel urn:x-wiley:10659471:media:hbm24060:hbm24060-math-0010 in urn:x-wiley:10659471:media:hbm24060:hbm24060-math-0011 and urn:x-wiley:10659471:media:hbm24060:hbm24060-math-0012. That is, urn:x-wiley:10659471:media:hbm24060:hbm24060-math-0013, where urn:x-wiley:10659471:media:hbm24060:hbm24060-math-0014 is the probability at the voxel urn:x-wiley:10659471:media:hbm24060:hbm24060-math-0015.

Mean distance (MD) measures the mean of distances between surface voxels of two binary segmentations. The surface distance urn:x-wiley:10659471:media:hbm24060:hbm24060-math-0016 from a surface voxel urn:x-wiley:10659471:media:hbm24060:hbm24060-math-0017 in segmentation urn:x-wiley:10659471:media:hbm24060:hbm24060-math-0018 to segmentation urn:x-wiley:10659471:media:hbm24060:hbm24060-math-0019 is defined as urn:x-wiley:10659471:media:hbm24060:hbm24060-math-0020, where urn:x-wiley:10659471:media:hbm24060:hbm24060-math-0021 is the Euclidean distance between voxels urn:x-wiley:10659471:media:hbm24060:hbm24060-math-0022. The MD between urn:x-wiley:10659471:media:hbm24060:hbm24060-math-0023 is defined as urn:x-wiley:10659471:media:hbm24060:hbm24060-math-0024.

Hausdorff distance (HD) measures the maximum of the distances between surface voxels of two binary segmentations (Aspert, Santa-Cruz, & Ebrahimi, 2002). The HD between urn:x-wiley:10659471:media:hbm24060:hbm24060-math-0025 is defined as urn:x-wiley:10659471:media:hbm24060:hbm24060-math-0026.

For the intra-site data, the DC, MD, & HD were calculated from the pair of 3T test-retest data or between 3T & 7T data. For the inter-site data, the overall DC, MD, & HD were represented by the average values from all three combinatorial pairs of sites. To calculate DC, MD, & HD of segmentation using Lawson's method, we converted the habenula region, consisting of line segments & elliptic curves, to binary image in the same space of T1w image by assigning a voxel as habenula if any point of the defined region was in the voxel.

2.12 Reproducibility of habenula volume

To evaluate the systematic difference of habenula volume in intra-site test and retest data and across sites, paired t test (two-tailed) & analysis of variance (ANOVA) were used, respectively. In addition, the following two reproducibility metrics were calculated.

The coefficient of variation (COV) is the relative standard deviation (SD) of the habenula volume for the 3T test-retest data (intra- or inter-site) calculated by SD/mean.

Intraclass correlation coefficient (ICC) (Shrout & Fleiss, 1979) calculates the reliability of quantitative measurements. The strength of agreement of the range of ICC is defined as: 0.00–0.20 slight, 0.21–0.40 fair, 0.41–0.60 moderate, 0.61–0.80 substantial, 0.81–1.00 almost perfect. ICC was calculated (model = oneway & type = agreement; McGraw & P, 1996) using irr package version 0.84 (Gamer, Lemon, & Singh, 2012) in R (R Core Team, 2015).

Using our myelin contrast-based segmentation, the habenula volume was estimated with partial volume estimation (i.e., probability). Using Lawson's method, the habenula volume was calculated with areas inside the drawing (i.e., two lines & two curves), instead of counting all voxels within & on the geometric lines as originally proposed (Lawson et al., 2013).

3 RESULTS

3.1 Registration & segmentation

Each subject's images (intra-site & inter-site 3T data or intra-site 3T & 7T comparison data) were registered within half voxel difference (Figure 1) after careful visual verification by an experienced researcher (JWK). The semi-automated 3T habenula segmentation successfully segmented the habenula for all the intra-site (n = 27) and inter-site (n = 12) subjects (Figure 1, red overlay). The fully-automated 3T habenula segmentation successfully segmented the habenula in 27 intra-site subjects and in all 18 cases of 6 inter-site subjects, but failed in 13 cases out of 18 cases of 6 other inter-site subjects. For the CamCAN data (i.e., 1.0 mm isotropic resolution), the fully-automated habenula segmentation successfully segmented the habenula in 30 out of 40 cases. We define success if the segmentation is located at the habenula and there is no obvious underestimation or overestimation, the same definition as in our previous study (Kim et al., 2016b). The main reason for these failed fully-automated segmentation was because of the imperfect MNI-to-native space non-linear transformation for small subcortical structures, such as the habenula (e.g., Supporting Information Figure S10). This imperfect transformation could lead to creating seed voxels outside of the individual habenula region, causing the initialization of our segmentation scheme to fail. More detailed interpretations of this fully-automated segmentation failure and its remedy will be discussed in §4. Using Lawson's method, two raters segmented 24 intra-site subjects and 12 inter-site subjects (Figure 2). For the intra-site 3T and 7T comparison data, the habenula was manually segmented from 7T images in all subjects (n = 10) and successfully semi-automatically segmented from 3T images in nine subjects (Figure 3). One 3T segmentation from the intra-site 3T and 7T comparison data was excluded after visual inspection because of obvious overestimation along the FR (Supporting Informatiom Figure S2).

Details are in the caption following the image

Representative coronal views of T1w (first row), T2w (second row), myelin-sensitive (T1w/T2w, third row) images, and the habenula segmentation probability map (fourth row) of the intra-site (left panel) and inter-site (right panel) 3T data [Color figure can be viewed at wileyonlinelibrary.com]

Details are in the caption following the image

Representative manual habenula geometric segmentation (Lawson's method) of intra-site test (left column) and retest (right column) 3T data from rater 1 (top row) and rater 2 (bottom row) in coronal view (zoomed in) [Color figure can be viewed at wileyonlinelibrary.com]

Details are in the caption following the image

Representative coronal views of the co-registered 7T (top row), down-sampled 7T (middle row), and 3T (bottom row) images of the habenula region with the corresponding habenula segmentation (blue and red outlines representing 7T and 3T habenula segmentation, respectively) from two representative cases with (subject 2) and without (subject 1) subtle overestimation of the fasciculus retroflexus (FR, black arrows) using our myelin contrast-based habenula segmentation. The 3T image was registered to the 7T image and the transformed 3T habenula probability map was thresholded (red boundary, threshold value = 0.3) to minimize overall mismatch between the 3T and 7T habenula segmentation [Color figure can be viewed at wileyonlinelibrary.com]

To evaluate the spatial confound of our myelin contrast-based habenula segmentation, we used the intra-site 3T and 7T comparison data and oriented them, as closely as possible, to the Allen Brain Atlas (Figure 4a–c), an existing histology slice stained (Luxo fast blue) for myelin (Figure 4d), and previously published ex vivo MRI (Figure 4, second row and q) at 200 μm isotropic resolution (see Kim et al., 2016b, Methods, Ex vivo 7T). Based on the spatial correspondence from different view angles, we attribute the subtle overestimation of the 3T habenula segmentation (Figure 4, red outline in m–p) to the inclusion of the emergence of the FR, ventral anterior lateral to the habenula (Figure 4, black arrows in e–h). Neither the manual habenula segmentation from the 7T data nor our myelin contrast-based habenula segmentation from the 3T data showed obvious inclusion of the stria medullaris (SM), dorsal anterior to the habenula (Figure 4, white arrows in g and k).

Details are in the caption following the image

To visualize the spatial confound of our myelin contrast-based 3T habenula segmentation with respect to its afferent and efferent white matter fibers, representative habenula segmentations from the in vivo 3T and 7T comparison data were re-oriented and aligned, as closely as possible, with atlas, histology, and ex vivo MRI images. Aligned coronal views (columns 1–3 from the left, posterior to anterior) of Allen Brain Atlas images (a–c, human whole brain (34 years) reference atlas, slices 48, 50, and 52; left half: Nissl staining; right half: anatomical delineation; left and right half are mirrored from the same hemisphere; (© 2010 Allen Institute for Brain Science,; Ding et al., 2016; Hawrylycz et al., 2012), ex vivo MRI (e–g), in vivo 7T MRI without (i–k) and with (m–o) habenula segmentation overlays. Aligned oblique (slice orientation indicated as the orange dotted line in q) coronal-axial views (column 4, rightmost) of myelin stained (Luxol fast blue) histology slice (d), ex vivo MRI (h), in vivo 7T MRI without (l) and with (p) habenula segmentation overlays. Red and blue outlines (m–p) are habenula segmentation from in vivo 3T and 7T comparison data, respectively. Medial and lateral habenula are labeled respectively as “MHN” and “LHN” in panel a–c, and “m” and “l” in panel d, while the habenula (Hb) is indicated by white arrow in panel q (oblique sagittal view of ex vivo MRI). Fasciculus retroflexus (FR) is labeled as “fr” in panel a–c, indicated by black arrows in panels d–l, and traced by orange arrows in panel q. Stria medullaris (SM) is labeled as “smt” in a–c and indicated by white arrows in panel g and k [Color figure can be viewed at wileyonlinelibrary.com]

3.2 Habenula segmentation similarity using myelin-sensitive image contrast (3T)

Overall, the 3T habenula segmentation using our myelin-sensitive image contrast approach showed good similarity in both the intra- and inter-site 3T data (Table 1). The DCs (both binary and probability, Supporting Information Figure S3) were higher than 0.6 except in 6 out of 39 subjects (Figure 5a,b). The MD (Supporting Information Figure S4a) was less than 0.5 mm and the HD (Supporting Information Figure S4b; the case with maximum HD is shown in Supporting Information Figure S5) was less than 3.0 mm for all cases (Figure 5c,d). The probability map showed significantly (p < .01, paired t test) higher DC than the binary segmentation (Figure 5a,b, most of the points above the line of identity). The DC, MD, and HD results are correlated (Supporting Information Figure S6), as expected.

Details are in the caption following the image

Semi- and fully-automated habenula segmentation comparison. Scatter plots and histograms of Dice coefficients (a and b, histogram bin size = 0.05) from binary segmentation and probability map, Hausdorff distances (c and d, histogram bin size = 0.5 mm), and mean distances (c and d, histogram bin size = 0.05 mm) of our semi- versus fully-automated habenula segmentation in intra-site (a and c) and inter-site (b and d) 3T data. Gray lines (in a and b) are identity lines [Color figure can be viewed at wileyonlinelibrary.com]

Table 1. Dice coefficient (DC), Mean distance (MD), and Hausdorff distance (HD) of habenula segmentation (mean ± SD) by our semi- and fully-automated methods from 3T myelin-sensitive image
Intra-site Inter-site
Semi (n = 27) Fully (n = 27) Semi (n = 12) Fully (n = 6)
DCbinary 0.71 ± 0.09 0.69 ± 0.07 0.64 ± 0.05 0.61 ± 0.08
DCprobability 0.74 ± 0.06 0.74 ± 0.06 0.70 ± 0.05 0.71 ± 0.03
MD (mm) 0.25 ± 0.08 0.26 ± 0.07 0.35 ± 0.53 0.36 ± 0.09
HD (mm) 1.42 ± 0.44 1.44 ± 0.53 1.82 ± 0.45 1.60 ± 0.37

No obvious spatial bias exists between the semi- and fully-automated segmentation, excluding the failed segmentation cases. The DC using probability map, MD, and HD between semi- and fully-automated segmentation of the intra-site data were 0.85 ± 0.06, 1.16 ± 0.08 mm, and 1.34 ± 0.42 mm (mean ± SD, n = 27), respectively. The DC of the habenula segmentation probability map of the intra-site 3T data was not significantly different (p = .2) between the semi- and fully-automated segmentation. Although the DC of the habenula binary segmentation of the intra-site 3T data barely reached statistical significance (p = .02, n = 27) between the semi- (0.71 ± 0.09) and fully- (0.69 ± 0.07) automated segmentation, the difference is marginal.

The 3T habenula segmentation using Lawson's method also showed good similarity, comparable to the segmentation using our semi- or fully-automated methods, in both the intra- and inter-site 3T data (Supporting Information Table S1). The intra-site MD and HD using the automated methods were smaller than those using Lawson's method, while the inter-site MD and HD from one rater using Lawson's method were smaller than those using automated methods.

3.3 Habenula boundary comparison between semi-automated segmentation at 3T and manual segmentation at 7T

The DC, MD, and HD between the manual 7T habenula segmentation and the semi-automated 3T habenula segmentation (transformed to the 7T image space) from the same subjects were 0.66 ± 0.04, 0.39 ± 0.08 mm, and 2.16 ± 0.77 mm, respectively (mean ± SD, n = 9). Overall, the group average habenula segmentation co-localized well between the 3T and 7T results (Figure 6, MNI space). For the majority of the habenula boundaries, the group average habenula segmentation from the 7T data was inclusive of those from the 3T data (individual example in Figure 3, Subject 1), except for the emergence of the FR, ventral anterior lateral to the habenula (Figure 6, arrow; individual example in Figure 3, Subject 2). The over-estimated FR region in the 3T data was approximately 11% ± 8% (n = 9) of the 3T habenula segmentation volume.

Details are in the caption following the image

Representative coronal (a), axial (b), and sagittal (c) views of group average (n = 9) habenula probability maps from intra-site 3T (blue) and 7T (red) comparison data in the MNI152 space. The probability maps were thresholded for visualization (threshold value 0.1) with overlap shown in purple. Note that the group-level threshold used here is different from the individual-level threshhold used in Figure 3. Black arrows point to the emergence of the fasciculus retroflexus (FR), ventral anterior lateral to the habenula [Color figure can be viewed at wileyonlinelibrary.com]

3.4 Habenula volume

The segmented habenula volumes of intra-site and inter-site 3T data are summarized in Table 2. For each method or rater, the habenula volume difference between test and retest data or among sites (Supporting Information Figure S7) was not statistically significant (p > .1, paired t test for intra-site and ANOVA for inter-site) in our samples. For intra-rater reproducibility of both raters, the habenula volume of intra-site test1 (33.3 and 22.7 mm3 for rater 1 and 2, respectively) was significantly (p < .01) larger than that of intra-site test2 (28.3 and 19.1 mm3 for rater 1 and 2, respectively, Supporting Information Figure S8). The habenula volume from rater 1 was significantly larger than that from rater 2 (p < .01, Supporting Information Figure S9). For the intra-site 3T and 7T comparison data, the left/right habenula volumes of 7T manual segmentation and 3T semi-automated segmentation were 30.4 ± 6.1/28.4 ± 4.8 mm3 and 18.0 ± 5.2/16.9 ± 4.5 mm3, respectively (means ± SD, n = 9). For the CamCAN data (i.e., 1.0 mm isotropic resolution), the left/right habenula volumes were 20.1 ± 4.9/20.3 ± 5.7 mm3, respectively, from the 30 cases with successful fully-automated habenula segmentation.

Table 2. Habenula volumes (mean ± SD) in mm3 measured by semi- and fully-automated segmentation and two raters using Lawson's method.
Semi-automated Fully-automated Rater 1 Rater 2
Left Right Left Right Left Right Left Right
Intra-site test1 34.0 ± 7.6§ 32.5 ± 9.9§ 24.7 ± 7.2§ 20.7 ± 6.9§
Intra-site test2 18.2 ± 3.9* 18.9 ± 4.4* 16.3 ± 4.0* 16.5 ± 3.6* 27.9 ± 10.0§ 28.7 ± 8.4§ 20.4 ± 7.1§ 17.8 ± 6.3§
Intra-site retest 19.5 ± 3.4* 18.6 ± 3.4* 16.9 ± 3.2* 16.3 ± 3.6* 31.1 ± 10.8§ 32.9 ± 10.2§ 24.7 ± 7.4§ 19.4 ± 7.2§
Inter-site 1 19.3 ± 3.1† 19.3 ± 4.0† 16.5 ± 2.1‡ 17.3 ± 4.0‡ 36.6 ± 9.2† 41.5 ± 11.0† 24.1 ± 7.5† 21.1 ± 7.1†
Inter-site 2 18.9 ± 4.4† 17.6 ± 2.0† 15.2 ± 3.1‡ 17.1 ± 1.1‡ 30.6 ± 11.7† 32.3 ± 9.9† 22.5 ± 7.3† 20.9 ± 6.5†
Inter-site 3 18.6 ± 3.9† 20.0 ± 1.4† 16.4 ± 3.3‡ 17.3 ± 3.9‡ 33.4 ± 6.6† 35.7 ± 8.6† 22.2 ± 7.5† 20.0 ± 9.9†
  • *n = 27; §n = 24; †n = 12; ‡n = 6.

3.5 Habenula volume test-retest COV and ICC

The intra- and inter-site habenula volume COV and ICC are summarized in Figure 7 and Table 3. The semi- and fully-automated segmentation showed lower COV than Lawson's method. The intra-rater COV were comparable with intra/inter-site COV using Lawson's method, while they were consistently higher than our proposed semi- and fully-automated segmentation method. The COVs using semi-automated segmentation were all less than 0.25 except for 3 outliers (range 0.33–0.52) and those using fully-automated segmentation were all less than 0.34 except for one outlier (0.51). Using our myelin contrast-based semi- and fully-automated segmentation method, intra-site segmentation showed “substantial” (ICC = 0.62) and “moderate” reliability, respectively, while inter-site segmentation showed “slight” and “fair” reliability, respectively. Using Lawson's method, a range of “fair” to “moderate” reliabilities was observed for the two raters, and inter-site ICCs were higher than the semi-automated segmentation method.

Details are in the caption following the image

Box plots of COV of habenula volumes. “semi” and “fully” refer to semi- and fully-automated segmentation, respectively. “intra” and “inter” refer to intra-site and inter-site 3T data, respectively [Color figure can be viewed at wileyonlinelibrary.com]

Table 3. Coefficient of Variation (COV) and Intraclass Correlation Coefficient (ICC) of intra- and inter-site habenula segmentation (mean ± SD) by semi- and fully-automated segmentation and two raters using Lawson's method
Semi Fully Rater 1 Rater 2
COV
Intra-rater (n=24) 0.21 ± 0.19 0.21 ± 0.19
Intra-site 0.11 ± 0.10 (n = 27) 0.12 ± 0.10 (n = 27) 0.18 ± 0.18 (n = 24) 0.24 ± 0.20 (n = 24)
Inter-site 0.15 ± 0.06 (n = 12) 0.14 ± 0.04 (n = 6) 0.23 ± 0.14 (n = 12) 0.23 ± 0.10 (n = 12)
ICC (confidence interval)
Intra-rater 0.29 (0.01, 0.53) 0.39 (0.12, 0.60)
Intra-site 0.62 (0.42, 0.76) 0.47 (0.23, 0.65) 0.33 (0.06, 0.66) 0.20 (−0.08, 0.45)
Inter-site 0.17 (−0.06, 0.45) 0.35 (0.00, 0.71) 0.23 (−0.01, 0.52) 0.49 (0.21, 0.68)
  • The same number of subjects were used for ICC calculation as the number of subjects for the COV calculation.

4 DISCUSSIONS

In this study, we evaluated the reproducibility, both intra-site and inter-site at 3T, of the habenula segmentation in test-retest anatomical MRI data using our recently proposed myelin content-based segmentation method (Kim et al., 2016b) with semi- or fully-automated habenula seed voxel initialization, as well as a previously reported geometric segmentation method (Lawson et al., 2013). We also assessed the agreement of habenula boundaries between our semi-automated segmentation at 3T and manual segmentation on higher resolution (0.5 mm isotropic) 7T anatomical MRI of the same subjects.

Our semi-automated habenula segmentation from 3T data showed good agreement (mean DC = 0.66, MD = 0.39 mm, and HD = 2.16 mm) with the manual habenula segmentation from 7T data of the same subjects. The medial and lateral boundaries were similar. Nevertheless, we have observed subject-specific subtle overestimation in our 3T habenula segmentation (Figure 3, Subject 2). Through careful comparison with the 7T MRI, ex vivo MRI, Allen Brain Atlas, and myelin-stained histology, we attribute this subtle overestimation to the emergence of the FR, ventral anterior lateral to the habenula. Note that this subtle overestimation exists even after the geometric constraint step that we have implemented in our segmentation scheme (Kim et al., 2016b) and visual inspection of the results to exclude obvious outliers (e.g., Supporting Information Figure S2). Because of the highly myelinated fibers in the FR, similar to the high myelin content in the habenula, their histological boundary is not well defined (Díaz et al., 2011). Such ambiguity affects both contrast-weighted (e.g., T1w or T2w) and quantitative (e.g., T1 or T2* map) MRI. Any contrast-based habenula segmentation methods at 3T or 7T are likely to yield results with such subject-specific subtle overestimation of the FR. Therefore, we recommend that any habenula volumetric studies show a group average habenula probability map (e.g., Figure 6) to demonstrate the lack of this confound. Sufficiently high-resolution (e.g., 0.5 mm isotropic) 7T images helped to mitigate this subtle overestimation by clearer definition of the FR in the coronal view as a thinner structure than the habenula with sometimes discontinuous or denticular appearance (due to FR's anterior approach, Figure 4q, orange arrows), while these fine features of the FR disappeared after down-sampling the 7T images to the 0.8 mm isotropic resolution of 3T MRI (Figure 3).

The 7T UNI image and T1 map from MP2RAGE showed much higher habenula contrast than the 3T myelin-sensitive images (Figure 3), as has been previously suggested by others (Strotmann et al., 2014; Schmidt et al., 2016). We chose to use UNI image for our manual segmentation because of its similar contrast to T1w images (i.e., habenula as hyperintense). We do not expect the choice of UNI image or T1 map to significantly affect our conclusions, because the UNI image and T1 map showed very similar habenula contrast (Supporting Information Figure S11). The habenula volume measured at 7T using manual segmentation was larger than that measured at 3T using our semi- or fully-automated segmentation, but similar to our previously reported habenula volume using manual segmentation from ex vivo MRI at 200 μm isotropic resolution (Kim et al., 2016b). Both the resolution and contrast advantage at 7T are presumed to lead to less partial volume effect, hence the more likely inclusion of true habenula voxels at the boundaries. For this reason, we used the 7T manual habenula segmentation as a reference for our 3T habenula segmentation in this study. In addition, quantitative T1 map can be calculated from the MP2RAGE as a potentially more reliable source of habenula contrast in repeated scans of the same subjects. However, achieving 0.5 mm isotropic resolution requires lengthy MP2RAGE acquisition at 7T. Future habenula morphological investigation at 7T may focus on high-resolution, better than commonly achievable at 3T (i.e., ≤0.6 mm isotropic resolution), reduced field-of-view acquisition with short acquisition time, if whole brain coverage is not a requirement.

As implied by the 3T and 7T comparison, image resolution could play an important role in the boundary definition of habenula segmentation. In this and our previous studies, we have demonstrated successful habenula segmentation from 0.7, 0.8, and 0.9 mm isotropic resolution HCP-like T1w and T2w 3T anatomical MRI using our semi-automated method. Nevertheless, 1 mm isotropic resolution anatomical MRI are more commonly collected in neuroimaging studies. To assess the applicability our segmentation method to these wider pool of anatomical MRI data, we tentatively examined our fully-automated segmentation on publicly available (e.g., CamCAN) or locally acquired (results not shown) 1.0 mm data. We found that successful habenula segmentation requires higher habenula-thalamus contrast-to-noise ratio (CNR) for 1.0 mm (i.e., voxel volume = 1.0 mm3) isotropic resolution data than 0.7 mm (i.e., voxel volume = 0.34 mm3) or 0.8 mm (i.e., voxel volume = 0.51 mm3) isotropic resolution data, since 1.0 mm isotropic resolution data contain a much smaller number of voxels in and around the habenula region to inform the histogram-based thresholding in our segmentation algorithm (Kim et al., 2016b). This implies that high quality T2w image with optimized gray-to-white matter contrast is critical for 1.0 mm isotropic resolution data. We show both successful and failed habenula segmentation examples (Supporting Information Figure S12) from 1.0 mm isotropic resolution datasets with high and low habenula-thalamus CNR, respectively. We did not attempt to compare the habenula volumes between datasets with different image resolutions from different samples to avoid Type I or Type II errors. A future study with anatomical images acquired at different resolutions in a single imaging session on the same subject would be more appropriate to address the impact of image resolution on the volume and reproducibility of habenula segmentation.

Similar to the habenula volume differences between our intra-site 3T and 7T comparison data of the same subjects, group level mean habenula volumes reported in the literature are variable. The mean habenula volumes measured by manual contrast-based segmentation at 3T were 18.8/16.4 mm3 (Savitz et al., 2011a), 19.5/17.9 mm3 (Savitz et al., 2011b), left/right habenula respectively, and 43.0 mm3 (Carceller-Sindreu et al., 2015), bilateral, while those from 7T T1 map were 17.6/17.3 mm3 (Schmidt et al., 2016), left/right habenula respectively in healthy controls. Using the geometric segmentation (i.e., Lawson's method), the habenula volumes from 3T were 27.9/28.0 mm3 (Hétu et al., 2016), 28.3/28.7 mm3 (Furman & Gotlib, 2016), 29.4/29.3 mm3 (Lawson et al., 2013), left/right habenula respectively, and 22.3 mm3 (Lawson et al., 2017) per hemisphere, while those from 7T (0.7 mm isotropic resolution MPRAGE) were 18.8/14.9 mm3 (Torrisi et al., 2017), left/right habenula respectively. Assessing the accuracy of in vivo habenula segmentation would require a carefully planned cadaver MRI, ex vivo MRI, and histology from the same brain, which is beyond the scope of this study.

The habenula segmentation similarity and reproducibility results in this study should be interpreted in the context of the habenula's small size compared to the MRI resolution and low anatomical contrast to the surrounding thalamus. Smaller structures are expected to have lower DC and higher volume COV (de Boer et al., 2010; Khan, Wang, & Beg, 2008; Loh et al., 2016; Powell et al., 2008) than larger structures (De Leener, Kadoury, & Cohen-Adad, 2014; Kullberg, Ahlström, Johansson, & Frimmel, 2007; Lemieux, Hagemann, Krakow, & Woermann, 1999). Moreover, potential segmentation bias to FR and/or SM could lead to the moderate HD result in our study (note that the HD is calculated from maximum single point difference, not as an average). The respectably high DC (>0.6), low MD (<0.5 mm), and moderate HD (<3 mm) for both the intra- and inter-site 3T data, and the intra-site 3T and 7T comparison data using our myelin content-based segmentation method (either semi- or fully-automated) indicate a respectable level of consistency in habenula boundary definition on repeated scans of the same subjects. In addition, the segmented habenula volume with partial volume estimation (i.e., probability) using our methods showed low COV (≤0.15) in both the intra-site and inter-site 3T data.

The habenula volume reproducibility results (i.e. COV and ICC) have different implications. The COV results could be used for power analysis in cross-sectional studies of habenula volume differences in different cohorts, when using similar MRI acquisition and habenula segmentation methods as presented in this study. On the other hand, our ICC results (range = 0.17–0.62) suggest that the within-subject variation is at least on par with, if not larger than, the between-subject variation for both the intra-site and inter-site test-retest measurements. This relatively high within-subject variation might be attributed to the sensitivity of habenula segmentation methods to both technical (e.g., head motion and head tilting angle) and physiological (e.g., cerebrospinal fluid and vascular pulsation around the habenula) factors leading to image resolution and contrast variation in repeated scans of the same subject, given the small size of the habenula (i.e., boundary voxels have substantial influence on the measured volume) and its close vicinity to the third ventricle. Variance components related to these effects are random effects and we speculate that they dominate in both the within- and between-subject variation. Additional random between-subject variation includes subject variations of the true habenula size, (COV = ∼0.3 [n = 38; Ahumada-Galleguillos et al., 2016] and ∼0.2 [n = 14; Ranft et al., 2010] from postmortem histology in deceased subjects without reported cerebral illness or neuropathy. Note that the COV here is between-subject variation, which is different from the test-retest COV calculated in §2.12). In addition, there are scanner/site-specific fixed effects variation components contributing to both the within- and between-subject variation in the inter-site data. Although different cohorts were studied in our intra-site and inter-site test-retest data that prevented us from a rigorous intra- and inter-site comparison, we expect that the habenula segmentation reproducibility would be higher for intra-site than inter-site. Overall, our ICC results imply that designing a well-powered longitudinal habenula volumetric study, using the habenula segmentation methods presented in this study, would be challenging, even within a single site.

The COV and ICC results using our semi- or fully-automated segmentation was higher, except for the inter-site ICC, than that using Lawson's method. Well-trained raters would show higher reproducibility. Admittedly, by the nature of the geometric delineation and unlike the contrast-based segmentation methods, Lawson's method was not designed for the purpose of evaluating habenula volume differences or changes. We caution against misusing Lawson's method to assess human habenula anatomical laterality (Bianco & Wilson, 2009), as suggested by Concha's group (Ahumada-Galleguillos et al., 2016). Lawson's method has been successfully used for objectively localizing the habenula seed regions in functional MRI (Furman & Gotlib, 2016; Hétu et al., 2016; Lawson et al., 2017; Torrisi et al., 2017) or diffusion MRI (Shelton et al., 2012) applications, which our myelin content-based habenula segmentation method provides a reproducible and efficient alternative (Ely et al., 2016).

We have attempted to fully automate our previously developed semi-automated segmentation by transforming habenula seed voxel in the MNI atlas space to individual native space. The fully automated segmentation performed as well as the semi-automated segmentation in the intra-site HCP data, but underperformed in the inter-site data; while its performance (75% success rate) in the CamCAN data (i.e., 1.0 mm isotropic resolution) is still encouraging. The main reason for the underperformance of the fully-automated segmentation was the imperfect MNI-to-native space non-linear transformation for small subcortical structures, such as the habenula (e.g., Supporting Information Figure S10). This imperfect transformation could lead to creating seed voxels outside of the individual habenula region, causing the initialization of our segmentation scheme to fail. In cases where the MNI-to-native space transformation successfully placed seed voxels inside the individual habenula, there is no significant difference between the semi- and fully-automated segmentation in our inter-site data (p > .05, unpaired t test), as expected. The universal success of the fully-automated segmentation in the intra-site HCP data was probably because our habenula template in the MNI space was generated using the healthy young adult HCP data from our previous study, hence relatively more representative of the HCP data, which suggests a scanner effect on the accuracy of MNI-to-native space habenula registration. Although fully-automated habenula segmentation step requires more accurate MNI-to-native space subcortical registration (Osher, Saygin, Tobyne, & Somers, 2015) to achieve universal success, it can still be used as an efficient first pass processing step in large cohort studies and be complemented by the semi-automated segmentation for the failed cases during the visual inspection of the segmentation results.

5 CONCLUSIONS

We evaluated the reproducibility of human habenula segmentation methods on repeated 3T MRI measurements of the same subjects. Our proposed semi- and fully-automated segmentation showed consistent boundary definition (DC > 0.6, MD < 0.5 mm, HD < 3 mm) and reproducible volume measurement (COV ≤ 0.15) in both intra- and inter-site test-retest 3T data. The habenula boundary from the semi-automated segmentation of 3T data showed good agreement with habenula boundary from the manual segmentation of 7T data of the same subjects. Overall, our results indicate that the proposed semi-automated habenula segmentation has good reproducibility.

ACKNOWLEDGMENTS

We are grateful to Dr. Pedro Pasik (Mount Sinai Hospital, New York) for the myelin stained histology slice. We also thank Rima Fayad (Translational and Molecular Imaging Institute, Icahn School of Medicine at Mount Sinai), Jennifer Barret (Olin Neuropsychiatry Research Center, Institute of Living), and Nicole Santamauro (Yale University School of Medicine) for coordinating the inter-site MRI scans. Data collection and sharing for 1 mm isotropic resolution data analysis was provided by the Cambridge Centre for Ageing and Neuroscience (CamCAN). CamCAN funding was provided by the UK Biotechnology and Biological Sciences Research Council (grant number BB/H008217/1), together with support from the UK Medical Research Council and University of Cambridge, UK.

    CONFLICT OF INTERESTS

    Dr. Anticevic consults and is a member of the SAB for BlackThorn thereaputics.

    Other authors declare no competing financial interests.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.