Research Article

Full Access

Multi-modal robust inverse-consistent linear registration

Corresponding Author

Christian Wachinger

Department of Electrical Engineering and Computer Science, Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, Massachusetts

Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, 02114

Correspondence to: Christian Wachinger, Massachusetts Institute of Technology, 32 Vassar St, Cambridge, MA. E-mail: [email protected]Search for more papers by this author

Polina Golland,

Polina Golland

Department of Electrical Engineering and Computer Science, Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, Massachusetts

Search for more papers by this author

Caroline Magnain,

Caroline Magnain

Department of Radiology, Massachusetts General Hospital, A.A. Martinos Center for Biomedical Imaging, Charlestown, Massachusetts, 02129

Search for more papers by this author

Bruce Fischl,

Bruce Fischl

Department of Electrical Engineering and Computer Science, Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, Massachusetts

Department of Radiology, Massachusetts General Hospital, A.A. Martinos Center for Biomedical Imaging, Charlestown, Massachusetts, 02129

Search for more papers by this author

Martin Reuter,

Martin Reuter

Department of Electrical Engineering and Computer Science, Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, Massachusetts

Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, 02114

Department of Radiology, Massachusetts General Hospital, A.A. Martinos Center for Biomedical Imaging, Charlestown, Massachusetts, 02129

Search for more papers by this author

Christian Wachinger,

Corresponding Author

Christian Wachinger

Department of Electrical Engineering and Computer Science, Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, Massachusetts

Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, 02114

Correspondence to: Christian Wachinger, Massachusetts Institute of Technology, 32 Vassar St, Cambridge, MA. E-mail: [email protected]Search for more papers by this author

Polina Golland,

Polina Golland

Department of Electrical Engineering and Computer Science, Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, Massachusetts

Search for more papers by this author

Caroline Magnain,

Caroline Magnain

Department of Radiology, Massachusetts General Hospital, A.A. Martinos Center for Biomedical Imaging, Charlestown, Massachusetts, 02129

Search for more papers by this author

Bruce Fischl,

Bruce Fischl

Department of Electrical Engineering and Computer Science, Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, Massachusetts

Department of Radiology, Massachusetts General Hospital, A.A. Martinos Center for Biomedical Imaging, Charlestown, Massachusetts, 02129

Search for more papers by this author

Martin Reuter,

Martin Reuter

Department of Electrical Engineering and Computer Science, Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, Massachusetts

Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, 02114

Department of Radiology, Massachusetts General Hospital, A.A. Martinos Center for Biomedical Imaging, Charlestown, Massachusetts, 02129

Search for more papers by this author

First published: 02 December 2014

https://doi.org/10.1002/hbm.22707

Citations: 5

Share a link

Email
Wechat
Bluesky

Abstract

Registration performance can significantly deteriorate when image regions do not comply with model assumptions. Robust estimation improves registration accuracy by reducing or ignoring the contribution of voxels with large intensity differences, but existing approaches are limited to monomodal registration. In this work, we propose a robust and inverse-consistent technique for cross-modal, affine image registration. The algorithm is derived from a contextual framework of image registration. The key idea is to use a modality invariant representation of images based on local entropy estimation, and to incorporate a heteroskedastic noise model. This noise model allows us to draw the analogy to iteratively reweighted least squares estimation and to leverage existing weighting functions to account for differences in local information content in multimodal registration. Furthermore, we use the nonparametric windows density estimator to reliably calculate entropy of small image patches. Finally, we derive the Gauss–Newton update and show that it is equivalent to the efficient second-order minimization for the fully symmetric registration approach. We illustrate excellent performance of the proposed methods on datasets containing outliers for alignment of brain tumor, full head, and histology images. Hum Brain Mapp 36:1365–1380, 2015. © 2014 Wiley Periodicals, Inc.

INTRODUCTION

Tremendous progress in image registration has led to a vast number of new algorithms over the last decades. One of the important aspects for translating these algorithms from research to clinical practice is their robustness. Clinical images tend to frequently violate model assumptions that were incorporated in the derivation of the algorithm. Typical sources for such inconsistencies are imaging artifacts, noise, anatomical changes, and partial image information. Registration methods based on concepts from robust estimation have been proposed to address such issues by identifying outliers in the images and restricting their influence on the cost function [Reuter et al., 2010]. These techniques are targeted at minimizing weighted intensity differences and, therefore, cannot account for more complex intensity relationships, prevalent in multimodal images. However, multimodal registration could largely benefit from robust approaches because inconsistencies arise from the very nature of multimodal acquisitions: the depiction of complementary information. Structural information that is only available in selected image modalities thus adds to the inconsistencies already observed in monomodal registration and therefore highlights the need for robust multimodal registration.

Furthermore, several applications require the calculation of inverse-consistent (i.e., unbiased) transformations that are independent of the order of the two passed images. Usually, an asymmetric registration approach is taken, where only one of the images is transformed during registration and consequently only this image is resampled onto the grid of the fixed image. Even in global (rigid/affine) registration, this asymmetry in the resampling process introduces artifacts to only one of the images and has been shown to bias follow-up analyses [Fox et al., 2011; Reuter and Fischl, 2011; Thompson and Holland, 2011; Yushkevich et al., 2010]. Inverse-consistent approaches avoid this processing bias and have been a focus of research in registration [Avants et al., 2008; Christensen and Johnson, 2001; Reuter et al., 2010; Smith et al., 2002; Vercauteren et al., 2009]. A fully symmetric registration setup where both images are mapped into the mid-space avoids such problems and produces inverse-consistent results. However, existing symmetric approaches describe monomodal registration settings and focus mainly on nonlinear transformation models. The reliable and unbiased inclusion of multimodal image stacks into modern image processing pipelines therefore requires an inverse-consistent multimodal registration.

In this article, we address these requirements and introduce a robust, inverse-consistent, multimodal registration algorithm. The method is designed to produce a highly accurate affine alignment of images in the presence of potentially severe local outliers. We derive the new registration algorithm via a probabilistic framework that incorporates layers of latent random variables. The layers provide a structural representation of images, transforming the initial multimodal setup into a monomodal registration problem via localized entropy estimation [Wachinger and Navab, 2012a]. We incorporate a heteroskedastic Gaussian noise model for the similarity term that operates on the latent layers, allowing for spatially varying noise. This construction permits using techniques from robust estimation to model local noise variation via an iteratively reweighted least squares estimation. Moreover, we derive the Gauss–Newton method with symmetric transformation updates in the Lie group and show that this approach corresponds to an efficient second-order minimization (ESM) achieving a vastly improved convergence rate and frequency, as it uses gradient information from both images.

Applications

The first target application is the registration of multimodal, intrasubject, full head images with the objective to accurately align brain structures. As these images are collected very close in time (often within the same scan session), we expect only minimal changes in the brain between these multimodal image pairs, but potentially large differences in surrounding regions, for example, in soft tissue, different jaw, tongue, neck and eye placement, or different cropping planes due to subject motion. Applying deformable registration here is not meaningful as brain displacements are only rigid (or at most affine, depending on the acquisition). Nonlinear registration bears the risk of propagating unwanted deformations into the brain by regularization constrains. In full head registration, however, standard global alignment can fail because of the nonmatching structures and may terminate in local minima. A robust approach helps to overcome this problem by identifying nonmatching structures as outliers and by iteratively reducing their influence on the registration.

An alternative approach to this problem would be to align skull-stripped images and thus remove most of the local differences outside the brain. Unfortunately, brain extraction tools (BET) often work only on a single modality. Even when they can be applied to both images independently, they will lead to varying brain masks, as different parts of dura or skull may be included or removed by the algorithm. A robust registration (RR) approach is capable of identifying these regions and can produce highly accurate alignments of the brain independent of skull stripping errors. Common cross-modal registration procedures are often severely influenced by strong boundary edges and thus may drive the registration toward a good alignment of the boundary, sacrificing accuracy in the interior. Consequently, the proposed RR method can assist in pre-registering full head images for the simultaneous extraction of the brain in both modalities or to accurately align independently extracted brain images.

A second motivating application for our method is the alignment of brain tumor images. Tumors have a different appearance depending on the acquisition parameters and the usage of contrast agent (e.g., Gadolinium in MR angiography). The largely varying appearance can deteriorate the performance of common multimodal registration algorithms. Approaches based on robust statistics are useful in this context; they automatically discount those regions and recover the correct alignment based on the remainder of the image. Additionally, accurate alignment of full head images can also be beneficial for tumor images as the automatic brain extraction is very challenging in these cases. Common BETs use prior knowledge about intensity distributions to extract the brain. As the intensity values vary significantly in tumor regions, such methods fail frequently in the presence of pathology.

A final application is the mapping of histology images to optical coherence tomography (OCT) [Huang et al., 1991; Yaqoob et al., 2005] or to high-resolution MR images. Accurate registration with histology is necessary to validate the appearance of structures in OCT or MR based on their true appearance on the histological slices. Furthermore, accurate correspondence will be invaluable to guide development and validation of new algorithms for automatic detection of areal and laminar boundaries in the human brain, or of the extent of infiltrating tumors. Histology registration can be extremely challenging due to tears and deformations that occur during slicing, mounting, and staining. These artifacts are not present in the previously acquired OCT or MR images and give rise to complex registration problems that can be alleviated when using our proposed robust cross-modal algorithm.

Related Work

Stewart [1999] presents a general summary of robust parameter estimation in computer vision. Most RR techniques focus on monomodal registration. Nestares and Heeger [2000] propose a method based on M-estimators for the RR of images. Periaswamy and Farid [2006] introduce a method based on a local neighborhood and use the expectation-maximization algorithm to handle partial data. Reuter et al. [2010] introduce a robust, inverse-consistent registration approach, where both images are transformed into a mid space. Similar to [Nestares and Heeger, 2000] robust statistics are used to obtain accurate registration in the presence of temporal differences. This approach was shown to lead to superior results for alignment of monomodal images acquired in the same session, as well as across time in longitudinal studies [Reuter et al., 2012].

In addition to related methods for monomodal registration, a few studies have reported results for robust multimodal registration. Itti et al. [1997] extract the brain surface and subsequently perform an iterative anisotropic chamfer matching. A hybrid registration approach, based on the extraction of the brain surface and intensity information is proposed in [Greve and Fischl, 2009]. Wong and Orchard [2009] use the residual of the local phase coherence representation for registration. This is similar to local phase mutual information (MI) [Mellor and Brady, 2005], where MI is calculated between local phase estimates from the image intensities for multimodal registration.

Here, we transform the multimodal registration problem to a monomodal one. Several structural representations have been proposed in the literature for this purpose. Andronache et al. [2008] recolor images, depending on the variance in the images. Other algorithms extract edges and ridges from the images [Maintz et al., 1996] or correlate image gradient directions [Haber and Modersitzki, 2007]. Wachinger and Navab [2012a] study the theoretical properties of structural representations and propose two approaches: an entropy representation, and Laplacian images with spectral decomposition for dimensionality reduction. Entropy images can be easily and quickly calculated via a dense grid of patches. While they bear similarities to gradient images, they better represent complex configurations such as triple junctions [Wachinger and Navab, 2012a]. Shechtman and Irani [2007] use a concept similar to the Laplacian images; however, they construct the self-similarity descriptor based only on local information. Heinrich et al. [2012] also use local self-similarity measures for multimodal registration, but without dimensionality reduction. With the length of the information vector corresponding to the size of the neighborhood, this approach is limited to small neighborhoods due to memory and computational complexity constrains.

In this study, we focus on affine and rigid transformation models with low degrees of freedom. Many applications require a highly accurate global registration, for example, to align different image modalities within subject in close temporal proximity. Nonlinear registrations across subjects, or across time, are usually performed within-modality using a high-resolution image or full images stacks. Furthermore, nonlinear registrations rely on an accurate global alignment as a preprocessing step. There are a number of freely available registration software packages. The widely used registration tool FLIRT [Jenkinson et al., 2002], part of the FSL package [Smith et al., 2004], implements several intensity based cost functions such as sum of squared differences, correlation ratio, and MI. It also contains sophisticated optimization schemes to prevent the algorithms from being trapped in local minima. The freely available SPM software package [Ashburner and Friston, 1999] contains a registration tool based on [Collignon et al., 1995]. In our study, we compare against these two programs to evaluate the proposed registration method.

Outline

Robust Multimodal Registration section introduces our robust, inverse-consistent, multimodal registration method using a probabilistic framework of image registration. In Optimization section, we derive the steepest descent and Gauss–Newton optimization steps, with the latter one corresponding to the ESM for the proposed symmetric registration setup. The calculation of entropy images with the nonparametric (NP) windows estimator is described in Entropy Images with NP Windows section. Finally, we present experiments on several datasets in Experiments section.

ROBUST MULTIMODAL REGISTRATION

In this section, we detail our robust multimodal registration approach. First, we describe the parameterization of the transformations for the symmetric setup in Symmetric Transformation Parameterization section. Subsequently, we present a probabilistic model for robust multimodal registration based on entropy images in Probabilistic Model for RR section. Heteroskedastic Noise section introduces the spatially varying noise model that allows us to draw the analogy of the derived log-likelihood function to iteratively reweighed least squares estimation in Relationship to Iteratively Reweighted Least Squares section. This connection permits to leverage existing robust estimators for the purpose of RR in Robust Estimators section.

Symmetric Transformation Parameterization

For RR, we are interested in transforming both images into an unbiased common space. This is achieved by a symmetric registration approach, mapping both images into the mid space [Avants et al., 2008]. The result is an inverse-consistent registration [Christensen and Johnson, 2001; Zeng and Chen, 2008]. This implies that the exact inverse transformation will be obtained by swapping the fixed and moving image. The common setup is to have one moving image $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0001$ and one fixed image $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0002$ together with a transformation $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0003$ that operates on the moving image, $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0004$ , for all spatial locations on the image grid $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0005$ . For the symmetric approach, we want to transform both images half way and, therefore, need to calculate the half transformation “ $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0006$ ”. As the space of most linear and nonlinear transformations is not a vector space, we use a parameterization of transformations that has a Lie group structure. This construction ensures that we stay within the transformation space after the parameter update and facilitates calculations, such as computing the inverse transformation, as required for the symmetric approach.

More precisely, consider transformation $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0007$ that is part of the Lie group $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0008$ with the related Lie algebra $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0009$ . Each element in the Lie algebra $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0010$ can be expressed as a linear combination $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0011$ , with the standard basis of the Lie algebra $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0012$ and transformation parameters $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0013$ [Zefran et al., 1998]. The exponential map

$urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0014$

assigns to an element of the Lie algebra $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0015$ that is dependent on the transformation parameters $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0016$ the corresponding transformation $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0017$ . For notational convenience, we define the transformation operator $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0018$ that maps the location $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0019$ with the Lie algebra parameters $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0020$ . This is a shorthand for creating a member of the Lie algebra with the basis elements $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0021$ , applying the exponential map to obtain a transformation matrix, and using this matrix to map $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0022$ .

In the space of the Lie algebra it is straightforward to calculate the mid space transformation with the half parameters $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0023$ and the inverse transformation with the negative parameters $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0024$ . The concatenation of two transforms leads to the original transform

$urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0025$

Lie group parameterizations have been applied to rigid, affine, projective, and nonlinear registration [Arsigny et al., 2009; Benhimane and Malis, 2004; Vercauteren et al., 2009; Wachinger and Navab, 2009; Wachinger and Navab, 2013]. In this study, we work with rigid and affine transformations. While a standard parameterization for the rigid case exists [Zefran et al., 1998], different parameterizations were proposed for affine transformations [Arsigny et al., 2009; Kaji et al., 2013; Kaji and Ochiai, 2014]. For the parameterization in [Arsigny et al., 2009], no closed form of the exponential map exists. The parameterization in [Kaji et al., 2013; Kaji and Ochiai, 2014] describes the transformation matrix as a product of two matrix exponentials, which complicates the computation of the half transform. A viable alternative in practice is to directly update the parameters of the transformation matrix, as gradient descent optimizations are unlikely to produce negative determinants when started from identity. The half transform in this case can be directly computed via the matrix square root. Note, that the matrix square root is equivalent to $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0026$ but computationally more efficient. It can be computed via a Schur decomposition and then applying a recurrence of Parlett for computing the root of the triangular matrix.

Probabilistic Model for RR

Probabilistic modeling presents a mathematical framework to derive algorithms for image registration in which assumptions about the characteristics of the noise and signal are made explicit. In this work, we leverage techniques for robust estimation that require comparable intensity values, as in the monomodal registration setup. For this purpose, we use a probabilistic framework [Wachinger and Navab, 2012b] that introduced layers of latent random variables, called description layers. Figure 1 shows a schematic illustration of the framework in 2D. The description layer $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0027$ depends on $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0028$ and $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0029$ depends on $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0030$ . In our study, we use the description layers to store a structural representation of images.

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

Contextual model for image registration illustrated in 2D. Image layers $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0031$ and $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0032$ consist of observed random variables. Description layers D_m and D_f are latent. Each descriptor variable at location $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0035$ depends on a local image neighborhood $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0036$ , which is of size 4 in the illustration. A one-to-one relationship exists between the description layers. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

**Figure 1**
Open in figure viewer PowerPoint

Contextual model for image registration illustrated in 2D. Image layers $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0031$ and $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0032$ consist of observed random variables. Description layers D_m and D_f are latent. Each descriptor variable at location $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0035$ depends on a local image neighborhood $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0036$ , which is of size 4 in the illustration. A one-to-one relationship exists between the description layers. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

This framework emphasizes the importance of the local context. It replaces the common assumption of independently distributed coordinate samples with the Markov property. The Markov property allows for splitting up a joint distribution into singleton terms, where each term is conditioned on a neighborhood that contains all the relevant information. In our case, each location $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0037$ in the description layer is dependent on a local neighborhood or patch $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0038$ in the image, as shown in Figure 1. The joint distribution for the contextual framework factorizes as

$urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0039$ (1)

with the coupling terms $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0040$ and $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0041$ that ensure that the description layers represent the original image information, and the similarity term $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0042$ that compares both description layers. Commonly used description layers result from image filtering, image gradients, or dense feature descriptors. As we are interested in a modality-invariant description, we use a structural representation with entropy images, justified by applying the asymptotic equipartition property on the coupling terms [Wachinger and Navab, 2012b]. Under the assumption that the information content across modalities is similar, entropy images reduce the multimodal setup to a monomodal registration problem. This is essential for the further derivation of the RR approach, which assumes a monomodal registration setup. Entropy Images with NP Windows section details the calculation of entropy images.

Based on the distribution in Eq. 1, we formulate registration as maximum likelihood estimation [Viola and Wells, 1997; Roche et al., 2000]. For the contextual framework, the optimization is performed over the description layers and transformation parameters

$urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0043$

In this work, we do not allow for a dynamic adaptation of the description layers during the registration process by specifically choosing to work with entropy images. The coupling terms, therefore, only motivate the construction of the description layers but do not influence the optimization, yielding the maximization problem

$urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0044$

Heteroskedastic Noise

After introducing the layout of the probabilistic framework, the similarity term $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0045$ needs to be specified. This requires the introduction of the underlying imaging model. First, we map both of the description layers into the mid space to obtain an inverse-consistent transformation. Second, we assume heteroskedastic Gaussian noise to account for outliers and perform robust estimation. Third, instead of intensity mapping, we operate on the modality-invariant description layers. This set of assumptions yields the imaging model

$urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0046$ (2)

with a spatially varying Gaussian noise $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0047$ and variance $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0048$ . Note that this model differs from alternative probabilistic approaches [Viola and Wells, 1997; Roche et al., 2000] that consider homoskedastic image noise, $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0049$ . To facilitate notation, we define $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0050$ for the movable image mapped forward and $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0051$ for the fixed image mapped backward. We continue referring to images and description layers as fixed and moving to differentiate between them, although both of them are moving.

In the derivation of the similarity measure, we relax the commonly made assumption of independent and identically distributed samples to allow independent and not identically distributed samples. We define the log-likelihood function for heteroskedastic Gaussian noise

$urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0052$ (3)

where const does not depend on the variables of interest. The influence of each sample in the summation is dependent on $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0053$ . In fact, $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0054$ can be seen as a local weight that determines the influence of the local error. Note that, for a constant variance across the image $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0055$ , this reduces to the commonly applied sum of squared differences. The optimization problem changes from estimating just the transformation parameters to additionally estimating the variance at each location

$urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0056$

with the vector of variances $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0057$ . The joint estimation of transformation parameters and variance can cause a complex optimization problem. We use an iterative optimization procedure that alternates between optimizing the transformation parameters and variances, as summarized in the following Algorithm 1:

Alternating Optimization:

$urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0058$

for k = 1 to maxIteration do

$urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0059$

$urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0060$

end for

Initially, the transformation parameters are set to zero, corresponding to the identity transformation. 1 In step (I), the optimal variances $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0061$ are calculated for the transformation of the last iteration $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0062$ . In step (II), the optimal transformation $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0063$ is calculated for fixed variances $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0064$ . An important consequence of the decoupled optimization is that the log-likelihood term in Eq. 3 only depends on the second term when maximizing with respect to the transformation parameters in step (II) of Algorithm 1.

To solve step (I), we draw the relationship to iteratively reweighted least squares estimation for which a large number of robust estimators have been proposed. This allows us to estimate local variances even though we only have a two intensity values at each location $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0065$ .

Relationship to Iteratively Reweighted Least Squares

In iteratively reweighted least squares estimation [Holland and Welsch, 1977; Street et al., 1988], the least squares estimator is replaced by the more general class of M-estimators. This yields a cost function

$urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0066$

where $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0067$ is a symmetric, positive-definite function with a unique minimum at zero and $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0068$ . In the following, we concentrate on $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0069$ being differentiable, in which case the M-estimator is said to be of $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0070$ -type (as $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0071$ ). Calculating the derivative yields

$urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0072$

Defining a weight function $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0073$ , we obtain

$urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0074$ (4)

To relate this to our optimization problem, we calculate the gradient of the negative log-likelihood function of Eq. 3

$urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0075$ (5)

under the assumption that $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0076$ does not depend on the transformation parameter $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0077$ , as it is the case for the alternating optimization. Comparing Eqs. 4 and 5, we see that they can be brought into correspondence by identifying $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0078$ . This shows that the application of gradient-based optimization for the proposed log-likelihood function is equivalent to the gradient-based optimization of the M-estimator function $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0079$ . As the weights are inversely related to the variances, we have $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0080$ for different M-estimators $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0081$ .

Robust Estimators

Having this important connection established, we can now leverage M-estimators $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0082$ , for the robust description of the local variances. Here, we specifically model the variances via Tukey's biweight function, as it completely suppresses the effect of outliers:

$urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0083$

Figure 2 illustrates Tukey's biweight function as well as the quadratic function with the corresponding weights. The weight function $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0084$ for the quadratic function $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0085$ is the constant function $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0086$ and for Tukey's biweight function is $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0087$ for $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0088$ and zero otherwise. Tukey's biweight (red curve) limits the influence of large residuals ( $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0089$ ), for which the weights (red dashed curve) are zero (and the variance infinity). The regular least squares approach (blue curves) results in a constant weight function and variance, independent of the local residuals, and thus considers the contribution of all voxels equally. Using Tukey's estimator requires the saturation parameter $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0090$ to be set. A fixed value of $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0091$ is recommended for unit Gaussian noise by [Holland and Welsch, 1977; Nestares and Heeger, 2000]. However, it may be more appropriate to specify the value depending on the characteristics of the input images. A method to automatically estimate the saturation parameter for each image pair in a preprocessing step on low-resolution images has been recommended in [Reuter et al., 2010]. The procedure allows for more outliers in the skull, jaw and neck regions, where more variation is to be expected. We use this method to estimate the saturation parameter for entropy images automatically.

**Figure 2**
Open in figure viewer PowerPoint

Illustration of Tukey's biweight function for $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0092$ in red and the quadratic function $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0093$ in blue. We also show the corresponding weight function $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0094$ as dashed line. The weights are constant for the quadratic function. For Tukey's biweight, the weights are zero for |r|. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

OPTIMIZATION

In this section, we present more details on the optimization with respect to the transformation parameters (step (II) in Algorithm 1) and derive the Gauss–Newton method. Moreover, we show that Gauss–Newton corresponds to the ESM for the special case of a fully symmetric setup. For the alternating optimization, only the second term of the log-likelihood function in Eq. 3 is a function of the transformation. We rewrite the resulting cost function in vector notation

$urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0096$ (6)

with the diagonal weight matrix $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0097$ of size $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0098$ . As before, the weights do not depend on the transformation parameters $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0099$ .

Steepest descent:

For steepest descent, we obtain the update rule

$urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0100$

where $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0101$ is the learning rate . Note that the additive update in Lie algebra is equivalent to a compositional update in the Lie group $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0102$ . The Jacobian $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0103$ contains the partial derivatives of all $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0104$ parameters, that is, $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0105$ with

$urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0106$

Gauss–Newton and Efficient Second-Order Minimization:

For least squares optimization problems, as in Eq. 6, Gauss–Newton methods can be applied. From the Taylor series, we obtain the following approximation of the residual for small updates $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0107$

$urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0108$

with the Jacobian $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0109$ . The linear approximation yields

$urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0110$

We obtain the optimal update by differentiating with respect to $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0111$ and setting the gradient to zero, leading to the linear system of equations

$urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0112$

The update is as before $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0113$ . We take a closer look at the Jacobian to see an interesting consequence of the fully symmetric setup

$urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0114$

As the first image is mapped exactly in the inverse direction of the second, the derivatives of the transformations only differ by a minus sign. Thus, the transformation derivative $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0115$ can be factored out. The resulting formula for the Jacobian is equivalent to the Jacobian of the ESM [Benhimane and Malis, 2004; Vercauteren et al., 2007; Wachinger and Navab, 2009, 2013]. For the fully symmetric registration setup, Gauss–Newton and ESM are therefore equivalent. A similar relationship for another symmetric formulation was independently discovered by [Lorenzi et al., 2013] in an analysis of the log-Demons algorithm.

In a general setup, ESM was shown to require significantly fewer steps in the optimization than Gauss–Newton [Benhimane and Malis, 2004; Wachinger and Navab, 2013] and to have a cubic convergence rate [Malis, 2008]. The update procedure of ESM is closely related to Gauss–Newton: it uses the addition of the image gradients of both images, instead of just one of the images. We use the Gauss–Newton optimization in our experiments and are benefiting from the attractive convergence properties of ESM because of the symmetric setup. To simplify implementation, we explicitly state the derivative at a point $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0116$ for the rigid case [Wachinger and Navab, 2013]

$urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0117$

and for the affine case [Ashburner, 2000; Reuter et al., 2010]

$urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0118$

Additional Algorithmic Details

We use a multiresolution approach to estimate large displacements between the images [Roche et al., 1999]. The Gaussian pyramid is constructed by halving each dimension on each level until the image size is approximately $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0119$ . This procedure results in five resolution levels for typical image sizes ( $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0120$ ). The iterations at each level are terminated if the transformation update is below a specified threshold (0.01 mm) or a maximum number of iterations (5) is reached. The registration of the subsequent resolution level is initialized with the result of the previous one.

In our experiences with entropy images, different smoothness or signal-to-noise ratio can lead to entropy images that vary in overall scale. To account for this variation, we add a global intensity scaling of entropy images. As we are interested in a fully symmetric registration setup, we apply the intensity scaling to both images. The equation for the imaging model (2), changes to

$urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0121$

where $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0122$ is the scaling parameter. 2 We choose an exponential intensity scaling, in contrast to square root multiplication of s in [Reuter et al., 2010], because it is symmetric (with respect to the parameter update) and improves the stability of the model.

ENTROPY IMAGES WITH NP WINDOWS

We use entropy images with the NP windows estimator to obtain a structural representation of the images. Considering an image $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0123$ and a local neighborhood $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0124$ , the corresponding description layer is calculated as

$urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0125$

with the Shannon entropy $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0126$ and $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0127$ the restriction of the image to the neighborhood $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0128$ . The calculation of the local entropy for all locations in the image leads to entropy images. Compared to MI, entropy images have advantages for the registration of images affected by bias fields [Wachinger and Navab, 2012a]. An essential step for the entropy calculation is the estimation of the intensity distribution $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0129$ within patches $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0130$ . Conversely, the patch size has to be large enough to provide enough samples for reliable density estimation. Conversely, a large patch size leads to smoothing in the images. As we are interested in a highly accurate multimodal registration, we want to reduce the patch size as much as possible to allow for accurate localization and sharp structures.

One approach is to upsample the original image and to utilize the interpolated values for the density estimation. A recently proposed method of NP windows [Dowson et al., 2008] considers the asymptotic case by letting the number of interpolated samples go to infinity. The method involves constructing a continuous space representation of the discrete space signal using a suitable interpolation method. NP windows require only a small number of observed signal samples to estimate the density function and are completely data driven. Originally, Dowson et al. [2008] only presented closed form solutions for certain 1D and 2D density estimation cases. Recently, Joshi et al. [2011] introduced a simplified version of NP windows estimation. Importantly they also include closed form solutions for 3D images. The density is estimated per cube in the volume, where each cube is divided in five tetrahedra. Finding the density estimate over the cube then reduces to finding densities for each of the tetrahedra. We use the NP windows density estimation for entropy images, leading to highly localized entropy images. Figure 3 shows entropy images for density estimation based on histograms and NP windows for different patch sizes. The entropy estimation with NP windows leads to a clear improvement, especially for small patch sizes.

EXPERIMENTS

We evaluate the robust, multimodal registration in the following experiments. For quantifying the registration results, we calculate the root mean squared (RMS) error between two transformations [Jenkinson, 1999]. This error describes the average voxel displacements in millimeter inside a ball with radius $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0131$ located at the center of the image. Given two affine transformations in 3D, $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0132$ and $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0133$ , with the $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0134$ transformation matrices $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0135$ and the $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0136$ translation vectors $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0137$ , the RMS error is

$urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0138$

We calculate the RMS error on transformations in RAS (right, anterior, superior) coordinates, where the origin lies approximately in the center of the image. We set $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0139$ mm, to define a sphere that is large enough to include the full brain. For example, an $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0140$ error of 0.1 mm corresponds to 1/10 of a voxel displacement (for 1mm isotropic images), which can easily be picked up in visual inspection.

Brain Registration with Ground Truth

In the first experiment, we work with simulated T1 and T2 images from the BrainWeb dataset [Cocosco et al., 1997], which are perfectly aligned. Always knowing the ground truth transformations allows us to establish results on the accuracy of different registration procedures. The image resolution is $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0141$ with 1mm isotropic spacing. We automatically skull-strip the T1 image with the BET [Smith, 2002], modify the brain mask (asymmetric and localized dilation), and apply it to the T2 image to induce variations in brain extraction between both modalities. We apply random transformations to the skull-stripped images with translations of 30 mm and rotations around an arbitrary axis of 25°. Different registration approaches are compared to determine how well they recover the correct registration: the statistical analysis of the RMS registration error for 100 repetitions is shown in Figure 4.

**Figure 4**
Open in figure viewer PowerPoint

Statistical analysis of RMS errors for the skull-strip T1-T2 registration study over 100 repetitions. Bars indicate mean error and correspond to two standard deviations. *** indicates significance level at $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0142$ . Robust registration with nonparametric windows (RR-NP) yields a significant reduction in registration error with respect to robust registration with histograms (RR), FLIRT, and SPM. Furthermore, setting the parameter in Tukey's biweight function to $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0143$ , which allows for more outliers than $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0144$ , yields to significantly better results. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

For the robust registration with NP windows (RR-NP), the patch size is $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0145$ and for the robust registration with histograms (RR) it is $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0146$ . We manually set the parameter in Tukey's biweight function to simulate a robust approach with many outliers ( $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0147$ ) and a less robust approach with few outliers ( $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0148$ ). We compare to popular and freely available registration software using MI: FLIRT in FSL and the coregistration tool in SPM. Our results show the significant reduction in registration error of our approach in contrast to these reference methods. Moreover, we observe a significant improvement for the density estimation with NP windows in comparison to histograms, highlighting the importance of the better localization with NP windows and confirming the qualitative results from Figure 3. Finally, the significant improvement for $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0149$ over $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0150$ demonstrates the necessity for a robust approach to limit the influence of outliers on the registration. The creation of the entropy images took about 9 s with histograms and about 320 s with NP windows. The remaining registration took about 39 s.

Tumor Registration

In the second experiment, we evaluate registration accuracy based on a pair of real brain tumor MR T1 (magnetization-prepared rapid gradient-echo, MPRAGE, [Mugler and Brookeman, 1990]) and MR T2 images, both acquired within the same session. The image resolution is $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0151$ mm³ for T1 and isotropic 1 mm for T2. Figure 5 shows the pair of images that we use for the experiments with the corresponding entropy images. Both the automatic skull stripping tool in FreeSurfer and the BET [Smith, 2002] frequently fail for tumor images because of the different appearance of enhancing tumor compared to regular brain tissue, violating intensity priors used by the methods. To produce a useful result, we manually refined the brain extraction obtained from FreeSurfer on the T1 image and propagate the brain mask to the T2 image by registering the full head T1 and T2 images with FLIRT.

When performing robust multimodal registration on the skull-stripped brain images, we expect the tumor regions to be marked as outliers because of the different appearance in both modalities. Figure 6 depicts the weights calculated with Tukey's biweight function (Robust Estimators section) at each spatial location after successful registration. Smaller weights are shown in red and yellow. Note that weights are inversely related to the variances. We observe that tumor regions are marked as outliers (yellow), as expected. Also note the decreased weight at the interface between the brain and the skull, which is likely caused by the inconsistent brain extraction (mask propagation) and differential distortion between the two modalities. These nonmatching structures in tumor regions and artifacts from the brain extraction, that can cause problems for nonrobust techniques, are correctly identified automatically by our method and their influence is limited during the registration.

In the case of clinical tumor images, we do not know the ground truth alignment. To quantify the registration error, landmarks were manually selected in both modalities, permitting the computation of landmark distances after the registration. Figure 7 shows the statistical analysis for 100 repetitions with the random displacement of the images similar to the previous experiment. We compare RR-NP with patch size 3 to FLIRT and SPM. The results indicate a significantly lower registration error for the proposed RR approach. The manual refinement of the brain extraction and the manual selection of landmarks make the extension of this study to many subjects complicated, but results on a large cohort are presented in the next section.

**Figure 7**
Open in figure viewer PowerPoint

Statistical analyses of RMS errors for tumor registration study over 100 repetitions. Bars indicate mean error; error bars correspond to two standard deviations. *** indicates significance level at $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0152$ . Robust registration with NP windows (RR-NP) yields a significant reduction in registration error. The range of the y-axis is adjusted to highlight the differences of the methods. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Full Head Registration

In this experiment, we register T1 (Multi-Echo MPRAGE [Kouwe et al., 2008]) and T2 full head images of 106 healthy subjects. These images have a 1 mm isotropic resolution ( $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0153$ ). Although each image pair was acquired within the same session, we expect local differences caused by jaw, tongue, and small head movements. Such local displacements can deteriorate the registration quality of full head scans. Figure 8 shows a registered pair of T1 and T2 images with an overlay of the estimated weights, together with a magnified view of the area around the tongue. Due to the movement of the tongue between the scans, the RR algorithm detects outliers in that region. Note that also regions in the periphery of the brain show low weights, which are caused by the different appearance of dura mater and CSF in T1- and T2-weighted images. The local information content is different in these regions, yielding differences between the entropy images. The benefit of the robust approach is to identify these contradictory regions as outliers and reduce their influence on the registration.

Again no ground truth alignment is available, but the number of scans in this study is too large for the manual identification of landmarks. Instead, we assume that the registration of skull-stripped images is more accurate than the registration of the full head images because most structures that are susceptible to patient motion are removed. As both scans were collected within the same session, we do not expect any changes in the brain. Local differences are mainly expected in the scalp, jaw, neck, and eye regions. It is, therefore, reasonable to assume higher registration accuracy when using the brain as a stable region of interest (ROI). Given a reference transformation $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0154$ , we compute the RMS error $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0155$ , where $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0156$ is the transformation computed on the full head scans with registration method Type that we want to evaluate. Figure 9 shows the difference in RMS error between the usage of FLIRT and SPM compared to RR-NP, so $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0157$ and $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0158$ , where positive differences indicate that a better approximation of the reference transformation with RR-NP. To avoid biasing results by selecting only one registration to establish the reference transformation, we report results for using all three registration methods to compute $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0159$ , shown on the x-axis of Figure 9. For each subject, we randomly transform the full head images (up to 15° and 20 mm) 10 times and perform the registration with each method, yielding 3180 full head registrations. The results show that RR-NP is significantly better in recovering the reference transformation established on the brain ROI independent of which reference method was used.

**Figure 9**
Open in figure viewer PowerPoint

Analysis of difference of RMS errors for the large control study on 106 subjects with 10 repeats. The RMS error is computed between full head registrations and the reference transformation, yielding $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0160$ , where we use RR-NP, FLIRT, and SPM as registration Type on the full head scans. Differences of RMS errors $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0161$ and $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0162$ are plotted to compare the methods, where positive values show better performance of RR-NP. Bars indicate mean difference error and correspond to standard error of difference. *, **, and *** indicate significance levels from paired t-test at $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0163$ , $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0164$ and $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0165$ , respectively. To avoid biasing the results, we use each method (RR-NP, FLIRT, and SPM) to establish the reference transformation on the skull-stripped images (x-axis). [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Histology-OCT Registration

In this multimodal experiment, we register images from OCT with histology slices. OCT is an optical microscopy technique, analogue to ultrasound imaging, which provides a resolution close to 1 µm. This technique requires neither staining nor dyes, as it relies on the intrinsic optical properties of the tissue. The block sample can be imaged prior to any cutting, which greatly reduces the distortions contrary to the histological protocol. Details on the technique and the acquisition of those images can be found in Magnain et al. [2014]. Histology-OCT alignment is of clinical importance to validate the appearance of structures in OCT based on the true appearance on the sliced tissue. The registration is challenging as histology images show artifacts due to local deformations and tearing of the tissues during slicing, which are not present in the previously acquired OCT images. Figure 10 shows the result of the registration between an OCT and histology slice, as well as the estimated weights. The magnified view illustrates the differences between both images and shows that tears in the histology slices are correctly marked as outliers. Figure 11 shows another set of histology and OCT images with the corresponding entropy images. The gyrus was cropped from larger OCT and histology images independently, which introduces small differences at the top boundary edge due to slightly different cropping angles. The weight map shows that these differences together with internal cracks are marked as outliers by the RR.

To confirm the promising results from the qualitative analysis, we identified 26 landmark correspondences in four histology and corresponding OCT images. We use the transformation from image registration to map the landmarks from the OCT to the histology domain. The average Euclidean distance between mapped OCT and histology landmark pairs measures the registration error. Figure 12 shows the mean and standard error for the RR, for FSL, and for an alternative 2D affine registration method that is based on the SPM algorithm (also using MI and a Powell minimizer). The original SPM registration method is available only for the full 3D case and does not model 2D affine transformations. Because of the high image resolution in our experiment ( $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0166$ ), we select a larger patch size and the histogram-based density estimation for constructing the entropy images in the RR method. The results from the quantitative analysis confirm the improved registration accuracy of the robust multimodal registration.

Bias Field Robustness

Intensity bias fields in images can severely affect cost functions in cross-modal registration, such as MI [Myronenko and Song, 2010]. On the contrary, the presented approach is very robust with respect to bias fields, as it is based on local entropy estimation. To demonstrate robustness with respect to changes in the bias field, we use five different cases from the BrainWeb dataset [Cocosco et al., 1997] and increase the size of this test set by simulating brain tumors [Prastawa et al., 2009] (including contrast enhancement, local distortion of healthy tissue, infiltrating edema adjacent to tumors, destruction and deformation of fiber tracts). For each of the five original cases, tumor seed were placed at two different locations and two different parameter settings were selected to vary the characteristics of the tumor (conservative vs. aggressive growth), yielding 20 tumor cases. Each simulation produced aligned T1, T2, and T1 Gadolinium enhanced (T1Gad) tumor images (1mm isotropic, $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0167$ ) where we added noise and bias fields. We apply random rigid transformations with translations of 30 mm and rotations of 25° around an arbitrary axis with the center of rotation corresponding to the image center, and measure how accurately each test method can estimate the true transformation. We perform the RR for patch sizes between $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0168$ and $urn:x-wiley:10659471:media:hbm22707:hbm22707-math-0169$ and again compare to FLIRT and SPM. Figure 13 shows the RMS registration errors for the different cross-modal registration pairs. It can be seen that bias field does not affect the RR approach. SPM performs better than FLIRT, but in both methods registration fails completely in a large number of cases. Without adding a bias field, none of the methods produce any of these severe registration failures (not shown). Bias field correction can, therefore, be expected to remove most of the intensity bias and resulting registration problems, but this additional processing step is not required for our robust entropy-based cross-modal registration approach.

DISCUSSION AND CONCLUSION

We presented a novel registration approach for inverse-consistent, multimodal registration that is robust with respect to local differences and with respect to bias fields. To achieve outlier robustness, we incorporated a heteroskedastic noise model and established the relationship to iteratively reweighed least squares estimation. We derived the Gauss–Newton optimization, which we showed to be equivalent to the ESM in case of our robust and inverse-consistent registration setup. To allow for better localization of structures when constructing entropy images, we used a NP density estimator and demonstrated its advantages. We evaluated our method on different multimodal datasets and demonstrated increased accuracy and robustness. This work focuses on global registration and it remains to investigate the performance and feasibility of the proposed robust multimodal approach for nonlinear registration. One concern is that locations with large differences may be marked as outliers and therefore produce no force on the deformation field, although they could potentially be correctly aligned in subsequent steps. Conversely, deformation fields are never estimated only locally. Regularizers and parametric models combine the forces from several locations. This combination may still push the deformation field in the correct direction, in spite of a reduced weight at certain locations. Adjustments to the robust estimation approach may be required, for instance, the regularization of weights via local smoothing.

The presented registration method will be made freely available within the FreeSurfer software package.

ACKNOWLEDGMENTS

We are very thankful to Niranjan Joshi and Timor Kadir for providing Matlab code of the NP windows estimator. We are very grateful to the unknown reviewers for their insightful comments. BF has a financial interest in CorticoMetrics, a company whose medical pursuits focus on brain imaging and measurement technologies. BF's interests were reviewed and are managed by Massachusetts General Hospital and Partners HealthCare in accordance with their conflict of interest policies.

REFERENCES

Andronache A, von Siebenthal M, Székely G, Cattin P (2008): Non-rigid registration of multi-modal images using both mutual information and cross-correlation. Med Image Anal 12: 3–15.
10.1016/j.media.2007.06.005
CAS PubMed Web of Science® Google Scholar
Arsigny V, Commowick O, Ayache N, Pennec X (2009): A fast and log-euclidean polyaffine framework for locally linear registration. J Math Imaging Vis 33: 222–238.
10.1007/s10851-008-0135-9
Web of Science® Google Scholar
Ashburner J (2000): Computational Neuroanatomy. PhD thesis, University College London.
Google Scholar
Ashburner J, Friston KJ (1999): Nonlinear spatial normalization using basis functions. Hum Brain Mapp 7: 254–266.
10.1002/(SICI)1097-0193(1999)7:4<254::AID-HBM4>3.0.CO;2-G
CAS PubMed Web of Science® Google Scholar
Avants BB, Epstein CL, Grossman M, Gee JC (2008): Symmetric diffeomorphic image registration with cross-correlation: Evaluating automated labeling of elderly and neurodegenerative brain. Med Image Anal 12: 26–41.
10.1016/j.media.2007.06.004
CAS PubMed Web of Science® Google Scholar
Benhimane S, Malis E (2004): Real-time image-based tracking of planes using efficient second-order minimization. In: Proceedings of IEEE International Conference on Intelligent Robots and Systems IEEE/RSJ, IEEE. pp 943–948.
Google Scholar
Christensen GE, Johnson HJ (2001): Consistent image registration. IEEE Trans Med Imaging 20: 568–582.
10.1109/42.932742
CAS PubMed Web of Science® Google Scholar
Cocosco CA, Kollokian V, Kwan RK-S, Evans AC (1997): BrainWeb: Online interface to a 3D MRI simulated brain database. NeuroImage 5: 425.
Google Scholar
Collignon A, Maes F, Delaere D, Vandermeulen D, Suetens P, Marchal G (1995): Automated multi-modality image registration based on information theory. In Information Processing in Medical Imaging, Ile de Berder, France, Springer, Vol. 3. pp 264–274.
Google Scholar
Dowson N, Kadir T, Bowden R (2008): Estimating the joint statistics of images using nonparametric windows with application to registration using mutual information. IEEE Trans Pattern Anal Mach Intell 30: 1841–1857.
10.1109/TPAMI.2007.70832
PubMed Web of Science® Google Scholar
Fox NC, Ridgway GR, Schott JM (2011): Algorithms, Atrophy and Alzheimer's disease: Cautionary tales for clinical trials. Neuroimage 57: 15–18.
10.1016/j.neuroimage.2011.01.077
PubMed Web of Science® Google Scholar
Greve DN, Fischl B (2009): Accurate and robust brain image alignment using boundary-based registration. Neuroimage 48: 63.
10.1016/j.neuroimage.2009.06.060
PubMed Web of Science® Google Scholar
Haber E, Modersitzki J (2007): Intensity gradient based registration and fusion of multi-modal images. Methods Inf Med 46: 292–299.
10.1160/ME9046
PubMed Web of Science® Google Scholar
Heinrich MP, Jenkinson M, Bhushan M, Matin T, Gleeson FV, Brady SM, Schnabel JA (2012): MIND: Modality independent neighbourhood descriptor for multi-modal deformable registration. Med Image Anal 16: 1423–1435.
10.1016/j.media.2012.05.008
PubMed Web of Science® Google Scholar
Holland PW, Welsch RE (1977): Robust regression using iteratively reweighted least-squares. Commun Stat-Theory Methods 6: 813–827.
10.1080/03610927708827533
Web of Science® Google Scholar
Huang D, Swanson EA, Lin CP, Schuman JS, Stinson WG, Chang W, Hee MR, Flotte T, Gregory K, Puliafito CA (1991): Optical Coherence Tomography. Science 254: 1178–1181.
10.1034/j.1600-0420.2002.800510.x
CAS PubMed Web of Science® Google Scholar
Itti L, Chang L, Mangin JF, Darcourt J, Ernst T (1997): Robust multimodality registration for brain mapping. Hum Brain Mapp 5: 3–17.
10.1002/(SICI)1097-0193(1997)5:1<3::AID-HBM2>3.0.CO;2-7
CAS PubMed Web of Science® Google Scholar
Jenkinson M (1999): Measuring Transformation Error by RMS Deviation. Tech. Rep. TR99MJ1, Oxford Center for Functional Magnetic Resonance Imaging of the Brain (FMRIB).
Google Scholar
Jenkinson M, Bannister P, Brady M, Smith S (2002): Improved optimization for the robust and accurate linear registration and motion correction of brain images. Neuroimage 17: 825–841.
10.1006/nimg.2002.1132
PubMed Web of Science® Google Scholar
Joshi N, Kadir T, Brady M (2011): Simplified computation for nonparametric windows method of probability density function estimation. IEEE Trans Pattern Anal Mach Intell 33: 1673–1680.
10.1109/TPAMI.2011.51
Web of Science® Google Scholar
Kaji S, Ochiai H (2014): A concise parametrization of affine transformation, in preparation, www.skaji.org/files/WEB-exponential.pdf.
Google Scholar
Kaji S, Hirose S, Ochiai H, Anjyo K (2013): A lie theoretic parameterization of affine transformation, In: Proceedings of MEIS2013, Vol. 50, Kyushu University, pp 134–140.
Google Scholar
Lorenzi M, Ayache N, Frisoni GB, Pennec X (2013): LCC-Demons: A robust and accurate diffeomorphic registration algorithm. NeuroImage 81: 470–483.
10.1016/j.neuroimage.2013.04.114
CAS PubMed Web of Science® Google Scholar
Magnain C, Augustinack JC, Reuter M, Wachinger C, Frosch MP, Ragan T, Akkin T, Weeden VJ, Boas DA, Fischl B (2014): Blockface histology with optical coherence tomography: A comparison with Nissl staining. NeuroImage 84: 524–533.
10.1016/j.neuroimage.2013.08.072
PubMed Web of Science® Google Scholar
Maintz JBA, van den Elsen PA, Viergever MA (1996): Comparison of edge-based and ridge-based registration of CT and MR brain images. Med Image Anal 1: 151–161.
10.1016/S1361-8415(96)80010-7
CAS PubMed Google Scholar
Malis E (2008): Méthodologies d'estimation Et de Commande à Partir d'un Système de Vision. PhD thesis, Nice-Sophia Antipolis.
Google Scholar
Mellor M, Brady M (2005): Phase mutual information as a similarity measure for registration. Med Image Anal 9: 330–343.
10.1016/j.media.2005.01.002
PubMed Web of Science® Google Scholar
Mugler JP, Brookeman JR (1990): Three-dimensional magnetization-prepared rapid gradient-echo imaging (3D MP RAGE). Magn Reson Med 15: 152–157.
10.1002/mrm.1910150117
PubMed Web of Science® Google Scholar
Myronenko A, Song X (2010): Intensity-based image registration by minimizing residual complexity. IEEE Trans Med Imaging 29: 1882–1891.
10.1109/TMI.2010.2053043
PubMed Web of Science® Google Scholar
Nestares O, Heeger DJ (2000): Robust multiresolution alignment of mri brain volumes. Magn Reson Med 43: 705–715.
10.1002/(SICI)1522-2594(200005)43:5<705::AID-MRM13>3.0.CO;2-R
CAS PubMed Web of Science® Google Scholar
Periaswamy S, Farid H (2006): Medical image registration with partial data. Med Image Anal 10: 452–464.
10.1016/j.media.2005.03.006
PubMed Web of Science® Google Scholar
Prastawa M, Bullitt E, Gerig G (2009): Simulation of brain tumors in MR images for evaluation of segmentation efficacy. Med Image Anal 13: 297–311.
10.1016/j.media.2008.11.002
PubMed Web of Science® Google Scholar
Reuter M, Fischl B (2011): Avoiding asymmetry-induced bias in longitudinal image processing. NeuroImage 57: 19–21.
10.1016/j.neuroimage.2011.02.076
PubMed Web of Science® Google Scholar
Reuter M, Rosas HD, Fischl B (2010): Highly accurate inverse consistent registration: A robust approach. NeuroImage 53: 1181–1196.
10.1016/j.neuroimage.2010.07.020
PubMed Web of Science® Google Scholar
Reuter M, Schmansky NJ, Rosas HD, Fischl B (2012): Within-subject template estimation for unbiased longitudinal image analysis. NeuroImage 61: 1402–1418.
10.1016/j.neuroimage.2012.02.084
PubMed Web of Science® Google Scholar
Roche A, Malandain G, Ayache N, Prima S (1999): Towards a better comprehension of similarity measures used in medical image registration. In Medical Image Computing and Computer Assisted Interventions, Vol. 1679. Lecture Notes in Computer Science. Springer Verlag. pp 555–566.
10.1007/10704282_60
Web of Science® Google Scholar
Roche A, Malandain G, Ayache N (2000): Unifying maximum likelihood approaches in medical image registration. Int J Imaging Syst Technol 11: 71–80.
10.1002/(SICI)1098-1098(2000)11:1<71::AID-IMA8>3.0.CO;2-5
Web of Science® Google Scholar
Shechtman E, Irani M (2007): Matching local self-similarities across images and videos. In: IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, MN. pp 1–8.
Google Scholar
Smith SM (2002): Fast robust automated brain extraction. Hum Brain Mapp 17: 143–155.
10.1002/hbm.10062
PubMed Web of Science® Google Scholar
Smith SM, Zhang Y, Jenkinson M, Chen J, Matthews PM, Federico A, De Stefano N (2002): Accurate, robust, and automated longitudinal and cross-sectional brain change analysis. Neuroimage 17: 479–489.
10.1006/nimg.2002.1040
PubMed Web of Science® Google Scholar
Smith SM, Jenkinson M, Woolrich MW, Beckmann CF, Behrens TEJ, Johansen-berg H, Bannister PR, De Luca M, Drobnjak I, Flitney DE, Niazy RK, Saunders J, Vickers J, Zhang Y, De Stefano N, Brady JM, Matthews PM (2004): Advances in functional and structural MR image analysis and implementation as Fsl. NeuroImage 23: 208–219.
10.1016/j.neuroimage.2004.07.051
PubMed Web of Science® Google Scholar
Stewart CV (1999): Robust parameter estimation in computer vision. SIAM Rev 41: 513–537.
10.1137/S0036144598345802
Web of Science® Google Scholar
Street JO, Carroll RJ, Ruppert D (1988): A note on computing robust regression estimates via iteratively reweighted least squares. Am Stat 42: 152–154.
10.2307/2684491
Web of Science® Google Scholar
Thompson WK, Holland D (2011): Bias in tensor based morphometry stat-ROI measures may result in unrealistic power estimates. NeuroImage 57: 1–4.
10.1016/j.neuroimage.2010.11.092
Web of Science® Google Scholar
van der Kouwe AJW, Benner T, Salat DH, Fischl B (2008): Brain Morphometry with Multiecho MPRAGE. Neuroimage 40: 559–569.
10.1016/j.neuroimage.2007.12.025
PubMed Web of Science® Google Scholar
Vercauteren T, Pennec X, Malis E, Perchant A, Ayache N (2007): Insight into efficient image registration techniques and the demons algorithm. Inf Process Med Imaging 4584: 495–506.
Web of Science® Google Scholar
Vercauteren T, Pennec X, Perchant A, Ayache N (2009): Diffeomorphic demons: Efficient non-parametric image registration. NeuroImage 45: 61–72.
10.1016/j.neuroimage.2008.10.040
PubMed Web of Science® Google Scholar
Viola P, Wells WM (1997): Alignment by maximization of mutual information. Int J Comput Vis 24: 137–154.
10.1023/A:1007958904918
Web of Science® Google Scholar
Wachinger C, Navab N (2009): Similarity metrics and efficient optimization for simultaneous registration. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). Miami, FL. pp 667–674.
Google Scholar
Wachinger C, Navab N (2012a): Entropy and Laplacian images: Structural representations for multi-modal registration. Med Image Anal 16: 1–17.
10.1016/j.media.2011.03.001
PubMed Web of Science® Google Scholar
Wachinger C, Navab N (2012b): A contextual maximum likelihood framework for modeling image registration. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). Providence, RI. pp 1995–2002.
Google Scholar
Wachinger C, Navab N (2013): Simultaneous registration of multiple images: Similarity metrics and efficient optimization. IEEE Trans Pattern Anal Mach Intell 35: 1221–1233.
10.1109/TPAMI.2012.196
PubMed Web of Science® Google Scholar
Wong A, Orchard J (2009): Robust multimodal registration using local phase-coherence representations. J Signal Process Syst 54: 89–100.
10.1007/s11265-008-0202-x
Web of Science® Google Scholar
Yaqoob Z, Wu J, Yang C (2005): Spectral domain optical coherence tomography: A better OCT imaging strategy. BioTechniques 39: S6–S13.
10.2144/000112090
PubMed Google Scholar
Yushkevich PA, Avants BB, Das SR, Pluta J, Altinay M, Craige C (2010): Bias in estimation of hippocampal atrophy using deformation-based morphometry arises from asymmetric global normalization: An illustration in ADNI 3 T MRI data. Neuroimage 50: 434–445.
10.1016/j.neuroimage.2009.12.007
PubMed Web of Science® Google Scholar
Zefran M, Kumar V, Croke CB (1998): On the generation of smooth three-dimensional rigid body motions. IEEE Trans Robotics Autom 14: 576–589.
10.1109/70.704225
Web of Science® Google Scholar
Zeng Q, Chen Y (2008): Accurate inverse consistent non-rigid image registration and its application on automatic re-contouring. Bioinf Res Appl 4983: 293–304.
Web of Science® Google Scholar

1 An alternative would be to initialize the transformation based on raw image moments, as described in (Reuter et al., 2010).
2 We consider separate noise for fixed and moving image, which are subject to the intensity scaling. Since linear combination of mutually independent normal random variables is again normal distributed, we summarize it into a global noise term.

Citing Literature

Volume36, Issue4

April 2015

Pages 1365-1380

Multi-modal robust inverse-consistent linear registration

Abstract

INTRODUCTION

Applications

Related Work

Outline

ROBUST MULTIMODAL REGISTRATION

Symmetric Transformation Parameterization

Probabilistic Model for RR

Heteroskedastic Noise

Relationship to Iteratively Reweighted Least Squares

Robust Estimators

OPTIMIZATION

Additional Algorithmic Details

ENTROPY IMAGES WITH NP WINDOWS

EXPERIMENTS

Brain Registration with Ground Truth

Tumor Registration

Full Head Registration

Histology-OCT Registration

Bias Field Robustness

DISCUSSION AND CONCLUSION

ACKNOWLEDGMENTS

REFERENCES

Citing Literature

Figures

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley