Volume 31, Issue 6 e2575
RESEARCH ARTICLE
Open Access

Memory-efficient compression of 𝒟ℋ2-matrices for high-frequency Helmholtz problems

Steffen Börm

Corresponding Author

Steffen Börm

Mathematisches Seminar, Christian-Albrechts-Universität zu Kiel, Germany

Correspondence

Steffen Börm, Mathematisches Seminar, Christian-Albrechts-Universität zu Kiel, Germany.

Email: [email protected]

Search for more papers by this author
Janne Henningsen

Janne Henningsen

Mathematisches Seminar, Christian-Albrechts-Universität zu Kiel, Germany

Search for more papers by this author
First published: 04 July 2024
Citations: 1

Abstract

Directional interpolation is a fast and efficient compression technique for high-frequency Helmholtz boundary integral equations, but requires a very large amount of storage in its original form. Algebraic recompression can significantly reduce the storage requirements and speed up the solution process accordingly. During the recompression process, weight matrices are required to correctly measure the influence of different basis vectors on the final result, and for highly accurate approximations, these weight matrices require more storage than the final compressed matrix. We present a compression method for the weight matrices and demonstrate that it introduces only a controllable error to the overall approximation. Numerical experiments show that the new method leads to a significant reduction in storage requirements.

1 INTRODUCTION

We consider boundary element discretizations of the Helmholtz equation
Δ u κ 2 u = 0 $$ -\Delta u-{\kappa}^2u=0 $$
with the wave number κ $$ \kappa \in \mathbb{R} $$ on a domain Ω 3 $$ \Omega \subseteq {\mathbb{R}}^3 $$ . Using the fundamental solution
g ( x , y ) = exp ( ι κ x y ) 4 π x y for all x , y 3 , x y , $$ g\left(x,y\right)=\frac{\exp \left(\iota \kappa \left\Vert x-y\right\Vert \right)}{4\pi \left\Vert x-y\right\Vert}\kern2.3em \mathrm{for}\ \mathrm{all}\kern0.3em x,y\in {\mathbb{R}}^3,\kern0.3em x\ne y, $$ (1)
the boundary integral formulation leads to an equation of the form
Ω g ( x , y ) u n ( y ) d y = 1 2 u ( x ) + Ω g n y ( x , y ) u ( y ) d y for all x Ω $$ {\int}_{\mathrm{\partial \Omega }}g\left(x,y\right)\kern0.3em \frac{\partial u}{\partial n}(y)\kern0.3em dy=\frac{1}{2}u(x)+{\int}_{\mathrm{\partial \Omega }}\frac{\partial g}{\partial {n}_y}\left(x,y\right)u(y)\kern0.3em dy\kern2.3em \mathrm{for}\ \mathrm{all}\kern0.3em x\in \mathrm{\partial \Omega } $$ (2)
that allows us to compute the Neumann boundary values u n $$ \frac{\partial u}{\partial n} $$ on Ω $$ \mathrm{\partial \Omega } $$ from the Dirichlet boundary values. Once we know both, the solution u $$ u $$ can be evaluated anywhere in the domain Ω $$ \Omega $$ .
In order to solve the integral equation (2), we employ a Galerkin discretization: the unknown Neumann values are approximated by a boundary element basis ( φ i ) i $$ {\left({\varphi}_i\right)}_{i\in \mathcal{I}} $$ and the Dirichlet values by another, possibly different, basis ( ψ j ) j 𝒥 . The discretization replaces the integral operators by matrices G × $$ G\in {\mathbb{C}}^{\mathcal{I}\times \mathcal{I}} $$ and K × 𝒥 given by
g i j : = Ω φ i ( x ) Ω g ( x , y ) φ j ( y ) d y d x for all i , j , k i j : = Ω φ i ( x ) Ω g n y ( x , y ) ψ ( y ) d y d x for all i , j 𝒥 .
Both matrices are densely populated in general, and having to store them explicitly would severely limit the resolution and therefore the accuracy of the approximation. Fast multipole methods1-3 employ a special approximation of the kernel function that can be evaluated efficiently using FFT-like algorithms to significantly improve the speed of the matrix-vector product.

Local low-rank approximations offer an attractive solution without the need for special expansion: if the kernel function can be approximated by a tensor product, the corresponding part of the matrix can be approximated by a low-rank matrix, and keeping this matrix in factorized form will significantly reduce the storage requirements.

Directional interpolation4-9 offers a particularly convenient approach: the kernel function is split into a plane wave and a smooth remainder, and interpolation of the remainder yields a tensor-product approximation of the kernel function. Advantages of this approach include ease of implementation and very robust convergence properties. A major disadvantage is the large amount of storage required by this approximation.

Fortunately, this disadvantage can be overcome by combining the analytical approximation with an algebraic recompression10, 11 to significantly reduce the storage requirements and improve the speed of matrix-vector multiplications at the expense of some additional work in the setup phase. In order to guarantee the quality of the recompression, the algorithm relies on weight matrices that describe how “important” different basis vectors are for the final approximation.

If we are considering problems with high wave numbers and high resolutions, these weight matrices may require far more storage than the final result of the compression, that is, we may run out of storage even if the final result would fit a given computer system.

To solve this problem, we present an algorithm for replacing the exact weight matrices by compressed weight matrices. The key challenge is to ensure that the additional errors introduced by this procedure can be controlled and do not significantly reduce the accuracy of the final result of the computation.

The following Section 2 introduces the structure of 𝒟 2 -matrices used to represent operators for high-frequency Helmholtz boundary integral equations. Section 3 shows how algebraic compression can be applied to significantly reduce the storage requirements of 𝒟 2 -matrices. Section 4 is focused on deriving error estimates for the compression. In Section 5, we introduce an algorithm for approximating the weight matrices while preserving the necessary accuracy. Section 6 contains numerical results indicating that the new method preserves the convergence rates of the underlying Galerkin discretization.

2 𝒟 2 -MATRICES

Integral operators with smooth kernel functions can be handled efficiently by applying interpolation to the kernel function, since this gives rise to a low-rank approximation.

The kernel function of the high-frequency Helmholtz equation is oscillatory and therefore not well-suited for interpolation. The idea of directional interpolation7-9 is to split the kernel function into an oscillatory part that can be approximated by a plane wave and a smooth part that can be approximated by interpolation.

2.1 Directional interpolation

To illustrate this approach, we approximate the oscillatory part exp ( ι κ x y ) $$ \exp \left(\iota \kappa \left\Vert x-y\right\Vert \right) $$ for x τ $$ x\in \tau $$ , y σ $$ y\in \sigma $$ , where τ , σ 3 $$ \tau, \sigma \subseteq {\mathbb{R}}^3 $$ are star-shaped subsets with respect to centers x τ τ $$ {x}_{\tau}\in \tau $$ and y σ σ $$ {y}_{\sigma}\in \sigma $$ . A Taylor expansion of z : = x y $$ z:= x-y $$ around z 0 : = x τ y σ $$ {z}_0:= {x}_{\tau }-{y}_{\sigma } $$ yields
κ z = κ z 0 + κ z 0 z 0 , z z 0 + κ 0 1 ( 1 t ) sin 2 ( z z 0 , z 0 + t ( z z 0 ) ) z z 0 2 z 0 + t ( z z 0 ) d t . $$ \kappa \left\Vert z\right\Vert \kern0.5em =\kappa \left\Vert {z}_0\right\Vert +\kappa \left\langle \frac{z_0}{\left\Vert {z}_0\right\Vert },z-{z}_0\right\rangle +\kappa {\int}_0^1\left(1-t\right){\sin}^2\angle \left(z-{z}_0,{z}_0+t\left(z-{z}_0\right)\right)\frac{{\left\Vert z-{z}_0\right\Vert}^2}{\left\Vert {z}_0+t\left(z-{z}_0\right)\right\Vert}\kern0.3em dt. $$
Inserted into the exponential function, the first two terms on the right-hand side correspond to a plane wave. In order to ensure that this plane wave is a reasonably good approximation of the spherical wave appearing in the kernel function, we have to bound the integral term. Using the diameter and distance given by
diam ( τ ) : = max { x 1 x 2 : x 1 , x 2 τ } , dist ( τ , σ ) : = min { x y : x τ , y σ } , $$ {\displaystyle \begin{array}{ll}\hfill \mathrm{diam}\left(\tau \right)& := \max \left\{\left\Vert {x}_1-{x}_2\right\Vert \kern0.3em :\kern0.3em {x}_1,{x}_2\in \tau \right\},\\ {}\hfill \mathrm{dist}\left(\tau, \sigma \right)& := \min \left\{\left\Vert x-y\right\Vert \kern0.3em :\kern0.3em x\in \tau, \kern0.3em y\in \sigma \right\},\end{array}} $$
the third term is bounded if
κ max { diam ( τ ) 2 , diam ( σ ) 2 } η 3 dist ( τ , σ ) $$ \kappa \max \left\{\mathrm{diam}{\left(\tau \right)}^2,\mathrm{diam}{\left(\sigma \right)}^2\right\}\le {\eta}_3\mathrm{dist}\left(\tau, \sigma \right) $$ (3a)
holds with a suitable parameter η 3 > 0 $$ {\eta}_3\in {\mathbb{R}}_{>0} $$ . In terms of our kernel function, this means that we can approximate the spherical wave exp ( ι κ x y ) $$ \exp \left(\iota \kappa \left\Vert x-y\right\Vert \right) $$ by the plane wave traveling in direction z 0 $$ {z}_0 $$ .
Since z 0 $$ {z}_0 $$ depends on x τ $$ {x}_{\tau } $$ and y σ $$ {y}_{\sigma } $$ , we would have to use different directions for every pair ( τ , σ ) $$ \left(\tau, \sigma \right) $$ of subdomains, and this would make the approximation too expensive. To keep the number of directions under control, we restrict ourselves to a fixed set 𝒟 of unit vectors and approximate z 0 / z 0 $$ {z}_0/\left\Vert {z}_0\right\Vert $$ by an element c 𝒟 . If we can ensure
κ x τ y σ x τ y σ c max { diam ( τ ) , diam ( σ ) } η 2 $$ \kappa \left\Vert \frac{x_{\tau }-{y}_{\sigma }}{\left\Vert {x}_{\tau }-{y}_{\sigma}\right\Vert }-c\right\Vert \max \left\{\mathrm{diam}\left(\tau \right),\mathrm{diam}\left(\sigma \right)\right\}\le {\eta}_2 $$ (3b)
with a parameter η 2 > 0 $$ {\eta}_2\in {\mathbb{R}}_{>0} $$ , the spherical wave divided by the plane wave
exp ( ι κ x y ) exp ( ι κ c , x y ) = exp ( ι κ ( x y c , x y ) ) $$ \frac{\exp \left(\iota \kappa \left\Vert x-y\right\Vert \right)}{\exp \left(\iota \kappa \left\langle c,x-y\right\rangle \right)}=\exp \left(\iota \kappa \left(\left\Vert x-y\right\Vert -\left\langle c,x-y\right\rangle \right)\right) $$
will still be sufficiently smooth, and the modified kernel function
g c ( x , y ) : = exp ( ι κ ( x y c , x y ) ) 4 π x y for all x τ , y σ $$ {g}_c\left(x,y\right):= \frac{\exp \left(\iota \kappa \left(\left\Vert x-y\right\Vert -\left\langle c,x-y\right\rangle \right)\right)}{4\pi \left\Vert x-y\right\Vert}\kern2.3em \mathrm{for}\ \mathrm{all}\kern0.3em x\in \tau, \kern0.3em y\in \sigma $$
will no longer be oscillatory. If the domains τ $$ \tau $$ and σ $$ \sigma $$ are small compared to the wave length, (3b) allows us to choose c = 0 $$ c=0 $$ and g c = g $$ {g}_c=g $$ to improve the efficiency of our implementation. In order to interpolate this function, we also have to keep its denominator under control. This can be accomplished by requiring
max { diam ( τ ) , diam ( σ ) } η 1 dist ( τ , σ ) . $$ \max \left\{\mathrm{diam}\left(\tau \right),\mathrm{diam}\left(\sigma \right)\right\}\le {\eta}_1\mathrm{dist}\left(\tau, \sigma \right). $$ (3c)
If the three admissibility conditions (3a), (3b), and (3c) hold, standard tensor interpolation of g c $$ {g}_c $$ converges at a robust rate.6, 7 To summarize, (3a) allows us to approximate the spherical wave by a plane wave, (3b) allows us to limit the number of directions this plane wave can travel in, and (3c) ensures that we are sufficiently far away from the singularity to be able to approximate 1 / z $$ 1/\left\Vert z\right\Vert $$ by a polynomial.
We choose interpolation points ( ξ τ , ν ) ν = 1 k $$ {\left({\xi}_{\tau, \nu}\right)}_{\nu =1}^k $$ with corresponding Lagrange polynomials ( τ , ν ) ν = 1 k $$ {\left({\ell}_{\tau, \nu}\right)}_{\nu =1}^k $$ in the subdomain τ $$ \tau $$ and interpolation points ( ξ σ , μ ) μ = 1 k $$ {\left({\xi}_{\sigma, \mu}\right)}_{\mu =1}^k $$ with corresponding Lagrange polynomials ( σ , μ ) μ = 1 k $$ {\left({\ell}_{\sigma, \mu}\right)}_{\mu =1}^k $$ in the subdomain σ $$ \sigma $$ and approximate g c $$ {g}_c $$ by the interpolating polynomial
g ˜ τ σ c ( x , y ) : = ν = 1 k μ = 1 k g c ( ξ τ , ν , ξ σ , μ ) τ , ν ( x ) σ , μ ( y ) . $$ {\tilde{g}}_{\tau \sigma c}\left(x,y\right):= \sum \limits_{\nu =1}^k\sum \limits_{\mu =1}^k{g}_c\left({\xi}_{\tau, \nu },{\xi}_{\sigma, \mu}\right){\ell}_{\tau, \nu }(x){\ell}_{\sigma, \mu }(y). $$
In order to obtain an approximation of the original kernel function g $$ g $$ , we have to multiply g c $$ {g}_c $$ by the plane wave exp ( ι κ c , x y ) $$ \exp \left(\iota \kappa \left\langle c,x-y\right\rangle \right) $$ and get
g ˜ τ σ ( x , y ) = ν = 1 k μ = 1 k g c ( ξ τ , ν , ξ σ , μ ) τ c , ν ( x ) σ c , μ ( y ) $$ {\tilde{g}}_{\tau \sigma}\left(x,y\right)=\sum \limits_{\nu =1}^k\sum \limits_{\mu =1}^k{g}_c\left({\xi}_{\tau, \nu },{\xi}_{\sigma, \mu}\right){\ell}_{\tau c,\nu }(x)\overline{\ell_{\sigma c,\mu }(y)} $$
with the modified Lagrange functions
τ c , ν ( x ) = exp ( ι κ c , x ) τ , ν ( x ) , σ c , μ ( y ) = exp ( ι κ c , y ) σ , μ ( y ) , $$ {\ell}_{\tau c,\nu }(x)=\exp \left(\iota \kappa \left\langle c,x\right\rangle \right){\ell}_{\tau, \nu }(x),\kern1em {\ell}_{\sigma c,\mu }(y)=\exp \left(\iota \kappa \left\langle c,y\right\rangle \right){\ell}_{\sigma, \mu }(y), $$
where we exploit
σ c , μ ( y ) = exp ( ι κ c , y ) σ , μ ( y ) = exp ( ι κ c , y ) σ , μ ( y ) for all y 3 , μ [ 1 : k ] . $$ \overline{\ell_{\sigma c,\mu }(y)}=\overline{\exp \left(\iota \kappa \left\langle c,y\right\rangle \right)}{\ell}_{\sigma, \mu }(y)=\exp \left(-\iota \kappa \left\langle c,y\right\rangle \right){\ell}_{\sigma, \mu }(y)\kern2.3em \mathrm{for}\ \mathrm{all}\kern0.3em y\in {\mathbb{R}}^3,\kern0.3em \mu \in \left[1:k\right]. $$

2.2 𝒟 2 -matrices

To obtain an approximation of the entire matrix G $$ G $$ , we have to partition its index set × $$ \mathcal{I}\times \mathcal{I} $$ into subsets where our approximation can be used. Here, $$ \mathcal{I} $$ denotes the index set of the boundary element basis functions ( φ i ) i $$ {\left({\varphi}_i\right)}_{i\in \mathcal{I}} $$ introduced before. We focus here on the matrix G × $$ G\in {\mathbb{C}}^{\mathcal{I}\times \mathcal{I}} $$ for the sake of simplicity, but state that the entire construction can be extended to handle the rectangular matrix K × 𝒥 in a very similar way.

Definition 1. (cluster tree)Let 𝒯 be a finite tree, and let each of its nodes t 𝒯 be associated with a subset t ^ $$ \hat{t}\subseteq \mathcal{I} $$ .

𝒯 is called a cluster tree for the index set $$ \mathcal{I} $$ if

  • the root r = root ( 𝒯 ) is associated with r ^ = $$ \hat{r}=\mathcal{I} $$ ,
  • for all t 𝒯 with children, we have
    t ^ = t chil ( t ) t ^ . $$ \hat{t}{=\bigcup}_{t^{\prime}\in \mathrm{chil}(t)}{\hat{t}}^{\prime }. $$
  • for all t 𝒯 , t 1 , t 2 chil ( t ) $$ {t}_1,{t}_2\in \mathrm{chil}(t) $$ , we have
    t 1 t 2 t ^ 1 t ^ 2 = . $$ {t}_1\ne {t}_2\Rightarrow {\hat{t}}_1\cap {\hat{t}}_2=\varnothing . $$

A cluster tree for $$ \mathcal{I} $$ is usually denoted by 𝒯 . Its leaves are denoted by : = { t 𝒯 : chil ( t ) = } .

A cluster tree provides us with a hierarchy of subsets of the index set $$ \mathcal{I} $$ , and its leaves define a disjoint partition of $$ \mathcal{I} $$ . In order to define approximations for the matrix G $$ G $$ , we require a similar tree structure with subsets of × $$ \mathcal{I}\times \mathcal{I} $$ .

Definition 2. (block tree)Let 𝒯 be a finite tree. It is called a block tree for the cluster tree 𝒯 if

  • for all b 𝒯 there are t , s 𝒯 with b = ( t , s ) $$ b=\left(t,s\right) $$ ,
  • the root r = root ( 𝒯 ) is given by r = ( root ( 𝒯 ) , root ( 𝒯 ) ) ,
  • for all b = ( t , s ) 𝒯 with chil ( b ) $$ \mathrm{chil}(b)\ne \varnothing $$ we have
    chil ( b ) = chil ( t ) × chil ( s ) . $$ \mathrm{chil}(b)=\mathrm{chil}(t)\times \mathrm{chil}(s). $$

A block tree for 𝒯 is usually denoted by 𝒯 × . Its leaves are denoted by × : = { b 𝒯 × : chil ( b ) = } .

Leaves satisfying (3) are called admissible and collected in the set × + × $$ {\mathcal{L}}_{\mathcal{I}\times \mathcal{I}}^{+}\subseteq {\mathcal{L}}_{\mathcal{I}\times \mathcal{I}} $$ . Its complement are the inadmissible leaves × : = × × + $$ {\mathcal{L}}_{\mathcal{I}\times \mathcal{I}}^{-}:= {\mathcal{L}}_{\mathcal{I}\times \mathcal{I}}\setminus {\mathcal{L}}_{\mathcal{I}\times \mathcal{I}}^{+} $$ .

The definition implies that a block tree 𝒯 × for 𝒯 is indeed a cluster tree for the index set × $$ \mathcal{I}\times \mathcal{I} $$ , and therefore the leaves × $$ {\mathcal{L}}_{\mathcal{I}\times \mathcal{I}} $$ of a block tree describe a disjoint partition of × $$ \mathcal{I}\times \mathcal{I} $$ , that is, a decomposition of G $$ G $$ into submatrices G | t ^ × ŝ $$ {\left.G\right|}_{\hat{t}\times \hat{s}} $$ for all b = ( t , s ) × $$ b=\left(t,s\right)\in {\mathcal{L}}_{\mathcal{I}\times \mathcal{I}} $$ .

We expect to be able to efficiently approximate those blocks that satisfy the admissibility conditions (3). In order to keep the number of blocks small, we split only those blocks that do not satisfy these conditions. This approach ensures that we obtain the minimal number of blocks and therefore the most efficient representation.

According to References 6 and 7, the admissiblity conditions (3) guarantee that the directional interpolation converges exponentially with respect to k 1 / 3 $$ {k}^{1/3} $$ on the domain τ × σ $$ \tau \times \sigma $$ . In order to be able to apply our approximation to submatrices, we have to take the supports of the basis functions into account. Since we will be using tensor interpolation, we choose for every cluster t 𝒯 an axis-parallel bounding box τ 3 $$ \tau \subseteq {\mathbb{R}}^3 $$ such that
supp φ i τ for all i t ^ . $$ \operatorname{supp}{\varphi}_i\subseteq \tau \kern2.3em \mathrm{for}\ \mathrm{all}\kern0.3em i\in \hat{t}. $$
For every cluster s 𝒯 , we denote the corresponding bounding box by σ $$ \sigma $$ . If we have a block ( t , s ) × $$ \left(t,s\right)\in {\mathcal{L}}_{\mathcal{I}\times \mathcal{I}} $$ with bounding boxes τ $$ \tau $$ and σ $$ \sigma $$ satisfying the admissibility conditions (3), we can expect the approximation
g ˜ τ σ ( x , y ) = ν = 1 k μ = 1 k g c ( ξ τ , ν , ξ σ , μ ) τ c , ν ( x ) σ c , μ ( y ) for all x τ , y σ $$ {\tilde{g}}_{\tau \sigma}\left(x,y\right)=\sum \limits_{\nu =1}^k\sum \limits_{\mu =1}^k{g}_c\left({\xi}_{\tau, \nu },{\xi}_{\sigma, \mu}\right){\ell}_{\tau c,\nu }(x)\overline{\ell_{\sigma c,\mu }(y)}\kern2.3em \mathrm{for}\ \mathrm{all}\kern0.3em x\in \tau, \kern0.3em y\in \sigma $$
for a suitably chosen direction c $$ c $$ , cf. (3b), to converge rapidly and therefore
g i j Ω φ i ( x ) Ω g ˜ τ σ ( x , y ) φ j ( y ) d y d x = ν = 1 k μ = 1 k Ω τ c , ν ( x ) φ i ( x ) d x = : v t c , i ν g c ( ξ τ , ν , ξ σ , μ ) = : s t s , ν μ Ω σ c , μ ( y ) φ j ( y ) d y = : v s c , j μ = ( V t c S t s V s c ) i j for all i t ^ , j ŝ . $$ {\displaystyle \begin{array}{ll}\hfill {g}_{ij}& \approx {\int}_{\mathrm{\partial \Omega }}{\varphi}_i(x){\int}_{\mathrm{\partial \Omega }}{\tilde{g}}_{\tau \sigma}\left(x,y\right){\varphi}_j(y)\kern0.3em dy\kern0.3em dx\\ {}\hfill & =\sum \limits_{\nu =1}^k\sum \limits_{\mu =1}^k\underset{=: {v}_{tc, i\nu}}{\underbrace{\int_{\mathrm{\partial \Omega }}{\ell}_{\tau c,\nu }(x){\varphi}_i(x)\kern0.3em dx}}\kern.2em \underset{=: {s}_{ts,\nu \mu}}{\underbrace{g_c\left({\xi}_{\tau, \nu },{\xi}_{\sigma, \mu}\right)}}\kern.2em \underset{=: {\overline{v}}_{sc, j\mu}}{\underbrace{\int_{\mathrm{\partial \Omega }}\overline{\ell_{\sigma c,\mu }(y){\varphi}_j(y)}\kern0.3em dy}}\\ {}\hfill & ={\left({V}_{tc}{S}_{ts}{V}_{sc}^{\ast}\right)}_{ij}\kern2em \mathrm{for}\ \mathrm{all}\kern0.3em i\in \hat{t},\kern0.3em j\in \hat{s}.\end{array}} $$ (4)
This means that the submatrices corresponding to the leaves
× + : = { ( t , s ) × : τ and σ satisfy (3) } $$ {\mathcal{L}}_{\mathcal{I}\times \mathcal{I}}^{+}:= \left\{\left(t,s\right)\in {\mathcal{L}}_{\mathcal{I}\times \mathcal{I}}\kern0.3em :\kern0.3em \tau \kern0.3em \mathrm{and}\kern0.3em \sigma \kern0.3em \mathrm{satisfy}\ (3)\right\} $$
can be approximated by low-rank matrices in factorized form.

We can satisfy the admissibility condition (3b) only if large clusters are accompanied by a large number of directions to choose from.

Definition 3. (Directions)Let 𝒯 be a cluster tree. For every cluster t 𝒯 we let either 𝒟 t = { 0 } or choose a subset 𝒟 t 3 such that

c = 1 for every c 𝒟 t .
The family ( 𝒟 t ) t 𝒯 is called a family of directions for the cluster tree 𝒯 .

Allowing 𝒟 t = { 0 } makes algorithms more efficient for small clusters where (3b) can be fulfilled by choosing c = 0 $$ c=0 $$ . In this case, the function τ c , ν $$ {\ell}_{\tau c,\nu } $$ becomes simply the Lagrange polynomial τ , ν $$ {\ell}_{\tau, \nu } $$ , and the modified kernel function g c $$ {g}_c $$ becomes just the standard kernel function g $$ g $$ .

In the high-frequency case, the large clusters require 𝒪 ( κ 2 ) 𝒪 ( n ) directions to satisfy (3b). This means that storing all matrices ( V t c ) t 𝒯 , c 𝒟 t would require at least 𝒪 ( n 2 ) units of storage, where n = # $$ n=\#\mathcal{I} $$ denotes the number of basis functions, and this would not be an improvement over simply storing the matrix explicitly. This problem can be overcome by taking advantage of the fact that we can approximate V t c $$ {V}_{tc} $$ in terms of the matrices V t c $$ {V}_{t^{\prime }{c}^{\prime }} $$ corresponding to its children: let ( ξ τ , ν ) ν = 1 k $$ {\left({\xi}_{\tau^{\prime },{\nu}^{\prime }}\right)}_{\nu^{\prime }=1}^k $$ and ( τ , ν ) ν = 1 k $$ {\left({\ell}_{\tau^{\prime },{\nu}^{\prime }}\right)}_{\nu^{\prime }=1}^k $$ denote the interpolations points and Lagrange polynomials for the subdomain τ $$ {\tau}^{\prime } $$ corresponding to the child cluster t $$ {t}^{\prime } $$ . If we use the same polynomial order for all clusters, we have
τ , ν = ν = 1 k τ , ν ( ξ τ , ν ) τ , ν for all ν [ 1 : k ] $$ {\ell}_{\tau, \nu}\kern0.5em =\sum \limits_{\nu^{\prime }=1}^k{\ell}_{\tau, \nu}\left({\xi}_{\tau^{\prime },{\nu}^{\prime }}\right){\ell}_{\tau^{\prime },{\nu}^{\prime }}\kern1.00em \mathrm{for}\ \mathrm{all}\kern0.3em \nu \in \left[1:k\right] $$
by the identity theorem, and interpolating a slightly modified function instead yields
τ c , ν ( x ) = exp ( ι κ c , x ) τ , ν ( x ) = exp ( ι κ c , x ) exp ( ι κ c c , x ) τ , ν ( x ) exp ( ι κ c , x ) ν = 1 k exp ( ι κ c c , ξ τ , ν ) τ , ν ( ξ τ , ν ) = : e τ c , ν ν τ , ν ( x ) = ν = 1 k e τ c , ν ν τ c , ν ( x ) , $$ {\displaystyle \begin{array}{ll}\hfill {\ell}_{\tau c,\nu }(x)& =\exp \left(\iota \kappa \left\langle c,x\right\rangle \right){\ell}_{\tau, \nu }(x)\\ {}\hfill & =\exp \left(\iota \kappa \left\langle {c}^{\prime },x\right\rangle \right)\exp \left(\iota \kappa \left\langle c-{c}^{\prime },x\right\rangle \right){\ell}_{\tau, \nu }(x)\\ {}\hfill & \approx \exp \left(\iota \kappa \left\langle {c}^{\prime },x\right\rangle \right)\sum \limits_{\nu^{\prime }=1}^k\underset{=: {e}_{\tau^{\prime }c,{\nu}^{\prime}\nu }}{\underbrace{\exp \left(\iota \kappa \left\langle c-{c}^{\prime },{\xi}_{\tau^{\prime },{\nu}^{\prime }}\right\rangle \right){\ell}_{\tau, \nu}\left({\xi}_{\tau^{\prime },{\nu}^{\prime }}\right)}}{\ell}_{\tau^{\prime },{\nu}^{\prime }}(x)\\ {}\hfill & =\sum \limits_{\nu^{\prime }=1}^k{e}_{\tau^{\prime }c,{\nu}^{\prime}\nu}\kern0.3em {\ell}_{\tau^{\prime }{c}^{\prime },{\nu}^{\prime }}(x),\end{array}} $$
that is, we can approximate modified Lagrange polynomials on parent clusters by modified Lagrange polynomials in their children. Under moderate conditions, this approximation can be applied repeatedly without harming the total error too much,6, 7 so we can afford to replace the matrices V τ c $$ {V}_{\tau c} $$ defined in (4) in all nonleaf clusters by approximations.

Definition 4. (Directional cluster basis)Let 𝒯 be a cluster tree with a family 𝒟 = ( 𝒟 t ) t 𝒯 of directions. A family ( V t c ) t 𝒯 , c 𝒟 t of matrices V t c t ^ × k $$ {V}_{tc}\in {\mathbb{C}}^{\hat{t}\times k} $$ is called a directional cluster basis for 𝒯 and 𝒟 if for every t 𝒯 and t chil ( t ) $$ {t}^{\prime}\in \mathrm{chil}(t) $$ there are a direction c = dirchil ( t , c ) $$ {c}^{\prime }=\mathrm{dirchil}\left({t}^{\prime },c\right) $$ and a matrix E t c k × k $$ {E}_{t^{\prime }c}\in {\mathbb{C}}^{k\times k} $$ with

V t c | t ^ × k = V t c E t c . $$ {\left.{V}_{tc}\right|}_{{\hat{t}}^{\prime}\times k}={V}_{t^{\prime }{c}^{\prime }}{E}_{t^{\prime }c}. $$ (5)
The matrices E t c $$ {E}_{t^{\prime }c} $$ are called transfer matrices. Since the matrices V t c $$ {V}_{tc} $$ have to be stored only for clusters without children, they are called leaf matrices.

The matrices V t c $$ {V}_{tc} $$ here correspond to the matrices introduced in (4) only for the leaves, while for nonleaf clusters the Lagrange polynomials are replaced by nested interpolation.

Definition 5. ( 𝒟 2 -matrix)Let 𝒯 be a cluster tree with a family 𝒟 of directions, let V = ( V t c ) t 𝒯 , c 𝒟 t be a directional cluster basis, and let 𝒯 × be a block tree.

A matrix G × $$ G\in {\mathbb{C}}^{\mathcal{I}\times \mathcal{I}} $$ is called a 𝒟 2 -matrix if for every admissible leaf b = ( t , s ) × + $$ b=\left(t,s\right)\in {\mathcal{L}}_{\mathcal{I}\times \mathcal{I}}^{+} $$ there are a direction c = dirblock ( t , s ) 𝒟 t 𝒟 s and a matrix S t s k × k $$ {S}_{ts}\in {\mathbb{C}}^{k\times k} $$ with

G | t ^ × ŝ = V t c S t s V s c . $$ {\left.G\right|}_{\hat{t}\times \hat{s}}={V}_{tc}{S}_{ts}{V}_{sc}^{\ast }. $$ (6)
The matrix S t s $$ {S}_{ts} $$ is called a coupling matrix for the block b = ( t , s ) $$ b=\left(t,s\right) $$ .

The mappings dirchil ( t , c ) $$ \mathrm{dirchil}\left({t}^{\prime },c\right) $$ and dirblock ( t , s ) $$ \mathrm{dirblock}\left(t,s\right) $$ can be defined by taking the best approximation of c $$ c $$ in 𝒟 t in the first case and the best approximation of the direction from the center of σ $$ \sigma $$ to the center of τ $$ \tau $$ in 𝒟 t 𝒟 s in the second case.

If we have a 𝒟 2 -matrix, all admissible blocks can be represented by the cluster basis and the coupling matrices. For the inadmissible leaves b = ( t , s ) × $$ b=\left(t,s\right)\in {\mathcal{L}}_{\mathcal{I}\times \mathcal{I}}^{-} $$ , we store the corresponding submatrices G | t ^ × ŝ $$ {\left.G\right|}_{\hat{t}\times \hat{s}} $$ explicitly. Under moderate assumptions,10 these nearfield matrices require only 𝒪 ( n k ) units of storage in the low-frequency case, where the size of the leaf clusters is chosen proportional to k $$ k $$ to balance far- and nearfield storage, and in the high-frequency case 𝒪 ( n k 2 log ( n ) ) units of storage, where we have to ensure that leaf clusters are small enough to resolve waves.

For constant wave numbers κ $$ \kappa $$ , a 𝒟 2 -matrix approximation of G $$ G $$ requires only 𝒪 ( n k ) units of storage. In the high-frequency case, that is, if κ n $$ \kappa \sim \sqrt{n} $$ , a 𝒟 2 -matrix approximation requires only 𝒪 ( n k 2 log n ) units of storage.7, 10 The matrix-vector multiplication can be performed in a similar complexity since it requires not more than two arithmetic operations per stored 𝒟 2 -matrix coefficient.

3 COMPRESSION OF 𝒟 2 -MATRICES

Although directional interpolation leads to a robust and fairly fast algorithm with a complexity in 𝒪 ( n k 3 log n ) for approximating the matrix G $$ G $$ , taking into account numerical quadrature, it requires a very large amount of storage, particularly if we are interested in a highly accurate approximation: Figure 1 shows that directional interpolation converges very robustly, but also that interpolation of higher order requires a very large amount of memory, close to 1 TB for the eighth order.

Details are in the caption following the image
Left: Convergence of directional interpolation with increasing order. Right: Complete storage requirements (basis and farfield plus nearfield) with increasing order. 32,768 basis functions, κ = 8 $$ \kappa =8 $$ .

Algebraic compression techniques offer an attractive solution: we use directional interpolation only to provide an intermediate approximation of G $$ G $$ and apply algebraic techniques to reduce the rank as far as possible. The resulting re-compression algorithm can be implemented in a way that avoids having to store the entire intermediate 𝒟 2 -matrix, so that only the final result is written to memory and very large matrices can be handled at high accuracies.

We present here a version of the algorithm introduced in Reference 11 that will be modified in the following sections. Our goal is to find an improved cluster basis Q = ( Q t c ) t 𝒯 , c 𝒟 t for the matrix G $$ G $$ . In order to avoid redundant information and to ensure numerical stability, we aim for an isometric basis, that is, we require
Q t c Q t c = I for all t 𝒯 , c 𝒟 t .
The best approximation of a matrix block G | t ^ × ŝ $$ {\left.G\right|}_{\hat{t}\times \hat{s}} $$ with respect to this basis is given by the orthogonal projection Q t c Q t c G | t ^ × ŝ $$ {\left.{Q}_{tc}{Q}_{tc}^{\ast }G\right|}_{\hat{t}\times \hat{s}} $$ , and we have to ensure that all blocks connected to the cluster t $$ t $$ and the direction c $$ c $$ are approximated. We introduce the sets
t c : = { s 𝒯 : ( t , s ) × + , dirblock ( t , s ) = c } for all t 𝒯 , c 𝒟 t (7)
containing all column clusters connected via an admissible block to a row cluster t $$ t $$ and a given direction c $$ c $$ , cf. dark blue part in Figure 2, and note that
G | t ^ × ŝ Q t c Q t c G | t ^ × ŝ for all s t c $$ G{\left|{}_{\hat{t}\times \hat{s}}\approx {Q}_{tc}{Q}_{tc}^{\ast }G\right|}_{\hat{t}\times \hat{s}}\kern0.3em \mathrm{for}\ \mathrm{all}\kern0.3em s\in {\mathcal{R}}_{tc} $$
is a minimal requirement for our new basis. But it is not entirely sufficient: if t 𝒯 has children, Definition 4 requires that Q t c | t ^ × k $$ {\left.{Q}_{tc}\right|}_{{\hat{t}}^{\prime}\times k} $$ can be expressed in terms of Q t c $$ {Q}_{t^{\prime }{c}^{\prime }} $$ for t chil ( t ) $$ {t}^{\prime}\in \mathrm{chil}(t) $$ and c = dirchil ( t , c ) $$ {c}^{\prime }=\mathrm{dirchil}\left({t}^{\prime },c\right) $$ , therefore the basis Q t c $$ {Q}_{tc} $$ has to be able to approximate all admissible blocks connected to the ancestors of t $$ t $$ , as well. To reflect this requirement, we extend t c $$ {\mathcal{R}}_{tc} $$ to
t c : = t c if t is the root of 𝒯 , t c c + 𝒟 t + dirchil ( t , c + ) = c t + c + if t chil ( t + ) , t + 𝒯 for all t 𝒯 , c 𝒟 t (8)
by including all admissible blocks connected to the parent, light blue part in Figure 2, and by induction to any of its ancestors.
Details are in the caption following the image
Directional farfield clusters t c $$ {\mathcal{R}}_{tc} $$ (dark blue) and ancestores t c $$ {\mathcal{R}}_{tc}^{\ast } $$ (light blue) in horizontal direction.
A suitable cluster basis satisfies
G | t ^ × ŝ Q t c Q t c G | t ^ × ŝ for all s t c , t 𝒯 , c 𝒟 t .
By combining all of these submatrices in a large matrix
G t c : = G | t ^ × 𝒞 t c , 𝒞 t c : = { ŝ : s t c } ,
we obtain the equivalent formulation
G t c Q t c Q t c G t c for all t 𝒯 , c 𝒟 t ,
and the singular value decompositions of G t c $$ {G}_{tc} $$ can be used to determine optimal isometric matrices Q t c $$ {Q}_{tc} $$ with this property. The resulting algorithm, investigated in Reference 10, has quadratic complexity, since it does not take the special structure of G $$ G $$ into account.
If G $$ G $$ is already approximated by a 𝒟 2 -matrix, for example, via directional interpolation, we can make this algorithm considerably more efficient. We start by considering the root t $$ t $$ of 𝒯 . Let c 𝒟 t . Since G $$ G $$ is a 𝒟 2 -matrix, we have
G | t ^ × ŝ = V t c S t s V s c for all s t c , $$ {\left.G\right|}_{\hat{t}\times \hat{s}}={V}_{tc}{S}_{ts}{V}_{sc}^{\ast}\kern2.3em \mathrm{for}\ \mathrm{all}\kern0.3em s\in {\mathcal{R}}_{tc}, $$
and enumerating t c = { s 1 , , s m } $$ {\mathcal{R}}_{tc}=\left\{{s}_1,\dots, {s}_m\right\} $$ yields
G t c = V t c S t s 1 V s 1 c S t s m V s m c . $$ {G}_{tc}={V}_{tc}\left({S}_{t{s}_1}{V}_{s_1c}^{\ast}\kern0.5em \dots \kern0.5em {S}_{t{s}_m}{V}_{s_mc}^{\ast}\right). $$
The optimized cluster basis Q t c $$ {Q}_{tc} $$ depends only on the singular values and left singular vectors of G t c $$ {G}_{tc} $$ , not on the right singular vectors. In the factorized representation we have now obtained, this means that we can apply Householder reflections to condense the right factor into a small k × k $$ k\times k $$ matrix without changing the resulting matrix Q t c $$ {Q}_{tc} $$ . Using the transformations directly, however, is too computationally expensive, so we are looking for way to avoid it.

Definition 6. (basis weights)A family ( R s c ) s 𝒯 , c 𝒟 s of matrices is called a family of basis weights for the basis ( V s c ) s 𝒯 , c 𝒟 s if for every s 𝒯 and c 𝒟 s there is an isometric matrix Q s c $$ {Q}_{sc} $$ with

V s c = Q s c R s c $$ {V}_{sc}={Q}_{sc}{R}_{sc} $$
and the matrices R s c $$ {R}_{sc} $$ have each k $$ k $$ columns and at most k $$ k $$ rows.

If we have basis weights at our disposal, we obtain
G t c = V t c S t s 1 R s 1 c S t s m R s m c Q s 1 c Q s m c , $$ {G}_{tc}={V}_{tc}\left({S}_{t{s}_1}{R}_{s_1c}^{\ast}\kern0.5em \dots \kern0.5em {S}_{t{s}_m}{R}_{s_mc}^{\ast}\right)\left(\begin{array}{lll}{Q}_{s_1c}^{\ast }& & \\ {}& \ddots & \\ {}& & {Q}_{s_mc}^{\ast}\end{array}\right), $$
and since the multiplication by an adjoint isometric matrix from the right does not change the singular values or left singular vectors, we can replace G t c $$ {G}_{tc} $$ with
V t c S t s 1 R s 1 c S t s m R s m c . $$ {V}_{tc}\left({S}_{t{s}_1}{R}_{s_1c}^{\ast}\kern0.5em \dots \kern0.5em {S}_{t{s}_m}{R}_{s_mc}^{\ast}\right). $$
We can even go one step further and compute a thin Householder factorization of the right factor's adjoint
P ^ t c Z t c = R s 1 c S t s 1 R s m c S t s m $$ {\hat{P}}_{tc}{Z}_{tc}=\left(\begin{array}{l}{R}_{s_1c}{S}_{t{s}_1}^{\ast}\\ {}\vdots \\ {}{R}_{s_mc}{S}_{t{s}_m}^{\ast}\end{array}\right) $$
with an isometric matrix P ^ t c $$ {\hat{P}}_{tc} $$ and a matrix Z t c $$ {Z}_{tc} $$ that has only k $$ k $$ columns and not more than k $$ k $$ rows. If we set
P t c : = Q s 1 c Q s m c P ^ t c , $$ {P}_{tc}:= \left(\begin{array}{lll}{Q}_{s_1c}& & \\ {}& \ddots & \\ {}& & {Q}_{s_mc}\end{array}\right){\hat{P}}_{tc}, $$
we obtain
G t c = V t c Z t c P t c $$ {G}_{tc}={V}_{tc}{Z}_{tc}^{\ast }{P}_{tc}^{\ast } $$
and can drop the rightmost adjoint isometric matrix to work just with the thin matrix V t c Z t c $$ {V}_{tc}{Z}_{tc}^{\ast } $$ that has only at most k $$ k $$ columns, since multiplication by an isometric matrix from the right does not change the resulting matrix Q t c $$ {Q}_{tc} $$ .
So far, we have only considered the root of the cluster tree. If t 𝒯 is a nonroot cluster, it has a parent t + 𝒯 and our definition (8) yields
t c = t c c + 𝒟 t + dirchil ( t , c + ) = c t + c + .
Let c + 𝒟 t + with dirchil ( t , c + ) = c $$ \mathrm{dirchil}\left(t,{c}^{+}\right)=c $$ . If we assume that Z t + c + $$ {Z}_{t^{+}{c}^{+}} $$ has already been computed, we have
G t c | t ^ × 𝒞 t + c + = ( G t + c + ) | t ^ × 𝒞 t + c + = ( V t + c + Z t + c + P t + c + ) | t ^ × 𝒞 t + c + = V t c E t c + Z t + c + P t + c + .
To apply this procedure to all directions c + $$ {c}^{+} $$ , we enumerate them as
{ c 1 + , , c + } = { c + 𝒟 t + : dirchil ( t , c + ) = c }
and the admissible blocks again as t c = { s 1 , , s m } $$ {\mathcal{R}}_{tc}=\left\{{s}_1,\dots, {s}_m\right\} $$ to get
G t c = V t c S t s 1 V s 1 c S t s m V s m c E t c 1 + Z t + c 1 + P t + c 1 + E t c + Z t + c + P t + c + = V t c S t s 1 R s 1 c Q s 1 c S t s m R s m c Q S m c E t c 1 + Z t + c 1 + P t + c 1 + E t c + Z t + c + P t + c + . $$ {\displaystyle \begin{array}{ll}\hfill {G}_{tc}& ={V}_{tc}\left({S}_{t{s}_1}{V}_{s_1c}^{\ast}\kern0.5em \dots \kern0.5em {S}_{t{s}_m}{V}_{s_mc}^{\ast}\kern0.5em {E}_{t{c}_1^{+}}{Z}_{t^{+}{c}_1^{+}}^{\ast }{P}_{t^{+}{c}_1^{+}}^{\ast}\kern0.5em \dots \kern0.5em {E}_{t{c}_{\ell}^{+}}{Z}_{t^{+}{c}_{\ell}^{+}}^{\ast }{P}_{t^{+}{c}_{\ell}^{+}}^{\ast}\right)\\ {}\hfill & ={V}_{tc}\left(\begin{array}{cccccc}{S}_{t{s}_1}{R}_{s_1c}^{\ast }{Q}_{s_1c}^{\ast }& \dots & {S}_{t{s}_m}{R}_{s_mc}^{\ast }{Q}_{S_mc}^{\ast }& {E}_{t{c}_1^{+}}{Z}_{t^{+}{c}_1^{+}}^{\ast }{P}_{t^{+}{c}_1^{+}}^{\ast }& \dots & {E}_{t{c}_{\ell}^{+}}{Z}_{t^{+}{c}_{\ell}^{+}}^{\ast }{P}_{t^{+}{c}_{\ell}^{+}}^{\ast}\end{array}\right).\end{array}} $$
The rightmost factors are again isometric, and we can once more compute a thin Householder factorization
P ^ t c Z t c = R s 1 c S t s 1 R s m c S t s m Z t + c 1 + E t c 1 + Z t + c + E t c + and set P t c : = Q s 1 c Q s m c P t + c 1 + P t + c + P ^ t c . $$ {\hat{P}}_{tc}{Z}_{tc}=\left(\begin{array}{l}{R}_{s_1c}{S}_{t{s}_1}^{\ast}\\ {}\vdots \\ {}{R}_{s_mc}{S}_{t{s}_m}^{\ast}\\ {}{Z}_{t^{+}{c}_1^{+}}{E}_{t{c}_1^{+}}^{\ast}\\ {}\vdots \\ {}{Z}_{t^{+}{c}_{\ell}^{+}}{E}_{t{c}_{\ell}^{+}}^{\ast}\end{array}\right)\kern1em \mathrm{and}\ \mathrm{set}\kern1em {P}_{tc}:= \left(\begin{array}{llllll}{Q}_{s_1c}& & & & & \\ {}& \ddots & & & & \\ {}& & {Q}_{s_mc}& & & \\ {}& & & {P}_{t^{+}{c}_1^{+}}& & \\ {}& & & & \ddots & \\ {}& & & & & {P}_{t^{+}{c}_{\ell}^{+}}\end{array}\right){\hat{P}}_{tc}. $$
to obtain
G t c = V t c Z t c P t c . $$ {G}_{tc}={V}_{tc}{Z}_{tc}^{\ast }{P}_{tc}^{\ast }. $$
Since the isometric matrices P t c $$ {P}_{tc} $$ do not influence the range of G t c $$ {G}_{tc} $$ , we do not have to compute them, we only need the weight matrices Z t c $$ {Z}_{tc} $$ .

Definition 7. (total weights)A family ( Z t c ) t 𝒯 , c 𝒟 t of matrices is called a family of total weights for the 𝒟 2 -matrix G $$ G $$ if for every t 𝒯 and c 𝒟 t there is an isometric matrix P t c $$ {P}_{tc} $$ with

G t c = V t c Z t c P t c $$ {G}_{tc}={V}_{tc}{Z}_{tc}^{\ast }{P}_{tc}^{\ast } $$ (9)
and the matrices Z t c $$ {Z}_{tc} $$ have each k $$ k $$ columns and at most k $$ k $$ rows.

Remark 1. (Symmetric total weights)In the original approximation constructed by directional interpolation, the same cluster basis is used for rows and columns, since we have G | t ^ × ŝ = V t c S t s V s c $$ {\left.G\right|}_{\hat{t}\times \hat{s}}={V}_{tc}{S}_{ts}{V}_{sc}^{\ast } $$ for all admissible blocks b = ( t , s ) × + $$ b=\left(t,s\right)\in {\mathcal{L}}_{\mathcal{I}\times \mathcal{I}}^{+} $$ .

Since the matrix G $$ G $$ is not self-adjoint, this property no longer holds for the adaptively constructed basis ( Q t c ) t 𝒯 , c 𝒟 t and we would have to construct a separate basis for the columns by applying the procedure to the adjoint matrix G $$ {G}^{\ast } $$ .

A possible alternative is to extend the total weight matrices to handle G $$ G $$ and G $$ {G}^{\ast } $$ simultaneously: for every s t c $$ s\in {\mathcal{R}}_{tc} $$ , we include not only R s c S t s $$ {R}_{sc}{S}_{ts}^{\ast } $$ in the construction of the weight Z t c $$ {Z}_{tc} $$ , but also R s c S s t $$ {R}_{sc}{S}_{st} $$ . This will give us an adaptive cluster basis that can be used for rows and columns, just like the original. Since the matrices appearing in the Householder factorization are now twice as large, the algorithm will take almost twice as long to complete and the adaptively chosen ranks may increase.

We can compute the total weights efficiently by this procedure as long as we have the basis weights ( R s c ) s 𝒯 , c 𝒟 s at our disposal. These weights can be computed efficiently by taking advantage of their nested structure: if s 𝒯 is a leaf, we compute the thin Householder factorization
V s c = Q s c R s c $$ {V}_{sc}={Q}_{sc}{R}_{sc} $$
with an isometric matrix Q s c $$ {Q}_{sc} $$ and a matrix R s c $$ {R}_{sc} $$ with k $$ k $$ columns and at most k $$ k $$ rows.
If s 𝒯 has children, we first compute the basis weights for all children chil ( s ) = { s 1 , , s } $$ \mathrm{chil}(s)=\left\{{s}_1,\dots, {s}_{\ell}\right\} $$ by recursion and let
V ^ s c = R s 1 c 1 E s 1 c R s c E s c , $$ {\hat{V}}_{sc}=\left(\begin{array}{l}{R}_{s_1{c}_1}{E}_{s_1c}\\ {}\vdots \\ {}{R}_{s_{\ell }{c}_{\ell }}{E}_{s_{\ell }c}\end{array}\right), $$ (10)
where c i = dirchil ( s i , c ) $$ {c}_i=\mathrm{dirchil}\left({s}_i,c\right) $$ for all i [ 1 : ] $$ i\in \left[1:\ell \right] $$ . We compute the thin Householder factorization
V ^ s c = Q ^ s c R s c $$ {\hat{V}}_{sc}={\hat{Q}}_{sc}{R}_{sc} $$
and find
V s c = V s 1 c 1 E s 1 c V s c E s c = Q s 1 c 1 Q s c V ^ s c = Q s 1 c 1 Q s c Q ^ s c = : Q s c R s c . $$ {V}_{sc}=\left(\begin{array}{l}{V}_{s_1{c}_1}{E}_{s_1c}\\ {}\vdots \\ {}{V}_{s_{\ell }{c}_{\ell }}{E}_{s_{\ell }c}\end{array}\right)=\left(\begin{array}{lll}{Q}_{s_1{c}_1}& & \\ {}& \ddots & \\ {}& & {Q}_{s_{\ell }{c}_{\ell }}\end{array}\right){\hat{V}}_{sc}=\underset{=: {Q}_{sc}}{\underbrace{\left(\begin{array}{lll}{Q}_{s_1{c}_1}& & \\ {}& \ddots & \\ {}& & {Q}_{s_{\ell }{c}_{\ell }}\end{array}\right){\hat{Q}}_{sc}}}{R}_{sc}. $$
The matrix Q s c $$ {Q}_{sc} $$ is the product of two isometric matrices and therefore itself isometric. We can see that we can compute the basis weight matrices R s c $$ {R}_{sc} $$ using only 𝒪 ( k 3 ) operations per s 𝒯 and c 𝒟 s as long as we are not interested in Q s c $$ {Q}_{sc} $$ . The algorithm is summarized in Figure 3.
Details are in the caption following the image
Construction of the basis weights R s c $$ {R}_{sc} $$ .

Once the basis weights and total weights have been computed, we can construct the improved cluster basis Q t c $$ {Q}_{tc} $$ .

If t 𝒯 is a leaf, we make use of (9) to get
G t c = V t c Z t c P t c , $$ {G}_{tc}={V}_{tc}{Z}_{tc}{P}_{tc}^{\ast }, $$
and we can again drop the isometric matrix P t c $$ {P}_{tc} $$ and only have to find the singular value decomposition of V t c Z t c $$ {V}_{tc}{Z}_{tc}^{\ast } $$ , choose a suitable rank k t c 0 $$ {k}_{tc}\in {\mathbb{N}}_0 $$ and use the first k t c $$ {k}_{tc} $$ left singular vectors as columns of the matrix Q t c $$ {Q}_{tc} $$ . We also prepare the matrix T t c : = Q t c V t c $$ {T}_{tc}:= {Q}_{tc}^{\ast }{V}_{tc} $$ describing the change of basis from V t c $$ {V}_{tc} $$ to Q t c $$ {Q}_{tc} $$ .
If t 𝒯 is not a leaf, we first construct the basis for all children { t 1 , , t } = chil ( t ) $$ \left\{{t}_1,\dots, {t}_{\ell}\right\}=\mathrm{chil}(t) $$ . Since the parent can only approximate what has been kept by its children, we can switch to the orthogonal projection
G ^ t c : = Q t 1 c 1 Q t c G t c = Q t 1 c 1 G | t ^ 1 × 𝒞 t c Q t c G | t ^ × 𝒞 t c
of G t c $$ {G}_{tc} $$ with c i = dirchil ( t i , c ) $$ {c}_i=\mathrm{dirchil}\left({t}_i,c\right) $$ for all i [ 1 : ] $$ i\in \left[1:\ell \right] $$ . Using again (9), we find
G ^ t c = V ^ t c Z t c P t c , $$ {\hat{G}}_{tc}={\hat{V}}_{tc}{Z}_{tc}{P}_{tc}^{\ast }, $$
with the projection of V t c $$ {V}_{tc} $$ into the children's bases
V ^ t c : = T t 1 , c 1 E t 1 c T t , c E t c $$ {\hat{V}}_{tc}:= \left(\begin{array}{l}{T}_{t_1,{c}_1}{E}_{t_1c}\\ {}\vdots \\ {}{T}_{t_{\ell },{c}_{\ell }}{E}_{t_{\ell }c}\end{array}\right) $$ (11)
that can be easily computed using the transfer matrices and the basis-change matrices. Once again we can drop the isometric factor P t c $$ {P}_{tc} $$ and only have to compute the singular value decomposition of V ^ t c Z t c $$ {\hat{V}}_{tc}{Z}_{tc}^{\ast } $$ , choose again a suitable rank k t c 0 $$ {k}_{tc}\in {\mathbb{N}}_0 $$ and use the first k t c $$ {k}_{tc} $$ left singular vectors as columns of a matrix Q ^ t c $$ {\hat{Q}}_{tc} $$ . Using
Q t c : = Q t 1 , c 1 Q t , c Q ^ t c $$ {Q}_{tc}:= \left(\begin{array}{lll}{Q}_{t_1,{c}_1}& & \\ {}& \ddots & \\ {}& & {Q}_{t_{\ell },{c}_{\ell }}\end{array}\right){\hat{Q}}_{tc} $$
gives us the new cluster basis, where the transfer matrices can be extracted from Q ^ t c $$ {\hat{Q}}_{tc} $$ . Again it is a good idea to prepare the basis-change matrix T t c : = Q t c V t c = Q ^ t c V ^ t c $$ {T}_{tc}:= {Q}_{tc}^{\ast }{V}_{tc}={\hat{Q}}_{tc}^{\ast }{\hat{V}}_{tc} $$ for the next steps of the recursion.

Under standard assumptions, the entire construction can be performed in 𝒪 ( n k 2 ) operations for constant wave numbers and 𝒪 ( n k 3 log n ) operations in the high-frequency case.11 The algorithm is summarized in Figure 4. It is important to note that the total weight matrices Z t c $$ {Z}_{tc} $$ can be constructed and discarded during the recursive algorithm, they do not have to be kept in storage permanently. This is in contrast to the basis weight matrices R s c $$ {R}_{sc} $$ that may appear at any time during the recursion and therefore are kept in storage during the entire run of the algorithm.

Details are in the caption following the image
Construction of an adaptive cluster basis.

Figure 5 shows an experiment with the single-layer matrix with the wave number κ = 8 $$ \kappa =8 $$ and n = 32 , 768 $$ n=32,768 $$ piecewise constant basis functions on the unit sphere. The left image illustrates that the error decreases exponentially both with and without recompression. The right image compares the storage requirements for the original matrix (magenta, nearfield, farfield and bases) with those for the recompressed matrix (green, again nearfield, farfield, and bases). The blue curve corresponds to the storage needed for the basis weights, and we can see that it grows beyond the storage for the entire recompressed 𝒟 2 -matrix if higher polynomial orders are used. This is not acceptable if we want to apply the method to high-frequency problems at high accuracies, so we will now work to reduce the storage requirements for these weights without harming the convergence of the overall method.

Details are in the caption following the image
Left: convergence of recompressed interpolation with increasing order. Right: storage requirements of original interpolation and recompression.

4 ERROR CONTROL

In order to preserve the convergence properties, we have to investigate how our algorithm reacts to perturbations.

4.1 Error decomposition

We consider an admissible block ( t , s ) × + $$ \left(t,s\right)\in {\mathcal{L}}_{\mathcal{I}\times \mathcal{I}}^{+} $$ with c = dirblock ( t , s ) $$ c=\mathrm{dirblock}\left(t,s\right) $$ that is approximated by our algorithm by
Q t c Q t c G | t ^ × ŝ . $$ {\left.{Q}_{tc}{Q}_{tc}^{\ast }G\right|}_{\hat{t}\times \hat{s}}. $$
If t $$ t $$ is a leaf, the approximation error is given by
G | t ^ × ŝ Q t c Q t c G | t ^ × ŝ . $$ G{\left|{}_{\hat{t}\times \hat{s}}-{Q}_{tc}{Q}_{tc}^{\ast }G\right|}_{\hat{t}\times \hat{s}}. $$
If t $$ t $$ is not a leaf, there are children { t 1 , , t } = chil ( t ) $$ \left\{{t}_1,\dots, {t}_{\ell}\right\}=\mathrm{chil}(t) $$ with directions c i = dirchil ( t i , c ) $$ {c}_i=\mathrm{dirchil}\left({t}_i,c\right) $$ , i [ 1 : ] $$ i\in \left[1:\ell \right] $$ , and an isometric matrix Q ^ t c $$ {\hat{Q}}_{tc} $$ such that
Q t c = Q t 1 c 1 Q t i c i = : U t c Q ^ t c = U t c Q ^ t c $$ {Q}_{tc}=\underset{=: {U}_{tc}}{\underbrace{\left(\begin{array}{lll}{Q}_{t_1{c}_1}& & \\ {}& \ddots & \\ {}& & {Q}_{t_i{c}_i}\end{array}\right)}}{\hat{Q}}_{tc}={U}_{tc}{\hat{Q}}_{tc} $$
and the approximation error can be split into
G | t ^ × ŝ Q t c Q t c G | t ^ × ŝ = G | t ^ × ŝ U t c Q ^ t c Q ^ t c U t c G | t ^ × ŝ = G | t ^ × ŝ U t c U t c G | t ^ × ŝ + U t c ( I Q ^ t c Q ^ t c ) U t c G | t ^ × ŝ = G | t ^ 1 × ŝ Q t 1 c 1 Q t 1 c 1 G | t ^ 1 × ŝ G | t ^ × ŝ Q t c Q t c G | t ^ × ŝ + U t c ( I Q ^ t c Q ^ t c ) U t c G | t ^ × ŝ . $$ {\displaystyle \begin{array}{ll}\hfill G{\left|{}_{\hat{t}\times \hat{s}}-{Q}_{tc}{Q}_{tc}^{\ast }G\right|}_{\hat{t}\times \hat{s}}& =G{\left|{}_{\hat{t}\times \hat{s}}-{U}_{tc}{\hat{Q}}_{tc}{\hat{Q}}_{tc}^{\ast }{U}_{tc}^{\ast }G\right|}_{\hat{t}\times \hat{s}}\\ {}\hfill & {\left.=G\right|}_{\hat{t}\times \hat{s}}-{U}_{tc}{U}_{tc}^{\ast }G{\left|{}_{\hat{t}\times \hat{s}}+{U}_{tc}\left(I-{\hat{Q}}_{tc}{\hat{Q}}_{tc}^{\ast}\right){U}_{tc}^{\ast }G\right|}_{\hat{t}\times \hat{s}}\\ {}\hfill & {\left.=\left(\begin{array}{c}G{\left|{}_{{\hat{t}}_1\times \hat{s}}-{Q}_{t_1{c}_1}{Q}_{t_1{c}_1}^{\ast }G\right|}_{{\hat{t}}_1\times \hat{s}}\\ {}\vdots \\ {}G{\left|{}_{{\hat{t}}_{\ell}\times \hat{s}}-{Q}_{t_{\ell }{c}_{\ell }}{Q}_{t_{\ell }{c}_{\ell}}^{\ast }G\right|}_{{\hat{t}}_{\ell}\times \hat{s}}\end{array}\right)+{U}_{tc}\left(I-{\hat{Q}}_{tc}{\hat{Q}}_{tc}^{\ast}\right){U}_{tc}^{\ast }G\right|}_{\hat{t}\times \hat{s}}.\end{array}} $$
The ranges of both terms are perpendicular: for any pair x , y ŝ $$ x,y\in {\mathbb{C}}^{\hat{s}} $$ of vectors we have
( G | t ^ × ŝ U t c U t c G | t ^ × ŝ ) x , U t c ( I Q ^ t c Q ^ t c ) U t c G | t ^ × ŝ y = U t c ( G | t ^ × ŝ U t c U t c G | t ^ × ŝ ) x , ( I Q ^ t c Q ^ t c ) U t c G | t ^ × ŝ y = ( U t c G | t ^ × ŝ U t c G | t ^ × ŝ ) x , ( I Q ^ t c Q ^ t c ) U t c G | t ^ × ŝ y = 0 $$ {\displaystyle \begin{array}{ll}\hfill & {\left.\Big\langle \left(G|{}_{\hat{t}\times \hat{s}}-{U}_{tc}{U}_{tc}^{\ast }{G}_{\hat{t}\times \hat{s}}\right)x,{U}_{tc}\left(I-{\hat{Q}}_{tc}{\hat{Q}}_{tc}^{\ast}\right){U}_{tc}^{\ast }G\right|}_{\hat{t}\times \hat{s}}y\Big\rangle \\ {}\hfill & {\left.\kern1em =\Big\langle {U}_{tc}^{\ast}\left(G|{}_{\hat{t}\times \hat{s}}-{U}_{tc}{U}_{tc}^{\ast }{G}_{\hat{t}\times \hat{s}}\right)x,\left(I-{\hat{Q}}_{tc}{\hat{Q}}_{tc}^{\ast}\right){U}_{tc}^{\ast }G\right|}_{\hat{t}\times \hat{s}}y\Big\rangle \\ {}\hfill & {\left.\kern1em =\Big\langle \left({U}_{tc}^{\ast }G|{}_{\hat{t}\times \hat{s}}-{U}_{tc}^{\ast }{G}_{\hat{t}\times \hat{s}}\right)x,\left(I-{\hat{Q}}_{tc}{\hat{Q}}_{tc}^{\ast}\right){U}_{tc}^{\ast }G\right|}_{\hat{t}\times \hat{s}}y\Big\rangle =0\end{array}} $$
due to U t c U t c = I $$ {U}_{tc}^{\ast }{U}_{tc}=I $$ . By Pythagoras' theorem, this implies
( G | t ^ × ŝ Q t c Q t c G | t ^ × ŝ ) x 2 2 = i = 1 ( G | t ^ i × ŝ Q t i c i Q t i c i G | t ^ i × ŝ ) x 2 2 + ( I Q ^ t c Q ^ t c ) U t c G | t ^ × ŝ x 2 2 for all x ŝ , $$ {\left.{\left.{\left\Vert \left(G|{}_{\hat{t}\times \hat{s}}-{Q}_{tc}{Q}_{tc}^{\ast }{G}_{\hat{t}\times \hat{s}}\right)x\right\Vert}_2^2\kern0.5em =\sum \limits_{i=1}^{\ell}\left\Vert \right(G\right|}_{{\hat{t}}_i\times \hat{s}}-{Q}_{t_i{c}_i}{Q}_{t_i{c}_i}^{\ast }G{\left|{\left.{}_{{\hat{t}}_i\times \hat{s}}\Big)x\right\Vert}_2^2+\Big\Vert \left(I-{\hat{Q}}_{tc}{\hat{Q}}_{tc}^{\ast}\right){U}_{tc}^{\ast }G\right|}_{\hat{t}\times \hat{s}}x\right\Vert}_2^2\kern1em \mathrm{for}\ \mathrm{all}\kern0.3em x\in {\mathbb{C}}^{\hat{s}}, $$ (12)
that is, we can split the error exactly into contributions of the children and a contribution of the parent t $$ t $$ . If the children have children again, we can proceed by induction. To make this precise, we introduce the sets of descendants
desc ( t , c ) : = { ( t , c ) } if chil ( t ) = , { ( t , c ) } t chil ( t ) desc ( t , dirchil ( t , c ) ) otherwise $$ \mathrm{desc}\left(t,c\right):= \left\{\begin{array}{ll}\left\{\left(t,c\right)\right\}\kern1em & \mathrm{if}\kern0.3em \mathrm{chil}(t)=\varnothing, \\ {}\left\{\left(t,c\right)\right\}{\cup \bigcup}_{t^{\prime}\in \mathrm{chil}(t)}\mathrm{desc}\left({t}^{\prime },\mathrm{dirchil}\left({t}^{\prime },c\right)\right)\kern1em & \mathrm{otherwise}\end{array}\right. $$
for all t 𝒯 and c 𝒟 t .

Theorem 1. (error representation)We define

G ^ t s c : = G | t ^ × ŝ if chil ( t ) = , U t c G | t ^ × ŝ otherwise for all t , s 𝒯 , c 𝒟 t .
In the previous section, we have already defined Q ^ t c $$ {\hat{Q}}_{tc} $$ for nonleaf clusters. We extend this notation by setting Q ^ t c : = Q t c $$ {\hat{Q}}_{tc}:= {Q}_{tc} $$ for all leaf clusters t 𝒯 and all c 𝒟 t . Then we have
( G | t ^ × ŝ Q t c Q t c G | t ^ × ŝ ) x 2 2 = ( t , c ) desc ( t , c ) ( G ^ t s c Q ^ t c Q ^ t c G ^ t s c ) x 2 2 $$ {\left\Vert \left(G{\left|{}_{\hat{t}\times \hat{s}}-{Q}_{tc}{Q}_{tc}^{\ast }G\right|}_{\hat{t}\times \hat{s}}\right)x\right\Vert}_2^2=\sum \limits_{\left({t}^{\prime },{c}^{\prime}\right)\in \mathrm{desc}\left(t,c\right)}{\left\Vert \left({\hat{G}}_{t^{\prime }s{c}^{\prime }}-{\hat{Q}}_{t^{\prime }{c}^{\prime }}{\hat{Q}}_{t^{\prime }{c}^{\prime}}^{\ast }{\hat{G}}_{t^{\prime }s{c}^{\prime }}\right)x\right\Vert}_2^2 $$
for all ( t , s ) × + $$ \left(t,s\right)\in {\mathcal{L}}_{\mathcal{I}\times \mathcal{I}}^{+} $$ with c = dirblock ( t , s ) $$ c=\mathrm{dirblock}\left(t,s\right) $$ and all x ŝ $$ x\in {\mathbb{C}}^{\hat{s}} $$ .

Proof.With the new notation, (12) takes the form

( G | t ^ × ŝ Q t c Q t c G | t ^ × ŝ ) x 2 2 = t chil ( t ) c = dirchil ( t , c ) ( G | t ^ × ŝ Q t c Q t c G | t ^ × ŝ ) x 2 2 + ( G ^ t s c Q ^ t c Q ^ t c G ^ t s c ) x 2 2 , $$ {\left\Vert \left(G{\left|{}_{\hat{t}\times \hat{s}}-{Q}_{tc}{Q}_{tc}^{\ast }G\right|}_{\hat{t}\times \hat{s}}\right)x\right\Vert}_2^2\kern0.5em =\sum \limits_{\genfrac{}{}{0ex}{}{t^{\prime}\in \mathrm{chil}(t)}{c^{\prime }=\mathrm{dirchil}\left({t}^{\prime },c\right)}}^{\ell }{\left\Vert \left(G{\left|{}_{{\hat{t}}^{\prime}\times \hat{s}}-{Q}_{t^{\prime }{c}^{\prime }}{Q}_{t^{\prime }{c}^{\prime}}^{\ast }G\right|}_{{\hat{t}}^{\prime}\times \hat{s}}\right)x\right\Vert}_2^2+{\left\Vert \left({\hat{G}}_{tsc}-{\hat{Q}}_{tc}{\hat{Q}}_{tc}^{\ast }{\hat{G}}_{tsc}\right)x\right\Vert}_2^2, $$
and a straightforward induction yields the result.

We can see that the matrices G ^ t s c $$ {\hat{G}}_{tsc} $$ required by this theorem appear explicitly in the compression algorithm: G t c $$ {G}_{tc} $$ is the combination of all matrices G ^ t s c $$ {\hat{G}}_{tsc} $$ for s t c $$ s\in {\mathcal{R}}_{tc}^{\ast } $$ if t $$ t $$ is a leaf, and otherwise G ^ t c = U t c G t c = V ^ t c Z t c P t c $$ {\hat{G}}_{tc}={U}_{tc}^{\ast }{G}_{tc}={\hat{V}}_{tc}{Z}_{tc}^{\ast }{P}_{tc}^{\ast } $$ is the combination of all matrices G ^ t s c $$ {\hat{G}}_{tsc} $$ for s t c $$ s\in {\mathcal{R}}_{tc}^{\ast } $$ .

The compression algorithm computes the singular value decompositions of the matrices G t c $$ {G}_{tc} $$ and G ^ t c $$ {\hat{G}}_{tc} $$ , respectively, so we have all the singular values at our disposal to guarantee G t c Q t c Q t c G t c 2 ϵ $$ {\left\Vert {G}_{tc}-{Q}_{tc}{Q}_{tc}^{\ast }{G}_{tc}\right\Vert}_2\le \epsilon $$ or G ^ t c Q ^ t c Q ^ t c G ^ t c 2 ϵ $$ {\left\Vert {\hat{G}}_{tc}-{\hat{Q}}_{tc}{\hat{Q}}_{tc}^{\ast }{\hat{G}}_{tc}\right\Vert}_2\le \epsilon $$ , respectively, for any given accuracy ϵ > 0 $$ \epsilon \in {\mathbb{R}}_{>0} $$ by ensuring that the first dropped singular value σ k t c + 1 $$ {\sigma}_{k_{tc}+1} $$ is less than ϵ $$ \epsilon $$ .

4.2 Block-relative error control

Although multiple submatrices are combined in G t c $$ {G}_{tc} $$ , they do not all have to be treated identically12(chapter 6.8): we can scale the different submatrices with individually chosen weights, for example, given t 𝒯 and c 𝒟 t , we can choose a weight ω t s > 0 $$ {\omega}_{ts}\in {\mathbb{R}}_{>0} $$ for every s t c $$ s\in {\mathcal{R}}_{tc}^{\ast } $$ and replace G t c $$ {G}_{tc} $$ by
G ω , t c : = ω t s 1 1 G | t ^ × ŝ 1 ω t s m 1 G | t ^ × ŝ m $$ {G}_{\omega, tc}:= \left({\omega}_{t{s}_1}^{-1}G{\left|{}_{\hat{t}\times {\hat{s}}_1}\kern0.5em \dots \kern0.5em {\omega}_{t{s}_m}^{-1}G\right|}_{\hat{t}\times {\hat{s}}_m}\right) $$
with the enumeration t c = { s 1 , , s m } $$ {\mathcal{R}}_{tc}^{\ast }=\left\{{s}_1,\dots, {s}_m\right\} $$ . Correspondingly, G ^ t c $$ {\hat{G}}_{tc} $$ is replaced by a weighted version G ^ ω , t c $$ {\hat{G}}_{\omega, tc} $$ and Z t c $$ {Z}_{tc} $$ by Z ω , t c $$ {Z}_{\omega, tc} $$ . The modified algorithm will now guarantee
G ^ t s c Q ^ t c Q ^ t c G ^ t s c 2 = ω t s G ω , t c | t ^ × ŝ Q t c Q t c G ω , t c | t ^ × ŝ 2 ω t s G ω , t c Q t c Q t c G ω , t c 2 ω t s ϵ $$ {\displaystyle \begin{array}{ll}\hfill {\left\Vert {\hat{G}}_{ts c}-{\hat{Q}}_{tc}{\hat{Q}}_{tc}^{\ast }{\hat{G}}_{ts c}\right\Vert}_2& ={\omega}_{ts}{\left\Vert {G}_{\omega, tc}{\left|{}_{\hat{t}\times \hat{s}}-{Q}_{tc}{Q}_{tc}^{\ast }{G}_{\omega, tc}\right|}_{\hat{t}\times \hat{s}}\right\Vert}_2\\ {}\hfill & \le {\omega}_{ts}{\left\Vert {G}_{\omega, tc}-{Q}_{tc}{Q}_{tc}^{\ast }{G}_{\omega, tc}\right\Vert}_2\le {\omega}_{ts}\epsilon \end{array}} $$
for leaf clusters t 𝒯 and
G ^ t s c Q ^ t c Q ^ t c G ^ t s c 2 = ω t s U t G ω , t c | t ^ × ŝ Q ^ t c Q ^ t c U t G ω , t c | t ^ × ŝ 2 ω t s G ^ ω , t c Q ^ t c Q ^ t c G ^ ω , t c 2 ω t s ϵ $$ {\displaystyle \begin{array}{ll}\hfill {\left\Vert {\hat{G}}_{ts c}-{\hat{Q}}_{tc}{\hat{Q}}_{tc}^{\ast }{\hat{G}}_{ts c}\right\Vert}_2& ={\omega}_{ts}{\left\Vert {U}_t^{\ast }{G}_{\omega, tc}{\left|{}_{\hat{t}\times \hat{s}}-{\hat{Q}}_{tc}{\hat{Q}}_{tc}^{\ast }{U}_t^{\ast }{G}_{\omega, tc}\right|}_{\hat{t}\times \hat{s}}\right\Vert}_2\\ {}\hfill & \le {\omega}_{ts}{\left\Vert {\hat{G}}_{\omega, tc}-{\hat{Q}}_{tc}{\hat{Q}}_{tc}^{\ast }{\hat{G}}_{\omega, tc}\right\Vert}_2\le {\omega}_{ts}\epsilon \end{array}} $$
for nonleaf clusters. With these modifications, Theorem 1 yields
( G | t ^ × ŝ Q t c Q t c G | t ^ × ŝ ) x 2 2 ( t , c ) desc ( t , c ) ω t s 2 ϵ 2 for all ( t , s ) × + . $$ {\left\Vert \left(G{\left|{}_{\hat{t}\times \hat{s}}-{Q}_{tc}{Q}_{tc}^{\ast }G\right|}_{\hat{t}\times \hat{s}}\right)x\right\Vert}_2^2\le \sum \limits_{\left({t}^{\prime },{c}^{\prime}\right)\in \mathrm{desc}\left(t,c\right)}{\omega}_{t^{\prime }s}^2{\epsilon}^2\kern0.3em \mathrm{for}\ \mathrm{all}\kern0.3em \left(t,s\right)\in {\mathcal{L}}_{\mathcal{I}\times \mathcal{I}}^{+}. $$ (13)
The weights ω t s $$ {\omega}_{ts} $$ can be used to keep the error closely under control. As an example, we consider how to implement block-relative error controls, that is, how to ensure
G | t ^ × ŝ Q t c Q t c G | t ^ × ŝ 2 ϵ G | t ^ × ŝ $$ {\left.\Big\Vert G\right|}_{\hat{t}\times \hat{s}}-{Q}_{tc}{Q}_{tc}^{\ast }G{\left|{\left.{}_{\hat{t}\times \hat{s}}\right\Vert}_2\le \epsilon \Big\Vert G\right|}_{\hat{t}\times \hat{s}}\Big\Vert $$
for a block ( t , s ) × + $$ \left(t,s\right)\in {\mathcal{L}}_{\mathcal{I}\times \mathcal{I}}^{+} $$ . We start by observing that we have
G | t ^ × ŝ 2 = V t c S t s V s c 2 = R t c S t s R s c 2 $$ {\left.{\left.\Big\Vert G\right|}_{\hat{t}\times \hat{s}}\right\Vert}_2={\left\Vert {V}_{tc}{S}_{ts}{V}_{sc}^{\ast}\right\Vert}_2={\left\Vert {R}_{tc}{S}_{ts}{R}_{sc}^{\ast}\right\Vert}_2 $$
due to the admissibility and using the basis weights introduced in Definition 6, so the spectral norm can be computed efficiently.
We assume that a cluster can have at most m $$ m $$ children and set
ω t s : = 1 m + 1 G | t ^ × ŝ 2 if t = t , 1 m + 1 ω t + s if t chil ( t + ) for all ( t , c ) desc ( t , c ) . $$ {\omega}_{t^{\prime }s}:= \left\{\begin{array}{ll}{\left.{\left.\frac{1}{\sqrt{m+1}}\Big\Vert G\right|}_{\hat{t}\times \hat{s}}\right\Vert}_2\kern1em & \mathrm{if}\kern0.3em {t}^{\prime }=t,\\ {}\frac{1}{\sqrt{m+1}}{\omega}_{t^{+}s}\kern1em & \mathrm{if}\kern0.3em {t}^{\prime}\in \mathrm{chil}\left({t}^{+}\right)\end{array}\right.\kern0.3em \mathrm{for}\ \mathrm{all}\kern0.3em \left({t}^{\prime },{c}^{\prime}\right)\in \mathrm{desc}\left(t,c\right). $$
Keeping in mind that every cluster can have at most m $$ m $$ children, substituting ω t s $$ {\omega}_{t^{\prime }s} $$ in (13) and summing up level by level yields
G | t ^ × ŝ Q t c Q t c G | t ^ × ŝ 2 2 ( t , c ) desc ( t , c ) ω t s 2 ϵ 2 = 0 m m + 1 ω t s 2 ϵ 2 , $$ {\left\Vert G{\left|{}_{\hat{t}\times \hat{s}}-{Q}_{tc}{Q}_{tc}^{\ast }G\right|}_{\hat{t}\times \hat{s}}\right\Vert}_2^2\le \sum \limits_{\left({t}^{\prime },{c}^{\prime}\right)\in \mathrm{desc}\left(t,c\right)}{\omega}_{t^{\prime }s}^2{\epsilon}^2\le \sum \limits_{\ell =0}^{\infty }{\left(\frac{m}{m+1}\right)}^{\ell }{\omega}_{ts}^2{\epsilon}^2, $$
allowing us to evaluate the geometric sum to conclude
G | t ^ × ŝ Q t c Q t c G | t ^ × ŝ 2 2 1 1 m m + 1 ω t s 2 ϵ 2 = ( m + 1 ) ω t s 2 ϵ 2 = ϵ 2 G | t ^ × ŝ 2 2 . $$ {\left.{\left.\Big\Vert G\right|}_{\hat{t}\times \hat{s}}-{Q}_{tc}{Q}_{tc}^{\ast }G{\left|{\left.{}_{\hat{t}\times \hat{s}}\right\Vert}_2^2\le \frac{1}{1-\frac{m}{m+1}}{\omega}_{ts}^2{\epsilon}^2=\left(m+1\right){\omega}_{ts}^2{\epsilon}^2={\epsilon}^2\Big\Vert G\right|}_{\hat{t}\times \hat{s}}\right\Vert}_2^2. $$
The weights ω t s $$ {\omega}_{t^{\prime }s} $$ can be computed and conveniently included during the construction of the total weights at only minimal additional cost.

4.3 Stability

In order to improve the efficiency of our algorithm, we would like to replace the 𝒟 2 -matrix G $$ G $$ by an approximation. If we want to ensure that the result of the compression algorithm is still useful, we have to investigate its stability. In the following, G $$ G $$ denotes the matrix treated during the compression algorithm, while H $$ H $$ denotes the matrix that we actually want to approximate.

Lemma 1. (stability)Let H × $$ H\in {\mathbb{C}}^{\mathcal{I}\times \mathcal{I}} $$ . We have

( H | t ^ × ŝ Q t c Q t c H | t ^ × ŝ ) x 2 ( G | t ^ × ŝ Q t c Q t c G | t ^ × ŝ x 2 + ( H | t ^ × ŝ G | t ^ × ŝ ) x 2 $$ {\left\Vert \left(H{\left|{}_{\hat{t}\times \hat{s}}-{Q}_{tc}{Q}_{tc}^{\ast }H\right|}_{\hat{t}\times \hat{s}}\right)x\right\Vert}_2\le {\left\Vert \Big(G{\left|{}_{\hat{t}\times \hat{s}}-{Q}_{tc}{Q}_{tc}^{\ast }G\right|}_{\hat{t}\times \hat{s}}x\right\Vert}_2+{\left\Vert \left(H{\left|{}_{\hat{t}\times \hat{s}}-G\right|}_{\hat{t}\times \hat{s}}\right)x\right\Vert}_2 $$
for all ( t , s ) × + $$ \left(t,s\right)\in {\mathcal{L}}_{\mathcal{I}\times \mathcal{I}}^{+} $$ with c = dirblock ( t , s ) $$ c=\mathrm{dirblock}\left(t,s\right) $$ and all x ŝ $$ x\in {\mathbb{C}}^{\hat{s}} $$ .

Proof.Let ( t , s ) × + $$ \left(t,s\right)\in {\mathcal{L}}_{\mathcal{I}\times \mathcal{I}}^{+} $$ , c = dirblock ( t , s ) $$ c=\mathrm{dirblock}\left(t,s\right) $$ and x ŝ $$ x\in {\mathbb{C}}^{\hat{s}} $$ . We have

H | t ^ × ŝ Q t c Q t c H | t ^ × ŝ = G | t ^ × ŝ + ( H G ) | t ^ × ŝ Q t c Q t c G | t ^ × ŝ Q t c Q t c ( H G ) | t ^ × ŝ = G | t ^ × ŝ Q t c Q t c G | t ^ × ŝ + ( I Q t c Q t c ) ( H G ) | t ^ × ŝ $$ {\displaystyle \begin{array}{ll}\hfill H{\left|{}_{\hat{t}\times \hat{s}}-{Q}_{tc}{Q}_{tc}^{\ast }H\right|}_{\hat{t}\times \hat{s}}& =G{\left|{}_{\hat{t}\times \hat{s}}+\left(H-G\right)\right|}_{\hat{t}\times \hat{s}}-{Q}_{tc}{Q}_{tc}^{\ast }G{\left|{}_{\hat{t}\times \hat{s}}-{Q}_{tc}{Q}_{tc}^{\ast}\left(H-G\right)\right|}_{\hat{t}\times \hat{s}}\\ {}\hfill & {\left.=G\right|}_{\hat{t}\times \hat{s}}-{Q}_{tc}{Q}_{tc}^{\ast }G{\left|{}_{\hat{t}\times \hat{s}}+\left(I-{Q}_{tc}{Q}_{tc}^{\ast}\right)\left(H-G\right)\right|}_{\hat{t}\times \hat{s}}\end{array}} $$
and therefore by the triangle inequality
( H | t ^ × ŝ Q t c Q t c H | t ^ × ŝ ) x 2 ( G | t ^ × ŝ Q t c Q t c G | t ^ × ŝ ) x 2 + ( I Q t c Q t c ) ( H G ) x 2 . $$ {\left\Vert \left(H{\left|{}_{\hat{t}\times \hat{s}}-{Q}_{tc}{Q}_{tc}^{\ast }H\right|}_{\hat{t}\times \hat{s}}\right)x\right\Vert}_2\le {\left\Vert \left(G{\left|{}_{\hat{t}\times \hat{s}}-{Q}_{tc}{Q}_{tc}^{\ast }G\right|}_{\hat{t}\times \hat{s}}\right)x\right\Vert}_2+{\left\Vert \left(I-{Q}_{tc}{Q}_{tc}^{\ast}\right)\left(H-G\right)x\right\Vert}_2. $$
We let y : = ( H G ) | t ^ × ŝ x $$ {\left.y:= \left(H-G\right)\right|}_{\hat{t}\times \hat{s}}x $$ and make use of the isometry of Q t c $$ {Q}_{tc} $$ to find
( I Q t c Q t c ) y 2 2 = y 2 2 y , Q t c Q t c y 2 Q t c Q t c y , y 2 + Q t c Q t c y 2 2 = y 2 2 2 y , Q t c Q t c Q t c Q t c y 2 + Q t c Q t c y 2 2 = y 2 2 2 Q t c Q t c y , Q t c Q t c y 2 + Q t c Q t c y 2 2 = y 2 2 Q t c Q t c y 2 2 y 2 2 . $$ {\displaystyle \begin{array}{ll}\hfill {\left\Vert \left(I-{Q}_{tc}{Q}_{tc}^{\ast}\right)y\right\Vert}_2^2& ={\left\Vert y\right\Vert}_2^2-{\left\langle y,{Q}_{tc}{Q}_{tc}^{\ast }y\right\rangle}_2-{\left\langle {Q}_{tc}{Q}_{tc}^{\ast }y,y\right\rangle}_2+{\left\Vert {Q}_{tc}{Q}_{tc}^{\ast }y\right\Vert}_2^2\\ {}\hfill & ={\left\Vert y\right\Vert}_2^2-2{\left\langle y,{Q}_{tc}{Q}_{tc}^{\ast }{Q}_{tc}{Q}_{tc}^{\ast }y\right\rangle}_2+{\left\Vert {Q}_{tc}{Q}_{tc}^{\ast }y\right\Vert}_2^2\\ {}\hfill & ={\left\Vert y\right\Vert}_2^2-2{\left\langle {Q}_{tc}{Q}_{tc}^{\ast }y,{Q}_{tc}{Q}_{tc}^{\ast }y\right\rangle}_2+{\left\Vert {Q}_{tc}{Q}_{tc}^{\ast }y\right\Vert}_2^2\\ {}\hfill & ={\left\Vert y\right\Vert}_2^2-{\left\Vert {Q}_{tc}{Q}_{tc}^{\ast }y\right\Vert}_2^2\le {\left\Vert y\right\Vert}_2^2.\end{array}} $$
Since this is equivalent with ( I Q t c Q t c ) ( H G ) | t ^ × ŝ x 2 2 ( H G ) | t ^ × ŝ x 2 2 $$ {\left\Vert \left(I-{Q}_{tc}{Q}_{tc}^{\ast}\right)\left(H-G\right){\left|{\left.{}_{\hat{t}\times \hat{s}}x\right\Vert}_2^2\le \Big\Vert \left(H-G\right)\right|}_{\hat{t}\times \hat{s}}x\right\Vert}_2^2 $$ , the proof is complete.

This lemma implies that if we want to approximate a matrix H $$ H $$ , but apply the algorithm to an approximation G $$ G $$ satisfying the block-relative error estimate
H | t ^ × ŝ G | t ^ × ŝ 2 ϵ H | t ^ × ŝ 2 for all ( t , s ) × + , $$ {\left.{\left.\Big\Vert H\right|}_{\hat{t}\times \hat{s}}-G{\left|{\left.{}_{\hat{t}\times \hat{s}}\right\Vert}_2\le \epsilon \Big\Vert H\right|}_{\hat{t}\times \hat{s}}\right\Vert}_2\kern1em \mathrm{for}\ \mathrm{all}\kern0.3em \left(t,s\right)\in {\mathcal{L}}_{\mathcal{I}\times \mathcal{I}}^{+}, $$
we will obtain
H | t ^ × ŝ Q t c Q t c H | t ^ × ŝ 2 G | t ^ × ŝ Q t c Q t c G | t ^ × ŝ 2 + H | t ^ × ŝ G | t ^ × ŝ 2 ϵ G | t ^ × ŝ 2 + ϵ H | t ^ × ŝ 2 ϵ ( H | t ^ × ŝ 2 + G H | t ^ × ŝ 2 ) + ϵ H | t ^ × ŝ 2 ϵ ( 1 + ϵ ) H | t ^ × ŝ 2 + ϵ H | t ^ × ŝ 2 = ϵ ( 2 + ϵ ) H | t ^ × ŝ 2 , $$ {\displaystyle \begin{array}{ll}\hfill {\left\Vert H{\left|{}_{\hat{t}\times \hat{s}}-{Q}_{tc}{Q}_{tc}^{\ast }H\right|}_{\hat{t}\times \hat{s}}\right\Vert}_2& \le {\left\Vert G{\left|{}_{\hat{t}\times \hat{s}}-{Q}_{tc}{Q}_{tc}^{\ast }G\right|}_{\hat{t}\times \hat{s}}\right\Vert}_2+{\left\Vert H{\left|{}_{\hat{t}\times \hat{s}}-G\right|}_{\hat{t}\times \hat{s}}\right\Vert}_2\\ {}\hfill & \le \epsilon {\left\Vert G{\left|{\left.{}_{\hat{t}\times \hat{s}}\right\Vert}_2+\epsilon \Big\Vert H\right|}_{\hat{t}\times \hat{s}}\right\Vert}_2\\ {}\hfill & {\left.{\left.\le \epsilon \left(\Big\Vert H|{\left.{}_{\hat{t}\times \hat{s}}\right\Vert}_2+\Big\Vert G-H|{\left.{}_{\hat{t}\times \hat{s}}\right\Vert}_2\right)+\epsilon \Big\Vert H\right|}_{\hat{t}\times \hat{s}}\right\Vert}_2\\ {}\hfill & \le \epsilon \left(1+\epsilon \right){\left\Vert H{\left|{\left.{}_{\hat{t}\times \hat{s}}\right\Vert}_2+\epsilon \Big\Vert H\right|}_{\hat{t}\times \hat{s}}\right\Vert}_2\\ {}\hfill & {\left.{\left.=\epsilon \left(2+\epsilon \right)\Big\Vert H\right|}_{\hat{t}\times \hat{s}}\right\Vert}_2,\end{array}} $$
that is, the basis constructed to ensure block-relative error estimates for the matrix G $$ G $$ will also ensure block-relative error estimates for the matrix H $$ H $$ , only with a slightly larger error factor 2 ϵ $$ \approx 2\epsilon $$ . Since our error-control strategy can ensure any accuracy ϵ > 0 $$ \epsilon >0 $$ , this is quite satisfactory.

5 APPROXIMATED WEIGHTS

Figure 5 suggests that for higher accuracies, the basis weights ( R s c ) s 𝒯 , c 𝒟 s can require more storage than the entire recompressed 𝒟 2 -matrix. With the error representation of Theorem 1 and the stability analysis of the previous section at our disposal, we can investigate ways to reduce the storage requirements without causing significant harm to the final result.

We do not have to worry about the total weights ( Z t c ) t 𝒯 , c 𝒟 t , since they can be set up during the recursive construction of the adaptive cluster basis.

5.1 Direct approximation of weights

The basis weight matrices R s c $$ {R}_{sc} $$ are required by our algorithm when it sets up the total weight matrix Z t c $$ {Z}_{tc} $$ with
G t c = V t c Z t c P t c $$ {G}_{tc}={V}_{tc}{Z}_{tc}^{\ast }{P}_{tc}^{\ast } $$
using
G t c | t ^ × ŝ = V t c S t s W s c = V t c S t s R s c Q s c $$ {\left.{G}_{tc}\right|}_{\hat{t}\times \hat{s}}={V}_{tc}{S}_{ts}{W}_{sc}^{\ast }={V}_{tc}{S}_{ts}{R}_{sc}^{\ast }{Q}_{sc}^{\ast } $$
for an admissible block b = ( t , s ) × + $$ b=\left(t,s\right)\in {\mathcal{L}}_{\mathcal{I}\times \mathcal{I}}^{+} $$ . The isometric matrix Q s c $$ {Q}_{sc} $$ influences only P t c $$ {P}_{tc} $$ and can be dropped since it does not influence the singular values or the left singular vectors.
Our goal is to replace the basis weight R s c $$ {R}_{sc} $$ by an approximation R ˜ s c $$ {\tilde{R}}_{sc} $$ while ensuring that the recompression algorithm keeps working reliably. We find
G t c | t ^ × ŝ V t c S t s R ˜ s c Q s c 2 = V t c S t s ( R s c R ˜ s c ) Q s c 2 = V t c S t s ( R s c R ˜ s c ) 2 $$ {\left.{\left.\Big\Vert {G}_{tc}\right|}_{\hat{t}\times \hat{s}}-{V}_{tc}{S}_{ts}{\tilde{R}}_{sc}^{\ast }{Q}_{sc}^{\ast}\right\Vert}_2={\left\Vert {V}_{tc}{S}_{ts}\left({R}_{sc}^{\ast }-{\tilde{R}}_{sc}^{\ast}\right){Q}_{sc}^{\ast}\right\Vert}_2={\left\Vert {V}_{tc}{S}_{ts}{\left({R}_{sc}-{\tilde{R}}_{sc}\right)}^{\ast}\right\Vert}_2 $$
and conclude that it is sufficient to ensure that the product R ˜ s c S t s $$ {\tilde{R}}_{sc}{S}_{ts}^{\ast } $$ is a good approximation of the product R s c S t s $$ {R}_{sc}{S}_{ts}^{\ast } $$ , we do not require R ˜ s c $$ {\tilde{R}}_{sc} $$ itself to be a good approximation of R s c $$ {R}_{sc} $$ . This is a crucial observation, because important approximation properties are due to the kernel function represented by S t s $$ {S}_{ts} $$ , not due to the essentially arbitrary polynomial basis represented by R s c $$ {R}_{sc} $$ .
Since the basis weight R s c $$ {R}_{sc} $$ will be used for multiple clusters t 𝒯 , we introduce the sets
𝒞 s c : = { t 𝒯 : ( t , s ) × + , dirblock ( t , s ) = c } for all s 𝒯 , c 𝒟 s (14)
in analogy to the sets t c $$ {\mathcal{R}}_{tc} $$ used for the compression algorithm in (7). Enumerating the elements by 𝒞 s c = { t 1 , , t m } leads us to consider the approximation of the matrix
W s c : = R s c S t 1 s S t m s . $$ {W}_{sc}:= {R}_{sc}\left({S}_{t_1s}^{\ast}\kern0.5em \dots \kern0.5em {S}_{t_ms}^{\ast}\right). $$ (15)
The optimal solution is again provided by the singular value decomposition of W s c $$ {W}_{sc} $$ : for the singular values σ 1 , σ 2 , $$ {\sigma}_1,{\sigma}_2,\dots $$ and a given accuracy ϵ > 0 $$ \epsilon \in {\mathbb{R}}_{>0} $$ , we choose a rank k s c $$ {k}_{sc}\in \mathbb{N} $$ such that σ k s c + 1 ϵ $$ {\sigma}_{k_{sc}+1}\le \epsilon $$ and combine the first k s c $$ {k}_{sc} $$ left singular vectors in an isometric matrix Q ˜ s c $$ {\tilde{Q}}_{sc} $$ . We use the corresponding low-rank approximation R ˜ s c : = Q ˜ s c Q ˜ s c R s c $$ {\tilde{R}}_{sc}:= {\tilde{Q}}_{sc}{\tilde{Q}}_{sc}^{\ast }{R}_{sc} $$ and find
R s c S s t R ˜ s c S s t 2 = R s c S s t Q ˜ s c Q ˜ s c R s c S s t 2 W s c Q ˜ s c Q ˜ s c W s c 2 ϵ . $$ {\left\Vert {R}_{sc}{S}_{st}^{\ast }-{\tilde{R}}_{sc}{S}_{st}^{\ast}\right\Vert}_2={\left\Vert {R}_{sc}{S}_{st}^{\ast }-{\tilde{Q}}_{sc}{\tilde{Q}}_{sc}^{\ast }{R}_{sc}{S}_{st}^{\ast}\right\Vert}_2\le {\left\Vert {W}_{sc}-{\tilde{Q}}_{sc}{\tilde{Q}}_{sc}^{\ast }{W}_{sc}\right\Vert}_2\le \epsilon . $$
The resulting algorithm is summarized in Figure 6. It is important to note that the original basis weights R s c $$ {R}_{sc} $$ are discarded as soon as they are no longer needed so that the original weights have to be kept in storage only for the children of the current cluster and the children of its ancestors at every point of the algorithm.
Details are in the caption following the image
Construction of the approximated basis weights R ˜ s c = Q ˜ s c R ^ s c $$ {\tilde{R}}_{sc}={\tilde{Q}}_{sc}{\hat{R}}_{sc} $$ .

For the basis construction algorithm, cf. Figure 4, only the coefficient matrices R ^ s c : = Q ˜ s c R s c $$ {\hat{R}}_{sc}:= {\tilde{Q}}_{sc}^{\ast }{R}_{sc} $$ are required, since the isometric matrices Q ˜ s c $$ {\tilde{Q}}_{sc} $$ do not influence the singular values and the left singular vectors. Our algorithm only provides these matrices to save storage.

5.2 Block-relative error control

Again, we are interested in blockwise relative error estimates, and as before, we can modify the blockwise approximation by introducing weights ω t s > 0 $$ {\omega}_{ts}\in {\mathbb{R}}_{>0} $$ and considering
W ω , s c : = R s c ω t 1 s 1 S t 1 s ω t m s 1 S t m s . $$ {W}_{\omega, sc}:= {R}_{sc}\left({\omega}_{t_1s}^{-1}{S}_{t_1s}^{\ast}\kern0.5em \dots \kern0.5em {\omega}_{t_ms}^{-1}{S}_{t_ms}^{\ast}\right). $$ (16)
Replacing W s c $$ {W}_{sc} $$ by W ω , s c $$ {W}_{\omega, sc} $$ yields
R s c S s t R ˜ s c S s t 2 = ω t s R s c ω t s 1 S s t Q ˜ s c Q ˜ s c R s c ω t s 1 S s t 2 ω t s W ω , s c Q ˜ s c Q ˜ s c W ω , s c 2 ω t s ϵ . $$ {\displaystyle \begin{array}{ll}\hfill {\left\Vert {R}_{sc}{S}_{st}^{\ast }-{\tilde{R}}_{sc}{S}_{st}^{\ast}\right\Vert}_2& ={\omega}_{ts}{\left\Vert {R}_{sc}{\omega}_{ts}^{-1}{S}_{st}^{\ast }-{\tilde{Q}}_{sc}{\tilde{Q}}_{sc}^{\ast }{R}_{sc}{\omega}_{ts}^{-1}{S}_{st}^{\ast}\right\Vert}_2\\ {}\hfill & \le {\omega}_{ts}{\left\Vert {W}_{\omega, sc}-{\tilde{Q}}_{sc}{\tilde{Q}}_{sc}^{\ast }{W}_{\omega, sc}\right\Vert}_2\le {\omega}_{ts}\epsilon .\end{array}} $$
For the blockwise error we obtain
G | t ^ × ŝ V t c S t s R ˜ s c Q s c 2 = V t c S t s W s c V t c S t c R ˜ s c Q s c 2 = V t c S t s ( R s c R ˜ s c ) Q s c 2 V t c 2 Q s c ( R s c R ˜ s c ) S t s 2 = V t c 2 ( R s c R ˜ s c ) S t s 2 V t c 2 ω t s ϵ , $$ {\displaystyle \begin{array}{ll}\hfill {\left.{\left.\Big\Vert G\right|}_{\hat{t}\times \hat{s}}-{V}_{tc}{S}_{ts}{\tilde{R}}_{sc}^{\ast }{Q}_{sc}^{\ast}\right\Vert}_2& ={\left\Vert {V}_{tc}{S}_{ts}{W}_{sc}^{\ast }-{V}_{tc}{S}_{tc}{\tilde{R}}_{sc}^{\ast }{Q}_{sc}^{\ast}\right\Vert}_2\\ {}\hfill & ={\left\Vert {V}_{tc}{S}_{ts}{\left({R}_{sc}-{\tilde{R}}_{sc}\right)}^{\ast }{Q}_{sc}^{\ast}\right\Vert}_2\\ {}\hfill & \le {\left\Vert {V}_{tc}\right\Vert}_2{\left\Vert {Q}_{sc}\left({R}_{sc}-{\tilde{R}}_{sc}\right){S}_{ts}^{\ast}\right\Vert}_2\\ {}\hfill & ={\left\Vert {V}_{tc}\right\Vert}_2{\left\Vert \left({R}_{sc}-{\tilde{R}}_{sc}\right){S}_{ts}^{\ast}\right\Vert}_2\le {\left\Vert {V}_{tc}\right\Vert}_2{\omega}_{ts}\epsilon, \end{array}} $$
so a relative error bound is guaranteed if we ensure
ω t s V t c S t s V s c 2 V t c 2 . $$ {\omega}_{ts}\le \frac{{\left\Vert {V}_{tc}{S}_{ts}{V}_{sc}^{\ast}\right\Vert}_2}{{\left\Vert {V}_{tc}\right\Vert}_2}. $$
Evaluating the numerator and denominator exactly would require us again to have the full basis weights at our disposal. Fortunately, a projection is sufficient for our purposes: in a preparation step, we compute the basis weight R t c $$ {R}_{tc} $$ , find its singular value decomposition and use a given number k norm $$ {k}_{\mathrm{norm}}\in \mathbb{N} $$ of left singular vectors to form an auxiliary isometric matrix P t c $$ {P}_{tc} $$ and store N t c : = P t c R r c $$ {N}_{tc}:= {P}_{tc}^{\ast }{R}_{rc} $$ . Since the first singular value corresponds to the spectral norm of R t c $$ {R}_{tc} $$ , and therefore the spectral norm of V t c $$ {V}_{tc} $$ , we have
N t c 2 = P t c R t c 2 = R t c 2 = V t c 2 $$ {\left\Vert {N}_{tc}\right\Vert}_2={\left\Vert {P}_{tc}^{\ast }{R}_{tc}\right\Vert}_2={\left\Vert {R}_{tc}\right\Vert}_2={\left\Vert {V}_{tc}\right\Vert}_2 $$
and can evaluate the denominator exactly. Since P t c P t c $$ {P}_{tc}{P}_{tc}^{\ast } $$ is an orthogonal projection, we also have
V t c S t s V s c 2 = R t c S t s R s c 2 P t c R t c S t s R s c 2 = N t c S t s R s c 2 , $$ {\left\Vert {V}_{tc}{S}_{ts}{V}_{sc}^{\ast}\right\Vert}_2={\left\Vert {R}_{tc}{S}_{ts}{R}_{sc}^{\ast}\right\Vert}_2\ge {\left\Vert {P}_{tc}^{\ast }{R}_{tc}{S}_{ts}{R}_{sc}^{\ast}\right\Vert}_2={\left\Vert {N}_{tc}{S}_{ts}{R}_{sc}^{\ast}\right\Vert}_2, $$
that is, we can find a lower bound for the numerator. Fortunately, a lower bound is sufficient for our purposes, and we can use
ω t s : = N t c S t s R s c 2 N t c 2 V t c S t s V s c 2 V t c 2 . $$ {\omega}_{ts}:= \frac{{\left\Vert {N}_{tc}{S}_{ts}{R}_{sc}^{\ast}\right\Vert}_2}{{\left\Vert {N}_{tc}\right\Vert}_2}\le \frac{{\left\Vert {V}_{tc}{S}_{ts}{V}_{sc}^{\ast}\right\Vert}_2}{{\left\Vert {V}_{tc}\right\Vert}_2}. $$
The algorithm for constructing the norm-estimation matrices N t c $$ {N}_{tc} $$ is summarized in Figure 7. It uses exact Householder factorizations Q t c R t c = V t c $$ {Q}_{tc}{R}_{tc}={V}_{tc} $$ for all basis matrices and then truncates R t c $$ {R}_{tc} $$ . In order to make the computation efficient, we use
V t c = V t 1 , c 1 E t 1 c V t n , c n E t n c = Q t 1 , c 1 Q t n c n V ^ t c , V ^ t c : = R t 1 , c 1 E t 1 c R t n , c n E t n c $$ {V}_{tc}=\left(\begin{array}{l}{V}_{t_1,{c}_1}{E}_{t_1c}\\ {}\vdots \\ {}{V}_{t_n,{c}_n}{E}_{t_nc}\end{array}\right)=\left(\begin{array}{lll}{Q}_{t_1,{c}_1}& & \\ {}& \ddots & \\ {}& & {Q}_{t_n{c}_n}\end{array}\right){\hat{V}}_{tc},\kern1em {\hat{V}}_{tc}:= \left(\begin{array}{l}{R}_{t_1,{c}_1}{E}_{t_1c}\\ {}\vdots \\ {}{R}_{t_n,{c}_n}{E}_{t_nc}\end{array}\right) $$ (17)
to replace V t c $$ {V}_{tc} $$ by the projected matrix V ^ t c $$ {\hat{V}}_{tc} $$ with chil ( t ) = { t 1 , , t n } $$ \mathrm{chil}(t)=\left\{{t}_1,\dots, {t}_n\right\} $$ and c i = dirchil ( t i , c ) $$ {c}_i=\mathrm{dirchil}\left({t}_i,c\right) $$ for all i [ 1 : n ] $$ i\in \left[1:n\right] $$ .
Details are in the caption following the image
Construction of norm-approximation matrices N t c $$ {N}_{tc} $$ .

Figure 8 shows that compressing the basis weights following these principles leaves the accuracy of the matrix intact and significantly reduces the storage requirements.

Details are in the caption following the image
Left: convergence of recompressed interpolation with compressed weights. Right: storage requirements of compressed and uncompressed weights.

6 NUMERICAL EXPERIMENTS

To demonstrate the properties of the new algorithms in practical applications, we consider direct boundary integral formulations for the Helmholtz problem on the unit sphere. We create a mesh for the unit sphere by starting with a double pyramid P = { x 3 : | x 1 | + | x 2 | + | x 3 | = 1 } $$ P=\left\{x\in {\mathbb{R}}^3\kern0.3em :\kern0.3em |{x}_1|+|{x}_2|+|{x}_3|=1\right\} $$ and refining each of its faces into m 2 $$ {m}^2 $$ triangles, where m { 16 , 24 , 32 , 48 , , 1024 } $$ m\in \left\{16,24,32,48,\dots, 1024\right\} $$ . Projecting these triangles' vertices to the unit sphere yields regular surface meshes with between 2048 $$ 2048 $$ and 8 , 388 , 608 $$ 8,388,608 $$ triangles.

We discretize the direct boundary integral formulations for the Dirichlet-to-Neumann and the Neumann-to-Dirichlet problem with piecewise constant basis functions for the Neumann values and continuous linear nodal basis functions for the Dirichlet values. The approximation of the single-layer matrix by a 𝒟 2 -matrix has already been discussed.

For the double-layer matrix, we apply directional interpolation to the kernel function and take the normal derivative of the result. This again yields an 𝒟 2 -matrix. The approximation error has been investigated in Reference 6.

For the hypersingular matrix, we use partial integration13(Corollary 3.3.24) and again apply directional interpolation to the remaining kernel function.

In order to save storage, basis weights for the row and column basis of the single-layer matrix and the row basis of the double-layer matrix are shared, and basis weights for the column basis of the double-layer matrix and the row and column basis of the hypersingular matrix are also shared. In our implementation, this can be easily accomplished by including more matrix blocks in the matrices W s c $$ {W}_{sc} $$ .

The resulting systems of linear equations are solved by a GMRES method that is preconditioned using an $$ \mathscr{H} $$ -LU factorization14(Chapter 7.6) of a coarse approximation of the 𝒟 2 -matrix.

Table 1 contains results for a first experiment with the constant wave number κ = 4 $$ \kappa =4 $$ . The column “n” gives the number of triangles, the column “m” gives the order of the interpolation, the column “Norm” gives the time in seconds for the approximation of the matrix norm with the algorithm given in Figure 7, the column “Comp” gives the time for the compression of the weights by the algorithm in Figure 6 and “Mem” the storage requirements in MB for the compressed weight matrices.

TABLE 1. Helmholtz boundary integral equation with constant wave number κ = 4 $$ \kappa =4 $$ .
Weight φ i $$ {\varphi}_i $$ SLP DLP
n m Norm Comp Mem Row Col Mem Row Col Mem
8192 3 < $$ < $$ 0.1 2.6 5 0.2 0.3 319 0.1 0.1 331
18,432 4 0.2 11.5 37 2.0 2.1 588 1.4 1.4 648
32,768 4 0.2 8.7 80 3.0 3.0 943 1.6 2.7 951
73,728 5 0.9 70.5 431 5.7 5.8 2013 4.4 4.3 1792
131,072 5 1.3 101.3 853 9.1 9.4 3594 7.0 6.5 3243
294,912 5 3.6 183.1 2072 20.7 20.9 8234 14.7 13.7 7774
524,288 6 6.5 447.7 7217 71.1 52.6 15,797 44.1 37.4 13,959
1,179,648 6 14.1 824.2 17,091 156.9 119.6 37,949 100.7 71.5 35,281
2,097,152 7 26.1 3310.3 51,516 1062.4 718.8 71,768 602.6 513.3 62,032
4,718,592 7 57.8 7034.6 122,268 2704.3 1858.2 176,761 1606.0 1316.2 162,386
8,388,608 7 101.9 12,041.2 224,330 4821.9 3636.7 332,950 2799.6 2616.1 323,817

The columns “Row” and “Col” give the times in seconds for constructing the adaptive row and column cluster bases, both for the single-layer and the double-layer matrix, while the columns “Mem” give the storage requirements in MB for the compressed 𝒟 2 -matrices.

The experiment was performed on a server with two AMD EPYC 7713 processors with 64 cores each and a total of 2048 GB of memory.

Our theoretical investigation suggests that the runtime and storage requirements grow like 𝒪 ( n k 2 ) for a constant wave number, and this prediction seems to be confirmed by our experiment. Truncation tolerances were chosen to ensure that the convergence of the original Galerkin discretization is preserved, that is, we obtain L 2 $$ {L}^2 $$ -norm errors falling like 𝒪 ( h ) for the Dirichlet-to-Neumann problem and like 𝒪 ( h 2 ) for the Neumann-to-Dirichlet problem.

Table 2 contains results for a second experiment with the a wave number that grows as the mesh is refined. This is common in practical applications when the mesh is chosen just fine enough to resolve waves.

TABLE 2. Helmholtz boundary integral equation with growing wave number.
Weight φ i $$ {\varphi}_i $$ SLP DLP
n κ $$ \kappa $$ m Norm Comp Mem Row Col Mem Row Col Mem
8192 4 3 < $$ < $$ 0.1 2.3 5 0.8 0.9 319 0.3 0.4 331
18,432 6 4 0.2 12.3 35 3.2 3.2 993 1.7 2.0 1057
32,768 8 4 0.3 33.8 77 7.2 7.4 2295 4.6 4.5 2336
73,728 12 5 0.9 246.4 428 24.4 25.0 7318 19.4 18.7 7507
131,072 16 5 1.6 559.9 966 50.1 47.2 16,888 42.8 37.8 17,170
294,912 24 5 5.6 1579.0 3599 125.5 123.6 43,792 107.8 103.0 45,345
524,288 32 6 13.2 6815.1 14,383 474.0 331.5 94,766 489.1 389.5 93,371
1,179,648 48 6 34.7 19,657.0 44,523 1324.8 1008.2 255,373 1540.6 979.9 259,687
2,097,152 64 7 105.5 84,707.6 174,972 7622.8 6252.1 455,584 8541.0 6106.9 438,976

The growing wave number makes it significantly harder to satisfy the admissibility condition (3a) and thereby leads to an increase in blocks that have to be treated. The admissibility condition (3b) implies that we also have to introduce a growing number of directions as the wave number increases.

The results of our experiment show the expected increase both in computing time and storage requirements: for 2,097,152 triangles, the setup takes far longer than in the low-frequency case, since more than thirteen times as many blocks have to be considered. This is not surprising, since the complexity analysis suggests 𝒪 ( n k 2 log n ) for the high-frequency case. In this case, a single coupling matrix requires 4 MB of storage, and storing multiple of these matrices during the setup of the matrix W s c $$ {W}_{sc} $$ can be expected to exceed the capacity of the available cache memory, thus considerably slowing down the computation.

The compression of the weight matrices takes almost as much time as the entire basis construction. Fortunately, the basis construction significantly benefits from the compressed weights and runs twice as fast. This effect is not able to fully compensate the time spent compressing the weight matrices, but it ensures that the new memory-efficient algorithm is competitive with the original version.

Figure 9 illustrates the algorithms' practical performance: the left figure shows the runtime per degree of freedom using a logarithmic scale for the number of triangles. We can see that the runtimes grow like 𝒪 ( n log n ) , as predicted, with the slope depending on the order of interpolation.

Details are in the caption following the image
Left: Runtime per degree of freedom. Right: Storage requirements per degree of freedom.

The right figure shows the storage, again per degree of freedom, and we can see that the compressed 𝒟 2 -matrices for the three operators show the expected 𝒪 ( n log n ) behavior. The compressed weights grow slowly, while the uncompressed weights appear to be set to surpass the storage requirements of the matrices they are used to construct.

ACKNOWLEDGMENTS

Open Access funding enabled and organized by Projekt DEAL.

    FUNDING INFORMATION

    Part of this research was funded by the Deutsche Forschungsgemeinschaft in project BO 3289/7-1.

    CONFLICT OF INTEREST STATEMENT

    The authors are not aware of any conflicts of interest concerning this article.

    DATA AVAILABILITY STATEMENT

    Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.