Geometrical and Spectral Properties of the Orthogonal Projections of the Identity
Abstract
We analyze the best approximation AN (in the Frobenius sense) to the identity matrix in an arbitrary matrix subspace AS (A ∈ ℝn×n nonsingular, S being any fixed subspace of ℝn×n). Some new geometrical and spectral properties of the orthogonal projection AN are derived. In particular, new inequalities for the trace and for the eigenvalues of matrix AN are presented for the special case that AN is symmetric and positive definite.
1. Introduction
The set of all n × n real matrices is denoted by ℝn×n, and I denotes the identity matrix of order n. In the following, AT and tr (A) denote, as usual, the transpose and the trace of matrix A ∈ ℝn×n. The notations 〈·,·〉F and ∥·∥F stand for the Frobenius inner product and matrix norm, defined on the matrix space ℝn×n. Throughout this paper, the terms orthogonality, angle, and cosine will be used in the sense of the Frobenius inner product.
The solution N to the problem (3) will be referred to as the “optimal” or the “best” approximate inverse of matrix A in the subspace S. Since matrix AN is the best approximation to the identity in subspace AS, it will be also referred to as the orthogonal projection of the identity matrix onto the subspace AS. Although many of the results presented in this paper are also valid for the case that matrix N is singular, from now on, we assume that the optimal approximate inverse N (and thus also the orthogonal projection AN) is a nonsingular matrix. The solution N to the problem (3) has been studied as a natural generalization of the classical Moore-Penrose inverse in [6], where it has been referred to as the S-Moore-Penrose inverse of matrix A.
The main goal of this paper is to derive new geometrical and spectral properties of the best approximations AN (in the sense of formula (3)) to the identity matrix. Such properties could be used to analyze the quality and theoretical effectiveness of the optimal approximate inverse N as preconditioner of the system (1). However, it is important to highlight that the purpose of this paper is purely theoretical, and we are not looking for immediate numerical or computational approaches (although our theoretical results could be potentially applied to the preconditioning problem). In particular, the term “optimal (or best) approximate inverse” is used in the sense of formula (3) and not in any other sense of this expression.
Among the many different works dealing with practical algorithms that can be used to compute approximate inverses, we refer the reader to for example, [4, 7–9] and to the references therein. In [4], the author presents an exhaustive survey of preconditioning techniques and, in particular, describes several algorithms for computing sparse approximate inverses based on Frobenius norm minimization like, for instance, the well-known SPAI and FSAI algorithms. A different approach (which is also focused on approximate inverses based on minimizing ∥AM−I∥F) can be found in [7], where an iterative descent-type method is used to approximate each column of the inverse, and the iteration is done with “sparse matrix by sparse vector” operations. When the system matrix is expressed in block-partitioned form, some preconditioning options are explored in [8]. In [9], the idea of “target” matrix is introduced, in the context of sparse approximate inverse preconditioners, and the generalized Frobenius norms (H symmetric positive definite) are used, for minimization purposes, as an alternative to the classical Frobenius norm.
The last results of our work are devoted to the special case that matrix AN is symmetric and positive definite. In this sense, let us recall that the cone of symmetric and positive definite matrices has a rich geometrical structure and, in this context, the angle that any symmetric and positive definite matrix forms with the identity plays a very important role [10]. In this paper, the authors extend this geometrical point of view and analyze the geometrical structure of the subspace of symmetric matrices of order n, including the location of all orthogonal matrices not only the identity matrix.
This paper has been organized as follows. In Section 2, we present some preliminary results required to make the paper self-contained. Sections 3 and 4 are devoted to obtain new geometrical and spectral relations, respectively, for the orthogonal projections AN of the identity matrix. Finally, Section 5 closes the paper with its main conclusions.
2. Some Preliminaries
Now, we present some preliminary results concerning the orthogonal projection AN of the identity onto the matrix subspace AS ⊂ ℝn×n. For more details about these results and for their proofs, we refer the reader to [5, 6, 11].
Taking advantage of the prehilbertian character of the matrix Frobenius norm, the solution N to the problem (3) can be obtained using the orthogonal projection theorem. More precisely, the matrix product AN is the orthogonal projection of the identity onto the subspace AS, and it satisfies the conditions stated by the following lemmas; see [5, 11].
Lemma 1. Let A ∈ ℝn×n be nonsingular and let S be a linear subspace of ℝn×n. Let N be the solution to the problem (3). Then,
An explicit formula for matrix N can be obtained by expressing the orthogonal projection AN of the identity matrix onto the subspace AS by its expansion with respect to an orthonormal basis of AS [5]. This is the idea of the following lemma.
Lemma 2. Let A ∈ ℝn×n be nonsingular. Let S be a linear subspace of ℝn×n of dimension d and {M1, …, Md} a basis of S such that {AM1, …, AMd} is an orthogonal basis of AS. Then, the solution N to the problem (3) is
Hence, we can explicitly obtain the solution N to the problem (3) for subspace S′, from its basis {P1AT, …, PdAT}, as follows. If {AP1AT, …, APdAT} is an orthogonal basis of subspace AS′, then we just use the orthogonal expansion (6) for obtaining N. Otherwise, we use again the Gram-Schmidt procedure to obtain an orthogonal basis of subspace AS′, and then we apply formula (6). The interest of this second example stands in the possibility of using the conjugate gradient method for solving the preconditioned linear system, when the symmetric matrix AN is positive definite. For a more detailed exposition of the computational aspects related to these two examples, we refer the reader to [5].
The following lemma [11] provides some inequalities involving the eigenvalues and singular values of the preconditioned matrix AN.
Lemma 3. Let A ∈ ℝn×n be nonsingular and let S be a linear subspace of ℝn×n. Then,
The following fact [11] is a direct consequence of Lemma 3.
Lemma 4. Let A ∈ ℝn×n be nonsingular and let S be a linear subspace of ℝn×n. Then, the smallest singular value and the smallest eigenvalue’s modulus of the orthogonal projection AN of the identity onto the subspace AS are never greater than 1. That is,
The following theorem [11] establishes a tight connection between the closeness of matrix AN to the identity matrix and the closeness of σn(|λn|) to the unity.
Theorem 5. Let A ∈ ℝn×n be nonsingular and let S be a linear subspace of ℝn×n. Let N be the solution to the problem (3). Then,
Remark 6. Theorem 5 states that the closer the smallest singular value σn of matrix AN is to the unity, the closer matrix AN will be to the identity, that is, the smaller ∥AN−I∥F will be, and conversely. The same happens with the smallest eigenvalue’s modulus |λn| of matrix AN. In other words, we get a good approximate inverse N of A when σn(|λn|) is sufficiently close to 1.
To finish this section, let us mention that, recently, lower and upper bounds on the normalized Frobenius condition number of the orthogonal projection AN of the identity onto the subspace AS have been derived in [12]. In addition, this work proposes a natural generalization (related to an arbitrary matrix subspace S of ℝn×n) of the normalized Frobenius condition number of the nonsingular matrix A.
3. Geometrical Properties
Lemma 7. Let A ∈ ℝn×n be nonsingular and let S be a linear subspace of ℝn×n. Let N be the solution to the problem (3). Then,
Proof. First, using (15) and (4) we immediately obtain (16). As a direct consequence of (16), we derive that cos (AN, I) is always nonnegative. Finally, using (5) and (16), we get
Remark 8. In [13], the authors consider an arbitrary approximate inverse Q of matrix A and derive the following equality:
Remark 9. More precisely, formula (18) states that the closer cos (AN, I) is to the unity (i.e., the smaller the angle ∠(AN, I) is), the smaller ∥AN−I∥F will be, and conversely. This gives us a new measure of the quality (in the Frobenius sense) of the approximate inverse N of matrix A, by comparing the minimum residual norm ∥AN−I∥F with the cosine of the angle between AN and the identity, instead of with tr (AN), ∥AN∥F (Lemma 1), or σn, |λn| (Theorem 5). So for a fixed nonsingular matrix A ∈ ℝn×n and for different subspaces S ⊂ ℝn×n, we have
Remark 10. Note that the ratio between cos (AN, I) and cos (A, I) is independent of the order n of matrix A. Indeed, assuming that tr (A) ≠ 0 and using (16), we immediately obtain
The following lemma compares the trace and the Frobenius norm of the orthogonal projection AN.
Lemma 11. Let A ∈ ℝn×n be nonsingular and let S be a linear subspace of ℝn×n. Let N be the solution to the problem (3). Then,
Proof. Using (4), we immediately obtain the four leftmost equivalences. Using (16), we immediately obtain the two rightmost equivalences.
The next lemma provides us with a relationship between the Frobenius norms of the inverses of matrices A and its best approximate inverse N in subspace S.
Lemma 12. Let A ∈ ℝn×n be nonsingular and let S be a linear subspace of ℝn×n. Let N be the solution to the problem (3). Then,
However, for the special case that B = N (the solution to the problem (3)), we also get the following inequality.
Lemma 13. Let A ∈ ℝn×n be nonsingular and let S be a linear subspace of ℝn×n. Let N be the solution to the problem (3). Then,
Proof. Using the Cauchy-Schwarz inequality and (5), we get
The next lemma provides us with lower and upper bounds on the inner product 〈AN,B〉F, for any n × n real matrix B.
Lemma 14. Let A ∈ ℝn×n be nonsingular and let S be a linear subspace of ℝn×n. Let N be the solution to the problem (3). Then, for every B ∈ ℝn×n, we have
The next lemma provides an upper bound on the arithmetic mean of the squares of the n2 terms in the orthogonal projection AN. By the way, it also provides us with an upper bound on the arithmetic mean of the n diagonal terms in the orthogonal projection AN. These upper bounds (valid for any matrix subspace S) are independent of the optimal approximate inverse N, and thus they are independent of the subspace S and only depend on matrix A.
Lemma 15. Let A ∈ ℝn×n be nonsingular with tr (A) ≠ 0 and let S be a linear subspace of ℝn×n. Let N be the solution to the problem (3). Then,
Proof. Using (32) for a = AT, x = I, and b = AN, and the Cauchy-Schwarz inequality for and (4), we get
Remark 16. Lemma 15 has the following interpretation in terms of the quality of the optimal approximate inverse N of matrix A in subspace S. The closer the ratio n∥A∥F/|tr (A)| is to zero, the smaller tr (AN) will be, and thus, due to (5), the larger ∥AN−I∥F will be, and this happens for any matrix subspace S.
4. Spectral Properties
In this section, we present some new spectral properties for matrix AN, N being the optimal approximate inverse of matrix A, defined by (3). Mainly, we focus on the case that matrix AN is symmetric and positive definite. This has been motivated by the following reason. When solving a large nonsymmetric linear system (1) by using Krylov methods, a possible strategy consists of searching for an adequate optimal preconditioner N such that the preconditioned matrix AN is symmetric positive definite [5]. This enables one to use the conjugate gradient method (CG-method), which is, in general, a computationally efficient method for solving the new preconditioned system [2, 15].
Let us particularize (38) for some special cases.
The next lemma compares the traces of matrices AN and (AN)2.
Lemma 18. Let A ∈ ℝn×n be nonsingular and let S be a linear subspace of ℝn×n. Let N be the solution to the problem (3). Then,
- (i)
for any orthogonal projection AN
() - (ii)
for any symmetric orthogonal projection AN
() - (iii)
for any symmetric positive definite orthogonal projection AN
()
Proof. (i) Using (38), we get
(ii) It suffices to use the obvious fact that and the following equalities taken from (40):
(iii) It suffices to use (43) and the fact that (see, e.g., [17, 18]) if P and Q are symmetric positive definite matrices then tr (PQ) ≤ tr (P)tr (Q) for P = Q = AN.
The rest of the paper is devoted to obtain new properties about the eigenvalues of the orthogonal projection AN for the special case that this matrix is symmetric positive definite.
First, let us recall that the smallest singular value and the smallest eigenvalue’s modulus of the orthogonal projection AN are never greater than 1 (see Lemma 4). The following theorem establishes the dual result for the largest eigenvalue of matrix AN (symmetric positive definite).
Theorem 19. Let A ∈ ℝn×n be nonsingular and let S be a linear subspace of ℝn×n. Let N be the solution to the problem (3). Suppose that matrix AN is symmetric and positive definite. Then, the largest eigenvalue of the orthogonal projection AN of the identity onto the subspace AS is never less than 1. That is,
Proof. Using (41), we get
In Theorem 19, the assumption that matrix AN is positive definite is essential for assuring that |λ1| ≥ 1, as the following simple counterexample shows. Moreover, from Lemma 4 and Theorem 19, respectively, we have that the smallest and largest eigenvalues of AN (symmetric positive definite) satisfy λn ≤ 1 and λ1 ≥ 1, respectively. Nothing can be asserted about the remaining eigenvalues of the symmetric positive definite matrix AN, which can be greater than, equal to, or less than the unity, as the same counterexample also shows.
Example 20. For n = 3, let
Let us arrange the eigenvalues and singular values of matrix AkNk, as usual, in nonincreasing order (as shown in (11)).
The following corollary improves the lower bound zero on both tr (AN), given in (4), and cos (AN, I), given in (17).
Corollary 21. Let A ∈ ℝn×n be nonsingular and let S be a linear subspace of ℝn×n. Let N be the solution to the problem (3). Suppose that matrix AN is symmetric and positive definite. Then,
Proof. Denote by ∥·∥2 the spectral norm. Using the well-known inequality ∥·∥2 ≤ ∥·∥F [19], Theorem 19, and (4), we get
Our last theorem improves the upper bound given in (60) for the special case that the orthogonal projection AN is symmetric positive definite.
Theorem 22. Let A ∈ ℝn×n be nonsingular and let S be a linear subspace of ℝn×n. Let N be the solution to the problem (3). Suppose that matrix AN is symmetric and positive definite. Then, all the eigenvalues of matrix AN satisfy
5. Conclusion
In this paper, we have considered the orthogonal projection AN (in the Frobenius sense) of the identity matrix onto an arbitrary matrix subspace AS (A ∈ ℝn×n nonsingular, S ⊂ ℝn×n). Among other geometrical properties of matrix AN, we have established a strong relation between the quality of the approximation AN ≈ I and the cosine of the angle ∠(AN, I). Also, the distance between AN and the identity has been related to the ratio n∥A∥F/|tr (A)| (which is independent of the subspace S). The spectral analysis has provided lower and upper bounds on the largest eigenvalue of the symmetric positive definite orthogonal projections of the identity.
Acknowledgments
The authors are grateful to the anonymous referee for valuable comments and suggestions, which have improved the earlier version of this paper. This work was partially supported by the “Ministerio de Economía y Competitividad” (Spanish Government), and FEDER, through Grant Contract CGL2011-29396-C03-01.