1. Introduction
The least squares methods cover a wide range of applications in signal processing and system identification [1–5]. Many technical applications need robust and fast algorithms for fitting ellipses to given points in the plane. In the past, effective methods were Bookstein′s conic-fitting or Fitzgibbon′s direct ellipse-specific fitting, where an algebaic distance with a quadratic constraint is minimized [6, 7]. In this paper, we develop an extended theory of minimization of least squares with a quadratic constraint based on the ideas of Bookstein and Fitzgibbon. Thereby, we show the existence of a minimal solution and present the uniqueness regarding to the smallest positive generalized eigenvalue. So, arbitrary conic fitting problems with quadratic constraints are possible.
Let
A ∈
ℝn×m be matrix with
n ≥
m ≥ 2,
C ∈
ℝm×m be symmetric matrix, and
d ∈
ℝ a real value. We consider the problem of finding a vector
x ∈
ℝm which minimizes the function
F :
ℝm →
ℝ defined by
(1.1)
The side condition
xtCx =
d introduces an absolute quadratic constraint. The problem (
1.1) is not a special case of Gander′s optimization as presented in [
8], because in our case
C is a real symmetric matrix in contrast to the approach of Gander, where the side condition considers real symmetric matrices
CtC which are positive definite. For our considerations, we require the following three assumptions
Assumption 1.1. By replacing C by (−C), we consider d ≥ 0. For d = 0, the trivial solution x = 0 ∈ ℝm fulfills (1.1). Therefore, we demand d > 0.
Assumption 1.2. The set N : = {x ∈ ℝm, xtCx = d} is not empty, that is, the matrix C has not been less than one positive eigenvalue. If we assume that C has only nonpositive eigenvalues, it would be negative semidefinite and
(1.2)
holds. With
d > 0 it follows that the set
N would be empty in this case.
Assumption 1.3. In the following, we set S = AtA and assume that S is regular. S is sometimes called scatter matrix.
In the following two sections, we introduce the theoretical basics of this optimization. The main result is the solution of a generalized eigenvalue problem. Afterwards, we reduce this system numerically to an eigenvalue problem. In Section 5, we present four typical applications for conic fitting problems with quadratic constraints. These approximations contain the ellipse fitting of Fitzgibbon, the hyperbola fitting of O′Leary, the conic fitting of Bookstein, and an optical application of shrinked aspheres [6, 7, 9, 10].
2. Existence of a Minimal Solution
Theorem 2.1. If S ∈ ℝm×m is regular, then there exists a global minimum to the problem (1.1).
Proof. The real regular matrix S = AtA is symmetric and positive definite. Therefore, a Cholesky decomposition S = RtR exists with a regular upper triangular matrix R ∈ ℝm×m. In (1.1), we are looking for a solution x ∈ ℝm minimizing
(2.1)
With
R regular we substitute
x by
R−1y for
y ∈
ℝm. Thus, we obtain an equivalent problem to (
2.1), where we want to find a vector
y ∈
ℝm, minimizing
(2.2)
Now, we define
G :
ℝm →
ℝ with
G(
y) =
yt(
R−1)
tCR−1y −
d and look for a solution
y on the zero-set
NG of
G with minimal distance to the point of origin. Let
y0 ∈
NG ≠
∅ and
be the closed sphere of
ℝm in 0 with radius
r0 = ∥
y0∥. Because of
and
G being continuous, the set
(2.3)
is nonempty, closed and bounded. Therefore, for the continuous function
F(
y) = ∥
y∥
2 exists a minimal value
yM on
with
(2.4)
For all
y from
, it is
. So,
yM is a minimal value of
F in
NG. By the equivalence of (
2.1), and (
2.2) the assumption follows.
3. Generalized Eigenvalue Problem
The minimization problem in (1.1) induces a generalized eigenvalue problem. The following theorem is already proven by Bookstein and Fitzgibbon for the special case of ellipse-fitting [6, 7].
Theorem 3.1. If xs is an extremum of F(x) subject to xtCx = d, then a positive λ0 ∈ ℝ exists with
(3.1)
that is,
xs is an eigenvector to the generalized eigenvalue
λ0 and
(3.2)
holds.
Proof. Let G : ℝm → ℝ be a defined as G(x): = d − xtCx. For G(x) = 0 and d > 0 follows x ≠ 0. Further, G is continuously differentiable with dG/dx = −2Cx ≠ 0 for all x of the zero-set of G. So, if xs is a local extremum of F(x) subject to G(x) = 0, then it is rank (dG/dx)(xs) = 1. Since F is also a continuously differentiable function in ℝm with m > 1, it follows by using a Lagrange multiplier [11]: if xs is a local extremum of F(x) subject to G(x) = 0, then a λ0 ∈ ℝ exists, such that the Lagrange function ϕ : ℝm+1 → ℝ given as
(3.3)
has a critical point in (
xs,
λ0). Therefore,
xs fulfills necessarily the equations:
(3.4)
(3.5)
The first equation describes a generalized eigenvalue problem with
(3.6)
With
d > 0,
xs ≠ 0 and
xs fulfills (
3.6),
λ must be a generalized eigenvalue, and
xs is a corresponding eigenvector to
λ of (
3.6), so that (
S −
λC) is a singular matrix. If
λ0 is an eigenvalue and
x0 ≠ 0 a corresponding eigenvector to
λ0 of (
3.6), then every vector
αx0 is also a solution of (
3.6) for
λ0. Now, we are looking for
α, such that
xs =
αx0 satisfies (
3.5). For
λ0 ≠ 0 and (
3.4), it follows
(3.7)
Because the left side and the numerator are positive, the denominator must also be chosen positive, that is, only positive eigenvalues solve (
3.4) and (
3.5). By the multiplication with
λ0,
(3.8)
follows and
xs =
α ·
x0 fulfills the constraint
G(
xs) = 0.
Remark 3.2. Let x0 be a generalized eigenvector to a positive eigenvalue λ0 of problem (3.6). Then
(3.9)
are solutions of (
3.8).
Lemma 3.3. If S is regular and C is symmetric, then all eigenvalues of (3.1) are real-valued and different to zero.
Proof. With det (S) ≠ 0, λ0 ≠ 0 in (3.1). The Cholesky decomposition S = RtR with a regular upper triangular matrix R yields (3.1)
(3.10)
With
R invertible and the substitution
,
ys =
Rxs, we obtain an eigenvalue problem to the matrix (
Rt)
−1CR−1:
(3.11)
Furthermore, we have
(3.12)
Therefore, the matrix (
Rt)
−1CR−1 is symmetric and all eigenvalues
μ0 are real. With
for
μ0 ≠ 0 follows the proposition.
Remark 3.4. Because of S regular and λ ≠ 0 in
(3.13)
we can consider the equivalent problem with
instead of (
3.11):
(3.14)
This system is called
inverse eigenvalue problem to (
3.1). Here, the eigenspaces to the generalized eigenvalue
λ0 in (
3.1) and to the eigenvalue 1/
λ0 in (
3.14) are identical. Therefore, the generalized eigenvectors in (
3.1) are perpendicular.
Definition 3.5. The set of all eigenvalues of a matrix C is called spectrum σ(C). We call the set of all eigenvalues to the generalized eigenvalue problem in (3.1) also spectrum and denote σ(S, C). σ+(S, C) is defined as the set of all positive values in σ(S, C).
Remark 3.6. In case of rg(C) < m = rg(S), the inverse problem in (3.14) has eigenvalue 0 with multiplicity rg(S) − rg(C). Otherwise, for μ ≠ 0 and μ ∈ σ(S−1C) follows 1/μ ∈ σ(S, C). Analogously, for μ ∈ σ((Rt) −1CR−1) with μ ≠ 0, 1/μ ∈ σ(S, C).
The following lemma is a modified result of Fitzgibbon [7].
Lemma 3.7. The signs of the generalized eigenvalues of (3.1) are the same as those of C.
Proof. With S being nonsingular, every generalized eigenvalue λ0 of (3.1) is not zero. Therefore, it follows for the equivalent problem (3.11) that is also a positive eigenvalue to (R−1) tCR−1, where R is an upper triangular matrix to the Cholesky decomposition of S. With Sylvester′s Inertia Law, we know that the signs of eigenvalues of the matrices (R−1) tCR−1 are the same as those of C.
For the following proofs, we need the lemma of Lagrange (see, e.g., [12]).
Lemma 3.8 (Lemma of Lagrange). For M ⊂ ℝn, f : M → ℝ, g = (g1, …, gk) : M → ℝk, and Ng = {x ∈ M, g(x) = 0 ∈ ℝk}, let λ ∈ ℝk, so that xs ∈ Ng is a minimal value of the function Φλ : M → ℝ with
(3.15)
Then
xs is a minimal solution of
f in
Ng.
Definition 3.9. Let λ* be the smallest positive value of σ+(S, C) and a corresponding generalized eigenvector to λ* to the constraint .
Lemma 3.10. Let S − λC be a positive semidefinite matrix for λ ∈ σ+(S, C). Then a generalized eigenvector xs corresponding to λ is a local minimum of (1.1).
Proof. We consider Φ : ℝm → ℝ with
(3.16)
With grad
xΦ(
x) = 2(
Sx −
λCx) holds grad
xΦ(
xs) = 0 and since the Hessian matrix (
S −
λC) of Φ is positive semidefinite, the vector
xs is a minimal solution of Φ. Then,
xs minimizes also
F(
x) subject to
xtCx =
d by the Lemma
3.8.
Remark 3.11. In fact, in Lemma 3.10, we require only a semidefinite matrix, because xs ≠ 0 fulfills in (3.1).
Lemma 3.12. The matrix (S − λ*C) is positive semidefinite.
Proof. Let μ be an arbitrary eigenvalue of ((λ*) −1I − (Rt) −1CR−1), where R is the upper triangular matrix of the Cholesky decomposition from S. With
(3.17)
it follows that ((
λ*)
−1 −
μ) is an eigenvalue of (
Rt)
−1CR−1. This value is corresponding in (
3.11) with the inverse eigenvalue of problem (
3.1). Furthermore, it yields
(3.18)
So
μ ≥ 0 follows, that is, ((
λ*)
−1I − (
Rt)
−1CR−1) is positive semidefinite, and for
y ∈
ℝm we obtain
(3.19)
By setting
y =
Rx and with regular
R, we get
(3.20)
With
λ* > 0,
xt(
S −
λ*C)
x ≥ 0 for
x ∈
ℝm, that is,
S −
λ*C is positive semidefinite.
Theorem 3.13. For the smallest value λ* in σ+(S, C) exists a corresponding generalized eigenvector , which minimizes F(x) subject to xtCx = d.
Proof. The matrix (S − λ*C) is positive semidefinite by Lemma 3.12, and with Lemma 3.10 it follows that is a local minimum of problem (1.1). Furthermore, we know by Theorem 3.1 that if xs is a local extremum of F(x) subject to xtCx = d, then a positive value λs exists with
(3.21)
and it is
F(
xs) =
λsd. Because of the existence of a minimum
xE in Theorem
2.1 a value
λE ∈
σ+(
S,
C) exists in problem (
3.21) regarding to
xE. Otherwise, for an arbitrary local minimum
xs,
(3.22)
So,
λ* =
λE follows and
is a minimum of
F(
x) subject to
xtCx =
d.
Example 3.14. We minimize F : ℝ2 → ℝ with subject to . So, we have d = 1, S the identity matrix I2 ∈ ℝ2×2, and C ∈ ℝ2×2 a diagonal matrix with values −1 and 1. Then, we get the following generalized eigenvalue problem:
(3.23)
with eigenvalues 1 and −1. Because of Theorem
3.1, we consider a generalized eigenvector (
α,0)
t with
α ∈
ℝ∖{0} only for
λ = 1. Then, (1,0)
t and (−1,0)
t are solutions subject to
. This result is conform to the geometric interpretation, since we are looking for
x = (
ξ1,
ξ2)
t on the hyperbola
with minimal distance to the origin.
4. Reduction to an Eigenvalue Problem of Dimension rg(C)
In numerical applications, a generalized eigenvalue problem is mostly reduced to an eigenvalue problem, for example, by multiplication with S−1. Thus, we obtain the inverse problem (3.14) from (3.1) (see, e.g., [13]). But, S may be ill-conditioned, so that a solution of (3.14) may be numerical instable. Therefore, we present another reduction of (3.1).
Many times,
C is a sparse matrix with
r : = rank (
C) ≤ rank (
S). This symmetric matrix
C is diagonalizable in
C =
PtDP with
P orthogonal and
D diagonal. Further, we assume that the first
r diagonal entrees in
D are different to 0. For the characteristic polynomial in (
3.1), it follows
(4.1)
The order of
p is
r. We decompose these matrices in
(4.2)
with
S1,
D1 ∈
ℝr×r,
S2 ∈
ℝr×(m−r), and
S3 ∈
ℝ(m−r)×(m−r). Now, we eliminate
S2 in
PSPt by multiplications with Givens rotations
Gk ∈
ℝm×m,
k = 1, …,
l, so that it follows:
(4.3)
with
Σ1, Δ
1 ∈
ℝr×r,
Σ2, Δ
2 ∈
ℝ(m−r)×r, and
Σ3 ∈
ℝ(m−r)×(m−r). In (
4.1), we achieve with orthogonal
Gk,
k = 1, …,
l
(4.4)
Because of
p(0) = det (
PSPt) = det (
S) ≠ 0 and
p(0) = det (
Σ3)det (
Σ1), the submatrices
Σ1,
Σ3 are regular and the generalized eigenvalues of det (
Σ1 −
λΔ
1) are different to zero. So, with
y ∈
ℝr
(4.5)
can be transformed to an equivalent eigenvalue problem with
(4.6)
This system can be solved by finding the matrix
X with Δ
1X =
Σ1 using the Gaussian elimination and determining the eigenvalues of
X computing the
QR algorithm [
13]. Because all steps are equivalent, we have
, that is, the eigenvalues of (
3.1) and (
4.6) are the same.
With Theorem
3.13, we are looking for the smallest value
λ* ∈
σ+(
S,
C) and a corresponding generalized eigenvector
to minimize the problem (
3.1). So,
(4.7)
yields. By substitution of
for
, we obtain
(4.8)
We decompose
into the subvectors
and
with
. Then,
is a generalized eigenvector for
λ* of the problems (
4.5) and (
4.6).
Let
be an eigenvector to the smallest positive eigenvalue
λ* of (
4.6). Since
Σ3 is regular, it follows in (
4.8)
(4.9)
and a generalized eigenvector
for
λ* in (
3.1) is given as:
(4.10)
5. Applications in Conic Fitting
5.1. Fitzgibbon′s Ellipse Fitting
First, we would like to find an ellipse for a given set of points in
ℝ2. Generally, a conic in
ℝ2 is implicitly defined as the zero set of
f :
ℝ6 ×
ℝ2 →
ℝ to a constant parameter
a = (
α1, …,
α6)
t ∈
ℝ6:
(5.1)
The equation
f(
a,
ξ,
η) = 0 can also be written with
x = (
ξ,
η)
t as
(5.2)
The eigenvalues
λ1,
λ2 of
A characterize a conic uniquely [
14]. Thus, we need
for ellipses in
f(
a,
x) = 0. Furthermore, every scaled vector
μa with
μ ∈
ℝ∖{0} describes the same zero-set of
f. So, we can impose the constraint for ellipses with
. For
n (
n ≥ 6) given points (
ξi,
ηi)
t ∈
ℝ2, we want to find a parameter
a ∈
ℝ6, which minimizes
F :
ℝ6 →
ℝ with
(5.3)
This ellipse fitting problem is established and solved by Fitzgibbon [
7]. With the following matrices
D ∈
ℝn×6,
C ∈
ℝ6×6
(5.4)
and
, we achieve the equivalent problem:
(5.5)
For
S =
DtD, we have a special case of (
1.1). Assuming
S is a regular matrix and since the eigenvalues of
C are −2, −1, 0, and 2, by lemma 3.3 we know that the generalized eigenvalue problem
(5.6)
has exactly one positive solution
λ* ∈
ℝ. Because of Theorem
3.13 a corresponding generalized eigenvector
a* to
λ* minimizes the problem (
5.5) and
a* consists of the coefficients of an implicit given ellipse.
A numerically stable noniterative algorithm to solve this optimization problem is presented by Halir and Flusser [15]. In comparison with Section 4, their method uses a special block decomposition of the matrices D and C.
5.2. Hyperbola Fitting
Instead of ellipses, O′Leary and Zsombor-Murray want to find a hyperbola to a set of scattered data
xi ∈
ℝ2 [
9]. A hyperbola is a conic, which can uniquely be characterized by
[
14]. So, we consider the constraint
and obtain the optimization problem:
(5.7)
with
D and
C being chosen in 5.1. The matrix (−
C) has two positive eigenvalues. In this case, a solution is given by a generalized eigenvector to the smallest value in
σ+(
S, −
C). But O′Leary and Zsombor-Murray determine the best hyperbolic fit by evaluation of
, where the eigenvector
ai = (
ai,1, …,
ai,6)
t is associated to a positive value of
σ+(
S, −
C).
5.3. Bookstein′s Conic Fitting
In Bookstein′s method, the conic constraint is restricted to
(5.8)
where
λ1,
λ2 are the eigenvalues of
A in
f [
6]. There, it is
and at least one of them different to 0. But the constraint (
5.8) is not a restriction to a class of conics. Here, we determine an arbitrary conic, which minimizes
(5.9)
The resulting data matrix
D ∈
ℝn×6 is the same as for Fitzgibbon′s problem. The constraint matrix
C ∈
ℝ6×6 has a diagonal shape with the entrees (2,1, 2,0, 0,0), that is, all eigenvalues of
C are nonnegative. In the case of a regular matrix
S, the problem (
5.9) is solved for a generalized eigenvector to the smallest value in
σ+(
S,
C).
5.4. Approximation of Shrinked Aspheres
After the molding process in optical applications, the shrinkage of rotation-symmetric aspheres is implicitly defined for
x = (
ξ,
ζ)
t in
(5.10)
where
r ∈
ℝ∖{0} and
a = (
a1, …,
a4)
t are aspheric-specific constants [
10]. For
i = 1, …,
n with
n ≥ 4, the scattered data
xi = (
ξi,
ζi)
t ∈
ℝ2 of a shrinked asphere are given in this approximation problem. Here, we are looking for the conic parameter
a = (
α1, …,
α4)
t for a fixed value
rref, which minimizes
(5.11)
Analogously to Fitzgibbon, we have the matrices
D ∈
ℝn×4 and
C ∈
ℝ4×4 with
(5.12)
and with
we get the following optimization problem:
(5.13)
This is also an application of (
1.1). The matrix
C has the eigenvalues −2,0, 1, and 2. So, the generalized eigenvalue problem in (
3.1) with regular
S =
DtD ∈
ℝ4×4 has two positive values in
σ+(
S,
C). With Theorem
3.13, a generalized eigenvector
a* ∈
ℝ4 to the smaller of both values solves (
5.13).
The coefficients αi in the problems (5.5) and (5.13) correspond not to the same monomials ξkζl. Hence, we have different matrices D and C.
6. Conclusion
In this paper, we present a minimization problem of least squares subject to absolute quadratic constraints. We develop a closed theory with the main result that a minimum is a solution of a generalized eigenvalue problem corresponding to the smallest positive eigenvalue. Further, we show a reduction to an eigensystem for numerical calculations. Finally, we study four applications about conic approximations. We analyze Fitzgibbon′s method for direct ellipse-specific fitting, O′Leary′s direct hyperbola approximation, Bookstein′s conic fitting, and an optical application of shrinked aspheres. All these systems are attribute to the general optimization problem.