Volume 2012, Issue 1 806945
Research Article
Open Access

A Comparison between Fixed-Basis and Variable-Basis Schemes for Function Approximation and Functional Optimization

Giorgio Gnecco

Corresponding Author

Giorgio Gnecco

Department of Communication, Computer and System Sciences (DIST), University of Genova, Via Opera Pia 13, 16145 Genova, Italy unige.it

Search for more papers by this author
First published: 12 January 2012
Citations: 13
Academic Editor: Jacek Rokicki

Abstract

Fixed-basis and variable-basis approximation schemes are compared for the problems of function approximation and functional optimization (also known as infinite programming). Classes of problems are investigated for which variable-basis schemes with sigmoidal computational units perform better than fixed-basis ones, in terms of the minimum number of computational units needed to achieve a desired error in function approximation or approximate optimization. Previously known bounds on the accuracy are extended, with better rates, to families of d-variable functions whose actual dependence is on a subset of dd variables, where the indices of these d variables are not known a priori.

1. Introduction

In functional optimization problems, also known as infinite programming problems, functionals have to be minimized with respect to functions belonging to subsets of function spaces. Function-approximation problems, the classical problems of the calculus of variations [1] and, more generally, all optimization tasks in which one has to find a function that is optimal in a sense specified by a cost functional belong to this family of problems. Such functions may express, for example, the routing strategies in communication networks, the decision functions in optimal control problems and economic ones, and the input/output mappings of devices that learn from examples.

Experience has shown that optimization of functionals over admissible sets of functions made up of linear combinations of relatively few basis functions with a simple structure and depending nonlinearly on a set of “inner” parameters (e.g., feedforward neural networks with one hidden layer and linear output activation units) often provides surprisingly good suboptimal solutions. In such approximation schemes, each function depends on both external parameters (the coefficients of the linear combination) and inner parameters (the ones inside the basis functions). These are examples of variable-basis approximators since the basis functions are not fixed but their choice depends on the one of the inner parameters. In contrast, classical approximation schemes (such as the Ritz method in the calculus of variations [1]) do not use inner parameters but employ fixed basis functions, and the corresponding approximators exhibit only a linear dependence on the external parameters. Then, they are called fixed-basis or linear approximators. In [2], certain variable-basis approximators were applied to obtain approximate solutions to functional optimization problems. This technique was later formalized as the extended Ritz method (ERIM) [3] and was motivated by the innovative and successful application of feedforward neural networks in the late 80 s. For experimental results and theoretical investigations about the ERIM, see [27] and the references therein.

The basic motivation to search for suboptimal solutions of these forms is quite intuitive: when the number of basis functions becomes sufficiently large, the convergence of the sequence of suboptimal solutions to an optimal one may be ensured by suitable properties of the set of basis functions, the admissible set of functions, and the functional to be optimized [1, 5, 8]. Computational feasibility requirements (i.e., memory occupancy and time needed to find sufficiently good values for the parameters) make it crucial to estimate the minimum number of computational units needed by an approximator to guarantee that suboptimal solutions are “sufficiently close” to an optimal one. Such a number plays the role of “model complexity” of the approximator and can be studied with tools from linear and nonlinear approximation theory [9, 10].

As compared to fixed-basis approximators, in variable-basis ones the nonlinearity of the parametrization of the variable basis functions may cause the loss of useful properties of best approximation operators [11], such as uniqueness, homogeneity, and continuity, but may allow improved rates of approximation or approximate optimization [9, 1214]. Then, to justify the use of variable-basis schemes instead of fixed-basis ones, it is crucial to investigate families of function-approximation and functional optimization problems for which, for a given desired accuracy, variable-basis schemes require a smaller number of computational units than fixed-basis ones. This is the aim of this work.

In the paper, the approximate solution of certain function-approximation and functional optimization problems via fixed- and variable-basis schemes is investigated. In particular, families of problems are presented, for which variable-basis schemes of a certain kind perform better than any fixed-basis one, in terms of the minimum number of computational units needed to achieve a desired worst-case error. Propositions 2.4, 2.7, 2.8, and 3.2 are the main contributions, which are presented after the exposition of results available in the literature.

The paper is organized as follows. Section 2 compares variable- and fixed-basis approximation schemes for function-approximation problems, which are particular instances of functional optimization. Section 3 extends the estimates to some more general families of functional optimization problems through the concepts of modulus of continuity and modulus of convexity of a functional. Section 4 is a short discussion.

2. Comparison of Bounds for Fixed- and Variable-Basis Approximation

Here and in the following, the “big O,” “big Ω,” and “big Θ” notations [18] are used. For two functions f, g : (0, +) → , one writes f = O(g) if and only if there exist M > 0 and x0 > 0 such that |f(x)| ≤ M | g(x)| for all x > x0, f = Ω(g) if and only if g = O(f), and f = Θ(g) if and only if both f = O(g) and f = Ω(g) hold. In order to be able to use such notations also for multivariable functions, in the following it is assumed that all their arguments are fixed with the exception of one of them (more precisely, the argument ɛ).

Two approaches have been adopted in the literature to compare the approximation capabilities of fixed- and variable-basis approximation schemes (see also [15] for a discussion on this topic). In the first one, one fixes the family of functions to be approximated (e.g., the unit ball in a Sobolev space [16]), then one finds bounds on the worst-case approximation error for functions belonging to such a family, for various approximation schemes (fixed- and variable-basis ones). The second approach, initiated by Barron [12, 17], fixes a variable-basis approximation scheme (e.g., the set of one-hidden-layer perceptrons with a given upper bound on the number of sigmoidal computational units) and searches for families of functions that are well approximated by such an approximation scheme. Then, for these families of functions, the approximation capability of the variable-basis approximation scheme is compared with the ones of fixed-basis approximation schemes. In this context, one is interested in finding cases for which, the number of computational units being the same, one has upper bounds on the worst-case approximation error for certain variable-basis approximation schemes that are smaller than corresponding lower bounds for any fixed-basis one, implying that such variable-basis schemes have better approximation capabilities than every fixed-basis one.

One problem of the first approach is that, for certain families of smooth functions to be approximated, the bounds on the worst-case approximation error obtained for fixed- and variable-basis approximation schemes are very similar. In particular, typically one obtains the so-called Jackson rate of approximation [4] n = Θ(ɛd/m), where n is the number of computational units, ɛ > 0 is the worst-case approximation error, m is a measure of smoothness, and d is the number of variables on which such functions depend. Following the second approach, it was shown in [12, 17] that, for certain function-approximation problems, variable-basis schemes exhibit some advantages over fixed-basis ones (see Sections 2.1 and 2.2, where extensions of some results from [12, 17] are also derived).

In Section 2.1, some bounds in the 2-norm are considered, whereas Section 2.2 investigates bounds in the supnorm. Estimates in the 2-norm can be applied, for example, to investigate the approximation of the optimal policies in static team optimization problems [19]. Estimates in the supnorm are required, for example, to investigate the approximation of the optimal policies in dynamic optimization problems with a finite number of stages [20]. Indeed, for such problems, the supnorm can be used to analyze the error propagation from one stage to the next one, while this is not the case for the 2-norm [20]. Moreover, it provides guarantees on the approximation errors in the design of the optimal decision laws.

2.1. Bounds in the 2-Norm

The following Theorem 2.1 from [12] describes a quite general set of functions of d real variables (described in terms of their Fourier distributions) whose approximation from variable-basis approximation schemes with sigmoidal computational units requires O(ɛ−2) computational units, where ɛ > 0 is the desired worst-case approximation error measured in the 2-norm. Recall that a sigmoidal function is defined in general as a bounded measurable function σ : such that σ(y) → 1 as y → + and σ(y) → 0 as y → − [21]. For C > 0, d a positive integer, and B a bounded subset of d containing 0, by ΓB,C we denote the set of functions f : d having a Fourier representation of the form
()
for some complex-valued measure (where F(dω) and θ(ω) are the magnitude distribution and the phase at the pulsation ω, resp.) such that
()
where 〈·, ·〉 is the standard inner product on d. Functions in ΓB,C are continuously differentiable on B [12]. When B is the hypercube [−1,1] d, the inequality (2.2) reduces to
()
where ∥·∥1 denotes the l1-norm.

For a probability measure μ on B, we denote by 2(B, μ) the Hilbert space of functions g : B with inner product and induced norm . When there is no risk of confusion, the simpler notation is used instead of .

Theorem 2.1 (see [12], Theorem  1.)For every f ∈ ΓB,C, every sigmoidal function σ : , every probability measure μ on B, and every n ≥ 1, there exist akd, bk, ck, and fn : B of the form

()
such that
()

Variable-basis approximators of the form (2.4) are called one-hidden-layer perceptrons with n computational units. Formula (2.5) shows that at most
()
computational units are required to guarantee a desired worst-case approximation error ɛ in the 2-norm, when variable-basis approximation schemes of the form (2.4) are used to approximate functions belonging to the set ΓB,C.

In contrast to this, Theorem 2.2 from [12] shows that, when B is the unit hypercube [0,1] d and μ = μu is the uniform probability measure on [0,1] d, for the same set of functions ΓB,C the best linear approximation scheme requires Ω(ɛd) computational units in order to achieve the same worst-case approximation error ɛ. The set of all linear combinations of n fixed basis functions h1, h2, …, hn in a linear space is denoted by span (h1, h2, …, hn).

Theorem 2.2 (see [12], Theorem  6.)For every n ≥ 1 and every choice of fixed basis functions h1, h2, …, hn2([0,1] d, μu), one has

()

Remark 2.3. Inspection of the proof of [12, Theorem  6] shows that the factors 1/8 and 1/n, which appear in the original statement of the theorem, have to be replaced by 1/16 and 1/2n in (2.7), respectively.

Inspection of the proof of Theorem 2.2 in [12] shows also that the lower bound (2.7) still holds if the set is replaced by either

()
or
()
where l denotes any multi-index and ∥l1 its norm (i.e., the sum of the components of l, which are nonnegative). Obviously, when B is the unit hypercube [0,1] d, the upper bound (2.5) still holds under one of these two replacements, since .

The inequality (2.7) implies that for a uniform probability measure on [0,1] d, at least
()
computational units are required to guarantee a desired worst-case approximation error ɛ in the 2-norm, when fixed-basis approximation schemes of the form span (h1, h2, …, hn) are used to approximate functions in . Then, at least for a sufficiently small value of ɛ, Theorems 2.1 and 2.2 show that for d > 2, variable-basis approximators of the form (2.4) provide a smaller approximation error than any fixed-basis one for functions in , the number of computational units being the same.

It should be noted that, for fixed C and ɛ, the estimate (2.6) is constant with respect to d, whereas the one (2.10) goes to 0 as d goes to +. So, a too small value of in the bound (2.10) for fixed-basis approximation may make the theoretical advantage of variable-basis approximation of impractical use, since for large d it would be guaranteed only for sufficiently small ɛ (depending on C, too). In the following, families of d-variable functions are considered, for which this drawback is mitigated. These are families of d-variable functions whose actual dependence is on a subset of dd variables, where the indices of these d variables are not known a priori. These families are of interest, for example, in machine learning applications, for problems with redundant or correlated features. In this context, each of the d real variables represents a feature (e.g., a measure of some physical property of an object), and one is interested in learning a function of these features on the basis of a set of supervised examples. As it often happens in applications, only a small subset of the features is useful for the specific task (typically, classification or regression), due to the presence of redundant or correlated features. Then, one may assume that the function to be learned depends only on subset of dd features but one may not know a priori which particular subset is. The problem of finding such a subset (or finding a subset of features of sufficiently small cardinality d on which the function mostly depends, when the function depends on all the d features) is called the feature-selection problem [22].

For d a positive integer and d its multiple, denotes the subset of functions in that depend only on d of their possible d arguments.

Proposition 2.4. For every n ≥ 1 and every choice of fixed basis functions h1, h2, …, hn2([0,1] d, μu), for n ≤ (d + 1)/2 one has

()
and for n > (d + 1)/2
()

Proof. The proof is similar to the one of [12, Theorem  6]. The following is a list of the changes to that proof, needed to derive (2.11) and (2.12). We denote by ∥l0 the number of nonzero components of the multi-index l. Proceeding likewise in the proof of [12, Theorem  6], we get

()
where m* is the smallest positive integer m such that the number of multi-indices l ∈ {0,1, …} d with norm ∥l1m and that satisfy the constraint ∥l0d is larger than or equal to 2n. More precisely, (2.13) is obtained by observing that for such an integer m the set contains at least 2n orthogonal cosinusoidal functions with 2([0,1] d, μu)-norm equal to C/4πm and applying [12, Lemma  6], which states that for any orthonormal basis of a 2n-dimensional space, there does not exist a linear subspace of dimension n having distance smaller than 1/2 from every basis function in such an orthonormal basis. The constraint ∥l0d is not present in the proof of [12, Theorem  6] and is due to the specific form of the set . Because of such a constraint, the functions in 𝒮2 with ∥l0 > d do not belong to .

Then we get

()
()
Indeed, for d = d the equality (2.14) follows recalling that the number of different ways of placing No identical objects in Nb distinct boxes is [23, Theorem  5.1], and for this case it is the same estimate as the one obtained in the proof of [12, Theorem  6]. Similarly, for 1 ≤ md the constraint ∥l0d is redundant and we get again (2.14). Finally, for d/d a positive integer larger than 1 and m > 1, the upper bound in (2.15) is obtained ignoring the constraint ∥l0d, whereas the lower bound is obtained as follows. First, we partition the set of d variables into d/d subsets of cardinality d, and then we apply to each subset the estimate obtained by replacing d by d in (2.14). In this way, the multi-index l = 0 is counted d/d times (one for each subset), but the final estimate so obtained holds since for m > 1 there are at least other d/d − 1 multi-indices that have been not counted in this process.

In the following, we apply (2.14) and (2.15) for m = 1 and m > 1, respectively. For m = 1, the condition becomes

()
so m* = 1 for n ≤ (d + 1)/2. This, combined with (2.13), proves (2.11).

Now, likewise in the proof of [12, Theorem  6], for m > 1 we exploit a bound from Stirling’s formula, according to which , so the condition holds if we impose

()
which is equivalent to
()
(note that, for n > (d + 1)/2, the value of m provided by (2.18) is indeed larger than 1, as required for the application of (2.15)). Since
()
we conclude that for n > (d + 1)/2. This, together with (2.13), proves the statement (2.12).

For the case considered by Proposition 2.4, an uniform probability measure on [0,1] d, and 0 < ɛ < C/8π, formulas (2.11) and (2.12) show that at least
()
computational units are required to guarantee a desired worst-case approximation error ɛ in the 2-norm, when fixed-basis approximation schemes of the form span (h1, h2, …, hn) are used to approximate functions in .

Remark 2.5. The quantity d in Proposition 2.4 has to be interpreted as an effective number of variables for the family of functions to be approximated. Roughly speaking, the flexibility of the neural network architecture (2.4) allows one to identify, for each , the d variables on which it actually depends, whereas fixed-basis approximation schemes have not this flexibility. Indeed, differently from the lower bound (2.10), for fixed C, ɛ, and d the lower bound (2.20) goes to + as d goes to +. Finally, similar remarks as in Remark 2.3 apply to Proposition 2.4.

2.2. Bounds in the Supnorm

The next result is from [17] and is analogous to Theorem 2.1, but it measures the worst-case approximation error in the supnorm.

Theorem 2.6 (see [17], Theorem  2.)For every f ∈ ΓB,C and every n ≥ 1, there exists fn : B of the form (2.4) such that

()

Upper bounds in the supnorm similar to the one from Theorem 2.6 are given, for example, in [24, 25]. Moreover, for , the following estimate holds.

Proposition 2.7. For every and every n ≥ 1, there exists fn : [0,1] d of the form (2.4) such that

()

Proof. Each function depends on d arguments; let be their indices. Let be defined by , where , and all the other components of x are arbitrary in . Then , so by Theorem 2.6 there exists an approximation made up of n sigmoidal computational units and a constant term such that . Finally, we observe that can be extended to a function fn : [0,1] d of the form (2.4) such that , then one obtains (2.22).

The estimates (2.21) and (2.22) show that at most
()
computational units, respectively, are required to guarantee a desired worst-case approximation error ɛ in the supnorm, when variable-basis approximation schemes of the form (2.4) are used to approximate functions belonging to the sets ΓB,C and , respectively.

The next proposition, combined with Theorem 2.6 and Proposition 2.7, allows one to compare the approximation capabilities of fixed- and variable-basis schemes in the supnorm, showing cases for which the upper bounds (2.21) and (2.22) are smaller than one of the corresponding lower bounds (2.24)–(2.26), at least for n sufficiently large.

Proposition 2.8. For every n ≥ 1 and every choice of fixed bounded and μu-measurable basis functions h1, h2, …, hn : [0,1] d, the following hold.

  • (i)

    For the approximation of functions in , one has

    ()

  • (ii)

    For the approximation of functions in , for n ≤ (d + 1)/2, one has

    ()
    whereas for n > (d + 1)/2

()

Proof. For each bounded and μu-measurable function g : [0,1] d, we get

()
so
()
Then we get the lower bounds (2.24)–(2.26) by (2.7), (2.11), and (2.12), respectively.

For the case considered by Proposition 2.8, the estimate (2.24) implies that at least computational units are required to guarantee a desired worst-case approximation error ɛ in the supnorm, when fixed-basis approximation schemes of the form span (h1, h2, …, hn) are used to approximate functions in . Similarly, for 0 < ɛ < C/8π, the bounds (2.25) and (2.26) imply that at least computational units are required when is replaced by . One can observe that, for each d, d and C, each of the lower bounds (2.25) and (2.26) is larger than (2.24). Moreover, all the other parameters being fixed, the lower bound (2.24) goes to 0 as d tends to +, whereas for d ≥ 2n − 1, the lower bound (2.25) holds, and it does not depend on the specific value of d. Finally, for d > 2, the upper bound (2.21) is smaller than the lower bound (2.24) for n sufficiently large, and similarly, for d > 2, the upper bound (2.22) is smaller than the lower bounds (2.25) and (2.26) for n sufficiently large. For instance, in the latter case and for d sufficiently small with respect to d, this happens for ⌈225d2/π2⌉ ≤ n ≤ (d + 1)/2 and for
()
where and K2 = 2/(d − 2).

Similar remarks as in Remark 2.3 can be made about the bounds in the supnorm derived in this section.

3. Application to Functional Optimization Problems

The results of Section 2 can be extended, with the same rates of approximation or similar ones, to the approximate solution of certain functional optimization problems. This can be done by exploiting the concepts of modulus of continuity and modulus of convexity of a functional, provided that continuity and uniform convexity assumptions are satisfied. The basic ideas are the following (see also [5] for a similar analysis).

3.1. Rates of Approximate Optimization in Terms of the Modulus of Continuity

Let 𝒳 be a normed linear space, X𝒳, and Φ : X a functional. Suppose that the functional optimization problem
()
has a solution f, and let X1X2⊆⋯⊆Xn⊆⋯⊆X be a nested sequence of subsets of X such that
()
for some ɛn > 0, where ɛn → 0 as n → +. Then, if the functional Φ is continuous, too, one has
()
where defined by is the modulus of continuity of Φ at f. For instance, if Φ is Lipschitz continuous with Lipschitz constant KΦ, one has , and by(3.2)
()
Then, if an upper bound on ɛn in terms of n is known (e.g., ɛn = O(n−1/2) under the assumptions of Theorem 2.1, where X = ΓB,C2(B, μ) = 𝒳 and Xn is the set of functions of the form (2.4)), then the same upper bound (up to a multiplicative constant) holds on . So, investigating the approximating capabilities of the sets Xn is useful for functional optimization purposes, too.

3.2. Rates of Approximate Optimization in Terms of the Modulus of Convexity

When dealing with suboptimal solutions from a set XnX, the following question arises: suppose that is such that
()
for some γn > 0, where γn → 0 as n → +. This can be guaranteed, for example, if the functional is continuous, the sets Xn satisfy the property (3.2), and one chooses assuming, almost without loss of generality, that such a set is nonempty. If this is not the case, then one can proceed as follows. For ϵ > 0, let . Then one obtains estimates similar to the ones of this section (obtained assuming that is nonempty) by choosing , where η > 1 is a constant. Does the estimate (3.5) imply an upper bound on the approximation error ? A positive answer can be given when the functional Φ is uniformly convex. Recall that a functional Φ : X is called convex on a convex set X𝒳 if and only if for all h, gX and all λ ∈ [0,1], one has Φ(λh + (1 − λ)g) ≤ λΦ(h)+(1 − λ)Φ(g) and it is called uniformly convex if and only if there exists a nonnegative function δ : [0, +)→[0, +) such that δ(0) = 0, δ(t) > 0 for all t > 0, and for all h, gX and all λ ∈ [0,1], one has
()
Any such function δ is called a modulus of convexity of Φ [26]. The terminology is not unified: some authors use the term “strictly uniformly convex” instead of “uniformly convex” and reserve the term “uniformly convex” for the case where δ : [0, +)→[0, +) merely satisfies δ(0) = 0 and δ(t0) > 0 for some t0 > 0 (see, e.g., [27, 28]). Note that when 𝒳 is a Hilbert space and δ(t) has the quadratic expression
()
for some constant c > 0, the condition (3.6) is equivalent to the convexity of the functional . Indeed, the latter property means that, for all h, gX and all λ ∈ [0,1], one has
()
and this is equivalent to
()
since one can show through straightforward computations that, for 𝒳 a Hilbert space, one has
()
One of the most useful properties of uniform convexity is that f ∈ argminfXΦ(f) implies the lower bound
()
for any fX (see, e.g., [5, Proposition  2.1(iii)]). When the modulus of convexity has the form (3.7), this implies (together with (3.5))
()
When (3.2) holds, too, and Φ has modulus of continuity at f, one can take
()
in (3.12), thus obtaining
()
Again, this allows one to extend rates of function approximation to functional optimization, supposing, as in Section 3.1, that Φ is also Lipschitz continuous with Lipschitz constant KΦ and that ɛn = O(n−1/2). Then, one obtains (from the choice (3.13) for γn and formula (3.14))
()
()

Remark 3.1. In [29], a greedy algorithm is proposed to construct a sequence of sets Xn corresponding to variable-basis schemes and functions that achieve the rate (3.15) for certain uniformly convex functional optimization problems. Such an algorithm can be interpreted as an extension to functional optimization of the greedy algorithm proposed in [12] for function approximation by sigmoidal neural networks.

Finally, it should be noted that the rate (3.15) is achieved in general by imposing some structure on the sets X and Xn. For instance, the set X in [29] is the convex hull of some set of functions G𝒳, that is,

()
whereas, for each n+, the set Xn in [29] is
()
Functional optimization problems have in general a natural domain X larger than co   G (or its closure in the norm of the ambient space 𝒳). Therefore, the choice of a set X of the form (3.17) as the domain of the functional Φ might seem unmotivated. This is not the case, because there are several examples of functional optimization problems for which, for suitable sets G and a natural domain X larger than co   G (resp., ), the set
()
has a nonempty intersection with co   G (resp., ), or it is contained in it. This issue is studied in [20] for dynamic optimization problems and in [19] for static team optimization ones, where structural properties (e.g., smoothness) of the minimizers are studied.

3.3. Comparison between Fixed- and Variable-Basis Schemes for Functional Optimization

The proposition follows by combining the results derived in Sections 2.1, 3.1, and 3.2.

Proposition 3.2. Let the functional Φ be Lipschitz continuous with Lipschitz constant KΦ and uniformly convex with modulus of convexity of the form (3.7), X = ΓB,C, μ any probability measure on B, 𝒳 = 2(B, μ), and suppose that there exists a minimizer . Then the following hold.

  • (i)

    For every n ≥ 1 there exists fn of the form (2.4) such that

    ()
    For each such fn one has
    ()
    and if of the form (2.4) is such that
    ()
    then
    ()

  • (ii)

    For B = [0,1] d, μu equal to the uniform probability measure on [0,1] d, every n ≥ 1, and every choice of fixed-basis functions h1, …, hn2([0,1] d, μu), there exists a uniformly convex functional (such a functional can be also chosen to be Lipschitz continuous with Lipschitz constant , but this is not needed in the inequalities (3.24)–(3.29), since they do not contain ) with modulus of convexity of the form (3.7) and minimizer such that for every 0 < χ < 1 one has

    ()
    ()

  • (iii)

    The statements (i) and (ii) still hold by replacing the set ΓB,C by , for d a multiple of d. The only difference is that the estimates (3.24) and (3.25) are replaced, respectively, by

    ()
    ()
    for n ≤ (d + 1)/2 and by
    ()
    ()
    for n > (d + 1)/2.

Proof. (i) The estimate (3.20) follows by Theorem 2.1. The bound (3.21) follows by (3.20), the definition of modulus of continuity, and the assumption of Lipschitz continuity of Φ. Finally, (3.23) is obtained by property (3.11) of the modulus of convexity and its expression (3.7).

(ii) (3.24) comes from Theorem 2.2: the constant χ is introduced in order to remove the supremum with respect to in formula (2.7) and replace it with the choice , where is any function that achieves the bound (2.7) up to the constant factor χ; (3.25) follows from (3.24), (3.11), and (3.7), choosing as any functional that is uniformly convex with modulus of convexity of the form (3.7), and such that .

(iii) The estimates (3.20), (3.21), (3.23) still hold when the set ΓB,C is replaced by since for B = [0,1] d, whereas formulas (3.26)–(3.29) are obtained likewise formulas (3.24) and (3.25), by applying Proposition 2.4 instead of Theorem 2.2.

4. Discussion

Classes of function-approximation and functional optimization problems have been investigated for which, for a given desired error, certain variable-basis approximation schemes with sigmoidal computational units require less parameters than fixed-basis ones. Previously known bounds on the accuracy have been extended, with better rates, to families of functions whose effective number of variables d is much smaller than the number of their arguments d.

Proposition 3.2 shows that there is a strict connection between certain problems of function approximation and functional optimization. For such two classes of problems, indeed, the approximation error rates for the first class can be converted into rates of approximate optimization for the second one and vice versa. In particular, for d > 2, , and any linear approximation scheme span {h1, h2, …, hn}, the estimates (3.21) and (3.25) show families of functional optimization problems for which the error in approximate optimization with variable-basis schemes of sigmoidal type is smaller than the one associated with the linear scheme. For d > 2 and , a similar remark can be made for the estimates (3.21) and (3.27) and for the bounds (3.21) and (3.29). Finally, the bound (3.23) shows that for large n any approximate minimizer of the form (2.4) differs slightly from the true minimizer f, even though the error in approximate optimization (3.22) and the associated approximation error (3.23) have different rates. In contrast, the estimates (3.24), (3.26), and (3.28) show that, for any linear approximation scheme span {h1, h2, …, hn}, there exists a functional optimization problem whose minimizer cannot be approximated with the same accuracy by the linear scheme.

The results presented in the paper provide some theoretical justification for the use of variable-basis approximation schemes (instead of fixed-basis ones) in function approximation and functional optimization.

Acknowledgment

The author was partially supported by a PRIN grant from the Italian Ministry for University and Research, project “Adaptive State Estimation and Optimal Control.”

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.