We investigate the proximal map for the weighted mean absolute error function. An algorithm for its efficient and vectorized evaluation is presented. As a demonstration, this algorithm is applied to a nonsmooth energy minimization problem.
1 INTRODUCTION
The proximity operator, or proximal map, plays a fundamental role in nonsmooth optimization; see, for instance, [1-3]. Given a function , the proximal map is defined as the solution of the problem
(1)
Under the mild condition that f is proper, lower semicontinuous and convex, , is well defined. We refer the reader to [4, Ch. 12.4] for details and further properties. We also mention http://proximity-operator.net/
, where formulas and implementations of the proximal map for various functions have been collected.
In this paper, we present theory and an efficient algorithm for the evaluation of , where is defined as
(2)
Here are given, positive weights and are given data, for some . We refer to (2) as the weighted mean absolute error. Any of its minimizers is known as a weighted median of the data . Clearly, f is proper, continuous, and convex, and so is for any .
By definition, the proximal map for f as in (2) is given by
(3)
In the case of a single data point (), problem (3) reduces to the well-known problem
(4)
with and , whose unique solution is explicitly given in terms of the soft-thresholding operator . In this case, we have
(5)
This map, often with , arises in many iterative schemes for the solution of problems involving the 1-norm; see, for instance, [5, 6]. We can therefore view (3) as a multithresholding operation.
We point out that [7] have considered the slightly more general problem
with F strictly convex, differentiable and bijective. The prototypical examples are functions with . We concentrate on the case , which agrees with (3). The algorithm in [7] computes the solution of (3) by finding the median in the union of two sorted lists of cardinality N. By contrast, our implementation in1 finds an index in a single sorted list of cardinality N. In contrast to [7], we provide a vectorized, open-source implementation of (3) in [8].
This paper is structured as follows: We establish an algorithm for the evaluation of the proximal map of the weighted mean absolute error (3) in Section 2 and prove its correctness in Theorem 2.1. In Section 3, we briefly discuss the structural properties of the proximal map. We conclude by showing an application of the proposed algorithm to a nonsmooth energy minimization problem.
2 ALGORITHM FOR THE EVALUATION OF THE PROXIMAL MAP
In this section, we derive an efficient algorithm for the evaluation of the proximal map (3) and prove its correctness in Theorem 2.1. To this end, we assume that the points have been sorted and duplicates have been removed and their weights added. As a result, we can assume . Moreover, we assume , and for all . Summands with can obviously be dropped from the sum in (3).
We divide the real line into the intervals
(6)
which overlap in the given data points. It is also useful to set and . We further introduce the forward and reverse cumulative weights as
We extend these definitions by setting , and . We therefore have for all . Using this notation, we can rewrite the derivative of f as
This formula reflects the fact that f is piecewise linear and convex because is monotonically increasing with i. Moreover, holds for all (points to the left of smallest data point d1), and holds for all (points to the right of the largest data point ). At , , f is nondifferentiable but we can specify its subdifferential, which is
The objective
(7)
of (3) is piecewise quadratic and strongly convex. Its derivative is thus strongly monotone and it satisfies
Consequently, the unique minimizer of Φ lies between these bounds.
The idea to finding the unique minimizer of (7) is to locate the smallest index such that holds, that is, the nearest data point to the right of . In other words, we need to find such that
(8a)
(8b)
holds. Now we can distinguish two cases: and . The first case applies if and only if
Otherwise, lies in , and thus, it is the unique minimizer of the locally quadratic objective Φ. In either case, once the index has been identified, is given by
(9)
The considerations above lead to Algorithm 1. In our implementation, we evaluate (10) for all k simultaneously and benefit from the quantities being monotone increasing with k when finding the first nonnegative entry.
ALGORITHM 1. Evaluation of (3), the proximal map of the weighted mean absolute error.
Theorem 2.1.Under the assumptions stated in Algorithm 1, it returns , the unique solution of (3).
Proof.Let be the index found in Algorithm 1. First, suppose . Then and are both finite, and (8) and (10) imply
Owing to the properties of the subdifferential of strongly convex functions, there exists a unique point such that , that is, is the unique minimizer of (3). This point either belongs to , or else holds. In the first case, Φ is differentiable, so that holds, yielding
Otherwise, we have , and implies
In either case, the unique solution of (3) is determined by , which is the quantity returned in Algorithm 1.
It remains to verify the marginal cases and . In case , we have due to the minimality of k. Hence, . A similar reasoning applies in the case .
We provide an efficient and vectorized open-source Python implementation of Algorithm 1 in [8]. It allows the simultaneous evaluation of (3) for multiple values of x, provided that each instance of (3) has the same number N of data points. The weights and data points as well as the prox parameter γ may vary between instances. The discussion so far assumed positive weights for simplicity, but the case is a simple extension and it is allowed in our implementation. This is convenient in order to simultaneously solve problem instances that differ with respect to the number of data points N. In this case, we can easily pad all instances to the same number of data points using zero weights. In addition, data points are allowed to be duplicate, that is, we only require , . Notice that since the data points are assumed to be sorted, finding the index in Algorithm 1 is of complexity .
3 STRUCTURE OF
In this section, we briefly discuss the structure of the map . Since it generalizes the soft-thresholding operation (5), it is not surprising that we obtain a graph that features a staircase pattern. An illustrative plot for certain choices of weights , data points , and prox parameter is shown in Figure 1. Each of the N distinct data points provides one plateau in the graph.
Two alternating regimes occur for , as x ranges over . First, when holds, then (9) implies that is an affine function of x with slope 1. This is the case for x whose associated index k is constant, that is,
As x increases beyond the upper bound of the interval above, enters a constant regime that applies to
Notice that the case reduces to the soft-thresholding map (5) with only one plateau.
4 APPLICATION TO A DEFLECTION ENERGY MINIMIZATION PROBLEM
In this section, we consider the minimization of the nonsmooth energy of a deflected membrane. We apply an alternating direction method of multiplier (ADMM) scheme that requires the parallel evaluation of the proximal map Algorithm 1 in each inner loop of the scheme.
Consider a bounded domain occupied by a thin membrane. In our model, the energy of this membrane is given in terms of its unknown deflection (displacement) function and it takes the following form:
(11)
The first term describes the potential energy of the membrane, where is the stiffness constant. The second term accounts for an external area force density f, which we assume to be a nonnegative function on Ω. Consequently, the deflection will be nonnegative as well. The specialty of our model is the third term, which can be interpreted as follows. The nonnegative constants serve as thresholds.1
Once the deflection z at a point in Ω exceeds any of these thresholds , an additional downward force of size times the excess deflection activates. Finally, the fourth term models an additional potential energy due to springs along the boundary Γ of the domain with stiffness constant α. A related problem has been studied in [9] with an emphasis on a posteriori error analysis and adaptive solution.
Notice that the minimization of the convex, nonsmooth energy among all displacements corresponds to the weak form of a partial differential equation (PDE), or rather a variational inequality. In fact, the necessary and sufficient optimality conditions for the minimization of (11) amount to
(12)
Here, is the characteristic function of the set A with values in . Provided that the set is of Lebesgue measure zero, the variational inequality (12) becomes an equation, whose strong form—using integration by parts—can be seen to be
From here, we also learn that the boundary term in the energy (11) leads to boundary conditions of Robin type.
We employ a standard Galerkin approach to numerically discretize the energy (11). To this end, we replace the displacement space by some finite-dimensional subspace of piecewise linear, globally continuous finite element functions defined over a triangulation of Ω. Denoting the associated nodal basis by , we define the stiffness matrix
as well as the lumped mass matrix
with when . The reason for using mass lumping to approximate all area integrals in (11) that do not involve derivatives is to translate the pointwise maximum operator into a coefficientwise one. This technique is crucial for an efficient numerical realization and has been used before, for example, in [10, 11].
We continue to use z to denote the nodal values of the finite element discretization of the deflection and thus obtain the discrete energy
(13)
Here, 1 denotes the vector of all ones and the max operation is understood coefficientwise.
The minimization of the convex but nonsmooth energy (13) is not straightforward. Our method of choice here is an ADMM scheme, which moves the nonsmooth terms into a separate subproblem, which can then be efficiently solved using Algorithm 1. We refer the reader to [12] and the references therein for an overview on ADMM.
To obtain a subproblem of type (3), we use the identity and arrive at
(14)
where is a constant and is a modification of the force vector. Moreover, the absolute value is understood coefficientwise. Dropping the constant C, introducing a second variable y, the constraint , and associated (scaled) Lagrange multiplier μ gives rise to the augmented Lagrangian
(15)
where is the squared norm induced by the positive diagonal matrix M.
An ADMM computes iterates , , according to the following update scheme;
(16a)
(16b)
(16c)
For the problem at hand, (16a) amounts to the solution of the discretized Poisson-like problem
The y-update (16b) on the other hand can be cast as the problem
(17)
Owing to the diagonal structure of the mass matrix, this problem fully decouples, the mass matrix cancels, and we obtain
(18)
for each component of y. This problem fits into the pattern of (3), and thus, can be solved efficiently and simultaneously for all components by Algorithm 1.
To validate the method, we consider two simple, convex domains
We use meshes with 2521 vertices for Ω1 and 2637 vertices for Ω2. We choose the stiffness constant c, force density f, boundary stiffness constant α, additional downward forces , and threshold deflections to be constant over the domains in each case, as
(19)
We apply the ADMM scheme (16). The iterations are terminated as soon as the stopping criterion
(20)
is satisfied, where we recall that denotes the norm induced by the lumped mass matrix M. With this setup and , the ADMM algorithm required 186 iterations for the problem on Ω1 and 278 iterations in case of Ω2. The resulting solutions z are shown in Figure 2.
Resulting deflections z that minimize the discrete energy functional defined in (13) over two different domains, with data given in (19). Deflections were computed with the ADMM scheme (16) and stopping criterion (20). The subdomains are colored according to the number of active terms ( in (13).
ACKNOWLEDGMENTS
This work was supported by DFG grants HE 6077/10–2 and SCHM 3248/2–2 within the Priority Program SPP 1962 (Nonsmooth and Complementarity-based Distributed Parameter Systems: Simulation and Hierarchical Optimization), which is gratefully acknowledged.
Open access funding enabled and organized by Projekt DEAL.
REFERENCES
1Chambolle, A., & Pock, T. (2011). A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision, 40(1), 120–145. https://doi.org/10.1007/s10851-010-0251-1
2Combettes, P. L., & Pesquet, J. C. (2011). Proximal splitting methods in signal processing. In H. Bauschke, R. Burachik, P. Combettes, V. Elser, D. Luke, H. Wolkowicz (Eds.), Fixed-point algorithms for inverse problems in science and engineering, Springer Optimization and Its Applications (Vol. 49, pp. 185–212). Springer. https://doi.org/10.1007/978-1-4419-9569-8_10
4Bauschke, H. H., & Combettes, P. L. (2011). Convex analysis and monotone operator theory in Hilbert spaces. CMS Books in Mathematics/Ouvrages de Mathématiques de la SMC. Springer. https://doi.org/10.1007/978-1-4419-9467-7
5Daubechies, I., Defrise, M., & De Mol, C. (2004). An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on Pure and Applied Mathematics, 57(11), 1413–1457. https://doi.org/10.1002/cpa.20042
7Li, Y., & Osher, S. (2009). A new median formula with applications to PDE based denoising. Communications in Mathematical Sciences, 7(3), 741–753. https://doi.org/10.4310/cms.2009.v7.n3.a11
8Baumgärtner, L., Herzog, R., Schmidt, S., & Weiß, M. (2023). The proximal map of the weighted mean absolute error. https://doi.org/10.5281/zenodo.7620815
9Bostan, V., Han, W., & Reddy, B. D. (2005). A posteriori error estimation and adaptive solution of elliptic variational inequalities of the second kind. Applied Numerical Mathematics. An IMACS Journal, 52(1), 13–38. https://doi.org/10.1016/j.apnum.2004.06.012
10Wachsmuth, G., & Wachsmuth, D. (2011). Convergence and regularization results for optimal control problems with sparsity functional. ESAIM: Control, Optimisation and Calculus of Variations, 17(3), 858–886. https://doi.org/10.1051/cocv/2010027
11Casas, E., Herzog, R., & Wachsmuth, G. (2012). Approximation of sparse controls in semilinear equations by piecewise linear functions. Numerische Mathematik, 122(4), 645–669. https://doi.org/10.1007/s00211-012-0475-7
12Boyd, S., Parikh, N., Chu, E., Peleato, B., & Eckstein, J. (2010). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3(1), 1–122. https://doi.org/10.1561/2200000016
Please check your email for instructions on resetting your password.
If you do not receive an email within 10 minutes, your email address may not be registered,
and you may need to create a new Wiley Online Library account.
Request Username
Can't sign in? Forgot your username?
Enter your email address below and we will send you your username
If the address matches an existing account you will receive an email with instructions to retrieve your username
The full text of this article hosted at iucr.org is unavailable due to technical difficulties.