Sample-Path Large Deviations in Credit Risk
Abstract
The event of large losses plays an important role in credit risk. As these large losses are typically rare, and portfolios usually consist of a large number of positions, large deviation theory is the natural tool to analyze the tail asymptotics of the probabilities involved. We first derive a sample-path large deviation principle (LDP) for the portfolio′s loss process, which enables the computation of the logarithmic decay rate of the probabilities of interest. In addition, we derive exact asymptotic results for a number of specific rare-event probabilities, such as the probability of the loss process exceeding some given function.
1. Introduction
For financial institutions, such as banks and insurance companies, it is of crucial importance to accurately assess the risk of their portfolios. These portfolios typically consist of a large number of obligors, such as mortgages, loans, or insurance policies, and therefore it is computationally infeasible to treat each individual object in the portfolio separately. As a result, attention has shifted to measures that characterize the risk of the portfolio as a whole, see, for example, [1] for general principles concerning managing credit risk. The best-known metric is the so-called value at risk, see [2], which is measuring the minimum amount of money that can be lost with α percent certainty over some given period. Several other measures have been proposed, such as economic capital, the risk-adjusted return on capital (RAROC), or expected shortfall, which is a coherent risk measure [3]. Each of these measurements is applicable to market risk as well as credit risk. Measures such as loss-given default (LGD) and exposure at default (EAD) are measures that purely apply to credit risk. These and other measures are discussed in detail in, for example, [4].
Turning back to the setting of credit risk, both of the results we present are derived in a setup where all obligors in the portfolio are i.i.d., in the sense that they behave independently and stochastically identically. A third contribution of our work concerns a discussion on how to extend our results to cases where the obligors are dependent (meaning that they, in the terminology of [5], react to the same “macroenvironmental” variable, conditional upon which they are independent again). We also treat the case of obligor-heterogeneity: we show how to extend the results to the situation of multiple classes of obligors.
The paper is structured as follows. In Section 2 we introduce the loss process and we describe the scaling under which we work. We also recapitulate a couple of relevant large-deviation results. Our first main result, the sample-path LDP for the cumulative loss process, is stated and proved in Section 3. Special attention is paid to, easily-checkable, sufficient conditions under which this result holds. As argued above, the LDP is a generally applicable result, as it yields an expression for the decay rate of any probability that depends on the entire sample path. Then, in Section 4, we derive the exact asymptotic behavior of the probability that, at some point in time, the loss exceeds a certain threshold, that is, the asymptotics of pn, as defined in (1.3). After this we derive a similar result for the increments of the loss process. Eventually, in Section 5, we discuss a number of possible extensions to the results we have presented. Special attention is given to allowing dependence between obligors, and to different classes of obligors each having its own specific distributional properties. In the appendix we have collected a number of results from the literature in order to keep the exposition of the paper self-contained.
2. Notation and Definitions
The portfolios of banks and insurance companies are typically very large; they may consist of several thousands of assets. It is therefore computationally impossible to estimate the risks for each element, or obligor, in a portfolio. This explains why one attempts to assess the aggregated losses resulting from defaults, for example, bankruptcies, failure to repay loans or insurance claims, for the portfolio as a whole. The risk in the portfolio is then measured through this (aggregate) loss process. In the following sections we introduce the loss process and the portfolio constituents more formally.
2.1. Loss Process
Given the distribution of the loss amounts Ui and the default times τi, our goal is to investigate the loss process. Many of the techniques that have been developed so far, first fix a time T (typically one year), and then stochastic properties of the cumulative loss at time T, that is, Ln(T), are studied. Measures such as value at risk and economic capital are examples of these “one-dimensional” characteristics. Many interesting measures, however, involve properties of the entire path of the loss process rather than those of just one time epoch, examples being the probability that Ln(·) exceeds some barrier function ζ(·) for some t smaller than the horizon T, or the probability that (during a certain period) the loss always stays above a certain level. The event corresponding to the former probability might require the bank to attract more capital, or worse, it might lead to the bankruptcy of this bank. The event corresponding to the latter event might also lead to the bankruptcy of the bank, as a long period of stress may have substantial negative implications. We conclude that having a handle on these probabilities is therefore a useful instrument when assessing the risks involved in the bank′s portfolios.
As mentioned above, the number of obligors n in a portfolio is typically very large, thus prohibiting analyses based on the specific properties of the individual obligors. Instead, it is more natural to study the asymptotical behavior of the loss process as n → ∞. One could rely on a central-limit-theorem-based approach, but in this paper we focus on rare events, by using the theory of large deviations.
In the following subsection we provide some background of large-deviation theory, and we define a number of quantities that are used in the remainder of this paper.
2.2. Large Deviation Principle
In this section we give a short introduction to the theory of large deviations. Here, in an abstract setting, the limiting behavior of a family of probability measures {μn} on the Borel sets ℬ of a complete separable metric space, a Polish space, (𝒳, d) is studied, as n → ∞. This behavior is referred to as the large deviation principle (LDP), and it is characterized in terms of a rate function. The LDP states lower and upper exponential bounds for the value that the measures μn assign to sets in a topological space 𝒳. Below we state the definition of the rate function that has been taken from [17].
Definition 2.1. A rate function is a lower semicontinuous mapping I : 𝒳 → [0, ∞], for all α ∈ [0, ∞) the level set ΨI(α): = {x∣I(x) ≤ α} is a closed subset of 𝒳. A good rate function is a rate function for which all the level sets are compact subsets of 𝒳.
With the definition of the rate function in mind we state the large deviation principle for the sequence of measure {μn}.
Definition 2.2. We say that {μn} satisfies the large deviation principle with a rate function I(·) if
- (i)
(upper bound) for any closed set F⊆𝒳
- (ii)
(lower bound) for any open set G⊆𝒳
The LDP from Definition 2.2 provides upper and lower bounds for the log-asymptotic behavior of measures μn. In case of the loss process (2.1), fixed at some time t, we can easily establish an LDP by an application of Cramér′s theorem (Theorem A.1). This theorem yields that the rate function is given by , where is the Fenchel-Legendre transform of the random variable UZ(t).
The results we present in this paper involve either (Section 3), which corresponds to i.i.d. loss amounts Ui only, or (Section 4), which corresponds to those loss amounts up to time t. In the following section we derive an LDP for the whole path of the loss process, which can be considered as an extension of Cramér′s theorem.
3. A Sample-Path Large Deviation Result
In the previous section we have introduced the large deviation principle. In this section we derive a sample-path LDP for the cumulative loss process (2.1). We consider the exponential decay of the probability that the path of the loss process Ln(·) is in some set A, as the size n of the portfolio tends to infinity.
3.1. Assumptions
Assumption 1. Let φ, φn be as above. We assume that φn → φ and moreover that the measures μn and νn as defined in (3.5) and (3.6), respectively, are exponentially equivalent.
From Assumption 1, we learn that the differences between the two measures μn and νn go to zero at a “superexponential” rate. In the next section, in Lemma 3.3, we provide a sufficient condition, that is, easy to check, under which this assumption holds.
3.2. Main Result
The large deviations principle allows us to approximate a large variety of probabilities related to the average loss process, such as the probability that the loss process stays above a certain time-dependent level or the probability that the loss process exceeds a certain level before some given point in time.
Theorem 3.1. With Φ as in (3.3) and under Assumption 1, the average loss process, Ln(·)/n satisfies an LDP with rate function IU,p. Here, for , IU,p is given by
Observing the rate function for this sample-path LDP, we see that the effects of the default times τi and the loss amounts Ui are nicely decoupled into the two terms in the rate function, one involving the distribution of the default epoch τ (the “Sanov term”, cf. [17, Theorem 6.2.10]), the other one involving the incurred loss size U (the “Cramér term”, cf. [17, Theorem 2.2.3]. Observe that we recover Cramér′s theorem by considering a time grid consisting of a single time point, which means that Theorem 3.1 extends Cramér′s result. We also remark that, informally speaking, the optimizing φ ∈ Φ in (3.8) can be interpreted as the “most likely” distribution of the loss epoch, given that the path of Ln(·)/n is close to x.
In the proof of Theorem 3.1 we use the following lemma, which is related to the concept of epi-convergence, extensively discussed in [16]. After this proof, in which we use a “bare hands” approach, we discuss alternative, more sophisticated ways to establish Theorem 3.1.
Lemma 3.2. Let fn, f : D → ℝ, with D ⊂ ℝm compact. Assume that for all x ∈ D and for all xn → x in D we have
Proof. Let . Consider a subsequence . Let ϵ > 0 and choose such that for all k. By the compactness of D, there exists a limit point x ∈ D such that along a subsequence . By the hypothesis (3.10) we then have
Proof of Theorem 3.1. We start by establishing an identity from which we show both bounds. We need to calculate the probability
Upper Bound Starting from Equality (3.15), let us first establish the upper bound of the LDP. To this end, let F be a closed set and consider the decay rate
We can bound the first term in this expression from above using Lemma A.5, which implies that the decay rate (3.16) is majorized by
Lower Bound To complete the proof, we need to establish the corresponding lower bound. Let G be an open set and consider
In order to apply Theorem 3.1, one needs to check that Assumption 1 holds. In general, this could be a quite cumbersome exercise. In Lemma 3.3 below, we provide a sufficient, easy-to-check condition under which this assumption holds.
Lemma 3.3. Assume that for all θ ∈ ℝ:ΛU(θ) < ∞. Then Assumption 1 holds.
Remark 3.4. The assumption we make in Lemma 3.3, that is, that the logarithmic moment generating function is finite everywhere, is a common assumption in large deviations theory. We remark that for instance Mogul′skiĭ′s theorem [17, Theorem 5.1.2], also relies on this assumption; this theorem is a sample-path LDP for
Remark 3.5. In Lemma 3.3 it was assumed that ΛU(θ) < ∞, for all θ ∈ ℝ, but an equivalent condition is
Proof of Lemma 3.3. Let φn → φ for some sequence of φn ∈ Φ and φ ∈ Φ. We introduce two families of random vectors {Yn} and {Zn},
We have to show that for any δ > 0,
Remark 3.6. Large deviations analysis provides us with insight into the behavior of the system conditional on the rare event under consideration happening. In this remark we compare the insight we gain from the rate functions (3.7) and (3.8). We consider the decay rate of the probability of the rare event that the average loss process Ln(·)/n is in the set A, and do so by minimizing the rate function over x ∈ A (where x⋆ denotes the optimizing argument).
Let, for ease, the random vector (UiZi(1), …, UiZi(N)) have a density, given by by f(y1, …, yN). Then well-known large deviations reasoning yields that, conditional on the rare event A, the vector (UiZi(1), …, UiZi(N)) behaves as being sampled from an exponentially twisted distribution with density
Importantly, the rate function we identified in (3.8) gives more detailed information on the system conditional on being in the rare set A. The default times of the individual obligors are to be sampled from the distribution (with φ⋆ ∈ Φ the optimizing argument in (3.8)), whereas the claim size of an obligor defaulting at time i has density
We conclude this section with some examples.
Example 3.7. Assume that the loss amounts have finite support, say on the interval [0, u]. Then we clearly have
In practical applications, one (always) chooses a distribution with finite support for the loss amounts, since the exposure to every obligor is finite. Theorem 3.1 thus clearly holds for any (realistic) model of the loss given default.
An explicit expression for the rate function (3.8), or even the Fenchel-Legendre transform, is usually not available. On the other hand one can use numerical optimization techniques to calculate these quantities.
We next present an example to which Lemma 3.3 applies.
Example 3.8. Assume that the loss amount U is measured in a certain unit, and takes on the values u, 2u, … for some u > 0. Assume that it has a distribution of Poisson type with parameter λ > 0, in the sense that for i = 0,1, …,
4. Exact Asymptotic Results
In the previous section we have established a sample-path large deviation principle on a finite time grid; this LDP provides us with logarithmic asymptotics of the probability that the sample path of Ln(·)/n is contained in a given set, say A. The results presented in this section are different in several ways. In the first place, we derive exact asymptotics (rather than logarithmic asymptotics). In the second place, our time domain is not assumed to be finite, instead, we consider all integer numbers, ℕ. The price to be paid is that we restrict ourselves to special sets A, namely, those corresponding to the loss process (or the increment of the loss process) exceeding a given function. We work under the setup that we introduced in Section 2.1.
4.1. Crossing a Barrier
Theorem 4.1. Assume that
Before proving the result, which will rely on arguments similar to those in [18], one first discusses the meaning and implications of Theorem 4.1. In addition, one reflects on the role played by the assumptions. One does so by a sequence of remarks.
Remark 4.2. Comparing Theorem 4.1 to the Bahadur-Rao theorem (Theorem A.8), we observe that the probability of a sample mean exceeding a rare value has the same type of decay as the probability of our interest (i.e., the probability that the normalized loss process Ln(·)/n ever exceeds some function ζ). This decay looks like for positive constants C and I. This similarity can be explained as follows.
First, assume that the probability of our interest is actually the probability of a union events. Evidently, this probability is larger than the probability of any of the events in this union, and hence also larger than the largest among these:
As is clear from the statement of Theorem 4.1, two assumptions are needed to prove the claim; we now briefly comment on the role played by these.
Remark 4.3. Assumption (4.3) is needed to make sure that there is not a time epoch , different from t⋆, having a contribution of the same order as t⋆. It can be verified from our proof that if the uniqueness assumption is not met, the probability under consideration remains asymptotically proportional to , but we lack a clean expression for the proportionality constant.
Assumption (4.4) has to be imposed to make sure that the contribution of the “upper tail”, that is, time epochs t ∈ {t⋆ + 1, t⋆ + 2, …}, can be neglected; more formally, we should have
Remark 4.4. We now comment on what Assumption (4.4) means. Clearly,
Proof of Theorem 4.1. We start by rewriting the probability of interest as
4.2. Large Increments of the Loss Process
A similar probability has been considered in [5], where the authors derive the logarithmic asymptotic behavior of the probability that the increment of the loss, for some s < t, in a bounded interval exceeds a thresholds that depends only on t − s. In contrast, our approach uses a more flexible threshold, which depends on both times s and t, and in addition we derive the exact asymptotic behavior of this probability.
Theorem 4.5. Assume that
Remark 4.6. A first glance at Theorem 4.5 tells us the obtained result is very similar to the result of Theorem 4.1. The second condition, that is, Inequality (4.29), however, seems to be more restrictive than the corresponding condition, that is, Inequality (4.4), due to the infimum over s. This assumption has to make sure that the “upper tail” is negligible for any s. In the previous subsection we have seen that, under mild restrictions, the upper tail can be safely ignored when the barrier function grows at a rate of at least log t. We can extend this claim to our new setting of large increments, as follows.
First note that
The sufficient condition (4.32) shows that the range of admissible barrier functions is quite substantial, and, importantly, imposing (4.29) is not as restrictive as it seems at first glance.
Proof of Theorem 4.5. The proof of this theorem is very similar to that of Theorem 4.1. Therefore we only sketch the proof here.
As before, the probability of interest is split up into a “front part” and “tail part.” The tail part can be bounded using Assumption (4.29); this is done analogously to the way Assumption (4.4) was used in the proof of Theorem 4.1. The uniqueness assumption (4.28) then shows that the probability of interest is asymptotically equal to the probability that the increment between time s⋆ and t⋆ exceeds ξ(s⋆, t⋆); this is an application of the Bahadur-Rao theorem. Another application of the Bahadur-Rao theorem to the probability that the increment between time s⋆ and t⋆ exceeds ξ(s⋆, t⋆) yields the result.
5. Discussion and Concluding Remarks
In this paper, we have established a number of results with respect to the asymptotic behavior of the distribution of the loss process. In this section we discuss some of the assumptions in more detail and we consider extensions of the results that we have derived.
5.1. Extensions of the Sample-Path LDP
The first part of our work, Section 3, was devoted to establishing a sample-path large deviation principle on a finite time grid. Here we modeled the loss process as the sum of i.i.d. loss amounts multiplied by i.i.d. default indicators. From a practical point of view one can argue that the assumptions underlying our model are not always realistic. In particular, the random properties of the obligors cannot always be assumed independent. In addition, the assumption that all obligors behave in an i.i.d. fashion will not necessarily hold in practice. Both shortcomings can be dealt with, however, by adapting the model slightly.
5.2. Extensions of the Exact Asymptotics
In the second part of the paper, that is, Section 4, we have derived the exact asymptotic behavior for two special events. First we showed that, under certain conditions, the probability that the loss process exceeds a certain time-dependent level is asymptotically equal to the probability that the process exceeds this level at the “most likely” time t⋆. The exact asymptotics of this probability are obtained by applying the Bahadur-Rao theorem. A similar result has been obtained for an event related to the increment of the loss process. One could think of refining the logarithmic asymptotics, as developed in Section 3, to exact asymptotics. Note, however, that this is far from straightforward, as for general sets these asymptotics do not necessarily coincide with those of a univariate random variable, cf. [19].
Acknowledgments
V. Leijdekker would like to thank ABN AMROE bank for providing financial support. Part of this work was carried out while M. Mandjes was at Stanford University, USA. The authors are indebted to E. J. Balder (Utrecht University, The Netherlands) for pointing out to the authors, the relevance of epi-convergence to their research.
Appendix
Background Results
In this section, we state a number of definitions and results, taken from [17], which are used in the proofs in this paper.
Theorem A.1 (Cramér). Let Xi be i.i.d. real valued random variables with all exponential moments finite, and let μn be the law of the average . Then the sequence {μn} satisfies an LDP with rate function Λ⋆(·), where Λ⋆ is the Fenchel-Legendre transform of the Xi.
Proof . See, for example [17, Theorem 2.2.3].
Definition A.2. We say that two families of measures {μn} and {νn} on a complete separable metric space (𝒳, d) are exponentially equivalent if there exist two families of 𝒳-valued random variables {Yn} and {Zn} with marginal distributions {μn} and {νn}, respectively, such that for all δ > 0
Lemma A.3. For every triangular array , n ≥ 1, 1 ≤ i ≤ n,
Proof. Elementary, but also a direct consequence of [17, Lemma 1.2.15].
Lemma A.4. Let Λ(θ) < ∞ for all θ ∈ ℝ, then
Proof. This result is a part of [17, Lemma 2.2.20].
Lemma A.5. Let Kn,i be defined as Kn,j∶ = #{i ∈ {1, …, n}∣τi = j}. Then for any vector k ∈ ℕN, such that , we have that
Proof. See [17, Lemma 2.1.9].
Lemma A.6. Define
Proof. See [17, Lemma 5.1.8]. This lemma is one of the key steps in proving Mogul′skiĭ’s theorem, which provides a sample-path LDP for Zn(·) on a bounded interval.
Theorem A.7. If an LDP with a good rate function I(·) holds for the probability measures {μn}, which are exponentially equivalent to {νn}, then the same LDP holds for {νn}.
Proof. See [17, Theorem 4.2.13].
Theorem A.8 (Bahadur-Rao). Let Xi be a sequence of i.i.d. real-valued random variables. Then we have
- (i)
The law of X1 is lattice, that is, for some x0, d, the random variable (X1 − x0)/d is (a.s.) an integer number, and d is the largest number with this property. Under the additional condition 0 < ℙ (X1 = q) < 1, the constant CX,q is given by
()where σ satisfies . - (ii)
If the law of X1 is nonlattice, the constant CX,q is given by
()with σ as in case (i).