Missing phosphorus legacy of the Anthropocene: Quantifying residual phosphorus in the biosphere
Abstract
A defining feature of the Anthropocene is the distortion of the biosphere phosphorus (P) cycle. A relatively sudden acceleration of input fluxes without a concomitant increase in output fluxes has led to net accumulation of P in the terrestrial-aquatic continuum. Over the past century, P has been mined from geological deposits to produce crop fertilizers. When P inputs are not fully removed with harvest of crop biomass, the remaining P accumulates in soils. This residual P is a uniquely anthropogenic pool of P, and its management is critical for agronomic and environmental sustainability. Managing residual P first requires its quantification—but measuring residual P is challenging. In this review, we synthesize approaches to quantifying residual P, with emphasis on advantages, disadvantages, and complementarity. Common approaches to estimate residual P are mass balances, long-term experiments, soil test P trends and chronosequences, with varying suitability or even limitations to distinct spatiotemporal scales. We demonstrate that individual quantification approaches are (i) constrained, (ii) often complementary, and (iii) may be feasible at only certain time–space scales. While some of these challenges are inherent to the quantification approach, in many cases there are surmountable challenges that can be addressed by unifying existing P pool and flux datasets, standardizing and synchronizing data collection on pools and fluxes, and quantifying uncertainty. Though defined as a magnitude, the distribution and speciation of residual P is relatively less understood but shapes its utilization and environmental impacts. The form of residual P will vary by agroecosystem context due to edaphoclimatic-specific transformation of the accumulated P, which has implications for management (e.g., crop usage) and future policies (e.g., lag times in P loading from non-point sources). Quantifying the uncertainty in measuring residual P holds value beyond scientific understanding, as it supports prioritization of monitoring and management resources and inform policy.
1 INTRODUCTION
Phosphorus (P) is an essential element for life on Earth and accordingly has been managed by humans via agriculture. The degree of human alteration of the biogeochemical cycle of P has been drastic, and has occurred largely within the past century. The Anthropocene P cycle is characterized by a relatively rapid and 3- to 4-fold increase in P flux into the biosphere from the lithosphere (Bennett et al., 2001; Yuan et al., 2018). Influxes are dominated by fertilizer applications to soils under agricultural land use, enabled by historically recent (<90 years) large-scale mining of phosphate rock deposits to produce concentrated P fertilizers (Cordell et al., 2009; Yuan et al., 2018). This relatively recently accrued “positive balance” of P of the biosphere has directly caused P enrichment of soils, and consequently the terrestrial-aquatic continuum. Though often focused on soil due to its agricultural origins, the positive P balance of the biosphere—referred to as residual or in some cases “legacy” P—has changed the magnitude and potentially nature of P fluxes at varying spatiotemporal scales across soils and surface waters.
Since the 2000s, “legacy P” has emerged as a popular term to describe P that exists in soils and catchments as a result of either past anthropogenically released rock phosphate-derived P, or of human impacts on P fluxes in aquatic systems, including non-fertilizer P sources (e.g., sediment loading via erosion; Zhou & Margenot, 2023). The lack of more than one stable isotope for P has made it difficult to differentiate between the “legacy” of the newly added P and the background geogenic P in circulation prior to the mid-1900s. Phosphorus accumulated from past inputs, usually in an agricultural context but sometimes referring to historical or past loading of P, is often referred to as “legacy P”, though “residual P” was historically used to denote the accumulation of P inputs. For this review, we will use the term “residual P” because it better reflects the net increase in P stocks in the soil–water continuum due to anthropogenic P inputs (Zhou & Margenot, 2023).
While global estimates of residual P have been made (e.g., Cordell et al., 2009; Sattari et al., 2012), quantifying the magnitude and distribution of the positive P balance at finer spatiotemporal scales is necessary to resolve to understand and manage the anthropogenic P cycle. Here, we examine how residual P can be measured—how much and where—in order to advance fundamental biogeochemical understanding and applied management of the anthropogenic P cycle. Individual approaches to quantifying the magnitude and distribution of residual P are indirect and inadequate in isolation, leaving gaps across spatiotemporal scales (Figure 1). To measure residual P, mass balances, long-term experiments, soil test P trends, and chronosequences offer spatiotemporal-specific advantages and disadvantages. For this reason, combining approaches stands to improve the accuracy and certainty of estimation, though ultimately gaps will likely persist. Identifying such gaps is important, as the extent and (in)accuracy of residual P measurement have implications for research and policy with increasing demands on how to monitor and manage the agronomic and environmental trade-offs of this element that are complicated by its relatively recent accumulation in the biosphere.

2 RESIDUAL P IN THE TERRESTRIAL-AQUATIC CONTINUUM
The release and export of residual P from cultivated soil to aquatic systems are governed by multiple P transfer processes in the terrestrial-aquatic continuum at the watershed scale (Haygarth et al., 2005). Sediment inputs to streams and aquatic ecosystems can originate from upland erosion associated with intensive land clearance (e.g., deforestation), agricultural activities (e.g., tillage), and mining (James, 2013, 2018). These anthropogenic deposits are widely referred to as legacy sediment by environmental scientists, geomorphologists, sedimentologists, and hydrologists (Zhou & Margenot, 2023). Sediment-bound P has been referred to as sediment legacy P by some researchers (Lannergård et al., 2020; Søndergaard et al., 2003) because it can be released from lake or river beds to the water column through internal biogeochemical cycling processes, retarding the recovery of eutrophic aquatic systems.
However, the hydrology use of the term legacy P is distinct from and conflicts with the soil science and agronomy use of legacy P to mean residual P. Importantly, a large proportion of sediment-bound P is of non-anthropogenic origin. Thus, residual P interpreted strictly as P accumulated from past anthropogenic inputs would refer to the dissolved P in surface runoff and particulate P in eroded topsoil transported to rivers and lakes that was originally added to soils by humans (Barrow, 1980). The transfer of residual or newly applied P from terrestrial to aquatic ecosystems can lead to P accumulation in lakes and river beds, serving as a long-term P loading source to surface water (Holtan et al., 1988). According to the strict agronomic interpretation of residual P as anthropogenic inputs (i.e., accrued from fertilizer or manure applications), only the fertilizer-derived P desorbed from sediments would be considered to be sediment legacy P because it is derived from past applications of P to soil (Kleinman et al., 2011). The “legacy P hypothesis” postulated by Haygarth et al. (2014) for a catchment or wastershed means that delays in catchment export of loaded P due to long-term legacy effects are important to consider for surface water P loading (Powers et al., 2016).
3 ESTIMATING RESIDUAL PHOSPHORUS
Most approaches to quantifying residual P are (i) estimates at best, due to assumptions that are commonly made, (ii) indirect, and (iii) individually inadequate. For example, calculation of residual P by difference via a mass balance is an indirect estimate. Direct measurements of soil P often cannot establish the magnitude of residual P with high confidence due to incomplete measures of soil P change, necessary assumptions, and/or the lack of a true baseline of native soil P (i.e., t = 0 prior to anthropogenic inputs). Combining approaches can improve accuracy and certainty of estimation. However, which scale(s) is feasible versus which scale(s) is necessary may not necessarily align. This holds implications for fundamental understanding of P biogeochemistry and environmental policy.
3.1 Mass balances

Conversely, by assuming a mass balance (i.e., magnitude) and defining a spatial (e.g., watershed or a farm) and temporal (e.g., 1 year) boundary and resolution, one can also estimate a missing term (e.g., input, output, storage). The ability to estimate a missing term is useful in the context of residual P as it is often difficult to measure directly at larger spatial scales. If on the other hand one assumes zero residual P (steady state) then, by knowing most inputs and intentional outputs, one can estimate unintentional outputs, which in the case of P entails losses to waters via erosion and runoff. If on the other hand, we assume all inputs and outputs are known, we can estimate residual P.
If Inputs > Outputs there is a net surplus and there is likely to be an accumulation of P in soil. If Inputs < Outputs there is a net deficit and there is likely a drawdown of soil P. By adding net annual balances over many years for a given geographic area, one can estimate long-term build-up or depletion of P (e.g., Sabo et al., 2021).
The application of the mass balance principle to derive nutrient budgets is one means to estimate residual P. Balances are a key tool in biogeochemistry (Evans et al., 1997) and agronomy (Roy et al., 2003; Vitousek et al., 2009), offering a means to inventory input–output flows and the net outcome of such flows: increases, decreases or no changes in the amount of nutrient in a defined system. It is important to consider how system boundary choices affect mass balance study results and their interpretation. To quantify residual P with high confidence, it is necessary to have accurate data on all inputs and outputs of a system. This is rarely the case; information on intentional inputs is usually more readily available than on outputs, and in turn, quantifying intentional outputs is easier than unintentional ones. Intentional inputs usually include mineral fertilizers and animal manure whereas intentional outputs are crop and animal biomass harvest. If the system includes people (e.g., a city) then wastewater treatment (or lack thereof) is also considered an input.
Often, inputs and outputs that are rarely measured or not feasible to measure are also negligible on short timescales, such as atmospheric deposition. With increasing timescale, these annually minor input–output fluxes may amplify to be appreciable in magnitude. For example, in the annual P balance for a 22,360 ha catchment (including a 3160 ha lake), 6.2% of input fluxes were atmospheric deposition and 0.78% of output fluxes were total P discharged via outlets (Li et al., 2016). Accounting for unintentional P export fluxes (i.e., losses) can deliver high accuracy to mass balance approaches to estimate residual P.
Illustrating this, in New Zealand pastures receiving fertilizer P as triple superphosphate over 57 years, mass balance calculations that accounted for P losses by irrigation outwash, surface runoff, and leaching (as well as excretal transfer by grazing livestock) led to high verification via soil sampling to quantify soil P stock change: of the 946–1932 kg ha−1 of residual P estimated by mass balances, 97%–99% was recovered in the soil to 100 cm depth (Tian et al., 2019). Relative to residual P magnitude-wise, these output fluxes ranged from 17% (1932 kg residual P ha−1) to 25% (946 kg residual P ha−1), highlighting the potentially severe overestimation of residual P via mass balances if seemingly small and/or logistically challenging non-agronomic output fluxes are omitted. Uncertainty in mass balances is important to quantify, as they provide needed context on the confidence of residual P estimated. For example, the mean positive P balance of 0.8 kg P ha−1 year−1 estimated for the EU and UK during 2011–2019 was over fourfold lower than the 90% confidence interval of 3.5 kg P ha−1 year−1 (Panagos et al., 2022), severely limiting certainty of changes in P balances over time.
Although a given nutrient budget varies slightly in the inputs, outputs, and storage terms used, it is useful to contrast two prevalent “families” of mass balance approaches: the “soil” balance versus the “metabolic” balance. Haygarth et al. (1998) used a soil balance for dairy and sheep farms in the UK to identify considerable surplus. Subsequently, Page et al. (2005) demonstrated considerable surface soil depth residual P accumulation as a result of a positive soil P balance under UK grasslands. The metabolic balance draws on measures of many of the same inputs and outputs as the soil balance but makes distinct assumptions on the intermediary steps. The most common approach is called a Net Anthropogenic P Inventory (NAPI), derived from a systems approach developed for nitrogen (Howarth et al., 1996). This approach has been used to identify accumulation of P inputs, largely via fertilizer, in major basins globally, such as the largest watershed in China of the Yangtze River basin (Hu et al., 2020) or the Baltic Sea basin of Europe (McCrackin et al., 2018).
When we have information only on intentional inputs and outputs, we can still make inferences on the “risk” of P storage or loss by calculating the P use efficiency (PUE). There are multiple ways to define and calculate PUE, much of it inherited from metrics on nitrogen use efficiency, but a simple calculation that reflects the mass balance approach focused on intentional (i.e., agronomic) inputs and outputs (Figure 2) is the proportion of inputs in magnitude removed as outputs. It is important to note that this does not mean that outputs of P were necessarily derived from inputs of P, but that the PUE metric is a ratio of relative magnitudes. This metric of PUE is but one of several approaches to estimating PUE, and—as for other nutrient elements such as nitrogen—is not a true determination of PUE, which requires direct tracing via isotopic labeling (i.e., 32P or 33P). Rather, such indirect approaches yield “apparent” use efficiency (Congreves et al., 2021; Harmsen & Moraghan, 1988). While useful to gauge PUE in cropping systems at longer timescales (e.g., Zou et al., 2022), this approach overestimates how efficiently crops (or a system of interest) uses new P inputs because it does not explicitly account for residual P contributions (Dobermann, 2007). Similar to nitrogen use efficiency, PUE determined directly by isotopic labeling of the fertilizer nutrient is the most accurate metric of use efficiency (Congreves et al., 2021), but relatively short half-lives of the two major P radioisotopes and radioactivity limit applicability for experiments outside of controlled conditions (e.g., limitation to greenhouse).
To a much greater extent than N, P not removed with biomass harvest can accumulate in the soil. As such, low same-season PUE means inputs can be present as residual P for future cropping seasons. In other words, although low nitrogen use efficiency generally means higher losses (Schröder et al., 2003; van Beek et al., 2003), low PUE may take several years to see the high input-to-loss signals. For example, largely positive mass balances at the field scale (1.1–8.4 ha) incurred by poultry litter application over 15 years were positively associated (R2 = .95) with dissolved reactive P loss via surface runoff from row crop fields (Bos et al., 2021).
At the watershed and regional scales, P balances tend to be positively associated, albeit less strongly, with P loads to surface waters (McCrackin et al., 2018). In many cases, the mass balance approach is often focused on predicting water quality rather than quantifying residual P. Furthermore, residual P is often precisely why P balances are not strongly correlated with P export from the system, particularly at large spatial scales such as watersheds (Metson et al., 2020; Stackpoole et al., 2019; Van Staden et al., 2022). Although soils (and other watershed components) can buffer waters from accumulated anthropogenic P, the lack of a correlation between annual budgets and water quality outcomes can reflect continued contributions of residual P to vertical (i.e., leaching) and horizontal (i.e., surface runoff) losses off-farm losses after inputs have ceased. This means that it may take tens or even hundreds of years to draw down residual P to not contribute to water quality issues, as had been modeled in Canada (Goyette et al., 2018; Kusmer et al., 2019) and Sweden (McCrackin et al., 2018). Meeting water quality objectives thus must account for residual P and its time-lag effect on water quality (Jarvie et al., 2013).
As the aggregated P surpluses over time estimate how much P accumulated within the boundaries of a system, from study plot to continent (see Section 4.2), mass balances do not enable identification of where within the system the residual P is located (Van Meter et al., 2021). This is more important with increasing spatial scales, as the location of residual P has greater uncertainty. Here, field-based sampling to quantify soil P stocks if historic or non-agricultural land use reference soil samples are available as a baseline is a powerful approach. In some cases, remote sensing (e.g., satellite) may be able to increase the resolution of inputs and outputs, and thus how much and where potential residual P is located. For example, remote sensing can quantify crop biomass with high precision at the sub-field scale (Guan et al., 2016; Ye et al., 2023) to estimate P efflux with harvest (e.g., Zhang et al., 2023), or historical changes in concentrated feeding operations that can be used to identify where the application of manure is likely higher and thus accumulation of P (e.g., Miralha et al., 2021; Shea et al., 2022).
3.2 Long-term experiments
Long-term research experiments can enable calculation and verification of positive P balances with relatively high certainty by virtue of small spatial area and control of major input and output fluxes, enabling near-complete and accurate balances. Accounting for all input and output fluxes, which includes detailed record keeping and measuring—not assuming—P fluxes (e.g., manure or grain P concentrations) increases certainty. For example, a recent evaluation of P balances across 56 long-term experiments in North America revealed that uncertainty in P balances averaged more than 50% (13.1 kg ha−1) of mean balances (22.4 kg P ha−1; Welikhe et al., 2023). Uncertainty was sufficiently large in magnitude to preclude determination of net change in P balance over time for 39% of long-term experiment balances. However, long-term experiments with good record keeping (e.g., as-applied rates of inputs and measurement of P concentrations in input–output fluxes) and measurement of P input–output fluxes are generally few and far between in the agricultural world, and relatively young. For example, at the turn of the 20th century, the US Department of Agriculture established an 18-site network of long-term agroecosystem research experiments (Kleinman et al., 2018) historical long–term experiments that exceed 100 years can be counted on one hand. This cluster of centennial field experiments in the US was established with the advent of the US land grant institutions in the late 1800s, several of which continue today. In addition to the Morrow Plots at University of Illinois—the eldest sibling (1876)—there is the Sanborn Field at University of Missouri (1888), Magruder Plots at Oklahoma State University (1892), and the Cullars Rotation at Auburn University (1911; Richter & Markewitz, 2001).
Phosphorus balances in select treatment plots of the second oldest continuous agricultural experiment in the world exemplify the strengths and weaknesses of long-term experiments for quantifying residual P (Figure 3). Balances from the Morrow Plots are positive (up to 5000 kg P ha−1) with highly variable increases in soil P in plots receiving inputs (fertilized) over a 133-year timespan. Residual P accrued rapidly at times, increasing at a rate of up to 125 kg ha−1 year−1 in the early 20th century under rock phosphate application. Phosphorus drawdown was significantly more constrained due to the inherent limits to P uptake and removal with harvest of crop biomass (maize grain). Consequently, the magnitude of change in the most heavily fertilized treatment after 133 years is fivefold that of the unfertilized treatment that did not receive P inputs but for which P was still exported via maize grain harvest.

However, long-term experiments provide valuable context on processes relevant to P dynamics with relatively high certainty of tracking P balances. For example, changes in soil organic matter or yield trends at timescales that exceed most experimental durations of 5–10 years (Bowles et al., 2020; Johnston et al., 2017) can be used to contextualize P balances and the magnitude—as well as speciation—of residual P (e.g., Sun et al., 2022). Finally, longitudinal data, especially if auxiliary data is collected (e.g., weather), raises the possibility for calibration and validation of biogeochemical P models at a scale that enables backcasting and forecasting at a longer range and with higher confidence.
3.3 Soil test trends
Agronomic measures of “soil test” P are generally regarded as the most readily available form of P for plants and have been instrumental in guiding P management (Hopkins & Hansen, 2019), but only account for a small proportion of total P in soils. Developed in the 1940s, the first soil tests were developed to predict the probability of crop yield response to addition of fertilizer (Peck, 1990)—and for P, specifically, with the “Bray-1” test in 1945 (Bray & Kurtz, 1945), followed closely by Mehlich tests (Mehlich, 1978, 1984) and the Olsen test (Olsen, 1954). Because such soil tests are widely used by farmers in developed countries to guide P fertilizer applications, soil test P measurements arguably offer the largest dataset with greatest coverage in space and time over the past several decades. For this reason, scaling soil testing has been proposed as a cost-effective means to begin to identify the magnitude as well as spatial distribution of residual P (Wironen et al., 2018). Alternatively, modeling soil test P concentrations across croplands can provide an estimate—albeit coarse at best—of P accrual (McDowell et al., 2023).
However, these agronomic measures of soil P (i) are operational (e.g., do not neessarily measure bioavailable P), (ii) vary in extraction procedures that furnish values that cannot be directly compared, and (iii) generally represent a minor fraction of total P in soil. The latter is perhaps the most salient limitation to the use of soil test P to measure residual P. For example, a global dataset of total soil P stocks, albeit limited to 50 cm depth, indicated that in US soils, soil test P accounts for a minor proportion (<7%) of P stocks (Yang et al., 2013), and often in a non-proportional manner. While balances for soil test P can be constructed, in particular to assess surface or subsurface losses of P in dissolved forms (e.g., Ekholm et al., 2005), these balances reflect only a minor portion of total soil P. In a recent assessment of P balances across US agroecosystems, labile P comprised less than 10%, and approximately 3% on average of total P (Swaney & Howarth, 2019). Similarly, recent work in Ohio, US, showed that while fertilizer P application in excess of P export with crop biomass harvest did not increase ("build") soil test P, fertilizer application rates at or below P export rates caused decreases in soil test P ("drawdown") (Fulford & Culman, 2018) and substantially shifted P speciation (Wade et al., 2019). Thus, changes in soil test P only capture an inconsistently small fraction of the total P stocks, and only indirectly (if at all) reflect residual P.
Another approach is the use of P saturation indices (Nair, 2014; Pautler & Sims, 2000), which are often calculated using Fe and Al from the same extractant used for soil test P (e.g., Mehlich-3 extraction; Maguire & Sims, 2002; Nair et al., 2004) to estimate the proportion of P binding sites that are occupied. While useful for gauging relative loading of P to soils and P loss risk (i.e., greater risk as saturation increases; Pote et al., 1999), P saturation indices do not provide any information on the amount of residual P. While attempts have been made to use P saturation indices to evaluate residual P (e.g., Sharpley et al., 2020), quantification of residual P magnitudes (i.e., direct quantification via changes in soil P stocks to depth), is still needed to verify the degree to which indices provide a reasonable estimate of residual P in specific contexts. Similar to soil test P, changes in P saturation indices may provide a qualitative estimate of total soil P changes and thus residual P, even if it is imperfect.
Though there is room—and arguably a strong need—to develop soil tests that more quantitatively reflect residual P (Barrow et al., 2022), such tests would also need to be cost-effective and match, if not exceed, the value of entrenched soil testing approaches. Moreover, soil test P data is largely privately owned, being paid for by farmers and agronomists and measured by commercial testing labs. Anonymizing soil test P data from commercial labs offers a compromise to track medium–spatial-scale trends (e.g., county level of US states), as has been successfully demonstrated in Ohio, US (Dayton et al., 2020) and New Zealand (McDowell et al., 2020). When paired with mass balances, soil test P can be used to corroborate trends in mass balances (i.e., positive or negative) at state to county to field scale, with success (e.g., Dayton et al., 2020) or severe discrepancies (e.g., MacDonald et al., 2011). Accessing and consolidating such databases holds strong promise for diagnosing the directionality of residual P (i.e., continued accumulation or mining) with minimal resource investment.
The potential of soil test P values to proxy directionality and potentially the magnitude of residual P rests in large part on the fate of applied P not taken up by the crop. For example, soil Olsen P concentrations and stocks across an agricultural landscape were used as proxies for residual P accumulation and transfer as a function of tillage and water erosion (VandenBygaart et al., 2021). However, because of the multiple sinks for residual P in soils, and the non-linear or consistently proportional relationships of residual P and soil test P (Barrow, 1980; McCollum, 1991), soil test P is at best a qualitative metric of residual P. In year 145 of the Morrow Plots, soil test (Mehlich-3) P values did not reflect absolute values of residual P (kg ha-1) among 12 treatments resulting from combinations of crop rotation and fertility input practices (Figure 4). Thus, tough generally more commercially available due to lower analysis cost and higher throughput than total P, soil test P measurements are a valuable starting point but alone cannot be used to quantify residual P. From a practical standpoint of managing residual P, intentionally managing agroecosystems to have a negative P balance by applying less P relative to crop biomass P removal is often monitored by farmers using soil tests (i.e., “drawdown”). Given transformations of residual P (see Section 3.2) and the operational nature of soil tests (i.e., probabilistic predictors of crop yield response rather than bioavailable P; Hopkins & Hansen, 2019; O'Connor, 1988), a key need is the development of soil test relationships with residual P for specific agroecosystem types to ensure agronomically viable drawdown of accumulated P in soils that can double for better quantifying residual P for broader management and research purposes.

3.4 Chronosequences
The chronosequence, or “time sequence,” is a classic approach to studying phenomena that operate over long timescales, such as soil formation over millions of years or ecosystem development over centuries. Also known as space-for-time substitution, a chronosequence is a series of spatially discrete sites that differ in age of a process of interest (Stevens & Walker, 1970; Walker et al., 2010). The key assumption is that sites largely or only differ in age (e.g., parent material or climate). Chronosequences have been used to provide insights to factors that determine the dynamics, bioavailability, and fate of P in soil–plant systems under minimal anthropogenic disturbance (Turner & Condron, 2013). A framework to understand changes in the quantity and speciation of P during pedogenesis was proposed by Walker and Syers (1976) using soil profile data from four contrasting chronosequences in New Zealand. Subsequent data from many other chronosequences have consistently validated the principles proposed by Walker and Syers (1976) for terrestrial P biogeochemistry, notably the transformation of insoluble calcium-bound inorganic P in parent material (e.g., apatite) to inorganic P adsorbed on iron and aluminum oxide surfaces, entailing decreasing net bioavailability of P over time.
At the shorter timescales of the Anthropocene, chronosequences of agroecosystems can be used to assess residual P by comparing fields that differ in age of P management practices. For example, a 2000-year chronosequence of a rice-wheat rotation in the Bay of Hangzhou (China) revealed initial (100 years) accumulation of P as organic P before stabilizing under agricultural land use (Jiang et al., 2017) and enrichment of P across colloidal pools of all sizes (Jiang et al., 2023), suggesting decoupled chemical versus physical transformations during residual P increases. Conceptually, chronosequences could be derived from a collection of long-term experiments if the age of treatment was the only or major difference among experiments. For example, long-term trials ranging in duration from 30 to nearly 150 years in the former tallgrass prairie region of the north central US (Huggins et al., 1998), now under highly intensified agriculture with high residual P hypothesized and estimated by models (MacDonald et al., 2011), could provide a coarse understanding of residual P depending on comparability of soil types and P input treatments.
A variant of the chronosequence approach is the resampling soils that were sampled in the past to compare changes in soil P stocks and thus estimate residual P over time. Though not always available, archives of soils taken at a or multiple past point(s) in time provide a time series, and if the location of samples is known (e.g., field experiment, soils from mapping surveys), resampling the original locations offers a chronosequence approach (e.g., Veenstra & Lee Burras, 2015). This approach pairs especially well with long-term field experiments and enables verification of calculated mass balances. However, soil archives—be they of soils from mapping surveys or (more commonly) long-term experiments are generally less than half a century in age, averaging 48 years and with a median of 37 years in age as of 2022 across global archives (Bergh et al., 2022). This limits the temporal coverage of residual P. On the other hand, since P input fluxes to agriculturally managed soils only increased—at global scale—starting in the second half of the 20th century (Cordell et al., 2009), the majority of residual P is likely to be captured by chronosequences covering the past 50–70 years. In some locations of the world (e.g., East Asia, sub-Saharan Africa) in which the use of P fertilizers has only recently increased in the latter part of the 20th century (Obersteiner et al., 2013), multiple decades may be sufficient to capture residual P.
4 IMPROVING OUR UNDERSTANDING OF RESIDUAL P
4.1 What is the form or speciation of residual P?
Though added to soil largely as inorganic P from fertilizer inputs or manure and sometimes as organic P via manure and other organic matter amendments, the fate of P accrued in soils will depend on multiple factors, and has implications for agronomic reuse and transfer to surface waters. Major pathways of residual P transformation include immobilization, precipitation, and occlusion. The lack of more than one stable isotope of P, and the short half-life of radioisotopes (14.3 days for 32P, 25.4 days for 33P; Evans & Read, 1992) precludes direct tracing of labeled P inputs at timescales relevant to residual P. Thus, resolving speciation of residual P will likely require comparisons of soils under contrasting P balances. However, it is difficult to obtain true controls for residual P over time. Because residual P is a positive balance, a neutral balance control is needed, in which P is added to exactly replace P export. Agronomically, this would entail P inputs that match P outputs via crop biomass harvest, but as described previously, the relatively lower magnitude but cumulatively appreciable input–output fluxes should also be accounted for (e.g., atmospheric deposition, erosional or runoff losses)—but are challenging to quantify. Additionally, mimicking these non-fertilizer and non-crop input–output fluxes to provide a truly neutral P balance control via P fertilizers will likely entail artifacts (e.g., replacing P loss via erosion with fertilizer will entail a difference in P speciation). As a result, many data on the speciation of residual P are comparisons of positive P balances incurred by P fertilizer exceeding crop removal rate relative to a P-unfertilized control—yet this P-unfertilized comparison is a negative P balance (Figure 5). As a result, comparisons are of soils with residual P relative to soils with mined P. Ideally, large positive P balance treatments are compared to minor P negative balance treatments as a control to minimize potential artifacts.

Despite this, limited evaluations to date suggest that residual P from highly soluble P fertilizers re-equilibrates across multiple pools over time, and that under certain conditions may be disproportionately stored in certain pools depending on agroecosystem properties and management. For example, in Minnesota, USA, residual P over 4 years was preferentially accumulated in inorganic relative to organic pools, and two fractions interpreted as “labile” P (resin and NaHCO3-extractable Pi) accounted for 60% of residual P (Sims et al., 2023). Here, it is important to note that “residual” is often used to describe the fraction of non-extractable P, also referred to as “resistant P” and commonly interpreted as “occluded P” (Bowman et al., 1998), though occluded P is a pool (Walker & Syers, 1976) that may not necessarily be the same as the operationally defined non-extractable P. Residual P as a non-extractable fraction of soil P is defined as the difference between the sum of sequentially extractable P fractions and separately measured total P of a soil sample (Condron & Newman, 2011). As with any fractionation-derived measure, this non-extractable P fraction is defined operationally, and its interpretation remains an open question. Thus, we suggest caution in how “residual” P is used given different meanings for different contexts: residual P as a positive P balance as used in the present work (Zhou & Margenot, 2023) versus an operationally defined P fraction (Bowman et al., 1998; Condron & Newman, 2011).
Over longer timescales, residual P may equilibrate across various pools, particularly in fractions that are thought to represent less crop-available pools (Figure 5). For example, 16-year residual P magnitudes of 66–284 kg ha−1 at 0–15 cm depth in eastern Canada predominantly accumulated as labile Pi (27%–43%) and iron (Fe)- and aluminum (Al)-associated Pi (NaOH-Pi; 25%–48%) (Shi et al., 2013; Figure 5c). At a longer timescale of 145 years in Illinois, USA, residual P of 347–741 kg ha−1 to 90 cm depth entailed enrichment of calcium (Ca)-associated pools (HCl-Pi), followed by Fe- and Al-associated pools, and the least enrichment occurred in labile, organic, and non-extractable pools (Figure 5a). At Rothamsted, UK, 137 years of wheat cropping without fertilizer or with manure or fertilizer application led to −305 to +45 kg ha−1 P balances at 0–23 cm depth (Blake et al., 2003; Figure 5b). As residual P increased over time, Ca-associated P (5%–400%) and Fe- or Al-associated P (4%–88%) pools were the generally dominant if highly variable forms of residual P estimated by comparison to the unfertilized soil, whereas labile, organic, and occluded P pools, inferred by operationally defined fractions, decreased. Sequentially extracted P fractions are approximations of in situ pools (Gu & Margenot, 2021), which depending on mineralogy may lead to mis-estimation of pool sizes (e.g., NaOH-Pi as Fe- and Al-associated P) (Barrow et al., 2021; Gu et al., 2020). On the other hand, many operationally defined fractions can be interpreted with high confidence (e.g., exchangeable Pi) and with greater accuracy (e.g., organic P) than by complementary techniques such as X-ray absorption near edge spectroscopy (XANES; Gu et al., 2020). Thus, operationally defined sequentially extracted P fractions still hold value (Condron & Newman, 2011; Gu et al., 2020) in providing insights to P dynamics during soil weathering and pedogenesis (e.g., Cross & Schlesinger, 1995).
The composition of residual P can change with its magnitude over time, and in unexpectedly distinct ways depending on agroecosystem management practices. For example, long-term manure or P fertilizer application at Rothamsted (UK; Blake et al., 2003) initiated in 1856 resulted in large positive P balances (+1035 and +1222 kg ha−1 for FYM and PK, respectively) and net increases in labile P, Fe- and Al-associated Pi, and Ca-associated P pools before 1903, at which point balances became negative (−752 and −644 kg ha−1 for FYM and PK, respectively) with decreases in P pool sizes from 1903 to 1993 (Figure 6a), resulting in net decreases or minor changes in these same P fractions (Figure 5b) over the combined 1856-1993 period. Temporal changes in P stocks in agroecosystems are generally driven by changes in P application and crop removal rates, especially with the co-introduction of high-yielding hybrids and synthetic P fertilizers in the second half of the 20th century. Over a shorter timescale at the L'Acadie Experiment (Canada) (Shi et al., 2013) and with consistent P input treatments, changes in stocks of P pools varied (i) with time, (ii) non-linearly with the positive P balance, and (iii) by tillage practices (Figure 6b). For example, changes in the non-extractable P fraction from 1992 to 2002 were negative across treatments, which became positive from 2002 to 2008. Initial increases in labile P under no-tillage from 1992 to 2002 were counteracted by decreases after 2002, and Fe- and Al-associated P under high P input changed from 7%–21% to 60%–70% over time.

4.2 Quantifying residual P answers “how much”; knowing “where” also matters
The inability to determine where the net accumulation of P has occurred within a system is a key limitation of most aproaches to quantify residual P, including the arguably most quantitative approach of mass balances (Van Meter et al., 2021). As the system becomes increasingly larger, knowing where the residual P resides is valuable to (i) empirically characterize this pool, (ii) integrate residual P into biogeochemical models and understanding for a given location, and (iii) manage residual P. This is particularly important to identify hotspots and coldspots of P loss, which can account for the majority of P losses and can be driven by localized residual P (e.g., Andino et al., 2020) within the field scale (102–103 m2) to watershed scale (104–1010 m2; Sharpley et al., 2001) and country scale (108–1014 m2; e.g., Wang et al., 2018). Alternatively, identifying spatial distribution of residual P, even on a probability basis, enables targeting agronomic approaches (e.g., drawdown). For example, across Brazil, an estimated 31.8 Tg residual P in surface soils was estimated to be potentially available for crop production, but with marked variation at fine scales (Pavinato et al., 2010). Averages can mask localized differences in residual P. For example, an average balance of 0.11 kg ha−1 year−1 for the EU and UK varied at the country scale from −2.5 to 5.2 kg P ha−1 year−1 during 2011–2019 (Muntwyler et al., 2024), illustrating that net positive balances at a given spatial scale can mask substantial variation in balances at finer spatial scales.
As an example of blind spots on where residual P is distributed, accumulation of P in soils of the state of Illinois from 1945 to 1998 is estimated to have engendered approximately 2,200,000 Mg of residual P via fertilizer-driven positive balances (David & Gentry, 2000). Averaged across all of the state's 15 million ha, this corresponds to a mean residual P magnitude of 147 kg P ha−1, but reasonably accounting for just the 9.3 million ha of cropland corresponds to an average of 230 kg P ha−1. Clearly, the distribution of these 2.2 million Mg of residual P in soils and likely waters across this region is spatially heterogeneous. The spatial distribution of residual P could be refined by recalculating mass balances for subsystems (e.g., HUC-8 watersheds of Illinois in the aforementioned example) and by large-scale data collection such as soil testing (see Section 3.3) which despite its limitations that may still identify trends in P at finer (e.g., country) spatial scales (e.g., Dayton et al., 2020). Here, partial P balances from publicly available crop yield and fertilizer sales data over time may provide a sense of localization, but in the US context, limited to county scale (104–107 ha).
At a finer scale, the fate of residual P location within the soil profile is strongly dictated by soil texture, precipitation, and tillage. The relative lower mobility of P from fertilizers, largely added as orthophosphate in highly soluble salt forms (e.g., single or triple superphosphate, and ammonium phosphate) means that P is less translocated relative to N—but P can still be vertically translocated over time. Coarse-textured soils with low P sorption capacity, intensive tillage, and/or high precipitation (or irrigation) are anticipated to maximize vertical movement of P (Sims et al., 1998), whereas fine-textured soils with generally higher P sorption capacity, minimal soil mixing, and/or low surface inputs of water are likely to minimize vertical redistribution.
In general, residual P is found at surface depths due to surface application of P inputs. For example, on a silt loam in New Zealand, 75% of residual P (1932 kg ha−1) after 57 years was found at 0–25 cm depth (Tian et al., 2019). After 145 years on a silty clay loam in North Central US, the majority of residual P (525–833 kg ha−1) occurred at 0–30 cm depth (Figure 5). Despite large positive balances at 0–90 cm depth, differences in balances used to proxy residual P were mostly confined to 0–30 cm depth and occurred concomitantly with net decreases in total P at 60–90 cm depth, suggesting that P accumulation and pool size changes at surface depths can be decoupled from and occur independently of subsurface depth changes. An implication of this is that limiting soil sampling to surface depths would overestimate the magnitude of residual P by not accounting for subsurface P stock decreases.
Thus, though most residual P occurs at surface depths, full-profile assessments of soil P stocks are still needed to verify residual P magnitudes. Sampling beyond the 0–15 cm depth commonly used for agronomic soil sampling and soil science in recent decades (Yost et al., 2018) underscores the importance of sampling beyond surface depths for empirical measurements of residual P and/or to corroborate mass balance calculations. The form of P input can also influence the distribution of P, likely due to interactions of P input chemistry with soil properties that determine the fate of accumulated P. For example, after 16 years on loamy sand in Germany, a greater proportion of residual P from compost (60% of 1355 kg P ha−1) was found at 0–30 cm depth compared to triple superphosphate (39% of 891 kg P ha−1; Koch et al., 2018).
5 OUTLOOK
5.1 Addressing data limitations in space and time: Consolidating datasets that enable quantification approaches
A major need in the quantification of residual P that forms a basis for its management is “simply” that of data. While certain time–space scales of residual P may be inherently light or even devoid of data, it is important to keep in mind that for certain space–time contexts, even coarse estimates of residual P may not be possible (Figure 1). As the Anthropocene intensifies, and with increasing calls to intentionally manage anthropogenic fluxes of P at a system scale (Reitzel et al., 2019; Springmann et al., 2018), considering what kind of data should be collected now to avoid limitations to future residual P quantification would likely prove fruitful.
Aggregating data of varying types that enable complementary approaches reviewed here to estimating residual P is needed to resolve “how much” and “where,” and will require cooperation among many entities. Though much data exist that could be used to construct or refine residual P estimates at varying space–time scales, data from long-term field experiments, soil testing labs (including commercial), and historical records may exist behind paywalls and across many decentralized public and private depots. Additionally, intersectional gaps in P between traditional domains should be addressed, as data and insights may be missed by complementary but unintegrated disciplines (Reitzel et al., 2019). A concerted effort to identify, collect, and harmonize data to enable residual P measurement is needed.
5.2 Quantifying uncertainty to guide action
There are multiple and interrelated agronomic, environmental, and sociopolitical reasons for deriving estimates of residual P. Managing this large anthropogenic P pool requires some sense of magnitude, but the uncertainties that result from incomplete and inexact approaches to its estimation raise the question of what degree of (im)precision is acceptable. This will likely vary by the application. For example, agronomic management of residual P may tolerate greater margins of error than water quality improvement programs. How researchers, land managers, and policymakers navigate trade-offs of uncertainty for action on residual P is a defining feature of this anthropogenic P pool that can be optimized in part by improving its quantification, such as combining measurement methods reviewed in Section 3.
However, it must be recognized that uncertainty is inherent to residual P measurement given the complexity of input and output fluxes that determine its magnitude, and that these uncertainties amplify over time. The spatial heterogeneity of residual P, from scales of meters within a field (e.g., Andino et al., 2020) to across continents (e.g., Pavinato et al., 2020) should be acknowledged as a defining aspect of this anthropogenic P pool in research, management and policy discussions. Moreover, uncertainties in residual P resulting from “data holes” (see Section 5.1) do not necessarily handicap efforts to understand and manage this anthropogenic P pool. Here, two specific questions should be considered: First, do pre-20th century P fluxes matter for residual P assessments? There is high uncertainty but also low magnitude of P fluxes with increasing backcasting. This reflects residual P as a defining feature of the Anthropocene: pre-mid-20th century, anthropogenic and thus net inputs of P to agroecosystems were minimal, and were often accompanied by relatively low output fluxes due to low crop productivity with net negative P balances (e.g., Figure 3). Thus, the proportion of residual P is likely to have been encumbered in the past 50–80 years in industrialized nations (e.g., Germany or US), and in less time for more recently industrialized or industrializing countries (e.g., Brazil, China). Second, at what spatial scale is uncertainty lowest and highest, and thus where is uncertainty least versus most problematic? It is undeniable that uncertainties and blind spots challenge decision-making that seeks to provide precise and accurate recommendations. On the other hand, some blind spots may not be important to agronomic and environmental goals, and/or likely impossible to resolve (Figure 1). Understanding approaches to quantify residual P and the limitations of these approaches, individually and in combination, can help inform trade-off navigation between agricultural and environmental considerations.
5.3 Recommended next steps
Beyond long-term agricultural experiments, scaling assessments of residual P spatially can partly overcome the data gap by capitalizing on large datasets that exist but are not consolidated or difficult to access, particularly for mass balance and soil test trends. Doing so can further buttress modeling approaches to potentially address some of the space–time gaps of residual P estimates (Figure 1)—though here, researchers may need to be selective on which gaps to prioritize. While answering the question of “how much?” by quantifying residual P is the first step, going beyond the magnitude and identifying “where” and “what form” of residual P also matters to guide management of residual P for agronomic (i.e., utilization by crops) and environmental (i.e., residual P impacts on water quality) outcomes. Scales and contexts in which the magnitude of residual P can be measured in concert with its distribution and speciation are key research priorities.
Large spatial scale evaluations at country to global scales are important first steps (e.g., MacDonald et al., 2011), but refining where residual P is located at scales appropriate for management and monitoring is ultimately needed. It is important to keep in mind that while there is a global positive P balance and thus residual P in soils under agriculture as a whole, this is highly spatially segregated, notably to North America, western Europe, and eastern Asia, particularly China (MacDonald et al., 2011). In countries with legacies of P inputs exceeding crop removal, residual P is a massive resource that can offset future P fertilizer requirements by 10%–40% (Sattari et al., 2012) if managed correctly, for which unknowns persist (Rowe et al., 2016) but can be informed by speciation of residual P (see Section 4.1). For this reason, global imbalances in P, including negative P balances, should be kept in mind: there is still a need for net P inputs to soils under agricultural land use in much of the world, particularly in the developing world tropics (Simons et al., 2014).
Finally, communicating uncertainties in the quantification of this anthropogenic P pool is essential to ensure reasonable decisions on trade-offs for its management. In particular for policy, “divide and conquer” may serve as a means to prioritize limited resources for residual P contamination based on agronomic versus environmental versus socioeconomic trade-offs. All three must be considered to steer and reshape this uniquely anthropogenic P pool to mitigate negative impacts and contribute to a more sustainable P future.
AUTHOR CONTRIBUTIONS
Andrew J. Margenot: Conceptualization; data curation; funding acquisition; investigation; methodology; project administration; resources; supervision; validation; visualization; writing – original draft; writing – review and editing. Shengnan Zhou: Visualization; writing – original draft; writing – review and editing. Leo M. Condron: Investigation; writing – original draft; writing – review and editing. Geneviève S. Metson: Writing – original draft; writing – review and editing. Philip M. Haygarth: Investigation; writing – original draft; writing – review and editing. Jordon Wade: Writing – original draft; writing – review and editing. Suwei Xu: Data curation; formal analysis; visualization; writing – review and editing. Prince C. Agyeman: Writing – review and editing.
ACKNOWLEDGMENTS
This work was supported in part by Illinois Nutrient Research & Education Council awards #2021-4-360731-469 (A.J.M.), #2023-4-360731-642 (A.J.M.), #2023-5-360731-527 (A.J.M.) and U.S. National Science Foundation award #2125626 (A.J.M). We thank Maia G. Rothman for contributions on soil analyses.
CONFLICT OF INTEREST STATEMENT
The authors have no conflicts of interest to report.
Open Research
DATA AVAILABILITY STATEMENT
The data that support the findings of this study are openly available at Illinois Data Bank: Margenot, Andrew; Zhou, Shengnan; Xu, Suwei; Condron, Leo; Metson, Geneviève; Haygarth, Philip; Wade, Jordon; Agyeman, Prince Chapman (2024): Missing phosphorus legacy of the Anthropocene: quantifying residual phosphorus in the biosphere. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-1538422_V1.