Volume 29, Issue 20 pp. 5968-5980
RESEARCH ARTICLE
Open Access

Resolving the influence of lignin on soil organic matter decomposition with mechanistic models and continental-scale data

Bo Yi

Bo Yi

Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, Iowa, USA

Search for more papers by this author
Chaoqun Lu

Corresponding Author

Chaoqun Lu

Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, Iowa, USA

Correspondence

Chaoqun Lu, Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, USA.

Email: [email protected]

Search for more papers by this author
Wenjuan Huang

Wenjuan Huang

Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, Iowa, USA

Search for more papers by this author
Wenjuan Yu

Wenjuan Yu

Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, Iowa, USA

Search for more papers by this author
Jihoon Yang

Jihoon Yang

Department of Agricultural and Biosystems Engineering, Iowa State University, Ames, Iowa, USA

Search for more papers by this author
Adina Howe

Adina Howe

Department of Agricultural and Biosystems Engineering, Iowa State University, Ames, Iowa, USA

Search for more papers by this author
Samantha R. Weintraub-Leff

Samantha R. Weintraub-Leff

National Ecological Observatory Network, Battelle, Boulder, Colorado, USA

Search for more papers by this author
Steven J. Hall

Steven J. Hall

Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, Iowa, USA

Department of Plant and Agroecosystem Sciences, University of Wisconsin-Madison, Madison, Wisconsin, USA

Search for more papers by this author
First published: 13 July 2023
Citations: 4

Abstract

Confidence in model estimates of soil CO2 flux depends on assumptions regarding fundamental mechanisms that control the decomposition of litter and soil organic carbon (SOC). Multiple hypotheses have been proposed to explain the role of lignin, an abundant and complex biopolymer that may limit decomposition. We tested competing mechanisms using data-model fusion with modified versions of the CN-SIM model and a 571-day laboratory incubation dataset where decomposition of litter, lignin, and SOC was measured across 80 soil samples from the National Ecological Observatory Network. We found that lignin decomposition consistently decreased over time in 65 samples, whereas in the other 15 samples, lignin decomposition subsequently increased. These “lagged-peak” samples can be predicted by low soil pH, high extractable Mn, and fungal community composition as measured by ITS PC2 (the second principal component of an ordination of fungal ITS amplicon sequences). The highest-performing model incorporated soil biogeochemical factors and daily dynamics of substrate availability (labile bulk litter:lignin) that jointly represented two hypotheses (C substrate limitation and co-metabolism) previously thought to influence lignin decomposition. In contrast, models representing either hypothesis alone were biased and underestimated cumulative decomposition. Our findings reconcile competing hypotheses of lignin decomposition and suggest the need to precisely represent the role of lignin and consider soil metal and fungal characteristics to accurately estimate decomposition in Earth-system models.

1 INTRODUCTION

Decomposition of plant litter is a critical process in terrestrial ecosystems. It is a dominant source of carbon dioxide (CO2) fluxes to the atmosphere and is the key step in soil organic matter (SOM) formation (Cotrufo et al., 2015; Wieder et al., 2015). Climate, litter quality, and soil nitrogen (N) availability have long been known to regulate litter decomposition at local to global scales (Berg et al., 1993; Cornwell et al., 2008; Currie et al., 2010; Meentemeyer, 1978), and the decomposition of lignin, in particular, may play a critical role (Austin & Ballaré, 2010; Coûteaux et al., 1995). Lignin is the second most abundant compound in terrestrial plants, representing up to 30% of plant carbon (C) (Boerjan et al., 2003). Furthermore, lignin may limit microbial accessibility to cellulose, the most abundant compound in plant litterfall, since a portion can only be decomposed after lignin is removed (Berg et al., 2000; Pauly & Keegstra, 2008).

Despite decades of research, lignin's stability, biodegradability during litter decomposition, and its importance in controlling litter and soil organic carbon (SOC) decay remain controversial topics (Dao et al., 2018; Thevenot et al., 2010). Lignin was long regarded as a relatively recalcitrant substance given its complex molecular structure and could only be efficiently decomposed by certain specialized fungi (Hammel, 1997; Swift et al., 1979). Lignin degradation would thus limit litter decomposition and only proceed after labile, unprotected compounds were consumed, which we term the “substrate-limitation hypothesis” (Figure 1a, Berg & Staaf, 1980). Alternatively, lignin might decompose fastest during early litter decomposition due to co-metabolic degradation with labile C, which we term the “co-metabolism hypothesis” (Figure 1b, Klotzbücher et al., 2011). A recent study found evidence for aspects of both the substrate-limitation and co-metabolism hypotheses, which we term the “reconciliation hypothesis” (Figure 1c, Hall et al., 2020). However, these competing arguments have been overlooked or only partially represented in current C decomposition models. Disagreement about mechanistic understanding challenges ecological modelers to accurately represent the details of the litter/lignin degradation processes in decomposition models, which is a critical uncertainty source for the prediction of terrestrial carbon dynamics (Wieder et al., 2018).

Details are in the caption following the image
Conceptual lignin C decomposition hypothesis incorporated into process-based CN-SIM model for model structure comparison. The vertical gray line indicates the transition from the initial decline to the lagged-peak decomposition time stage. Figure modified from Hall et al. (2020).

Most carbon decomposition models oversimplify the role of lignin and rely on fixed proxies or rate modifiers, even though lignin is known to be a critical determinant of litter and soil carbon decomposition (Cornwell et al., 2008; Stephan & Patrick, 2005; Talbot et al., 2012). For example, many SOC decomposition models mixed structural litter and lignin as a chemically resistant pool [e.g., ORCHIMIC (Krinner et al., 2005) and CORPSE (Sulman et al., 2014)] or represented litter decomposability by a proxy of lignin:N ratio with climate modifier [e.g., Century (Parton et al., 1987), CASA-CNP (Wieder et al., 2018), MIMICS (Zhang et al., 2020), LIDLE (Campbell et al., 2016), and MEMS (Zhang et al., 2021)], rather than an independent lignin dynamics. These approaches neglect the protective role of lignin or its time-varying interaction with litter during the course of decomposition. Except for lignin, there are other critical variables that influence the decomposition of litter and lignin but are biased or difficult to represent in process models. Nitrogen is an important predictor that may have contrasting effects on C decomposition. Greater litter N content may increase lignin decomposition by alleviating N limitation (Talbot & Treseder, 2012), whereas increased N availability may also decrease lignin decomposition by suppressing the production of oxidative enzymes (Chen et al., 2018). Most C–N coupled decomposition models consider litter C/N or lignin/N ratios as well as N regulation to decomposition rate through N immobilization/mineralization. Soil geochemical and microbial characteristics may also explain variation in lignin and litter decomposition, but have been excluded or underrepresented in models (Parton et al., 1987; Tian et al., 2015). For instance, soil metals show positive (calcium, Lovett et al., 2016), negative (iron, Huang et al., 2019), or dual effects (manganese, Li et al., 2021) on lignin degradation in different studies. Microbial communities play key roles in decomposing SOC and releasing CO2 (Jansson & Hofmockel, 2020). Experimental evidence has revealed shifts in the microbial community caused by C substrate decomposability (Schimel & Schaeffer, 2012), and bacteria and fungi had niche differentiation in the decomposition of plant-derived organic matter (de Boer et al., 2005). Related microbial community interactions have been incorporated in only a few C decomposition models through microbial physiology, like MIMICS (Wieder et al., 2014).

Deficiencies in model representation of lignin and critical soil characteristics hinder us from attributing CO2 fluxes to the corresponding C components and leave large uncertainties in predicting litter and SOC decomposition. In addition to underrepresented mechanisms, testing and validating lignin's roles in C decomposition models remains challenging due to a lack of systematic observation datasets that measure lignin decomposition in soils with diverse geochemical and microbial properties. In this study, a process-based decomposition model was confronted with a recently published laboratory incubation data set (Huang et al., 2022) to address these challenges.

Here, we used data from a laboratory incubation of soils collected across North America to inform and improve a process-based SOM decomposition model called CN-SIM (Petersen et al., 2005). The dataset includes 80 different National Ecological Observatory Network (NEON) soil samples spanning large-scale climatic and edaphic gradients, and CO2 fluxes measured over 571 days from three major C components (litter, 13Cβ label lignin, and SOC) using a combination of enriched and natural abundance stable isotope treatments. To test the influence of local-scale variation in soil properties on decomposition rates, four replicates of surface soil (0–15 cm depth) were collected around a 40 × 40-m plot from each of the 20 NEON sites. We aimed to identify and integrate important biogeochemical information (e.g., soil N, metals, and microbes) into the CN-SIM model to better represent temporal CO2 fluxes from the corresponding C components. Moreover, we sought to evaluate competing lignin degradation configurations in models that explicitly represented these hypotheses (substrate limitation, co-metabolism, and reconciliation, Figure 1). The model-estimated CO2 fluxes from litter, lignin, and soil components were compared with laboratory incubation data to assess the predictive capability and uncertainty. Overall, this data-model fusion study highlights the benefits of mechanistically representing lignin and local-specific geochemical/microbial controls on SOM decomposition to improve the temporal prediction accuracy of CO2 fluxes derived from litter, lignin, and SOC.

2 METHOD

In this data-model fusion work, we analyzed a published laboratory incubation dataset (Huang et al., 2022) to understand lignin decomposition trajectories and their relationships with litter and soil decomposition in soils sampled from across the United States. We first explored what biogeochemical indicators predicted the grouping of lignin decomposition trajectories (i.e., no-peak and lagged-peak groups). Then, we incorporated the laboratory incubation findings (key predictors and time-variant substrate availability) to improve a process-based C decomposition model (CN-SIM, Petersen et al., 2005). Finally, we compared the model-estimated CO2 fluxes with the laboratory incubation dataset and quantified the estimation uncertainties derived from different model structures informed by two classic but debated hypotheses (C substrate limitation and co-metabolism of lignin) thought to control lignin decomposition (Hall et al., 2020).

2.1 Status of laboratory incubation data

The decomposition model improvement and calibration were based upon a 571-days soil laboratory incubation dataset with samples collected from 20 sites across 18 of the eco-climatic domains in NEON. It included sites from across the continental United States as well as Alaska and Hawaii. At each site, four mineral soil samples were collected from 0–15 to 15–30 cm depths around the perimeter of a 40 × 40-m NEON distributed base plot. Soil samples spanned a broad range of biogeochemical characteristics (Table S1) and land cover types (forest, grassland/shrubland, and wetland). In the laboratory incubation dataset, the decomposition of C4 grass litter, 13C-labeled lignin, and soil C was monitored over time (Figure 2) by measuring the stable C isotope ratio (δ13C) values of CO2 from replicates of each soil that received substrates with different δ13C values; these empirical data are described in detail in our published paper (Huang et al., 2023). Three separate treatments were implemented for each soil sample: including (1) incubating the soil alone; (2) the soil amended with C4 grass (Androgopon gerardi) litter + natural abundance synthetic lignin; and (3) the soil amended with C4 grass litter + 13C-labeled lignin at the Cβ position of the propyl sidechain. Soils were gently mixed with the C4 grass litter + labeled lignin mixture at a 250:25:1 ratio, with 1 g of dry soil mass equivalent mixed with 100 mg C4 grass litter (containing natural lignin: 14% of added C) and 4 mg synthetic lignin (5% of added C). CO2 production from litter, lignin Cβ, and SOC was calculated using isotope mixing models (Hall et al., 2017) and used for model improvement. Soil incubation experiments lasted for 571 days, during which the CO2 was sampled and measured. Samples were incubated at a constant temperature (23°C) and moisture (field capacity). The C- and N-related, geochemical, and microbial properties of soil samples are listed in Table S1. These data were used for model initialization and lignin decomposition pattern identification (two patterns found in Figure 2). Additional details regarding the NEON soil laboratory incubation design, CO2 efflux measurements, and methods for microbial and geochemical measurements are reported in the Supporting Information.

Details are in the caption following the image
Laboratory-measured decomposition rate of C4 grass litter C, lignin Cβ, and soil organic carbon for no-peak and lagged-peak groups over time in the NEON incubation dataset, expressed relative to initial C mass in each C source. Lines represent the average decomposition rate in group samples, and shaded areas represent a 95% confidence interval. The dashed lines represent the 109th day of incubation. NEON, National Ecological Observatory Network.

2.2 Model calibration and validation

The 0–15 cm soil samples from the NEON laboratory incubation dataset were used for model calibration. The measurements of C- and N-related and microbial properties were used to initialize the model pools, and CO2 efflux measurements data were used to calibrate the sensitive parameters by a Monte Carlo Markov Chain (MCMC) algorithm (details in Section 2.8). The improved CN-SIM model includes a separate lignin C pool (Figure S1) so that the model simulation can separate daily CO2 efflux into bulk litter pools and a lignin pool. In this paper, the “bulk” litter pool hereafter refers to two litter pools (Figure S1, excluding lignin) in the model, corresponding to the C that comes from “pure” litter (without lignin) in C4 grass. Model “lignin” C pool refers to all added lignin (natural lignin in C4 grass + synthetic labeled lignin). In model calibration and validation, laboratory-measured CO2 efflux from C4 grass was compared with the sum of model-simulated CO2 from bulk litter pools and the natural lignin proportion in the lignin pool. Model performance in estimating lignin decomposition was evaluated by using laboratory-measured 13C-labeled lignin and model-simulated synthetic lignin. Model validation was conducted by using independent 15–30 cm soil samples collected from the same plots and incubated with the same protocol, and the details are reported in the Supporting Information.

2.3 Identification of decomposition pattern predictors by Random Forest modeling

Random Forest (RF) model can provide multivariate, nonlinear classification relationships, and provide variable importance to the prediction strength (Hastie et al., 2009). We tested 24 soil geochemical, microbial, and N-related characteristics (Table S1) from the laboratory incubation dataset to predict different lignin decomposition patterns (no-peak and lagged-peak, Figure 2) in 80 NEON samples. The final RF model used three soil characteristics to predict binary lignin decomposition patterns (0 for no-peak and 1 for lagged-peak). We used 70% of samples to train the RF algorithm and the remaining 30% for accuracy evaluation. RF algorithm was performed using the sklearn library on Python 3.8 and was executed 100 times to evaluate the model's overall stability.

2.4 SOC decomposition model

We adopted the SOM model structure from CN-SIM, which Petersen first presented in 2005 (Petersen et al., 2005). We added a lignin pool upon the original structure (Figure S1), enabling the model to track dynamic litterfall quality (labile litter:lignin ratio) along the decomposition process. We also configured the microbial physiology for modeling microbial pools (details in the following Section 2.5). The C components (particulate organic matter [POM] and mineral-associated organic matter [MAOM]) measured on these same soils in a previous study (Yu et al., 2022) were used for soil C pool initialization. The improved CN-SIM model (Figure S1) is a SOM decomposition model with the ability to simulate the dynamics of pool sizes, C/N flows among plant litterfall (bulk litter and lignin), microbes, and soil C pools. For C–N coupling dynamics, we used measurements of mineral soil N (0, 30, 270, and 570 days in incubation) to calibrate N-related processes (shown in Figure S2). The plant litterfall consists of three pools: labile litter pool, stable litter pool, and lignin pool. Two model (bulk) litter pools were initialized by the C:N ratio measured in the added C4 grass litter, and the lignin pool was initialized based on lignin content (19%) in the added litter. Microbial pools consist of two microbial functional types: bacteria and fungi, whose amounts are initialized with bacteria and fungi biomass inferred by qPCR data (details can be found in Supporting Information). The SOM consists of three pools: the microbial residue pool (MR) represents microbial residue products, the active soil pool (POM) represents fine and coarse particulate organic matter, and the passive soil pool (MAOM) represents mineral-associated organic matter that is physically and chemically protected by clay or minerals. Soil C pools were initialized by laboratory-measured POM and MAOM data (Yu et al., 2022) with the assumption of assigning 2.5% POM C to the microbial residue pool (MR) C pool. The C turnover rate from model pools is a function of the decay rate modified by climatic conditions (temperature, water) and soil properties (pH, available N, and soil clay content). Part of the decomposed C is released as CO2, while the remaining part goes into microbial pools or SOM pools that decompose more slowly. Microbial-assimilated C and N are allocated to maintenance respiration, population growth, and necromass (Figure S1).

2.5 Model improvements and microbial physiology

Based on the NEON incubation data analysis, we found two different lignin decomposition patterns (no-peak and lagged-peak in Figure 2) that were poorly represented by the original CN-SIM model due to the mechanism limitation. The incubation data also provided directions for model structure improvement, and we briefly summarize those here with additional details in the Section 3. For grouping the lignin decomposition pattern qualitatively using soil characteristics identified in RF, the following criteria were used: (1) soils with high pH (>5.8) or low extractable Mn (<0.8 mg g−1 soil, blue area in Figure 3b) were assigned to the no-peak lignin group; (2) soil samples with low pH (<5.8) and high extractable Mn (>0.8 mg g−1 soil) were assigned to the lagged-peak group. For tracking time-series lignin C loss quantitatively, in the lagged-peak group: (1) lignin decomposition follows the co-metabolic hypothesis when bulk litter is abundant (labile bulk litter to lignin pool ratio is >2); (2) decomposition of occluded cellulose/lignin was stimulated following the substrate-limitation hypothesis when bioavailable litter becomes limited (labile bulk litter to lignin pool ratio is <2). In the no-peak group, co-metabolism is always the dominant mechanism, and the ratio of bulk litter to lignin does not have an impact. A graphical illustration (Figure S3) shows our proposed reconciliation hypothesis on lignin decomposition that integrates specific soil properties and two classic hypotheses.

Details are in the caption following the image
Geographical locations of the sampling sites (a), Random Forest (RF) model classified lignin decomposition groups (b), and quality status in carbon substrate along the decomposition in lagged-peak starting day (c). The NEON incubation dataset contains samples with nested scales of variation, including 20 NEON sites and four replicate samples collected around the perimeter of one 40 × 40-m “distributed base plot.” The RF model result is based on soil pH, Mn, and ITS composition index as input features (for clarity, only pH and citrate-dithionite extractable Mn are shown). NEON, National Ecological Observatory Network.

In order to represent the different lignin decomposition trajectories across samples, we improved the representation of microbial processes in the CN-SIM model. The original CN-SIM model included two functional microbial pools (-r/-K strategy) based on different C-use strategies (Petersen et al., 2005). In this study, we kept two microbial pools but enhanced them by integrating microbial physiological traits based on incubation dataset analysis and findings from previous studies. The NEON incubation dataset revealed that litter decomposition was related to bacterial and fungal abundance, while lignin decomposition was always slower than litter and was more strongly related to fungal abundance (Figure 2; Figure S4). Previous studies also revealed that some bacteria may have fast growth rates and flourish in environments enriched with labile C, and that some fungi are slow-growing microorganisms that more efficiently use C with lower bioavailability (de Boer et al., 2005; Fabian et al., 2017; Waring et al., 2013). We recognized that not all bacteria and fungi fit into these functional groups, but we elected to keep the model as simple as possible, and to enable quantification of microbial pool sizes using a simple and widely available metric (qPCR of 16S and ITS genes). Hence, we set bacteria and fungi as two microbial pools in CN-SIM with different degradation traits. Then, we configured the physiological traits of modeled microbial pools in the improved model with the following assumptions: different functional groups dominate bulk litter and lignin decomposition, with bacteria dominating in high-quality substrate conditions (defined by the ratio of labile bulk litter to lignin content larger than 2, Figure 3c), whereas the fungi group is more competitive when the substrate has low quality (defined as the labile bulk litter to lignin pool ratio equal to or smaller than 2).

The CO2 fluxes from microbial pools were adapted from Michaelis–Menten kinetics (Schimel & Weintraub, 2003), which links microbial biomass carbon (MBC) to the decay of litter products and CO2 release in a daily timestep:
Δ CO 2 = MBC × V max × C in K m + C in
Here, ΔCO2 (mg−1 CO2-C g−1 soil day−1) represents daily CO2 loss by microbial respiration; MBC corresponds to microbial biomass carbon (mg MBC g−1 soil) of bacteria/fungi pools regulated by the balance of microbial death and procreation; Vmax is the C contents (mg) consumed per unit of microbial biomass per day (mg−1 MBC day−1)
V max = K max × f T × f w × f N × f clay
where Kmax represents the decay rate in ideal conditions modified with temperature, moisture, nitrogen content, and soil clay content; Cin stands for carbon import (mg C g−1 soil) into the studied microbial pool; and Km is the half-saturation constant for microbial C assimilation (mg C 10−3 g−1 soil).

2.6 Model structure testing informed by different hypotheses

To evaluate the impacts of lignin on predictions of the process-based model, we informed models with different lignin degradation hypotheses (substrate limitation, co-metabolism, and reconciliation, Figure 1) and evaluated their capabilities in predicting daily CO2 from bulk litter, lignin, and soil decomposition. For the substrate-limitation hypothesis-informed model, lignin degradation is suppressed in the initial stage until the labile bulk litter to lignin pool size ratio reaches the proposed threshold point of labile substrate limitation (regulated by labile bulk litter:lignin ratio <2). For the co-metabolism hypothesis-informed model, lignin degradation is triggered in the initial stage by labile bulk litter substrate because of the enrichment of bioavailable C content. The diagram of the reconciliation hypothesis-informed model is shown in Figure S3. The RF model classification (Figure 3b, pH < 5.8 and citrate-dithionite extractable Mn >0.8 mg g−1 soil) was first applied to determine the lignin decomposition trajectory groups (no-peak or lagged-peak). Then, for the no-peak group, the co-metabolism hypothesis-informed model was applied to simulate decreased lignin decomposition as overall bulk litter decomposition decreased, or for the lagged-peak group, the reconciliation hypothesis-informed model was applied to simulate different time stages (initial and lagged-peak) of lignin decomposition.

2.7 Model sensitivity analysis

We conducted a variance-based global sensitivity analysis, Sobol sensitivity analysis, to quantify the relative importance of each parameter to model CO2 efflux. The Sobol method is model independent and works for linear and nonlinear outputs (Sobol, 2001), so it is well suited for complex and high-dimensional nonlinear process-based ecosystem models. The Sobol method considers the whole parameter space in the form of a probability density function, including the main effect and interactions between parameters (Saltelli et al., 2008). We used the log-likelihood value computed from the mismatch between the measured and modeled CO2 to determine the sensitivity. Here, we identified a total of six parameters (shown in Figure S5) related to C pool decay rates (Kmax_Litter1, Kmax_Litter2, Kmax_Lignin, and Kmax_POM) and microbial half-saturation constant (Km_MB1 and Km_MB2) that are directly related to CO2 release in the bulk litter, lignin, and soil mixture system decomposition. Sobol analysis allowed us to find parameters with the most influence and fix other less influential parameters as default values in model parameterization and prediction, making the model more robust in prediction. Additional information about the prior distributions for model parameters is in Table S2.

2.8 Model parameterization

For each of the 80 soil samples, parameter estimates for six sensitive parameters (identified in Sobol analysis) were derived from a MCMC technique based on the Metropolis-Hastings Random Walk algorithm (Hastings, 1970) by fitting the CN-SIM model to the laboratory-observed time-series CO2 flux from bulk litter, lignin, and soil pools. This method, in turn, aids in revealing difficulties in parameter estimation by representing it as a function of likelihood given the measured data and the prior information. Moreover, uncertainties in estimating parameters can be propagated to uncertainties in the model output. Further details on step widths, chain lengths, and criteria for accepting a given MCMC are given in the Supporting Information regarding MCMC sampling, and subsequent probability density function estimation was done with the MATLAB packages Dream(zs) (Vrugt, 2016). The maximum posterior of influential parameters in each sample estimated by MCMC for the reconciliation model are shown in Table S3.

2.9 Model implementation

For model estimation, we applied an improved CN-SIM model to estimate uncertainty from parameters and derive prediction intervals for daily CO2 from bulk litter/lignin/soil C decomposition and SOC stocks by iteratively applying the model for 1000 replicates driven with joint probability from the posterior parameter distributions. The model estimated bulk litter, lignin, and soil decomposition derived CO2 flux from the reconciliation model are shown in Figure S6a, b, and c, respectively. We also assessed model performance in estimating the evolving content of soil available N ( NH 4 + and NO 3 in 0, 30, 270, and 570 days) during the incubation period for each soil sample (Figure S2).

2.10 Model fitness statistics

Common statistical analysis methods were used to evaluate the fitness of models, including the coefficient of determination R2 (which measures how well the predicted values match the observed value), root mean square error (RMSE), mean absolute percent error (MAPE), and Pearson's correlation coefficients (r). The formulas of R2, RMSE, MAPE, and Pearson's r are given in Supporting Information.

3 RESULTS

3.1 Contrasting trajectories and hypotheses of lignin decomposition

Based on the laboratory incubation data analysis, we found two different trajectories in lignin decomposition rate among NEON samples (Figure 2): a “no-peak” group where lignin decomposition generally decreased over time (65 of 80 samples, Figure 3a), and a “lagged-peak” group where lignin decomposition significantly increased over time after an early decline (15 of 80 samples, Figure 3a). We defined the lagged-peak group by a continuous increase in lignin decomposition in three consecutive 14-day measurements. The two lignin decomposition groups differed in temporal dynamics and cumulative lignin C loss (Figure 2; Table S5). By the end of the laboratory incubation (571 days), cumulative lignin C loss was significantly higher in the lagged-peak group than in the no-peak group (12.6% vs. 5.6% of initial lignin C mass, p < .001 in t-test). We defined phases of decomposition as initial (0–109 days) and lagged-peak (109–571 days) stages based on lignin decomposition trajectories. Cumulative lignin C loss from the lagged-peak stage (109–571 days) in the lagged-peak group (10.8%) was more than double that in the no-peak group (3.7%). Meanwhile, litter and SOC decomposition varied less and without statistical differences (p > .05) in cumulative C loss and temporal variation between the two groups (Figure 2).

We used a RF model to identify the important soil C, N, geochemical, and microbial predictors for the occurrence of the lagged peak in lignin decomposition. Variables that were frequently reported to impact organic matter decomposition in previous studies, such as soil mineral N, lignin:soil N, and soil C:N ratio, were weakly correlated with lagged peak in lignin decomposition (Pearson's r = −.19, −.08, and −.03, respectively; Figure S7). Rather, soil pH, extractable metals (Mn, Fe, Al), and a fungal composition index (ITS PC2, the second principal component of an ordination of fungal ITS amplicon sequences) were the top five features for predicting whether a soil sample belonged to the “no-peak” or “lagged-peak” group of lignin decomposition (Figure S8). The trained RF model reached 92.5% accuracy for categorically predicting the two decomposition groups using pH, extractable Mn, and ITS PC2 values as input features (Figure 3b; for clarity, only pH and extractable Mn are shown). Furthermore, we also found that the starting day of the increased lignin decomposition in the lagged-peak group was predicted by the relative abundance of labile bulk litter and lignin remaining in soil (Figure 3c; Figure S9). Generally, the lagged-peak stage (109–571 days) was triggered on the day when the modeled ratio of labile bulk litter:lignin C pool sizes approached 2 in relatively acidic soils (pH < 5.8) and high Mn environment (Figure 3b,c). These data-derived mechanistic details were then incorporated into a version of the CN-SIM model that we term the “reconciliation” model because it interprets aspects of the substrate-limitation and co-metabolism hypotheses. This reconciliation model predicted which samples displayed the lagged peak in lignin decomposition and when it occurred. Specifically, we first applied the trained RF model to classify two decomposition pattern groups based on soil characteristics (pH, extractable Mn, and ITS PC2 value), then we simulated the decomposition for no-peak group samples using the co-metabolism hypothesis-informed model. Decomposition in the lagged-peak group samples was predicted by a modified substrate-limitation model that used the labile bulk litter:lignin ratio to initiate the lagged-peak stage in lignin decomposition (Figure S3).

3.2 Predicting sample variation of CO2 flux with different hypothesis-informed models

Three model versions of CN-SIM were developed to quantify how the simulated CO2 flux was impacted by contrasting mechanistic understanding of lignin decomposition (Figure 1). The different models varied in explaining CO2 flux variation and estimation errors among the NEON samples. The reconciliation model captured 77%, 91%, and 90% of observed spatiotemporal variation in CO2 produced from the decomposition of litter, lignin, and SOC among all 80 NEON samples, and it performed better than the other two model structures (model fitness index: coefficient of determination = −0.21, −0.21, and −0.07 for substrate limitation; 0.71, 0.72, and 0.91 for co-metabolism, Figure 4; Table S6). Compared with the reconciliation model, the substrate-limitation model had greater estimation errors in both the no-peak and lagged-peak groups, with a laboratory-model MAPE (model prediction quality index) of 59%, 55%, and 57% in CO2 flux estimation from litter, lignin, and SOC, respectively (Table S6). Estimation error in the substrate-limitation model was caused by deficient performance in simulating the 65 samples in the no-peak group (Table S7). The co-metabolism model missed some high lignin fluxes (especially for rate >0.075% day−1) in the 15 lagged-peak samples (Figure 4) but was comparable with the reconciliation model in litter and soil decomposition rate simulation (Tables S6 and S7). The RMSE (model prediction quality index) values also indicated that the reconciliation model structure estimated the spatial variation of CO2 flux more accurately (RMSE = 0.02, 0.00, 0.01 for litter, lignin, and soil) than the substrate-limitation (RMSE = 0.05, 0.01, 0.04) and co-metabolism (RMSE = 0.03, 0.01, 0.01) models, with lignin decomposition having the best improvement overall (Table S7). The CN-SIM model with the reconciliation hypothesis was validated and produced reasonable results for litter, lignin, and soil CO2 fluxes (Pearson's r = .73, .52, and .87) in 15–30 cm soil samples (Figure S10). Further details on the validation analysis are reported in the Supporting Information.

Details are in the caption following the image
Comparison between different hypothesis-informed model simulations (y-axis) and laboratory-observed (x-axis) CO2 fluxes from C4 grass litter C, lignin Cβ, and soil organic carbon decomposition across different samples. Each dot denotes the decomposition rate on a measurement date in a soil sample, expressed as the proportion of CO2 efflux lost from the initial C mass in each C source. The blue and yellow circle dots represent no-peak and lagged-peak groups, respectively. The dashed line indicates a 1:1 relationship. R2, coefficients of determination, an index representing how well the predicted values match the observed values (ranging from any negative number to +1). MAPE, mean absolute percent error. Model-simulated C4 grass litter CO2 was calculated by combining CO2 from two litter pools and the fraction of C4 grass CO2 loss in the lignin pool.

3.3 Tracking temporal trajectories of CO2 flux from different C components

The three hypothesis-informed model versions varied greatly in the simulation of cumulative C loss as CO2, source attribution, and temporal dynamics (Figure 5). For the 15 lagged-peak group samples, the reconciliation model estimated 20.4 ± 4.8% (mean ± SD, standardized by initial C content) cumulative CO2 from all C components after 571 days of incubation, which was closely comparable to the laboratory observation (21.6 ± 5.10% cumulative C loss as CO2, Figure 5e). However, the substrate-limitation and co-metabolism model predicted cumulative CO2 of only 16.9 ± 4.9% and 17.8 ± 5.1%, respectively, underestimating model CO2 simulation results. A similar underestimate of 25.7% of cumulative CO2 was also shown in the simulation of the 65 no-peak group samples with the model informed by the substrate-limitation hypothesis (Figure S11).

Details are in the caption following the image
Temporal dynamics of different hypothesis informed model-simulated CO2-C flux rates from bulk litter (a), lignin (b), soil organic carbon (c), all C sources (d), and cumulative C loss over time (e) and after 571 days (f) from the 15 lagged-peak group samples. Each dot and line represent the mean value of 15 lagged-peak site samples. The vertical line in (a–d) is the 109th incubation day, and the red circles in (f) represent the mean values of 15 samples.

The different models also showed contrasting results in C source attribution and temporal dynamics of CO2 flux. For the 15 lagged-peak group samples, the substrate-limitation model underestimated CO2 from bulk litter by 26% and overestimated CO2 from SOC by 6% over the 571-days incubation, and the co-metabolism model underestimated CO2 from bulk litter C and SOC by 28% and 0.63%, respectively (Figure 5). These modeling biases in CO2 source attribution directly impacted the model estimation of the remaining soil C composition. The bulk litter:soil C pools were estimated as 63%:32% of total remaining C mass for the substrate-limitation model and 63%:33% for the co-metabolism model after the 571-days incubation. The reconciliation model had the closest C pool estimation (bulk litter:soil C = 60%:35%) with the laboratory observation (59%:36%). Moreover, evaluated from the aspect of temporal dynamics, the substrate-limitation model underestimated cumulative CO2 in the initial stage (0–109 days) by 7.9% and in the lagged-peak stage (109–571 days) by 26% (Figure 5). The co-metabolism model underestimated them by 9% and 28% in those two decomposition time stages. The reconciliation model had the best temporal accuracy when compared with the laboratory incubation, and only had 0.9% and 1.9% underestimation in the two decomposition stages. Hence, failure to represent lignin decomposition mechanisms in models may not only increase variation across samples but may also misestimate CO2 fluxes and mismatch the timing of CO2 emission from different C components.

4 DISCUSSION

4.1 Incorporating a lignin pool and local soil properties into C decomposition modeling

Previous models typically treat lignin as part of the “structural” litter pool and use the lignin:N or C:N ratio to represent litter decomposability (Stewart et al., 2015), determining the turnover rate from litterfall C to SOC. Our NEON data analysis showed large variation in both the temporal decomposition dynamics and cumulative C mass loss from lignin and litter (Figure 2; Table S5), despite their correlated (Pearson's r = .62) decomposition rates (Figure S12). Specifically, litter and lignin significantly differed in cumulative C loss after the 571-days incubation (32.1% vs. 7.4% of initial C component mass, p < .001 in the t-test). Therefore, combining lignin with other conceptual litter pools in litterfall decomposition models would bias temporal representations of either pool. Considering the large range of lignin mass (5%–40%) in litterfall (Boerjan et al., 2003) and two different decomposition patterns, explicitly modeling a lignin C pool and including overlooked soil characteristics can substantially improve model predictions of soil CO2 flux while decreasing bias.

The dominant conceptual model of litterfall decomposition proposes that the primary controls of decomposition rate are climate, plant litter chemistry, soil N availability, and decomposer community composition (Moorhead et al., 1999; Todd-Brown et al., 2014; Wieder et al., 2013). Interestingly, we found distinct temporal patterns of lignin decomposition and significant C loss differences (Figure 2), despite the fact that all of the NEON soils received the same addition of litter/lignin and experienced identical climate conditions in the laboratory incubation. We posit that this is due to the effects of soil variables, such as metal content and decomposer community composition, on C decay rate and C-use efficiency. Previous studies have also suggested that large uncertainty in decomposition model predictions was likely caused by variation in soil properties and how soil variables alter C decay rate and C-use efficiency in decomposition models (Wang et al., 2020). Many C models, like DayCent and DNDC, mainly used the lignin:N or C:N ratio as the essential soil indicators controlling lignin–litter interaction in decomposition, without a potential change in lignin decay under different soil conditions. In the NEON incubation data, we found lignin:soil available N and soil C:N failed to explain lignin degradation groups or cumulative CO2 flux (Figure S7). Instead, we found that soil pH, Mn, and fungal composition were important for explaining lignin variation among 80 soil samples, and that including these variables can improve C decomposition modeling.

Previous studies were often limited to spatially restricted laboratory and field experiments or lacked explicit measurements of lignin decomposition over time, which made it difficult to use data synthesis methods such as meta-analyses to evaluate lignin's role in decomposition. Moreover, such data aggregation methods potentially obscure the influence of local variation in biogeochemical properties (Bradford et al., 2016), since variation among local within-site replicates is averaged. Our incubation of NEON samples included nested scales of spatial variation, including sites distributed across eco-climatic domains and replicates within each site. We found that even under the same vegetation type and climate conditions, several NEON sites (e.g., GRSM, HARV, OSBS, WREF) had different decomposition patterns (i.e., no-peak or lagged-peak lignin decomposition) between the four within-site replicate samples (Figure S6). This result challenges conceptual biogeochemical models built around climate-decomposition relationships that only use temperature and moisture as predominant controls on litter turnover rates (Todd-Brown et al., 2014; Tuomi et al., 2009). With the help of systematic incubation data, our study revealed soil metal and fungal characteristics (Figure 3b) could be valuable for informing the model of lignin decomposition patterns and regulating decomposition processes. However, most of the current observation network dataset overlooked this information, limiting our model's efficacy in large-scale applications. The RF model identified pH, extractable Mn, and fungal community composition as important predictors to classify temporal patterns in lignin decomposition (Figure 3b). Fungi are potentially dominant drivers of terrestrial organic matter decomposition (Lustenhouwer et al., 2020; Maynard et al., 2019), and ITS sequencing is a useful indicator of fungal community composition (Nilsson et al., 2019). Soil pH directly impacts the abundance and composition of fungi and bacteria (Rousk et al., 2010), and acidic pH often boosts fungal growth and diversity relative to bacteria (Tedersoo et al., 2014). Manganese is important for forming manganese peroxidases of fungi (Datta et al., 2017), one of the essential extracellular enzymes for lignin biodegradation (Brown & Chang, 2014; Hatakka, 1994; Hofrichter, 2002). Our data-model fusion work indicates that a low pH (<5.8) and high extractable Mn (>0.8 mg g−1 soil) environment promotes a niche suitable for lignin-decomposing fungi, leading to a peak in lignin decomposition when labile bulk litter becomes insufficient (labile bulk litter:lignin ratio <2) (Figure 3).

4.2 Reconciling classic hypotheses to track spatiotemporal dynamics of litter–lignin–SOC decomposition

Although lignin's role in regulating litter and SOM decomposition has been debated (Berg & Staaf, 1980; Hall et al., 2020; Klotzbücher et al., 2011), studies evaluating the impacts of how lignin is represented in C decomposition models are lacking. Substrate-limitation and co-metabolism hypotheses are two dominant proposed mechanisms in lignin degradation studies. Our model comparison revealed that the co-metabolism model well-matched the observed decomposition patterns for the 65 no-peak samples (coefficients of determination = 0.88, 0.98, 0.96 for bulk litter, lignin, and soil, Figure S11). For the lagged-peak group, the 15 samples partly conflicted and agreed with both substrate-limitation (C decomposition increases after labile C substrate depletion) and co-metabolism (C decomposes fastest during early decomposition) hypotheses. Specifically, results from the initial stage (around 0–109 days) matched the co-metabolism hypothesis, and results from the lagged-peak stage (around 109–571 days) matched the substrate-limitation hypothesis. Our model simulation revealed that the transition from the initial co-metabolism stage to lagged substrate-limitation stage was caused by the depletion of easily decomposed labile bulk litter pool, which agrees with the substrate-limitation hypothesis that lignin degradation only proceeds after labile, unprotected compounds are consumed (Berg & Staaf, 1980). Although the day when lignin decomposition began to increase (beginning of lagged-peak phase) varied from the 67th to the 253rd day of incubation among 15 samples (Figure S9), the predicted labile bulk litter C:lignin pool approached a ratio of 2 on the day when the lagged-peak stage began for all 15 samples (Figure 3c). Hence, we propose that the dynamic change of the labile C:lignin pool can be a valuable index for estimating the beginning of the lagged peak in lignin decomposition. Although litter C mass change can be well simulated in C decomposition models over time without dynamic lignin content monitoring (Bonan et al., 2013), our result emphasized that explicit representation of lignin C pool and its dynamic interactions with bulk litter may be critical for tracking the temporal change of CO2 fluxes in different environments.

Comparison among predictions from the three versions of the CN-SIM model tested here allowed us to quantify the impacts of different lignin decomposition hypotheses. On their own, the two classical hypotheses were not able to capture the variable and complex decomposition patterns observed in soils from across North America (Figure 4). The substrate-limitation and co-metabolism models underestimated cumulative CO2 flux by roughly 20% after 571 days of incubation for the 15 lagged-peak group samples, but the underlying reasons were different (Figure 5). The substrate-limitation model underestimated CO2 flux because it was unable to represent the decline in lignin decomposition rate at the beginning of the experiment (Figure 5), which in turn caused underestimated lignin C loss in decomposition. Specifically, since lignin protects bulk litter, lignin pool size overestimation will suppress the decomposition rate of bulk litter pools over time. In contrast, in the co-metabolism hypothesis-informed model, the lagged peak (109–571 days) in lignin decomposition was not represented, leading to a lower prediction of CO2 loss and inaccurate source attribution (Figure 5a–c). The NEON incubation data showed that temporal dynamics of bulk litter and lignin decomposition were correlated even under the lagged-peak stage (Figure S9), and the lagged-peak stage contributed more than 70% of lignin decomposition in the lagged-peak group. The lack of a lagged lignin peak mechanism means that the model was not able to represent increased bulk litter C availability after removing lignin, therefore underestimating bulk litter CO2 flux in the lagged-peak stage. The reconciliation hypothesis-informed model performed better in capturing both the variation among samples (Figure 4) and temporal dynamics (Figure 5) of lignin decomposition for the lagged-peak group samples. This model well-reproduced lignin decomposition in the initial stage (0–109 days, around 16% of lignin CO2 flux) and in the lagged-peak stage (109–571 days, around 84% of the lignin CO2 flux, Figure 5).

In summary, model structure, parameter estimation, and variation in observational data are the three main uncertainty sources for process-based models (Chatfield, 1995). Based on a systematic NEON incubation dataset, our data-model fusion study ascertained regulatory factors in lignin decomposition, evaluated model structures with different lignin degradation hypotheses, and reduced uncertainties for process-based C decomposition modeling. Lignin decomposition patterns among sites varied with key soil geochemical characteristics (pH, Mn) and fungal community composition. Improvement in lignin prediction accuracy was necessary for bulk litter simulation. Substrate-limitation or co-metabolism hypotheses-informed model structures showed limitations in explaining observed spatial variation, temporal dynamics, and C component attribution of bulk litter and lignin decomposition. Predictions of CO2 flux from bulk litter, lignin, and SOC were improved by a reconciliation model structure, with the inclusion of specific geochemical (pH < 5.8, extractable Mn >0.8 mg g−1 soil) and microbial (fungal community composition) information as lignin decomposition pattern classifiers, and inclusion of a dynamic lignin protection effect to bulk litter. Modifying the model structure to represent the role of lignin in decomposition, especially for capturing a peak in CO2 pulse from decomposition under lagged-peak trait soils, would be one solution to lower the uncertainties of model-estimated global soil carbon dynamics.

ACKNOWLEDGMENTS

This work was funded by National Science Foundation grant 1802745. We thank three anonymous reviewers' valuable feedback and constructive criticism to improve the quality of this manuscript. The National Ecological Observatory Network is a program sponsored by the National Science Foundation and operated under cooperative agreement by Battelle. Data collected/used in this research were obtained through the NEON Assignable Assets program. Open access funding provided by the Iowa State University Library.

    CONFLICT OF INTEREST STATEMENT

    The authors declare that there is no conflict of interest regarding the publication of this article.

    DATA AVAILABILITY STATEMENT

    The data that support the findings of this study are openly available in Environmental Data Initiative at https://doi.org/10.6073/pasta/3169668ed4727b41f8fbec1c0ebd46cb, and codes are publicly available on GitHub: https://github.com/boyiisme/NEON_PeakSite.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.