Seeing the forest beyond the trees
Abstract
In a recent paper (Mitchard et al. 2014, Global Ecology and Biogeography, 23, 935–946) a new map of forest biomass based on a geostatistical model of field data for the Amazon (and surrounding forests) was presented and contrasted with two earlier maps based on remote-sensing data Saatchi et al. (2011; RS1) and Baccini et al. (2012; RS2). Mitchard et al. concluded that both the earlier remote-sensing based maps were incorrect because they did not conform to Mitchard et al. interpretation of the field-based results. In making their case, however, they misrepresented the fundamental nature of primary field and remote-sensing data and committed critical errors in their assumptions about the accuracy of research plots, the interpolation methodology and the statistical analysis. By ignoring the large uncertainty associated with ground estimates of biomass and the significant under-sampling and spatial bias of research plots, Mitchard et al. reported erroneous trends and artificial patterns of biomass over Amazonia. Because of these misrepresentations and methodological flaws, we find their critique of the satellite-derived maps to be invalid.
Introduction
The estimation of carbon stocks in tropical forests is challenging for several reasons: (1) diversity in the structure, wood density and dynamics of tropical forests leads to complex and variable allometry (Chave et al., 2004); (2) natural and anthropogenic disturbances at various spatial and temporal scales add to forest heterogeneity (Espírito-Santo et al., 2014); and (3) there is no strong relationship between environmental (climate and soil) variables and forest biomass for predicting regional variations. Therefore, to meet this challenge, ground and remote-sensing observations have been combined to provide estimates of biomass distribution in the tropics at regional to continental scales (Saatchi et al., 2007, 2011; Mascaro et al., 2011; Baccini et al., 2012; Asner et al., 2013). On a per-hectare basis, the ground data (generally consisting of all tree diameters above a threshold, a sampling of tree heights and species identification that permits the inference of wood densities) are more comprehensive than remote-sensing data that generally measure aggregate structure such as canopy height. In contrast, airborne or satellite remote-sensing data are far more extensive, including millions of measurements over regional or continental scales compared with hundreds for research plots. Both are measures of physical properties that are not forest biomass (Clark and Kellner, 2012). Both efforts rely on statistical techniques to estimate biomass, using single-tree allometry in the case of field plots and plot-aggregate allometry in the case of satellite data.
Here, we show that Mitchard et al. (2014) misrepresented what they measured in the plots and committed significant methodological errors in extrapolating biomass estimates from plots to the whole of Amazonia and comparing their results with the satellite-derived maps of Saatchi et al. (2011) (RSI) and Baccini et al. (2012) (RS2).
The Fallacy of Ground Truth
Mitchard et al. (2014) used 413 plots covering a total of 404.6 ha to sample more than 650 million ha of forests in Amazonia and the Choco region, west of the Andes. The census data were taken between 1956 and 2013 with more than a third of the plots last censused before 1995. The quality of structure measurements and botanical information for more than half of the plots outside RAINFOR (Malhi et al. 2006) and the TEAM network (http://www.teamnetwork.org/) is unknown. Mitchard et al. (2014) argued that their biomass estimation from research plots and the derived maps are more accurate than satellite derived maps. The reader is left with the tacit impression that the data of Mitchard et al. (2014) must be correct because they come from the ground, while the RS1 and RS2 data are from satellites. This argument is unsurprisingly compelling because the human brain is hard-wired to accept results from physical contact rather than from distant measurements. Liberman & Trope (2008) note that for the human brain, ‘Remote locations should bring to mind the distant rather than the near future, other people rather than oneself, and unlikely rather than likely events.’ In other words, ‘to see the forest we need to step back, whereas to see the trees we need to get closer’ (Liberman and Trope, 2008).
Are the primary Mitchard et al. (2014) data superior because they are closer to trees? No
Before 2005, most local and regional tropical forest biomass allometry was based on measurements of tree diameter (D) only (Brown, 1997; Chambers et al., 2007; Chave et al., 2004), and in most cases did not include wood density (ρ) or height–diameter (H–D) relations. Following the work of Chave et al. (2005), Mitchard et al. (2014) estimated biomass using diameter, height and wood density – as did the estimates of biomass using satellite data in RS1 and RS2. The analyses of RS1 and RS2 use similar and sometimes overlapping data to Mitchard et al. (2014). There is no clear evidence that one set of data is superior to another. Therefore, the main difference between the Mitchard et al. (2014) and the satellite analyses is in the extrapolation approach. We note that the ground-based biomass estimates for all three studies have been challenged recently by the publication of a new tropical forest allometry by Chave et al. (2014).
Mitchard et al. (2014) provided six estimates of biomass using the Chave et al. (2005) moist forest allometry with three or two parameters (D, H and ρ) to allow for variations in biomass estimation but did not include these differences as uncertainty in their analysis. Moreover, each allometric estimate may have additional error of 10–20% (smaller plots have a larger uncertainty) if one uses error propagation from basic measurements to model implementation (Chave et al., 2004). Including these errors (e.g. 10%) along with estimates from different allometries provides a realistic variation around the mean biomass for each plot location (Fig. S1a in Supporting Information).
Mitchard et al. (2014) claimed that the four regional H–D models introduced by Feldpausch et al. (2012) improve ground estimates of biomass by ‘greatly reducing the error in the prediction of H from D compared to a pan-Amazonian model’. The models are implemented by casually assigning the plots to four manually delineated regions of Amazonia without any systematic method of stratification. The western Amazon model, for example, covers areas from highly seasonal rainfall in the south to areas with no dry season in the north, with soils varying from infertile on the east to fertile by the Andes foothills, and vegetation types varying as in floodplains, bamboo dominated and terrains with widely different geomorphology and topography. The H–D allometry by Feldpausch et al. (2012) has been found to introduce large bias (> 20%) (Chave et al., 2014) in estimates of biomass when compared with local relations (Fig. S1b) (see, e.g., Hunter et al., 2013; Kearsley et al. 2013) and are probably simplistic approximations of H–D variations (Fig. S1c).
Wood specific gravity, the Achilles' heel of biomass estimation
In allometric models, biomass at tree level or at aggregate plot level varies linearly with wood density (Chave et al., 2005; Asner & Mascaro, 2014). However, wood density is not directly measured in the field and estimates are often extracted from published tabulated data with large uncertainty due to variations in measurement techniques, sample size, geographic concentration of samples and identification of species (Muller-Landau, 2004). The spatial variation of average wood density over Amazonia is unknown, but is expected to be large because of geographic variations in taxonomy and phylogenetic characteristics (Chave et al., 2009), as well as interspecific and inter-site variations in both soil fertility and complex processes of tree mortality (Muller-Landau, 2004). Field observations suggest that there is a significant pattern in wood density related to soil characteristics – trees with higher wood density in infertile soils of eastern Amazonia and those with lower wood density in more fertile soils of western Amazonia near the Andes foothills (ter Steege et al., 2006; Quesada et al., 2012). However, without systematic spatial sampling from ground or remote-sensing observations of wood traits, we will not be able to prove but only suggest a regional and large-scale pattern.
We challenge the claim of Mitchard et al. (2014) that research plots provide accurate estimates of variations in wood density over Amazonia. To demonstrate this, we use a larger dataset (n = 3616) compiled over Amazonia using plots provided by Mitchard et al. (2014) and additional data from other sources (Nogueira et al., 2005; Saatchi et al., 2011; S. Brown, Winrock International, pers. comm.). Dividing the data over the same four regions suggested by Mitchard et al. (2014), we show that the within-region variations in wood density are larger than the between-region variations and the regional mean values (average wood density of individual trees in plots) are less divergent (Fig. S2). Our data, although not based on a systematic sampling of Amazonia, suggest that the wood density may have larger variations within landscapes than at regional scales because of the heterogeneity in forest composition, soil characteristics, geomorphology, size-dependent tree mortality and disturbance regimes, all functioning at small scales (metres to hectares).
Research Plots and the Curse of Sampling
The Mitchard et al. (2014) data may be more comprehensive within individual plots, yet they are several orders of magnitude less extensive across space. In other words, they are missing the forest for 0.00001% of the trees.
By referring to their plot network as an ‘inventory’, Mitchard et al. (2014) conflate measurement protocol in the field (i.e. wherein all trees are inventoried) with strategic planning to sample biomass and other forest properties as conducted by national forest inventories (e.g. McRoberts et al., 2005). Their research plot network, although designed and used for ecological studies, is not suitable for biomass inventory because: (1) more than half of the plots are inherited from different groups, increasing the likelihood of measurement errors; (2) plots are spatially clustered near roads, rivers and research stations for easy access; and (3) the plots are haphazardly located, yet falsely depicted on the map to convey a widespread distribution over Amazonia. The coordinates have large uncertainty (c. 10–50 km) (Supporting Information in Mitchard et al., 2014) because of lack of GPS recordings, particularly in older plots. Locations provided in their paper do not always match with data provided in RAINFOR publications or websites (Baker et al., 2004; Malhi et al., 2006; http://www.rainfor.org/). The comparison of the biomass estimates from the opportunistic plot collection that is only broadly constrained in space (at times as poorly as c. 10–50 km) and time (the years 1956–2013) leads to uncertainties ignored by Mitchard et al. (2014) when comparing with the satellite estimates that are tightly constrained (2005 ± 3 years and < 100 m for Geoscience Laser Altimeter System (GLAS) lidar observations; Lefsky, 2010)
To compensate for the sparse sampling of their plot collection, Mitchard et al. (2014) opted to average ‘field plots within 20 km × 20 km boxes and compared the mean biomass values for these boxes to the mean AGB [aboveground biomass] of RS1 and RS2’. In the process, however, they committed several methodological errors.
First, they found 107 unique points (20 km × 20 km) with an average of 3.9 (1–14) plots for each box. We could not reproduce the same number of unique points with their data. We found 109 unique points at 80 km × 80 km (with a similar average of 3.8 plots per box) or 189 unique points for 5 km × 5 km boxes (with an average of 2.1 plots). This difference has large implications for trend analysis and map comparison because 80-km boxes are one degree of magnitude larger than typical landscape scales (< 10 km).
Second, Mitchard et al. (2014) ignored the sampling problem and treated these average biomass values as the true mean for each box. The large spatial variability of AGB suggests that an average of at least 9–15 1-ha plots randomly located in the 20-km box areas are required to estimate the mean biomass at each point with 20% error (Chave et al., 2003) (Fig. S3 shows examples of biomass spatial heterogeneity using airborne lidar data). Unlike Mitchard et al. (2014), we addressed the sampling problem in our map (RS1) and provided the uncertainty of using five GLAS lidar shots (> 0.25 ha each) systematically sampling the 1-km map units in developing the RS1 biomass map (Saatchi et al., 2011).
Third, we followed their approach and performed the trend analysis with unique points (109 or 189) derived from their research plots along three directions (N–S, E–W and NE–SW) using the ordinary least squares (OLS) regression model [y = α + Xβ + ε; with X being the explanatory variable (e.g. latitude or longitude or diagonal distance), y being AGB, and ε representing a geometric error term]. The reproduced results show that β has similar significance levels to Fig. 2 in Mitchard et al. (2014) (Table S1). However, if AGB can be fully explained by X using OLS, the residual of OLS regression should be white noise, otherwise any significance test based upon OLS is erroneous (Lennon, 2000). Our analysis shows that the OLS residual error is spatially correlated (Moran's I test in Table S1), confirming the existence of spatial autocorrelation (Fig. S4a), even after accounting for the changes in the proposed explanatory variable (Fig. S4b). Such spatial autocorrelation can be modelled as non-zero covariance in the regression residual, under the assumption of covariance stationarity (which is the same assumption underlying ordinary or universal kriging). The so-called geostatistical regression (GR) (Johnson & Hoeting, 2011) utilizing the generalized least squares (GLS) method shows that none of the trends provided by Mitchard et al. (2014) is significant under this approach (Seber & Lee, 2012). By ignoring the presence of spatial correlation in the data, they have effectively overestimated the number of samples, and reported artificially low P-values (Duffy et al., 2007). The transformed residuals in the GR method are no longer dependent on distance, suggesting that the results of our GLS approach are valid (Fig. S4b).
Fourth, Mitchard et al. (2014) rely on interpolating 412 research plots over 650 million ha of Amazonian forests with large stretches (> 100 million ha) without a single plot. The performance of their interpolation is strongly dependent on the noise in the data, the spatial autocorrelation, the sampling pattern and the method of interpolation. By using a log–log axis for semivariogram analysis, Mitchard et al. (2014) misinterpreted the autocorrelation among plots. Using a linear axis, our analysis shows that spatial autocorrelation does exist and extends to more than 2000 km with large variations at local scales (presence of a non-zero nugget) (Fig. S4a). Mitchard et al. (2014) consequently ignored any statistically rigorous kriging analysis and simply applied an inverse distance-based kernel function for spatial interpolation (Isaaks and Srivastava, 1989). In the presence of spatial autocorrelation the use of the inverse distance approach that ignores the uncertainty in the data has no sound statistical basis and can provide misleading interpolated surfaces (Zimmerman et al., 1999). In addition, the performance of the interpolation also deteriorates significantly when sampling patterns are clustered instead of random, unless the sampling is designed initially to minimize the maximum or average kriging prediction-error variance (Brus & Heuvelink, 2007), which is not the case with the research plots. Without any optimized spatial sampling used in the forest inventory techniques (Mandallaz, 2007) the interpolated surfaces of the AGB offered by Mitchard et al. (2014) are erroneous and do not provide any further information than the original plots.
Misleading Assessment of Biomass Maps
Mitchard et al. (2014) claim to have documented a gradient in biomass from the south-west to the north-east in Amazonia. However, not only do they fail to rigorously prove this point, as demonstrated above, but they present a false narrative of the distribution of Amazonian biomass and what the maps present.
The literature on the spatial pattern of forest biomass in Amazonia is extremely diverse and sometimes contains contradictory results (Clark & Clark, 2000; DeWalt & Chave, 2004; Slik et al., 2010; Quesada et al., 2012). Forest biomass is a synthesis of several ecological and biological processes modulated with climate, soil and disturbance. If there is some non-randomness in these processes, then what are the controlling factors and at what scale do they operate? We showed that research plots, without any statistical sampling design, are not suitable for providing reliable answers to these important questions.
In the absence of rigorously designed (and extensive) ground-based forest inventories, remote-sensing techniques with spatially resolved, systematic and repeated measurements of forest structure are the best alternative to inventory sampling (Asner et al., 2013; Neigh et al., 2013). The GLAS lidar measurements taken along the ICESAT orbital tracks provide systematic samples of forest structure that are three orders of magnitude denser than the opportunistic research plots (Fig. S5). Using a simple statistical aggregation of the height samples measured by GLAS lidar at 50 km × 50 km grids (with over a thousand samples in undisturbed forest pixels), we provide spatial patterns of forest structure over Amazonia (Fig. S5), showing the distribution of tall trees, potential gradients and large spatial variability.
We converted the GLAS lidar samples to biomass from a single equation derived from plots scattered in Amazonia close to lidar measurements with knowledge about the uncertainty of using a single allometry or average wood density (Saatchi et al., 2011). However, using a single model based on canopy height over Amazonia had the advantage of not introducing any additional artificial and spatially correlated errors in the biomass map from unknown variations in wood density (Chave et al., 2009). Nevertheless, contrary to the claims of Mitchard et al. (2014) the biomass map (RS1), with its regional bias, preserves the potential known patterns in Amazonia. To show this, we used a soil map of Amazonia (Fig. S6) as the basic stratification of forest types, calculated the mean forest height and biomass for each stratum and coloured the map to highlight regional patterns (Fig. S7).
In Conclusion
Mitchard et al. (2014) have conducted an analysis with numerous and serious technical flaws. Instead, they rely on human psychology and create false impression that measures by touch (e.g. field plots) are superior to remotely sensed measures (from air and space). We know this is a flawed argument. Yet it is innately compelling because the human brain is wired to accept it. These instincts and arguments affected their methodology so that they erroneously used few plots for inferences about distant forests in time and place and comparison with satellite observations, that are extensive in space and constrained in time. By doing this, they made ‘predictions, evaluations, and choices with respect to [their] construal of objects rather than the objects themselves’. (Liberman and Trope, 2008).
Acknowledgements
This work was carried out at the Jet Propulsion Laboratory, California Institute of Technology, and University of California Los Angeles, USA, under a contract by National Aeronautics and Space Administration (NASA).