International Journal of Forestry Research

Volume 2025, Issue 1 9355771

Research Article

Open Access

A Hybrid Machine Learning Approach for Estimating Aboveground Biomass and Carbon Stock in Tanzania’s Miombo Woodlands

Emmanuel F. Chifunda,

Corresponding Author

Emmanuel F. Chifunda

[email protected]

orcid.org/0009-0004-9443-2548

Department of Mathematics and Statistics , College of Natural and Mathematical Sciences , University of Dodoma , P.O. Box 338, Dodoma , Tanzania , udom.ac.tz

Search for more papers by this author

Ramkumar T. Balan,

Ramkumar T. Balan

orcid.org/0000-0001-5206-5226

Department of Mathematics and Statistics , College of Natural and Mathematical Sciences , University of Dodoma , P.O. Box 338, Dodoma , Tanzania , udom.ac.tz

Search for more papers by this author

Peter J. Kirigiti,

Peter J. Kirigiti

orcid.org/0000-0002-5038-2225

Department of Mathematics and Statistics , College of Natural and Mathematical Sciences , University of Dodoma , P.O. Box 338, Dodoma , Tanzania , udom.ac.tz

Search for more papers by this author

Emmanuel F. Chifunda,

Corresponding Author

Emmanuel F. Chifunda

[email protected]

orcid.org/0009-0004-9443-2548

Department of Mathematics and Statistics , College of Natural and Mathematical Sciences , University of Dodoma , P.O. Box 338, Dodoma , Tanzania , udom.ac.tz

Search for more papers by this author

Ramkumar T. Balan,

Ramkumar T. Balan

orcid.org/0000-0001-5206-5226

Department of Mathematics and Statistics , College of Natural and Mathematical Sciences , University of Dodoma , P.O. Box 338, Dodoma , Tanzania , udom.ac.tz

Search for more papers by this author

Peter J. Kirigiti,

Peter J. Kirigiti

orcid.org/0000-0002-5038-2225

Department of Mathematics and Statistics , College of Natural and Mathematical Sciences , University of Dodoma , P.O. Box 338, Dodoma , Tanzania , udom.ac.tz

Search for more papers by this author

First published: 19 July 2025

https://doi.org/10.1155/ijfr/9355771

Academic Editor: Anna Źróbek

Share a link

Email
Wechat
Bluesky

Abstract

The complexity of Miombo woodlands, characterized by diverse attributes, poses challenges in developing accurate and reliable biomass estimation models using conventional approaches. Conventional approaches inadequately capture the intricate relationships between biomass and the numerous factors in Miombo woodlands. This study proposes a novel approach combining artificial neural networks (ANNs) and random forest (RF) algorithms to estimate AGB and carbon stock in the Miombo Woodland Ecosystem. A model (ANN-RF) was developed using a combination of ANN and RF models. Initially, the RF algorithm combined the predictions from the ANN models. Then, a stacking technique was used to integrate both the ANN and RF models. Comparative models such as allometric, ANN, and RF models were also established. Traditional allometric models refer to regression-based allometric equations commonly used for biomass estimation. The input variables for estimating AGB and carbon stock included diameter at breast height, tree height, basal area, stem density, slope, elevation, precipitation, and soil pH. Model quality was evaluated using root-mean-square error (RMSE, Mg/tree), coefficient of determination (R²), and mean absolute error (MAE, Mg/tree). The combined ANN-RF model outperformed individual models and traditional allometric equations, achieving the highest accuracy with R² = 0.975, RMSE = 0.153 Mg/tree, and MAE = 0.053 Mg/tree using the full input set. Even with reduced input variables, the ANN-RF model maintained superior performance. Traditional allometric models showed significantly lower accuracy, highlighting the effectiveness of the ANN-RF model for estimating AGB and carbon stock in the Miombo Woodland Ecosystem.

1. Introduction

Forests are vital for maintaining ecological balance and providing numerous ecosystem services, making them essential for global environmental health [1]. They act as significant carbon sinks by sequestering carbon dioxide from the atmosphere, thereby mitigating climate change [2, 3]. Acknowledging the importance of forests in carbon storage and biodiversity conservation, international initiatives such as the United Nations Framework Convention on Climate Change (UNFCCC) have provided strategies to combat deforestation, including reducing emissions from deforestation and forest degradation (REDD+), afforestation programs, and sustainable forest management [1, 4].

Among various forest types, Miombo Woodlands are a notable subtype of tropical forests, predominant in Tanzania and other parts of Africa, distinguished by their diverse flora and fauna, play a fundamental role in global carbon generation. These woodlands display distinct physiognomic variations, classified into dry and wet Miombo based on annual rainfall [5, 6]. Covering about 33.4 million hectares, Woodlands consist of several dry forest types, including Miombo Woodlands, and constitute approximately 95% of Tanzania’s total forest and woodland area [7]. The carbon stock in Miombo Woodlands, particularly the aboveground biomass (AGB) and aboveground carbon (AGC) stock per hectare, varies significantly across different regions in Tanzania [8].

Since Miombo Woodlands are highly populated woodlands, accurate estimation of biomass and carbon is crucial for developing and implementing mitigation strategies to reduce greenhouse gas emissions globally in tropical regions. Traditional methods for AGB estimation, such as allometric equations, often fall short in capturing the complex structure and spatial variability of Miombo Woodlands [9]. The inadequacies of the current model stem arise from their inability to account for the inherent variability in tree structures, species-specific growth patterns, and regional environmental factors within the Miombo woodlands ecosystem (MWE) [10]. Consequently, there is an increasing interest in leveraging advanced modeling techniques, such as artificial neural network (ANN) and random forest (RF) algorithms, to enhance the accuracy of AGB and carbon stock estimation.

This study aimed to develop an advanced statistical model by integrating advanced statistical techniques, specifically ANN and RF algorithms. By doing so, this study seeks to improve the precision of AGB and carbon stock estimates in Miombo Woodlands.

The outcomes of this research will contribute to the advancement of biomass estimation techniques in Miombo Woodlands and have broader implications for enhancing carbon stock assessments in similar ecosystems globally.

2. Materials and Methods

2.1. The Study Area

The study was conducted in Tanzania across six Miombo woodland forests. These included Angai Forest Reserve (AFR) in the Lindi region, Ayasanda and Duru Haitemba (ADH) in the Manyara region, and Gangalamtumba Village Land Forest Reserve (GVLFR) in the Iringa region. The other forests were Mkulazi Catchment Forest Reserve (MCFR) in Morogoro and Nyahua Forest Reserve (NFR) in Tabora as well as Mpanda in Katavi region (Figure 1).

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

Map of Tanzania showing regions with Miombo woodland cover.

2.2. Study Design and Selection of Study Sites

This was a panel longitudinal and systematic study conducted on randomly sampled trees. The selection of regions, forests, and subsequent Miombo trees were sampled based on double sampling for the stratification method. Initially, a dense grid of clusters was placed over the map of mainland Tanzania, with clusters spaced 5 km apart, forming the first-phase sample. Second-phase samples were systematically selected from the first-phase sample, with varying sampling intensities in each of the six regions. Subplots within each 1-ha plot were used for tree measurements in line with the National Forest Resources Monitoring and Assessment of Tanzania (NAFORMA) protocols [11].

2.3. Data Collection

Data for model training and validation were collected from the NAFORMA database which is under the National Carbon Monitoring Center (NCMC). Data on key input variables included diameter at breast height (Dbh), tree height (Ht), basal area (BA), stem density (SD), slope (Slp), elevation (Elv), precipitation (Prt), and soil pH (SpH). SpH was measured at 0–15 cm depth [12]. Tree-level AGB was calculated using species-specific allometric equations developed by Mugasha et al. [13], which are widely accepted for Miombo Woodlands, as presented in the following equation:

()

where Dbh is the diameter at breast height (cm) and Ht is the total tree height (m).

The BA of a single tree was calculated using its Dbh with the formula in the following equation.

()

where Dbh is in centimeters (cm). The Dbh value is divided by 100 to convert it into meters, ensuring that the resulting BA is expressed in square meters (m²).

High-resolution climate data at 2.5 arc-minute spatial resolutions were obtained from the WorldClim v2.1 database (https://www.worldclim.org/data/worldclim21.html) for the baseline period of 1970–2000 [14]. The key climate variable utilized in this study was mean annual Prt, which was spatially analyzed and visualized across Miombo Woodland sites to assess geographic variation in rainfall distribution, as shown in Figure 2.

2.4. Population and Sample Size

In this study, the population refers to all trees in Tanzania’s Miombo Woodlands. The population of trees is estimated to be 20,080 [15]. Because the study could not study all Miombo woodlands, a sample of 1619 trees was included in the study. The sample was adopted from NAFORMA as proportionally distributed among the six Miombo Woodland forests, as presented in Table 1.

Table 1. Sample size distribution among Miombo Woodland forests.

Forests	Region	Category	Sample size
GVLFR	Iringa	Dry miombo	283
Mpanda	Katavi	Wet miombo	263
AFR	Lindi	Wet miombo	270
ADH	Manyara	Dry miombo	263
MCFR	Morogoro	Wet miombo	277
NFR	Tabora	Dry miombo	263
Total			1619

2.5. Data Preparation and Exploration

The dataset was initially imported into R Studio for preliminary examination to assess its structure and contents. This process involved identifying variable types, evaluating the completeness of records, and detecting any inconsistencies or outliers. To enhance model simplicity and interpretability, the analysis excluded categorical variables such as site or species names, which tend to increase model complexity without substantially improving predictive performance [16]. Instead, emphasis was placed on the most relevant numeric variables known to significantly contribute to the estimation of AGB and carbon stock.

The variables retained for analysis include Dbh, Ht, BA, SD, Elv, Slp, SpH, and annual Prt. These variables are frequently cited in the ecological modeling literature as key predictors of forest biomass and carbon stocks. Descriptive statistics for each variable are summarized in Table 2, providing an overview of their central tendencies and ranges. Additionally, Figure 3 presents histograms for each numeric variable, offering visual insights into their distributional properties and potential skewness, which are critical for selecting appropriate modeling techniques.

Table 2. Characteristics of the datasets used.

Statistic	Dbh (cm)	Ht (m)	BA (m²/ha)	SD (stem/ha)	Elv (m)	Slp (degrees)	SpH	Prt (mm/year)	AGB (Mg/tree)	AGC (Mg/tree)
Min.	0.6	1.0	2.8e − 05	14.1	859	0.0	5.7	704	0.004	0.002
1st Qu.	6.3	2.2	3.1e − 03	14.1	979	10.0	6.4	821	0.162	0.081
Median	8.7	2.7	5.9e − 03	14.1	1121	12.0	6.6	864	0.388	0.194
Mean	13.2	6.7	2.6e − 02	26.6	1096	17.3	6.6	909	1.281	0.641
3rd Qu.	14.6	5.9	1.7e − 02	14.1	1180	25.0	6.9	1010	1.119	0.560
Max.	110.0	88.1	9.5e − 01	175.2	1288	55.0	8.5	1083	27.244	13.622

Note: AGC = 0.5AGB (where, 0.5 is the commonly used average carbon fraction factor for most trees [2]).

2.6. Feature Importance Analysis

Understanding the relative contribution of different variables in predicting AGB and carbon stock is crucial for accurate estimations [17, 18]. In this study, we applied the RF algorithm to assess feature importance, quantifying the significance of each predictor based on its ability to reduce impurity within the forest’s decision trees [19].

The results, illustrated in Figure 4, highlight the varying degrees of influence among the examined variables. The figure ranks predictors according to their impact on model accuracy, with Dbh emerging as the most influential factor, followed by BA and Ht. These findings underscore the dominant role of tree structural attributes in estimating AGB and carbon stock [20].

2.7. Models for Estimating the AGB and AGC Stock

The AGB estimates produced by the models represent individual tree biomass, calculated using tree-level input variables such as Dbh and Ht, which ranged from 0.6 to 110.0 cm and 1.0 to 88.1 m, respectively. These measurements capture the structural variation among trees, which significantly influences biomass accumulation. BA and AGB are initially computed at the tree level, with AGB values ranging from 0.004 to 27.244 Mg per tree. Plot-level characteristics, such as SD, ranging from 14.1 to 175.2 stems/ha, are incorporated to reflect the competitive environment that affects individual tree growth [21]. Although the modeling framework operates at the individual tree level, the outputs can be aggregated to stand-level estimates (e.g., Mg C/ha) to support forest management, carbon accounting, and ecological assessments.

2.8. Model Architecture and Stacking Approach

To estimate AGB, a hybrid ensemble learning framework was employed by integrating ANNs and RF using a stacked generalization approach. The workflow consisted of base learners: two ANNs and one RF, followed by a meta-learner trained on their predictions.

2.8.1. ANNs (ANN1 and ANN2)

The ANN models were implemented using the nnet method from the caret package in R, optimized for regression by setting linout = TRUE. Each network comprises an input layer (corresponding to the predictor variables), a single hidden layer, and a linear output node. The general structure of the ANN function is given in the following equation.

()

where

is the predicted value (AGB), x_i are the i-th input variables (Dbh, Ht, BA, SD, Elv, Slp, SpH, and Prt), n is the number of input features, H is the number of neurons in the hidden layer, σ(z) = 1/(1 + e^−z) is the sigmoid activation function applied at the hidden layer, w_ij and b_j are weights and biases from input to hidden layer, and β_j and β₀ are weights and bias from hidden to output layer.

Hyperparameters such as the number of hidden nodes (size) and weight decay (decay) were tuned using 10-fold cross-validation. ANN1 used default tuning parameters, while ANN2 was further optimized using tuneLength = 5.

The ANN1 equation (default parameters), as represented in equation (4), has hidden layer size of 5, decay = 0.1, and tuneLength = 3 (default).

()

is used in stacking but has the least contribution based on the coefficient (β₁ = 0.0146).

The ANN2 equation (optimized), as represented by equation (5), has hidden layer size of 9, decay = 0.0001, and tuneLength = 5.

()

significantly contributes to the stacked model (β₂ = 0.1477).

2.8.2. RF

The RF model was trained using the randomForest package with 400 trees (ntree = 400) and seven randomly selected variables at each node split (mtry = 7). The RF prediction function aggregates predictions from all decision trees. The RF prediction function is represented in the following equation.

()

where

, the total number of decision trees in the ensemble is T = 400, h_t(x) is the prediction of the t-th decision tree for input features x, and x is the input feature vector (Dbh, Ht, BA, SD, Elv, Slp, SpH, and Prt).

is the most influential base learner in stacking (β₃ = 0.8763).

2.9. Stacked Ensemble Meta-Learner (ANN-RF)

The predictions from ANN1, ANN2, and RF on the training set were used as input features for a meta-model built using linear regression. A stacking technique (meta-learner) is used to combine the outputs of the ANN and RF models resulting into an ANN-RF model, as shown in Figure 5. This stacked model combines the strengths of each base learner to produce final predictions. The meta-model is defined in the following equation.

()

where

is the final prediction of AGB from the stacked model;

, and

are predictions from base learners ANN1, ANN2, and RF, respectively; β₀ is the intercept term of the meta-learner (linear regression) which is equal to 0.0006158; β₁, β₂, andβ₃ are regression coefficients learned during meta−model training which is equal to 0.0145,688, 0.1476808, and 0.8762685, respectively; and ε is the residual error term.

2.10. Data Partitioning and Model Evaluation

The dataset was split into 80% training and 20% testing subsets using stratified sampling via createDataPartition to preserve the distribution of the target variable (AGB). All models were trained and evaluated using 10-fold cross-validation to enhance generalizability. Model performance was assessed on the test set using root mean square error (RMSE), mean absolute error (MAE), and the coefficient of determination (R²). The evaluation metrics were computed as shown in the following equations.

()

In the formulas provided, “n” denotes the total number of data points; “y_i,” “ ,” and “” represent the measured, predicted, and mean values of “y_i,” respectively.

2.11. Hyperparameter Tuning Process

To ensure optimal model performance, hyperparameter tuning was conducted for all base learners (ANN1, ANN2, and RF) as well as for the final stacked ensemble model.

ANNs (ANN1 and ANN2) consisted of a single hidden layer with sigmoid activation and a linear output node. ANN1 was configured with default hyperparameters, using a hidden layer size of five neurons and a weight decay parameter of 0.1. The tuning process employed a basic grid search with tuneLength = 3 under 10-fold cross-validation. ANN2 underwent extended tuning, with hyperparameters optimized using tuneLength = 5. This resulted in an improved configuration with a hidden layer size of nine neurons and a lower decay rate of 0.0001, which helped reduce overfitting while maintaining model flexibility.

The RF model was trained using the randomForest package in R. Key hyperparameters were selected based on prior experimentation and validation. The number of trees was set to 400 (ntree = 400) to ensure model stability and reduce variance. The number of variables randomly selected at each node split was set to seven (mtry = 7), which corresponds to the square root of the total number of predictor variables, following best-practice heuristic rule.

The predictions from ANN1, ANN2, and RF on the training data were used as input features for a meta-learner based on linear regression. As the meta-model (ANN-RF) does not involve complex tuning parameters, no additional hyperparameter optimization was required. The stacking approach was used to leverage the complementary strengths of the base learners, with model weights estimated through ordinary least square (OLS) regression.

3. Results

This study employed advanced machine learning techniques, including two ANNs (ANN1 and ANN2), a RF model, and an integrated ANN-RF hybrid model to capture the complex interactions among tree-specific, topographical, and environmental variables. Model performance was evaluated using R², RMSE, and MAE. As summarized in Table 3, the machine learning models developed in this study significantly outperformed traditional regression-based allometric models applied to the same dataset.

Table 3. Models’ comparison.

S/n	Author	Biomass model	R²	RMSE	MAE
1	This study (Group ‘a’-ANN1)	AGB∼Dbh + Ht + BA + SD + Elv + Slp + SpH + Prt	0.925	0.262	0.099
2	This study (Group ‘a’-ANN2)	AGB∼Dbh + Ht + BA + SD + Elv + Slp + SpH + Prt	0.954	0.207	0.108
3	This study (Group ‘a’ - RF)	AGB∼Dbh + Ht + BA + SD + Elv + Slp + SpH + Prt	0.973	0.168	0.045
4	This study (Group ‘a’-ANN-RF)	AGB∼Dbh + Ht + BA + SD + Elv + Slp + SpH + Prt	0.975	0.153	0.053
5	This study (Group ‘b’-ANN1)	AGB∼Dbh + Ht + BA	0.970	0.167	0.078
6	This study (Group ‘b’-ANN2)	AGB∼Dbh + Ht + BA	0.960	0.195	0.075
7	This study (Group ‘b’-RF)	AGB∼Dbh + Ht + BA	0.962	0.186	0.045
8	This study (Group ‘b’-ANN-RF)	AGB∼Dbh + Ht + BA	0.966	0.178	0.050
9	This study (Group ‘c’-ANN1)	AGB∼Dbh + Ht	0.954	0.207	0.109
10	This study (Group ‘c’-ANN2)	AGB∼Dbh + Ht	0.974	0.157	0.063
11	This study (Group ‘c’-RF)	AGB∼Dbh + Ht	0.965	0.183	0.055
12	This study (Group ‘c’-ANN-RF)	AGB∼Dbh + Ht	0.958	0.196	0.066
13	This study (Group ‘d’-ANN1)	AGB∼Dbh	0.447	0.721	0.307
14	This study (Group ‘d’-ANN2)	AGB∼Dbh	0.447	0.721	0.307
15	This study (Group ‘d’-RF)	AGB∼Dbh	0.400	0.777	0.319
16	This study (Group ‘d’-ANN-RF)	AGB∼Dbh	0.361	0.851	0.341
17	Abbot et al. [22]	log₁₀AGB = −3.85 + 2.49 log₁₀Dbh	0.206	1.343	1.106
18	Brown [23]	AGB = 0.1359Dbh^2.2320	0.348	1.052	0.996
19	Chamshama et al. [24]	AGB = 0.0625Dbh^2.553	0.323	1.049	0.985
20	Malimbwi et al. [25]	AGB = 0.0001Dbh^2.032Ht^0.66	0.338	1.047	0.978
21	Malimbwi and Temu [26]	AGB = 0.092Dbh^2.59	0.320	1.049	0.984
22	Mugasha et al. [27]	AGB = 0.1027Dbh^2.4798	0.329	1.050	0.988
23	Mugasha et al. [13]	AGB = 0.0763Dbh^2.2046Ht^0.4918	0.342	1.047	0.979
24	Mwakalukwa et al. [29]	ln(AGB) = −2.6896 + 1.9041 ln(Dbh) + 0.9377 ln(Ht)	0.153	1.357	1.110
25	Temu [28]	log₁₀AGB = −1.2875 + 2.8436 log₁₀Dbh	0.206	1.343	1.106

Note: “This study” refers to models developed in this study, grouped as follows: Group ‘a’ (Dbh, Ht, BA, SD, Elv, Slp, SpH, and Prt), Group ‘b’ (Dbh, Ht, and BA), Group ‘c’ (Dbh and Ht), and Group ‘d’ (Dbh only). For example, “This study (Group ‘a’-ANN1)” indicates the specific machine learning model (ANN1) using Group ‘a’ predictors.

3.1. Model Performance Evaluation

Visual assessments of model predictions (Figure 6) provided initial evidence of strong model accuracy. The scatter plot (Figure 6(a)) of observed versus predicted AGB values demonstrated that most data points clustered tightly around the 1:1 reference line, indicating high predictive precision. Additionally, the residuals versus predicted AGB plot (Figure 6(b)) confirmed that model errors were centered near zero and randomly distributed, with no discernible patterns or signs of systematic bias.

Further analysis of the residuals, through both histogram and normal Q-Q plots, supported these findings. The residual histogram (Figure 6(c)) exhibited a symmetric, approximately normal distribution centered at zero, while the Q-Q plot (Figure 6(d)) suggested acceptable adherence to normality, with only minor deviations observed at the distribution tails. Together, these diagnostics confirmed the robustness, reliability, and unbiased nature of the machine learning models.

A meta-learner or linear regression model was employed to evaluate the individual contributions of ANN1, ANN2, and RF to AGB prediction within a hybrid ensemble. The model exhibited an excellent fit, with a multiple R² of 0.987 and an adjusted R² of 0.987, indicating that 98.7% of the variance in AGB is explained by the combined predictions. The model was statistically significant (F-statistic = 31,990, p < 2.2e − 16) and had a low residual standard error (0.1165), reflecting high prediction accuracy.

Individually, the RF model made the most substantial and highly significant contribution (β = 0.8763, p < 0.001), followed by ANN2 (β = 0.1477, p < 0.001), which also provided a meaningful complementary effect. Conversely, ANN1’s contribution was statistically insignificant (β = 0.0146, p = 0.361), suggesting a minimal added predictive value. The intercept was also nonsignificant (p = 0.849), as shown in Table 4. These results underscore that the RF model is the primary driver of predictive accuracy in the ensemble, with ANN2 offering additional support, while ANN1 contributes negligibly.

Table 4. Coefficients and statistics of the hybrid ANN-RF (meta-learner/linear regression) model for AGB prediction.

Coefficient	Estimate	Std. error	t value	p value	Significance
(Intercept)	0.0006	0.0032	0.19	0.849
ANN1	0.0146	0.016	0.913	0.361
ANN2	0.1477	0.0186	7.922	5.01e − 15 (< 0.001)	^∗∗∗
RF	0.8763	0.0188	46.723	< 2e − 16 (< 0.001)	^∗∗∗

Note: Significance codes: 0 ‘^∗∗∗’ 0.001 ‘^∗∗’ 0.01 ‘^∗’ 0.05 ‘.’ 0.1 ‘’ 1.

3.2. Comparative Model Performance for AGB Prediction

This study evaluated the performance of AGB prediction models across four configurations, each employing varying sets of predictor variables. The models assessed include two ANNs (ANN1 and ANN2), RF, and a hybrid ANN-RF model.

In Model ‘a’, which incorporated a comprehensive set of predictors (Dbh, Ht, BA, SD, Elv, Slp, SpH, and Prt), the hybrid ANN-RF model achieved the best performance with an R² value of 0.975, an RMSE value of 0.153 Mg/tree, and an MAE value of 0.053 Mg/tree. The RF model also demonstrated strong accuracy (R² = 0.973, RMSE = 0.168 Mg/tree, and MAE = 0.045 Mg/tree). While ANN1 (R² = 0.925) and ANN2 (R² = 0.954) provided reasonable results, their errors were comparatively higher (ANN1: RMSE = 0.262 Mg/tree, MAE = 0.099 Mg/tree, ANN2: RMSE = 0.207 Mg/tree, and MAE = 0.108 Mg/tree). Residual and scatter plots (Figure 6) further confirmed the superior fit and minimal prediction errors of the ANN-RF and RF models in this configuration.

Focusing on the most influential variables (Dbh, Ht, and BA), Model ‘b’ proverb ANN1 outperform the other models, achieving an R² value of 0.970, an RMSE value of 0.167 Mg/tree, and an MAE value of 0.078 Mg/tree. The ANN-RF (R² = 0.966, RMSE = 0.178 Mg/tree, and MAE = 0.050 Mg/tree) and RF (R² = 0.962, RMSE = 0.186 Mg/tree, and MAE = 0.045 Mg/tree) models also maintained robust accuracy in this configuration. Notably, despite the reduced number of predictors, the performance of these models remained comparable to those using the full variable set, indicating the high predictive power of Dbh, Ht, and BA.

In Model ‘c’, which utilized Dbh and Ht as predictors, ANN2 showed the best performance with an R² value of 0.974, an RMSE value of 0.157 Mg/tree, and an MAE value of 0.063 Mg/tree. The RF model also performed well (R² = 0.965, RMSE = 0.183 Mg/tree, and MAE = 0.055 Mg/tree). ANN1 and ANN-RF also yielded good results (ANN1: R² = 0.954, RMSE = 0.207 Mg/tree, and MAE = 0.109 Mg/tree; ANN-RF: R² = 0.958, RMSE = 0.196 Mg/tree, and MAE = 0.066 Mg/tree).

In contrast, Model ‘c’, which relied exclusively on Dbh as the sole predictor, exhibited significantly poorer performance across all models. Both ANN1 and ANN2 recorded a low R² value of 0.447, with markedly higher errors (RMSE = 0.721 Mg/tree and MAE = 0.307 Mg/tree). The RF (R² = 0.400) and ANN-RF (R² = 0.361) models also showed substantial drops in accuracy, further emphasizing that using Dbh alone is insufficient for reliable AGB estimation.

Generally, the results underscore the critical importance of incorporating multiple predictor variables to achieve high-accuracy AGB predictions. The ANN-RF hybrid model consistently demonstrated strong performance, particularly when a comprehensive set of variables was included, while the significance of Dbh, Ht, and BA as key influential variables was also highlighted (refer Figure 4).

3.3. Comparison With Traditional Allometric Models

This study’s machine learning models consistently and significantly outperformed traditional allometric equations in predicting AGB, even when these traditional equations were tested using the identical dataset (n = 1619) employed to train the machine learning models. Crucially, this dataset comprises samples from both wet and dry Miombo Woodlands, a diverse ecosystem where traditional allometric equations are often developed for specific subtypes (wet or dry Miombo) or even for individual species, possibly limiting their applicability across the full spectrum. The superior performance of this study’s models highlights the significant benefits of their adaptability to complex ecological variations and their ability to leverage a wider range of predictor variables.

Traditional allometric models, such as those by Abbot et al. [22], Brown [23], Chamshama et al. [24], Malimbwi et al. [25], Malimbwi and Temu [26], Mugasha et al. [13, 27], Mwakalukwa et al. [12], and Temu [28], consistently exhibited much lower predictive accuracy on the common dataset. Their R² values ranged from 0.153 to 0.348, accompanied by substantially higher errors (refer Table 3).

3.4. Key Findings

Models that incorporate more predictor variables (Dbh, Ht, BA, SD, Elv, Slp, SpH, and Prt) perform better in terms of R², RMSE, and MAE. This highlights the importance of considering multiple factors in biomass estimation. The ANN-RF hybrid model consistently outperforms individual ANN or RF models, indicating that combining the strengths of different modeling approaches can lead to improved predictive performance.

Traditional models from previous studies generally perform poorly compared to the models developed in this study, suggesting that newer machine learning approaches (ANN and RF) provide significant improvements in biomass estimation accuracy and precision. The results demonstrate that advanced statistical models, particularly the integrated ANN-RF approach, significantly enhance the accuracy of AGB and carbon stock estimates in Miombo Woodlands. These findings support the adoption of advanced modeling techniques in biomass estimation efforts, which can inform more effective forest management and climate change mitigation strategies.

4. Discussion

The development and evaluation of a hybrid machine learning model (ANN-RF) for estimating AGB and carbon stock in Tanzania’s Miombo Woodlands revealed significant advancements over traditional allometric models, as evidenced by superior performance metrics across various predictor configurations. This discussion critically examines the performance of the ANN-RF model, individual machine learning models (ANN1, ANN2, and RF), and traditional allometric models, contextualizing their effectiveness in terms of predictive accuracy, dataset characteristics, predictor variables, and model complexity.

4.1. Model Performance Comparison

The ANN-RF hybrid model, particularly when utilizing the full set of predictors (Group ‘a’: Dbh, Ht, BA, SD, Elv, Slp, SpH, and Prt), achieved the highest predictive accuracy, with an R² value of 0.975, RMSE value of 0.153 Mg/tree, and MAE value of 0.053 Mg/tree. This performance markedly surpassed that of individual machine learning models (ANN1: R² = 0.925, RMSE = 0.262 Mg/tree, and MAE = 0.099 Mg/tree; ANN2: R² = 0.954, RMSE = 0.207 Mg/tree, and MAE = 0.108 Mg/tree; RF: R² = 0.973, RMSE = 0.168 Mg/tree, and MAE = 0.045 Mg/tree) and traditional allometric models, which reported R² values ranging from 0.153 to 0.348 and RMSE values from 1.047 to 1.357 Mg/tree (Table 3). The superior performance of the ANN-RF model is attributed to its ability to integrate the complementary strengths of ANN and RF through a stacking approach, capturing complex, nonlinear relationships among predictors that traditional allometric models fail to address [30–32].

When predictor sets were reduced, the ANN-RF model maintained robust performance. In Group ‘b’ (Dbh, Ht, and BA), the ANN-RF model achieved an R² value of 0.966, RMSE value of 0.178 Mg/tree, and MAE value of 0.050 Mg/tree, closely followed by ANN1 (R² = 0.970, RMSE = 0.167 Mg/tree, and MAE = 0.078 Mg/tree). In Group ‘c’ (Dbh and Ht), ANN2 outperformed others with an R² value of 0.974, RMSE value of 0.157 Mg/tree, and MAE value of 0.063 Mg/tree. However, Group ‘d’ models, relying solely on Dbh, exhibited significantly poorer performance (R² = 0.361–0.447, RMSE = 0.721–0.851 Mg/tree, and MAE = 0.307–0.341 Mg/tree), underscoring the critical importance of incorporating multiple predictors to capture the structural and environmental variability of Miombo Woodlands [33, 34].

In contrast, traditional allometric models, such as those by Mugasha et al. [13, 27], Mwakalukwa et al. [12], and Malimbwi et al. [35], exhibited lower predictive accuracy, with R² values ranging from 0.153 to 0.348 and RMSE values from 1.047 to 1.357 Mg/tree. Notably, the literature-reported R² values for some allometric models (such as, Mwakalukwa et al. [29], 96%–99% and Mugasha et al. [27], 95%–97%) were derived from species-specific or region-specific datasets, which may not generalize well across the diverse wet and dry Miombo Woodlands sampled in this study (n = 1619). The limited adaptability of allometric models to ecological heterogeneity likely explains their inferior performance compared to the machine learning models developed here [10].

4.2. Influence of Sample Size and Data Characteristics

The large sample size (n = 1619) used in this study, drawn from the NAFORMA database across six diverse Miombo Woodland sites, significantly enhanced the robustness of the machine learning models. In comparison, allometric models were developed with smaller datasets (such as Mugasha et al. [27]: n = 167, Mwakalukwa et al. [12]: n = 142 trees + 57 shrubs, and Malimbwi et al. [25]: n = 17–191). The larger dataset allowed the machine learning models to better capture the variability in tree structures, species compositions, and environmental conditions, thereby improving generalizability [16]. The stratified sampling approach, aligned with NAFORMA protocols, further ensured representative coverage of both wet and dry Miombo Woodlands, enhancing model applicability across Tanzania’s diverse forest ecosystems.

4.3. Role of Predictor Variables

The inclusion of a comprehensive set of predictors in Group ‘a’ (Dbh, Ht, BA, SD, Elv, Slp, SpH, and Prt) was critical to the ANN-RF model’s superior performance. Feature importance analysis (Figure 4) identified Dbh, Ht, and BA as the most influential predictors, consistent with the ecological modeling literature that emphasizes tree structural attributes as primary drivers of AGB [20, 33]. Environmental variables such as Elv, Slp, SpH, and Prt further refined predictions by accounting for topographic and climatic influences on biomass accumulation [8]. In contrast, allometric models typically relied on fewer predictors (such as, Dbh, Ht, and occasionally wood density), which, while parsimonious, limited their ability to capture complex ecological interactions [36]. The poor performance of Group ‘d’ models, using only Dbh, highlights the inadequacy of single-predictor models in complex ecosystems such as Miombo Woodlands.

4.4. Model Complexity and Practical Implications

Machine learning models, particularly the ANN-RF hybrid, are inherently complex and data-intensive, requiring large datasets and computational resources for training and tuning. The ANN-RF model’s stacking approach, combining predictions from ANN1, ANN2, and RF via a linear regression meta-learner, leverages the strengths of both neural networks (nonlinear pattern recognition) and RFs (robustness to overfitting) [30–32]. However, this complexity may limit their immediate applicability in field settings with limited computational infrastructure. Conversely, allometric models, being regression-based and requiring fewer predictors, offer simplicity and ease of use in field applications, making them practical for rapid biomass assessments [33]. The trade-off between accuracy and practicality suggests that machine learning models are best suited for large-scale, data-rich applications, such as national carbon inventories, while allometric models remain valuable for localized, resource-constrained settings.

4.5. Limitations and Future Directions

While the ANN-RF model demonstrated high accuracy, its reliance on a large dataset may limit its applicability in regions with sparse data. Exploring transfer learning or data augmentation techniques could enhance model performance in data-scarce environments. Furthermore, incorporating additional predictors, such as remote-sensing data or species-specific traits, could further improve model accuracy and generalizability [37]. Finally, validating the ANN-RF model across other tropical forest ecosystems would strengthen its applicability to global carbon accounting and REDD + initiatives.

5. Conclusion

The ANN-RF hybrid model significantly outperforms traditional allometric models and individual machine learning models in estimating AGB and carbon stock in Tanzania’s Miombo Woodlands, particularly when leveraging a comprehensive set of predictors. Its ability to capture complex ecological relationships makes it a powerful tool for enhancing carbon stock assessments and supporting sustainable forest management. However, the practical advantages of allometric models in field applications highlight the need for context-specific model selection. These findings underscore the potential of advanced machine learning techniques to revolutionize biomass estimation in complex tropical ecosystems, with broader implications for global climate change mitigation strategies.

Ethics Statement

The authors have nothing to report.

Consent

The authors have nothing to report.

Conflicts of Interest

The authors declare no conflicts of interest.

Funding

No funding was received for this research.

Acknowledgments

The authors would like to express their gratitude to the NCMC for allowing them to utilize the tree measurements’ dataset from the NAFORMA database.

Open Research

Data Availability Statement

The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.

References

1 FAO, Global Forest Resources Assessment 2020–Key Findings, Food and Agriculture Organization of the United Nations. (2020) .
Google Scholar
2 IPCC, 2019 Refinement to the 2006 IPCC Guidelines for National Greenhouse Gas Inventories, 2019, Intergovernmental Panel on Climate Change (IPCC).
Google Scholar
3 Nunes L. J. R., Meireles C. I. R., Pinto Gomes C. J., and Almeida Ribeiro N. M. C., Forest Contribution to Climate Change Mitigation: Management Oriented to Carbon Capture and Storage, Climate. (2020) 8, no. 2, https://doi.org/10.3390/cli8020021.
10.3390/cli8020021
PubMed Web of Science® Google Scholar
4 Brack D., Forests and Climate Change, Proceedings of Background Study Prepared for the Fourteenth Session of the United Nations Forum on Forests, March 2019, United Nations Forum on Forests.
Google Scholar
5 Matowo G., Sangeda A., and Katani J., The Regeneration Dynamics of Miombo Tree Species in Sub-Saharan Africa, African Journal of Ecology and Ecosystems. (2019) 6, no. 5, 1–016.
Google Scholar
6 Shamaoma H., Chirwa P. W., Ramoelo A., Hudak A. T., and Syampungani S., The Application of UASs in Forest Management and Monitoring: Challenges and Opportunities for Use in the Miombo Woodland, Forests. (2022) 13, no. 11, https://doi.org/10.3390/f13111812.
10.3390/f13111812
Web of Science® Google Scholar
7 Ribeiro N. S., Grundy I. M., Gonçalves F. M. et al., People in the Miombo Woodlands: Socio-Ecological Dynamics, Miombo Woodlands in a Changing Environment: Securing the Resilience and Sustainability of People and Woodlands. (2020) 55–100.
10.1007/978-3-030-50104-4_3
Google Scholar
8 Manyanda B. J., Nzunda E. F., Mugasha W. A., and Malimbwi R. E., Estimates of Volume and Carbon Stock Removals in Miombo Woodlands of Mainland Tanzania, International Journal of Financial Research. (2020) 2020, 1–10, https://doi.org/10.1155/2020/4043965.
10.1155/2020/4043965
Google Scholar
9 Macave O. A., Ribeiro N. S., Ribeiro A. I. et al., Modelling Aboveground Biomass of Miombo Woodlands in Niassa Special Reserve, Northern Mozambique, Forests. (2022) 13, no. 2, https://doi.org/10.3390/f13020311.
10.3390/f13020311
Web of Science® Google Scholar
10 Shirima D. D., Forests and Woodlands of Tanzania: Interactions Between Woody Plant Structure, Diversity, Carbon Stocks and Soil Nutrient Heterogeneity, 2019.
Google Scholar
11 Tomppo E., Malimbwi R., Katila M. et al., A Sampling Design for a Large Area Forest Inventory: Case Tanzania, Canadian Journal of Forest Research. (2014) 44, no. 8, 931–948, https://doi.org/10.1139/cjfr-2013-0490, 2-s2.0-84905007237.
10.1139/cjfr-2013-0490
Web of Science® Google Scholar
12 Mwakalukwa E. E., Meilby H., and Treue T., Floristic Composition, Structure, and Species Associations of Dry Miombo Woodland in Tanzania, ISRN Biodiversity. (2014) 2014, 1–15, https://doi.org/10.1155/2014/153278.
10.1155/2014/153278
Google Scholar
13 Mugasha W. A., Eid T., Bollandsås O. M. et al., Allometric Models for Prediction of Above- and Belowground Biomass of Trees in the Miombo Woodlands of Tanzania, Forest Ecology and Management. (2013) 310, 87–101, https://doi.org/10.1016/j.foreco.2013.08.003, 2-s2.0-84883628010.
10.1016/j.foreco.2013.08.003
Web of Science® Google Scholar
14 Fick S. E. and Hijmans R. J., Worldclim 2: New 1-km Spatial Resolution Climate Surfaces for Global Land Areas, International Journal of Climatology. (2017) 37, no. 12, 4302–4315, https://doi.org/10.1002/joc.5086, 2-s2.0-85019264593.
10.1002/joc.5086
Web of Science® Google Scholar
15 Mugasha W., Mauya E., Njana A., Karlsson K., Malimbwi R., and Ernest S., Height-Diameter Allometry for Tree Species in Tanzania Mainland, International Journal of Financial Research. (2019) 2019, 1–17, https://doi.org/10.1155/2019/4832849, 2-s2.0-85065644892.
10.1155/2019/4832849
Google Scholar
16 Rudin C., Chen C., Chen Z., Huang H., Semenova L., and Zhong C., Interpretable Machine Learning: Fundamental Principles and 10 Grand Challenges, Statistics Surveys. (2022) 16, 1–85, https://doi.org/10.1214/21-ss133.
10.1214/21-ss133
Web of Science® Google Scholar
17 Bulut S., Machine Learning Prediction of Above-Ground Biomass in Pure Calabrian Pine (Pinus brutia Ten.) Stands of the Mediterranean Region, Türkiye, Ecological Informatics. (2023) 74, https://doi.org/10.1016/j.ecoinf.2022.101951.
10.1016/j.ecoinf.2022.101951
Web of Science® Google Scholar
18 Li X., Wang Y., Basu S., Kumbier K., and Yu B., A Debiased MDI Feature Importance Measure for Random Forests, Advances in Neural Information Processing Systems. (2019) 32.
Google Scholar
19 Li Y., Li C., Li M., and Liu Z., Influence of Variable Selection and Forest Type on Forest Aboveground Biomass Estimation Using Machine Learning Algorithms, Forests. (2019) 10, no. 12, https://doi.org/10.3390/f10121073.
10.3390/f10121073
PubMed Web of Science® Google Scholar
20 Moncada-Torres A., van Maaren M. C., Hendriks M. P., Siesling S., and Geleijnse G., Explainable Machine Learning Can Outperform Cox Regression Predictions and Provide Insights in Breast Cancer Survival, Scientific Reports. (2021) 11, no. 1, https://doi.org/10.1038/s41598-021-86327-7.
10.1038/s41598-021-86327-7
PubMed Web of Science® Google Scholar
21 Brunner A. and Forrester D. I., Tree Species Mixture Effects on Stem Growth Vary With Stand Density–An Analysis Based on Individual Tree Responses, Forest Ecology and Management. (2020) 473, https://doi.org/10.1016/j.foreco.2020.118334.
10.1016/j.foreco.2020.118334
Web of Science® Google Scholar
22 Abbot P., Lowore J., and Werren M., Models for the Estimation of Single Tree Volume in Four Miombo Woodland Types, Forest Ecology and Management. (1997) 97, no. 1, 25–37, https://doi.org/10.1016/s0378-1127(97)00036-4, 2-s2.0-0030711005.
10.1016/s0378-1127(97)00036-4
Web of Science® Google Scholar
23 Brown S., Estimating Biomass and Biomass Change of Tropical Forests: A Primer Estimating Biomass and Biomass Change of Tropical Forests: A Primer(FAO Forestry Paper-134), 1997, http://www.fao.org/docrep/w4095e/w4095e00.htm.
Google Scholar
24 Chamshama S. A. O., Mugasha A. G., and Zahabu E., Stand Biomass and Volume Estimation for Miombo Woodlands at Kitulangalo, Morogoro, Tanzania, Southern African Forestry Journal. (2004) 200, no. 1, 59–70, https://doi.org/10.1080/20702620.2004.10431761, 2-s2.0-2342464849.
10.1080/20702620.2004.10431761
Google Scholar
25 Malimbwi E., Zahabu R. E., Solberg B., and Luoga E. J., Biomass and Carbon Stocks in Miombo Woodlands of Tanzania: Country Level Estimates, Journal of Forestry Research. (2017) 28, no. 4, 711–724.
Google Scholar
26 Malimbwi R. E. and Temu A. B., Volume Functions for Pterocarpus Angolensis and Julbernadia Globiflora, Journal of Tanzania Association of Foresters. (1984) 1, 49–53.
Google Scholar
27 Mugasha W. A., Eid T., Bollandsås O. M., and Malimbwi C., Allometric Models for Prediction of Aboveground Biomass of Single Trees in Miombo Woodlands in Tanzania, 2012.
Google Scholar
28 Temu A. B., Double Sampling With Aerial Photographs in Estimating Wood Volume in Miombo Woodlands, 1981, Division of Forestry, University of Dar-Es-Salaam, Morogoro, Tanzania.
Google Scholar
29 Mwakalukwa E. E., Meilby H., and Treue T., Volume and Aboveground Biomass Models for Dry Miombo Woodland in Tanzania, International Journal of Financial Research. (2014) 2014, 1–11, https://doi.org/10.1155/2014/531256.
10.1155/2014/531256
Google Scholar
30 Bazrafkan A., Navasca H., Kim J.-H. et al., Predicting Dry Pea Maturity Using Machine Learning and Advanced Sensor Fusion With Unmanned Aerial Systems (Uass), Remote Sensing. (2023) 15, no. 11, https://doi.org/10.3390/rs15112758.
10.3390/rs15112758
Web of Science® Google Scholar
31 Breiman L., Random Forests, Machine Learning. (2001) 45, no. 1, 5–32, https://doi.org/10.1023/a:1010933404324, 2-s2.0-0035478854.
10.1023/A:1010933404324
Web of Science® Google Scholar
32 Singh S. and Singh Y., 9 Forest Resources, Advances in Geospatial Technologies for Natural Resource Management. (2024) 213.
10.1201/9781003035404-9
Google Scholar
33 Chave J., Réjou-Méchain M., Búrquez A. et al., Improved Allometric Models to Estimate the Aboveground Biomass of Tropical Trees, Global Change Biology. (2014) 20, no. 10, 3177–3190, https://doi.org/10.1111/gcb.12629, 2-s2.0-84908499511.
10.1111/gcb.12629
PubMed Web of Science® Google Scholar
34 Chitayat A. B., Lewis M., Anyelwisye M. et al., Development of Spatial Models and Maps for Tree Species Diversity and Biomass in a Miombo Ecosystem, Western Tanzania, Applied Vegetation Science. (2024) 27, no. 4, https://doi.org/10.1111/avsc.70002.
10.1111/avsc.70002
Web of Science® Google Scholar
35 Malimbwi R. E., Eid T., and Chamshama S. A. O., Allometric Tree Biomass and Volume Models in Tanzania, 2016, Department of Forest Mensuration and Management, Faculty of Forestry and Nature Conservation, Sokoine University of Agriculture.
Google Scholar
36 Macave O. A., Ribeiro N. S., Ribeiro A. I. et al., Modelling Aboveground Biomass of Miombo Woodlands in Niassa Special Reserve, Northern Mozambique, Forests. (2022) 13, no. 2, https://doi.org/10.3390/f13020311.
10.3390/f13020311
Web of Science® Google Scholar
37 Ribeiro N. S., Silva de Miranda P. L., and Timberlake J., N. S. Ribeiro, Y. Katerere, P. W. Chirwa, and I. M. Grundy, Biogeography and Ecology of Miombo Woodlands, Miombo Woodlands in a Changing Environment: Securing the Resilience and Sustainability of People and Woodlands, 2020, Springer International Publishing, 9–53, https://doi.org/10.1007/978-3-030-50104-4_2.
10.1007/978-3-030-50104-4_2
Google Scholar

All articles

A Hybrid Machine Learning Approach for Estimating Aboveground Biomass and Carbon Stock in Tanzania’s Miombo Woodlands

Abstract

1. Introduction

2. Materials and Methods

2.1. The Study Area

2.2. Study Design and Selection of Study Sites

2.3. Data Collection

2.4. Population and Sample Size

2.5. Data Preparation and Exploration

2.6. Feature Importance Analysis

2.7. Models for Estimating the AGB and AGC Stock

2.8. Model Architecture and Stacking Approach

2.8.1. ANNs (ANN1 and ANN2)

2.8.2. RF

2.9. Stacked Ensemble Meta-Learner (ANN-RF)

2.10. Data Partitioning and Model Evaluation

2.11. Hyperparameter Tuning Process

3. Results

3.1. Model Performance Evaluation

3.2. Comparative Model Performance for AGB Prediction

3.3. Comparison With Traditional Allometric Models

3.4. Key Findings

4. Discussion

4.1. Model Performance Comparison

4.2. Influence of Sample Size and Data Characteristics

4.3. Role of Predictor Variables

4.4. Model Complexity and Practical Implications

4.5. Limitations and Future Directions

5. Conclusion

Ethics Statement

Consent

Conflicts of Interest

Funding

Acknowledgments

Open Research

Data Availability Statement

References

Figures

References

Related

Information