Volume 2024, Issue 1 5630435
Research Article
Open Access

Combination of Physics-Based Model and Artificial Intelligence for Rapid Simulation and Optimization of Dark Fermentative Hydrogen Production From Water Hyacinth

Phan Khanh Thinh Nguyen

Phan Khanh Thinh Nguyen

Department of Chemical and Biological Engineering , Gachon University , Seongnam , 13120 , Gyeonggi-do , Republic of Korea , gachon.ac.kr

Search for more papers by this author
Thi Thu Ha Tran

Corresponding Author

Thi Thu Ha Tran

Faculty of Environment , Ho Chi Minh City University of Natural Resources and Environment , Tan Binh District, Ho Chi Minh City , Vietnam , hcmunre.edu.vn

Search for more papers by this author
Tuan Loi Nguyen

Corresponding Author

Tuan Loi Nguyen

Institute of Fundamental and Applied Sciences , Duy Tan University , Ho Chi Minh City , 70000 , Vietnam , duytan.edu.vn

Faculty of Environmental and Chemical Engineering , Duy Tan University , Da Nang City , 50000 , Vietnam , duytan.edu.vn

Search for more papers by this author
First published: 26 October 2024
Citations: 3
Academic Editor: Prakash Bhuyar

Abstract

Dark fermentative hydrogen (H2) production from water hyacinth (WH) is considered a potentially sustainable process that helps minimize this weed’s harmful effects on the ecosystem and dependence on fossil fuels. To create a quick and precise tool for simulating and optimizing this process, this study applied the combination of the physics-based model and artificial intelligence approaches for the first time. The physics-based model was used as a computational experimental dataset generator to save time and cost in acquiring experimental data. Such a synthetic dataset was used to train the artificial neural network (ANN) model, which can predict the performance of dark fermentation fed with water hyacinth (DF@WH) in a fraction of the time. The particle swarm optimization (PSO) algorithm was then integrated to identify the ideal conditions for DF@WH. H2 productivity and total energy recovery were selected as objectives based on basic operating parameters such as substrate concentration, initial pH, temperature, and operating time. The optimization results revealed that the maximum values of H2 productivity (i.e., the maximum yield of 266.8 mL/g-TS and the maximum rate of 80.5 mL/L/h) and energy efficiency (i.e., 11.4%) cannot be achieved simultaneously under a specific optimal condition. Instead, when these targets were considered equally important, the balance optimal condition was determined at a substrate concentration of 8.9 g-TS/L, an initial pH of 6.5, a temperature of 33.9°C, and an operating time of 28.2 h. Under such conditions, H2 productivity can be achieved with a yield of 200.2 mL/g-TS at a production rate of 62.9 mL/L/h and a total energy recovery of 11.0%.

1. Introduction

Due to its nontoxic nature, carbon-free properties, and high energy density, hydrogen (H2) is considered a green fuel for sustainable energy solutions [1]. However, to date, H2 has been primarily produced through thermal cracking or steam reforming of fossil carbon–based sources, despite their practicality, significantly contributing to emissions and reducing H2 sustainability [2, 3]. From that situation, H2 production from green and clean sources through biological processes has received increasing research attention [4]. Among them, dark fermentative hydrogen (H2) production from water hyacinth (WH) is considered a potentially sustainable process that helps minimize this weed’s harmful effects on the ecosystem and dependence on fossil fuels. As known, WH is a noxious weed that damages agricultural and aquatic ecosystems [5]. Its vigorous growth depletes oxygen and nutrient content, leading to plant and animal death. Meanwhile, dark fermentation (DF) is gaining interest due to its low investment and operating costs and environmental friendliness [6]. Even so, DF fed with water hyacinth (DF@WH) is a complex process, and its performance is affected by various parameters, such as substrate concentration, initial pH, temperature, and inoculum type [715]. Many experimental studies on DF@WH have been carried out [715]; however, optimization studies for this process are currently absent in the literature. In general, the optimization for DF can be implemented via response surface methodology (RSM) [16, 17]. However, the exclusive use of quadratic polynomial models has resulted in RSM often being ineffective for complicated processes [18, 19]. Moreover, with the large number of independent variables to be investigated, the necessary expenses and time for the experiments involved will significantly increase [20].

In that context, mathematical model-based optimization has been regarded as a valuable tactic for cutting down on experimental expenses and time [21]. The mathematical models for DF can be divided into physics-based models and data-driven models. The physics-based models, also known as equation-based models, are developed based on the fundamental physical principles that enable them to comprehensively describe the behavior of processes [2224]. However, such physics-based models are too complex, requiring high computational effort and time-consuming, thus preventing their application in control or online applications. Moreover, requiring the simultaneous solution of nonlinear equations with many variables, it is challenging to apply optimization algorithms to these physics-based models [2224].

Conversely, data-driven models have been considered practical mathematical tools in online applications and control due to they can be simulated and optimized with quick computational turnaround times [23, 25]. One of the common examples of a data-driven model is the artificial neural network (ANN), which mimics the functioning of the human brain and is capable of creating complex relationships between predictors and responses without the need for explicit physical formulations [25]. Compared to RSM, ANN has been proven to be superior in terms of predictive ability and convenience [18, 19, 25]. Most notably, ANN can be quickly and accurately optimized using biologically inspired optimization algorithms [26, 27], especially particle swarm optimization (PSO). Various complex processes (e.g., solar energy [28], fuel cells [26], and spray drying [29]) have been effectively optimized through the combination of ANN and PSO.

However, building an effective ANN model requires a large enough dataset (up to thousands of data points), which will necessitate a significant amount of time and experimental expense to accumulate [22]. Therefore, a combination of physics-based and ANN models has been proposed to overcome their limitations [3032]. Accordingly, an ANN model is built to reproduce the physics-based model. The large dataset required for the ANN training can be derived from simulating the physics-based model, leading to considerable reductions in the cost and time of experiment processes [20]. In addition, the use of such a physics-based model-based dataset also eradicates gross and random errors during the experiment procedure, thereby ensuring high precision of the ANN model [20]. Notably, this trained ANN model can be simulated and optimized in seconds [33]. Such strategy has been implemented in a variety of research fields, including steam methane reforming [30], carbon dioxide capture [31], H2 purification [32], crude oil distillation [34], biomass pyrolysis [24], conversion of biomethanol into dimethyl ether [23], and fuel cell [35]. However, applying this combination approach to DF, especially DF@WH, is not available in the literature.

Therefore, the H2 production from DF@WH is optimized for the first time by combining a physics-based model, ANN, and PSO. The optimization process using the RSM method was also investigated for comparison purposes. Some basic process parameters, including substrate concentration, initial pH, temperature, and operating time, were taken as the inputs; meanwhile, H2 productivity and total energy recovery (TER) were selected as the targets. The combined method developed in this work is anticipated to be a highly convenient and effective mathematical tool for designing and optimizing DF fed with WH and other substrates.

2. Materials and Methods

Figure 1 illustrates the sequential study steps to accomplish the aforementioned goals. First, synthetic datasets for the implementation of RSM and ANN training were created through the simulation of the physics-based model of DF@WH. In the ANN training procedure, a variety of ANN topologies were inspected to find the best ANN model. The optimal points were then determined by using the best ANN model’s output as the objective function for the optimization algorithm (i.e., PSO). Ultimately, these optimal points were experimentally validated (details of the experimental procedure can be found in Appendix A, Supporting Information) and compared with the optimization results obtained from RSM. All computational processes in this study were carried out on a desktop computer, which was configured with an Intel Core i7-8700 processor, ran on a Windows 11 operating system, and had 16 GB of DDR4 RAM.

Details are in the caption following the image
The workflow for modeling and optimizing dark fermentation fed with water hyacinth (DF@WH) by combining physics-based model and intelligent techniques. ANN, artificial neural network; BBD, Box–Behnken design; H2, hydrogen; PSO, particle swarm optimization; RSM, response surface methodology.

2.1. Physics-Based Model

Until now, although numerous microorganisms have been found to be able to convert organic material to H2 via DF (e.g., Clostridium sp., Bacillus sp., and Enterobacter sp.), most studies on DF@WH have used mixed cultures instead of pure cultures [7, 9, 10, 12, 15], as the use of mixed cultures provides more significant economic and technological benefits [3638]. In the modeling of DF using mixed cultures, the microorganisms are usually considered to be a component or reactant in the system [3941]. Meanwhile, substrate concentration, initial pH, temperature, and operating time are often selected as basic operating parameters for modeling and optimization [3941]. In most studies on DF@WH using mixed cultures, the Gompertz model has been widely used to describe H2 production [7, 8, 13]. Unfortunately, it is not applicable for optimization as the estimated parameters of the Gompertz model are limited to a specific operating condition and, therefore, cannot be extended to other conditions [42].

Recently, a kinetic model, a type of physics-based model, for performance prediction of DF@WH using mixed cultures has been developed by Phan, Phan, and Nguyen [43], which integrated the effect of fundamental operating parameters, such as temperature, initial pH, and substrate concentration. Accordingly, this model was constructed assuming that the WH feedstock exists in two forms (i.e., particulate and soluble) and undergoes two sequential steps (i.e., hydrolysis and acidogenesis) in the DF process. The surface-limiting model [42] and the modified Michaelis–Menten model [44] were used to model the hydrolysis of particulate and soluble forms, respectively. The Luedeking–Piret model [45] was used to model the change rate of H2 production. Meanwhile, the effects of substrate concentration, operating temperature, and pH were modeled based on the Aiba model [46], the modified Ratkowsky model [47], and the ADM1 model [48], respectively. This model was validated by the literature and experimental data with a high R2 value of 0.97 for dark fermentative H2 production from WH [43]. Therefore, this kinetic model was chosen as a computational experimental dataset generator for this study.

2.2. Dataset Generation

The synthetic dataset was generated by simulating the abovementioned kinetic model in the MATLAB/Simulink software (2023a, MathWorks Inc., USA) using an adaptive algorithm, variable-step type, automatically selected solver, and a relative tolerance of 10−4. The WH concentration, initial pH, temperature, and operating time were selected as the inputs. The Box–Behnken design (BBD) was chosen as the multilevel experimental design for RSM implementation because of its simplicity and ability to reduce the amount of experiments required [49]. Accordingly, 29 operating condition sets need to be generated for BBD, as illustrated in Table S1 (Supporting Information). On the other hand, to guarantee the ANN training dataset, the studied range for operating parameters of DF@WH comprises an initial pH of 5–8, WH concentration of 5–45 g-TS/L, and temperature of 20–40°C. The maximum operating time was set at 168 h, and the accumulated H2 volume was recorded every 6 h. With such a setup, a dataset of 9135 operating condition sets needs to be generated. To reduce the dataset size for faster ANN training without loss of information, the PSO-based feature selection [50] was used to select the highest influence data points in a total of 9135 initial data points. The fitness function for feature selection was set based on the mean of the normalized original dataset (i.e., all data points in the original dataset are transformed into their respective values in the range of 0–1). A higher influential data point is determined if its normalized mean is closer to the mean of the normalized original dataset (i.e., 0.445). Different selected data sizes of 1%, 2%, 5%, 8%, and 10% of the original dataset were employed for ANN training. The best ANN models obtained in these procedures were validated with a fixed dataset containing 456 least influential data points (i.e., equivalent to 5% of the original dataset).

2.3. Evaluation Performance of DF@WH

The performance of DF@WH was evaluated through H2 production yield (HY, mL/g-TS), H2 production rate (HPR, mL/L/h), and TER (%), calculated as follows:
(1)
(2)
(3)
where the energy of H2 production (, kJ/L), substrate energy input (Ws, kJ/L), heat energy input (Wh, kJ/L), operating temperature (Wl, kJ), and mixing energy (Wm, kJ/L) are calculated as Equations (4)–(8), respectively:
(4)
(5)
(6)
(7)
(8)
where (mL/L) is the accumulated volume of H2 produced per 1 L of working volume of the DF reactor; and ΔHs are energy values of H2 (i.e., 285.8 kJ/mol [51]) and WH (i.e., 15.6 kJ/g-TS [15]), respectively; CS (g-TS/L) is the WH concentration; T (°C) is the operation temperature; To is the environmental temperature which is assumed as 25°C; cp is the specific heat capacity, considered as equal to that of water (cp = 4.186 kJ/kg/°C); ρ is the mass density of DF media, considered as equal to that of water (ρ = 1.0 kg/L); qmix (W/L) is the stirring power consumed that is needed for mixing 1 L of the reactor’s working volume, which is assumed to be 0.14 W/L [52]; and qloss (J/h) is the heat loss from the DF reactor walls during each 1 h operating, which is assumed to be 0.5% of Wh.

2.4. Data-Driven Model Development, Comparison, and Optimization

Based on the previous section, it can be seen that the accumulated volume of H2 production (, mL/L) is a crucial parameter from which other performance parameters (i.e., HY, HPR, and TER) can be determined. Therefore, it was selected as the ANN’s output. Meanwhile, to simplify optimization using RSM, HY, HPR, and TER were selected as the response variables for RSM model development. At the same time, four operating parameters, including WH concentration, initial pH, temperature, and operating time, were selected as the inputs for ANN and RSM models.

2.4.1. ANN Model Development and Optimization by PSO

The trial-and-error approach and the mean square error (MSE) as the objective function were used to evaluate the accuracy of the ANN models. MSE was defined as follows using Equation (9) [26, 53], and the lowest MSE value corresponds to the best ANN model:
(9)
where m is the number of data points and Yex and Ypr are the experimental and predicted data, respectively.

Various ANN structures were examined to ascertain relationships between inputs and outputs using the selected computational dataset (as mentioned in Section 2.2). The training, validating, and testing datasets accounted for 70%, 15%, and 15% of the total divisions of this selected dataset. The hidden layer’s characteristics, including activation functions, the number of hidden layers, and the number of neurons in the hidden layer, were altered to achieve the best ANN. Specifically, several standard activation functions, including logistic sigmoid (logsig), triangular basis (tribas), positive saturating linear (satlin), radial basis (radbas), symmetric saturating linear (satlins), radial basis normalized (radbasn), softmax (softmax), elliot sigmoid (elliotsig), rectified linear unit (ReLU), and hyperbolic tangent (tansig), were sequentially examined. The number of hidden layers and neutrons in each hidden layer was adjusted between 1–3 layers and 1–20 neurons, respectively, as shown in Figure 2. Each ANN topology was trained sequentially using the Deep Learning Toolbox (MATLAB 2023a, MathWorks Inc., USA). The ANN training was stopped when the epoch number reached a maximum of 2000 and/or the gradient value reached a minimum of 10−12. After obtaining the best ANN model, its output (i.e., the accumulated volume of H2 production) was transformed to HY, HPR, and TER based on Equations (1)–(3) and then considered the objective functions for further optimization by PSO.

Details are in the caption following the image
The structure of the artificial neural network (ANN) model developed in this study. H2, hydrogen; WH, water hyacinth.

The PSO procedure was performed in the Optimize Toolbox (MATLAB 2023a, MathWorks Inc., USA) with the input parameter’s boundaries presented in Table 1. Other parameters of the PSO optimization were kept at the initial setting of the toolbox.

Table 1. Boundaries of input parameters for the optimization procedure.
Parameter Lower boundary Upper boundary
WH concentration (g-TS/L) 5 45
Initial pH 5 8
Temperature (°C) 20 40
Operating time (h) 0 168
  • Abbreviation: WH, water hyacinth.

2.4.2. RSM Model Development and Optimization

The correlation between independent and response variables in the RSM model was described by the second-order polynomial equation, as presented in Equation (10) [54]. The significance of the RSM model was evaluated using an analysis of variance (ANOVA), where a p-value of less than 0.05 was deemed significant. The ANOVA and optimization of RSM were carried out in Design Expert software (version 12.0, Stat-Ease Inc., USA):
(10)
where Yi represents the response variable (i.e., HY, HPR, and TER); Xi and Xj represent independent variables; and k0, ki, kij, and kij represent the RSM model’s coefficients.

2.4.3. Evaluation and Comparison Between ANN and RSM

In addition to MSE, the coefficient of determination (R2) and normalized root mean squared error (NRMSE) were also used to assess the accuracy of the developed ANN and RSM models. The values of R2 and NRMSE were determined as shown in Equations (11) and (12), respectively. In general, the precision of the model increases as the values of R2 and NRMSE increase (to a maximum of 1) and decrease (to a minimum of 0), respectively:
(11)
(12)

3. Results and Discussion

3.1. ANN Model Development

Firstly, PSO-based feature selection was implemented to reduce the size of the dataset and thus save time for the ANN training procedure. The effects of various sizes of the selected dataset on the performance of ANN training were presented in Table S2 and Figure S2 (Supporting Information). It can be observed that with a smaller size, the training time was shorter. Specifically, the training time was decreased from 36.2 s for the original dataset to only 2.2 s for the selected dataset’s size of 1%. However, the reduced size also causes a loss of the information of the original dataset, leading to a reduced accuracy of the model. As a result, the size of 5% (i.e., corresponding to 456 data points) can be considered the best size that saved over 75% training time to 8.6 s without changing the information of the original dataset.

Table 2 lists the performance of different ANN models in replicating the kinetic model. Only the most suitable model with the lowest MSE for each activation function is shown here for simplicity. Because of the complexity of DF@WH, it is evident that ANN models with a single hidden layer are insufficient to replicate the outcomes of the kinetic model. Higher hidden layer ANNs, on the other hand, performed better but required more training time because they had a more considerable number of hidden neurons, making them more complex. It is also evident, though, that as the number of hidden layers rose to three, the overcomplexity of these models caused overfitting, which in turn reduced the goodness of fit [55, 56].

Table 2. Summarization of the training performance of various ANN topologies.
Hidden layer number Activation function Optimal hidden neuron number MSE NRMSE (all) R2 (all) Training time (s)
Training Testing Validation All
1 elliotsig 15 3764.3 14,318.6 4436.0 5451.0 0.0385 0.9868 3.77
logsig 12 5787.1 11,638.3 8212.8 7030.7 0.0437 0.9830 3.73
radbas 20 2523.9 10,899.5 8100.6 4620.2 0.0354 0.9888 2.03
radbasn 20 786.5 8505.5 17,961.5 4526.7 0.0351 0.9890 5.95
ReLU 16 4911.5 13,692.9 16,338.3 7947.7 0.0465 0.9808 0.58
satlin 20 5532.0 23,989.0 11,245.4 9163.5 0.0499 0.9778 0.64
satlins 18 4691.6 5522.4 31,629.5 8863.8 0.0491 0.9786 0.51
softmax 18 599.2 1283.5 835.0 737.5 0.0142 0.9982 5.13
tansig 16 2381.5 3251.8 6644.3 3152.7 0.0293 0.9924 4.98
tribas 19 3693.9 5336.8 16,416.0 5852.2 0.0399 0.9858 0.55
  
2 elliotsig 16-5 46.4 950.6 1509.9 402.2 0.0105 0.9990 6.46
logsig 16-8 14.7 24.3 1727.5 273.5 0.0086 0.9993 8.12
radbas 8-17 145.3 3305.6 7310.8 1696.7 0.0215 0.9959 8.23
radbasn 19-6 1.4 49.9 394.5 67.8 0.0043 0.9998 9.11
ReLU 13-11 586.4 2849.2 2919.6 1276.9 0.0186 0.9969 0.63
satlin 19-9 154.9 722.8 1450.7 434.9 0.0109 0.9989 0.85
satlins 16-18 275.4 2600.9 6274.1 1526.1 0.0204 0.9963 0.88
softmax 9-11 42.5 51.4 44.6 44.2 0.0025 0.9999 8.63
tansig 12-14 19.8 41.4 488.7 93.5 0.0050 0.9997 9.53
tribas 20-16 2022.7 6076.0 8761.0 3644.1 0.0315 0.9912 1.08
  
3 elliotsig 9-13-14 604.1 2955.8 1948.8 1159.5 0.0147 0.9972 14.93
logsig 8-11-19 880.6 1766.4 2700.3 1287.1 0.0139 0.9969 17.07
radbas 11-12-12 502.1 4444.7 7429.2 2135.2 0.0236 0.9948 14.40
radbasn 12-16-10 119.2 1318.2 3029.6 736.6 0.0089 0.9982 22.10
ReLU 9-17-10 1235.1 4475.8 4932.4 2277.5 0.0228 0.9945 1.82
satlin 11-19-10 961.4 4212.8 2919.9 1744.2 0.0167 0.9958 2.20
satlins 13-16-10 937.9 3039.1 10,077.4 2626.8 0.0247 0.9936 2.02
softmax 10-11-13 126.0 236.2 163.2 148.2 0.0042 0.9996 20.07
tansig 12-14-16 374.0 523.0 1412.0 552.4 0.0087 0.9986 21.05
tribas 8-14-11 2273.4 5965.1 9909.2 3975.3 0.0327 0.9904 2.37
  • Note: The boldface represents the best performance.
  • Abbreviations: ANN, artificial neural network; MSE, mean square error; NRMSE, normalized root mean squared error.

Table 2 demonstrates that the ReLU, tribas, satlins, and satlin functions demonstrated rapid training, taking about 2 s to complete. On the other hand, these activation functions showed subpar performance. In the meantime, the training time and performance of the radbas and radbasn functions were subpar. The elliotsig, tansig, logsig, and softmax functions all required more training time, but their results were superior. With the softmax function and a 4-9-11-1 network topology, the optimal ANN model was thus discovered. With the lowest MSE and NRMSE of 44.2 and 0.0025, respectively, this model demonstrated the best performance. Figure S1 and Table S3 (Supporting Information) provide the specific characteristics of the best ANN model. Furthermore, Figure 3 illustrates that the best ANN model’s expected data were close to the kinetic model-generated synthetic data. All R2 values in ANN training, testing, and validation surpass 0.999, indicating the ANN model successfully reproduced the kinetic model.

Details are in the caption following the image
Regression plots of the best artificial neural network (ANN) model for (a, b) training, (c, d) validation, and (e, f) testing. H2, hydrogen; MSE, mean square error; NRMSE, normalized root mean squared error.
Details are in the caption following the image
Regression plots of the best artificial neural network (ANN) model for (a, b) training, (c, d) validation, and (e, f) testing. H2, hydrogen; MSE, mean square error; NRMSE, normalized root mean squared error.
Details are in the caption following the image
Regression plots of the best artificial neural network (ANN) model for (a, b) training, (c, d) validation, and (e, f) testing. H2, hydrogen; MSE, mean square error; NRMSE, normalized root mean squared error.
Details are in the caption following the image
Regression plots of the best artificial neural network (ANN) model for (a, b) training, (c, d) validation, and (e, f) testing. H2, hydrogen; MSE, mean square error; NRMSE, normalized root mean squared error.
Details are in the caption following the image
Regression plots of the best artificial neural network (ANN) model for (a, b) training, (c, d) validation, and (e, f) testing. H2, hydrogen; MSE, mean square error; NRMSE, normalized root mean squared error.
Details are in the caption following the image
Regression plots of the best artificial neural network (ANN) model for (a, b) training, (c, d) validation, and (e, f) testing. H2, hydrogen; MSE, mean square error; NRMSE, normalized root mean squared error.

3.2. Compare to RSM

The expected results from RSM and matching computational synthetic data are shown in Table S4 (Supporting Information). Also, ANOVA results for HY, HPR, and TER responses are presented in Tables S5S7 (Supporting Information), respectively. The quadratic equation coefficients for these responses in relation to substrate concentration, initial pH, temperature, and operating time are shown in Table S8 (Supporting Information). Consequently, these RSM models were significant based on their high F-values of 4.8–52.9 and low p-values of less than 0.003 [54].

Nevertheless, the RSM models exhibit a substantially lower level of accuracy than the ANN model. Figure 4 shows the correlation between the kinetic model-generated synthetic data and the expected data from the ANN and RSM models. The ANN model demonstrated a predictability that was superior to that of the RSM model, where greater R2 values of ≥ 0.9999 were compared to 0.8283–0.9815 of RSM. Besides, compared to the RSM models, the ANN model’s NRMSE values were considerably lower (i.e., 0.004–0.005 vs. 0.049–0.108). In contrast to the RSM, which utilizes a second-order polynomial, the ANN model is able to analyze higher-order nonlinear systems, which accounts for its superior accuracy in terms of predictions [19, 25]. These results once again proved that the best ANN model is reliable for further simulation and optimization.

Details are in the caption following the image
Regression plots of the best artificial neural network (ANN) compared to response surface methodology (RSM): (a) H2 production yield (HY), (b) H2 production rate (HPR), and (c) total energy recovery (TER). NRMSE, normalized root mean squared error.
Details are in the caption following the image
Regression plots of the best artificial neural network (ANN) compared to response surface methodology (RSM): (a) H2 production yield (HY), (b) H2 production rate (HPR), and (c) total energy recovery (TER). NRMSE, normalized root mean squared error.
Details are in the caption following the image
Regression plots of the best artificial neural network (ANN) compared to response surface methodology (RSM): (a) H2 production yield (HY), (b) H2 production rate (HPR), and (c) total energy recovery (TER). NRMSE, normalized root mean squared error.

3.3. ANN Simulation

The simulated impacts of WH concentration, temperature, initial pH, and operating time on DF@WH from the best ANN model are shown in Figure 5. Besides, the detailed energies input and output from DF@WH were also simulated and illustrated in Figure 6. These simulation results were obtained under the base conditions of a substrate concentration of 20 g-TS/L, an initial pH of 7.0, a temperature of 35°C, and an operating time of 48 h. It can be observed that the simulation results of the ANN model agreed well with the kinetic model-generated synthetic data. That once again confirms the excellent precision of the ANN model. Moreover, the best ANN model considerably reduced computational time in comparison to the kinetic model. Accordingly, to ensure the data for Figure 5 (i.e., including 50 data points from 22 different operating condition sets), the kinetic model required over 1100 s to simulate; meanwhile, the ANN model only needed less than 2 s to emulate a 15-times-larger dataset (i.e., 750 data points from 330 different operating condition sets).

Details are in the caption following the image
The simulation effect of (a) WH concentration, (b) initial pH, (c) temperature, and (d) operating time on HY, HPR, and TER of DF@WH. The scatters and lines represent the simulated data from the kinetic and ANN models, respectively. The base conditions were set at a concentration of 20 g-TS/L of WH, initial pH of 7.0, temperature of 35°C, and operating time of 48 h. ANN, artificial neural network; DF@WH, dark fermentation fed with water hyacinth; HPR, H2 production rate; HY, H2 production yield; TER, total energy recovery; WH, water hyacinth.
Details are in the caption following the image
The simulation effect of (a) WH concentration, (b) initial pH, (c) temperature, and (d) operating time on HY, HPR, and TER of DF@WH. The scatters and lines represent the simulated data from the kinetic and ANN models, respectively. The base conditions were set at a concentration of 20 g-TS/L of WH, initial pH of 7.0, temperature of 35°C, and operating time of 48 h. ANN, artificial neural network; DF@WH, dark fermentation fed with water hyacinth; HPR, H2 production rate; HY, H2 production yield; TER, total energy recovery; WH, water hyacinth.
Details are in the caption following the image
The simulation effect of (a) WH concentration, (b) initial pH, (c) temperature, and (d) operating time on HY, HPR, and TER of DF@WH. The scatters and lines represent the simulated data from the kinetic and ANN models, respectively. The base conditions were set at a concentration of 20 g-TS/L of WH, initial pH of 7.0, temperature of 35°C, and operating time of 48 h. ANN, artificial neural network; DF@WH, dark fermentation fed with water hyacinth; HPR, H2 production rate; HY, H2 production yield; TER, total energy recovery; WH, water hyacinth.
Details are in the caption following the image
The simulation effect of (a) WH concentration, (b) initial pH, (c) temperature, and (d) operating time on HY, HPR, and TER of DF@WH. The scatters and lines represent the simulated data from the kinetic and ANN models, respectively. The base conditions were set at a concentration of 20 g-TS/L of WH, initial pH of 7.0, temperature of 35°C, and operating time of 48 h. ANN, artificial neural network; DF@WH, dark fermentation fed with water hyacinth; HPR, H2 production rate; HY, H2 production yield; TER, total energy recovery; WH, water hyacinth.
Details are in the caption following the image
Simulation of detailed energy input and output of DF@WH affected by (a) substrate concentration, (b) initial pH, (c) temperature, and (d) operating time. Where gray color represents the substrate energy input, orange color represents the heat energy input, yellow color represents the energy input to maintain the operating temperature, and green color represents the mixing energy input. The base conditions were set at a concentration of 20 g-TS/L of WH, initial pH of 7.0, temperature of 35°C, and operating time of 48 h. DF@WH, dark fermentation fed with water hyacinth; H2, hydrogen; WH, water hyacinth.
Details are in the caption following the image
Simulation of detailed energy input and output of DF@WH affected by (a) substrate concentration, (b) initial pH, (c) temperature, and (d) operating time. Where gray color represents the substrate energy input, orange color represents the heat energy input, yellow color represents the energy input to maintain the operating temperature, and green color represents the mixing energy input. The base conditions were set at a concentration of 20 g-TS/L of WH, initial pH of 7.0, temperature of 35°C, and operating time of 48 h. DF@WH, dark fermentation fed with water hyacinth; H2, hydrogen; WH, water hyacinth.
Details are in the caption following the image
Simulation of detailed energy input and output of DF@WH affected by (a) substrate concentration, (b) initial pH, (c) temperature, and (d) operating time. Where gray color represents the substrate energy input, orange color represents the heat energy input, yellow color represents the energy input to maintain the operating temperature, and green color represents the mixing energy input. The base conditions were set at a concentration of 20 g-TS/L of WH, initial pH of 7.0, temperature of 35°C, and operating time of 48 h. DF@WH, dark fermentation fed with water hyacinth; H2, hydrogen; WH, water hyacinth.
Details are in the caption following the image
Simulation of detailed energy input and output of DF@WH affected by (a) substrate concentration, (b) initial pH, (c) temperature, and (d) operating time. Where gray color represents the substrate energy input, orange color represents the heat energy input, yellow color represents the energy input to maintain the operating temperature, and green color represents the mixing energy input. The base conditions were set at a concentration of 20 g-TS/L of WH, initial pH of 7.0, temperature of 35°C, and operating time of 48 h. DF@WH, dark fermentation fed with water hyacinth; H2, hydrogen; WH, water hyacinth.

It can be observed in Figure 5 that substrate concentration, initial pH, temperature, and operating time have a multidimensional effect on the energy efficiency and performance of DF@WH. Accordingly, a WH concentration that is too high or too low causes a decline in the performance of DF@WH. As is known, low substrate concentrations cause substrate limitation, while high concentrations lead to substrate inhibition, which reduces hydrolysis and acidogenesis rates [43]. On the other hand, an initial pH and temperature that are too low or too high reduce DF@WH’s productivity and energy efficiency, which are modeled in their inhibition factors [43]. Meanwhile, HY and TER increased in the first 72 h of operation and then changed insignificantly when DF@WH continued to be operated for up to 168 h. This caused a decrease in HPR when the operation time was prolonged.

Figure 6 illustrates the detailed energy output (i.e., H2 energy) and input into the DF@WH system. It can be seen that the H2 energy produced tends to be similar to HY since they are both proportional to the cumulative volume of H2 produced, as illustrated in Equations (1) and (4), respectively. However, it can be observed that this H2 energy output was significantly lower than the input energies into the DF@WH, which is mainly the energy from the substrate. This is consistent with the results obtained in Figure 5, as increasing substrate concentration significantly reduced the energy conversion efficiency of DF. Besides, except for the effect of initial pH, a change in pH does not change the energy input; increased operating time and temperature increase the energy input requirements associated with heating, heat loss compensation, and agitation. In other words, excessively increasing the substrate concentration, temperature, and operating time increases the input energy, which in turn can reduce the energy recovery of DF@WH. From these simulation results, it is clear that to maximize the performance and efficiency of DF@WH, the substrate concentration, initial pH, temperature, and operating time must be optimized.

3.4. ANN–PSO Optimization

The iteration diagrams and ideal values from the optimization process are displayed in Figure 7. The PSO implementation method showed quick computation; it took roughly 3 s and less than 50 iterations. Such fast calculation of the ANN–PSO approach has also been reported in previous studies [25, 28, 57]. As can also be observed in Figure 7, to achieve the highest performance and efficiency, DF@WH should be operated under an initial pH of 6.5. At the same time, other operational parameters, including substrate concentration, temperature, and operating time, cannot exist at an optimal value to achieve the maximum performance and effectiveness of DF@WH simultaneously. Accordingly, HY could reach the maximum value at 266.8 mL/g-TS when DF was operated at 5.0 g-TS of WH per liter and 34.4°C for 168 h of operation. HPR could be achieved at 80.5 mL/L/h at the WH concentration of 15 g-TS/L, 34.5°C, and after 23.3 h of operation. Meanwhile, to achieve the optimal energy conversion value of 11.4%, DF@WH must be operated at 7.8 g/L, 32.6°C, and for 34 h of operation. These determined optimal conditions are within the typical optimal range previously observed for H2 production from DF [39, 58, 59].

Details are in the caption following the image
PSO implementation’s iteration plots for (a) HY, (b) HPR, (c) TER, (d) co-optimal, and summarization of the (e) determined optimal conditions and (f) predicted optimal values. HPR, H2 production rate; HY, H2 production yield; PSO, particle swarm optimization; TER, total energy recovery.
Details are in the caption following the image
PSO implementation’s iteration plots for (a) HY, (b) HPR, (c) TER, (d) co-optimal, and summarization of the (e) determined optimal conditions and (f) predicted optimal values. HPR, H2 production rate; HY, H2 production yield; PSO, particle swarm optimization; TER, total energy recovery.
Details are in the caption following the image
PSO implementation’s iteration plots for (a) HY, (b) HPR, (c) TER, (d) co-optimal, and summarization of the (e) determined optimal conditions and (f) predicted optimal values. HPR, H2 production rate; HY, H2 production yield; PSO, particle swarm optimization; TER, total energy recovery.
Details are in the caption following the image
PSO implementation’s iteration plots for (a) HY, (b) HPR, (c) TER, (d) co-optimal, and summarization of the (e) determined optimal conditions and (f) predicted optimal values. HPR, H2 production rate; HY, H2 production yield; PSO, particle swarm optimization; TER, total energy recovery.
Details are in the caption following the image
PSO implementation’s iteration plots for (a) HY, (b) HPR, (c) TER, (d) co-optimal, and summarization of the (e) determined optimal conditions and (f) predicted optimal values. HPR, H2 production rate; HY, H2 production yield; PSO, particle swarm optimization; TER, total energy recovery.
Details are in the caption following the image
PSO implementation’s iteration plots for (a) HY, (b) HPR, (c) TER, (d) co-optimal, and summarization of the (e) determined optimal conditions and (f) predicted optimal values. HPR, H2 production rate; HY, H2 production yield; PSO, particle swarm optimization; TER, total energy recovery.

When all performance parameters (i.e., HY, HPR, and TER) were assumed to be of equal importance, the co-optimal conditions for DF@WH were determined at 8.9 g-TS/L of WH, initial pH of 6.5, a temperature of 33.9°C, and operating time of 28.2 h. Under such co-optimal conditions, HY, HPR, and TER could be achieved at 200.2 mL/g-TS, 62.9 mL/L/h, and 11.0%, respectively.

Figure 8 presents the experimental data under co-optimal conditions determined from RSM (i.e., 14.1 g-TS/L, initial pH of 6.5, 31.8°C, and 111.9 h) and ANN–PSO approaches. Additionally, other optimization results of RSM implementation are presented in Table S9 (Supporting Information). It was also found in Figure 8 that the predicted optimal values from RSM were lower and less accurate than the ANN–PSO approach. Specifically, the accuracy of RSM was only 80%–87% and less than 77% when compared to the kinetic model and experimental data, respectively. These values were significantly lower than those from the ANN–PSO approach, with an accuracy of more than 99% compared to the kinetic model and more than 96% compared to the experimental data (details of the experimental procedure can be found in Appendix A, supporting information), reaffirming the precise predictability of the ANN model as well as the ANN–PSO optimization strategy.

Details are in the caption following the image
(a, c) Predicted performance (i.e., HY, HPR, and TER) profiles from kinetic model and (b, d) optimal points at the co-optimal conditions determined from (a, b) RSM and (c, d) ANN–PSO approach. ANN–PSO, artificial neural network–particle swarm optimization; HPR, H2 production rate; HY, H2 production yield; RSM, response surface methodology; TER, total energy recovery.
Details are in the caption following the image
(a, c) Predicted performance (i.e., HY, HPR, and TER) profiles from kinetic model and (b, d) optimal points at the co-optimal conditions determined from (a, b) RSM and (c, d) ANN–PSO approach. ANN–PSO, artificial neural network–particle swarm optimization; HPR, H2 production rate; HY, H2 production yield; RSM, response surface methodology; TER, total energy recovery.
Details are in the caption following the image
(a, c) Predicted performance (i.e., HY, HPR, and TER) profiles from kinetic model and (b, d) optimal points at the co-optimal conditions determined from (a, b) RSM and (c, d) ANN–PSO approach. ANN–PSO, artificial neural network–particle swarm optimization; HPR, H2 production rate; HY, H2 production yield; RSM, response surface methodology; TER, total energy recovery.
Details are in the caption following the image
(a, c) Predicted performance (i.e., HY, HPR, and TER) profiles from kinetic model and (b, d) optimal points at the co-optimal conditions determined from (a, b) RSM and (c, d) ANN–PSO approach. ANN–PSO, artificial neural network–particle swarm optimization; HPR, H2 production rate; HY, H2 production yield; RSM, response surface methodology; TER, total energy recovery.

It should be noted that, due to the thermodynamic limit of the DF process, at least two-thirds of the H2 element are retained in solution to form soluble by-products (such as volatile fatty acids and alcohols) instead of converting turns into H2 gas [60]. This resulted in low H2 productivity and energy conversion efficiency of DF@WH. The optimal yield obtained in this study was around 50% higher than the highest value found in the literature (i.e., 134.9 mL/g-TS at 18 g-WH/L, 35°C, initial pH 6, and 24 h of operation [11]). However, it only reached approximately one-fifth of the theoretical H2 yield of WH (i.e., ~1000 mL/g-TS [12]). Therefore, other downstream processes that can further convert soluble by-products to H2, such as microbial electrolysis cells and photofermentation, need to be further investigated and optimized. To achieve these goals, further in-depth analyses and simulations of microbial community variation and soluble by-product formation in DF@WH are also required.

4. Conclusion

This study demonstrates that the combined approach of the physics-based model, ANN model, and PSO algorithm is an effective strategy for accurate and rapid simulation and optimization of DF@WH. The simulation results revealed that substrate concentration, initial pH, temperature, and operating time have synergistic effects on DF@WH that must be optimized. The optimization results showed that achieving the highest energy efficiency and performance values at the same time was not possible under a specific optimal operational condition. At co-optimal conditions (i.e., H2 productivity and TER were seen as being equally significant), the H2 yield could be achieved at 200.2 mL/g-TS at the production rate of 62.9 mL/L/h with a TER of 11.0%. However, combining DF@WH with other downstream processes, such as microbial electrolysis cells and photofermentation, needs further investigation and optimization to enhance the H2 productivity and energy recovery from WH.

Conflicts of Interest

The authors declare no conflicts of interest.

Funding

This work was supported by Ho Chi Minh City University of Natural Resources and Environment, Vietnam; Duy Tan University, Vietnam; and Gachon University, South Korea.

Supporting Information

Additional supporting information can be found online in the Supporting Information section.

Data Availability Statement

The data that supports the findings of this study are available in the supporting information of this article.

    The full text of this article hosted at iucr.org is unavailable due to technical difficulties.