Combination of Physics-Based Model and Artificial Intelligence for Rapid Simulation and Optimization of Dark Fermentative Hydrogen Production From Water Hyacinth
Abstract
Dark fermentative hydrogen (H2) production from water hyacinth (WH) is considered a potentially sustainable process that helps minimize this weed’s harmful effects on the ecosystem and dependence on fossil fuels. To create a quick and precise tool for simulating and optimizing this process, this study applied the combination of the physics-based model and artificial intelligence approaches for the first time. The physics-based model was used as a computational experimental dataset generator to save time and cost in acquiring experimental data. Such a synthetic dataset was used to train the artificial neural network (ANN) model, which can predict the performance of dark fermentation fed with water hyacinth (DF@WH) in a fraction of the time. The particle swarm optimization (PSO) algorithm was then integrated to identify the ideal conditions for DF@WH. H2 productivity and total energy recovery were selected as objectives based on basic operating parameters such as substrate concentration, initial pH, temperature, and operating time. The optimization results revealed that the maximum values of H2 productivity (i.e., the maximum yield of 266.8 mL/g-TS and the maximum rate of 80.5 mL/L/h) and energy efficiency (i.e., 11.4%) cannot be achieved simultaneously under a specific optimal condition. Instead, when these targets were considered equally important, the balance optimal condition was determined at a substrate concentration of 8.9 g-TS/L, an initial pH of 6.5, a temperature of 33.9°C, and an operating time of 28.2 h. Under such conditions, H2 productivity can be achieved with a yield of 200.2 mL/g-TS at a production rate of 62.9 mL/L/h and a total energy recovery of 11.0%.
1. Introduction
Due to its nontoxic nature, carbon-free properties, and high energy density, hydrogen (H2) is considered a green fuel for sustainable energy solutions [1]. However, to date, H2 has been primarily produced through thermal cracking or steam reforming of fossil carbon–based sources, despite their practicality, significantly contributing to emissions and reducing H2 sustainability [2, 3]. From that situation, H2 production from green and clean sources through biological processes has received increasing research attention [4]. Among them, dark fermentative hydrogen (H2) production from water hyacinth (WH) is considered a potentially sustainable process that helps minimize this weed’s harmful effects on the ecosystem and dependence on fossil fuels. As known, WH is a noxious weed that damages agricultural and aquatic ecosystems [5]. Its vigorous growth depletes oxygen and nutrient content, leading to plant and animal death. Meanwhile, dark fermentation (DF) is gaining interest due to its low investment and operating costs and environmental friendliness [6]. Even so, DF fed with water hyacinth (DF@WH) is a complex process, and its performance is affected by various parameters, such as substrate concentration, initial pH, temperature, and inoculum type [7–15]. Many experimental studies on DF@WH have been carried out [7–15]; however, optimization studies for this process are currently absent in the literature. In general, the optimization for DF can be implemented via response surface methodology (RSM) [16, 17]. However, the exclusive use of quadratic polynomial models has resulted in RSM often being ineffective for complicated processes [18, 19]. Moreover, with the large number of independent variables to be investigated, the necessary expenses and time for the experiments involved will significantly increase [20].
In that context, mathematical model-based optimization has been regarded as a valuable tactic for cutting down on experimental expenses and time [21]. The mathematical models for DF can be divided into physics-based models and data-driven models. The physics-based models, also known as equation-based models, are developed based on the fundamental physical principles that enable them to comprehensively describe the behavior of processes [22–24]. However, such physics-based models are too complex, requiring high computational effort and time-consuming, thus preventing their application in control or online applications. Moreover, requiring the simultaneous solution of nonlinear equations with many variables, it is challenging to apply optimization algorithms to these physics-based models [22–24].
Conversely, data-driven models have been considered practical mathematical tools in online applications and control due to they can be simulated and optimized with quick computational turnaround times [23, 25]. One of the common examples of a data-driven model is the artificial neural network (ANN), which mimics the functioning of the human brain and is capable of creating complex relationships between predictors and responses without the need for explicit physical formulations [25]. Compared to RSM, ANN has been proven to be superior in terms of predictive ability and convenience [18, 19, 25]. Most notably, ANN can be quickly and accurately optimized using biologically inspired optimization algorithms [26, 27], especially particle swarm optimization (PSO). Various complex processes (e.g., solar energy [28], fuel cells [26], and spray drying [29]) have been effectively optimized through the combination of ANN and PSO.
However, building an effective ANN model requires a large enough dataset (up to thousands of data points), which will necessitate a significant amount of time and experimental expense to accumulate [22]. Therefore, a combination of physics-based and ANN models has been proposed to overcome their limitations [30–32]. Accordingly, an ANN model is built to reproduce the physics-based model. The large dataset required for the ANN training can be derived from simulating the physics-based model, leading to considerable reductions in the cost and time of experiment processes [20]. In addition, the use of such a physics-based model-based dataset also eradicates gross and random errors during the experiment procedure, thereby ensuring high precision of the ANN model [20]. Notably, this trained ANN model can be simulated and optimized in seconds [33]. Such strategy has been implemented in a variety of research fields, including steam methane reforming [30], carbon dioxide capture [31], H2 purification [32], crude oil distillation [34], biomass pyrolysis [24], conversion of biomethanol into dimethyl ether [23], and fuel cell [35]. However, applying this combination approach to DF, especially DF@WH, is not available in the literature.
Therefore, the H2 production from DF@WH is optimized for the first time by combining a physics-based model, ANN, and PSO. The optimization process using the RSM method was also investigated for comparison purposes. Some basic process parameters, including substrate concentration, initial pH, temperature, and operating time, were taken as the inputs; meanwhile, H2 productivity and total energy recovery (TER) were selected as the targets. The combined method developed in this work is anticipated to be a highly convenient and effective mathematical tool for designing and optimizing DF fed with WH and other substrates.
2. Materials and Methods
Figure 1 illustrates the sequential study steps to accomplish the aforementioned goals. First, synthetic datasets for the implementation of RSM and ANN training were created through the simulation of the physics-based model of DF@WH. In the ANN training procedure, a variety of ANN topologies were inspected to find the best ANN model. The optimal points were then determined by using the best ANN model’s output as the objective function for the optimization algorithm (i.e., PSO). Ultimately, these optimal points were experimentally validated (details of the experimental procedure can be found in Appendix A, Supporting Information) and compared with the optimization results obtained from RSM. All computational processes in this study were carried out on a desktop computer, which was configured with an Intel Core i7-8700 processor, ran on a Windows 11 operating system, and had 16 GB of DDR4 RAM.

2.1. Physics-Based Model
Until now, although numerous microorganisms have been found to be able to convert organic material to H2 via DF (e.g., Clostridium sp., Bacillus sp., and Enterobacter sp.), most studies on DF@WH have used mixed cultures instead of pure cultures [7, 9, 10, 12, 15], as the use of mixed cultures provides more significant economic and technological benefits [36–38]. In the modeling of DF using mixed cultures, the microorganisms are usually considered to be a component or reactant in the system [39–41]. Meanwhile, substrate concentration, initial pH, temperature, and operating time are often selected as basic operating parameters for modeling and optimization [39–41]. In most studies on DF@WH using mixed cultures, the Gompertz model has been widely used to describe H2 production [7, 8, 13]. Unfortunately, it is not applicable for optimization as the estimated parameters of the Gompertz model are limited to a specific operating condition and, therefore, cannot be extended to other conditions [42].
Recently, a kinetic model, a type of physics-based model, for performance prediction of DF@WH using mixed cultures has been developed by Phan, Phan, and Nguyen [43], which integrated the effect of fundamental operating parameters, such as temperature, initial pH, and substrate concentration. Accordingly, this model was constructed assuming that the WH feedstock exists in two forms (i.e., particulate and soluble) and undergoes two sequential steps (i.e., hydrolysis and acidogenesis) in the DF process. The surface-limiting model [42] and the modified Michaelis–Menten model [44] were used to model the hydrolysis of particulate and soluble forms, respectively. The Luedeking–Piret model [45] was used to model the change rate of H2 production. Meanwhile, the effects of substrate concentration, operating temperature, and pH were modeled based on the Aiba model [46], the modified Ratkowsky model [47], and the ADM1 model [48], respectively. This model was validated by the literature and experimental data with a high R2 value of 0.97 for dark fermentative H2 production from WH [43]. Therefore, this kinetic model was chosen as a computational experimental dataset generator for this study.
2.2. Dataset Generation
The synthetic dataset was generated by simulating the abovementioned kinetic model in the MATLAB/Simulink software (2023a, MathWorks Inc., USA) using an adaptive algorithm, variable-step type, automatically selected solver, and a relative tolerance of 10−4. The WH concentration, initial pH, temperature, and operating time were selected as the inputs. The Box–Behnken design (BBD) was chosen as the multilevel experimental design for RSM implementation because of its simplicity and ability to reduce the amount of experiments required [49]. Accordingly, 29 operating condition sets need to be generated for BBD, as illustrated in Table S1 (Supporting Information). On the other hand, to guarantee the ANN training dataset, the studied range for operating parameters of DF@WH comprises an initial pH of 5–8, WH concentration of 5–45 g-TS/L, and temperature of 20–40°C. The maximum operating time was set at 168 h, and the accumulated H2 volume was recorded every 6 h. With such a setup, a dataset of 9135 operating condition sets needs to be generated. To reduce the dataset size for faster ANN training without loss of information, the PSO-based feature selection [50] was used to select the highest influence data points in a total of 9135 initial data points. The fitness function for feature selection was set based on the mean of the normalized original dataset (i.e., all data points in the original dataset are transformed into their respective values in the range of 0–1). A higher influential data point is determined if its normalized mean is closer to the mean of the normalized original dataset (i.e., 0.445). Different selected data sizes of 1%, 2%, 5%, 8%, and 10% of the original dataset were employed for ANN training. The best ANN models obtained in these procedures were validated with a fixed dataset containing 456 least influential data points (i.e., equivalent to 5% of the original dataset).
2.3. Evaluation Performance of DF@WH
2.4. Data-Driven Model Development, Comparison, and Optimization
Based on the previous section, it can be seen that the accumulated volume of H2 production (, mL/L) is a crucial parameter from which other performance parameters (i.e., HY, HPR, and TER) can be determined. Therefore, it was selected as the ANN’s output. Meanwhile, to simplify optimization using RSM, HY, HPR, and TER were selected as the response variables for RSM model development. At the same time, four operating parameters, including WH concentration, initial pH, temperature, and operating time, were selected as the inputs for ANN and RSM models.
2.4.1. ANN Model Development and Optimization by PSO
Various ANN structures were examined to ascertain relationships between inputs and outputs using the selected computational dataset (as mentioned in Section 2.2). The training, validating, and testing datasets accounted for 70%, 15%, and 15% of the total divisions of this selected dataset. The hidden layer’s characteristics, including activation functions, the number of hidden layers, and the number of neurons in the hidden layer, were altered to achieve the best ANN. Specifically, several standard activation functions, including logistic sigmoid (logsig), triangular basis (tribas), positive saturating linear (satlin), radial basis (radbas), symmetric saturating linear (satlins), radial basis normalized (radbasn), softmax (softmax), elliot sigmoid (elliotsig), rectified linear unit (ReLU), and hyperbolic tangent (tansig), were sequentially examined. The number of hidden layers and neutrons in each hidden layer was adjusted between 1–3 layers and 1–20 neurons, respectively, as shown in Figure 2. Each ANN topology was trained sequentially using the Deep Learning Toolbox (MATLAB 2023a, MathWorks Inc., USA). The ANN training was stopped when the epoch number reached a maximum of 2000 and/or the gradient value reached a minimum of 10−12. After obtaining the best ANN model, its output (i.e., the accumulated volume of H2 production) was transformed to HY, HPR, and TER based on Equations (1)–(3) and then considered the objective functions for further optimization by PSO.

The PSO procedure was performed in the Optimize Toolbox (MATLAB 2023a, MathWorks Inc., USA) with the input parameter’s boundaries presented in Table 1. Other parameters of the PSO optimization were kept at the initial setting of the toolbox.
Parameter | Lower boundary | Upper boundary |
---|---|---|
WH concentration (g-TS/L) | 5 | 45 |
Initial pH | 5 | 8 |
Temperature (°C) | 20 | 40 |
Operating time (h) | 0 | 168 |
- Abbreviation: WH, water hyacinth.
2.4.2. RSM Model Development and Optimization
2.4.3. Evaluation and Comparison Between ANN and RSM
3. Results and Discussion
3.1. ANN Model Development
Firstly, PSO-based feature selection was implemented to reduce the size of the dataset and thus save time for the ANN training procedure. The effects of various sizes of the selected dataset on the performance of ANN training were presented in Table S2 and Figure S2 (Supporting Information). It can be observed that with a smaller size, the training time was shorter. Specifically, the training time was decreased from 36.2 s for the original dataset to only 2.2 s for the selected dataset’s size of 1%. However, the reduced size also causes a loss of the information of the original dataset, leading to a reduced accuracy of the model. As a result, the size of 5% (i.e., corresponding to 456 data points) can be considered the best size that saved over 75% training time to 8.6 s without changing the information of the original dataset.
Table 2 lists the performance of different ANN models in replicating the kinetic model. Only the most suitable model with the lowest MSE for each activation function is shown here for simplicity. Because of the complexity of DF@WH, it is evident that ANN models with a single hidden layer are insufficient to replicate the outcomes of the kinetic model. Higher hidden layer ANNs, on the other hand, performed better but required more training time because they had a more considerable number of hidden neurons, making them more complex. It is also evident, though, that as the number of hidden layers rose to three, the overcomplexity of these models caused overfitting, which in turn reduced the goodness of fit [55, 56].
Hidden layer number | Activation function | Optimal hidden neuron number | MSE | NRMSE (all) | R2 (all) | Training time (s) | |||
---|---|---|---|---|---|---|---|---|---|
Training | Testing | Validation | All | ||||||
1 | elliotsig | 15 | 3764.3 | 14,318.6 | 4436.0 | 5451.0 | 0.0385 | 0.9868 | 3.77 |
logsig | 12 | 5787.1 | 11,638.3 | 8212.8 | 7030.7 | 0.0437 | 0.9830 | 3.73 | |
radbas | 20 | 2523.9 | 10,899.5 | 8100.6 | 4620.2 | 0.0354 | 0.9888 | 2.03 | |
radbasn | 20 | 786.5 | 8505.5 | 17,961.5 | 4526.7 | 0.0351 | 0.9890 | 5.95 | |
ReLU | 16 | 4911.5 | 13,692.9 | 16,338.3 | 7947.7 | 0.0465 | 0.9808 | 0.58 | |
satlin | 20 | 5532.0 | 23,989.0 | 11,245.4 | 9163.5 | 0.0499 | 0.9778 | 0.64 | |
satlins | 18 | 4691.6 | 5522.4 | 31,629.5 | 8863.8 | 0.0491 | 0.9786 | 0.51 | |
softmax | 18 | 599.2 | 1283.5 | 835.0 | 737.5 | 0.0142 | 0.9982 | 5.13 | |
tansig | 16 | 2381.5 | 3251.8 | 6644.3 | 3152.7 | 0.0293 | 0.9924 | 4.98 | |
tribas | 19 | 3693.9 | 5336.8 | 16,416.0 | 5852.2 | 0.0399 | 0.9858 | 0.55 | |
2 | elliotsig | 16-5 | 46.4 | 950.6 | 1509.9 | 402.2 | 0.0105 | 0.9990 | 6.46 |
logsig | 16-8 | 14.7 | 24.3 | 1727.5 | 273.5 | 0.0086 | 0.9993 | 8.12 | |
radbas | 8-17 | 145.3 | 3305.6 | 7310.8 | 1696.7 | 0.0215 | 0.9959 | 8.23 | |
radbasn | 19-6 | 1.4 | 49.9 | 394.5 | 67.8 | 0.0043 | 0.9998 | 9.11 | |
ReLU | 13-11 | 586.4 | 2849.2 | 2919.6 | 1276.9 | 0.0186 | 0.9969 | 0.63 | |
satlin | 19-9 | 154.9 | 722.8 | 1450.7 | 434.9 | 0.0109 | 0.9989 | 0.85 | |
satlins | 16-18 | 275.4 | 2600.9 | 6274.1 | 1526.1 | 0.0204 | 0.9963 | 0.88 | |
softmax | 9-11 | 42.5 | 51.4 | 44.6 | 44.2 | 0.0025 | 0.9999 | 8.63 | |
tansig | 12-14 | 19.8 | 41.4 | 488.7 | 93.5 | 0.0050 | 0.9997 | 9.53 | |
tribas | 20-16 | 2022.7 | 6076.0 | 8761.0 | 3644.1 | 0.0315 | 0.9912 | 1.08 | |
3 | elliotsig | 9-13-14 | 604.1 | 2955.8 | 1948.8 | 1159.5 | 0.0147 | 0.9972 | 14.93 |
logsig | 8-11-19 | 880.6 | 1766.4 | 2700.3 | 1287.1 | 0.0139 | 0.9969 | 17.07 | |
radbas | 11-12-12 | 502.1 | 4444.7 | 7429.2 | 2135.2 | 0.0236 | 0.9948 | 14.40 | |
radbasn | 12-16-10 | 119.2 | 1318.2 | 3029.6 | 736.6 | 0.0089 | 0.9982 | 22.10 | |
ReLU | 9-17-10 | 1235.1 | 4475.8 | 4932.4 | 2277.5 | 0.0228 | 0.9945 | 1.82 | |
satlin | 11-19-10 | 961.4 | 4212.8 | 2919.9 | 1744.2 | 0.0167 | 0.9958 | 2.20 | |
satlins | 13-16-10 | 937.9 | 3039.1 | 10,077.4 | 2626.8 | 0.0247 | 0.9936 | 2.02 | |
softmax | 10-11-13 | 126.0 | 236.2 | 163.2 | 148.2 | 0.0042 | 0.9996 | 20.07 | |
tansig | 12-14-16 | 374.0 | 523.0 | 1412.0 | 552.4 | 0.0087 | 0.9986 | 21.05 | |
tribas | 8-14-11 | 2273.4 | 5965.1 | 9909.2 | 3975.3 | 0.0327 | 0.9904 | 2.37 |
- Note: The boldface represents the best performance.
- Abbreviations: ANN, artificial neural network; MSE, mean square error; NRMSE, normalized root mean squared error.
Table 2 demonstrates that the ReLU, tribas, satlins, and satlin functions demonstrated rapid training, taking about 2 s to complete. On the other hand, these activation functions showed subpar performance. In the meantime, the training time and performance of the radbas and radbasn functions were subpar. The elliotsig, tansig, logsig, and softmax functions all required more training time, but their results were superior. With the softmax function and a 4-9-11-1 network topology, the optimal ANN model was thus discovered. With the lowest MSE and NRMSE of 44.2 and 0.0025, respectively, this model demonstrated the best performance. Figure S1 and Table S3 (Supporting Information) provide the specific characteristics of the best ANN model. Furthermore, Figure 3 illustrates that the best ANN model’s expected data were close to the kinetic model-generated synthetic data. All R2 values in ANN training, testing, and validation surpass 0.999, indicating the ANN model successfully reproduced the kinetic model.






3.2. Compare to RSM
The expected results from RSM and matching computational synthetic data are shown in Table S4 (Supporting Information). Also, ANOVA results for HY, HPR, and TER responses are presented in Tables S5–S7 (Supporting Information), respectively. The quadratic equation coefficients for these responses in relation to substrate concentration, initial pH, temperature, and operating time are shown in Table S8 (Supporting Information). Consequently, these RSM models were significant based on their high F-values of 4.8–52.9 and low p-values of less than 0.003 [54].
Nevertheless, the RSM models exhibit a substantially lower level of accuracy than the ANN model. Figure 4 shows the correlation between the kinetic model-generated synthetic data and the expected data from the ANN and RSM models. The ANN model demonstrated a predictability that was superior to that of the RSM model, where greater R2 values of ≥ 0.9999 were compared to 0.8283–0.9815 of RSM. Besides, compared to the RSM models, the ANN model’s NRMSE values were considerably lower (i.e., 0.004–0.005 vs. 0.049–0.108). In contrast to the RSM, which utilizes a second-order polynomial, the ANN model is able to analyze higher-order nonlinear systems, which accounts for its superior accuracy in terms of predictions [19, 25]. These results once again proved that the best ANN model is reliable for further simulation and optimization.



3.3. ANN Simulation
The simulated impacts of WH concentration, temperature, initial pH, and operating time on DF@WH from the best ANN model are shown in Figure 5. Besides, the detailed energies input and output from DF@WH were also simulated and illustrated in Figure 6. These simulation results were obtained under the base conditions of a substrate concentration of 20 g-TS/L, an initial pH of 7.0, a temperature of 35°C, and an operating time of 48 h. It can be observed that the simulation results of the ANN model agreed well with the kinetic model-generated synthetic data. That once again confirms the excellent precision of the ANN model. Moreover, the best ANN model considerably reduced computational time in comparison to the kinetic model. Accordingly, to ensure the data for Figure 5 (i.e., including 50 data points from 22 different operating condition sets), the kinetic model required over 1100 s to simulate; meanwhile, the ANN model only needed less than 2 s to emulate a 15-times-larger dataset (i.e., 750 data points from 330 different operating condition sets).








It can be observed in Figure 5 that substrate concentration, initial pH, temperature, and operating time have a multidimensional effect on the energy efficiency and performance of DF@WH. Accordingly, a WH concentration that is too high or too low causes a decline in the performance of DF@WH. As is known, low substrate concentrations cause substrate limitation, while high concentrations lead to substrate inhibition, which reduces hydrolysis and acidogenesis rates [43]. On the other hand, an initial pH and temperature that are too low or too high reduce DF@WH’s productivity and energy efficiency, which are modeled in their inhibition factors [43]. Meanwhile, HY and TER increased in the first 72 h of operation and then changed insignificantly when DF@WH continued to be operated for up to 168 h. This caused a decrease in HPR when the operation time was prolonged.
Figure 6 illustrates the detailed energy output (i.e., H2 energy) and input into the DF@WH system. It can be seen that the H2 energy produced tends to be similar to HY since they are both proportional to the cumulative volume of H2 produced, as illustrated in Equations (1) and (4), respectively. However, it can be observed that this H2 energy output was significantly lower than the input energies into the DF@WH, which is mainly the energy from the substrate. This is consistent with the results obtained in Figure 5, as increasing substrate concentration significantly reduced the energy conversion efficiency of DF. Besides, except for the effect of initial pH, a change in pH does not change the energy input; increased operating time and temperature increase the energy input requirements associated with heating, heat loss compensation, and agitation. In other words, excessively increasing the substrate concentration, temperature, and operating time increases the input energy, which in turn can reduce the energy recovery of DF@WH. From these simulation results, it is clear that to maximize the performance and efficiency of DF@WH, the substrate concentration, initial pH, temperature, and operating time must be optimized.
3.4. ANN–PSO Optimization
The iteration diagrams and ideal values from the optimization process are displayed in Figure 7. The PSO implementation method showed quick computation; it took roughly 3 s and less than 50 iterations. Such fast calculation of the ANN–PSO approach has also been reported in previous studies [25, 28, 57]. As can also be observed in Figure 7, to achieve the highest performance and efficiency, DF@WH should be operated under an initial pH of 6.5. At the same time, other operational parameters, including substrate concentration, temperature, and operating time, cannot exist at an optimal value to achieve the maximum performance and effectiveness of DF@WH simultaneously. Accordingly, HY could reach the maximum value at 266.8 mL/g-TS when DF was operated at 5.0 g-TS of WH per liter and 34.4°C for 168 h of operation. HPR could be achieved at 80.5 mL/L/h at the WH concentration of 15 g-TS/L, 34.5°C, and after 23.3 h of operation. Meanwhile, to achieve the optimal energy conversion value of 11.4%, DF@WH must be operated at 7.8 g/L, 32.6°C, and for 34 h of operation. These determined optimal conditions are within the typical optimal range previously observed for H2 production from DF [39, 58, 59].






When all performance parameters (i.e., HY, HPR, and TER) were assumed to be of equal importance, the co-optimal conditions for DF@WH were determined at 8.9 g-TS/L of WH, initial pH of 6.5, a temperature of 33.9°C, and operating time of 28.2 h. Under such co-optimal conditions, HY, HPR, and TER could be achieved at 200.2 mL/g-TS, 62.9 mL/L/h, and 11.0%, respectively.
Figure 8 presents the experimental data under co-optimal conditions determined from RSM (i.e., 14.1 g-TS/L, initial pH of 6.5, 31.8°C, and 111.9 h) and ANN–PSO approaches. Additionally, other optimization results of RSM implementation are presented in Table S9 (Supporting Information). It was also found in Figure 8 that the predicted optimal values from RSM were lower and less accurate than the ANN–PSO approach. Specifically, the accuracy of RSM was only 80%–87% and less than 77% when compared to the kinetic model and experimental data, respectively. These values were significantly lower than those from the ANN–PSO approach, with an accuracy of more than 99% compared to the kinetic model and more than 96% compared to the experimental data (details of the experimental procedure can be found in Appendix A, supporting information), reaffirming the precise predictability of the ANN model as well as the ANN–PSO optimization strategy.




It should be noted that, due to the thermodynamic limit of the DF process, at least two-thirds of the H2 element are retained in solution to form soluble by-products (such as volatile fatty acids and alcohols) instead of converting turns into H2 gas [60]. This resulted in low H2 productivity and energy conversion efficiency of DF@WH. The optimal yield obtained in this study was around 50% higher than the highest value found in the literature (i.e., 134.9 mL/g-TS at 18 g-WH/L, 35°C, initial pH 6, and 24 h of operation [11]). However, it only reached approximately one-fifth of the theoretical H2 yield of WH (i.e., ~1000 mL/g-TS [12]). Therefore, other downstream processes that can further convert soluble by-products to H2, such as microbial electrolysis cells and photofermentation, need to be further investigated and optimized. To achieve these goals, further in-depth analyses and simulations of microbial community variation and soluble by-product formation in DF@WH are also required.
4. Conclusion
This study demonstrates that the combined approach of the physics-based model, ANN model, and PSO algorithm is an effective strategy for accurate and rapid simulation and optimization of DF@WH. The simulation results revealed that substrate concentration, initial pH, temperature, and operating time have synergistic effects on DF@WH that must be optimized. The optimization results showed that achieving the highest energy efficiency and performance values at the same time was not possible under a specific optimal operational condition. At co-optimal conditions (i.e., H2 productivity and TER were seen as being equally significant), the H2 yield could be achieved at 200.2 mL/g-TS at the production rate of 62.9 mL/L/h with a TER of 11.0%. However, combining DF@WH with other downstream processes, such as microbial electrolysis cells and photofermentation, needs further investigation and optimization to enhance the H2 productivity and energy recovery from WH.
Conflicts of Interest
The authors declare no conflicts of interest.
Funding
This work was supported by Ho Chi Minh City University of Natural Resources and Environment, Vietnam; Duy Tan University, Vietnam; and Gachon University, South Korea.
Supporting Information
Additional supporting information can be found online in the Supporting Information section.
Open Research
Data Availability Statement
The data that supports the findings of this study are available in the supporting information of this article.