Department of Computer Sciences , Faculty of Mathematics , Statistics and Computer Science , Semnan University , Semnan , P.O. Box: 35195–363 , Iran , semnan.ac.ir

Search for more papers by this author

Alireza Sabagh Moeini,

Alireza Sabagh Moeini

Faculty of Physics , Semnan University , Semnan , P.O. Box: 35195–363 , Iran , semnan.ac.ir

Search for more papers by this author

Fatemeh Shariatmadar Tehrani,

Corresponding Author

Fatemeh Shariatmadar Tehrani

[email protected]

orcid.org/0000-0003-1479-8546

Faculty of Physics , Semnan University , Semnan , P.O. Box: 35195–363 , Iran , semnan.ac.ir

Search for more papers by this author

Alireza Naeimi-Sadigh,

Alireza Naeimi-Sadigh

orcid.org/0000-0001-9225-649X

Department of Computer Sciences , Faculty of Mathematics , Statistics and Computer Science , Semnan University , Semnan , P.O. Box: 35195–363 , Iran , semnan.ac.ir

Search for more papers by this author

First published: 19 May 2025

https://doi.org/10.1155/er/9974355

Academic Editor: Chaofan Sun

Share a link

Email
Wechat
Bluesky

Abstract

Perovskites are a class of materials, known for their diverse structural, electronic, and optical properties. Band gap in perovskites is crucial in determining their suitability for applications such as solar cells, light-emitting diodes, and photodetectors. By tuning the band gap through composition and structural modifications, perovskites can be optimized for specific optoelectronic and energy-related applications, making them a versatile material in modern technology. Machine learning (ML) provides an efficient approach to predicting material band gaps by analyzing atomic and structural features, facilitating the discovery of materials with tailored electronic properties. This study employs adaptive boosting regression (ABR), random forest regression (RFR), and gradient boosting regression (GBR) for band gap prediction, alongside support vector machine (SVM), random forest classifier (RFC), and multilayer perceptron (MLP) for classifying compounds with zero and nonzero band gaps. Regression models are assessed using mean absolute error (MAE), mean squared error (MSE), and R², while classification performance is evaluated based on accuracy, precision, recall, and F1-score. ABR excels in predicting band gaps of inorganic perovskites, while RFC is the most effective model for classification. Feature analysis identifies the standard deviation of valence charges as the key predictor. This study underscores ML’s potential to accelerate perovskite discovery through accurate band gap predictions.

1. Introduction

The study of perovskites is not a new subject, as research on these materials dates back to 1839 when the calcium titanate (CaTiO₃) mineral was discovered [1, 2]. The mineral was discovered in the Ural. Named for Russian mineralogist Lev Perovski, Mountains of Russia was written by Gustav Rose [3]. Perovskite’s exceptional qualities and wide range of uses in innovative technology make it significant. The attractive electrical and magnetic properties of perovskites with the ABX₃ structure have made them usable in fields including solar cells [4–6], photodetectors [7–9], high-temperature superconductivity [10–12], ferroelectricity [13–15], artificial synapses devices [16–18], and so on.

Ceramics (processed inorganic materials) are among the most widely produced materials by humankind. However, there are still a few specific ceramic phases that, due to their benefits in terms of weight, volume, and especially their technological significance, maintain their dominance over human-made products [19]. Examination of lists of ternary crystal structures highlights the significance of 12 key structures in the world of ceramics [20]. One of the most important of these mentioned structures is the ABX₃ structure, or perovskites, which, with chemical modifications, can offer a diverse set of phases with highly varied functionalities.

The general formula of perovskites is ABX₃, where the A cation (usually much larger than the B cation) is 12-fold coordinated with the X anions. The B cations are also 6-fold coordinated with the X anions. In this structure, the A cation consists of alkali metals, alkaline earth metals, or rare earth elements, while the B cation is typically a transition metal. The X anions are often elements. The anion X elements consist of Cl, Br, O, I, and F elements [21].

The empirical formula for determining the crystal structure of the ABX₃ chemical is Goldschmidt’s tolerant factor t, which is as follows [22, 23]:

(1)

The ionic radii R_A, R_B, and R_X represent the radii of the A, B, and X ions, respectively. When the tolerance factor (t) equals 1, perovskite compounds exhibit an optimal cubic close-packed structure. Deviations from t = 1 introduce geometric strain and crystal distortions. As t diverges further from unity, the crystal transitions to lower-symmetry structures. By calculating t the crystalline structure can be predicted, and its geometric strain and stability can be assessed. A cubic structure is expected for perovskites when the tolerance factor is close to 1. For t < 0.96, the structure is typically orthorhombic, while 0.96 < t < 1 corresponds to a rhombohedral structure. Larger deviations, where t > 1, lead to hexagonal structures. On a qualitative level, perovskite formation is impeded if the A-site cation is excessively large (t > 1). Similarly, if t < 0.8, the A-site cation is too small, potentially resulting in alternative structures [24–26].

Actually, we are speaking about a number of different perovskite branches, each with distinct properties of its own, when we discuss inorganic perovskites. Apart from oxygen (O) occupying the X site exclusively, oxide perovskites have the same overall structure as inorganic perovskites. Although double and layered perovskites are subclasses of inorganic perovskites, halide perovskites have Cl, Br, and I positioned in the X site [27–30].

Band gap is essential to understanding how various materials react to light and electricity in the material science. The magnitude of the band gap determines a material’s electrical conductivity, light absorption capacity, and ultimate categorization as an insulator, conductor, or semiconductor [31]. Perovskites have the potential to be classified as semiconductors, making them suitable materials for use in photovoltaic (solar cells). In recent times, their exceptional qualities and potential applications in various technologies have made them extremely favorable [32–35]. Both computational and experimental techniques can be used to calculate electronic parameters like the band gap. Despite generally offering increased reliability, experimental procedures have drawbacks [36–38]. Density functional theory (DFT), a popular computational technique used in condensed matter physics for many computations, including band gap estimation, is one of the well-known computational techniques [39–42]. Although the generalized gradient approximation (GGA) and the local density approximation (LDA), two popular techniques for estimating the ground state electronic features of materials, are based on uniform and non-uniform electron density distributions, respectively, they are unable to produce accurate results for excited state DFT calculations [43–45]. On the other hand, hybrid approaches seek to include aspects of first principles approaches, such as Hartree-Fock techniques, with improved DFT mathematics and computer codes. According to Many-Body Perturbation Theory, the GW approach is reliable and accurate when used, especially when analyzing the electrical and optical properties [46, 47]. In DFT computations, the hybrid functional, or HSE06, is employed. The GGA is combined with a component of precise exchange from Hartree–Fock theory to increase the precision of electronic structure calculations, especially for band gap predictions in semiconductors and insulators [48, 49].

Moreover, machine learning (ML) is a computational technique to predict the electronic characteristics of materials. ML techniques offer a powerful means to accelerate the discovery of materials while significantly reducing computational costs, all without compromising accuracy compared to traditional first-principles methods. The seamless integration of big data, artificial intelligence, and materials modeling has ushered in a transformative era in materials design [50–53]. These include the ML approaches that have been described by the statistical learning community, which are currently steering research in the direction of a brand-new data-driven science paradigm. Additionally, ML has demonstrated remarkable efficacy in forecasting the band gap of various perovskites [54–58]. Extracting certain features through experimental studies or DFT calculations can be both costly and time intensive. However, features for ML can be effectively generated by means of the properties of the constituent elements of a compound. Feature engineering, encompassing both feature construction and feature selection, plays a key role in the ML workflow. In many ML processes, the model’s maximum performance is largely influenced by the validity of the features, the sample size, and the quality of feature dimensionality within the dataset [59, 60]. The potential of undiscovered perovskite materials for solar energy applications is investigated by Keisuke Takahashi et al. [61] To find materials with promising photovoltaic qualities, the research combines data science and first-principle computations. The work analyzes a dataset of 15,000 perovskite compounds using ML, more precisely a random forest (RF) model, to predict their band gaps—a crucial component for solar energy absorption. The band gap is mostly influenced by 18 physical characteristics, according to the model. Following training, the algorithm predicted 9328 candidate materials with possible uses in solar cell applications. Among these, first-principles calculations were used to analyze a subset of materials further based on lithium (Li) and sodium (Na). Of them, it was discovered that 11 novel materials had the proper band gaps and formation energy for efficient photovoltaic application [61]. ABO₃-Perovskites’ band gap predictions can be significantly enhanced by integrating structural information (such as bond-valence vector sums) and formation energy in a progressive learning model, as shown by Li et al. [62] work. The structural diversity of perovskites presents a difficulty that their method effectively overcomes, offering a strong foundation for material discovery in industries such as photovoltaics. Increased precision in predictions was achieved by the model’s effective mapping of the link between structural properties and band gap values [62]. Huang et al. have selected 8 elemental features for wurtzite nitride semiconductors: covalent radius, melting point, valence, atomic weight, atomic number, periodic number, and first ionization energy. The models were trained and evaluated using a feature space derived from the 58-dimensional space, representing all possible combinations of the 8 elemental characteristics. From a physically naturally inclined perspective, covalent radius, and valence have been demonstrated to be the element’s most significant electronic qualities. These characteristics are believed to be the most applicable and effective descriptors for electronic band gap and alignment predictions [63]. Shun Feng et al. explore the prediction of organic–inorganic hybrid perovskites’ band gaps using ML approaches, which are essential for their optical characteristics and utilization in optoelectronic devices. A dataset of 1208 entries and 30 feature descriptors pertaining to the A, B, and X components of perovskites was assembled by the study. Four algorithms were used to create prediction models: XGBoost, RF, LightGBM, and Gradient Boosting Regression (GBR). With a mean absolute error (MAE) of 0.0901, mean squared error (MSE) of 0.0173, and an R² value of 0.9913, the XGBoost model outperformed the others, demonstrating remarkable accuracy in band gap prediction. The predictions made by the XGBoost model were interpreted using the SHAP (SHapley Additive exPlanations) approach. The results showed that the band gap is strongly influenced by the, A-site ion’s occupancy rate, which has a negative correlation with the expected values [64].

The association between the elemental features of component ions and the band gap of ABX₃ perovskites has been described by Vladislav Gladkikh et al. [65] using ML algorithms. The nonlinear mappings between the predictors and band gap were discovered through data analysis using atomic cluster expansion (ACE), a small-scale, semi-parametric ML technique. This approach avoids the curse of dimensionality, does not assume anything about the functional form of the descriptors beforehand, and does not require a lot of processing power. The atomic radii, ionization energies, and electron affinities of the constituent elements are shown to predominantly influence the band gap. This relationship is nonlinear with respect to these descriptors. Further research is required to unravel the molecular mechanisms underlying this dependence [65]. Zhang et al. [66] utilized a set of 20 physical attributes to model and predict the band gaps of double perovskite materials. Their findings revealed that properties such as bulk modulus, superconducting transition temperature, and cation electronegativity play the most significant roles, reflecting their strong connection to the material’s electronic structure [66]. Zhuo et al. [67] have employed 5 distinct ML models, each with features derived only from the elemental characteristics of the constituent elements, to predict the band gap of inorganic solids. These attributes comprise, among other things, the atom’s physical properties, electrical structure, and relative position on the periodic table [67]. The utilization of ML approaches to the prediction of ABX₃ perovskites’ electronic band gaps is investigated by Obada et al. [68] In order to create predictive models and give interpretability—a critical component of material science—the study makes use of explainable AI (XAI) tools. This helps researchers better understand the fundamental causes of band gap features. The authors employed a range of ML models, such as SVM and decision trees, and utilized techniques such as SHAP to elucidate the findings and determine the significance of features such as atomic number, valence electrons ionic radius, first ionization energy, and so forth. This explainability trait helps researchers better understand the connections between the band gaps and structural and chemical characteristics of perovskites, which will inform future efforts in material design [68]. Yang et al. [69] aim to expedite the search for stable hybrid perovskites with ideal band gaps for solar cells and optoelectronic applications, while also being free of lead and harmful to the environment. To anticipate band gaps, the authors used ML models such as neural networks (NN) and RF. The predictions were interpreted in terms of how different elemental properties, such as atomic radii and ionization energy, influence them using Explainable AI (XAI) techniques such as SHAP. The model determined important parameters that affect the band gap, including the organic cation in the structure and the halide (X-site) selection. It effectively forecasted novel perovskite materials with the intended band gaps, proving the usefulness of interpretable ML in directing research endeavors [69]. In order to maximize the material’s efficiency for electronic applications, Eti Mahal et al. want to utilize ML to forecast the kind of band adjustment (type-I, type-II, or type-III) in 2D hybrid perovskites. A dataset of well-known 2D perovskite materials including details on band adjustment was assembled by the authors. Predictive models were trained using a variety of ML algorithms, such as SVM and RF, based on the structural and electrical characteristics of the perovskites. Electronic properties, lattice parameters, and elemental qualities were important factors impacting the band alignment prediction. This research demonstrates that properties like band gap levels and variations in component electronegativity have a significant impact on the alignment predictions, according to feature importance analysis [70].

In this study, our data are derived from the paper by Chenebuah and Chenebuah [71] who extracted 16,323 data points from the Open Quantum Materials Database (OQMD). By compiling tens of thousands, or even hundreds of thousands, of DFT simulations into extensive databases, high-throughput DFT (HT-DFT) is rapidly becoming a powerful tool for accelerating materials design and discovery. Due to the extensive variety of structures and chemistries present in these databases, complex material challenges can be addressed in a much more thorough and efficient manner. The OQMD has over a million DFT-calculated crystal structures [71, 72]. Our aim is to classify inorganic perovskites into two categories with zero and nonzero band gaps (E_g) and then to predict the band gap of these materials using ML. For this purpose, in band gap prediction, we used 14 different ML models and for classification we used 7 different ML models with various test-to-train ratios (5/95, 10/90, 15/85, 20/80, 25/75, 30/70) and applied Grid Search Cross-Validation (GridSearchCV) for all models. Among them, three models—MLP, SVM, and RFC—proved to be the best for classification, while three models—ABR, RFR, and GBR—were the best machines for predicting the band gap. For the classification and regression of band gaps, we selected 41 features. Our approach uses less computationally costly features than Tetteh Chenebuah et al.’s work, and the results we get are better than what they have presented.

In the classification, The RFC model outperformed SVM and MLP, achieving a best cross-validation score exceeding 90% and highlighting the most effective values for F1-score among the models. In the band gap prediction, the ABR model outperformed RFR, and GBR for attaining a prediction accuracy above 88% and presenting the best results for mean absolute error (MAE) and mean squared error (MSE) among the models. This work is in line with our previous study on the feature importance of low-symmetry perovskites [73], once again highlighting the importance of valence (std) and valence (mean), and their impact on band gap prediction.

2. Data and Features

The OQMD provided the dataset, which was extracted and included 16,323 strong samples of ABX₃ inorganic perovskite structures [71, 72]. 11,316 (or about 80%) of the 16,323 potential data points have a zero-band gap, whereas 5,007 (or about 20%) have a non-zero band gap. The common chemical formula for inorganic perovskites is ABX₃, where A is usually a large cation, B is a smaller metal cation, and X is an anion, frequently a halide (Te, I, H, N, Se, Br, S, F, Cl, and O). Except for space group, crystal structure, and formation energy, every attribute of perovskite is determined by the features of its constituent atoms. There are no specific perovskite features that need to be measured. Each significant atomic property, like mass, covalent radius,and valence, has its mean and standard deviation (std) determined; these values are reported as features. Every compound is guaranteed to have an equal number of characteristics by computing all elemental data’s mean and standard deviation. Our goal in this study was to employ the fewest possible computational features. By computing the mean and standard deviation of the elements that make up the perovskite, we can determine our elemental features. These data, however, show perovskites with the same formula but distinct space groups, crystal structures, or formation energy. Therefore, the machine will not be able to discriminate between perovskites with the same formula if we do not add these three computational features to the overall set of features with which the machine learns. We must therefore make use of these three computational features. For this study, two main aims have been established. First, we want to classify inorganic perovskites into two groups: those with zero and nonzero band gaps, utilizing novel features and several ML algorithms. After that, we will forecast the band gap using all of the data.

3. ML Models

Below is a general definition of the unknown band gap (E_g) that requires estimation:

(2)

where x is the known perovskite features, f (x) is a function that simulates the relationship between the perovskite features and output E_g. With the help of a well-known labeled perovskite dataset known as the training set, ML aims to ascertain f (x). Subsequently, the trained model is used for prediction. In this study, we employed 3 ML models—RFC [74], SVM [75], and MLP [76]—for classification. To predict band gap, we employed three models: ABR [77], RFR [78], and GBR [79].

4. Criteria

4.1. Band Gap Prediction Criteria

To evaluate the prediction accuracy of each model on the test set, three metrics are used: MAE, MSE, and R².

(3)

(4)

(5)

In equations (3) –(5), represents the actual band gap value randomly selected from the test set. is the average value of , is the predicted value of the corresponding regression model, and i = 1, 2, …, N, where N = number of inorganic perovskites.

4.2. Classification Criteria

We will validate the classification results by using four additional metrics, accuracy, precision, recall, and F1-score. These four criteria in classification are essential.

Accuracy is a metric that represents the proportion of correct predictions made by the model compared to the total number of predictions. It is commonly used for classification problems.

(6)

Precision quantifies how well optimistic predictions work. It shows the percentage of true-positive predictions among all of the model’s positive predictions.

(7)

Obviously, a high precision means that when the model predicts a positive class, it is likely correct.

Recall, often known as sensitivity or the true-positive rate, measures a model’s ability to identify all relevant instances (true positives) within the dataset. It shows the percentage of actual positives that are true positives.

(8)

A high recall means that the model is effective in identifying positive instances.

The F1-score is the harmonic average of precision and recall, offering a balanced measure of both metrics., making it a good measure when we need to find an optimal balance between precision and recall.

(9)

When the distribution of classes is unbalanced (one class is more frequent than the other), the F1-score is very helpful. In many situations, an F1-score is a more revealing metric than accuracy since it shows that both precision and recall are rather good [80, 81].

5. Results

Our data were gathered entirely from publications [71] that are also extracted from OQMD, which includes the dataset of 16,323 inorganic perovskites. Feature selection is an important phase in ML models. ML model performance can also be improved by using a sufficient dataset. Three computational and 38 elemental datasets were initially chosen for this study. Selected properties taken from OQMD and the periodic table are included in Table 1 [72, 82]. For all features except the space group, crystal structure, and formation energy, the mean and standard deviation (std) taken into account. Thus, 41 features ought to be made available. RFC, SVM, and MLP are 3 ML models for classification. Three models—ABR, RFR, and GBR—were to predict the band gap. The range of hyperparameters tested and the best hyperparameters in band gap classification (C) and prediction (P) are found using the Grid-Search tool, as indicated in Table 2.

Table 1. Selection of the elemental features that were chosen, without bias in the process of selection.

Features	Unit
Space group	—
Crystal structure	—
Formation energy	eV/atom
Atomic mass	amu
Boiling point	K
Density	g/L
Static average electric dipole polarizability	Ȧ³
Period	—
Electron affinity	KJ/mol
Villars modified mendeleev number	—
Group	—
Pettifor mendeleev number	—
Atomic radius	Ȧ
First ionization energy	KJ/mol
Specific heat capacity	J/Kg°C
Atomic number	—
Heat of fusion	J/g
Heat of vaporization	KJ/mol
Thermal conductivity	W/m.k
Molar volume	Cc
Valence	—
Covalent radius	Ȧ

Note: All values, except the first three, have their means and standard deviations computed.

Table 2. The range of hyperparameters tested and the best hyperparameters in bandgap classification (C) and prediction (P).

Models	Range of hyperparameters tested	Best hyperparameters
SVM (C)	C: [1,10,100], degree: [2, 3, 4], kernel: [linear, rbf, poly], gamma: [scale, auto]	C: 100, degree: 2, kernel: rbf, gamma: auto

RFC (C)	n_estimators: [50, 100, 200], max_depth: [None, 10, 20, 30, 40, 50], min_samples_split: [2, 5, 10], min_samples_leaf: [1, 2, 4], max_features: [auto, sqrt, log2]	n_estimators: 200, max_depth: 50, min_samples_split: 10, min_samples_leaf: 4, max_features: sqrt

MLP (C)	hidden_layer_sizes: [(50,), (100,), (50, 50), (100, 50, 25)], activation: [relu, tanh, logistic], solver: [adam, sgd], alpha: [0.0001, 0.001, 0.01], learning_rate: [constant, adaptive], max_iter: [200, 400, 600]	hidden_layer_sizes: (100, 50, 25), activation: tanh, solver: adam, alpha: 0.01, learning_rate: adaptive, max_iter: 600

ABR (P)	estimator__max_depth: [10, 15, 20], n_estimators: [100, 200, 300], learning_rate: [0.01, 0.05, 0.1]	estimator__max_depth: 15, n_estimators: 300, learning_rate: 0.1

RFR (P)	n_estimators: [100, 200], max_depth: [None, 10, 20], min_samples_split: [2, 5, 10], min_samples_leaf: [1, 2, 4]	n_estimators: 200, max_depth: None, min_samples_split: 2, min_samples_leaf: 1

GBR (P)	n_estimators: [100, 200, 300], learning_rate: [0.01, 0.1, 0.2], max_depth: [3, 5, 7], subsample: [0.8, 0.9, 1.0], min_samples_split: [2, 5, 10]	n_estimators: 200, learning_rate: 0.2, max_depth: 7, subsample: 1.0, min_samples_split: 10

5.1. Band Gap Prediction

The 41-dimensional feature is used to compare the performances of the 3 ML models for the predicted band gap of inorganic perovskites and obtained results shown in Table 3. One of the objectives was to identify the optimal test/train ratio for each of the three models—ABR, RFR, and GBR. The RFR and GBR models were shown to perform best with a test/train ratio of 15/85, whereas the ABR model performed best with a ratio of 10/90. With a value of 0.19 eV MAE, 0.18 eV MSE, and 88% R² on the test set, we found that the ABR model produced the best results when compared to the other two models, regardless of whether R², MSE, or MAE were used as the assessment criteria. The best-performing ML model (LGB) –MAE ~ 0.21 and R² ~ 87%—in the study by Chenebuah and Chenebuah [71], from which we obtained the data, yielded results in our study that were approximately equal to the ML model (GBR)— MAE ~ 0.22 and R² ~ 87%—as the worst outcomes in our study. The 41-dimensional feature set is used in Figure 1 to compare the performance of the 3 ML models in predicting the band gap of ABX₃ perovskites. The data in both researches are identical, and comparing with other studies in the field of inorganic perovskites necessitates at least the same dataset, hence this comparison is completely valid. Therefore, it might not be acceptable to compare our findings with others that might have different data. However, the ML models that were employed and the features that the machine was trained with are the two reasons that contributed to the improvement in our outcomes in this comparison. One of the advantages of our work is that we tried to use as few computational features as feasible during this process. The machine learns more efficiently and can distinguish between different things better when it uses more computational features. In this study, we made progress toward our objective of lowering computing costs through the use of ML, while still attaining marginally improved outcomes.

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

Band gap prediction performance on the train and test sets for ABX3 perovskites using three ML models. The test/train split for the RFR and GBR models is 15/85, while for the ABR model, it is 10/90. The ABR model provides the best predictions, closely aligning with the ideal line. In contrast, the RFR and GBR models yield relatively similar outcomes.

Table 3. Statistics of predicted band gaps of ABX₃ perovskites by ABR, RFR, and GBR models based on 41 features.

Band gap prediction model	R² train %	R² test %	MSE (eV)	MAE (eV)	Test/train (%)	Cross validation R² (%)
Band gap prediction model	R² train %	R² test %	MSE (eV)	MAE (eV)	Test/train (%)	CV = 5	CV = 10
ABR	99	88	0.18	0.18	10/90	86	85
RFR	98	88	0.19	0.21	15/85	86	85
GBR	98	87	0.19	0.22	15/85	85	84

The cross-validation (CV) R² scores are slightly lower than the test scores, which is expected. However, if CV = 10 has lower R² than CV = 5, this might indicate variance sensitivity. We are responsible for investigating the extent of the difference in reported errors between CV = 5 and CV = 10. Therefore, Table 4 reports the MSE and MAE criteria for both CV = 5 and CV = 10. As shown in Table 4, the approximate difference between these two cases is around 0.02–0.03 eV, which, as expected, was predictable given the observed differences in R². This result does not indicate any unexpected deviation.

Table 4. Analysis of the differences in MSE and MAE under the two cross-validation settings: CV = 5 and CV = 10.

Band gap prediction model	MSE (eV) CV = 5	MAE (eV) CV = 5	MSE (eV) CV = 10	MAE (eV) CV = 10	Test/train %

ABR	0.18	0.18	0.20	0.22	10/90
RFR	0.19	0.21	0.22	0.23	15/85
GBR	0.19	0.22	0.22	0.24	15/85

5.2. Classification

To classify inorganic perovskites based on the previously used dataset and selected features, 3 ML models are employed: RFC, SVM, and MLP. The optimal test/train ratio for all three models was determined to be 5/95, which is acceptable given the substantial size of the dataset. Since our dataset is imbalanced, we tried to address this issue by using CV = 5 and CV = 10. During model evaluation, cross-validation with CV = 5 and CV = 10 was performed to reduce the variance caused by random sampling and to ensure consistent generalization. We used four metrics—accuracy, precision (Pre), recall (Rec), and F1-score (F1)—to validate our models. The accuracy of all models is 90%, and despite 80% of the data having a zero-band gap, they have provided satisfactory results. The last four metrics and confusion matrix are shown in Table 5, and receiver operating characteristic (ROC) is plotted in Figure 2 for each of the learning machines. The confusion matrix is essentially a 2 × 2 matrix that simply displays the efficiency of the model. In the main diagonal of this matrix, the correctly predicted zero and nonzero band gaps are placed, while the off-diagonal elements represent the incorrectly predicted values. The first column shows the correct and incorrect predictions for the zero-band gap, and the second column similarly shows the correct predictions for the nonzero band gap. A graphical depiction called an ROC curve is used to evaluate the performance of a binary classification model. At various threshold values, it displays the interchange between the truepositive rate (TPR) and the false-positive rate (FPR). The percentage of true positives that the model correctly detects is called the TPR, sometimes referred to as recall or sensitivity. The percentage of true negatives that the model incorrectly classifies as positive is known as the FPR. With a TPR of 1 and an FPR of 0, a perfect classifier would have a curve that passes through the ideal point, which is the upper-left corner of the plot. The model’s ability to differentiate between positive and negative classes increases with the ROC curve’s proximity to the upper-left corner. Area under the ROC curve (AUC) is commonly regarded as a measure of the model’s performance. If AUC = 1, we have a perfect classifier and AUC = 0.5 indicates such a random classifier (equivalent to flipping a coin). For AUC < 0.5, the model performs worse than random guessing. The closer the AUC value gets to one, the better the machine we have chosen for the classification [83, 84]. AUC value obtained for two machines, MLP and RFC, is 0.95, and for the SVM learning machine of the value of 0.93 is achieved, which also indicates good results.

Table 5. Three criteria, normal accuracy, accuracy with cross validation state (CV = 5, 10), and confusion matrix. The test/train ratio is 5/95 for all models.

Band gap	MLP			RFC			SVM
Band gap	Pre	Rec	F1	Pre	Rec	F1	Pre	Rec		F1
Zero	0.93	0.94	0.93	0.92	0.96	0.94	0.91	0.95	0.93
Non-zero	0.86	0.82	0.84	0.90	0.82	0.86	0.88	0.81	0.84
Accuracy	0.90			0.91			0.90
Accuracy (CV = 5)	0.89			0.90			0.89
Accuracy (CV = 10)	0.86			0.88			0.86
Confusion matrix

6. Feature Importance

Several feature-ranking approaches, such as Permutation Importance (PI) [74, 85] and Local Interpretable Model-Agnosti Explanations (LIME) [86], are used to evaluate the significance of the features. LIME uses a more straightforward, interpretable model, like linear regression, to locally approximate the complicated model surrounding a given data point in order to clarify the reasons behind a data point’s specific prediction. By looking at the LIME explanations for various data points and predictions, you can have a better understanding of how the model is performing with the existing hyperparameters. We now need to change the hyperparameter if the explanations are unclear or align with our understanding of the data [87]. By rearranging the values of a single feature in the data, PI dissociates the feature from the target variable. The drop in model performance after shuffling indicates the feature’s importance. Features are ranked by PI according to their impact on model performance. Features that consistently result in significant performance decline upon shuffling are likely to be more influential. Conversely, features that exhibit minimal or no impact on performance after rearrangement may be considered redundant or noninformative. These could be suitable candidates to be removed, which would lead to a more limited selection of characteristics for further examination [88]. While our prior work leveraged [73] SHAP for feature ranking, this approach was not feasible for the present work. The previous work was conducted on two datasets consisting of 1493 double perovskites and 491 layered perovskites, respectively. However, in this work, a dataset with 16,323 data points was utilized. While SHAP values are a powerful tool for feature importance, their computational cost was prohibitive for this study and for this scale of data. Therefore, we employed PI and LIME to analyze feature importance and model behavior. Our goal in this work is to utilize the fastest, most cost-effective, and most accurate tool for investigating the band gap and ranking features. These two models are used to identify the top 10 most important input features for band gap and classification. The total number of potential cases for identifying the top 10 is 12. This is because we have two different targets (band gap prediction and classification) × one dataset (inorganic perovskite) × three different ML models and × two methods for finding the top 10 features. The ranking method involves initially examining the frequency with which a feature appears in the top 10. However, this alone is not a sufficient criterion for decision-making. Next, we assess the ranking of the feature (position from 1 to 10). Therefore, in interpreting the importance of this ranking, both the number of times the feature appears in the top 10 and its final ranking score are important. Thus, to find the average score, we need to calculate the total score/number of appearances. Evaluating the feature importance data clearly shows that the “Formation energy” plays a critical role in predicting band gaps and classification. Its significant presence in 9 of the 12 possible cases underscores its high impact across various scenarios. Its importance is further emphasized by its average score of 1, as lower values indicate higher relevance (a score of “1” signifies the highest importance, while “10” denotes the least importance). Since other features do not exhibit a clear order of importance like “Formation energy,” it is challenging to draw a precise conclusion about the determining role of those features. As illustrated in Figure 3, the next four most important features include “Space group,” appearing 8 times with an average score of 2, “Valence (std)” appearing 6 times with an average score of 4, “Valence (mean)” appearing 6 times with an average score of 3.6, and “Crystal structure” appearing 7 times with an average score of 4.5. As mentioned in this study, due to the repetition of perovskite formulas, we are unable to rely solely on elemental features in these calculations. When calculating the mean and standard deviation of features, if the formulas are the same, our features will also be identical. To enhance the model’s ability to differentiate between perovskites with the same formula, we incorporated computational features such as formation energy, space group, and crystal structure into our feature set wherever possible. However, the important point in this study is that, as expected, formation energy, space group, and crystal structure should have had the greatest impact on the calculations. It is noticeable that two features, Valence (std) and Valence (mean), not only showed results very close to the Space group but also performed even better than the Crystal structure.

This work, like our previous study [73], once again confirmed that the Valence (std) and Valence (mean) have the greatest impact on band gap prediction and the classification of zero band gaps from nonzero ones. Such repeated evidence of the importance of Valence (std) and Valence (mean) could be an intriguing subject for future studies on their effect on band gaps.

Naturally, the relationship between elemental features and the band gap is by no means straightforward, and we expect the presence of complex correlations. Aside from the significant importance of computational features, the elemental features have had a relatively important impact on the predictions. However, the noteworthy point is that ML, without any knowledge of the features and only receiving a few numerical values, was able to predict this issue effectively. This was achieved by providing only the average and standard deviation of the constituent elements’ features in inorganic perovskites. This certainly deserves more attention in future research.

7. Conclusion

This study demonstrates that ABR outperforms RFR and GBR in predicting band gaps of inorganic perovskites, while RFC surpasses SVM and MLP in classifying zero and nonzero band gaps. By utilizing 38 generalized input features, supplemented with computationally derived descriptors, the models effectively capture critical relationships. Feature analysis using LIME and PI identifies the standard deviation of atomic valence as a key predictor, revealing its strong correlation with band gaps. These findings underscore the potential of ML to enhance the design and optimization of perovskite materials for advanced solar cell applications.

Conflicts of Interest

The authors declare no conflicts of interest.

Author Contributions

Alireza Sabagh Moeini: Data curation, formal analysis, investigation, methodology, software, validation, and writing–original draft. Fatemeh Shariatmadar Tehrani: Conceptualization, funding acquisition, project administration, supervision, visualization, and writing–review and editing. Alireza Naeimi-Sadigh: Conceptualization, methodology, supervision, validation, and writing–review and editing.

Funding

The research has not received any fund or grant.

Open Research

Data Availability Statement

The data supporting this article are available upon request by contact to the corresponding author.

References

1 Zhang L., Mei L., and Wang K., et al.Advances in the Application of Perovskite Materials, Nano-Micro Letters. (2023) 15, no. 1, https://doi.org/10.1007/s40820-023-01140-3, 177.
10.1007/s40820-023-01140-3
PubMed Google Scholar
2 Tejuca L. G., Fierro J. L. G., and Tascón J. M., Structure and Reactivity of Perovskite-Type Oxides, Advances in Catalysis. (1989) 36, 237–328.
CAS Google Scholar
3 De Graef M. and McHenry M. E., Structure of Materials: An Introduction to Crystallography, Diffraction and Symmetry, 2012, Cambridge University Press.
10.1017/CBO9781139051637
Google Scholar
4 Jeong M., Choi I. W., and Go E. M., et al.Stable Perovskite Solar Cells With Efficiency Exceeding 24.8% and 0.3-V Voltage Loss, Science. (2020) 369, no. 6511, 1615–1620, https://doi.org/10.1126/science.abb7167.
10.1126/science.abb7167
CAS PubMed Web of Science® Google Scholar
5 Correa-Baena J.-P., Saliba M., and Buonassisi T., et al.Promises and Challenges of Perovskite Solar Cells, Science (New York, N.Y.). (2017) 358, no. 6364, 739–744, https://doi.org/10.1126/science.aam6323, 2-s2.0-85033435041.
10.1126/science.aam6323
CAS PubMed Google Scholar
6 Wang D., Wright M., Elumalai N. K., and Uddin A., Stability of Perovskite Solar Cells, Solar Energy Materials and Solar Cells. (2016) 147, 255–275.
10.1016/j.solmat.2015.12.025
CAS Web of Science® Google Scholar
7 Dou L., Yang Y., and You J., et al.Solution-Processed Hybrid Perovskite Photodetectors With High Detectivity, Nature Communications. (2014) 5, no. 1, https://doi.org/10.1038/ncomms6404, 2-s2.0-84923339015, 5404.
10.1038/ncomms6404
CAS PubMed Google Scholar
8 Diao Z., Gong T., Li X., Hu Y., and Hu W., Solution-Processed Perovskite and Oxide-Semiconductor Heterostructure Construction for a High-Performance Ultraviolet Photodetector, ACS Applied Electronic Materials. (2024) 6, no. 4, 2316–2322, https://doi.org/10.1021/acsaelm.3c01839.
10.1021/acsaelm.3c01839
CAS Google Scholar
9 Zheng D. and Pauporté T., Advances in Optical Imaging and Optical Communications Based on High-Quality Halide Perovskite Photodetectors, Advanced Functional Materials. (2024) 34, no. 11, https://doi.org/10.1002/adfm.202311205, 2311205.
10.1002/adfm.202311205
CAS Google Scholar
10 Rao C. N. R., Perovskite Oxides and High-Temperature Superconductivity, Ferroelectrics. (1990) 102, no. 1, 297–308, https://doi.org/10.1080/00150199008221489, 2-s2.0-84951408223.
10.1080/00150199008221489
CAS Google Scholar
11 Fratello V. J. and Brandle C. D., Calculation of Dielectric Polarizabilities of Perovskite Substrate Materials for High-Temperature Superconductors, Journal of Materials Research. (1994) 9, no. 10, 2554–2560, https://doi.org/10.1557/JMR.1994.2554, 2-s2.0-0028517987.
10.1557/JMR.1994.2554
CAS Google Scholar
12 Ford P. J. and Saunders G. A., High-Temperature Superconductivity-Ten Years on, Contemporary Physics. (1997) 38, no. 1, 63–81, https://doi.org/10.1080/001075197182568, 2-s2.0-0031539996.
10.1080/001075197182568
CAS Google Scholar
13 Cohen R. E., Origin of Ferroelectricity in Perovskite Oxides, Nature. (1992) 358, no. 6382, 136–138, https://doi.org/10.1038/358136a0, 2-s2.0-0027115880.
10.1038/358136a0
CAS Web of Science® Google Scholar
14 Nuraje N. and Su K., Perovskite Ferroelectric Nanomaterials, Nanoscale. (2013) 5, no. 19, 8752–8780, https://doi.org/10.1039/c3nr02543h, 2-s2.0-84884221777.
10.1039/c3nr02543h
CAS PubMed Google Scholar
15 Ye H.-Y., Tang Y.-Y., and Li P.-F., et al.Metal-Free Three-Dimensional Perovskite Ferroelectrics, Science. (2018) 361, no. 6398, 151–155, https://doi.org/10.1126/science.aas9330, 2-s2.0-85049809762.
10.1126/science.aas9330
CAS PubMed Google Scholar
16 Choi J., Han J. S., Hong K., Kim S. Y., and Jang H. W., Organic-Inorganic Hybrid Halide Perovskites for Memories, Transistors, and Artificial Synapses, Advanced Materials. (2018) 30, no. 42, https://doi.org/10.1002/adma.201704002, 2-s2.0-85047779641, 1704002.
10.1002/adma.201704002
Web of Science® Google Scholar
17 Xue Z., Xu Y., Jin C., Liang Y., Cai Z., and Sun J., Halide Perovskite Photoelectric Artificial Synapses: Materials, Devices, and Applications, Nanoscale. (2023) 15, no. 10, 4653–4668, https://doi.org/10.1039/D2NR06403K.
10.1039/D2NR06403K
CAS PubMed Google Scholar
18 Zhang B. W., Lin C. H., and Nirantar S., et al.Lead-Free Perovskites and Metal Halides for Resistive Switching Memory and Artificial Synapse, Small Structures. (2024) 5, no. 6, https://doi.org/10.1002/sstr.202300524, 2300524.
10.1002/sstr.202300524
CAS Google Scholar
19 Bhalla A. S., Guo R., and Roy R., The Perovskite Structure—a Review of Its Role in Ceramic Science and Technology, Materials Research Innovations. (2000) 4, no. 1, 3–26, https://doi.org/10.1007/s100190000062, 2-s2.0-0034311338.
10.1007/s100190000062
CAS Web of Science® Google Scholar
20 Muller O. and Roy R., The Major Ternary Structural Families, 1974, Springer.
10.1007/978-3-642-65706-1
Google Scholar
21 Zhang G., Liu G., Wang L., and Irvine J. T. S., Inorganic Perovskite Photocatalysts for Solar Energy Utilization, Chemical Society Reviews. (2016) 45, no. 21, 5951–5984, https://doi.org/10.1039/C5CS00769K, 2-s2.0-84994016195.
10.1039/C5CS00769K
CAS PubMed Google Scholar
22 Goldschmidt V., Skrifter Norske Videnskaps-Akad, 1926, 8, no. 2, Oslo I, Mat-Naturvidensk Kl.
Google Scholar
23 Sato T., Takagi S., Deledda S., Hauback B. C., and Orimo S.-I., Extending the Applicability of the Goldschmidt Tolerance Factor to Arbitrary Ionic Compounds, Scientific Reports. (2016) 6, no. 1, https://doi.org/10.1038/srep23592, 2-s2.0-84962855287, 23592.
10.1038/srep23592
CAS PubMed Google Scholar
24 Travis W., Glover E. N. K., Bronstein H., Scanlon D. O., and Palgrave R. G., On the Application of the Tolerance Factor to Inorganic and Hybrid Halide Perovskites: A Revised System, Chemical Science. (2016) 7, no. 7, 4548–4556, https://doi.org/10.1039/C5SC04845A, 2-s2.0-84976285814.
10.1039/C5SC04845A
CAS PubMed Google Scholar
25 Jarin S., Yuan Y., and Zhang M., et al.Predicting the Crystal Structure and Lattice Parameters of the Perovskite Materials via Different Machine Learning Models Based on Basic Atom Properties, Crystals. (2022) 12, no. 11, https://doi.org/10.3390/cryst12111570, 1570.
10.3390/cryst12111570
CAS Google Scholar
26 Pradhan S., Moschitti A., and Xue N., et al. Towards Robust Linguistic Analysis Using Ontonotes, Proceedings of the Seventeenth Conference on Computational Natural Language Learning, 2013, Association for Computational Linguistics, 143–152.
Google Scholar
27 Xiang W., Liu S. F., and Tress W., A Review on the Stability of Inorganic Metal Halide Perovskites: Challenges and Opportunities for Stable Solar Cells, Energy & Environmental Science. (2021) 14, no. 4, 2090–2113, https://doi.org/10.1039/D1EE00157D.
10.1039/D1EE00157D
CAS Google Scholar
28 Karna L. R., Upadhyay R., and Ghosh A., All-iInorganic Perovskite Photovoltaics for Power Conversion Efficiency of 31%, Scientific Reports. (2023) 13, no. 1, 15212.
10.1038/s41598-023-42447-w
CAS PubMed Google Scholar
29 Castelli I. E., García-Lastra J. M., Hüser F., Thygesen K. S., and Jacobsen K. W., Stability and Bandgaps of Layered Perovskites for One-and Two-Photon Water Splitting, New Journal of Physics. (2013) 15, no. 10, https://doi.org/10.1088/1367-2630/15/10/105026, 2-s2.0-84887442441, 105026.
10.1088/1367-2630/15/10/105026
Google Scholar
30 Castelli I. E., Thygesen K. S., and Jacobsen K. W., Bandgap Engineering of Double Perovskites for One- and Two-photon Water Splitting, MRS Online Proceedings Library (OPL). (2013) 1523, 706.
Google Scholar
31 Peter Y. and Cardona M., Fundamentals of Semiconductors: Physics and Materials Properties, 2010, Springer Science & Business Media.
Google Scholar
32 Wang K., Yang D., Wu C., Sanghadasa M., and Priya S., Recent Progress in Fundamental Understanding of Halide Perovskite Semiconductors, Progress in Materials Science. (2019) 106, https://doi.org/10.1016/j.pmatsci.2019.100580, 2-s2.0-85069569500, 100580.
10.1016/j.pmatsci.2019.100580
CAS Google Scholar
33 Even J., Pedesseau L., and Katan C., et al.Solid-State Physics Perspective on Hybrid Perovskite Semiconductors, The Journal of Physical Chemistry C. (2015) 119, no. 19, 10161–10177, https://doi.org/10.1021/acs.jpcc.5b00695, 2-s2.0-84929379586.
10.1021/acs.jpcc.5b00695
CAS Google Scholar
34 Su R., Fieramosca A., and Zhang Q., et al.Perovskite Semiconductors for Room-Temperature Exciton-Polaritonics, Nature Materials. (2021) 20, no. 10, 1315–1324, https://doi.org/10.1038/s41563-021-01035-x.
10.1038/s41563-021-01035-x
CAS PubMed Google Scholar
35 Schmidt-Mende L., Dyakonov V., and Olthof S., et al.Roadmap on Organic-Inorganic Hybrid Perovskite Semiconductors and Devices, APL Materials. (2021) 9, no. 10, https://doi.org/10.1063/5.0047616, 109202.
10.1063/5.0047616
CAS Google Scholar
36 López R. and Gómez R., Band-Gap Energy Estimation From Diffuse Reflectance Measurements on Sol-Gel and Commercial TiO₂: A Comparative Study, Journal of Sol-Gel Science and Technology. (2012) 61, no. 1, 1–7, https://doi.org/10.1007/s10971-011-2582-9, 2-s2.0-84856215669.
10.1007/s10971-011-2582-9
CAS Web of Science® Google Scholar
37 Gouder T., Eloirdi R., Martin R. L., Osipenko M., Giovannini M., and Caciuffo R., Measurements of the Band Gap of ThF 4 by Electron Spectroscopy Techniques, Physical Review Research. (2019) 1, no. 3, https://doi.org/10.1103/PhysRevResearch.1.033005, 033005.
10.1103/PhysRevResearch.1.033005
CAS Google Scholar
38 Jubu P. R., Obaseki O., and Ajayi D., et al.Considerations about the Determination of Optical Bandgap From Diffuse Reflectance Spectroscopy Using the Tauc Plot, Journal of Optics. (2024) 53, no. 5, 5054–5064, https://doi.org/10.1007/s12596-024-01741-0.
10.1007/s12596-024-01741-0
Google Scholar
39 Choudhary K. and Garrity K. F., InterMat: Accelerating Band Offset Prediction in Semiconductor Interfaces With DFT and Deep Learning, Digital Discovery. (2024) 3, no. 7, 1365–1377, https://doi.org/10.1039/D4DD00031E.
10.1039/D4DD00031E
Google Scholar
40 Kauwe S. K., Welker T., and Sparks T. D., Extracting Knowledge From DFT: Experimental Band Gap Predictions Through Ensemble Learning, Integrating Materials and Manufacturing Innovation. (2020) 9, no. 3, 213–220, https://doi.org/10.1007/s40192-020-00178-0.
10.1007/s40192-020-00178-0
Google Scholar
41 Chan M. K. Y. and Ceder G., Efficient Band Gap Prediction for Solids, Physical Review Letters. (2010) 105, no. 19, https://doi.org/10.1103/PhysRevLett.105.196403, 2-s2.0-78149310664, 196403.
10.1103/PhysRevLett.105.196403
CAS PubMed Web of Science® Google Scholar
42 Tao S. X., Cao X., and Bobbert P. A., Accurate and Efficient Band Gap Predictions of Metal Halide Perovskites Using the DFT-1/2 Method: GW Accuracy With DFT Expense, Scientific Reports. (2017) 7, no. 1, https://doi.org/10.1038/s41598-017-14435-4, 2-s2.0-85032570089, 14386.
10.1038/s41598-017-14435-4
PubMed Google Scholar
43 Perdew J. P., Burke K., and Ernzerhof M., Generalized Gradient Approximation Made Simple, Physical Review Letters. (1996) 77, no. 18, 3865–3868, https://doi.org/10.1103/PhysRevLett.77.3865, 2-s2.0-4243943295.
10.1103/PhysRevLett.77.3865
CAS PubMed Web of Science® Google Scholar
44 Zhang Y. and Yang W., Comment on “Generalized Gradient Approximation Made Simple”, Physical Review Letters. (1998) 80, no. 4, 890–890, https://doi.org/10.1103/PhysRevLett.80.890, 2-s2.0-85029400214.
10.1103/PhysRevLett.80.890
CAS Google Scholar
45 Negele J. W., Structure of Finite Nuclei in the Local-Density Approximation, Physical Review C. (1970) 1, no. 4, 1260–1321, https://doi.org/10.1103/PhysRevC.1.1260, 2-s2.0-0001082177.
10.1103/PhysRevC.1.1260
Google Scholar
46 Aryasetiawan F. and Gunnarsson O., The GW Method, Reports on Progress in Physics. (1998) 61, no. 3, https://doi.org/10.1088/0034-4885/61/3/002, 2-s2.0-1542388160, 237.
10.1088/0034-4885/61/3/002
CAS Web of Science® Google Scholar
47 Aryasetiawan F. and Gunnarsson O., Electronic Structure of NiO in the GW Approximation, Physical Review Letters. (1995) 74, no. 16, 3221–3224, https://doi.org/10.1103/PhysRevLett.74.3221, 2-s2.0-11744384724.
10.1103/PhysRevLett.74.3221
CAS PubMed Google Scholar
48 Schira R. and Latouche C., DFT and Hybrid-DFT Calculations on the Electronic Properties of Vanadate Materials: Theory Meets Experiments, New Journal of Chemistry. (2020) 44, no. 27, 11602–11607, https://doi.org/10.1039/D0NJ02316G.
10.1039/D0NJ02316G
CAS Google Scholar
49 Franchini C., Hybrid Functionals Applied to Perovskites, Journal of Physics: Condensed Matter. (2014) 26, no. 25, https://doi.org/10.1088/0953-8984/26/25/253202, 2-s2.0-84901777786, 253202.
10.1088/0953-8984/26/25/253202
PubMed Google Scholar
50 Ward L., Agrawal A., Choudhary A., and Wolverton C., A General-Purpose Machine Learning Framework for Predicting Properties of Inorganic Materials, Npj Computational Materials. (2016) 2, no. 1, 1–7, https://doi.org/10.1038/npjcompumats.2016.28, 2-s2.0-85042210364.
10.1038/npjcompumats.2016.28
Web of Science® Google Scholar
51 Faber F. A., Hutchison L., and Huang B., et al.Machine Learning Prediction Errors Better Than DFT Accuracy, ArXiv. (2017) .
Google Scholar
52 Faber F. A., Hutchison L., and Huang B., et al.Prediction Errors of Molecular Machine Learning Models Lower Than Hybrid DFT Error, Journal of Chemical Theory and Computation. (2017) 13, no. 11, 5255–5264, https://doi.org/10.1021/acs.jctc.7b00577, 2-s2.0-85034091857.
10.1021/acs.jctc.7b00577
CAS PubMed Google Scholar
53 Schleder G. R., Padilha A. C. M., Acosta C. M., Costa M., and Fazzio A., From DFT to Machine Learning: Recent Approaches to Materials Science-A Review, Journal of Physics: Materials. (2019) 2, no. 3, https://doi.org/10.1088/2515-7639/ab084b, 032001.
10.1088/2515-7639/ab084b
CAS Google Scholar
54 Guo Z. and Lin B., Machine Learning Stability and Band Gap of Lead-Free Halide Double Perovskite Materials for Perovskite Solar Cells, Solar Energy. (2021) 228, 689–699, https://doi.org/10.1016/j.solener.2021.09.030.
10.1016/j.solener.2021.09.030
CAS Google Scholar
55 Hu W. and Zhang L., High-Throughput Calculation and Machine Learning of Two-Dimensional Halide Perovskite Materials: Formation Energy and Band Gap, Materials Today Communications. (2023) 35, https://doi.org/10.1016/j.mtcomm.2023.105841, 105841.
10.1016/j.mtcomm.2023.105841
CAS Web of Science® Google Scholar
56 Liu Y., Yan W., Zhu H., Tu Y., Guan L., and Tan X., Study on Bandgap Predications of ABX3-Type Perovskites by Machine Learning, Organic Electronics. (2022) 101, https://doi.org/10.1016/j.orgel.2021.106426, 106426.
10.1016/j.orgel.2021.106426
CAS Google Scholar
57 Lee J., Seko A., Shitara K., Nakayama K., and Tanaka I., Prediction Model of Band Gap for Inorganic Compounds by Combination of Density Functional Theory Calculations and Machine Learning Techniques, Physical Review B. (2016) 93, no. 11, https://doi.org/10.1103/PhysRevB.93.115104, 2-s2.0-84960865662, 115104.
10.1103/PhysRevB.93.115104
Google Scholar
58 Prateek S., Garg R., Kumar Saxena K., Srivastav V. K., Vasudev H., and Kumar N., Data-Driven Materials Science: Application of ML for Predicting Band Gap, Advances in Materials and Processing Technologies. (2024) 10, no. 2, 708–717, https://doi.org/10.1080/2374068X.2023.2171666.
10.1080/2374068X.2023.2171666
Web of Science® Google Scholar
59 Theng D. and Bhoyar K. K., Feature Selection Techniques for Machine Learning: A Survey of More Than Two Decades of Research, Knowledge and Information Systems. (2024) 66, no. 3, 1575–1637, https://doi.org/10.1007/s10115-023-02010-5.
10.1007/s10115-023-02010-5
Google Scholar
60 Xie J., Sage M., and Zhao Y. F., Feature Selection and Feature Learning in Machine Learning Applications for Gas Turbines: A Review, Engineering Applications of Artificial Intelligence. (2023) 117, https://doi.org/10.1016/j.engappai.2022.105591, 105591.
10.1016/j.engappai.2022.105591
Google Scholar
61 Takahashi K., Takahashi L., Miyazato I., and Tanaka Y., Searching for Hidden Perovskite Materials for Photovoltaic Systems by Combining Data Science and First Principle Calculations, ACS Photonics. (2018) 5, no. 3, 771–775, https://doi.org/10.1021/acsphotonics.7b01479, 2-s2.0-85044221053.
10.1021/acsphotonics.7b01479
CAS Google Scholar
62 Li C., Hao H., and Xu B., et al.A Progressive Learning Method for Predicting the Band Gap of ABO 3 Perovskites Using an Instrumental Variable, Journal of Materials Chemistry C. (2020) 8, no. 9, 3127–3136, https://doi.org/10.1039/C9TC06632B.
10.1039/C9TC06632B
CAS Google Scholar
63 Huang Y., Yu C., and Chen W., et al.Band Gap and Band Alignment Prediction of Nitride-Based Semiconductors Using Machine Learning, Journal of Materials Chemistry C. (2019) 7, no. 11, 3238–3245, https://doi.org/10.1039/C8TC05554H, 2-s2.0-85062846116.
10.1039/C8TC05554H
CAS Web of Science® Google Scholar
64 Feng S. and Wang J., Prediction of Organic-Inorganic Hybrid Perovskite Band Gap by Multiple Machine Learning Algorithms, Molecules. (2024) 29, no. 2, https://doi.org/10.3390/molecules29020499, 499.
10.3390/molecules29020499
CAS PubMed Google Scholar
65 Gladkikh V., Kim D. Y., Hajibabaei A., Jana A., Myung C. W., and Kim K. S., Machine Learning for Predicting the Band Gaps of ABX3 Perovskites From Elemental Properties, The Journal of Physical Chemistry C. (2020) 124, no. 16, 8905–8918, https://doi.org/10.1021/acs.jpcc.9b11768.
10.1021/acs.jpcc.9b11768
CAS Web of Science® Google Scholar
66 Zhang J., Li Y., and Zhou X., Machine-Learning Prediction of the Computed Band Gaps of Double Perovskite Materials, ArXiv. (2023) 15–27, https://doi.org/10.5121/csit.2023.130102.
10.5121/csit.2023.130102
Google Scholar
67 Zhuo Y., Mansouri Tehrani A., and Brgoch J., Predicting the Band Gaps of Inorganic Solids by Machine Learning, The Journal of Physical Chemistry Letters. (2018) 9, no. 7, 1668–1673, https://doi.org/10.1021/acs.jpclett.8b00124, 2-s2.0-85045021867.
10.1021/acs.jpclett.8b00124
CAS PubMed Google Scholar
68 Obada D. O., Okafor E., Abolade S. A., Ukpong A. M., Dodoo-Arhin D., and Akande A., Explainable Machine Learning for Predicting the Band Gaps of ABX3 Perovskites, Materials Science in Semiconductor Processing. (2023) 161, https://doi.org/10.1016/j.mssp.2023.107427, 107427.
10.1016/j.mssp.2023.107427
CAS Web of Science® Google Scholar
69 Yang C., Chong X., and Hu M., et al.Accelerating the Discovery of Hybrid Perovskites With Targeted Band Gaps via Interpretable Machine Learning, ACS Applied Materials & Interfaces. (2023) 15, no. 34, 40419–40427, https://doi.org/10.1021/acsami.3c06392.
10.1021/acsami.3c06392
CAS PubMed Google Scholar
70 Mahal E., Roy D., Manna S. S., and Pathak B., Machine Learning-Driven Prediction of Band-Alignment Types in 2D Hybrid Perovskites, Journal of Materials Chemistry A. (2023) 11, no. 43, 23547–23555, https://doi.org/10.1039/D3TA05186B.
10.1039/D3TA05186B
CAS Google Scholar
71 Chenebuah E. T. and Chenebuah D. T., An Inorganic ABX3 Perovskite Materials Dataset for Target Property Prediction and Classification Using Machine Learning, ArXiv. (2023) .
Google Scholar
72 Saal J. E., Kirklin S., Aykol M., Meredig B., and Wolverton C., Materials Design and Discovery With High-Throughput Density Functional Theory: The Open Quantum Materials Database (OQMD), JOM. (2013) 65, no. 11, 1501–1509, https://doi.org/10.1007/s11837-013-0755-4, 2-s2.0-84887236786.
10.1007/s11837-013-0755-4
CAS Google Scholar
73 Sabagh Moeini A., Shariatmadar Tehrani F., and Naeimi-Sadigh A., Machine Learning-Enhanced Band Gaps Prediction for Low-Symmetry Double and Layered Perovskites, Scientific Reports. (2024) 14, no. 1, https://doi.org/10.1038/s41598-024-77081-7, 26736.
10.1038/s41598-024-77081-7
CAS PubMed Google Scholar
74 Breiman L., Random Forests, Machine Learning. (2001) 45, no. 1, 5–32, https://doi.org/10.1023/A:1010933404324, 2-s2.0-0035478854.
10.1023/A:1010933404324
Web of Science® Google Scholar
75 Drucker H., Burges C. J., Kaufman L., Smola A., and Vapnik V., Support Vector Regression Machines, Advances in Neural Information Processing Systems. (1996) 9, 155–161.
Web of Science® Google Scholar
76 LeCun Y., Touresky D., Hinton G., and Sejnowski T., A Theoretical Framework for Back-Propagation, Proceedings of the 1988 cOnnectionist Models Summer School, 1988, CMU, 21–28.
Google Scholar
77 Domingo C. and Watanabe O., MadaBoost: A mModification of AdaBoost, 2000, Colt.
Google Scholar
78 Murphy K. P., Machine Learning: A Probabilistic Perspective, 2012, MIT press.
Google Scholar
79 Friedman J. H., Greedy Function Approximation: A Gradient Boosting Machine, The Annals of Statistics. (2001) 29, no. 5, 1189–1232, https://doi.org/10.1214/aos/1013203451.
10.1214/aos/1013203451
Web of Science® Google Scholar
80 Fawcett T., An Introduction to ROC Analysis, Pattern Recognition Letters. (2006) 27, no. 8, 861–874, https://doi.org/10.1016/j.patrec.2005.10.010, 2-s2.0-33646023117.
10.1016/j.patrec.2005.10.010
Web of Science® Google Scholar
81 Powers D. M., Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness and Correlation, ArXiv. (2020) .
Google Scholar
82 Ong S. P., Richards W. D., and Jain A., et al.Python Materials Genomics (Pymatgen): A Robust, Open-Source Python Library for Materials Analysis, Computational Materials Science. (2013) 68, 314–319, https://doi.org/10.1016/j.commatsci.2012.10.028, 2-s2.0-84870720323.
10.1016/j.commatsci.2012.10.028
CAS Web of Science® Google Scholar
83 Hanley J. A. and McNeil B. J., The Meaning and Use of the Area Under a Receiver Operating Characteristic (ROC) Curve, Radiology. (1982) 143, no. 1, 29–36, https://doi.org/10.1148/radiology.143.1.7063747, 2-s2.0-0020083498.
10.1148/radiology.143.1.7063747
CAS PubMed Web of Science® Google Scholar
84 Bradley A. P., The use of the Area Under the ROC Curve in the Evaluation of Machine Learning Algorithms, Pattern Recognition. (1997) 30, no. 7, 1145–1159, https://doi.org/10.1016/S0031-3203(96)00142-2, 2-s2.0-0031191630.
10.1016/S0031-3203(96)00142-2
Web of Science® Google Scholar
85 Fisher A., Rudin C., and Dominici F., All Models are Wrong, But Many are Useful: Learning a Variable’s Importance by Studying an Entire Class of Prediction Models Simultaneously, Journal of Machine Learning Research. (2019) 20, no. 177, 1–81, https://doi.org/10.1080/01621459.1963.10500830, 2-s2.0-84947403595.
10.1080/01621459.1963.10500830
CAS Google Scholar
86 Ribeiro M. T., Singh S., and Guestrin C., Why Should I Trust You?” Explaining the Predictions of any Classifier, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, Association for Computing Machinery, 1135–1144.
Google Scholar
87 Molnar C., Interpretable Machine Learning, 2020.
Google Scholar
88 Huang N., Lu G., and Xu D., A Permutation Importance-Based Feature Selection Method for Short-Term Electricity Load Forecasting Using Random Forest, Energies. (2016) 9, no. 10, https://doi.org/10.3390/en9100767, 2-s2.0-85019413262, 767.
10.3390/en9100767
Google Scholar

All articles

Machine Learning-Driven Band Gap Prediction/Classification and Feature Importance Analysis of Inorganic Perovskites

Abstract

1. Introduction

2. Data and Features

3. ML Models