Science Engineer Laboratory for Energy , National School of Applied Sciences , Chouaib Doukkali University of El Jadida , El Jadida , Morocco , ucd.ac.ma

Search for more papers by this author

Oussama Khouili,

Corresponding Author

Oussama Khouili

[email protected]

orcid.org/0009-0002-5460-3790

LTI Laboratory , National School of Applied Sciences , Chouaib Doukkali University , El Jadida , Morocco , ucd.ac.ma

Search for more papers by this author

Mohamed Hanine,

Mohamed Hanine

orcid.org/0000-0001-5981-2511

LTI Laboratory , National School of Applied Sciences , Chouaib Doukkali University , El Jadida , Morocco , ucd.ac.ma

Search for more papers by this author

Mohamed Louzazni,

Mohamed Louzazni

orcid.org/0000-0003-0686-0426

Science Engineer Laboratory for Energy , National School of Applied Sciences , Chouaib Doukkali University of El Jadida , El Jadida , Morocco , ucd.ac.ma

Search for more papers by this author

First published: 04 April 2025

https://doi.org/10.1155/er/5846114

Academic Editor: Akshay Kumar Saha

Share a link

Email
Wechat
Bluesky

Abstract

Accurate solar radiation prediction is essential for optimizing renewable energy systems and supporting grid stability. This study investigates the use of principal component analysis (PCA) for dimensionality reduction in solar radiation prediction models, followed by an evaluation of the models’ performance across varying feature sets. A series of case studies were conducted, comparing models using raw meteorological inputs with those employing reduced principal components (PCs) as inputs. Results demonstrate that while retaining fewer PCs reduces computational complexity, it can significantly affect model performance. The model with all meteorological inputs achieved the best results with an R² of 0.99198, MSE of 562.612, and MAPE of 0.1899%. By contrast, the single-PC model exhibited an R² of 0.11699 and MAPE of 64.5897%, highlighting the trade-off between dimensionality reduction and prediction accuracy. The study also emphasizes the computational efficiency gained through PCA, particularly in high-dimensional datasets. Future directions include integrating hybrid feature extraction techniques, leveraging advanced deep learning architectures, and exploring temporal and spatial dynamics to further refine prediction accuracy. The findings provide a roadmap for developing scalable and interpretable solar radiation prediction models, advancing their integration into real-time renewable energy systems.

1. Introduction

Accurate solar yield prediction is crucial for optimizing the performance of solar energy systems and improving the efficiency of renewable energy generation. As the demand for solar power continues to rise, precise prediction models are essential to minimize the uncertainty in energy production, which can be influenced by environmental factors, such as solar radiation, temperature, and cloud cover. Traditional methods of solar yield prediction, including statistical approaches and simple regression models, often fail to capture the complex, nonlinear relationships between these variables, leading to suboptimal accuracy in forecasting [1].

To address these challenges, machine learning (ML) techniques have become increasingly popular due to their ability to model complex and nonlinear relationships. Among these techniques, artificial neural networks (ANNs) have demonstrated strong predictive capabilities in various domains, including renewable energy prediction [2]. However, one of the primary challenges in applying ANN to solar yield prediction lies in the high dimensionality of the input data, which can lead to overfitting, increased computational cost, and poor generalization performance [3]. Principal component analysis (PCA) has proven to be an effective dimensionality reduction technique, enabling the transformation of a large set of correlated features into a smaller set of uncorrelated components, while retaining most of the data’s variance [4]. When combined with ANN, PCA can potentially improve model efficiency and reduce overfitting by identifying and retaining only the most significant features, while discarding irrelevant or redundant data [5].

Despite the individual success of PCA and ANN in different predictive modeling tasks, their integration in solar yield prediction remains an area with limited exploration. Some recent studies have explored the use of PCA for feature extraction in combination with ANN for solar radiation predictions [6] and energy load forecasting [7]. However, the combination of PCA and ANN for solar yield prediction has not been extensively studied, particularly with respect to the impact of dimensionality reduction on model accuracy and computational performance. This paper aims to investigate the effectiveness of using PCA for feature extraction and ANN for prediction, with the objective of improving solar yield prediction by reducing dimensionality while maintaining high prediction accuracy.

The remainder of this paper is organized as follows: Section 2 reviews related work on solar yield prediction and the use of PCA and ANN in energy forecasting. Section 3 presents the methodology, including the dataset, preprocessing steps, and model development. Section 4 discusses the results of the proposed models. Finally, Section 5 provides conclusions and suggests directions for future research.

2. Literature Review

The task of solar yield prediction has been extensively studied in recent years due to the growing importance of solar energy in the global renewable energy mix. Traditional prediction methods, such as statistical regression models, time-series analysis, and physical modeling, have been used to forecast solar power generation based on factors like solar irradiance, temperature, and humidity [3]. However, these methods often struggle to handle the complex, nonlinear relationships between input variables, which can result in lower prediction accuracy.

With the advent of ML techniques, ANNs have emerged as one of the most popular tools for solar yield prediction due to their ability to capture complex nonlinear patterns in data [8]. ANN models, particularly multilayer perceptrons (MLPs), have shown significant improvements in forecasting solar radiation and energy yield compared to traditional methods [9]. These models utilize multiple hidden layers and activation functions to approximate nonlinear mappings, making them well-suited for tasks where the relationship between input variables and output is highly intricate and difficult to model using linear techniques [10].

Despite their success, ANN models often face challenges related to the high dimensionality of input data. Solar energy datasets typically include a large number of features, such as various weather parameters (e.g., temperature, humidity, direct normal irradiance [DNI], diffuse horizontal irradiance [DHI], etc.) and time-related variables (e.g., time of day, season, and geographical location). The inclusion of many features can lead to overfitting, where the model captures noise in the data instead of the underlying patterns [11]. This issue often results in poor generalization performance and increased computational costs.

To address the dimensionality problem, PCA has been widely adopted as a dimensionality reduction technique. PCA transforms the original correlated feature space into a new set of orthogonal features, called principal components (PCs), that capture the most significant variance in the data [4]. By retaining only the PCs that account for the majority of the variance, PCA reduces the number of input variables for ML models without sacrificing much predictive power. This reduction can improve model efficiency, reduce the risk of overfitting, and simplify the training process [12].

Several studies have investigated the use of PCA in combination with ANN for various applications, including energy forecasting. In the context of solar energy, PCA-based feature extraction has been used to select the most relevant features and improve the performance of ANN models for predicting solar irradiance [13]. For example, in a study by Howley et al. [12], PCA was applied to reduce the dimensionality of weather data, and the transformed features were fed into an ANN for solar radiation forecasting. The results demonstrated that PCA-enhanced ANN models significantly outperformed traditional ANN models that used the original high-dimensional input data.

In addition, other dimensionality reduction techniques such as independent component analysis (ICA) and singular value decomposition (SVD) have also been explored for feature extraction in solar yield prediction. These methods, like PCA, aim to identify the most informative components of the data, though they differ in the mathematical approach and assumptions [14]. While ICA is more suited for situations where the data has non-Gaussian distributions, PCA remains the most commonly used method due to its simplicity and effectiveness in capturing the main variance in the data.

Although much progress has been made, challenges still remain in optimizing the integration of PCA with ANN models for solar yield prediction. Most studies focus on comparing the performance of PCA-based models against conventional ANN models or other ML techniques. However, there is a lack of consensus on how PCA impacts the model’s generalization performance and whether the improvements in accuracy justify the reduction in input dimensionality [15]. Additionally, the choice of ANN architecture, including the number of hidden layers, neurons, and activation functions, is often not standardized across studies, which can lead to inconsistent results.

Moreover, the combination of PCA and ANN for solar yield prediction has not been widely explored with respect to its application in real-time forecasting or integration into operational energy management systems. There is a need for further research to optimize the use of PCA for feature extraction in dynamic environments and to assess the scalability of these models when applied to large, real-time datasets [16].

This paper aims to contribute to the body of literature by exploring the effectiveness of PCA for feature extraction and ANN for solar yield prediction, specifically focusing on the impact of dimensionality reduction on prediction accuracy and computational efficiency. By comparing the performance of ANN models trained on original and PCA-reduced datasets, this study seeks to provide new insights into the role of dimensionality reduction in improving solar yield forecasting.

3. Methodology

3.1. Data Description

The dataset utilized in this study spans a period of 3 years (2014–2016), providing high-resolution, 1-min interval measurements of key solar radiation parameters: global horizontal irradiance (GHI), DNI, and DHI. These measurements were recorded at a location in Folsom, situated within the Sacramento metropolitan area in Northern California, with geographic coordinates ~38°40^′43”N and 121°09^′35”W. In addition to solar irradiance data, the dataset includes complementary information from numerical weather prediction (NWP) models and meteorological variables, such as temperature, humidity, and wind speed. These overlapping datasets provide a robust framework for capturing the intricate interactions between solar radiation and weather conditions. The combination of detailed solar and meteorological data allows for a thorough investigation of solar radiation patterns across different timescales, encompassing daily and seasonal variations. To ensure data quality and suitability for modeling, several preprocessing steps were applied. The original dataset comprised 1,048,575 records collected at a 1-min interval over the 3-year period. Missing values were handled using linear interpolation to maintain data continuity. To align with the temporal resolution typically used in solar radiation forecasting, the data were aggregated on an hourly basis. Additionally, nighttime records where GHI = 0 were excluded, as they do not contribute to solar radiation prediction. After these preprocessing steps, the dataset was reduced to 9462 records, which were then divided into training (80%) and testing (20%) sets for model development and evaluation. This structured approach ensures that the dataset remains representative while optimizing model performance.

This comprehensive dataset serves as an ideal foundation for modeling and predicting solar yield, offering insights into both short-term and long-term trends. The high temporal resolution of the data further enhances its utility, enabling the analysis of rapid fluctuations in solar radiation and their correlation with weather phenomena, thereby improving the accuracy and reliability of solar yield prediction models.

3.2. Data Preprocessing

3.2.1. PCA

The study begins with comprehensive data cleaning and preprocessing to ensure the accuracy and reliability of the dataset. This process includes imputing missing values and removing outliers, establishing a robust foundation for subsequent analysis. Once the data are prepared, PCA is employed as a dimensionality reduction technique. PCA is an unsupervised learning method that simplifies high-dimensional data into a lower dimensional representation, enhancing its interpretability while retaining essential information for analysis and visualization [17, 18]. By varying the number of PCs, the study explores the influence of dimensionality on ANN performance, aiming to identify the optimal trade-off between computational efficiency and predictive accuracy.

PCA is particularly advantageous in meteorological data analysis for several reasons:

•
It reduces the number of potentially correlated variables to a smaller, independent subset [19, 20].
•
It decreases computational complexity and processing time.
•
It eliminates irrelevant and noisy features from the dataset.
•
It enhances overall data quality.
•
It improves the accuracy and efficiency of algorithms.
•
It facilitates data visualization in a reduced feature space.
•
It boosts the performance of classification and regression tasks [21, 22].

This study begins with thorough data cleaning and preprocessing, including imputing missing values and removing outliers to ensure accuracy and reliability. This initial step establishes a solid foundation for precise analysis in subsequent stages. Following this, PCA is applied to streamline feature extraction. PCA, an unsupervised learning technique, effectively reduces high-dimensional data to a simpler form that is easier to manipulate, analyze, and visualize [17, 18].

PCA leverages covariance matrices to perform dimensionality reduction. For a dataset X of dimensions m × n, where n columns represent observations and m rows correspond to variables, PCA transforms X into a new matrix Y of dimensions q × n, where q is significantly smaller than mmm. This transformation retains the essential structure of the data while significantly reducing its dimensionality. The process is mathematically represented as follows:

()

The diagonal matrix D, containing the eigenvalues of covariance matrix C, is expressed as follows:

()

where λ_k represents the k th eigenvalue of C. The matrix B, which diagonalizes C, is defined as follows:

()

Each column of B corresponds to one eigenvector of C, ordered according to the eigenvalues in D. The first Q columns of B form the matrix W (with dimensions n × Q), which reduces X into Y in Q-dimensional space using the following equation:

()

Using different subsets of PCs, multiple ANN models are trained to assess how varying levels of dimensionality affect key performance metrics. The analysis also investigates the trade-off between preserving explained variance and reducing computational complexity. Results demonstrate the optimal number of PCs required for the best ANN performance, providing insights into the effectiveness of PCA in solar yield prediction. This approach highlights the versatility of PCA as a preprocessing technique, emphasizing its role in improving ANN efficiency and accuracy while enabling the systematic evaluation of feature dimensionality. Examples of PCA components and the selected features are detailed in Tables 1 and 2. Providing an overview of the transformation results.

Table 1. Examples of the selected features against the target GHI.

Timestamp	T (°C)	H (%)	Pr (h·Pa)	W-speed (m/s)	W-direction (degree)	Max_WP	DNI (W/m)	DHI (W/m)	GHI (W/m)
1/2/2014	2.93	74.97	1009.83	1.66	171.86	2.24	88.45	7.26	12.94
1/3/2014	6.33	63.11	1010	0.66	114.69	1.14	631.31	39.01	146.65
1/4/2014	11.17	47.74	1010.13	0.62	248.09	0.98	709.66	68.39	287.12
1/5/2014	13.44	41.77	1010.42	0.83	278.28	1.28	782.01	66.27	390.23
1/6/2014	16.31	34.28	1009.67	0.85	241.01	1.39	863.02	60.27	469.5

Table 2. Examples of PCA components against the target GHI.

Timestamp	PC1	PC2	PC3	PC4	PC5	PC6	PC7	PC8	PC9	GHI (W/m)
1/2/2014	−1.53	1.81	−0.74	−0.12	0.34	−0.12	−0.06	0.43	−0.23	12.94
1/3/2014	−1.58	0.01	−1.29	0.81	0.64	1.07	−0.98	0.57	0.03	146.65
1/4/2014	−0.16	−0.41	−1.67	1.35	1.24	−0.27	−0.63	0.68	0.02	287.12
1/5/2014	0.51	−0.26	−1.7	1.28	1.48	−0.4	−0.51	0.61	0.02	390.23
1/6/2014	0.71	−0.56	−1.51	1.08	1.45	0.25	−0.46	0.54	0.07	469.5

3.2.2. Efficiency Assessment

The effectiveness of the ML models employed in this study is evaluated using a set of well-established performance metrics: mean absolute error (MAE), root mean square error (RMSE), R-squared (R²), and mean bias error (MBE). Together, these metrics provide a comprehensive assessment of predictive accuracy and model performance. The mathematical formulations of these metrics are presented below, where Y_predicted,i represents the predicted i th value, Y_actual,i denotes the corresponding observed measurement, and

is the mean of the observed values [23]:

()

In these equations, N denotes the total number of observations in the dataset. Each metric offers a unique perspective on model performance:

•
RMSE evaluates the magnitude of prediction errors, penalizing larger deviations more heavily.
•
MAE measures the average absolute difference between predicted and observed values, offering a straightforward interpretation.
•
MAPE quantifies error as a percentage of the observed values, facilitating comparisons across datasets.
•
MBE assesses the systematic bias of the predictions, indicating whether a model consistently over- or under-predicts.
•
R² measures the proportion of variance in the observed data explained by the model, with values closer to 1 indicating stronger predictive performance.

These metrics collectively enable a robust evaluation of the predictive capabilities and reliability of each model tested in the study.

3.3. Neural Network (NN) Development

The architecture of a NN consists of multiple interconnected layers of neurons, as depicted in Figure 1. During the training process, the biases and weights of the neurons are iteratively adjusted using optimization techniques to minimize prediction error, as described in [24]. The relationship between inputs and outputs within the NN can be expressed mathematically using Equation (10) [25]:

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

The illustration of ANN model.

()

In this equation,

represents the input from the kth neuron in layer j − 1, W_k,i is the weight of the connection between neuron k and neuron i, and b_i denotes the bias associated with neuron i. Here, N_j − 1 refers to the number of neurons in the preceding layer. The result Z is then passed through an activation function to introduce nonlinearity, enhancing the model’s ability to capture complex patterns. The sigmoid activation function, commonly used in NNs, is defined by Equation (11) [26]:

()

After conducting multiple pilot tests, the study determined that the most effective NN architecture for predicting GHI comprises an input layer, four hidden layers with 64, 512, 256, and 64 neurons, respectively, and an output layer with a single neuron dedicated to GHI prediction. To further enhance predictive performance, this optimized NN model is integrated with PCA for feature extraction and dimensionality reduction. This integration leverages the strengths of both PCA and NN to analyze the dataset efficiently and accurately. The results of this integrated approach, demonstrating its efficacy and impact on solar yield prediction, are detailed in the upcoming section.

3.4. Prediction Process

The proposed framework for solar energy prediction is structured into four key stages. The first stage involves data collection, where relevant solar energy data are gathered to create a comprehensive dataset for analysis. This dataset serves as the foundation for the predictive modeling process.

In the second stage, data preprocessing is performed to ensure the dataset is clean and suitable for modeling. This includes handling missing values, removing outliers, normalizing the data, and aggregating it on an hourly basis to align with the modeling requirements. These steps ensure that the input data are consistent and ready for use in subsequent stages.

The third stage is model training, where preprocessed data are used to train NN algorithms. During this phase, the models learn to recognize patterns and relationships within the dataset, enabling them to develop accurate predictive capabilities.

Finally, the fourth stage involves model testing and evaluation. In this phase, the trained models are assessed for accuracy and reliability using a separate testing dataset. This evaluation ensures that the models perform effectively on unseen data and can provide trustworthy predictions.

The entire process, from data collection to model evaluation, is systematically depicted in Figure 2, offering a clear overview of the methodology and its sequential steps.

4. Results and Discussion

The variance explained by each PC, as shown in Table 3, highlights their relative importance in representing the dataset’s variability. PC1, with an explained variance of 39.75%, is the most significant contributor, followed by PC2 (19.85%) and PC3 (12.69%). This suggests that the first three PCs capture the majority (~72%) of the dataset’s variance, making them highly informative for the predictive task. The diminishing variance from PC4 to PC9 indicates that these components contribute progressively less information and may be less critical for model training. This observation motivates the incremental reduction in input dimensions for the case studies, as illustrated in Table 4, where we defined all the cases explored in our investigation.

Table 3. The variance of each principal component.

Principal component	Explained variance
PC1	0.397516
PC2	0.198547
PC3	0.12692
PC4	0.092933
PC5	0.075476
PC6	0.051769
PC7	0.038788
PC8	0.015533
PC9	0.002519

Table 4. The definition of each case study.

Case	Inputs	Output
a	Temperature, humidity, pression, windsp, winddir, max_windsp, precipitation, DNI, DHI	GHI
b	PC1, PC2, PC3, PC4, PC5, PC6, PC7, PC8, PC9	GHI
c	PC1, PC2, PC3, PC4, PC5, PC6, PC7, PC8	GHI
d	PC1, PC2, PC3, PC4, PC5, PC6, PC7	GHI
e	PC1, PC2, PC3, PC4, PC5, PC6	GHI
f	PC1, PC2, PC3, PC4, PC5	GHI
g	PC1, PC2^, PC3, PC4	GHI
h	PC1, PC2, PC3	GHI
i	PC1, PC2	GHI
J	PC1	GHI

based on Table 5, Case a (all original were features used) achieved the best performance, with an MSE of 562.612, an RMSE of 23.719, and an R² of 0.99198. The MAPE of 0.1899% and MBE of 2.078 demonstrate exceptional predictive accuracy and minimal bias. Training required 179.41 s over 94 epochs, reflecting a balance between computational efficiency and model complexity. The inclusion of all original features appears to provide the model with comprehensive information, resulting in superior performance. Case b utilized all nine PCs as inputs and yielded slightly lower performance compared to Case a. While the R² of 0.99159 indicates excellent predictive capability, the MAPE increased to 0.4293%, and the training time rose to 231.79 s over 147 epochs. This suggests that while PCs effectively compress the information, the transformation may introduce a minor loss in interpretability and predictive precision. Cases c–e: Retaining six or more PCs (e.g., Case c with PCs 1–8 and Case e with PCs 1–6) maintained high R² values above 0.97. However, the performance began to decline significantly in terms of MSE, RMSE, and MAPE, particularly in Case e (MSE: 1670.77, RMSE: 40.875, R²: 0.97619). Training times and epochs increased, indicating greater computational demands to fit reduced input spaces. Cases f–j: Retaining fewer than six PCs led to a pronounced drop in performance. For instance, Case f (PCs 1–5) had an R² of 0.82080, and Case g (PCs 1–4) further deteriorated to 0.77989. The worst performance was observed in Case j (using only PC1), with an R² of 0.11699, an MSE of 61955.913, and a dramatically higher MAPE of 64.59%. These results confirm that while PC1 captures the largest variance, additional PCs are critical for preserving sufficient information for accurate GHI predictions.

Table 5. Performance of different models at both cases.

Case	MSE	RMSE	R-squared	MAPE	MBE	Training time (s)	Number of epochs
a	562.612	23.719	0.99198	0.1899	2.078	179.405893	94
b	590.215	24.294	0.99159	0.4293	0.682	231.788185	147
c	564.068	23.750	0.99196	0.3142	−0.192	315.811647	247
d	682.345	26.122	0.99027	0.3167	3.993	338.209242	305
e	1670.770	40.875	0.97619	0.5754	4.741	495.685228	472
f	12,573.351	112.131	0.82080	0.5980	−10.778	166.585155	80
g	15,443.813	124.273	0.77989	2.5443	−6.071	237.033534	162
h	18,546.316	136.185	0.73568	7.1959	−3.747	167.359929	102
i	27,141.639	164.747	0.61317	13.5462	−12.078	126.686251	48
j	61,955.913	248.909	0.11699	64.5897	10.155	153.629068	76

Note: Bold values were used the best metrics values among all the cases.

Interestingly, training time and the number of epochs required to converge varied across cases. Cases with fewer PCs (e.g., Case i with PCs 1–2) had shorter training times (126.69 s) but suffered from poor performance, suggesting that the computational savings come at the expense of predictive accuracy. Conversely, Cases b–d required longer training times, with Case d taking the most time (338.21 s). This underscores a trade-off between computational efficiency and model complexity.

Figure 3 provides a visual comparison of the predicted and actual GHI values for different cases. Cases a and b exhibit tight clustering around the ideal prediction line, confirming their high accuracy. As PCs are reduced, the scatter widens, with Cases f–j showing significant deviations, particularly in the tails. This reflects the reduced capability of these models to capture the variance and dynamics of GHI with fewer input features.

The results demonstrate the utility of PCA as a dimensionality reduction technique. While using fewer PCs accelerates training, it also reduces predictive accuracy. For solar energy applications, where precision is paramount, retaining at least six PCs (Case e) appears to provide a reasonable balance between dimensionality reduction and accuracy. However, for critical scenarios, using all original features (Case a) remains the most effective approach.

The comparative evaluation underscores that the choice of input features significantly affects model performance. Cases with comprehensive inputs (original features or a large number of PCs) consistently outperform those with reduced dimensions. This is particularly evident in the stark contrast between Cases a and j. Furthermore, the computational trade-offs highlight the need to carefully balance model complexity against available resources.

These findings underscore the importance of input feature extraction and dimensionality reduction in solar radiation prediction models. Using PCA for dimensionality reduction can effectively compress data while maintaining high performance, particularly when retaining a sufficient number of PCs. However, retaining all original features yields the most accurate predictions. Future studies could explore hybrid approaches, combining PCA with feature selection techniques to further enhance model efficiency and precision.

5. Conclusion

This study investigated the impact of feature extraction and dimensionality reduction using PCA on solar GHI prediction models. The analysis demonstrated that while PCA effectively reduces the dimensionality of the dataset, the performance of the prediction models strongly depends on the number of PCs retained. Retaining all original features provided the highest accuracy, with Case a achieving the best performance metrics, including an R² of 0.99198 and an MAPE of 0.1899%. However, reducing the number of PCs significantly degraded model performance, particularly when fewer than six PCs were used, as observed in Case j (R² = 0.11699 and MAPE = 64.59%).

From a computational perspective, models with fewer PCs required less training time but at the cost of accuracy, highlighting a trade-off between computational efficiency and predictive performance. This trade-off is critical for solar energy forecasting applications, where precise predictions are essential for optimizing energy systems and planning.

Future research offers numerous opportunities to enhance solar radiation prediction models and their applications. Combining PCA with other feature selection methods, such as mutual information or recursive feature elimination, could improve model efficiency by focusing on the most relevant components while minimizing redundancy. Advanced NN architectures, such as convolutional neural networks (CNNs) or transformers, may better capture complex patterns in solar radiation data, especially with reduced feature sets. Incorporating temporal features like time of day and seasonality or spatial features like geographic location could further refine models, particularly for large-scale solar forecasting. Developing interpretable models to provide insights into the contributions of features or PCs would aid decision-makers in understanding the drivers of solar variability. Integrating these models into real-time forecasting systems and evaluating their performance under dynamic conditions would enhance their practicality. Additionally, extending this research to assess the impact of solar radiation predictions on photovoltaic system performance, battery storage optimization, and grid integration could significantly benefit renewable energy deployment. These directions collectively promise to improve the accuracy, efficiency, and applicability of solar radiation prediction models in renewable energy systems.

Conflicts of Interest

The authors declare no conflicts of interest.

Author Contributions

Conceptualization: Oussama Khouili and Mohamed Hanine. Methodology: Oussama Khouili and Mohamed Louzazni. Software, resources, data curation, writing–original draft preparation: Oussama Khouili. Validation, writing–review and editing, supervision: Mohamed Hanine and Mohamed Louzazni. Formal analysis, investigation, visualization, project administration: Oussama Khouili, Mohamed Louzazni, and Mohamed Hanine. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

AI has not been used in the preparation of the manuscript.

Open Research

Data Availability Statement

The data are available publicly in Carreira Pedro, H., Larson, D., and Coimbra, C. (2019). A comprehensive dataset for the accelerated development and benchmarking of solar forecasting methods (Version V1) (dataset). Zenodo. https://doi.org/10.5281/zenodo.2826939.

References

1 Ahmed A. and Khalid M., A Review on the Selected Applications of Forecasting Models in Renewable Power Systems, Renewable and Sustainable Energy Reviews. (2019) 100, 9–21, https://doi.org/10.1016/j.rser.2018.09.046, 2-s2.0-85055283929.
10.1016/j.rser.2018.09.046
Web of Science® Google Scholar
2 Bermejo J. F., Fernández J. F. Gómez, Polo F. O., and Márquez A. C., A Review of the Use of Artificial Neural Network Models for Energy and Reliability Prediction. A Study of the Solar PV, Hydraulic and Wind Energy Sources, Applied Sciences. (2019) 9, no. 9, https://doi.org/10.3390/app9091844, 2-s2.0-85065660811, 1844.
10.3390/app9091844
Google Scholar
3 Ahmed R., Sreeram V., Mishra Y., and Arif M. D., A Review and Evaluation of the State-of-the-Art in PV Solar Power Forecasting: Techniques and Optimization, Renewable and Sustainable Energy Reviews. (2020) 124, https://doi.org/10.1016/j.rser.2020.109792, 109792.
10.1016/j.rser.2020.109792
Web of Science® Google Scholar
4 Jolliffe I. T. and Cadima J., Principal Component Analysis: A Review and Recent Developments, Philosophical Transactions of the Royal Society Series A: Mathematical, Physical, and Engineering Sciences. (2016) 374, no. 2065, 20150202.
10.1098/rsta.2015.0202
PubMed Web of Science® Google Scholar
5 Ali A. K. and Erçelebi E., Automatic Modulation Classification Using Different Neural Network and PCA Combinations, Expert Systems with Applications. (2021) 178, https://doi.org/10.1016/j.eswa.2021.114931, 114931.
10.1016/j.eswa.2021.114931
Web of Science® Google Scholar
6 Fariz T. K. N. and Basha S. S., Enhancing Solar Radiation Predictions Through COA Optimized Neural Networks and PCA Dimensionality Reduction, Energy Reports. (2024) 12, 341–359, https://doi.org/10.1016/j.egyr.2024.06.025.
10.1016/j.egyr.2024.06.025
Web of Science® Google Scholar
7 Yousaf A., Asif R. M., Shakir M., Rehman A. U., and Adrees M. S., An Improved Residential Electricity Load Forecasting Using a Machine-Learning-Based Feature Selection Approach and a Proposed Integration Strategy, Sustainability. (2021) 13, no. 11, https://doi.org/10.3390/su13116199, 6199.
10.3390/su13116199
Web of Science® Google Scholar
8 Rodríguez F. D., Fleetwood A., Galarza A., and Fontán L., Predicting Solar Energy Generation Through Artificial Neural Networks Using Weather Forecasts for Microgrid Control, Renewable Energy. (2018) 126, 855–864, https://doi.org/10.1016/j.renene.2018.03.070, 2-s2.0-85045085063.
10.1016/j.renene.2018.03.070
Web of Science® Google Scholar
9 Yadav A. K. and Chandel S. S., Solar Radiation Prediction Using Artificial Neural Network Techniques: A Review, Renewable and Sustainable Energy Reviews. (2014) 33, 772–781, https://doi.org/10.1016/j.rser.2013.08.055, 2-s2.0-84897515616.
10.1016/j.rser.2013.08.055
Web of Science® Google Scholar
10 Ahmed S. F., Alam M. S. B., and Hassan M., et al.Deep Learning Modelling Techniques: Current Progress, Applications, Advantages, and Challenges, Artificial Intelligence Review. (2023) 56, no. 11, 13521–13617, https://doi.org/10.1007/s10462-023-10466-8.
10.1007/s10462-023-10466-8
Web of Science® Google Scholar
11 Salman S. and Liu X., Overfitting Mechanism and Avoidance in Deep Neural Networks, ArXiv. (2019) .
Google Scholar
12 Howley T., Madden M. G., O’Connell M. L., and Ryder A. G., The Effect of Principal Component Analysis on Machine Learning Accuracy With High Dimensional Spectral Data, International Conference on Innovative Techniques and Applications of Artificial Intelligence, 2006, London, Springer.
Google Scholar
13 Davò F., Alessandrini S., and Sperati S., et al.Post-Processing Techniques and Principal Component Analysis for Regional Wind Power and Solar Irradiance Forecasting, Solar Energy. (2016) 134, 327–338, https://doi.org/10.1016/j.solener.2016.04.049, 2-s2.0-84968739482.
10.1016/j.solener.2016.04.049
Web of Science® Google Scholar
14 Artoni F., Delorme A., and Makeig S., Applying Dimension Reduction to EEG Data by Principal Component Analysis Reduces the Quality of Its Subsequent Independent Component Decomposition, NeuroImage. (2018) 175, 176–187, https://doi.org/10.1016/j.neuroimage.2018.03.016, 2-s2.0-85045101890.
10.1016/j.neuroimage.2018.03.016
PubMed Web of Science® Google Scholar
15 Nguyen L. H., Holmes S., and Ouellette F., Ten Quick Tips for Effective Dimensionality Reduction, PLOS Computational Biology. (2019) 15, no. 6, https://doi.org/10.1371/journal.pcbi.1006907, 2-s2.0-85068491285, e1006907.
10.1371/journal.pcbi.1006907
CAS PubMed Web of Science® Google Scholar
16 Yang L. and Shami A., IoT Data Analytics in Dynamic Environments: From an Automated Machine Learning Perspective, Engineering Applications of Artificial Intelligence. (2022) 116, https://doi.org/10.1016/j.engappai.2022.105366, 105366.
10.1016/j.engappai.2022.105366
Web of Science® Google Scholar
17 Verleysen M. and François D., The Curse of Dimensionality in Data Mining and Time Series Prediction, 3512, Computational Intelligence and Bioinspired Systems. IWANN 2005, 2005, Berlin, Heidelberg, Springer, 758–770, Lecture Notes in Computer Science, https://doi.org/10.1007/11494669_93.
10.1007/11494669_93
Google Scholar
18 Zebari R., Abdulazeez A., Zeebaree D., Zebari D., and Saeed J., A Comprehensive Review of Dimensionality Reduction Techniques for Feature Selection and Feature Extraction, Journal of Applied Science and Technology Trends. (2020) 1, no. 1, 56–70, https://doi.org/10.38094/jastt1224.
10.38094/jastt1224
Google Scholar
19 Maćkiewicz A. and Ratajczak W., Principal Components Analysis (PCA), Computers & Geosciences. (1993) 19, no. 3, 303–342, https://doi.org/10.1016/0098-3004(93)90090-R, 2-s2.0-0027333692.
10.1016/0098-3004(93)90090-R
Web of Science® Google Scholar
20 Hasan B. M. S. and Abdulazeez A. M., A Review of Principal Component Analysis Algorithm for Dimensionality Reduction, Journal of Soft Computing and Data Mining. (2021) 2, no. 1, 20–30.
Google Scholar
21 Juvonen A., Sipola T., and Hämäläinen T., Online Anomaly Detection Using Dimensionality Reduction Techniques for HTTP Log Analysis, Computer Networks. (2015) 91, 46–56, https://doi.org/10.1016/j.comnet.2015.07.019, 2-s2.0-84941071538.
10.1016/j.comnet.2015.07.019
Web of Science® Google Scholar
22 Liu L. and Özsu M. T., Encyclopedia of Database Systems, 6, Springer, New York.
Google Scholar
23 Chai T. and Draxler R. R., Root mean square error (RMSE) or mean absolute error (MAE)?, Geoscientific Model Development Discussions (GMDD). (2014) 7, no. 1, 1525–1534, https://doi.org/10.5194/gmdd-7-1525-2014.
10.5194/gmdd-7-1525-2014
Web of Science® Google Scholar
24 Abiodun O. I., Jantan A., Omolara A. E., Dada K. V., Mohamed N. A. E., and Arshad H., State-of-the-Art in Artificial Neural Network Applications: A Survey, Heliyon. (2018) 4, no. 11, https://doi.org/10.1016/j.heliyon.2018.e00938, 2-s2.0-85057050483, e00938.
10.1016/j.heliyon.2018.e00938
PubMed Web of Science® Google Scholar
25 Arce-Medina E. and Paz-Paredes J. I., Artificial Neural Network Modeling Techniques Applied to the Hydrodesulfurization Process, Mathematical and Computer Modelling. (2009) 49, no. 1-2, 207–214, https://doi.org/10.1016/j.mcm.2008.05.010, 2-s2.0-55549088192.
10.1016/j.mcm.2008.05.010
Web of Science® Google Scholar
26 Schapire R. E., B. Schölkopf, Z. Luo, and V. Vovk, Explaining AdaBoost, Empirical Inference, 2013, Springer, Berlin, Heidelberg, 6–5.
10.1007/978-3-642-41136-6_5
Google Scholar

All articles

Harnessing Principal Component Analysis and Artificial Neural Networks for Accurate Solar Radiation Prediction

Abstract

1. Introduction

2. Literature Review