Proceedings of MATSUS Spring 2025 Conference (MATSUSSpring25)
DOI: https://doi.org/10.29363/nanoge.matsusspring.2025.060
Publication date: 16th December 2024
Accurate predictive models are required for photovoltaic (PV) performance reliability assessment and failure diagnostics [1]-[3]. Studies have shown that machine learning models can accurately predict the power conversion efficiency of perovskite solar cells (PSCs) based on composition and structural parameters [4]-[5]. Machine learning algorithms have been utilized before [6] using indoor stability data sets to predict the outdoor stability of perovskite-based devices. However, machine learning models using high-throughput outdoor stability data from perovskite-based devices to predict their time-series power output is still limited in the literature [7-8].
This work aims to utilize several data-driven algorithms (based on machine learning principles) to predict the power output of different perovskite devices (both single and tandem configuration). Namely, three gradient boosting models (CatBoost, XGBoost and LightGBM) have been employed for predicting the PV performance and output power from perovskite-based devices based on long-term outdoor data. The prediction performance of the different machine learning models was evaluated using yearly datasets containing instantaneous field measurements obtained from the outdoor test site in Nicosia, Cyprus. In all cases, the PV time series dataset was split into a random 70:30% train and test set approach. More specifically, 70% of the dataset was used for model’s training, while the rest 30% was used for testing the accuracy of the models. Prior to the model development, data quality checks were performed [9] along the most influential input parameters using statistics (Pearson correlation) were identified.
For the evaluation of the predictive accuracy of the constructed models, the normalized root mean square error (nRMSE) metric was used [8]. The obtained results demonstrated that all models provide good predictive quality (nRMSE<7%) using the instantaneous measurements. Better prediction performance was provided by the LightGBM regression model which presented the lowest nRMSE (<4%) across the whole test set. Dependence of the prediction accuracy of the models with output power levels was detected with larger discrepancies between the actual and predicted power to obtained at lower power levels.
Evaluation of the performance of the models at different train set data partition as well as at different filtering conditions is underway. This study will provide evidence regarding the dependence of the predictive accuracy on the train set duration, data filtering conditions and irradiance profile classification.
This work has been financed by the European Union through the TESTARE project (Grant ID: 101079488) and by the European Regional Development Fund and the Republic of Cyprus through the DegradationLab project (Grant ID: INFRASTRUCTURES/1216/0043).