Publication date: 10th April 2024
Overcoming the challenge of limited data availability within materials science is crucial for the broad-based applicability of machine learning within materials science, specifically the well-established deep learning paradigms that have been successful in image and natural language processing. One pathway to overcome this limited data availability is to use the framework of transfer learning, where a pre-trained machine learning model (on a larger dataset) can be fine-tuned on a target (typically smaller) dataset. Our study explores the utility of transfer learning as a systematic solution to address the lack of large datasets in materials science. We focus on developing a framework that leverages a Graph Neural Networks (GNN) to transfer learn material properties. To achieve this, we used seven diverse curated datasets from MatBench, encompassing sizes ranging from 941 to 132,752 data points. These datasets cover a spectrum of material properties, ranging from band gaps, to formation energies, and piezoelectric moduli. We employed three distinct pre-training strategies primarily to capture the impact of dataset size. Subsequently, we fine-tuned each pre-trained model on the remaining six datasets, once again employing three fine-tuning strategies to determine best performance. Importantly, we find that GNNs with greater flexibility during fine-tuning exhibited higher performance (in terms of r2scores), even surpassing models constructed from scratch using the respective datasets. Furthermore, we also examine the utility of our pre-training and fine-tuning strategies on properties that had minimal physical correlation with the seven pre-training datasets considered. In summary, we demonstrate transfer learning as a pivotal strategy by facilitating a comprehensive understanding of material properties, bridging data gaps, and enhancing material property prediction.