DOI: https://doi.org/10.29363/nanoge.stabpero.2020.002
Publication date: 31st May 2020
Compositional engineering of perovskites enables the precise control of key material properties such as the bandgap [1]. This possibility makes perovskites a promising material for multijunc- tion, “tandem” solar cells, where the combination of two different bandgaps allows to easily break the Shockley-Queisser limit and thus improve efficiency [2]. The remaining challenge is to find structures with the target bandgap which are both stable in the environment as well as non-toxic (i.e. lead-free).
To this end, computer simulations allow rapid screening of a large array of compositions for a given structure and subsequent data modeling. However typical high-troughput calculation fall short in capturing the effect of different geometries in modeling, thus severely constraining the applicability of the result in predicting “new” structures. This becomes especially prob- lematic for large-cell systems, such as for example mixed, lead-free double perovskites, where different compositions might have varying relaxed geometries and sampling the whole feature space becomes infeasible.
Today researcher try to work around this problem by building surrogate models. In our work, we follow the approach outlined initially in the exploration of molecular datasets [3] and subsequently used for inorganic materials research [4]: first we employ a fingerprinting function to create a regular feature vector representation for all structure samples of a training databse and then feed it to a machine learning algorithm together with a target property from ab- initio-calculations for the original structure. Ideally, the resulting model can then use material fingerprints (which could also originate from experimental results) as a proxy for fast property prediction of new materials, sampling vast regions of the whole feature space and, ideally, opening up a way to extract “regions of interest” in feature space, informing high-level theoretical and practical work.
Building upon our previous work of introducing a new, general Radial-distribution-function (RDF)-based fingerprint (while still employing the typical Kernel-Ridge-Regression (KRR) ap- proach on machine learning (ML)) [5], we are now exploring the generalizability of this ap- proach on existing perovskite and novel, inhouse-developed, lead-free, mixed-inorganic per- ovskite databases. To this end we are replacing the KRR with neural networks and also com- paring to various other fingerprints (sine matrix, SOAP and more from the dscribe library [6]) as well as simple “structure-informed”, non-general property features (e.g. for an A2B-structure the average of a property on the A-site-atoms).
To tackle the problem that the fingerprint size increases with database complexity (structure size and elemental variation) to the order of N 2 (where N is the number of atoms in the structure) and thus mandates a larger DFT-based training data set, which in turn requires O(N 3)-scaling DFT-calculations, we employ an autoencoder framework, where a specially designed neural network is used to shrink the structure fingerprint into a meaningful intermediate representation. Ideally this representation should include all the information which is (redundantly) stored in the fingerprint and allow to build a good, non-overfitting model with much less data.
Preliminary studies on the hybrid-perovskite dataset published in [7] show the promise of the latter approach. Although the structure of the dataset (molecular center-ions, which are adding a lot of noise and for which properties are not as readily available as for atoms) prevents efficient usage of the PDDF, we see how adding a structural fingerprint improves on the model compared to plainly using basic features of the course-grained hybrid perovskites structure. Further, we can effectively reduce overfitting of the PDDF-model by encoding the fingerprint with an autoencoder.
References
[1] Michael Saliba et al. “Incorporation of rubidium cations into perovskite solar cells improves pho- tovoltaic performance”. In: Science 354.6309 (2016), pp. 206–209. issn: 0036-8075. doi: 10.1126/ science.aah5557.
[2] Ajay Singh and Alessio Gagliardi. “Efficiency of all-perovskite two-terminal tandem solar cells: A drift-diffusion study”. In: Solar Energy 187 (July 2019), pp. 39–46. doi: 10.1016/j.solener.2019. 05.006.
[3] Felix Faber et al. “Crystal structure representations for machine learning models of formation en- ergies”. In: International Journal of Quantum Chemistry 115.16 (Apr. 2015), pp. 1094–1101. doi: 10.1002/qua.24917.
[4] G. Pilania et al. “Machine learning bandgaps of double perovskites”. In: Scientific Reports 6.19375 (). doi: 10.1038/srep19375.
Jared C. Stanley, Felix Mayr, and Alessio Gagliardi. “Machine Learning Stability and Bandgaps of Lead-Free Perovskites for Photovoltaics”. (accepted to Adv. Theory Simul.) Oct. 2019.
[6] Lauri Himanen et al. “DScribe: Library of Descriptors for Machine Learning in Materials Science”. In: (Apr. 18, 2019). doi: 10.1016/j.cpc.2019.106949. arXiv: http://arxiv.org/abs/1904. 08875v1 [cond-mat.mtrl-sci].
[7] Chiho Kim et al. “A hybrid organic-inorganic perovskite dataset”. In: Scientific Data 4 (May 2017), p. 170057. doi: 10.1038/sdata.2017.57.