By Dr. Olexandr Isayev, Ph.D.
Associate Professor, Carnegie Mellon University
Machine learned interatomic potentials (MLIPs) are reshaping computational chemistry practices because of their ability to drastically exceed the accuracy-length/time scale tradeoff. Despite this attraction, the benefits of such efficiency are only impactful when an MLIP uniquely enables insight into a target system or is broadly transferable outside of the training dataset, where models achieving the latter are seldom reported. Recently, we presented the 2nd generation of our atoms-in-molecules neural network potential (AIMNet2), which is applicable to species composed of up to 14 chemical elements in both neutral and charged states, making it a valuable model for modeling the majority of non-metallic compounds.
Intermolecular interactions are central to many fields of material science including drug formulation and crystal engineering. Generally, two computational approaches to crystal structure prediction (CSP) exist: the calculation of the intermolecular energy is performed by first-principles methods or by force fields. In this work, we have presented a computationally efficient and accurate ML approach to CSP, using AIMNet2 model that adequately describes long-range interactions as a starting point. The AIMNet2 is characterized by explicit dispersion and electrostatics terms and features a high efficiency in terms of training and costs.
The outlined MLIP-based methodology performs CSP exclusively by learning intermolecular interactions within molecular clusters and avoids additional first-principles calculations, except for the cost-effective generation of DFT labels for molecular clusters. The performance of AIMNet2 interatomic potential is illustrated through several case studies. First, predicted crystal energy landscape of 5-Methyl-2-[(2-nitrophenyl)amino]-3-thiophenecarbonitrile (commonly known as ROY for the red, orange, and yellow colors of polymorphs). The ROY molecule holds the current world record with 12 fully characterized polymorphs. Second, an FDA approved drug Celebrex, known generically as celecoxib. Celebrex exhibit polymorphism and CSP was used to predict recently characterized polymorph. Finally, we also report results of our participation in the seventh CSP blind test target systems, encompassing diverse chemistry of molecules from pharmaceutical and agrochemical industries.