Active learning for building data-efficient machine learning potentials

Published Time:  2024-07-03 21:10:05


If you want to use machine learning for potential energy surfaces, one of the biggest obstacles is getting the data to train machine learning potential. We have recently developed the physics-informed active learning protocol for efficient data sampling and training potentials from scratch as described in this preprint.

We have integrated this protocol in MLatom to provide a user-friendly solution: it is now available in our new MLatom 3.7.0 release, which you can either get through pip install, find on GitHub, or use on the XACS cloud computing.

Here are some examples of what you can achieve with this protocol:

- Get machine learning potential in a couple of days, trained on less than a thousand DFT calculations, and use it for running very long MD to generate vibrational spectra

- Run thousands of quasi-classical trajectories to find surprising rare events in reaction mechanisms, such as roaming in the Diels–Alder reaction with fullerene.

 We also provide tutorials on performing such active learning-assisted simulations, including end-to-end active learning starting from initial sampling to choosing your ML model and sampler of new points and running final simulations.