Bringing the power of equivariant NN potential through the interface of MACE to MLatom@XACS

Published Time:  2024-01-10 21:45:58

Equivariant potentials are the (relatively) new kid on the block with promising high accuracy in published benchmarks. One of them is  MACE  which we now added to the zoo of machine learning potentials available through the interfaces in MLatom. See the above figure with the overview of MLPs supported by MLatom (in bold) and other representatives (modified from our  MLP benchmark paper ). We have just released the MLatom 3.1.0 version with MACE and show how to use it here.


Installation

pip install mlatom git clone https://github.com/ACEsuit/mace.git pip install ./mace


Data preparation

Below we provide a 1000-point dataset that randomly sampled from  MD17 dataset  for the ethanol molecule as the training data (xyz.dat, en.dat, grad.dat, which store the geometires, potential energies, and energy gradients respectively) along with test data of another 1000 points (names begin with "test_").


mace_tutorial   (Note that the energies are in Hartree, and distances are in Ångstrom.)


Training, testing and using MACE can be done through input files, command line, and Python API. Below we show how.


Training and testing with input file and command file


createMLmodel            # task to create MLmodelXYZfile=xyz.dat          # file with geometiesYfile=en.dat             # file with energiesYgradXYZfile=grad.dat    # file with energy gradientsMLmodelType=MACE         # specify the model type to be MACEmace.max_num_epochs=100  # only train for 100 epochs (optional)MLmodelOut=mace.pt       # give your trained model a name

You can save the following input in file train.inp and then run it with MLatom in your terminal as:


  > mlatom train.inp


Alternatively, you can run all options in the command line:

  > mlatom createMLmodel XYZfile=xyz.dat Yfile=en.dat YgradXYZfile=grad.dat MLmodelType=MACE mace.max_num_epochs=100 MLmodelOut=mace.pt


You can also submit a job to our XACS cloud computing or use its online terminal. It's free, but training only on CPUs can be very slow. To speed up the test, you can comment out or delete the line YgradXYZfile=grad.dat, which would only train on energies but will be faster.


The web interface of XACS cloud computing's job submitter



After the training of 100 epochs is finished (it may take a while especially if you don't use a GPU), you will see the analysis of the training performance generated by MACE and MLatom. My result looks like:



The validation RMSE is 14.1 meV (or 0.33 kcal/mol), which is quite impressive for just 1000 training points.

Then you can test the trained model with the test files by following inputs:


useMLmodelXYZfile=test_xyz.datYgradXYZestFile=test_gradest.datYestfile=test_enest.datMLmodelType=MACEMLmodelIn=mace.pt
analyze Yfile=test_en.dat YgradXYZfile=test_grad.dat Yestfile=test_enest.dat YgradXYZestFile=test_gradest.dat


The analysis results looks like (note that the orignal unit is Hartree and Hartree/Angstrom):



Around 0.45 kcal/mol for energy and 0.76 kcal/mol/A for gradients.



Training and using Python


First, let's import MLatom:

import mlatom as ml


which offers greate flexibility. You can check the documentation .


Doing the training in Python is also simple. First, load the data into a molecular database:
molDB = ml.data.molecular_database.from_xyz_file(filename = 'xyz.dat')molDB.add_scalar_properties_from_file('en.dat', 'energy') molDB.add_xyz_vectorial_properties_from_file('grad.dat', 'energy_gradients')


Then define a MACE model and train with the database:

model= ml.models.mace(model_file='mace.pt', hyperparameters={'max_num_epochs': 100})model.train(molDB, property_to_learn='energy', xyz_derivative_property_to_learn='energy_gradients')


Making predictions with the model:

test_molDB = ml.data.molecular_database.from_xyz_file(filename = 'test_xyz.dat')test_molDB.add_scalar_properties_from_file('test_en.dat', 'energy')test_molDB.add_xyz_vectorial_properties_from_file('test_grad.dat', 'energy_gradients')
model.predict(molecular_database=test_molDB,property_to_predict='mace_energy',xyz_derivative_property_to_predict='mace_gradients'


Then you can do analysis whatever you like, e.g. calculate RMSE:

ml.stats.rmse(test_molDB.get_properties('energy'), test_molDB.get_properties('mace_energy'))*ml.constants.Hartree2kcalpermol
ml.stats.rmse(test_molDB.get_xyz_vectorial_properties('energy_gradients').flatten(), test_molDB.get_xyz_vectorial_properties('mace_gradients').flatten())*ml.constants.Hartree2kcalpermol


Using the model


After the model is trained, it can be used with MLatom for applications, e.g., geometry optimizations or MD, check out MLatom's manual for details. Here is brief example how the input file for geometry optimization would look like:

geomopt                      # Request geometry optimizationMLmodelType=MACE             # use ML model of the MACE typeMLmodelIn=mace.pt            # the model to be usedXYZfile=ethanol_init.xyz     # The file with the initial guessoptXYZ=eq_MACE.xyz           # optimized geometry output


In Python, geometry optimization is also quite simple:

import mlatom as ml
# load initial geometrymol = ml.data.molecule.from_xyz_file('ethanol_init.xyz')print(mol.get_xyz_string())
# load the modelmodel = ml.models.mace(model_file='mace.pt')
# run geometry optimizationml.optimize_geometry(model=model, molecule=mol, program='ASE')print(mol.get_xyz_string())


Summary


We are glad to introduce here the MACE interface in MLatom and share the tutorial on how to use it. This model shows a great performance even with a relatively small data set size. Hope it will be helpful to you.


Finaly, atomistic machine learning is growing fast, and as an integrative platform, MLatom will keep evolving with state-of-the-art methods to offer the best experience for the community.