Equivariant potentials are the (relatively) new kid on the block with promising high accuracy in published benchmarks. One of them is MACE which we now added to the zoo of machine learning potentials available through the interfaces in MLatom. See the above figure with the overview of MLPs supported by MLatom (in bold) and other representatives (modified from our MLP benchmark paper ). We have just released the MLatom 3.1.0 version with MACE and show how to use it here.
Installation
pip install mlatom
git clone https://github.com/ACEsuit/mace.git
pip install ./mace
Data preparation
Below we provide a 1000-point dataset that randomly sampled from MD17 dataset for the ethanol molecule as the training data (xyz.dat, en.dat, grad.dat, which store the geometires, potential energies, and energy gradients respectively) along with test data of another 1000 points (names begin with "test_").
mace_tutorial
(Note that
the energies are in Hartree, and distances are in Ångstrom.)
Training, testing and using MACE can be done through input files, command line, and Python API. Below we show how.
Training and testing with input file and command file
createMLmodel
XYZfile=xyz.dat
Yfile=en.dat
YgradXYZfile=grad.dat
MLmodelType=MACE
mace.max_num_epochs=100
MLmodelOut=mace.pt
You can save the following input in file train.inp and then run it with MLatom in your terminal as:
> mlatom train.inp
Alternatively, you can run all options in the command line:
> mlatom createMLmodel XYZfile=xyz.dat Yfile=en.dat YgradXYZfile=grad.dat MLmodelType=MACE mace.max_num_epochs=100 MLmodelOut=mace.pt
You can also submit a job to our XACS cloud computing or use its online terminal. It's free, but training only on CPUs can be very slow. To speed up the test, you can comment out or delete the line YgradXYZfile=grad.dat, which would only train on energies but will be faster.
The web interface of XACS cloud computing's job submitter
useMLmodel
XYZfile=test_xyz.dat
YgradXYZestFile=test_gradest.dat
Yestfile=test_enest.dat
MLmodelType=MACE
MLmodelIn=mace.pt
analyze
Yfile=test_en.dat
YgradXYZfile=test_grad.dat
Yestfile=test_enest.dat
YgradXYZestFile=test_gradest.dat
The analysis results looks like (note that the orignal unit is Hartree and Hartree/Angstrom):
Around 0.45 kcal/mol for energy and 0.76 kcal/mol/A for gradients.
Training and using Python
First, let's import MLatom:
import mlatom as ml
which offers greate flexibility. You can check the documentation .
molDB = ml.data.molecular_database.from_xyz_file(filename = 'xyz.dat')
molDB.add_scalar_properties_from_file('en.dat', 'energy')
molDB.add_xyz_vectorial_properties_from_file('grad.dat', 'energy_gradients')
Then define a MACE model and train with the database:
model= ml.models.mace(model_file='mace.pt', hyperparameters={'max_num_epochs': 100})
model.train(molDB, property_to_learn='energy', xyz_derivative_property_to_learn='energy_gradients')
Making predictions with the model:
test_molDB = ml.data.molecular_database.from_xyz_file(filename = 'test_xyz.dat')
test_molDB.add_scalar_properties_from_file('test_en.dat', 'energy')
test_molDB.add_xyz_vectorial_properties_from_file('test_grad.dat', 'energy_gradients')
model.predict(molecular_database=test_molDB,property_to_predict='mace_energy',xyz_derivative_property_to_predict='mace_gradients')
Then you can do analysis whatever you like, e.g. calculate RMSE:
ml.stats.rmse(test_molDB.get_properties('energy'), test_molDB.get_properties('mace_energy'))*ml.constants.Hartree2kcalpermol
ml.stats.rmse(test_molDB.get_xyz_vectorial_properties('energy_gradients').flatten(), test_molDB.get_xyz_vectorial_properties('mace_gradients').flatten())*ml.constants.Hartree2kcalpermol
Using
the model
After the model is trained, it can be used with MLatom for applications, e.g., geometry optimizations or MD, check out MLatom's manual for details. Here is brief example how the input file for geometry optimization would look like:
geomopt # Request geometry optimization
MLmodelType=MACE
MLmodelIn=mace.pt
XYZfile=ethanol_init.xyz # The file with the initial guess
optXYZ=eq_MACE.xyz
In Python, geometry optimization is also quite simple:
import mlatom as ml
mol = ml.data.molecule.from_xyz_file('ethanol_init.xyz')
print(mol.get_xyz_string())
model = ml.models.mace(model_file='mace.pt')
ml.optimize_geometry(model=model, molecule=mol, program='ASE')
print(mol.get_xyz_string())
Summary