Train and use models for H2

Here we will show how to train and use different machine learning potential models for H2 as examples:

KREG

Firstly, we train a KREG model for H2. For command line, prepare the input file h2_train_KREG.inp. Two auxiliary files are needed: h2.xyz with XYZ geometries of hydrogen and E_FCI_451.dat with labels (reference FCI/aug-cc-pV6Z energies in Hartree). The trained model will be saved in energies.unf.

# h2_train_KREG.inp
createMLmodel           # Specify the task for MLatom
MLmodelType=KREG        # Specify the model type
MLmodelOut=energies.unf # Save model in energies.unf
XYZfile=h2.xyz          # File with XYZ geometries
Yfile=E_FCI_451.dat     # The file with FCI energies
sigma=opt               # Optimize hyperparameter sigma
lgSigmaL=-4             # Lower bound of log2(sigma)
lambda=opt              # Optimize hyperparameter lambda

And then run it:

mlatom h2_train_KREG.inp

In Python, we as well as need to provide the same two auxiliary files.

import mlatom as ml

# load data set
molDB = ml.data.molecular_database.from_xyz_file('h2.xyz')
molDB.add_scalar_properties_from_file('E_FCI_451.dat', 'energy')

# define the model
model = ml.models.kreg(model_file='energies')

# split data set for optimizing hyperparameters
[subtraining_molDB, validation_molDB] = ml.data.sample(number_of_splits=2, fraction_of_points_in_splits=[0.8, 0.2], molecular_database_to_split=molDB, sampling='random')

# optimize hyperparameters
model.hyperparameters["sigma"].minval = 2**-4
model.optimize_hyperparameters(subtraining_molecular_database=subtraining_molDB,
                                     validation_molecular_database=validation_molDB,
                                     optimization_algorithm='nelder-mead',
                                     hyperparameters=['lambda', 'sigma'],
                                     training_kwargs={'property_to_learn': 'energy'},
                                     prediction_kwargs=None)
lmbd = model.hyperparameters['lambda'].value
sigma = model.hyperparameters['sigma'].value
print(f'Optimized hyperparameters: lambda={lmbd}, sigma={sigma}')

# train the final model
model.train(molecular_database=molDB, property_to_learn='energy')

Now we can use the model. For command line, prepare the input file h2_opt_KREG.inp for geometry optimization. We need to provide the initial geometry of H2 (h2_init.xyz) and the trained model in the previous step (energies.unf)

# h2_opt_KREG.inp
geomopt                # Request geometry optimization
MLmodelType=KREG       # of the KREG type
MLmodelIn=energies.unf # in energies.unf file
XYZfile=h2_init.xyz    # The file with initial guess
optXYZ=eq_KREG.xyz     # optimized geometry output
-------------------------------------------------------------------
# h2_init.xyz
2

H             0.0000000000000           0.0000000000000           0.0000000000000
H             0.0000000000000           0.0000000000000           0.8000000000000

Perform geometry optimization.

mlatom h2_opt_KREG.inp

The output of optimized geometry is in eq_KREG.xyz.

cat eq_KREG.xyz

In Python, we need to provide the same auxiliary files.

import mlatom as ml

# load initial geometry
mol = ml.data.molecule.from_xyz_file('h2_init.xyz')
print(mol.get_xyz_string())

# load the model
model = ml.models.kreg(model_file='energies')

# run geometry optimization
ml.optimize_geometry(model=model, molecule=mol, program='ASE')
print(mol.get_xyz_string())

TorchANI

Except for the KREG model, we can also use other machine learning potential models, e.g., ANI model. Same as above, for command line, prepare the input file h2_train_ANI.inp and auxiliary files (h2.xyz, E_FCI_451.dat). The trained model will be saved in energies_ani.pt.

# h2_train_ANI.inp
createMLmodel               # Specify the task for MLatom
MLmodelType=ANI             # Specify the model type
MLmodelOut=energies_ani.pt  # Save model in energies_ani.pt
XYZfile=h2.xyz              # File with XYZ geometries
Yfile=E_FCI_451.dat         # The file with FCI energies but can be any other property
#ani.max_epochs=16          # Only train 16 epochs

Run it:

mlatom h2_train_ANI.inp

In Python, we need to prepare the same two auxiliary files.

import mlatom as ml

# load data set
molDB = ml.data.molecular_database.from_xyz_file('h2.xyz')
molDB.add_scalar_properties_from_file('E_FCI_451.dat', 'energy')

# define the model
model = ml.models.ani(model_file='energies_ani_api.pt', hyperparameters={'max_epochs': 16})

# train the final model
model.train(molecular_database=molDB, property_to_learn='energy')

Now we can use the model for geometry optimization, for command line, prepare the input file h2_opt_ANI.inp and the auxiliary files: the initial geometry of H2 (h2_init.xyz) and the trained model in the previous step (energies_ani.pt).

# h2_opt_ANI.inp
geomopt                     # Request geometry optimization
MLmodelType=ANI             # of the KREG type
MLmodelIn=energies_ani.pt   # in energies_ani.pt file
XYZfile=h2_init.xyz         # The file with initial guess
optXYZ=eq_ANI.xyz           # optimized geometry output

Perform geometry optimization.

mlatom h2_opt_ANI.inp

The output of optimized geometry is in eq_ANI.xyz.

cat eq_ANI.xyz

In Python, we need to prepare the same auxiliary files.

import mlatom as ml

# load initial geometry
mol = ml.data.molecule.from_xyz_file('h2_init.xyz')
print(mol.get_xyz_string())

# load the model
model = ml.models.ani(model_file='energies_ani_api.pt')

# run geometry optimization
ml.optimize_geometry(model=model, molecule=mol, program='ASE')
print(mol.get_xyz_string())