Train and use models for H2
Here we will show how to train and use different machine learning potential models for H2 as examples:
KREG
Firstly, we train a KREG model for H2. For command line, prepare the input file h2_train_KREG.inp
.
Two auxiliary files are needed: h2.xyz
with XYZ geometries of hydrogen and E_FCI_451.dat
with labels (reference FCI/aug-cc-pV6Z energies in Hartree). The trained model will be saved in energies.unf
.
# h2_train_KREG.inp
createMLmodel # Specify the task for MLatom
MLmodelType=KREG # Specify the model type
MLmodelOut=energies.unf # Save model in energies.unf
XYZfile=h2.xyz # File with XYZ geometries
Yfile=E_FCI_451.dat # The file with FCI energies
sigma=opt # Optimize hyperparameter sigma
lgSigmaL=-4 # Lower bound of log2(sigma)
lambda=opt # Optimize hyperparameter lambda
And then run it:
mlatom h2_train_KREG.inp
In Python, we as well as need to provide the same two auxiliary files.
import mlatom as ml
# load data set
molDB = ml.data.molecular_database.from_xyz_file('h2.xyz')
molDB.add_scalar_properties_from_file('E_FCI_451.dat', 'energy')
# define the model
model = ml.models.kreg(model_file='energies')
# split data set for optimizing hyperparameters
[subtraining_molDB, validation_molDB] = ml.data.sample(number_of_splits=2, fraction_of_points_in_splits=[0.8, 0.2], molecular_database_to_split=molDB, sampling='random')
# optimize hyperparameters
model.hyperparameters["sigma"].minval = 2**-4
model.optimize_hyperparameters(subtraining_molecular_database=subtraining_molDB,
validation_molecular_database=validation_molDB,
optimization_algorithm='nelder-mead',
hyperparameters=['lambda', 'sigma'],
training_kwargs={'property_to_learn': 'energy'},
prediction_kwargs=None)
lmbd = model.hyperparameters['lambda'].value
sigma = model.hyperparameters['sigma'].value
print(f'Optimized hyperparameters: lambda={lmbd}, sigma={sigma}')
# train the final model
model.train(molecular_database=molDB, property_to_learn='energy')
Now we can use the model. For command line, prepare the input file h2_opt_KREG.inp
for geometry optimization.
We need to provide the initial geometry of H2 (h2_init.xyz
) and the trained model in
the previous step (energies.unf
)
# h2_opt_KREG.inp
geomopt # Request geometry optimization
MLmodelType=KREG # of the KREG type
MLmodelIn=energies.unf # in energies.unf file
XYZfile=h2_init.xyz # The file with initial guess
optXYZ=eq_KREG.xyz # optimized geometry output
-------------------------------------------------------------------
# h2_init.xyz
2
H 0.0000000000000 0.0000000000000 0.0000000000000
H 0.0000000000000 0.0000000000000 0.8000000000000
Perform geometry optimization.
mlatom h2_opt_KREG.inp
The output of optimized geometry is in eq_KREG.xyz
.
cat eq_KREG.xyz
In Python, we need to provide the same auxiliary files.
import mlatom as ml
# load initial geometry
mol = ml.data.molecule.from_xyz_file('h2_init.xyz')
print(mol.get_xyz_string())
# load the model
model = ml.models.kreg(model_file='energies')
# run geometry optimization
ml.optimize_geometry(model=model, molecule=mol, program='ASE')
print(mol.get_xyz_string())
TorchANI
Except for the KREG model, we can also use other machine learning potential models, e.g., ANI model. Same as above,
for command line, prepare the input file h2_train_ANI.inp
and auxiliary files
(h2.xyz
, E_FCI_451.dat
). The trained model will be saved in energies_ani.pt
.
# h2_train_ANI.inp
createMLmodel # Specify the task for MLatom
MLmodelType=ANI # Specify the model type
MLmodelOut=energies_ani.pt # Save model in energies_ani.pt
XYZfile=h2.xyz # File with XYZ geometries
Yfile=E_FCI_451.dat # The file with FCI energies but can be any other property
#ani.max_epochs=16 # Only train 16 epochs
Run it:
mlatom h2_train_ANI.inp
In Python, we need to prepare the same two auxiliary files.
import mlatom as ml
# load data set
molDB = ml.data.molecular_database.from_xyz_file('h2.xyz')
molDB.add_scalar_properties_from_file('E_FCI_451.dat', 'energy')
# define the model
model = ml.models.ani(model_file='energies_ani_api.pt', hyperparameters={'max_epochs': 16})
# train the final model
model.train(molecular_database=molDB, property_to_learn='energy')
Now we can use the model for geometry optimization, for command line, prepare the input file h2_opt_ANI.inp
and the auxiliary files: the initial geometry of H2 (h2_init.xyz
) and the trained
model in the previous step (energies_ani.pt
).
# h2_opt_ANI.inp
geomopt # Request geometry optimization
MLmodelType=ANI # of the KREG type
MLmodelIn=energies_ani.pt # in energies_ani.pt file
XYZfile=h2_init.xyz # The file with initial guess
optXYZ=eq_ANI.xyz # optimized geometry output
Perform geometry optimization.
mlatom h2_opt_ANI.inp
The output of optimized geometry is in eq_ANI.xyz
.
cat eq_ANI.xyz
In Python, we need to prepare the same auxiliary files.
import mlatom as ml
# load initial geometry
mol = ml.data.molecule.from_xyz_file('h2_init.xyz')
print(mol.get_xyz_string())
# load the model
model = ml.models.ani(model_file='energies_ani_api.pt')
# run geometry optimization
ml.optimize_geometry(model=model, molecule=mol, program='ASE')
print(mol.get_xyz_string())