Tutorial on geometry optimizations with MLatom@XACS

Published Time:  2024-01-18 10:30:12

In this tutorial we show how to optimize molecular geometries with MLatom. Here, only the optimization of the minima is shown and the optimization of the transition states will be shown elsewhere (for now, please check out the manual).


For the tutorials below, download the tutorial zipped archive of the required files:

We start with the simplest example, using the input file geomopt.inp with minimum number of options (see below for optimizing via Python API):

geomopt                 # 1. requests geometry optimization
ANI-1ccx                # 2. pre-trained model
xyzfile=init.xyz        # 3. initial geometry guess
optxyz=opt.xyz          # 4. file with optimized geometry

As you see, the input file is pretty self-explanatory, the geomopt option requests the geometry optimization in the first line. The user needs to provide the model or method, here we use the universal ML potential ANI-1ccx in the second line. In principle, any QM method, ML model (pre-trained or trained by the user), or hybrid ML/ML models that provides energies and gradients can be used (we will explain below how to do it).

xyzfile=[file name] tells MLatom in what file it can find the initial guess. The user should provide the initial guess in the XYZ file (in Angstrom, e.g., init.xyz) for the geometry, e.g., for ethanol which would look like:

9

C       -1.691449880     -0.315985130      0.000000000
H       -1.334777040      0.188413060      0.873651500
H       -1.334777040      0.188413060     -0.873651500
H       -2.761449880     -0.315971940      0.000000000
C       -1.178134160     -1.767917280      0.000000000
H       -1.534806620     -2.272315330      0.873651740
H       -1.534807450     -2.272316160     -0.873650920
O        0.251865840     -1.767934180     -0.000001150
H        0.572301420     -2.672876720      0.000175020

The final line with optxyz=[file name] will request MLatom to save the optimized geometry in XYZ file (here opt.xyz). Order of the options is not important, options are case-insensitive except for file names.

Important: MLatom will not overwrite the XYZ file with the optimized geometry defined by optxyz. Before you run, please make sure, there is no file with this name in your directory or the optimization will produce any optimized geometry. If you have such a file – rename or delete it.

After you created the input file and initial XYZ file, you can run MLatom, e.g., on XACS cloud, as:

mlatom geomopt.inp &> geomopt.out

The optimized geometry will be saved in XYZ file opt.xyz. The program output will be saved in file geomopt.out.

Alternatively, you can run the same simulations without any input file simply with in the command line with the same options:

mlatom geomopt ANI-1ccx xyzfile=init.xyz optxyz=opt.xyz

You should be able to find in the MLatom output the properties of the final geometry, i.e., the final energy and any other relevant properties such as standard deviation of the neural networks in the ANI-1ccx method. In addition, you can check how the optimization progresses as it will print out the energy for each optimization step. For our example, the part of the output would look like:

==============================================================================
Optimization of molecule 1
==============================================================================

Iteration         Energy (Hartree)
         1           -154.8894274342429
         2           -154.8894274342429
         3           -154.8913352129107
         4           -154.8915612759089
         5           -154.8917131295130
         6           -154.8917542901067
         7           -154.8918338589389
         8           -154.8918607356472
         9           -154.8918827332715
        10           -154.8919015405256
        11           -154.8919243290977
        12           -154.8919395989974
        13           -154.8919477894997
        14           -154.8919527832317
        15           -154.8919570330184
        16           -154.8919591168921


Final properties of molecule 1

Standard deviation of NNs                :      0.00063190 Hartree         0.39652 kcal/mol
Energy                                   :   -154.89195912 Hartree

Your numbers may somewhat vary, particularly depending on what optimizer your MLatom setup is using. Above example is using ASE with the L-BFGS optimizer and it took 16 iterations to converge. You check the full output file geomopt_ase.log the tutorial zip file to inspect it closer. See below the instructions on optimizers.

Choosing the method or model

The user can benefit from MLatom’s many choices of the methods or models to be used for optimizations. See the separate manuals and tutorials, here we give brief examples to quickly get started. Many methods such as ANI-1ccx or AIQM1 are recognized automatically by MLatom as shown above.

The general way to define the QM method is to use method and the corresponding QM program providing its implementations (MLatom uses interfaces to such programs). For example, ab initio (e.g., HF or MP2) or DFT (e.g., B3LYP/6-31G*) can be requested with the input:

geomopt                 # 1. requests geometry optimization
method=B3LYP/6-31G*     # 2. request running DFT optimization with B3LYP
qmprog=PySCF            # 3. request using PySCF for B3LYP calculations; qmprog=Gaussian can be also used
xyzfile=init.xyz        # 4. initial geometry guess
optxyz=opt.xyz          # 5. file with optimized geometry

The general way to request the geometry optimization with the user-trained ML model is to specify the ML model type with MLmodelType and MLmodelIn keyword such as:

geomopt                 # 1. requests geometry optimization
MLmodelType=MACE        # 2. request optimization with the MACE-type of machine learning potential
MLmodelIn=mace.pt       # 3. the file with the trained model should be provided, here it is mace.pt file
xyzfile=init.xyz        # 4. initial geometry guess
optxyz=opt.xyz          # 5. file with optimized geometry

See manuals and tutorials on how to train such models, e.g., for MACE.

The advanced user-defined hybrid models such as those based on the delta-learning concept can be used for geometry optimizations only via Python API (see example below).

Note: In principle, our default recommendation is to use AIQM1 as it is usually the most accurate method for affordable cost. However, it is currently only applicable to the compounds with CHNO elements. Another limitation is that to unlock its full potential, we recommend to install the MNDO program for its QM part. The XACS cloud uses Sparrow instead, which currently only provides numerical gradients limiting its applicability for geometry optimizations. ANI-1ccx is hence a good alternative on the XACS cloud if it can be applied (it has additional limitations to neutral closed-shell compounds in ground state).

Optimization of many molecules

MLatom as a data-oriented program naturally allows to perform geometry optimizations of many molecules. You need to simply prepare the XYZ file with initial guesses of multiple molecules, e.g., for optimizing the hydrogen and methane molecules in one computational job your init.xyz file may look like:

2

1         0.0000000000        0.0000000000        0.0000000000
1         0.7414000000        0.0000000000        0.0000000000
5

C         0.0000000000        0.0000000000        0.0000000000
H         1.0870000000        0.0000000000        0.0000000000
H        -0.3623333220       -1.0248334322       -0.0000000000
H        -0.3623333220        0.5124167161       -0.8875317869
H        -0.3623333220        0.5124167161        0.8875317869

MLatom will save the optimized geometries in the XYZ file with the same order of molecules as in the initial guess. The output will contain the optimization details for each molecule too.

Charge and multiplicity

A quick reminder on how to define the charges and multiplicities of a single or multiple molecules. The input might look something like:

geomopt                 # 1. requests geometry optimization
AIQM1                   # 2. AI-enhanced quantum mechanical method 1
xyzfile=init.xyz        # 3. initial geometry guess
optxyz=opt.xyz          # 4. file with optimized geometry
charges=0,1             # 5. charge of the first molecule is 0, of the second is 1.
multiplicities=1,2      # 6. multiplicity of the first molecule is 1, second - 2

Note: It does not make sense to define charges and multiplicities for models like ANI-1ccx which were trained on the neutral closed-shell species

Choosing the optimization program

Now, you might have noticed that MLatom tells which program it used as an optimizer for running the geometry optimizations. If you used the XACS cloud for computations, it should have been ASE:

Atomic simulation environment (ASE): A. Hjorth Larsen, et al. J. Phys. Condens. Matter. 2017, 29, 273002

Note: In the output, MLatom will summarize what methods, models, and programs it used with the recommended citations to include in your publications. This is very useful information to pay attention to and properly give credit to the hard work of people. However, is not printed when MLatom is used through Python API, so please check carefully the manual and publications.

MLatom supports the use of several optimizers: aforementioned ASE (using L-BFGS algorithm), the default one in popular Gaussian program, and optimizers implemented in SciPy (L-BFGS algorithm by default). They can be chosen by adding optprog=[ASE or Gaussian or SciPy] keyword (case-insensitive), i.e., as:

geomopt                 # 1. requests geometry optimization
ANI-1ccx                # 2. pre-trained model
xyzfile=init.xyz        # 3. initial geometry guess
optxyz=opt.xyz          # 4. file with optimized geometry
optprog=gaussian        # 5. Gaussian is choosen as the optimizer

If no optimizer is requested with a keyword, MLatom will check the availability and use the optimizer in the order: Gaussian -> ASE -> SciPy.

Each of the optimizers have their advantages and disadvantages:

  • ASE is open-source and tested by the community. It is available on the XACS cloud and can be easily installed.
  • Gaussian is very versatile and time-tested program, allowing its users to exploit all Gaussian’s functionality as usual with additional benefit of access to ML models. It should be obtained and installed separately by the user. The I/O is currently not very efficient so that the optimizations with very fast ML models will take much longer than with other optimizers. It might be improved in the future.
  • Our SciPy impelementation is an experimental option so please use it on your own risk.

Note: MLatom is being rapidly developed, so it will get more features and improved geometry optimization features. Our recommendations will change too. Please check from time-to-time this tutorial and also follow our updates on the usual channels.

Whatever optimizer you choose, it is worth checking the instructions below for pitfalls and tips.

ASE optimizer

ASE performs the geometry optimization until it reaches the defined threshold for forces or maximum number of iterations. The default value for the maximum number of iterations is 200, for the forces 0.02 eV/Angstrom and the default algorithm is L-BFGS. This default behavior can be changed by using the corresponding ASE keywords, e.g.:

geomopt                 # 1. requests geometry optimization
ANI-1ccx                # 2. pre-trained model
xyzfile=init.xyz        # 3. initial geometry guess
optxyz=opt.xyz          # 4. file with optimized geometry
optprog=ase             # 5. ASE is choosen as the optimizer
ase.steps=10000         # 6. increasing the maximum number of iterations from 200 to 10000
ase.fmax=0.01           # 7. tightening the convergence criterion from 0.02 to 0.01 eV/Angstrom
ase.optimizer=BFGS      # 8. selecting the BFGS instead of L-BFGS algorithm for optimization

Note: If the ASE optimizer reaches its maximum number of optimization iterations, MLatom will save the final geometry at the last iteration. Please double-check MLatom’s output and if the number of iterations is 200, the geometry may be still far from the optimized one and you need to restart optimization. Setting ase.steps to a higher value, e.g., ase.steps=10000 might help too.

Gaussian optimizer

When Gaussian optimizer is used, you will see the gaussian.log file which is similar to the common Gaussian output file with the difference, that it will use the MLatom’s model to predict energies and gradients to perform the geometry optimization. Then you can use your favorite tools to analyze this output as usual for Gaussian, which is a big advantage for the Gaussian users. For above example, you can check the MLatom’s output file for the above geometry optimization with Gaussian optimizer:

==============================================================================
Optimization of molecule 1
==============================================================================

Iteration         Energy (Hartree)
            1           -154.8894274617628
            2           -154.8918239620292
            3           -154.8919507619391
            4           -154.8919601346281


Final properties of molecule 1

Standard deviation of NNs                :      0.00062817 Hartree         0.39419 kcal/mol
Energy                                   :   -154.89196013 Hartree

The examples of the full MLatom and Gaussian output files can be found in the downloaded tutorial zip archive.

An additional important advantage is that the advanced users can modify the gaussian.com file which contains the input required for geometry optimization with MLatom’s models. This way, the user can unlock the access to all the great Gaussian functionality for geometry optimizations such as freezing coordinates or defining input in internal coordinates, etc.

After the gaussian.com file is modified, you can run with it the Gaussian job as usual, e.g., in your system like g16 gaussian.com in command line.

Optimization through Python API

For many users, it might be more convenient to use the Python API for geometry optimizations as it provides more flexibility to use more complicated models and build custom workflows.

Below is a quick recap of how to do all above simulations but with Python API. In addition, we will show how to build a more complicated model based on delta-learning to perform the geometry optimization.

For starters, let’s optimize the geometries of ethanol. This will give a pretty good overview of how to write such a code:

import mlatom as ml

# Get the initial guess for the molecules to optimize
initmol = ml.data.molecule.from_xyz_file('init.xyz')
# initmol charge and multiplicity can be changed if required, e.g., by uncommenting the lines below:
#initmol.charge = 1 ; initmol.multiplicity = 2

# Choose one of the predifined (automatically recognized) methods
mymodel = ml.models.methods(method='ANI-1ccx')
# or QM method, e.g., B3LYP with Gaussian
#mymodel = ml.models.methods(method='B3LYP/6-31G*', program='Gaussian')
# or ML model, e.g., MACE:
#mymodel = ml.models.mace(model_file='mace.pt')

# Optimize the geometry with the choosen optimizer:
geomopt = ml.optimize_geometry(model=mymodel, initial_molecule=initmol, program='ASE', maximum_number_of_steps=10000)

# Get the final geometry
final_mol = geomopt.optimized_molecule
# and its XYZ coordinates
final_mol.xyz_coordinates

# Check and save the final geometry
print('Optimized coordinates:')
print(final_mol.get_xyz_string())
final_mol.write_file_with_xyz_coordinates(filename='final.xyz')

# You can also check the optimization trajectory
print('Number of optimization steps:', len(geomopt.optimization_trajectory.steps))
# and access any geometry during optimization as
intermediate_step_geom = geomopt.optimization_trajectory.steps[2].molecule
print('Intermediate coordinates:')
print(intermediate_step_geom.get_xyz_string())

Note that there is a subtle but important difference: if you use initial_molecule, the initmol object with not be changed, it will remain the original initial molecule. However, if you use instead molecule parameters, it will save the final optimized coordinates and other properties (e.g., forces and energies) into the original molecule object, i.e.:

# This will not change the initmol, you can check it as
print('Initial molecule before optimization')
print(initmol.get_xyz_string())
geomopt = ml.optimize_geometry(model=mymodel, initial_molecule=initmol, program='ASE', maximum_number_of_steps=10000)
print('Initial molecule after optimization')
print(initmol.get_xyz_string())
print('Final coordinates after optimization')
print(geomopt.optimized_molecule.get_xyz_string())

# Compare to the following code that will change the initmol, you can check it as
print('Initial molecule before optimization')
print(initmol.get_xyz_string())
geomopt = ml.optimize_geometry(model=mymodel, molecule=initmol, program='ASE', maximum_number_of_steps=10000)
print('Initial molecule after optimization')
print(initmol.get_xyz_string())

The first option is good to preserve the initial molecule, the second option is good if you just want the final geometry.

To run above example, you might want to use the required files (init.xyz and mace.pt) from the downloaded tutorial zip archive.

Optimization with custom hybrid models

Finally, a more advanced example of using a custom delta-learning model, trained on the differences between full configuration interaction and Hartree–Fock energies:

XYZ coordinates, full CI energies and Hartree–Fock energies of the training points can be found in the downloaded tutorial zip archive (xyz_h2_451.datE_FCI_451.datE_HF_451.dat). You can try to train the delta-learning model by yourself. Here, we provide a pre-trained model delta_FCI-HF_h2.unf.

import mlatom as ml

# Get the initial guess for the molecules to optimize
initmol = ml.data.molecule.from_xyz_file('h2_init.xyz')

# Let's build our own delta-model
# First, define the baseline QM method, e.g.,
baseline = ml.models.model_tree_node(name='baseline', operator='predict', model=ml.models.methods(method='HF/STO-3G', program='pyscf'))
# Second, define the correcting ML model, in this case the KREG model trained on the differences between full CI and HF
delta_correction = ml.models.model_tree_node(name='delta_correction', operator='predict', model=ml.models.kreg(model_file='delta_FCI-HF_h2.unf',ml_program='MLatomF'))
# Third, build the delta-model which would sum up the predictions by the baseline (HF) with the correction from ML
delta_model = ml.models.model_tree_node(name='delta_model', children=[baseline, delta_correction], operator='sum')

# Optimize the geometry with the delta-model as usual:
geomopt = ml.optimize_geometry(model=delta_model, initial_molecule=initmol)

# Get the final geometry approximately at the full CI level
final_mol = geomopt.optimized_molecule
print('Optimized coordinates:')
print(final_mol.get_xyz_string())
final_mol.write_file_with_xyz_coordinates(filename='final.xyz')

# Let's check how many full CI calculations, our delta-model saved us
print('Number of optimization steps:', len(geomopt.optimization_trajectory.steps))

This example is inspired by our book chapters on delta-learning and kernel method potentials, with the models trained on data taken from the tutorial book chapter.

Given an initial geometry h2_init.xyz, you will get the following output when using the codes above.

Optimized coordinates:
2

H             0.0000000000000          -0.0000000000000           0.3707890910739
H             0.0000000000000          -0.0000000000000          -0.3707890910739

Number of optimization steps: 4

Any questions or suggestions?

If you have further questions, criticism, and suggestions, we would be happy to receive them in English or Chinese via emailSlack (preferred), or WeChat (please send an email to request to add you to the XACS user support group).