Universal ML models

New: we highly recommend you to check out UAIQM that is our ultimate solution to the universal ML models.

MLatom supports a wide range of universal machine learning (ML)-based models including ML potentials and hybrid ML-enhanced quantum mechanical (QM) methods. They can be used out-of-box without training. The table below lists the methods available with the links to specific tutorials:

method

model type

keywords in MLatom

elements supported

gradients

hessian

charged

radicals

excited-states

UAIQM

ML & ML/QM

uaiqm

all elements

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

not available yet

AIQM1

ML/QM

aiqm1

H, C, N, O

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

AIQM1@DFT

ML/QM

aiqm1@dft

H, C, N, O

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

AIQM1@DFT*

ML/QM

aiqm1@dft*

H, C, N, O

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

AIQM1@XACS

ML/QM

aiqm1

H, C, N, O

\(\times\)

\(\times\)

\(\checkmark\)

\(\checkmark\)

\(\times\)

AIQM1@DFT@XACS

ML/QM

aiqm1@dft

H, C, N, O

\(\times\)

\(\times\)

\(\checkmark\)

\(\checkmark\)

\(\times\)

AIQM1@DFT*@XACS

ML/QM

aiqm1@dft*

H, C, N, O

\(\times\)

\(\times\)

\(\checkmark\)

\(\checkmark\)

\(\times\)

DM21

ML/QM

dm21/[basis set]

main-group elements

\(\times\)

\(\times\)

\(\checkmark\)

\(\checkmark\)

\(\times\)

DM21m

ML/QM

dm21m/[basis set]

main-group elements

\(\times\)

\(\times\)

\(\checkmark\)

\(\checkmark\)

\(\times\)

DM21mc

ML/QM

dm21mc/[basis set]

main-group elements

\(\times\)

\(\times\)

\(\checkmark\)

\(\checkmark\)

\(\times\)

DM21mu

ML/QM

dm21mu/[basis set]

main-group elements

\(\times\)

\(\times\)

\(\checkmark\)

\(\checkmark\)

\(\times\)

ANI-1x

MLP

ani1x, ani-1x

H, C, N, O

\(\checkmark\)

\(\checkmark\)

\(\times\)

\(\times\)

\(\times\)

ANI-1x-D4

MLP

ani1xd4, ani-1x-d4

H, C, N, O

\(\checkmark\)

\(\checkmark\)

\(\times\)

\(\times\)

\(\times\)

ANI-1ccx

MLP

ani1ccx, ani-1ccx

H, C, N, O

\(\checkmark\)

\(\checkmark\)

\(\times\)

\(\times\)

\(\times\)

ANI-2x

MLP

ani2x, ani-2x

H, C, N, O, F, S, Cl

\(\checkmark\)

\(\checkmark\)

\(\times\)

\(\times\)

\(\times\)

ANI-2x-D4

MLP

ani2xd4, ani-2x-d4

H, C, N, O, F, S, Cl

\(\checkmark\)

\(\checkmark\)

\(\times\)

\(\times\)

\(\times\)

ANI-1xnr

MLP

ani-1xnr, ani1xnr

H, C, N, O

\(\checkmark\)

\(\checkmark\)

\(\times\)

\(\times\)

\(\times\)

AIMNet2@B97-3c

MLP

aimnet2@b973c

H, B, C, N, O, F, Si, P, S, Cl, As, Se, Br, I

\(\checkmark\)

\(\times\)

\(\checkmark\)

\(\times\)

\(\times\)

AIMNet2@ωB97M-D3

MLP

aimnet2@wb97m-d3

H, B, C, N, O, F, Si, P, S, Cl, As, Se, Br, I

\(\checkmark\)

\(\times\)

\(\checkmark\)

\(\times\)

\(\times\)

In this tutorial, we will first introduce how to use these universal methods to perform various tasks with MLatom in general. Then we will go through each method in detail.

Using universal ML models

Below is also a brief general overview of how to use universal ML models with MLatom:

Single-point calculations

For single-point calculation, only 3-5 lines would be needed for MLatom input file as usual with one of the methods above specified:

AIMNet2@b973c
xyzfile=sp.xyz
yestfile=energy.dat

where the sp.xyz is the XYZ geometries of molecule(s), which can also be defined explicitly:

AIMNet2@b973c
xyzfile='
2

H    0.000000    0.000000    0.363008
H    0.000000    0.000000   -0.363008
5

C    0.000000    0.000000    0.000000
H    0.627580    0.627580    0.627580
H   -0.627580   -0.627580    0.627580
H    0.627580   -0.627580   -0.627580
H   -0.627580    0.627580   -0.627580
'
yestfile=energy.dat

With keywords ygradxyzestfile and hessianestfile, gradients and hessian can also be obtained in the file specified.

Since DM21 functionals are integrated in PySCF, user would need to use method and qmprog keywords in input file as the way to define QM methods in MLatom. Here is an example of the input file for using DM21 functional with 6-31G* basis set:

method=DM21/6-31G*
qmprog=pyscf
xyzfile=sp.xyz
yestfile=energy.dat

Python API provides a flexible alternative to use methods in MLatom. In our case, user can define method by using mlatom.models.methods module and specify the keywords mentioned above, e.g.

method = mlatom.models.methods(method='ANI-1xnr')
# method = mlatom.models.methods(method='DM21/6-31G*', program='pyscf')

We provide here an example to calculate energy, gradients and hessian with ANI-1xnr.

import mlatom as ml
# read molecule from .xyz file
molDB = ml.data.molecular_database.from_xyz_file('sp.xyz')

# define method
model = ml.models.methods(method='ANI-1xnr')

model.predict(
    molecular_database=molDB,
    calculate_energy=True,
    calculate_energy_gradients=True,
    calculate_hessian=True)

print(f'Energy in Hartree for molecule 0: {molDB[0].energy}')
print(f'Gradients in Hartree/Angstrom for molecule 1: {molDB[1].get_energy_gradients()}')
print(f'Hessian in Hartree/Angstrom^-2 for molecule 1: {molDB[1].hessian}')

For more details on how to perform single-point calculations with MLatom, please check our tutorial.

Geometry optimization and frequency calculations

Geometry optimization is a common task in studying chemical system and subsequent frequency calculation on the optimized molecule accompanied by thermochemical properties are also crutial for analysis.

To perform geometry optimization with input file in MLatom, user just need to request geomopt option in the first line along with the method to be used and initial guess:

geomopt                 # 1. requests geometry optimization
ANI-1ccx                # 2. universal MLP
xyzfile='               # 3. initial geometry guess
9

C       -1.691449880     -0.315985130      0.000000000
H       -1.334777040      0.188413060      0.873651500
H       -1.334777040      0.188413060     -0.873651500
H       -2.761449880     -0.315971940      0.000000000
C       -1.178134160     -1.767917280      0.000000000
H       -1.534806620     -2.272315330      0.873651740
H       -1.534807450     -2.272316160     -0.873650920
O        0.251865840     -1.767934180     -0.000001150
H        0.572301420     -2.672876720      0.000175020
'
optxyz=opt.xyz          # 4. (optional) file with optimized geometry.
optprog=geometric       # 5. request geometric optimizer

Each optimization step will be printed to the output file, which can be controlled by printall and printmin keywords in after version 3.4.0. User can also choose whether to dump the optimization trajectory with keyword dumpopttrajs.

After geometry optimization, frequency calculation can be performed with freq option in the input file as is shown below:

freq                    # 1. requests frequency calculation
ANI-1ccx                # 2. universal MLP
xyzfile='               # 3. optimized geometry
9

C            -1.672571          -0.341122          -0.000001
H            -1.307766           0.181713           0.885095
H            -1.307762           0.181707          -0.885099
H            -2.764560          -0.305014          -0.000003
C            -1.188732          -1.771664           0.000009
H            -1.559124          -2.298647           0.885998
H            -1.559099          -2.298653          -0.885987
O             0.237878          -1.729915           0.000028
H             0.575701          -2.626896           0.000135
'

In the output file, user will find the vibration analysis including frequency, reduced mass and force constant of each normal mode, and also thermochemistry results. The output file in this case can be downloaded here.

For more details on these two tasks with MLatom, please check our tutorials on geometry optimization and frequency calculations.

Molecular dynamics

One of the advantages of machine learning potentials is the ultra-fast speed to propagate thousands of trajectories within several hours compared with a few weeks for commonly used DFT methods (if you do not use DM21 that is). MLatom provides an easy way to run MD and also quasi-classical MD which is popular in chemical reaction simulation.

For using input file in MLatom, the only difference here is the keywords for method. For example, if you want to use AIMNet2 targeting RKS B97-3c to run dynamics for hydrogen molecule in the NVT ensemble using the Nosé–Hoover thermostat, the input file can look like:

MD                                # 1. requests molecular dynamics
AIMNet2@b973c                     # 2. use AIMNet2@B97-3c method
initConditions=user-defined       # 3. use user-defined initial conditions
initXYZ=h2_init.xyz               # 4. file with initial geometry; Unit: Angstrom
initVXYZ=h2_init.vxyz             # 5. file with initial velocity; Unit: Angstrom/fs
dt=0.3                            # 6. time step; Unit: fs
trun=30                           # 7. total time; Unit: fs
thermostat=Nose-Hoover            # 8. use Nose-Hoover thermostat
ensemble=NVT                      # 9. NVT ensemble
temperature=300                   # 10. Run MD at 300 Kelvin

The initial XYZ coordinates and velocities can be downloaded here: h2_init.xyz, h2_init.vxyz

We also provide below the snippet to run the same task with Python API. As usual, only the code to define the method used will be changed.

import mlatom as ml
# Use user-defined initial conditions
mol = ml.data.molecule.from_xyz_file('h2_init.xyz')
init_cond_db = ml.generate_initial_conditions(molecule=mol,
                                            generation_method='user-defined',
                                            file_with_initial_xyz_coordinates='h2_init.xyz',
                                            file_with_initial_xyz_velocities='h2_init.vxyz')
init_mol = init_cond_db[0]

# Initializing model
model = ml.models.methods(method='AIMNet2@b973c')

# Initializing thermostat
nose_hoover = ml.md.Nose_Hoover_thermostat(temperature=300, molecule=init_mol)

# Run molecular dynamics
dyn = ml.md(model=model,
            molecule_with_initial_conditions=init_mol,
            thermostat=nose_hoover,
            ensemble='NVT',
            time_step=0.3,
            maximum_propagation_time=30.0)

# Dump trajectory
traj = dyn.molecular_trajectory
traj.dump(filename='traj', format='plain_text')
traj.dump(filename='traj.h5', format='h5md')

print(f"Number of steps in the trajectory: {len(traj.steps)}")

Fine-tuning universal models

tl_ani

Note

Fine-tuning for the universal models is supported for ANI-type models ANI-1x, ANI-1ccx, ANI-1ccx-gelu, and ANI-2x.

When using this feature, please cite:

  • Seyedeh Fatemeh Alavi, Yuxinxin Chen, Yi-Fan Hou, Fuchun Ge, Peikun Zheng, Pavlo O. Dral. Towards Accurate and Efficient Anharmonic Vibrational Frequencies with the Universal Interatomic Potential ANI-1ccx-gelu and Its Fine-Tuning. 2024. Preprint on ChemRxiv: https://doi.org/10.26434/chemrxiv-2024-c8s16 (2024-10-09).

AIQM1

AIQM1 (artificial intelligence–quantum mechanical method 1) is a general-purpose method approaching the gold-standard coupled cluster quantum mechanical method with high computational speed of the approximate low-level semiempirical quantum mechanical methods for the ground-state, closed-shell species, but also transferable for calculation of charged and radical species as well as for excited-state calculations with a good accuracy. See AIQM1 paper for more details. Please cite this paper alongside other required citations:

Strengths: AIQM1 is especially good for energy calculations and geometry optimizations of closed-shell molecules in their ground-state.

Limitations: This method is currently limited to compounds only containing H, C, N, and O elements.

The detailed tutorial is available.

DM21

DM21 is an ML-enhanced DFT method published in Science by DeepMind (please cite it when you use this method). Our installation follows the GitHub page. There are four variants of DM21 (DM21 - default, DM21m, DM21mc, DM21mu), see the above GitHub page for the details.

Using DM21 and its variants is similar to using common DFT functionals. Users need to specify both the functional and the basis set to use. Worth noting is that DM21 is not stable and there is no gaurantee to converge. Time for prediction is longer than previous methods since by default in MLatom, it will start from the relatively cheap functional B3LYP as suggested by their official documentation to make SCF faster. It can be only used for single-point calculations in the current implementation (the interface program does not provide gradients or hessians and we did not implement numerical derivatives for this method yet).

Example of an input file:

method=DM21/6-31G*
qmprog=pyscf
xyzfile='
2

H    0.000000    0.000000    0.363008
H    0.000000    0.000000   -0.363008
5

C    0.000000    0.000000    0.000000
H    0.627580    0.627580    0.627580
H   -0.627580   -0.627580    0.627580
H    0.627580   -0.627580   -0.627580
H   -0.627580    0.627580   -0.627580
'
yestfile=energy.dat

In Python:

import mlatom as ml
# read molecule from .xyz file
molDB = ml.data.molecular_database.from_xyz_file('sp.xyz')

# define method
method = mlatom.models.methods(method='DM21/6-31G*', program='pyscf')

method.predict(
    molecular_database=molDB,
    calculate_energy=True,
    calculate_energy_gradients=True,
    calculate_hessian=True)

print(f'Energy in Hartree for molecule 0: {molDB[0].energy}')
print(f'Gradients in Hartree/Angstrom for molecule 1: {molDB[1].get_energy_gradients()}')
print(f'Hessian in Hartree/Angstrom^-2 for molecule 1: {molDB[1].hessian}'')

ANI models zoo

MLatom contains 3 public models in ANI model zoo from TorchANI: ANI-1x, ANI-1ccx and ANI-2x. In addition, MLatom also allows to use D4-dispersion corrected methods ANI-1x-D4 and ANI-2x-D4. Below we provide some useful notes when using these methods in MLatom.

  • ANI-1x and ANI-2x were trainied on DFT level data

  • ANI-1ccx possess the highest accuracy targeting CCSD(T)*/CBS.

  • ANI-1ccx and ANI-1x are limited to CHNO elements, while ANI-2x can be used for CHNOFClS elements.

  • D4 dispersion correction in ANI-1x-D4 and ANI-2x-D4 correspond to ωB97X functional.

  • These methods are limited to predicting energies and forces for neutral closed-shell compounds in their ground state.

  • MLatom will report uncertainties for calculations with these methods based on the standard deviation between neural network (NN) predictions.

Example of an input file:

ANI-1ccx
geomopt
xyzfile='
2

H    0.000000    0.000000    0.363008
H    0.000000    0.000000   -0.363008
5

C    0.000000    0.000000    0.000000
H    0.627580    0.627580    0.627580
H   -0.627580   -0.627580    0.627580
H    0.627580   -0.627580   -0.627580
H   -0.627580    0.627580   -0.627580
'

In Python:

import mlatom as ml
# read molecule from .xyz file
molDB = ml.data.molecular_database.from_xyz_file('sp.xyz')

# define method
method = mlatom.models.methods(method='ANI-1ccx')

method.predict(
    molecular_database=molDB,
    calculate_energy=True,
    calculate_energy_gradients=True,
    calculate_hessian=True)

print(f'Energy in Hartree for molecule 0: {molDB[0].energy}')
print(f'Gradients in Hartree/Angstrom for molecule 1: {molDB[1].get_energy_gradients()}')
print(f'Hessian in Hartree/Angstrom^-2 for molecule 1: {molDB[1].hessian})

Reactive ANI: ANI-1xnr

ANI-1xnr is a general reactive ANI-type NN trained on condensed-phase reactive data capable of real-world reactive systems containing C, H, N, O elements, see the Nature Chemistry publication. Implementation is done by interfacing to the model from ani-1xnr GitHub repository.

Note

The first time any of the models are istantiated, the models will be downloaded automatically from the ani-model-zoo repository to the local folder ./local. User can choose to download them beforehand.

The input is analogous to other ANI models.

AIMNet2

AIMNet2 aims to solve the problem of ANI which is less capable of dealing with non-local interaction and open-shell charged species. There are two pretrained model targeting B97-3c and ωB97M-D3 accuracy (the user need to choose one of them using the keywords aimnet2@b973c or aimnet2@wb97m-d3). It is applicable to 14 elements including H, B, C, N, O, F, Si, P, S, Cl, As, Se, Br, I. Currently, hessian is not supported in MLatom.

Note

The first time any of the models are istantiated, the models will be downloaded automatically from the AIMNet2 GitHub repository to the local folder ./local. User can choose to download them beforehand.

Example of an input file:

AIMNet2@wb97m-d3
geomopt
xyzfile='
2

H    0.000000    0.000000    0.363008
H    0.000000    0.000000   -0.363008
5

C    0.000000    0.000000    0.000000
H    0.627580    0.627580    0.627580
H   -0.627580   -0.627580    0.627580
H    0.627580   -0.627580   -0.627580
H   -0.627580    0.627580   -0.627580
'

In Python:

import mlatom as ml
# read molecule from .xyz file
molDB = ml.data.molecular_database.from_xyz_file('sp.xyz')

# define method
method = mlatom.models.methods(method='AIMNet2@wb97m-d3')

method.predict(
    molecular_database=molDB,
    calculate_energy=True,
    calculate_energy_gradients=True,
    calculate_hessian=True)

print(f'Energy in Hartree for molecule 0: {molDB[0].energy}')
print(f'Gradients in Hartree/Angstrom for molecule 1: {molDB[1].get_energy_gradients()}')
print(f'Hessian in Hartree/Angstrom^-2 for molecule 1: {molDB[1].hessian})