AIQM1

AIQM1 (artificial intelligence–quantum mechanical method 1) is a general-purpose method approaching the gold-standard coupled cluster quantum mechanical method with high computational speed of the approximate low-level semiempirical quantum mechanical methods for the ground-state, closed-shell species, but also transferable for calculation of charged and radical species as well as for excited-state calculations with a good accuracy. See AIQM1 paper for more details. Please cite this paper alongside other required citations:

Strengths: AIQM1 is especially good for energy calculations and geometry optimizations of closed-shell molecules in their ground-state.

Limitations: This method is currently limited to compounds only containing H, C, N, and O elements.

Below we provide important notes on installation of AIQM1 and non-exhaustive examples of its uses via command line/input files and Python API for:

Installation

Refer to MLatom installation instructions. AIQM1 requires TorchANI and dftd4 as well as a semi-empirical program. The semi-empirical program can be MNDO which supports analytical gradients and excited-state calculations. On MLatom@XACS cloud computing service we cannot use MNDO but use development version of SCINE Sparrow (see also a paper on Sparrow). This version does not support analytical gradients and no excited-state calculations with AIQM1. Hence, Sparrow can be used with numerical gradients for geometry optimizations and MD but might lead to numerical instabilities in, e.g., frequency calculations. Some features described below might require interfaces to the Gaussian or ASE or PySCF packages. Installation instructions, a usage manual, and proper citations are given below.

Single-point calculations

Single-point calculations of energies (and gradients if needed) of closed-shell molecules in electronic ground state is the simplest job, which can be run with 3-4 line MLatom input file, e.g., sp.inp:

AIQM1 # or AIQM1@DFT or AIQM1@DFT* if you want to use these methods
xyzfile=sp.xyz
yestfile=enest.dat
ygradxyzestfile=gradest.dat

This input requires a sp.xyz file with XYZ geometries of molecules (you can provide many molecules as usual for MLatom), e.g., for hydrogen and methane sp.xyz file can look like (geometries in Å):

2

H    0.000000    0.000000    0.363008
H    0.000000    0.000000   -0.363008
5

C    0.000000    0.000000    0.000000
H    0.627580    0.627580    0.627580
H   -0.627580   -0.627580    0.627580
H    0.627580   -0.627580   -0.627580
H   -0.627580    0.627580   -0.627580

After you prepared your input files sp.inp and sp.xyz, you can run MLatom as usual:

mlatom sp.inp > sp.out

After the calculations finish, MLatom output sp.out will contain the standard deviation of NN prediction and components of AIQM1 energies:

Standard deviation of NN contribution   :      0.00892407 Hartree        5.59994 kcal/mol
NN contribution                         :     -0.00210898 Hartree
Sum of atomic self energies             :     -0.08587317 Hartree
ODM2* contribution                      :     -1.09094119 Hartree
D4 contribution                         :     -0.00000889 Hartree
Total energy                            :     -1.17893224 Hartree
Standard deviation of NN contribution   :      0.00025608 Hartree        0.16069 kcal/mol
NN contribution                         :      0.00958812 Hartree
Sum of atomic self energies             :    -33.60470494 Hartree
ODM2* contribution                      :     -6.86968756 Hartree
D4 contribution                         :     -0.00010193 Hartree
Total energy                            :    -40.46490632 Hartree

You can also get AIQM1 energies saved in file enest.dat looking like this (energies in Hartree):

 -1.178932238420
-40.464906315250

and gradients saved in file gradest.dat looking like this (gradients in Hartree/Å):

2

     0.000000000000       0.000000000000       0.000032023551
     0.000000000000       0.000000000000      -0.000032023551
5

    -0.000000000000      -0.000000000000       0.000000000000
     0.000490470799       0.000490470714       0.000490470881
    -0.000490470799      -0.000490470714       0.000490470881
     0.000490470799      -0.000490470714      -0.000490470881
    -0.000490470799       0.000490470714      -0.000490470881

You can also perform the same calculation in Python API. The code snippet looks like:

import mlatom as ml
# read molecule from .xyz file
molDB = ml.data.molecular_database.from_xyz_file('sp.xyz')
# define method
model = ml.models.methods(method='AIQM1', qm_program='MNDO')
# perform the calculations
model.predict(molecular_database=molDB, calculate_energy_gradients=True, calculate_hessian=True)
# print the energy. You can print extract properties from each molecule object, such as energy gradients and Hessian
print(molDB[0].energy)
print(molDB[1].energy)
print(molDB[1].energy-molDB[0].energy)

The examples above were tested with the interface to the MNDO program. If you use the MLatom@XACS cloud computing service, it runs calculations using the interface to the SCINE Sparrow package, thus, the numbers may be slightly different. See tutorial for more details on single point energy calculations.

Geometry optimizations

Geometry optimization of closed-shell molecules in electronic ground state is as simple as running single point calculations and 4-line MLatom input file, e.g., opt.inp, looks like this:

AIQM1
xyzfile=opt.xyz
optxyz=final_geo.xyz
geomopt
optprog=gaussian # or optprog=ase if you choose ASE

This input requires opt.xyz file with initial XYZ geometries of molecules to be optimized (you can provide many molecules as usual for MLatom), e.g., for methane opt.xyz file can look like (geometries in Å):

5

C         0.0000000000        0.0000000000        0.0000000000
H         1.0870000000        0.0000000000        0.0000000000
H        -0.3623333220       -1.0248334322       -0.0000000000
H        -0.3623333220        0.5124167161       -0.8875317869
H        -0.3623333220        0.5124167161        0.8875317869

After you prepared your input files opt.inp and opt.xyz, you can run MLatom as usual:

mlatom opt.inp

After the calculations finish, the optimized geometries are saved in either Gaussian or ASE-style output files.

If you use Gaussian, the Gaussian output files of geometry optimizations are saved in mol_1.log, mol_2.log, … files for each molecule, where you can find the optimized geometries and visualize them by GausianView program as in any other typical Gaussian job.

If you use ASE, you can directly obtain optimized XYZ geometries in a single file final_geo.xyz with MLatom format, which for our example looks like (geometries in Å):

5

C        0.00000000       0.00000000       0.00000000
H        1.08666998      -0.00000000       0.00000000
H       -0.36222332      -1.02452229      -0.00000000
H       -0.36222332       0.51226114      -0.88726233
H       -0.36222332       0.51226114       0.88726233

In Python API, the code snippet looks like:

import mlatom as ml

# Get the initial guess for the molecules to optimize
initmol = ml.data.molecule.from_xyz_file('opt.xyz')

# Choose one of the predifined (automatically recognized) methods
mymodel = ml.models.methods(method='AIQM1')

# Optimize the geometry with the choosen optimizer:
geomopt = ml.optimize_geometry(model=mymodel,
                               initial_molecule=initmol,
                               program='ASE',
                               maximum_number_of_steps=10000)

# Get the final geometry
final_mol = geomopt.optimized_molecule
# and its XYZ coordinates
final_mol.xyz_coordinates

# Check and save the final geometry
print('Optimized coordinates:')
print(final_mol.get_xyz_string())
final_mol.write_file_with_xyz_coordinates(filename='final_geo.xyz')

See tutorial for more details on geometry optimization.

TS optimization

The geometry optimization of transition states structures (TSs) is almost the same as above with the only difference that we replace geomopt with ts keyword. These calculations require calculations of Hessians and thus are more expensive. Currently, they can only be performed with Gaussian interface, i.e., use optprog=gaussian in the input file. Example of input file ts.inp:

AIQM1
xyzfile=ts.xyz
optxyz=final_geo.xyz
ts
optprog=gaussian

The xyz file ts.xyz required by the input file should be the initial guess for TS. For a pericyclic reaction:

_images/pericyclic.jpg

the file ts.xyz can look like:

14

C         -0.12282076     -1.09054053      1.14065341
C          0.17199635      0.23301894      1.48248953
C         -0.12282076      1.35549534      0.70296425
C         -0.12282076      1.35549534     -0.70296425
C          0.17199635      0.23301894     -1.48248953
C         -0.12282076     -1.09054053     -1.14065341
H          0.82454258      0.40945200      2.34048883
H         -0.03331673      2.33317669      1.17510642
H         -0.03331673      2.33317669     -1.17510642
H          0.82454258      0.40945200     -2.34048883
H          0.43031244     -1.88602881     -1.64114758
H         -1.14789312     -1.35457363     -0.93625533
H          0.43031244     -1.88602881      1.64114758
H         -1.14789312     -1.35457363      0.93625533

When the ts.inp and ts.xyz are both ready, the TS geometry optimization could be conducted via running MLatom as usual:

mlatom ts.inp

After the calculations finish, the optimized TS geometries will be saved in the XYZ output file final_geo.xyz. You can check the TS optimization by investigating frequencies: the number of imaginary frequencies for successful TS optimization should be one and analyze the direction of nuclear motions by visualizing Gaussian output .log files with your favorite software (e.g., GaussView).

14

6  -0.14824052  -1.18232681   1.09011331
6   0.27697295   0.08303553   1.40719277
6  -0.14824052   1.26264088   0.68226088
6  -0.14824052   1.26264088  -0.68226088
6   0.27697295   0.08303553  -1.40719277
6  -0.14824052  -1.18232681  -1.09011331
1   1.16206956   0.21862475   2.03298866
1  -0.22714441   2.21025461   1.20566513
1  -0.22714441   2.21025461  -1.20566513
1   1.16206956   0.21862475  -2.03298866
1   0.37749585  -2.04342303  -1.49163954
1  -1.19537249  -1.36555399  -0.92796582
1   0.37749585  -2.04342303   1.49163954
1  -1.19537249  -1.36555399   0.92796582

In Python API, the code snippet looks like:

import mlatom as ml

# Get the initial guess for the molecules to optimize
initmol = ml.data.molecule.from_xyz_file('ts.xyz')

# Choose one of the predifined (automatically recognized) methods
mymodel = ml.models.methods(method='AIQM1')

# Optimize the geometry with the choosen optimizer:
geomopt = ml.optimize_geometry(model=mymodel,
                               initial_molecule=initmol,
                               ts=True,
                               program='ASE',
                               maximum_number_of_steps=10000)

# Get the final geometry
final_mol = geomopt.optimized_molecule
# and its XYZ coordinates
final_mol.xyz_coordinates

# Check and save the final geometry
print('Optimized coordinates:')
print(final_mol.get_xyz_string())
final_mol.write_file_with_xyz_coordinates(filename='final_geo.xyz')

See tutorial for more details on geometry optimization of transition states structures.

Thermochemical calculations

Thermochemical properties of closed-shell molecules in electronic ground state can be calculated at AIQM1 level by adding an option freq to the MLatom input file, e.g., freq.inp and they are typically run on AIQM1-optimized geometries. An example of MLatom input file:

AIQM1
xyzfile=freq.xyz
freq
optprog=gaussian # or optprog=ase if you choose ASE

This input requires freq.xyz file with initial XYZ geometries of molecules (you can provide many molecules as usual for MLatom), e.g., for methane freq.xyz file can look like (geometries in Å):

5

C    0.000000    0.000000    0.000000
H    0.627580    0.627580    0.627580
H   -0.627580   -0.627580    0.627580
H    0.627580   -0.627580   -0.627580
H   -0.627580    0.627580   -0.627580

After you prepared your input files freq.inp and freq.xyz, you can run MLatom as usual:

mlatom freq.inp > freq.out

After the calculations finish, MLatom output freq.out will contain the summary with atomization enthalpy at 0 K, ZPVE-exclusive atomization energy at 0 K, and heat of formation at 298 K for the molecule.

If you use Gaussian, the Gaussian output files of frequency calculations are saved in mol_1.log, mol_2.log, … files for each molecule; these files contain ZPVE energy and lots of thermochemical data such as entropy and the Gibbs free energy. MLatom output will contain additional information as described above with following lines for methane example:

Standard deviation of NN contribution   :      0.00025608 Hartree        0.16069 kcal/mol
NN contribution                         :      0.00958812 Hartree
Sum of atomic self energies             :    -33.60470494 Hartree
ODM2* contribution                      :     -6.86968756 Hartree
D4 contribution                         :     -0.00010193 Hartree
Total energy                            :    -40.46490632 Hartree
Atomization enthalpy at 0 K             :       391.58894 kcal/mol
ZPE exclusive atomization energy at 0 K :       419.90907 kcal/mol
Heat of formation at 298.15 K           :       -17.30543 kcal/mol

When you use ASE for the calculation of thermochemical properties, you should specify ase.linear and ase.symmetrynumber this two keywrods. ase.linear is 0 for nonlinear molecule, 1 for linear molecule, and ase.symmetrynumber is the rotational symmetry number for each molecule, (see Table 10.1 and Appendix B of C. Cramer “Essentials of Computational Chemistry”, 2nd Ed.). For example, for hydrogen and methane this two molecules, you should set ase.linear=1,0 and ase.symmetrynumber=2,12.

The Python API provides more flexibility to use more complicated models and build custom workflows. Below is a workflow of optimizing geometry and calculating frequency.

import mlatom as ml

# Get the initial guess for the molecules to optimize
initmol = ml.data.molecule.from_xyz_file('freq.xyz')

# Choose one of the predifined (automatically recognized) methods
mymodel = ml.models.methods(method='AIQM1')

# Optimize the geometry with the choosen optimizer:
geomopt = ml.optimize_geometry(model=mymodel,
                               initial_molecule=initmol,
                               program='ASE',
                               maximum_number_of_steps=10000)

# Get the optimized molecule
final_mol = geomopt.optimized_molecule

# Do vibration analysis and thermochemistry calculation
freq = ml.thermochemistry(model=mymodel,
                          molecule=final_mol,
                          program='ASE')
# or vibration analysis only
# freq = ml.freq(model=mymodel, molecule=final_mol, program='')

# Save the molecule with vibration analysis and thermochemistry results
final_mol.dump(filename='final_mol.json',format='json')

# Check vibration analysis
print("Mode     Frequencies     Reduced masses     Force Constants")
print("           (cm^-1)            (AMU)           (mDyne/A)")
for ii in range(len(final_mol.frequencies)):
    print("%d   %13.4f   %13.4f   %13.4f"%(ii,final_mol.frequencies[ii],final_mol.reduced_masses[ii],final_mol.force_constants[ii]))

# Check thermochemistry results
print(f"Zero-point vibrational energy: {final_mol.ZPE} Hartree")
print(f"Enthalpy at 298 K: {final_mol.H} Hartree")
print(f"Gibbs Free energy at 298 K: {final_mol.G} Hartree")
print(f"Heat of formation at 298 K: {final_mol.DeltaHf298} Hartree")

See tutorial for more details on frequency and thermochemistry calculations.

Special input for semi-empirical programs

You need to use QMprogramKeywords

Sometimes, you may need to change the semiempirical quantum mechanical part of AIQM1 by informing the MNDO or Sparrow program. MLatom implementation therefore supports the request to read user-defined MNDO or Sparrow keywords via option QMprogramKeywords=[file name with MNDO or Sparrow keywords], which are passed to the MNDO program. For historical reasons, mndokeywords is also supported for the MNDO program. Please consult the MNDO program manual for the available keywords.

Note

Whenever you request special keywords for MNDO, keywords iop=-22 immdp=-1 nsav15=3 igeom=1 iform=1 are always required and AIQM1 calculations can be run only for a single molecule in xyz file.

Calculations of charged species and radicals

For example, if we want to optimize geometry of protonated water H3O+, then you can use the following MLatom input optcs.inp:

AIQM1
xyzfile=optcs.xyz
optxyz=final_geo.xyz
geomopt
optprog=ase
charges=1

Similarly, for radicals, you can provide the keyword multiplicities.

This input requires optcs.xyz file with initial XYZ geometries of molecules to be optimized, e.g. (geometries in Å):

4

O      0.00000000     0.00000000      0.08727273
H      0.00000000     0.90509668     -0.23272727
H     -0.78383672    -0.45254834     -0.23272727
H      0.78383672    -0.45254834     -0.23272727

After you prepared your input files optcs.inp and optcs.xyz, you can run MLatom as usual:

mlatom optcs.inp

The output geometry is saved in final_geo.xyz with MLatom format, which for our example looks like (geometries in Å):

4

O        0.00000000      -0.00000000       0.12401796
H        0.00000000       0.90385318      -0.24497568
H       -0.78275982      -0.45192659      -0.24497568
H        0.78275982      -0.45192659      -0.24497568

When you want to calculate the heat of formation of H3O+ with the above optimized geometry, then you can use the following MLatom input freqcs.inp:

AIQM1
xyzfile=final_geo.xyz
freq
optprog=ase
charges=1

After you run MLatom by mlatom freqcs.inp > freqcs.out, you can get the following heat of formation from the output:

Heat of formation at 298.15 K           :       150.84947 kcal/mol
* Warning * Heat of formation have high uncertainty!

Vertical excitation energies

For example, if we want to calculate the vertical excitation energies of ethene in 1B1u state, then you can use the following MLatom input vee.inp:

AIQM1
xyzfile=vee.xyz
mndokeywords=mndokw

This input requires a vee.xyz file with XYZ geometries of molecules, e.g. (geometries in Å):

6
Ethene 1B1u
C          0.000000      0.000000     0.668188
C          0.000000      0.000000    -0.668188
H          0.000000      0.923274     1.238289
H          0.000000     -0.923274     1.238289
H          0.000000      0.923274    -1.238289
H          0.000000     -0.923274    -1.238289

and mndokw file with MNDO keywords:

iop=-22 immdp=-1 +
igeom=1 iform=1 +
jop=-2 nsav15=3 +
kci=5 ici1=1 ici2=1 iroot=2 ioutci=2 +
movo=-1 nciref=1 mciref=3 levexc=2

After you prepared your input files vee.inp, vee.xyz, and mndokw you can run MLatom as usual:

mlatom vee.inp

The output of vertical excitation energy is saved in the MNDO program outfile mndo.out, where you can see:

State  2,  Mult. 1,  B1u (4),  E-E(1)=  7.910791 eV,  E=   -304.054934 eV

See also tutorial.

Excited-state geometry optimization

For example, if we want to optimize geometry of formaldehyde in 1* excited state (1A”), then you can use the following MLatom input optes.inp:

AIQM1
xyzfile=optes.xyz
optxyz=final_geo.xyz
geomopt
optprog=ase
mndokeywords=mndokw_es

This input requires optes.xyz file with initial XYZ geometries of molecules to be optimized, e.g. (geometries in Å):

4
formaldehyde 1npi*
C      -0.02667227        0.64339915       -0.00000000
O      -0.02667227       -0.71674790        0.00000000
H       0.18670592        0.93679416        1.04362466
H       0.18670592        0.93679416       -1.04362466

and mndokw_es file with MNDO keywords:

iop=-22 immdp=-1 +
igeom=1 iform=1 +
jop=-2 nsav15=3 +
kharge=0 imult=0 +
kci=5 ici1=6 ici2=4 jci1=1 jci2=1 ncisym=2 +
movo=-1 nciref=1 mciref=3 levexc=2 iroot=1 lroot=1

The first line in mndokw_es file requests the use of the ODM2* Hamiltonian (you do not need to modify it), the second line specifies that geometry is given in a free XYZ format, the third line requests calculation of gradients and saving energies and gradients into fort.15 file (required for geometry optimization), the fourth line sets charge (kharge=0) and multiplicity (imult=0), and the following lines setup the type of excited-state calculations. kci=5 requests GUGA-CI approach for excited state calculation; ici1 and ici2 define the number of active occupied orbitals and unoccupied orbitals respectively; jci1 and jci2 define the number of occupied and unoccupied pi-MOs included in the active space; ncisym define the state symmetry. Here the initial geometry has Cspoint group, which has A’ and A” this two irreducible representation, and we use ncisym=2 to calculate the 1A” state. nciref and mciref define the number of reference occupations and definition of reference occupations respectively; levexc define the maximum excitation level; iroot and lroot is the number of lowest CI states computed and the number of the CI state of interest.

After you prepared your input files optes.inp, optes.xyz, and mndokw_es you can run MLatom as usual:

mlatom optes.inp

The output geometry is saved in final_geo.xyz with MLatom format, which for our example looks like (geometries in Å):

4

C       -0.10906735       0.55934608      -0.00000000
O       -0.02784720      -0.77782274       0.00000000
H        0.22849093       1.00935811       0.92370071
H        0.22849093       1.00935811      -0.92370071

Visualizing molecular orbitals

If you use Sparrow for providing the ODM2* part: vmo_sparrow.inp, sp.xyz

AIQM1
qmprog=sparrow
xyzfile=sp.xyz
yestfile=enest.dat
QMprogramKeywords=SparrowKW

You need to use QMprogramKeywords for printing orbitals in Molden wavefunction format. The content of the file SparrowKW should be -W. After calculation, the wavefunction.molden.input file will be generated. You can use Jmol or MOLDEN program to open this file and see the orbitals. For Jmol program, to get the pictures of orbitals. You should right-click in the panel and select Surface and MO. For MOLDEN program, to get the pictures of orbitals, you need to choose Dens. Mode from the control panel and select the Orbital button, then select the Space option to set the contour value. Besides, you can also use Chemcraft to visualize the orbitals, but you need to rename this file with a .molden suffix.

If you use MNDO for providing the ODM2* part: vmo_mndo.inp, sp.xyz

AIQM1
qmprog=mndo
xyzfile=sp.xyz
yestfile=enest.dat
mndokeywords=mndokw_vmo

You should provide the mndokw_vmo file with general MNDO keywords and add an extra keyword nsav13=2:

iop=-22 immdp=-1 +
igeom=1 iform=1 +
jop=-1 nsav15=3 +
kharge=0 imult=0 nsav13=2

Then the molden.dat file will be generated. You can also use Jmol or MOLDEN program to visualize the orbitals.

Citations

If you have used AIQM1, then the following citations are appropriate in your publication:

  • AIQM1: P. Zheng, R. Zubatyuk, W. Wu, O. Isayev, P. O. Dral. Artificial Intelligence-Enhanced Quantum Chemical Method with Broad Applicability. Nat. Commun. 2021, 12, 7022. DOI: 10.1038/s41467-021-27340-2.

  • MLatom: Pavlo O. Dral, Fuchun Ge, Yi-Fan Hou, Peikun Zheng, Yuxinxin Chen, Mario Barbatti, Olexandr Isayev, Cheng Wang, Bao-Xin Xue, Max Pinheiro Jr, Yuming Su, Yiheng Dai, Yangtao Chen, Lina Zhang, Shuang Zhang, Arif Ullah, Quanhao Zhang, Yanchi Ou. MLatom 3: A Platform for Machine Learning-enhanced Computational Chemistry Simulations and Workflows. J. Chem. Theory Comput. 2024, 20, 1193–1213. DOI: 10.1021/acs.jctc.3c01203.

  • MLatom: Pavlo O. Dral, Fuchun Ge, Yi-Fan Hou, Peikun Zheng, Yuxinxin Chen, Bao-Xin Xue, Max Pinheiro Jr, Yuming Su, Yiheng Dai, Yangtao Chen, Shuang Zhang, Lina Zhang, Arif Ullah, Quanhao Zhang, Yanchi Ou. MLatom: A Package for Atomistic Simulations with Machine Learning, version [add version number], Xiamen University, Xiamen, China, 2013–2024. MLatom.com.

  • ODM2* Hamiltonian: P. O. Dral, X. Wu, W. Thiel, J. Chem. Theory Comput. 2019, 15, 1743.

    • If you use MNDO program:

      MNDO program: W. Thiel, MNDO [check your version] (Max-Planck-Institut fur Kohlenforschung, Mulheim an der Ruhr, [year])

    • If you use MLatom@XACS cloud computing service, where SCINE Sparrow is used:

      F. Bosia, T. Husch, C. H. Müller, S. Polonius, J.-G. Sobez, M. Steiner, J. P. Unsleber, A. C. Vaucher, T. Weymuth, M. Reiher, qcscine/sparrow: to-be-released alpha version with modifications by P. Zheng, 2022.

      T. Husch, A. C. Vaucher, M. Reiher, Int. J. Quantum Chem. 2018, 118, e25799.

  • D4: E. Caldeweyher, C. Bannwarth, S. Grimme, J. Chem. Phys. 2017, 147, 034112.

  • D4 program: E. Caldeweyher, S. Ehlert, S. Grimme, DFT-D4, Version [check your version], (Mulliken Center for Theoretical Chemistry, University of Bonn, [year])

  • ANI model: J. S. Smith, O. Isayev, A. E. Roitberg, Chem. Sci. 2017, 8, 3192

  • TorchANI program: X. Gao, F. Ramezanghorbani, O. Isayev, J. S. Smith, A. E. Roitberg, J. Chem. Inf. Model. 2020, 60, 3408