Simulations

Single-point calculations

Single-point calculations can be performed for either given geometries (one or many) or generic input vector(s) X (often, molecular descriptors representing a molecule) and can be performed either with a pre-trained model supported by MLatom or with a user-trained model.

Input arguments

The user has to choose at least one model either with MLmodelIn or by giving the name of a pre-trained model.

At least one of the arguments, XYZfile or XfileIn, should be chosen.

  • useMLmodel MLmodelIn=[file with ML model] or AIQM1, ANI-1ccx, …

    • One and only one of these two options can be chosen.

    • useMLmodel MLmodelIn: see user-trained models for details. No default parameters to MLmodelIn are provided. Requests to read a file with ML model.

    • AIQM1, ANI-1ccx, …: see pre-trained models for a full list. This argument is optional and no default parameters are provided. Requests one of the pre-trained models supported by MLatom.

  • XYZfile=[file with XYZ coordinates] or XfileIn=[file with input vectors X]

    • One and only one of these two options can be chosen. No default file names.

    • XYZfile: requests to make predictions for one or many molecules provided in file with their XYZ coordinates. The units of coordinates depend on the model. For pre-trained models Å should be used.

    • XfileIn: requests to make predictions for provided list of input vectors (one input vector per line in text file), which are typically molecular descriptors.

Output file arguments

At least one of the below arguments is required.

  • YestFile=[file with estimated Y values]

    • This argument is optional and no default parameters are provided.

    • Saves predictions Y to the requested file. If a file with the same name already exists, program will terminate and not overwrite it. If predictions are made with pre-trained models, they are energies in Hartree. Other predictions depend on a model.

  • YgradXYZestFile=[file with estimated XYZ gradients]

    • This argument is optional and no default parameters are provided.

    • Should be used only with XYZfile option. Saves predicted XYZ gradients (first derivatives) to the requested file. If a file with the same name already exists, program will terminate and not overwrite it. If predictions are made with pre-trained models, they are XYZ gradients in Hartree/Å.

  • YgradEstFile=[file with estimated gradients]

    • This argument is optional and no default parameters are provided.

    • Should be used only with XfileIn option. Saves predicted gradients (first derivatives) wrt to elements of input vector X to the requested file. If a file with the same name already exists, program will terminate and not overwrite it.

Note

Calculations with AIQM1-based models also generate additional output files, i.e., if one needs other properties than available via the above options, one should use a single XYZ geometry and look at the MNDO output file mndo.out.

Example

Single-point calculations of energies (and gradients if needed) of closed-shell molecules in electronic ground state is the simplest job, which can be run with 3-4 line MLatom input file, e.g., sp.inp:

AIQM1 # or useMLmodel MLmodelIn=CH3Cl.unf if you, e.g., want to use CH3Cl.unf model
xyzfile=sp.xyz
yestfile=enest.dat
ygradxyzestfile=xyzgradest.dat

This input requires a sp.xyz file with XYZ geometries of molecules (you can provide many molecules as usual for MLatom), e.g., for hydrogen and methane sp.xyz file can look like (geometries in Å):

2

H    0.000000    0.000000    0.363008
H    0.000000    0.000000   -0.363008
5

C    0.000000    0.000000    0.000000
H    0.627580    0.627580    0.627580
H   -0.627580   -0.627580    0.627580
H    0.627580   -0.627580   -0.627580
H   -0.627580    0.627580   -0.627580

After you prepared your input files sp.inp and sp.xyz, you can run MLatom as usual:

mlatom sp.inp > sp.out

After the calculations finish, MLatom output sp.out will contain information depending on the model, e.g., for AIQM1 it will contain the standard deviation of NN prediction and components of AIQM1 energies:

Standard deviation of NN contribution   :      0.00892407 Hartree        5.59994 kcal/mol
NN contribution                         :     -0.00210898 Hartree
Sum of atomic self energies             :     -0.08587317 Hartree
ODM2* contribution                      :     -1.09094119 Hartree
D4 contribution                         :     -0.00000889 Hartree
Total energy                            :     -1.17893224 Hartree

Standard deviation of NN contribution   :      0.00025608 Hartree        0.16069 kcal/mol
NN contribution                         :      0.00958812 Hartree
Sum of atomic self energies             :    -33.60470494 Hartree
ODM2* contribution                      :     -6.86968756 Hartree
D4 contribution                         :     -0.00010193 Hartree
Total energy                            :    -40.46490632 Hartree

In any case, MLatom will save predicted values in file enest.dat which for above calculations will contain AIQM1 energies in Hartree:

 -1.178932238420
-40.464906315250

and XYZ gradients in Hartree/Å will be saved in file xyzgradest.dat looking like this:

2

    0.000000000000       0.000000000000       0.000032023551
    0.000000000000       0.000000000000      -0.000032023551
5

    -0.000000000000      -0.000000000000       0.000000000000
     0.000490470799       0.000490470714       0.000490470881
    -0.000490470799      -0.000490470714       0.000490470881
     0.000490470799      -0.000490470714      -0.000490470881
    -0.000490470799       0.000490470714      -0.000490470881

Note

Your output may have very minor numerical differences.

Geometry optimization

Geometry optimizations can be performed for given geometries (one or many) with either a pre-trained model supported by MLatom or a user-trained model. Optimizations are performed using third-party software (Gaussian or ASE), please see the installation of third-party software packages Gaussian and ASE; the experimental option is to use SciPy geometry optimization, but it is not well tested. On the MLatom@XACS cloud, ASE is used by default.

Input and output arguments

The user has to choose a task (minimum energy or TS geometry optimization), provide initial geometry, and choose one of the models or methods.

Task

Choose the tpye of task, one and only one of these arguments is required.

  • geomopt: requests minimum-energy geometry optimization.

  • TS: requests optimization of a transition state structure. Only works via interface to Gaussian.

Method or model specification

  • Universal ML and QM/ML methods

    If one of the known universal ML models or hybrid QM/ML methods are requested, their names should be provided as AIQM1 or ANI-1ccx (see manual), e.g., input file will look like:

    geomopt
    AIQM1
    xyzfile=ts_opt.xyz
    
  • QM methods

    If the QM method is used, please refer to the manual, in short, QM method and program should be provided with method and qmprog arguments like this:

    geomopt
    method=B3LYP/6-31G*
    qmprog=gaussian
    xyzfile=ts_opt.xyz
    
  • ML models

    MLmodelIn=[file with ML model] MLmodelType=[one of the supported types]: requests to read a file with ML model of the supported type (see manual). Note that model should predict energies in Hartree for a successful geometry optimizations. Example of input:

    geomopt
    MLmodelIn=ani.pt
    MLmodelType=ANI
    xyzfile=ts_opt.xyz
    

Initial XYZ coordinates

XYZfile=[file with XYZ coordinates]: it is required; no default parameters are provided. The units of coordinates should be Å. Provide the file with initial XYZ coordinates of one or many molecules.

Program for optimization

optprog=[either Gaussian or ASE or SciPy]: chooses a third-party program for optimization. By default, MLatom will try to use Gaussian. If Gaussian is not found, it will try to use ASE. If ASE is not found, it will try to use SciPy. Default algorithms in Gaussian, ASE, and SciPy are Berny optimization, LBFGS, and BFTS, respectivel. Optimizations settings can be changed by using additional options for the third-party program. If ASE is used, it may terminate after maximum number of iterations without informing a user that geometry is not optimized.

Output files

optxyz=[file with optimized XYZ coordinates]: saves optimized geometries in the requested file. The default file name is optgeoms.xyz.

In addition, geometry optimizations will dump many useful files:

  • optimized geometries in optgeoms.xyz or other file saved under name requested with optxyz=.

  • the optimization trajectories in XYZ format opttraj1.xyz and JSON format opttraj1.json and so on for each molecule.

  • in case of optimizations with the Gaussian optimizer, you will also get the corresponding input and output files gaussian1.com and gaussian1.log etc for each molecule.

If you want to control how much information is saved (e.g., for big molecules and many molecules):

  • printmin will not print information about every iteration.

  • printall will print detailed information at each iteration.

  • dumpopttrajs=False will not dump any optimization trajectories.

Additional options for third-party programs

Arguments

Default parameters

ase.fmax=[threshold of maximum force (in eV/A)]

0.02

ase.steps=[maximum number of optimization steps]

200

ase.optimizer=[LBFGS or BFGS]

LBFGS

Example

Geometry optimization of closed-shell molecules in the electronic ground state is as simple as running single point calculations and a 4-line MLatom input file, e.g., opt.inp, looks like this:

AIQM1 # or useMLmodel MLmodelIn=CH3Cl.unf if you, e.g., want to use CH3Cl.unf model
xyzfile=init.xyz
optxyz=opt.xyz
geomopt

This input requires init.xyz file with initial XYZ geometries of molecules to be optimized (you can provide many molecules as usual for MLatom), e.g., for hydrogen and methane init.xyz file can look like (geometries in Å):

2
Hydrogen molecule
H         0.0000000000        0.0000000000        0.0000000000
H         0.7414000000        0.0000000000        0.0000000000
5
Methane molecule
C         0.0000000000        0.0000000000        0.0000000000
H         1.0870000000        0.0000000000        0.0000000000
H        -0.3623333220       -1.0248334322       -0.0000000000
H        -0.3623333220        0.5124167161       -0.8875317869
H        -0.3623333220        0.5124167161        0.8875317869

After you prepared your input files opt.inp and init.xyz, you can run MLatom as usual:

mlatom opt.inp > opt.out

MLatom output file opt.out should contain lines similar to those below (if interface to the Gaussian program was used for optimization):

******************************************************************************
optprog: Gaussian 16

Standard deviation of NN contribution   :      0.00892062 Hartree        5.59777 kcal/mol
NN contribution                         :     -0.00210740 Hartree
Sum of atomic self energies             :     -0.08587317 Hartree
ODM2* contribution                      :     -1.09094281 Hartree
D4 contribution                         :     -0.00000889 Hartree
Total energy                            :     -1.17893227 Hartree

Standard deviation of NN contribution   :      0.00025608 Hartree        0.16069 kcal/mol
NN contribution                         :      0.00958812 Hartree
Sum of atomic self energies             :    -33.60470494 Hartree
ODM2* contribution                      :     -6.86968742 Hartree
D4 contribution                         :     -0.00010193 Hartree
Total energy                            :    -40.46490617 Hartree

==============================================================================
Wall-clock time: 21.60 s (0.36 min, 0.01 hours)

MLatom terminated on 11.11.2022 at 13:21:37
==============================================================================

After the calculations finish, the optimized geometries are saved in a single file opt.xyz, which for our example looks like (geometries in Å, there can be slight numerical differences depending on a machine, etc.):

2

H        0.00770082       0.00000000       0.00000000
H        0.73369918       0.00000000       0.00000000
5

C        0.00000000       0.00000000       0.00000000
H        1.08666998      -0.00000000       0.00000000
H       -0.36222332      -1.02452229      -0.00000000
H       -0.36222332       0.51226114      -0.88726233
H       -0.36222332       0.51226114       0.88726233

Frequencies and thermochemistry

Calculation of frequencies and thermochemical properties can be performed for given optimized geometries (one or many) with either a pre-trained model supported by MLatom or a user-trained model. The geometries should be first optimized with the same model before these calculations can be performed. Calculations require the use of third-party software (Gaussian, ASE or PySCF), please see the installation instructions. On the MLatom@XACS cloud, both ASE and PySCF can be used.

Input arguments

The user has to choose at least one model either with MLmodelIn or by giving the name of a pre-trained model.

  • freq

    • This argument is required.

    • Requests frequencies calculations.

  • useMLmodel MLmodelIn=[file with ML model]

    • This argument is optional and no default parameters are provided.

    • Requests to read a file with ML model. It is an optional argument and no default models are provided. Note that model should predict energies in Hartree for a successful freq calculations.

  • ANI-1ccx, AIQM1 , etc.

    • see pre-trained model for a full list. This argument is optional and no default parameters are provided.

    • requests one of the pre-trained models supported by MLatom, e.g., AIQM1, ANI-1ccx, etc. If AIQM1 or ANI-1ccx is requested, MLatom will also calculate enthalpies of formation at 298 K, see tutorial. The standard deviation of neural networks in AIQM1 and ANI-1ccx models will be reported and if it is larger than 0.41 and 1.68 kcal/mol, respectively, predicted enthalpies of formation have potentially too high uncertainty and a warning will be reported in the output file.

  • XYZfile=[file with optimized XYZ coordinates]

    • This argument is required; no default parameters are provided.

    • File with optimized XYZ coordinates of one or many molecules. The units of coordinates should be Å.

  • freqprog=[Gaussian, PySCF or ASE]

    • By default, MLatom will try to use Gaussian. If Gaussian is not found, it will try to use PySCF, then ASE.

    • Chooses a third-party program for frequencies calculations.

Note

Frequency calculations with Gaussian program also generate Gaussian output files mol_1.log, mol_2.log, etc, which contain additional information.

Additional options for third-party programs

  • ase.linear=N,…,N

    • Available parameters are 0 (default) and 1.

    • 0 for nonlinear molecule, 1 for linear molecule. The order is the same as in XYZ file.

  • ase.symmetrynumber=N,…,N

    • Available parameter is 1 (default).

    • rotational symmetry number for each molecule (see Table 10.1 and Appendix B of C. Cramer “Essentials of Computational Chemistry”, 2nd Ed.). This number only affect the results of entropy and free energy, but this influence is usually very small. The order is the same as in XYZ file.

Point Group

Symmetry number

\(C_1\)

\(1\)

\(C_i\)

\(1\)

\(C_s\)

\(1\)

\(C_{{\infty}v}\)

\(1\)

\(D_{{\infty}h}\)

\(2\)

\(S_n,n=2,4,6,...\)

\(n/2\)

\(C_n,n=2,3,4,...\)

\(n\)

\(C_{nh},n=2,3,4,...\)

\(n\)

\(C_{nv},n=2,3,4,...\)

\(n\)

\(D_n,n=2,3,4,...\)

\(2n\)

\(D_{nh},n=2,3,4,...\)

\(2n\)

\(D_{nd},n=2,3,4,...\)

\(2n\)

\(T\)

\(12\)

\(T_d\)

\(12\)

\(O_h\)

\(24\)

\(I_h\)

\(60\)

This number only affect the results of entropy and free energy, but this influence is usually very small. The order is the same as in XYZ file.

Example

Thermochemical properties of closed-shell molecules in the electronic ground state can be calculated at ANI-1ccx or AIQM1 level by adding an argument freq to the MLatom input file, e.g., freq.inp, and they should be run on geometries optimized with the corresponding model. An example of MLatom input file using the ANI-1ccx-optimized geometries:

ANI-1ccx
xyzfile=optgeoms.xyz
freq
freqprog=ASE # or freqprog=gaussian if you choose Gaussian
ase.linear=1,0
ase.symmetrynumber=2,12

When ASE is used for the calculation of thermochemical properties, you should specify ase.linear and ase.symmetrynumber this two keywrods. ase.linear is 0 for nonlinear molecule, 1 for linear molecule, and ase.symmetrynumber is the rotational symmetry number for each molecule (see Table 10.1 and Appendix B of C. Cramer “Essentials of Computational Chemistry”, 2nd Ed.). For example, for hydrogen and methane this two molecules, you should set ase.linear=1,0 and ase.symmetrynumber=2,12.

File with preoptimized geometries optgeoms.xyz for our example are (geometries in Å):

2
hydrogen
H        0.15255733       0.00000000       0.00000000
H        0.58884267       0.00000000       0.00000000
5
methane
C        0.00000000       0.00000000       0.00000000
H        1.08733372      -0.00000000       0.00000000
H       -0.36244456      -1.02514806       0.00000000
H       -0.36244456       0.51257403      -0.88780426
H       -0.36244456       0.51257403       0.88780426

After you prepared your input files freq.inp and optgeoms.xyz, you can run MLatom as usual:

mlatom freq.inp > freq.out

After the calculations finish, MLatom output freq.out will contain the summary with atomization enthalpy at 0 K, ZPVE-exclusive atomization energy at 0 K, and heat of formation at 298.15 K for each molecule. If you use ASE, MLatom output will contain the same lines as above, but also include additional data such as entropy and the Gibbs free energy:

......
Zero-point vibrational energy           :         4.07528 kcal/mol
Atomization enthalpy at 0 K             :       126.42974 kcal/mol
ZPE exclusive atomization energy at 0 K :       130.50502 kcal/mol
Heat of formation at 298.15 K           :       -23.11424 kcal/mol
* Warning * Heat of formation have high uncertainty!
......
Zero-point vibrational energy           :        27.87144 kcal/mol
Atomization enthalpy at 0 K             :       391.92513 kcal/mol
ZPE exclusive atomization energy at 0 K :       419.79657 kcal/mol
Heat of formation at 298.15 K           :       -17.63420 kcal/mol

If you use Gaussian, the Gaussian output files of frequency calculations are saved in mol_1.log, mol_2.log, … files for each molecule; these files contain ZPVE energy and lots of thermochemical data such as entropy and the Gibbs free energy.

IRC

Intrinsic reaction coordinate (IRC) can be used to check the nature of the optimized TS. It can be performed for given geometries (one or many) with either a pre-trained model supported by MLatom or a user-trained model. IRC is performed via interface to the Gaussian program.

Input and output arguments

The user has to choose a task (minimum energy geometry optimization, TS optimization, or IRC) and choose at least one model either with MLmodelIn or by giving the name of a pre-trained model.

Method or model specification

  • Universal ML and QM/ML methods

    If one of the known universal ML models or hybrid QM/ML methods are requested, their names should be provided as AIQM1 or ANI-1ccx (see manual), e.g., input file will look like:

    IRC
    AIQM1
    xyzfile=ts_opt.xyz
    
  • QM methods

    If the QM method is used, please refer to the manual, in short, QM method and program should be provided with method and qmprog arguments like this:

    IRC
    method=B3LYP/6-31G*
    qmprog=gaussian
    xyzfile=ts_opt.xyz
    
  • ML models

    MLmodelIn=[file with ML model] MLmodelType=[one of the supported types]: requests to read a file with ML model of the supported type (see manual). Note that model should predict energies in Hartree for a successful geometry optimizations. Example of input:

    IRC
    MLmodelIn=ani.pt
    MLmodelType=ANI
    xyzfile=ts_opt.xyz
    

Initial XYZ coordinates

XYZfile=[file with XYZ coordinates]: it is required; no default parameters are provided. The units of coordinates should be Å.

Example

Geometry optimization of closed-shell molecules in the electronic ground state is as simple as running single point calculations and a 4-line MLatom input file, e.g., opt.inp, looks like this:

IRC
AIQM1 # or useMLmodel MLmodelIn=CH3Cl.unf if you, e.g., want to use CH3Cl.unf model
xyzfile=ts_opt.xyz

This input requires ts_opt.xyz file with initial XYZ geometries of molecules to be optimized (you can provide many molecules as usual for MLatom), e.g., for the Diels–Alder reaction ts_opt.xyz file can look like (geometries in Å):

16

C          0.48462430     -0.55755495      1.43729151
C          0.48462430     -0.55755495     -1.43729151
C         -0.27595797     -1.44977527      0.70359025
C         -0.27595797     -1.44977527     -0.70359025
C         -0.27595797      1.45086377      0.69299925
C         -0.27595797      1.45086377     -0.69299925
H          0.37292526     -0.50748993      2.51767690
H          1.44526264     -0.21636383      1.06867438
H          0.37292526     -0.50748993     -2.51767690
H          1.44526264     -0.21636383     -1.06867438
H         -1.05536225     -2.01444047      1.21328943
H         -1.05536225     -2.01444047     -1.21328943
H          0.51071931      1.96707995      1.23581344
H         -1.20625330      1.32768072      1.23591744
H          0.51071931      1.96707995     -1.23581344
H         -1.20625330      1.32768072     -1.23591744

After you prepared your input files irc.inp and ts_opt.xyz, you can run MLatom as usual:

mlatom irc.inp > irc.out

After the calculations finish, the IRC results will be saved in the Gaussian output file gaussian.log.

Molecular dynamics

MLatom can now perform molecular dynamics of molecular systems with various methods and models due to its interfaces to many famous quantum chemistry and machine learning packages.

Input and output arguments

Arguments

Available and default parameters

Description

dt

0.1 by default

time step; unit: fs

trun

1000 by default

length of trajectory; unit: fs

initXYZ

required

user-provided initial geometry (should be in Å)

initVXYZ

required when initConditions≠random

user-provided initial velocity (should be in Å/fs)

initConditions

user-defined by default, other options: random

algorithm of generating initial conditions

initTemperature

300 by default

initial temperature; unit: K; necessary when initConditions=random

initXYZout

output file of initial geometry

initVXYZout

output file of initial velocity

Thermostat

NVE by default, other options: Andersen Nose-Hoover

MD thermostat

Temperature

300 by default

environment temperature

Gamma

0.2 by default, required when Thermostat=Andersen

collision frequency; unit: fs-1

NHClength

3 by default, required when Thermostat=Nose-Hoover

Nose-Hoover chain length

Nc

3 by default, required when Thermostat=Nose-Hoover

multiple time step

Nys

7 by default, required when Thermostat=Nose-Hoover

number of Yoshida-Suzuki steps; only 1,3,5,7 are available

NHCfreq

0.0625 by default, required when Thermostat=Nose-Hoover

Nose-Hoover chain frequency; unit: fs-1

trajH5MDout

traj.h5 by default

trajectory saved in H5MD file format

trajTextout

traj by default

trajectory saved in plain text format

Example

Below is the example of how to run MD with MLatom:

MD                            # Molecular dynamics
method=AIQM1                  # Use AIQM1
initConditions=user-defined   # Use user-defined initial conditions
initXYZ=init.xyz              # File with initial geometry
initVXYZ=init.vxyz            # File with initial velocities
dt=0.1                        # Time step
trun=100000                   # Length of trajectory
thermostat=nose-hoover        # Use Nose-Hoover thermostat
temperature=300               # Set temperature
trajH5MDout=traj.h5           # Save trajectory in traj.h5
qmprog=mndo

IR and power spectra from MD

MD trajectory can be used to generate IR spectrum and power spectrum.

Arguments

Available and default parameters

Description

trajH5MDin

required if trajdpin is not provided

file with trajectory in H5MD format

trajVXYZin

required if trajH5MDin is not provided

plain text file containing velocities

trajdpin

required if trajH5MDin is not provided

plain text file containing dipole moments

start_time

0.0 by default

unit: fs; use trajectory from start_time to end_time

end_time

maximum time by default

unit: fs; use trajectory from start_time to end_time

autocorrelationDepth

1024 by default

autocorrelation depth; unit: fs

zeropadding

1024 by default

zero padding; unit: fs

title

title of the plot

output

its value can be ir or ps

which spectrum to output

The plot will be saved in ir.png or ps.png, and the spectrum will be saved in ir.npy or ps.npy.

Example

Below is an example of how to generate IR spectrum from MD trajectory:

MD2vibr                      # Generate vibrational spectrum from MD trajectory
trajH5MDin=traj.h5           # Read MD trajectory from traj.h5
dt=0.5                       # Time step
start_time=3000              # Start time
end_time=100000              # End time
autocorrelationDepth=1024    # Autocorrelation depth
zeropadding=1024             # Zero padding
output=ir                    # Generate IR spectrum

Below is an example of how to generate power spectrum from MD trajectory:

MD2vibr
trajH5MDin=traj.h5
dt=0.5
start_time=0
end_time=10000
autocorrelationDepth=1024
zeropadding=1024
output=ps

Simulations with universal ML-based models

MLatom supports calculations with the following pre-trained ML-based models (MLatom arguments are spelled exaxtly the same way as the method names given below):

  • AIQM1, AIQM1@DFT, AIQM1@DFT* (tutorial with examples and installation instructions)
    • Strengths: AIQM1 is approaching CCSD(T)/CBS accuracy but with a speed of semiempirical methods (thousand times faster than DFT) for energy calculations and geometry optimizations of closed-shell molecules in their ground state. It is also transferable for calculations of charged and radical species as well as for excited-state calculations with good accuracy. MLatom will also report the standard deviation of neural networks correction and if it is larger than 0.41 kcal/mol the AIQM1 calculations have potentially too high uncertainty and a warning will be reported in the output file if heats of formation are predicted.

    • Limitations: only CHNO elements are supported. On XACS cloud, no analytical gradients are available, i.e., geometry optimizations and frequencies calculations are rather slow; install local MLatom if higher efficiency is needed.

  • ANI-1ccx, ANI-1x, ANI-2x, ANI-1x-D4 and ANI-2x-D4 (requires to install TorchANI)
    • Strengths: Faster than AIQM1. ANI-1ccx is also approaching CCSD(T)/CBS accuracy for energy calculations and geometry optimizations of closed-shell molecules in their ground state but is generally less accurate and reliable than AIQM1. MLatom will also report the standard deviation of neural networks and if it is larger than 1.68 kcal/mol the ANI-1ccx calculations have potentially too high uncertainty and a warning will be reported in the output file if heats of formation are predicted.

    • Limitations: only CHNO elements are supported by ANI-1ccx and ANI-1x, CHNOFClS are supported by ANI-2x. Not transferable for calculations of charged and radical species or to excited-state calculations. Not good accuracy for noncovalent interactions if no D4 correction is included.

They can be used for such typical simulations as (see the corresponding sections for more details):

Optional arguments

AIQM1 is using interfaces to MNDO or Sparrow to calculate QM contributions. Thus, the following AIQM1-specific arguments can be used:

  • QMprog=[program]:

    • MNDO [default]

      Sparrow [default if MNDO is not found]

    • chooses a program for calculating QM part of AIQM1. If neither MNDO or Sparrow program is found, MLatom will not be able to run AIQM1 calculations.

  • mndokeywords=[file with MNDO keywords, e.g., mndokw]

    • allows to modify the input to MNDO to request non-standard calculations, e.g., to define charge, multiplicity, excited-state calculations settings, convergence criteria, etc. These keywords can be provided to MLatom via a MNDO keyword file, e.g., mndokw file, which should contain at least keywords iop=-22 immdp=-1. For more details see the AIQM1 tutorial and MNDO documentation.

Note

Calculations with AIQM1-based models also generate additional output files, i.e., if one needs other properties than available via the above arguments, one should perform calculations on a single XYZ geometry and look at the MNDO output file mndo.out.

Example

Geometry optimization of closed-shell molecules in the electronic ground state is as simple as running single point calculations and a 4-line MLatom input file, e.g., opt.inp, looks like this:

AIQM1 # or ANI-1ccx, ANI-2x, etc.
xyzfile=init.xyz
optxyz=opt.xyz
geomopt

This input requires init.xyz file with initial XYZ geometries of molecules to be optimized (you can provide many molecules as usual for MLatom), e.g., for hydrogen and methane init.xyz file can look like (geometries in Å):

2
Hydrogen molecule
H         0.0000000000        0.0000000000        0.0000000000
H         0.7414000000        0.0000000000        0.0000000000
5
Methane molecule
C         0.0000000000        0.0000000000        0.0000000000
H         1.0870000000        0.0000000000        0.0000000000
H        -0.3623333220       -1.0248334322       -0.0000000000
H        -0.3623333220        0.5124167161       -0.8875317869
H        -0.3623333220        0.5124167161        0.8875317869

After you prepared your input files opt.inp and init.xyz, you can run MLatom as usual:

mlatom opt.inp > opt.out

MLatom output file opt.out should contain lines similar to those below (if interface to the Gaussian program was used for optimization):

******************************************************************************
optprog: Gaussian 16

Standard deviation of NN contribution   :      0.00892062 Hartree        5.59777 kcal/mol
NN contribution                         :     -0.00210740 Hartree
Sum of atomic self energies             :     -0.08587317 Hartree
ODM2* contribution                      :     -1.09094281 Hartree
D4 contribution                         :     -0.00000889 Hartree
Total energy                            :     -1.17893227 Hartree

Standard deviation of NN contribution   :      0.00025608 Hartree        0.16069 kcal/mol
NN contribution                         :      0.00958812 Hartree
Sum of atomic self energies             :    -33.60470494 Hartree
ODM2* contribution                      :     -6.86968742 Hartree
D4 contribution                         :     -0.00010193 Hartree
Total energy                            :    -40.46490617 Hartree

==============================================================================
Wall-clock time: 21.60 s (0.36 min, 0.01 hours)

MLatom terminated on 11.11.2022 at 13:21:37
==============================================================================

After the calculations finish, the optimized geometries are saved in a single file opt.xyz, which for our example looks like (geometries in Å, there can be slight numerical differences depending on a machine, etc.):

2

H        0.00770082       0.00000000       0.00000000
H        0.73369918       0.00000000       0.00000000
5

C        0.00000000       0.00000000       0.00000000
H        1.08666998      -0.00000000       0.00000000
H       -0.36222332      -1.02452229      -0.00000000
H       -0.36222332       0.51226114      -0.88726233
H       -0.36222332       0.51226114       0.88726233

Simulations with QM methods

MLatom supports QM calculations with various popular programs.

Available interfaced QM programs

Arguments

  • method=[QM method]

    (required, case insensitive)

    Depending on the available interfaced QM program (see below), supported methods include ab initio, DFT, and semi-empirical QM methods. The following short-list shows same Standard QM methods recognized by MLatom:

    • usual format such as B3LYP/6-31G*

    • GFN2-xTB (interface to xtb)

    • CCSD(T)*/CBS (interface to ORCA)

    • ODM2 (interface to MNDO)

    • ODM2* (interface to MNDO and Sparrow)

  • qmprog=[supported QM program]

    (required, case insensitive)

    Brief and incomplete examples for each QM programs are given below

  • QMprogramKeywords=[file with keywords of QM program]

    (optional)

    Now only xtb and mndo keywords are supported for the respective programs.

  • multiplicities=[multiplicities of molecules]

    (optional)

    Default value is 1. If more than one molecules are provides, please use comma to separate the multiplicities, e.g. multiplicities=3,3.

  • charges=[charges of molecules]

    (optional)

    Default value is 0. If more than one molecules are provied, please use comma to separate the charges, e.g. charges=1,-1.

  • nthreads=[number of threads used]

    (optional)

    Default value is 1

Examples of different calculation types

Single-point calculations

When doing simulations with QM methods, it is necessary to include method and qmprog keywords in your input file (except for xTB which does not require qmprog). The molecular structure file and output file can be defined as usual. See tutorial about the single-point calculations. Generally, the input file sp.inp should look like this:

method=B3LYP/6-31G*
qmprog=gaussian
xyzfile=sp.xyz
yestfile=enest.dat

where the input structure file sp.xyz contains:

5

C          0.00000000      0.00000000      0.00000000
H          0.62783705     -0.62783705      0.62783705
H         -0.62783705      0.62783705      0.62783705
H         -0.62783705     -0.62783705     -0.62783705
H          0.62783705      0.62783705     -0.62783705

After running $mlatom sp.inp > sp.out, the output file sp.out will give calculated single point energy like this: (also in enest.dat)

******************************************************************************

You are going to use feature(s) listed below.
Please cite corresponding work(s) in your paper:

Gaussian program:
See the Gaussian output file for
the proper citation

******************************************************************************

Energy of molecule      1:         -40.5182964000000 Hartree

==============================================================================
Wall-clock time: 1.07 s (0.02 min, 0.00 hours)

MLatom terminated on 08.10.2023 at 10:13:54
==============================================================================

If definition of charges and multiplicities is required, the input file can also look like this:

method=B3LYP/6-31G*
qmprog=gaussian
xyzfile=sp.xyz
yestfile=enest.dat
charges=0,0,1
multiplicities=3,3,1

where structures of 3 molecules are included in sp.xyz. Note that charges and multiplicities should be defined in the same order.

Frequency calculations

See tutorial about the frequency calculation and thermochemistry. Here is the typical input file:

method=B3LYP/6-31G*
qmprog=gaussian
xyzfile=sp.xyz
charges=0,0,1
multiplicities=3,3,1
freq
freqprog=gaussian # optional (default Gaussian)

Geometry optimization

See tutorial about the geometry optimization. Here is the typical input file:

method=B3LYP/6-31G*
qmprog=gaussian
xyzfile=sp.xyz
optxyz=opt.xyz
charges=0,0,1
multiplicities=3,3,1
geomopt
optprog=gaussian # optional (default Gaussian)

Examples for each QM programs

(single point calculation only)

Gaussian

Gaussian is a commonly used QM program that supports various types of calculations with different level of theory. Each Gaussian job should specify both method and basis set (usually separated by /), which will be used in method keyword in MLatom. For methods available, see https://gaussian.com/capabilities/?tabid=0.

A typical input file of MLatom using Gaussian looks like this:

method=B3LYP/6-31G*
qmprog=gaussian
xyzfile=sp.xyz
yestfile=enest.dat

PySCF

The Python-based Simulations of Chemistry Framework (PySCF) is an open-source Python package that possesses various electronic structure modules. It can be used to simulate the properties of molecules, crystals, and custom Hamiltonians using mean-field and post-mean-field methods. For methods available, see https://pyscf.org/user.html

Here is the list of methods and jobs currently supported in pyscf.

  • Energy: HF, MP2, DFT, CISD, FCI, CCSD/CCSD(T), TD-DFT/TD-HF

  • Gradients: HF, MP2, DFT, CISD, CCSD, RCCSD(T), TD-DFT/TD-HF

  • Hessian: HF, DFT

A typical input file of MLatom using PySCF looks like this:

method=b3lyp/6-31g*
qmprog=pyscf
xyzfile=sp.xyz
yestfile=enest.dat

Orca

Orca is a general-purpose computational program for quantum chemistry with specific emphasis on spectroscopic properties of open-shell molecules. It supports various quantum chemistry methods with different level of theory. More information can be found on their website.

We also implement CCSD(T)*/CBS method in Orca interface which uses a composite scheme to extrapolate CCSD(T) to the complete basis set, which is faster than full CCSD(T)/CBS without sacrificing accuracy. Details of components in CCSD(T)*/CBS can be checked here

A typical input file of MLatom using Orca looks like this:

method=b3lyp/6-31g*
qmprog=orca
xyzfile=sp.xyz
yestfile=enest.dat

For CCSD(T)*/CBS method, users just need to directly specify it like this:

CCSD(T)*/CBS
xyzfile=sp.xyz
yestfile=enest.dat

xTB

The open-source semiempirical extended tight binding (xTB) program supports the calculation with popular semiempirical quantum mechanical methods GFNn-xTB. Currently MLatom use GFN2-xTB as default.

A typical input file of MLatom using GFN2-xTB looks like this:

method=GFN2-xTB
xyzfile=sp.xyz
yestfile=enest.dat
QMprogramKeywords=xtb_kw # optional

xtb_kw file looks like: (details see xTB command line option https://xtb-docs.readthedocs.io/en/latest/commandline.html)

-c 1 -u 3

Note

  • When using xTB, no need to specify qmprog=xTB

  • If GFN-xTB is to use, please specify --gfn 1 in keyword file

MNDO

MNDO is a semiempirical quantum chemistry program that supports semiempirical calculations using orthogonalization corrections. For methods available, please refer to https://mndo.kofo.mpg.de/input.php

A typical input file of MLatom using MNDO looks like this:

method=ODM2
qmprog=mndo
xyzfile=sp.xyz
yestfile=enest.dat
QMprogramKeywords=mndokw # optional

Sparrow

SCINE Sparrow is an open-source command line tool for various semiempirical methods including MNDO-type models and DFTB models. For methods available, please refer to https://scine.ethz.ch/download/sparrow

A typical input file of MLatom using Sparrow looks like this:

method=ODM2*
qmprog=sparrow
xyzfile=sp.xyz
yestfile=enest.dat

Simulations with user-trained models

MLatom can read a user-trained model from a file to make predictions (single-point calculations) with it for new data (given either as input vectors X or as XYZ coordinates) and ultimately to perform the following simulations (for data given in XYZ coordinates):

  • single-point calculations

  • geometry optimizations

  • frequencies and thermochemistry

  • molecular dynamics

The models can be either native MLatom or from third-party interfaces to popular ML model types:

Arguments

Calculations with native implementations do not require additional arguments, while the use of third-party models require the specification of a model type and/or the name of a third-party program, i.e., the user should provide either MLmodelType and/or MLprog argument (see also installation instructions). They can be used for such typical simulations as (see the corresponding sections for more details).

  • MLmodelType=[supported ML model type]

    +-------------+----------------+
    | MLmodelType | default MLprog |
    +-------------+----------------+
    | KREG        | MLatomF        |
    +-------------+----------------+
    | sGDML       | sGDML          |
    +-------------+----------- ----+
    | GAP-SOAP    | GAP            |
    +-------------+----------------+
    | PhysNet     | PhysNet        |
    +-------------+----------------+
    | DeepPot-SE  | DeePMD-kit     |
    +-------------+----------------+
    | ANI         | TorchANI       |
    +-------------+----------------+
    
  • MLprog=[supported ML program]

    Supported interfaces with default and tested ML model types:

    +------------+----------------------+
    | MLprog     | MLmodelType          |
    +------------+----------------------+
    | MLatomF    | KREG [default]       |
    |            | see                  |
    |            | MLatom.py KRR help   |
    +------------+----------------------+
    | sGDML      | sGDML [default]      |
    |            | GDML                 |
    +------------+----------------------+
    | GAP        | GAP-SOAP             |
    +------------+----------------------+
    | PhysNet    | PhysNet              |
    +------------+----------------------+
    | DeePMD-kit | DeepPot-SE [default] |
    |            | DPMD                 |
    +------------+----------------------+
    | TorchANI   | ANI [default]        |
    +------------+----------------------+
    

Note

Calculations with third-party programs may also generate additional output files.

Example

Below is an input file example of how to use KREG model to optimize geometry (see tutorial):

geomopt                # Request geometry optimization
useMLmodel             # using existing ML model
MLmodelIn=energies.unf # in energies.unf file
MLmodelType=KREG       # of the KREG type
xyzfile=eq.xyz         # The file with initial guess

Quantum dynamics with machine learning

MLatom can perform quantum dissipative dynamics with a range of machine-learning methods via an interface to the MLQD program. Supported methods from the program’s website:

Input and output arguments

  • QDmodel=[createQDmodel or useQDmodel] (not optional)

    • default option is useQDmodel

    • requests MLQD to create or use QD model

  • QDmodelIn=[user-provided model file]

    • Not optional if QDmodel=useQDmodel. Passing the name of file with the trained model

  • QDmodelOut=[user-defined name of created model](optional)

    • You can pass it if QDmodel=createQDmodel and MLQD will save the trained model with this name. However, its optional, if you don’t pass it, MLQD will choose a random name.

  • QDmodelType=[KRR or AIQD or OSTL]

    • default option is OSTL

    • It tells MLQD what type of QD model to use

  • systemType=[SB or FMO](not optional)

    • no default option

    • It tells MLQD the type of the system

  • QDtrajOut=file name for the output trajectory

    • You can pass it if QDmodel=useQDmodel and MLQD will save the predicted dynamics with this name. However, its optional, if you don’t pass it, MLQD will choose a random name.

  • prepInput=[True or False]

    • default is False. Case sensitive

    • Prepare input files X and Y from the data

  • hyperParam=[True or False]

    • default is False. Case sensitive

    • Optimize the hyper parameters of the model

  • patience=[integer non-negative number]

    • Default value is 10

    • Patience for early stopping in CNN training

  • epochs=[integer non-negative number]

    • Default value is 100

    • Number of epochs for training and optimization of CNN model [OSTL and AIQD methods]

  • max_evals=[integer non-negative number]

    • Default value is 100

    • Number of maximum evaluations in hyperopt optimization of CNN model [OSTL and AIQD methods]

  • XfileIn=[name of X file]

    • Default is x_data if QDmodel=createQDmodel and prepInput=True

    • In the case of QDmodel=createQDmodel, its optional. It passes the name for X file. It saves the Xfile with this name if prepInput=True , and it passes the Xfile if prepInput=False . However if QDmodel=useQDmodel and QDmodelType=KRR , then it is not optional. You need to pass the input shot-time trajectory.

  • YfileIn=[name of Y file]

    • Default is y_data if QDmodel = createQDmodel and prepInput=True

    • In the case of QDmodel = createQDmodel, it is optional. It passes the name for Y file. It saves the Yfile with this name if prepInput=True , and it passes the Yfile if prepInput=False.

  • dataPath=[absolute or relative path with data]

    • In the case of QDmodel=createQDmodel, and prepInput=True, need to pass path to the data, so MLQD can prepare the X and Y files. It should be noted that, data should be in the same format as our in our data set QDDSET-1 (to be published) especially when QDmodelType=OSTL or AIQD

  • n_states=[number of states or sites, integer]

    • Default is 2 for SB and 7 for FMO

    • Number of states (SB) or sites (FMO)

  • initState=[number of initial site]

    • Default value is 1 (Initial exictation is on site-1)

    • It represents initial site in FMO complex. Only required when we propagate dynamics with OSTL or AIQD method

  • time=[propagation time]

    • Default is 20 for SB and 50 for FMO

    • Propagation time in picoseconds (ps) for FMO complex and in atomic units (a.u.) for spin-boson model

  • time_step=[time step of propagation]

    • Default is 0.05 for SB and 0.005 for FMO

    • time step of propagation

  • energyDiff=[energy difference]

    • Default value is 1.0

    • Energy difference between the states in the case of SB, needed only when QDmodelType=OSTL or AIQD

  • Delta=[tunneling matrix element]

    • Default value is 1.0

    • The tunneling matrix element in the case of SB, needed only when QDmodelType = OSTL or AIQD

  • gamma=[characteristic frequency]

    • Default value is 10 in the case of SB and 500 in the case of FMO

    • Characteristic frequency. In cm^-1 for FMO and in (a.u.) for SB, and needed only when QDmodelType=OSTL or AIQD

  • lamb=[system-bath coupling strength]

    • Default value is 1.0 in the case of SB and 520 in the case of FMO

    • System-bath coupling strength. In cm^-1 for FMO and in (a.u.) for SB, and needed only when QDmodelType=OSTL or AIQD

  • temp=[temperature]

    • Default value is 1.0 in the case of SB and 510 in the case of FMO

    • Temperature (K) in the case FMO complex and inverse temperature in the case of SB, and needed only when QDmodelType=OSTL or AIQD

  • energyNorm=[normalizer]

    • Default value is 1.0

    • Normalizer for the energy difference between the states in the case of SB

  • energyNorm=[normalizer]

    • Default value is 1.0

    • Normalizer for the tunneling matrix element in the case of SB

  • gammaNorm=[normalizer]

    • Default value is 10 in the case of SB and 500 in the case of FMO

    • Normalizer for characteristic frequency

  • lambNorm=[normalizer]

    • Default value is 1.0 in the case of SB and 520 in the case of FMO

    • Normalizer for system-bath coupling strength

  • tempNorm=[normalizer]

    • Default value is 1.0 in the case of SB and 510 in the case of FMO

    • Normalizer for temperature in the case of FMO and for inverse temperature in the case of SB

  • numLogf=[number of logistic functions]

    • Default value is 1

    • Number of logistic functions normalizing the dimension of time

  • LogCa=[coefficient]

    • Default value is 1.0

    • Coefficient “a” in the logistic function

  • LogCb=[coefficient]

    • Default value is 15.0

    • Coefficient “b” in the logistic function

  • LogCc=[coefficient]

    • Default value is -1.0

    • Coefficient “c” in the logistic function

  • LogCd=[coefficient]

    • Default value is 1.0

    • Coefficient “d” in the logistic function

  • dataCol=[column number]

    • Default value is 1

    • When QDmodelType=KRR , it only works for single output values. If ther are multiple columns in you data files, you need mention which column to grab

  • dtype=[real or imag]

    • Default is real

    • When you pass the column with dataCol and your data is complex, then need to mention which part of the complex data the MLQD to grab, real or imaginary

  • xlength=[number of time steps in the short seed trajectory]

    • Default value is 81

    • Length of the input short trajectory. It is the number of time steps in the data you passed with dataCol

  • refTraj

    • MLQD has the option to plot the predicted dynamics against the reference trajectory. It is optional, if reference trajectory is provided, MLQD will go for plotting otherwise not

  • xlim=[xaxis limit]

    • Default option is equal to the propagation time

    • The user can define xaxis limit for plotting

  • pltNstates=[number of states to be plotted]

    • Default option is to plot all states

    • Users can define how many states should be plotted by MLQD

Examples

These are just very brief examples, please see our detailed tutorial.

Training a KRR model

In the case of spin boson model, we have provided 20 trajectories from our QD3SET-1 database for demonstration. The MLQD will grab them automatically if you don’t pass data path.

MLQD
QDmodel=createQDmodel
QDmodelType=KRR
prepInput=True
dataCol=1
dtype=real
xlength=81
systemType=SB
QDmodelOut=KRR_SB_model

Propagation of dynamics with the trained KRR model

We are providing a short input trajectory saved as state_1_pop.txt:

MLQD
time=20
time_step=0.05
QDmodel=useQDmodel
QDmodelType=KRR
XfileIn=state_1_pop.txt
systemType=SB
QDmodelIn=KRR_SB_model
QDtrajOut=KRR_trajectory

The reference trajectory for comparison: 2_epsilon-0.0_Delta-1.0_lambda-0.1_gamma-4.0_beta-1.0.npy

Training an AIQD model

MLQD
n_states=2
time=20
time_step=0.05
QDmodel=createQDmodel
QDmodelType=AIQD
prepInput=True
numLogf=10
LogCa=1.0
LogCb=15.0
LogCc=-1.0
LogCd=1.0
energyNorm=1.0
DeltaNorm=1.0
gammaNorm=10
lambNorm=1.0
tempNorm=1.0
systemType=SB
hyperParam=True
patience=10
epochs=10
max_evals=10
QDmodelOut=AIQD_SB_model

Propagation of dynamics with the trained AIQD model

We just pass the parameters and the trained AIQD model should be able to predict the corresponding dynamics

MLQD
n_states=2
time=20
time_step=0.05
energyDiff=1.0
Delta=1.0
gamma=4.0
lamb=0.1
temp=1.0
QDmodel=useQDmodel
QDmodelType=AIQD
energyNorm=1.0
DeltaNorm=1.0
gammaNorm=10
lambNorm=1.0
tempNorm=1.0
numLogf=10
systemType=SB
QDmodelIn=AIQD_SB_model.hdf5
QDtrajOut=Qd_trajectory

Training an OSTL model

MLQD
n_states=2
QDmodel=createQDmodel
QDmodelType=OSTL
prepInput=True
energyNorm=1.0
DeltaNorm=1.0
gammaNorm=10
lambNorm=1.0
tempNorm=1.0
systemType=SB
hyperParam=True
patience=10
epochs=10
max_evals=10
QDmodelOut=OSTL_SB_model

Propagation of dynamics with the trained OSTL model

We just pass the parameters and the trained OSTL model should be able to predict the corresponding dynamics in one shot

MLQD
n_states=2
time=20
time_step=0.05
energyDiff=1.0
Delta=1.0
gamma=4.0
lamb=0.1
temp=1.0
QDmodel=useQDmodel
QDmodelType=OSTL
energyNorm=1.0
DeltaNorm=1.0
gammaNorm=10
lambNorm=1.0
tempNorm=1.0
systemType=SB
QDmodelIn=OSTL_SB_model.hdf5
QDtrajOut=Qd_trajectory

UV/vis spectra

UV/vis spectra (cross-sections) can be calculated with ML-Nuclear Ensemble Approach (ML-NEA). Detailed tutorial 1 and tutorial 2 are available.

For full functionality, Newton-X (tested with version 2.2) and Gaussian should be installed (see installation instructions including settings appropriate environmental variables like $NX and $GAUSS_EXEDIR). Neither Newton-X nor Gaussian are available on MLatom@XACS cloud.

optional arguments:

Nexcitations=N

number of excited states to calculate.

default=3

nQMpoints=N

user-defined number of QM calculations for training ML.

default=0, number of QM calculations will be determined iteratively

plotQCNEA

requests plotting QC-NEA cross section

deltaQCNEA=float

define the broadening parameter of QC-NEA cross section

plotQCSPC requests

plotting cross section obtained via single point convolution

required files:

  1. mandatory file
    • gaussian_optfreq.com input file for Gaussian opt and freq calculations Alternatively, files eq.xyz (XYZ file with equilibrium, optimized, geometry) and nea_geoms.xyz (file with all geometries in nuclear ensemble) can be provided.

    • gaussian_ef.com template file for calculating excitation energies and oscillator strengths with Gaussian.

  2. optional file
    • cross-section_ref.dat reference cross section file calculated in format similar to that of Newton-X (1st column: DE/eV; 2nd column: lambda/nm; 3rd column: sigma/A2)

    • eq.xyz file with optimized geometry (has to be used together with nea_geoms.xyz)

    • nea_geoms.xyz file with all geometries in nuclear ensemble (has to be used together with eq.xyz)

    • E1.dat  E2.dat ... and f1.dat  f2.dat ... files that stores the exciting energy and oscillator strength per line which correspond to nea_geoms.xyz.

output files:

  • cross-section/cross-section_ml-nea.dat: cross-section spectra calculated with ML-NEA method

  • cross-section/cross-section_qc-nea.dat: cross-section spectra calculated with QC-NEA method

  • cross-section/cross-section_spc.dat: cross-section spectra calculated with single-point-convolution

  • cross-section/plot.png: the plotting that contains cross-section calculated with different kinds of method.

Two-photon absorption cross sections

This simulation type is performed as described in this publication. It is currently only available on the MLatom@XACS cloud and will be released soon. See the original source code on GitHub.

To run ML-TPA calculations locally, the following packages have to be installed:

  • python >= 3.7

  • scikit-learn<1.0.0

  • xgboost>=1.5.0

  • rdkit>=2022.03.3

  • numpy>=1.21.1

  • pandas>=1.0.1

After proper python environment is built. Install packages from conda is recommened, i.e., you need to run:

pip install pandas
pip install numpy==1.22
pip install scikit-learn==0.24.2
pip install xgboost==1.5
pip install rdkit

Input and output arguments

  • MLTPA

    • required.

    • requests calculation of the two-photon absorption (TPA) cross section for a spectra or a given wavelength.

  • SMILLESfile=[file with SMILES]

    • this argument is required; no default file name.

    • file with SMILES of one or many molecules.

Output contains the comma-separated predicted ML-TPA cross section values or spectra in units of GM for each wavelength. Output files tpa[molecular index as in SMILESfile].txt are saved in a folder tpa[absolute time] in current path.

Additional options for wavelength and solvent

auxfile=[file with the information of wavelength and Et30 in the format of 'wavelength_lowbound,wavelength_upbound,Et30'] (wavelength in nm.): If the auxiliary file does not exist, then the default value of Et30 will be 33.9 (toluene) and the whole spectra between 600-1100 nm will be provided. The entries (lines) should be provided in the same order as in SMILLESfile. See the list with the solvents and their Et30 values.

Example

Here we show how to calculate TPA cross section for RHODAMINE 6G and RHODAMINE 123 molecules with MLatom input file mltpa.inp:

MLTPA
SMILESfile=Smiles.csv
auxfile=_aux.txt

This input requires Smiles.csv file with SMILES of molecules:

CCNC1=CC2=C(C=C1C)C(=C3C=C(C(=[NH+]CC)C=C3O2)C)C4=CC=CC=C4C(=O)OCC.[Cl-]
COC(=O)C1=CC=CC=C1C2=C3C=CC(=N)C=C3OC4=C2C=CC(=C4)N.Cl

and optional _aux.txt:

600,850,55.4
600,600,33.9

After you prepared your input files mltpa.inp, Smiles.csv, and _aux.txt, you can run MLatom as usual:

mlatom mltpa.inp > mltpa.out

After the calculations finish, the predicted TPA cross section values are saved in a Folder named tpa[absolute time]. In the folder, there are two files for two molecules: tpa1.txt and tpa2.txt. For our examples, it looks like:

wavelength,predicted_sigma (GM)
600.0,285.19455
610.0,297.71707
620.0,284.11694
......
810.0,121.51988
820.0,116.537994
830.0,118.04909
840.0,103.65925
850.0,113.72374

wavelength,predicted_sigma (GM)
600.0,138.2346