.. _tutorial_freq: Frequencies and thermochemistry ========================================== In this tutorial we show how to calculate frequencies of molecules with MLatom. We will basically continue from :ref:`geometry optimization ` tutorial, as it is a common practice to first optimize geometry and then calculate its frequency, i.e., to check whether it is a true minimum or transition state (number of imaginary frequencies) and obtain ZPVE and other thermochemical corrections. .. warning:: We recommend to use :ref:`UAIQM ` methods for these calculations, example below is just to show that you can use other methods too. Here is a video of frequency and thermochemistry calculations using XACS cloud and Python API: .. raw:: html We start with the same example as that in the beginning of geometry optimization tutorial. Geometry optimization is typically needed before calculating frequencies and the same method or model should be used for both. After getting the optimized geometry (:download:`opt.xyz `) as described in the :ref:`tutorial on geometry optimization `, you can do frequency calculations with input file/command line options :download:`freq.inp ` shown below (see the :ref:`separate section ` on calculating frequencies via Python API): .. code-block:: freq # 1. requests frequency calculation ANI-1ccx # 2. pre-trained model, the same as that used in geometry optimization xyzfile=opt.xyz # 3. file with optimized geometry Similar to geometry optimization, the user needs to provide the model or method, which can be QM, ML or their combinations. Analytical Hessian is used when it is available, otherwise semianalytical (calculated with analytical gradients) or fully numerical Hessian (calculated with only energies) can be used. ``xyzfile=[file name]`` tells MLatom in what file it can find the geometry. The user can directly use the optimized geometry output by MLatom (file defined by ``optxyz`` keyword), or provide it in XYZ format, e.g., for this example (in Angstrom, e.g., :download:`opt.xyz `) which would look like: .. code-block:: 9 C -1.2121068029534 -0.2252488537660 0.0000164117920 H -1.2701429633537 -0.8599818439347 -0.8855315546559 H -1.2701264459840 -0.8599834499240 0.8855643352209 H -2.0710636368970 0.4504289041619 0.0000252076803 C 0.0804149514179 0.5556635798575 0.0000041512523 H 0.1395635927681 1.1970970820212 -0.8856250576131 H 0.1395848675981 1.1970854779804 0.8856411245403 O 1.1428329343502 -0.3971737931938 -0.0000106635278 H 1.9796722202817 0.0702558186994 -0.0001121252171 After you created the input file, you can run MLatom, e.g., on XACS cloud, as: .. code-block:: bash mlatom freq.inp &> freq.out The program output will be saved in file ``freq.out``. Alternatively, you can run the same simulation in the command line with the same options: .. code-block:: bash mlatom freq ANI-1ccx xyzfile=opt.xyz You should be able to find in the MLatom output the vibration analysis and thermochemistry results. The vibration analysis prints out frequency, reduced mass and force constant of each normal mode. The thermochemistry part prints out zero-point energy, enthalpy, Gibbs free energy, heat of formation and other properties. For our example, the part of the output would look like: .. _ethanol_freq_output: .. code-block:: ============================================================================== Vibration analysis for molecule 1 ============================================================================== Multiplicity: 1 Rotational symmetry number: 1 This is a nonlinear molecule Mode Frequencies Reduced masses Force Constants (cm^-1) (AMU) (mDyne/A) 1 261.0407 1.0929 0.0439 2 286.9474 1.1295 0.0548 3 428.0328 2.7260 0.2943 4 841.1644 1.0947 0.4564 5 928.2688 2.3346 1.1853 6 1082.6338 3.1779 2.1946 7 1138.6399 1.9583 1.4959 8 1212.5257 1.5518 1.3442 9 1287.5002 1.1134 1.0874 10 1292.8263 1.0682 1.0520 11 1426.7497 1.2307 1.4760 12 1457.4408 1.3203 1.6524 13 1496.9454 1.1496 1.5178 14 1501.2654 1.0458 1.3887 15 1524.8936 1.0529 1.4426 16 3063.7060 1.0530 5.8232 17 3076.9020 1.1120 6.2026 18 3093.2039 1.0342 5.8299 19 3201.4189 1.1000 6.6424 20 3213.8757 1.1004 6.6964 21 3741.4822 1.0681 8.8098 ============================================================================== Thermochemistry for molecule 1 ============================================================================== Standard deviation of NNs : 0.00063190 Hartree 0.39652 kcal/mol Energy : -154.89195912 Hartree ZPE-exclusive internal energy at 0 K: -154.89196 Hartree Zero-point vibrational energy : 0.08101 Hartree Internal energy at 0 K: -154.81095 Hartree Enthalpy at 298 K: -154.80576 Hartree Gibbs free energy at 298 K: -154.83631 Hartree Atomization enthalpy at 0 K: 1.21200 Hartree 760.54458 kcal/mol ZPE-exclusive atomization energy at 0 K: 1.29301 Hartree 811.37653 kcal/mol Heat of formation at 298 K: -0.09030 Hartree -56.66187 kcal/mol .. note:: MLatom will dump molecule objects in json files named ``freq[molecule number].json`` for each molecule containing information from the frequency calculations such as normal modes, frequencies, thermochemistry data. .. _tutorial_freq_visualize: .. include:: tutorial_vibrview.inc Choosing the method or model ---------------------------- The user can benefit from MLatom's many choices of the methods or models. Many methods such as ANI-1ccx or AIQM1 are recognized automatically by MLatom as shown above. We also show examples of using QM method (B3LYP/6-31G*) and user-trained ML model (ANI model). The general way to define the QM method is to use ``method`` and the corresponding QM program providing its implementations (MLatom uses interfaces to such programs). For example, the input file of using B3LYP/6-31G*: .. code-block:: freq # 1. requests frequency calculation method=B3LYP/6-31G* # 2. requests running DFT calculation with B3LYP qmprog=PySCF # 3. requests using PySCF for B3LYP calculations; qmprog=Gaussian can also be used xyzfile=opt.xyz # 4. file with optimized geometry The general way to request the frequency calculations with the user-trained ML model is to specify the ML model type with ``MLmodelType`` and ``MLmodelIn`` keyword. For example, the input file of using ANI model: .. code-block:: freq # 1. requests frequency calculation MLmodelType=ANI # 2. request optimization with the ANI-type of machine learning potential MLmodelIn=ani.npz # 3. the file with the trained model should be provided, here it is ani.npz file xyzfile=opt.xyz # 4. file with optimized geometry The advanced user-defined hybrid models such as delta-learning models can be used for frequency calculation only via Python API as :ref:`demonstrated separately `. .. note:: In principle, our default recomendation is to use AIQM1 as it is usually the most accurate method for affordable cost. However, it is currently only applicable to the compounds with CHNO elements. Another limitation is that to unlock its full potential, we recommend to install the MNDO program for its QM part. The XACS cloud uses Sparrow instead, which currently only provides numerical gradients and Hessians severly limiting its applicability for frequency calculations. ANI-1ccx is hence a good alternative on the XACS cloud if it can be applied (it has additional limitations to neutral closed-shell compounds in ground state). Also, currently, among ML models, Hessians are currently available only for ANI-type models. Calculations of many molecules -------------------------------- MLatom can also perform frequency calculations of many molecules like geometry optimization. MLatom will output the vibration analysis and thermochemical properties of each molecule. To do these calculations, you simply need to prepare the XYZ file with the geometries of multiple molecules (or get this file from MLatom's previous geometry optimization calculations), e.g., for the hydrogen and methane molecules your ``opt.xyz`` file may look like: .. code-block:: bash 2 1 0.0000000000 0.0000000000 0.0000000000 1 0.7414000000 0.0000000000 0.0000000000 5 C 0.0000000000 0.0000000000 0.0000000000 H 1.0870000000 0.0000000000 0.0000000000 H -0.3623333220 -1.0248334322 -0.0000000000 H -0.3623333220 0.5124167161 -0.8875317869 H -0.3623333220 0.5124167161 0.8875317869 Charge and multiplicity ----------------------- A quick reminder on how to define the charges and multiplicities of a single or multiple molecules. The input might looks something like: .. code-block:: freq # 1. requests frequency calculation AIQM1 # 2. AI-enhanced quantum mechanical method 1 xyzfile=opt.xyz # 3. file with optimized geometries charges=0,1 # 4. charge of the first molecule is 0, of the second is 1. multiplicities=1,2 # 5. multiplicity of the first molecule is 1, second - 2 .. note:: It does not make sense to define charges and multiplicities for models like ANI-1ccx which were trained on the neutral closed-shell species. Choosing the program --------------------- MLatom provides several ways (programs) to calculate frequencies and thermochemical properties, similarly to geometry optimizations, the user can choose the program with the keyword ``freqprog``. Three options are available: ``freqprog=Gaussian``, ``freqprog=PySCF`` and ``freqprog=ASE`` for using the respective programs. These are actually different tasks and while they are split in Python API (:ref:`see below `), in input file, when ``freq`` is requested, both tasks are performed. ``freqprog=Gaussian`` will perform both tasks with Gaussian and will generate input and output files which the user can modify and inspect. Gaussian's interface is the most time-tested and hence, robust, but not available on the XACS cloud and needs to be installed by the user locally. ``freqprog=PySCF`` will use PySCF to generate thermochemical properties which is quite robust and also available on the XACS cloud since PySCF is an open-source Python package. If ``freqprog=ASE`` is requested, the ASE is used to calculate thermochemical properties following the frequencies calculations performed either with the improved code based on TorchANI (modified to correctly capture the negative frequency in transition states). Frequencies given by modified TorchANI's code was also tested by us and available on the XACS cloud. If no ``freqprog`` is provided, by default, MLatom will check whether Gaussian is available and use it, then PySCF and ASE & modified TorchANI combination. Whatever program you chhose, it is worth checking the instructions below for pitfalls and tips. ASE ++++ For MLatom version older than 3.1.1, the user should tell ASE whether molecules are linear by using ``ase.linear`` keyword. If the molecule is linear, you need to set this value as 1, otherwise it is by default 0. If your input XYZ file has more than 1 molecule, ``ase.linear=1`` makes MLatom assume that they are all linear molecules. You can specify the linearity by, e.g., ``ase.linear=1,0``. For version newer than 3.1.1 (including), MLatom can automatically detect the linearity of molecules so you don't have to specify explicitly. But you can still do that which will override MLatom's judgement. It is also important to specify the symmetry of your molecules by using keyword ``ase.symmetrynumber``. The default one in MLatom corresponds to :math:`C_1` point group. You can get the symmetry number by referring to the table below, taken from C. Cramer "Essentials of Computational Chemistry", 2nd Ed. ================================ ==================== Point Group Symmetry number ================================ ==================== :math:`C_1` :math:`1` :math:`C_i` :math:`1` :math:`C_s` :math:`1` :math:`C_{{\infty}v}` :math:`1` :math:`D_{{\infty}h}` :math:`2` :math:`S_n,n=2,4,6,...` :math:`n/2` :math:`C_n,n=2,3,4,...` :math:`n` :math:`C_{nh},n=2,3,4,...` :math:`n` :math:`C_{nv},n=2,3,4,...` :math:`n` :math:`D_n,n=2,3,4,...` :math:`2n` :math:`D_{nh},n=2,3,4,...` :math:`2n` :math:`D_{nd},n=2,3,4,...` :math:`2n` :math:`T` :math:`12` :math:`T_d` :math:`12` :math:`O_h` :math:`24` :math:`I_h` :math:`60` ================================ ==================== For example, if your input XYZ file has hydrogen (belongs to :math:`D_{{\infty}h}` group) and methane (belongs to :math:`T_d` group), the input file should look like this: .. code-block:: freq ANI-1ccx xyzfile=opt.xyz freqprog=ase ase.linear=1,0 ase.symmetrynumber=2,12 Gaussian +++++++++ When Gaussian is used, you will see the ``gaussian.log`` file which is similar to the common Gaussian output file with the difference, that it will use the MLatom's model to predict Hessian to perform the frequency calculations. Then you can use your favorite tools to analyze this output as usual for Gaussian, which is a big advantage for the Gaussian users. Here is the input file of using Gaussian to calculate frequencies. .. code-block:: freq ANI-1ccx xyzfile=opt.xyz freqprog=Gaussian For above example, you can check the MLatom's output file: .. code-block:: ============================================================================== Vibration analysis for molecule 1 ============================================================================== Multiplicity: 1 This is a nonlinear molecule Mode Frequencies Reduced masses Force Constants (cm^-1) (AMU) (mDyne/A) 1 264.1253 1.0872 0.0447 2 288.7816 1.1305 0.0555 3 429.1248 2.7295 0.2961 4 840.9571 1.0959 0.4566 5 929.0118 2.3254 1.1825 6 1083.2419 3.1763 2.1960 7 1138.7381 1.9666 1.5025 8 1211.7614 1.5471 1.3384 9 1291.9096 1.0694 1.0516 10 1299.4679 1.1131 1.1074 11 1434.7836 1.2319 1.4941 12 1464.0813 1.3257 1.6743 13 1497.4787 1.1427 1.5097 14 1502.1301 1.0457 1.3902 15 1525.4126 1.0539 1.4449 16 3045.3746 1.0531 5.7544 17 3061.9747 1.1130 6.1482 18 3092.9313 1.0343 5.8295 19 3197.6062 1.0999 6.6261 20 3213.0380 1.1000 6.6905 21 3741.0501 1.0681 8.8073 ============================================================================== Thermochemistry for molecule 1 ============================================================================== Standard deviation of NNs : 0.00062817 Hartree 0.39419 kcal/mol Energy : -154.89196013 Hartree ZPE-exclusive internal energy at 0 K: -154.89196 Hartree Zero-point vibrational energy : 0.08100 Hartree Internal energy at 0 K: -154.81096 Hartree Enthalpy at 298 K: -154.80578 Hartree Gibbs free energy at 298 K: -154.83631 Hartree Atomization enthalpy at 0 K: 1.21202 Hartree 760.55134 kcal/mol ZPE-exclusive atomization energy at 0 K: 1.29301 Hartree 811.37710 kcal/mol Heat of formation at 298 K: -0.09032 Hartree -56.67409 kcal/mol The examples of the full :download:`MLatom ` and :download:`Gaussian ` output files can be downloaded too. An additional important advantage is that the advanced users can modify the ``gaussian.com`` file which contains the input required for geometry optimization with MLatom's models. After the ``gaussian.com`` file is modified, you can run with it the Gaussian job as usual, e.g., in your system like ``g16 gaussian.com`` in command line. PySCF +++++ Frequency calculation with PySCF works similar with that in Gaussian. We slightly modified `source code `_ to make it adaptive to MLatom. By using input as below, .. code-block:: freq ANI-1ccx xyzfile=opt.xyz freqprog=PySCF you will get the MLatom's output file: .. code-block:: ============================================================================== Vibration analysis for molecule 1 ============================================================================== Multiplicity: 1 This is a nonlinear molecule Mode Frequencies Reduced masses Force Constants (cm^-1) (AMU) (mDyne/A) 1 263.8939 1.0878 0.0446 2 288.6910 1.1303 0.0555 3 429.0255 2.7312 0.2962 4 840.8114 1.0961 0.4566 5 928.7823 2.3273 1.1829 6 1082.9072 3.1819 2.1984 7 1138.4541 1.9665 1.5017 8 1211.4647 1.5472 1.3379 9 1291.7699 1.0696 1.0516 10 1299.3247 1.1131 1.1072 11 1434.5579 1.2319 1.4937 12 1463.8316 1.3262 1.6743 13 1497.2788 1.1422 1.5087 14 1501.9697 1.0458 1.3901 15 1525.2540 1.0541 1.4448 16 3045.0585 1.0532 5.7540 17 3061.5892 1.1131 6.1473 18 3092.6304 1.0344 5.8292 19 3197.1933 1.1000 6.6252 20 3212.6500 1.1001 6.6896 21 3740.7180 1.0683 8.8072 ============================================================================== Thermochemistry for molecule 1 ============================================================================== Standard deviation of NNs : 0.00062817 Hartree 0.39419 kcal/mol Energy : -154.89196013 Hartree ZPE-exclusive internal energy at 0 K: -154.89196 Hartree Zero-point vibrational energy : 0.08098 Hartree Internal energy at 0 K: -154.81098 Hartree Enthalpy at 298 K: -154.80579 Hartree Gibbs free energy at 298 K: -154.83632 Hartree Atomization enthalpy at 0 K: 1.21203 Hartree 760.55897 kcal/mol ZPE-exclusive atomization energy at 0 K: 1.29301 Hartree 811.37718 kcal/mol Heat of formation at 298 K: -0.09033 Hartree -56.68121 kcal/mol .. _freq_with_PyAPI: Calculations through Python API ----------------------------------------- The Python API provides more flexibility to use more complicated models and build custom workflows. Below is a workflow of optimizing geometry and calculating frequency (see :download:`Jupyter notebook `): .. raw:: html :file: tutorial_files/freq/freq.html .. note:: Note that ``ml.thermochemistry`` contains both vibration analysis and thermochemistry calculation. If you just want vibrational analysis, you can use ``ml.freq`` instead. All the results are saved as the arguments of ``final_mol``. .. _freq_custom_models: Calculations with custom hybrid models ++++++++++++++++++++++++++++++++++++++ Finally, a more advanced example of using a custom delta-learning model, trained on the differences between full configuration interaction and Hartree--Fock energies: XYZ coordinates, full CI energies and Hartree--Fock energies of the training points can be downloaded here (:download:`xyz_h2_451.dat `, :download:`E_FCI_451.dat `, :download:`E_HF_451.dat `). You can try to train the delta-learning model by yourself. Here, we provide a pre-trained model using ANI :download:`delta_FCI-HF_h2_ani.npz `. .. code-block:: python import mlatom as ml # Get the initial guess for the molecules to optimize initmol = ml.data.molecule.from_xyz_file('h2_init.xyz') # Let's build our own delta-model # First, define the baseline QM method, e.g., baseline = ml.models.model_tree_node(name='baseline', operator='predict', model=ml.models.methods(method='HF/STO-3G', program='pyscf')) # Second, define the correcting ML model, in this case the ANI model trained on the differences between full CI and HF delta_correction = ml.models.model_tree_node(name='delta_correction', operator='predict', model=ml.models.ani(model_file='delta_FCI-HF_h2_ani.npz')) # Third, build the delta-model which would sum up the predictions by the baseline (HF) with the correction from ML delta_model = ml.models.model_tree_node(name='delta_model', children=[baseline, delta_correction], operator='sum') # Optimize the geometry with the delta-model as usual: geomopt = ml.optimize_geometry(model=delta_model, initial_molecule=initmol, program='ASE') # Get the final geometry approximately at the full CI level final_mol = geomopt.optimized_molecule print('Optimized coordinates:') print(final_mol.get_xyz_string()) final_mol.write_file_with_xyz_coordinates(filename='final.xyz') # Let's check how many full CI calculations, our delta-model saved us print('Number of optimization steps:', len(geomopt.optimization_trajectory.steps)) # Calculate frequency freq = ml.freq(model=delta_model, molecule=final_mol, program='') # Check vibration analysis print("Mode Frequencies Reduced masses Force Constants") print(" (cm^-1) (AMU) (mDyne/A)") for ii in range(len(final_mol.frequencies)): print("%d %13.4f %13.4f %13.4f"%(ii,final_mol.frequencies[ii],final_mol.reduced_masses[ii],final_mol.force_constants[ii])) Given an initial geometry :download:`h2_init.xyz `, you will get the following output when using the codes above. .. code-block:: Optimized coordinates: 2 H 0.1272159900000 0.0000000000000 0.0000000000000 H 0.8727840100000 0.0000000000000 0.0000000000000 Number of optimization steps: 7 Mode Frequencies Reduced masses Force Constants (cm^-1) (AMU) (mDyne/A) 0 4327.2885 1.0078 11.1190 Thermochemistry in Diels--Alder reaction ---------------------------------------- By using these thermochemical properties, you can easily evaluate relative energies for various aspects of chemical behavior, such as the change of energy profile during reaction process. We take the Diels--Alder reaction of cyclopentadiene and maleimide, an example in the `MLatom 3 paper `_. You can download the geometry file needed here: :download:`re.xyz `. There are 3 structures in ``re.xyz``: the first two strucutres are reactants and the third one is the product (see figure below): .. image:: files/re.png Thus, the basic idea here is that we will calculate thermochemical properties for each molecule and then calculate the energy changes from product to reactants. We will illustrate that it is very convenient to do with the Python API (the same can be done manually through input file/command line calculations by performing thermochemical calculations for each molecule and get the changes in each type of energy). .. code-block:: python import mlatom as ml # read molecules from .xyz file reDB = ml.data.molecular_database().read_from_xyz_file('re.xyz') optreDB = ml.data.molecular_database() # define method to calculate thermochemical properties model = ml.models.methods(method='AIQM1', qm_program='mndo') for mol in reDB: # geometry optimization geomopt = ml.optimize_geometry(model=model, initial_molecule=mol, program='gaussian') optmol = geomopt.optimized_molecule # frequency calculation ml.thermochemistry(model=model, molecule=optmol) optreDB.append(optmol) # get relative energies deltaE = optreDB[2].energy - optreDB[1].energy - optreDB[0].energy deltaG = optreDB[2].G - optreDB[1].G - optreDB[0].G deltaH = optreDB[2].H - optreDB[1].H - optreDB[0].H print(deltaE) print(deltaG) print(deltaH) Here is the summary of the output containing changes in ZPVE-exclusive energy, Gibbs free energy, and enthalpy. ========= ================== ================== ================== Method :math:`\Delta E_r` :math:`\Delta G_r` :math:`\Delta H_r` ========= ================== ================== ================== reference -34.20 \- \- AIQM1 -34.13 -15.98 -30.36 ========= ================== ================== ================== This shows that if you can use AIQM1, you should use this method to obtain accurate results fast; it is often much more preferable than performing analogous, slower and less accurate, DFT calculations. Any questions or suggestions? ----------------------------- If you have further questions, criticism, and suggestions, we would be happy to receive them in English or Chinese via `email `_, `Slack `_ (preferred), or WeChat (please send an email to request to add you to the XACS user support group).