.. _tutorial_sp:

Single-point calculations
=========================
This tutorial shows how to run single-point calculations with MLatom.

Single-point calculations can be performed with various models and methods in MLatom 
including: 

* QM methods,
* hybrid QM/ML methods,
* pretrained and user-defined ML models.

For more details on the models in MLatom, please check an :ref:`overview <overview_models_and_methods>`. 
We also created a video covering the contents of this tutorial. You can find this video on `Youtube <https://www.youtube.com/watch?v=fNpB0LLlx-8>`__.

Here we will illustrate how to calculate isomerization energy of sugar in `ISOL24 <https://pubs.rsc.org/en/content/articlelanding/2017/cp/c7cp04913g>`_ with different types of methods in MLatom. 

.. image:: files/sugar.png

You can download the required file :download:`sugar_iso.xyz <files/sugar_iso.xyz>` with geometries of two isomers in XYZ format (in Angstrom). 
The basic idea here is that we will get single-point energies for isomers and calculate their energy difference. Since MLatom is a data-oriented package, it natively enables the calculations of many molecules. We first show how to run simulations via input files/command-line options and then :ref:`how to get single-point properties in Python <tutorial_sp_pyapi>`.


Single-point calculations with QM methods
+++++++++++++++++++++++++++++++++++++++++
MLatom supports single-point QM calculations with various popular programs such as 
`Gaussian <https://gaussian.com/capabilities/?tabid=0>`_, `PySCF <https://pyscf.org/user.html>`_, `xTB <https://xtb-docs.readthedocs.io/en/latest/>`_, etc. 
Detailed manuals for simulations with QM methods can be found in :ref:`Simulations with QM methods <cl_manual_qm_methods>`.

The typical input file for single-point calculations should contain ``xyzfile`` which defines geometry file and ``yestfile`` which defines the file to store energy. 
For QM methods, users need to provide the ``method`` name and ``qmprog``, i.e. the corresponding QM program. 
Here is an example of the input file for energy-only calculations at B3LYP/6-31G* level with Gaussian

.. code::

    method=B3LYP/6-31G*
    qmprog=gaussian
    xyzfile=sugar_iso.xyz
    yestfile=enest.dat

By running the command ``$mlatom mlatom.inp > mlatom.out``, you will the get the MLatom output file as follows:

.. code::

    ****************************************************************************** 

    Energy of molecule      1:        -687.1484077000000 Hartree
    Energy of molecule      2:        -687.1383648000000 Hartree

    ============================================================================== 
    Wall-clock time: 83.08 s (1.38 min, 0.02 hours)

    MLatom terminated on 03.02.2024 at 09:35:57
    ============================================================================== 

In the output file, energies of two molecules are listed in the same order as that in the input ``.xyz`` file.

Single-point calculations with hybrid QM/ML methods
+++++++++++++++++++++++++++++++++++++++++++++++++++

One of the successful examples of hybrid QM/ML method is `AIQM1 <https://doi.org/10.1038/s41467-021-27340-2>`_, which is  a general-purpose method approaching the gold-standard coupled cluster 
QM method at the cost of low-level semiempirical QM method. It takes the advantage of delta-learning strategy, 
correcting less accurate semiempirical method with state-of-art ML model ensemble (see figure below). Beyond this, in MLatom, users can also define their own delta-learning model,
which is currently easy to achieve with Python API (see :ref:`python API tutorial <tutorial_sp_api_hybrid>`). 

.. image:: files/aiqm1.png


For AIQM1, users just need to request ``AIQM1`` calculation in the first line:

.. code::

    AIQM1                     # 1. request AIQM1 method 
    xyzfile=sugar_iso.xyz     # 2. file with geometry 
    yestfile=enest.dat        # 3. file to store energy

After running the command ``$mlatom mlatom.inp > mlatom.out``, users will get the ``mlatom.out`` as :

.. code::

    ****************************************************************************** 

    Properties of molecule 1

    Standard deviation of NN contribution    :      0.00094895 Hartree         0.59548 kcal/mol
    NN contribution                          :      0.00275644 Hartree
    Sum of atomic self energies              :   -578.89384843 Hartree
    ODM2* contribution                       :   -107.70671483 Hartree
    D4 contribution                          :     -0.00358197 Hartree
    Total energy                             :   -686.60138879 Hartree


    Properties of molecule 2

    Standard deviation of NN contribution    :      0.00066560 Hartree         0.41767 kcal/mol
    NN contribution                          :      0.00159327 Hartree
    Sum of atomic self energies              :   -578.89384843 Hartree
    ODM2* contribution                       :   -107.68954498 Hartree
    D4 contribution                          :     -0.00351388 Hartree
    Total energy                             :   -686.58531402 Hartree

    ============================================================================== 
    Wall-clock time: 0.77 s (0.01 min, 0.00 hours)

    MLatom terminated on 03.02.2024 at 09:20:02
    ============================================================================== 
 
For each molecule, all the energy components contained in AIQM1 method are presented, including ``ODM2* energy``, ``D4 enengy`` and NN energy (sum of ``NN contribution`` and ``atomic self energies``),  
The ``standard deviation of NN contribution`` can be used to judge whether AIQM1 is reliable. 

As a comparison, we show here isomeration energies calculated by each method and their respective times used. 

.. list-table:: 
   :widths: 25 25 25
   :header-rows: 1

   * - Method
     - :math:`\Delta E_\text{r}` (kcal/mol)
     - time (s)
   * - reference       
     - 10.69  
     - \-
   * - AIQM1           
     - 10.08                          
     - 0.77
   * - B3LYP/6-31G*    
     - 6.27                            
     - 83  

Single-point calculations with user-trained ML models
+++++++++++++++++++++++++++++++++++++++++++++++++++++

MLatom can read a user-trained model from a file and make predictions with it for new data (given either as input vectors X or as XYZ coordinates).
Users should provide either ``MLmodelType`` and/or ``MLprog`` argument. Below is the table for type of ML models and programs available:

==============   ============
MLmodelType      MLporg   
==============   ============
KREG             MLatomF
sGDML            sGDML
GAP-SOAP         GAP 
PhysNet          PhysNet 
DeePot-SE        DeepMD-kit
ANI              TorchANI
MACE             MACE 
==============   ============

We will use a pretrained `MACE <https://openreview.net/forum?id=YPpSngE-ZU>`_ model to predict molecular enenrgy. 
You can download mace model file here: :download:`mace.pt <files/mace.pt>`.
Here is an example of how to use user-trained MACE model to perform single-point calculation:

.. code::

    useMLmodel              # 1. requests usage of ML model for prediction 
    MLmodelType=MACE         # 2. request the MACE model
    MLmodelIn=mace.pt       # 3. the file with the trained model should be provided, here it is mace.pt file
    xyzfile=sugar_iso.xyz   # 4. file with geometries

For how to train ML models with MLatom, please refer to `manual <http://mlatom.com/manual/#trainpopular>`_.


Gradients and Hessian
+++++++++++++++++++++
If users need gradients and Hessian, two lines can be added in the input file:

.. code::

    AIQM1                       # 1. request AIQM1 method 
    xyzfile=sugar_iso.xyz       # 2. file with geometry 
    yestfile=enest.dat          # 3. file to store energy 
    ygradxyzestfile=grad.dat    # 4. file to store gradients 
    hessianestfile=hess.dat     # 5. file to store hessian 

``ygradxyzestfile`` will create the file storing the gradients and ``hessianestfile`` the Hessian. These keywords are case-insensitive but the file names are. 


Charge and multiplicity
+++++++++++++++++++++++

If molecules have other than default charges (default 0) and multiplicities (default 1), the input might look like:

.. code::

    method=B3LYP/6-31G*
    qmprog=gaussian
    xyzfile=sp.xyz
    yestfile=enest.dat
    charges=0,1
    multiplicities=1,3

.. _tutorial_sp_pyapi:

Calculations through PyAPI
++++++++++++++++++++++++++

Define methods  
--------------
Python API provides a more flexible usage of MLatom methods. Detailed introduction is provided in :ref:`method in python API <methods>`.
For QM methods, AIQM1 and pre-trained ML models, user can define method by using ``mlatom.models.methods`` module. 
There are three arguments here which are commonly used:

* ``method``: define the method you want to use 
* ``program``: the program to use the method
* ``nthreads``: number of threads used by the program

For example, you can define QM method as :

.. code::

    method = mlatom.models.methods(method='b3lyp/6-31G*', program='gaussian')
    # method = mlatom.models.methods(method='b3lyp/6-31G*', program='pyscf', nthreads=18)

.. note::
    AIQM1 use ``qm_program`` keyword to define the program used for ODM2* parts.

For user-trained ML models, they use different modules listed below. See :ref:`manual of API <ml_model>` for details on these modules.

===========    ========================
model type      module 
===========    ========================
KREG           ``mlatom.models.kreg``
sGDML          ``mlatom.models.sgdml``
GAP-SOAP       ``mlatom.models.gap``
PhysNet        ``mlatom.models.physnet`` 
DeePot-SE      ``mlatom.models.dpmd``
ANI            ``mlatom.models.ani``
MACE           ``mlatom.models.mace`` 
===========    ========================


Perform calculations  
--------------------
Here we will introduce the pipeline of using python API to calculate isomerization energy of sugar.
The energy unit used in MLatom is Hartree. Since the calculations are on multiple molecules, we start from using ``mlatom.data.molecular_database`` class to load these molecules and 
use ``predict`` function in method to perform calculation

.. code:: 

    import mlatom as ml 
    # read molecule from .xyz file
    molDB = ml.data.molecular_database.from_xyz_file('sugar_iso.xyz')

    # define method 
    model = ml.models.methods(method='AIQM1') 
    # model = ml.models.methods(method='B3LYP/6-31G*'. program='pyscf')  ## calculations using QM methods 

    model.predict(molecular_database=molDB)

    print(molDB[0].energy)
    print(molDB[1].energy)
    print(molDB[1].energy-molDB[0].energy)

If gradients and hessian are needed, user can request gradient and hessian calculations by:

.. code:: 

    model.predict(molecular_database=molDB, calculate_energy_gradients=True, calculate_hessian=True)

If only one molecules is used, user can also use ``mlatom.data.molecule`` class

.. code:: 

    mol = ml.data.molecule.from_xyz_file('sp.xyz')

and use it to perform the calculation:

.. code:: 

    model.predict(molecule=mol)

User can also assign charges and multiplicities to ``molecular_database`` or ``molecule`` by:

.. code:: 

    mol.charge = 1
    mol.multiplicity = 3

    molDB.charges = [1,1]
    molDB.multiplicities = [1,3]


Retrieve properties 
-------------------

In MLatom, energy and Hessian will be stored in ``molecule`` object and gradients will be stored in ``atom`` object.
(Details in :ref:`data in python API <data>`). User can request them by using

.. code:: 

    mol.energy
    mol.get_energy_gradients()
    mol.hessian 

``molecular_database`` provide funciton ``get_properties`` to extract properties from each ``molecule`` object. 
You can retrieve an array of energy, gradients and hessians by using:

.. code:: 

    molDB.get_properties('energy')
    molDB.get_xyz_vectorial_properties('energy_gradients')
    molDB.get_properties('hessian')

    
.. _tutorial_sp_api_hybrid:

Single-point calculations with user-defined hybrid models
---------------------------------------------------------
MLatom possesses the power of combining different models together by use of ``mlatom.models.model_tree_node`` module, which works similar to tree structure 
(details in :ref:`model tree node in Python API <model_tree_node>`)

Here we use a custom delta-learning model as an example, which was trained on the differences between full configuration interaction and Hartree--Fock energies of H\ :sub:`2`.
The pretrained model file can be downloaded here :download:`delta_FCI-HF_h2_ani.npz <files/delta_FCI-HF_h2_ani.npz>`.

You can define 2 children nodes as: 

.. code::

    baseline=ml.models.model_tree_node(name='baseline', operator='predict', model=ml.models.methods(method='HF/STO-3G', program='pyscf')
    delta_correction=ml.models.model_tree_node(name='delta_correction', operator='predict', model=ml.models.ani(model_file='delta_FCI-HF_h2_ani.npz')`
 
You can see there is ``operator`` that operate on the model you defined (here each model just performs prediction task). 
Now we define the parent node as: 

.. code:: 

    ml.models.model_tree_node(name='hybrid', operator='sum', children=[baseline, delta_correction])

Here, the operator changes to ``sum`` since the operation we will perform on this node is summing up the predictions by two children nodes. 

Finally we combine these things together and predict energy of H\ :sub:`2`. You can download the geometry file needed here :download:`h2_init.xyz <files/h2_init.xyz>`.

.. code:: 
    
    import mlatom as ml

    # Get the initial guess for the molecules to optimize
    mol = ml.data.molecule.from_xyz_file('h2_init.xyz')

    # Let's build our own delta-model
    # First, define the baseline QM method, e.g., 
    baseline = ml.models.model_tree_node(name='baseline', operator='predict', model=ml.models.methods(method='HF/STO-3G', program='pyscf'))
    # Second, define the correcting ML model, in this case the KREG model trained on the differences between full CI and HF
    delta_correction = ml.models.model_tree_node(name='delta_correction', operator='predict', model=ml.models.ani(model_file='delta_FCI-HF_h2_ani.npz'))
    # Third, build the delta-model which would sum up the predictions by the baseline (HF) with the correction from ML
    delta_model = ml.models.model_tree_node(name='delta_model', children=[baseline, delta_correction], operator='sum')

    delta_model.predict(molecule=mol)

    print(mol.energy)


Any questions or suggestions?
-----------------------------

If you have further questions, criticism, and suggestions, we would be happy to receive them in English or Chinese via `email <xacs@xmu.edu.cn>`_, `Slack <https://join.slack.com/t/xacs-support/shared_invite/zt-1gm1lpn68-pReQhfYGu813eCqwmvdGvA>`_ (preferred), or WeChat (please send an email to request to add you to the XACS user support group).