.. _tutorial_uniml:

Universal ML models
==========================================

:red:`New:` we highly recommend you to check out :ref:`UAIQM <tutorial_uaiqm>` that is **our ultimate solution to the universal ML models**.

MLatom supports a wide range of universal machine learning (ML)-based models including ML potentials and hybrid ML-enhanced quantum mechanical (QM) methods.
They can be used out-of-box without training. The table below lists the methods available with the links to specific tutorials:

.. csv-table:: 
    :file: files/tutorial_uniml/universal_ml_table.csv 
    :header-rows: 1

In this tutorial, we will first introduce how to use these universal methods to perform various tasks with MLatom in general. Then we will go through each method in detail. 

Using universal ML models
-------------------------------

Below is also a brief general overview of how to use universal ML models with MLatom:

- :ref:`single-point calculations <tutorial_uniml_use_sp>`
- :ref:`geometry optimization and frequency calculations <tutorial_uniml_use_geomopt_freq>`
- :ref:`molecular dynamics <tutorial_uniml_use_md>`
- :ref:`fine-tuning universal models <tutorial_uni_tl>`


.. _tutorial_uniml_use_sp:

Single-point calculations
+++++++++++++++++++++++++
For single-point calculation, only 3-5 lines would be needed for MLatom input file as usual with one of the methods above specified: 

.. code::

    AIMNet2@b973c
    xyzfile=sp.xyz
    yestfile=energy.dat

where the ``sp.xyz`` is the XYZ geometries of molecule(s), which can also be defined explicitly:

.. code::

    AIMNet2@b973c     
    xyzfile='
    2

    H    0.000000    0.000000    0.363008
    H    0.000000    0.000000   -0.363008
    5

    C    0.000000    0.000000    0.000000
    H    0.627580    0.627580    0.627580
    H   -0.627580   -0.627580    0.627580
    H    0.627580   -0.627580   -0.627580
    H   -0.627580    0.627580   -0.627580
    '
    yestfile=energy.dat

With keywords ``ygradxyzestfile`` and ``hessianestfile``, gradients and hessian can also be obtained in the file specified.

Since DM21 functionals are integrated in PySCF, user would need to use ``method`` and ``qmprog`` keywords in input file as the way to :ref:`define QM methods in MLatom <tutorial_sp>`. Here is an example of the input file for using DM21 functional with 6-31G* basis set:

.. code::

    method=DM21/6-31G*
    qmprog=pyscf 
    xyzfile=sp.xyz
    yestfile=energy.dat

Python API provides a flexible alternative to use methods in MLatom. In our case, user can define method by using ``mlatom.models.methods`` module and specify the keywords mentioned above, e.g. 

.. code::

    method = mlatom.models.methods(method='ANI-1xnr')
    # method = mlatom.models.methods(method='DM21/6-31G*', program='pyscf') 
    

We provide here an example to calculate energy, gradients and hessian with ANI-1xnr. 

.. code::

    import mlatom as ml
    # read molecule from .xyz file
    molDB = ml.data.molecular_database.from_xyz_file('sp.xyz')

    # define method
    model = ml.models.methods(method='ANI-1xnr')

    model.predict(
        molecular_database=molDB, 
        calculate_energy=True,
        calculate_energy_gradients=True,
        calculate_hessian=True)

    print(f'Energy in Hartree for molecule 0: {molDB[0].energy}')
    print(f'Gradients in Hartree/Angstrom for molecule 1: {molDB[1].get_energy_gradients()}')
    print(f'Hessian in Hartree/Angstrom^-2 for molecule 1: {molDB[1].hessian}')
    

For more details on how to perform single-point calculations with MLatom, please check our :ref:`tutorial <tutorial_sp>`. 

.. _tutorial_uniml_use_geomopt_freq:

Geometry optimization and frequency calculations  
++++++++++++++++++++++++++++++++++++++++++++++++
Geometry optimization is a common task in studying chemical system and subsequent frequency calculation on the optimized molecule accompanied by thermochemical properties are also crutial for analysis. 

To perform geometry optimization with input file in MLatom, user just need to request ``geomopt`` option in the first line along with the method to be used and initial guess:

.. code::

    geomopt                 # 1. requests geometry optimization
    ANI-1ccx                # 2. universal MLP
    xyzfile='               # 3. initial geometry guess
    9

    C       -1.691449880     -0.315985130      0.000000000
    H       -1.334777040      0.188413060      0.873651500
    H       -1.334777040      0.188413060     -0.873651500
    H       -2.761449880     -0.315971940      0.000000000
    C       -1.178134160     -1.767917280      0.000000000
    H       -1.534806620     -2.272315330      0.873651740
    H       -1.534807450     -2.272316160     -0.873650920
    O        0.251865840     -1.767934180     -0.000001150
    H        0.572301420     -2.672876720      0.000175020
    '                       
    optxyz=opt.xyz          # 4. (optional) file with optimized geometry. 
    optprog=geometric       # 5. request geometric optimizer 

Each optimization step will be printed to the output file, which can be controlled by ``printall`` and ``printmin`` keywords in after version 3.4.0. User can also choose whether to dump the optimization trajectory with keyword ``dumpopttrajs``.

After geometry optimization, frequency calculation can be performed with ``freq`` option in the input file as is shown below:

.. code::

    freq                    # 1. requests frequency calculation 
    ANI-1ccx                # 2. universal MLP
    xyzfile='               # 3. optimized geometry
    9

    C            -1.672571          -0.341122          -0.000001
    H            -1.307766           0.181713           0.885095
    H            -1.307762           0.181707          -0.885099
    H            -2.764560          -0.305014          -0.000003
    C            -1.188732          -1.771664           0.000009
    H            -1.559124          -2.298647           0.885998
    H            -1.559099          -2.298653          -0.885987
    O             0.237878          -1.729915           0.000028
    H             0.575701          -2.626896           0.000135
    '                      

In the output file, user will find the vibration analysis including frequency, reduced mass and force constant of each normal mode, and also thermochemistry results. The output file in this case can be downloaded :download:`here <files/tutorial_uniml/uniml_freq_ani1ccx.out>`. 

For more details on these two tasks with MLatom, please check our tutorials on :ref:`geometry optimization <tutorial-geomopt>` and :ref:`frequency calculations <tutorial_freq>`.

.. _tutorial_uniml_use_md:

Molecular dynamics
++++++++++++++++++
One of the advantages of machine learning potentials is the ultra-fast speed to propagate thousands of trajectories within several hours compared with a few weeks for commonly used DFT methods (if you do not use DM21 that is). MLatom provides an easy way to run :ref:`MD <tutorial_md>` and also :ref:`quasi-classical MD <tutorial_qct>` which is popular in chemical reaction simulation.

For using input file in MLatom, the only difference here is the keywords for ``method``. For example, if you want to use AIMNet2 targeting RKS B97-3c to run dynamics for hydrogen molecule in the NVT ensemble using the Nosé--Hoover thermostat, the input file can look like:

.. code::

    MD                                # 1. requests molecular dynamics
    AIMNet2@b973c                     # 2. use AIMNet2@B97-3c method
    initConditions=user-defined       # 3. use user-defined initial conditions
    initXYZ=h2_init.xyz               # 4. file with initial geometry; Unit: Angstrom
    initVXYZ=h2_init.vxyz             # 5. file with initial velocity; Unit: Angstrom/fs
    dt=0.3                            # 6. time step; Unit: fs
    trun=30                           # 7. total time; Unit: fs
    thermostat=Nose-Hoover            # 8. use Nose-Hoover thermostat
    ensemble=NVT                      # 9. NVT ensemble
    temperature=300                   # 10. Run MD at 300 Kelvin

The initial XYZ coordinates and velocities can be downloaded here: :download:`h2_init.xyz<files/tutorial_uniml/h2_init.xyz>`,  :download:`h2_init.vxyz<files/tutorial_uniml/h2_init.vxyz>`

We also provide below the snippet to run the same task with Python API. As usual, only the code to define the method used will be changed.

.. code::

    import mlatom as ml
    # Use user-defined initial conditions
    mol = ml.data.molecule.from_xyz_file('h2_init.xyz')
    init_cond_db = ml.generate_initial_conditions(molecule=mol,
                                                generation_method='user-defined',
                                                file_with_initial_xyz_coordinates='h2_init.xyz',
                                                file_with_initial_xyz_velocities='h2_init.vxyz')
    init_mol = init_cond_db[0]

    # Initializing model
    model = ml.models.methods(method='AIMNet2@b973c')

    # Initializing thermostat
    nose_hoover = ml.md.Nose_Hoover_thermostat(temperature=300, molecule=init_mol)

    # Run molecular dynamics
    dyn = ml.md(model=model,
                molecule_with_initial_conditions=init_mol,
                thermostat=nose_hoover,
                ensemble='NVT',
                time_step=0.3,
                maximum_propagation_time=30.0)

    # Dump trajectory
    traj = dyn.molecular_trajectory
    traj.dump(filename='traj', format='plain_text')
    traj.dump(filename='traj.h5', format='h5md')

    print(f"Number of steps in the trajectory: {len(traj.steps)}")

.. _tutorial_uni_tl:

Fine-tuning universal models
-------------------------------------------

.. include:: tutorial_tl_uni.inc

AIQM1
-----

AIQM1 (artificial intelligence–quantum mechanical method 1) is a general-purpose method approaching the gold-standard coupled cluster quantum mechanical method with high computational speed of the approximate low-level semiempirical quantum mechanical methods for the ground-state, closed-shell species, but also transferable for calculation of charged and radical species as well as for excited-state calculations with a good accuracy. See `AIQM1 paper <http://doi.org/10.1038/s41467-021-27340-2>`__ for more details. Please cite this paper alongside other required :ref:`citations <tutorial_aiqm1_citations>`:

- Peikun Zheng, Roman Zubatyuk, Wei Wu, Olexandr Isayev, Pavlo O. Dral. `Artificial Intelligence-Enhanced Quantum Chemical Method with Broad Applicability <http://doi.org/10.1038/s41467-021-27340-2>`__. *Nat. Commun.* **2021**, *12*, 7022, DOI: 10.1038/s41467-021-27340-2.

**Strengths:** AIQM1 is especially good for energy calculations and geometry optimizations of closed-shell molecules in their ground-state.

**Limitations:** This method is currently limited to compounds only containing H, C, N, and O elements.

The :ref:`detailed tutorial <tutorial_aiqm1>` is available.

.. _tutorial_uniml_dm21:

DM21 
----

DM21 is an ML-enhanced DFT method `published in Science by DeepMind <http://doi.org/10.1126/science.abj6511>`__ (please cite it when you use this method). Our installation follows `the GitHub page <https://github.com/google-deepmind/deepmind-research/tree/master/density_functional_approximation_dm21>`__. There are four variants of DM21 (DM21 - default, DM21m, DM21mc, DM21mu), see the above GitHub page for the details.

Using DM21 and its variants is similar to using common DFT functionals. Users need to specify both the functional and the basis set to use. Worth noting is that DM21 is not stable and there is no gaurantee to converge. Time for prediction is longer than previous methods since by default in MLatom, it will start from the relatively cheap functional B3LYP as suggested by their official documentation to make SCF faster. It can be only used for single-point calculations in the current implementation (the interface program does not provide gradients or hessians and we did not implement numerical derivatives for this method yet).

Example of an input file:

.. code::

    method=DM21/6-31G*
    qmprog=pyscf     
    xyzfile='
    2

    H    0.000000    0.000000    0.363008
    H    0.000000    0.000000   -0.363008
    5

    C    0.000000    0.000000    0.000000
    H    0.627580    0.627580    0.627580
    H   -0.627580   -0.627580    0.627580
    H    0.627580   -0.627580   -0.627580
    H   -0.627580    0.627580   -0.627580
    '
    yestfile=energy.dat

In Python:

.. code::

    import mlatom as ml
    # read molecule from .xyz file
    molDB = ml.data.molecular_database.from_xyz_file('sp.xyz')

    # define method
    method = mlatom.models.methods(method='DM21/6-31G*', program='pyscf')

    method.predict(
        molecular_database=molDB, 
        calculate_energy=True,
        calculate_energy_gradients=True,
        calculate_hessian=True)

    print(f'Energy in Hartree for molecule 0: {molDB[0].energy}')
    print(f'Gradients in Hartree/Angstrom for molecule 1: {molDB[1].get_energy_gradients()}')
    print(f'Hessian in Hartree/Angstrom^-2 for molecule 1: {molDB[1].hessian}'')

.. _tutorial_uniml_ani_zoo:

ANI models zoo
----------------

MLatom contains 3 public models in `ANI model zoo <https://github.com/aiqm/ani-model-zoo>`__ from `TorchANI <https://aiqm.github.io/torchani/>`__: `ANI-1x <http://doi.org/10.1039/c6sc05720a>`__, `ANI-1ccx <http://doi.org/10.1038/s41467-019-10827-4>`__ and `ANI-2x <http://doi.org/10.1021/acs.jctc.0c00121>`__. In addition, MLatom also allows to use `D4-dispersion corrected methods ANI-1x-D4 and ANI-2x-D4 <https://doi.org/10.1021/acs.jctc.3c01203>`__. Below we provide some useful notes when using these methods in MLatom.

- ANI-1x and ANI-2x were trainied on DFT level data
- ANI-1ccx possess the highest accuracy targeting CCSD(T)*/CBS.
- ANI-1ccx and ANI-1x are limited to CHNO elements, while ANI-2x can be used for CHNOFClS elements.
- D4 dispersion correction in ANI-1x-D4 and ANI-2x-D4 correspond to ωB97X functional.
- These methods are limited to predicting energies and forces for neutral closed-shell compounds in their ground state.
- MLatom will report uncertainties for calculations with these methods based on the standard deviation between neural network (NN) predictions.

Example of an input file:

.. code::

    ANI-1ccx
    geomopt   
    xyzfile='
    2

    H    0.000000    0.000000    0.363008
    H    0.000000    0.000000   -0.363008
    5

    C    0.000000    0.000000    0.000000
    H    0.627580    0.627580    0.627580
    H   -0.627580   -0.627580    0.627580
    H    0.627580   -0.627580   -0.627580
    H   -0.627580    0.627580   -0.627580
    '

In Python:

.. code::

    import mlatom as ml
    # read molecule from .xyz file
    molDB = ml.data.molecular_database.from_xyz_file('sp.xyz')

    # define method
    method = mlatom.models.methods(method='ANI-1ccx')

    method.predict(
        molecular_database=molDB, 
        calculate_energy=True,
        calculate_energy_gradients=True,
        calculate_hessian=True)

    print(f'Energy in Hartree for molecule 0: {molDB[0].energy}')
    print(f'Gradients in Hartree/Angstrom for molecule 1: {molDB[1].get_energy_gradients()}')
    print(f'Hessian in Hartree/Angstrom^-2 for molecule 1: {molDB[1].hessian})

.. _tutorial_uniml_ani1xnr:

Reactive ANI: ANI-1xnr 
++++++++++++++++++++++++

ANI-1xnr is a general reactive ANI-type NN trained on condensed-phase reactive data capable of real-world reactive systems containing C, H, N, O elements, see the `Nature Chemistry publication <http://doi.org/10.1038/s41557-023-01427-3>`__. Implementation is done by interfacing to the model from `ani-1xnr GitHub repository <https://github.com/atomistic-ml/ani-1xnr/>`__.

.. note::

    The first time any of the models are istantiated, the models will be downloaded automatically from the ani-model-zoo repository to the local folder ``./local``. User can choose to download them beforehand.

The input is analogous to :ref:`other ANI models <tutorial_uniml_ani_zoo>`.
 
.. _tutorial_uniml_aimnet2:

AIMNet2
-------
`AIMNet2 <https://chemrxiv.org/engage/chemrxiv/article-details/6525b39e8bab5d2055123f75>`__ aims to solve the problem of ANI which is less capable of dealing with non-local interaction and open-shell charged species. There are two pretrained model targeting B97-3c and ωB97M-D3 accuracy (the user need to choose one of them using the keywords ``aimnet2@b973c`` or ``aimnet2@wb97m-d3``). It is applicable to 14 elements including H, B, C, N, O, F, Si, P, S, Cl, As, Se, Br, I. Currently, hessian is not supported in MLatom. 

.. note::

    The first time any of the models are istantiated, the models will be downloaded automatically from the `AIMNet2 GitHub repository <https://github.com/isayevlab/AIMNet2?tab=readme-ov-file>`__ to the local folder ``./local``. User can choose to download them beforehand.

Example of an input file:

.. code::

    AIMNet2@wb97m-d3
    geomopt   
    xyzfile='
    2

    H    0.000000    0.000000    0.363008
    H    0.000000    0.000000   -0.363008
    5

    C    0.000000    0.000000    0.000000
    H    0.627580    0.627580    0.627580
    H   -0.627580   -0.627580    0.627580
    H    0.627580   -0.627580   -0.627580
    H   -0.627580    0.627580   -0.627580
    '

In Python:

.. code::

    import mlatom as ml
    # read molecule from .xyz file
    molDB = ml.data.molecular_database.from_xyz_file('sp.xyz')

    # define method
    method = mlatom.models.methods(method='AIMNet2@wb97m-d3')

    method.predict(
        molecular_database=molDB, 
        calculate_energy=True,
        calculate_energy_gradients=True,
        calculate_hessian=True)

    print(f'Energy in Hartree for molecule 0: {molDB[0].energy}')
    print(f'Gradients in Hartree/Angstrom for molecule 1: {molDB[1].get_energy_gradients()}')
    print(f'Hessian in Hartree/Angstrom^-2 for molecule 1: {molDB[1].hessian})