.. _tutorial_qct:

Quasi-classical molecular dynamics 
=====================================

Quasi-classical molecular dynamics (also known as quasi-classical trajectories (QCT)) accounts for the zero-point energy (ZPE) in contrast to classical dynamics and is very popular in studying chemical reactions (see, e.g., works by Houk et al. in `PNAS 2012, 109, 12860 <https://doi.org/10.1073/pnas.1209316109>`__ and `J. Am. Chem. Soc. 2023, 145, 14446 <https://doi.org/10.1021/jacs.3c03773>`__). In this tutorial, we show how to perform such simulations on an :ref:`example <tutorial_qct_da>` reproducing the above PNAS paper. See also related :ref:`tutorial on MD <tutorial_md>`.

The implementation details are given in the following work (please cite it alongside other required citations when using this feature):

- Yi-Fan Hou, Lina Zhang, Quanhao Zhang, Fuchun Ge, Pavlo O. Dral. `Physics-informed active learning for accelerating quantum chemical simulations <https://arxiv.org/abs/2404.11811>`__. *arXiv* **2024**, DOI: 10.48550/arXiv.2404.11811.

Background
----------

The key to performing QCT is to make sure that the initial conditions (geometries and velocities) contain the required ZPE. MLatom supports two normal mode sampling methods of generating such conditions:

- Wigner sampling (see Eq. 20 in `the corresponding paper <https://doi.org/10.1021/acs.jctc.2c00804>`_), sampling only from the lowest vibrational level. It is also widely used in :ref:`surface-hopping dynamics <tutorial_namd>` and spectra generation
- sampling from the harmonic quantum Boltzmann distribution (implemented following `this work <https://doi.org/10.1021/acs.jctc.2c00804>`_, also called "thermal sampling" in the manual to its early implementation in the VENUS program). In contrast to Wigner sampling, it samples from higher vibrational levels. This sampling is commonly used for performing QCT investigation of transition states: for all the real normal modes that are perpendicular to the reaction coordinate, the quantum number :math:`n_{i}` of normal mode :math:`i` is first sampled from a harmonic quantum Boltzmann distribution:

:math:`p(n_{i})=e^{- n_{i} h v_{i} / k_{\text{B}} T} (1 - e^{-h v_{i} / k_{\text{B}} T})`

where :math:`h` is the Planck constant, :math:`v_{i}` is the vibrational frequency, :math:`k_{B}` is the Boltzmann constant and :math:`T` is the temperature. The coordinates :math:`Q_{i}` and momenta :math:`P_{i}` are then generated by

:math:`Q_{i}=A_{i} \text{cos} (2 \pi R_{i})`,

:math:`P_{i}=-\omega_{i}A_{i} \text{cos} (2 \pi R_{i})`,

here, :math:`\omega_{i}=2 \pi v_{i}`, :math:`A_{i}= \sqrt{2E_{i}} / \omega_{i}` and :math:`R_{i}` is a uniform random number on :math:`[0,1]`. The reaction coordinate (corresponding to the normal mode with the imaginary frequency) is fixed while the mass-weighted momentum :math:`P` is calculated by :math:`\pm \sqrt{-2 k_{B} T \text{ln} (1-R)}`, where :math:`R` is also a uniform random number on :math:`[0,1]`.

.. _tutorial_qct_da:

Example
--------------------------

Below we will show you how to run quasi-classical MD in MLatom and reproduce results in the `PNAS 2012, 109, 12860 <https://doi.org/10.1073/pnas.1209316109>`__ and `arXiv.2404.11811 <https://arxiv.org/abs/2404.11811>`__ for the Diels-Alder reaction between 1,3-butadiene and ethene.

The general procedure is:

1. optimize the geometry, in our case do TS optimization,
2. run :ref:`frequency calculations <tutorial_freq>` which would generate the molecule object file dumped in the JSON format (here we provide this file for you: :download:`da_eqmol.json <files/da_eqmol.json>`, generated for the TS),
3. use the normal modes in the molecule file (in our case ``da_eqmol.json``) to generate initial conditions,
4. run dynamics with these conditions,
5. analyze results. Here, we are going to calculate the average time to traverse the transition zone and the average time gap of C-C bond formation.

See below a full version with Python API and a brief version with the input file (only showing how to do normal mode sampling and propagating trajectories).

.. _tutorial_qct_cli:

Example with input file
+++++++++++++++++++++++++++++++++

Generating initial conditions and running a single trajectory can be done with the following input file example:

.. code-block:: 

    MD                                            # 1. requests molecular dynamics
    method=UB3LYP/6-31G* qmprog=gaussian          # 2. use DFT method (if you have Gaussian). You can also use, e.g., AIQM1 or ANI-1ccx, or B3LYP and qmprog=PySCF...
    initConditions=harmonic-quantum-boltzmann     # 3. use harmonic quantum Boltzmann distribution 
    #initConditions=wigner                        # 3. also possible - use Wigner sampling
    normalModeFile=da_eqmol.json                  # 4. molecule with frequencies and normal modes 
    initTemperature=300                           # 5. initial temperature 
    trun=150                                      # 6. total time; Unit: fs
    dt=0.5                                        # 7. time step; Unit: fs
    
You will get tons of output files for this trajectory (such as xyz coordinates which you can view with VMD, and also h5md trajectory file, etc).                                        

Above example will take you quite long time (30 min for one trajectory), but if you use ML it will be much faster. Here provide a pre-trained model for this reaction :download:`da_energy_ani.npz <files/da_energy_ani.npz>` which would take you less than a minute for the same simulations using the following input file:

.. code-block:: 

    MD                                            
    MLmodelType=ANI
    MLmodelIn=da_energy_ani.npz
    initConditions=harmonic-quantum-boltzmann                     
    normalModeFile=da_eqmol.json                 
    initTemperature=300                           
    trun=150                                      
    dt=0.5                                       

.. _tutorial_qct_api:

Example with Python API
+++++++++++++++++++++++++++++++++

A more complete example using Python API:

.. code-block:: python 

    import mlatom as ml 
    import os 
    import numpy as np

    # Load molecule with frequencies and normal modes 
    eqmol = ml.data.molecule()
    eqmol.load('da_eqmol.json',format='json')

    # Generate initial conditions
    if not os.path.exists('init_cond_db.json'):
        init_cond_db = ml.generate_initial_conditions(molecule=eqmol,
                                                    generation_method='harmonic-quantum-boltzmann',
                                                    initial_temperature=298,
                                                    number_of_initial_conditions=128)
        init_cond_db.dump('init_cond_db.json',format='json')
    else:
        init_cond_db = ml.data.molecular_database.load('init_cond_db.json',format='json')

    # Reverse the velocities to get the backward trajectories
    re_init_cond_db = init_cond_db.copy()
    for mol in re_init_cond_db:
        for atom in mol:
            atom.xyz_velocities = -atom.xyz_velocities

    # Define method
    method = ml.models.methods(method='UB3LYP/6-31G*',program='Gaussian',nthreads=8)

    # Run dynamics
    for imol in range(128):
        if not os.path.exists(f'forward_traj{imol+1}.xyz'):
            dyn = ml.md(model=method,
                        molecule_with_initial_conditions = init_cond_db[imol],
                        ensemble='NVE',
                        time_step=0.5,
                        maximum_propagation_time=150.0)
            traj = dyn.molecular_trajectory
            traj.dump(filename=f'forward_traj{imol+1}',format='plain_text')
        if not os.path.exists(f'backward_traj{imol+1}.xyz'):
            dyn = ml.md(model=method,
                        molecule_with_initial_conditions = re_init_cond_db[imol],
                        ensemble='NVE',
                        time_step=0.5,
                        maximum_propagation_time=150.0)
            traj = dyn.molecular_trajectory
            traj.dump(filename=f'backward_traj{imol+1}',format='plain_text')

    # Analyze trajectories
    def bond_length(molecule):
        d1 = molecule.internuclear_distance(0,10)
        d2 = molecule.internuclear_distance(6,11)
        return d1, d2 

    def check_traj(forward_traj,backward_traj):
        forward_label = 'ts'
        backward_label = 'ts'
        forward_time = -1
        backward_time = -1
        forward_distances_list = [] 
        backward_distances_list = []
        for imol in range(len(forward_traj)):
            d1,d2 = bond_length(forward_traj[imol])
            min_dist = min(d1,d2)
            max_dist = max(d1,d2)
            distances = [d1,d2]
            forward_distances_list.append(distances)
            if min_dist > 5.0:
                forward_label = 'reactant'
                break
            if max_dist < 1.6:
                forward_label = 'product'
                break 
            if (min_dist > 2.52 or max_dist < 2.02) and forward_time < 0:
                forward_time = imol*0.5

        for imol in range(len(backward_traj)):
            d1,d2 = bond_length(backward_traj[imol])
            min_dist = min(d1,d2)
            max_dist = max(d1,d2)
            distances = [d1,d2]
            backward_distances_list.append(distances)
            if min_dist > 5.0:
                backward_label = 'reactant'
                break 
            if max_dist < 1.6:
                backward_label = 'product'
                break
            if (min_dist > 2.52 or max_dist < 2.02) and backward_time < 0:
                backward_time = imol*0.5

        # Check time gap
        if (forward_label=='product' and backward_label=='reactant') or (forward_label=='reactant' and backward_label=='product'):
            if forward_label == 'product':
                distances_list = forward_distances_list 
            else:
                distances_list = backward_distances_list 

            bond1_formed = False 
            bond2_formed = False

            for istep in range(len(distances_list)):
                distances = distances_list[istep]
                # print(distances)
                
                if min(distances) < 1.6 and not bond1_formed:
                    gap = istep * 0.5 
                    bond1_formed = True
                if max(distances) < 1.6 and not bond2_formed:
                    gap = istep * 0.5 - gap
                    bond2_formed = True 
                    break
        else:
            gap = None
            
        return forward_label,backward_label,forward_time+backward_time,gap

    reactive = 0 
    nonreactive = 0
    traverse_time_list = []
    gap_list = []
    for imol in range(128):
        forward_traj = ml.data.molecular_database.from_xyz_file(f'forward_traj{imol+1}.xyz')
        backward_traj = ml.data.molecular_database.from_xyz_file(f'backward_traj{imol+1}.xyz')

        forward_label, backward_label, traverse_time, gap = check_traj(forward_traj,backward_traj)
        if (forward_label == 'product' and backward_label == 'reactant') or (forward_label == 'reactant' and backward_label == 'product'):
            reactive += 1 
            # print(f'{forward_time} + {backward_time} = {forward_time+backward_time}')
            traverse_time_list.append(traverse_time)
            gap_list.append(gap)
        else:
            nonreactive += 1

    print(f"Number of reactive trajectories: {reactive}")
    print(f"Number of non-reactive trajectories: {nonreactive}")
    print(f"Average time: {np.mean(traverse_time_list)} fs")
    print(f"Median time: {np.median(traverse_time_list)} fs")
    print(f"Standard deviation: {np.std(traverse_time_list)} fs")
    print(f"Average time gap: {np.mean(gap_list)} fs")
    print(f"Median time gap: {np.median(gap_list)} fs")
    print(f"Standard deviation: {np.std(gap_list)} fs")

Unfortunately, each DFT trajectory will take roughly 30 minutes and 128 trajectories will take more than 2 days. You definitely do not want to wait that long. Using machine learning model can greatly accelerate the simulations, so we provide a pre-trained model for this reaction :download:`da_energy_ani.npz <files/da_energy_ani.npz>`. You can simply replace the DFT method with this ANI model:

.. code-block:: python 

    method = ml.models.ani(model_file='da_energy_ani.npz',device='cpu') # If you want to use gpu, set device='cuda'

We provide our initial conditions (:download:`init_cond_db.json <files/init_cond_db.json>`) so that you can reproduce our results. The simulations should take about 15 minutes (it will be much faster if you run them in parallel by using MLatom's ``simulation.run_in_parallel`` function!) and you will be able to get the following outputs:

.. code-block:: 

    Number of reactive trajectories: 121 
    Number of non-reactive trajectories: 7
    Average time: 53.35537190082645 fs
    Median time: 53.0 fs
    Standard deviation: 11.622406049624619 fs
    Average time gap: 3.479338842975207 fs
    Median time gap: 3.0 fs
    Standard deviation: 2.885126215935735 fs