.. _tutorial_4d:

Learning molecular dynamics: GICnet
===========================

.. image:: files/tutorial_4d/4D.png
    :width: 931 
    :height: 554

Molecules are always in motion, and this motion can be simulated with :ref:`MD <tutorial_md>` but it needs many steps to evaluate and is very slow. We can accelerate this procedure with :ref:`machine learning potentials <tutorial_mlp>` to calculate forces faster. However, machine learning potentials still do not solve the fundamental problem with dynamics: it is iterative, steps cannot be parallelized, quality depends on time step, trajectories are discrete and not smooth.

That's why in `March 2022 <https://doi.org/10.26434/chemrxiv-2022-qf75v>`__, we introduced a novel concept of directly learning dynamics via **4D-spacetime atomistic AI models** (4D models) for short. The idea is very simple but not easy to realize: we predict the nuclear coordinates as a continuous function of time! We can parallelize predictions at different time steps as they do not need to be calculated sequentially. The trajectories also can be obtained with arbitrarily high resolution.

.. raw:: html

   <iframe src="https://player.bilibili.com/player.html?isOutside=true&aid=1156244583&bvid=BV1YZ421T7VE&cid=1618486035&p=1&autoplay=0" scrolling="no" border="0" frameborder="no" framespacing="0" allowfullscreen="true" height=500 width=800></iframe>

GICnet 
-----------

We managed to pull off a difficult task of creating a deep learning model to learn dynamics and predict the trajectories fast. This model is called GICnet and was `published in JPCL <https://doi.org/10.1021/acs.jpclett.3c01592>`_ in 2023. In effect, GICnet models are analytical representation of molecules and chemical reactions in 4D spacetime!

Amazingly, our model can learn dynamics of molecules of different complexity and even learns cis-trans isomerization dynamics of azobenzene. The trajectories can be obtained with GICnet orders of magnitude faster than with machine learning potentials.

See our paper for more details (please also cite it if you use GICnet):

- Fuchun Ge, Lina Zhang, Yi-Fan Hou, Yuxinxin Chen, Arif Ullah, Pavlo O. Dral*. Four-dimensional-spacetime atomistic artificial intelligence models. *J. Phys. Chem. Lett*. **2023**, 14, 7732–7743. DOI: `10.1021/acs.jpclett.3c01592 <https://doi.org/10.1021/acs.jpclett.3c01592>`_.

.. note::

    In this tutorial, we only talk about the Python API. For use with an input file/command-line, only a specific version of MLatom supports GICnet, which can be found in `GitHub <https://github.com/dralgroup/mlatom/tree/gicnet>`_.

Now, let's move on to the instructions of using GICnet inside MLatom.

Prerequisites
---------------------------

- ``MLatom 3.8.0`` or later
- ``tensorflow v2.17.0`` (no guarantee for other versions)

Training GICnet models
-----------------------

Here we use ethanol molecule.

First we need to generate some trajectories as the training data. Check this :ref:`tutorial <tutorial_md>` for running MD in MLatom.

.. code-block:: python
    
    from mlatom.GICnet import GICnet
    mol = ml.molecule.from_smiles_string('CCO')
    molDB = ml.generate_initial_conditions(
        molecule = mol,
        generation_method = 'random',
        initial_temperature = 300,
        number_of_initial_conditions = 8
    )
    ani = ml.models.methods('ANI-1ccx',tree_node=False)
    trajs = [
        ml.md(
            model=ani,
            molecule=mol,
            time_step=0.05,
            ensemble='NVE',
            maximum_propagation_time=10000,
        ).molecular_trajectory
    ]

Then, we can use these trajectories to train the GICnet model:
    
.. code-block:: python
    
    model = GICnet(
        model_file='GICnet',            # the name of the model file (it's a folder)
        hyperparameters={
            'ICidx': 'ethanol.ic',      # a file that contains the definition of the internal coordinates for ethanol
            'tc': 10.0,                 # the time cutoff
            'batchSize3D': 1024,        # the batch size for the built-in MLP model
            'batchSize4D': 1024,        # the batch size for the 4D model
            'maxEpoch3D': 1024,         # the maximum number of epochs of MLP model training
            'maxEpoch4D': 1024,         # the maximum number of epochs of 4D model training 
        }
    )

    model.train(trajs)

Since GICnet uses internal coordinates to describe the molecule, we need to provide the defintion of the coordinate system. More details can be found in our paper, here is an example for ethanol, for which the definition is saved in file :download:`ethanol.ic <files/tutorial_4d/ethanol.ic>`:

.. code::

    1   0  -1  -2
    2   1   0  -1
    3   1   2   0
    4   1   2   3
    5   1   2   3
    6   2   1   3
    7   2   1   3
    8   2   1   3
    9   3   1   2

Training status will be printed in stdout in real time:

.. code-block:: ruby
    
    ... ...
    ----------------------------------------------------
    epoch 5          1/1             t_epoch:        0.1       
    validate:  0.000650  
    best:      0.000650  
    ----------------------------------------------------
    epoch 6          1/1             t_epoch:        0.1       
    validate:  0.000599  
    best:      0.000599  
    ----------------------------------------------------
    ... ...
    ----------------------------------------------------
    epoch 1          1/1             t_epoch:        1.9       
               D          A          DA         Ep/Ek     
    train:     0.045998   0.049149   0.036743   0.005203  
      d/dt     0.009045   0.009566   0.008752   0.000000  
                                     Loss       0.842420  
    validate:  0.031084   0.024374   0.019558   0.001658  
      d/dt     0.013125   0.010855   0.009795   0.000000  
                                     Loss       0.821911  
    best:      0.031084   0.024374   0.019558   0.001658  
      d/dt     0.013125   0.010855   0.009795   0.000000  
                                     Loss       0.821911  
    ----------------------------------------------------
    epoch 2          1/1             t_epoch:        2.0       
               D          A          DA         Ep/Ek     
    train:     0.036901   0.049857   0.033790   0.005391  
      d/dt     0.008148   0.008873   0.008209   0.000000  
                                     Loss       0.781977  
    validate:  0.034426   0.029851   0.021169   0.003375  
      d/dt     0.012806   0.009719   0.007115   0.000000  
                                     Loss       0.752215  
    best:      0.034426   0.029851   0.021169   0.003375  
      d/dt     0.012806   0.009719   0.007115   0.000000  
                                     Loss       0.752215  
    ----------------------------------------------------
    ... ...
    
Propagate trajectories with GICnet model(s)
-----------------------------------------

Once you have a GICnet model, you can use it to propagate dynamics with lightning-fast speed. 

.. code-block:: python
    
    from mlatom.GICnet import GICnet
    mol=ml.molecule.from_smiles_string('CCO')
    mol= ml.generate_initial_conditions(molecule = mol,
                                        generation_method = 'random',
                                        initial_temperature=300)[0]
                                        
    model = GICnet(model_file='GICnet')
    
    # options are similar to normal MD
    dyn = model.propagate(molecule=mol,
                          maximum_propagation_time=20000, # fs
                          time_step=0.1 # fs
                          )

We highly recommend training multiple models and propagate the dynamics with their ensemble to improve the stability of the predictions:

.. code-block:: python
        
    model = GICnet(model_file=['GICnet_0',
                                'GICnet_1',
                                'GICnet_2',
                                'GICnet_3'])
    
    dyn = model.propagate(molecule=mol,
                          maximum_propagation_time=20000,
                          time_step=0.1)