.. _tutorial_tl:

Transfer learning
====================================

Transfer learning (TL) is an often-used technique in machine learning that helps you train better neural network models. The basic idea is to reuse the knowledge (parameters) of the pre-trained model from a different task, by fixing and changing some of its layers and then fit the model for a new task. There are serval reasons that we might utilize this technique: 

- **Better training performance** 

    The pre-trained models can give you a good starting point to kick off the training, which helps the convergence and might improve the accuracy of the final model.

- **Less data hungry** 

    Training from a pre-trained model does not require data as much as training from scratch, since the pre-trained model already contains tons of information from its training data. This can be critical, especially when the accessibility to the data for the new task is limited.

Note that the advantages might not be able to be realized unless you choose your pre-trained model and train it properly. E.g. you definitely don't want to use a totally unrelated model as the pre-trained model.

Here we show two tutorials:

- :ref:`fine-tuning the universal models <tutorial_tl_uni>`
- :ref:`generating all data and models from scratch <tutorial_tl_scratch>`

.. _tutorial_tl_uni:

Transfer learning from the universal models
-------------------------------------------

The training data, initial geometry for frequency calculation and the reference harmonic frequencies at MP2/zvtz level can be found in :download:`TL_tutorial.zip <tutorial_files/tl/TL_tutorial.zip>` 

.. include:: tutorial_tl_uni.inc

.. _tutorial_tl_scratch:

Transfer learning from scratch
------------------------------

A typical transfer learning scheme includes:

1. **Obtain the pre-trained model** 

2. **Fix/change layers**

3. **Retrain**

The first and last steps can be easily done in the normal MLatom routine (via ``model.__init__(model_file)`` and ``model.train()``). While the second one might be a little bit complicated: we need to modify ``model.model``.

Although platforms like PyTorch, Tensorflow have the flexibility to allow users to modify the model as they wish. This modification still requires a little bit deeper knowledge about the model's architecture.

Fortunately, we provide some shortcut methods in our interface to TorchANI, to make it easier to fix some of the layers in a model (no quick methods for adding layers / other interfaces yet...).

Let's see how it works.

- :download:`tutorial notebook <tutorial_files/tl/tl.ipynb>`

ANI TL model 
--------------

Here we just use hydrogen molecule as a simple example. The Jupyter notebook can be downloaded from the bottom of this page.

First we need to generate some data.

.. code-block:: python
    
    import mlatom as ml
    import numpy as np
    import matplotlib.pyplot as plt
    
    # prepare H2 geometries with bond lengths ranging from 0.5 to 5.0 Å
    xyz = np.zeros((451, 2, 3))
    xyz[:, 1, 2] = np.arange(0.5, 5.01, 0.01)
    z = np.ones((451, 2)).astype(int)
    molDB = ml.molecular_database.from_numpy(coordinates=xyz, species=z)
    
    # calculate HF energies
    hf = ml.models.methods(method='HF/STO-3G', program='PySCF')
    hf.predict(molecular_database=molDB, calculate_energy=True)
    molDB.add_scalar_properties(molDB.get_properties('energy'), 'HF_energy') # save HF energy with a new name
    
    # calculate CISD energies
    cisd = ml.models.methods(method='CISD/cc-pVDZ', program='PySCF')
    cisd.predict(molecular_database=molDB, calculate_energy=True)
    molDB.add_scalar_properties(molDB.get_properties('energy'), 'CISD_energy')
    
Here we use ``HF/STO-3G`` and ``CISD/cc-pVDZ`` to generate two levels of energies for the geometry whose H-H bond lengths ranging from 0.5 to 5.0 Å.
    
.. image:: tutorial_files/tl/energy.png
    :width: 640 
    :height: 480


Then we can train an ANI model with HF energies to be the pre-trained model.
    
.. code-block:: python
    
    # train ANI model with HF energies
    ani = ml.models.ani(model_file='ANI-HF.pt', verbose=False)
    ani.train(molecular_database=molDB, property_to_learn='HF_energy')
    # predict with trained ANI model
    ani.predict(molecular_database=molDB, property_to_predict='ANI_HF_energy')
    
The model has a default structure as below:   
    
.. code-block:: python
    
    Sequential(
    (0): AEVComputer()
    (1): ANIModel(
        (H): Sequential(
        (0): Linear(in_features=48, out_features=160, bias=True)
        (1): CELU(alpha=0.1)
        (2): Linear(in_features=160, out_features=128, bias=True)
        (3): CELU(alpha=0.1)
        (4): Linear(in_features=128, out_features=96, bias=True)
        (5): CELU(alpha=0.1)
        (6): Linear(in_features=96, out_features=1, bias=True)
        )
      )
    )
    
Now let's do transfer learning.
For this, you need to copy the file ``ANI-HF.pt`` with the model pre-trained on HF to ``ANI-HF-TL.pt`` file which will be used for fine-tuning on CISD.
    
.. code-block:: python
    
    # fix some of the layers
    ani = ml.models.ani(model_file='ANI-HF-TL.pt', verbose=False)
    ani.fix_layers([[0, 6]])
    # transfer leaning with every 40th of the data  
    step = 40
    val = molDB[::step][::10]
    sub = ml.molecular_database([mol for mol in molDB[::step] if mol not in val])
    ani.energy_shifter.self_energies = None # let the model recalculate the self atomic energies
    ani.train(molecular_database=sub, validation_molecular_database=val, property_to_learn='CISD_energy', hyperparameters={'learning_rate': 0.0001}, reset_optimizer=True)
    # predict with TL model
    ani.predict(molecular_database=molDB, property_to_predict='ANI_TL_energy')
    
Here, we only use 12 CISD data points (subtrain/validate: 10/2) to train the TL model.

And we fixed the first and the last linear layer in the model with ``ani.fix_layers([[0, 6]])`` (see :class:`mlatom.interfaces.torchani_interface.ani.fix_layers`).

Also, we change the initial learning rate to a smaller value to make the training less aggressive.
    
Next, let's train another model with the same data from scratch.
      
.. code-block:: python    
    
    # train ANI model with CISD energies directly
    ani_cisd = ml.models.ani(model_file='ANI_CISD.pt', verbose=False)
    ani_cisd.train(molecular_database=sub, validation_molecular_database=val, property_to_learn='CISD_energy')
    # predict with trained ANI model
    ani_cisd.predict(molecular_database=molDB, property_to_predict='ANI_CISD_energy')
    
Now let's check the results. 
    
.. code-block:: python    
    
    # plot the energies
    plt.plot(xyz[:, 1, 2], molDB.get_properties('HF_energy'), label='HF/STO-3G')
    plt.plot(xyz[:, 1, 2], molDB.get_properties('CISD_energy'), label='CISD/cc-pVDZ')
    plt.plot(xyz[:, 1, 2], molDB.get_properties('ANI_HF_energy'), label='ANI-HF')
    plt.plot(xyz[:, 1, 2], molDB.get_properties('ANI_TL_energy'), label='ANI-TL')
    plt.plot(xyz[:, 1, 2], molDB.get_properties('ANI_CISD_energy'), label='ANI-CISD')
    plt.plot(sub.xyz_coordinates[:, 1, 2], sub.get_properties('CISD_energy'), 'o', label='TL subtraining')
    plt.plot(val.xyz_coordinates[:, 1, 2], val.get_properties('CISD_energy'), 'o', label='TL validation')
    plt.legend()
    plt.xlabel('H-H bond length (Å)')
    plt.ylabel('energy (hartree)')
    plt.show()
    
First, we can see that the ANI-HF model trained with all the data has excellent agreement with the reference. 

While for the CISD level, the transfer learned model behave way better than the direct one.
    
.. image:: tutorial_files/tl/energy-TL.png 
    :width: 640 
    :height: 480


With this example, we can see the power of transfer learning: lower-level pre-trained model can boost the higher-level model even with a small training set.