Train and use models for H\ :sub:`2`\ 
======================================

Here we will show how to train and use different machine learning potential models for H\ :sub:`2`\  as examples:

- :ref:`KREG <KREG>`
- :ref:`TorchANI <TorchANI>`

.. _KREG:

KREG
~~~~~

Firstly, we train a KREG model for H\ :sub:`2`\ . For command line, prepare the input file :download:`h2_train_KREG.inp <files/h2_train_KREG.inp>`. 
Two auxiliary files are needed: :download:`h2.xyz <files/h2.xyz>` with XYZ geometries of hydrogen and :download:`E_FCI_451.dat <files/E_FCI_451.dat>` 
with labels (reference FCI/aug-cc-pV6Z energies in Hartree). The trained model will be saved in ``energies.unf``.

.. code-block::

    # h2_train_KREG.inp
    createMLmodel           # Specify the task for MLatom
    MLmodelType=KREG        # Specify the model type
    MLmodelOut=energies.unf # Save model in energies.unf
    XYZfile=h2.xyz          # File with XYZ geometries
    Yfile=E_FCI_451.dat     # The file with FCI energies 
    sigma=opt               # Optimize hyperparameter sigma
    lgSigmaL=-4             # Lower bound of log2(sigma)
    lambda=opt              # Optimize hyperparameter lambda

And then run it:

.. code-block::

    mlatom h2_train_KREG.inp

In Python, we as well as need to provide the same two auxiliary files.

.. code-block::

    import mlatom as ml

    # load data set
    molDB = ml.data.molecular_database.from_xyz_file('h2.xyz')
    molDB.add_scalar_properties_from_file('E_FCI_451.dat', 'energy')

    # define the model
    model = ml.models.kreg(model_file='energies')

    # split data set for optimizing hyperparameters
    [subtraining_molDB, validation_molDB] = ml.data.sample(number_of_splits=2, fraction_of_points_in_splits=[0.8, 0.2], molecular_database_to_split=molDB, sampling='random')

    # optimize hyperparameters
    model.hyperparameters["sigma"].minval = 2**-4
    model.optimize_hyperparameters(subtraining_molecular_database=subtraining_molDB,
                                         validation_molecular_database=validation_molDB,
                                         optimization_algorithm='nelder-mead',
                                         hyperparameters=['lambda', 'sigma'],
                                         training_kwargs={'property_to_learn': 'energy'},
                                         prediction_kwargs=None)
    lmbd = model.hyperparameters['lambda'].value
    sigma = model.hyperparameters['sigma'].value
    print(f'Optimized hyperparameters: lambda={lmbd}, sigma={sigma}')

    # train the final model
    model.train(molecular_database=molDB, property_to_learn='energy')

Now we can use the model. For command line, prepare the input file :download:`h2_opt_KREG.inp <files/h2_opt_KREG.inp>` for geometry optimization.
We need to provide the initial geometry of H\ :sub:`2`\  (:download:`h2_init.xyz <files/h2_init.xyz>`) and the trained model in 
the previous step (``energies.unf``)

.. code-block::

    # h2_opt_KREG.inp
    geomopt                # Request geometry optimization
    MLmodelType=KREG       # of the KREG type
    MLmodelIn=energies.unf # in energies.unf file
    XYZfile=h2_init.xyz    # The file with initial guess
    optXYZ=eq_KREG.xyz     # optimized geometry output
    -------------------------------------------------------------------
    # h2_init.xyz
    2

    H             0.0000000000000           0.0000000000000           0.0000000000000
    H             0.0000000000000           0.0000000000000           0.8000000000000


Perform geometry optimization.

.. code-block::

    mlatom h2_opt_KREG.inp

The output of optimized geometry is in ``eq_KREG.xyz``.

.. code-block::

    cat eq_KREG.xyz

In Python, we need to provide the same auxiliary files.

.. code-block::

    import mlatom as ml

    # load initial geometry
    mol = ml.data.molecule.from_xyz_file('h2_init.xyz')
    print(mol.get_xyz_string())

    # load the model
    model = ml.models.kreg(model_file='energies')

    # run geometry optimization
    ml.optimize_geometry(model=model, molecule=mol, program='ASE')
    print(mol.get_xyz_string())

.. _TorchANI:

TorchANI
~~~~~~~~~

Except for the KREG model, we can also use other machine learning potential models, e.g., ANI model. Same as above, 
for command line, prepare the input file :download:`h2_train_ANI.inp <files/h2_train_ANI.inp>` and auxiliary files 
(:download:`h2.xyz <files/h2.xyz>`, :download:`E_FCI_451.dat <files/E_FCI_451.dat>`).  The trained model will be saved in ``energies_ani.pt``.

.. code-block::

    # h2_train_ANI.inp
    createMLmodel               # Specify the task for MLatom
    MLmodelType=ANI             # Specify the model type
    MLmodelOut=energies_ani.pt  # Save model in energies_ani.pt 
    XYZfile=h2.xyz              # File with XYZ geometries
    Yfile=E_FCI_451.dat         # The file with FCI energies but can be any other property
    #ani.max_epochs=16          # Only train 16 epochs

Run it:

.. code-block::

    mlatom h2_train_ANI.inp

In Python, we need to prepare the same two auxiliary files.

.. code-block::

    import mlatom as ml

    # load data set
    molDB = ml.data.molecular_database.from_xyz_file('h2.xyz')
    molDB.add_scalar_properties_from_file('E_FCI_451.dat', 'energy')

    # define the model
    model = ml.models.ani(model_file='energies_ani_api.pt', hyperparameters={'max_epochs': 16})

    # train the final model
    model.train(molecular_database=molDB, property_to_learn='energy')

Now we can use the model for geometry optimization, for command line, prepare the input file :download:`h2_opt_ANI.inp <files/h2_opt_ANI.inp>` 
and the auxiliary files: the initial geometry of H\ :sub:`2`\  (:download:`h2_init.xyz <files/h2_init.xyz>`) and the trained 
model in the previous step (``energies_ani.pt``).

.. code-block::

    # h2_opt_ANI.inp
    geomopt                     # Request geometry optimization
    MLmodelType=ANI             # of the KREG type
    MLmodelIn=energies_ani.pt   # in energies_ani.pt file
    XYZfile=h2_init.xyz         # The file with initial guess
    optXYZ=eq_ANI.xyz           # optimized geometry output

Perform geometry optimization.

.. code-block::

    mlatom h2_opt_ANI.inp

The output of optimized geometry is in ``eq_ANI.xyz``.

.. code-block::

    cat eq_ANI.xyz

In Python, we need to prepare the same auxiliary files.

.. code-block::

    import mlatom as ml

    # load initial geometry
    mol = ml.data.molecule.from_xyz_file('h2_init.xyz')
    print(mol.get_xyz_string())

    # load the model
    model = ml.models.ani(model_file='energies_ani_api.pt')

    # run geometry optimization
    ml.optimize_geometry(model=model, molecule=mol, program='ASE')
    print(mol.get_xyz_string())