Molecular dynamics

This tutorial shows how to run molecular dynamics with MLatom.

We start with the example of running dynamics via input file/command line and later give a tutorial on how to use more versatile Python API. For the tutorial, you can download the files with the initial XYZ coordinates h2_md_kreg_init.xyz, initial XYZ velocities h2_md_kreg_init.vxyz and model file h2_kreg_energies.unf.

The input file h2_md_kreg.inp shows how to run dynamics for the hydrogen molecule in the NVT ensemble using the Nosé–Hoover thermostat:

# h2_md_kreg.inp
MD                                # 1. requests molecular dynamics
initConditions=user-defined       # 2. use user-defined initial conditions
initXYZ=h2_md_kreg_init.xyz       # 3. file with initial geometry; Unit: Angstrom
initVXYZ=h2_md_kreg_init.vxyz     # 4. file with initial velocity; Unit: Angstrom/fs
dt=0.3                            # 5. time step; Unit: fs
trun=30                           # 6. total time; Unit: fs
thermostat=Nose-Hoover            # 7. use Nose-Hoover thermostat
ensemble=NVT                      # 8. NVT ensemble
temperature=300                   # 9. Run MD at 300 Kelvin
MLmodelType=KREG                  # 10. KREG model is used
MLprog=MLatomF                    # 11. use KREG implemented in the Fortran part of MLatom
MLmodelIn=h2_kreg_energies.unf    # 12. file with the trained model

The initial coordinates are defined in the XYZ format (Angstrom) as:

2

H 0.0 0.0 1.0
H 0.0 0.0 0.0

The initial velocities are also defined in a simplified XYZ format (Angstrom/fs):

2

0.0 0.0 -0.05
0.0 0.0  0.05

Run the MD simulation.

mlatom h2_md_kreg.inp &> h2_md_kreg.out

After the end of the MD, the trajectory will be saved in the files traj.xyz with the XYZ coordinates, traj.vxyz with XYZ velocities, traj.ekin, traj.epot, and traj.etot with kinetic, potential, and totla energies (Hartree), respectively, and traj.grad with energy gradients (Hartree/Angstrom).

Note

In MD via input file or command line, translational and angular momenta are removed by default. The user cannot keep angular momenta or change the degrees of freedom as in Python API. The linearity of the molecule is checked automatically to ensure correct degrees of freedom.

Initial conditions

Apart from using user-defined initial conditions shown above, the user can also use random velocities via input file. initTemperature is used to control the kinetic energy of the molecule (\(E_\text{kin}=N_\text{DOF}k_\text{B}T_\text{init}/2\)).

MD                                # 1. requests molecular dynamics
initConditions=random             # 2. generate random velocities
initXYZ=h2_md_kreg_init.xyz       # 3. file with initial geometry; Unit: Angstrom
initTemperature=300               # 4. initial temperature
dt=0.3                            # 5. time step; Unit: fs
trun=30                           # 6. total time; Unit: fs
thermostat=Nose-Hoover            # 7. use Nose-Hoover thermostat
ensemble=NVT                      # 8. NVT ensemble
temperature=300                   # 9. Run MD at 300 Kelvin
MLmodelType=KREG                  # 10. KREG model is used
MLprog=MLatomF                    # 11. use KREG implemented in the Fortran part of MLatom
MLmodelIn=h2_kreg_energies.unf    # 12. file with the trained model

The user can also use normal mode sampling to generate initial conditions. We provide two options: Wigner distribution (initConditions=wigner) and quantum harmonic Boltzmann distribution (initConditions=quantum-harmonic-boltzmann).

MD                                            # 1. requests molecular dynamics
method=AIQM1                                  # 2. use AIQM1
initConditions=wigner                         # 3. use Wigner distribution
#initConditions=harmonic-quantum-boltzmann    # 3. use harmonic quantum Boltzmann distribution
normalModeFile=ethanol_freq.json              # 4. molecule with frequencies and normal modes
initTemperature=300                           # 5. initial temperature
trun=30                                       # 6. total time; Unit: fs
dt=0.3                                        # 7. time step; Unit: fs

normalModeFile specifies the json file that saves the molecule object, which should contain information needed for normal mode sampling, i.e., frequencies and normal modes. This file is automatically generated when you do frequency calculations (check freq[your molecule number].json file).

Choosing ensemble and thermostat

MLatom currenly supports NVE and NVT ensembles which can be requested with the ensemble=NVE and ensemble=NVT keywords, respectively. In the case of the NVT ensemble, the user can choose between the Andersen thermostat (thermostat=Andersen) and Nosé–Hoover (thermostat=Nose-Hoover).

Choosing the method or model

The user can benefit from MLatom’s many choices of the methods or models to be used for MD.

Above examples are shown for the KREG model trained by the user and defined with the following keywords:

...
MLmodelType=KREG                  # 10. KREG model is used
MLprog=MLatomF                    # 11. use KREG implemented in the Fortran part of MLatom
MLmodelIn=h2_kreg_energies.unf    # 12. file with the trained model

See manuals and tutorials on how to train such models, e.g., for MACE.

The above lines can be replaced with some of the automatically recognized methods such as ANI-1ccx or AIQM1, e.g., simply as:

...
ANI-1ccx # or AIQM1

The general way to define the QM method is to use method and the corresponding QM program providing its implementations (MLatom uses interfaces to such programs). For example, ab initio (e.g., HF or MP2) or DFT (e.g., B3LYP/6-31G*) can be requested with the input:

...
method=B3LYP/6-31G*     # 2. request running DFT optimization with B3LYP
qmprog=PySCF            # 3. request using PySCF for B3LYP calculations; qmprog=Gaussian can be also used

Note

In principle, our default recomendation is to use AIQM1 as it is usually the most accurate method for affordable cost. However, it is currently only applicable to the compounds with CHNO elements. Another limitation is that to unlock its full potential, we recommend to install the MNDO program for its QM part. The XACS cloud uses Sparrow instead, which currently only provides numerical gradients limiting its applicability for MD. ANI-1ccx is hence a good alternative on the XACS cloud if it can be applied (it has additional limitations to neutral closed-shell compounds in ground state).

Using PyAPI

Running molecular dynamics in PyAPI consists of 4 steps: generating initial conditions, initializing model, initializing thermostat and running dynamics. Here we use a pre-trained KREG model to run MD simulation for \(\text{H}_2\).

Initial conditions

MLatom provides a function ml.generate_initial_conditions to generate initial geometries and velocities for MD, which returns a ml.data.molecular_database class with the ml.data.molecule objects containing these properties. Example below uses initial conditions provided by the user. They can be initial conditions from previous calculations or generated by other softwares.

import mlatom as ml
# Use user-defined initial conditions
mol = ml.data.molecule.from_xyz_file('h2_md_kreg_init.xyz')
init_cond_db = ml.generate_initial_conditions(molecule=mol,
                                              generation_method='user-defined',
                                              file_with_initial_xyz_coordinates='h2_md_kreg_init.xyz',
                                              file_with_initial_xyz_velocities='h2_md_kreg_init.vxyz')
init_mol = init_cond_db[0]

The user can also generate random velocities with MLatom via generation_method='random'. The initial_temperature (in Kelvin) can be used to control the kinetic energy of the molecule. The kinetic energy is calculated by \(N_\text{DOF}k_\text{B}T_\text{init}/2\), where \(N_\text{DOF}\) is the degrees of freedom of the molecule, \(k_\text{B}\) is the Boltzmann constant and \(T_\text{init}\) is the initial temperature set by the user. You can also use initial_kinetic_energy argument to directly set the initial kinetic energy in Hartree.

import mlatom as ml
# Generate random velocities given geometry
mol = ml.data.molecule.from_xyz_file('h2_md_kreg_init.xyz')
init_cond_db = ml.generate_initial_conditions(molecule=mol,
                                              generation_method='random',
                                              initial_temperature=300,           # Set initial temperature in K
                                              #initial_kinetic_energy=4.75E-4,   # Or set initial kinetic energy in Hartree
                                              number_of_initial_conditions=1)

Note

In MLatom, if generation_method='random' is used, translational and angular momenta (velocities) are removed by default (\(N_\text{DOF}=3N_\text{atoms}-6\) for non-linear molecules). If you want to keep angular momentum, you can use eliminate_angular_momentum=False (\(N_\text{DOF}=3N_\text{atoms}-3\)). By default, for diatomic molecules, \(N_\text{DOF}=3N_\text{atoms}-5\), while for other linear molecules (\(N_\text{atoms}>2\)), \(N_\text{DOF}=3N_\text{atoms}-6\) (but MLatom will check the linearity of the molecule and print a warning message). You can use degrees_of_freedom argument to override the default setting of degrees of freedom in MLatom.

Apart from random velocities, MLatom provides two normal mode sampling methods as well which are required for quasi-classical MD. One is based on the non-sharp Wigner distribution as implemented in Newton-X (see Eq. 20 in the software’s paper). The codes are modified and rewritten in Python. The other one is based on the harmonic quantum Boltzmann distribution as described in this paper. Below we show how to use normal mode sampling and the molecule with frequencies and normal modes can be downloaded here ethanol_freq.json.

import mlatom as ml
eqmol = ml.data.molecule()
eqmol.load('ethanol_freq.json',format='json')
# Sample in Wigner distribution
init_cond_db = ml.generate_initial_conditions(molecule=eqmol,
                                              generation_method='wigner',
                                              initial_temperature=300,    # Set initial temperature in K
                                              number_of_initial_conditions=1)

# Sample in harmonic quantum Boltzmann distribution
init_cond_db = ml.generate_initial_conditions(molecule=eqmol,
                                              generation_method='harmonic-quantum-boltzmann',
                                              initial_temperature=300,    # Set initial temperature in K
                                              number_of_initial_conditions=1)

Note

Normal mode sampling needs frequencies and normal modes. Make sure that the molecule object has these properties (basically, just run frequency calculations). For transition states, we recommend using harmonic quantum Boltzmann distribution (make sure your frequency calculations produced only one imaginary frequency). The reaction coordinate is treated seperately.

Method or model

A model or method should be provided to calculate forces for MD simulation. Here, a pre-trained ML model is provided h2_kreg_energies.unf but other choices are possible, e.g., see the commented out lines:

# Initializing model
model = ml.models.kreg(model_file='h2_kreg_energies.unf',ml_program='MLatomF')
# or another ML model, e.g., MACE:
#mymodel = ml.models.mace(model_file='mace.pt')
# or choose one of the predifined (automatically recognized) methods
#model = ml.models.methods(method='ANI-1ccx')
# or QM method, e.g., B3LYP with Gaussian
#model = ml.models.methods(method='B3LYP/6-31G*', program='Gaussian')

Thermostat

A thermostat should be specified unless NVE ensemble (micro-canonical ensemble, the system has constant particle N, constant volume V and constant energy E) is used. Here, the Nosé–Hoover thermostat is initialized in order to run MD in NVT ensemble (canonical ensemble, the system has constant particle N, constant volume V and temperature fluctuating around T).

# Initializing thermostat
nose_hoover = ml.md.Nose_Hoover_thermostat(temperature=300, molecule=init_mol)

Note

Degrees of freedom in Nosé–Hoover thermostat is basically the same as that in initial conditions except for linear molecules. The default degrees of freedom for all linear molecules are \(3N_\text{atoms}-5\). You can still use degrees_of_freedom argument to override the default settings in MLatom.

Advanced settings for the Nosé–Hoover thermostat such as Nosé–Hoover chain length (nose_hoover_chain_length), multiple time step (multiple_time_step), the number of the Yoshida–Suzuki steps (number_of_yoshida_suzuki_steps) and the Nosé–Hoover chain frequency (nose_hoover_chain_frequency) can also be changed. Please refer to the manual for more information.

Andersen thermostat is also available in MLatom.

andersen = ml.md.Andersen_thermostat(temperature=300)

Note

The random collision in Andersen thermostat is applied to all \(3N_\text{atoms}\) degrees of freedom.

You can use gamma to modify the collision rate (in \(\text{fs}^{-1}\)) in Andersen thermostat, which is 0.2 by default.

Dynamics

The last step is running molecular dynamics. The maximum propagation time (in fs) and time step (in fs) should be provided. Here, the maximum propagation time is 30 fs and the time step is 0.3 fs.

# Run molecular dynamics
dyn = ml.md(model=model,
            molecule_with_initial_conditions=init_mol,
            thermostat=nose_hoover,
            ensemble='NVT',
            time_step=0.3,
            maximum_propagation_time=30.0)

The trajectory is saved as an attribute of ml.md, which can be dumped into plain text files and h5md format file.

# Dump trajectory
traj = dyn.molecular_trajectory
traj.dump(filename='traj', format='plain_text')
traj.dump(filename='traj.h5', format='h5md')

Bringing it all together

After combining the 4 steps shown above, the code for the entire workflow from setting up the initial conditions to obtaining the dynamics trajectory files look like this:

import mlatom as ml
# Use user-defined initial conditions
mol = ml.data.molecule.from_xyz_file('h2_md_kreg_init.xyz')
init_cond_db = ml.generate_initial_conditions(molecule=mol,
                                                generation_method='user-defined',
                                                file_with_initial_xyz_coordinates='h2_md_kreg_init.xyz',
                                                file_with_initial_xyz_velocities='h2_md_kreg_init.vxyz')
init_mol = init_cond_db[0]

# Initializing model
model = ml.models.kreg(model_file='h2_kreg_energies.unf',ml_program='MLatomF')

# Initializing thermostat
nose_hoover = ml.md.Nose_Hoover_thermostat(temperature=300, molecule=init_mol)

# Run molecular dynamics
dyn = ml.md(model=model,
            molecule_with_initial_conditions=init_mol,
            thermostat=nose_hoover,
            ensemble='NVT',
            time_step=0.3,
            maximum_propagation_time=30.0)

# Dump trajectory
traj = dyn.molecular_trajectory
traj.dump(filename='traj', format='plain_text')
traj.dump(filename='traj.h5', format='h5md')

print(f"Number of steps in the trajectory: {len(traj.steps)}")

You will find XYZ coordinates in traj.xyz, XYZ velocities in traj.vxyz, kinetic energies in traj.ekin, potential energies in traj.epot, total energies in traj.etot and energy gradients in traj.grad. There are 101 steps in the trajectory (including the initial condition) and the geometry of the last step is:

2

H             0.0000000000000           0.0000000000000           0.8764323118216
H             0.0000000000000           0.0000000000000           0.1235676881784

Stop function

You can define a stop function in MD so that it will stop when some conditions are met (only available via Python API). For example, we show below how to stop MD when the bond length is shorter than 0.9 Angstrom.

import mlatom as ml
# Use user-defined initial conditions
mol = ml.data.molecule.from_xyz_file('h2_md_kreg_init.xyz')
init_cond_db = ml.generate_initial_conditions(molecule=mol,
                                                generation_method='user-defined',
                                                file_with_initial_xyz_coordinates='h2_md_kreg_init.xyz',
                                                file_with_initial_xyz_velocities='h2_md_kreg_init.vxyz')
init_mol = init_cond_db[0]

# Initializing model
model = ml.models.kreg(model_file='h2_kreg_energies.unf',ml_program='MLatomF')

# Initializing thermostat
nose_hoover = ml.md.Nose_Hoover_thermostat(temperature=300, molecule=init_mol)

# User-defined stop function
def stop_function(molecule):
    dist_matrix = molecule.get_internuclear_distance_matrix()
    # Get bond length
    bond_length = dist_matrix[0,1]
    if bond_length < 0.9:
        return True
    else:
        return False

# Run molecular dynamics
dyn = ml.md(model=model,
            molecule_with_initial_conditions=init_mol,
            thermostat=nose_hoover,
            ensemble='NVT',
            time_step=0.3,
            maximum_propagation_time=30.0,
            stop_function=stop_function,
            stop_function_kwargs={})

# Dump trajectory
traj = dyn.molecular_trajectory
traj.dump(filename='traj_stop_function', format='plain_text')
traj.dump(filename='traj_stop_function.h5', format='h5md')

print(f"Number of steps in the trajectory: {len(traj.steps)}")

There are 4 steps in the trajectory and the geometry of the last step is:

2

H             0.0000000000000           0.0000000000000           0.9417784217628
H             0.0000000000000           0.0000000000000           0.0582215782372

Note

In every step, the molecule will be passed to the stop function to check whether the molecular dynamics needs to stop. The first argument in the user-defined stop function has to take in a molecule object.

Any questions or suggestions?

If you have further questions, criticism, and suggestions, we would be happy to receive them in English or Chinese via email, Slack (preferred), or WeChat (please send an email to request to add you to the XACS user support group).