Quasi-classical molecular dynamics
Quasi-classical molecular dynamics (also known as quasi-classical trajectories (QCT)) accounts for the zero-point energy (ZPE) in contrast to classical dynamics and is very popular in studying chemical reactions (see, e.g., works by Houk et al. in PNAS 2012, 109, 12860 and J. Am. Chem. Soc. 2023, 145, 14446). In this tutorial, we show how to perform such simulations on an example reproducing the above PNAS paper. See also related tutorial on MD.
The implementation details are given in the following work (please cite it alongside other required citations when using this feature):
Yi-Fan Hou, Lina Zhang, Quanhao Zhang, Fuchun Ge, Pavlo O. Dral. Physics-informed active learning for accelerating quantum chemical simulations. arXiv 2024, DOI: 10.48550/arXiv.2404.11811.
Background
The key to performing QCT is to make sure that the initial conditions (geometries and velocities) contain the required ZPE. MLatom supports two normal mode sampling methods of generating such conditions:
Wigner sampling (see Eq. 20 in the corresponding paper), sampling only from the lowest vibrational level. It is also widely used in surface-hopping dynamics and spectra generation
sampling from the harmonic quantum Boltzmann distribution (implemented following this work, also called “thermal sampling” in the manual to its early implementation in the VENUS program). In contrast to Wigner sampling, it samples from higher vibrational levels. This sampling is commonly used for performing QCT investigation of transition states: for all the real normal modes that are perpendicular to the reaction coordinate, the quantum number \(n_{i}\) of normal mode \(i\) is first sampled from a harmonic quantum Boltzmann distribution:
\(p(n_{i})=e^{- n_{i} h v_{i} / k_{\text{B}} T} (1 - e^{-h v_{i} / k_{\text{B}} T})\)
where \(h\) is the Planck constant, \(v_{i}\) is the vibrational frequency, \(k_{B}\) is the Boltzmann constant and \(T\) is the temperature. The coordinates \(Q_{i}\) and momenta \(P_{i}\) are then generated by
\(Q_{i}=A_{i} \text{cos} (2 \pi R_{i})\),
\(P_{i}=-\omega_{i}A_{i} \text{cos} (2 \pi R_{i})\),
here, \(\omega_{i}=2 \pi v_{i}\), \(A_{i}= \sqrt{2E_{i}} / \omega_{i}\) and \(R_{i}\) is a uniform random number on \([0,1]\). The reaction coordinate (corresponding to the normal mode with the imaginary frequency) is fixed while the mass-weighted momentum \(P\) is calculated by \(\pm \sqrt{-2 k_{B} T \text{ln} (1-R)}\), where \(R\) is also a uniform random number on \([0,1]\).
Example
Below we will show you how to run quasi-classical MD in MLatom and reproduce results in the PNAS 2012, 109, 12860 and arXiv.2404.11811 for the Diels-Alder reaction between 1,3-butadiene and ethene.
The general procedure is:
optimize the geometry, in our case do TS optimization,
run frequency calculations which would generate the molecule object file dumped in the JSON format (here we provide this file for you:
da_eqmol.json
, generated for the TS),use the normal modes in the molecule file (in our case
da_eqmol.json
) to generate initial conditions,run dynamics with these conditions,
analyze results. Here, we are going to calculate the average time to traverse the transition zone and the average time gap of C-C bond formation.
See below a full version with Python API and a brief version with the input file (only showing how to do normal mode sampling and propagating trajectories).
Example with input file
Generating initial conditions and running a single trajectory can be done with the following input file example:
MD # 1. requests molecular dynamics
method=UB3LYP/6-31G* qmprog=gaussian # 2. use DFT method (if you have Gaussian). You can also use, e.g., AIQM1 or ANI-1ccx, or B3LYP and qmprog=PySCF...
initConditions=harmonic-quantum-boltzmann # 3. use harmonic quantum Boltzmann distribution
#initConditions=wigner # 3. also possible - use Wigner sampling
normalModeFile=da_eqmol.json # 4. molecule with frequencies and normal modes
initTemperature=300 # 5. initial temperature
trun=150 # 6. total time; Unit: fs
dt=0.5 # 7. time step; Unit: fs
You will get tons of output files for this trajectory (such as xyz coordinates which you can view with VMD, and also h5md trajectory file, etc).
Above example will take you quite long time (30 min for one trajectory), but if you use ML it will be much faster. Here provide a pre-trained model for this reaction da_energy_ani.npz
which would take you less than a minute for the same simulations using the following input file:
MD
MLmodelType=ANI
MLmodelIn=da_energy_ani.npz
initConditions=harmonic-quantum-boltzmann
normalModeFile=da_eqmol.json
initTemperature=300
trun=150
dt=0.5
Example with Python API
A more complete example using Python API:
import mlatom as ml
import os
import numpy as np
# Load molecule with frequencies and normal modes
eqmol = ml.data.molecule()
eqmol.load('da_eqmol.json',format='json')
# Generate initial conditions
if not os.path.exists('init_cond_db.json'):
init_cond_db = ml.generate_initial_conditions(molecule=eqmol,
generation_method='harmonic-quantum-boltzmann',
initial_temperature=298,
number_of_initial_conditions=128)
init_cond_db.dump('init_cond_db.json',format='json')
else:
init_cond_db = ml.data.molecular_database.load('init_cond_db.json',format='json')
# Reverse the velocities to get the backward trajectories
re_init_cond_db = init_cond_db.copy()
for mol in re_init_cond_db:
for atom in mol:
atom.xyz_velocities = -atom.xyz_velocities
# Define method
method = ml.models.methods(method='UB3LYP/6-31G*',program='Gaussian',nthreads=8)
# Run dynamics
for imol in range(128):
if not os.path.exists(f'forward_traj{imol+1}.xyz'):
dyn = ml.md(model=method,
molecule_with_initial_conditions = init_cond_db[imol],
ensemble='NVE',
time_step=0.5,
maximum_propagation_time=150.0)
traj = dyn.molecular_trajectory
traj.dump(filename=f'forward_traj{imol+1}',format='plain_text')
if not os.path.exists(f'backward_traj{imol+1}.xyz'):
dyn = ml.md(model=method,
molecule_with_initial_conditions = re_init_cond_db[imol],
ensemble='NVE',
time_step=0.5,
maximum_propagation_time=150.0)
traj = dyn.molecular_trajectory
traj.dump(filename=f'backward_traj{imol+1}',format='plain_text')
# Analyze trajectories
def bond_length(molecule):
d1 = molecule.internuclear_distance(0,10)
d2 = molecule.internuclear_distance(6,11)
return d1, d2
def check_traj(forward_traj,backward_traj):
forward_label = 'ts'
backward_label = 'ts'
forward_time = -1
backward_time = -1
forward_distances_list = []
backward_distances_list = []
for imol in range(len(forward_traj)):
d1,d2 = bond_length(forward_traj[imol])
min_dist = min(d1,d2)
max_dist = max(d1,d2)
distances = [d1,d2]
forward_distances_list.append(distances)
if min_dist > 5.0:
forward_label = 'reactant'
break
if max_dist < 1.6:
forward_label = 'product'
break
if (min_dist > 2.52 or max_dist < 2.02) and forward_time < 0:
forward_time = imol*0.5
for imol in range(len(backward_traj)):
d1,d2 = bond_length(backward_traj[imol])
min_dist = min(d1,d2)
max_dist = max(d1,d2)
distances = [d1,d2]
backward_distances_list.append(distances)
if min_dist > 5.0:
backward_label = 'reactant'
break
if max_dist < 1.6:
backward_label = 'product'
break
if (min_dist > 2.52 or max_dist < 2.02) and backward_time < 0:
backward_time = imol*0.5
# Check time gap
if (forward_label=='product' and backward_label=='reactant') or (forward_label=='reactant' and backward_label=='product'):
if forward_label == 'product':
distances_list = forward_distances_list
else:
distances_list = backward_distances_list
bond1_formed = False
bond2_formed = False
for istep in range(len(distances_list)):
distances = distances_list[istep]
# print(distances)
if min(distances) < 1.6 and not bond1_formed:
gap = istep * 0.5
bond1_formed = True
if max(distances) < 1.6 and not bond2_formed:
gap = istep * 0.5 - gap
bond2_formed = True
break
else:
gap = None
return forward_label,backward_label,forward_time+backward_time,gap
reactive = 0
nonreactive = 0
traverse_time_list = []
gap_list = []
for imol in range(128):
forward_traj = ml.data.molecular_database.from_xyz_file(f'forward_traj{imol+1}.xyz')
backward_traj = ml.data.molecular_database.from_xyz_file(f'backward_traj{imol+1}.xyz')
forward_label, backward_label, traverse_time, gap = check_traj(forward_traj,backward_traj)
if (forward_label == 'product' and backward_label == 'reactant') or (forward_label == 'reactant' and backward_label == 'product'):
reactive += 1
# print(f'{forward_time} + {backward_time} = {forward_time+backward_time}')
traverse_time_list.append(traverse_time)
gap_list.append(gap)
else:
nonreactive += 1
print(f"Number of reactive trajectories: {reactive}")
print(f"Number of non-reactive trajectories: {nonreactive}")
print(f"Average time: {np.mean(traverse_time_list)} fs")
print(f"Median time: {np.median(traverse_time_list)} fs")
print(f"Standard deviation: {np.std(traverse_time_list)} fs")
print(f"Average time gap: {np.mean(gap_list)} fs")
print(f"Median time gap: {np.median(gap_list)} fs")
print(f"Standard deviation: {np.std(gap_list)} fs")
Unfortunately, each DFT trajectory will take roughly 30 minutes and 128 trajectories will take more than 2 days. You definitely do not want to wait that long. Using machine learning model can greatly accelerate the simulations, so we provide a pre-trained model for this reaction da_energy_ani.npz
. You can simply replace the DFT method with this ANI model:
method = ml.models.ani(model_file='da_energy_ani.npz',device='cpu') # If you want to use gpu, set device='cuda'
We provide our initial conditions (init_cond_db.json
) so that you can reproduce our results. The simulations should take about 15 minutes (it will be much faster if you run them in parallel by using MLatom’s simulation.run_in_parallel
function!) and you will be able to get the following outputs:
Number of reactive trajectories: 121
Number of non-reactive trajectories: 7
Average time: 53.35537190082645 fs
Median time: 53.0 fs
Standard deviation: 11.622406049624619 fs
Average time gap: 3.479338842975207 fs
Median time gap: 3.0 fs
Standard deviation: 2.885126215935735 fs