.. _lecture8: ===================================================================== Transition state search and analysis ===================================================================== - :ref:`Slides ` - :ref:`TS optimization ` - :ref:`TS validation with normal modes ` - :ref:`TS validation with IRC ` - :ref:`TS analysis via downhill dynamics ` The first-principles eludication of the reaction mechanism requires extensive simulations in time domain by also treating nuclei quantum mechanically (i.e., beyond Born--Oppenheimer approximation). However, in practice, this is rarely affordable and, hence, most of the simulations have to rely on a painstacking manual search of intermediates and transition states (TSs) to map the reaction profiles. There are new automatic tools reported each year and promissing to solve this problem, but so far practice shows that they all are not yet mature enough to deliver the *routine* simulations. However, it looks like we are on a brink of actually having such tools, potentially replacing the need for human experts doing these cumbersome tasks manually. .. _ml-ts-slides: Slides ------ :download:`Slides <_static/mlatom/8-XACSW2024_20240705_Dral_wm.pdf>`: .. raw:: html .. _ts_search: TS optimization --------------- TS search is one of the most time-consuming and difficult tasks for a computational chemist which requries keen chemical understanding and intuition with calculations. The intuition can be obtained only via many trial-and-error attempts. Once TS found, you can calculate the reaction barrier (difference between the energies of the TS and reactants). MLatom provides several ways to find transition states which you can find in the `documentation `__. In the task below you can try out several of the simplest ones. .. _task8.2: .. admonition:: Example ML-TS-1. Try to find a TS for the Diels--Alder reaction between ethylene and butadiene: .. image:: _static/lecture8/da_ts_unknown.png :width: 300 :align: center :alt: Unknown TS of Diels--Alder reaction The most difficult part would be setting up a reasonable guess for the TS geometry used as an initial guess for TS optimization in MLatom. To simplify your life, we provide it. An input file: .. code:: uaiqm ts xyzfile='16 C 0.48462430 -0.55755495 1.43729151 C 0.48462430 -0.55755495 -1.43729151 C -0.27595797 -1.44977527 0.70359025 C -0.27595797 -1.44977527 -0.70359025 C -0.27595797 1.45086377 0.69299925 C -0.27595797 1.45086377 -0.69299925 H 0.37292526 -0.50748993 2.51767690 H 1.44526264 -0.21636383 1.06867438 H 0.37292526 -0.50748993 -2.51767690 H 1.44526264 -0.21636383 -1.06867438 H -1.05536225 -2.01444047 1.21328943 H -1.05536225 -2.01444047 -1.21328943 H 0.51071931 1.96707995 1.23581344 H -1.20625330 1.32768072 1.23591744 H 0.51071931 1.96707995 -1.23581344 H -1.20625330 1.32768072 -1.23591744 ' optxyz=da_ts_opt.xyz The TS should look something like this (depending on the level details may vary but it is a symmetric TS corresponding to the concerted mechanism): .. image:: _static/lecture8/da_ts.png :width: 800 :align: center :alt: TS of Diels-Alder reaction The optimized coordinates for your reference: .. code-block:: 16 C 0.4984885582383 -0.5338087844381 1.4145928440825 C 0.4960692097735 -0.5384413336966 -1.4181301495364 C -0.2896532359000 -1.4128506592682 0.7068921658160 C -0.2905515192621 -1.4145888148315 -0.7051528877014 C -0.2588083017453 1.4282744245852 0.6874347906976 C -0.2574941435174 1.4288896351987 -0.6919434751381 H 0.3947104660660 -0.4397347928006 2.4878894665433 H 1.4467820702090 -0.2162549498538 1.0088131803715 H 0.3876593222716 -0.4479872886549 -2.4913060754311 H 1.4458716855847 -0.2184181276041 -1.0177581601047 H -1.0783691986254 -1.9558675609816 1.2150705221731 H -1.0799202791492 -1.9588818640283 -1.2105952858916 H 0.5409477501467 1.9087387905544 1.2339324532537 H -1.1792980399036 1.2603193266422 1.2295776489298 H 0.5433702777959 1.9074275278623 -1.2394354082521 H -1.1774768154298 1.2605820031641 -1.2339285873837 Importantly, the reaction kinetics is very sensitive to the errors in the barrier heights. Unfortunately, they are not obtained with high enough accuracy even with DFT. You may wonder how good are universal MLPs? UAIQM is quite good. Earlier, we `benchmarked `__ and found that ANI-1ccx is not even good for geometry optimizations while AIQM1 is better than semi-empirical methods for both geometry optimizations and energies but it is not even as good as DFT. This is not surprising given that no transition states were in the training data of either method. In addition, you may want to try a newer `ANI-1xnr `__ or other methods specifically trained on reaction data. After the TSs located, it is important to confirm that the found structures actually correspond to the sought chemical transformation. .. _ts_validate_freq: TS validation with normal modes ------------------------------- The first sanity check whether your optimized TS structure is the correct saddle point on the PES connecting reactants and products, is to perform the frequency calculations on it and confirm that it has only one imaginary (negative) frequency. This frequency corresponds to a normal mode which you can visualize and check whether this vibration corresonds to the sought reaction coordinate. In addition, frequency calculations will give you thermochemical properties which you can use to calculate the Gibbs free energy of activation. .. _task8.3: .. admonition:: Example ML-TS-2. For the structure(s) you found in the :ref:`previous task `, calculate frequencies: .. code-block:: freq uaiqm xyzfile='16 C 0.4984885582383 -0.5338087844381 1.4145928440825 C 0.4960692097735 -0.5384413336966 -1.4181301495364 C -0.2896532359000 -1.4128506592682 0.7068921658160 C -0.2905515192621 -1.4145888148315 -0.7051528877014 C -0.2588083017453 1.4282744245852 0.6874347906976 C -0.2574941435174 1.4288896351987 -0.6919434751381 H 0.3947104660660 -0.4397347928006 2.4878894665433 H 1.4467820702090 -0.2162549498538 1.0088131803715 H 0.3876593222716 -0.4479872886549 -2.4913060754311 H 1.4458716855847 -0.2184181276041 -1.0177581601047 H -1.0783691986254 -1.9558675609816 1.2150705221731 H -1.0799202791492 -1.9588818640283 -1.2105952858916 H 0.5409477501467 1.9087387905544 1.2339324532537 H -1.1792980399036 1.2603193266422 1.2295776489298 H 0.5433702777959 1.9074275278623 -1.2394354082521 H -1.1774768154298 1.2605820031641 -1.2339285873837 ' If you done everything properly, the normal mode corresponding to the reaction coordinate would look like: .. image:: _static/lecture8/da_opt.gif :width: 600 :align: center :alt: TS of Diels-Alder reaction [Credit: taken from `MLatom's documentation `__] .. note:: If you wonder how to make such an animation: we used Gaussian to run frequency calculations with our model, and the Gaussian output can be visualized with Gaussview or ChemCraft. We can't provide Gaussian on the cloud computing though. We will provide a work-around soon -- please subscribe to our updates on WeChat! The output will contain one negative frequency: .. code-block:: Mode Frequencies Reduced masses Force Constants (cm^-1) (AMU) (mDyne/A) 1 -599.9203 8.3844 -1.7779 2 55.2239 5.8098 0.0104 3 137.6350 2.0276 0.0226 4 259.0093 2.8785 0.1138 5 364.0193 3.9539 0.3087 6 375.8171 2.3977 0.1995 ... .. _irc: Intrinsic reaction coordinate (IRC) -------------------------------------------------------------- The more rigorous check for the correctness of the optimized TS structure is following intrinsic reaction coordinate (IRC). IRC follows the minimum-energy path from TS to reactants and products and you can analyze the structures along this path as well the energy profile. Performing IRC is more rigorous than simply checking the normal mode corresponding to the reaction coordinate. If you have a Gaussian program and installed MLatom locally, the input file to MLatom is very simple: .. code:: # first of all, choose the method! IRC xyzfile=... # file with the optimized TS structure Then you can check whether the IRC leads to the reactants and products you expect. You can also check the energy profile along the reaction coordinate which may look like this (details will vary depending your level): .. image:: _static/lecture8/da_opt_irc.png :width: 600 :align: center :alt: IRC for TS of Diels--Alder reaction [Credit: taken from `MLatom's documentation `__] .. _ts_analysis_downhill_md: TS analysis via downhill dynamics --------------------------------- .. raw:: html The problem with IRC that it goes along the minimum-energy path while real reactions never exactly stick to it. The molecules always are in complex dynamical motions and hence it is more appropriate to perform dynamics to find reaction events. Unfortunately, reaction time scales are typically too long to be directly assessed in MD. The compromise is to find TS and start MD *downhill* from this TS to products. Also, typically, we need to sample geometries around TS to reflect that the reaction never exactly passes through the TS. A powerful and common tool for this is :ref:`quasi-classical trajectories ` which we already encountered before. A special consideration has to be taken that during the sampling of initial conditions, they are sampled from the normal modes orthogonal to the reaction coordinate. This is done automatically in MLatom. Since the trajectories are very slow, it is better to use :ref:`active learning ` procedures to generate a good ML potential and accelerate such downhill MD exploration. Also, you would need to run thousands of trajectories in both forward and backward directions to properly analyze the distribution of the reaction outcomes, particularly for the reactions with post-TS bifurcation (very interesting class of reactions where you can observe formation of several different products from the same TS!). You can try yourself to run downhill MD from the TS: .. include:: materials/lecture8/task8.4/task8.4_qct_da.inc When you run many such dynamics you can obtain beautiful reaction trajectories like we `obtained `__: .. image:: _static/lecture8/da_qct.gif :width: 200 :align: center :alt: IRC for TS of Diels-Alder reaction For larger reactions, you can even find some roaming like we did for the `Diels--Alder reaction of fullerene with 2,3-dimethyl-1,3-butadiene `__: .. image:: _static/lecture8/da_c60.gif :width: 400 :align: center :alt: IRC for TS of Diels-Alder reaction You can only obtain such insights with thousands of long trajectories which are much faster with MLPs. .. _uaiqm_qct_ts: Quasiclassical trajectories analysis on the bifurcating pericyclic reaction ----------------------------------------------------------------------------------------------------------------------------------------------------------- If you ever wondered how to get the computation of the beautiful reaction mechanisms, below are instructions from MLatom's documentation prepared by Yuxinxin Chen. In this section, we will reproduce the quasiclassical trajectories for bifurcating pericyclic reaction reported in `JACS (2017) 139, 8251 `__, where the human experts manually chosen the B3LYP-D3/6-31G* level of theory and got the distribution of the products with the resulting 117 reactive trajectories. With the UAIQM models, we can quickly shoot 1000 trajectories and get 867 reactive ones, while the energy profile is more accurate than B3LYP which is known for low reliability on energy prediction. .. image:: materials/qct_uaiqm.png Below we provide instructions on the procedure to **propagate one trajectory in less than one minute** with our UAIQM models. Time budget is reduced to 0.1 s for UAIQM method. The same setting for quasi-classical MD will be used as that in the original paper, i.e. 500 fs long trajectory with 1 fs time step starting from region near ambimodal transition state. Initial transition state file can be downloaded: :download:`TS1.xyz `. .. code:: import mlatom as ml # load initial transitiona state init_ts1 = ml.data.molecule.from_xyz_file('TS1.xyz') # choose optimal UAIQM method uaiqm_optimal = ml.models.uaiqm(method='uaiqm_optimal', verbose=False) uaiqm_optimal.select_optimal( molecule=init_ts1, nCPUs=1, time_budget='0.1s' ) # optimize geometry and get frequencies geomopt = ml.optimize_geometry( model=uaiqm_optimal, initial_molecule=init_ts1, ts=True, program='geometric' ) optmol_ts1 = geomopt.optimized_molecule ml.freq(model=uaiqm_optimal, molecule=optmol_ts1) # get initial conditions for ifreq in range(len(optmol_ts1.frequencies)): if optmol_ts1.frequencies[ifreq] > 0 and optmol_ts1.frequencies[ifreq] < 100: optmol_ts1.frequencies[ifreq] = 100 init_cond_db = ml.generate_initial_conditions(molecule=optmol_ts1, generation_method='harmonic-quantum-boltzmann', number_of_initial_conditions=1000, initial_temperature=298, use_hessian=False) # dump forward initial conditions init_cond_db.dump(f'TS1_incond_298.json',format='json') # Reverse the velocities to get the backward trajectories re_init_cond_db = init_cond_db.copy() for mol in re_init_cond_db: for atom in mol: atom.xyz_velocities = -atom.xyz_velocities # dump backward initial conditions re_init_cond_db.dump(f're_TS1_incond_298.json',format='json') # propagate one forward trajectory init_mol = init_cond_db[0] dyn = ml.md(model=uaiqm_optimal, molecule_with_initial_conditions = init_mol, ensemble='NVE', time_step=1, maximum_propagation_time=500, ) traj = dyn.molecular_trajectory traj.dump(filename=f'forward_temp298_mol0', format='plain_text') # propagate backward trajectory re_init_mol = re_init_cond_db[0] dyn = ml.md(model=uaiqm_optimal, molecule_with_initial_conditions = re_init_mol, ensemble='NVE', time_step=1, maximum_propagation_time=500, ) traj = dyn.molecular_trajectory traj.dump(filename=f'backward_temp298_mol0', format='plain_text') # check reactive trajectories import numpy as np def get_bond_data(traj_xyz, bonds): nmol = len(traj_xyz) data = np.zeros((nmol, len(bonds))) for ii in range(nmol): mol = traj_xyz[ii] bl_list = [] for bond in bonds: bl = mol.internuclear_distance(bond[0], bond[1]) bl_list.append(bl) data[ii] = bl_list return data def check_reactive(traj_bond_data): P1 = False # C2-C15 C7-C24 P2 = False # C2-C15 C3-C13 for ii in range(traj_bond_data.shape[0]): bond_data = traj_bond_data[ii] if bond_data[0] <= 1.6 and bond_data[1] <= 1.6: P2 = True if bond_data[0] <= 1.6 and bond_data[2] <= 1.6: P1 = True return P1, P2 bonds = [(1,14),(2,12),(6,23)] traj_forward = ml.data.molecular_database.from_xyz_file(f'forward_temp298_mol0.xyz') traj_backward = ml.data.molecular_database.from_xyz_file(f'backward_temp298_mol0.xyz') traj = ml.data.molecular_database() traj_forward.molecules.reverse() traj.molecules = traj_forward.molecules + traj_backward.molecules P1, P2 = check_reactive(get_bond_data(traj, bonds=bonds)) print(f'P1 in trajectory: {P1}\n') print(f'P2 in trajectory: {P2}\n')