.. _lecture7: ===================================================================== Spectroscopy ===================================================================== Spectroscopy simulations are very important for bridging the theoretical understanding of molecular structure and reactivity with the experimental measurements. QM spectroscopy is typically quite expensive and here ML can help both as a surrogate, faster replacement of QM methods or be used to directly predict spectra. Also, there is long-standing interest in using ML for interpreting spectra and, most importantly, to obtain structures from spectra, but this is much more difficult and less mature. Here we can't cover all types of spectroscopy but show on several common types how they can be simulated and what we should pay attention to when setting up the simulations. - :ref:`Slides ` - :ref:`(Ro)vibrational spectra ` - :ref:`IR from frequency calculations ` - :ref:`IR and power spectra from MD ` - :ref:`UV/vis absorption spectra ` - :ref:`UV/vis spectra from single-point calculations ` - :ref:`Nuclear-ensemble approach ` - :ref:`Emission spectra ` - :ref:`Two-photon absorption spectra ` .. _ml-spectra-slides: Slides ------ :download:`Slides <_static/mlatom/5-XACSW2024_20240704_Dral_wm.pdf>`: .. raw:: html .. _vibr: (Ro)vibrational spectra ----------------------- Rotational-vibrational (rovibrational), i.e., infrared (IR) and Raman, spectroscopy provides a direct way to probe the molecular structure. Hence, such spectra simulations have big practical importance. Their accurate simulations with QM methods are, however, rather expensive. ML is used to significantly accelerate such simulations. Plenty of approaches were suggested for rovibrational spectra simulations of molecules. Most of them require the PES representation of a molecule which allow to calculate vibrational frequencies. However, calculations of intensities also require the knowldge of dipole moments and, often, their first- and second-order derivatives which are not always available (particularly with many ML methods). Two big classes considered here are the :ref:`static ` and :ref:`dynamic ` approaches. The static calculations are faster but miss the effects of the different distribution of conformers and often, also, can't adequately describe the anharmonic effects. Dynamic approaches, on the other hand, often, inadequatly describe quantum nuclear effects. .. _ir_freq: IR from frequency calculations ++++++++++++++++++++++++++++++ Vibrational spectra can be calculated statically by just performing single-point calculations which require the costly evaluation of the Hessian matrix (second-order energy derivatives). The Hessian is diagonalized to obtaine frequencies and normal modes of vibrations. As you know, these properties are obtained by simply performing the frequency calculations using the ``freq`` keyword in MLatom. However, if you analyze the output file, you will discover that no intensities are predicted. Indeed, frequency calculations are just part of the answer, you need to know intensities to actually obtain the IR spectrum which can be compared to experiment. Calculation of IR intensities, however, requires the evaluation of the first-order dipole moment derivatives with respect to normal mode coordinates. This is much more expensive and also not available in all models. To request such calculations, you can use the ``ir`` keyword in MLatom: .. raw:: html Simulating infrared spectra with UAIQM methods using input file in MLatom is ultra easy with only 3 lines. .. include:: materials/lecture7/task7.1.inc Vibrational analysis and IR intensities can be found in MLatom output files: .. code:: ============================================================================== Vibration analysis for molecule 1 ============================================================================== Multiplicity: 1 This is a nonlinear molecule Mode Frequencies Reduced masses Force Constants IR intensities (cm^-1) (AMU) (mDyne/A) (km/mol) 1 241.0374 1.1507 0.0394 86.1148 2 297.4762 1.0709 0.0558 54.1970 3 418.2334 2.5803 0.2659 12.4199 4 810.5925 1.0759 0.4165 0.0530 5 910.3415 2.1135 1.0319 12.0631 6 1048.6773 1.8779 1.2167 48.5102 7 1125.4218 2.7007 2.0154 37.0761 8 1193.0919 1.5109 1.2672 5.4162 9 1274.9990 1.2567 1.2037 83.6804 10 1305.5061 1.1150 1.1196 0.0292 11 1404.9278 1.2383 1.4401 2.1079 12 1454.1580 1.4707 1.8323 16.1926 13 1493.5120 1.0405 1.3674 5.3892 14 1508.6873 1.0534 1.4126 3.9672 15 1539.9937 1.0936 1.5280 2.0331 16 2996.6657 1.0554 5.5840 65.1341 17 3032.2203 1.1085 6.0050 66.4608 18 3037.2688 1.0350 5.6253 14.7477 19 3120.1171 1.1012 6.3160 27.1681 20 3126.3285 1.1032 6.3532 28.7563 21 3844.2694 1.0659 9.2813 22.8496 ============================================================================== Thermochemistry for molecule 1 ============================================================================== Selected UAIQM method: uaiqm_wb97x631gp@cc Selected version: latest Standard deviation of ML contribution : 0.00014563 Hartree 0.09139 kcal/mol Baseline contribution : -154.98764057 Hartree NN contribution : 0.09856558 Hartree D4 contribution : -0.00044829 Hartree Total energy : -154.88952328 Hartree ZPE-exclusive internal energy at 0 K: -154.88952 Hartree Zero-point vibrational energy : 0.08015 Hartree Internal energy at 0 K: -154.80937 Hartree Enthalpy at 298 K: -154.80413 Hartree Gibbs free energy at 298 K: -154.83475 Hartree Atomization enthalpy at 0 K: 1.25151 Hartree 785.33412 kcal/mol ZPE-exclusive atomization energy at 0 K: 1.33166 Hartree 835.63152 kcal/mol Heat of formation at 298 K: -0.12976 Hartree -81.42550 kcal/mol ============================================================================== Here is the IR spectrum obtained with UAIQM: .. image:: _static/lecture7/ir_ethanol_uaiqm_freq.png :width: 800 :align: center :alt: Gas-phase IR spectrum of ethanol at UAIQM It is in quite a good agreement with the `experimental spectrum `__ available on NIST (you should compare to the gas phase IR spectrum as calculations are done in vacuum!). .. note:: Currently, IR intensities can only be obtained with DFT-based UAIQM methods, i.e. ``uaiqm_wb97x631gp@cc``, and ``uaiqm_wb97xdef2tzvpp@cc``. Pure DFT methods are also supported. AIQM1 only locally supported. .. _ir_md: IR and power spectra from MD ++++++++++++++++++++++++++++ Rovibrational spectra can be obtained by simulating the nuclear motion with :ref:`MD `. The advantage is that this way the anharmonic effects are naturally included as well as conformational landscape can be properly represented. The disadvantage is that very long trajectories are needed which are quite expensive to obtain with the non-ML approaches. Also, classical MD does not properly take into account quantum nuclear effects which are especially substantial for light element hydrogen. Similarly to frequency calculations, if we do not have dipole moments, we can only calculate the power spectra using the autocorrelation function of velocities in the MD trajectory. If dipole moments are available, you can calculate the IR spectra with intensities derived the fast Fourier transform of the autocorrelation function of dipole moments. That may sound complicated, but practically, MLatom provides the means to obtain both power and IR spectra by post-processing the MD trajectory (see a `dedicated tutorial `__ for more theory and calculation options). In the task below, you can calculate the IR spectrum from the trajectory and compare it to the spectrum from static calculations. .. include:: materials/lecture7/task7.2/task7.2.inc The major qualitative difference is that the high-frequency peaks are shifted despite an excellent accuracy of AIQM1: this is a known consequence of not including quantum nuclear effects. The future versions of MLatom will include path-integral MD to take into account these effects. .. include:: outdating.rst .. _uvvis: UV/vis absorption spectra ------------------------- UV/vis absorption spectra is one of the most important spectroscopies which is also easy to carry out experimentally. However, theoretical simulations are very difficult because they require the calculation of excited states which are very costly and usually the affordable methods are not that accurate. There is therefore big insentive to accelerate and improve accuracy of theoretical UV/vis absorption spectra simulations with ML. .. _uvvis_sp: UV/vis spectra from single-point calculations +++++++++++++++++++++++++++++++++++++++++++++ The simplest way to calculate UV/vis spectra is by optimizing the geometry in the ground state and calculating the excitation energies and oscillator strengths for this geometry. The oscillator strengths correspond to intensities via a simple relation and the common practice is to simply broaden the line spectra represented by excitation energies on the x-axis and oscillator strengths on the y-axis with a Gaussian function. The width of the Gaussian function is usually selected manually to match the experimental shapes. The peaks are sometimes shifted too. This is so-called single-point convolution approach and is commonly employed to obtain at least qualitative comparison with experiment and explain the origin of the absorption bands. We are planning to provide a simple input file implementation for this type of calculations in the near future. .. include:: outdating.rst .. _uvvis_nea: UV/vis spectra with nuclear-ensemble approach +++++++++++++++++++++++++++++++++++++++++++++ The problem with the single-point convolution is, as you might properly guess, that the molecules are not stationary objects and the excitation properties change as the molecule vibrates. An example is a benzene molecule which is highly symmetric in the ground-state minimum but the vibrations break the symmetry and some of the dark excitations become slightly observable as very weak peaks in UV/vis. .. image:: _static/lecture7/benzene_vibr.gif :width: 400 :align: center :alt: One of the vibrational normal modes of benzene How to simulate such a process? We can sample many geometries from such vibrations -- this collection of geometries is called *nuclear ensemble*. The geometries can be sampled from both MD and from the normal modes obtained with frequencies calculations (usually from the so-called Wigner distribution). The latter sampling is the easiest and most frequently employed. Once we have sampled the geometries, we can calculate the excitation properties for them and obtain a spectrum. Regardless how the ensemble is obtained, this method of calculating spectra is called *nuclear ensemble approach* (NEA). It is a more rigorous way of simulating UV/vis spectra as it can recover the absorption band shapes and positions better than single-point convolution. However, NEA typically requires many hundreds or thousands of expensive quantum mechanical calculations of excited states precluding its widespread adoption. ML can greatly accelerate the simulations by predicting the excited state properties after learning on the fraction of the usually-required geometries. In `our ML-NEA approach `__ implemented in MLatom, we exploited the essence of ML -- we can use the same program to learn different properties just given the data. This is in contrast to QM methods where we have first to derive the proper physical models and then implement them for each property: .. image:: _static/lecture7/qm_vs_ml.png :width: 400 :align: center :alt: QM vs ML programming In ML-NEA, we create the KREG models to learn each property separately, i.e., we have twice as many models as excitations, because we learn both excitation energies and oscillator strengths. We also use kind of a :ref:`pool-based active learning `, where we keep sampling 50 more points from the Wigner distribution until the validation error in the ML models is decreasing less than by 10%: .. image:: _static/lecture7/uvvis_workflow.png :width: 400 :align: center :alt: QM vs ML programming This allows us to obtain very precise spectra with confidence -- in contrast to QM NEA we do not need to make subjective evaluation when stop sampling more points and what broadening factor to use. You can see it for yourself in the next exercise. .. include:: materials/lecture7/task7.3/task7.3.inc As you can see from this task, single-point convolution gives too simple of a spectrum and missing a finer band structure due to vibrations. Compared to QC-NEA spectrum, it is also shifted. QC-NEA spectrum has a problem of being too rough as evidently 250 points is not enough to provide a smooth spectrum. This leads to many spurious maxima. ML-NEA is the best but can be improved with more points further. However, this would not change the spectrum much. *The best improvement would be by providing a better QM method to generate training data for ML.* You might have a question how to generate such a spectrum from scratch. You can do it `if you install MLatom locally and have Gaussian program installed `__. We are planning to make ML-NEA also available on the XACS cloud computing in the future with alternative programs. .. include:: outdating.rst .. _emission: Emission spectra ---------------- Calculating emission spectra is similar to UV/vis with the major difference that we have to deal with the excited-state surface. E.g., if we use the single-point convolution then we need to optimize the geometry in the excited state (usually in the lowest excited state according to the Kasha's rule). We applied it successfully to `predict with AIQM1 that binding of fullerene with nanolassos quenches the fluorescence `__. However, on the cloud we can't use the required MNDO program for this and, hence, such simulations are currently only possible locally (see `instructions `__). .. include:: outdating.rst .. _tpa: Two-photon absorption spectra ----------------------------- Two-photon absorption (TPA) is a fascinating property because it arises after two photons are absorbed *simultaneously* so that the electron is excited to a higher energy level than absorption of one photon of the same energy would make possible. Once the energy is pumped into molecule, it can release it back by emitting a photon of higher energy. .. image:: _static/lecture7/tpa_energy_levels.png :width: 500 :align: center :alt: Explanation of two-photon absorption with energy diagrams [Credit: By `BP-Aegirsson - Own work `__, CC BY-SA 4.0] This can be exploited in many applications such as upconverted laser, two-photon lithography, 3D printing, photodynamic therapy, and bioimaging. Finding of molecules with strong TPA is therefore quite important. Unfortunately, the calculation of the TPA from first principles, i.e., with QM methods, is not an easy task and the accuracy of the affordable approaches is typically not high either. Hence, we developed a `procedure `__ to directly predict TPA with ML which is fast and comparable in accuracy to the DFT approaches. The calculations are quite simple as you just need to provide SMILES string of a molecule and information about the solvent and the range of wavelengths of photons to be absorbed (see the :ref:`next task `). .. include:: materials/lecture7/homework7.1.inc The best answer to this exercise would tell which molecule out of the two has the strongest two-photon absorption and for the first molecule, where is the peak absorption.