.. _releases: Releases ================= Current release is ``MLatom, version 3.17.1``. Upgrade your MLatom to the latest one with: .. code-block:: bash pip install --upgrade mlatom .. Upcoming releases .. ----------------- .. - more efficient implementations of DENS24. .. - more efficient predictions with the ANI models. .. - bug fixes. .. _release_3.17: MLatom 3.17 --------------------- - 3.17.1 -- released on 26.03.2025. What's new? +++++++++++ - More support for reaction exploration including: - QST2 and QST3 via interface to Gaussian - Stable NEB method via interface to ASE - `reaction_database` as the mlatom data format for storing reaction data - Improved data analysis and format transform including: - Transfer between h5 file and `molecular_database` - More options for calculating RMSD between two structures - Improved uvvis plot - Improved error handling - Many bug fix and code refactor to make MLatom more efficient and light-weighted :download:`Download zip `, check these versions on `PyPI `__ and `GitHub <>`__. ``pip install mlatom==3.17.1`` .. _release_3.16: MLatom 3.16 --------------------- - 3.16.2 -- released on 18.12.2024. - 3.16.1 -- released on 11.12.2024. - 3.16.0 -- released on 04.12.2024. :download:`Download zip `, check these versions on `PyPI `__ and `GitHub `__. ``pip install mlatom==3.16.2`` What's new? +++++++++++ - :ref:`TDDFT calculations with Gaussian and TDDFT and TDA calculations with PySCF `. - parsing Gaussian output files into :ref:`MLatom data format `. - new in 3.16.1: - DFTB and TD-DFTB calculations via the interface to DFTB+ program (see the :ref:`installation instructions ` on how to setup it with MLatom). - fixed the speed for importing mlatom as a Python package: ``import mlatom`` takes now less than a second. - New in 3.16.2: - calculating bond lengths, angles, dihedral angles with respective functions (``molecule.bond_length(atom1_index, atom2_index)``, etc.). - calculating RMSD between two structures as ``mlatom.xyz.rmsd(molecule1, molecule2)``. - running ``geomopt`` (or ``ts``) and ``freq`` calculations in the same job, e.g., input file can contain both keywords and will do frequency calculations after geometry optimization of minimum or transition state structure. .. _release_3.16_nutshell: In a nutshell ~~~~~~~~~~~~~~~~ To run TDDFT and TDA calculations with PySCF, just define your method as: .. code-block:: python import mlatom as ml tddft=ml.models.methods(method='TD-B3LYP/6-31G*', program='PySCF') # or tda=ml.models.methods(method='TDA-B3LYP/6-31G*', program='PySCF') Interface to Gaussian supports TDDFT as: .. code-block:: python import mlatom as ml tddft=ml.models.methods(method='TD-B3LYP/6-31G*', program='Gaussian') # note that by default the Gaussian output files will not be saved. You can request to save them, e.g., in the current directory as: tddft=ml.models.methods(method='TD-B3LYP/6-31G*', program='Gaussian', working_directory='.') To run DFTB or TD-DFTB calculations: .. code-block:: python import mlatom as ml dftb=ml.models.methods(method='DFTB') # that will work for both DFTB and TD-DFTB, see below the difference when used in predict Once defined, you can use the methods as usual for calculating excited-state properties, e.g., for :ref:`UV/vis spectra simulations `. Here is an example for single-point calculations for a single molecule or a molecular database (e.g., to get properties required for nuclear-ensemble approach in UV/vis spectra simulations or to generate data for ML): .. code-block:: python tddft.predict(molecule=single_molecule, calculate_energy=True, nstates=10, current_state=5) # or tda.predict(molecular_database=my_molecular_database, calculate_energy_gradients=True, nstates=10, current_state=5) # or for TD-DFTB dftb.predict(molecular_database=my_molecular_database, calculate_energy_gradients=True, nstates=10, current_state=5) # for just ground-state DFTB dftb.predict(molecular_database=my_molecular_database, calculate_energy_gradients=True) The nice thing is also that if you have Gaussian output files lying around, you can directly parse them with MLatom to get the molecules in its data format, e.g.: .. code-block:: python import mlatom as ml mol = ml.molecule.load('gaussian_output_file.log', format='gaussian') mol.dump('parsed_molecule.json', format='json') # or db = ml.molecular_database.load(filename='geomopt.log', format='gaussian') # or opttraj = ml.data.molecular_trajectory() opttraj.load(filename='sample_outputs/gaussian/geomopt.log', format='gaussian') # or mol = ml.molecule.load(filename='sample_outputs/gaussian/geomopt.log', format='gaussian') mol.optimization_trajectory mol.molecular_database MLatom also now provides ways to calculate RMSD between structures: .. code-block:: python #Example of the simple use: rmsd = mlatom.xyz.rmsd(molecule1.xyz_coordinates, molecule2.xyz_coordinates) #Example of using Hungarian algorithm to check for homonuclear atom permutation and reflections: rmsd = mlatom.xyz.rmsd(molecule1, molecule2, reorder='Hungarian', check_reflection=True) Calculating structural parameters is now also easy: .. code-block:: python bond_length = mol.bond_length(0, 1) # bond lengths between atoms 1 and 2 (using Python indexing starging from zero) bond_angle = mol.bond_length(0, 1, 2, degrees=True) # angles in degrees. If you want in radians, use degrees=False dihedral_angle = mol.bond_length(0, 1, 2, 4, degrees=True) And finally, after many requests, you can request in the same input file geometry optimization followed by frequency calculations: .. code-block:: bash geomopt freq xyzfile=init.xyz for TS optimization: .. code-block:: bash ts freq xyzfile=init.xyz Example how end of output file would look like: .. code-block:: bash Iteration Energy (Hartree) 1 -0.9037967543490 2 -0.9737420405670 3 -0.9675615854860 4 -0.9826833773690 5 -0.9826861683990 Final energy of molecule 1: -0.9826861683990 Hartree ============================================================================== Vibration analysis for molecule 1 ============================================================================== Multiplicity: 1 This is a linear molecule Mode Frequencies Reduced masses Force Constants IR intensities (cm^-1) (AMU) (mDyne/A) (km/mol) 1 3755.3923 1.0078 8.3743 0.0000 ============================================================================== Thermochemistry for molecule 1 ============================================================================== ZPE-exclusive internal energy at 0 K: -0.98269 Hartree Zero-point vibrational energy : 0.00856 Hartree Internal energy at 0 K: -0.97413 Hartree Enthalpy at 298 K: -0.97083 Hartree Gibbs free energy at 298 K: -0.98570 Hartree Atomization enthalpy at 0 K: 0.18717 Hartree 117.44811 kcal/mol ZPE-exclusive atomization energy at 0 K: 0.19572 Hartree 122.81645 kcal/mol Heat of formation at 298 K: -0.02252 Hartree -14.13239 kcal/mol ============================================================================== Wall-clock time: 5.03 s (0.08 min, 0.00 hours) MLatom terminated on 17.12.2024 at 21:30:24 ============================================================================== Contributors ++++++++++++ - Pavlo O. Dral (TDDFT in Gaussian, parsing of Gaussian output files, DFTB+ interface, ``geomopt freq`` task) - Vignesh Balaji Kumar (TDDFT and TDA in PySCF) - Yi-Fan Hou (parsing of Gaussian output files) - Xin-Yu Tong (DFTB and TD-DFTB calculations via DFTB+ interface) - Mikolaj Martyka (bond lengths, bond angles, and dihedral angles) .. _release_3.15.0: MLatom 3.15.0 --------------------- - 3.15.0 -- released on 27.11.2024. :download:`Download zip `, check these versions on `PyPI `__ and `GitHub `__. ``pip install mlatom==3.15.0`` What's new? +++++++++++ - :ref:`fine-tuning universal ANI models `. When using this feature, please cite: * Seyedeh Fatemeh Alavi, Yuxinxin Chen, Yi-Fan Hou, Fuchun Ge, Peikun Zheng, Pavlo O. Dral. Towards Accurate and Efficient Anharmonic Vibrational Frequencies with the Universal Interatomic Potential ANI-1ccx-gelu and Its Fine-Tuning. 2024. Preprint on ChemRxiv: https://doi.org/10.26434/chemrxiv-2024-c8s16 (2024-10-09). Contributors ++++++++++++ - Yuxinxin Chen - Fuchun Ge .. _release_3.14.0: MLatom 3.14.0 --------------------- - 3.14.0 -- released on 20.11.2024. :download:`Download zip `, check these versions on `PyPI `__ and `GitHub `__. ``pip install mlatom==3.14.0`` What's new? +++++++++++ - :ref:`UV/vis spectra ` from single-point convolution and nuclear-ensemble approach via Python API with improved plotting routines. - Updated interface to MACE to support its latest 0.3.8 version. - minor bug fixes. Contributors ++++++++++++ - Pavlo O. Dral - Fuchun Ge - Matheus de Oliveira Bispo .. _release_3.13.0: MLatom 3.13.0 --------------------- - 3.13.0 -- released on 06.11.2024. :download:`Download zip `, check these versions on `PyPI `__ and `GitHub `__. ``pip install mlatom==3.13.0`` What's new? +++++++++++ - :ref:`IR spectra calculations ` with AIQM1, AIQM2, UAIQM with semi-empirical baseline, and a range of QM methods (DFT, semi-empirical, ab initio wavefunction), with empirical scaling for better accuracy, special spectra module with plotting routines in Python. Contributors ++++++++++++ - Yi-Fan Hou - Fuchun Ge - Pavlo O. Dral .. _release_3.12.0: MLatom 3.12.0 --------------------- - 3.12.0 -- released on 08.10.2024. :download:`Download zip `, check these versions on `PyPI `__ and `GitHub `__. ``pip install mlatom==3.12.0`` What's new? +++++++++++ - `AIQM2 `__ - `ANI-1ccx-gelu `__. Contributors ++++++++++++ - Yuxinxin Chen .. _release_3.11.0: MLatom 3.11.0 --------------------- - 3.11.0 -- released on 23.09.2024. :download:`Download zip `, check these versions on `PyPI `__ and `GitHub `__. ``pip install mlatom==3.11.0`` What's new? +++++++++++ - :ref:`DENS24 implementation for our ensembles of DFT functionals `. - :ref:`IR spectra with harmonic oscillator approximation with AIQM1 and DFT methods `. - simpler input for methods, e.g., ``B3LYP/6-31G*`` in input will work, no need for ``method=B3LYP/6-31G* program=PySCF`` and similarly in Python API, no need to specify the ``program`` if the default is suitable. - major bug fixes, particularly in active learning. Contributors ++++++++++++ - Yuxinxin Chen - Yifan Hou - Mikolaj Martyka - Fuchun Ge .. _release_3.10.0: MLatom 3.10.0 --------------------- - 3.10.1 -- released on 21.08.2024. - 3.10.0 -- released on 21.08.2024. :download:`Download zip `, check these versions on `PyPI `__ and `GitHub `__. ``pip install mlatom==3.10.1`` What's new? +++++++++++ - big improvements for excited-state simulations and surface-hopping molecular dynamics: - :ref:`active learning ` for surface-hopping dynamics. It is efficient and robust: often, you can do surface-hopping dynamics from start to finish within a couple of days on a single GPU! - :ref:`multi-state learning model (MS-ANI) ` that has unrivaled accuracy for excited state properties (accuracy is often better than for models targeting only ground state!). We demonstrate that this model can be used for trajectory-surface hopping of multiple molecules (not just for a single molecule!) - :ref:`gapMD ` for efficient sampling of the vicinity of conical intersection - quality of life improvements: - :ref:`visualizing molecules and their vibrations in Jupyter `, e.g., simply use ``mymol.view(normal_mode=1)``, etc. - you can now load the molecule using both: - ``mol = molecule(); mol.load(filename='mymol.json')`` - ``mol = molecule.load(filename='mymol.json')`` - now you can view the MD trajectory in Jupyter as ``mytraj.show()`` for quick checks - same for molecular databases: ``moldb.show()`` When using the above improvements for surface-hopping dynamics, please cite: - Mikołaj Martyka, Lina Zhang, Fuchun Ge, Yi-Fan Hou, Joanna Jankowska, Mario Barbatti, Pavlo O. Dral. Charting electronic-state manifolds across molecules with multi-state learning and gap-driven dynamics via efficient and robust active learning. **2024**. Preprint on ChemRxiv: https://doi.org/10.26434/chemrxiv-2024-dtc1w. Contributors ++++++++++++ - Mikolaj Martyka (:red:`New contributor, welcome to the MLatom developers team!`) - Lina Zhang - Pavlo O. Dral .. _release_3.9.0: MLatom 3.9.0 --------------------- - 3.9.0 -- released on 23.07.2024. :download:`Download zip `, check these versions on `PyPI `__ and `GitHub `__. ``pip install mlatom==3.9.0`` What's new? +++++++++++ - :ref:`periodic boundary conditions ` Contributed to this release: - Fuchun Ge .. _release_3.8.0: MLatom 3.8.0 --------------------- - 3.8.0 -- released on 17.07.2024. :download:`Download zip `, check these versions on `PyPI `__ and `GitHub `__. ``pip install mlatom==3.8.0`` What's new? +++++++++++ - :ref:`Directly learning dynamics with GICnet ` The implementation details are given in the following work (please cite it alongside other required citations when using this feature): - Fuchun Ge, Lina Zhang, Yi-Fan Hou, Yuxinxin Chen, Arif Ullah, Pavlo O. Dral*. Four-dimensional-spacetime atomistic artificial intelligence models. *J. Phys. Chem. Lett*. **2023**, 14, 7732–7743. DOI: `10.1021/acs.jpclett.3c01592 `_. Contributed to this release: - Fuchun Ge .. _release_3.7.0: MLatom 3.7.0 -- 3.7.1 --------------------- - 3.7.1 -- released on 04.07.2024. Bug fix. - 3.7.0 -- released on 03.07.2024. :download:`Download zip `, check these versions on `PyPI `__ and `GitHub `__. ``pip install mlatom==3.7.1`` What's new? +++++++++++ - :ref:`Active learning ` - batch parallelization of MD (heavily used in AL). The implementation details are given in the following work (please cite it alongside other required citations when using this feature): - Yi-Fan Hou, Lina Zhang, Quanhao Zhang, Fuchun Ge, Pavlo O. Dral. `Physics-informed active learning for accelerating quantum chemical simulations `__. *arXiv* **2024**, DOI: 10.48550/arXiv.2404.11811. Contributed to this release: - Yi-Fan Hou (implementations and tests of AL, documentation, design of API) - Pavlo O. Dral (implementation of earlier version of AL, design of API, documentation) - Fuchun Ge (batch parallelization of MD) .. include:: releases/license_3.7.0.rst .. _release_3.6.0: MLatom 3.6.0 ------------ Released on 15.05.2024. :download:`Download zip `, check this version on `PyPI `__ and `GitHub `__. ``pip install mlatom==3.6.0`` What's new? +++++++++++ This is a major release with the implementation of many new universal ML-based models (see :ref:`tutorial `): - DM21 - AIMNet2 - ANI-1xnr Contributed to this release: Yuxinxin Chen. .. include:: releases/license_3.6.0.rst .. _release_3.5.0: MLatom 3.5.0 ------------ Released on 08.05.2024. :download:`Download zip `, check this version on `PyPI `__ and `GitHub `__. ``pip install mlatom==3.5.0`` What's new? +++++++++++ This is a major release with the implementation of the quasi-classical MD: - sampling from the harmonic quantum Boltzmann distribution - :ref:`quasi-classical molecular dynamics (quasi-classical trajectories (QCT)) ` The implementation details are given in the following work (please cite it alongside other required citations when using this feature): - Yi-Fan Hou, Lina Zhang, Quanhao Zhang, Fuchun Ge, Pavlo O. Dral. `Physics-informed active learning for accelerating quantum chemical simulations `__. *arXiv* **2024**, DOI: 10.48550/arXiv.2404.11811. Contributed to this release: Yi-Fan Hou. .. include:: releases/license_3.5.0.rst .. _release_3.4.0: MLatom 3.4.0 ------------ Released on 29.04.2024. :download:`Download zip `, check this version on `PyPI `__ and `GitHub `__. ``pip install mlatom==3.4.0`` What's new? +++++++++++ This release is a major release with usability improvements: - :ref:`simpler input ` of xyz coordinates (and other data) directly in the input file. - :ref:`more informative output ` of geometry optimizations. - :ref:`interface ` to `geomeTRIC `__ package for geometry optimizations. - and :ref:`other improvements ` related to geometry optimizations. Contributed to this release: Pavlo O. Dral, Yuxinxin Chen, Fuchun Ge, and Yi-Fan Hou. .. _release_3.4.0_nutshell: In a nutshell ~~~~~~~~~~~~~~~~ Input becomes much simpler. Here is an example of the geometry optimization job: .. code-block:: geomopt GFN2-xTB XYZfile=' 2 H 0 0 0 H 0 0 0.8 ' which would print out much more informative output, e.g., a snippet: .. code-block:: ------------------------------------------------------------------------------ Iteration 8 ------------------------------------------------------------------------------ Molecule with 2 atom(s): H, H XYZ coordinates, Angstrom 1 H 0.000000 0.000000 -0.138338 2 H 0.000000 0.000000 0.638338 Interatomic distance matrix, Angstrom [[0. 0.77667652] [0.77667652 0. ]] Energy: -0.982686 Hartree Energy gradients, Hartree/Angstrom 1 H 0.000000 0.000000 0.000020 2 H 0.000000 0.000000 -0.000020 Energy gradients norm: 0.000029 Hartree/Angstrom .. _release_3.4.0_input: Simpler input ~~~~~~~~~~~~~~~~ If you ever wanted to get rid of the auxiliary files, now it is possible. You just need to enclose the content of such files between ``'`` characters instead of providing the input file name, e.g., as above: .. code-block:: geomopt GFN2-xTB XYZfile=' 2 H 0 0 0 H 0 0 0.8 ' This also works for other files such as velocities, etc. Optimized geometries will be saved in ``optgeoms.xyz`` if option ``optxyz=`` is not provided. .. _release_3.4.0_output: Informative output ~~~~~~~~~~~~~~~~~~ Now you can track the progress of your geometry optimization in the output file which would print out the information about geometries, energies, and gradients at each iteration (if you have no more than 10 molecules): .. code-block:: ------------------------------------------------------------------------------ Iteration 8 ------------------------------------------------------------------------------ Molecule with 2 atom(s): H, H XYZ coordinates, Angstrom 1 H 0.000000 0.000000 -0.138338 2 H 0.000000 0.000000 0.638338 Interatomic distance matrix, Angstrom [[0. 0.77667652] [0.77667652 0. ]] Energy: -0.982686 Hartree Energy gradients, Hartree/Angstrom 1 H 0.000000 0.000000 0.000020 2 H 0.000000 0.000000 -0.000020 Energy gradients norm: 0.000029 Hartree/Angstrom In addition, geometry optimizations will dump many useful files: - optimized geometries in ``optgeoms.xyz`` or other file saved under name requested with ``optxyz=``. - the optimization trajectories in XYZ format ``opttraj1.xyz`` and JSON format ``opttraj1.json`` and so on for each molecule. - in case of optimizations with the Gaussian optimizer, you will also get the corresponding input and output files ``gaussian1.com`` and ``gaussian1.log`` etc for each molecule. If you want to control how much information is saved (e.g., for big molecules and many molecules): - ``printmin`` will not print information about every iteration. - ``printall`` will print detailed information at each iteration. - ``dumpopttrajs=False`` will not dump any optimization trajectories. The corresponding controls are also availalbe in Python API, i.e., arguments for ``ml.simulations.optimize_geometry``: - ``print_properties`` (``None`` or ``str``, optional): properties to print. Default: ``None``. Possible ``'all'``. - ``dump_trajectory_interval`` (``int``, optional): dump trajectory at every time step (1). Set to ``None`` to disable dumping (default). - ``filename`` (``str``, optional): the file that saves the dumped trajectory. Default: ``None``. - ``format`` (``str``, optional): format in which the dumped trajectory is saved. Default: ``'json'``. .. _release_3.4.0_geometric: geomeTRIC ~~~~~~~~~~~~~~~~~~ MLatom now supports geometry optimizations (including TS optimization) with `geomeTRIC `__. To install it, just run ``pip install geometric``. If you want to use it, in command line add ``optprog=geomeTRIC``, in Python API ``program=geometric``. If you use this program, please cite: * L.-P. Wang, C. C. Song, *J. Chem. Phys.* **2016**, *144*, 214108. .. _release_3.4.0_misc: Other changes ~~~~~~~~~~~~~~~~~~ - overwrite (and print warning) the XYZ file with optimized geometry if it exists. Before MLatom would terminate and complain that the file exists. Practice showed that it is annoying behavior as often we want to rerun calculations in the same folder and replace the old calculation result. - more graceful handling of failed geometry optimizations. .. include:: releases/license_3.4.0.rst .. _release_3.3.0: MLatom 3.3.0 ------------ Released on 03.04.2024. :download:`Download zip `, check this version on `PyPI `__ and `GitHub `__. ``pip install mlatom==3.3.0`` What's new? +++++++++++ This is a major release with: * :ref:`surface-hopping dynamics ` (within NAC-free Landau--Zener--Belyev--Lebedev (LZBL) approximation) * support of :ref:`excited-state calculations ` with AIQM1, *ab initio* methods through COLUMBUS (for CASSCF) and Turbomole (for ADC(2)), semi-empirical methods through the MNDO program, and machine-learning and hybrid QM/ML methods. * :ref:`Wigner sampling ` (with and without filtering by excitation energy window) - useful for generating initial conditions for surface-hopping dynamics or spectra simulations. The routine for Wigner sampling is adapted to Python from `Newton-X `__. It is not needed to be installed, but you must cite the corresponding `Newton-X paper `__ when using the Wigner sampling. Minor change: * Since this release, we fixed how the validation loss is calculated in the training of ANI networks. Now it is calculated as the overall mean squared error over all batches, while before it was calculated as averaged RMSE of validation batches. This might lead to different numerical results as the model training uses validation RMSE for, e.g., early stopping and learning rate. Contributed to this release: Pavlo O. Dral, Zhang Lina, Fuchun Ge, Sebastian Pios, Yi-Fan Hou, and Yuxinxin Chen. See our paper for more details (please also cite it if you use this feature): * Lina Zhang, Sebastian Pios, Mikołaj Martyka, Fuchun Ge, Yi-Fan Hou, Yuxinxin Chen, Joanna Jankowska, Lipeng Chen, Mario Barbatti, `Pavlo O. Dral `__. `MLatom software ecosystem for surface hopping dynamics in Python with quantum mechanical and machine learning methods `__. *J. Chem. Theory Comput.* **2024**, *20*, 5043--5057. DOI: 10.1021/acs.jctc.4c00468. Preprint on *arXiv*: https://arxiv.org/abs/2404.06189. The code snippet to give an idea how to use these new features (see the :ref:`dedicated tutorial `): .. code-block:: python import mlatom as ml # Load the initial geometry of a molecule mol = ml.data.molecule() mol.charge=1 mol.read_from_xyz_file('cnh4+.xyz') # Define model aiqm1 = ml.models.methods(method='AIQM1',qm_program_kwargs={'save_files_in_current_directory': True, 'read_keywords_from_file':f'mndokw'}) method_optfreq = ml.models.methods(method='B3LYP/Def2SVP', program='pyscf') # You can also use AIQM1 (it is recommended for neutral species because if its better quality) # Optimize geometry geomopt = ml.simulations.optimize_geometry(model=method_optfreq, initial_molecule=mol) eqmol = geomopt.optimized_molecule eqmol.write_file_with_xyz_coordinates('eq.xyz') # Get frequencies ml.simulations.freq(model=method_optfreq, molecule=eqmol) eqmol.dump(filename='eqmol.json', format='json') # Get initial conditions init_cond_db = ml.generate_initial_conditions(molecule=eqmol, generation_method='wigner', number_of_initial_conditions=16, initial_temperature=0) init_cond_db.dump('test.json','json') # Propagate multiple LZBL surface-hopping trajectories in parallel # .. setup dynamics calculations namd_kwargs = { 'model': aiqm1, 'time_step': 0.25, 'maximum_propagation_time': 5, 'hopping_algorithm': 'LZBL', 'nstates': 3, 'initial_state': 2, } # .. run trajectories in parallel dyns = ml.simulations.run_in_parallel(molecular_database=init_cond_db, task=ml.namd.surface_hopping_md, task_kwargs=namd_kwargs, create_and_keep_temp_directories=True) trajs = [d.molecular_trajectory for d in dyns] # Dump the trajectories itraj=0 for traj in trajs: itraj+=1 traj.dump(filename=f"traj{itraj}.h5",format='h5md') # Analyze the result of trajectories and make the population plot ml.namd.analyze_trajs(trajectories=trajs, maximum_propagation_time=5) ml.namd.plot_population(trajectories=trajs, time_step=0.25, max_propagation_time=5, nstates=3, filename=f'pop.png') .. include:: releases/license_3.3.0.rst .. _release_3.2.0: MLatom 3.2.0 ------------ Released on 19.03.2024. :download:`Download zip `, check this version on `PyPI `__ and `GitHub `__. ``pip install mlatom==3.2.0`` What's new? +++++++++++ This is a major release with many new features, usability and performance improvements, and bug fixes. In short, we: * implemented :ref:`energy-weighted training ` of ANI machine learning potentials * implemented :ref:`diffusion Monte Carlo ` * made AIQM1 on the XACS cloud :ref:`much faster and more stable ` * made :ref:`frequency calculations more robust ` * improved :ref:`ORCA interface ` * implemented CCSD(T)*/CBS calculations for :ref:`charged and open-shell species ` * implemented :ref:`geometry optimization with constraints `. .. _release_3.2.0_weight_e: Energy-weighted training of ANI-type machine learning potentials ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ It is hard to obtain machine learning potentials with balanced description of different PES regions when training on global PES data with many strongly distorted molecular geometries which have high deformation energies. Hence, we implemented training of ANI machine learning potentials using the energy weighting function that downweights the importance of PES regions with high deformation energies. We also provide recommendations, tutorials, and training scripts on an example of the glycine global PES. See our paper for more details (please also cite it if you use this feature): * F. Ge, R. Wang, C. Qu, P. Zheng, A. Nandi, R. Conte, P. L. Houston, J. M. Bowman, `P. O. Dral `__. `Tell machine learning potentials what they are needed for: Simulation-oriented training exemplified for glycine `__. *J. Phys. Chem. Lett.* **2024**, *15*, 4451--4460. DOI: 10.1021/acs.jpclett.4c00746. Preprint on *arXiv*: https://arxiv.org/abs/2403.11216. The code snippet to give an idea how to use this new features (see the `dedicated tutorial `__): .. code-block:: import mlatom as ml # ... # define the weighing_function def weighting_function(energy_reference, a): # a - is a parameter defining the shape of the function global_minimum = -284.33376035 reference_tensor = torch.tensor(energy_reference - global_minimum) x=a*reference_tensor x_tensor = torch.tensor(x) x_pow5 = x_tensor ** 5 x_pow4 = x_tensor ** 4 x_pow3 = x_tensor ** 3 w = -6 * x_pow5 + 15 * x_pow4 - 10 * x_pow3 + 1 w = torch.clamp(w, min=0.000001) return w # train the ANI model with the energy weighting function # get subsets subtrain_molDB and validate_molDB from somewhere (not shown here) ani = ml.models.ani(model_file=f"glycine_ani_a2.15.pt") ani.train( molecular_database=subtrain_molDB, validation_molecular_database=validate_molDB, property_to_learn='energy', xyz_derivative_property_to_learn='energy_gradients', energy_weighting_function=weighting_function, energy_weighting_function_kwargs={'a': 2.15}, hyperparameters={'lrReducePatience': 32} ) .. _release_3.2.0_dmc: Diffusion Monte Carlo ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Diffusion Monte Carlo (DMC) is a computationally expensive way of obtaining accurate frequencies and ZPVEs, and hence, the use machine learning potentials is indespensible. We interfaced MLatom to the great code `PyVibDMC `__ to enable the DMC simulations with different models. See our paper for an example of DMC calculations for glycine conformers (please also cite it and PyVibDMC if you use this feature): * F. Ge, R. Wang, C. Qu, P. Zheng, A. Nandi, R. Conte, P. L. Houston, J. M. Bowman, `P. O. Dral `__. Tell machine learning potentials what they are needed for: Simulation-oriented training exemplified for glycine. **2024**, *15*, 4451--4460. Preprint on *arXiv*: https://arxiv.org/abs/2403.11216. The code snippet to give an idea how to use this new features (see the `dedicated tutorial `__): .. code-block:: import mlatom as mlatom # ... dmc=ml.simulations.dmc(model=model, initial_molecule=conf) dmc.run(number_of_walkers=30000, number_of_timesteps=55000) print(f'ZPVE: {(dmc.get_zpe(start_step=-1000) + 284.33355671) * 219474.63} cm-1') .. _release_3.2.0_aiqm1: AIQM1 on the XACS cloud: much faster and more stable ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Yes, we know, AIQM1 has been too slow and numerically unstable on the XACS cloud. It is frustrating to both us and you. The reason is that we cannot use on the cloud the MNDO program which has analytical gradients (due to license reasons), and rely on Sparrow which is a great software for many semi-empirical approaches but unfortunately still has no analytical gradients for AIQM1 (it has for many other methods...). Thus, as a poor man's solution, we did some tweaking with calculations based on numerical derivatives and that greatly improved both speed and stability of AIQM1 on the cloud for geometry optimizations and, to some extend, also for frequencies and thermochemical calculations (see below). With more CPUs available for parallelization **AIQM1 on the cloud might be even faster than with single-CPU MNDO program**! Geometries are good for the tested molecules when optimized with the Sparrow-based AIQM1. Frequencies and thermochemistry are ok for small molecules like hydrogen or methane but not for bigger molecules like vinylacetylene and, hence, the output may have many negative frequencies which are purely artifact of errors introduced by numerical differentiation. Thus, when you use Sparrow-based AIQM1, please check the frequencies -- if there are too many negative ones, the frequencies and thermochemistries are not reliable. Currently, the only solution is to use MNDO-based AIQM1, but we have some methods in the workings that should improve the situation in the future releases. Please give another try to AIQM1 on the `XACS cloud `__! .. _release_3.2.0_freqprog: New keyword ``freqprog``. Frequencies and thermochemistry with PySCF ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Now you can run frequency and thermochemistry calculations with PySCF with ``freqprog=pyscf`` for command-line use of MLatom. In Python, you can use the similar option ``mlatom.freq(program=`PySCF`, ...)`` and ``mlatom.thermochemistry(program=`PySCF`, ...)``. This release also introduces the new keyword ``freqprog`` for command-line use and, hence, resolve the inconsistency in the previous releases when to choose the program for frequency calculations, the users have to use the odd-looking ``optprog`` which should only be used for choosing the program for optimization as its name suggests... Using PySCF has several advantages: it is open-source and can be used on our XACS cloud (where it is the default option now) and it has much more consistent handling of frequencies than our previous implementation based on TorchANI (and ASE for thermochemistry). PySCF automatically removes the rotational and translational frequencies; these can particularly erroneous and messing up thermochemistry if you use numerical gradients and Hessians (as we do on the cloud for AIQM1). If you want to see those, however, our old implementation is the way to go (use ``freqprog=ASE``). PySCF is now made the default program if no Gaussian is detected. The old workhorse Gaussian still works like a charm if you have it and, hence, it is still the default option in MLatom. .. _release_3.2.0_orca: ORCA interface ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Orca is a popular program and we use it to generate data for our ML models. Hence, we implemented what we needed so far. The current interface works nicely for DFT calculations but if you want to do something else, the interface has some dirty tricks which you can find in the API manual (that might be changed in the future though, if someone wants to improve the interface). The example of the Orca use in the input file is as usual: .. code-block:: method=B3LYP/6-31G* qmprog=Orca geomopt xyzfile=init.xyz optxyz=opt.xyz And in Python API: .. code-block:: import mlatom as ml # ... dft_with_orca = ml.models.methods(method='B3LYP/6-31G*', program='Orca') .. _release_3.2.0_ccsdt: CCSD(T)*/CBS for charged and open-shell species ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We use CCSD(T)*/CBS to generate data and test ML models. It is an expensive but a rather accurate approach. So far our implementation only supported closed-shell, neutral molecules because we didn't need anything else for our research. With the new orca interface, we rewrote the old code and the new one naturally supports charged and open-shell species. If you want to try it out for this purpose, the input file would look like: .. code-block:: CCSD(T)*/CBS yestfile=energy.dat xyzfile=init.xyz charges=1,0 multiplicities=2,1 In Python: .. code-block:: import mlatom as ml # ... ccsd = ml.models.methods(method='CCSD(T)*/CBS') mol.charge=1 ; mol.multiplicity=2 ccsd.predict(molecule=mol) print(mol.energy) .. _release_3.2.0_geomopt_constraints: Geometry optimization with constraints ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We use ASE implementation of constraints so this is only available when you use ASE optimizer via Python API. It needs one more argument ``constraints`` and it should be used like this: ``constraints={'bonds':[[target,[index0,index1]], ...],'angles':[[target,[index0,index1,index2]], ...],'dihedrals':[[target,[index0,index1,index2,index3]], ...]}`` (Check `FixInternals class in ASE `__ for more information). The units of bond lengths are Angstrom and those of angles and dihedrals are degrees. Note that the indices of atoms start from 0! Below shows an example of optimizing ethane using AIQM1 while setting the target of C-C bond length as 1.8 Angstrom. The initial XYZ coordinates are: .. code-block:: 8 C -3.41779278 -2.06081078 0.00000000 H -3.06113836 -3.06962078 0.00000000 H -3.06111994 -1.55641259 -0.87365150 H -4.48779278 -2.06079760 0.00000000 C -2.90445057 -1.33485451 1.25740497 H -1.83445237 -1.33656157 1.25838332 H -3.25950713 -0.32548149 1.25642758 H -3.26271953 -1.83812251 2.13105517 In Python script: .. code-block:: python import mlatom as ml mol = ml.data.molecule.from_xyz_file('ethane_initial.xyz') print(f"Initial C-C bond length: {mol.get_internuclear_distance_matrix()[0][4]} Angstrom") constraints = {'bonds':[[1.8,[0,4]]]} aiqm1 = ml.models.methods(method='AIQM1',qm_program='sparrow') geomopt = ml.optimize_geometry(model=aiqm1,initial_molecule=mol,program='ase',constraints=constraints) optmol = geomopt.optimized_molecule print(f"Final C-C bond length: {optmol.get_internuclear_distance_matrix()[0][4]} Angstrom") print("XYZ coordinates:") print(optmol.get_xyz_string()) The output should look like this: .. code-block:: Initial C-C bond length: 1.5399999964612658 Angstrom Step Time Energy fmax LBFGS: 0 10:11:55 -2169.358192 0.6954 LBFGS: 1 10:11:56 -2168.493055 1.5444 LBFGS: 2 10:11:56 -2168.324507 2.1746 LBFGS: 3 10:11:57 -2168.702277 0.3925 LBFGS: 4 10:11:57 -2168.719565 0.3556 LBFGS: 5 10:11:58 -2168.744194 0.4116 LBFGS: 6 10:11:58 -2168.765605 0.3914 LBFGS: 7 10:11:58 -2168.779525 0.2010 LBFGS: 8 10:11:59 -2168.782649 0.0282 LBFGS: 9 10:11:59 -2168.782767 0.0237 LBFGS: 10 10:12:00 -2168.782733 0.0370 LBFGS: 11 10:12:00 -2168.782749 0.0236 LBFGS: 12 10:12:00 -2168.782623 0.0873 LBFGS: 13 10:12:01 -2168.782774 0.0203 LBFGS: 14 10:12:01 -2168.782777 0.0217 LBFGS: 15 10:12:01 -2168.782654 0.0988 LBFGS: 16 10:12:02 -2168.782762 0.0299 LBFGS: 17 10:12:02 -2168.782723 0.0372 LBFGS: 18 10:12:03 -2168.782762 0.0250 LBFGS: 19 10:12:03 -2168.777521 0.5284 LBFGS: 20 10:12:03 -2168.782764 0.0369 LBFGS: 21 10:12:04 -2168.782758 0.0397 LBFGS: 22 10:12:04 -2168.781062 0.1423 LBFGS: 23 10:12:04 -2168.782774 0.0375 LBFGS: 24 10:12:05 -2168.782774 0.0251 LBFGS: 25 10:12:05 -2168.782670 0.0433 LBFGS: 26 10:12:06 -2168.782769 0.0209 LBFGS: 27 10:12:06 -2168.782777 0.0177 Final C-C bond length: 1.799999998386536 Angstrom XYZ coordinates: 8 C -3.4609594100000 -2.1223293000000 -0.1060679300000 H -3.0827425100000 -3.1376804600000 -0.0751292200000 H -3.0834089300000 -1.5865323800000 -0.9697472700000 H -4.5445733300000 -2.1041917000000 -0.0741819000000 C -2.8607234100000 -1.2737628100000 1.3635074000000 H -1.7772071700000 -1.2935363400000 1.3338165800000 H -3.2376294200000 -0.2576124300000 1.3304506900000 H -3.2417292800000 -1.8070164100000 2.2269711900000 .. include:: releases/license_3.2.0.rst Older releases -------------- Older releases you can find on the `legacy page `_.