.. _tutorial-geomopt: Geometry optimization ===================== In this tutorial we show how to optimize molecular geometries with MLatom. Here, only the optimization of the minima is shown and the optimization of the transition states will be shown elsewhere (for now, please check out the `manual `_). Here is a video of geometry optimization using XACS cloud and Python API: .. raw:: html We start with the simplest example, using the input file :download:`geomopt.inp ` with minimum number of options (see below for :ref:`optimizing via Python API `): .. code-block:: geomopt # 1. requests geometry optimization ANI-1ccx # 2. pre-trained model xyzfile=init.xyz # 3. initial geometry guess optxyz=opt.xyz # 4. file with optimized geometry As you see, the input file is pretty self-explanatory, the ``geomopt`` option requests the geometry optimization in the first line. The user needs to provide the model or method, here we use the universal ML potential ANI-1ccx in the second line. In principle, any QM method, ML model (pre-trained or trained by the user), or hybrid ML/ML models that provides energies and gradients can be used (we will explain :ref:`below how to do it `). ``xyzfile=[file name]`` tells MLatom in what file it can find the initial guess. The user should provide the initial guess in the XYZ file (in Angstrom, e.g., :download:`init.xyz `) for the geometry, e.g., for ethanol which would look like: .. code-block:: 9 C -1.691449880 -0.315985130 0.000000000 H -1.334777040 0.188413060 0.873651500 H -1.334777040 0.188413060 -0.873651500 H -2.761449880 -0.315971940 0.000000000 C -1.178134160 -1.767917280 0.000000000 H -1.534806620 -2.272315330 0.873651740 H -1.534807450 -2.272316160 -0.873650920 O 0.251865840 -1.767934180 -0.000001150 H 0.572301420 -2.672876720 0.000175020 .. note:: This self-contained input file without the need to provide auxiliary file also works: .. code-block:: ANI-1ccx # pre-trained model geomopt # requests geometry optimization xyzfile=' 9 C -1.691449880 -0.315985130 0.000000000 H -1.334777040 0.188413060 0.873651500 H -1.334777040 0.188413060 -0.873651500 H -2.761449880 -0.315971940 0.000000000 C -1.178134160 -1.767917280 0.000000000 H -1.534806620 -2.272315330 0.873651740 H -1.534807450 -2.272316160 -0.873650920 O 0.251865840 -1.767934180 -0.000001150 H 0.572301420 -2.672876720 0.000175020 ' Also, ``optxyz=`` is optional. If you do not provide it, it will dump the optimized geometries in ``optgeoms.xyz``. The final line with optional ``optxyz=[file name]`` will request MLatom to save the optimized geometry in XYZ file (here ``opt.xyz``). Order of the options is not important, options are case-insensitive except for file names. After you created the input file and initial XYZ file, you can run MLatom, e.g., on XACS cloud, as: .. code-block:: bash mlatom geomopt.inp &> geomopt.out The optimized geometry will be saved in XYZ file ``opt.xyz``. The program output will be saved in file ``geomopt.out``. Alternatively, you can run the same simulations without any input file simply with in the command line with the same options: .. code-block:: bash mlatom geomopt ANI-1ccx xyzfile=init.xyz optxyz=opt.xyz You should be able to find in the MLatom output the properties of the final geometry, i.e., the final energy and any other relevant properties such as standard deviation of the neural networks in the ANI-1ccx method. In addition, you can check how the optimization progresses as it will print out the energy for each optimization step. For our example, the part of the output would look like: .. code-block:: ============================================================================== Optimization of molecule 1 ============================================================================== ... ------------------------------------------------------------------------------ Iteration 16 ------------------------------------------------------------------------------ Molecule with 9 atom(s): C, H, H, H, C, H, H, O, H XYZ coordinates, Angstrom 1 C -1.672571 -0.341122 -0.000001 2 H -1.307766 0.181713 0.885095 3 H -1.307762 0.181707 -0.885099 4 H -2.764560 -0.305014 -0.000003 5 C -1.188732 -1.771664 0.000009 6 H -1.559124 -2.298647 0.885998 7 H -1.559099 -2.298653 -0.885987 8 O 0.237878 -1.729915 0.000028 9 H 0.575701 -2.626896 0.000135 Interatomic distance matrix, Angstrom [[0. 1.09079523 1.09079553 1.09258554 1.51014856 2.15168963 2.15169099 2.3618975 3.20616369] [1.09079523 0. 1.77019411 1.7727239 2.14784248 2.49306442 3.05812311 2.61279175 3.4955523 ] [1.09079553 1.77019411 0. 1.77272412 2.14784559 3.05812426 2.49306137 2.61280479 3.49561397] [1.09258554 1.7727239 1.77272412 0. 2.15274127 2.49251861 2.49252903 3.32339819 4.06798183] [1.51014856 2.14784248 2.14784559 2.15274127 0. 1.09538946 1.09538987 1.42722094 1.96077669] [2.15168963 2.49306442 3.05812426 2.49251861 1.09538946 0. 1.77198527 2.08269455 2.33451888] [2.15169099 3.05812311 2.49306137 2.49252903 1.09538987 1.77198527 0. 2.08269348 2.33459306] [2.3618975 2.61279175 2.61280479 3.32339819 1.42722094 2.08269455 2.08269348 0. 0.95848785] [3.20616369 3.4955523 3.49561397 4.06798183 1.96077669 2.33451888 2.33459306 0.95848785 0. ]] Energy: -154.891959 Hartree Energy gradients, Hartree/Angstrom 1 C -0.000084 0.000219 0.000000 2 H 0.000012 -0.000026 -0.000223 3 H 0.000012 -0.000025 0.000223 4 H 0.000084 -0.000248 0.000000 5 C 0.000027 -0.000139 -0.000001 6 H -0.000370 0.000067 0.000512 7 H -0.000371 0.000067 -0.000512 8 O 0.000601 0.000016 -0.000001 9 H 0.000089 0.000070 0.000001 Energy gradients norm: 0.001194 Hartree/Angstrom Iteration Energy (Hartree) 1 -154.8894274342429 2 -154.8894274342429 3 -154.8913352129107 4 -154.8915612759089 5 -154.8917131295130 6 -154.8917542901067 7 -154.8918338589389 8 -154.8918607356472 9 -154.8918827332715 10 -154.8919015405256 11 -154.8919243290977 12 -154.8919395989974 13 -154.8919477894997 14 -154.8919527832317 15 -154.8919570330184 16 -154.8919591168921 Final properties of molecule 1 Standard deviation of NNs : 0.00063190 Hartree 0.39652 kcal/mol Energy : -154.89195912 Hartree Your numbers may somewhat vary, particularly depending on what optimizer your MLatom setup is using. Above example is using ASE with the L-BFGS optimizer and it took 16 iterations to converge. .. You can download the full output file :download:`geomopt_ase.log ` to inspect it closer. See below the :ref:`instructions on optimizers `. In addition, geometry optimizations will dump many useful files: - the optimization trajectories in XYZ format ``opttraj1.xyz`` and JSON format ``opttraj1.json`` and so on for each molecule. - in case of optimizations with the Gaussian optimizer, you will also get the corresponding input and output files ``gaussian1.com`` and ``gaussian1.log`` etc for each molecule. If you want to control how much information is saved (e.g., for big molecules and many molecules): - ``printmin`` will not print information about every iteration. - ``printall`` will print detailed information at each iteration. - ``dumpopttrajs=False`` will not dump any optimization trajectories. .. _model-definition: Choosing the method or model ---------------------------- The user can benefit from MLatom's many choices of the methods or models to be used for optimizations. See the separate manuals and tutorials, here we give brief examples to quickly get started. Many methods such as ANI-1ccx or AIQM1 are recognized automatically by MLatom as shown above. The general way to define the QM method is to use ``method`` and the corresponding QM program providing its implementations (MLatom uses interfaces to such programs). For example, ab initio (e.g., HF or MP2) or DFT (e.g., B3LYP/6-31G*) can be requested with the input: .. code-block:: geomopt # 1. requests geometry optimization method=B3LYP/6-31G* # 2. request running DFT optimization with B3LYP qmprog=PySCF # 3. request using PySCF for B3LYP calculations; qmprog=Gaussian can be also used xyzfile=init.xyz # 4. initial geometry guess optxyz=opt.xyz # 5. file with optimized geometry The general way to request the geometry optimization with the user-trained ML model is to specify the ML model type with ``MLmodelType`` and ``MLmodelIn`` keyword such as: .. code-block:: geomopt # 1. requests geometry optimization MLmodelType=MACE # 2. request optimization with the MACE-type of machine learning potential MLmodelIn=mace.pt # 3. the file with the trained model should be provided, here it is mace.pt file xyzfile=init.xyz # 4. initial geometry guess optxyz=opt.xyz # 5. file with optimized geometry See manuals and tutorials on how to train such models, e.g., `for MACE `_. The advanced user-defined hybrid models such as those based on the delta-learning concept can be used for geometry optimizations only via Python API (see :ref:`example below `). .. note:: In principle, our default recomendation is to use AIQM1 as it is usually the most accurate method for affordable cost. However, it is currently only applicable to the compounds with CHNO elements. Another limitation is that to unlock its full potential, we recommend to install the MNDO program for its QM part. The XACS cloud uses Sparrow instead, which currently only provides numerical gradients limiting its applicability for geometry optimizations. ANI-1ccx is hence a good alternative on the XACS cloud if it can be applied (it has additional limitations to neutral closed-shell compounds in ground state). Optimization of many molecules --------------------------------- MLatom as a data-oriented program naturally allows to perform geometry optimizations of many molecules. You need to simply prepare the XYZ file with initial guesses of multiple molecules, e.g., for optimizing the hydrogen and methane molecules in one computational job your ``init.xyz`` file may look like: .. code-block:: bash 2 1 0.0000000000 0.0000000000 0.0000000000 1 0.7414000000 0.0000000000 0.0000000000 5 C 0.0000000000 0.0000000000 0.0000000000 H 1.0870000000 0.0000000000 0.0000000000 H -0.3623333220 -1.0248334322 -0.0000000000 H -0.3623333220 0.5124167161 -0.8875317869 H -0.3623333220 0.5124167161 0.8875317869 MLatom will save the optimized geometries in the XYZ file with the same order of molecules as in the initial guess. The output will contain the optimization details for each molecule too. Charge and multiplicity ----------------------- A quick reminder on how to define the charges and multiplicities of a single or multiple molecules. The input might looks something like: .. code-block:: geomopt # 1. requests geometry optimization AIQM1 # 2. AI-enhanced quantum mechanical method 1 xyzfile=init.xyz # 3. initial geometry guess optxyz=opt.xyz # 4. file with optimized geometry charges=0,1 # 5. charge of the first molecule is 0, of the second is 1. multiplicities=1,2 # 6. multiplicity of the first molecule is 1, second - 2 .. note:: It does not make sense to define charges and multiplicities for models like ANI-1ccx which were trained on the neutral closed-shell species. .. _optimizers: Choosing the optimization program --------------------------------- Now, you might have noticed that MLatom tells which program it used as an optimizer for running the geometry optimizations. If you used the XACS cloud for computations, it should have been ASE: .. code-block:: bash Atomic simulation environment (ASE): A. Hjorth Larsen, et al. J. Phys. Condens. Matter. 2017, 29, 273002 .. note:: In the output, MLatom will summarize what methods, models, and programs it used with the recommended citations to include in your publications. This is very useful information to pay attention to and properly give credit to the hard work of people. However, is not printed when MLatom is used through Python API, so please check carefully the manual and publications. MLatom supports the use of several optimizers: aforementioned ASE (using L-BFGS algorithm), the default one in popular Gaussian program, geomeTRIC (using the BFGS algorithm), and optimizers implemented in SciPy (L-BFGS algorithm by default). They can be chosen by adding ``optprog=[ASE or Gaussian or SciPy]`` keyword (case-insensitive), i.e., as: .. code-block:: geomopt # 1. requests geometry optimization ANI-1ccx # 2. pre-trained model xyzfile=init.xyz # 3. initial geometry guess optxyz=opt.xyz # 4. file with optimized geometry optprog=gaussian # 5. Gaussian is choosen as the optimizer If no optimizer is requested with a keyword, MLatom will check the availability and use the optimizer in the order Gaussian -> ASE -> SciPy. Each of the optimizers have their advantages and disadvantages: - ASE is open-source and tested by the community. It is available on the XACS cloud and can be easily installed. - Gaussian is very versatile and time-tested program, allowing its users to exploit all Gaussian's functionality as usual with additional benefit of access to ML models. It should be obtained and installed separately by the user. The I/O is currently not very efficient so that the optimizations with very fast ML models will take much longer than with other optimizers. It might be improved in the future. - `geomeTRIC `__ is also available. - Our SciPy impelementation is an experimental option so please use it on your own risk. .. note:: MLatom is being rapidly developed, so it will get more features and improved geometry optimization features. Our recommendations will change too. Please check from time-to-time this tutorial and also follow our updates on the usual channels. Whatever optimizer you choose, it is worth checking the instructions below for pitfalls and tips. ASE optimizer +++++++++++++ ASE performs the geometry optimization until it reaches the defined threshold for forces or maximum number of iterations. The default value for the maximum number of iterations is 200, for the forces 0.02 eV/Angstrom and the default algorithm is L-BFGS. This default behavior can be changed by using the corresponding ASE keywords, e.g.: .. code-block:: geomopt # 1. requests geometry optimization ANI-1ccx # 2. pre-trained model xyzfile=init.xyz # 3. initial geometry guess optxyz=opt.xyz # 4. file with optimized geometry optprog=ase # 5. ASE is choosen as the optimizer ase.steps=10000 # 6. increasing the maximum number of iterations from 200 to 10000 ase.fmax=0.01 # 7. tightening the convergence criterion from 0.02 to 0.01 eV/Angstrom ase.optimizer=BFGS # 8. selecting the BFGS instead of L-BFGS algorithm for optimization .. note:: If the ASE optimizer reaches its maximum number of optimization iterations, MLatom will save the final geometry at the last iteration. Please double-check MLatom's output and if the number of iterations is 200, the geometry may be still far from the optimized one and you need to restart optimization. Setting ``ase.steps`` to a higher value, e.g., ``ase.steps=10000`` might help too. Gaussian optimizer ++++++++++++++++++ When Gaussian optimizer is used, you will see the ``gaussian.log`` file which is similar to the common Gaussian output file with the difference, that it will use the MLatom's model to predict energies and gradients to perform the geometry optimization. Then you can use your favorite tools to analyze this output as usual for Gaussian, which is a big advantage for the Gaussian users. For above example, you can check the MLatom's output file for the above geometry optimization with Gaussian optimizer: .. code-block:: ============================================================================== Optimization of molecule 1 ============================================================================== Iteration Energy (Hartree) 1 -154.8894274617628 2 -154.8918239620292 3 -154.8919507619391 4 -154.8919601346281 Final properties of molecule 1 Standard deviation of NNs : 0.00062817 Hartree 0.39419 kcal/mol Energy : -154.89196013 Hartree The examples of the full :download:`MLatom ` and :download:`Gaussian ` output files can be downloaded too. An additional important advantage is that the advanced users can modify the ``gaussian.com`` file which contains the input required for geometry optimization with MLatom's models. This way, the user can unlock the access to all the great Gaussian functionality for geometry optimizations such as freezing coordinates or defining input in internal coordinates, etc. After the ``gaussian.com`` file is modified, you can run with it the Gaussian job as usual, e.g., in your system like ``g16 gaussian.com`` in command line. .. _opt_with_PyAPI: Optimization through Python API ------------------------------- For many users, it might be more convenient to use the Python API for geometry optimizations as it provides more flexibility to use more complicated models and build custom workflows. Below is a quick recap of how to do all above simulations but with Python API. In addition, we will show how to build a more complicated model based on delta-learning to perform the geometry optimization. For starters, let's optimize the geometries of ethanol. This will give a pretty good overview of how to write such a code: .. code-block:: python import mlatom as ml # Get the initial guess for the molecules to optimize initmol = ml.data.molecule.from_xyz_file('init.xyz') # initmol charge and multiplicity can be changed if required, e.g., by uncommenting the lines below: #initmol.charge = 1 ; initmol.multiplicity = 2 # Choose one of the predifined (automatically recognized) methods mymodel = ml.models.methods(method='ANI-1ccx') # or QM method, e.g., B3LYP with Gaussian #mymodel = ml.models.methods(method='B3LYP/6-31G*', program='Gaussian') # or ML model, e.g., MACE: #mymodel = ml.models.mace(model_file='mace.pt') # Optimize the geometry with the choosen optimizer: geomopt = ml.optimize_geometry(model=mymodel, initial_molecule=initmol, program='ASE', maximum_number_of_steps=10000) # Get the final geometry final_mol = geomopt.optimized_molecule # and its XYZ coordinates final_mol.xyz_coordinates # Check and save the final geometry print('Optimized coordinates:') print(final_mol.get_xyz_string()) final_mol.write_file_with_xyz_coordinates(filename='final.xyz') # You can also check the optimization trajectory print('Number of optimization steps:', len(geomopt.optimization_trajectory.steps)) # and access any geometry during optimization as intermediate_step_geom = geomopt.optimization_trajectory.steps[2].molecule print('Intermediate coordinates:') print(intermediate_step_geom.get_xyz_string()) Note that there is a subtle but important difference: if you use ``initial_molecule``, the initmol object with not be changed, it will remain the original initial molecule. However, if you use instead ``molecule`` parameters, it will save the final optimized coordinates and other properties (e.g., forces and energies) into the original molecule object, i.e.: .. code-block:: python # This will not change the initmol, you can check it as print('Initial molecule before optimization') print(initmol.get_xyz_string()) geomopt = ml.optimize_geometry(model=mymodel, initial_molecule=initmol, program='ASE', maximum_number_of_steps=10000) print('Initial molecule after optimization') print(initmol.get_xyz_string()) print('Final coordinates after optimization') print(geomopt.optimized_molecule.get_xyz_string()) # Compare to the following code that will change the initmol, you can check it as print('Initial molecule before optimization') print(initmol.get_xyz_string()) geomopt = ml.optimize_geometry(model=mymodel, molecule=initmol, program='ASE', maximum_number_of_steps=10000) print('Initial molecule after optimization') print(initmol.get_xyz_string()) The first option is good to preserve the initial molecule, the second option is good if you just want the final geometry. To run above example, you might want to download the required files (:download:`init.xyz ` and :download:`mace.pt `) .. _opt-with-delta-model: Optimization with custom hybrid models ++++++++++++++++++++++++++++++++++++++ .. include:: tutorial_geomopt_delta-learn.inc Optimization with constraints ++++++++++++++++++++++++++++++ You can also apply constraints to your molecule during geometry optimization. We use ASE implementation so this is only available when you use ASE optimizer via Python API. It needs one more argument ``constraints`` and it should be used like this: ``constraints={'bonds':[[target,[index0,index1]], ...],'angles':[[target,[index0,index1,index2]], ...],'dihedrals':[[target,[index0,index1,index2,index3]], ...]}`` (Check `FixInternals class in ASE `__ for more information). The units of bond lengths are Angstrom and those of angles and dihedrals are degrees. Note that the indices of atoms start from 0! .. note:: Optimization with constraints is added to MLatom in 3.2.0 release. It looks like ASE checks optimization convergence before applying constraints, so if you start from a previously optimzed geometry, you may get the exactly the same geometry. One solution is to perturb the initial geometry a bit. Below shows an example of optimizing ethane using AIQM1 while setting the target of C-C bond length as 1.8 Angstrom. The initial XYZ coordinates are (:download:`ethane_initial.xyz `): .. code-block:: 8 C -3.41779278 -2.06081078 0.00000000 H -3.06113836 -3.06962078 0.00000000 H -3.06111994 -1.55641259 -0.87365150 H -4.48779278 -2.06079760 0.00000000 C -2.90445057 -1.33485451 1.25740497 H -1.83445237 -1.33656157 1.25838332 H -3.25950713 -0.32548149 1.25642758 H -3.26271953 -1.83812251 2.13105517 In Python script: .. code-block:: python import mlatom as ml mol = ml.data.molecule.from_xyz_file('ethane_initial.xyz') print(f"Initial C-C bond length: {mol.get_internuclear_distance_matrix()[0][4]} Angstrom") constraints = {'bonds':[[1.8,[0,4]]]} aiqm1 = ml.models.methods(method='AIQM1',qm_program='sparrow') geomopt = ml.optimize_geometry(model=aiqm1,initial_molecule=mol,program='ase',constraints=constraints) optmol = geomopt.optimized_molecule print(f"Final C-C bond length: {optmol.get_internuclear_distance_matrix()[0][4]} Angstrom") print("XYZ coordinates:") print(optmol.get_xyz_string()) The output should look like this: .. code-block:: Initial C-C bond length: 1.5399999964612658 Angstrom Step Time Energy fmax LBFGS: 0 10:11:55 -2169.358192 0.6954 LBFGS: 1 10:11:56 -2168.493055 1.5444 LBFGS: 2 10:11:56 -2168.324507 2.1746 LBFGS: 3 10:11:57 -2168.702277 0.3925 LBFGS: 4 10:11:57 -2168.719565 0.3556 LBFGS: 5 10:11:58 -2168.744194 0.4116 LBFGS: 6 10:11:58 -2168.765605 0.3914 LBFGS: 7 10:11:58 -2168.779525 0.2010 LBFGS: 8 10:11:59 -2168.782649 0.0282 LBFGS: 9 10:11:59 -2168.782767 0.0237 LBFGS: 10 10:12:00 -2168.782733 0.0370 LBFGS: 11 10:12:00 -2168.782749 0.0236 LBFGS: 12 10:12:00 -2168.782623 0.0873 LBFGS: 13 10:12:01 -2168.782774 0.0203 LBFGS: 14 10:12:01 -2168.782777 0.0217 LBFGS: 15 10:12:01 -2168.782654 0.0988 LBFGS: 16 10:12:02 -2168.782762 0.0299 LBFGS: 17 10:12:02 -2168.782723 0.0372 LBFGS: 18 10:12:03 -2168.782762 0.0250 LBFGS: 19 10:12:03 -2168.777521 0.5284 LBFGS: 20 10:12:03 -2168.782764 0.0369 LBFGS: 21 10:12:04 -2168.782758 0.0397 LBFGS: 22 10:12:04 -2168.781062 0.1423 LBFGS: 23 10:12:04 -2168.782774 0.0375 LBFGS: 24 10:12:05 -2168.782774 0.0251 LBFGS: 25 10:12:05 -2168.782670 0.0433 LBFGS: 26 10:12:06 -2168.782769 0.0209 LBFGS: 27 10:12:06 -2168.782777 0.0177 Final C-C bond length: 1.799999998386536 Angstrom XYZ coordinates: 8 C -3.4609594100000 -2.1223293000000 -0.1060679300000 H -3.0827425100000 -3.1376804600000 -0.0751292200000 H -3.0834089300000 -1.5865323800000 -0.9697472700000 H -4.5445733300000 -2.1041917000000 -0.0741819000000 C -2.8607234100000 -1.2737628100000 1.3635074000000 H -1.7772071700000 -1.2935363400000 1.3338165800000 H -3.2376294200000 -0.2576124300000 1.3304506900000 H -3.2417292800000 -1.8070164100000 2.2269711900000 Any questions or suggestions? ----------------------------- If you have further questions, criticism, and suggestions, we would be happy to receive them in English or Chinese via `email `_, `Slack `_ (preferred), or WeChat (please send an email to request to add you to the XACS user support group).