Installation

The easiest way to use MLatom is not to install it locally but run on the XACS cloud. If you want to install it anyway, please follow instruction below. You can watch a short video demonstrating how to install and use MLatom.

MLatom is a Python package and can be easily installed on Linux using this shell command:

pip install mlatom

You also need to install required dependences in your Python environment as described below.

Dependencies

Minimal setup

pip install numpy scipy torch torchani tqdm matplotlib statsmodels h5py pyh5md

Useful optional modules

pip install sgdml rmsd openbabel xgboost scikit-learn pyscf rmsd rdkit pandas \
   ase fortranformat tensorflow geometric

Full Anaconda environment setup:

Download mlatom.yml.

Then,

conda create -N mlatom --file mlatom.yml
conda activate mlatom

Additional software packages

Many MLatom features are relying on other third-party software packages which are not Python modules and should be installed and setup separately as described here.

The third-party packages below are optional and can be installed separately to enable specific features. In alphabetical order:

  • ASE (can be used for geometry optimizations and thermochemistry)

  • COLUMBUS (required for CASSCF)

  • DeePMD-kit (enables several machine learning potentials implemented there)

  • dftd4 (required for the D4 dispersion correction, required for AIQM1, ANI-2x-D4, and ANI-1x-D4)

  • GAP and QUIP (required for GAP-SOAP machine learning potential)

  • Gaussian (can be used for QM calculations, geometry optimizations, frequencies and thermochemistry, required for IRC and anharmonic frequencies)

  • hyperopt (can be used for hyperparameter optimization)

  • MACE (required for the MACE potential)

  • MNDO (can be used & recommended for AIQM1 and many other semi-empirical QM methods)

  • Newton-X (required for UV/vis spectra simulations)

  • Orca (required for CCSD(T)*/CBS and can be used for DFT calculations)

  • PhysNet (required for PhysNet potential)

  • sGDML (required for sGDML potential)

  • Sparrow (can be used & recommended for AIQM1 and many other semi-empirical QM methods)

  • TorchANI (required for AIQM1 and ANI potentials)

  • Turbomole (required for ADC(2) calculations)

COLUMBUS

COLUMBUS 7 is required for CASSCF calculations. It can be obtained and installed as described on the program website.

It must be made available to MLatom by setting up environmental variable COLUMBUS, e.g.:

export COLUMBUS=[path to COLUMBUS directory with runc executable]

Turbomole

Turbomole is required for ADC(2) calculations. It can be obtained and installed as described on the program website.

It must be made available to MLatom by setting up environmental variable TURBODIR.

Orca

Orca is required for CCSD(T)*/CBS calculations and can be used for DFT calculations. Here we use Orca 4.2.0. It can be obtained and installed as described on the program website.

It must be made available to MLatom by setting up environmental variable orcabin, e.g.:

export orcabin=[path to Orca executable]

Newton-X

Newton-X is required for ML-NEA calculations.

  1. Install Newton-X (NX, preferably version==2.2 for which our implementations were tested)

  2. use export NX=/path/to/Newton-X to define the $NX

TorchANI

TorchANI is required for calculations with AIQM1 and ANI family of potentials.

  1. install Numpy and nightly version of PyTorch (if you do not have them already):

pip install numpy tensorboard
pip install --pre torch torchvision -f https://download.pytorch.org/whl/nightly/cu100/torch_nightly.html
  1. install TorchANI:

pip install torchani

Visit https://aiqm.github.io/torchani/ for more info. The latest version of TorchANI used for testing was v2.2, you can install this version by pip install torchani==2.2 if there are any problems when running with the newest version of TorchANI. The CUDA extension for AEV calculation is not supported for the NN part of AIQM1 and ANI-1ccx now.

DeePMD-kit

Required for DPMD and DeepPot-SE potentials.

  1. download installer for DeePMD-kit from GitHub (tested v1.2.2)

  2. run installer

  3. add environmental variable $DeePMDkit that point to the where dp binary is located (bin/ in your installation directory), e.g. export DeePMDkit=/export/home/fcge/deepmd-kit-1.2/bin.

GAP and QUIP

Required for GAP-SOAP potentials.

  1. compile QUIP and GAP from source

1.1 install prerequisites

sudo apt-get install gcc gfortran python python-pip libblas-dev liblapack-dev # for system uses apt, do equivalent for your OS
pip install numpy ase f90wrap

1.2 get source code of QUIP and GAP

git clone --recursive https://github.com/libAtoms/QUIP.git

Get source code of GAP from http://www.libatoms.org/gap/gap_download.html (form-filling required).

Then put source code in QUIP/src/.

1.3 compile

cd QUIP
export QUIP_ARCH=linux_x86_64_gfortran_openmp # enable multi-threading, use 'export QUIP_ARCH=linux_x86_64_gfortran' if no OpenMP thus no MT capability
export QUIPPY_INSTALL_OPTS=--user # omit for a system-wide installation
make config

Enter Y for gap or edit build/linux_x86_64_gfortran/Makefile.inc with HAVE_GAP=1, then: make. Built binaries are in QUIP/build/linux_x86_64_gfortran/quip and QUIP/build/linux_x86_64_gfortran/gap_fit.

  1. add environmental variable $quip and $gap_fit for quip and gap_fit, e.g.

export quip='/export/home/fcge/GAP-SOAP/QUIP/build/linux_x86_64_gfortran_openmp/quip'
export gap_fit='/export/home/fcge/GAP-SOAP/QUIP/build/linux_x86_64_gfortran_openmp/gap_fit'

visit https://libatoms.github.io/GAP/index.html for more information.

PhysNet

Required for PhysNet models.

  1. clone form PhysNet’s GitHub page

git clone https://github.com/MMunibas/PhysNet.git
  1. install TensorFlow:

pip install tensorflow
  1. if you use TensorFlow v2, you need to execute the command below in PhysNet’s directory to make the scripts compatible with TFv2.

for i in `find . -name '*.py'`; do sed -i -e 's/import tensorflow as tf/import tensorflow.compat.v1 as tf\ntf.disable_v2_behavior()/g'
-e 's/import tensorflow as tf/import tensorflow.compat.v1 as tf\ntf.disable_v2_behavior()/g' $i; done
  1. add environmental variable $PhysNet to the directory, e.g.

export PhysNet=/export/home/fcge/PhysNet/

sGDML

Required for GDML and sGDML potentials.

  1. install sGDML

pip install sgdml==0.4.4
  1. add the path of sGDML binary to environmental variable $sGDML, e.g.

export sGDML=/export/home/fcge/.linuxbrew/bin/sgdml

Visit http://quantum-machine.org/gdml/doc/ for more information.

Note

In our tests we found that installation is more stable with: pip install scipy==1.7.1.

MACE

Required for MACE potentials.

  1. clone MACE git from GitHub

git clone https://github.com/ACEsuit/mace.git
  1. install MACE with pip

pip install ./mace

Warning

We tested version 0.3.2 of MACE, please use this version if you encounter any problem.

MNDO

MNDO program is required to provide the ODM2* part of AIQM1. Alternatively, a (development) version of SCINE Sparrow can be used (at the moment it has no analytical derivatives for ODM2* part and hence only single-point simulations are recommended; see a paper on Sparrow; note that the development version of Sparrow also implements single-point AIQM1 calculations).

The free binary and open-source code of the MNDO program is available from the official distributors of the MNDO code as described at https://mndo.kofo.mpg.de.

After the MNDO program is installed, you need to set the environmental variable pointing to the MNDO executable (typically mndo99), e.g., in bash:

export mndobin=[path to the executable]/mndo99

Sparrow

The SCINE Sparrow program can be used to provide many of the semi-empirical methods. See its website for the installation instructions.

Note that the a (development) version can also be used instead of the MNDO program to provide the ODM2* part of AIQM1, but the biggest limitation is that it has no analytical gradients at the moment. See a paper on Sparrow for details. Note that the development version of Sparrow also implements single-point AIQM1 calculations. Our recommendation is just to use this development version of Sparrow for AIQM1 single-point calculations on the XACS cloud. It is difficult to install this version.

dftd4

dftd4 program is required to provide the D4 part of AIQM1.

The dftd4 program can be obtained as both executable and open-source code. We recommend using dftd4 v3.5.0 (dftd4 v2.5.0 for the MLatom versions earlier than 3.0.1), which can calculate Hessian needed for thermochemical calculations. To install the dftd4 program from source code, please see the README.md file on dftd4 GitHub page for more details.

After the dftd4 program is installed, you need to set the environmental variable pointing to the dftd4 executable, e.g., in bash:

export dftd4bin=[path to the executable]/dftd4

Gaussian

Required for geometry optimizations, freq, TS search, IRC, thermochemistry, and ML-NEA. For some of these tasks, alternatively, ASE can be used, see below.

Our implementation work with both Gaussian 09 and Gaussian 16. It is a commercial program, which can be obtained and installed separately.

To use Gaussian interface, make sure that your environmental variable $GAUSS_EXEDIR points to the right place.

ASE

Required for geometry optimizations, freq, and thermochemistry. Alternatively, Gaussian can be used, see above.

The ASE (Atomic Simulation Environment) are Python modules, which can be installed as described on ASE website, i.e.:

pip install ase

hyperopt

To enable hyperopt, please run pip install hyperopt to install the hyperopt package.