Overview

The Python API (or PyAPI for short) of MLatom mainly contains 3 components: mlatom.data, mlatom.models, and mlatom.simulations.

With class mlatom.data, one can create/manipulate/save/load chemical data into objects like mlatom.data.atom, mlatom.data.molecule, and mlatom.data.molecular_database.

The class mlatom.models contains lots of computational chemistry models, based on quantum mechanics and machine learning, for making predictions on molecules. The models can be categorized into 3 types:

methods: Models of methods that can be used ‘as is’ without the need to be trained by the user.
ml_model: Machine learning models that require user to train them.
model_tree_node: models consisting of other model objects (mlatom.models.methods, mlatom.models.ml_model, or another mlatom.models.model_tree_node).

mlatom.simulations makes use of an mlatom.models object to perform simulations such as geometry optimizations or molecular dynamics.

Please check a simple example that illustrates the usage of these components.

Data

!---------------------------------------------------------------------------!
! data: Module for working with data                                        !
! Implementations by: Pavlo O. Dral, Fuchun Ge,                             !
!                     Shuang Zhang, Yi-Fan Hou, Yanchi Ou                   !
!---------------------------------------------------------------------------!

Create an atom object.

Parameters:

nuclear_charge (int, optional) – provide the nuclear charge to define the atom.
atomic_number (int, optional) – provide the atomic number to define the atom.
element_symbol (int, optional) – provide the element symbol to define the atom.
nuclear_mass (int, optional) – provide the nuclear mass to define the atom.
xyz_coordinates (Array-like, optional) – specify the location of atom in Cartesian coordinates.

copy(atomic_labels=None) → atom[source]: Return a copy of the current atom object.

class mlatom.data.molecule(charge: int = 0, multiplicity: int = 1, atoms: List[atom] = None)[source]

Create a molecule object.

Parameters:

charge (float, optional) – Specify the charge of the molecule.
multiplicity (int, optional) – Specify the multiplicity of the molecule.
atoms (List[atom], optional) – Specify the atoms in the molecule.

Examples

Select an atom inside with subscription:

from mlatom.data import atom, molecule
at = atom(element_symbol = 'C')
mol = molecule(atoms = [at])
print(id(at), id(mol[0]))

id: The unique ID for this molecule.

charge: The electric charge of the molecule.

multiplicity: The multiplicity of the molecule.

read_from_xyz_file(filename: str, format: str | None = None) → molecule[source]

Load molecular geometry from XYZ file.

Data in standard XYZ format can be read if parameter format is not specified.

Extra formats supoorted are:

'COLUMBUS'

'NEWTON-X' or 'NX'

'turbomol'

Parameters:

filename (str) – The name of the file to be read.
format (str, optional) – The format of the file.

read_from_xyz_string(string: str = None, format: str | None = None) → molecule[source]

Load molecular geometry from XYZ string.

Data in standard XYZ format can be read if parameter format is not specified.

Extra formats supoorted are:

'COLUMBUS'

'NEWTON-X' or 'NX'

'turbomol'

Parameters:

string (str) – The string input.
format (str, optional) – The format of the string.

read_from_numpy(coordinates: ndarray, species: ndarray) → molecule[source]

Load molecular geometry from a numpy array of coordinates and another one for species.

The shape of the input coordinates should be (N, 3), while (N,) for the input species.

Where the N is the number of atoms.

read_from_smiles_string(smi_string: str) → molecule[source]

Generate molecular geometry from a SMILES string provided.

The geometry will be generated and optimized with Pybel’s make3D() method.

classmethod from_xyz_file(filename: str, format: str | None = None) → molecule[source]: Classmethod wrapper for molecule.read_from_xyz_file(), returns a molecule object.

classmethod from_xyz_string(string: str = None, format: str | None = None) → molecule[source]: Classmethod wrapper for molecule.read_from_xyz_string(), returns a molecule object.

classmethod from_numpy(coordinates: ndarray, species: ndarray) → molecule[source]: Classmethod wrapper for molecule.read_from_numpy(), returns a molecule object.

classmethod from_smiles_string(smi_string: str) → molecule[source]: Classmethod wrapper for molecule.read_from_smiles_string(), returns a molecule object.

add_atom_from_xyz_string(line: str) → None[source]: Add an atom to molecule from a string in XYZ format

add_scalar_property(scalar, property_name: str = 'y') → None[source]

Add a scalar property to the molecule. So the property can be called by molecule.<property_name>.

Parameters:

scalar – The scalar to be added.
property_name (str, optional) – The name assign to the scalar property.

add_xyz_derivative_property(derivative, property_name: str = 'y', xyz_derivative_property: str = 'xyz_derivatives') → None[source]

Add a XYZ derivative property to the molecule.

Parameters:

derivative – The derivative property to be added.
property_name (str, optional) – The name of the associated non-derivative property.
xyz_derivative_property (str, optional) – the name assign to the derivative property.

add_xyz_vectorial_property(vector, xyz_vectorial_property: str = 'xyz_vector') → None[source]

Add a XYZ vectorial property to the molecule.

Parameters:

vector – The vector to be added.
xyz_vectorial_property (str, optional) – the name assign to the vectorial property.

write_file_with_xyz_coordinates(filename: str, format: str | None = None) → None[source]

Write the molecular geometry data into a file. Data in standard XYZ format can be read if parameter format is not specified.

Extra formats supoorted are:

'COLUMBUS'

'NEWTON-X' or 'NX'

'turbomol'

Parameters:

filename (str) – The name of the file to be written.
format (str, optional) – The format of the file.

get_xyz_string() → str[source]: Return the molecular geometry in a string of XYZ format.

property atomic_numbers: ndarray: The atomic numbers of the atoms in the molecule.

property element_symbols: ndarray: The element symbols of the atoms in the molecule.

property smiles: str: The SMILES representation of the molecule.

property xyz_coordinates: ndarray: The XYZ geometry of the molecule.

property kinetic_energy: float: Give out the kinetic energy (A.U.) based on the xyz_velocities.

copy(atomic_labels=None, molecular_labels=None)[source]: Return a copy of current molecule object.

dump(filename=None, format='json')[source]: Dump the current molecule object into a file. Only in json format, which is supported now.

load(filename=None, format='json')[source]: Load a molecule object from a dumped file.

property state_energies: ndarray: The electronic state energies of the molecule.

property state_gradients: ndarray: The electronic state energy gradients of the molecule.

property energy_gaps: ndarray: The energy gaps of different states.

property excitation_energies: ndarray: The excitation energies of the molecule from ground state.

class mlatom.data.molecular_database(molecules: List[molecule] = None)[source]

Create a database for molecule objects.

Parameters:: molecules (List[molecule]) – A list of molecule to be included in the molecular database.

Examples

Select an atom inside with subscription:

from mlatom.data import atom, molecule, molecular_database
at = atom(element_symbol = 'C')
mol = molecule(atoms = [at])
molDB = molecular_database([mol])
print(id(mol) == id(molDB[0]))
# the output should be 'True'

Slicing the database like a numpy array:

from mlatom.data import molecular_database
molDB = molecular_database.from_xyz_file('devtests/al/h2_fci_db/xyz.dat')
print(len(molDB))           # 451
print(len(molDB[:100:4]))   # 25

read_from_xyz_file(filename: str, append: bool = False) → molecular_database[source]

Load molecular geometry from XYZ file.

Parameters:

filename (str) – The name of the file to be read.
append (bool, optional) – Append to current database if True, otherwise clear the current database.

read_from_xyz_string(string: str, append=False) → molecular_database[source]

Load molecular geometry from XYZ string.

Parameters:

string (str) – The name of the file to be read.
append (bool, optional) – Append to current database if True, otherwise clear the current database.

read_from_numpy(coordinates: ndarray, species: ndarray, append: bool = False) → molecular_database[source]

Load molecular geometries from a numpy array of coordinates and another one for species.

The shape of the input coordinates should be (M, N, 3), while (M, N,) for the input species.

Where the N is the number of atoms and M is the number of molecules.

read_from_smiles_file(smi_file: str, append: bool = False) → molecular_database[source]

Generate molecular geometries from a SMILES file provided.

The geometries will be generated and optimized with Pybel’s make3D() method.

read_from_smiles_string(smi_string: str, append: bool = False) → molecular_database[source]

Generate molecular geometries from a SMILES string provided.

The geometries will be generated and optimized with Pybel’s make3D() method.

classmethod from_xyz_file(filename: str) → molecular_database[source]: Classmethod wrapper for molecular_database.read_from_xyz_file(), returns a molecular_database object.

classmethod from_xyz_string(string: str) → molecular_database[source]: Classmethod wrapper for molecular_database.read_from_xyz_string(), returns a molecular_database object.

classmethod from_numpy(coordinates: ndarray, species: ndarray) → molecular_database[source]: Classmethod wrapper for molecular_database.read_from_numpy(), returns a molecular_database object.

classmethod from_smiles_file(smi_file: str) → molecular_database[source]: Classmethod wrapper for molecular_database.read_from_smiles_file(), returns a molecular_database object.

classmethod from_smiles_string(smi_string: str | List) → molecular_database[source]: Classmethod wrapper for molecular_database.read_from_smiles_string(), returns a molecular_database object.

add_scalar_properties(scalars, property_name: str = 'y') → None[source]

Add scalar properties to the molecules.

Parameters:

scalars – The scalar to be added.
property_name (str, optional) – The name assign to the scalar property.

add_scalar_properties_from_file(filename: str, property_name: str = 'y') → None[source]

Add scalar properties from a file to the molecules.

Parameters:

filename (str) – Specify the text file that contains properties.
property_name (str, optional) – The name assign to the scalar property.

add_xyz_derivative_properties(derivatives, property_name: str = 'y', xyz_derivative_property: str = 'xyz_derivatives') → None[source]

Add a XYZ derivative property to the molecule.

Parameters:

derivatives – The derivatives to be added.
property_name (str, optional) – The name of the associated non-derivative property.
xyz_derivative_property (str, optional) – the name assign to the derivative property.

add_xyz_derivative_properties_from_file(filename: str, property_name: str = 'y', xyz_derivative_property: str = 'xyz_derivatives') → None[source]

Add a XYZ derivatives from a text file to the molecules.

Parameters:

filename (str) – The filename that contains derivatives to be added.
property_name (str, optional) – The name of the associated non-derivative properties.
xyz_derivative_property (str, optional) – the name assign to the derivative properties.

add_xyz_vectorial_properties(vectors, xyz_vectorial_property: str = 'xyz_vector') → None[source]

Add a XYZ vectorial properties to the molecules.

Parameters:

vectors – The vectors to be added.
xyz_vectorial_property (str, optional) – the name assign to the vectorial properties.

add_xyz_vectorial_properties_from_file(filename: str, xyz_vectorial_property: str = 'xyz_vector') → None[source]

Add a XYZ derivatives from a text file to the molecules.

Parameters:

filename (str) – The filename that contains vectorial properties to be added.
xyz_vectorial_property (str, optional) – the name assign to the vectorial properties.

write_file_with_xyz_coordinates(filename: str) → None[source]

Write the molecular geometries into a file in XYZ format.

Parameters:: filename (str) – The name of the file to be written.

get_xyz_string() → None[source]: Return a string in XYZ format for the molecules.

write_file_with_properties(filename, property_to_write='y')[source]: Write a property of molecules to a text file.

property atomic_numbers: ndarray: The 2D array of the atomic numbers of each atom, for all molecules in the database.

property element_symbols: ndarray: The 2D array of the element symbols of each atom, for all molecules in the database.

property ids: The IDs of the molecules in the database.

property smiles: str: The SMILES string of the molecules in the database.

write_file_with_smiles(filename)[source]: Write the SMILES of the molecules in the database to a file.

property nuclear_masses: The nuclear_masses of the molecules in the database.

property charges: The electric charges of the molecules in the database.

property multiplicities: The multiplicities of the molecules in the database.

get_properties(property_name='y')[source]: Return the properties of the molecules by a given property name.

set_properties(**kwargs)[source]: Set properties of the molecules by given property name(s) as keyword(s).

get_xyz_derivative_properties(xyz_derivative_property='xyz_derivatives')[source]: Return XYZ derivative properties by the name.

get_xyz_vectorial_properties(property_name)[source]: Return XYZ vectorial properties by the name.

write_file_with_xyz_derivative_properties(filename, xyz_derivative_property_to_write='xyz_derivatives')[source]: Write XYZ derivative properties into a file.

write_file_energy_gradients(filename)[source]: Write energy gradients into a file.

write_file_with_xyz_vectorial_properties(filename, xyz_vectorial_property_to_write='xyz_vector')[source]: Write XYZ vectorial properties into a file.

write_file_with_hessian(filename, hessian_property_to_write='hessian')[source]: Write Hessians into a file.

append(obj)[source]: Append a molecule/molecular database.

copy(atomic_labels=None, molecular_labels=None, molecular_database_labels=None)[source]: Return a copy of the database.

dump(filename=None, format=None)[source]: Dump the molecular database to a file.

classmethod load(filename=None, format=None)[source]: Load a molecular database from a file.

property xyz_coordinates: The XYZ coordinates of each atom in every molecule.

class mlatom.data.molecular_trajectory(steps=None)[source]

A class for storing/access molecular trajectory data, which is generated from a dynamics or a optimization task.

dump(filename=None, format=None)[source]

Dump the molecular_trajectory object into a file.

Available formats are:

'h5md' (requires python module h5py and pyh5md)
'json'
'plain_text'

load(filename: str = None, format: str = None)[source]: Load the previously dumped molecular_trajectory from file.

get_xyz_string() → str[source]: Return the XYZ string of the molecules in the trajectory.

class mlatom.data.h5md(filename: str, data: Dict[str, Any] = {}, mode: str = 'w')[source]

Saving trajectory data to file in H5MD format

Parameters:

filename (str) – The filename of the h5md file output.
data (Dict) – The data to be stored (optional, if provided, the file will be closed after storing data).
mode (str, optional) – A string that controls the file processing mode (default value: ‘w’ for a new file, ‘r+’ for an exisiting file). The choices are listed in the table below which is consistent with pyh5md.File()

r	Readonly, file must exist
r+	Read/write, file must exist
w	Create file, truncate if exists
w- or x	Create file, fail if exists

Examples:

traj0 = h5md('traj.h5')  # open 'traj.h5'
traj1 = h5md('/tmp/test.h5', mode='r')  # open an existing file in readonly mode
traj2 = h5md('/tmp/traj2.h5', data={'time': 1.0, 'total_energy': -32.1, 'test': 8848}) # add some data to the file, then close the file

traj0.write(data) # write data to opened file
traj0(data) # an alternative way to write data

data = traj0.export() # export the data in the opened file
data = traj0() # an alternative way to export data
with h5md('test.h5') as traj: # export with a with statement
    data = traj.export()


traj0.close() # close the file

Note

the default data path in HDF5 file

particles/all:
‘box’, ‘gradients’, ‘mass’, ‘nad’, ‘names’, ‘position’, ‘species’, ‘velocities’

observables:
‘angular_momentum’, ‘generated_random_number’, ‘kinetic_energy’, ‘linear_momentum’, ‘nstatdyn’, ‘oscillator_strengths’, ‘populations’, ‘potential_energy’, ‘random_seed’, ‘sh_probabilities’, ‘total_energy’, ‘wavefunctions’,

and any other keywords

h5: the HDF5 file object

write(data: Dict[str, Any]) → None[source]: Write data to the opened H5 file. Data should be a dictionary-like object with ‘time’ in its keys().

export() → Dict[str, ndarray][source]

Export the data in the opened H5 file.

Returns:: A dictionary of the trajectory data in the H5 file.

close() → None[source]: Close the opened file.

__call__() → Dict[str, ndarray]

Export the data in the opened H5 file.

Returns:: A dictionary of the trajectory data in the H5 file.

Models

!---------------------------------------------------------------------------!
! models: Module with models                                                !
! Implementations by: Pavlo O. Dral                                         !
!---------------------------------------------------------------------------!

methods

!---------------------------------------------------------------------------!
! models: Module with models                                                !
! Implementations by: Pavlo O. Dral                                         !
!---------------------------------------------------------------------------!

class mlatom.models.methods(method: str = None, program: str = None, **kwargs)[source]

Create a model object with a specified method.

Parameters:

method (str) – Specify the method. Available methods are listed in the section below.
program (str, optional) – Specify the program to use.
**kwargs – Other method-specific options

Available Methods:

'AIQM1', 'AIQM1@DFT', 'AIQM1@DFT*', 'AM1', 'ANI-1ccx', 'ANI-1x', 'ANI-1x-D4', 'ANI-2x', 'ANI-2x-D4', 'CCSD(T)*/CBS', 'CNDO/2', 'D4', 'DFTB0', 'DFTB2', 'DFTB3', 'GFN2-xTB', 'MINDO/3', 'MNDO', 'MNDO/H', 'MNDO/d', 'MNDO/dH', 'MNDOC', 'ODM2', 'ODM2*', 'ODM3', 'ODM3*', 'OM1', 'OM2', 'OM3', 'PM3', 'PM6', 'RM1', 'SCC-DFTB', 'SCC-DFTB-heats'.

Methods listed above can be accepted without specifying a program. The required programs still have to be installed though as described in the installation manual.

Available Programs and Their Corresponding Methods:

Program

Methods

TorchANI

'AIQM1', 'AIQM1@DFT', 'AIQM1@DFT*', 'ANI-1ccx', 'ANI-1x', 'ANI-1x-D4', 'ANI-2x', 'ANI-2x-D4'

dftd4

'AIQM1', 'AIQM1@DFT', 'ANI-1x-D4', 'ANI-2x-D4', 'D4'

MNDO or Sparrow

'AIQM1', 'AIQM1@DFT', 'AIQM1@DFT*', 'MNDO', 'MNDO/d', 'ODM2*', 'ODM3*', 'OM2', 'OM3', 'PM3', 'SCC-DFTB', 'SCC-DFTB-heats'

MNDO

'CNDO/2', 'MINDO/3', 'MNDO/H', 'MNDO/dH', 'MNDOC', 'ODM2', 'ODM3', 'OM1', semiempirical OMx, DFTB, NDDO-type methods

Sparrow

'DFTB0', 'DFTB2', 'DFTB3', 'PM6', 'RM1', semiempirical OMx, DFTB, NDDO-type methods

xTB

'GFN2-xTB', semiempirical GFNx-TB methods

Orca

'CCSD(T)*/CBS', DFT

Gaussian

ab initio methods, DFT

PySCF

ab initio methods, DFT

property nthreads

int([x]) -> integer int(x, base=10) -> integer

Convert a number or string to an integer, or return 0 if no arguments are given. If x is a number, return x.__int__(). For floating point numbers, this truncates towards zero.

If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in the given base. The literal can be preceded by ‘+’ or ‘-’ and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal. >>> int(‘0b100’, base=0) 4

predict(*args, **kwargs)[source]

Make predictions for molecular geometries with the model.

Parameters:

molecular_database (mlatom.data.molecular_database, optional) – A database contains the molecules whose properties need to be predicted by the model.
molecule (mlatom.models.molecule, optional) – A molecule object whose property needs to be predicted by the model.
calculate_energy (bool, optional) – Use the model to calculate energy.
calculate_energy_gradients (bool, optional) – Use the model to calculate energy gradients.
calculate_hessian (bool, optional) – Use the model to calculate energy hessian.

AIQM1

!---------------------------------------------------------------------------!
! aiqm1: Artificial intelligence quantum-mechanical method 1                !
! Implementations by: Peikung Zheng & Pavlo O. Dral                         !
!---------------------------------------------------------------------------!

class mlatom.aiqm1.aiqm1(method='AIQM1', qm_program=None, qm_program_kwargs={}, dftd4_kwargs={}, **kwargs)[source]

The Artificial intelligence-quantum mechanical method as in the AIQM1 paper.

Parameters:

method (str, optional) – AIQM method used. Currently supports AIQM1, AIQM1@DFT*, and AIQM1@DFT. Default value: AIQM1.
qm_program (str) – The QM program used in the calculation of ODM2* part. Currently supports MNDO and Sparrow program.
qm_program_kwargs (dictionary, optional) – Keywords passed to QM program.

# Initialize molecule
mol = ml.data.molecule()
mol.read_from_xyz_file(filename='ethanol.xyz')
# Run AIQM1 calculation
aiqm1 = ml.models.methods(method='AIQM1', qm_program='MNDO')
aiqm1.predict(molecule=mol, calculate_energy=True, calculate_energy_gradients=True)
# Get energy, gradient, and prediction uncertainty of AIQM1
energy = mol.energy
gradient = mol.gradient
std = mol.aiqm1_nn.energy_standard_deviation

predict(molecular_database=None, molecule=None, calculate_energy=True, calculate_energy_gradients=False, calculate_hessian=False, nstates=1, current_state=0, **kwargs)[source]

Make predictions for molecular geometries with the model.

Parameters:

molecular_database (mlatom.data.molecular_database, optional) – A database contains the molecules whose properties need to be predicted by the model.
molecule (mlatom.models.molecule, optional) – A molecule object whose property needs to be predicted by the model.
calculate_energy (bool, optional) – Use the model to calculate energy.
calculate_energy_gradients (bool, optional) – Use the model to calculate energy gradients.
calculate_hessian (bool, optional) – Use the model to calculate energy hessian.

ml_model

!---------------------------------------------------------------------------!
! models: Module with models                                                !
! Implementations by: Pavlo O. Dral                                         !
!---------------------------------------------------------------------------!

class mlatom.models.hyperparameter(value: Any = None, optimization_space: str = 'linear', dtype: Callable | None = None, name: str = '', minval: Any = None, maxval: Any = None, step: Any = None, choices: Iterable[Any] = [], **kwargs)[source]

Class of hyperparameter object, containing data could be used in hyperparameter optimizations.

Parameters:

value (Any, optional) – The value of the hyperparameter.
optimization_space (str, optional) – Defines the space for hyperparameter. Currently supports 'linear', and 'log'.
dtype (Callable, optional) – A callable object that forces the data type of value. Automatically choose one if set to None.

update(new_hyperparameter: hyperparameter) → None[source]

Update hyperparameter with data in another instance.

Parameters:: new_hyperparameter (mlatom.models.hyperparamters) – Whose data are to be applied to the current instance.

copy()[source]

Returns a copy of current instance.

Returns:: a new instance copied from current one.
Return type:: mlatom.models.hyperparamter

class mlatom.models.hyperparameters(dict=None, /, **kwargs)[source]

Class for storing hyperparameters, values are auto-converted to mlatom.models.hyperparameter objects. Inherit from collections.UserDict.

Initiaion:

Initiate with a dictinoary or kwargs or both.

e.g.:

hyperparamters({'a': 1.0}, b=hyperparameter(value=2, minval=0, maxval=4))

copy(keys: Iterable[str] | None = None) → hyperparameters[source]

Returns a copy of current instance.

Parameters:: keys (Iterable[str], optional) – If keys provided, only the hyperparameters selected by keys will be copied, instead of all hyperparameters.
Returns:: a new instance copied from current one.
Return type:: mlatom.models.hyperparamters

class mlatom.models.kreg(model_file: str | None = None, ml_program: str = 'KREG_API', equilibrium_molecule: molecule | None = None, prior: float = 0, nthreads: int | None = None, hyperparameters: Dict[str, Any] | hyperparameters = {})[source]

Create a KREG model object

Parameters:

model_file (str, optional) – The name of the file where the model should be dumped to or loaded from.
ml_program (str, optional) – Specify which ML program to use. Avaliable options: 'KREG_API', 'MLatomF.
equilibrium_molecule (mlatom.data.molecule | None) – Specify the equilibrium geometry to be used to generate RE descriptor. The geometry with lowest energy/value will be selected if set to None.
prior (default - None) – the prior can be ‘mean’, None (0.0), or any floating point number.
hyperparameters (Dict[str, Any] | mlatom.models.hyperparameters, optional) – Updates the hyperparameters of the model with provided.

train(molecular_database=None, property_to_learn=None, xyz_derivative_property_to_learn=None, save_model=True, invert_matrix=False, matrix_decomposition=None, prior=None, hyperparameters: Dict[str, Any] | hyperparameters = {})[source]

Train the model with molecular database provided.

Parameters:

molecular_database (mlatom.data.molecular_database) – The database of molecules for training.
property_to_learn (str, optional) – The label of property to be learned in model training.
xyz_derivative_property_to_learn (str, optional) – The label of XYZ derivative property to be learned.

predict(molecular_database=None, molecule=None, calculate_energy=False, calculate_energy_gradients=False, calculate_hessian=False, property_to_predict=None, xyz_derivative_property_to_predict=None, hessian_to_predict=None)[source]

Make predictions for molecular geometries with the model.

Parameters:

molecular_database (mlatom.data.molecular_database, optional) – A database contains the molecules whose properties need to be predicted by the model.
molecule (mlatom.models.molecule, optional) – A molecule object whose property needs to be predicted by the model.
calculate_energy (bool, optional) – Use the model to calculate energy.
calculate_energy_gradients (bool, optional) – Use the model to calculate energy gradients.
calculate_hessian (bool, optional) – Use the model to calculate energy hessian.
property_to_predict (str, optional) – The label name where the predicted properties to be saved.
xyz_derivative_property_to_predict (str, optional) – The label name where the predicted XYZ derivatives to be saved.
hessian_to_predict (str, optional) – The label name where the predicted Hessians to be saved.

mlatom.models.ani(**kwargs)[source]: Returns an ANI model object (see mlatom.interfaces.torchani_interface.ani).

mlatom.models.dpmd(**kwargs)[source]: Returns a DPMD model object (see mlatom.interfaces.dpmd_interface.dpmd).

mlatom.models.gap(**kwargs)[source]: Returns a GAP model object (see mlatom.interfaces.gap_interface.gap).

mlatom.models.physnet(**kwargs)[source]: Returns a PhysNet model object (see mlatom.interfaces.physnet_interface.physnet).

mlatom.models.sgdml(**kwargs)[source]: Returns an sGDML model object (see mlatom.interfaces.sgdml_interface.sgdml).

mlatom.models.mace(**kwargs)[source]: Returns an MACE model object (see mlatom.interfaces.mace_interface.mace).

model_tree_node

!---------------------------------------------------------------------------!
! models: Module with models                                                !
! Implementations by: Pavlo O. Dral                                         !
!---------------------------------------------------------------------------!

class mlatom.models.model_tree_node(name=None, parent=None, children=None, operator=None, model=None)[source]

Create a model tree node.

Parameters:

name (str) – The name assign to the object.
parent – The parent of the model node.
children – The children of this model tree node.
operator – Specify the operation to be made when making predictions.

predict(**kwargs)[source]

Make predictions for molecular geometries with the model.

Parameters:

molecular_database (mlatom.data.molecular_database, optional) – A database contains the molecules whose properties need to be predicted by the model.
molecule (mlatom.models.molecule, optional) – A molecule object whose property needs to be predicted by the model.
calculate_energy (bool, optional) – Use the model to calculate energy.
calculate_energy_gradients (bool, optional) – Use the model to calculate energy gradients.
calculate_hessian (bool, optional) – Use the model to calculate energy hessian.

dump(filename=None, format='json')[source]: Dump the object to a file.

Interfaces

Interfaces to third-party software.

TorchANI

!---------------------------------------------------------------------------!
! Interface_TorchANI: Interface between TorchANI and MLatom                 !
! Implementations by: Fuchun Ge and Max Pinheiro Jr                         !
!---------------------------------------------------------------------------!

class mlatom.interfaces.torchani_interface.ani(model_file: str = None, device: str = None, hyperparameters: Dict[str, Any] | hyperparameters = {}, verbose=1)[source]

Create an ANI (ANAKIN-ME: Accurate NeurAl networK engINe for Molecular Energie) model object.

Interfaces to TorchANI.

Parameters:

model_file (str, optional) – The filename that the model to be saved with or loaded from.
device (str, optional) – Indicate which device the calculation will be run on. i.e. ‘cpu’ for CPU, ‘cuda’ for Nvidia GPUs. When not speficied, it will try to use CUDA if there exists valid CUDA_VISIBLE_DEVICES in the environ of system.
hyperparameters (Dict[str, Any] | mlatom.models.hyperparameters, optional) – Updates the hyperparameters of the model with provided.
verbose (int, optional) – 0 for silence, 1 for verbosity.

save(model_file: str = '') → None[source]

Save the model to file (.pt format).

Parameters:: model_file (str, optional) – The filename that the model to be saved into. If not provided, a randomly generated string will be used.

load(model_file: str = '', species_order: List[str] | None = None, AEV_parameters: Dict | None = None, self_energies: List[float] | None = None, reset_parameters: bool = False, method: str = '') → None[source]

Load a saved ANI model from file (.pt format).

Parameters:

model_file (str) – The filename of the model to be loaded.
species_order (List[str], optional) – Manually provide species order if it’s not present in the saved model.
AEV_parameters (Dict, optional) – Manually provide AEV parameters if it’s not present in the saved model.
self_energies (List[float], optional) – Manually provide self energies if it’s not present in the saved model.
reset_parameters (bool) – Reset network paramters in the loaded model.
method (str) – Load an ANI method, see also ani.load_ani_model().

load_ani_model(method: str, **hyperparameters) → None[source]

Load an ANI model.

Arguments:
method(str): Can be 'ANI-1x', 'ANI-1ccx', or 'ANI-2x'.

train(molecular_database: data.molecular_database, property_to_learn: str = 'energy', xyz_derivative_property_to_learn: str = None, validation_molecular_database: data.molecular_database | str | None = 'sample_from_molecular_database', hyperparameters: Dict[str, Any] | models.hyperparameters = {}, spliting_ratio: float = 0.8, save_model: bool = True, check_point: str = None, reset_optim_state: bool = False, use_last_model: bool = False, reset_parameters: bool = False, reset_network: bool = False, reset_optimizer: bool = False, save_every_epoch: bool = False, energy_weighting_function: Callable = None, energy_weighting_function_kwargs: dict = {}) → None[source]

Train the model with molecular database provided.

Parameters:

molecular_database (mlatom.data.molecular_database) – The database of molecules for training.
property_to_learn (str, optional) – The label of property to be learned in model training.
xyz_derivative_property_to_learn (str, optional) – The label of XYZ derivative property to be learned.
validation_molecular_database (mlatom.data.molecular_database | str, optional) – Explicitly defines the database for validation, or use 'sample_from_molecular_database' to make it sampled from the training set.
hyperparameters (Dict[str, Any] | mlatom.models.hyperparameters, optional) – Updates the hyperparameters of the model with provided.
spliting_ratio (float, optional) – The ratio sub-training dataset in the whole training dataset.
save_model (bool, optional) – Whether save the model to disk during training process. Note that the model might be saved many times during training.
reset_optim_state (bool, optional) – Whether to reset the state of optimizer.
use_last_model (bool, optional) – Whether to keep the self.model as it is in the last training epoch. If False, the best model will be loaded to memory at the end of training.
reset_parameters (bool, optional) – Whether to reset the model’s parameters before training.
reset_network (bool, optional) – Whether to re-construct the network before training.
reset_optimizer (bool, optional) – Whether to re-define the optimizer before training .
save_every_epoch (bool, optional) – Whether to save model in every epoch, valid when save_model is True.
energy_weighting_function (Callable, optional) – A weighting function $\mathit{W}(\mathbf{E_ref})$ that assign weights to training points based on their reference energies.
energy_weighting_function_kwargs (dict, optional) – Extra weighting function arguments in a dictionary.

predict(molecular_database: data.molecular_database = None, molecule: data.molecule = None, calculate_energy: bool = False, calculate_energy_gradients: bool = False, calculate_hessian: bool = False, property_to_predict: str | None = 'estimated_y', xyz_derivative_property_to_predict: str | None = None, hessian_to_predict: str | None = None, batch_size: int = 65536) → None[source]

Make predictions for molecular geometries with the model.

Parameters:

molecular_database (mlatom.data.molecular_database, optional) – A database contains the molecules whose properties need to be predicted by the model.
molecule (mlatom.models.molecule, optional) – A molecule object whose property needs to be predicted by the model.
calculate_energy (bool, optional) – Use the model to calculate energy.
calculate_energy_gradients (bool, optional) – Use the model to calculate energy gradients.
calculate_hessian (bool, optional) – Use the model to calculate energy hessian.
property_to_predict (str, optional) – The label name where the predicted properties to be saved.
xyz_derivative_property_to_predict (str, optional) – The label name where the predicted XYZ derivatives to be saved.
hessian_to_predict (str, optional) – The label name where the predicted Hessians to be saved.
batch_size (int, optional) – The batch size for batch-predictions.

NN_initialize(a: float = 1.0) → None[source]

Reset the network parameters using torch.nn.init.kaiming_normal_().

Parameters:: a (float) – Check torch.nn.init.kaiming_normal_().

fix_layers(layers_to_fix: List[List[int]] | List[int])[source]

Fix specific layers to be non-trainable for each element.

Parameters:

layers_to_fix (List) –

Should be:

A list of integers. Layers indicate by the integers will be fixed
A list of lists of integers. Each sub-list defines the layers to be fixed for each species, in the order of self.species_order.

class mlatom.interfaces.torchani_interface.ani_methods(method: str = 'ANI-1ccx', device: str = 'cpu', **kwargs)[source]

Create a model object with one of the ANI methods

Parameters:

method (str) – A string that specifies the method. Available choices: 'ANI-1x', 'ANI-1ccx', or 'ANI-2x'.
device (str, optional) – Indicate which device the calculation will be run on, i.e. ‘cpu’ for CPU, ‘cuda’ for Nvidia GPUs. When not speficied, it will try to use CUDA if there exists valid CUDA_VISIBLE_DEVICES in the environ of system.

predict(molecular_database: data.molecular_database = None, molecule: data.molecule = None, calculate_energy: bool = False, calculate_energy_gradients: bool = False, calculate_hessian: bool = False, batch_size: int = 65536) → None[source]

Make predictions for molecular geometries with the model.

Parameters:

molecular_database (mlatom.data.molecular_database, optional) – A database contains the molecules whose properties need to be predicted by the model.
molecule (mlatom.models.molecule, optional) – A molecule object whose property needs to be predicted by the model.
calculate_energy (bool, optional) – Use the model to calculate energy.
calculate_energy_gradients (bool, optional) – Use the model to calculate energy gradients.
calculate_hessian (bool, optional) – Use the model to calculate energy hessian.
batch_size (int, optional) – The batch size for batch-predictions.

DeepMD-kit

!---------------------------------------------------------------------------!
! Interface_DeePMDkit: Interface between DeePMD-kit and MLatom              !
! Implementations by: Fuchun Ge                                             !
!---------------------------------------------------------------------------!

class mlatom.interfaces.dpmd_interface.dpmd(model_file=None, hyperparameters={}, verbose=1)[source]

Create an DeepPot-SE model object.

Interfaces to DeepMD-kit.

Parameters:

model_file (str, optional) – The filename that the model to be saved with or loaded from.
hyperparameters (Dict[str, Any] | mlatom.models.hyperparameters, optional) – Updates the hyperparameters of the model with provided.
verbose (int, optional) – 0 for silence, 1 for verbosity.

train(molecular_database: molecular_database, property_to_learn: str = 'energy', xyz_derivative_property_to_learn: str = None, validation_molecular_database: molecular_database | str | None = 'sample_from_molecular_database', hyperparameters: Dict[str, Any] | hyperparameters = {}, spliting_ratio=0.8, stdout=None, stderr=None, json_input=None, dirname=None)[source]

Train the model with molecular database provided.

Parameters:

molecular_database (mlatom.data.molecular_database) – The database of molecules for training.
property_to_learn (str, optional) – The label of property to be learned in model training.
xyz_derivative_property_to_learn (str, optional) – The label of XYZ derivative property to be learned.

predict(molecular_database: data.molecular_database = None, molecule: data.molecule = None, calculate_energy: bool = False, calculate_energy_gradients: bool = False, calculate_hessian: bool = False, property_to_predict: str | None = 'estimated_y', xyz_derivative_property_to_predict: str | None = None, hessian_to_predict: str | None = None) → None[source]

Make predictions for molecular geometries with the model.

Parameters:

molecular_database (mlatom.data.molecular_database, optional) – A database contains the molecules whose properties need to be predicted by the model.
molecule (mlatom.models.molecule, optional) – A molecule object whose property needs to be predicted by the model.
calculate_energy (bool, optional) – Use the model to calculate energy.
calculate_energy_gradients (bool, optional) – Use the model to calculate energy gradients.
calculate_hessian (bool, optional) – Use the model to calculate energy hessian.
property_to_predict (str, optional) – The label name where the predicted properties to be saved.
xyz_derivative_property_to_predict (str, optional) – The label name where the predicted XYZ derivatives to be saved.
hessian_to_predict (str, optional) – The label name where the predicted Hessians to be saved.

GAP/QUIP

!---------------------------------------------------------------------------!
! Interface_GAP: Interface between GAP and MLatom                           !
! Implementations by: Fuchun Ge                                             !
!---------------------------------------------------------------------------!

class mlatom.interfaces.gap_interface.gap(model_file=None, hyperparameters={}, verbose=1)[source]

Create an GAP-SOAP model object.

Interfaces to QUIP.

Parameters:

model_file (str, optional) – The filename that the model to be saved with or loaded from.
hyperparameters (Dict[str, Any] | mlatom.models.hyperparameters, optional) – Updates the hyperparameters of the model with provided.
verbose (int, optional) – 0 for silence, 1 for verbosity.

train(molecular_database: molecular_database, property_to_learn: str = 'energy', xyz_derivative_property_to_learn: str = None, hyperparameters: Dict[str, Any] | hyperparameters = {}, stdout=None, stderr=None)[source]

Train the model with molecular database provided.

Parameters:

molecular_database (mlatom.data.molecular_database) – The database of molecules for training.
property_to_learn (str, optional) – The label of property to be learned in model training.
xyz_derivative_property_to_learn (str, optional) – The label of XYZ derivative property to be learned.

predict(molecular_database: data.molecular_database = None, molecule: data.molecule = None, calculate_energy: bool = False, calculate_energy_gradients: bool = False, calculate_hessian: bool = False, property_to_predict: str | None = 'estimated_y', xyz_derivative_property_to_predict: str | None = None, hessian_to_predict: str | None = None) → None[source]

Make predictions for molecular geometries with the model.

Parameters:

molecular_database (mlatom.data.molecular_database, optional) – A database contains the molecules whose properties need to be predicted by the model.
molecule (mlatom.models.molecule, optional) – A molecule object whose property needs to be predicted by the model.
calculate_energy (bool, optional) – Use the model to calculate energy.
calculate_energy_gradients (bool, optional) – Use the model to calculate energy gradients.
calculate_hessian (bool, optional) – Use the model to calculate energy hessian.
property_to_predict (str, optional) – The label name where the predicted properties to be saved.
xyz_derivative_property_to_predict (str, optional) – The label name where the predicted XYZ derivatives to be saved.
hessian_to_predict (str, optional) – The label name where the predicted Hessians to be saved.

PhysNet

!---------------------------------------------------------------------------!
! Interface_PhysNet: Interface between PhysNet and MLatom                   !
! Implementations by: Fuchun Ge and Max Pinheiro Jr                         !
!---------------------------------------------------------------------------!

class mlatom.interfaces.physnet_interface.physnet(model_file=None, hyperparameters={}, verbose=1)[source]

Create an PhysNet model object.

Interfaces to PhysNet program.

Parameters:

model_file (str, optional) – The filename that the model to be saved with or loaded from.
hyperparameters (Dict[str, Any] | mlatom.models.hyperparameters, optional) – Updates the hyperparameters of the model with provided.
verbose (int, optional) – 0 for silence, 1 for verbosity.

train(molecular_database: molecular_database, property_to_learn: str = 'energy', xyz_derivative_property_to_learn: str = None, validation_molecular_database: molecular_database | str | None = 'sample_from_molecular_database', hyperparameters: Dict[str, Any] | hyperparameters = {}, spliting_ratio=0.8, save_model=True, log=True, check_point=False, use_last_model=False, summary_interval=1, validation_interval=1, save_interval=1, record_run_metadata=0) → None[source]

Train the model with molecular database provided.

Parameters:

molecular_database (mlatom.data.molecular_database) – The database of molecules for training.
property_to_learn (str, optional) – The label of property to be learned in model training.
xyz_derivative_property_to_learn (str, optional) – The label of XYZ derivative property to be learned.

predict(molecular_database: data.molecular_database = None, molecule: data.molecule = None, calculate_energy: bool = False, calculate_energy_gradients: bool = False, calculate_hessian: bool = False, property_to_predict: str | None = 'estimated_y', xyz_derivative_property_to_predict: str | None = None, hessian_to_predict: str | None = None) → None[source]

Make predictions for molecular geometries with the model.

Parameters:

molecular_database (mlatom.data.molecular_database, optional) – A database contains the molecules whose properties need to be predicted by the model.
molecule (mlatom.models.molecule, optional) – A molecule object whose property needs to be predicted by the model.
calculate_energy (bool, optional) – Use the model to calculate energy.
calculate_energy_gradients (bool, optional) – Use the model to calculate energy gradients.
calculate_hessian (bool, optional) – Use the model to calculate energy hessian.
property_to_predict (str, optional) – The label name where the predicted properties to be saved.
xyz_derivative_property_to_predict (str, optional) – The label name where the predicted XYZ derivatives to be saved.
hessian_to_predict (str, optional) – The label name where the predicted Hessians to be saved.

MACE

!---------------------------------------------------------------------------!
! Interface_MACE: Interface between MACE and MLatom                         !
! Implementations by: Fuchun Ge                                             !
!---------------------------------------------------------------------------!

mlatom.interfaces.mace_interface.get_dataset_from_molDB(train_db: molecular_database, valid_db: molecular_database, valid_fraction: float = 0.1, config_type_weights: Dict = {'Default': 1.0}, energy_key: str = '', gradients_key: str = '') → Tuple[mace.tools.scripts_utils.SubsetCollection, Dict[int, float] | None][source]: Load training and test dataset from molecular database

class mlatom.interfaces.mace_interface.mace(model_file: str = None, device: str = None, hyperparameters: Dict[str, Any] | hyperparameters = {}, verbose=True)[source]

Create an MACE model object.

Interfaces to MACE program.

Parameters:

model_file (str, optional) – The filename that the model to be saved with or loaded from.
device (str, optional) – Indicate which device the calculation will be run on. i.e. ‘cpu’ for CPU, ‘cuda’ for Nvidia GPUs. When not speficied, it will try to use CUDA if there exists valid CUDA_VISIBLE_DEVICES in the environ of system.
hyperparameters (Dict[str, Any] | mlatom.models.hyperparameters, optional) – Updates the hyperparameters of the model with provided.
verbose (int, optional) – 0 for silence, 1 for verbosity.

train(molecular_database: data.molecular_database, property_to_learn: str = 'energy', xyz_derivative_property_to_learn: str = None, validation_molecular_database: data.molecular_database | str | None = 'sample_from_molecular_database', spliting_ratio: float = 0.8, hyperparameters: Dict[str, Any] | models.hyperparameters = {}) → None[source]

Train the model with molecular database provided.

Parameters:

molecular_database (mlatom.data.molecular_database) – The database of molecules for training.
property_to_learn (str, optional) – The label of property to be learned in model training.
xyz_derivative_property_to_learn (str, optional) – The label of XYZ derivative property to be learned.
validation_molecular_database (mlatom.data.molecular_database | str, optional) – Explicitly defines the database for validation, or use 'sample_from_molecular_database' to make it sampled from the training set.
hyperparameters (Dict[str, Any] | mlatom.models.hyperparameters, optional) – Updates the hyperparameters of the model with provided.

predict(molecular_database: data.molecular_database = None, molecule: data.molecule = None, calculate_energy: bool = False, calculate_energy_gradients: bool = False, calculate_hessian: bool = False, property_to_predict: str | None = 'estimated_y', xyz_derivative_property_to_predict: str | None = None, hessian_to_predict: str | None = None, batch_size: int = 8) → None[source]

Make predictions for molecular geometries with the model.

Parameters:

molecular_database (mlatom.data.molecular_database, optional) – A database contains the molecules whose properties need to be predicted by the model.
molecule (mlatom.models.molecule, optional) – A molecule object whose property needs to be predicted by the model.
calculate_energy (bool, optional) – Use the model to calculate energy.
calculate_energy_gradients (bool, optional) – Use the model to calculate energy gradients.
calculate_hessian (bool, optional) – Use the model to calculate energy hessian.
property_to_predict (str, optional) – The label name where the predicted properties to be saved.
xyz_derivative_property_to_predict (str, optional) – The label name where the predicted XYZ derivatives to be saved.
hessian_to_predict (str, optional) – The label name where the predicted Hessians to be saved.
batch_size (int, optional) – The batch size for batch-predictions.

class mlatom.interfaces.mace_interface.CheckpointHandler_modified(*args: Any, **kwargs: Any)[source]

sGDML

!---------------------------------------------------------------------------!
! Interface_sGDML: Interface between sGDML and MLatom                       !
! Implementations by: Fuchun Ge                                             !
!---------------------------------------------------------------------------!

class mlatom.interfaces.sgdml_interface.sgdml(model_file=None, hyperparameters={}, verbose=1, max_memory=None, max_processes=None, use_torch=False, lazy_training=False)[source]

Create an sGDML model object.

Interfaces to sGDML.

Parameters:

model_file (str, optional) – The filename that the model to be saved with or loaded from.
hyperparameters (Dict[str, Any] | mlatom.models.hyperparameters, optional) – Updates the hyperparameters of the model with provided.
verbose (int, optional) – 0 for silence, 1 for verbosity.

Hyperparameters:

Please refer to the sgdml manual

no_E: mlatom.models.hyperparameter(value=False) gdml: mlatom.models.hyperparameter(value=False) perms: mlatom.models.hyperparameter(value=None) sigma: mlatom.models.hyperparameter(value=None) E_cstr: mlatom.models.hyperparameter(value=False) cprsn: mlatom.models.hyperparameter(value=False)

train(molecular_database: molecular_database, property_to_learn: str = 'energy', xyz_derivative_property_to_learn: str = None, validation_molecular_database: molecular_database | str | None = None, hyperparameters: Dict[str, Any] | hyperparameters = {}, spliting_ratio=0.8, save_model=True, task_dir=None, overwrite=False, max_memory=None, max_processes=None, use_torch=None, lazy_training=None)[source]

Train the model with molecular database provided.

Parameters:

molecular_database (mlatom.data.molecular_database) – The database of molecules for training.
property_to_learn (str, optional) – The label of property to be learned in model training.
xyz_derivative_property_to_learn (str, optional) – The label of XYZ derivative property to be learned.

predict(molecular_database: data.molecular_database = None, molecule: data.molecule = None, calculate_energy: bool = False, calculate_energy_gradients: bool = False, calculate_hessian: bool = False, property_to_predict: str | None = 'estimated_y', xyz_derivative_property_to_predict: str | None = None, hessian_to_predict: str | None = None, batch_size: int = 65536) → None[source]

Make predictions for molecular geometries with the model.

Parameters:

molecular_database (mlatom.data.molecular_database, optional) – A database contains the molecules whose properties need to be predicted by the model.
molecule (mlatom.models.molecule, optional) – A molecule object whose property needs to be predicted by the model.
calculate_energy (bool, optional) – Use the model to calculate energy.
calculate_energy_gradients (bool, optional) – Use the model to calculate energy gradients.
calculate_hessian (bool, optional) – Use the model to calculate energy hessian.
property_to_predict (str, optional) – The label name where the predicted properties to be saved.
xyz_derivative_property_to_predict (str, optional) – The label name where the predicted XYZ derivatives to be saved.
hessian_to_predict (str, optional) – The label name where the predicted Hessians to be saved.
batch_size (int, optional) – The batch size for batch-predictions.

Gaussian

!---------------------------------------------------------------------------!
! gaussian: interface to the Gaussian program                               !
! Implementations by: Pavlo O. Dral, Peikun Zheng, Yi-Fan Hou               !
!---------------------------------------------------------------------------!

class mlatom.interfaces.gaussian_interface.gaussian_methods(method='B3LYP/6-31G*', nthreads=None, save_files_in_current_directory=False, working_directory=None, **kwargs)[source]

Gaussian interface

Parameters:

method (str) – Method to use
nthreads (int) – equivalent to %proc in Gaussian input file
save_files_in_current_directory (bool) – whether to keep input and output files, default 'False'
working_directory (str) – path to the directory where the program output files and other tempory files are saved, default 'None'

Note

The format of method should be the same as that in Gaussian, e.g., 'B3LYP/6-31G*'

predict(molecular_database=None, molecule=None, calculate_energy=True, calculate_energy_gradients=False, calculate_hessian=False, gaussian_keywords=None)[source]

Make predictions for molecular geometries with the model.

Parameters:

molecular_database (mlatom.data.molecular_database, optional) – A database contains the molecules whose properties need to be predicted by the model.
molecule (mlatom.models.molecule, optional) – A molecule object whose property needs to be predicted by the model.
calculate_energy (bool, optional) – Use the model to calculate energy.
calculate_energy_gradients (bool, optional) – Use the model to calculate energy gradients.
calculate_hessian (bool, optional) – Use the model to calculate energy hessian.
gaussian_keywords (some type) – # needs to be documented.

Orca

!---------------------------------------------------------------------------!
! orca: interface to the ORCA program                                       !
! Implementations by: Yuxinxin Chen                                         !
!---------------------------------------------------------------------------!

class mlatom.interfaces.orca_interface.orca_methods(method='wb97x/6-31G*', **kwargs)[source]

ORCA interface

Parameters:

method (str) – method used the same as in ORCA, e.g. 'B3LYP/6-31G*' (case insensitive). This is the first line in orca input and you can also store the whole first line here.
save_files_in_current_directory (bool) – whether to keep input and output files, default 'False'
working_directory (str) – path to the directory where the program output files and other tempory files are saved, default 'None'
nthreads (int) – equivalent to %pal nprocs in ORCA input file
nthreads_list (list) – a list of number of nthreads used in CCSD(T)*/CBS method. The order should be [mp2_tz, mp2_qz, dlpno_normal_dz, dlpno_normal_tz, dlpno_tight_dz]
additional_keywords (list) – list of keywords(str) to be added to orca input (first line)
input_file (str) – name of your orca input file. Default will use “[molecule number]_”
output_keywords (list) – list of keywords that you want to extract from output file. (Currently customized keywords are only supported in energy calculation and in property.txt file)

Note

When using CCSD(T)*/CBS for calculation, please make sure the 'nthreads' you used for each method will not cause memory exceeding. We suggest using 'nthreads_list' to properly set 'nthreads' for each component method in the order: [MP2/cc-pVTZ, MP2/cc-pVQZ, DLPNO-CCSD(T)-normalPNO/cc-pVDZ, DLPNO-CCSD(T)-normalPNO/cc-pVTZ, DLPNO-CCSD(T)-tightPNO/cc-pVTZ] If only 'nthreads' is set, all component methods would use the same number of threads.

predict(molecular_database=None, molecule=None, calculate_energy=True, calculate_energy_gradients=False, calculate_hessian=False, **kwargs)[source]

Make predictions for molecular geometries with the model.

Parameters:

molecular_database (mlatom.data.molecular_database, optional) – A database contains the molecules whose properties need to be predicted by the model.
molecule (mlatom.models.molecule, optional) – A molecule object whose property needs to be predicted by the model.
calculate_energy (bool, optional) – Use the model to calculate energy.
calculate_energy_gradients (bool, optional) – Use the model to calculate energy gradients.
calculate_hessian (bool, optional) – Use the model to calculate energy hessian.
**kwargs – # needs to be documented.

DFT-D4

!---------------------------------------------------------------------------!
! dftd4: interface to the dftd4 program                                     !
! Implementations by: Pavlo O. Dral & Peikun Zheng                          !
!---------------------------------------------------------------------------!

class mlatom.interfaces.dftd4_interface.dftd4_methods(functional=None, save_files_in_current_directory=True, working_directory=None, **kwargs)[source]

DFT-D4 interface

Parameters:

functional (str) – functional to use
save_files_in_current_directory (bool) – whether to keep input and output files, default 'True'
working_directory (str) – path to the directory where the program output files and other tempory files are saved, default 'None'

Note

The default DFT-D4 implementation provides a shared memory parallelisation for CPUs. They offer openMP parallelisation, which is not implemented here currently. For more discussion, please refer to https://github.com/dftd4/dftd4/issues/20.

predict(molecular_database=None, molecule=None, calculate_energy=True, calculate_energy_gradients=False, calculate_hessian=False, nstates=1, **kwargs)[source]

Make predictions for molecular geometries with the model.

Parameters:

molecular_database (mlatom.data.molecular_database, optional) – A database contains the molecules whose properties need to be predicted by the model.
molecule (mlatom.models.molecule, optional) – A molecule object whose property needs to be predicted by the model.
calculate_energy (bool, optional) – Use the model to calculate energy.
calculate_energy_gradients (bool, optional) – Use the model to calculate energy gradients.
calculate_hessian (bool, optional) – Use the model to calculate energy hessian.

PySCF

!---------------------------------------------------------------------------!
! Interface_PySCF: interface to the PySCF program                           !
! Implementations by: Yuxinxin Chen                                         !
!---------------------------------------------------------------------------!

class mlatom.interfaces.pyscf_interface.pyscf_methods(method='B3LYP/6-31g', **kwargs)[source]

PySCF interface

Parameters:

method (str) – Method to use
nthreads (int) – Set the number of OMP threads

Note

Methods supported:

Energy: HF, MP2, DFT, CISD, FCI, CCSD/CCSD(T), TD-DFT/TD-HF

Gradients: HF, MP2, DFT, CISD, CCSD, RCCSD(T), TD-DFT/TD-HF

Hessian: HF, DFT

predict(molecule=None, molecular_database=None, calculate_energy=True, calculate_energy_gradients=False, calculate_hessian=False, **kwargs)[source]

Make predictions for molecular geometries with the model.

Parameters:

molecular_database (mlatom.data.molecular_database, optional) – A database contains the molecules whose properties need to be predicted by the model.
molecule (mlatom.models.molecule, optional) – A molecule object whose property needs to be predicted by the model.
calculate_energy (bool, optional) – Use the model to calculate energy.
calculate_energy_gradients (bool, optional) – Use the model to calculate energy gradients.
calculate_hessian (bool, optional) – Use the model to calculate energy hessian.
**kwargs – # needs to be documented.

Sparrow

!---------------------------------------------------------------------------!
! sparrow: interface to the Sparrow program                                 !
! Implementations by: Pavlo O. Dral & Peikun Zheng                          !
!---------------------------------------------------------------------------!

class mlatom.interfaces.sparrow_interface.sparrow_methods(method='ODM2*', read_keywords_from_file='', save_files_in_current_directory=False, working_directory=None, **kwargs)[source]

Sparrow interface

Parameters:

method (str) – method to use
read_keywords_from_file (str) – keywords used in Sparrow
save_files_in_current_directory (bool) – whether to keep input and output files, default 'False'
working_directory (str) – path to the directory where the program output files and other tempory files are saved, default 'None'

Note

Methods supported:

Energy: DFTB0, DFTB2, DFTB3 MNDO, MNDO/d, AM1, PM3, PM6, RM1, OM2, OM3, ODM2*, ODM3* AIQM1

Gradients: DFTB0, DFTB2, DFTB3 MNDO, MNDO/d, AM1, PM3, PM6, RM1

predict(molecular_database=None, molecule=None, calculate_energy=True, calculate_energy_gradients=False, calculate_hessian=False, **kwargs)[source]

Make predictions for molecular geometries with the model.

Parameters:

molecular_database (mlatom.data.molecular_database, optional) – A database contains the molecules whose properties need to be predicted by the model.
molecule (mlatom.models.molecule, optional) – A molecule object whose property needs to be predicted by the model.
calculate_energy (bool, optional) – Use the model to calculate energy.
calculate_energy_gradients (bool, optional) – Use the model to calculate energy gradients.
calculate_hessian (bool, optional) – Use the model to calculate energy hessian.

xTB

!---------------------------------------------------------------------------!
! xtb: interface to the xtb program                                         !
! Implementations by: Pavlo O. Dral                                         !
!---------------------------------------------------------------------------!

class mlatom.interfaces.xtb_interface.xtb_methods(method='GFN2-xTB', read_keywords_from_file='', **kwargs)[source]

xTB interface

Parameters:

method (str) – xTB methods
read_keywords_from_file (str) – keywords used in xTB

Note

Only GFN2-xTB is available.

Examples:

from ml.interfaces.xtb import xtb_methods()

# read molecule from xyz file
mol = ml.data.molecule()
mol.read_from_xyz_file('sp.xyz')

# initialize xtb methods
model = xtb_methods(method='GFN2-xTB)

# calculate energy, gradients and hessian
model.predict(molecule=mol,
            calculate_energy_gradients=True,
            calculate_hessian=True)
print(mol.energy)

predict(molecular_database=None, molecule=None, calculate_energy=True, calculate_energy_gradients=False, calculate_hessian=False)[source]

Make predictions for molecular geometries with the model.

Parameters:

molecular_database (mlatom.data.molecular_database, optional) – A database contains the molecules whose properties need to be predicted by the model.
molecule (mlatom.models.molecule, optional) – A molecule object whose property needs to be predicted by the model.
calculate_energy (bool, optional) – Use the model to calculate energy.
calculate_energy_gradients (bool, optional) – Use the model to calculate energy gradients.
calculate_hessian (bool, optional) – Use the model to calculate energy hessian.

MNDO

!---------------------------------------------------------------------------!
! mndo: interface to the MNDO program                                       !
! Implementations by: Pavlo O. Dral, Peikun Zheng, and Lina Zhang           !
!---------------------------------------------------------------------------!

class mlatom.interfaces.mndo_interface.mndo_methods(method='ODM2*', read_keywords_from_file='', save_files_in_current_directory=True, working_directory=None, **kwargs)[source]

MNDO interface

Parameters:

method (str) – method used in MNDO
read_keywords_from_file (str) – keywords used in MNDO
save_files_in_current_directory (bool) – whether to keep input and output files, default 'True'
working_directory (str) – path to the directory where the program output files and other tempory files are saved, default 'None'

predict(molecular_database=None, molecule=None, nstates=1, current_state=0, calculate_energy=True, calculate_energy_gradients=False, calculate_hessian=False, calculate_nacv=False, read_density_matrix=False)[source]

Make predictions for molecular geometries with the model.

Parameters:

molecular_database (mlatom.data.molecular_database, optional) – A database contains the molecules whose properties need to be predicted by the model.
molecule (mlatom.models.molecule, optional) – A molecule object whose property needs to be predicted by the model.
calculate_energy (bool, optional) – Use the model to calculate energy.
calculate_energy_gradients (bool, optional) – Use the model to calculate energy gradients.
calculate_hessian (bool, optional) – Use the model to calculate energy hessian.

Simulations

!---------------------------------------------------------------------------!
! simulations: Module for simulations                                       !
! Implementations by: Pavlo O. Dral                                         !
!---------------------------------------------------------------------------!

class mlatom.simulations.optimize_geometry(model=None, model_predict_kwargs={}, initial_molecule=None, molecule=None, ts=False, program=None, optimization_algorithm=None, maximum_number_of_steps=None, convergence_criterion_for_forces=None, working_directory=None, **kwargs)[source]

Geometry optimization.

Parameters:

model (mlatom.models.model or mlatom.models.methods) – any model or method which provides energies and forces.
initial_molecule (mlatom.data.molecule) – the molecule object to optimize.
ts (bool, optional) – whether to do transition state search. Currently only be done with program=Gaussian or ASE.
program (str, optional) – the engine used in geometry optimization. Currently supports Gaussian, ASE, and scipy.
optimization_algorithm (str, optional) – the optimization algorithm used in ASE. Default value: LBFGS (ts=False), dimer (ts=False).
maximum_number_of_steps (int, optional) – the maximum number of steps for ASE and SciPy. Default value: 200.
convergence_criterion_for_forces (float, optional) – forces convergence criterion in ASE. Default value: 0.02 eV/Angstroms.
working_directory (str, optional) – working directory. Default value: ‘.’, i.e., current directory.
constraints (dict, optional) – constraints for geometry optimization. Currently only available with program=ASE and follows the same conventions as in ASE: constraints={'bonds':[[target,[index0,index1]], ...],'angles':[[target,[index0,index1,index2]], ...],'dihedrals':[[target,[index0,index1,index2,index3]], ...]} (check FixInternals class in ASE for more information).

Examples:

# Initialize molecule
mol = ml.data.molecule()
mol.read_from_xyz_file(filename='ethanol.xyz')
# Initialize methods
aiqm1 = ml.models.methods(method='AIQM1', qm_program='MNDO')
# Run geometry optimization
geomopt = ml.simulations.optimize_geometry(model = aiqm1, initial_molecule=mol, program = 'ASE')
# Get the optimized geometry, energy, and gradient
optmol = geomopt.optimized_molecule
geo = optmol.get_xyz_coordinates()
energy = optmol.energy
gradient = optmol.get_energy_gradients()

class mlatom.simulations.freq(model=None, model_predict_kwargs={}, molecule=None, program=None, normal_mode_normalization='mass deweighted normalized', anharmonic=False, anharmonic_kwargs={}, working_directory=None)[source]

Frequence analysis.

Parameters:

model (mlatom.models.model or mlatom.models.methods) – any model or method which provides energies and forces and Hessian.
molecule (mlatom.data.molecule) – the molecule object with necessary information.
program (str, optional) – the engine used in frequence analysis through modified TorchANI (if Gaussian not found or any other string is given), pyscf or Gaussian interfaces.
normal_mode_normalization (str, optional) – normal modes output scheme. It should be one of: mass weighted normalized, mass deweighted unnormalized, and mass deweighted normalized (default).
anharmonic (bool) – whether to do anharmonic frequence calculation.
working_directory (str, optional) – working directory. Default value: ‘.’, i.e., current directory.

Examples:

# Initialize molecule
mol = ml.data.molecule()
mol.read_from_xyz_file(filename='ethanol.xyz')
# Initialize methods
aiqm1 = ml.models.methods(method='AIQM1', qm_program='MNDO')
# Run frequence analysis
ml.simulations.freq(model=aiqm1, molecule=mol, program='ASE')
# Get frequencies
frequencies = mol.frequencies

class mlatom.simulations.thermochemistry(model=None, molecule=None, program=None, normal_mode_normalization='mass deweighted normalized')[source]

Thermochemical properties calculation.

Parameters:

model (mlatom.models.model or mlatom.models.methods) – any model or method which provides energies and forces and Hessian.
molecule (mlatom.data.molecule) – the molecule object with necessary information.
program (str) – the engine used in thermochemical properties calculation. Currently support Gaussian and ASE.
normal_mode_normalization (str, optional) – normal modes output scheme. It should be one of: mass weighted normalized, mass deweighted unnormalized, and mass deweighted unnormalized (default).

# Initialize molecule
mol = ml.data.molecule()
mol.read_from_xyz_file(filename='ethanol.xyz')
# Initialize methods
aiqm1 = ml.models.methods(method='AIQM1', qm_program='MNDO')
# Run thermochemical properties calculation
ml.simulations.thermochemistry(model=aiqm1, molecule=mol, program='ASE')
# Get ZPE and heat of formation
ZPE = mol.ZPE
Hof = mol.DeltaHf298

The thermochemical properties available in molecule object after the calculation:

ZPE: Zero-point energy
DeltaE2U: Thermal correction to Energy (only available in Gaussian)
DeltaE2H: Thermal correction to Enthalpy (only available in Gaussian)
DeltaE2G: Thermal correction to Gibbs free energy (only available in Gaussian)
U0: Internal energy at 0K
H0: Enthalpy at 0K
U: Internal energy (only available in Gaussian)
H: Enthalpy
G: Gibbs free energy
S: Entropy (only available in Gaussian)
atomization_energy_0K
ZPE_exclusive_atomization_energy_0K
DeltaHf298: Heat of formation at 298 K

class mlatom.simulations.dmc(model: model, initial_molecule: molecule = None, initial_molecular_database: molecular_database = None, energy_scaling_factor: float = 1.0)[source]

Run diffusion Monte Carlo simulation for molecule(s) using PyVibDMC.

Parameters:

model (mlatom.models.model) – The potential energy surfaces model. The unit should be Hartree, otherwise a correct energy_scaling_factor need to be set.
initial_molecule (mlatom.data.molecule) – The initial geometry for the walkers. Usually a energy minimum geometry should be provided. By default every coordinate will be scaled by 1.01 to make it slightly distorted.
energy_scaling_factor (float, optional) – A factor that will be multiplied to the model’s energy pridiction.

run(run_dir: str = 'DMC', weighting: str = 'discrete', number_of_walkers: int = 5000, number_of_timesteps: int = 10000, equilibration_steps: int = 500, dump_trajectory_interval: int = 500, dump_wavefunction_interval: int = 1000, descendant_weighting_steps: int = 300, time_step: float = 0.024188843265857, initialize: bool = False)[source]

Run the DMC simulation.

Parameters:

run_dir (str) – The folder for the output files.
weighting (str) – 'discrete' or 'continuous'. 'continuous' keeps the ensemble size constant.
number_of_walkers (int) – The number of geometries exploring the potential surface.
number_of_timesteps (int) – The number of steps the simulation will go.
equilibration_steps (int) – The number of steps for equilibration.
dump_trajectory_interval (int) – The interval for dumping walkers’ trajectories.
dump_wavefunction_interval (int) – The interval for collecting wave function.
descendant_weighting_steps (int) – The number of time steps for descendant weighting per wave function.
time_step (float) – The length of each time step in fs.

load(filename)[source]: Load previous simulation results from a HDF5 file.

get_zpe(start_step=1000) → float[source]

Return calculated zero-point energy in Hartree.

Parameters:: start_step (int) – The starting step for averaging the energies.

Initial conditions

mlatom.initial_conditions.generate_initial_conditions(molecule=None, generation_method=None, number_of_initial_conditions=1, file_with_initial_xyz_coordinates=None, file_with_initial_xyz_velocities=None, eliminate_angular_momentum=True, degrees_of_freedom=None, initial_temperature=None, initial_kinetic_energy=None, use_hessian=False, reaction_coordinate_momentum=True, filter_by_energy_window=False, window_filter_kwargs={}, random_seed=None)[source]

Generate initial conditions

Parameters:

molecule (data.molecule) – Molecule with necessary information
generation_method (str) – Initial condition generation method
number_of_initial_conditions (int) – Number of initial conditions to generate, 1 by default
file_with_initial_xyz_coordinates (str) – File with initial xyz coordinates, only valid for generation_method='user-defined'
file_with_initial_xyz_velocities (str) – File with initial xyz velocities, only valid for generation_method='user-defined'
eliminate_angular_momentum (bool) – Remove angular momentum from velocities, valid for generation_method='random' and generation_method='wigner'
degrees_of_freedom (int) – Degrees of freedom of the molecule, by default remove translational and rotational degrees of freedom. It can be a negative value, which means that some value is subtracted from 3*Natoms
initial_temperature (float) – Initial temperature in Kelvin, control random initial velocities
initial_kinetic_energy (float) – Initial kinetic energy in Hartree, control random initial velocities
random_seed (int) – Random seed for numpy random number generator

generation_method	description
`'user-defined'` (default)	Use user-defined initial conditions
`'random'`	Generate random velocities
`'maxwell-boltzmann'`	Randomly generate initial velocities from Maxwell-Boltzmann distribution
`'wigner'`	Use Wigner sampling as implemented in Newton-X

Returns:: A molecular database (ml.data.molecular_database) with number_of_initial_conditions initial conditions

Examples:

# Use user-defined initial conditions
init_cond_db = ml.generate_initial_conditions(molecule = mol,
                                              generation_method = 'user-defined',
                                              file_with_initial_xyz_coordinates = 'ethanol.xyz',
                                              file_with_initial_xyz_velocities  = 'ethanol.vxyz',
                                              number_of_initial_conditions = 1)
# Generate random velocities
init_cond_db = ml.generate_initial_conditions(molecule = mol,
                                              generation_method = 'random',
                                              initial_temperature = 300,
                                              number_of_initial_conditions = 1)
# Use Wigner sampling
init_cond_db = ml.generate_initial_conditions(molecule = mol,
                                              generation_method = 'wigner',
                                              number_of_initial_conditions = 1)

Note

Wigner sampling needs Hessian matrix. You can use ml.models.methods.predict(molecule=mol,calculate_hessian=True) to get Hessian matrix.

Molecular dynamics

!---------------------------------------------------------------------------!
! MD: Module for molecular dynamics                                         !
! Implementations by: Yi-Fan Hou & Pavlo O. Dral                            !
!---------------------------------------------------------------------------!

class mlatom.md.md(model=None, model_predict_kwargs={}, molecule_with_initial_conditions=None, molecule=None, ensemble='NVE', thermostat=None, time_step=0.1, maximum_propagation_time=1000, dump_trajectory_interval=None, filename=None, format='h5md', stop_function=None, stop_function_kwargs=None)[source]

Molecular dynamics

Parameters:

model (mlatom.models.model or mlatom.models.methods) – Any model or method which provides energies and forces.
molecule_with_initial_conditions (data.molecule) – The molecule with initial conditions.
ensemble (str, optional) – Which kind of ensemble to use.
thermostat (thermostat.Thermostat) – The thermostat applied to the system.
time_step (float) – Time step in femtoseconds.
maximum_propagation_time (float) – Maximum propagation time in femtoseconds.
dump_trajectory_interval (int, optional) – Dump trajectory at which interval. Set to None to disable dumping.
filename (str, optional) – The file that saves the dumped trajectory
format (str, optional) – Format in which the dumped trajectory is saved
stop_function (any, optional) – User-defined function that stops MD before maximum_propagation_time
stop_function_kwargs (Dict, optional) – Kwargs of stop_function

ensemble	description
`'NVE'` (default)	Microcanonical (NVE) ensemble
`'NVT'`	Canonical (NVT) ensemble

thermostat	description
`ml.md.Andersen_thermostat`	Andersen thermostat
`ml.md.Nose_Hoover_thermostat`	Hose-Hoover thermostat
`None` (default)	No thermostat is applied

For theoretical details, see and cite original paper.

Examples:

# Initialize molecule
mol = ml.data.molecule()
mol.read_from_xyz_file(filename='ethanol.xyz')
# Initialize methods
aiqm1 = ml.models.methods(method='AIQM1')
# User-defined initial condition
init_cond_db = ml.generate_initial_conditions(molecule = mol,
                                              generation_method = 'user-defined',
                                              file_with_initial_xyz_coordinates = 'ethanol.xyz',
                                              file_with_initial_xyz_velocities  = 'ethanol.vxyz')
init_mol = init_cond_db.molecules[0]
# Initialize thermostat
nose_hoover = ml.md.Nose_Hoover_thermostat(temperature=300,molecule=init_mol,degrees_of_freedom=-6)
# Run dynamics
dyn = ml.md(model=aiqm1,
            molecule_with_initial_conditions = init_mol,
            ensemble='NVT',
            thermostat=nose_hoover,
            time_step=0.5,
            maximum_propagation_time = 10.0)
# Dump trajectory
traj = dyn.molecular_trajectory
traj.dump(filename='traj', format='plain_text')
traj.dump(filename='traj.h5', format='h5md')

Note

Trajectory is saved in ml.md.molecular_trajectory, which is a ml.data.molecular_trajectory class

Warning

In MLatom, energy unit is Hartree and distance unit is Angstrom. Make sure that the units in your model are consistent.

class Andersen_thermostat(**kwargs)

Andersen thermostat object

Parameters:

gamma (float) – collision rate in fs^{-1}, 0.2 by default
temperature (float) – System temperature to equilibrate in Kelvin, 300 by default

class Nose_Hoover_thermostat(**kwargs)

Nose-Hoover thermostat object

Parameters:

nose_hoover_chain_length (int) – Nose Hoover chain length, should be a positive number, 3 by default
multiple_time_step (int) – Multiple time step, should be a positive number, 3 by default
number_of_yoshida_suzuki_steps (int) – Number of Yoshida Suzuki steps, can be any in (1,3,5,7), 7 by default
nose_hoover_chain_frequency (float) – Nose-Hoover chain frequency in $fs^{-1}$, 0.0625 by default, should be comparable to the frequency you want to equilibrate
temperature (float) – System temperature to equilibrate in Kelvin, 300 by default
molecule (data.molecule) – The molecule to be equilibrated
degrees_of_freedom – Degrees of freedom of the system

Surface-hopping dynamics

!—————————————————————————! ! MD: Module for molecular dynamics ! ! Implementations by: Lina Zhang & Pavlo O. Dral ! !—————————————————————————!

class mlatom.namd.surface_hopping_md(model=None, model_predict_kwargs={}, molecule_with_initial_conditions=None, molecule=None, ensemble='NVE', thermostat=None, time_step=0.1, maximum_propagation_time=100, dump_trajectory_interval=None, filename=None, format='h5md', stop_function=None, stop_function_kwargs=None, hopping_algorithm='LZBL', nstates=None, initial_state=None, random_seed=<function generate_random_seed>, prevent_back_hop=False, rescale_velocity_direction='along velocities', reduce_kinetic_energy=False)[source]

Surface-hopping molecular dynamics

Parameters:

model (mlatom.models.model or mlatom.models.methods) – Any model or method which provides energies and forces.
model_predict_kwargs (Dict, optional) – Kwargs of model prediction
molecule_with_initial_conditions (data.molecule) – The molecule with initial conditions.
molecule (data.molecule) – Work the same as molecule_with_initial_conditions
ensemble (str, optional) – Which kind of ensemble to use.
thermostat (thermostat.Thermostat) – The thermostat applied to the system.
time_step (float) – Time step in femtoseconds.
maximum_propagation_time (float) – Maximum propagation time in femtoseconds.
dump_trajectory_interval (int, optional) – Dump trajectory at which interval. Set to None to disable dumping.
filename (str, optional) – The file that saves the dumped trajectory
format (str, optional) – Format in which the dumped trajectory is saved
stop_function (any, optional) – User-defined function that stops MD before maximum_propagation_time
stop_function_kwargs (Dict, optional) – Kwargs of stop_function
hopping_algorithm (str, optional) – Surface hopping algorithm
nstates (int) – Number of states
initial_state (int) – Initial state
random_seed (int) – Random seed
prevent_back_hop (bool, optional) – Whether to prevent back hopping
rescale_velocity_direction (string, optional) – Rescale velocity direction
reduce_kinetic_energy (bool, optional) – Whether to reduce kinetic energy

ensemble	description
`'NVE'` (default)	Microcanonical (NVE) ensemble
`'NVT'`	Canonical (NVT) ensemble

thermostat	description
`ml.md.Andersen_thermostat`	Andersen thermostat
`ml.md.Nose_Hoover_thermostat`	Hose-Hoover thermostat
`None` (default)	No thermostat is applied

For theoretical details, see and cite original paper (to be submitted).

Examples:

import mlatom as ml

# Load the initial geometry of a molecule
mol = ml.data.molecule()
mol.charge=1
mol.read_from_xyz_file('cnh4+.xyz')

# Define models
aiqm1 = ml.models.methods(method='AIQM1',
                qm_program_kwargs={'save_files_in_current_directory': True,
                                    'read_keywords_from_file': f'mndokw'})
method_optfreq = ml.models.methods(method='B3LYP/Def2SVP', program='pyscf')

# Optimize geometry
geomopt = ml.simulations.optimize_geometry(model=method_optfreq,
                                        initial_molecule=mol)
eqmol = geomopt.optimized_molecule
eqmol.write_file_with_xyz_coordinates('eq.xyz')

# Get frequencies
ml.simulations.freq(model=method_optfreq,
                    molecule=eqmol)
eqmol.dump(filename='eqmol.json', format='json')

# Get initial conditions
init_cond_db = ml.generate_initial_conditions(molecule=eqmol,
                                    generation_method='wigner',
                                    number_of_initial_conditions=16,
                                    initial_temperature=0)
init_cond_db.dump('test.json','json')

# Propagate multiple LZBL surface-hopping trajectories in parallel
# .. setup dynamics calculations
namd_kwargs = {
            'model': aiqm1,
            'time_step': 0.25,
            'maximum_propagation_time': 5,
            'hopping_algorithm': 'LZBL',
            'nstates': 3,
            'initial_state': 2,
            }

# .. run trajectories in parallel
dyns = ml.simulations.run_in_parallel(molecular_database=init_cond_db,
                                    task=ml.namd.surface_hopping_md,
                                    task_kwargs=namd_kwargs,
                                    create_and_keep_temp_directories=True)
trajs = [d.molecular_trajectory for d in dyns]

# Dump the trajectories
itraj=0
for traj in trajs:
    itraj+=1
    traj.dump(filename=f"traj{itraj}.h5",format='h5md')

# Analyze the result of trajectories and make the population plot
ml.namd.analyze_trajs(trajectories=trajs, maximum_propagation_time=5)
ml.namd.plot_population(trajectories=trajs, time_step=0.25,
                    max_propagation_time=5, nstates=3, filename=f'pop.png')

Note

Trajectory is saved in ml.md.molecular_trajectory, which is a ml.data.molecular_trajectory class

Warning

In MLatom, energy unit is Hartree and distance unit is Angstrom. Make sure that the units in your model are consistent.

Spectra

Active learning

Initial data sampling

initdata_sampler can be:

'wigner'
'harmonic-quantum-boltzmann'

User-defined ML models

The user has the flexibility to create their own ML model class for AL. Minimum requirements to such a class:

it must have the usual train and predict functions.
the train function must accept molecular_database parameter.
the predict function must accept molecule and/or molecular_database parameters.

The realistic, fully fledged example of how to create a usable ML model class is below (it is what we use in al routine!):

class my_model():
    def __init__(self, al_info = {}, model_file=None, device=None, verbose=False):
        import torch
        if device is None:
            device = 'cuda' if torch.cuda.is_available() else 'cpu'

        if model_file is None:
            if 'mlmodel_file' in al_info.keys():
                self.model_file = al_info['mlmodel_file']
            else:
                self.model_file = 'mlmodel'
                al_info['mlmodel_file'] = self.model_file
        else:
            self.model_file = model_file
            al_info['mlmodel_file'] = self.model_file
        if 'main_mlmodel_file' in al_info.keys():
            main_mlmodel_file = al_info['main_mlmodel_file']
        else:
            main_mlmodel_file = f'{self.model_file}.pt'
            al_info['main_mlmodel_file'] = main_mlmodel_file
        if 'aux_mlmodel_file' in al_info.keys():
            aux_mlmodel_file = al_info['aux_mlmodel_file']
        else:
            aux_mlmodel_file = f'aux_{self.model_file}.pt'
            al_info['aux_mlmodel_file'] = aux_mlmodel_file
        self.device = device
        self.verbose = verbose
        self.main_model = ml.models.ani(model_file=main_mlmodel_file,device=device,verbose=verbose)
        self.aux_model = ml.models.ani(model_file=aux_mlmodel_file,device=device,verbose=verbose)

    def train(self, molecular_database=None, al_info={}):
        if 'working_directory' in al_info.keys():
            workdir = al_info['working_directory']
            self.main_model.model_file = f'{workdir}/{self.model_file}.pt'
            self.aux_model.model_file = f'{workdir}/aux_{self.model_file}.pt'

        validation_set_fraction = 0.1
        [subtraindb, valdb] = molecular_database.split(number_of_splits=2, fraction_of_points_in_splits=[1-validation_set_fraction, validation_set_fraction], sampling='random')

        # train the model on energies and gradients
        self.main_model = ml.models.ani(model_file=self.main_model.model_file,device=self.device,verbose=self.verbose)
        self.main_model.train(molecular_database=subtraindb,validation_molecular_database=valdb,property_to_learn='energy',xyz_derivative_property_to_learn='energy_gradients')

        # train the auxiliary model only on energies
        self.aux_model = ml.models.ani(model_file=self.aux_main_model.model_file,device=self.device,verbose=self.verbose)
        self.aux_model.train(molecular_database=subtraindb,validation_molecular_database=valdb,property_to_learn='energy')

        if not 'uq_threshold' in al_info.keys():
            self.predict(molecular_database=valdb)
            uqs = valdb.get_property('uq')
            al_info['uq_threshold'] = np.median(uqs) + 3*stats.calc_median_absolute_deviation(uqs)
        self.uq_threshold = al_info['uq_threshold']

        # if the models were trained successfully, let's update al info where we can find them
        al_info['main_mlmodel_file'] = self.main_model.model_file
        al_info['aux_mlmodel_file'] = self.aux_model.model_file

    def predict(self, molecule=None, molecular_database=None):

        # predict energies and gradients with the main model
        self.main_model.predict(molecule=molecule, molecular_database=molecular_database,property_to_predict='energy',xyz_derivative_property_to_predict='energy_gradients')

        # predict energies with the auxiliary model
        self.aux_model.predict(molecule=molecule, molecular_database=molecular_database,property_to_predict='aux_energy')

        # calculate uncertainties
        moldb = molecular_database
        if moldb is None:
            moldb = ml.molecular_database()

        for mol in moldb:
            mol.uq = abs(mol.energy - mol.aux_energy)
            if mol.uq > self.uq_threshold:
                mol.uncertain = True
            else:
                mol.uncertain = False

    # This are useful in some internal al routines, e.g., when we want to make predictions in parallel (if nthreads is not set properly, it may slow down al significantly!)
    @property
    def nthreads(self):
        return self.main_model.nthreads

    @nthreads.setter
    def nthreads(self, value):
        self.main_model.nthreads = value
        self.aux_model.nthreads  = value

ml.al(
    ...
    ml_model = my_model,
    # do not use my_model(...), if you want to pass any arguments, use ml_model_kwargs:
    ml_model_kwargs = {...}, # 'al_info' is unnecessary to include, it will be added automatically. If you supply 'al_info' key, it will overwrite the default one so use if you know what you are doing.
    ...
)

As you can see, it is helpful (but not required) if the __init__ and train functions of the ML model class also accept the al_info parameter which can be used to pass information during active learning from one routine to another.

Sampler

Here is a realistic example of the sampler function used in the physics-informed active learning:

def my_sampler(al_info={}, ml_model=None, initcond_sampler=None, initcond_sampler_kwargs={}, maximum_propagation_time=1000, time_step=0.1, ensemble='NVE', thermostat=None, dump_trajs=False, dump_trajectory_interval=None, stop_function=None, batch_parallelization=True):

    moldb2label = ml.data.molecular_database()

    # generate initial conditions
    if type(initcond_sampler) == str:
        if initcond_sampler.casefold() in ['wigner', 'harmonic-quantum-boltzmann']:
            initcond_sampler = ml.generate_initial_conditions
            initcond_sampler_kwargs['generation_method'] = initcond_sampler
    import inspect
    args, varargs, varkw, defaults = inspect.getargspec(initcond_sampler)
    # Do we need al_info below?
    if 'al_info' in args:
        initial_molecular_database = initcond_sampler(al_info=al_info, **initcond_sampler_kwargs)
    else:
        initial_molecular_database = initcond_sampler(**initcond_sampler_kwargs)

    # run MD in parallel to collect uncertain points
    if batch_parallelization: # Faster way to propagate many trajs with ML
        dyn = ml.md_parallel(model=ml_model,
                             molecular_database=initial_molecular_database,
                             ensemble=ensemble,
                             thermostat=thermostat,
                             time_step=time_step,
                             maximum_propagation_time=maximum_propagation_time,
                             dump_trajectory_interval=dump_trajectory_interval,
                             stop_function=stop_function)
        trajs = dyn.molecular_trajectory
        for itraj in range(len(trajs.steps[0])):
            print(f"Trajectory {itraj} number of steps: {trajs.traj_len[itraj]}")
            if trajs.steps[trajs.traj_len[itraj]][itraj].uncertain:
                print(f'Adding molecule from trajectory {itraj} at time {trajs.traj_len[itraj]*time_step} fs')
                moldb2label.molecules.append(trajs.steps[trajs.traj_len[itraj]][itraj])

            # Dump traj
            if dump_trajs:
                import os
                traj = ml.data.molecular_trajectory()
                for istep in range(trajs.traj_len[itraj]+1):

                    step = ml.data.molecular_trajectory_step()
                    step.step = istep
                    step.time = istep * time_step
                    step.molecule = trajs.steps[istep][itraj]
                    traj.steps.append(step)
                if 'working_directory' in al_info.keys():
                    dirname = f'{al_info['working_directory']}/trajs'
                else:
                    dirname = 'trajs'
                if not os.path.exists(dirname):
                    os.makedirs(dirname)
                traj.dump(f"{dirname}/traj{itraj}.h5",format='h5md')
    else:
        md_kwargs = {
                    'molecular_database': initial_molecular_database,
                    'model': ml_model,
                    'time_step': time_step,
                    'maximum_propagation_time': maximum_propagation_time,
                    'ensemble': ensemble,
                    'thermostat': thermostat,
                    'dump_trajectory_interval': dump_trajectory_interval,
                    'stop_function': stop_function
                    }
        dyns = ml.simulations.run_in_parallel(molecular_database=initial_molecular_database,
                                            task=ml.md,
                                            task_kwargs=md_kwargs,
                                            create_and_keep_temp_directories=False)
        trajs = [d.molecular_trajectory for d in dyns]
        itraj=0
        for traj in trajs:
            itraj+=1
            print(f"Trajectory {itraj} number of steps: {len(traj.steps)}")
            if traj.steps[-1].molecule.uncertain:
                print('Adding molecule from trajectory %d at time %.2f fs' % (itraj, traj.steps[-1].time))
                moldb2label.molecules.append(traj.steps[-1].molecule)

            # Dump traj
            if dump_trajs:
                import os
                if 'working_directory' in al_info.keys():
                    dirname = f'{al_info['working_directory']}/trajs'
                else:
                    dirname = 'trajs'
                if not os.path.exists(dirname):
                    os.makedirs(dirname)
                traj.dump(f"{dirname}/traj{itraj}.h5",format='h5md')
    # add the source of molecule
    for mol in moldb2label:
        mol.sampling = 'md'
    return moldb2label

ml.al(
    ...
    sampler=my_sampler,
    sampler_kwargs={'time_step': 0.5},
    ...
)

Analysis

!—————————————————————————! ! xyz: different operations on XYZ coordinates ! ! Implementations by: Fuchun Ge ! ! To-do: implement permutation (Hungarian algorithm) ! !—————————————————————————!