Overview
The Python API (or PyAPI for short) of MLatom mainly contains 3 components: mlatom.data
, mlatom.models
, and mlatom.simulations
.
With class mlatom.data
, one can create/manipulate/save/load chemical data into objects like mlatom.data.atom
, mlatom.data.molecule
, and mlatom.data.molecular_database
.
The class mlatom.models
contains lots of computational chemistry models, based on quantum mechanics and machine learning, for making predictions on molecules. The models can be categorized into 3 types:
methods: Models of methods that can be used ‘as is’ without the need to be trained by the user.
ml_model: Machine learning models that require user to train them.
model_tree_node: models consisting of other model objects (
mlatom.models.methods
,mlatom.models.ml_model
, or anothermlatom.models.model_tree_node
).
mlatom.simulations
makes use of an mlatom.models
object to perform simulations such as geometry optimizations or molecular dynamics.
Please check a simple example that illustrates the usage of these components.
Data
!---------------------------------------------------------------------------!
! data: Module for working with data !
! Implementations by: Pavlo O. Dral, Fuchun Ge, !
! Shuang Zhang, Yi-Fan Hou, Yanchi Ou !
!---------------------------------------------------------------------------!
- class mlatom.data.atom(nuclear_charge: int | None = None, atomic_number: int | None = None, element_symbol: str | None = None, nuclear_mass: float | None = None, xyz_coordinates: ndarray | List | None = None)
Create an atom object.
- Parameters:
nuclear_charge (int, optional) – provide the nuclear charge to define the atom.
atomic_number (int, optional) – provide the atomic number to define the atom.
element_symbol (int, optional) – provide the element symbol to define the atom.
nuclear_mass (int, optional) – provide the nuclear mass to define the atom.
xyz_coordinates (Array-like, optional) – specify the location of atom in Cartesian coordinates.
- class mlatom.data.molecule(charge: int = 0, multiplicity: int = 1, atoms: List[atom] = None, pbc: ndarray | bool | None = None, cell: ndarray | None = None)
Create a molecule object.
- Parameters:
charge (float, optional) – Specify the charge of the molecule.
multiplicity (int, optional) – Specify the multiplicity of the molecule.
atoms (List[
atom
], optional) – Specify the atoms in the molecule.
Examples
Select an atom inside with subscription:
from mlatom.data import atom, molecule at = atom(element_symbol = 'C') mol = molecule(atoms = [at]) print(id(at), id(mol[0]))
- id
The unique ID for this molecule.
- charge
The electric charge of the molecule.
- multiplicity
The multiplicity of the molecule.
- load(filename: stringe, format: string):
Load a molecule object from a dumped file.
Updates a molecule object if initialized:
mol = molecule(); mol.load(filename='mymol.json')
Returns a molecule object if called as class method:
mol = molecule.load(filename='mymol.json')
- Arguments:
filename (str): filename or path
format (str, optional): currently, only ‘json’ format is supported.
- property pbc
The periodic boundary conditions of the molecule. Setting it with
mol.pbc = True
is equal tomol.pbc = [True, True, True]
.
- property cell
The matrix of 3 vectors that defines the unicell. The setter of it simply wraps ase.geometry.cell.cellpar_to_cell().
- property cell_coordinates: ndarray
The relative coordinates in the cell.
- map_to_unicell()
Map all atoms outside the unicell into it.
- read_from_xyz_file(filename: str, format: str | None = None) molecule
Load molecular geometry from XYZ file.
Data in standard XYZ format can be read if parameter
format
is not specified.Extra formats supoorted are:
'COLUMBUS'
'NEWTON-X'
or'NX'
'turbomol'
- Parameters:
filename (str) – The name of the file to be read.
format (str, optional) – The format of the file.
- read_from_xyz_string(string: str = None, format: str | None = None) molecule
Load molecular geometry from XYZ string.
Data in standard XYZ format can be read if parameter
format
is not specified.Extra formats supoorted are:
'COLUMBUS'
'NEWTON-X'
or'NX'
'turbomol'
- Parameters:
string (str) – The string input.
format (str, optional) – The format of the string.
- read_from_numpy(coordinates: ndarray, species: ndarray) molecule
Load molecular geometry from a numpy array of coordinates and another one for species.
The shape of the input
coordinates
should be(N, 3)
, while(N,)
for the inputspecies
.Where the
N
is the number of atoms.
- read_from_smiles_string(smi_string: str) molecule
Generate molecular geometry from a SMILES string provided.
The geometry will be generated and optimized with Pybel’s
make3D()
method.
- classmethod from_xyz_file(filename: str, format: str | None = None) molecule
Classmethod wrapper for
molecule.read_from_xyz_file()
, returns amolecule
object.
- classmethod from_xyz_string(string: str = None, format: str | None = None) molecule
Classmethod wrapper for
molecule.read_from_xyz_string()
, returns amolecule
object.
- classmethod from_numpy(coordinates: ndarray, species: ndarray) molecule
Classmethod wrapper for
molecule.read_from_numpy()
, returns amolecule
object.
- classmethod from_smiles_string(smi_string: str) molecule
Classmethod wrapper for
molecule.read_from_smiles_string()
, returns amolecule
object.
- add_atom_from_xyz_string(line: str) None
Add an atom to molecule from a string in XYZ format
- add_scalar_property(scalar, property_name: str = 'y') None
Add a scalar property to the molecule. So the property can be called by molecule.<property_name>.
- Parameters:
scalar – The scalar to be added.
property_name (str, optional) – The name assign to the scalar property.
- add_xyz_derivative_property(derivative, property_name: str = 'y', xyz_derivative_property: str = 'xyz_derivatives') None
Add a XYZ derivative property to the molecule.
- Parameters:
derivative – The derivative property to be added.
property_name (str, optional) – The name of the associated non-derivative property.
xyz_derivative_property (str, optional) – the name assign to the derivative property.
- add_xyz_vectorial_property(vector, xyz_vectorial_property: str = 'xyz_vector') None
Add a XYZ vectorial property to the molecule.
- Parameters:
vector – The vector to be added.
xyz_vectorial_property (str, optional) – the name assign to the vectorial property.
- write_file_with_xyz_coordinates(filename: str, format: str | None = None) None
Write the molecular geometry data into a file. Data in standard XYZ format can be read if parameter
format
is not specified.Extra formats supoorted are:
'COLUMBUS'
'NEWTON-X'
or'NX'
'turbomol'
- Parameters:
filename (str) – The name of the file to be written.
format (str, optional) – The format of the file.
- get_xyz_string() str
Return the molecular geometry in a string of XYZ format.
- property atomic_numbers: ndarray
The atomic numbers of the atoms in the molecule.
- property element_symbols: ndarray
The element symbols of the atoms in the molecule.
- property smiles: str
The SMILES representation of the molecule.
- property xyz_coordinates: ndarray
The XYZ geometry of the molecule.
- property kinetic_energy: float
Give out the kinetic energy (A.U.) based on the xyz_velocities.
- copy(atomic_labels=None, molecular_labels=None)
Return a copy of current molecule object.
- proliferate(shifts: Iterable | None = None, XYZshifts: Iterable | None = None, Xshifts: Iterable | None = [0], Yshifts: Iterable | None = [0], Zshifts: Iterable | None = [0], PBC_constrained: bool = True) molecule
Proliferate the unicell by specified shifts along cell vectors (called X/Y/Z here).
Returns a new
molecule
object.- Parameters:
shifts (Iterable, optional) – The list of shifts to perform. Each shift should be a 3D vector that indicates the coefficient applies to the corresponding cell vector.
XYZshifts (Iterable, optional) – Generate all possible shifts with given shift coefficients in all three directions when a list is specified. When a list of 3 lists is specified, it’s equal to setting X/Y/Zshifts
Xshifts (Iterable, optional) – Specify all possible shift coefficients in the direction of the first cell vector.
Yshifts (Iterable, optional) – Specify all possible shift coefficients in the direction of the second cell vector.
Zshifts (Iterable, optional) – Specify all possible shift coefficients in the direction of the third cell vector.
PBC_constrained (bool) – Controls whether the shifts in some directions are disabled where corresponding PBC is false. Only applies to XYZshifts.
Note
- Priorities for different types of shifts:
shifts
>XYZshifts
>X/Y/Zshifts
Examples
Single H atom in the centre of a cubic cell (2x2x2):
mol = ml.molecule.from_numpy(np.ones((1, 3)), np.array([1])) mol.pbc = True mol.cell = 2
Proliferate to get two periods in all three directions, with shifts:
new_mol = mol.proliferate( shifts = [ [0, 0, 0], [1, 0, 0], [0, 1, 0], [0, 0, 1], [1, 1, 0], [1, 0, 1], [0, 1, 1], [1, 1, 1], ] )
with XYZshifts:
new_mol = mol.proliferate(XYZshifts=range(2)) # or new_mol = mol.proliferate(XYZshifts=[range(2)]*3)
with X/Y/Zshifts:
new_mol = mol.proliferate(Xshifts=range(2), Yshifts=(0, 1), Zshifts=[0, 1]))
All scripts above will make
new_mol.xyz_coordinates
be:array([[1., 1., 1.], [3., 1., 1.], [1., 3., 1.], [3., 3., 1.], [1., 1., 3.], [3., 1., 3.], [1., 3., 3.], [3., 3., 3.]])
- dump(filename=None, format='json')
Dump the current molecule object into a file. Only in json format, which is supported now.
- property state_energies: ndarray
The electronic state energies of the molecule.
- property state_gradients: ndarray
The electronic state energy gradients of the molecule.
- property energy_gaps: ndarray
The energy gaps of different states.
- property excitation_energies: ndarray
The excitation energies of the molecule from ground state.
- get_xyzvib_string(normal_mode=0)
Get the xyz string with geometries and displacements along the vibrational normal modes
- view(normal_mode=None)
Visualize the molecule and its vibrations if requested. Uses
py3Dmol
. :param normal_mode: the index of a normal mode to visualize. Default: None. :type normal_mode: integer, optional
- class mlatom.data.molecular_database(molecules: List[molecule] = None)
Create a database for molecule objects.
- Parameters:
molecules (List[
molecule
]) – A list of molecule to be included in the molecular database.
Examples
Select an atom inside with subscription:
from mlatom.data import atom, molecule, molecular_database at = atom(element_symbol = 'C') mol = molecule(atoms = [at]) molDB = molecular_database([mol]) print(id(mol) == id(molDB[0])) # the output should be 'True'
Slicing the database like a numpy array:
from mlatom.data import molecular_database molDB = molecular_database.from_xyz_file('devtests/al/h2_fci_db/xyz.dat') print(len(molDB)) # 451 print(len(molDB[:100:4])) # 25
- read_from_xyz_file(filename: str, append: bool = False) molecular_database
Load molecular geometry from XYZ file.
- Parameters:
filename (str) – The name of the file to be read.
append (bool, optional) – Append to current database if True, otherwise clear the current database.
- read_from_xyz_string(string: str, append=False) molecular_database
Load molecular geometry from XYZ string.
- Parameters:
string (str) – The name of the file to be read.
append (bool, optional) – Append to current database if True, otherwise clear the current database.
- read_from_numpy(coordinates: ndarray, species: ndarray, append: bool = False) molecular_database
Load molecular geometries from a numpy array of coordinates and another one for species.
The shape of the input
coordinates
should be(M, N, 3)
, while(M, N,)
for the inputspecies
.Where the
N
is the number of atoms andM
is the number of molecules.
- read_from_smiles_file(smi_file: str, append: bool = False) molecular_database
Generate molecular geometries from a SMILES file provided.
The geometries will be generated and optimized with Pybel’s
make3D()
method.
- read_from_smiles_string(smi_string: str, append: bool = False) molecular_database
Generate molecular geometries from a SMILES string provided.
The geometries will be generated and optimized with Pybel’s
make3D()
method.
- classmethod from_xyz_file(filename: str) molecular_database
Classmethod wrapper for
molecular_database.read_from_xyz_file()
, returns amolecular_database
object.
- classmethod from_xyz_string(string: str) molecular_database
Classmethod wrapper for
molecular_database.read_from_xyz_string()
, returns amolecular_database
object.
- classmethod from_numpy(coordinates: ndarray, species: ndarray) molecular_database
Classmethod wrapper for
molecular_database.read_from_numpy()
, returns amolecular_database
object.
- classmethod from_smiles_file(smi_file: str) molecular_database
Classmethod wrapper for
molecular_database.read_from_smiles_file()
, returns amolecular_database
object.
- classmethod from_smiles_string(smi_string: str | List) molecular_database
Classmethod wrapper for
molecular_database.read_from_smiles_string()
, returns amolecular_database
object.
- add_scalar_properties(scalars, property_name: str = 'y') None
Add scalar properties to the molecules.
- Parameters:
scalars – The scalar to be added.
property_name (str, optional) – The name assign to the scalar property.
- add_scalar_properties_from_file(filename: str, property_name: str = 'y') None
Add scalar properties from a file to the molecules.
- Parameters:
filename (str) – Specify the text file that contains properties.
property_name (str, optional) – The name assign to the scalar property.
- add_xyz_derivative_properties(derivatives, property_name: str = 'y', xyz_derivative_property: str = 'xyz_derivatives') None
Add a XYZ derivative property to the molecule.
- Parameters:
derivatives – The derivatives to be added.
property_name (str, optional) – The name of the associated non-derivative property.
xyz_derivative_property (str, optional) – the name assign to the derivative property.
- add_xyz_derivative_properties_from_file(filename: str, property_name: str = 'y', xyz_derivative_property: str = 'xyz_derivatives') None
Add a XYZ derivatives from a text file to the molecules.
- Parameters:
filename (str) – The filename that contains derivatives to be added.
property_name (str, optional) – The name of the associated non-derivative properties.
xyz_derivative_property (str, optional) – the name assign to the derivative properties.
- add_xyz_vectorial_properties(vectors, xyz_vectorial_property: str = 'xyz_vector') None
Add a XYZ vectorial properties to the molecules.
- Parameters:
vectors – The vectors to be added.
xyz_vectorial_property (str, optional) – the name assign to the vectorial properties.
- add_xyz_vectorial_properties_from_file(filename: str, xyz_vectorial_property: str = 'xyz_vector') None
Add a XYZ derivatives from a text file to the molecules.
- Parameters:
filename (str) – The filename that contains vectorial properties to be added.
xyz_vectorial_property (str, optional) – the name assign to the vectorial properties.
- write_file_with_xyz_coordinates(filename: str) None
Write the molecular geometries into a file in XYZ format.
- Parameters:
filename (str) – The name of the file to be written.
- get_xyz_string() None
Return a string in XYZ format for the molecules.
- write_file_with_properties(filename, property_to_write='y')
Write a property of molecules to a text file.
- property atomic_numbers: ndarray
The 2D array of the atomic numbers of each atom, for all molecules in the database.
- property element_symbols: ndarray
The 2D array of the element symbols of each atom, for all molecules in the database.
- property ids
The IDs of the molecules in the database.
- property smiles: str
The SMILES string of the molecules in the database.
- write_file_with_smiles(filename)
Write the SMILES of the molecules in the database to a file.
- property nuclear_masses
The nuclear_masses of the molecules in the database.
- property charges
The electric charges of the molecules in the database.
- property multiplicities
The multiplicities of the molecules in the database.
- get_properties(property_name='y')
Return the properties of the molecules by a given property name.
- set_properties(**kwargs)
Set properties of the molecules by given property name(s) as keyword(s).
- get_xyz_derivative_properties(xyz_derivative_property='xyz_derivatives')
Return XYZ derivative properties by the name.
- get_xyz_vectorial_properties(property_name)
Return XYZ vectorial properties by the name.
- write_file_with_xyz_derivative_properties(filename, xyz_derivative_property_to_write='xyz_derivatives')
Write XYZ derivative properties into a file.
- write_file_energy_gradients(filename)
Write energy gradients into a file.
- write_file_with_xyz_vectorial_properties(filename, xyz_vectorial_property_to_write='xyz_vector')
Write XYZ vectorial properties into a file.
- write_file_with_hessian(filename, hessian_property_to_write='hessian')
Write Hessians into a file.
- append(obj)
Append a molecule/molecular database.
- copy(atomic_labels=None, molecular_labels=None, molecular_database_labels=None)
Return a copy of the database.
- proliferate(*args, **kwargs) molecular_database
Proliferate the unicell by specified shifts along cell vectors.
Returns a new
molecular_databse
object.Check
molecule.proliferate()
for details on options.
- dump(filename=None, format=None)
Dump the molecular database to a file.
- classmethod load(filename=None, format=None)
Load a molecular database from a file.
- split(sampling='random', number_of_splits=2, split_equally=None, fraction_of_points_in_splits=None)
Splits molecular database.
- Parameters:
sampling (str, optional) – default ‘random’. Can be also ‘none’.
split_equally (bool, optinoal) – default
False
; if set toTrue
splits 50:50.fraction_of_points_in_splits (list, optional) – e.g., [0.8, 0.2] is the default one
indices
- property xyz_coordinates
The XYZ coordinates of each atom in every molecule.
- view()
Visualize the molecular database. Uses
py3Dmol
.
- class mlatom.data.molecular_trajectory(steps=None)
A class for storing/access molecular trajectory data, which is generated from a dynamics or a optimization task.
- dump(filename=None, format=None)
Dump the molecular_trajectory object into a file.
Available formats are:
'h5md'
(requires python moduleh5py
andpyh5md
)'json'
'plain_text'
- load(filename: str = None, format: str = None)
Load the previously dumped molecular_trajectory from file.
- get_xyz_string() str
Return the XYZ string of the molecules in the trajectory.
- to_database() molecular_database
Return a molecular database comprising the molecules in the trajectory.
- view()
Visualize the molecular trajectory. Uses
py3Dmol
.
- class mlatom.data.h5md(filename: str, data: Dict[str, Any] = {}, mode: str = 'w')
Saving trajectory data to file in H5MD format
- Parameters:
filename (str) – The filename of the h5md file output.
data (Dict) – The data to be stored (optional, if provided, the file will be closed after storing data).
mode (str, optional) – A string that controls the file processing mode (default value: ‘w’ for a new file, ‘r+’ for an exisiting file). The choices are listed in the table below which is consistent with
pyh5md.File()
r
Readonly, file must exist
r+
Read/write, file must exist
w
Create file, truncate if exists
w- or x
Create file, fail if exists
Examples:
traj0 = h5md('traj.h5') # open 'traj.h5' traj1 = h5md('/tmp/test.h5', mode='r') # open an existing file in readonly mode traj2 = h5md('/tmp/traj2.h5', data={'time': 1.0, 'total_energy': -32.1, 'test': 8848}) # add some data to the file, then close the file traj0.write(data) # write data to opened file traj0(data) # an alternative way to write data data = traj0.export() # export the data in the opened file data = traj0() # an alternative way to export data with h5md('test.h5') as traj: # export with a with statement data = traj.export() traj0.close() # close the file
Note
the default data path in HDF5 file
- particles/all:
‘box’, ‘gradients’, ‘mass’, ‘nad’, ‘names’, ‘position’, ‘species’, ‘velocities’
- observables:
‘angular_momentum’, ‘generated_random_number’, ‘kinetic_energy’, ‘linear_momentum’, ‘nstatdyn’, ‘oscillator_strengths’, ‘populations’, ‘potential_energy’, ‘random_seed’, ‘sh_probabilities’, ‘total_energy’, ‘wavefunctions’,
and any other keywords
- h5
the HDF5 file object
- write(data: Dict[str, Any]) None
Write data to the opened H5 file. Data should be a dictionary-like object with ‘time’ in its keys().
- export() Dict[str, ndarray]
Export the data in the opened H5 file.
- Returns:
A dictionary of the trajectory data in the H5 file.
- close() None
Close the opened file.
- __call__() Dict[str, ndarray]
Export the data in the opened H5 file.
- Returns:
A dictionary of the trajectory data in the H5 file.
Models
!---------------------------------------------------------------------------!
! models: Module with models !
! Implementations by: Pavlo O. Dral, Fuchun Ge, Yi-Fan Hou, Yuxinxin Chen, !
! Peikun Zheng !
!---------------------------------------------------------------------------!
- class mlatom.models.model
Parent (super) class for models to enable useful features such as logging during geometry optimizations.
- config_multiprocessing()
for scripts that need to be executed before running model in parallel
- predict(molecular_database: molecular_database = None, molecule: molecule = None, calculate_energy: bool = False, calculate_energy_gradients: bool = False, calculate_hessian: bool = False, **kwargs)
Make predictions for molecular geometries with the model.
- Parameters:
molecular_database (
mlatom.data.molecular_database
, optional) – A database contains the molecules whose properties need to be predicted by the model.molecule (
mlatom.models.molecule
, optional) – A molecule object whose property needs to be predicted by the model.calculate_energy (bool, optional) – Use the model to calculate energy.
calculate_energy_gradients (bool, optional) – Use the model to calculate energy gradients.
calculate_hessian (bool, optional) – Use the model to calculate energy hessian.
methods
!---------------------------------------------------------------------------!
! models: Module with models !
! Implementations by: Pavlo O. Dral, Fuchun Ge, Yi-Fan Hou, Yuxinxin Chen, !
! Peikun Zheng !
!---------------------------------------------------------------------------!
- class mlatom.models.methods(method: str = None, program: str = None, **kwargs)
Create a model object with a specified method.
- Parameters:
method (str) – Specify the method. Available methods are listed in the section below.
program (str, optional) – Specify the program to use.
**kwargs – Other method-specific options
Available Methods:
'AIQM1'
,'AIQM1@DFT'
,'AIQM1@DFT*'
,'AM1'
,'ANI-1ccx'
,'ANI-1x'
,'ANI-1x-D4'
,'ANI-2x'
,'ANI-2x-D4'
,'CCSD(T)*/CBS'
,'CNDO/2'
,'D4'
,'DFTB0'
,'DFTB2'
,'DFTB3'
,'GFN2-xTB'
,'MINDO/3'
,'MNDO'
,'MNDO/H'
,'MNDO/d'
,'MNDO/dH'
,'MNDOC'
,'ODM2'
,'ODM2*'
,'ODM3'
,'ODM3*'
,'OM1'
,'OM2'
,'OM3'
,'PM3'
,'PM6'
,'RM1'
,'SCC-DFTB'
,'SCC-DFTB-heats'
.Methods listed above can be accepted without specifying a program. The required programs still have to be installed though as described in the installation manual.
Available Programs and Their Corresponding Methods:
Program
Methods
TorchANI
'AIQM1'
,'AIQM1@DFT'
,'AIQM1@DFT*'
,'ANI-1ccx'
,'ANI-1x'
,'ANI-1x-D4'
,'ANI-2x'
,'ANI-2x-D4'
,'ANI-1xnr'
dftd4
'AIQM1'
,'AIQM1@DFT'
,'ANI-1x-D4'
,'ANI-2x-D4'
,'D4'
MNDO or Sparrow
'AIQM1'
,'AIQM1@DFT'
,'AIQM1@DFT*'
,'MNDO'
,'MNDO/d'
,'ODM2*'
,'ODM3*'
,'OM2'
,'OM3'
,'PM3'
,'SCC-DFTB'
,'SCC-DFTB-heats'
MNDO
'CNDO/2'
,'MINDO/3'
,'MNDO/H'
,'MNDO/dH'
,'MNDOC'
,'ODM2'
,'ODM3'
,'OM1'
, semiempirical OMx, DFTB, NDDO-type methodsSparrow
'DFTB0'
,'DFTB2'
,'DFTB3'
,'PM6'
,'RM1'
, semiempirical OMx, DFTB, NDDO-type methodsxTB
'GFN2-xTB'
, semiempirical GFNx-TB methodsOrca
'CCSD(T)*/CBS'
, DFTGaussian
ab initio methods, DFT
PySCF
ab initio methods, DFT
- property nthreads
int([x]) -> integer int(x, base=10) -> integer
Convert a number or string to an integer, or return 0 if no arguments are given. If x is a number, return x.__int__(). For floating point numbers, this truncates towards zero.
If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in the given base. The literal can be preceded by ‘+’ or ‘-’ and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal. >>> int(‘0b100’, base=0) 4
- predict(*args, **kwargs)
Make predictions for molecular geometries with the model.
- Parameters:
molecular_database (
mlatom.data.molecular_database
, optional) – A database contains the molecules whose properties need to be predicted by the model.molecule (
mlatom.models.molecule
, optional) – A molecule object whose property needs to be predicted by the model.calculate_energy (bool, optional) – Use the model to calculate energy.
calculate_energy_gradients (bool, optional) – Use the model to calculate energy gradients.
calculate_hessian (bool, optional) – Use the model to calculate energy hessian.
- config_multiprocessing()
for scripts that need to be executed before running model in parallel
AIQM1
ml_model
!---------------------------------------------------------------------------!
! models: Module with models !
! Implementations by: Pavlo O. Dral, Fuchun Ge, Yi-Fan Hou, Yuxinxin Chen, !
! Peikun Zheng !
!---------------------------------------------------------------------------!
- class mlatom.models.ml_model
Useful as a superclass for the ML models that need to be trained.
- train(molecular_database: molecular_database, property_to_learn: str | None = 'y', xyz_derivative_property_to_learn: str = None) None
Train the model with molecular database provided.
- Parameters:
molecular_database (
mlatom.data.molecular_database
) – The database of molecules for training.property_to_learn (str, optional) – The label of property to be learned in model training.
xyz_derivative_property_to_learn (str, optional) – The label of XYZ derivative property to be learned.
- predict(molecular_database: data.molecular_database = None, molecule: data.molecule = None, calculate_energy: bool = False, property_to_predict: str | None = 'estimated_y', calculate_energy_gradients: bool = False, xyz_derivative_property_to_predict: str | None = 'estimated_xyz_derivatives_y', calculate_hessian: bool = False, hessian_to_predict: str | None = 'estimated_hessian_y') None
Make predictions for molecular geometries with the model.
- Parameters:
molecular_database (
mlatom.data.molecular_database
, optional) – A database contains the molecules whose properties need to be predicted by the model.molecule (
mlatom.models.molecule
, optional) – A molecule object whose property needs to be predicted by the model.calculate_energy (bool, optional) – Use the model to calculate energy.
calculate_energy_gradients (bool, optional) – Use the model to calculate energy gradients.
calculate_hessian (bool, optional) – Use the model to calculate energy hessian.
property_to_predict (str, optional) – The label name where the predicted properties to be saved.
xyz_derivative_property_to_predict (str, optional) – The label name where the predicted XYZ derivatives to be saved.
hessian_to_predict (str, optional) – The label name where the predicted Hessians to be saved.
- generate_model_dict()
Generates model dictionary for dumping in json format.
- reset()
Resets model (deletes the ML model file from the hard disk).
- dump(filename=None, format='json')
Dumps model class object information in a json file (do not confused with saving the model itself, i.e., its parameters!).
- calculate_validation_loss(training_kwargs=None, prediction_kwargs=None, cv_splits_molecular_databases=None, calculate_CV_split_errors=False, subtraining_molecular_database=None, validation_molecular_database=None, validation_loss_function=None, validation_loss_function_kwargs={}, debug=False)
Returns the validation loss for the given hyperparameters.
By default, the validation loss is RMSE evaluated as a geometric mean of scalar and vectorial properties, e.g., energies and gradients.
- Parameters:
training_kwargs (dict, optional) – the kwargs to be passed to
yourmodel.train()
function.prediction_kwargs (dict, optional) – the kwargs to be passed to
yourmodel.predict()
function.cv_splits_molecular_databases (list, optional) – the list with cross-validation splits, each element is
molecular_database
.calculate_CV_split_errors (bool, optional) – requests to return the errors for each cross-validation split as a list in addtion to the aggregate cross-validation error.
subtraining_molecular_database (
molecular_database
, optional) – molecular database for sub-training to be passed toyourmodel.train()
function.validation_molecular_database (
molecular_database
, optional) – molecular database for validation to be passed toyourmodel.predict()
function.validation_loss_function (function, optional) – user-defined validation function.
validation_loss_function_kwargs (dict, optional) – kwargs for above
validation_loss_function
.
- optimize_hyperparameters(hyperparameters=None, training_kwargs=None, prediction_kwargs=None, cv_splits_molecular_databases=None, subtraining_molecular_database=None, validation_molecular_database=None, optimization_algorithm=None, optimization_algorithm_kwargs={}, maximum_evaluations=10000, validation_loss_function=None, validation_loss_function_kwargs={}, debug=False)
Optimizes hyperparameters by minimizing the validation loss.
By default, the validation loss is RMSE evaluated as a geometric mean of scalar and vectorial properties, e.g., energies and gradients.
- Parameters:
hyperparameters (list, required) – the list with strings - names of hyperparameters. Hyperparameters themselves must be in
youmodel.hyperparameters
defined with class instancehyperparameters
consisting ofhyperparameter
defining the optimization space.training_kwargs (dict, optional) – the kwargs to be passed to
yourmodel.train()
function.prediction_kwargs (dict, optional) – the kwargs to be passed to
yourmodel.predict()
function.cv_splits_molecular_databases (list, optional) – the list with cross-validation splits, each element is
molecular_database
.calculate_CV_split_errors (bool, optional) – requests to return the errors for each cross-validation split as a list in addtion to the aggregate cross-validation error.
subtraining_molecular_database (
molecular_database
, optional) – molecular database for sub-training to be passed toyourmodel.train()
function.validation_molecular_database (
molecular_database
, optional) – molecular database for validation to be passed toyourmodel.predict()
function.validation_loss_function (function, optional) – user-defined validation function.
validation_loss_function_kwargs (dict, optional) – kwargs for above
validation_loss_function
.optimization_algorithm (str, required) – optimization algorithm. No default, must be specified among: ‘grid’ (‘brute’), ‘TPE’, ‘Nelder-Mead’, ‘BFGS’, ‘L-BFGS-B’, ‘Powell’, ‘CG’, ‘Newton-CG’, ‘TNC’, ‘COBYLA’, ‘SLSQP’, ‘trust-constr’, ‘dogleg’, ‘trust-krylov’, ‘trust-exact’.
optimization_algorithm_kwargs (dict, optional) – kwargs to be passed to optimization algorithm, e.g.,
{'grid_size': 5}
(default 9 for the grid search).maximum_evaluations (int, optional) – maximum number of optimization evaluations (default: 10000) supported by all optimizers except for grid search.
Saves the final hyperparameters in
yourmodel.hyperparameters
adn validation loss inyourmodel.validation_loss
.
- class mlatom.models.hyperparameter(value: Any = None, optimization_space: str = 'linear', dtype: Callable | None = None, name: str = '', minval: Any = None, maxval: Any = None, step: Any = None, choices: Iterable[Any] = [], **kwargs)
Class of hyperparameter object, containing data could be used in hyperparameter optimizations.
- Parameters:
value (Any, optional) – The value of the hyperparameter.
optimization_space (str, optional) – Defines the space for hyperparameter. Currently supports
'linear'
, and'log'
.dtype (Callable, optional) – A callable object that forces the data type of value. Automatically choose one if set to
None
.
- update(new_hyperparameter: hyperparameter) None
Update hyperparameter with data in another instance.
- Parameters:
new_hyperparameter (
mlatom.models.hyperparamters
) – Whose data are to be applied to the current instance.
- copy()
Returns a copy of current instance.
- Returns:
a new instance copied from current one.
- Return type:
mlatom.models.hyperparamter
- class mlatom.models.hyperparameters(dict=None, /, **kwargs)
Class for storing hyperparameters, values are auto-converted to
mlatom.models.hyperparameter
objects. Inherit from collections.UserDict.- Initiaion:
Initiate with a dictinoary or kwargs or both.
e.g.:
hyperparamters({'a': 1.0}, b=hyperparameter(value=2, minval=0, maxval=4))
- copy(keys: Iterable[str] | None = None) hyperparameters
Returns a copy of current instance.
- Parameters:
keys (Iterable[str], optional) – If keys provided, only the hyperparameters selected by keys will be copied, instead of all hyperparameters.
- Returns:
a new instance copied from current one.
- Return type:
mlatom.models.hyperparamters
- class mlatom.models.kreg(model_file: str | None = None, ml_program: str = 'KREG_API', equilibrium_molecule: molecule | None = None, prior: float = 0, nthreads: int | None = None, hyperparameters: Dict[str, Any] | hyperparameters = {})
Create a KREG model object.
- Parameters:
model_file (str, optional) – The name of the file where the model should be dumped to or loaded from.
ml_program (str, optional) – Specify which ML program to use. Avaliable options:
'KREG_API'
,'MLatomF
.equilibrium_molecule (
mlatom.data.molecule
| None) – Specify the equilibrium geometry to be used to generate RE descriptor. The geometry with lowest energy/value will be selected if set toNone
.prior (default - None) – the prior can be ‘mean’, None (0.0), or any floating point number.
hyperparameters (Dict[str, Any] |
mlatom.models.hyperparameters
, optional) – Updates the hyperparameters of the model with provided.
- generate_model_dict()
Generates model dictionary for dumping in json format.
- train(molecular_database=None, property_to_learn=None, xyz_derivative_property_to_learn=None, save_model=True, invert_matrix=False, matrix_decomposition=None, prior=None, hyperparameters: Dict[str, Any] | hyperparameters = {})
Train the KREG model with molecular database provided.
- Parameters:
molecular_database (
mlatom.data.molecular_database
) – The database of molecules for training.property_to_learn (str, optional) – The label of property to be learned in model training.
xyz_derivative_property_to_learn (str, optional) – The label of XYZ derivative property to be learned.
prior (str or float or int, optional) – default zero prior. It can also be ‘mean’ and any user-defined number.
- predict(molecular_database=None, molecule=None, calculate_energy=False, calculate_energy_gradients=False, calculate_hessian=False, property_to_predict=None, xyz_derivative_property_to_predict=None, hessian_to_predict=None)
Make predictions for molecular geometries with the model.
- Parameters:
molecular_database (
mlatom.data.molecular_database
, optional) – A database contains the molecules whose properties need to be predicted by the model.molecule (
mlatom.models.molecule
, optional) – A molecule object whose property needs to be predicted by the model.calculate_energy (bool, optional) – Use the model to calculate energy.
calculate_energy_gradients (bool, optional) – Use the model to calculate energy gradients.
calculate_hessian (bool, optional) – Use the model to calculate energy hessian.
property_to_predict (str, optional) – The label name where the predicted properties to be saved.
xyz_derivative_property_to_predict (str, optional) – The label name where the predicted XYZ derivatives to be saved.
hessian_to_predict (str, optional) – The label name where the predicted Hessians to be saved.
- mlatom.models.ani(**kwargs)
Returns an ANI model object (see
mlatom.interfaces.torchani_interface.ani
).
- mlatom.models.dpmd(**kwargs)
Returns a DPMD model object (see
mlatom.interfaces.dpmd_interface.dpmd
).
- mlatom.models.gap(**kwargs)
Returns a GAP model object (see
mlatom.interfaces.gap_interface.gap
).
- mlatom.models.physnet(**kwargs)
Returns a PhysNet model object (see
mlatom.interfaces.physnet_interface.physnet
).
- mlatom.models.sgdml(**kwargs)
Returns an sGDML model object (see
mlatom.interfaces.sgdml_interface.sgdml
).
- mlatom.models.mace(**kwargs)
Returns an MACE model object (see
mlatom.interfaces.mace_interface.mace
).
model_tree_node
!---------------------------------------------------------------------------!
! models: Module with models !
! Implementations by: Pavlo O. Dral, Fuchun Ge, Yi-Fan Hou, Yuxinxin Chen, !
! Peikun Zheng !
!---------------------------------------------------------------------------!
- class mlatom.models.model_tree_node(name=None, parent=None, children=None, operator=None, model=None)
Create a model tree node.
- Parameters:
name (str) – The name assign to the object.
parent – The parent of the model node.
children – The children of this model tree node.
operator – Specify the operation to be made when making predictions.
- predict(**kwargs)
Make predictions for molecular geometries with the model.
- Parameters:
molecular_database (
mlatom.data.molecular_database
, optional) – A database contains the molecules whose properties need to be predicted by the model.molecule (
mlatom.models.molecule
, optional) – A molecule object whose property needs to be predicted by the model.calculate_energy (bool, optional) – Use the model to calculate energy.
calculate_energy_gradients (bool, optional) – Use the model to calculate energy gradients.
calculate_hessian (bool, optional) – Use the model to calculate energy hessian.
- dump(filename=None, format='json')
Dump the object to a file.
Interfaces
Interfaces to third-party software.
TorchANI
DeepMD-kit
GAP/QUIP
PhysNet
MACE
sGDML
Gaussian
Orca
DFT-D4
PySCF
Sparrow
xTB
MNDO
Simulations
!---------------------------------------------------------------------------!
! simulations: Module for simulations !
! Implementations by: Pavlo O. Dral !
!---------------------------------------------------------------------------!
Geomopt, freq, DMC
- class mlatom.simulations.optimize_geometry(model=None, model_predict_kwargs={}, initial_molecule=None, molecule=None, ts=False, program=None, optimization_algorithm=None, maximum_number_of_steps=None, convergence_criterion_for_forces=None, working_directory=None, print_properties=None, dump_trajectory_interval=None, filename=None, format='json', **kwargs)
Geometry optimization.
- Parameters:
model (
mlatom.models.model
ormlatom.models.methods
) – any model or method which provides energies and forces.initial_molecule (
mlatom.data.molecule
) – the molecule object to optimize.ts (bool, optional) – whether to do transition state search. Currently only be done with program=Gaussian, ASE and geometric.
program (str, optional) – the engine used in geometry optimization. Currently supports Gaussian, ASE, scipy and PySCF.
optimization_algorithm (str, optional) – the optimization algorithm used in ASE. Default value: LBFGS (ts=False), dimer (ts=False).
maximum_number_of_steps (int, optional) – the maximum number of steps for ASE, SciPy and geometric. Default value: 200.
convergence_criterion_for_forces (float, optional) – forces convergence criterion in ASE. Default value: 0.02 eV/Angstroms.
working_directory (str, optional) – working directory. Default value: ‘.’, i.e., current directory.
constraints (dict, optional) – constraints for geometry optimization. Currently only available with program=ASE and program=geometric. For program=ASE, constraints follows the same conventions as in ASE:
constraints={'bonds':[[target,[index0,index1]], ...],'angles':[[target,[index0,index1,index2]], ...],'dihedrals':[[target,[index0,index1,index2,index3]], ...]}
(check FixInternals class in ASE for more information). For program=geometric, the name of constraint file should be provided and please refer to constrained optimization for the format of the constraint file.print_properties (None or str, optional) – properties to print. Default: None. Possible ‘all’.
dump_trajectory_interval (int, optional) – dump trajectory at every time step (1). Set to
None
to disable dumping (default).filename (str, optional) – the file that saves the dumped trajectory.
format (str, optional) – format in which the dumped trajectory is saved.
Examples:
# Initialize molecule mol = ml.data.molecule() mol.read_from_xyz_file(filename='ethanol.xyz') # Initialize methods aiqm1 = ml.models.methods(method='AIQM1', qm_program='MNDO') # Run geometry optimization geomopt = ml.simulations.optimize_geometry(model = aiqm1, initial_molecule=mol, program = 'ASE') # Get the optimized geometry, energy, and gradient optmol = geomopt.optimized_molecule geo = optmol.get_xyz_coordinates() energy = optmol.energy gradient = optmol.get_energy_gradients()
- class mlatom.simulations.freq(model=None, model_predict_kwargs={}, molecule=None, program=None, ir=False, raman=False, normal_mode_normalization='mass deweighted normalized', anharmonic=False, anharmonic_kwargs={}, working_directory=None)
Frequence analysis.
- Parameters:
model (
mlatom.models.model
ormlatom.models.methods
) – any model or method which provides energies and forces and Hessian.molecule (
mlatom.data.molecule
) – the molecule object with necessary information.program (str, optional) – the engine used in frequence analysis through modified TorchANI (if Gaussian not found or any other string is given), pyscf or Gaussian interfaces.
normal_mode_normalization (str, optional) – normal modes output scheme. It should be one of: mass weighted normalized, mass deweighted unnormalized, and mass deweighted normalized (default).
anharmonic (bool) – whether to do anharmonic frequence calculation.
working_directory (str, optional) – working directory. Default value: ‘.’, i.e., current directory.
Examples:
# Initialize molecule mol = ml.data.molecule() mol.read_from_xyz_file(filename='ethanol.xyz') # Initialize methods aiqm1 = ml.models.methods(method='AIQM1', qm_program='MNDO') # Run frequence analysis ml.simulations.freq(model=aiqm1, molecule=mol, program='ASE') # Get frequencies frequencies = mol.frequencies
- class mlatom.simulations.thermochemistry(model=None, molecule=None, program=None, ir=False, raman=False, normal_mode_normalization='mass deweighted normalized')
Thermochemical properties calculation.
- Parameters:
model (
mlatom.models.model
ormlatom.models.methods
) – any model or method which provides energies and forces and Hessian.molecule (
mlatom.data.molecule
) – the molecule object with necessary information.program (str) – the engine used in thermochemical properties calculation. Currently support Gaussian and ASE.
normal_mode_normalization (str, optional) – normal modes output scheme. It should be one of: mass weighted normalized, mass deweighted unnormalized, and mass deweighted unnormalized (default).
# Initialize molecule mol = ml.data.molecule() mol.read_from_xyz_file(filename='ethanol.xyz') # Initialize methods aiqm1 = ml.models.methods(method='AIQM1', qm_program='MNDO') # Run thermochemical properties calculation ml.simulations.thermochemistry(model=aiqm1, molecule=mol, program='ASE') # Get ZPE and heat of formation ZPE = mol.ZPE Hof = mol.DeltaHf298
The thermochemical properties available in
molecule
object after the calculation:ZPE
: Zero-point energyDeltaE2U
: Thermal correction to Energy (only available in Gaussian)DeltaE2H
: Thermal correction to Enthalpy (only available in Gaussian)DeltaE2G
: Thermal correction to Gibbs free energy (only available in Gaussian)U0
: Internal energy at 0KH0
: Enthalpy at 0KU
: Internal energy (only available in Gaussian)H
: EnthalpyG
: Gibbs free energyS
: Entropy (only available in Gaussian)atomization_energy_0K
ZPE_exclusive_atomization_energy_0K
DeltaHf298
: Heat of formation at 298 K
- class mlatom.simulations.dmc(model: model, initial_molecule: molecule = None, initial_molecular_database: molecular_database = None, energy_scaling_factor: float = 1.0)
Run diffusion Monte Carlo simulation for molecule(s) using PyVibDMC.
- Parameters:
model (
mlatom.models.model
) – The potential energy surfaces model. The unit should be Hartree, otherwise a correctenergy_scaling_factor
need to be set.initial_molecule (
mlatom.data.molecule
) – The initial geometry for the walkers. Usually a energy minimum geometry should be provided. By default every coordinate will be scaled by 1.01 to make it slightly distorted.energy_scaling_factor (float, optional) – A factor that will be multiplied to the model’s energy pridiction.
- run(run_dir: str = 'DMC', weighting: str = 'discrete', number_of_walkers: int = 5000, number_of_timesteps: int = 10000, equilibration_steps: int = 500, dump_trajectory_interval: int = 500, dump_wavefunction_interval: int = 1000, descendant_weighting_steps: int = 300, time_step: float = 0.024188843265857, initialize: bool = False)
Run the DMC simulation.
- Parameters:
run_dir (str) – The folder for the output files.
weighting (str) –
'discrete'
or'continuous'
.'continuous'
keeps the ensemble size constant.number_of_walkers (int) – The number of geometries exploring the potential surface.
number_of_timesteps (int) – The number of steps the simulation will go.
equilibration_steps (int) – The number of steps for equilibration.
dump_trajectory_interval (int) – The interval for dumping walkers’ trajectories.
dump_wavefunction_interval (int) – The interval for collecting wave function.
descendant_weighting_steps (int) – The number of time steps for descendant weighting per wave function.
time_step (float) – The length of each time step in fs.
- load(filename)
Load previous simulation results from a HDF5 file.
- get_zpe(start_step=1000) float
Return calculated zero-point energy in Hartree.
- Parameters:
start_step (int) – The starting step for averaging the energies.
Initial conditions
- mlatom.initial_conditions.generate_initial_conditions(molecule=None, generation_method=None, number_of_initial_conditions=1, file_with_initial_xyz_coordinates=None, file_with_initial_xyz_velocities=None, eliminate_angular_momentum=True, degrees_of_freedom=None, initial_temperature=None, initial_kinetic_energy=None, use_hessian=False, reaction_coordinate_momentum=True, filter_by_energy_window=False, window_filter_kwargs={}, random_seed=None)
Generate initial conditions
- Parameters:
molecule (
data.molecule
) – molecule with necessary informationgeneration_method (str) – initial condition generation method, see below the table
number_of_initial_conditions (int) – number of initial conditions to generate, 1 by default
file_with_initial_xyz_coordinates (str) – file with initial xyz coordinates, only valid for
generation_method='user-defined'
file_with_initial_xyz_velocities (str) – file with initial xyz velocities, only valid for
generation_method='user-defined'
eliminate_angular_momentum (bool) – remove angular momentum from velocities, valid for
generation_method='random'
andgeneration_method='wigner'
degrees_of_freedom (int) – degrees of freedom of the molecule, by default remove translational and rotational degrees of freedom. It can be a negative value, which means that some value is subtracted from 3*Natoms
initial_temperature (float) – initial temperature in Kelvin, control random initial velocities
initial_kinetic_energy (float) – initial kinetic energy in Hartree, control random initial velocities
random_seed (int) – random seed for numpy random number generator (do not use unless you want to obtain the same results every time)
filter_by_energy_window (bool) – filter by excitation energy window
window_filter_kwargs (dict) – keyword arguments for filtering the energy window, see below the table
generation_method
description
'user-defined'
(default)use user-defined initial conditions
'random'
generate random velocities
'maxwell-boltzmann'
randomly generate initial velocities from Maxwell-Boltzmann distribution
'wigner'
use Wigner sampling as implemented in Newton-X
window_filter_kwargs
description
model
model or method that can calculate excitation energies and oscillator strengths
model_predict_kwargs
keyword arguments for above model, typically
nstates
specifying how many states to calculatetarget_excitation_energy (float)
in eV
window_half_width (float)
in eV
random_seed (int)
random seed for numpy random number generator (do not use unless you want to obtain the same results every time)
- Returns:
A molecular database (
ml.data.molecular_database
) withnumber_of_initial_conditions
initial conditions
Examples:
# Use user-defined initial conditions init_cond_db = ml.generate_initial_conditions(molecule = mol, generation_method = 'user-defined', file_with_initial_xyz_coordinates = 'ethanol.xyz', file_with_initial_xyz_velocities = 'ethanol.vxyz', number_of_initial_conditions = 1) # Generate random velocities init_cond_db = ml.generate_initial_conditions(molecule = mol, generation_method = 'random', initial_temperature = 300, number_of_initial_conditions = 1) # Use Wigner sampling init_cond_db = ml.generate_initial_conditions(molecule = mol, generation_method = 'wigner', number_of_initial_conditions = 1) # Sample with filtering by excitation energy window. Requires the model for calculating excitation energies and oscillator strengths. model = ml.models.methods(method='AIQM1') model_predict_kwargs={'nstates':9} # requests calculation of 9 electronic states window_filter_kwargs={'model':model, 'model_predict_kwargs':model_predict_kwargs, 'target_excitation_energy':5.7, # eV 'window_half_width':0.1, # eV} init_cond_db = ml.generate_initial_conditions(molecule=mol, generation_method='wigner', number_of_initial_conditions=5, initial_temperature=0, random_seed=0, use_hessian=False, filter_by_energy_window=True, window_filter_kwargs=window_filter_kwargs)
Note
Wigner sampling needs Hessian matrix. You can use
ml.models.methods.predict(molecule=mol,calculate_hessian=True)
to get Hessian matrix.
Molecular dynamics
!---------------------------------------------------------------------------!
! md: Module for molecular dynamics !
! Implementations by: Yi-Fan Hou & Pavlo O. Dral !
!---------------------------------------------------------------------------!
- class mlatom.md.md(model=None, model_predict_kwargs={}, molecule_with_initial_conditions=None, molecule=None, ensemble='NVE', thermostat=None, time_step=0.1, maximum_propagation_time=1000, dump_trajectory_interval=None, filename=None, format='h5md', stop_function=None, stop_function_kwargs=None)
Molecular dynamics
- Parameters:
model (
mlatom.models.model
ormlatom.models.methods
) – Any model or method which provides energies and forces.molecule_with_initial_conditions (
data.molecule
) – The molecule with initial conditions.ensemble (str, optional) – Which kind of ensemble to use.
thermostat (
thermostat.Thermostat
) – The thermostat applied to the system.time_step (float) – Time step in femtoseconds.
maximum_propagation_time (float) – Maximum propagation time in femtoseconds.
dump_trajectory_interval (int, optional) – Dump trajectory at which interval. Set to
None
to disable dumping.filename (str, optional) – The file that saves the dumped trajectory
format (str, optional) – Format in which the dumped trajectory is saved
stop_function (any, optional) – User-defined function that stops MD before
maximum_propagation_time
stop_function_kwargs (Dict, optional) – Kwargs of
stop_function
ensemble
description
'NVE'
(default)Microcanonical (NVE) ensemble
'NVT'
Canonical (NVT) ensemble
thermostat
description
ml.md.Andersen_thermostat
Andersen thermostat
ml.md.Nose_Hoover_thermostat
Hose-Hoover thermostat
None
(default)No thermostat is applied
For theoretical details, see and cite original paper.
Examples:
# Initialize molecule mol = ml.data.molecule() mol.read_from_xyz_file(filename='ethanol.xyz') # Initialize methods aiqm1 = ml.models.methods(method='AIQM1') # User-defined initial condition init_cond_db = ml.generate_initial_conditions(molecule = mol, generation_method = 'user-defined', file_with_initial_xyz_coordinates = 'ethanol.xyz', file_with_initial_xyz_velocities = 'ethanol.vxyz') init_mol = init_cond_db.molecules[0] # Initialize thermostat nose_hoover = ml.md.Nose_Hoover_thermostat(temperature=300,molecule=init_mol,degrees_of_freedom=-6) # Run dynamics dyn = ml.md(model=aiqm1, molecule_with_initial_conditions = init_mol, ensemble='NVT', thermostat=nose_hoover, time_step=0.5, maximum_propagation_time = 10.0) # Dump trajectory traj = dyn.molecular_trajectory traj.dump(filename='traj', format='plain_text') traj.dump(filename='traj.h5', format='h5md')
Note
Trajectory is saved in
ml.md.molecular_trajectory
, which is aml.data.molecular_trajectory
classWarning
In MLatom, energy unit is Hartree and distance unit is Angstrom. Make sure that the units in your model are consistent.
- class Andersen_thermostat(**kwargs)
Andersen thermostat object
- Parameters:
gamma (float) – collision rate in fs^{-1}, 0.2 by default
temperature (float) – System temperature to equilibrate in Kelvin, 300 by default
- class Nose_Hoover_thermostat(**kwargs)
Nose-Hoover thermostat object
- Parameters:
nose_hoover_chain_length (int) – Nose Hoover chain length, should be a positive number, 3 by default
multiple_time_step (int) – Multiple time step, should be a positive number, 3 by default
number_of_yoshida_suzuki_steps (int) – Number of Yoshida Suzuki steps, can be any in (1,3,5,7), 7 by default
nose_hoover_chain_frequency (float) – Nose-Hoover chain frequency in $fs^{-1}$, 0.0625 by default, should be comparable to the frequency you want to equilibrate
temperature (float) – System temperature to equilibrate in Kelvin, 300 by default
molecule (
data.molecule
) – The molecule to be equilibrateddegrees_of_freedom – Degrees of freedom of the system
Surface-hopping dynamics
!---------------------------------------------------------------------------!
! namd: Module for nonadiabatic molecular dynamics !
! Implementations by: Lina Zhang & Pavlo O. Dral !
!---------------------------------------------------------------------------!
- class mlatom.namd.surface_hopping_md(model=None, model_predict_kwargs={}, molecule_with_initial_conditions=None, molecule=None, ensemble='NVE', thermostat=None, time_step=0.1, maximum_propagation_time=100, dump_trajectory_interval=None, filename=None, format='h5md', stop_function=None, stop_function_kwargs=None, hopping_algorithm='LZBL', nstates=None, initial_state=None, random_seed=<function generate_random_seed>, prevent_back_hop=False, reduce_memory_usage=False, rescale_velocity_direction='along velocities', reduce_kinetic_energy=False)
Surface-hopping molecular dynamics
- Parameters:
model (
mlatom.models.model
ormlatom.models.methods
) – Any model or method which provides energies and forces.model_predict_kwargs (Dict, optional) – Kwargs of model prediction
molecule_with_initial_conditions (
data.molecule
) – The molecule with initial conditions.molecule (
data.molecule
) – Work the same as molecule_with_initial_conditionsensemble (str, optional) – Which kind of ensemble to use.
thermostat (
thermostat.Thermostat
) – The thermostat applied to the system.time_step (float) – Time step in femtoseconds.
maximum_propagation_time (float) – Maximum propagation time in femtoseconds.
dump_trajectory_interval (int, optional) – Dump trajectory at which interval. Set to
None
to disable dumping.filename (str, optional) – The file that saves the dumped trajectory
format (str, optional) – Format in which the dumped trajectory is saved
stop_function (any, optional) – User-defined function that stops MD before
maximum_propagation_time
stop_function_kwargs (Dict, optional) – Kwargs of
stop_function
hopping_algorithm (str, optional) – Surface hopping algorithm
nstates (int) – Number of states
initial_state (int) – Initial state
random_seed (int) – Random seed
prevent_back_hop (bool, optional) – Whether to prevent back hopping
rescale_velocity_direction (string, optional) – Rescale velocity direction
reduce_kinetic_energy (bool, optional) – Whether to reduce kinetic energy
ensemble
description
'NVE'
(default)Microcanonical (NVE) ensemble
'NVT'
Canonical (NVT) ensemble
thermostat
description
ml.md.Andersen_thermostat
Andersen thermostat
ml.md.Nose_Hoover_thermostat
Hose-Hoover thermostat
None
(default)No thermostat is applied
For theoretical details, see and cite original paper (to be submitted).
Lina Zhang, Sebastian Pios, Mikołaj Martyka, Fuchun Ge, Yi-Fan Hou, Yuxinxin Chen, Joanna Jankowska, Lipeng Chen, Mario Barbatti, Pavlo O. Dral. MLatom software ecosystem for surface hopping dynamics in Python with quantum mechanical and machine learning methods. 2024, to be submitted. Preprint on arXiv: https://arxiv.org/abs/2404.06189.
Examples:
# Propagate multiple LZBL surface-hopping trajectories in parallel # .. setup dynamics calculations namd_kwargs = { 'model': aiqm1, 'time_step': 0.25, 'maximum_propagation_time': 5, 'hopping_algorithm': 'LZBL', 'nstates': 3, 'initial_state': 2, } # .. run trajectories in parallel dyns = ml.simulations.run_in_parallel(molecular_database=init_cond_db, task=ml.namd.surface_hopping_md, task_kwargs=namd_kwargs, create_and_keep_temp_directories=True) trajs = [d.molecular_trajectory for d in dyns] # Dump the trajectories itraj=0 for traj in trajs: itraj+=1 traj.dump(filename=f"traj{itraj}.h5",format='h5md') # Analyze the result of trajectories and make the population plot ml.namd.analyze_trajs(trajectories=trajs, maximum_propagation_time=5) ml.namd.plot_population(trajectories=trajs, time_step=0.25, max_propagation_time=5, nstates=3, filename=f'pop.png', pop_filename='pop.txt')
Note
Trajectory is saved in
ml.md.molecular_trajectory
, which is aml.data.molecular_trajectory
classWarning
In MLatom, energy unit is Hartree and distance unit is Angstrom. Make sure that the units in your model are consistent.
Active learning
Initial data sampling
initdata_sampler
can be:
'wigner'
'harmonic-quantum-boltzmann'
User-defined ML models
The user has the flexibility to create their own ML model class for AL. Minimum requirements to such a class:
it must have the usual
train
andpredict
functions.the
train
function must acceptmolecular_database
parameter.the
predict
function must acceptmolecule
and/ormolecular_database
parameters.
The realistic, fully fledged example of how to create a usable ML model class is below (it is what we use in al routine!):
class my_model():
def __init__(self, al_info = {}, model_file=None, device=None, verbose=False):
import torch
if device is None:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
if model_file is None:
if 'mlmodel_file' in al_info.keys():
self.model_file = al_info['mlmodel_file']
else:
self.model_file = 'mlmodel'
al_info['mlmodel_file'] = self.model_file
else:
self.model_file = model_file
al_info['mlmodel_file'] = self.model_file
if 'main_mlmodel_file' in al_info.keys():
main_mlmodel_file = al_info['main_mlmodel_file']
else:
main_mlmodel_file = f'{self.model_file}.pt'
al_info['main_mlmodel_file'] = main_mlmodel_file
if 'aux_mlmodel_file' in al_info.keys():
aux_mlmodel_file = al_info['aux_mlmodel_file']
else:
aux_mlmodel_file = f'aux_{self.model_file}.pt'
al_info['aux_mlmodel_file'] = aux_mlmodel_file
self.device = device
self.verbose = verbose
self.main_model = ml.models.ani(model_file=main_mlmodel_file,device=device,verbose=verbose)
self.aux_model = ml.models.ani(model_file=aux_mlmodel_file,device=device,verbose=verbose)
def train(self, molecular_database=None, al_info={}):
if 'working_directory' in al_info.keys():
workdir = al_info['working_directory']
self.main_model.model_file = f'{workdir}/{self.model_file}.pt'
self.aux_model.model_file = f'{workdir}/aux_{self.model_file}.pt'
validation_set_fraction = 0.1
[subtraindb, valdb] = molecular_database.split(number_of_splits=2, fraction_of_points_in_splits=[1-validation_set_fraction, validation_set_fraction], sampling='random')
# train the model on energies and gradients
self.main_model = ml.models.ani(model_file=self.main_model.model_file,device=self.device,verbose=self.verbose)
self.main_model.train(molecular_database=subtraindb,validation_molecular_database=valdb,property_to_learn='energy',xyz_derivative_property_to_learn='energy_gradients')
# train the auxiliary model only on energies
self.aux_model = ml.models.ani(model_file=self.aux_main_model.model_file,device=self.device,verbose=self.verbose)
self.aux_model.train(molecular_database=subtraindb,validation_molecular_database=valdb,property_to_learn='energy')
if not 'uq_threshold' in al_info.keys():
self.predict(molecular_database=valdb)
uqs = valdb.get_property('uq')
al_info['uq_threshold'] = np.median(uqs) + 3*stats.calc_median_absolute_deviation(uqs)
self.uq_threshold = al_info['uq_threshold']
# if the models were trained successfully, let's update al info where we can find them
al_info['main_mlmodel_file'] = self.main_model.model_file
al_info['aux_mlmodel_file'] = self.aux_model.model_file
def predict(self, molecule=None, molecular_database=None):
# predict energies and gradients with the main model
self.main_model.predict(molecule=molecule, molecular_database=molecular_database,property_to_predict='energy',xyz_derivative_property_to_predict='energy_gradients')
# predict energies with the auxiliary model
self.aux_model.predict(molecule=molecule, molecular_database=molecular_database,property_to_predict='aux_energy')
# calculate uncertainties
moldb = molecular_database
if moldb is None:
moldb = ml.molecular_database()
for mol in moldb:
mol.uq = abs(mol.energy - mol.aux_energy)
if mol.uq > self.uq_threshold:
mol.uncertain = True
else:
mol.uncertain = False
# This are useful in some internal al routines, e.g., when we want to make predictions in parallel (if nthreads is not set properly, it may slow down al significantly!)
@property
def nthreads(self):
return self.main_model.nthreads
@nthreads.setter
def nthreads(self, value):
self.main_model.nthreads = value
self.aux_model.nthreads = value
ml.al(
...
ml_model = my_model,
# do not use my_model(...), if you want to pass any arguments, use ml_model_kwargs:
ml_model_kwargs = {...}, # 'al_info' is unnecessary to include, it will be added automatically. If you supply 'al_info' key, it will overwrite the default one so use if you know what you are doing.
...
)
As you can see, it is helpful (but not required) if the __init__
and train
functions of the ML model class also accept the al_info
parameter which can be used to pass information during active learning from one routine to another.
Sampler
Here is a realistic example of the sampler function used in the physics-informed active learning:
def my_sampler(al_info={}, ml_model=None, initcond_sampler=None, initcond_sampler_kwargs={}, maximum_propagation_time=1000, time_step=0.1, ensemble='NVE', thermostat=None, dump_trajs=False, dump_trajectory_interval=None, stop_function=None, batch_parallelization=True):
moldb2label = ml.data.molecular_database()
# generate initial conditions
if type(initcond_sampler) == str:
if initcond_sampler.casefold() in ['wigner', 'harmonic-quantum-boltzmann']:
initcond_sampler = ml.generate_initial_conditions
initcond_sampler_kwargs['generation_method'] = initcond_sampler
import inspect
args, varargs, varkw, defaults = inspect.getargspec(initcond_sampler)
# Do we need al_info below?
if 'al_info' in args:
initial_molecular_database = initcond_sampler(al_info=al_info, **initcond_sampler_kwargs)
else:
initial_molecular_database = initcond_sampler(**initcond_sampler_kwargs)
# run MD in parallel to collect uncertain points
if batch_parallelization: # Faster way to propagate many trajs with ML
dyn = ml.md_parallel(model=ml_model,
molecular_database=initial_molecular_database,
ensemble=ensemble,
thermostat=thermostat,
time_step=time_step,
maximum_propagation_time=maximum_propagation_time,
dump_trajectory_interval=dump_trajectory_interval,
stop_function=stop_function)
trajs = dyn.molecular_trajectory
for itraj in range(len(trajs.steps[0])):
print(f"Trajectory {itraj} number of steps: {trajs.traj_len[itraj]}")
if trajs.steps[trajs.traj_len[itraj]][itraj].uncertain:
print(f'Adding molecule from trajectory {itraj} at time {trajs.traj_len[itraj]*time_step} fs')
moldb2label.molecules.append(trajs.steps[trajs.traj_len[itraj]][itraj])
# Dump traj
if dump_trajs:
import os
traj = ml.data.molecular_trajectory()
for istep in range(trajs.traj_len[itraj]+1):
step = ml.data.molecular_trajectory_step()
step.step = istep
step.time = istep * time_step
step.molecule = trajs.steps[istep][itraj]
traj.steps.append(step)
if 'working_directory' in al_info.keys():
dirname = f'{al_info['working_directory']}/trajs'
else:
dirname = 'trajs'
if not os.path.exists(dirname):
os.makedirs(dirname)
traj.dump(f"{dirname}/traj{itraj}.h5",format='h5md')
else:
md_kwargs = {
'molecular_database': initial_molecular_database,
'model': ml_model,
'time_step': time_step,
'maximum_propagation_time': maximum_propagation_time,
'ensemble': ensemble,
'thermostat': thermostat,
'dump_trajectory_interval': dump_trajectory_interval,
'stop_function': stop_function
}
dyns = ml.simulations.run_in_parallel(molecular_database=initial_molecular_database,
task=ml.md,
task_kwargs=md_kwargs,
create_and_keep_temp_directories=False)
trajs = [d.molecular_trajectory for d in dyns]
itraj=0
for traj in trajs:
itraj+=1
print(f"Trajectory {itraj} number of steps: {len(traj.steps)}")
if traj.steps[-1].molecule.uncertain:
print('Adding molecule from trajectory %d at time %.2f fs' % (itraj, traj.steps[-1].time))
moldb2label.molecules.append(traj.steps[-1].molecule)
# Dump traj
if dump_trajs:
import os
if 'working_directory' in al_info.keys():
dirname = f'{al_info['working_directory']}/trajs'
else:
dirname = 'trajs'
if not os.path.exists(dirname):
os.makedirs(dirname)
traj.dump(f"{dirname}/traj{itraj}.h5",format='h5md')
# add the source of molecule
for mol in moldb2label:
mol.sampling = 'md'
return moldb2label
ml.al(
...
sampler=my_sampler,
sampler_kwargs={'time_step': 0.5},
...
)