概览

MLatom的Python API(简称PyAPI)主要有三个部分: mlatom.datamlatom.models,和 mlatom.simulations

使用 mlatom.data 可以创建/操作/保存/加载化学数据到 mlatom.data.atommlatom.data.molecule,和 mlatom.data.molecular_database 中。

mlatom.models 包含许多基于量子力学和机器学习的计算化学模型,可对分子进行预测。这些模型可分为3类:

mlatom.simulations 使用 mlatom.models 来执行分子模拟,例如几何优化或动力学。

这里我们提供了一个说明这些组件用法的 简单示例 以供参考。

Data

!---------------------------------------------------------------------------!
! data: Module for working with data                                        !
! Implementations by: Pavlo O. Dral, Fuchun Ge,                             !
!                     Shuang Zhang, Yi-Fan Hou, Yanchi Ou                   !
!---------------------------------------------------------------------------!
class mlatom.data.atom(nuclear_charge: int | None = None, atomic_number: int | None = None, element_symbol: str | None = None, nuclear_mass: float | None = None, xyz_coordinates: ndarray | List | None = None)[源代码]

创建一个原子对象。

参数:
  • nuclear_charge (int, optional) – 提供核电荷来定义原子。

  • atomic_number (int, optional) – 提供原子序数来定义原子。

  • element_symbol (int, optional) – 提供元素符号来定义原子。

  • nuclear_mass (int, optional) – 提供原子核质量来定义原子。

  • xyz_coordinates (Array-like, optional) – 在笛卡尔坐标系中指定原子的位置。

copy(atomic_labels=None) atom[源代码]

返回当前原子对象的副本。

class mlatom.data.molecule(charge: int = 0, multiplicity: int = 1, atoms: List[atom] = None, pbc: ndarray | bool | None = None, cell: ndarray | None = None)[源代码]

创建一个分子对象。

参数:
  • charge (float, optional) – 指定分子的电荷。

  • multiplicity (int, optional) – 指定分子的多重度。

  • atoms (List[atom], optional) – 指定分子中的原子。

示例

选择一个带下标的原子:

from mlatom.data import atom, molecule
at = atom(element_symbol = 'C')
mol = molecule(atoms = [at])
print(id(at), id(mol[0]))
id

这个分子的唯一ID。

charge

分子的电荷。

multiplicity

分子的多重度。

load(filename: stringe, format: string):

从转储文件中加载一个分子对象。

Updates a molecule object if initialized:

mol = molecule(); mol.load(filename='mymol.json')

Returns a molecule object if called as class method:

mol = molecule.load(filename='mymol.json')

参数:

filename (str): filename or path

format (str, optional): currently, only ‘json’ format is supported.

property pbc

The periodic boundary conditions of the molecule. Setting it with mol.pbc = True is equal to mol.pbc = [True, True, True].

property cell

The matrix of 3 vectors that defines the unicell. The setter of it simply wraps ase.geometry.cell.cellpar_to_cell().

property cell_coordinates: ndarray

The relative coordinates in the cell.

map_to_unicell()[源代码]

Map all atoms outside the unicell into it.

read_from_xyz_file(filename: str, format: str | None = None) molecule[源代码]

从xyz文件加载分子构型。

如果没有指定参数格式 format ,可以读取标准xyz格式的数据。

支持的其他格式有:

  • 'COLUMBUS'

  • 'NEWTON-X' or 'NX'

  • 'turbomol'

参数:
  • filename (str) – 待读取的文件的名称。

  • format (str, optional) – 文件的格式。

read_from_xyz_string(string: str = None, format: str | None = None) molecule[源代码]

从xyz字符串加载分子的几何构型。

如果没有指定参数格式 format ,可以读取标准xyz格式的数据。

支持的其他格式有:

  • 'COLUMBUS'

  • 'NEWTON-X' or 'NX'

  • 'turbomol'

参数:
  • string (str) – 字符串输入。

  • format (str, optional) – 字符串的格式。

read_from_numpy(coordinates: ndarray, species: ndarray) molecule[源代码]

从一个含有坐标的numpy数组以及一个包含分子种类的numpy数组中加载分子结构。

坐标 coordinates 的输入格式为 (N, 3)species 的输入格式为 (N,)

其中 N 代表原子个数。

read_from_smiles_string(smi_string: str) molecule[源代码]

根据提供的SMILES字符串生成分子的结构。

使用 Pybelmake3D() 方法生成优化后的几何构型。

classmethod from_xyz_file(filename: str, format: str | None = None) molecule[源代码]

molecule.read_from_xyz_file() 的类方法版本,返回一个 molecule 对象。

classmethod from_xyz_string(string: str = None, format: str | None = None) molecule[源代码]

molecule.read_from_xyz_string() 的类方法版本,返回一个 molecule 对象。

classmethod from_numpy(coordinates: ndarray, species: ndarray) molecule[源代码]

molecule.read_from_numpy() 的类方法版本,返回一个 molecule 对象。

classmethod from_smiles_string(smi_string: str) molecule[源代码]

molecule.read_from_smiles_string() 的类方法版本,返回一个 molecule 对象。

add_atom_from_xyz_string(line: str) None[源代码]

在分子的xyz文件中添加一个原子。

add_scalar_property(scalar, property_name: str = 'y') None[源代码]

为分子添加标量属性。这个性质可以通过 molecule.<property_name> 来调用。

参数:
  • scalar – 要添加的标量。

  • property_name (str, optional) – 为标量属性设定的名称。

add_xyz_derivative_property(derivative, property_name: str = 'y', xyz_derivative_property: str = 'xyz_derivatives') None[源代码]

为分子添加xyz导数属性。

参数:
  • derivative – 要添加的导数属性。

  • property_name (str, optional) – 关联的非导数属性的名称。

  • xyz_derivative_property (str, optional) – 为导数属性设定的名称。

add_xyz_vectorial_property(vector, xyz_vectorial_property: str = 'xyz_vector') None[源代码]

为分子添加xyz矢量属性。

参数:
  • vector – 要添加的矢量。

  • xyz_vectorial_property (str, optional) – 为矢量属性设定的名称。

write_file_with_xyz_coordinates(filename: str, format: str | None = None) None[源代码]

将分子几何数据写入文件。如果没有指定参数格式 format ,则读取标准xyz格式的数据。

支持的其他格式有:

  • 'COLUMBUS'

  • 'NEWTON-X' or 'NX'

  • 'turbomol'

参数:
  • filename (str) – 待写入的文件的名称。

  • format (str, optional) – 文件的格式。

get_xyz_string() str[源代码]

以xyz格式返回分子的几何构型。

property atomic_numbers: ndarray

分子中的原子个数。

property element_symbols: ndarray

分子中原子的元素符号。

property smiles: str

分子的SMILES表示。

property xyz_coordinates: ndarray

分子的xyz构型。

property kinetic_energy: float

根据速度的xyz文件给出动能(A.U.)。

copy(atomic_labels=None, molecular_labels=None)[源代码]

返回当前分子对象的副本。

proliferate(shifts: Iterable | None = None, XYZshifts: Iterable | None = None, Xshifts: Iterable | None = [0], Yshifts: Iterable | None = [0], Zshifts: Iterable | None = [0], PBC_constrained: bool = True) molecule[源代码]

Proliferate the unicell by specified shifts along cell vectors (called X/Y/Z here).

Returns a new molecule object.

参数:
  • shifts (Iterable, optional) – The list of shifts to perform. Each shift should be a 3D vector that indicates the coefficient applies to the corresponding cell vector.

  • XYZshifts (Iterable, optional) – Generate all possible shifts with given shift coefficients in all three directions when a list is specified. When a list of 3 lists is specified, it’s equal to setting X/Y/Zshifts

  • Xshifts (Iterable, optional) – Specify all possible shift coefficients in the direction of the first cell vector.

  • Yshifts (Iterable, optional) – Specify all possible shift coefficients in the direction of the second cell vector.

  • Zshifts (Iterable, optional) – Specify all possible shift coefficients in the direction of the third cell vector.

  • PBC_constrained (bool) – Controls whether the shifts in some directions are disabled where corresponding PBC is false. Only applies to XYZshifts.

备注

Priorities for different types of shifts:

shifts > XYZshifts > X/Y/Zshifts

示例

Single H atom in the centre of a cubic cell (2x2x2):

mol = ml.molecule.from_numpy(np.ones((1, 3)), np.array([1]))
mol.pbc = True
mol.cell = 2

Proliferate to get two periods in all three directions, with shifts:

new_mol = mol.proliferate(
    shifts = [
        [0, 0, 0],
        [1, 0, 0],
        [0, 1, 0],
        [0, 0, 1],
        [1, 1, 0],
        [1, 0, 1],
        [0, 1, 1],
        [1, 1, 1],
    ]
)

with XYZshifts:

new_mol = mol.proliferate(XYZshifts=range(2))
# or
new_mol = mol.proliferate(XYZshifts=[range(2)]*3)

with X/Y/Zshifts:

new_mol = mol.proliferate(Xshifts=range(2), Yshifts=(0, 1), Zshifts=[0, 1]))

All scripts above will make new_mol.xyz_coordinates be:

array([[1., 1., 1.],
       [3., 1., 1.],
       [1., 3., 1.],
       [3., 3., 1.],
       [1., 1., 3.],
       [3., 1., 3.],
       [1., 3., 3.],
       [3., 3., 3.]])
dump(filename=None, format='json')[源代码]

将当前分子对象转储到文件中,目前只支持.json格式。

property state_energies: ndarray

分子的电子态能量。

property state_gradients: ndarray

分子的电子态能量梯度。

property energy_gaps: ndarray

不同状态的能隙。

property excitation_energies: ndarray

分子从基态的激发能。

property nstates: np.int

The number of electronic states.

get_xyzvib_string(normal_mode=0)[源代码]

Get the xyz string with geometries and displacements along the vibrational normal modes

view(normal_mode=None, slider=True)[源代码]

Visualize the molecule and its vibrations if requested. Uses py3Dmol. :param normal_mode: the index of a normal mode to visualize. Default: None. :type normal_mode: integer, optional :param slider: show interactive slider to choose the mode.

Default: True (only works if normal_mode is not None).

class mlatom.data.molecular_database(molecules: List[molecule] = None)[源代码]

为分子对象生成一个数据库。

参数:

molecules (List[molecule]) – 包含在分子数据库中的分子列表。

示例

选择一个带下标的原子:

from mlatom.data import atom, molecule, molecular_database
at = atom(element_symbol = 'C')
mol = molecule(atoms = [at])
molDB = molecular_database([mol])
print(id(mol) == id(molDB[0]))
# the output should be 'True'

像numpy数组一样对数据库进行切片:

from mlatom.data import molecular_database
molDB = molecular_database.from_xyz_file('devtests/al/h2_fci_db/xyz.dat')
print(len(molDB))           # 451
print(len(molDB[:100:4]))   # 25
read_from_xyz_file(filename: str, append: bool = False) molecular_database[源代码]

从xyz文件加载分子构型。

参数:
  • filename (str) – 待读取的文件的名称。

  • append (bool, optional) – 若为True,则追加到当前数据库,否则清除当前数据库。

read_from_xyz_string(string: str, append=False) molecular_database[源代码]

从xyz字符串加载分子的几何构型。

参数:
  • string (str) – 待读取的文件的名称。

  • append (bool, optional) – 若为True,则追加到当前数据库,否则清除当前数据库。

read_from_numpy(coordinates: ndarray, species: ndarray, append: bool = False) molecular_database[源代码]

从一个包含坐标的numpy数组以及一个包含分子类别的numpy数组中加载多个分子结构。

坐标 coordinates 的输入格式为 (M, N, 3)species 的输入格式为 (M, N,)

其中 N 是原子数, M 是分子数。

read_from_smiles_file(smi_file: str, append: bool = False) molecular_database[源代码]

从提供的SMILES文件生成分子几何构型。

使用 Pybelmake3D() 方法生成优化后的几何构型。

read_from_smiles_string(smi_string: str, append: bool = False) molecular_database[源代码]

从提供的SMILES字符串生成分子几何构型。

使用 Pybelmake3D() 方法生成优化后的几何构型。

read_from_h5_file(h5_file: str = '', properties: list = None, parallel: bool | int | tuple = False, verbose: bool = False) molecular_database[源代码]

Generate molecular database from formatted h5 file. The first level should be configurations (or ensemble of molecules with same number of atoms) and the second level should be conformations and their properties. ‘species’ and ‘coordinates’ are required to construct molecule. An example format of h5 file:

` /003                  dict /003/species          array (624, 3) [int8] /003/coordinates      array (624, 3, 3) [float32] /003/energies         array (624,) [float64] /003/property1        ['wb97x/def2tzvpp'] /003/property2        array (624, 2) [int8] `

If the first two dimensions of the size of the value equals (number_of_configurations, number_of_atoms), the remaining dimension of the value will be assigned to each atom as xyz derivative properties. If the first dimensions of the size of the value equals to number of configurations, corresponging value will be assigned to each molecule. If only one value is provided for the property, it will be copied into each molecule. For example, in the above case, the properties stored in each molecule object would be: {‘energies’: float, ‘property1’:’wb97x/def2tzvpp’, ‘property2’: numpy.ndarray of size (2,0)}

参数:
  • h5file (str) – path to h5 file.

  • properties (list) – the properties to be stored in molecular database. By default all the properties presented in h5 file will be stored.

  • parallel (int or tuple or bool) –

    • If int is provided, the value will be assigned to the number of workers, Batch size will be calculated automatically.

    • If tuple is provided, the first value will be assigned to the number of workers and the second value will be assigned to the batch size.

    • If bool is provided, True means all the CPUs available will be used and batch size will be adjusted accordingly.

  • verbose (bool) – whether to print the loading message.

classmethod from_xyz_file(filename: str) molecular_database[源代码]

molecular_database.read_from_xyz_file() 的类方法版本,返回一个 molecular_database 对象。

classmethod from_xyz_string(string: str) molecular_database[源代码]

molecular_database.read_from_xyz_string() 的类方法版本,返回一个 molecular_database 对象。

classmethod from_numpy(coordinates: ndarray, species: ndarray) molecular_database[源代码]

molecular_database.read_from_numpy() 的类方法版本,返回一个 molecular_database 对象。

classmethod from_smiles_file(smi_file: str) molecular_database[源代码]

molecular_database.read_from_smiles_file() 的类方法版本,返回一个 molecular_database 对象。

classmethod from_smiles_string(smi_string: str | List) molecular_database[源代码]

molecular_database.read_from_smiles_string() 的类方法版本,返回一个 molecular_database 对象。

add_scalar_properties(scalars, property_name: str = 'y') None[源代码]

给分子添加标量属性。

参数:
  • scalars – 要添加的标量。

  • property_name (str, optional) – 为标量属性设定的名称。

add_scalar_properties_from_file(filename: str, property_name: str = 'y') None[源代码]

将文件中的标量属性添加到分子中。

参数:
  • filename (str) – 指定包含属性的文本文件。

  • property_name (str, optional) – 为标量属性设定的名称。

add_xyz_derivative_properties(derivatives, property_name: str = 'y', xyz_derivative_property: str = 'xyz_derivatives') None[源代码]

为分子添加xyz导数属性。

参数:
  • derivatives – 要添加的导数。

  • property_name (str, optional) – 关联的非导数属性的名称。

  • xyz_derivative_property (str, optional) – 为导数属性设定的名称。

add_xyz_derivative_properties_from_file(filename: str, property_name: str = 'y', xyz_derivative_property: str = 'xyz_derivatives') None[源代码]

将导数的xyz文件文本添加到分子中。

参数:
  • filename (str) – 待添加导数的文件名。

  • property_name (str, optional) – 关联的非导数属性的名称。

  • xyz_derivative_property (str, optional) – 为导数属性设定的名称。

add_xyz_vectorial_properties(vectors, xyz_vectorial_property: str = 'xyz_vector') None[源代码]

给分子添加一个xyz矢量属性。

参数:
  • vectors – 要添加的矢量。

  • xyz_vectorial_property (str, optional) – 为矢量属性设定的名称。

add_xyz_vectorial_properties_from_file(filename: str, xyz_vectorial_property: str = 'xyz_vector') None[源代码]

将导数的xyz文件文本添加到分子中。

参数:
  • filename (str) – 待添加矢量属性的文件名。

  • xyz_vectorial_property (str, optional) – 为矢量属性设定的名称。

write_file_with_xyz_coordinates(filename: str) None[源代码]

将分子几何构型写入xyz文件中。

参数:

filename (str) – 待写入的文件的名称。

get_xyz_string() None[源代码]

返回分子的xyz文本。

write_file_with_properties(filename, property_to_write='y')[源代码]

将分子的属性写入文本文件。

property atomic_numbers: ndarray

数据库中所有分子中每个原子的原子序数的二维数组。

property element_symbols: ndarray

数据库中所有分子每个原子元素符号的二维数组。

property ids

数据库中分子的ID。

property smiles: str

数据库中分子的SMILES字符串。

write_file_with_smiles(filename)[源代码]

将数据库中分子的SMILES写入文件。

property nuclear_masses

数据库中分子的核质量。

property charges

数据库中分子的电荷。

property multiplicities

数据库中分子的多重度。

get_properties(property_name='y')[源代码]

根据给定的属性名返回分子的属性。

set_properties(**kwargs)[源代码]

通过给定的属性名称(如关键字)设置分子的属性。

get_xyz_derivative_properties(xyz_derivative_property='xyz_derivatives')[源代码]

按名称返回xy导数属性。

get_xyz_vectorial_properties(property_name)[源代码]

按名称返回xyz矢量属性。

write_file_with_xyz_derivative_properties(filename, xyz_derivative_property_to_write='xyz_derivatives')[源代码]

将xyz导数属性写入文件。

write_file_energy_gradients(filename)[源代码]

将能量梯度写入文件。

write_file_with_xyz_vectorial_properties(filename, xyz_vectorial_property_to_write='xyz_vector')[源代码]

将xyz矢量属性写入文件。

write_file_with_hessian(filename, hessian_property_to_write='hessian')[源代码]

将Hessians写入文件。

append(obj)[源代码]

附加一个分子/分子数据库。

copy(atomic_labels=None, molecular_labels=None, molecular_database_labels=None)[源代码]

返回数据库的副本。

proliferate(*args, **kwargs) molecular_database[源代码]

Proliferate the unicell by specified shifts along cell vectors.

Returns a new molecular_databse object.

Check molecule.proliferate() for details on options.

dump(filename=None, format=None)[源代码]

将分子数据库转储到文件中。

classmethod load(filename=None, format=None)[源代码]

从文件中加载分子数据库。

split(sampling='random', number_of_splits=2, split_equally=None, fraction_of_points_in_splits=None)[源代码]

Splits molecular database.

参数:
  • sampling (str, optional) – default ‘random’. Can be also ‘none’.

  • split_equally (bool, optinoal) – default False; if set to True splits 50:50.

  • fraction_of_points_in_splits (list, optional) – e.g., [0.8, 0.2] is the default one

  • indices

property xyz_coordinates

各个分子中每个原子的xyz坐标。

view()[源代码]

Visualize the molecular database. Uses py3Dmol.

class mlatom.data.molecular_trajectory(steps=None)[源代码]

用于存储/访问分子轨迹数据的类,这些数据是由动力学或几何优化生成的。

dump(filename=None, format=None)[源代码]

将分子轨迹molecular_trajectory转储到文件中。

可用的格式有:

  • 'h5md' (需要Python模块 h5pypyh5md )

  • 'json'

  • 'plain_text'

load(filename: str = None, format: str = None)[源代码]

从文件中加载先前转储的分子轨迹molecular_trajectory。

get_xyz_string() str[源代码]

返回轨迹中分子的xyz字符串。

to_database() molecular_database[源代码]

Return a molecular database comprising the molecules in the trajectory.

view()[源代码]

Visualize the molecular trajectory. Uses py3Dmol.

class mlatom.data.h5md(filename: str, data: Dict[str, Any] = {}, mode: str = 'w')[源代码]

将轨迹数据保存为 H5MD 格式文件

参数:
  • filename (str) – 待输出的h5md文件的文件名。

  • data (Dict) – 待存储的数据(可选,如果用户为此项提供参数,文件将在存储数据后关闭)。

  • mode (str, optional) – 控制文件处理模式的字符串 (默认值:‘ w ’表示新文件,‘ r+ ’表示现有文件)。下表中列出了与 pyh5md.File() 一致的选项

r

只读,文件必须已经存在

r+

可读可写,文件必须已经存在

w

创建文件,如果文件存在则覆盖

w- or x

创建文件,如果文件存在则报错

示例:

traj0 = h5md('traj.h5')  # open 'traj.h5'
traj1 = h5md('/tmp/test.h5', mode='r')  # open an existing file in readonly mode
traj2 = h5md('/tmp/traj2.h5', data={'time': 1.0, 'total_energy': -32.1, 'test': 8848}) # add some data to the file, then close the file

traj0.write(data) # write data to opened file
traj0(data) # an alternative way to write data

data = traj0.export() # export the data in the opened file
data = traj0() # an alternative way to export data
with h5md('test.h5') as traj: # export with a with statement
    data = traj.export()


traj0.close() # close the file

备注

HDF5文件中的默认数据路径

particles/all:

‘box’, ‘gradients’, ‘mass’, ‘nad’, ‘names’, ‘position’, ‘species’, ‘velocities’

observables:

‘angular_momentum’, ‘generated_random_number’, ‘kinetic_energy’, ‘linear_momentum’, ‘nstatdyn’, ‘oscillator_strengths’, ‘populations’, ‘potential_energy’, ‘random_seed’, ‘sh_probabilities’, ‘total_energy’, ‘wavefunctions’,

以及其他关键字

h5

HDF5文件对象

write(data: Dict[str, Any]) None[源代码]

将数据写入打开的H5文件。data应该是一个类似字典的对象,其key()中包含‘time’。

export() Dict[str, ndarray][源代码]

导出打开的H5文件中的数据。

返回:

H5文件中轨迹数据的字典。

close() None[源代码]

关闭已打开的文件。

__call__() Dict[str, ndarray]

导出打开的H5文件中的数据。

返回:

H5文件中轨迹数据的字典。

Models

!---------------------------------------------------------------------------!
! models: Module with models                                                !
! Implementations by: Pavlo O. Dral, Fuchun Ge, Yi-Fan Hou, Yuxinxin Chen,  !
!                     Peikun Zheng                                          !
!---------------------------------------------------------------------------!
class mlatom.models.model[源代码]

Parent (super) class for models to enable useful features such as logging during geometry optimizations.

config_multiprocessing()[源代码]

for scripts that need to be executed before running model in parallel

predict(molecular_database: molecular_database = None, molecule: molecule = None, calculate_energy: bool = False, calculate_energy_gradients: bool = False, calculate_hessian: bool = False, **kwargs)[源代码]

使用模型预测分子的几何构型。

参数:
  • molecular_database (mlatom.data.molecular_database, optional) – 数据库中包含需要由模型预测其性质的分子。

  • molecule (mlatom.models.molecule, optional) – 需要用模型预测其性质的分子对象。

  • calculate_energy (bool, optional) – 使用模型计算能量。

  • calculate_energy_gradients (bool, optional) – 使用模型计算能量梯度。

  • calculate_hessian (bool, optional) – 使用模型计算能量Hessian。

methods

!---------------------------------------------------------------------------!
! models: Module with models                                                !
! Implementations by: Pavlo O. Dral, Fuchun Ge, Yi-Fan Hou, Yuxinxin Chen,  !
!                     Peikun Zheng                                          !
!---------------------------------------------------------------------------!
class mlatom.models.methods(method: str = None, program: str = None, **kwargs)[源代码]

用指定的方法创建一个模型对象。

参数:
  • method (str) – 指定方法。下一节列出了可用的方法。

  • program (str, optional) – 指定要使用的程序。

  • **kwargs – 其他对于方法的特定选项。

可用的方法:

'AIQM1', 'AIQM1@DFT', 'AIQM1@DFT*', 'AM1', 'ANI-1ccx', 'ANI-1x', 'ANI-1x-D4', 'ANI-2x', 'ANI-2x-D4', 'CCSD(T)*/CBS', 'CNDO/2', 'D4', 'DFTB0', 'DFTB2', 'DFTB3', 'GFN2-xTB', 'MINDO/3', 'MNDO', 'MNDO/H', 'MNDO/d', 'MNDO/dH', 'MNDOC', 'ODM2', 'ODM2*', 'ODM3', 'ODM3*', 'OM1', 'OM2', 'OM3', 'PM3', 'PM6', 'RM1', 'SCC-DFTB', 'SCC-DFTB-heats'.

上面列出的方法可以不指定程序直接使用。如安装手册中所述,仍然需要安装所需的程序。

可用程序及其相应方法:

程序

方法

TorchANI

'AIQM1', 'AIQM1@DFT', 'AIQM1@DFT*', 'ANI-1ccx', 'ANI-1x', 'ANI-1x-D4', 'ANI-2x', 'ANI-2x-D4', 'ANI-1xnr'

dftd4

'AIQM1', 'AIQM1@DFT', 'ANI-1x-D4', 'ANI-2x-D4', 'D4'

MNDO or Sparrow

'AIQM1', 'AIQM1@DFT', 'AIQM1@DFT*', 'MNDO', 'MNDO/d', 'ODM2*', 'ODM3*', 'OM2', 'OM3', 'PM3', 'SCC-DFTB', 'SCC-DFTB-heats'

MNDO

'CNDO/2', 'MINDO/3', 'MNDO/H', 'MNDO/dH', 'MNDOC', 'ODM2', 'ODM3', 'OM1', 半经验的 OMx, DFTB, NDDO-type 方法

Sparrow

'DFTB0', 'DFTB2', 'DFTB3', 'PM6', 'RM1', 半经验的OMx, DFTB, NDDO-type 方法

xTB

'GFN2-xTB',半经验GFNx-TB方法

Orca

'CCSD(T)*/CBS', DFT

Gaussian

从头算方法,DFT

PySCF

从头算方法,DFT

property nthreads

int([x]) -> integer int(x, base=10) -> integer

将数字或字符串转换为整数,如果没有给出参数将返回0。如果x是一个数字,返回x.__int__()。对于浮点数,会向零截断。

如果x不是数字或给定基数,则x必须是表示给定基数中的整数值的字符串、字节或字节数组实例。该值的前面可以有‘+’或‘-’,前后可以有空格。基数默认为10。有效基数为0和2-36。基数为0表示将字符串中的基数解释为整数值。>>> int(‘0b100’, base=0) 4

predict(*args, **kwargs)[源代码]

使用模型预测分子的几何构型。

参数:
  • molecular_database (mlatom.data.molecular_database, optional) – 数据库中包含需要由模型预测其性质的分子。

  • molecule (mlatom.models.molecule, optional) – 需要用模型预测其性质的分子对象。

  • calculate_energy (bool, optional) – 使用模型计算能量。

  • calculate_energy_gradients (bool, optional) – 使用模型计算能量梯度。

  • calculate_hessian (bool, optional) – 使用模型计算能量Hessian。

config_multiprocessing()[源代码]

for scripts that need to be executed before running model in parallel

AIQM1

ml_model

!---------------------------------------------------------------------------!
! models: Module with models                                                !
! Implementations by: Pavlo O. Dral, Fuchun Ge, Yi-Fan Hou, Yuxinxin Chen,  !
!                     Peikun Zheng                                          !
!---------------------------------------------------------------------------!
class mlatom.models.ml_model[源代码]

Useful as a superclass for the ML models that need to be trained.

train(molecular_database: molecular_database, property_to_learn: str | None = 'y', xyz_derivative_property_to_learn: str = None) None[源代码]

使用提供的分子数据库训练模型。

参数:
  • molecular_database (mlatom.data.molecular_database) – 用于训练模型的分子数据库。

  • property_to_learn (str, optional) – 在模型训练中要学习的属性标签。

  • xyz_derivative_property_to_learn (str, optional) – 要学习的xyz导数属性的标签。

predict(molecular_database: data.molecular_database = None, molecule: data.molecule = None, calculate_energy: bool = False, property_to_predict: str | None = 'estimated_y', calculate_energy_gradients: bool = False, xyz_derivative_property_to_predict: str | None = 'estimated_xyz_derivatives_y', calculate_hessian: bool = False, hessian_to_predict: str | None = 'estimated_hessian_y') None[源代码]

使用模型预测分子的几何构型。

参数:
  • molecular_database (mlatom.data.molecular_database, optional) – 数据库中包含需要由模型预测其性质的分子。

  • molecule (mlatom.models.molecule, optional) – 需要用模型预测其性质的分子对象。

  • calculate_energy (bool, optional) – 使用模型计算能量。

  • calculate_energy_gradients (bool, optional) – 使用模型计算能量梯度。

  • calculate_hessian (bool, optional) – 使用模型计算能量Hessian。

  • property_to_predict (str, optional) – 待保存的预测属性的标签名称。

  • xyz_derivative_property_to_predict (str, optional) – 待保存的预测的xy导数属性的标签名称。

  • hessian_to_predict (str, optional) – 待保存的预测的Hessians的标签名称。

generate_model_dict()[源代码]

Generates model dictionary for dumping in json format.

reset()[源代码]

Resets model (deletes the ML model file from the hard disk).

dump(filename=None, format='json')[源代码]

Dumps model class object information in a json file (do not confused with saving the model itself, i.e., its parameters!).

calculate_validation_loss(training_kwargs=None, prediction_kwargs=None, cv_splits_molecular_databases=None, calculate_CV_split_errors=False, subtraining_molecular_database=None, validation_molecular_database=None, validation_loss_function=None, validation_loss_function_kwargs={}, debug=False)[源代码]

Returns the validation loss for the given hyperparameters.

By default, the validation loss is RMSE evaluated as a geometric mean of scalar and vectorial properties, e.g., energies and gradients.

参数:
  • training_kwargs (dict, optional) – the kwargs to be passed to yourmodel.train() function.

  • prediction_kwargs (dict, optional) – the kwargs to be passed to yourmodel.predict() function.

  • cv_splits_molecular_databases (list, optional) – the list with cross-validation splits, each element is molecular_database.

  • calculate_CV_split_errors (bool, optional) – requests to return the errors for each cross-validation split as a list in addtion to the aggregate cross-validation error.

  • subtraining_molecular_database (molecular_database, optional) – molecular database for sub-training to be passed to yourmodel.train() function.

  • validation_molecular_database (molecular_database, optional) – molecular database for validation to be passed to yourmodel.predict() function.

  • validation_loss_function (function, optional) – user-defined validation function.

  • validation_loss_function_kwargs (dict, optional) – kwargs for above validation_loss_function.

optimize_hyperparameters(hyperparameters=None, training_kwargs=None, prediction_kwargs=None, cv_splits_molecular_databases=None, subtraining_molecular_database=None, validation_molecular_database=None, optimization_algorithm=None, optimization_algorithm_kwargs={}, maximum_evaluations=10000, validation_loss_function=None, validation_loss_function_kwargs={}, debug=False)[源代码]

Optimizes hyperparameters by minimizing the validation loss.

By default, the validation loss is RMSE evaluated as a geometric mean of scalar and vectorial properties, e.g., energies and gradients.

参数:
  • hyperparameters (list, required) – the list with strings - names of hyperparameters. Hyperparameters themselves must be in youmodel.hyperparameters defined with class instance hyperparameters consisting of hyperparameter defining the optimization space.

  • training_kwargs (dict, optional) – the kwargs to be passed to yourmodel.train() function.

  • prediction_kwargs (dict, optional) – the kwargs to be passed to yourmodel.predict() function.

  • cv_splits_molecular_databases (list, optional) – the list with cross-validation splits, each element is molecular_database.

  • calculate_CV_split_errors (bool, optional) – requests to return the errors for each cross-validation split as a list in addtion to the aggregate cross-validation error.

  • subtraining_molecular_database (molecular_database, optional) – molecular database for sub-training to be passed to yourmodel.train() function.

  • validation_molecular_database (molecular_database, optional) – molecular database for validation to be passed to yourmodel.predict() function.

  • validation_loss_function (function, optional) – user-defined validation function.

  • validation_loss_function_kwargs (dict, optional) – kwargs for above validation_loss_function.

  • optimization_algorithm (str, required) – optimization algorithm. No default, must be specified among: ‘grid’ (‘brute’), ‘TPE’, ‘Nelder-Mead’, ‘BFGS’, ‘L-BFGS-B’, ‘Powell’, ‘CG’, ‘Newton-CG’, ‘TNC’, ‘COBYLA’, ‘SLSQP’, ‘trust-constr’, ‘dogleg’, ‘trust-krylov’, ‘trust-exact’.

  • optimization_algorithm_kwargs (dict, optional) – kwargs to be passed to optimization algorithm, e.g., {'grid_size': 5} (default 9 for the grid search).

  • maximum_evaluations (int, optional) – maximum number of optimization evaluations (default: 10000) supported by all optimizers except for grid search.

Saves the final hyperparameters in yourmodel.hyperparameters adn validation loss in yourmodel.validation_loss.

class mlatom.models.hyperparameter(value: Any = None, optimization_space: str = 'linear', dtype: Callable | None = None, name: str = '', minval: Any = None, maxval: Any = None, step: Any = None, choices: Iterable[Any] = [], **kwargs)[源代码]

Class of hyperparameter object, containing data could be used in hyperparameter optimizations.

参数:
  • value (Any, optional) – The value of the hyperparameter.

  • optimization_space (str, optional) – Defines the space for hyperparameter. Currently supports 'linear', and 'log'.

  • dtype (Callable, optional) – A callable object that forces the data type of value. Automatically choose one if set to None.

update(new_hyperparameter: hyperparameter) None[源代码]

Update hyperparameter with data in another instance.

参数:

new_hyperparameter (mlatom.models.hyperparamters) – Whose data are to be applied to the current instance.

copy()[源代码]

Returns a copy of current instance.

返回:

a new instance copied from current one.

返回类型:

mlatom.models.hyperparamter

class mlatom.models.hyperparameters(dict=None, /, **kwargs)[源代码]

Class for storing hyperparameters, values are auto-converted to mlatom.models.hyperparameter objects. Inherit from collections.UserDict.

Initiaion:

Initiate with a dictinoary or kwargs or both.

e.g.:

hyperparamters({'a': 1.0}, b=hyperparameter(value=2, minval=0, maxval=4))
copy(keys: Iterable[str] | None = None) hyperparameters[源代码]

Returns a copy of current instance.

参数:

keys (Iterable[str], optional) – If keys provided, only the hyperparameters selected by keys will be copied, instead of all hyperparameters.

返回:

a new instance copied from current one.

返回类型:

mlatom.models.hyperparamters

class mlatom.models.kreg(model_file: str | None = None, ml_program: str = 'KREG_API', equilibrium_molecule: molecule | None = None, prior: float = 0, nthreads: int | None = None, hyperparameters: Dict[str, Any] | hyperparameters = {})[源代码]

Create a KREG model object.

参数:
  • model_file (str, optional) – 应该将模型转储或加载到其中的文件的名称。

  • ml_program (str, optional) – 指定要使用的ML程序。可用选项: 'KREG_API', 'MLatomF

  • equilibrium_molecule (mlatom.data.molecule | None) – 指定要用于生成RE描述符的平衡几何构型。如果设置为 None ,将选择能量/值最低的几何构型。

  • prior (default - None) – 先验可以是 ‘mean’, None(0.0)或任何浮点数。

  • hyperparameters (Dict[str, Any] | mlatom.models.hyperparameters, optional) – 使用提供更新模型的超参数。

generate_model_dict()[源代码]

Generates model dictionary for dumping in json format.

train(molecular_database=None, property_to_learn=None, xyz_derivative_property_to_learn=None, save_model=True, invert_matrix=False, matrix_decomposition=None, prior=None, hyperparameters: Dict[str, Any] | hyperparameters = {})[源代码]

Train the KREG model with molecular database provided.

参数:
  • molecular_database (mlatom.data.molecular_database) – 用于训练模型的分子数据库。

  • property_to_learn (str, optional) – 在模型训练中要学习的属性标签。

  • xyz_derivative_property_to_learn (str, optional) – 要学习的xyz导数属性的标签。

  • prior (str or float or int, optional) – default zero prior. It can also be ‘mean’ and any user-defined number.

predict(molecular_database=None, molecule=None, calculate_energy=False, calculate_energy_gradients=False, calculate_hessian=False, property_to_predict=None, xyz_derivative_property_to_predict=None, hessian_to_predict=None)[源代码]

使用模型预测分子的几何构型。

参数:
  • molecular_database (mlatom.data.molecular_database, optional) – 数据库中包含需要由模型预测其性质的分子。

  • molecule (mlatom.models.molecule, optional) – 需要用模型预测其性质的分子对象。

  • calculate_energy (bool, optional) – 使用模型计算能量。

  • calculate_energy_gradients (bool, optional) – 使用模型计算能量梯度。

  • calculate_hessian (bool, optional) – 使用模型计算能量Hessian。

  • property_to_predict (str, optional) – 待保存的预测属性的标签名称。

  • xyz_derivative_property_to_predict (str, optional) – 待保存的预测的xy导数属性的标签名称。

  • hessian_to_predict (str, optional) – 待保存的预测的Hessians的标签名称。

mlatom.models.ani(**kwargs)[源代码]

返回一个ANI模型对象(参见 mlatom.interfaces.torchani_interface.ani )。

mlatom.models.dpmd(**kwargs)[源代码]

返回一个DPMD模型对象(参见 mlatom.interfaces.dpmd_interface.dpmd )。

mlatom.models.gap(**kwargs)[源代码]

返回一个GAP模型对象(参见 mlatom.interfaces.gap_interface.gap )。

mlatom.models.physnet(**kwargs)[源代码]

返回一个PhysNet模型对象(参见 mlatom.interfaces.physnet_interface.physnet )。

mlatom.models.sgdml(**kwargs)[源代码]

返回一个sGDML模型对象(参见 mlatom.interfaces.sgdml_interface.sgdml )。

mlatom.models.mace(**kwargs)[源代码]

返回一个GAP模型对象(参见 mlatom.interfaces.gap_interface.gap )。

model_tree_node

!---------------------------------------------------------------------------!
! models: Module with models                                                !
! Implementations by: Pavlo O. Dral, Fuchun Ge, Yi-Fan Hou, Yuxinxin Chen,  !
!                     Peikun Zheng                                          !
!---------------------------------------------------------------------------!
class mlatom.models.model_tree_node(name=None, parent=None, children=None, operator=None, model=None)[源代码]

创建一个模型树节点对象。

参数:
  • name (str) – 为模型树节点设定的名字。

  • parent – 模型树节点的父本。

  • children – 模型树节点的子体。

  • operator – 指定预测时要进行的操作。

predict(**kwargs)[源代码]

使用模型预测分子的几何构型。

参数:
  • molecular_database (mlatom.data.molecular_database, optional) – 数据库中包含需要由模型预测其性质的分子。

  • molecule (mlatom.models.molecule, optional) – 需要用模型预测其性质的分子对象。

  • calculate_energy (bool, optional) – 使用模型计算能量。

  • calculate_energy_gradients (bool, optional) – 使用模型计算能量梯度。

  • calculate_hessian (bool, optional) – 使用模型计算能量Hessian。

dump(filename=None, format='json')[源代码]

将该模型树节点转储到文件中。

Interfaces

第三方软件接口。

TorchANI

DeepMD-kit

GAP/QUIP

PhysNet

MACE

sGDML

Gaussian

Orca

DFT-D4

PySCF

Sparrow

xTB

MNDO

Simulations

!---------------------------------------------------------------------------!
! simulations: Module for simulations                                       !
! Implementations by: Pavlo O. Dral                                         !
!---------------------------------------------------------------------------!

Geomopt, freq, DMC

class mlatom.simulations.optimize_geometry(model=None, model_predict_kwargs={}, initial_molecule=None, molecule=None, ts=False, program=None, optimization_algorithm=None, maximum_number_of_steps=None, convergence_criterion_for_forces=None, working_directory=None, print_properties=None, dump_trajectory_interval=None, filename=None, format='json', **kwargs)[源代码]

几何优化。

参数:
  • model (mlatom.models.model or mlatom.models.methods) – 任何可提供能量和力的模型或方法。

  • initial_molecule (mlatom.data.molecule) – 待优化的分子对象。

  • ts (bool, optional) – 是否进行过渡状态搜索。目前仅支持Gaussian、ASE或geometric程序。

  • program (str, optional) – 用于几何优化的程序。目前支持Gaussian、ASE、scipy和PySCF。

  • optimization_algorithm (str, optional) – ASE中使用的优化算法。默认值:LBFGS (ts=False), dimer (ts=False)。

  • maximum_number_of_steps (int, optional) – ASE、SciPy和geometric的最大步数。默认值:200。

  • convergence_criterion_for_forces (float, optional) – ASE中的强制收敛准则。默认值:0.02 eV/Angstroms。

  • working_directory (str, optional) – 工作目录。默认值为‘.’, 即当前目录。

  • constraints (dict, optional) – 几何优化的约束。目前只适用于program=ASE以及program=geometric。对于program=ASE,约束条件遵循与ASE相同的规则: constraints={'bonds':[[目标,[索引0,索引1]], ...],'angles':[[目标,[索引0,索引1,索引2]], ...],'dihedrals':[[目标,[索引0,索引1,索引2,索引3]], ...]} (请参阅 ASE中的FixInternals类 以获取更多信息)。对于program=geometric,需要提供约束文件的名称,约束文件的格式请参考 约束优化

  • print_properties (None or str, optional) – 待打印的性质。默认:None。可选项为’all’。

  • dump_trajectory_interval (int, optional) – 转储轨迹的时间间隔(1)。设置为 None 可禁用转储(默认)。

  • filename (str, optional) – 保存转储轨迹的文件。

  • format (str, optional) – 转储轨迹的保存格式。

示例:

# Initialize molecule
mol = ml.data.molecule()
mol.read_from_xyz_file(filename='ethanol.xyz')
# Initialize methods
aiqm1 = ml.models.methods(method='AIQM1', qm_program='MNDO')
# Run geometry optimization
geomopt = ml.simulations.optimize_geometry(model = aiqm1, initial_molecule=mol, program = 'ASE')
# Get the optimized geometry, energy, and gradient
optmol = geomopt.optimized_molecule
geo = optmol.get_xyz_coordinates()
energy = optmol.energy
gradient = optmol.get_energy_gradients()
class mlatom.simulations.freq(model=None, model_predict_kwargs={}, molecule=None, program=None, ir=False, raman=False, normal_mode_normalization='mass deweighted normalized', anharmonic=False, anharmonic_kwargs={}, working_directory=None)[源代码]

频率分析。

参数:
  • model (mlatom.models.model or mlatom.models.methods) – 任何能够提供能量、力和Hessian的模型或方法。

  • molecule (mlatom.data.molecule) – 具有必要信息的分子对象。

  • program (str, optional) – 用于分析频率的程序。支持pyscf或Gaussian。

  • normal_mode_normalization (str, optional) – 简正模输出方案。 它应该是以下值之一:质量加权归一化、质量减权非归一化和质量减权归一化(默认值)。

  • anharmonic (bool) – 是否进行非谐波频率计算。

  • working_directory (str, optional) – 工作目录。默认值为‘.’, 即当前目录。

示例:

# Initialize molecule
mol = ml.data.molecule()
mol.read_from_xyz_file(filename='ethanol.xyz')
# Initialize methods
aiqm1 = ml.models.methods(method='AIQM1', qm_program='MNDO')
# Run frequence analysis
ml.simulations.freq(model=aiqm1, molecule=mol, program='ASE')
# Get frequencies
frequencies = mol.frequencies
class mlatom.simulations.thermochemistry(model=None, molecule=None, program=None, ir=False, raman=False, normal_mode_normalization='mass deweighted normalized')[源代码]

热化学性质计算。

参数:
  • model (mlatom.models.model or mlatom.models.methods) – 任何能够提供能量、力和Hessian的模型或方法。

  • molecule (mlatom.data.molecule) – 具有必要信息的分子对象。

  • program (str) – 用于热化学性质计算的程序。目前支持Gaussian和ASE。

  • normal_mode_normalization (str, optional) – 简正模输出方案。 它应该是以下值之一:质量加权归一化、质量减权非归一化和质量减权非归一化(默认值)。

# Initialize molecule
mol = ml.data.molecule()
mol.read_from_xyz_file(filename='ethanol.xyz')
# Initialize methods
aiqm1 = ml.models.methods(method='AIQM1', qm_program='MNDO')
# Run thermochemical properties calculation
ml.simulations.thermochemistry(model=aiqm1, molecule=mol, program='ASE')
# Get ZPE and heat of formation
ZPE = mol.ZPE
Hof = mol.DeltaHf298

计算结束后,得到 molecule 对象的热化学性质:

  • ZPE: 零点能

  • DeltaE2U: 能量的热校正 (仅支持使用Gaussian)

  • DeltaE2H: 焓的热校正 (仅支持使用Gaussian)

  • DeltaE2G: 吉布斯自由能的热校正 (仅支持使用Gaussian)

  • U0: 0K时的内能

  • H0: 0K时的焓

  • U: 内能 (仅支持使用Gaussian)

  • H: 焓

  • G: 吉布斯自由能

  • S: 熵 (仅支持使用Gaussian)

  • atomization_energy_0K

  • ZPE_exclusive_atomization_energy_0K

  • DeltaHf298: 298 K时的生成热

class mlatom.simulations.dmc(model: model, initial_molecule: molecule = None, initial_molecular_database: molecular_database = None, energy_scaling_factor: float = 1.0)[源代码]

使用 PyVibDMC 运行扩散蒙特卡罗模拟。

参数:
  • model (mlatom.models.model) – 势能面模型。单位应为Hartree,否则需要设置正确的 energy_scaling_factor

  • initial_molecule (mlatom.data.molecule) – 步行者的初始几何形状。通常应提供能量最小几何。默认情况下,每个坐标将被缩放1.01,使其稍微扭曲。

  • energy_scaling_factor (float, optional) – 被乘到模型的能量预测中的因子

run(run_dir: str = 'DMC', weighting: str = 'discrete', number_of_walkers: int = 5000, number_of_timesteps: int = 10000, equilibration_steps: int = 500, dump_trajectory_interval: int = 500, dump_wavefunction_interval: int = 1000, descendant_weighting_steps: int = 300, time_step: float = 0.024188843265857, initialize: bool = False)[源代码]

运行DMC模拟

参数:
  • run_dir (str) – 用于存放输出文件的文件夹。

  • weighting (str) – 'discrete' 或者 'continuous''continuous' 保持系综大小不变。

  • number_of_walkers (int) – 探测势能面时的几何构型的数目

  • number_of_timesteps (int) – 模拟运行的步数

  • equilibration_steps (int) – 平衡的步数

  • dump_trajectory_interval (int) – 丢弃步行轨迹的间隔

  • dump_wavefunction_interval (int) – 收集波函数的间隔

  • descendant_weighting_steps (int) – 每个波函数的后代加权的时间步数

  • time_step (float) – 每个时步的长度,单位飞秒(fs)

load(filename)[源代码]

从HDF5文件加载之前的模拟结果

get_zpe(start_step=1000) float[源代码]

返回计算得到的零点能量,单位Hartree

参数:

start_step (int) – 求能量平均值的第一步

mlatom.simulations.numerical_gradients(molecule, model, displacement=1e-05, model_kwargs={}, return_molecular_database=False, nthreads=None)[源代码]

Calculate numerical gradients. Two-point numerical differentiation is used and the required single-point calculations are run in parallel.

参数:
  • molecule (mlatom.data.molecule) – the molecule object.

  • model (mlatom.models.model or mlatom.models.methods) – any model or method which provides energies (takes molecule as an argument).

  • displacement (float, optional) – displacement of nuclear coordinates in Angstrom (default: 1e-5).

  • model_kwargs (dict, optional) – kwargs to be passed to model (except for molecule).

  • return_molecular_database (bool, optional) – whether to return the mlatom.data.molecular_database with the displaced geometries and energies (default: False).

  • nthreads (int, optional) – number of threads (default: None, using all threads it can find).

mlatom.simulations.numerical_hessian(molecule, model, displacement=0.000529167, displacement4grads=1e-05, model_kwargs={})[源代码]

Calculate numerical Hessians. Two-point numerical differentiation is used and the required single-point calculations are run in parallel.

参数:
  • molecule (mlatom.data.molecule) – the molecule object.

  • model (mlatom.models.model or mlatom.models.methods) – any model or method which provides energies (takes molecule as an argument).

  • displacement (float, optional) – displacement of nuclear coordinates in Angstrom (default: 5.29167e-4).

  • displacement4grads (float, optional) – displacement of nuclear coordinates in Angstrom (default: 1e-5) when calculating gradients.

  • model_kwargs (dict, optional) – kwargs to be passed to model (except for molecule).

Initial conditions

mlatom.initial_conditions.generate_initial_conditions(molecule=None, generation_method=None, number_of_initial_conditions=1, file_with_initial_xyz_coordinates=None, file_with_initial_xyz_velocities=None, eliminate_angular_momentum=True, degrees_of_freedom=None, initial_temperature=None, initial_kinetic_energy=None, use_hessian=False, reaction_coordinate_momentum=True, filter_by_energy_window=False, window_filter_kwargs={}, random_seed=None)[源代码]

生成初始条件。

参数:
  • molecule (data.molecule) – 带有必要信息的分子。

  • generation_method (str) – 初始条件生成方法,参见下面的表格

  • number_of_initial_conditions (int) – 生成初始条件的个数,默认为1。

  • file_with_initial_xyz_coordinates (str) – 初始xyz坐标的文件,仅对 generation_method='user-defined' 有效

  • file_with_initial_xyz_velocities (str) – 初始xyz速度的文件,仅对 generation_method='user-defined' 有效

  • eliminate_angular_momentum (bool) – 从速度中去除角动量,对 generation_method='random'generation_method='wigner' 有效

  • degrees_of_freedom (int) – 分子的自由度,默认情况下去除了平动和转动自由度。 当该值设置为负值时,运行时将从3N中减去一些值,N为分子中的原子个数

  • initial_temperature (float) – 初始温度以开尔文为单位,控制随机初始速度。

  • initial_kinetic_energy (float) – 初始能量以Hartree为单位,控制随机初始速度。

  • random_seed (int) – numpy随机数生成器的随机种子(除非每次都想获得相同的结果,否则不要使用)

  • filter_by_energy_window (bool) – 按激发能窗口滤波

  • window_filter_kwargs (dict) – 用于过滤能量窗口的关键字参数,请参阅下表

生成方法

说明

'user-defined' (默认)

使用用户定义的初始条件

'random'

产生随机速度

'maxwell-boltzmann'

根据麦克斯韦-玻尔兹曼分布随机生成初速度

'wigner'

Newton-X 实现中使用Wigner采样

window_filter_kwargs

说明

model

能计算激发能和振子强度的模型或方法

model_predict_kwargs

上述模型的关键字参数,通常是 nstates ,指定要计算多少个状态

target_excitation_energy (float)

单位为 eV

window_half_width (float)

单位为 eV

random_seed (int)

numpy随机数生成器的随机种子(除非每次都想获得相同的结果,否则不要使用)

返回:

初始条件为 number_of_initial_conditions 的分子数据库( ml.data.molecular_database )

示例:

# Use user-defined initial conditions
init_cond_db = ml.generate_initial_conditions(molecule = mol,
                                              generation_method = 'user-defined',
                                              file_with_initial_xyz_coordinates = 'ethanol.xyz',
                                              file_with_initial_xyz_velocities  = 'ethanol.vxyz',
                                              number_of_initial_conditions = 1)
# Generate random velocities
init_cond_db = ml.generate_initial_conditions(molecule = mol,
                                              generation_method = 'random',
                                              initial_temperature = 300,
                                              number_of_initial_conditions = 1)
# Use Wigner sampling
init_cond_db = ml.generate_initial_conditions(molecule = mol,
                                              generation_method = 'wigner',
                                              number_of_initial_conditions = 1)

# Sample with filtering by excitation energy window. Requires the model for calculating excitation energies and oscillator strengths.
model = ml.models.methods(method='AIQM1')
model_predict_kwargs={'nstates':9} # requests calculation of 9 electronic states
window_filter_kwargs={'model':model,
                      'model_predict_kwargs':model_predict_kwargs,
                      'target_excitation_energy':5.7, # eV
                      'window_half_width':0.1, # eV}
init_cond_db = ml.generate_initial_conditions(molecule=mol,
                                            generation_method='wigner',
                                            number_of_initial_conditions=5,
                                            initial_temperature=0,
                                            random_seed=0,
                                            use_hessian=False,
                                            filter_by_energy_window=True,
                                            window_filter_kwargs=window_filter_kwargs)

备注

ml.models.methods.predict(molecule=mol,calculate_hessian=True) 获取Hessian矩阵。

Molecular dynamics

!---------------------------------------------------------------------------!
! md: Module for molecular dynamics                                         !
! Implementations by: Yi-Fan Hou & Pavlo O. Dral                            !
!---------------------------------------------------------------------------!
class mlatom.md.md(model=None, model_predict_kwargs={}, molecule_with_initial_conditions=None, molecule=None, ensemble='NVE', thermostat=None, time_step=0.1, maximum_propagation_time=1000, dump_trajectory_interval=None, filename=None, format='h5md', stop_function=None, stop_function_kwargs=None)[源代码]

Molecular dynamics

参数:
  • model (mlatom.models.model or mlatom.models.methods) – 任何可提供能量和力的模型或方法。

  • molecule_with_initial_conditions (data.molecule) – 有初始条件的分子。

  • ensemble (str, optional) – 使用哪种组合。

  • thermostat (thermostat.Thermostat) – 应用于系统的恒温器。

  • time_step (float) – 以飞秒为单位的时间步长。

  • maximum_propagation_time (float) – 最大模拟时间(以飞秒为单位)。

  • dump_trajectory_interval (int, optional) – 转储轨迹的时间间隔。设置为 None 可禁用转储。

  • filename (str, optional) – 保存转储轨迹的文件。

  • format (str, optional) – 转储轨迹的保存格式。

  • stop_function (any, optional) – 用户定义的函数,在 maximum_propagation_time 之前停止MD模拟。

  • stop_function_kwargs (Dict, optional) – Kwargs of stop_function

系综

说明

'NVE' (default)

微正则系综(NVE)

'NVT'

正则系综(NVT)

恒温器

说明

ml.md.Andersen_thermostat

Andersen恒温器

ml.md.Nose_Hoover_thermostat

Hose-Hoover恒温器

None (默认)

未应用恒温器

For theoretical details, see and cite original paper.

示例:

# Initialize molecule
mol = ml.data.molecule()
mol.read_from_xyz_file(filename='ethanol.xyz')
# Initialize methods
aiqm1 = ml.models.methods(method='AIQM1')
# User-defined initial condition
init_cond_db = ml.generate_initial_conditions(molecule = mol,
                                              generation_method = 'user-defined',
                                              file_with_initial_xyz_coordinates = 'ethanol.xyz',
                                              file_with_initial_xyz_velocities  = 'ethanol.vxyz')
init_mol = init_cond_db.molecules[0]
# Initialize thermostat
nose_hoover = ml.md.Nose_Hoover_thermostat(temperature=300,molecule=init_mol,degrees_of_freedom=-6)
# Run dynamics
dyn = ml.md(model=aiqm1,
            molecule_with_initial_conditions = init_mol,
            ensemble='NVT',
            thermostat=nose_hoover,
            time_step=0.5,
            maximum_propagation_time = 10.0)
# Dump trajectory
traj = dyn.molecular_trajectory
traj.dump(filename='traj', format='plain_text')
traj.dump(filename='traj.h5', format='h5md')

备注

轨迹保存在 ml.md.molecular_trajectory 中,这是一个 ml.data.molecular_trajectory

警告

在MLatom中,能量单位为Hartree,距离单位为Angstrom。请确保模型中的单位一致。

class Andersen_thermostat(**kwargs)

Andersen恒温器对象

参数:
  • gamma (float) – fs^{-1}中的碰撞率,默认为0.2

  • temperature (float) – 系统温度以开尔文为单位,默认为300

class Nose_Hoover_thermostat(**kwargs)

Nose-Hoover恒温器对象

参数:
  • nose_hoover_chain_length (int) – Nose Hoover链长度,应为正数,默认为3

  • multiple_time_step (int) – 多个时间步长,应为正数,默认为3

  • number_of_yoshida_suzuki_steps (int) – Yoshida Suzuki步数,可以是(1,3,5,7)中的任意一个,默认为7

  • nose_hoover_chain_frequency (float) – 以 fs^{-1} 为单位的 Nose-Hoover 链频率,默认为0.0625,应与要平衡的频率相当

  • temperature (float) – 系统温度以开尔文为单位,默认为300

  • molecule (data.molecule) – 要平衡的分子

  • degrees_of_freedom – 系统的自由度

Surface-hopping dynamics

!---------------------------------------------------------------------------!
! namd: Module for nonadiabatic molecular dynamics                          !
! Implementations by: Lina Zhang & Pavlo O. Dral                            !
!---------------------------------------------------------------------------!
class mlatom.namd.surface_hopping_md(model=None, model_predict_kwargs={}, molecule_with_initial_conditions=None, molecule=None, ensemble='NVE', thermostat=None, time_step=0.1, maximum_propagation_time=100, dump_trajectory_interval=None, filename=None, format='h5md', stop_function=None, stop_function_kwargs=None, hopping_algorithm='LZBL', nstates=None, initial_state=None, random_seed=<function generate_random_seed>, prevent_back_hop=False, reduce_memory_usage=False, rescale_velocity_direction='along velocities', reduce_kinetic_energy=False)[源代码]

面跳跃分子动力学

参数:
  • model (mlatom.models.model or mlatom.models.methods) – 任何可提供能量和力的模型或方法。

  • model_predict_kwargs (Dict, optional) – 模型预测的关键字参数

  • molecule_with_initial_conditions (data.molecule) – 有初始条件的分子。

  • molecule (data.molecule) – 工作原理与molecule_with_initial_conditions相同

  • ensemble (str, optional) – 使用哪种组合。

  • thermostat (thermostat.Thermostat) – 应用于系统的恒温器。

  • time_step (float) – 以飞秒为单位的时间步长。

  • maximum_propagation_time (float) – 最大模拟时间(以飞秒为单位)。

  • dump_trajectory_interval (int, optional) – 转储轨迹的时间间隔。设置为 None 可禁用转储。

  • filename (str, optional) – 保存转储轨迹的文件。

  • format (str, optional) – 转储轨迹的保存格式。

  • stop_function (any, optional) – 用户定义的函数,在 maximum_propagation_time 之前停止MD模拟。

  • stop_function_kwargs (Dict, optional) – Kwargs of stop_function

  • hopping_algorithm (str, optional) – 面跳跃算法

  • nstates (int) – 态数目

  • initial_state (int) – 初态

  • random_seed (int) – 随机种子

  • prevent_back_hop (bool, optional) – 是否阻止回跃

  • rescale_velocity_direction (string, optional) – 重新调整速度方向

  • reduce_kinetic_energy (bool, optional) – 是否降低动能

系综

说明

'NVE' (default)

微正则系综(NVE)

'NVT'

正则系综(NVT)

恒温器

说明

ml.md.Andersen_thermostat

Andersen恒温器

ml.md.Nose_Hoover_thermostat

Hose-Hoover恒温器

None (默认)

未应用恒温器

有关理论细节,请参阅并引用原始论文(待提交)。

  • Lina Zhang, Sebastian Pios, Mikołaj Martyka, Fuchun Ge, Yi-Fan Hou, Yuxinxin Chen, Joanna Jankowska, Lipeng Chen, Mario Barbatti, Pavlo O. Dral. MLatom software ecosystem for surface hopping dynamics in Python with quantum mechanical and machine learning methods. 2024, to be submitted. Preprint on arXiv: https://arxiv.org/abs/2404.06189.

示例:

# Propagate multiple LZBL surface-hopping trajectories in parallel
# .. setup dynamics calculations
namd_kwargs = {
            'model': aiqm1,
            'time_step': 0.25,
            'maximum_propagation_time': 5,
            'hopping_algorithm': 'LZBL',
            'nstates': 3,
            'initial_state': 2,
            }

# .. run trajectories in parallel
dyns = ml.simulations.run_in_parallel(molecular_database=init_cond_db,
                                    task=ml.namd.surface_hopping_md,
                                    task_kwargs=namd_kwargs,
                                    create_and_keep_temp_directories=True)
trajs = [d.molecular_trajectory for d in dyns]

# Dump the trajectories
itraj=0
for traj in trajs:
    itraj+=1
    traj.dump(filename=f"traj{itraj}.h5",format='h5md')

# Analyze the result of trajectories and make the population plot
ml.namd.analyze_trajs(trajectories=trajs, maximum_propagation_time=5)
ml.namd.plot_population(trajectories=trajs, time_step=0.25,
                    max_propagation_time=5, nstates=3, filename=f'pop.png',
                    pop_filename='pop.txt')

备注

轨迹保存在 ml.md.molecular_trajectory 中,这是一个 ml.data.molecular_trajectory

警告

在MLatom中,能量单位为Hartree,距离单位为Angstrom。请确保模型中的单位一致。

Spectra

!---------------------------------------------------------------------------!
! spectra: Module for working with spectra                                  !
! Implementations by: Yi-Fan Hou, Fuchun Ge, Bao-Xin Xue, Pavlo O. Dral     !
!---------------------------------------------------------------------------!
class mlatom.spectra.uvvis(x=None, y=None, wavelengths_nm=None, energies_eV=None, molar_absorbance=None, cross_section=None, meta_data=None)[源代码]

UV/Vis absorption spectrum class

参数:
  • x (float, np.ndarray) – range of spectra (e.g., wavelength in nm, recommended, or energies in eV)

  • y (float, np.ndarray) – user-provided intensities (e.g., molar absorpbance, recommended, or cross section)

  • done (It is better to provide spectrum information explicitly so that the correct conversions to different units are)

  • wavelengths_nm (float, np.ndarray) – range of wavelengths in nm

  • energies_eV (float, np.ndarray) – range of energies in eV

  • molar_absorbance (float, np.ndarray) – molar absorbance (extinction coefficients) in M^-1 cm^-1

  • cross_section (float, np.ndarray) – cross section in A^2/molecule

  • Also

  • meta-data (the user is encouraged to provide the)

  • meta_data (str) – meta data such as solvent, references, etc.

示例

uvvis = mlatom.spectra.uvvis(

wavelengths_nm = np.array(…), molar_absorbance = np.array(…), meta_data = ‘solvent: benzene, reference: DOI…’ )

# spectral properties can be accessed as: # uvvis.x is equivalent to what is provided by the user, e.g., wavelengths_nm or energies_eV # uvvis.y is equivalent to what is provided by the user, e.g., molar_absorbance or cross_section # wavelength range (float, np.ndarray) in nm uvvis.wavelengths_nm # molar absorbance (extinction coefficients) (float, np.ndarray) in M^-1 cm^-1 uvvis.molar_absorbance # energies corresponding to the wavelength range (float, np.ndarray), in eV uvvis.energies_eV # absorption cross-section (float, np.ndarray) in A^2/molecule uvvis.cross_section

classmethod spc(molecule=None, band_width=0.3, shift=0.0, refractive_index=1.0)[源代码]

Single-point convolution (SPC) approach for obtaining UV/vis spectrum via calculating the exctinction coefficient (and absorption cross section) from the single-point excited-state simulations for a single geometry Implementation follows http://doi.org/10.1007/s00894-020-04355-y

参数:
  • molecule (mlatom.data.molecule) – molecule object with excitation_energies (in Hartree, not eV!) and oscillator_strengths

  • wavelengths_nm (float, np.ndarray) – range of wavelengths in nm (default: np.arange(400, 800))

  • band_width (float) – band width in eV (default: 0.3 eV)

  • shift (float) – shift of excitation energies, eV (default: 0 eV)

  • refractive_index (float) – refractive index (default: 1)

示例

uvvis = mlatom.spectra.uvvis.spc(

molecule=mol, wavelengths_nm=np.arange(100, 200), band_width=0.3)

# spectral properties can be accessed as: # uvvis.x is equivalent to uvvis.wavelengths_nm # uvvis.y is equivalent to uvvis.molar_absorbance # wavelength range (float, np.ndarray) in nm uvvis.wavelengths_nm # molar absorbance (extinction coefficients) (float, np.ndarray) in M^-1 cm^-1 uvvis.molar_absorbance # energies corresponding to the wavelength range (float, np.ndarray), in eV uvvis.energies_eV # absorption cross-section (float, np.ndarray) in A^2/molecule uvvis.cross_section # quick plot uvvis.plot(filename=’uvvis.png’)

classmethod spc_broadening_func(DeltaE, ff, wavelength_range, band_width, refractive_index=1, shift=0.0)[源代码]

Spectrum convolution function

参数:
  • band_width (float) – width of band

  • DeltaE (float) – vertical excitation energy, eV

  • ff (float) – oscillator strength

  • wavelength_range (float, np.ndarray) – range of wavelengths

  • refractive_index (float) – refractive index

  • shift (float) – peak shift

返回:

extinction coefficients in M^-1 cm^-1

返回类型:

(float, np.ndarray)

classmethod nea(molecular_database=None, wavelengths_nm=None, broadening_width=0.05)[源代码]

Nuclear ensemble approach (NEA) for obtaining UV/vis spectrum. Implementation follows Theor. Chem. Acc. 2012, 131, 1237.

参数:
  • molecular_database (mlatom.data.molecular_database) – molecular_database object with molecules containing excitation_energies (in Hartree, not eV!) and oscillator_strengths

  • wavelengths_nm (float, np.ndarray) – range of wavelengths in nm (default: determined automatically)

  • broadening_width (float) – broadening factor in eV (default: 0.05 eV)

示例

uvvis = mlatom.spectra.uvvis.nea(molecular_database=db,

wavelengths_nm=wavelengths_nm, broadening_width=0.02)

# spectral properties can be accessed as: # uvvis.x is equivalent to uvvis.wavelengths_nm # uvvis.y is equivalent to uvvis.molar_absorbance # wavelength range (float, np.ndarray) in nm uvvis.wavelengths_nm # molar absorbance (extinction coefficients) (float, np.ndarray) in M^-1 cm^-1 uvvis.molar_absorbance # energies corresponding to the wavelength range (float, np.ndarray), in eV uvvis.energies_eV # absorption cross-section (float, np.ndarray) in A^2/molecule uvvis.cross_section # quick plot uvvis.plot(filename=’uvvis.png’)

Active learning

Initial data sampling

initdata_sampler can be:

  • 'wigner'

  • 'harmonic-quantum-boltzmann'

User-defined ML models

The user has the flexibility to create their own ML model class for AL. Minimum requirements to such a class:

  • it must have the usual train and predict functions.

  • the train function must accept molecular_database parameter.

  • the predict function must accept molecule and/or molecular_database parameters.

The realistic, fully fledged example of how to create a usable ML model class is below (it is what we use in al routine!):

class my_model():
    def __init__(self, al_info = {}, model_file=None, device=None, verbose=False):
        import torch
        if device is None:
            device = 'cuda' if torch.cuda.is_available() else 'cpu'

        if model_file is None:
            if 'mlmodel_file' in al_info.keys():
                self.model_file = al_info['mlmodel_file']
            else:
                self.model_file = 'mlmodel'
                al_info['mlmodel_file'] = self.model_file
        else:
            self.model_file = model_file
            al_info['mlmodel_file'] = self.model_file
        if 'main_mlmodel_file' in al_info.keys():
            main_mlmodel_file = al_info['main_mlmodel_file']
        else:
            main_mlmodel_file = f'{self.model_file}.pt'
            al_info['main_mlmodel_file'] = main_mlmodel_file
        if 'aux_mlmodel_file' in al_info.keys():
            aux_mlmodel_file = al_info['aux_mlmodel_file']
        else:
            aux_mlmodel_file = f'aux_{self.model_file}.pt'
            al_info['aux_mlmodel_file'] = aux_mlmodel_file
        self.device = device
        self.verbose = verbose
        self.main_model = ml.models.ani(model_file=main_mlmodel_file,device=device,verbose=verbose)
        self.aux_model = ml.models.ani(model_file=aux_mlmodel_file,device=device,verbose=verbose)

    def train(self, molecular_database=None, al_info={}):
        if 'working_directory' in al_info.keys():
            workdir = al_info['working_directory']
            self.main_model.model_file = f'{workdir}/{self.model_file}.pt'
            self.aux_model.model_file = f'{workdir}/aux_{self.model_file}.pt'

        validation_set_fraction = 0.1
        [subtraindb, valdb] = molecular_database.split(number_of_splits=2, fraction_of_points_in_splits=[1-validation_set_fraction, validation_set_fraction], sampling='random')

        # train the model on energies and gradients
        self.main_model = ml.models.ani(model_file=self.main_model.model_file,device=self.device,verbose=self.verbose)
        self.main_model.train(molecular_database=subtraindb,validation_molecular_database=valdb,property_to_learn='energy',xyz_derivative_property_to_learn='energy_gradients')

        # train the auxiliary model only on energies
        self.aux_model = ml.models.ani(model_file=self.aux_main_model.model_file,device=self.device,verbose=self.verbose)
        self.aux_model.train(molecular_database=subtraindb,validation_molecular_database=valdb,property_to_learn='energy')

        if not 'uq_threshold' in al_info.keys():
            self.predict(molecular_database=valdb)
            uqs = valdb.get_property('uq')
            al_info['uq_threshold'] = np.median(uqs) + 3*stats.calc_median_absolute_deviation(uqs)
        self.uq_threshold = al_info['uq_threshold']

        # if the models were trained successfully, let's update al info where we can find them
        al_info['main_mlmodel_file'] = self.main_model.model_file
        al_info['aux_mlmodel_file'] = self.aux_model.model_file

    def predict(self, molecule=None, molecular_database=None):

        # predict energies and gradients with the main model
        self.main_model.predict(molecule=molecule, molecular_database=molecular_database,property_to_predict='energy',xyz_derivative_property_to_predict='energy_gradients')

        # predict energies with the auxiliary model
        self.aux_model.predict(molecule=molecule, molecular_database=molecular_database,property_to_predict='aux_energy')

        # calculate uncertainties
        moldb = molecular_database
        if moldb is None:
            moldb = ml.molecular_database()

        for mol in moldb:
            mol.uq = abs(mol.energy - mol.aux_energy)
            if mol.uq > self.uq_threshold:
                mol.uncertain = True
            else:
                mol.uncertain = False

    # This are useful in some internal al routines, e.g., when we want to make predictions in parallel (if nthreads is not set properly, it may slow down al significantly!)
    @property
    def nthreads(self):
        return self.main_model.nthreads

    @nthreads.setter
    def nthreads(self, value):
        self.main_model.nthreads = value
        self.aux_model.nthreads  = value

ml.al(
    ...
    ml_model = my_model,
    # do not use my_model(...), if you want to pass any arguments, use ml_model_kwargs:
    ml_model_kwargs = {...}, # 'al_info' is unnecessary to include, it will be added automatically. If you supply 'al_info' key, it will overwrite the default one so use if you know what you are doing.
    ...
)

As you can see, it is helpful (but not required) if the __init__ and train functions of the ML model class also accept the al_info parameter which can be used to pass information during active learning from one routine to another.

Sampler

Here is a realistic example of the sampler function used in the physics-informed active learning:

def my_sampler(al_info={}, ml_model=None, initcond_sampler=None, initcond_sampler_kwargs={}, maximum_propagation_time=1000, time_step=0.1, ensemble='NVE', thermostat=None, dump_trajs=False, dump_trajectory_interval=None, stop_function=None, batch_parallelization=True):

    moldb2label = ml.data.molecular_database()

    # generate initial conditions
    if type(initcond_sampler) == str:
        if initcond_sampler.casefold() in ['wigner', 'harmonic-quantum-boltzmann']:
            initcond_sampler = ml.generate_initial_conditions
            initcond_sampler_kwargs['generation_method'] = initcond_sampler
    import inspect
    args, varargs, varkw, defaults = inspect.getargspec(initcond_sampler)
    # Do we need al_info below?
    if 'al_info' in args:
        initial_molecular_database = initcond_sampler(al_info=al_info, **initcond_sampler_kwargs)
    else:
        initial_molecular_database = initcond_sampler(**initcond_sampler_kwargs)

    # run MD in parallel to collect uncertain points
    if batch_parallelization: # Faster way to propagate many trajs with ML
        dyn = ml.md_parallel(model=ml_model,
                             molecular_database=initial_molecular_database,
                             ensemble=ensemble,
                             thermostat=thermostat,
                             time_step=time_step,
                             maximum_propagation_time=maximum_propagation_time,
                             dump_trajectory_interval=dump_trajectory_interval,
                             stop_function=stop_function)
        trajs = dyn.molecular_trajectory
        for itraj in range(len(trajs.steps[0])):
            print(f"Trajectory {itraj} number of steps: {trajs.traj_len[itraj]}")
            if trajs.steps[trajs.traj_len[itraj]][itraj].uncertain:
                print(f'Adding molecule from trajectory {itraj} at time {trajs.traj_len[itraj]*time_step} fs')
                moldb2label.molecules.append(trajs.steps[trajs.traj_len[itraj]][itraj])

            # Dump traj
            if dump_trajs:
                import os
                traj = ml.data.molecular_trajectory()
                for istep in range(trajs.traj_len[itraj]+1):

                    step = ml.data.molecular_trajectory_step()
                    step.step = istep
                    step.time = istep * time_step
                    step.molecule = trajs.steps[istep][itraj]
                    traj.steps.append(step)
                if 'working_directory' in al_info.keys():
                    dirname = f'{al_info['working_directory']}/trajs'
                else:
                    dirname = 'trajs'
                if not os.path.exists(dirname):
                    os.makedirs(dirname)
                traj.dump(f"{dirname}/traj{itraj}.h5",format='h5md')
    else:
        md_kwargs = {
                    'molecular_database': initial_molecular_database,
                    'model': ml_model,
                    'time_step': time_step,
                    'maximum_propagation_time': maximum_propagation_time,
                    'ensemble': ensemble,
                    'thermostat': thermostat,
                    'dump_trajectory_interval': dump_trajectory_interval,
                    'stop_function': stop_function
                    }
        dyns = ml.simulations.run_in_parallel(molecular_database=initial_molecular_database,
                                            task=ml.md,
                                            task_kwargs=md_kwargs,
                                            create_and_keep_temp_directories=False)
        trajs = [d.molecular_trajectory for d in dyns]
        itraj=0
        for traj in trajs:
            itraj+=1
            print(f"Trajectory {itraj} number of steps: {len(traj.steps)}")
            if traj.steps[-1].molecule.uncertain:
                print('Adding molecule from trajectory %d at time %.2f fs' % (itraj, traj.steps[-1].time))
                moldb2label.molecules.append(traj.steps[-1].molecule)

            # Dump traj
            if dump_trajs:
                import os
                if 'working_directory' in al_info.keys():
                    dirname = f'{al_info['working_directory']}/trajs'
                else:
                    dirname = 'trajs'
                if not os.path.exists(dirname):
                    os.makedirs(dirname)
                traj.dump(f"{dirname}/traj{itraj}.h5",format='h5md')
    # add the source of molecule
    for mol in moldb2label:
        mol.sampling = 'md'
    return moldb2label

ml.al(
    ...
    sampler=my_sampler,
    sampler_kwargs={'time_step': 0.5},
    ...
)