概览

MLatom的Python API（简称PyAPI）主要有三个部分： mlatom.data， mlatom.models，和 mlatom.simulations 。

使用 mlatom.data 可以创建/操作/保存/加载化学数据到 mlatom.data.atom， mlatom.data.molecule，和 mlatom.data.molecular_database 中。

mlatom.models 包含许多基于量子力学和机器学习的计算化学模型，可对分子进行预测。这些模型可分为3类：

methods ：直接使用现有方法的模型，无需用户进行训练。
ml_model ：需要用户训练的机器学习模型。
model_tree_node ：由其他模型对象组成的模型（ mlatom.models.methods ， mlatom.models.ml_model 或 mlatom.models.model_tree_node ）。

mlatom.simulations 使用 mlatom.models 来执行分子模拟，例如几何优化或动力学。

这里我们提供了一个说明这些组件用法的简单示例以供参考。

Data

!---------------------------------------------------------------------------!
! data: Module for working with data                                        !
! Implementations by: Pavlo O. Dral, Fuchun Ge,                             !
!                     Shuang Zhang, Yi-Fan Hou, Yanchi Ou                   !
!---------------------------------------------------------------------------!

创建一个原子对象。

参数:

nuclear_charge (int, optional) – 提供核电荷来定义原子。
atomic_number (int, optional) – 提供原子序数来定义原子。
element_symbol (int, optional) – 提供元素符号来定义原子。
nuclear_mass (int, optional) – 提供原子核质量来定义原子。
xyz_coordinates (Array-like, optional) – 在笛卡尔坐标系中指定原子的位置。

copy(atomic_labels=None) → atom[源代码]: 返回当前原子对象的副本。

class mlatom.data.molecule(charge: int = 0, multiplicity: int = 1, atoms: List[atom] = None)[源代码]

创建一个分子对象。

参数:

charge (float, optional) – 指定分子的电荷。
multiplicity (int, optional) – 指定分子的多重度。
atoms (List[atom], optional) – 指定分子中的原子。

示例

选择一个带下标的原子：

from mlatom.data import atom, molecule
at = atom(element_symbol = 'C')
mol = molecule(atoms = [at])
print(id(at), id(mol[0]))

id: 这个分子的唯一ID。

charge: 分子的电荷。

multiplicity: 分子的多重度。

read_from_xyz_file(filename: str, format: str | None = None) → molecule[源代码]

从xyz文件加载分子构型。

如果没有指定参数格式 format ，可以读取标准xyz格式的数据。

支持的其他格式有：

'COLUMBUS'

'NEWTON-X' or 'NX'

'turbomol'

参数:

filename (str) – 待读取的文件的名称。
format (str, optional) – 文件的格式。

read_from_xyz_string(string: str = None, format: str | None = None) → molecule[源代码]

从xyz字符串加载分子的几何构型。

如果没有指定参数格式 format ，可以读取标准xyz格式的数据。

支持的其他格式有：

'COLUMBUS'

'NEWTON-X' or 'NX'

'turbomol'

参数:

string (str) – 字符串输入。
format (str, optional) – 字符串的格式。

read_from_numpy(coordinates: ndarray, species: ndarray) → molecule[源代码]

从一个含有坐标的numpy数组以及一个包含分子种类的numpy数组中加载分子结构。

坐标 coordinates 的输入格式为 (N, 3) ， species 的输入格式为 (N,)

其中 N 代表原子个数。

read_from_smiles_string(smi_string: str) → molecule[源代码]

根据提供的SMILES字符串生成分子的结构。

使用 Pybel 的 make3D() 方法生成优化后的几何构型。

classmethod from_xyz_file(filename: str, format: str | None = None) → molecule[源代码]: molecule.read_from_xyz_file() 的类方法版本，返回一个 molecule 对象。

classmethod from_xyz_string(string: str = None, format: str | None = None) → molecule[源代码]: molecule.read_from_xyz_string() 的类方法版本，返回一个 molecule 对象。

classmethod from_numpy(coordinates: ndarray, species: ndarray) → molecule[源代码]: molecule.read_from_numpy() 的类方法版本，返回一个 molecule 对象。

classmethod from_smiles_string(smi_string: str) → molecule[源代码]: molecule.read_from_smiles_string() 的类方法版本，返回一个 molecule 对象。

add_atom_from_xyz_string(line: str) → None[源代码]: 在分子的xyz文件中添加一个原子。

add_scalar_property(scalar, property_name: str = 'y') → None[源代码]

为分子添加标量属性。这个性质可以通过 molecule.<property_name> 来调用。

参数:

scalar – 要添加的标量。
property_name (str, optional) – 为标量属性设定的名称。

add_xyz_derivative_property(derivative, property_name: str = 'y', xyz_derivative_property: str = 'xyz_derivatives') → None[源代码]

为分子添加xyz导数属性。

参数:

derivative – 要添加的导数属性。
property_name (str, optional) – 关联的非导数属性的名称。
xyz_derivative_property (str, optional) – 为导数属性设定的名称。

add_xyz_vectorial_property(vector, xyz_vectorial_property: str = 'xyz_vector') → None[源代码]

为分子添加xyz矢量属性。

参数:

vector – 要添加的矢量。
xyz_vectorial_property (str, optional) – 为矢量属性设定的名称。

write_file_with_xyz_coordinates(filename: str, format: str | None = None) → None[源代码]

将分子几何数据写入文件。如果没有指定参数格式 format ，则读取标准xyz格式的数据。

支持的其他格式有：

'COLUMBUS'

'NEWTON-X' or 'NX'

'turbomol'

参数:

filename (str) – 待写入的文件的名称。
format (str, optional) – 文件的格式。

get_xyz_string() → str[源代码]: 以xyz格式返回分子的几何构型。

property atomic_numbers: ndarray: 分子中的原子个数。

property element_symbols: ndarray: 分子中原子的元素符号。

property smiles: str: 分子的SMILES表示。

property xyz_coordinates: ndarray: 分子的xyz构型。

property kinetic_energy: float: 根据速度的xyz文件给出动能(A.U.)。

copy(atomic_labels=None, molecular_labels=None)[源代码]: 返回当前分子对象的副本。

dump(filename=None, format='json')[源代码]: 将当前分子对象转储到文件中,目前只支持.json格式。

load(filename=None, format='json')[源代码]: 从转储文件中加载一个分子对象。

property state_energies: ndarray: 分子的电子态能量。

property state_gradients: ndarray: 分子的电子态能量梯度。

property energy_gaps: ndarray: 不同状态的能隙。

property excitation_energies: ndarray: 分子从基态的激发能。

class mlatom.data.molecular_database(molecules: List[molecule] = None)[源代码]

为分子对象生成一个数据库。

参数:: molecules (List[molecule]) – 包含在分子数据库中的分子列表。

示例

选择一个带下标的原子：

from mlatom.data import atom, molecule, molecular_database
at = atom(element_symbol = 'C')
mol = molecule(atoms = [at])
molDB = molecular_database([mol])
print(id(mol) == id(molDB[0]))
# the output should be 'True'

像numpy数组一样对数据库进行切片：

from mlatom.data import molecular_database
molDB = molecular_database.from_xyz_file('devtests/al/h2_fci_db/xyz.dat')
print(len(molDB))           # 451
print(len(molDB[:100:4]))   # 25

read_from_xyz_file(filename: str, append: bool = False) → molecular_database[源代码]

从xyz文件加载分子构型。

参数:

filename (str) – 待读取的文件的名称。
append (bool, optional) – 若为True，则追加到当前数据库，否则清除当前数据库。

read_from_xyz_string(string: str, append=False) → molecular_database[源代码]

从xyz字符串加载分子的几何构型。

参数:

string (str) – 待读取的文件的名称。
append (bool, optional) – 若为True，则追加到当前数据库，否则清除当前数据库。

read_from_numpy(coordinates: ndarray, species: ndarray, append: bool = False) → molecular_database[源代码]

从一个包含坐标的numpy数组以及一个包含分子类别的numpy数组中加载多个分子结构。

坐标 coordinates 的输入格式为 (M, N, 3) ， species 的输入格式为 (M, N,) 。

其中 N 是原子数， M 是分子数。

read_from_smiles_file(smi_file: str, append: bool = False) → molecular_database[源代码]

从提供的SMILES文件生成分子几何构型。

使用 Pybel 的 make3D() 方法生成优化后的几何构型。

read_from_smiles_string(smi_string: str, append: bool = False) → molecular_database[源代码]

从提供的SMILES字符串生成分子几何构型。

使用 Pybel 的 make3D() 方法生成优化后的几何构型。

classmethod from_xyz_file(filename: str) → molecular_database[源代码]: molecular_database.read_from_xyz_file() 的类方法版本，返回一个 molecular_database 对象。

classmethod from_xyz_string(string: str) → molecular_database[源代码]: molecular_database.read_from_xyz_string() 的类方法版本，返回一个 molecular_database 对象。

classmethod from_numpy(coordinates: ndarray, species: ndarray) → molecular_database[源代码]: molecular_database.read_from_numpy() 的类方法版本，返回一个 molecular_database 对象。

classmethod from_smiles_file(smi_file: str) → molecular_database[源代码]: molecular_database.read_from_smiles_file() 的类方法版本，返回一个 molecular_database 对象。

classmethod from_smiles_string(smi_string: str | List) → molecular_database[源代码]: molecular_database.read_from_smiles_string() 的类方法版本，返回一个 molecular_database 对象。

add_scalar_properties(scalars, property_name: str = 'y') → None[源代码]

给分子添加标量属性。

参数:

scalars – 要添加的标量。
property_name (str, optional) – 为标量属性设定的名称。

add_scalar_properties_from_file(filename: str, property_name: str = 'y') → None[源代码]

将文件中的标量属性添加到分子中。

参数:

filename (str) – 指定包含属性的文本文件。
property_name (str, optional) – 为标量属性设定的名称。

add_xyz_derivative_properties(derivatives, property_name: str = 'y', xyz_derivative_property: str = 'xyz_derivatives') → None[源代码]

为分子添加xyz导数属性。

参数:

derivatives – 要添加的导数。
property_name (str, optional) – 关联的非导数属性的名称。
xyz_derivative_property (str, optional) – 为导数属性设定的名称。

add_xyz_derivative_properties_from_file(filename: str, property_name: str = 'y', xyz_derivative_property: str = 'xyz_derivatives') → None[源代码]

将导数的xyz文件文本添加到分子中。

参数:

filename (str) – 待添加导数的文件名。
property_name (str, optional) – 关联的非导数属性的名称。
xyz_derivative_property (str, optional) – 为导数属性设定的名称。

add_xyz_vectorial_properties(vectors, xyz_vectorial_property: str = 'xyz_vector') → None[源代码]

给分子添加一个xyz矢量属性。

参数:

vectors – 要添加的矢量。
xyz_vectorial_property (str, optional) – 为矢量属性设定的名称。

add_xyz_vectorial_properties_from_file(filename: str, xyz_vectorial_property: str = 'xyz_vector') → None[源代码]

将导数的xyz文件文本添加到分子中。

参数:

filename (str) – 待添加矢量属性的文件名。
xyz_vectorial_property (str, optional) – 为矢量属性设定的名称。

write_file_with_xyz_coordinates(filename: str) → None[源代码]

将分子几何构型写入xyz文件中。

参数:: filename (str) – 待写入的文件的名称。

get_xyz_string() → None[源代码]: 返回分子的xyz文本。

write_file_with_properties(filename, property_to_write='y')[源代码]: 将分子的属性写入文本文件。

property atomic_numbers: ndarray: 数据库中所有分子中每个原子的原子序数的二维数组。

property element_symbols: ndarray: 数据库中所有分子每个原子元素符号的二维数组。

property ids: 数据库中分子的ID。

property smiles: str: 数据库中分子的SMILES字符串。

write_file_with_smiles(filename)[源代码]: 将数据库中分子的SMILES写入文件。

property nuclear_masses: 数据库中分子的核质量。

property charges: 数据库中分子的电荷。

property multiplicities: 数据库中分子的多重度。

get_properties(property_name='y')[源代码]: 根据给定的属性名返回分子的属性。

set_properties(**kwargs)[源代码]: 通过给定的属性名称(如关键字)设置分子的属性。

get_xyz_derivative_properties(xyz_derivative_property='xyz_derivatives')[源代码]: 按名称返回xy导数属性。

get_xyz_vectorial_properties(property_name)[源代码]: 按名称返回xyz矢量属性。

write_file_with_xyz_derivative_properties(filename, xyz_derivative_property_to_write='xyz_derivatives')[源代码]: 将xyz导数属性写入文件。

write_file_energy_gradients(filename)[源代码]: 将能量梯度写入文件。

write_file_with_xyz_vectorial_properties(filename, xyz_vectorial_property_to_write='xyz_vector')[源代码]: 将xyz矢量属性写入文件。

write_file_with_hessian(filename, hessian_property_to_write='hessian')[源代码]: 将Hessians写入文件。

append(obj)[源代码]: 附加一个分子/分子数据库。

copy(atomic_labels=None, molecular_labels=None, molecular_database_labels=None)[源代码]: 返回数据库的副本。

dump(filename=None, format=None)[源代码]: 将分子数据库转储到文件中。

classmethod load(filename=None, format=None)[源代码]: 从文件中加载分子数据库。

property xyz_coordinates: 各个分子中每个原子的xyz坐标。

class mlatom.data.molecular_trajectory(steps=None)[源代码]

用于存储/访问分子轨迹数据的类，这些数据是由动力学或几何优化生成的。

dump(filename=None, format=None)[源代码]

将分子轨迹molecular_trajectory转储到文件中。

可用的格式有：

'h5md' (需要Python模块 h5py 和 pyh5md )
'json'
'plain_text'

load(filename: str = None, format: str = None)[源代码]: 从文件中加载先前转储的分子轨迹molecular_trajectory。

get_xyz_string() → str[源代码]: 返回轨迹中分子的xyz字符串。

class mlatom.data.h5md(filename: str, data: Dict[str, Any] = {}, mode: str = 'w')[源代码]

将轨迹数据保存为 H5MD 格式文件

参数:

filename (str) – 待输出的h5md文件的文件名。
data (Dict) – 待存储的数据(可选，如果用户为此项提供参数，文件将在存储数据后关闭)。
mode (str, optional) – 控制文件处理模式的字符串 (默认值：‘ w ’表示新文件，‘ r+ ’表示现有文件)。下表中列出了与 pyh5md.File() 一致的选项

r	只读，文件必须已经存在
r+	可读可写，文件必须已经存在
w	创建文件，如果文件存在则覆盖
w- or x	创建文件，如果文件存在则报错

示例：

traj0 = h5md('traj.h5')  # open 'traj.h5'
traj1 = h5md('/tmp/test.h5', mode='r')  # open an existing file in readonly mode
traj2 = h5md('/tmp/traj2.h5', data={'time': 1.0, 'total_energy': -32.1, 'test': 8848}) # add some data to the file, then close the file

traj0.write(data) # write data to opened file
traj0(data) # an alternative way to write data

data = traj0.export() # export the data in the opened file
data = traj0() # an alternative way to export data
with h5md('test.h5') as traj: # export with a with statement
    data = traj.export()


traj0.close() # close the file

备注

HDF5文件中的默认数据路径

particles/all:
‘box’, ‘gradients’, ‘mass’, ‘nad’, ‘names’, ‘position’, ‘species’, ‘velocities’

observables:
‘angular_momentum’, ‘generated_random_number’, ‘kinetic_energy’, ‘linear_momentum’, ‘nstatdyn’, ‘oscillator_strengths’, ‘populations’, ‘potential_energy’, ‘random_seed’, ‘sh_probabilities’, ‘total_energy’, ‘wavefunctions’,

以及其他关键字

h5: HDF5文件对象

write(data: Dict[str, Any]) → None[源代码]: 将数据写入打开的H5文件。data应该是一个类似字典的对象，其key()中包含‘time’。

export() → Dict[str, ndarray][源代码]

导出打开的H5文件中的数据。

返回:: H5文件中轨迹数据的字典。

close() → None[源代码]: 关闭已打开的文件。

__call__() → Dict[str, ndarray]

导出打开的H5文件中的数据。

返回:: H5文件中轨迹数据的字典。

Models

!---------------------------------------------------------------------------!
! models: Module with models                                                !
! Implementations by: Pavlo O. Dral                                         !
!---------------------------------------------------------------------------!

methods

!---------------------------------------------------------------------------!
! models: Module with models                                                !
! Implementations by: Pavlo O. Dral                                         !
!---------------------------------------------------------------------------!

class mlatom.models.methods(method: str = None, program: str = None, **kwargs)[源代码]

用指定的方法创建一个模型对象。

参数:

method (str) – 指定方法。下一节列出了可用的方法。
program (str, optional) – 指定要使用的程序。
**kwargs – 其他对于方法的特定选项。

可用的方法：

'AIQM1', 'AIQM1@DFT', 'AIQM1@DFT*', 'AM1', 'ANI-1ccx', 'ANI-1x', 'ANI-1x-D4', 'ANI-2x', 'ANI-2x-D4', 'CCSD(T)*/CBS', 'CNDO/2', 'D4', 'DFTB0', 'DFTB2', 'DFTB3', 'GFN2-xTB', 'MINDO/3', 'MNDO', 'MNDO/H', 'MNDO/d', 'MNDO/dH', 'MNDOC', 'ODM2', 'ODM2*', 'ODM3', 'ODM3*', 'OM1', 'OM2', 'OM3', 'PM3', 'PM6', 'RM1', 'SCC-DFTB', 'SCC-DFTB-heats'.

上面列出的方法可以不指定程序直接使用。如安装手册中所述，仍然需要安装所需的程序。

可用程序及其相应方法：

程序

方法

TorchANI

'AIQM1', 'AIQM1@DFT', 'AIQM1@DFT*', 'ANI-1ccx', 'ANI-1x', 'ANI-1x-D4', 'ANI-2x', 'ANI-2x-D4'

dftd4

'AIQM1', 'AIQM1@DFT', 'ANI-1x-D4', 'ANI-2x-D4', 'D4'

MNDO or Sparrow

'AIQM1', 'AIQM1@DFT', 'AIQM1@DFT*', 'MNDO', 'MNDO/d', 'ODM2*', 'ODM3*', 'OM2', 'OM3', 'PM3', 'SCC-DFTB', 'SCC-DFTB-heats'

MNDO

'CNDO/2', 'MINDO/3', 'MNDO/H', 'MNDO/dH', 'MNDOC', 'ODM2', 'ODM3', 'OM1', 半经验的 OMx, DFTB, NDDO-type 方法

Sparrow

'DFTB0', 'DFTB2', 'DFTB3', 'PM6', 'RM1', 半经验的OMx, DFTB, NDDO-type 方法

xTB

'GFN2-xTB'，半经验GFNx-TB方法

Orca

'CCSD(T)*/CBS', DFT

Gaussian

从头算方法，DFT

PySCF

从头算方法，DFT

property nthreads

int([x]) -> integer int(x, base=10) -> integer

将数字或字符串转换为整数，如果没有给出参数将返回0。如果x是一个数字，返回x.__int__()。对于浮点数，会向零截断。

如果x不是数字或给定基数，则x必须是表示给定基数中的整数值的字符串、字节或字节数组实例。该值的前面可以有‘+’或‘-’，前后可以有空格。基数默认为10。有效基数为0和2-36。基数为0表示将字符串中的基数解释为整数值。>>> int(‘0b100’, base=0) 4

predict(*args, **kwargs)[源代码]

使用模型预测分子的几何构型。

参数:

molecular_database (mlatom.data.molecular_database, optional) – 数据库中包含需要由模型预测其性质的分子。
molecule (mlatom.models.molecule, optional) – 需要用模型预测其性质的分子对象。
calculate_energy (bool, optional) – 使用模型计算能量。
calculate_energy_gradients (bool, optional) – 使用模型计算能量梯度。
calculate_hessian (bool, optional) – 使用模型计算能量Hessian。

AIQM1

!---------------------------------------------------------------------------!
! aiqm1: Artificial intelligence quantum-mechanical method 1                !
! Implementations by: Peikung Zheng & Pavlo O. Dral                         !
!---------------------------------------------------------------------------!

class mlatom.aiqm1.aiqm1(method='AIQM1', qm_program=None, qm_program_kwargs={}, dftd4_kwargs={}, **kwargs)[源代码]

人工智能——量子力学方法，参见关于 AIQM1 的文章。

参数:

method (str, optional) – 使用AIQM1方法，目前支持AIQM1，AIQM1@DFT*，和AIQM1@DFT。默认使用AIQM1。
qm_program (str) – QM程序用于ODM2*部分的计算。目前支持MNDO和Sparrow程序。
qm_program_kwargs (dictionary, optional) – 将关键字传递给QM程序。

# Initialize molecule
mol = ml.data.molecule()
mol.read_from_xyz_file(filename='ethanol.xyz')
# Run AIQM1 calculation
aiqm1 = ml.models.methods(method='AIQM1', qm_program='MNDO')
aiqm1.predict(molecule=mol, calculate_energy=True, calculate_energy_gradients=True)
# Get energy, gradient, and prediction uncertainty of AIQM1
energy = mol.energy
gradient = mol.gradient
std = mol.aiqm1_nn.energy_standard_deviation

predict(molecular_database=None, molecule=None, calculate_energy=True, calculate_energy_gradients=False, calculate_hessian=False, nstates=1, current_state=0, **kwargs)[源代码]

使用模型预测分子的几何构型。

参数:

molecular_database (mlatom.data.molecular_database, optional) – 数据库中包含需要由模型预测其性质的分子。
molecule (mlatom.models.molecule, optional) – 需要用模型预测其性质的分子对象。
calculate_energy (bool, optional) – 使用模型计算能量。
calculate_energy_gradients (bool, optional) – 使用模型计算能量梯度。
calculate_hessian (bool, optional) – 使用模型计算能量Hessian。

ml_model

!---------------------------------------------------------------------------!
! models: Module with models                                                !
! Implementations by: Pavlo O. Dral                                         !
!---------------------------------------------------------------------------!

class mlatom.models.hyperparameter(value: Any = None, optimization_space: str = 'linear', dtype: Callable | None = None, name: str = '', minval: Any = None, maxval: Any = None, step: Any = None, choices: Iterable[Any] = [], **kwargs)[源代码]

Class of hyperparameter object, containing data could be used in hyperparameter optimizations.

参数:

value (Any, optional) – The value of the hyperparameter.
optimization_space (str, optional) – Defines the space for hyperparameter. Currently supports 'linear', and 'log'.
dtype (Callable, optional) – A callable object that forces the data type of value. Automatically choose one if set to None.

update(new_hyperparameter: hyperparameter) → None[源代码]

Update hyperparameter with data in another instance.

参数:: new_hyperparameter (mlatom.models.hyperparamters) – Whose data are to be applied to the current instance.

copy()[源代码]

返回当前实例的副本。

返回:: a new instance copied from current one.
返回类型:: mlatom.models.hyperparamter

class mlatom.models.hyperparameters(dict=None, /, **kwargs)[源代码]

Class for storing hyperparameters, values are auto-converted to mlatom.models.hyperparameter objects. Inherit from collections.UserDict.

Initiaion:

Initiate with a dictinoary or kwargs or both.

e.g.:

hyperparamters({'a': 1.0}, b=hyperparameter(value=2, minval=0, maxval=4))

copy(keys: Iterable[str] | None = None) → hyperparameters[源代码]

返回当前实例的副本。

参数:: keys (Iterable[str], optional) – If keys provided, only the hyperparameters selected by keys will be copied, instead of all hyperparameters.
返回:: a new instance copied from current one.
返回类型:: mlatom.models.hyperparamters

class mlatom.models.kreg(model_file: str | None = None, ml_program: str = 'KREG_API', equilibrium_molecule: molecule | None = None, prior: float = 0, nthreads: int | None = None, hyperparameters: Dict[str, Any] | hyperparameters = {})[源代码]

Create a KREG model object

参数:

model_file (str, optional) – 应该将模型转储或加载到其中的文件的名称。
ml_program (str, optional) – 指定要使用的ML程序。可用选项: 'KREG_API', 'MLatomF
equilibrium_molecule (mlatom.data.molecule | None) – 指定要用于生成RE描述符的平衡几何构型。如果设置为 None ，将选择能量/值最低的几何构型。
prior (default - None) – 先验可以是 ‘mean’， None(0.0)或任何浮点数。
hyperparameters (Dict[str, Any] | mlatom.models.hyperparameters, optional) – 使用提供更新模型的超参数。

train(molecular_database=None, property_to_learn=None, xyz_derivative_property_to_learn=None, save_model=True, invert_matrix=False, matrix_decomposition=None, prior=None, hyperparameters: Dict[str, Any] | hyperparameters = {})[源代码]

使用提供的分子数据库训练模型。

参数:

molecular_database (mlatom.data.molecular_database) – 用于训练模型的分子数据库。
property_to_learn (str, optional) – 在模型训练中要学习的属性标签。
xyz_derivative_property_to_learn (str, optional) – 要学习的xyz导数属性的标签。

predict(molecular_database=None, molecule=None, calculate_energy=False, calculate_energy_gradients=False, calculate_hessian=False, property_to_predict=None, xyz_derivative_property_to_predict=None, hessian_to_predict=None)[源代码]

使用模型预测分子的几何构型。

参数:

molecular_database (mlatom.data.molecular_database, optional) – 数据库中包含需要由模型预测其性质的分子。
molecule (mlatom.models.molecule, optional) – 需要用模型预测其性质的分子对象。
calculate_energy (bool, optional) – 使用模型计算能量。
calculate_energy_gradients (bool, optional) – 使用模型计算能量梯度。
calculate_hessian (bool, optional) – 使用模型计算能量Hessian。
property_to_predict (str, optional) – 待保存的预测属性的标签名称。
xyz_derivative_property_to_predict (str, optional) – 待保存的预测的xy导数属性的标签名称。
hessian_to_predict (str, optional) – 待保存的预测的Hessians的标签名称。

mlatom.models.ani(**kwargs)[源代码]: 返回一个ANI模型对象（参见 mlatom.interfaces.torchani_interface.ani ）。

mlatom.models.dpmd(**kwargs)[源代码]: 返回一个DPMD模型对象（参见 mlatom.interfaces.dpmd_interface.dpmd ）。

mlatom.models.gap(**kwargs)[源代码]: 返回一个GAP模型对象（参见 mlatom.interfaces.gap_interface.gap ）。

mlatom.models.physnet(**kwargs)[源代码]: 返回一个PhysNet模型对象（参见 mlatom.interfaces.physnet_interface.physnet ）。

mlatom.models.sgdml(**kwargs)[源代码]: 返回一个sGDML模型对象（参见 mlatom.interfaces.sgdml_interface.sgdml ）。

mlatom.models.mace(**kwargs)[源代码]: 返回一个GAP模型对象（参见 mlatom.interfaces.gap_interface.gap ）。

model_tree_node

!---------------------------------------------------------------------------!
! models: Module with models                                                !
! Implementations by: Pavlo O. Dral                                         !
!---------------------------------------------------------------------------!

class mlatom.models.model_tree_node(name=None, parent=None, children=None, operator=None, model=None)[源代码]

创建一个模型树节点对象。

参数:

name (str) – 为模型树节点设定的名字。
parent – 模型树节点的父本。
children – 模型树节点的子体。
operator – 指定预测时要进行的操作。

predict(**kwargs)[源代码]

使用模型预测分子的几何构型。

参数:

molecular_database (mlatom.data.molecular_database, optional) – 数据库中包含需要由模型预测其性质的分子。
molecule (mlatom.models.molecule, optional) – 需要用模型预测其性质的分子对象。
calculate_energy (bool, optional) – 使用模型计算能量。
calculate_energy_gradients (bool, optional) – 使用模型计算能量梯度。
calculate_hessian (bool, optional) – 使用模型计算能量Hessian。

dump(filename=None, format='json')[源代码]: 将该模型树节点转储到文件中。

Interfaces

第三方软件接口。

TorchANI

!---------------------------------------------------------------------------!
! Interface_TorchANI: Interface between TorchANI and MLatom                 !
! Implementations by: Fuchun Ge and Max Pinheiro Jr                         !
!---------------------------------------------------------------------------!

class mlatom.interfaces.torchani_interface.ani(model_file: str = None, device: str = None, hyperparameters: Dict[str, Any] | hyperparameters = {}, verbose=1)[源代码]

创建一个 ANI （ANAKIN -ME：关于分子能量的精确神经网络引擎）模型对象。

Interfaces to TorchANI.

参数:

model_file (str, optional) – 待保存或加载模型的文件名。
device (str, optional) – 指定在哪个设备上运行计算。例如，‘cpu’代表CPU，‘cuda’代表Nvidia GPUs。当没有指定时，如果系统环境中存在有效的 CUDA_VISIBLE_DEVICES ，将尝试使用CUDA。
hyperparameters (Dict[str, Any] | mlatom.models.hyperparameters, optional) – 使用提供更新模型的超参数。
verbose (int, optional) – 0代表silence, 1代表verbosity。

save(model_file: str = '') → None[源代码]

将模型保存到文件（.pt格式）。

参数:: model_file (str, optional) – 待保存模型的文件名。如果没有提供，将随机生成字符串作为文件名。

load(model_file: str = '', species_order: List[str] | None = None, AEV_parameters: Dict | None = None, self_energies: List[float] | None = None, reset_parameters: bool = False, method: str = '') → None[源代码]

从文件（.pt格式）加载保存的ANI模型。

参数:

model_file (str) – 待加载的模型的文件名。
species_order (List[str], optional) – 如果保存的模型中没有species顺序，请手动提供该顺序。
AEV_parameters (Dict, optional) – 如果保存的模型中没有AEV参数，请手动提供该参数。
self_energies (List[float], optional) – 如果保存的模型中没有自能，请手动提供自能。
reset_parameters (bool) – 重置已加载模型中的网络参数。
method (str) – 加载一种ANI方法，参见 ani.load_ani_model() 。

load_ani_model(method: str, **hyperparameters) → None[源代码]

加载一个ANI模型。

参数：
method(str)：可以是 'ANI-1x' ， 'ANI-1ccx' 或 'ANI-2x' 。

train(molecular_database: data.molecular_database, property_to_learn: str = 'energy', xyz_derivative_property_to_learn: str = None, validation_molecular_database: data.molecular_database | str | None = 'sample_from_molecular_database', hyperparameters: Dict[str, Any] | models.hyperparameters = {}, spliting_ratio: float = 0.8, save_model: bool = True, check_point: str = None, reset_optim_state: bool = False, use_last_model: bool = False, reset_parameters: bool = False, reset_network: bool = False, reset_optimizer: bool = False, save_every_epoch: bool = False, energy_weighting_function: Callable = None, energy_weighting_function_kwargs: dict = {}) → None[源代码]

使用提供的分子数据库训练模型。

参数:

molecular_database (mlatom.data.molecular_database) – 用于训练模型的分子数据库。
property_to_learn (str, optional) – 在模型训练中要学习的属性标签。
xyz_derivative_property_to_learn (str, optional) – 要学习的xyz导数属性的标签。
validation_molecular_database (mlatom.data.molecular_database | str, optional) – 显式定义用于验证的数据库，或者使用 'sample_from_molecular_database' ，使其从训练集中抽样。
hyperparameters (Dict[str, Any] | mlatom.models.hyperparameters, optional) – 使用提供更新模型的超参数。
spliting_ratio (float, optional) – 子训练集在整个训练集中的比例。
save_model (bool, optional) – 在训练过程中是否将模型保存到磁盘。注意，在训练过程中，模型可能会被多次保存。
reset_optim_state (bool, optional) – 是否重置优化器状态。
use_last_model (bool, optional) – 在最后的训练阶段是否要保留 self.model ，如果 False ，则在训练结束时将最佳模型加载到内存中。
reset_parameters (bool, optional) – 训练前是否重置模型参数。
reset_network (bool, optional) – 训练前是否重构网络。
reset_optimizer (bool, optional) – 是否在训练前重新定义优化器。
save_every_epoch (bool, optional) – 是否在每个阶段保存模型，当 save_model 为 True 时有效。
energy_weighting_function (Callable, optional) – 一个加权函数 \(\mathit{W}(\mathbf{E_ref})\) ，它根据训练点的参考能量分配权重。
energy_weighting_function_kwargs (dict, optional) – 字典中额外的加权函数参数。

predict(molecular_database: data.molecular_database = None, molecule: data.molecule = None, calculate_energy: bool = False, calculate_energy_gradients: bool = False, calculate_hessian: bool = False, property_to_predict: str | None = 'estimated_y', xyz_derivative_property_to_predict: str | None = None, hessian_to_predict: str | None = None, batch_size: int = 65536) → None[源代码]

使用模型预测分子的几何构型。

参数:

molecular_database (mlatom.data.molecular_database, optional) – 数据库中包含需要由模型预测其性质的分子。
molecule (mlatom.models.molecule, optional) – 需要用模型预测其性质的分子对象。
calculate_energy (bool, optional) – 使用模型计算能量。
calculate_energy_gradients (bool, optional) – 使用模型计算能量梯度。
calculate_hessian (bool, optional) – 使用模型计算能量Hessian。
property_to_predict (str, optional) – 待保存的预测属性的标签名称。
xyz_derivative_property_to_predict (str, optional) – 待保存的预测的xy导数属性的标签名称。
hessian_to_predict (str, optional) – 待保存的预测的Hessians的标签名称。
batch_size (int, optional) – 批量预测的批处理大小。

NN_initialize(a: float = 1.0) → None[源代码]

使用 torch.nn.init.kaiming_normal_() 重置网络参数

参数:: a (float) – 参阅 torch.nn.init.kaiming_normal_() 。

fix_layers(layers_to_fix: List[List[int]] | List[int])[源代码]

将特定的网络层固定,使其中的参数无法再被训练。

参数:: layers_to_fix (List) – 应该为： - 一个整数列表。由整数表示的层将是固定的。 - 一个整数列表的列表。每个子列表按照 self.species_order 的顺序定义每个species要固定的层。

class mlatom.interfaces.torchani_interface.ani_methods(method: str = 'ANI-1ccx', device: str = 'cpu', **kwargs)[源代码]

用ANI方法创建一个模型对象。

参数:

method (str) – A string that specifies the method. Available choices: 'ANI-1x', 'ANI-1ccx', or 'ANI-2x'.
device (str, optional) – 指定在哪个设备上运行计算。例如，‘cpu’代表CPU，‘cuda’代表Nvidia GPUs。当没有指定时，如果系统环境中存在有效的 CUDA_VISIBLE_DEVICES ，将尝试使用CUDA。

predict(molecular_database: data.molecular_database = None, molecule: data.molecule = None, calculate_energy: bool = False, calculate_energy_gradients: bool = False, calculate_hessian: bool = False, batch_size: int = 65536) → None[源代码]

使用模型预测分子的几何构型。

参数:

molecular_database (mlatom.data.molecular_database, optional) – 数据库中包含需要由模型预测其性质的分子。
molecule (mlatom.models.molecule, optional) – 需要用模型预测其性质的分子对象。
calculate_energy (bool, optional) – 使用模型计算能量。
calculate_energy_gradients (bool, optional) – 使用模型计算能量梯度。
calculate_hessian (bool, optional) – 使用模型计算能量Hessian。
batch_size (int, optional) – 批量预测的批处理大小。

DeepMD-kit

!---------------------------------------------------------------------------!
! Interface_DeePMDkit: Interface between DeePMD-kit and MLatom              !
! Implementations by: Fuchun Ge                                             !
!---------------------------------------------------------------------------!

class mlatom.interfaces.dpmd_interface.dpmd(model_file=None, hyperparameters={}, verbose=1)[源代码]

创建一个 DeepPot-SE 模型对象。

与 DeepMD-kit 的接口。

参数:

model_file (str, optional) – 待保存或加载模型的文件名。
hyperparameters (Dict[str, Any] | mlatom.models.hyperparameters, optional) – 使用提供更新模型的超参数。
verbose (int, optional) – 0代表silence, 1代表verbosity。

train(molecular_database: molecular_database, property_to_learn: str = 'energy', xyz_derivative_property_to_learn: str = None, validation_molecular_database: molecular_database | str | None = 'sample_from_molecular_database', hyperparameters: Dict[str, Any] | hyperparameters = {}, spliting_ratio=0.8, stdout=None, stderr=None, json_input=None, dirname=None)[源代码]

使用提供的分子数据库训练模型。

参数:

molecular_database (mlatom.data.molecular_database) – 用于训练模型的分子数据库。
property_to_learn (str, optional) – 在模型训练中要学习的属性标签。
xyz_derivative_property_to_learn (str, optional) – 要学习的xyz导数属性的标签。

predict(molecular_database: data.molecular_database = None, molecule: data.molecule = None, calculate_energy: bool = False, calculate_energy_gradients: bool = False, calculate_hessian: bool = False, property_to_predict: str | None = 'estimated_y', xyz_derivative_property_to_predict: str | None = None, hessian_to_predict: str | None = None) → None[源代码]

使用模型预测分子的几何构型。

参数:

molecular_database (mlatom.data.molecular_database, optional) – 数据库中包含需要由模型预测其性质的分子。
molecule (mlatom.models.molecule, optional) – 需要用模型预测其性质的分子对象。
calculate_energy (bool, optional) – 使用模型计算能量。
calculate_energy_gradients (bool, optional) – 使用模型计算能量梯度。
calculate_hessian (bool, optional) – 使用模型计算能量Hessian。
property_to_predict (str, optional) – 待保存的预测属性的标签名称。
xyz_derivative_property_to_predict (str, optional) – 待保存的预测的xy导数属性的标签名称。
hessian_to_predict (str, optional) – 待保存的预测的Hessians的标签名称。

GAP/QUIP

!---------------------------------------------------------------------------!
! Interface_GAP: Interface between GAP and MLatom                           !
! Implementations by: Fuchun Ge                                             !
!---------------------------------------------------------------------------!

class mlatom.interfaces.gap_interface.gap(model_file=None, hyperparameters={}, verbose=1)[源代码]

创建一个 GAP-SOAP 模型对象。

与 QUIP 的接口。

参数:

model_file (str, optional) – 待保存或加载模型的文件名。
hyperparameters (Dict[str, Any] | mlatom.models.hyperparameters, optional) – 使用提供更新模型的超参数。
verbose (int, optional) – 0代表silence, 1代表verbosity。

train(molecular_database: molecular_database, property_to_learn: str = 'energy', xyz_derivative_property_to_learn: str = None, hyperparameters: Dict[str, Any] | hyperparameters = {}, stdout=None, stderr=None)[源代码]

使用提供的分子数据库训练模型。

参数:

molecular_database (mlatom.data.molecular_database) – 用于训练模型的分子数据库。
property_to_learn (str, optional) – 在模型训练中要学习的属性标签。
xyz_derivative_property_to_learn (str, optional) – 要学习的xyz导数属性的标签。

predict(molecular_database: data.molecular_database = None, molecule: data.molecule = None, calculate_energy: bool = False, calculate_energy_gradients: bool = False, calculate_hessian: bool = False, property_to_predict: str | None = 'estimated_y', xyz_derivative_property_to_predict: str | None = None, hessian_to_predict: str | None = None) → None[源代码]

使用模型预测分子的几何构型。

参数:

molecular_database (mlatom.data.molecular_database, optional) – 数据库中包含需要由模型预测其性质的分子。
molecule (mlatom.models.molecule, optional) – 需要用模型预测其性质的分子对象。
calculate_energy (bool, optional) – 使用模型计算能量。
calculate_energy_gradients (bool, optional) – 使用模型计算能量梯度。
calculate_hessian (bool, optional) – 使用模型计算能量Hessian。
property_to_predict (str, optional) – 待保存的预测属性的标签名称。
xyz_derivative_property_to_predict (str, optional) – 待保存的预测的xy导数属性的标签名称。
hessian_to_predict (str, optional) – 待保存的预测的Hessians的标签名称。

PhysNet

!---------------------------------------------------------------------------!
! Interface_PhysNet: Interface between PhysNet and MLatom                   !
! Implementations by: Fuchun Ge and Max Pinheiro Jr                         !
!---------------------------------------------------------------------------!

class mlatom.interfaces.physnet_interface.physnet(model_file=None, hyperparameters={}, verbose=1)[源代码]

创建一个 PhysNet 模型对象。

与 PhysNet 程序的接口。

参数:

model_file (str, optional) – 待保存或加载模型的文件名。
hyperparameters (Dict[str, Any] | mlatom.models.hyperparameters, optional) – 使用提供更新模型的超参数。
verbose (int, optional) – 0代表silence, 1代表verbosity。

train(molecular_database: molecular_database, property_to_learn: str = 'energy', xyz_derivative_property_to_learn: str = None, validation_molecular_database: molecular_database | str | None = 'sample_from_molecular_database', hyperparameters: Dict[str, Any] | hyperparameters = {}, spliting_ratio=0.8, save_model=True, log=True, check_point=False, use_last_model=False, summary_interval=1, validation_interval=1, save_interval=1, record_run_metadata=0) → None[源代码]

使用提供的分子数据库训练模型。

参数:

molecular_database (mlatom.data.molecular_database) – 用于训练模型的分子数据库。
property_to_learn (str, optional) – 在模型训练中要学习的属性标签。
xyz_derivative_property_to_learn (str, optional) – 要学习的xyz导数属性的标签。

predict(molecular_database: data.molecular_database = None, molecule: data.molecule = None, calculate_energy: bool = False, calculate_energy_gradients: bool = False, calculate_hessian: bool = False, property_to_predict: str | None = 'estimated_y', xyz_derivative_property_to_predict: str | None = None, hessian_to_predict: str | None = None) → None[源代码]

使用模型预测分子的几何构型。

参数:

molecular_database (mlatom.data.molecular_database, optional) – 数据库中包含需要由模型预测其性质的分子。
molecule (mlatom.models.molecule, optional) – 需要用模型预测其性质的分子对象。
calculate_energy (bool, optional) – 使用模型计算能量。
calculate_energy_gradients (bool, optional) – 使用模型计算能量梯度。
calculate_hessian (bool, optional) – 使用模型计算能量Hessian。
property_to_predict (str, optional) – 待保存的预测属性的标签名称。
xyz_derivative_property_to_predict (str, optional) – 待保存的预测的xy导数属性的标签名称。
hessian_to_predict (str, optional) – 待保存的预测的Hessians的标签名称。

MACE

!---------------------------------------------------------------------------!
! Interface_MACE: Interface between MACE and MLatom                         !
! Implementations by: Fuchun Ge                                             !
!---------------------------------------------------------------------------!

mlatom.interfaces.mace_interface.get_dataset_from_molDB(train_db: molecular_database, valid_db: molecular_database, valid_fraction: float = 0.1, config_type_weights: Dict = {'Default': 1.0}, energy_key: str = '', gradients_key: str = '') → Tuple[mace.tools.scripts_utils.SubsetCollection, Dict[int, float] | None][源代码]: 从分子数据库加载训练和测试数据集

class mlatom.interfaces.mace_interface.mace(model_file: str = None, device: str = None, hyperparameters: Dict[str, Any] | hyperparameters = {}, verbose=True)[源代码]

创建一个 MACE 模型对象。

与 MACE program 的接口。

参数:

model_file (str, optional) – 待保存或加载模型的文件名。
device (str, optional) – 指定在哪个设备上运行计算。例如，‘cpu’代表CPU，‘cuda’代表Nvidia GPUs。当没有指定时，如果系统环境中存在有效的 CUDA_VISIBLE_DEVICES ，将尝试使用CUDA。
hyperparameters (Dict[str, Any] | mlatom.models.hyperparameters, optional) – 使用提供更新模型的超参数。
verbose (int, optional) – 0代表silence, 1代表verbosity。

train(molecular_database: data.molecular_database, property_to_learn: str = 'energy', xyz_derivative_property_to_learn: str = None, validation_molecular_database: data.molecular_database | str | None = 'sample_from_molecular_database', spliting_ratio: float = 0.8, hyperparameters: Dict[str, Any] | models.hyperparameters = {}) → None[源代码]

使用提供的分子数据库训练模型。

参数:

molecular_database (mlatom.data.molecular_database) – 用于训练模型的分子数据库。
property_to_learn (str, optional) – 在模型训练中要学习的属性标签。
xyz_derivative_property_to_learn (str, optional) – 要学习的xyz导数属性的标签。
validation_molecular_database (mlatom.data.molecular_database | str, optional) – 显式定义用于验证的数据库，或者使用 'sample_from_molecular_database' ，使其从训练集中抽样。
hyperparameters (Dict[str, Any] | mlatom.models.hyperparameters, optional) – 使用提供更新模型的超参数。

predict(molecular_database: data.molecular_database = None, molecule: data.molecule = None, calculate_energy: bool = False, calculate_energy_gradients: bool = False, calculate_hessian: bool = False, property_to_predict: str | None = 'estimated_y', xyz_derivative_property_to_predict: str | None = None, hessian_to_predict: str | None = None, batch_size: int = 8) → None[源代码]

使用模型预测分子的几何构型。

参数:

molecular_database (mlatom.data.molecular_database, optional) – 数据库中包含需要由模型预测其性质的分子。
molecule (mlatom.models.molecule, optional) – 需要用模型预测其性质的分子对象。
calculate_energy (bool, optional) – 使用模型计算能量。
calculate_energy_gradients (bool, optional) – 使用模型计算能量梯度。
calculate_hessian (bool, optional) – 使用模型计算能量Hessian。
property_to_predict (str, optional) – 待保存的预测属性的标签名称。
xyz_derivative_property_to_predict (str, optional) – 待保存的预测的xy导数属性的标签名称。
hessian_to_predict (str, optional) – 待保存的预测的Hessians的标签名称。
batch_size (int, optional) – 批量预测的批处理大小。

class mlatom.interfaces.mace_interface.CheckpointHandler_modified(*args: Any, **kwargs: Any)[源代码]

sGDML

!---------------------------------------------------------------------------!
! Interface_sGDML: Interface between sGDML and MLatom                       !
! Implementations by: Fuchun Ge                                             !
!---------------------------------------------------------------------------!

class mlatom.interfaces.sgdml_interface.sgdml(model_file=None, hyperparameters={}, verbose=1, max_memory=None, max_processes=None, use_torch=False, lazy_training=False)[源代码]

创建一个 sGDML 模型对象。

与 sGDML 的接口。

参数:

model_file (str, optional) – 待保存或加载模型的文件名。
hyperparameters (Dict[str, Any] | mlatom.models.hyperparameters, optional) – 使用提供更新模型的超参数。
verbose (int, optional) – 0代表silence, 1代表verbosity。

超参数：

请参考sGDML手册

no_E: mlatom.models.hyperparameter(value=False) gdml: mlatom.models.hyperparameter(value=False) perms: mlatom.models.hyperparameter(value=None) sigma: mlatom.models.hyperparameter(value=None) E_cstr: mlatom.models.hyperparameter(value=False) cprsn: mlatom.models.hyperparameter(value=False)

train(molecular_database: molecular_database, property_to_learn: str = 'energy', xyz_derivative_property_to_learn: str = None, validation_molecular_database: molecular_database | str | None = None, hyperparameters: Dict[str, Any] | hyperparameters = {}, spliting_ratio=0.8, save_model=True, task_dir=None, overwrite=False, max_memory=None, max_processes=None, use_torch=None, lazy_training=None)[源代码]

使用提供的分子数据库训练模型。

参数:

molecular_database (mlatom.data.molecular_database) – 用于训练模型的分子数据库。
property_to_learn (str, optional) – 在模型训练中要学习的属性标签。
xyz_derivative_property_to_learn (str, optional) – 要学习的xyz导数属性的标签。

predict(molecular_database: data.molecular_database = None, molecule: data.molecule = None, calculate_energy: bool = False, calculate_energy_gradients: bool = False, calculate_hessian: bool = False, property_to_predict: str | None = 'estimated_y', xyz_derivative_property_to_predict: str | None = None, hessian_to_predict: str | None = None, batch_size: int = 65536) → None[源代码]

使用模型预测分子的几何构型。

参数:

molecular_database (mlatom.data.molecular_database, optional) – 数据库中包含需要由模型预测其性质的分子。
molecule (mlatom.models.molecule, optional) – 需要用模型预测其性质的分子对象。
calculate_energy (bool, optional) – 使用模型计算能量。
calculate_energy_gradients (bool, optional) – 使用模型计算能量梯度。
calculate_hessian (bool, optional) – 使用模型计算能量Hessian。
property_to_predict (str, optional) – 待保存的预测属性的标签名称。
xyz_derivative_property_to_predict (str, optional) – 待保存的预测的xy导数属性的标签名称。
hessian_to_predict (str, optional) – 待保存的预测的Hessians的标签名称。
batch_size (int, optional) – 批量预测的批处理大小。

Gaussian

!---------------------------------------------------------------------------!
! gaussian: interface to the Gaussian program                               !
! Implementations by: Pavlo O. Dral, Peikun Zheng, Yi-Fan Hou               !
!---------------------------------------------------------------------------!

class mlatom.interfaces.gaussian_interface.gaussian_methods(method='B3LYP/6-31G*', nthreads=None, save_files_in_current_directory=False, working_directory=None, **kwargs)[源代码]

Gaussian接口

参数:

method (str) – 使用方法
nthreads (int) – 相当于高斯输入文件中的%proc
save_files_in_current_directory (bool) – 是否保留输入和输出文件，默认为 'False'
working_directory (str) – 保存程序输出文件和其他临时文件的目录的路径，默认为 'None'

备注

方法的格式应与Gaussian中的函数格式相同，例如 'B3LYP/6-31G*'

predict(molecular_database=None, molecule=None, calculate_energy=True, calculate_energy_gradients=False, calculate_hessian=False, gaussian_keywords=None)[源代码]

使用模型预测分子的几何构型。

参数:

molecular_database (mlatom.data.molecular_database, optional) – 数据库中包含需要由模型预测其性质的分子。
molecule (mlatom.models.molecule, optional) – 需要用模型预测其性质的分子对象。
calculate_energy (bool, optional) – 使用模型计算能量。
calculate_energy_gradients (bool, optional) – 使用模型计算能量梯度。
calculate_hessian (bool, optional) – 使用模型计算能量Hessian。
gaussian_keywords (some type) – # needs to be documented.

Orca

!---------------------------------------------------------------------------!
! orca: interface to the ORCA program                                       !
! Implementations by: Yuxinxin Chen                                         !
!---------------------------------------------------------------------------!

class mlatom.interfaces.orca_interface.orca_methods(method='wb97x/6-31G*', **kwargs)[源代码]

ORCA接口

参数:

method (str) – 使用与ORCA相同的方法，例如: 'B3LYP/6-31G*' (不区分大小写)。这是orca输入的第一行，你也可以把整个第一行存储在这里。
save_files_in_current_directory (bool) – 是否保留输入和输出文件，默认为 'False'
working_directory (str) – 保存程序输出文件和其他临时文件的目录的路径，默认为 'None'
nthreads (int) – 相当于ORCA输入文件中的%pal nprocs
nthreads_list (list) – CCSD(T)*/CBS方法中使用的线程数列表。顺序应为 [mp2_tz, mp2_qz, dlpno_normal_dz, dlpno_normal_tz, dlpno_tight_dz]
additional_keywords (list) – 待添加到orca输入（第一行）中的关键字（str）
input_file (str) – orca输入文件的名字。默认使用 “[molecule number]_”
output_keywords (list) – 要从输出文件中提取的关键字列表。(目前只支持在能量计算和property.txt文件中自定义关键字)

备注

当使用CCSD(T)*/CBS进行计算时，请确保每个方法使用的 'nthreads' 不会导致内存超标。我们建议使用 'nthreads_list' 为每个组件方法设置正确的 'nthreads' ，顺序如下:[MP2/cc-pVTZ, MP2/cc-pVQZ, DLPNO-CCSD(T)-normalPNO/cc-pVDZ, DLPNO-CCSD(T)-normalPNO/cc-pVTZ, DLPNO-CCSD(T)-tightPNO/cc-pVTZ]。如果只设置 'nthreads' ，则所有组件方法将使用相同数量的线程。

predict(molecular_database=None, molecule=None, calculate_energy=True, calculate_energy_gradients=False, calculate_hessian=False, **kwargs)[源代码]

使用模型预测分子的几何构型。

参数:

molecular_database (mlatom.data.molecular_database, optional) – 数据库中包含需要由模型预测其性质的分子。
molecule (mlatom.models.molecule, optional) – 需要用模型预测其性质的分子对象。
calculate_energy (bool, optional) – 使用模型计算能量。
calculate_energy_gradients (bool, optional) – 使用模型计算能量梯度。
calculate_hessian (bool, optional) – 使用模型计算能量Hessian。
**kwargs – # needs to be documented.

DFT-D4

!---------------------------------------------------------------------------!
! dftd4: interface to the dftd4 program                                     !
! Implementations by: Pavlo O. Dral & Peikun Zheng                          !
!---------------------------------------------------------------------------!

class mlatom.interfaces.dftd4_interface.dftd4_methods(functional=None, save_files_in_current_directory=True, working_directory=None, **kwargs)[源代码]

DFT-D4接口

参数:

functional (str) – 可用功能
save_files_in_current_directory (bool) – 是否保留输入和输出文件，默认为 'True'
working_directory (str) – 保存程序输出文件和其他临时文件的目录的路径，默认为 'None'

备注

默认的DFT-D4实现为CPU提供共享内存并行化。它们提供openMP并行化，但此处目前尚未实现。详情请参考https://github.com/dftd4/dftd4/issues/20。

predict(molecular_database=None, molecule=None, calculate_energy=True, calculate_energy_gradients=False, calculate_hessian=False, nstates=1, **kwargs)[源代码]

使用模型预测分子的几何构型。

参数:

molecular_database (mlatom.data.molecular_database, optional) – 数据库中包含需要由模型预测其性质的分子。
molecule (mlatom.models.molecule, optional) – 需要用模型预测其性质的分子对象。
calculate_energy (bool, optional) – 使用模型计算能量。
calculate_energy_gradients (bool, optional) – 使用模型计算能量梯度。
calculate_hessian (bool, optional) – 使用模型计算能量Hessian。

PySCF

!---------------------------------------------------------------------------!
! Interface_PySCF: interface to the PySCF program                           !
! Implementations by: Yuxinxin Chen                                         !
!---------------------------------------------------------------------------!

class mlatom.interfaces.pyscf_interface.pyscf_methods(method='B3LYP/6-31g', **kwargs)[源代码]

PySCF接口

参数:

method (str) – 使用方法
nthreads (int) – 设置OMP线程数

备注

支持的方法：

能量：HF, MP2, DFT, CISD, FCI, CCSD/CCSD(T), TD-DFT/TD-HF

梯度：HF, MP2, DFT, CISD, CCSD, RCCSD(T), TD-DFT/TD-HF

Hessian矩阵：HF, DF

predict(molecule=None, molecular_database=None, calculate_energy=True, calculate_energy_gradients=False, calculate_hessian=False, **kwargs)[源代码]

使用模型预测分子的几何构型。

参数:

molecular_database (mlatom.data.molecular_database, optional) – 数据库中包含需要由模型预测其性质的分子。
molecule (mlatom.models.molecule, optional) – 需要用模型预测其性质的分子对象。
calculate_energy (bool, optional) – 使用模型计算能量。
calculate_energy_gradients (bool, optional) – 使用模型计算能量梯度。
calculate_hessian (bool, optional) – 使用模型计算能量Hessian。
**kwargs – # needs to be documented.

Sparrow

!---------------------------------------------------------------------------!
! sparrow: interface to the Sparrow program                                 !
! Implementations by: Pavlo O. Dral & Peikun Zheng                          !
!---------------------------------------------------------------------------!

class mlatom.interfaces.sparrow_interface.sparrow_methods(method='ODM2*', read_keywords_from_file='', save_files_in_current_directory=False, working_directory=None, **kwargs)[源代码]

Sparrow接口

参数:

method (str) – 要使用的方法
read_keywords_from_file (str) – Sparrow中使用的关键词
save_files_in_current_directory (bool) – 是否保留输入和输出文件，默认为 'False'
working_directory (str) – 保存程序输出文件和其他临时文件的目录的路径，默认为 'None'

备注

支持的方法：

能量：DFTB0, DFTB2, DFTB3 MNDO, MNDO/d, AM1, PM3, PM6, RM1, OM2, OM3, ODM2*, ODM3* AIQM1

梯度：DFTB0, DFTB2, DFTB3 MNDO, MNDO/d, AM1, PM3, PM6, RM1

predict(molecular_database=None, molecule=None, calculate_energy=True, calculate_energy_gradients=False, calculate_hessian=False, **kwargs)[源代码]

使用模型预测分子的几何构型。

参数:

molecular_database (mlatom.data.molecular_database, optional) – 数据库中包含需要由模型预测其性质的分子。
molecule (mlatom.models.molecule, optional) – 需要用模型预测其性质的分子对象。
calculate_energy (bool, optional) – 使用模型计算能量。
calculate_energy_gradients (bool, optional) – 使用模型计算能量梯度。
calculate_hessian (bool, optional) – 使用模型计算能量Hessian。

xTB

!---------------------------------------------------------------------------!
! xtb: interface to the xtb program                                         !
! Implementations by: Pavlo O. Dral                                         !
!---------------------------------------------------------------------------!

class mlatom.interfaces.xtb_interface.xtb_methods(method='GFN2-xTB', read_keywords_from_file='', **kwargs)[源代码]

xTB接口

参数:

method (str) – xTB方法
read_keywords_from_file (str) – xTB中使用的关键词

备注

Only GFN2-xTB is available.

示例：

from ml.interfaces.xtb import xtb_methods()

# read molecule from xyz file
mol = ml.data.molecule()
mol.read_from_xyz_file('sp.xyz')

# initialize xtb methods
model = xtb_methods(method='GFN2-xTB)

# calculate energy, gradients and hessian
model.predict(molecule=mol,
            calculate_energy_gradients=True,
            calculate_hessian=True)
print(mol.energy)

predict(molecular_database=None, molecule=None, calculate_energy=True, calculate_energy_gradients=False, calculate_hessian=False)[源代码]

使用模型预测分子的几何构型。

参数:

molecular_database (mlatom.data.molecular_database, optional) – 数据库中包含需要由模型预测其性质的分子。
molecule (mlatom.models.molecule, optional) – 需要用模型预测其性质的分子对象。
calculate_energy (bool, optional) – 使用模型计算能量。
calculate_energy_gradients (bool, optional) – 使用模型计算能量梯度。
calculate_hessian (bool, optional) – 使用模型计算能量Hessian。

MNDO

!---------------------------------------------------------------------------!
! mndo: interface to the MNDO program                                       !
! Implementations by: Pavlo O. Dral, Peikun Zheng, and Lina Zhang           !
!---------------------------------------------------------------------------!

class mlatom.interfaces.mndo_interface.mndo_methods(method='ODM2*', read_keywords_from_file='', save_files_in_current_directory=True, working_directory=None, **kwargs)[源代码]

MNDO接口

参数:

method (str) – MNDO中使用的方法
read_keywords_from_file (str) – MNDO中使用的关键词
save_files_in_current_directory (bool) – 是否保留输入和输出文件，默认为 'True'
working_directory (str) – 保存程序输出文件和其他临时文件的目录的路径，默认为 'None'

predict(molecular_database=None, molecule=None, nstates=1, current_state=0, calculate_energy=True, calculate_energy_gradients=False, calculate_hessian=False, calculate_nacv=False, read_density_matrix=False)[源代码]

使用模型预测分子的几何构型。

参数:

molecular_database (mlatom.data.molecular_database, optional) – 数据库中包含需要由模型预测其性质的分子。
molecule (mlatom.models.molecule, optional) – 需要用模型预测其性质的分子对象。
calculate_energy (bool, optional) – 使用模型计算能量。
calculate_energy_gradients (bool, optional) – 使用模型计算能量梯度。
calculate_hessian (bool, optional) – 使用模型计算能量Hessian。

Simulations

!---------------------------------------------------------------------------!
! simulations: Module for simulations                                       !
! Implementations by: Pavlo O. Dral                                         !
!---------------------------------------------------------------------------!

class mlatom.simulations.optimize_geometry(model=None, model_predict_kwargs={}, initial_molecule=None, molecule=None, ts=False, program=None, optimization_algorithm=None, maximum_number_of_steps=None, convergence_criterion_for_forces=None, working_directory=None, **kwargs)[源代码]

几何优化。

参数:

model (mlatom.models.model or mlatom.models.methods) – 任何可提供能量和力的模型或方法。
initial_molecule (mlatom.data.molecule) – 待优化的分子对象。
ts (bool, optional) – whether to do transition state search. Currently only be done with program=Gaussian or ASE.
program (str, optional) – the engine used in geometry optimization. Currently supports Gaussian, ASE, and scipy.
optimization_algorithm (str, optional) – ASE中使用的优化算法。默认值：LBFGS (ts=False), dimer (ts=False)。
maximum_number_of_steps (int, optional) – the maximum number of steps for ASE and SciPy. Default value: 200.
convergence_criterion_for_forces (float, optional) – ASE中的强制收敛准则。默认值：0.02 eV/Angstroms。
working_directory (str, optional) – 工作目录。默认值为‘.’, 即当前目录。
constraints (dict, optional) – constraints for geometry optimization. Currently only available with program=ASE and follows the same conventions as in ASE: constraints={'bonds':[[target,[index0,index1]], ...],'angles':[[target,[index0,index1,index2]], ...],'dihedrals':[[target,[index0,index1,index2,index3]], ...]} (check FixInternals class in ASE for more information).

示例：

# Initialize molecule
mol = ml.data.molecule()
mol.read_from_xyz_file(filename='ethanol.xyz')
# Initialize methods
aiqm1 = ml.models.methods(method='AIQM1', qm_program='MNDO')
# Run geometry optimization
geomopt = ml.simulations.optimize_geometry(model = aiqm1, initial_molecule=mol, program = 'ASE')
# Get the optimized geometry, energy, and gradient
optmol = geomopt.optimized_molecule
geo = optmol.get_xyz_coordinates()
energy = optmol.energy
gradient = optmol.get_energy_gradients()

class mlatom.simulations.freq(model=None, model_predict_kwargs={}, molecule=None, program=None, normal_mode_normalization='mass deweighted normalized', anharmonic=False, anharmonic_kwargs={}, working_directory=None)[源代码]

频率分析。

参数:

model (mlatom.models.model or mlatom.models.methods) – 任何能够提供能量、力和Hessian的模型或方法。
molecule (mlatom.data.molecule) – 具有必要信息的分子对象。
program (str, optional) – 用于分析频率的程序。支持pyscf或Gaussian。
normal_mode_normalization (str, optional) – 简正模输出方案。它应该是以下值之一：质量加权归一化、质量减权非归一化和质量减权归一化（默认值）。
anharmonic (bool) – 是否进行非谐波频率计算。
working_directory (str, optional) – 工作目录。默认值为‘.’, 即当前目录。

示例：

# Initialize molecule
mol = ml.data.molecule()
mol.read_from_xyz_file(filename='ethanol.xyz')
# Initialize methods
aiqm1 = ml.models.methods(method='AIQM1', qm_program='MNDO')
# Run frequence analysis
ml.simulations.freq(model=aiqm1, molecule=mol, program='ASE')
# Get frequencies
frequencies = mol.frequencies

class mlatom.simulations.thermochemistry(model=None, molecule=None, program=None, normal_mode_normalization='mass deweighted normalized')[源代码]

热化学性质计算。

参数:

model (mlatom.models.model or mlatom.models.methods) – 任何能够提供能量、力和Hessian的模型或方法。
molecule (mlatom.data.molecule) – 具有必要信息的分子对象。
program (str) – 用于热化学性质计算的程序。目前支持Gaussian和ASE。
normal_mode_normalization (str, optional) – 简正模输出方案。它应该是以下值之一：质量加权归一化、质量减权非归一化和质量减权非归一化（默认值）。

# Initialize molecule
mol = ml.data.molecule()
mol.read_from_xyz_file(filename='ethanol.xyz')
# Initialize methods
aiqm1 = ml.models.methods(method='AIQM1', qm_program='MNDO')
# Run thermochemical properties calculation
ml.simulations.thermochemistry(model=aiqm1, molecule=mol, program='ASE')
# Get ZPE and heat of formation
ZPE = mol.ZPE
Hof = mol.DeltaHf298

计算结束后，得到 molecule 对象的热化学性质：

ZPE: 零点能
DeltaE2U: 能量的热校正 (仅支持使用Gaussian)
DeltaE2H: 焓的热校正 (仅支持使用Gaussian)
DeltaE2G: 吉布斯自由能的热校正 (仅支持使用Gaussian)
U0: 0K时的内能
H0: 0K时的焓
U: 内能 (仅支持使用Gaussian)
H: 焓
G: 吉布斯自由能
S: 熵 (仅支持使用Gaussian)
atomization_energy_0K
ZPE_exclusive_atomization_energy_0K
DeltaHf298: 298 K时的生成热

class mlatom.simulations.dmc(model: model, initial_molecule: molecule = None, initial_molecular_database: molecular_database = None, energy_scaling_factor: float = 1.0)[源代码]

使用 PyVibDMC 运行扩散蒙特卡罗模拟。

参数:

model (mlatom.models.model) – 势能面模型。单位应为Hartree，否则需要设置正确的 energy_scaling_factor 。
initial_molecule (mlatom.data.molecule) – 步行者的初始几何形状。通常应提供能量最小几何。默认情况下，每个坐标将被缩放1.01，使其稍微扭曲。
energy_scaling_factor (float, optional) – 被乘到模型的能量预测中的因子

run(run_dir: str = 'DMC', weighting: str = 'discrete', number_of_walkers: int = 5000, number_of_timesteps: int = 10000, equilibration_steps: int = 500, dump_trajectory_interval: int = 500, dump_wavefunction_interval: int = 1000, descendant_weighting_steps: int = 300, time_step: float = 0.024188843265857, initialize: bool = False)[源代码]

运行DMC模拟

参数:

run_dir (str) – 用于存放输出文件的文件夹。
weighting (str) – 'discrete' 或者 'continuous'。 'continuous' 保持系综大小不变。
number_of_walkers (int) – 探测势能面时的几何构型的数目
number_of_timesteps (int) – 模拟运行的步数
equilibration_steps (int) – 平衡的步数
dump_trajectory_interval (int) – 丢弃步行轨迹的间隔
dump_wavefunction_interval (int) – 收集波函数的间隔
descendant_weighting_steps (int) – 每个波函数的后代加权的时间步数
time_step (float) – 每个时步的长度，单位飞秒（fs）

load(filename)[源代码]: 从HDF5文件加载之前的模拟结果

get_zpe(start_step=1000) → float[源代码]

返回计算得到的零点能量，单位Hartree

参数:: start_step (int) – 求能量平均值的第一步

Initial conditions

mlatom.initial_conditions.generate_initial_conditions(molecule=None, generation_method=None, number_of_initial_conditions=1, file_with_initial_xyz_coordinates=None, file_with_initial_xyz_velocities=None, eliminate_angular_momentum=True, degrees_of_freedom=None, initial_temperature=None, initial_kinetic_energy=None, use_hessian=False, reaction_coordinate_momentum=True, filter_by_energy_window=False, window_filter_kwargs={}, random_seed=None)[源代码]

生成初始条件。

参数:

molecule (data.molecule) – Molecule with necessary information
generation_method (str) – Initial condition generation method
number_of_initial_conditions (int) – Number of initial conditions to generate, 1 by default
file_with_initial_xyz_coordinates (str) – File with initial xyz coordinates, only valid for generation_method='user-defined'
file_with_initial_xyz_velocities (str) – File with initial xyz velocities, only valid for generation_method='user-defined'
eliminate_angular_momentum (bool) – Remove angular momentum from velocities, valid for generation_method='random' and generation_method='wigner'
degrees_of_freedom (int) – Degrees of freedom of the molecule, by default remove translational and rotational degrees of freedom. It can be a negative value, which means that some value is subtracted from 3*Natoms
initial_temperature (float) – Initial temperature in Kelvin, control random initial velocities
initial_kinetic_energy (float) – Initial kinetic energy in Hartree, control random initial velocities
random_seed (int) – Random seed for numpy random number generator

生成方法	说明
`'user-defined'` （默认）	Use user-defined initial conditions
`'random'`	Generate random velocities
`'maxwell-boltzmann'`	Randomly generate initial velocities from Maxwell-Boltzmann distribution
`'wigner'`	Use Wigner sampling as implemented in Newton-X

返回:: 初始条件为 number_of_initial_conditions 的分子数据库( ml.data.molecular_database )

示例：

# Use user-defined initial conditions
init_cond_db = ml.generate_initial_conditions(molecule = mol,
                                              generation_method = 'user-defined',
                                              file_with_initial_xyz_coordinates = 'ethanol.xyz',
                                              file_with_initial_xyz_velocities  = 'ethanol.vxyz',
                                              number_of_initial_conditions = 1)
# Generate random velocities
init_cond_db = ml.generate_initial_conditions(molecule = mol,
                                              generation_method = 'random',
                                              initial_temperature = 300,
                                              number_of_initial_conditions = 1)
# Use Wigner sampling
init_cond_db = ml.generate_initial_conditions(molecule = mol,
                                              generation_method = 'wigner',
                                              number_of_initial_conditions = 1)

备注

ml.models.methods.predict(molecule=mol,calculate_hessian=True) 获取Hessian矩阵。

Molecular dynamics

!---------------------------------------------------------------------------!
! MD: Module for molecular dynamics                                         !
! Implementations by: Yi-Fan Hou & Pavlo O. Dral                            !
!---------------------------------------------------------------------------!

class mlatom.md.md(model=None, model_predict_kwargs={}, molecule_with_initial_conditions=None, molecule=None, ensemble='NVE', thermostat=None, time_step=0.1, maximum_propagation_time=1000, dump_trajectory_interval=None, filename=None, format='h5md', stop_function=None, stop_function_kwargs=None)[源代码]

Molecular dynamics

参数:

model (mlatom.models.model or mlatom.models.methods) – 任何可提供能量和力的模型或方法。
molecule_with_initial_conditions (data.molecule) – 有初始条件的分子。
ensemble (str, optional) – 使用哪种组合。
thermostat (thermostat.Thermostat) – 应用于系统的恒温器。
time_step (float) – 以飞秒为单位的时间步长。
maximum_propagation_time (float) – 最大模拟时间（以飞秒为单位）。
dump_trajectory_interval (int, optional) – 转储轨迹的时间间隔。设置为 None 可禁用转储。
filename (str, optional) – 保存转储轨迹的文件。
format (str, optional) – 转储轨迹的保存格式。
stop_function (any, optional) – 用户定义的函数，在 maximum_propagation_time 之前停止MD模拟。
stop_function_kwargs (Dict, optional) – Kwargs of stop_function

系综	说明
`'NVE'` (default)	微正则系综（NVE）
`'NVT'`	正则系综（NVT）

恒温器	说明
`ml.md.Andersen_thermostat`	Andersen恒温器
`ml.md.Nose_Hoover_thermostat`	Hose-Hoover恒温器
`None` （默认）	未应用恒温器

For theoretical details, see and cite original paper.

示例：

# Initialize molecule
mol = ml.data.molecule()
mol.read_from_xyz_file(filename='ethanol.xyz')
# Initialize methods
aiqm1 = ml.models.methods(method='AIQM1')
# User-defined initial condition
init_cond_db = ml.generate_initial_conditions(molecule = mol,
                                              generation_method = 'user-defined',
                                              file_with_initial_xyz_coordinates = 'ethanol.xyz',
                                              file_with_initial_xyz_velocities  = 'ethanol.vxyz')
init_mol = init_cond_db.molecules[0]
# Initialize thermostat
nose_hoover = ml.md.Nose_Hoover_thermostat(temperature=300,molecule=init_mol,degrees_of_freedom=-6)
# Run dynamics
dyn = ml.md(model=aiqm1,
            molecule_with_initial_conditions = init_mol,
            ensemble='NVT',
            thermostat=nose_hoover,
            time_step=0.5,
            maximum_propagation_time = 10.0)
# Dump trajectory
traj = dyn.molecular_trajectory
traj.dump(filename='traj', format='plain_text')
traj.dump(filename='traj.h5', format='h5md')

备注

轨迹保存在 ml.md.molecular_trajectory 中，这是一个 ml.data.molecular_trajectory 类

警告

在MLatom中，能量单位为Hartree，距离单位为Angstrom。请确保模型中的单位一致。

class Andersen_thermostat(**kwargs)

Andersen恒温器对象

参数:

gamma (float) – fs^{-1}中的碰撞率，默认为0.2
temperature (float) – 系统温度以开尔文为单位，默认为300

class Nose_Hoover_thermostat(**kwargs)

Nose-Hoover恒温器对象

参数:

nose_hoover_chain_length (int) – Nose Hoover链长度，应为正数，默认为3
multiple_time_step (int) – 多个时间步长，应为正数，默认为3
number_of_yoshida_suzuki_steps (int) – Yoshida Suzuki步数，可以是（1,3,5,7）中的任意一个，默认为7
nose_hoover_chain_frequency (float) – 以 fs^{-1} 为单位的 Nose-Hoover 链频率，默认为0.0625，应与要平衡的频率相当
temperature (float) – 系统温度以开尔文为单位，默认为300
molecule (data.molecule) – 要平衡的分子
degrees_of_freedom – 系统的自由度

Surface-hopping dynamics

!—————————————————————————! ! MD: Module for molecular dynamics ! ! Implementations by: Lina Zhang & Pavlo O. Dral ! !—————————————————————————!

class mlatom.namd.surface_hopping_md(model=None, model_predict_kwargs={}, molecule_with_initial_conditions=None, molecule=None, ensemble='NVE', thermostat=None, time_step=0.1, maximum_propagation_time=100, dump_trajectory_interval=None, filename=None, format='h5md', stop_function=None, stop_function_kwargs=None, hopping_algorithm='LZBL', nstates=None, initial_state=None, random_seed=<function generate_random_seed>, prevent_back_hop=False, rescale_velocity_direction='along velocities', reduce_kinetic_energy=False)[源代码]

面跳跃分子动力学

参数:

model (mlatom.models.model or mlatom.models.methods) – 任何可提供能量和力的模型或方法。
model_predict_kwargs (Dict, optional) – 模型预测的关键字参数
molecule_with_initial_conditions (data.molecule) – 有初始条件的分子。
molecule (data.molecule) – 工作原理与molecule_with_initial_conditions相同
ensemble (str, optional) – 使用哪种组合。
thermostat (thermostat.Thermostat) – 应用于系统的恒温器。
time_step (float) – 以飞秒为单位的时间步长。
maximum_propagation_time (float) – 最大模拟时间（以飞秒为单位）。
dump_trajectory_interval (int, optional) – 转储轨迹的时间间隔。设置为 None 可禁用转储。
filename (str, optional) – 保存转储轨迹的文件。
format (str, optional) – 转储轨迹的保存格式。
stop_function (any, optional) – 用户定义的函数，在 maximum_propagation_time 之前停止MD模拟。
stop_function_kwargs (Dict, optional) – Kwargs of stop_function
hopping_algorithm (str, optional) – 面跳跃算法
nstates (int) – 态数目
initial_state (int) – 初态
random_seed (int) – 随机种子
prevent_back_hop (bool, optional) – 是否阻止回跃
rescale_velocity_direction (string, optional) – 重新调整速度方向
reduce_kinetic_energy (bool, optional) – 是否降低动能

系综	说明
`'NVE'` (default)	微正则系综（NVE）
`'NVT'`	正则系综（NVT）

恒温器	说明
`ml.md.Andersen_thermostat`	Andersen恒温器
`ml.md.Nose_Hoover_thermostat`	Hose-Hoover恒温器
`None` （默认）	未应用恒温器

有关理论细节，请参阅并引用原始论文(待提交)。

示例：

import mlatom as ml

# Load the initial geometry of a molecule
mol = ml.data.molecule()
mol.charge=1
mol.read_from_xyz_file('cnh4+.xyz')

# Define models
aiqm1 = ml.models.methods(method='AIQM1',
                qm_program_kwargs={'save_files_in_current_directory': True,
                                    'read_keywords_from_file': f'mndokw'})
method_optfreq = ml.models.methods(method='B3LYP/Def2SVP', program='pyscf')

# Optimize geometry
geomopt = ml.simulations.optimize_geometry(model=method_optfreq,
                                        initial_molecule=mol)
eqmol = geomopt.optimized_molecule
eqmol.write_file_with_xyz_coordinates('eq.xyz')

# Get frequencies
ml.simulations.freq(model=method_optfreq,
                    molecule=eqmol)
eqmol.dump(filename='eqmol.json', format='json')

# Get initial conditions
init_cond_db = ml.generate_initial_conditions(molecule=eqmol,
                                    generation_method='wigner',
                                    number_of_initial_conditions=16,
                                    initial_temperature=0)
init_cond_db.dump('test.json','json')

# Propagate multiple LZBL surface-hopping trajectories in parallel
# .. setup dynamics calculations
namd_kwargs = {
            'model': aiqm1,
            'time_step': 0.25,
            'maximum_propagation_time': 5,
            'hopping_algorithm': 'LZBL',
            'nstates': 3,
            'initial_state': 2,
            }

# .. run trajectories in parallel
dyns = ml.simulations.run_in_parallel(molecular_database=init_cond_db,
                                    task=ml.namd.surface_hopping_md,
                                    task_kwargs=namd_kwargs,
                                    create_and_keep_temp_directories=True)
trajs = [d.molecular_trajectory for d in dyns]

# Dump the trajectories
itraj=0
for traj in trajs:
    itraj+=1
    traj.dump(filename=f"traj{itraj}.h5",format='h5md')

# Analyze the result of trajectories and make the population plot
ml.namd.analyze_trajs(trajectories=trajs, maximum_propagation_time=5)
ml.namd.plot_population(trajectories=trajs, time_step=0.25,
                    max_propagation_time=5, nstates=3, filename=f'pop.png')

备注

轨迹保存在 ml.md.molecular_trajectory 中，这是一个 ml.data.molecular_trajectory 类

警告

在MLatom中，能量单位为Hartree，距离单位为Angstrom。请确保模型中的单位一致。

Spectra

Active learning

Initial data sampling

initdata_sampler can be:

'wigner'
'harmonic-quantum-boltzmann'

用户自定义ML模型

The user has the flexibility to create their own ML model class for AL. Minimum requirements to such a class:

it must have the usual train and predict functions.
the train function must accept molecular_database parameter.
the predict function must accept molecule and/or molecular_database parameters.

The realistic, fully fledged example of how to create a usable ML model class is below (it is what we use in al routine!):

class my_model():
    def __init__(self, al_info = {}, model_file=None, device=None, verbose=False):
        import torch
        if device is None:
            device = 'cuda' if torch.cuda.is_available() else 'cpu'

        if model_file is None:
            if 'mlmodel_file' in al_info.keys():
                self.model_file = al_info['mlmodel_file']
            else:
                self.model_file = 'mlmodel'
                al_info['mlmodel_file'] = self.model_file
        else:
            self.model_file = model_file
            al_info['mlmodel_file'] = self.model_file
        if 'main_mlmodel_file' in al_info.keys():
            main_mlmodel_file = al_info['main_mlmodel_file']
        else:
            main_mlmodel_file = f'{self.model_file}.pt'
            al_info['main_mlmodel_file'] = main_mlmodel_file
        if 'aux_mlmodel_file' in al_info.keys():
            aux_mlmodel_file = al_info['aux_mlmodel_file']
        else:
            aux_mlmodel_file = f'aux_{self.model_file}.pt'
            al_info['aux_mlmodel_file'] = aux_mlmodel_file
        self.device = device
        self.verbose = verbose
        self.main_model = ml.models.ani(model_file=main_mlmodel_file,device=device,verbose=verbose)
        self.aux_model = ml.models.ani(model_file=aux_mlmodel_file,device=device,verbose=verbose)

    def train(self, molecular_database=None, al_info={}):
        if 'working_directory' in al_info.keys():
            workdir = al_info['working_directory']
            self.main_model.model_file = f'{workdir}/{self.model_file}.pt'
            self.aux_model.model_file = f'{workdir}/aux_{self.model_file}.pt'

        validation_set_fraction = 0.1
        [subtraindb, valdb] = molecular_database.split(number_of_splits=2, fraction_of_points_in_splits=[1-validation_set_fraction, validation_set_fraction], sampling='random')

        # train the model on energies and gradients
        self.main_model = ml.models.ani(model_file=self.main_model.model_file,device=self.device,verbose=self.verbose)
        self.main_model.train(molecular_database=subtraindb,validation_molecular_database=valdb,property_to_learn='energy',xyz_derivative_property_to_learn='energy_gradients')

        # train the auxiliary model only on energies
        self.aux_model = ml.models.ani(model_file=self.aux_main_model.model_file,device=self.device,verbose=self.verbose)
        self.aux_model.train(molecular_database=subtraindb,validation_molecular_database=valdb,property_to_learn='energy')

        if not 'uq_threshold' in al_info.keys():
            self.predict(molecular_database=valdb)
            uqs = valdb.get_property('uq')
            al_info['uq_threshold'] = np.median(uqs) + 3*stats.calc_median_absolute_deviation(uqs)
        self.uq_threshold = al_info['uq_threshold']

        # if the models were trained successfully, let's update al info where we can find them
        al_info['main_mlmodel_file'] = self.main_model.model_file
        al_info['aux_mlmodel_file'] = self.aux_model.model_file

    def predict(self, molecule=None, molecular_database=None):

        # predict energies and gradients with the main model
        self.main_model.predict(molecule=molecule, molecular_database=molecular_database,property_to_predict='energy',xyz_derivative_property_to_predict='energy_gradients')

        # predict energies with the auxiliary model
        self.aux_model.predict(molecule=molecule, molecular_database=molecular_database,property_to_predict='aux_energy')

        # calculate uncertainties
        moldb = molecular_database
        if moldb is None:
            moldb = ml.molecular_database()

        for mol in moldb:
            mol.uq = abs(mol.energy - mol.aux_energy)
            if mol.uq > self.uq_threshold:
                mol.uncertain = True
            else:
                mol.uncertain = False

    # This are useful in some internal al routines, e.g., when we want to make predictions in parallel (if nthreads is not set properly, it may slow down al significantly!)
    @property
    def nthreads(self):
        return self.main_model.nthreads

    @nthreads.setter
    def nthreads(self, value):
        self.main_model.nthreads = value
        self.aux_model.nthreads  = value

ml.al(
    ...
    ml_model = my_model,
    # do not use my_model(...), if you want to pass any arguments, use ml_model_kwargs:
    ml_model_kwargs = {...}, # 'al_info' is unnecessary to include, it will be added automatically. If you supply 'al_info' key, it will overwrite the default one so use if you know what you are doing.
    ...
)

As you can see, it is helpful (but not required) if the __init__ and train functions of the ML model class also accept the al_info parameter which can be used to pass information during active learning from one routine to another.

示例

Here is a realistic example of the sampler function used in the physics-informed active learning:

def my_sampler(al_info={}, ml_model=None, initcond_sampler=None, initcond_sampler_kwargs={}, maximum_propagation_time=1000, time_step=0.1, ensemble='NVE', thermostat=None, dump_trajs=False, dump_trajectory_interval=None, stop_function=None, batch_parallelization=True):

    moldb2label = ml.data.molecular_database()

    # generate initial conditions
    if type(initcond_sampler) == str:
        if initcond_sampler.casefold() in ['wigner', 'harmonic-quantum-boltzmann']:
            initcond_sampler = ml.generate_initial_conditions
            initcond_sampler_kwargs['generation_method'] = initcond_sampler
    import inspect
    args, varargs, varkw, defaults = inspect.getargspec(initcond_sampler)
    # Do we need al_info below?
    if 'al_info' in args:
        initial_molecular_database = initcond_sampler(al_info=al_info, **initcond_sampler_kwargs)
    else:
        initial_molecular_database = initcond_sampler(**initcond_sampler_kwargs)

    # run MD in parallel to collect uncertain points
    if batch_parallelization: # Faster way to propagate many trajs with ML
        dyn = ml.md_parallel(model=ml_model,
                             molecular_database=initial_molecular_database,
                             ensemble=ensemble,
                             thermostat=thermostat,
                             time_step=time_step,
                             maximum_propagation_time=maximum_propagation_time,
                             dump_trajectory_interval=dump_trajectory_interval,
                             stop_function=stop_function)
        trajs = dyn.molecular_trajectory
        for itraj in range(len(trajs.steps[0])):
            print(f"Trajectory {itraj} number of steps: {trajs.traj_len[itraj]}")
            if trajs.steps[trajs.traj_len[itraj]][itraj].uncertain:
                print(f'Adding molecule from trajectory {itraj} at time {trajs.traj_len[itraj]*time_step} fs')
                moldb2label.molecules.append(trajs.steps[trajs.traj_len[itraj]][itraj])

            # Dump traj
            if dump_trajs:
                import os
                traj = ml.data.molecular_trajectory()
                for istep in range(trajs.traj_len[itraj]+1):

                    step = ml.data.molecular_trajectory_step()
                    step.step = istep
                    step.time = istep * time_step
                    step.molecule = trajs.steps[istep][itraj]
                    traj.steps.append(step)
                if 'working_directory' in al_info.keys():
                    dirname = f'{al_info['working_directory']}/trajs'
                else:
                    dirname = 'trajs'
                if not os.path.exists(dirname):
                    os.makedirs(dirname)
                traj.dump(f"{dirname}/traj{itraj}.h5",format='h5md')
    else:
        md_kwargs = {
                    'molecular_database': initial_molecular_database,
                    'model': ml_model,
                    'time_step': time_step,
                    'maximum_propagation_time': maximum_propagation_time,
                    'ensemble': ensemble,
                    'thermostat': thermostat,
                    'dump_trajectory_interval': dump_trajectory_interval,
                    'stop_function': stop_function
                    }
        dyns = ml.simulations.run_in_parallel(molecular_database=initial_molecular_database,
                                            task=ml.md,
                                            task_kwargs=md_kwargs,
                                            create_and_keep_temp_directories=False)
        trajs = [d.molecular_trajectory for d in dyns]
        itraj=0
        for traj in trajs:
            itraj+=1
            print(f"Trajectory {itraj} number of steps: {len(traj.steps)}")
            if traj.steps[-1].molecule.uncertain:
                print('Adding molecule from trajectory %d at time %.2f fs' % (itraj, traj.steps[-1].time))
                moldb2label.molecules.append(traj.steps[-1].molecule)

            # Dump traj
            if dump_trajs:
                import os
                if 'working_directory' in al_info.keys():
                    dirname = f'{al_info['working_directory']}/trajs'
                else:
                    dirname = 'trajs'
                if not os.path.exists(dirname):
                    os.makedirs(dirname)
                traj.dump(f"{dirname}/traj{itraj}.h5",format='h5md')
    # add the source of molecule
    for mol in moldb2label:
        mol.sampling = 'md'
    return moldb2label

ml.al(
    ...
    sampler=my_sampler,
    sampler_kwargs={'time_step': 0.5},
    ...
)

Analysis

!—————————————————————————! ! xyz: different operations on XYZ coordinates ! ! Implementations by: Fuchun Ge ! ! To-do: implement permutation (Hungarian algorithm) ! !—————————————————————————!